Yii2 mysql deal with bigger data - mysql

is there any way in which we can deal with a large number of data.
Like here is my code which failed with "Allowed memory size of 524288000 bytes exhausted (tried to allocate 20480 bytes)"
$amount_needed = 0;
$openWo = Production::find()->where(['fulfilled'=>0])->all(); // it returns 10k records
foreach ($openWo as $value) {
$amount_needed += $value->amount_needed;
$item = Item::findOne($value->item_id); // so this will fire a query 10k times
$materials = Materials::find()->where(['item_id'=>$value->item_id])->all(); // so this will fire a query 10k times
foreach ($materials as $val) {
//much more here....
}
//much more here....
}
so here the first foreach will 10k times. and again there are other queries and again foreach loop...
So is there any way I can work with large data in mysql+yii2 or do I need to switch from MySQL to other database?

Related

Problem with loading large MySQL Table into JSON

We have a large MySQL Table (say) 50,000 records or rows. The loading time of all the records is close to 20 mins. By loading I mean the data getting displayed in the browser or in a Javascript Grid - time taken to load this data is around 20 mins. This is creating a big problem for us. We have even tried to return all data in JSON following this approach How to convert result table to JSON array in MySQL still we are facing a prolonged delay in loading.
The query is very much straight forward as shown below. Can anyone help us with any solution to overcome this problem. Thanks in advance!
<?php
include 'includes/xxx.php';
$returnArray = array();
$totalSymptoms = 0;
$matchedSymptomIds = array();
$cutOff = 10;
$runningGlobalSymptomId = "";
$symptomResult = mysqli_query($db,"SELECT * FROM comparison_small WHERE (matched_percentage IS NULL OR matched_percentage >= ".$cutOff.")");
if(mysqli_num_rows($symptomResult) > 0){
while($symRow = mysqli_fetch_array($symptomResult)){
$dataArray = array();
$totalSymptoms++;
if($symRow['current_symptom'] == '1'){
// Global symptom
$runningGlobalSymptomId = $symRow['symptom_id'];
$dataArray['symptom_id'] = $symRow['symptom_id'];
$dataArray['row_id'] = "row".$symRow['symptom_id'];
$dataArray['symptom'] = $symRow['symptom'];
$dataArray['match'] = "";
}else{
// Local symptom
array_push($matchedSymptomIds, $symRow['symptom_id']);
$dataArray['symptom_id'] = $symRow['symptom_id'];
$dataArray['row_id'] = "row".$runningGlobalSymptomId."_".$symRow['symptom_id'];
$dataArray['symptom'] = $symRow['symptom'];
$dataArray['match'] = $symRow['matched_percentage'];
}
$returnArray[] = $dataArray;
}
}

Big database - doctrine query slow even with index

I'm building an app with Symfony 4 + Doctrine, where people can upload big CSV files and those records then get stored in a database. Before inserting, I'm checking that the entry doesn't already exist...
On a sample CSV file with only 1000 records, it takes 16 seconds without an index and 8 seconds with an index (MacBook 3Ghz - 16 GB Memory). My intuition tells me, this is quite slow and should be done in under < 1 sec especially with the index.
The index is set on the email column.
My code:
$ssList = $this->em->getRepository(EmailList::class)->findOneBy(["id" => 1]);
foreach ($csv as $record) {
$subscriber_exists = $this->em->getRepository(Subscriber::class)
->findOneByEmail($record['email']);
if ($subscriber_exists === NULL) {
$subscriber = (new Subscriber())
->setEmail($record['email'])
->setFirstname($record['first_name'])
->addEmailList($ssList)
;
$this->em->persist($subscriber);
$this->em->flush();
}
}
My Question:
How can I speed up this process?
Use LOAD DATA INFILE.
LOAD DATA INFILE has IGNORE and REPLACE options for handling duplicates if you put a UNIQUE KEY or PRIMARY KEY on your email column.
Look at settings for making the import faster.
Like Cid said, move the flush() outside of the loop or put a batch counter inside the loop and only flush inside of it at certain intervals
$batchSize = 1000;
$i = 1;
foreach ($csv as $record) {
$subscriber_exists = $this->em->getRepository(Subscriber::class)
->findOneByEmail($record['email']);
if ($subscriber_exists === NULL) {
$subscriber = (new Subscriber())
->setEmail($record['email'])
->setFirstname($record['first_name'])
->addEmailList($ssList)
;
$this->em->persist($subscriber);
if ( ($i % $batchSize) === 0) {
$this->em->flush();
}
$i++;
}
}
$this->em->flush();
Or if that's still slow, you could grab the Connection $this->em->getConnection() and use DBAL as stated here: https://www.doctrine-project.org/projects/doctrine-dbal/en/2.8/reference/data-retrieval-and-manipulation.html#insert

Grails 3: Bulk insert performance issue

Grails Version: 3.3.2
I have 100k records I am loading from a CSV file and trying to do a bulk save. The issue I am having is that the bulk save is performing worse than a non-bulk save.
All the online searches I did basically use the same methods as this site which I referenced
http://krixisolutions.com/bulk-insert-grails-gorm/
I tried all 3 solutions on the page, here is an example of one of them:
def saveFsRawData(List<FactFsRawData> rawData) {
int startTime = DateTime.newInstance().secondOfDay;
println("Start Save");
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
rawData.eachWithIndex{ FactFsRawData entry, int i ->
session.save(entry);
if(i % 1000 == 0) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
println("End Save - "+ (DateTime.newInstance().secondOfDay - startTime));
}
I have tried various bulk sizes from 100 to 5k (using 1k in the example). All of them average around 80 seconds.
If I remove the batch processing completely then I get an average of 65 seconds.
I am unsure of what the issue is or where I am going wrong. Any ideas?

fast execution analitycs database with thousand rows to displaying in php

i've table with a thousand rows and i want to creating analitycs with chart display my front end php
my table structure is
and how i display this data :
by user_agent column i display operating system, browsers, and devices.
for now i still using the old algorithm with looping using for () method and parsing each rows. And it takes a long time respond and displaying the data.
anyone knows how i can display this data without take long respond in my website? any idea? with the database structure or my php script?
Thankyou before.
Assuming you're loading all your data in a PHP script and postprocessing it in a for-loop in PHP, you should alter your database query. A GROUP BY statement might help. Of course, you need to alter your script to work with the new data. Revisiting your database structure is a good idea, too. A better approach might be not to save the whole user-agent string in one column but to use several columns.
Example before:
$data = $db->query('SELECT * FROM table');
for ($i = 0; $i <= $data->max(); i++) {
$row = $data->getRow($i);
postprocessRow($row); /* $sum += 1; */
}
Example after:
$data = $db->query('SELECT count(*) as weight, * FROM table GROUP BY user_agent');
for ($i = 0; $i <= $data->max(); i++) {
$row = $data->getRow($i);
postprocessRowWeighted($row); /* $sum += $row['weight']; */
}

Bulk insert performance issue in EF ObjectContext

I am trying to insert a large number of rows (>10,000,000) into a MySQL
database using EF ObjectContext (db-first). After reading the answer of this question
I wrote this code (batch save) to insert about 10,000 contacts (30k rows actually; including related other rows):
// var myContactGroupId = ...;
const int maxContactToAddInOneBatch = 100;
var numberOfContactsAdded = 0;
// IEnumerable<ContactDTO> contacts = ...
foreach (var contact in contacts)
{
var newContact = AddSingleContact(contact); // method excerpt below
if (newContact == null)
{
return;
}
if (++numberOfContactsAdded % maxContactToAddInOneBatch == 0)
{
LogAction(Action.ContactCreated, "Batch #" + numberOfContactsAdded / maxContactToAddInOneBatch);
_context.SaveChanges();
_context.Dispose();
// _context = new ...
}
}
// ...
private Contact AddSingleContact(ContactDTO contact)
{
Validate(contact); // Simple input validations
// ...
// ...
var newContact = Contact.New(contact); // Creates a Contact entity
// Add cell numbers
foreach (var cellNumber in contact.CellNumbers)
{
var existingContactCell = _context.ContactCells.FirstOrDefault(c => c.CellNo == cellNumber);
if (existingContactCell != null)
{
// Set some error message and return
return;
}
newContact.ContactCells.Add(new ContactCell
{
CellNo = cellNumber,
});
}
_context.Contacts.Add(newContact);
_context.ContactsInGroups.Add(new ContactsInGroup
{
Contact = newContact,
// GroupId = some group id
});
return newContact;
}
But it seems that the more contacts are added (batchwise), it takes more time (non linear).
Here is the log for batch size 100 (10k contacts). Notice the increasing time needed as the batch# increases:
12:16:48 Batch #1
12:16:49 Batch #2
12:16:49 Batch #3
12:16:50 Batch #4
12:16:50 Batch #5
12:16:50 Batch #6
12:16:51 Batch #7
12:16:52 Batch #8
12:16:53 Batch #9
12:16:54 Batch #10
...
...
12:21:26 Batch #89
12:21:32 Batch #90
12:21:38 Batch #91
12:21:44 Batch #92
12:21:50 Batch #93
12:21:57 Batch #94
12:22:03 Batch #95
12:22:10 Batch #96
12:22:16 Batch #97
12:22:23 Batch #98
12:22:29 Batch #99
12:22:36 Batch #100
It took 6 mins 48 sec. If I increase the batch size to 10,000 (requires a single batch), it takes about 26 sec (for 10k contacts). But when I try to insert 100k contacts (10k per batch), it takes a long time (for the increasing time per batch I guess).
Can you explain why it is taking increasing amount of time despite of the context being renew-ed?
Is there any other idea except raw SQL?
Most answers on the question you linked to use context.Configuration.AutoDetectChangesEnabled = false; I don't see that in your example. So you should try that. You might want to consider EF6 too. It has an AddRange method on the context for this purpose see INSERTing many rows with Entity Framework 6 beta 1
Finally got it. Looks like the method Validate() was the culprit. It had an existence check query to check if the contact already existed. So, as the contacts are being added, the db grows and it takes more time to check as the batch# increases; mainly because the cell number field (which it was comparing) was not indexed.