I have a big sql dump ~1,3 million of rows.
I try to import it through mysql console this way:
source mysql_dump.sql
It goes well on start. It creates new table and etc., but after some time of inserting queries, it takes more and more time to proceed insertion queries.
E.g. every ~1700 records console outputs the results and time consumption for this stack of queries. In the beginning to do ~1700 mysql spends ~0.3 sec. After 5 minutes it takes ~ 1 minute.
What can be done, to make it proceed queries that faster, as in the beginning?
This is a bit long for a comment.
One possibility is indexes. You should drop all the indexes on the table before inserting records. Then add the indexes after all the data is in the table. Index insertions can slow down inserts.
Second, if you want to save all the data in a table, it is better to reload it using load data infile.
When you do so many inserts, do a commit after every 1000 recs.
Related
So i have database in project Mysql .
I have a main table that have main staff for updating and inserting .
I have huge data traffic on the data . what i am doing mainly reading .csv file and inserting to table .
Everything works file for 3 days but when table record goes above 20 million the database start responding slow , and in 60 million more slow.
What i have done ?
I have applied index in the record where i think i need of it . (where clause field for fast searching) .
I think query optimisation can not be issue because database working fine for 3 days and when data filled in table it get slow . and as i reach 60 million it work more slow .
Can you provide me the approach how can i handle this ?
What should i do ? Should i shift data in every 3 days or what ? What you have done in such situation .
The purpose of database is to store a huge information. I think the problem is not in your database, it should be poor query, joins, Database buffer, index and cache. These are the following reason which makes your response to slow up. For more info check this link
I have applied index in the record where i think i need of it
Yes, index improve the performance of SELECT query, but at the same time it will degrade your DML operation and index has to be restructure whenever you perform any changes to indexed column.
Now, this is totally depending on your business need, whether you need index or not, whether you can compromise SELECT or DML.
Currently, many industries uses two different schemas OLAP for reporting and analytics and OLTP to store real-time data (including some real-time reporting).
First of all it could be helpful for us to now which kind of data you want to store.
Normally it makes no sense to store such a huge amount of data in 3 days because no one ever will be able to use this in an effective way. So it is better to reduce the data before storing in the database.
e.g.
If you get measuring values from a device which gives you one value a millisecond, you should think if any user is ever asking for a special value at a special millisecond or if it not makes more sense to calculate the average value of once a second, minute or hour or perhaps once a day?
If you really need the milliseconds but only if the user takes a deeper look, you can create a table from the main table with only the average values of an hour or day or whatever and work with that table. Only if the user goes in ths "milliseconds" view you use the main table and have to live with the more bad performance.
This all is of course only possible if the database data is read only. If the data in the database is changed from the application (and not only appended by the CSV import) then using more then one table will be error prone.
Whick operation do you want to speed up?
insert operation
A good way to speed it up is to insert records in batch. For example, insert 1000 records in each insert statement:
insert into test values (value_list),(value_list)...(value_list);
other operations
If your table got tens of millions of records, everything will be slowing down. This is quite common.
To speed it up in this situation, here is some advice:
Optimize your table definition. It depends on your particular case. Creating indexes is a common way.
Optimize your SQL statements. Apparently a good SQL statement will run much faster, and a bad SQL statement might be a performance killer.
Data migration. If only part of your data is used frequently, you can shift the infrequently-used data to another big table.
Sharding. This is a more complicated way, but usually used in big data system.
For the .csv file, use LOAD DATA INFILE ...
Are you using InnoDB? How much RAM do you have? What is the value of innodb_buffer_pool_size? That may not be set right -- based on queries slowing down as the data increases.
Let's see a slow query. And SHOW CREATE TABLE. Often a 'composite' index is needed. Or reformulation of the SELECT.
I have two tables and in both tables I get 1 million records .And I am using cron job every night for inserting records .In first table I am truncating the table first and then inserting the records and in second table I am updating and inserting record according to primary key. I am using mysql as my database.My problem is I need to do this task each day but I am unable to insert all data .So what can be the possible solution for this problem
Important is to set off all kind of actions and checks MySQL wants to perform when posting data, like autocommit, indexing, etc.
https://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-bulk-data-loading.html
Because if you do not do this, MySQL does a lot of work after every record added, and it adds up, when the process is proceeding, resulting in a very slow processing and importing in the end, and may not complete in one day.
If you must use MySql : For the first table, disable the indexes, do the inserts, than enable indexes. This will works faster.
Alternatively MongoDb will be faster, and Redis is very fast.
Scenario
I have an hourly cron that inserts roughly 25k entries into a table that's about 7 million rows. My Primary Key is a composite of 5 different fields. I did this so that I wouldn't have to search the table for duplicates prior to insert, assuming the dupes would just fall to the floor on insert. Due to PHP memory issues I was seeing while reading these 25k entries in (downloading multiple json files from a url and constructing insert queries), I break the entries into 2k chunks and insert them at once via INSERT INTO blah (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);. Lastly I should probably mention I'm on DreamHost so I doubt my server/db setup is all that great. Oh and the db is MyIsam(default).
Problem
Each 2k chunk insert is taking roughly 20-30 seconds(resulting in about a 10 minute total script time including 2 minutes for downloading 6k json files) and while this is happening, user selects from that table appear to be getting blocked/delayed making the website unresponsive for users. My guess would be that the slowdown is coming from the insert trying to index the 5 field PK into a table of 7 million.
What I'm considering
I originally thought enabling concurrent insert/selects would help the unresponsive site, but as far as I can tell, my table is already MyIsam and I have concurrent inserts enabled.
I read that LOAD DATA INFILE is a lot faster so I was thinking of maybe inserting all my values into an empty temp table that will be mostly collision free(besides dupes from the current hour), exporting those w/ SELECT * INTO OUTFILE and then using LOAD DATA INFILE, but i don't know if the overhead of inserting and writing negates the speed benefit. Also the guides I've read talk about further optimizing by disabling my indexes prior to insert, but i think that would break my method of avoiding duplicates on insert...
It's probably obvious that I'm a bit clueless here, I know just enough to get myself really confused on what to do next. Any advice on how to speed up the inserts or just to make selects still responsive while these inserts are occurring would be greatly appreciated.
I have created a query using doctrine query builder which inserts almost 65000 rows(including all 3 tables) to 3 different tables at a time when something is performed.And for this complete process to finish it takes almost 2-3 mins to execute .
What i have done is persist records in loops and then flush finally.
So is there any ways that will minimize my execution time and inserts data within seconds.
No, unfortunately Doctrine doesn't support grouping inserts into a single statement. If you need to do bulk inserts, one possibility is doing a $em->flush() and $em->clear() after every 100th or so row, see the manual's recommendation:
https://doctrine-orm.readthedocs.org/en/latest/reference/batch-processing.html
Just finished rewriting many queries as batch queries - no more DB calls inside of foreach loops!
One of these new batch queries, and insert ignore into a pivot table, is taking 1-4 seconds each time. It is fairly large (~100 rows per call) and the table is also > 2 million rows.
This is the current bottleneck in my program. Should I consider something like locking the table (never done this before, but I have heard it is ... dangerous) or are there other options I should look at first.
As it is a pivot table, there is a unique key comprised of both the rows I am updating.
Are you using indexes? Indexing the correct columns speeds things up immensely. If you are doing a lot of updating and inserting, sometimes it makes sense to disable indexes until finished, since re-indexing takes time. I don't understand how locking the table would help. Is this table in user by other users or applications? That would be the main reason locking would increase speed.