I have a situation in which I have to insert over 10 million separate records into one table. Normally a batch insert split into chunks does the work for me. The problem however is that this over 3gig file contains over 10 million separate insert statements. Since every query takes 0.01 till 0.1 seconds, it will take over 2 days to insert everything.
I'm sure there must be a way to optimize this by either lowering the insert time drasticly or somehow import in a different way.
I'm now just using the cli
source /home/blabla/file.sql
Note: It's a 3th party that is providing me this file. I'm
Small update
I removed any indexes
Drop the indexes, then re-index when you are done!
Maybe you can parse the file data and combine several INSERT queries to one query like this:
INSERT INTO tablename (field1, field2...) VALUES (val1, val2, ..), (val3, val4, ..), ...
There are some ways to improve the speed of your INSERT statements:
Try to insert many rows at once if this is an option.
An alternative can be to insert the data into a copy of your desired table without indexes, insert the data there, then add the indexes and rename your table.
Maybe use LOAD DATA INFILE, if this is an option.
The MySQL manual has something to say about that, too.
Related
I want to add some 1000 records into my table for creating a database. Inserting each record manually is not at all practical. Is there a proper way to do this?
In MySQL you can insert multiple rows with a single insert statement.
insert into table values (data-row-1), (data-row-2), (data-row-3)
If you run a mysqldump on your database, you will see that this is what the output does.
The insert is then run as a single "transaction", so it's much, much faster than running 1000 individual inserts
I get a report in a tab delimited file which stores some SKUs and the current quantities of them.
Which means most of the time the inventory is the same and we just have to update the quantities.
But it can happen, that a new SKU is in the list which we have to insert instead of updating.
We are using an INNODB table for storing those SKUs. At the moment we just cut the file by tabs and line breaks and make an INSERT ... ON DUPLICATE KEY UPDATE query which is quite inefficient, because INSERT is expensive at INNODB, right? Also tricky because when a list with a lot of SKUs coming in > 20k it just take some minutes.
So my resolution for now is to just make a LOAD DATA INFILE into an tmp table and afterwards do the INSERT ... ON DUPLICATE KEY UPDATE, which should be faster i think.
Also is there another solution which does a simple UPDATE in the first place and only if there are some left, it performs and INSERT? This would be perfect, but yet i could not find anything about it. Is there a way to delete rows which returned an update: 1?
Sort the CSV file by the PRIMARY KEY of the table.
LOAD DATA INFILE into a separate table (as you said)
INSERT INTO real_table SELECT * FROM tmp_table ON DUPLICATE KEY UPDATE ... -- Note: This is a single INSERT.
Caveat: This may block the table from other uses during step 3. A solution: Break the CSV into 1000-row chunks. COMMIT after each chunk.
Hi I have a huge unnormalized mysql database with (~100 million) urls (~20% dupes) divided into identical split tables of 13 million rows each.
I want to move the urls into a normalized database on the same mySql server.
The old database table is unnormalized, and the url's have no index
It look like this:
entry{id,data,data2, data3, data4, possition,rang,url}
And i'm goin to slit it up into multiple tables.
url{id,url}
data{id,data}
data1{id,data}
etc
The first thing I did was
INSERT IGNORE INTO newDatabase.url (url)
SELECT DISTINCT unNormalised.url FROM oldDatabase.unNormalised
But the " SELECT DISTINCT unNormalised.url" (13 million rows) took ages, and I figured that that since "INSERT IGNORE INTO" also do a comparison, it would be fast to just do a
INSERT IGNORE INTO newDatabase.url (url)
SELECT unNormalised.url FROM oldDatabase.unNormalised
Without the DISTINCT, is this assumption Wrong?
Any way it still takes forever and i need some help, is there a better way of dealing withe this huge quantity of unnormalized data?
Whould it be best if i did a SELECT DISTINCT unNormalised.url" on the entire 100 milion row database, and exported all the id's, and then moved only those id's to the new database with lets say a php script?
All ideas are welcomed, i have no clue how to port all this date without it taking a year!
ps it is hosted on a rds amazon server.
Thank you!
As the MySQL Manual states that LOAD DATA INFILE is quicker than INSERT, the fastest way to load your data would be:
LOCK TABLES url WRITE;
ALTER TABLE url DISABLE KEYS;
LOAD DATA INFILE 'urls.txt'
IGNORE
INTO TABLE url
...;
ALTER TABLE url ENABLE KEYS;
UNLOCK TABLES;
But since you already have the data loaded into MySQL, but just need to normalize it, you might try:
LOCK TABLES url WRITE;
ALTER TABLE url DISABLE KEYS;
INSERT IGNORE INTO url (url)
SELECT url FROM oldDatabase.unNormalised;
ALTER TABLE url ENABLE KEYS;
UNLOCK TABLES;
My guess is that INSERT IGNORE ... SELECT will be faster than INSERT IGNORE ... SELECT DISTINCT but that's just a guess.
I have insert 14.485 lines on MySQL like this:
INSERT INTO `bairros` (`id`,`cidade_id`,`descricao`) VALUES (1,8891,'VILA PELICIARI');
INSERT INTO `bairros` (`id`,`cidade_id`,`descricao`) VALUES (2,8891,'VILA MARIANA');
...
It took around 5 minutes.
I had to insert in another table 16.021 lines, same database, so for test I did this:
INSERT INTO `bairros` (`id`,`cidade_id`,`descricao`) VALUES (1,8891,'VILA PELICIARI'),(2,8891,'VILA MARIANA');
...
It took just a few seconds.
What is the difference, for the database, between the scripts? And why one is faster than the other?
The difference is that the first script contains 14,485 separate queries, each of which must be committed.
The second is a single query.
I've created a stored procedure. I fetch cursor with huge amount of data(around 1L rows).
After that I call another procedure in it to do all calculations related to needed data.
I create temporary table and try to insert this calculated data in it. But it takes too long
about 9.5 mins.
I want to know how to insert bulk data by using least "INSERT" queries as 1L insert queries cause poorest performance. Can anyone help me??
You can use the following SQL statement for bulk insert:
INSERT INTO TABLE_A (A, B, C, D) VALUES
(1,1,1,1),
(2,2,2,2),
(3,3,3,3),
(4,4,4,4);
Your question is a bit vague but you can BULK load data into mysql using
load data infile...
Check the following links:
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
MySQL load data infile - acceleration?
If you need to process the data during the load it might be better to first bulk load the data into a temporary table then run some stored procedures against it which process and populate your main tables.
Hope this helps.
also you can insert record to in-memory table.after complete inserting to in-memory table following code for insert many rows from in-memory table.
INSERT INTO TABLE_A (A, B, C, D)
SELECT A,B,C,D FROM INMEMORY_TABLE
DELETE FROM INMEMORY_TABLE