How should I speed up Inserts in a large table - mysql

Scenario
I have an hourly cron that inserts roughly 25k entries into a table that's about 7 million rows. My Primary Key is a composite of 5 different fields. I did this so that I wouldn't have to search the table for duplicates prior to insert, assuming the dupes would just fall to the floor on insert. Due to PHP memory issues I was seeing while reading these 25k entries in (downloading multiple json files from a url and constructing insert queries), I break the entries into 2k chunks and insert them at once via INSERT INTO blah (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);. Lastly I should probably mention I'm on DreamHost so I doubt my server/db setup is all that great. Oh and the db is MyIsam(default).
Problem
Each 2k chunk insert is taking roughly 20-30 seconds(resulting in about a 10 minute total script time including 2 minutes for downloading 6k json files) and while this is happening, user selects from that table appear to be getting blocked/delayed making the website unresponsive for users. My guess would be that the slowdown is coming from the insert trying to index the 5 field PK into a table of 7 million.
What I'm considering
I originally thought enabling concurrent insert/selects would help the unresponsive site, but as far as I can tell, my table is already MyIsam and I have concurrent inserts enabled.
I read that LOAD DATA INFILE is a lot faster so I was thinking of maybe inserting all my values into an empty temp table that will be mostly collision free(besides dupes from the current hour), exporting those w/ SELECT * INTO OUTFILE and then using LOAD DATA INFILE, but i don't know if the overhead of inserting and writing negates the speed benefit. Also the guides I've read talk about further optimizing by disabling my indexes prior to insert, but i think that would break my method of avoiding duplicates on insert...
It's probably obvious that I'm a bit clueless here, I know just enough to get myself really confused on what to do next. Any advice on how to speed up the inserts or just to make selects still responsive while these inserts are occurring would be greatly appreciated.

Related

Performance of MySQL LOAD DATA INFILE over small file

I'm loading a small data file which consists around 1K rows into a MyISAM table
{
id INT(8),
text TEXT(or VARCHAR(1000))
}
The cost is around 2 seconds for LOAD DATA INFILE. I've seen MySQL could load more than 10K rows per second in average when loading large files. And I roughly know there are cost such as open/close tables. Can someone help me know what exactly happen in this 2 seconds and is it possible to optimize it under seconds as my program is running in a time critical environment. Thanks.
Somebody asked a similar question here
http://forums.mysql.com/read.php?144,558753,558753.
Looks like it has not been well answered yet.
Scenario Description
The whole MySQL setup is for some academic projects, which has around 300G databases for various projects. Most of these databases are in MyISAM engine if not ALL. These databases contains imported dumps, and processed intermediate tables in experiments. There are delete and update operations on these tables, but now they are all idle. I have a project which generate some result tuples that are inserted into a table in one of the databases. The table is initialized to be empty. The schema is very simple which contains only two columns as I pasted. Now if I set the ENGINE=MyISAM, it always takes 2s to insert 1-1K row, however, if I switch to ENGINE=INNODB, it becomes 0.01s. I installed a new MySQL in the other machine, create the table with ENGINE=MyISAM, and insert the same number of rows, it only takes 0.01s.
At 1k rows, you may find that multi-inserts are faster. Some benchmarking should help. This should be helpful as well:
http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb.html

2273 msec to insert 7679 records is this fast or slow?

I'm planning on inserting a large amount of rows in my mysql database. At the moment i'm inserting about 8000 records in a almost empty table with no indexes (only primary key with autoincrement) using batches and using a mysql server (default install) on localhost (i7 6gb, fast hd)
It currently take about 2273 msec to insert 7679 records.
A single record looks like:
39492, 1.4618, 1.4619, 1.4606, 1.4613, 1199235602000, 0, 133
I was wondering if this is a normal average speed or that i should be worried because it it extremely slow?
I ask this because i have no reference when it comes to speed. And because of this i don't know if my code is good or might be bugged because the speed is slow.
0.3 milliseconds per row is respectable performance, especially if you haven't yet done anything to make your code run fast. If you have any indexes in your table the insertion rate may slow down as you get up to many thousands of rows already in the data base. Then you'll need to see about disabling constraints, loading the table, and then re-enabling constraints. But you can cross that bridge if you come to it.

how to optimize a slow batch INSERT IGNORE in MySQL

Just finished rewriting many queries as batch queries - no more DB calls inside of foreach loops!
One of these new batch queries, and insert ignore into a pivot table, is taking 1-4 seconds each time. It is fairly large (~100 rows per call) and the table is also > 2 million rows.
This is the current bottleneck in my program. Should I consider something like locking the table (never done this before, but I have heard it is ... dangerous) or are there other options I should look at first.
As it is a pivot table, there is a unique key comprised of both the rows I am updating.
Are you using indexes? Indexing the correct columns speeds things up immensely. If you are doing a lot of updating and inserting, sometimes it makes sense to disable indexes until finished, since re-indexing takes time. I don't understand how locking the table would help. Is this table in user by other users or applications? That would be the main reason locking would increase speed.

MySQL locking processing large LOAD DATA INFILE when live SELECT queries still needed

Looking for some help and advice please from Super Guru MySQL/PHP pros who can spare a moment of their time.
I have a web application in PHP/MySQL which has grown over the years and gets alot of searches on it. Its hitting bottlenecks now when the various daily data dumps of new rows get processed using MySQL LOAD DATA INFILE.
Its a large MyISAM table with about 1.5 million rows and all the SELECT queries occur on it. When these take place during the LOAD DATA INFILE of about 600k rows (and deletion of out dated data) they just get backed up and take about 30+ minutes to be freed up making any of those searches fruitless.
I need to come up with a way to get that table updated while retaining the ability to provide SELECT results in a reasonable timeframe.
Im completely out of ideas and have not been able to come up with a solution myself as its the first time ive encountered this sort of issue.
Any helpful advice, solutions or pointers from similar past experiences would be greatly appreciated as I would love to learn to resolve this sort of problem.
Many thanks everyone for your time! J
You can use the CONCURRENT keywords for LOAD DATA INFILE. This way, when you load the data, the table is still able to server SELECTs.
Concerning the delete, this is more complicated. I would personally add a column called 'status' INT(1), who will define if the line is active or not( = deleted), and then partition my table with a rule based on this column status.
This way, it will be easier to delete all rows where status=0 :P I haven;t tested this last solution, I may do that in a near future.
The CONCURRENT keywords will work if your table is optimized. If there is any FREE_SPACE, then the LOAD DATA INFILE will lock the table.
MyISAM doesn't support row-level locking, so operations like mysqldump are forced to lock the entire table to guarantee a consistent dump. Your only practical options are to switch to another table like (like InnoDB) that supports row-level locking, and/or split your dump up into smaller pieces. The small dumps will still lock the table while they're dumping/reloading, but the lock periods would be shorter.
A hairier option would be to have "live" and "backup" tables. Do the dump/load operations on the backup table. When they're copmlete, swap it out for the live table (rename tables, or have your code dynamically change which table they're using).. If you can live with a short window of potential stale data, this could be a better option.
You should switch your table storage engine from MyISAM to InnoDB. InnoDB provides row-locking (as opposed to MyISAM's table-locking) meaning while one query is busy updating or inserting a row, another query can update a different row at the same time.

Generating a massive 150M-row MySQL table

I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on simple blank table (4 integer columns with 1 primary key). Setup as a MEMORY table, the entire task completes in 8 hours. After finishing, about 150 million rows exist in the table. Eight hours is a completely-decent number for me. This is a one-time deal.
The problem comes when trying to convert the MEMORY table back into MyISAM so that (A) I'll have the memory freed up for other processes and (B) the data won't be killed when I restart the computer.
ALTER TABLE memtable ENGINE = MyISAM
I've let this ALTER TABLE query run for over two days now, and it's not done. I've now killed it.
If I create the table initially as MyISAM, the write speed seems terribly poor (especially due to the fact that the query requires the use of the ON DUPLICATE KEY UPDATE technique). I can't temporarily turn off the keys. The table would become over 1000 times larger if I were to and then I'd have to reprocess the keys and essentially run a GROUP BY on 150,000,000,000 rows. Umm, no.
One of the key constraints to realize: The INSERT query UPDATEs records if the primary key (a hash) exists in the table already.
At the very beginning of an attempt at strictly using MyISAM, I'm getting a rough speed of 1,250 rows per second. Once the index grows, I imagine this rate will tank even more.
I have 16GB of memory installed in the machine. What's the best way to generate a massive table that ultimately ends up as an on-disk, indexed MyISAM table?
Clarification: There are many, many UPDATEs going on from the query (INSERT ... ON DUPLICATE KEY UPDATE val=val+whatever). This isn't, by any means, a raw dump problem. My reasoning for trying a MEMORY table in the first place was for speeding-up all the index lookups and table-changes that occur for every INSERT.
If you intend to make it a MyISAM table, why are you creating it in memory in the first place? If it's only for speed, I think the conversion to a MyISAM table is going to negate any speed improvement you get by creating it in memory to start with.
You say inserting directly into an "on disk" table is too slow (though I'm not sure how you're deciding it is when your current method is taking days), you may be able to turn off or remove the uniqueness constraints and then use a DELETE query later to re-establish uniqueness, then re-enable/add the constraints. I have used this technique when importing into an INNODB table in the past, and found even with the later delete it was overall much faster.
Another option might be to create a CSV file instead of the INSERT statements, and either load it into the table using LOAD DATA INFILE (I believe that is faster then the inserts, but I can't find a reference at present) or by using it directly via the CSV storage engine, depending on your needs.
Sorry to keep throwing comments at you (last one, probably).
I just found this article which provides an example of a converting a large table from MyISAM to InnoDB, while this isn't what you are doing, he uses an intermediate Memory table and describes going from memory to InnoDB in an efficient way - Ordering the table in memory the way that InnoDB expects it to be ordered in the end. If you aren't tied to MyISAM it might be worth a look since you already have a "correct" memory table built.
I don't use mysql but use SQL server and this is the process I use to handle a file of similar size. First I dump the file into a staging table that has no constraints. Then I identify and delete the dups from the staging table. Then I search for existing records that might match and put the idfield into a column in the staging table. Then I update where the id field column is not null and insert where it is null. One of the reasons I do all the work of getting rid of the dups in the staging table is that it means less impact on the prod table when I run it and thus it is faster in the end. My whole process runs in less than an hour (and actually does much more than I describe as I also have to denormalize and clean the data) and affects production tables for less than 15 minutes of that time. I don't have to wrorry about adjusting any constraints or dropping indexes or any of that since I do most of my processing before I hit the prod table.
Consider if a simliar process might work better for you. Also could you use some sort of bulk import to get the raw data into the staging table (I pull the 22 gig file I have into staging in around 16 minutes) instead of working row-by-row?