MySQL INSERT on large table - can I improve speed performance? - mysql

I have a MyISAM table with 2 fields f1 and f2, both Unsigned integers and cannot be null. The table purposely has no primary key but it has an index on f2. The table currently has 320 million rows.
I would like to be able to insert new rows (about 4000 once every week) into this table at a decent speed. However, currently I insert about 4000 rows in 2 minutes. (I am doing this using a text file and the "source" command - The text file contains just INSERT statements into this table). Is there a way in which I can speed up the insert statements? Also, while performing the INSERT statements, will any SELECT/JOIN statements to the same table be affected or slowed down?

You can bulk up the insert statements from
INSERT INTO table (f1, f2) VALUES ($f1, $f2);
INSERT INTO table (f1, f2) VALUES ($other, $other);
etc...
into
INSERT INTO table (f1, f2) VALUES ($f1, $f2), ($other, $other), etc...
which will reducing parseing overhead somewhat. This may speed things up a little bit. However, don't go too far overboard grouping the inserts as the query is subject to the max_allowed_packet setting.
4000 rows in 2 minutes is still 33 rows per second. That's not too shabby, especially on a huge table where an index has to be updated. You could disable keys on the table prior to doing the insert and then rebuild the key again afterwards with a REPAIR TABLE, but that might take longer, especially with 320 million rows to scan. You'd have to do some benchmarking to see if it's worthwile.
As for SELECTS/JOINS, since you're on MYISAM tables, there's no way to hide the new tables in a transaction until they're all done. Each row will immediately be visible to other session as it's entered, unless you lock the table so you get exclusive access to it for the inserts. But then, you've locked everyone else out while the insert's running.

As far as i know, the source-command is the fastest way of doing this. Since the table is MyISAM, the whole table is locked during write actions. So yes, all SELECT-statements are queued up until all inserts/updates/deletes have finished.

If the data to load can be accessed by the database, you could use the LOAD DATA INFILE command. As described in the manual:
The LOAD DATA INFILE statement reads
rows from a text file into a table at
a very high speed.
Hope that helps.

#rjk is correct. LOAD DATA INFILE is the fastest way to get data into your table. Few other thoughts.
2 minutes seems long to me for 4k rows. SELECTs block INSERTs in MyISAM and are likely slowing down your inserts. I strongly suggest InnoDB which doesn't have this issue plus better crash recovery, etc. If you must use MyISAM, locking the table before running your inserts may help or you could try INSERT DELAYED which will allow the INSERT statements to return immediately and be processed when the table is free.

Related

MySQL PDO create and populate 1000 small tables in 3 seconds or less?

Is it possible? From a single process?
DB is on SATA disk.
I am using ubuntu 14.04. All tables have 20-60 rows and 6 columns each.
I am using transactions.
The current sequence is:
Create table
Start transaction
Insert #1
Insert #2
...
Insert #n
Commit
Right now I am getting about 3-4 tables/second.
Conclusion: When I disabled logging my performance became similar to phpmyadmin. So, as Rick James suggested, I guess there is no way to achieve further improvements without a faster storage.
On a spinning drive, you can get about 100 operations per second. CREATE TABLE might be slower since it involves multiple file operations in the OS. So, I would expect 1000 CREATE TABLEs to take more than 10 seconds. That's on Ubuntu; longer on Windows.
It is usually poor schema design to make multiple tables that are identical; instead have a single table with an extra column to distinguish the subsets.
INSERTing 40K rows--
40K single-row INSERTs with autocommit=ON -- 400 seconds.
1000 multi-row INSERTs of 20-60 rows each, again COMMITted after each statement -- 10 seconds.
A single INSERT with 40K rows (if you don't blow out some other limitation) -- possibly less than 1 second.
Do not use multi-statement queries; it is a potential security problem. Anyway, it won't help much.
For create table you could perform a multi statement query (PDO support this) so in a single query you can create several table and t for insert
you could use bulk insert preparing a sql insert query with repeated insert value and the execute as a single query
The bulk insert is based on
INSERT INTO your_table( col1, col2,,,)
VALUES ( val1_1, val1_2 ,,,),
( vale2_1, val2_2 ,,,),
....
Then you can build a PDO query based on these tecnique and do the fact the execution if for you single statement and not for each statement as ne number of values you can inset thousand of value in a query and get the result in a few seconds
the Use the multiple-row INSERT syntax reduce communication overhead
between the client and the server if you need to insert many rows This
tip is valid for inserts into any table, not just InnoDB tables.

MySQL Transaction in Cron job

I have a PHP DAEMON on my Ubuntu server doing huge data inserts into InnoDB. The very same tables are also being used by people using the platform.
The DAEMON when not running in TRANSACTION mode uses about 60-70 secs for 100.000 inserts. When running in TRANSACTION mode, BEGIN .... COMMIT it uses 15-20 seconds.
However will TRANSACTION mode lock the tables, and prevent users using the platform to do inserts while the DAEMON TRANSACTION is beeing preformed?
Locking the tables the users are manipulating for over 20 seconds is, of course, not desirable :)
Well I'm doing inserts in batches of 500 and 500 insie a FOR loop INSERT INTO (col1, col2) VALUES (a,b) etc. This is fine, and runs smooth, however I'm able to speed up the process significantly if i issue a BEGIN before the loop, and COMMIT after to loop, but this means the time between the BEGIN/COMMIT is over 60 seconds. But while the system is doing a few hundred thousand inserts, people using the platform can do inserts to the very same table. Will the system generated Inserts account for the user insets, or will the users have to wait XX seconds before their insert is processed?
Based on your description, you use innodb with the default autocommit mode enabled and you insert records one by one in a loop. Autocommit mode means that each insert is encapsulated into its own transaction, which is fine, but very slow, since each record is persisted separately into the disk.
If you wrap your loop that inserts the records within begin - commit statements, all inserts are run within a single transaction and are persisted to the disk only once, when the commit is issued - this is why you experience the speed gain.
Regardless of which way you insert the records, innodb will use locks. However, innodb only locks the record being inserted:
INSERT sets an exclusive lock on the inserted row. This lock is an
index-record lock, not a next-key lock (that is, there is no gap lock)
and does not prevent other sessions from inserting into the gap before
the inserted row.
Prior to inserting the row, a type of gap lock called an insert
intention gap lock is set. This lock signals the intent to insert in
such a way that multiple transactions inserting into the same index
gap need not wait for each other if they are not inserting at the same
position within the gap. Suppose that there are index records with
values of 4 and 7. Separate transactions that attempt to insert values
of 5 and 6 each lock the gap between 4 and 7 with insert intention
locks prior to obtaining the exclusive lock on the inserted row, but
do not block each other because the rows are nonconflicting.
This means, that having a transaction open for a longer period of time that only inserts records will not interfere with other users inserting records into the same table.
Pls note, that issuing single insert statements in a loop is the least efficient way of inserting larger amount of data into MySQL.
Either use bulk insert (build a single insert statement in the loop and execute it after the loop, paying attention to max_allowed_packet setting :
INSERT statements that use VALUES syntax can insert multiple rows. To
do this, include multiple lists of column values, each enclosed within
parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Or use load data infile statement.
These two solutions can significantly speed up the data insertion and will not cause table lock either.
Plan A: LOAD DATA. Drawback: This requires writing the data to a file. If it is already in a file, then this is the best approach.
Plan B: "Batched INSERTs" -- Build INSERT INTO t (a,b) VALUES (1,2), (3,4), ... and execute them. Do it in batches of 100-1000. This will be even faster than BEGIN..COMMIT around lots of 1-row INSERTs. Have autocommit=ON. Locking/blocking will be minimal since each 'transaction' will be only 100-1000 row's worth.
Let's see SHOW CREATE TABLE. INDEXes, especially UNIQUE indexes have an impact on the performance. We can advise further.
If this is a "Data Warehouse" application, then we should talk about "Summary Tables". These would lighten the load crated the 'readers' significantly and cut back on the need for indexes on the Fact table and prevent locking/blocking because they would be reading a different table.
Also, UUIDs are terrible for performance.
How big is the table? How much RAM do you have? What is the value of innodb_buffer_pool_size?

MySQL Insert with lock tables write

The task is to upload a price-list of sorts, so quick question before I implement this.
If I want to INSERT say 1000 rows at a time, for say 100,000 the recommendation is:
http://dev.mysql.com/doc/refman/5.5/en/optimizing-myisam-bulk-data-loading.html
"If you do very many successive inserts, you could do a LOCK TABLES followed by an UNLOCK TABLES once in a while (each 1,000 rows or so) to permit other threads to access table. This would still result in a nice performance gain."
Obviously while I have the "WRITE LOCK" on the table you can still read the table right?
The reason I ask is that:
http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
says:
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
"Can access it"... my gosh, if that is the case them our entire system would freeze up... we simply cant have that... Is this in fact the case, or did they mean "...No other session can write to the table until the lock is released."?
Ultimately what I want to be able to do is INSERT 100,000 simple rows of data without impacting the system. I have used:
INSERT INTO a VALUES (1,0.00),(2,0.00), ... ..., (999,0.00)
But this often results in no rows added for some reason.
If you lock a table with LOCK TABLES ... WRITE no other thread can read/write that table.
The best approach is to use multi-row insert statements
INSERT INTO table (f1,f2,f3....) VALUES
(v1,v2,v3...),
(v1,v2,v3...),
...
You will probably need to split your rows into multiple statements with 1-10K rows each (maybe more depending on your max_packet_size and other settings).
MyISAM locks the table if it inserts into empty space somewhere in the middle of the table. If you only do inserts (no deletions) on that table, you should be OK with multiple threads inserting and selecting.
If the table has frequent deletions/updates you should consider switching to InnoDB.
As for "no rows added for some reason"- there is almost certainly an error somewhere in your statement/code. The statement will either insert the rows or return an error.

Mysql InnoDB Insertion Speed Too Slow?

I have a InnoDb table in Mysql that needs to handle insertions very quickly (everything else can be as slow as it wants). The table has no relations or indexes an id which is auto incremented and a time stamp.
I ran a script from multiple clients to insert as many records as possible in the allotted time (loop insert) and calculated the number of insertions per second.
I am only getting on average 200 insertions per second and I need around 20000. The performance doesn't change with the number of clients running the script or the machine the script is running on.
Is there any way to speed up the performance of these insertions?
------ edit --------
Thanks for all your help. I couldn't group any of the insertions together because when we launch all the insertions will be coming from multiple connections. I ended up switching the engine for that table to MyISAM and the insertions per second immediately shot up to 40,000.
Which primary key did you used?
InnoDB uses clustered index so all datas are in same order as its primary key indexes.
if you don't use auto-increment type primary key, then it makes large disk operations for each inserts. it pushes all other data and inserts new element.
for longer reference, try check http://dev.mysql.com/doc/refman/5.0/en/innodb-table-and-index.html
First, execute the INSERTs in a TRANSACTION:
START TRANSACTION;
INSERT ...
COMMIT;
Second, batch up multiple rows into a single INSERT statement:
START TRANSACTION;
INSERT INTO table (only,include,fields,that,need,non_default,values) VALUES
(1,1,1,1,1,1,1),
(2,1,1,1,1,1,1),
(3,1,1,1,1,1,1),
...;
COMMIT;
Lastly, you might find LOAD DATA INFILE performs better than INSERT, if the input data is in the proper format.
i suggest to try with multiple inserts in one query for example
INSERT INTO table
(title, description)
VALUES
('test1', 'description1'),
('test2', 'description2'),
('test3', 'description3'),
('test4', 'description4')
or try to use procedures

What is the best mysql table format for high insert load?

I am in the process of adding a new feature to a system. The process will read live data from PLCĀ“s and store them in a database.
The data table will have 4 columns: variable_id (SMALLINT), timestamp (TIMESTAMP), value(FLOAT), quality(TINYINT).
The primary key is (variable_id, timestamp).
The system needs to be able to insert 1000-2000 records every second.
The data table will keep the last 24 hours of data, older data is deleted from the table.
The data table also needs to handle 5-10 select statements every second. The select statement is selecting the lastest value from the table for a specific variable and displaying it on the web.
Should I use MyISAM or InnoDB table format? Does MyISAM lock the entire table while doing inserts, thus blocking the select statements from the web interface?
Currently all the data tables in the data base are MyISAM tables.
Should I use MyISAM or InnoDB table format?
For any project with frequent concurrent reads and writes, you should use InnoDB.
Does MyISAM lock the entire table while doing inserts, thus blocking the select statements from the web interface?
With ##concurrent_insert enabled, MyISAMcan append inserted rows to the end while still reading concurrently from another session.
However, if you ever to anything but the INSERT, this can fail (i. e. the table will lock).
The system needs to be able to insert 1000-2000 records every second.
It will be better to batch these inserts and do them in batches.
InnoDB is much faster in terms of rows per second than in transactions per second.
The data table will keep the last 24 hours of data, older data is deleted from the table.
Note that InnoDB locks all rows examined, not only those affected.
If your DELETE statement will ever use a fullscan, the concurrent INSERTs will fail (since the fullscan will make InnoDB to place the gap locks on all records browsed including the last one).
MyISAM is quicker, and it locks the entire table. InnoDB is transaction-based, so it'll do row locking, but is slower.