The task is to upload a price-list of sorts, so quick question before I implement this.
If I want to INSERT say 1000 rows at a time, for say 100,000 the recommendation is:
http://dev.mysql.com/doc/refman/5.5/en/optimizing-myisam-bulk-data-loading.html
"If you do very many successive inserts, you could do a LOCK TABLES followed by an UNLOCK TABLES once in a while (each 1,000 rows or so) to permit other threads to access table. This would still result in a nice performance gain."
Obviously while I have the "WRITE LOCK" on the table you can still read the table right?
The reason I ask is that:
http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
says:
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
"Can access it"... my gosh, if that is the case them our entire system would freeze up... we simply cant have that... Is this in fact the case, or did they mean "...No other session can write to the table until the lock is released."?
Ultimately what I want to be able to do is INSERT 100,000 simple rows of data without impacting the system. I have used:
INSERT INTO a VALUES (1,0.00),(2,0.00), ... ..., (999,0.00)
But this often results in no rows added for some reason.
If you lock a table with LOCK TABLES ... WRITE no other thread can read/write that table.
The best approach is to use multi-row insert statements
INSERT INTO table (f1,f2,f3....) VALUES
(v1,v2,v3...),
(v1,v2,v3...),
...
You will probably need to split your rows into multiple statements with 1-10K rows each (maybe more depending on your max_packet_size and other settings).
MyISAM locks the table if it inserts into empty space somewhere in the middle of the table. If you only do inserts (no deletions) on that table, you should be OK with multiple threads inserting and selecting.
If the table has frequent deletions/updates you should consider switching to InnoDB.
As for "no rows added for some reason"- there is almost certainly an error somewhere in your statement/code. The statement will either insert the rows or return an error.
Related
I have a PHP DAEMON on my Ubuntu server doing huge data inserts into InnoDB. The very same tables are also being used by people using the platform.
The DAEMON when not running in TRANSACTION mode uses about 60-70 secs for 100.000 inserts. When running in TRANSACTION mode, BEGIN .... COMMIT it uses 15-20 seconds.
However will TRANSACTION mode lock the tables, and prevent users using the platform to do inserts while the DAEMON TRANSACTION is beeing preformed?
Locking the tables the users are manipulating for over 20 seconds is, of course, not desirable :)
Well I'm doing inserts in batches of 500 and 500 insie a FOR loop INSERT INTO (col1, col2) VALUES (a,b) etc. This is fine, and runs smooth, however I'm able to speed up the process significantly if i issue a BEGIN before the loop, and COMMIT after to loop, but this means the time between the BEGIN/COMMIT is over 60 seconds. But while the system is doing a few hundred thousand inserts, people using the platform can do inserts to the very same table. Will the system generated Inserts account for the user insets, or will the users have to wait XX seconds before their insert is processed?
Based on your description, you use innodb with the default autocommit mode enabled and you insert records one by one in a loop. Autocommit mode means that each insert is encapsulated into its own transaction, which is fine, but very slow, since each record is persisted separately into the disk.
If you wrap your loop that inserts the records within begin - commit statements, all inserts are run within a single transaction and are persisted to the disk only once, when the commit is issued - this is why you experience the speed gain.
Regardless of which way you insert the records, innodb will use locks. However, innodb only locks the record being inserted:
INSERT sets an exclusive lock on the inserted row. This lock is an
index-record lock, not a next-key lock (that is, there is no gap lock)
and does not prevent other sessions from inserting into the gap before
the inserted row.
Prior to inserting the row, a type of gap lock called an insert
intention gap lock is set. This lock signals the intent to insert in
such a way that multiple transactions inserting into the same index
gap need not wait for each other if they are not inserting at the same
position within the gap. Suppose that there are index records with
values of 4 and 7. Separate transactions that attempt to insert values
of 5 and 6 each lock the gap between 4 and 7 with insert intention
locks prior to obtaining the exclusive lock on the inserted row, but
do not block each other because the rows are nonconflicting.
This means, that having a transaction open for a longer period of time that only inserts records will not interfere with other users inserting records into the same table.
Pls note, that issuing single insert statements in a loop is the least efficient way of inserting larger amount of data into MySQL.
Either use bulk insert (build a single insert statement in the loop and execute it after the loop, paying attention to max_allowed_packet setting :
INSERT statements that use VALUES syntax can insert multiple rows. To
do this, include multiple lists of column values, each enclosed within
parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Or use load data infile statement.
These two solutions can significantly speed up the data insertion and will not cause table lock either.
Plan A: LOAD DATA. Drawback: This requires writing the data to a file. If it is already in a file, then this is the best approach.
Plan B: "Batched INSERTs" -- Build INSERT INTO t (a,b) VALUES (1,2), (3,4), ... and execute them. Do it in batches of 100-1000. This will be even faster than BEGIN..COMMIT around lots of 1-row INSERTs. Have autocommit=ON. Locking/blocking will be minimal since each 'transaction' will be only 100-1000 row's worth.
Let's see SHOW CREATE TABLE. INDEXes, especially UNIQUE indexes have an impact on the performance. We can advise further.
If this is a "Data Warehouse" application, then we should talk about "Summary Tables". These would lighten the load crated the 'readers' significantly and cut back on the need for indexes on the Fact table and prevent locking/blocking because they would be reading a different table.
Also, UUIDs are terrible for performance.
How big is the table? How much RAM do you have? What is the value of innodb_buffer_pool_size?
I'm trying to speed up a fact table load as part of an overall performance project. The table is just about 120 million rows about 100k are added each evening. The table is pretty heavily indexed.
Currently I'm using an SSIS Fast Load OLE DB destination and loading the 100,000 rows takes about 15 minutes. This seemed really high to me to insert 100k rows, so I altered the package to dump it's results into a staging table, then did an T-SQL insert into the fact table from that staging table. The insert now runs in less than 1 minute.
I found it quite odd that a plain old T-SQL insert would be faster than SSIS Fast Load, so I started looking at which boxes were checked on the OLEDB destination. It turns out Table Lock was NOT checked. When I checked this option, the SSIS load is now under 1 minute.
My questions are:
What are the implications of leaving Table Lock checked?
Does the T-SQL insert statement issue a table lock by default and that's why it was initially faster?
Well, I think the explanation is straightforward (see a more detailed reference here):
For your first question:
Table Lock – By default this setting is checked and the recommendation
is to let it be checked unless the same table is being used by some
other process at same time. It specifies a table lock will be acquired
on the destination table instead of acquiring multiple row level
locks, which could turn into lock escalation problems.
As for the insert statement, considering the rather large number of rows that need to be inserted, then SQL Server will most likely choose to make a table lock.
To confirm this you can check what kind of locks are held on the table by using the sys.dm_tran_locks DMV. Here are few good samples on how to interpret the results (and also good reading on lock escalation): http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/.
I would like to be able to lock a table to prevent other users doing Inserts. I don't want to lock the whole table, because this would prevent other users from updating rows. I do have another reasonably elegant solution, however if I could lock the table solely to prevent another user inserting rows, that would be a better solution. IE. Any user before attempting an INSERT would attempt to acquire this lock, and wait if already in use.
I'm not exactly sure why your trying to do this but I believe you can accomplish what your doing by locking on a dummy table.
That is for all inserts you would not actually lock on the table that you want to insert but on a different table that you use only for locks:
BEGIN WORK;
LOCK TABLE insert_locks IN EXCLUSIVE;
INSERT INTO real_table VALUES
(_id_, 'GREAT! I was waiting for it for so long!');
COMMIT WORK;
See Postgres' doc on LOCK.
Unfortunately you will have to go change any code that is doing inserts with out locking. The other option is to use some sort of Message Queue which I have done many times with great success.
In MySQL+InnoDB, suppose I have a single table, and two threads which both do "SELECT ... FOR UPDATE". Suppose that both of the SELECT statements end up selecting multiple rows, e.g. both of them end up selecting rows R42 and R99. Is it possible that this will deadlock?
I'm thinking of this situation: the first thread tries to lock R42 then R99, the second thread tries to lock R99 then R42. If I'm unlucky, the two threads will deadlock.
I read in the MySQL Glossary for "deadlock" that
A deadlock can occur when the transactions lock rows in multiple tables (through statements such as UPDATE or SELECT ... FOR UPDATE), but in the opposite order. ...
To reduce the possibility of deadlocks, ... create indexes on the columns used in SELECT ... FOR UPDATE and UPDATE ... WHERE statements.
This hints that in my situation (single table) I won't deadlock, maybe because MySQL automatically tries to lock rows in the order of the primary key, but I want to be certain, and I can't find the proper place in the documentation that tells me exactly what's going on.
From MySQL documentation
InnoDB uses automatic row-level locking. You can get deadlocks even in the case of
transactions that just insert or delete a single row. That is because these operations
are not really “atomic”; they automatically set locks on the (possibly several) index
records of the row inserted or deleted.
http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html
So generally, deadlocking is not fatal, you just need to try again, or add the appropriate indexes so that fewer rows are scanned and thus fewer rows are locked.
I have a MyISAM table with 2 fields f1 and f2, both Unsigned integers and cannot be null. The table purposely has no primary key but it has an index on f2. The table currently has 320 million rows.
I would like to be able to insert new rows (about 4000 once every week) into this table at a decent speed. However, currently I insert about 4000 rows in 2 minutes. (I am doing this using a text file and the "source" command - The text file contains just INSERT statements into this table). Is there a way in which I can speed up the insert statements? Also, while performing the INSERT statements, will any SELECT/JOIN statements to the same table be affected or slowed down?
You can bulk up the insert statements from
INSERT INTO table (f1, f2) VALUES ($f1, $f2);
INSERT INTO table (f1, f2) VALUES ($other, $other);
etc...
into
INSERT INTO table (f1, f2) VALUES ($f1, $f2), ($other, $other), etc...
which will reducing parseing overhead somewhat. This may speed things up a little bit. However, don't go too far overboard grouping the inserts as the query is subject to the max_allowed_packet setting.
4000 rows in 2 minutes is still 33 rows per second. That's not too shabby, especially on a huge table where an index has to be updated. You could disable keys on the table prior to doing the insert and then rebuild the key again afterwards with a REPAIR TABLE, but that might take longer, especially with 320 million rows to scan. You'd have to do some benchmarking to see if it's worthwile.
As for SELECTS/JOINS, since you're on MYISAM tables, there's no way to hide the new tables in a transaction until they're all done. Each row will immediately be visible to other session as it's entered, unless you lock the table so you get exclusive access to it for the inserts. But then, you've locked everyone else out while the insert's running.
As far as i know, the source-command is the fastest way of doing this. Since the table is MyISAM, the whole table is locked during write actions. So yes, all SELECT-statements are queued up until all inserts/updates/deletes have finished.
If the data to load can be accessed by the database, you could use the LOAD DATA INFILE command. As described in the manual:
The LOAD DATA INFILE statement reads
rows from a text file into a table at
a very high speed.
Hope that helps.
#rjk is correct. LOAD DATA INFILE is the fastest way to get data into your table. Few other thoughts.
2 minutes seems long to me for 4k rows. SELECTs block INSERTs in MyISAM and are likely slowing down your inserts. I strongly suggest InnoDB which doesn't have this issue plus better crash recovery, etc. If you must use MyISAM, locking the table before running your inserts may help or you could try INSERT DELAYED which will allow the INSERT statements to return immediately and be processed when the table is free.