I am using perl and DBI to perform deletes in chunks of 1000 on a very large mysql table. But I am receiving this error: DBD::mysql::db do failed: The total number of locks exceeds the lock table size.
Here is the perl code with the sql statement that performs the deletes
my $q = q{
DELETE FROM table
WHERE date_format(date, '%Y-%m') > '2015-01' LIMIT 1000
};
my $rc = '';
until ($rc eq '0E0') {
$rc = $dbh->do($q);
$dbh->commit();
}
In my experience this error has only occurred when trying to delete or insert a very large number of records all at once with one statement. In fact the viable solutions I have been able to find are:
Increase the innodb buffer pool size using the innodb_buffer_pool_size global variable.
perform the delete in chunks.
I have not tried solution 1. for two reasons. First being that it seems in my specific situation it would only increase the time before the buffer is eventually filled, though I am not sure about that, and second because we are not certain what effect it may have on the application using the database.
I would like to know:
*Why is this error occurring even though I am deleting in chunks?
*Is there a quick high level solution to this problem with perl and/or DBI?
*Any other info that could lead to a soution.
Why is this error occurring even though I am deleting in chunks?
InnoDB uses row-level locking:
14.5.8 Locks Set by Different SQL Statements in InnoDB
A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter whether there are WHERE conditions in the statement that would exclude the row. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned. The locks are normally next-key locks that also block inserts into the “gap” immediately before the record.
[...]
DELETE FROM ... WHERE ... sets an exclusive next-key lock on every record the search encounters.
(emphasis added)
This means that your query will lock every row it scans, even rows that don't match the condition in your WHERE clause.
I don't know the exact execution details of your query, but I imagine that with a large table, it wouldn't be difficult to overrun the default 128 MB of innodb_buffer_pool_size (which I believe is shared by all sessions; other sessions could be locking rows at the same time as your query). Especially so if your query doesn't use indexes and triggers a table scan.
Is there a quick high level solution to this problem?
The MySQL manual describes a simple workaround for exactly this situation:
If you are deleting many rows from a large table, you may exceed the lock table size for an InnoDB table. To avoid this problem, or simply to minimize the time that the table remains locked, the following strategy (which does not use DELETE at all) might be helpful:
Select the rows not to be deleted into an empty table that has the same structure as the original table:
INSERT INTO t_copy SELECT * FROM t WHERE ... ;
Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:
RENAME TABLE t TO t_old, t_copy TO t;
Drop the original table:
DROP TABLE t_old;
No other sessions can access the tables involved while RENAME TABLE executes, so the rename operation is not subject to concurrency problems. See Section 13.1.20, “RENAME TABLE Syntax”.
Have INDEX(date)
date is of type DATETIME or DATE or TIMESTAMP
Perform the query this way:
DELETE FROM table
WHERE date > '2015-01-31'
ORDER BY date DESC
LIMIT 1000
stop when the DELETE has rows_affected == 0
Related
I have requirement where we need to update the row without holding the lock for the while updating.
Here is the details of the requirements, we will be running a batch processing on a table every 5 mins update blogs set is_visible=1 where some conditions this query as to run on millions of records so we don't want to block all the rows for write during updates.
I totally understand the implications of not having write locks which is fine for us because is_visible column will be updated only by this batch process no other thread wil update this column. On the other hand there will be lot of updates to other columns of the same table which we don't want to block
First of all, if you default on the InnoDB storage engine of MySQL, then there is no way you can update data without row locks except setting the transaction isolation level down to READ UNCOMMITTED by running
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
However, I don't think the database behavior is what you expect since the dirty read is allowed in this case. READ UNCOMMITTED is rarely useful in practice.
To complement the answer from #Tim, it is indeed a good idea to have a unique index on the column used in the where clause. However, please note as well that there is no absolute guarantee that the optimizer will eventually choose such execution plan using the index created. It may work or not work, depending on the case.
For your case, what you could do is to split the long transaction into multiple short transactions. Instead of updating millions of rows in one shot, scanning only thousands of rows each time would be better. The X locks are released when each short transaction commits or rollbacks, giving the concurrent updates the opportunity to go ahead.
By the way, I assume that your batch has lower priority than the other online processes, thus it could be scheduled out of peak hours to further minimize the impact.
P.S. The IX lock is not on the record itself, but attached to the higher-granularity table object. And even with REPEATABLE READ transaction isolation level, there is no gap lock when the query uses a unique index.
Best practice is to always acquire a specific lock when there is a chance that an update could happen concurrently with other transactions. If your storage engine be MyISAM, then MySQL will lock the entire table during an update, and there isn't much you can do about that. If the storage engine be InnoDB, then it is possible that MySQL would only put an exclusive IX lock on the records targeted by the update, but there are caveats to this being the case. The first thing you would do to try to achieve this would be a SELECT ... FOR UPDATE:
SELECT * FROM blogs WHERE <some conditions> FOR UPDATE;
In order to ensure that InnoDB only locks the records being updated, there needs to be a unique index on the column which appears in the WHERE clause. In the case of your query, assuming id were the column involved, it would have to be a primary key, or else you would need to create a unique index:
CREATE UNIQUE INDEX idx ON blogs (id);
Even with such an index, InnoDB may still apply gap locks on the records in between index values, to ensure that the REPEATABLE READ contract is enforced.
So, you may add an index on the column(s) involved in your WHERE clause to optimize the update on InnoDB.
The task is to upload a price-list of sorts, so quick question before I implement this.
If I want to INSERT say 1000 rows at a time, for say 100,000 the recommendation is:
http://dev.mysql.com/doc/refman/5.5/en/optimizing-myisam-bulk-data-loading.html
"If you do very many successive inserts, you could do a LOCK TABLES followed by an UNLOCK TABLES once in a while (each 1,000 rows or so) to permit other threads to access table. This would still result in a nice performance gain."
Obviously while I have the "WRITE LOCK" on the table you can still read the table right?
The reason I ask is that:
http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
says:
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
"Can access it"... my gosh, if that is the case them our entire system would freeze up... we simply cant have that... Is this in fact the case, or did they mean "...No other session can write to the table until the lock is released."?
Ultimately what I want to be able to do is INSERT 100,000 simple rows of data without impacting the system. I have used:
INSERT INTO a VALUES (1,0.00),(2,0.00), ... ..., (999,0.00)
But this often results in no rows added for some reason.
If you lock a table with LOCK TABLES ... WRITE no other thread can read/write that table.
The best approach is to use multi-row insert statements
INSERT INTO table (f1,f2,f3....) VALUES
(v1,v2,v3...),
(v1,v2,v3...),
...
You will probably need to split your rows into multiple statements with 1-10K rows each (maybe more depending on your max_packet_size and other settings).
MyISAM locks the table if it inserts into empty space somewhere in the middle of the table. If you only do inserts (no deletions) on that table, you should be OK with multiple threads inserting and selecting.
If the table has frequent deletions/updates you should consider switching to InnoDB.
As for "no rows added for some reason"- there is almost certainly an error somewhere in your statement/code. The statement will either insert the rows or return an error.
I have a mysql lock question:
If I query this sql: select * from user order by id asc limit 0,1000.
Then anohther thread simutanousely delete the row between 0,1000 in the user table,if allowed?
In the MySQL Documentation for InnoDB, it states InnoDB does locking on the row level and runs queries as nonlocking consistent reads by default.
More directly, however is Internal Locking Methods, which says MySQL uses table-level locking for MyISAM, MEMORY, and MERGE tables, allowing only one session to update those tables at a time. Also, this:
MySQL grants table write locks as follows:
1. If there are no locks on the table, put a write lock on it.
2. Otherwise, put the lock request in the write lock queue.
MySQL grants table read locks as follows:
1. If there are no write locks on the table, put a read lock on it.
2. Otherwise, put the lock request in the read lock queue.
Okay, let's digest that: In InnoDB, each row has it's own lock, which means your query would loop through the table until it hit a row that has a lock. However, in MyISAM, there is only one lock for the entire table, which is set before the query is executed.
In other words, for InnoDB, if the DELETE operation removed the row before the SELECT operation read the row, then the row would not show up in the results. However, if the SELECT operation read the row first, then it would be returned in the result set, but any future SELECT operations would not show the row. If you want to intentionally lock the entire result set in InnoDB, look into SELECT ... FOR UPDATE.
In MyISAM, the table is locked by default, so it depends which query began execution first: if the DELETE operation started first, then the row would not be returned with the SELECT. But if the SELECT operation began execution first, then the row would indeed be returned.
There is more about interlaced here: http://dev.mysql.com/doc/refman/5.0/en/select.html
And also here: Any way to select without causing locking in MySQL?
In MySQL+InnoDB, suppose I have a single table, and two threads which both do "SELECT ... FOR UPDATE". Suppose that both of the SELECT statements end up selecting multiple rows, e.g. both of them end up selecting rows R42 and R99. Is it possible that this will deadlock?
I'm thinking of this situation: the first thread tries to lock R42 then R99, the second thread tries to lock R99 then R42. If I'm unlucky, the two threads will deadlock.
I read in the MySQL Glossary for "deadlock" that
A deadlock can occur when the transactions lock rows in multiple tables (through statements such as UPDATE or SELECT ... FOR UPDATE), but in the opposite order. ...
To reduce the possibility of deadlocks, ... create indexes on the columns used in SELECT ... FOR UPDATE and UPDATE ... WHERE statements.
This hints that in my situation (single table) I won't deadlock, maybe because MySQL automatically tries to lock rows in the order of the primary key, but I want to be certain, and I can't find the proper place in the documentation that tells me exactly what's going on.
From MySQL documentation
InnoDB uses automatic row-level locking. You can get deadlocks even in the case of
transactions that just insert or delete a single row. That is because these operations
are not really “atomic”; they automatically set locks on the (possibly several) index
records of the row inserted or deleted.
http://dev.mysql.com/doc/refman/5.1/en/innodb-deadlocks.html
So generally, deadlocking is not fatal, you just need to try again, or add the appropriate indexes so that fewer rows are scanned and thus fewer rows are locked.
The typical documentation on locking in innodb is way too confusing. I think it will be of great value to have a "dummies guide to innodb locking"
I will start, and I will gather all responses as a wiki:
The column needs to be indexed before row level locking applies.
EXAMPLE: delete row where column1=10; will lock up the table unless column1 is indexed
Here are my notes from working with MySQL support on a recent, strange locking issue (version 5.1.37):
All rows and index entries traversed to get to the rows being changed will be locked. It's covered at:
http://dev.mysql.com/doc/refman/5.1/en/innodb-locks-set.html
"A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter whether there are WHERE conditions in the statement that would exclude the row. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned. ... If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table."
That is a MAJOR headache if true.
It is. A workaround that is often helpful is to do:
UPDATE whichevertable set whatever to something where primarykey in (select primarykey from whichevertable where constraints order by primarykey);
The inner select doesn't need to take locks and the update will then have less work to do for the updating. The order by clause ensures that the update is done in primary key order to match InnoDB's physical order, the fastest way to do it.
Where large numbers of rows are involved, as in your case, it can be better to store the select result in a temporary table with a flag column added. Then select from the temporary table where the flag is not set to get each batch. Run updates with a limit of say 1000 or 10000 and set the flag for the batch after the update. The limits will keep the amount of locking to a tolerable level while the select work will only have to be done once. Commit after each batch to release the locks.
You can also speed this work up by doing a select sum of an unindexed column before doing each batch of updates. This will load the data pages into the buffer pool without taking locks. Then the locking will last for a shorter timespan because there won't be any disk reads.
This isn't always practical but when it is it can be very helpful. If you can't do it in batches you can at least try the select first to preload the data, if it's small enough to fit into the buffer pool.
If possible use the READ COMMITTED transaction isolation mode. See:
http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html
To get that reduced locking requires use of row-based binary logging (rather than the default statement based binary logging).
Two known issues:
Subqueries can be less than ideally optimised sometimes. In this case it was an undesirable dependent subquery - the suggestion I made to use a subquery turned out to be unhelpful compared to the alternative in this case because of that.
Deletes and updates do not have the same range of query plans as select statements so sometimes it's hard to properly optimise them without measuring the results to work out exactly what they are doing.
Both of these are gradually improving. This bug is one example where we've just improved the optimisations available for an update, though the changes are significant and it's still going through QA to be sure it doesn't have any great adverse effects:
http://bugs.mysql.com/bug.php?id=36569