I have a table in my database (actually a few related tables) that get can be manipulated manually from various points through our interface but also automatically from two sources on a continuous basis. The periodic updates can contain huge amounts of data and can result in thousands of inserts/updates. In order to improve performance of the inserts/updates I have used "SET autocommit = 0" around the updates from these automated sources. This has resulted in the desired performance improvement, maybe even more than expected. However the problem now is that if the automated sources overlap or if a manual update is performed very often the database locks up and after a while throws an error:
Lock wait timeout exceeded; try restarting transaction
This may be thrown even in a single statement with autocommit on and no transaction but I guess that is reasonable as well if it conflicts with a transaction. I have read various suggestions, unfortunately there is no ideal solution. I guess my options are:
Try to order updates/inserts on the tables so that locks on all threads are requested in the same order and there is no deadlock. Unfortunately this is no possible, updates need to be applied in the order they are received.
Use LOCK TABLES to serialize transactions. This is theoretically possible but a) Apart from the two automated sources the tables are updated from many points in the system, including triggers, schedules, manually from various interfaces. It would be a nightmare to identify and maintain LOCK tables around all these places and no easy way to know that all have been identified, and b) LOCK TABLES has to lock all tables involved and the updates/inserts though not often but sometimes may need to update many tables as a result of the updates and again need to identify and maintain all the tables that might be updated so that they are included in the LOCK TABLES.
Use a semaphore table before each update in order to achieve the serialization of updates as with LOCK TABLES above but without actually having to use LOCK TABLES. This is an improvement but still has problem a) of LOCK TABLES above.
Any other suggestions? Could the improvement benefits of autocommit = 0 (transactions) be achieved some other way that does not involve locks? Could innodb be configured to actually not lock or lock much less on updates/inserts?
Last resort option may be to move to MyISAM tables. Would this actually achieve performance improvements with heavy inserts/update operations?
Thank you
You can achieve the benefits of autocommit = 0 while still not using long transactions.
a) You can commit the transaction every X statements, assuming that you don't need to rollback the entire transaction
b) instead of using autocommit = 0 you can use ALTER TABLE x DISABLE keys / ALTER TABLE x ENABLE keys before/after the import. This is the reason for the performance improvement of the operation - the non-unique indexes are not updated until the transaction finishes, and then are updated in bulk.
Related
What's the timeout for mysql LOCK TABLES statement?
Can't find it anywhere.
I tried to set variable innodb_lock_wait_timeout ini my.cnf but it seems it's related to another (row level) locking not to table locking.
Simply it has no effect for LOCK TABLES.
I want to set some low timeout value for case of deadlock, because if some operation will LOCK tables and something will go wrong, it will hang up the whole site!
Which is stupid for example in case of finishing purchase on your site.
My work-around is to create a dedicated lock table and just lock a row in that table. This has the advantage of only locking the processes that specifically want to be locked. Other parts of the application can continue to access the tables even if they are at some point touched by the update processes.
Setup
CREATE TABLE `mutex` (
EMPTY ENUM('') NOT NULL,
PRIMARY KEY (EMPTY)
);
Usage
set innodb_lock_wait_timeout = 1;
start transaction;
insert into `mutex` values();
[... do the real work here ... or somewhere else ... even a different machine ...]
delete from `mutex`;
commit;
Why are you using LOCK TABLES?
If you are using MyISAM (which sometimes needs LOCK TABLES), you should convert to InnoDB.
If you are using InnoDB, you should never use LOCK TABLES. Instead, depend on innodb_lock_wait_timeout (default is an unreasonably high 50 seconds). And you should check for errors.
InnoDB Deadlocks are caught and immediately cause an error. Certain non-deadlocks may wait for innodb_lock_wait_timeout.
Edit
Since the transaction looks like
BEGIN;
SELECT ...;
compute some stuff
UPDATE ... (using that stuff);
COMMIT;
You need to add FOR UPDATE on the end of the SELECT.
I think you are after the table_lock_timout variable which was introduced in MySQL 5.0.10 but subsequently removed in 5.5. Unfortunately, the release notes don't specify an alternative to use, and I'm guessing that the general attitude is to switch over to using InnoDB transactions as #Rick James has stated in his answer.
I think that removing the variable was unhelpful. Others may regard this as a case of the XY Problem, where we are trying to fix a symptom (deadlocks) by changing the timeout period of locking tables when really we should resolve the root cause by switching over to transactions instead. I think there may still be cases where table locks are more suitable to the application than using transactions and are perhaps a lot easier to comprehend, even if they are worse performing.
The nice thing about using LOCK TABLES, is that you can state the tables that you're queries are dependent upon before proceeding. With transactions, the locks are grabbed at the last possible moment and if they can't be fetched and time-out, you then need to check for this failure and roll back before trying everything all over again. It's simpler to have a 1 second timeout (minimum) on the lock tables query and keep retrying to get the lock(s) until you succeed and then proceeding with your queries before unlocking the tables. This logic is at no risk of deadlocks.
I believe the developer's attitude is summed up by the following excerpt from the documetation:
...avoid using the LOCK TABLES statement, because it does not offer
any extra protection, but instead reduces concurrency.
The correct answer is the lock_wait_timeout system variable.
From the documentation:
This variable specifies the timeout in seconds for attempts to acquire
metadata locks. The permissible values range from 1 to 31536000 (1
year). The default is 31536000.
This timeout applies to all statements that use metadata locks. These
include DML and DDL operations on tables, views, stored procedures,
and stored functions, as well as LOCK TABLES, FLUSH TABLES WITH READ
LOCK, and HANDLER statements.
I think you meant to say the default timeout value; which is 50 Seconds per MySQL Documentation it says
innodb_lock_wait_timeout Default 50 The timeout in seconds an
InnoDB transaction may wait for a row lock before giving up. The
default value is 50 seconds
can someone explain the need to lock tables and/or rows in mysql?
I am assuming that it to prevent multiple writes to the same field, is this the best practise?
First lets look a good document This is not a mysql related documentation, it's about postgreSQl, but it's one of the simplier and clear doc I've read on transaction. You'll understand MySQl transaction better after reading this link http://www.postgresql.org/docs/8.4/static/mvcc.html
When you're running a transaction 4 rules are applied (ACID):
Atomicity : all or nothing (rollback)
Coherence : coherent before, coherent after
Isolation: not impacted by others?
Durability : commit, if it's done, it's really done
In theses rules there's only one which is problematic, it's Isolation. using a transaction does not ensure a perfect isolation level. The previous link will explain you better what are the phantom-reads and suchs isolation problems between concurrent transactions. But to make it simple you should really use Row levels locks to prevent other transaction, running in the same time as you (and maybe comitting before you), to alter the same records. But with locks comes deadlocks...
Then when you'll try using nice transactions with locks you'll need to handle deadlocks and you'll need to handle the fact that transaction can fail and should be re-launched (simple for or while loops).
Edit:------------
Recent versions of InnoDb provides greater levels of isolation than previous ones. I've done some tests and I must admit that even the phantoms reads that should happen are now difficult to reproduce.
MySQL is on level 3 by default of the 4 levels of isolation explained in the PosgtreSQL document (where postgreSQL is in level 2 by default). This is REPEATABLE READS. That means you won't have Dirty reads and you won't have Non-repeatable reads. So someone modifying a row on which you made your select in your transaction will get an implicit LOCK (like if you had perform a select for update).
Warning: If you work with an older version of MySQL like 5.0 you're maybe in level 2, you'll need to perform the row lock using the 'FOR UPDATE' words!
We can always find some nice race conditions, working with aggregate queries it could be safer to be in the 4th level of isolation (by using LOCK IN SHARE MODE at the end of your query) if you do not want people adding rows while you're performing some tasks. I've been able to reproduce one serializable level problem but I won't explain here the complex example, really tricky race conditions.
There is a very nice example of race conditions that even serializable level cannot fix here : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
When working with transactions the more important things are:
data used in your transaction must always be read INSIDE the transaction (re-read it if you had data from before the BEGIN)
understand why the high isolation level set implicit locks and may block some other queries ( and make them timeout)
try to avoid dead locks (try to lock tables in the same order) but handle them (retry a transaction aborted by MySQL)
try to freeze important source tables with serialization isolation level (LOCK IN SHARE MODE) when your application code assume that no insert or update should modify the dataset he's using (if not you will not have problems but your result will have ignored the concurrent changes)
It is not a best practice. Modern versions of MySQL support transactions with well defined semantics. Use transactions, and forget about locking stuff by hand.
The only new thing you'll have to deal with is that transaction commits may fail because of race conditions, but you'd be doing error checking with locks anyway, and it is easier to retry the logic that led to a transaction failure than to recover from errors in a non-transactional setup.
If you do get race conditions and failed commits, then you may want to fine-tune the isolation configuration for your transactions.
For example if you need to generate invoice numbers which are sequential and have no numbers missing - this is a requirement at least in the country I live in.
If you have a few web servers, then a few users might be buying stuff literally at the same time.
If you do select max(invoice_id)+1 from invoice to get the new invoice number, two web servers might do that at the same time (before the new invoice has been added), and get the same invoice number for the invoices they're trying to create.
If you use a mechanism such as "auto_increment", this is just meant to generate unique values, and makes no guarantees about not missing out numbers (if one transaction tries to insert a row, then does a rollback, the number is "lost"),
So the solution is to (a) lock the table (b) select max(invoice_id)+1 from invoice (c) do the insert (d) commit + unlock the table.
On another note, in MySQL you're best using InnoDB and using row-level locking. Doing a lock table command can implicitly commit the transaciton you're working on.
Take a look here for general introduction to what transactions are and how to use them.
Databases are designed to work in concurrent environments, so locking the tables and/or records helps to keep the transactions consistent.
So a record affected by one transaction should not be altered until this transaction commits or rolls back.
So after reading Performance in PDO / PHP / MySQL: transaction versus direct execution in regards to performance issues I was thinking about I did some research on locking tables in MySQL.
On http://dev.mysql.com/doc/refman/5.0/en/table-locking.html
Table locking enables many sessions to
read from a table at the same time,
but if a session wants to write to a
table, it must first get exclusive
access. During the update, all other
sessions that want to access this
particular table must wait until the
update is done.
This part struck me particularly because most of our queries will be updates rather than inserts. I was wondering if one created a table called foo on which all updates/inserts were carried out and then a view called foo_view (A copy of foo, or perhaps foo and a linkage of several other tables plus foo) on which all selects occurred, would this locking issue still occur?
That is, would SELECT queries on foo_view still have to wait for an update to finish on foo?
Another brief question my colleague asked. Does this affect caching? I.e. if the SELECT is cached will it hit the cache and return results, or will it wait for the lock to finish first?
Your view will experience the same locking as the underlying tables.
From the MySQL Reference page on locking:
MySQL grants table write locks as
follows:
If there are no locks on the table, put a write lock on it.
Otherwise, put the lock request in the write lock queue.
MySQL grants table read locks as
follows:
If there are no write locks on the table, put a read lock on it.
Otherwise, put the lock request in the read lock queue.
It's worth mentioning that this depends on the database engine you are using. MyISAM will follow the steps above and lock the entire table (even if it is split into multiple partitions) where an engine like InnoDB will do row level locking instead.
If you're not reaching the necessary performance benchmarks with MyISAM and you have shown your bottleneck is waiting on table locks via updates, I would suggest changing the storage engine of your table to InnoDB.
In MySQL:
Every one minute I empty the table and fill it with a new data. Now I want that users should not read data during the fill process, before or after is ok.
How do I achieve this?
Is transaction the way?
Assuming you use a transactional engine (Usually Innodb), clear and refill the table in the same transaction.
Be sure that your readers use READ_COMMITTED or higher transaction isolation level (the default is REPEATABLE READ which is higher).
That way readers will continue to be able to read the old contents of the table during the update.
There are a few things to be careful of:
If the table is so big that it exhausts the rollback area - this is possible if you update the whole of (say) a 1M row table. Of course this is tunable but there are limits
If the transaction fails part way through and gets rolled back - rolling back big transactions is VERY inefficient in InnoDB (it is optimised for commits, not rollbacks)
Be careful of deadlocks and lock wait timeouts, which are more likely if you use big transactions.
You can LOCK your table for the duration of your operation:
http://dev.mysql.com/doc/refman/5.1/en/lock-tables.html
A table lock protects only against
inappropriate reads or writes by other
sessions. The session holding the
lock, even a read lock, can perform
table-level operations such as DROP
TABLE. Truncate operations are not
transaction-safe, so an error occurs
if the session attempts one during an
active transaction or while holding a
table lock.
I don't know enough about the internal row-versioning mechanisms of MySql (or indeed, if there is one), but other databases (Oracle, Postgresql, and more recently, Sql Server) have invested a lot of effort into allowing writers to not block readers, in so far as readers have access to the version of the rows that existed immediately before the update/write process started. Once the update is committed, that version of the row becomes the one made availabe to all readers, thereby avoiding a bottleneck that the above behaviour in MySql will introduce.
This policy ensures that table locking
is deadlock free. There are, however,
other things you need to be aware of
about this policy: If you are using a
LOW_PRIORITY WRITE lock for a table,
it means only that MySQL waits for
this particular lock until there are
no other sessions that want a READ
lock. When the session has gotten the
WRITE lock and is waiting to get the
lock for the next table in the lock
table list, all other sessions wait
for the WRITE lock to be released. If
this becomes a serious problem with
your application, you should consider
converting some of your tables to
transaction-safe tables.
You can load your data into a shadow table as slowly as you like, then instantly swap the shadow and actual with RENAME TABLE:
truncate table shadow; # make sure it is clean to start with
insert into shadow .....; # lots of inserts etc against shadow table
rename table active to temp, shadow to active, temp to shadow;
truncate table shadow; # throw away the old active data
The rename statement is atomic. An intermediate name "temp" is used to help swap the names of temp and active.
This should work with all storage engines.
Rename table - MySQL Manual
I need to use a table for a queuing system. The table will be constantly be updated.
For example, multiple users on my website, will add their files for process, and I heard that when updates occur simultaneously from multiple users, the table becomes non responsive or something like that.
so do I need locking tables in this situation ? how do i apply a lock to a mysql table ?
By constantly be updated do you mean it will be appended to 10,000 times per second? Even for middle-range servers, that still presents 10,000 opportunities per second for the table to be shared by other users.
It's only necessary to lock the table when several dependent operations need to occur as a unit. In most cases, it is sufficient to include the series of operations in a database transaction. If the "constant updates" are merely insert sometable values ( ...), then it will be easy to guarantee transaction consistency.
so do I need locking tables in this situation ?
All tables can be locked, there isn't a special type of table. That said, deal with the issue when you run into deadlocks. Isolation levels are a closely related topic as well.
how do i apply a lock to a mysql table ?
There's the LOCK TABLES syntax - this article covers how & why you'd want to lock tables.
When you do several updates at once, you can get into a deadlock situation. Engines such as InnoDB will detect this and fail one of your transactions (You can retry, but only the whole transaction).
You can avoid this by table locking, but it reduces concurrency.
Engines such as MyISAM (which does not support MVCC or transactions anyway) use table locking implicitly anyway. Such table locks exist only for the duration of a query; they're automatically released soonish after the table isn't needed any more (not necessarily as soon as possible, but quite soon)
I recommend you do not use LOCK TABLE unless you feel you really need to; if you get a deadlock, retry the transaction.