Explain inexplicable deadlock - mysql

First of all, I don't see how I could be getting any deadlock at all, since I am using no explicit locking, there's only one table involved, there's a separate process each to insert, select, and update rows, only one row is inserted or updated at a time, and each process only rarely (perhaps once a minute) runs at all.
It's an email queue:
CREATE TABLE `emails_queue` (
`id` varchar(40) NOT NULL,
`email_address` varchar(128) DEFAULT NULL,
`body` text,
`status_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`status` enum('pending','inprocess','sent','discarded','failed') DEFAULT NULL,
KEY `status` (`status`),
KEY `status_time` (`status`,`status_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The generating process, in response to some user action but roughly every 90 seconds, does an insert to the table, setting the status to "pending".
There's a monitoring process that every minute checks that the number of "pending" and "failed" emails is not excessive. It takes less than a second to run and has never given me any trouble.
Every minute, the sending process grabs all the pending emails. It loops through and one email at a time, sets its status to "inprocess", tries to send it, and finally sets its status accordingly to "sent", "discarded" (it has reasons for deciding an email shouldn't go out), or "failed" (rejected by the SMTP system).
The statement for setting the status is unusual.
UPDATE emails_queue SET status=?, status_time=NOW() WHERE id=? AND status = ?
That is, I only update the status if the current status it already what I believe it to be. Before this mechanism, I accidentally kicked off two sending processes and they would each try to send the same email. Now, if that were to happen, one process would successfully move the email from "pending" to "inprocess", but the second one would update zero rows, realize there's a problem, and skip that email.
The problem is, about one time in 100, the update fails altogether! I get com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
WTH?
This is the only table and only query that this happens to and it only happens in production (to maximize difficulty in investigating it).
The only two things that seem at all unusual are (1) updating a column that participates in the WHERE clause, and (2) the (unused) automatic updating of the status_time.
I'm looking for any suggestions or diagnostic techniques.

Firstly, deadlocks do not depend on explicit locking. MySQL's LOCK TABLE or using non-default transaction isolation modes are NOT required to have a deadlock. You can still have deadlocks if you never use an explicit transaction.
Deadlocks can happen on a single table, quite easily. Most commonly it's from a single hot table.
Deadlocks can even happen if all your transactions just do a single row insert.
A deadlock can happen if you have
More than one connection to the database (obviously)
Any operation that internally involves more than one lock.
What is not obvious, is that most of the time, a single row insert or update involves more than one lock. The reason for this is that secondary indexes also need to be locked during inserts / updates.
SELECTs won't lock (assuming you're using the default isolation mode, and aren't using FOR UPDATE) so they can't be the cause.
SHOW ENGINE INNODB STATUS is your friend. It will give you a bunch of (admittedly very confusing) information about deadlocks, specifically, the most recent one.
You can't completely eliminate deadlocks, they will continue to happen in production (even on test systems if stress them properly)
Aim for a very low amount of deadlocks. If 1% of your transactions deadlock, that is possibly too many.
Consider changing the transaction isolation level of your transactions to read-committed IF YOU FULLY UNDERSTAND THE IMPLICATIONS
ensure that your software handles deadlocks appropriately.

With some database servers there are default settings for locking behavior. The usual default it to use locks (at least on the systems I used). I'm not sure this is true on mysql but I believe it is.
Do you have an index on the emails_queue table? The type of index can change how it does locking. In one case I dealt with not having a clustered index on the table caused it to
use page locking instead of row locking. I had explicitly told it to use row locking and
it silently changed it. Page locking can cause deadlocks. Try checking that index.
If those don't help the solution is the one suggested in the error message. Catch
the exception for deadlocks and re-run the sql when it happens.

You have not described the scope of the transactions in your description. If each process that you have described is trying to do everything within a single transaction, then there certainly is the potential for deadlock in this system.
While it may seem like a deadlock should not occur because only a single table is involved, the resources that are being locked are not tables but rows. Two processes may each be holding a row lock that is required by the other processes, if the same transaction is used to manipulate multiple rows.

Related

MySQL InnoDB: Difference Between `FOR UPDATE` and `LOCK IN SHARE MODE`

What is the exact difference between the two locking read clauses:
SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
And why would you need to use one over the other?
I have been trying to understand the difference between the two. I'll document what I have found in hopes it'll be useful to the next person.
Both LOCK IN SHARE MODE and FOR UPDATE ensure no other transaction can update the rows that are selected. The difference between the two is in how they treat locks while reading data.
LOCK IN SHARE MODE does not prevent another transaction from reading the same row that was locked.
FOR UPDATE prevents other locking reads of the same row (non-locking reads can still read that row; LOCK IN SHARE MODE and FOR UPDATE are locking reads).
This matters in cases like updating counters, where you read value in 1 statement and update the value in another. Here using LOCK IN SHARE MODE will allow 2 transactions to read the same initial value. So if the counter was incremented by 1 by both transactions, the ending count might increase only by 1 - since both transactions initially read the same value.
Using FOR UPDATE would have locked the 2nd transaction from reading the value till the first one is done. This will ensure the counter is incremented by 2.
For Update --- You're informing Mysql that the selected rows can be updated in the next steps(before the end of this transaction) ,,so that mysql does'nt grant any read locks on the same set of rows to any other transaction at that moment. The other transaction(whether for read/write )should wait until the first transaction is finished.
For Share- Indicates to Mysql that you're selecting the rows from the table only for reading purpose and not to modify before the end of transaction. Any number of transactions can access read lock on the rows.
Note: There are chances of getting a deadlock if this statement( For update, For share) is not properly used.
Either way the integrity of your data will be guaranteed, it's just a question of how the database guarantees it. Does it do so by raising runtime errors when transactions conflict with each other (i.e. FOR SHARE), or does it do so by serializing any transactions that would conflict with each other (i.e. FOR UPDATE)?
FOR SHARE (a.k.a. LOCK IN SHARE MODE): Transactions face a higher probability of failure due to deadlock, because they delay blocking until the moment an update statement is received (at which point they either block until all readlocks are released, or fail due to deadlock if another write is in progress). However, only one client blocks and eventually succeeds: the other clients will fail with deadlock if they try to update, so only one of them will succeed and the rest will have to retry their transactions.
FOR UPDATE: Transactions won't fail due to deadlock, because they won't be allowed to run concurrently. This may be desirable for example because it makes it easier to reason about multi-threading if all updates are serialized across all clients. However, it limits the concurrency you can achieve because all other transactions block until the first transaction is finished.
Pro-Tip: As an exercise I recommend taking some time to play with a local test database and a couple mysql clients on the command line to prove this behavior for yourself. That is how I eventually understood the difference myself, because it can be very abstract until you see it in action.

How can I avoid deadlocks with my queue table on MariaDB/Galera?

I have a database table that is basically a first-in-first-out queue. Rows are simply inserted into the table by other parts of the system and forgotten about. Every 5 minutes, a job runs to process items from the queue. Each row to be processed has it's status field changed from a pending value to a processing value. Subsequent duplicates in the queue are matched up and marked as the duplicate of an earlier queued item that is being processed. The queue processor job is the only thing that does anything with the table, apart from the parts of the system which just blindly insert rows.
This is exactly what the processor does with the queue:
START TRANSACTION;
SELECT id
FROM api_queue
WHERE status=:status_processing
-- Application checks this result set is empty, then...
UPDATE api_queue qs
INNER JOIN api_queue qdupes ON qdupes.products_id=qs.products_id AND qdupes.action=qs.action
SET qdupes.status = IF(qs.id=qdupes.id, :status_processing, :status_processing_duplicate)
WHERE qs.id IN (:queue_ids) ;
COMMIT;
-- Each queue item is processed
-- Once processing is complete, we purge the queue
START TRANSACTION;
SELECT COUNT(*) AS total FROM api_queue WHERE status = :status_processing ;
-- Application sanity checks the number of processing items it's about to delete against how many it's processed, and then...
DELETE FROM api_queue WHERE status IN (:status_processing, :status_processing_duplicate) ;
COMMIT;
In a typical 5 minutes, the queue will build up a backlog of about 100 items, though occasionally it can be in the thousands if a lot of changes have occurred in the catalog.
The first transaction is typically pretty fast when it doesn't hit a deadlock (0.1 - 0.2 seconds to complete), but it does seem to hit deadlocks about 10% of the time.
Why does it hit deadlocks so often? Even if a transaction locks all the rows currently in a table, should I expect this to cause contention when new rows are added to the table? If so, why is that?
I've also noticed that sometimes the first transaction above (containing the UPDATE query) doesn't appear to actually apply at all - though I think this may well be an unrelated bug.
My queue table looks like this:
CREATE TABLE IF NOT EXISTS `api_queue` (
`id` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
`products_id` int(11) NOT NULL,
`action` tinyint(3) NOT NULL,
`triggered_by` tinyint(3) NOT NULL,
`status` tinyint(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
My Mantra: "Don't queue it, just do it". I say this because I have seen too many queues implemented in MySQL that flopped for one reason or another. A common reason is that the overhead of inserting/checking/removing the items may be as costly as "just doing the task". So why double the cost? And, apparently, the queuing is causing extra deadlocks.
According to the info you gave, the system should be able to handle 1500-3000 every 5 minutes. That should handle the "100" to "thousands".
Your queuing mechanism seems overly complex since it involves a JOIN and other things that are not simply 1-in, 1-out.
Assuming you reject my comments so far, I will proceed to critique the code...
SELECT ... FOR UPDATE
is possibly required for the both SELECTs.
The SELECT next to the DELETE could possibly be merged with the DELETE as a multi-table DELETE. Or it might be possible to pull it, plus the associated code, out of the transaction. (Faster transactions are less likely to deadlock.)
You are checking for errors (deadlock, etc) after the COMMITs, yes? That's when Galera gets the hit.
When using an IN(...), sort the elements. The underlying code is probably locking the rows in the order of the IN elements. This could turn a deadlock into a delay of up to innodb_lock_wait_timeout seconds. (Such a delay is not as 'bad' as a deadlock.)
You repeat the transaction when it gets a deadlock, correct? (That's the simple way to deal with deadlocks.)
Edit (IN)
If one thread is doing UPDATE ... WHERE id IN (11,22) and another is doing UPDATE ... WHERE id IN (22,11), and each gets one row locked, then trying to get the other row locked is a deadlock -- and one would have to ROLLBACK. If, instead, both said (11,22), then (at worst) one would have to wait (but not be deadlocked). I am assuming, without proof, that InnoDB code is not cleaver enough to somehow avoid this IN deadlock -- by sorting the number, by atomically locking, or whatever. (And I would argue that cleaver=slower, hence not worth doing for such a rare happening.)

MySQL "LOCK TABLES" timeout?

What's the timeout for mysql LOCK TABLES statement?
Can't find it anywhere.
I tried to set variable innodb_lock_wait_timeout ini my.cnf but it seems it's related to another (row level) locking not to table locking.
Simply it has no effect for LOCK TABLES.
I want to set some low timeout value for case of deadlock, because if some operation will LOCK tables and something will go wrong, it will hang up the whole site!
Which is stupid for example in case of finishing purchase on your site.
My work-around is to create a dedicated lock table and just lock a row in that table. This has the advantage of only locking the processes that specifically want to be locked. Other parts of the application can continue to access the tables even if they are at some point touched by the update processes.
Setup
CREATE TABLE `mutex` (
EMPTY ENUM('') NOT NULL,
PRIMARY KEY (EMPTY)
);
Usage
set innodb_lock_wait_timeout = 1;
start transaction;
insert into `mutex` values();
[... do the real work here ... or somewhere else ... even a different machine ...]
delete from `mutex`;
commit;
Why are you using LOCK TABLES?
If you are using MyISAM (which sometimes needs LOCK TABLES), you should convert to InnoDB.
If you are using InnoDB, you should never use LOCK TABLES. Instead, depend on innodb_lock_wait_timeout (default is an unreasonably high 50 seconds). And you should check for errors.
InnoDB Deadlocks are caught and immediately cause an error. Certain non-deadlocks may wait for innodb_lock_wait_timeout.
Edit
Since the transaction looks like
BEGIN;
SELECT ...;
compute some stuff
UPDATE ... (using that stuff);
COMMIT;
You need to add FOR UPDATE on the end of the SELECT.
I think you are after the table_lock_timout variable which was introduced in MySQL 5.0.10 but subsequently removed in 5.5. Unfortunately, the release notes don't specify an alternative to use, and I'm guessing that the general attitude is to switch over to using InnoDB transactions as #Rick James has stated in his answer.
I think that removing the variable was unhelpful. Others may regard this as a case of the XY Problem, where we are trying to fix a symptom (deadlocks) by changing the timeout period of locking tables when really we should resolve the root cause by switching over to transactions instead. I think there may still be cases where table locks are more suitable to the application than using transactions and are perhaps a lot easier to comprehend, even if they are worse performing.
The nice thing about using LOCK TABLES, is that you can state the tables that you're queries are dependent upon before proceeding. With transactions, the locks are grabbed at the last possible moment and if they can't be fetched and time-out, you then need to check for this failure and roll back before trying everything all over again. It's simpler to have a 1 second timeout (minimum) on the lock tables query and keep retrying to get the lock(s) until you succeed and then proceeding with your queries before unlocking the tables. This logic is at no risk of deadlocks.
I believe the developer's attitude is summed up by the following excerpt from the documetation:
...avoid using the LOCK TABLES statement, because it does not offer
any extra protection, but instead reduces concurrency.
The correct answer is the lock_wait_timeout system variable.
From the documentation:
This variable specifies the timeout in seconds for attempts to acquire
metadata locks. The permissible values range from 1 to 31536000 (1
year). The default is 31536000.
This timeout applies to all statements that use metadata locks. These
include DML and DDL operations on tables, views, stored procedures,
and stored functions, as well as LOCK TABLES, FLUSH TABLES WITH READ
LOCK, and HANDLER statements.
I think you meant to say the default timeout value; which is 50 Seconds per MySQL Documentation it says
innodb_lock_wait_timeout Default 50 The timeout in seconds an
InnoDB transaction may wait for a row lock before giving up. The
default value is 50 seconds

MySQL is very slow with DELETE query, Apache is weird while query runs

To start, a few details to describe the situation as a whole:
MySQL (5.1.50) database on a very beefy (32 CPU cores, 64GB RAM) FreeBSD 8.1-RELEASE machine which also runs Apache 2.2.
Apache gets an average of about 50 hits per second. The vast majority of these hits are API calls for a sale platform.
The API calls usually take about a half of a second or less to generate a result, but could take up to 30 seconds depending on third parties.
Each of the API calls stores a row in a database. The information stored there is important, but only for about fifteen minutes, after which it must expire.
In the table which stores API call information (schema for this table is below), InnoDB row-level locking is used to synchronize between threads (Apache connections, really) requesting the same information at the same time, which happens often. This means that several threads may be waiting for a lock on a row for up to 30 seconds, as API calls can take that long (but usually don't).
Above all, the most important thing to note is that everything works perfectly under normal circumstances.
That said, this is the very highly used table (fifty or so INSERTs per second, many SELECTs, row-level locking is utilized) I'm running the DELETE query on:
CREATE TABLE `sales` (
`sale_id` int(32) unsigned NOT NULL auto_increment,
`start_time` int(20) unsigned NOT NULL,
`end_time` int(20) unsigned default NULL,
`identifier` char(9) NOT NULL,
`zip_code` char(5) NOT NULL,
`income` mediumint(6) unsigned NOT NULL,
PRIMARY KEY USING BTREE (`sale_id`),
UNIQUE KEY `SALE_DATA` (`ssn`,`zip_code`,`income`),
KEY `SALE_START` USING BTREE (`start_time`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii ROW_FORMAT=FIXED
The DELETE query looks like this, and is run every five minutes on cron (I'd prefer to run it once per minute):
DELETE FROM `sales` WHERE
`start_time` < UNIX_TIMESTAMP(NOW() - INTERVAL 30 MINUTE);
I've used INT for the time field because it is apparent that MySQL has trouble using indexes with DATETIME fields.
So this is the problem: The DELETE query seems to run fine the majority of the time (maybe 7 out of 10 times). Other times, the query finishes quickly, but MySQL seems to get choked up for awhile afterwards. I can't exactly prove it's MySQL that is acting up, but the times the symptoms happen definitely coincides with the times that this query is run. Here are the symptoms while everything is choked up:
Logging into MySQL and using SHOW FULL PROCESSLIST;, there are just a few INSERT INTOsales... queries running, where normally there are more than a hundred. What's abnormal here is actually the lack of any tasks in the process list, rather than there being too many. It seems MySQL stops taking connections entirely.
Checking Apache server-status, Apache has reached MaxClients. All threads are in "Sending reply" status.
Apache begins using lots of system time CPU. Load averages shoot way up, I've seen 1-minute load averages as high as 100. Normal load average for this machine is around 15. I see that it's using system CPU (as opposed to user CPU) because I use GKrellM to monitor it.
In top, there are many Apache processes using lots of CPU.
The web site and API (served by Apache of course) are unreachable most of the time. Some requests go through, but take around three or four minutes. Other requests reply after a time with a "Can't connect to MySQL server through /tmp/mysql.sock" error - this is the same error as I get when MySQL is over capacity and has too many connections (only it doesn't actually say too many connections).
MySQL accepts a maximum of 1024 connections, mysqltuner.pl reports "[!!] Highest connection usage: 100% (1025/1024)", meaning it's taken on more than it could handle at one point. Generally under normal conditions, there are only a few hundred concurrent MySQL connections at most. mysqltuner.pl reports no other issues, I'd be happy to paste the output if anybody wants.
Eventually, after about a minute or two, things recover on their own without any intervention. CPU usage goes back to normal, Apache and MySQL resume normal operations.
So, what can I do? :) How can I even begin to investigate why this is happening? I need that DELETE query to run for various reasons, why do things go bonkers when it's run (but not all the time)?
hard one. This is not a response but the start of a brainstorming.
I would say, maybe, a re-Index problem on delete, on the doc we can find "delete quick" followed by "optimize table" to try avoiding the multi index-merge.
One other possibility, maybe as well, is a chain of dead lock on delete with at least one other thread, row locks could pause the delete operation, and the delete operation could pause some next row lock. And then you've got either a detected deadlock , or an undetected one and so a timeout occuring. How do you detect such concurrency aborted exceptions? Do you re-run your transactions? If you threads are doing a lot of different row locks in the same transactions chances are that the first deadlock will impact more and more threads (traffic jam).
Did you tried to lock the table in the delete transaction? Check the manual, the way of locking tables in transaction in Innodb or to get a SHARE LOCK on all rows. Maybe it will take you some time to get the table only for you but if your delete is quite fast no one will notice you've taken the table for you only for 1s.
Now even if you do not tried it before, it's maybe what the delete is doing. Check as well this doc on implicit locks, your delete query should be using the start_time index, so I'm quite sure your current delete is not locking all rows (not completly sure, they lock al analysed rows not only the rows matching the where condition), but the delete is quite certainly blocking inserts. Some examples of deadlocks with transaction performing deletes are explained. Good luck! For me it's too late to understand all the lock isolation impacts.
edit you could try to change your DELETE by an UPDATE setting a deleted=1, and perform the real delete on low usage times (if you have some). And change the client queries to check this indexed deleted status.

Mysql with innodb and serializable transaction does not (always) lock rows

I have a transaction with a SELECT and possible INSERT. For concurrency reasons, I added FOR UPDATE to the SELECT. To prevent phantom rows, I'm using the SERIALIZABLE transaction isolation level. This all works fine when there are any rows in the table, but not if the table is empty. When the table is empty, the SELECT FOR UPDATE does not do any (exclusive) locking and a concurrent thread/process can issue the same SELECT FOR UPDATE without being locked.
CREATE TABLE t (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
display_order INT
) ENGINE = InnoDB;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT COALESCE(MAX(display_order), 0) + 1 from t FOR UPDATE;
..
This concept works as expected with SQL Server, but not with MySQL. Any ideas on what I'm doing wrong?
EDIT
Adding an index on display_order does not change the behavior.
There's something fun with this, both transaction are ready to get the real lock. As soon as one of the transaction will try to perform an insert the lock will be there. If both transactions try it one will get a deadlock and rollback. If only one of them try it it will get a lock wait timeout.
If you detect the lock wait timeout you can rollback and this will allow the next transaction to perform the insert.
So I think you're likely to get a deadlock exception or a timeout exception quite fast and this should save the situation. But talking about perfect 'serializable' situation this is effectively a bad side effect of empty table. The engine cannot be perfect on all cases, at least No double-transaction-inserts can be done..
I've send yesterday an interesting case of true seriability vs engine seriability, on potsgreSQl documentation, check this example it's funny : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
Update:
Other interesting resource: Does MySQL/InnoDB implement true serializable isolation?
This is probably not a bug.
The way that the different databases implement specific transaction isolation levels is NOT 100% consistent, and there are a lot of edge-cases to consider which behave differently. InnoDB was meant to emulate Oracle, but even there, I believe there are cases where it works differently.
If your application relies on very subtle locking behaviour in specific transaction isolation modes, it is probably broken:
Even if it "works" right now, it might not if somebody changes the database schema
It is unlikely that engineers maintaining your code will understand how it's using the database if it depends upon subtleties of locking
Did you have a look at this document:
http://dev.mysql.com/doc/refman/5.1/en/innodb-locking-reads.html
If you ask me, mysql wasn't built to be used in that way...
My recomendation is:
If you can affort it -> Lock the whole table.