Our production server gets stuck on Init for update state whenever we start a query like
update
<some_big_table>
set
<primary_key> = <some_sequence>.nextval
order by
<some_indexed_field>
While this the query is stuck are this state, all other queries get stuck at commit or writing to binlog state.
I couldn't find any relevant documentation for the same either.
That has to change every row in the table. So it effectively locks the entire table. And it takes a long time.
Hence, it blocks other queries touching the table for any purpose.
As for the "state" -- It is like most states, it does not mean much. And is possibly misleading. (I would expect it to be finished with "init" and "performing" the update.)
Related
I have a piece of (Perl) code, of which I have multiple instances running at the same time, all with a different - unique - value for a variable $dsID. Nearly all of them keep falling over when they try to execute the following (prepared) SQL statement:
DELETE FROM ssRates WHERE ssID IN (SELECT id FROM snapshots WHERE dsID=?)
returning the error:
Lock wait timeout exceeded; try restarting transaction
Which sounds clear enough, except for a few things.
I have autocommit enabled, and am not using (explicit) transactions.
I'm using InnoDB which is supposed to use row-level locking.
The argument passed as $dsID is unique to each code, so there should be no conflicting locks to get into deadlocks.
Actually, at present, there are no rows that match the inner SELECT clause (I have verified this).
Given these things, I cannot understand why I am getting lock problems -- no locks should be waiting on each other, and there is no scope for deadlocks! (Note, though, that the same script later on does insert into the ssRates table, so some instances of the code may be doing that).
Having googled around a little, this looks like it may be a "gap locking" phenomenon, but I'm not entirely sure why, and more to the point, I'm not sure what the right solution is. I have some possible workarounds, -- the obvious one being to split the process up: do the select clause, and then loop over results giving delete command. But really, I'd like to understand this otherwise I'm going to end up in this mess again!
So I have two questions for you friendly experts.
Is this a gap-locking thing?
If not - what is it? If yes -- why. I can't see how this condition matches the gap lock definition.
(NB, server is running MariaDB: 5.5.68-MariaDB; in case this is something fixed in newer versions).
I want to create a model with ID equal to the current greatest ID for that model plus one (like auto-increment). I'm considering doing this with select_for_update to ensure there is no race condition for the current greatest ID, like this:
with transaction.atomic():
greatest_id = MyModel.objects.select_for_update().order_by('id').last().id
MyModel.objects.create(id=greatest_id + 1)
But I'm wondering, if two processes try to run this simultaneously, once the second one unblocks, will it see the new greatest ID inserted by the first process, or will it still see the old greatest ID?
For example, say the current greatest ID is 10. Two processes go to create a new model. The first one locks ID 10. Then the second one blocks because 10 is locked. The first one inserts 11 and unlocks 10. Then, the second one unblocks, and now will it see the 11 inserted by the first as the greatest, or will it still see 10 because that's the row it blocked on?
In the select_for_update docs, it says:
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released.
So for my example, I'm thinking this means that the second process will rerun the query for the greatest ID once it unblocks and get 11. But I'm not certain I'm interpreting that right.
Note: I'm using MySQL for the db.
No, I don't think this will work.
First, let me note that you should absolutely check the documentation for the database you're using, as there are many subtle differences between the databases that are not captured in the Django documentation.
Using the PostgreSQL documentation as a guide, the problem is that, at the default READ COMMITTED isolation level, the blocked query will not be rerun. When the first transaction commits, the blocked transaction will be able to see changes to that row, but it will not be able to see that new rows have been added.
It is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database.
So 10 is what will be returned.
Edit: My understanding in this answer is wrong, just leaving it for documentation's sake in case I ever want to come back to it.
After some investigation, I believe this will work as intended.
The reason is that for this call:
MyModel.objects.select_for_update().order_by('id').last().id
The SQL Django generates and runs against the db is actually:
SELECT ... FROM MyModel ORDER BY id ASC FOR UPDATE;
(the call to last() only happens after the queryset has already been evaluated.)
Meaning, the query scans over all rows both times it runs. Meaning the second time it runs, it will pick up the new row and return it accordingly.
I learned that this phenomenon is called a "phantom read", and is possible because the isolation level of my db is REPEATABLE-READ.
#KevinChristopherHenry "The issue is that the query is not rerun after the lock is released; the rows have already been selected" Are you sure that's how it works? Why does READ COMMITTED imply the select doesn't run after the lock is released? I thought the isolation level defines which snapshot of data a query sees when it runs, not ~when~ the query is run. It seems to me that whether the select happens before or after the lock is released is orthogonal to the isolation level. And by definition, doesn't a blocked query not select the rows until after it is unblocked?
For what it's worth, I tried to test this by opening two separate connections to my db in a shell and issuing some queries. In the first, I began a transaction, and got a lock 'select * from MyModel order by id for update'. Then, in the second, I did the same, causing the select to block. Then back in the first, I inserted a new row, and commited the transaction. Then in the second, the query unblocked, and returned the new row. This makes me think my hypothesis is correct.
P.S. I finally actually read the "undesirable results" documentation that you read and I see your point - in that example, it looks like it ignores rows that weren't preselected, so that would point to the conclusion that my second query wouldn't pick up the new row. But I tested in a shell and it did. Now I'm not sure what to make of this.
I think I got the principle right I just want to make sure I get it right.
So when autocommit is enabled it means every command I do wont be executed directly, except whose who trigger the commit themselves.
So when I've for example a basic macro running like:
statement.executeUpdate("SET autocommit = 0;")
//some code
//SQL Queries
//SQL DELETEs
//SQL INSERTs
statement.executeUpdate("COMMIT;")
Then what would happen would be - If the script runs through without any problem the script goes to the point where every SQL Statement is executed and COMMITed at the end, if not and an error or exception happens the script breaks at that point never turns to the point where the COMMIT is going to happen and every change prior to that point is undo, so that every deleted information will still be there and every insertion is thrown away.
Is it that simple or did I get something wrong?
Assuming you are using a decent database, the data of the current transaction is not typically stored in the table heap itself but in a "redo log".
That is, it's not even in the table until the commit is executed. The commit and other later processed place them in the main table at some point.
In general, if the database engine crashes, the data may still be on disk somewhere, but not in any "official" table area, so it will be discarded when the database engine is restarted. It did not modify the actual data.
I have about a 25 GB table that I have to add a column to.
I run a script and when I execute it, I can see the temp table in the data directory but it stays stuck at about 480K. I can see in processlist that the ALTER is running and there are no issues.
If I kill the script after a long period of activity, then in processlist the query remains in "killed" state and the tmp file will start growing until the query is LITERALLY killed (ie., goes from "killed" state in processlist to disappearing off of the processlist altogether).
When I run the following (before killing the query):
select * from global_temporary_tables\G
it doesn't show any rows being added either.
Is there anything else I can do?
Firstly, what your "ps" output report may show has nothing to do with anything. Don't rely upon what "ps" says: it includes stale data.
If the process has been killed (SIGKILL, not SIGTERM), I guarantee you it is no longer delivering any output to anywhere. If it's been SIGTERMed, it depends what signal handlers you've attached. I'm going to hazard a wild guess that you haven't registered any signal handlers.
Most production DBMS set up storage in chunks. X amount of space is obtained, which may contain "slack" room that enables rows and/or columns to be added (I do NOT say that the two mechanisms are identical). Just because something didn't grow in a manner that you could perceive doesn't mean that the changes weren't made. Why not check out the data dictionary, interrogating the current structure of the table.
Did you COMMIT your changes? In some DBMS, DDL operations are regarded as committable/rollbackable (yecch) events.
If two independent scripts call a database with update requests to the same field, but with different values, would they execute at the same time and one overwrite the other?
as an example to help ensure clarity, imagine both of these statements being requested to run at the same time, each by a different script, where Status = 2 is called microseconds after Status = 1 by coincidence.
Update My_Table SET Status = 1 WHERE Status= 0;
Update My_Table SET Status = 2 WHERE Status= 0;
What would my results be and why? if other factors play a roll, expand on them as much as you please, this is meant to be a general idea.
Side Note:
Because i know people will still ask, my situation is using MySql with Google App Engine, but i don't want to limit this question to just me should it be useful to others. I am using Status as an identifier for what script is doing stuff to the field. if status is not 0, no other script is allowed to touch it.
This is what locking is for. All major SQL implementations lock DML statements by default so that one query won't overwrite another before the first is complete.
There are different levels of locking. If you've got row locking then your second update will run in parallel with the first, so at some point you'll have 1s and 2s in your table.
Table locking would force the second query to wait for the first query to completely finish to release it's table lock.
You can usually turn off locking right in your SQL, but it's only ever done if you need a performance boost and you know you won't encounter race conditions like in your example.
Edits based on the new MySQL tag
If you're updating a table that used the InnoDB engine, then you're working with row locking, and your query could yield a table with both 1s and 2s.
If you're working with a table that uses the MyISAM engine, then you're working with table locking, and your update statements would end up with a table that would either have all 1s or all 2s.
from https://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html (MySql)
Normally, you do not need to lock tables, because all single UPDATE statements are atomic; no other session can interfere with any other currently executing SQL statement. However, there are a few cases when locking tables may provide an advantage:
from https://msdn.microsoft.com/en-us/library/ms177523.aspx (sql server)
An UPDATE statement always acquires an exclusive (X) lock on the table it modifies, and holds that lock until the transaction completes. With an exclusive lock, no other transactions can modify data.
If you were having two separate connections executing the two posted update statements, whichever statement was started first, would be the one that completed. THe other statement would not update the data as there would no longer be records with a status of 0
The short answer is: it depends on which statement commits first. Just because one process started an update statement before another doesn't mean that it will complete before another. It might not get scheduled first, it might be blocked by another process, etc.
Ultimately, it's a race condition: the operation that completes (and commits) last, wins.
Since you have TWO scripts doing the same thing and using different values for the UPDATE, they will NOT run at the same time, one of the scripts will run before even if you think you are calling them at the same time. You need to specify WHEN each script should run, otherwise the program will not know what should be 1 and what should be 2.