I have a table with millions of rows:
id | info | uid
The uid is null by default. I want to select 10 rows and assign them to a uid, but I want to avoid any potential concurrency issues. So I think the only way to do that is to somehow select 10 rows based on certain criteria, lock those rows and then make my changes before unlocking them.
Is there a way to do row-locking in MySQL and PHP? Or is there some other way I can gaurantee that this doesnt happen:
user a queries the table where uid is null
finds row 1
user b queries the table where uid is null
finds row 1
user a process row and sets it back to null
user b process row and sets it back to null
See my problem?
What you probably need is SELECT ... FOR UPDATE. With this, retrieved rows are locked until a COMMIT or a ROLLBACK is made. So you can do something like :
START TRANSACTION;
SELECT * FROM yourTable WHERE uid IS NULL FOR UPDATE;
-- UPDATE to whatever you want
COMMIT;
Related
I am facing a problem in MySQL 8.0x with SKIP LOCKED.
The table is something like this (over simplified)
Table users as U {
id int [pk, increment] // auto-increment
transaction_id int
full_name varchar
status int
created_at timestamp
country_code int
important varchar
}
There are plenty of rows there and multiple threads writing concurrently at the table.
Initially I am selecting a row for update
SELECT * FROM Table WHERE STATUS=0 AND IMPORTANT="QgBv0kidCvg" AND
country_code=17200 AND TRANSACTION_ID < 0 ORDER BY ID LIMIT 1 FOR
UPDATE SKIP LOCKED
Then with the update statement I am reserving the row with a temporarily status code, so if another thread has the same query parameters now rows will be returned to the second thread.
UPDATE Table SET STATUS=3 WHERE ID=4695
The code does a few integrations here and there and at the end, after the reservation is OK I am
finalizing the status value to 1
UPDATE Table SET STATUS=1, TRANSACTION_ID=2312313 WHERE ID=4695
So it is a reservation scenario where a request comes to check if there is a free "important", reserves it for a while changing the status to 3 ( a temp state ) and after the peers are updated we mark the row with status = 1 means that it is reserved.
My problem here is that the SKIP LOCKED statement returns NO ROWS, meaning that no rows were found free. There is no chance of two concurrent threads querying for the some row. I checked this several times.
What concerns me here is that if I remove the SKIP LOCKED at the end of the select for update statement everything works as expected.
I have read in the PostgreSQL docs that without an ORDER statement, SELECT will return records in an unspecified order.
Recently on an interview, I was asked how to SELECT records in the order that they inserted without an PK or created_at or other field that can be used for order. The senior dev who interviewed me was insistent that without an ORDER statement the records will be returned in the order that they were inserted.
Is this true for PostgreSQL? Is it true for MySQL? Or any other RDBMS?
I can answer for MySQL. I don't know for PostgreSQL.
The default order is not the order of insertion, generally.
In the case of InnoDB, the default order depends on the order of the index read for the query. You can get this information from the EXPLAIN plan.
For MyISAM, it returns orders in the order they are read from the table. This might be the order of insertion, but MyISAM will reuse gaps after you delete records, so newer rows may be stored earlier.
None of this is guaranteed; it's just a side effect of the current implementation. MySQL could change the implementation in the next version, making the default order of result sets different, without violating any documented behavior.
So if you need the results in a specific order, you should use ORDER BY on your queries.
Following BK's answer, and by way of example...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table(id INT NOT NULL) ENGINE = MYISAM;
INSERT INTO my_table VALUES (1),(9),(5),(8),(7),(3),(2),(6);
DELETE FROM my_table WHERE id = 8;
INSERT INTO my_table VALUES (4),(8);
SELECT * FROM my_table;
+----+
| id |
+----+
| 1 |
| 9 |
| 5 |
| 4 | -- is this what
| 7 |
| 3 |
| 2 |
| 6 |
| 8 | -- we expect?
+----+
In the case of PostgreSQL, that is quite wrong.
If there are no deletes or updates, rows will be stored in the table in the order you insert them. And even though a sequential scan will usually return the rows in that order, that is not guaranteed: the synchronized sequential scan feature of PostgreSQL can have a sequential scan "piggy back" on an already executing one, so that rows are read starting somewhere in the middle of the table.
However, this ordering of the rows breaks down completely if you update or delete even a single row: the old version of the row will become obsolete, and (in the case of an UPDATE) the new version can end up somewhere entirely different in the table. The space for the old row version is eventually reclaimed by autovacuum and can be reused for a newly inserted row.
Without an ORDER BY clause, the database is free to return rows in any order. There is no guarantee that rows will be returned in the order they were inserted.
With MySQL (InnoDB), we observe that rows are typically returned in the order by an index used in the execution plan, or by the cluster key of a table.
It is not difficult to craft an example...
CREATE TABLE foo
( id INT NOT NULL
, val VARCHAR(10) NOT NULL DEFAULT ''
, UNIQUE KEY (id,val)
) ENGINE=InnoDB;
INSERT INTO foo (id, val) VALUES (7,'seven') ;
INSERT INTO foo (id, val) VALUES (4,'four') ;
SELECT id, val FROM foo ;
MySQL is free to return rows in any order, but in this case, we would typically observe that MySQL will access rows through the InnoDB cluster key.
id val
---- -----
4 four
7 seven
Not at all clear what point the interviewer was trying to make. If the interviewer is trying to sell the idea, given a requirement to return rows from a table in the order the rows were inserted, a query without an ORDER BY clause is ever the right solution, I'm not buying it.
We can craft examples where rows are returned in the order they were inserted, but that is a byproduct of the implementation, ... not guaranteed behavior, and we should never rely on that behavior to satisfy a specification.
Assume I have the following table:
| id | claimed |
----------------
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
I can execute this query to update exactly (any) one of the rows without having to execute a select first.
UPDATE mytable SET claimed = [someId] WHERE claimed IS NULL LIMIT 1
However, what happens if two concurrent requests of this query take place. Is it possible for the later request to override the value of the first request? I know the chance of this happening is very slight, but still.
Performing statement UPDATE mytable SET claimed = [someId] WHERE claimed IS NULL LIMIT 1 in a transaction t1 locks the respective record and prevents any other transaction t2 from updating the same record until transaction t1 commits (or aborts). Transaction t2 is blocked in the meanwhile; t2 continues once t1 commits (or aborts), or t2 gets aborted automatically once a timeout is reached.
Confer mysql reference on internal locking methods - row level locking:
MySQL uses row-level locking for InnoDB tables to support simultaneous
write access by multiple sessions, making them suitable for
multi-user, highly concurrent, and OLTP applications.
and mysql reference on Locks Set by Different SQL Statements in InnoDB:
UPDATE ... WHERE ... sets an exclusive next-key lock on every record
the search encounters. However, only an index record lock is required
for statements that lock rows using a unique index to search for a
unique row.
and finally the behaviour of locking in mysql reference InnoDB Locking for record locks:
If a transaction T1 holds an exclusive (X) lock on row r, a request
from some distinct transaction T2 for a lock of either type on r
cannot be granted immediately. Instead, transaction T2 has to wait for
transaction T1 to release its lock on row r.
So two queries will not grab the same record as long as these two queries run in different transactions.
Note that the complete record is locked, such that other update operations by other transactions are blocked, even if they would update other attributes of the respective record.
I tried it out using SequelPro, and you can try it out with any client you want, as follows:
Make sure that mytable contains at least two records with claimed
is null.
Open two connection windows / terminals; let's call them c1 and
c2.
in c1, execute the following two commands: start transaction;
UPDATE mytable SET claimed = 15 WHERE claimed IS NULL LIMIT 1; #
No commit so far!
in c2, execute similar commands (Note the different value for
claimed): start transaction; UPDATE mytable SET claimed = 16 WHERE claimed IS NULL LIMIT 1; # Again, no commit so far
Window c2 should inform you that it is working (i.e. waiting for
the query to finish).
Switch to window c1 and execute command commit;
Switch to window c2, where the (previously started) query should
now have been finished; Execute commit;
When looking into mytable, one record should now have claim=15,
and another one should have claim=16.
I've looked over all of the related questions i've find, but couldn't get one which will answer mine.
i got a table like this:
id | name | age | active | ...... | ... |
where "id" is the primary key, and the ... meaning there are something like 30 columns.
the "active" column is of tinyint type.
My task:
Update ids 1,4,12,55,111 (those are just an example, it can be 1000 different id in total) with active = 1 in a single query.
I did:
UPDATE table SET active = 1 WHERE id IN (1,4,12,55,111)
its inside a transaction, cause i'm updating something else in this process.
the engine is InnoDB
My problem:
Someone told me that doing such a query is equivalent to 5 queries at execution, cause the IN will translate to the a given number of OR, and run them one after another.
eventually, instead of 1 i get N which is the number in the IN.
he suggests to create a temp table, insert all the new values in it, and then update by join.
Does he right? both of the equivalency and performance.
What do you suggest? i've thought INSERT INTO .. ON DUPLICATE UPDATE will help but i don't have all the data for the row, only it id, and that i want to set active = 1 on it.
Maybe this query is better?
UPDATE table SET
active = CASE
WHEN id='1' THEN '1'
WHEN id='4' THEN '1'
WHEN id='12' THEN '1'
WHEN id='55' THEN '1'
WHEN id='111' THEN '1'
ELSE active END
WHERE campaign_id > 0; //otherwise it throws an error about updating without where clause in safe mode, and i don't know if i could toggle safe mode off.
Thanks.
It's the other way around. OR can sometimes be turned into IN. IN is then efficiently executed, especially if there is an index on the column. If you have 1000 entries in the IN, it will do 1000 probes into the table based on id.
If you are running a new enough version of MySQL, I think you can do EXPLAIN EXTENDED UPDATE ...OR...; SHOW WARNINGS; to see this conversion;
The UPDATE CASE... will probably tediously check each and every row.
It would probably be better on other users of the system if you broke the UPDATE up into multiple UPDATEs, each having 100-1000 rows. More on chunking .
Where did you get the ids in the first place? If it was via a SELECT, then perhaps it would be practical to combine it with the UPDATE to make it one step instead of two.
I think below is better because it uses primary key.
UPDATE table SET active = 1 WHERE id<=5
I ran into a problem and can't choose the right solution.
I have a SELECT query that selects records from table.
These records has an status column as seen below.
SELECT id, <...>, status FROM table WHERE something
Now, right after this SELECT I have to UPDATE the status column.
How can I do it to avoid a race condition?
What I want to achieve is once somebody (session) selected something, this something cannot be selected by anybody else until I do not release it manually (for example using a status column).
Thoughts?
There is some mysql documentation, thar may be interesting to solve your task, not sure if it fit you needs, but it describes right way to do select followed by update.
The technique described does not prevent other sessions reading, but prevent writing of selected record until the end of transaction.
It contains an example similar to your problem:
SELECT counter_field FROM child_codes FOR UPDATE;
UPDATE child_codes SET counter_field = counter_field + 1;
It is required that you tables use Innodb engine and your programs use transactions.
If you need locking only for short time, i.e. one session select row with lock, update it, and release lock in one session, then you do not need field status at all, just use select ... for update and select ... lock in share mode so if all sessions will use these two with conjunction with transactions select... for update then update to modify, and select ... with shared lock to just read - this will solve your requirements.
If you need to lock for long time, select and lock in one session and then update and release in another, then right you use some storage to keep lock statuses and all session should use as described below: select ... for update and set status and status owner in one session, then in another session select for update check status and owner, update and remove status - for updating scenario, and for read scenario: select ... with shared lock check status.
You can do it with some preparations. Add a column sessionId to your table. It has to be NULL-able and it will contain the unique ID of the session that acquires the row. Also add an index on this new column; we'll use the column to search for rows in the table.
ALTER TABLE `tbl`
ADD COLUMN `sessionId` CHAR(32) DEFAULT NULL,
ADD INDEX `sessionId`(`sessionId`)
When a session needs to acquire some rows (based on some criteria) run:
UPDATE `tbl`
SET `sessionId` = 'aaa'
WHERE `sessionId` IS NULL
AND ...
LIMIT bbb
Replace aaa with the current session ID and ... with the conditions you need to select the correct rows. Replace bbb with the number of rows you need to acquire. Add an ORDER BY clause if you need to process the rows in a certain order (if some of them have higher priority than others). You can also add status = ... in the UPDATE clause to change the status of the acquired rows (to pending f.e.) to let other instances of the code know those rows are processed right now.
The query above acquires some rows. Next, run:
SELECT *
FROM `tbl`
WHERE `sessionId` = 'aaa'
This query gets the acquired rows to be processed in the client code.
After each row is processed, you either DELETE the row or UPDATE it and set sessionId to NULL (release the row) and status to reflect its new status.
Also you should release the rows (using the same procedure as above) when the session is closed.