Fetch Unique Rows from Database in a Multi deployment Environment - mysql

I have a table (codes) in relational database MySQL which stores a list of unique codes. The schema sample is given below.
Integer id
String codes
String user
Boolean is_available
id | codes | user | is_available
1 | ABC | | true
2 | XYZ | | true
Whenever a code is assigned to a user user_id is updated and is_available is made false.
1 | ABC | user_id | false
There are multiple instances of a service running which fetch the code from database and gives it to the user.
Each request must give a unique code.
GET /code -> returns a unique code
As there are multiple instances what is the best way to handle the concurrency.
Is update and then select the correct way to do it ? Does it make the full table scan ?
Read optimistic locking , where we can set retries on a failed attempt but dont feel this is a good approach.
Setting isolation level to serializable is suggested but i dont think it should be used in production environment.
Having an centralised blocking queue which pops a unique code each time the request is made but it will be a single point of failure.
I have read a lot of theroy on this but I am looking for how it should be implemented on an enterprise scale application having thousands on concurrent calls.

Trying to answer "which is better" would be an opinion-based answer, because there's hardly ever a single right answer to architecture problems. They're always a tradeoff, so you have to decide which advantages are priorities for your project.
I'll try to address your specific questions, which are not opinion-based.
You can use a locking read to acquire an exclusive lock on a single row.
SELECT * FROM codes WHERE is_available='true' ORDER BY id LIMIT 1 FOR UPDATE;
Then update that row to claim it for a given user, and then return the code as the response.
UPDATE codes SET user = ?, is_available = 'false' WHERE id = ?;
Neither of these statements will do a table-scan. The SELECT will terminate when it gets the first row that satisfies the LIMIT, and the UPDATE applies only to one row which it looks up by its primary key.
Optimistic locking has no automatic retries. That's up to your application code. But ideally, you shouldn't need to do retries due to lock timeouts if your code is prompt about finishing transactions.
There's no need to set the isolation level to serializable globally even if you chose this strategy. You can use SET TRANSACTION immediately before starting a transaction, to change the isolation level once.
A centralized queue of unique codes would be an effective way to scale further, once your concurrent traffic grows so much that the locking SQL reads are a bottleneck. Any architecture need to be reevaluated periodically, because what worked at scale N may not be the best for scale 10 * N.

Have user_id NULLable, Then is_available can be eliminated and be replaced by user_id IS NULL.
Pseudo-code:
BEGIN;
$code <- SELECT code FROM t WHERE user_id IS NULL LIMIT 1 FOR UPDATE;
UPDATE t SET user_id = '$user_id' WHERE code = '$code';
COMMIT;
It's atomic. It assigns some code to the given user_id. It allows a user to have multiple "codes".
If a user can have only one "code", I think the transaction can be simplified.

Related

mysql deadlock while updating

I need to select, make manipulation and update a lot of data for less than 3 minutes. And was decided to create some kind of locking mechanism to make the ability to run separate processes (in parallel) and each process should lock, select and update own rows.
To make it possible was decided to add the column worker_id to the table.
Table structure:
CREATE TABLE offers
(
id int(10) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
offer_id int(11) NOT NULL,
offer_sid varchar(255) NOT NULL,
offer_name varchar(255),
account_name varchar(255),
worker_id varchar(255),
);
CREATE UNIQUE INDEX offers_offer_id_offer_sid_unique ON offers (offer_id, offer_sid);
CREATE INDEX offers_offer_id_index ON offers (offer_id);
CREATE INDEX offers_offer_sid_index ON offers (offer_sid);
Also, we decided to start from 5 parallel processes and to not allow selection of the same row by different processes we are using the formula: offer_id % max_amount_of_processes = process_number (process_number starting from 0, so first is 0 and last is 4)
Each process is following the steps:
set worker_id with current process id to the first 1000 rows using the query: update offers set worker_id =: process_id where worker_id is null and offer_id%5 =: process_number order by offer_id asc limit 1000
select those rows: select * from offers where worker_id =: process_id
order by offer_id asc limit 1000
make manipulation with data, store last offer_id to the variable and prepared data to another variable for further update
run the same query from step 1 to lock next 1000 rows
run the same query as we have in step 2 with additional where clause and offer_id > :last_selected_id to select next 1000 rows
make the same steps in the loop until we lock all rows
remove all locks update offers set worker_id = null where worker_id =: process_id
run the query to update all collected data
and the same steps for other 4 processes
The issue here is that I'm getting a deadlock when all 5 processes simultaneously run the query from step 1 to lock rows (set worker_id) but each process doing lock for own rows which depending on the formula. I tried to set transaction isolation level to READ COMMITED but still the same issue.
I'm a novice in the locking mechanism and I need a help to prevent deadlocks here or to create the better mechanism
The expression offer_id%5 = :process_number cannot use an index, so it can only scan all the rows matched by the first condition, worker_id is null.
You can prove this with two windows:
mysql1> begin;
mysql1> set #p=1;
mysql1> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
Don't commit the transaction in window 1 yet.
mysql2> set #p=2;
mysql2> update offers set worker_id = #p where worker_id is null and offer_id%5 = #p;
...waits for about 50 seconds, or value of innodb_lock_wait_timeout, then...
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
This demonstrates that each concurrent session locks overlapping sets of rows, not only the rows that match the modulus expression. So the sessions queue up against each other's locks.
This will get worse if you put all the steps into a transaction like #SloanThrasher suggests. Making the work of each worker take longer will make them hold only their locks longer, and further delay the other processes waiting on those locks.
I do not understand how updated_at field can cause the issue as I'm still updating other fields
I'm not sure because you haven't posted the InnoDB deadlock diagnostics from SHOW ENGINE INNODB STATUS.
I do notice that your table has a secondary UNIQUE KEY, which will also require locks. There are some cases of deadlocks that occur because of non-atomicity of the lock assignment.
Worker 1 Worker 2
UPDATE SET worker_id = 1
(acquires locks on PK)
UPDATE SET worker_id = 2
(waits for PK locks held by worker 1)
(waits for locks on UNIQUE KEY)
Both worker 1 and worker 2 can therefore be waiting on each other, and enter into a deadlock.
This is just a guess. Another possibility is that the ORM is doing a second UPDATE for the updated_at column, and this introduces another opportunity for a race condition. I haven't quite worked that out mentally, but I think it's possible.
Below is a recommendation for a different system that would avoid these problems:
There's another problem, that you're not really balancing the work over your processes to achieve the best completion time. There might not be an equal number of offers in each group when you split them by modulus. And each offer might not take the same amount of time to process anyway. So some of your workers could finish and have nothing to do, while the last worker is still processing its work.
You can solve both problems, the locking and the load-balancing:
Change the table columns in the following way:
ALTER TABLE offers
CHANGE worker_id work_state ENUM('todo', 'in progress', 'done') NOT NULL DEFAULT 'todo',
ADD INDEX (work_state),
ADD COLUMN updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX (updated_at);
Create ONE process that reads from the table periodically, and adds the primary key id values of offers in a 'todo' state to a message queue. All the offers, regardless of their offer_id value, get queued in the same way.
SELECT id FROM offers WHERE work_state = 'todo'
/* push each id onto the queue */
Then each of the workers can pull one id at a time from the message queue. The worker does the following steps with each id:
UPDATE offers SET work_state = 'in progress' WHERE id = :id
The worker performs the work for its one offer.
UPDATE offers SET work_state = 'done' WHERE id = :id
These worker queries only reference one offer at a time, and they address the offers by primary key, which will use the PK index and only lock one row at a time.
Once it has finished one offer, then the worker pulls the next offer from the queue.
In this way, the workers will all finish at the same time, and the work will be balanced over the workers better. Also you can start or stop workers at any time, and you don't care about what worker number they are, because your offers don't need to be processed by a worker with the same number as the modulus of the offer_id.
When the workers finish all the offers, the message queue will be empty. Most message queues allow workers to do blocking reads, so while the queue is empty, the worker will just wait for the read to return. When you use a database, the workers have to poll frequently for new work.
There's a chance a worker will fail during its work, and never mark an offer 'done'. You need to check periodically for orphaned offers. Assume they are not going to be completed, and mark their state 'todo'.
UPDATE offers SET work_state = 'todo'
WHERE work_state = 'in progress' AND updated_at < NOW() - INTERVAL 5 MINUTE
Choose the interval length so it's certain that any worker would have finished it by that time unless something had gone wrong. You would probably do this "reset" before the dispatcher queries for current offers todo, so the offers that had been forgotten will be re-queued.
I found the issue. It was because my ORM is by default updating timestamp fields (to simplify the example above I removed them from table structure) while doing an update operation, and after I turn it off the deadlock disappeared. But still, I do not understand how updated_at field can cause the issue as I'm still updating other fields

Mysql Update one column of multiple rows in one query

I've looked over all of the related questions i've find, but couldn't get one which will answer mine.
i got a table like this:
id | name | age | active | ...... | ... |
where "id" is the primary key, and the ... meaning there are something like 30 columns.
the "active" column is of tinyint type.
My task:
Update ids 1,4,12,55,111 (those are just an example, it can be 1000 different id in total) with active = 1 in a single query.
I did:
UPDATE table SET active = 1 WHERE id IN (1,4,12,55,111)
its inside a transaction, cause i'm updating something else in this process.
the engine is InnoDB
My problem:
Someone told me that doing such a query is equivalent to 5 queries at execution, cause the IN will translate to the a given number of OR, and run them one after another.
eventually, instead of 1 i get N which is the number in the IN.
he suggests to create a temp table, insert all the new values in it, and then update by join.
Does he right? both of the equivalency and performance.
What do you suggest? i've thought INSERT INTO .. ON DUPLICATE UPDATE will help but i don't have all the data for the row, only it id, and that i want to set active = 1 on it.
Maybe this query is better?
UPDATE table SET
active = CASE
WHEN id='1' THEN '1'
WHEN id='4' THEN '1'
WHEN id='12' THEN '1'
WHEN id='55' THEN '1'
WHEN id='111' THEN '1'
ELSE active END
WHERE campaign_id > 0; //otherwise it throws an error about updating without where clause in safe mode, and i don't know if i could toggle safe mode off.
Thanks.
It's the other way around. OR can sometimes be turned into IN. IN is then efficiently executed, especially if there is an index on the column. If you have 1000 entries in the IN, it will do 1000 probes into the table based on id.
If you are running a new enough version of MySQL, I think you can do EXPLAIN EXTENDED UPDATE ...OR...; SHOW WARNINGS; to see this conversion;
The UPDATE CASE... will probably tediously check each and every row.
It would probably be better on other users of the system if you broke the UPDATE up into multiple UPDATEs, each having 100-1000 rows. More on chunking .
Where did you get the ids in the first place? If it was via a SELECT, then perhaps it would be practical to combine it with the UPDATE to make it one step instead of two.
I think below is better because it uses primary key.
UPDATE table SET active = 1 WHERE id<=5

Optimistic Locks and Interleaving

I read about optimistic locking scheme, where clients can read the values, perform there computation and when write needs to happen, Updates are validated before being written to database.
Lets say If we employ version mechanism for Optimistic Locks then (In case two clients) both will be having update statements as :
update tableName Set field = val, version = oldVersion +1 where
version = OldVersion and Id = x;
Now lets consider the following scenario with Two Clients :
Both Clients read the values of field and version.
Both Clients compute something at there end. Generate new value of field.
Now Both Clients send query Request to Database Server.
As soon as it reaches database :
One Client Update Query starts executing.
But in the mean time interleaving happens and other Client Update
starts executing.
Will these query interleaving causes data races at table
I mean to say, we can't say that Optimistic Lock executes on its own, for example I understand the case where row level locking happens or other locking like table level locking happens, then its fine. But then its like Optimistic Locks doesn't work on its own, it needs pessimistic lock also(row level/ table level, which totally depends on underlying Storage Engine Implementation).
What happens when there is no Row / table level locks already there, but want to implement Optimistic Locking strategy. With query interleaving will it causes data races at table.(I mean to say only field is updated and version is not and then interleaving happens. Is this totally depends on what Isolation levels are set for query)?
I'm little bit confused with this scenario.
Also what is the right use case where optimistic Locking can be really helpful and increase the overall performance of application as compared to Pessimistic Locking.
The scenario in pseudo code for the worst case scenario: Two clients update the same record:
Scenario 1 (your scenario: optimistic locking):
Final constraints are checked at the server side. Optimistic locking is used only for presentation purposes.
Client one orders a product of which there is only 1 in stock.
Client two orders the same product of which there is only 1 in stock.
Both clients get this presented on the screen.
Products table:
CREATE TABLE products (
product_id VARCHAR(200),
stock INT,
price DOUBLE(5,2)
) ENGINE=InnoDB;
Presentation code:
-- Presentation:
SELECT * FROM products WHERE product_id="product_a";
-- Presented to client
Order code:
-- Verification of record (executed in the same block of code within
-- an as short time interval as possible):
SELECT stock FROM products WHERE product_id="product_a";
IF(stock>0) THEN
-- Client clicks "order" (one click method=also payment);
START TRANSACTION;
-- Gets a record lock
SELECT * FROM products WHERE product_id="product_a" FOR UPDATE;
UPDATE products SET stock=stock-1 WHERE product_id="product_a";
INSERT INTO orders (customer_id,product_id,price)
VALUES (customer_1, "product_a",price);
COMMIT;
END IF;
The result of this scenario is that both orders can succeed: They both get the stock>0 from the first select, and then execute the order placement. This is an unwanted situation (in almost any scenario). So this would then have to be addressed in code by cancelling the order, taking a few more transactions.
Scenario 2: Alternative to optimistic locking:
Final constraints are checked at the database side. Optimistic locking is used only for presentation purposes. Less database queries then in the previous optimistic locking scenario, less chance of redos.
Client one orders a product of which there is only 1 in stock.
Client two orders the same product of which there is only 1 in stock.
Both clients get this presented on the screen.
Products table:
CREATE TABLE products (
product_id VARCHAR(200),
stock INT,
price DOUBLE(5,2),
CHECK (stock>=-1) -- The constraint preventing ordering
) ENGINE=InnoDB;
Presentation code:
-- Presentation:
SELECT * FROM products WHERE product_id="product_a";
-- Presented to client
Order code:
-- Client clicks "order" (one click method=also payment);
START TRANSACTION;
-- Gets a record lock
SELECT * FROM products WHERE product_id="product_a" FOR UPDATE;
UPDATE products SET stock=stock-1 WHERE product_id="product_a";
INSERT INTO orders (customer_id,product_id,price)
VALUES (customer_1, "product_a",price);
COMMIT;
So now two customers get presented this product, and click order on the same time. The system executes both orders simultaneous. The result will be: One order will be placed, the other gets an exception since the constraint will fail to verify, and the transaction will be aborted. This abort (exception) will have to be handled in code but does not take any further queries or transactions.

SQL handling ACID and concurrency

(Please answer as generally as possible. But I am working in MS SQL Server and MySql, so if there is no general answer, go ahead and answer for one or both of those.)
Consider a reservation system implemented in a SQL database. I want to make sure that among many multiple users, only one user gets a reservation and no other user "thinks" they got it. It's a classic concurrency problem in DB work, but I'm not sure what the best answer is.
SPECIFICS:
Suppose each user has a UserID. We could imagine a few users are currently trying to make a reservation with UserID values of 1004, 1005, 1009, and 1011.
Suppose the resource and reservations are stored in a table, SEATS .
We could imagine at one point the SEATS table contains:
----- SEATS -----------------------------
SeatID UserID ResvTime
1 1017 2014.07.15 04:17:18.000
2 NULL NULL
3 NULL NULL
4 1012 2014.07.15 04:19:35.000
5 1003 2014.07.15 04:20:46.000
-----------------------------------------
Now suppose that "at the same time", users 1004 and 1005 try to grab SeatID 3.
I'm wanting to know what SQL will properly make sure that only one of them gets the seat and the other gets a refusal. The simplest version of the code I can think of, in T-SQL, would be:
PROC GRABSEAT #seatid INT, #userid INT, #obtained BIT OUTPUT
BEGIN
DECLARE #avail INT
SET #avail = (SELECT UserID FROM SEATS WHERE (SeatID = #seatid))
IF (#avail IS NULL)
BEGIN
UPDATE SEATS SET UserID = #userid, ResvTime = GETDATE() WHERE (SeatID = #seatid)
SET #obtained = 1
END
ELSE
SET #obtained = 0
END
But the question is how to prevent this from allowing multiple concurrent users, all executing this PROC, getting a TRUE return on the same seat (say SeatID = 3).
For example, if both users 1004 and 1005 execute this PROC nearly simultaneously, they could both do the SELECT and get #avail = NULL before either of them tries to set the UserID column. Then both of them would run the UPDATE statement. Assuming nothing even worse could result, then one of them would overwrite the other's set, both would think they got the seat, but actually only the one who ran the UPDATE statement last would have their data stored in the SEATS table. The other one would have their data overwritten. This is referred to as a "lost input" problem. But what is the way in a SQL database to prevent it? I have been assuming that each single SQL statement is executed as a TRANSACTION. A TRANSACTION has the four so-called "ACID" properties. These properties are what I need. So, I think the answer, in a SQL database, is:
BEGIN TRANSACTION
EXCEUTE GRABSEAT #seatid= <id1>, #userid = <id2>, #obtained
COMMIT
By doing so, the main property I need (isolation), will guarantee that the interleaved execution I'm worried about will not occur.
But I've seen articles that say it's not that simple at all. I think the big problem various articles point to is that not every TRANSACTION really runs in total atomicity and isolation. So, perhaps the above wrapping in a TRANSACTION will not achieve the desired result. If not, then what is needed?
A transaction is atomic by definition. But when a transaction's changes become visible to other users / connections / transactions depends on the isolation level. The default isolation in SQL Server is READ COMMITTED - see this question's answer for more info and links on how to change it.
For this type of scenario, you probably want SERIALIZABLE. The good news is that you can change the isolation level for a transaction with a SET TRANSACTION ISOLATION LEVEL statement in your stored proc. The bad news is that you have to be 100% sure that this is the only place in your code that ever updates the SEAT table.
Fundamentally, the issue you have is that there is a race condition. Just because you are in a transaction does not mean that two transactions can't both call the stored proc at the same time, then run the SELECT. Now both tx think it's ok to to do the UPDATE. Setting the isolation level to SERIALIZABLE locks the table for the tx that hits the SELECT first.
Instead of the SELECT statement, why don't you just do the update, with an extra filter on NULL so it can't replace if the value is null, and then return whether the query had any effect or not. That way, the transaction is atomic, since it's just one query.
PROC GRABSEAT #seatid INT, #userid INT, #obtained BIT OUTPUT
BEGIN
UPDATE SEATS SET UserID = #userid, ResvTime = GETDATE()
WHERE (SeatID = #seatid) AND UserID IS NULL
SET #obtained = ##ROWCOUNT
END
Due to rowlocking, two updates can't happen simultaneously, so one will work (return ##ROWCOUNT = 1, and the other will fail ##ROWCOUNT = 0.

MySQL way to make a lock in a PHP page

I have the following MySQL table:
myTable:
id int auto_increment
voucher int not null
id_user int null
I've populated voucher field with values from 1 to 100000 so I've got 100000 records. When a user clicks a button in a PHP page, I need to allocate a record for the user so I make something similar like:
update myTable set id_user=XXX where
voucher=(SELECT * FROM (SELECT MIN(voucher) FROM myTable WHERE id_user is null) v);
The problem is that I don't use locks and I should use them because if two users click in the same moment I risk assigning the same voucher to different persons (2 updates in the same record so I lose 1 user).
I think there must be a correct way to do this, can you help me please?
Thanks!
If you truly want to serialize your process, you can grab a Lock Tables tablename Write at the start of your transaction, and Unlock Tables when done.
If you are using Innodb and transactions, you have to perform the Lock Tables after the start of the transaction.
I am not advocating this method, as there is usually a better way of handling, however if you need a quick and dirty solution, this will work with a minimal amount of code changes.