mysql insert race condition - mysql

How do you stop race conditions in MySQL? the problem at hand is caused by a simple algorithm:
select a row from table
if it doesn't exist, insert it
and then either you get a duplicate row, or if you prevent it via unique/primary keys, an error.
Now normally I'd think transactions help here, but because the row doesn't exist, the transaction don't actually help (or am I missing something?).
LOCK TABLE sounds like an overkill, especially if the table is updated multiple times per second.
The only other solution I can think of is GET_LOCK() for every different id, but isn't there a better way? Are there no scalability issues here as well? And also, doing it for every table sounds a bit unnatural, as it sounds like a very common problem in high-concurrency databases to me.

what you want is LOCK TABLES
or if that seems excessive how about INSERT IGNORE with a check that the row was actually inserted.
If you use the IGNORE keyword, errors
that occur while executing the INSERT
statement are treated as warnings
instead.

It seems to me you should have a unique index on your id column, so a repeated insert would trigger an error instead of being blindingly accepted again.
That can be done by defining the id as a primary key or using a unique index by itself.
I think the first question you need to ask is why do you have many threads doing the exact SAME work? Why would they have to insert the exact same row?
After that being answered, I think that just ignoring the errors will be the most performant solution, but measure both approaches (GET_LOCK v/s ignore errors) and see for yourself.
There is no other way that I know of. Why do you want to avoid errors? You still have to code for the case when another type of error occurs.
As staticsan says transactions do help but, as they usually are implied, if two inserts are ran by different threads, they will both be inside an implied transactions and see consistent views of the database.

Locking the entire table is indeed overkill. To get the effect that you want, you need something that the litterature calls "predicate locks". No one has ever seen those except printed on the paper that academic studies are published on. The next best thing are locks on the "access paths" to the data (in some DBMS's : "page locks").
Some non-SQL systems allow you to do both (1) and (2) in one single statement, more or less meaning the potential race conditions arising from your OS suspending your execution thread right between (1) and (2), are entirely eliminated.
Nevertheless, in the absence of predicate locks such systems will still need to resort to some kind of locking scheme, and the finer the "granularity" (/"scope") of the locks it takes, the better for concurrency.
(And to conclude : some DBMS's - especially the ones you don't have to pay for - do indeed offer no finer lock granularity than "the entire table".)

On a technical level, a transaction will help here because other threads won't see the new row until you commit the transaction.
But in practice that doesn't solve the problem - it only moves it. Your application now needs to check whether the commit fails and decide what to do. I would normally have it rollback what you did, and restart the transaction because now the row will be visible. This is how transaction-based programmer is supposed to work.

I ran into the same problem and searched the Net for a moment :)
Finally I came up with solution similar to method to creating filesystem objects in shared (temporary) directories to securely open temporary files:
$exists = $success = false;
do{
$exists = check();// select a row in the table
if (!$exists)
$success = create_record();
if ($success){
$exists = true;
}else if ($success != ERROR_DUP_ROW){
log_error("failed to create row not 'coz DUP_ROW!");
break;
}else{
//probably other process has already created the record,
//so try check again if exists
}
}while(!$exists)
Don't be afraid of busy-loop - normally it will execute once or twice.

You prevent duplicate rows very simply by putting unique indexes on your tables. That has nothing to do with LOCKS or TRANSACTIONS.
Do you care if an insert fails because it's a duplicate? Do you need to be notified if it fails? Or is all that matters that the row was inserted, and it doesn't matter by whom or how many duplicates inserts failed?
If you don't care, then all you need is INSERT IGNORE. There is no need to think about transactions or table locks at all.
InnoDB has row level locking automatically, but that applies only to updates and deletes. You are right that it does not apply to inserts. You can't lock what doesn't yet exist!
You can explicitly LOCK the entire table. But if your purpose is to prevent duplicates, then you are doing it wrong. Again, use a unique index.
If there is a set of changes to be made and you want an all-or-nothing result (or even a set of all-or-nothing results within a larger all-or-nothing result), then use transactions and savepoints. Then use ROLLBACK or ROLLBACK TO SAVEPOINT *savepoint_name* to undo changes, including deletes, updates and inserts.
LOCK tables is not a replacement for transactions, but it is your only option with MyISAM tables, which do not support transactions. You can also use it with InnoDB tables if row-level level locking isn't enough. See this page for more information on using transactions with lock table statements.

I have a similar issue. I have a table that under most circumstances should have a unique ticket_id value, but there are some cases where I will have duplicates; not the best design, but it is what it is.
User A checks to see if the ticket is reserved, it isn't
User B checks to see if the ticket is reserved, it isn't
User B inserts a 'reserved' record into the table for that ticket
User A inserts a 'reserved' record into the table for that ticket
User B check for duplicate? Yes, is my record newer? Yes, leave it
User A check for duplicate? Yes, is my record newer? No, delete it
User B has reserved the ticket, User A reports back that the ticket has been taken by someone else.
The key in my instance is that you need a tie-breaker, in my case it's the auto-increment id on the row.

In case insert ignore doesnt fit for you as suggested in the accepted answer , so according to the requirements in your question :
1] select a row from table
2] if it doesn't exist, insert it
Another possible approach is to add a condition to the insert sql statement,
e.g :
INSERT INTO table_listnames (name, address, tele)
SELECT * FROM (SELECT 'Rupert', 'Somewhere', '022') AS tmp
WHERE NOT EXISTS (
SELECT name FROM table_listnames WHERE name = 'Rupert'
) LIMIT 1;
Reference:
https://stackoverflow.com/a/3164741/179744

Related

How does a lock work for two inserts in MySQL?

Let's say isolation level is Repeatable Read as it's really is as default for MySQL.
I have two inserts (no checking, no unique columns).
a) Let's say these two inserts happen at the same moment. What will happen? Will it first run the first insert and the second or both of them in different MySQL's threads?
b) Let's say I have insert statement and column called vehicle_id as unique, but before that, I check if it exists or not. If it doesn't exist, go on and insert. Let's say two threads in my code both come at the same moment. So they will both go into if statement since they happened at the same moment.
Now, they both have to do insert with the same vehicle_id. How does MySQL handle this? If it's asynchronous or something, maybe both inserts might happen so quickly that they will both get inserted even though vehicle_id was the same as unique field. If it's not asynchronous or something, one will get inserted first, second one waits. When one is done, second one goes and tries to insert, but it won't insert because of unique vehicle_id restriction. How does this situation work?
I am asking because locks in repeatable read for INSERT lose their essence. I know how it's going to work for Updating/Selecting.
As I understand it the situation is:
a) the threads are assigned for each connection. If both updates are received on the same connection then they will be executed in the same thread, one after the other according to the order in whcih they are received. If they're in different threads then it will be down to whichever thread is scheduled first and that's likely to be OS determined and non-deterministic from your point of view.
b) if a column is defined as UNIQUE at the server, then you cannot insert a second row with the same value so the second insert must fail.
Trying to use a conflicting index in the way you described appears to be an application logic problem, not a MySQL problem. Whatever entity is responsible for your unique ID's (which is your application in this case) it needs to ensure that they are unique. One approach is to implement an Application Lock using MySQL which allows applications running in isolation from each other to share a lock at the server. Check in the mysql docs for how to use this. It's usual use is intended to be application level - therefore not binding on the MySQL server. Another approach would be to use Uuids for unique keys and rely on their uniqueness when you need to create a new one.

Difference between table and row locks

I'm studying about MySQL and how it works, and something confuses me and I don't find any clear explanation on the web about this.
What exactly is the difference between row and table locks? One locks the row and the other locks the table. Correct?
So, in which sort of situations would you use a table lock and row lock? Is it something the programmer or database manager can program in or it is the enigne that does it for you?
If there is any other information you think is good to know, feel free to add that to your answer.
I'm sorry for this possible noobish question, but I'm still learning.
While this is SQL server, it applies well to mySQL as well: What are row, page and table locks? And when they are acquired?.
MySQL docs shows this:
Generally, table locks are superior to row-level locks in the following cases:
Most statements for the table are reads.
Statements for the table are a mix of reads and writes, where writes are updates or deletes for a single row that can be fetched with one key read:
SELECT combined with concurrent INSERT statements, and very few UPDATE or DELETE statements.
Many scans or GROUP BY operations on the entire table without any writers.
Now when to use: The infamous "It depends" applies here:
Ask yourself what is the use case for this transaction?
Typically row level locking will be used when high granular control is needed. In my opinion this should be used as the default. Say a orders or orders detail table where the order could be updated or deleted. Locking the whole table on a high transaction volume table makes no sense. I want users of individual orders to be able to update each order and not lock someone else out when I know the scope of their change is a limited to a specific order.
Now if I needed to restore the orders and details table from backup for some reason; or make many updates to many records based on an external source; I may lock the whole table to ensure all the updates complete successfully and I can verify the load before I let anyone back in. I don't want any changes while I'm making the needed updates. But we have to consider if locking the whole table will negatively impact user experience; or if we have no other options available. Locking at the table level will prevent other users from changing any value. IS this really what we want?

How to perform check-and-insert in mysql?

In my code I need to do the following:
Check a MySQL table (InnoDB) if a particular row (matching some criteria) exists. If it does, return it. If it doesn't, create it and then return it.
The problem I seem to have is race conditions. Every now and then two processes run so closely together, that they both check the table at the same time, don't see the row, and both insert it - thus duplicate data.
I'm reading MySQL documentation trying to come up with some way to prevent this. What I've come up so far:
Unique indexes seem to be one option, but they're not universal (it only works when the criteria is something unique for all rows).
Transactions even at SERIALIZABLE level don't protect against INSERT, period.
Neither do SELECT ... LOCK IN SHARE MODE or SELECT ... FOR UPDATE.
A LOCK TABLE ... WRITE would do it, but it's a very drastic measure - other processes won't be able to read from the table, and I need to lock ALL tables that I intend to use until I unlock them.
Basically, I'd like to do either of the following:
Prevent all INSERT to the table from processes other than mine, while allowing SELECT/UPDATE (this is probably impossible because it make so little sense most of the time).
Organize some sort of manual locking. The two processes would coordinate among themselves which one gets to do the select/insert dance, while the other waits. This needs some sort of operation that waits until the lock is released. I could probably implement a spin-lock (one process repeatedly checks if the other has released the lock), but I'm afraid that it would be too resource intensive.
I think I found an answer myself. Transactions + SELECT ... FOR UPDATE in an InnoDB table can provide a synchronization lock (aka mutex). Have all processes lock on a specific row in a specific table before they start their work. Then only one will be able to run at a time and the rest will wait until the first one finishes its transaction.

Whether UPDATE statement locks the rows in the table separately or entirely when using InnoDb

Say, we have a table called person like below
CREATE TABLE person (
id INT,
name VARCHAR(30),
point INT
);
I want to update the entire table changing the point of a person according to other's like
UPDATE person SET point=(
SELECT point FROM person WHERE some-condition
);
or, simply just increasing by one, like
UPDATE person SET point=point+1;
When executing the scripts above, which rows will be locked and will other statements wait until the update statement finishes or can be executed between two update operations?
Neither of your update statements has a where clause. (Your first one has a select with a where clause; it's possible you want that where clause to be part of the update, but I am not sure about that.)
That means they'll update all the rows in your person table. Transaction semantics provided by InnoDB says that each row will be locked until the entire update is completed. That is, other updates will be blocked. If you attempt other updates in an order different from the one in this query, you're risking a deadlock.
Other client connection select-queries will see the previous state of the table ... the state at the instant before your update statement began ... until your update statement completes. In many cases InnoDB can do that without delaying its response to the other connections' queries. But sometimes it must delay its response. The biggest delay may come at the end of your update query while InnoDB is committing its results.
Keep this in mind: in order to implement transaction semantics, InnoDB sacrifices the predictability of query performance.
I strongly suggest you avoid doing updates without where clauses where it makes sense to do that. It doesn't in your second (give every person another point) query.

Looking for the correct way to lock tables in mySQL

I have a situation where a specific transaction has to update the DB before any other transaction can look at a given table. Specifically, there is a prizing mechanism whereby there is a limited number of prizes and I'm concerned that if two requests arrive virtually at the same time the second request might find the prize still available by virtue of the first request having not had the time it needs to mark it as not-available.
I'm looking at the documentation for lock tables and it's not clear to me what's going on, and since it's extremely difficult to test this feature (as it requires two requests arriving at the same time), I was hoping for some advice.
My needs are extremely simple. There is only one table that I need to lock, while all the others can go about their business.
**request 1**:
lock prizes;
select from prizes;
mark prize as unavailable;
unlock prizes;
simultaneous **request n**
find the prizes table locked and wait for it to unlock //this is not critical, so long as they can just fail gracefully
select [no prize available]
As I said, it's CRITICALLY important that the other tables in this DB are completely unaffected by my lock, I got the sense from the documentation that when I lock one table, selecting another table will produce an error that says "other table isn't locked"... I'm probably not understanding this correctly, as that would seem idiotic, but just need to make sure that locking prizes doesn't affect anything else.
TIA
This is a perfect case for a transaction. Note that your tables need to be InnoDB for this to work.
start transaction;
select [your_fields] from [prizes_table] WHERE [your_where] FOR UPDATE;
// if is a valid recipient and prize gets taken:
update prizes set available=0 where id=[used_prize_id];
commit;
This should do exactly what you expect.
You should simply LOCK prizes WRITE to get the semantics you need. Since locking one or more tables prevents you from accessing any tables you have not locked for the lock's duration, you will also need to lock -- either for reading or for writing -- all other tables that you need to do the "mark prize as unavailable" step, if any.
Be aware that if you intend to access tables using an alias, you also need to supply the same alias in the LOCK statement. This topic is covered in the documentation, but I mention it explicitly because it could be overlooked.
If you need to lock only the record, you can use the select ... for update setting autocommit=0 and using the transaction.
This way the first request select the record and set the lock, then the second request is blocked until the first request commit (or rollback) the transaction or the wait exceeded the timout