I am planning on using auto-incremented user IDs on my website. As user data will be split across several tables, I was wondering, how reliable are auto-incremented values in transactions, i.e. when inserting some initial values into tables in a transaction upon registration, is it safe to just let the auto-incrementor set the IDs in all the tables or should it just insert into one table, get the inserted ID in a separate query and use it in the subsequent insertions, resulting in higher database load just for user creation?
The auto increment field gets reserved everytime you are attempting to insert something. The insert more or less proceeds in 2 steps:
1: reserve the next available auto-increment key
2: perform the insert with this reserved key
Now, if the transaction rolls back, the only thing that can never be rolled back is the auto-increment reservation, resulting in gaps in the auto-increment column. Because of this, if you are trying to predict 100% what the auto increment will be, it is impossible to do so. There is no other issue with auto_increment that i know of, and in almost all cases, it is more reliable to rely on mysql's features than to try to do something manually.
If your insertions happen under very controlled conditions, like there are no concurrent transactions, and nothing is ever rolled back, you might be safe. Otherwise, the answer could depend on what storage engine you are using. You should expect auto_increment to result in gaps in the numbering at least if you are using InnoDB.
More generally, every DB I know has the chance of gaps in auto_increment (or the equivalent) values. The reason is that if it were not so, any transaction inserting a new row would have to block all other insertions to that table. This is because a second row can't be inserted when it's not possible to know what the next value will be if the first transaction hasn't yet committed or rolled back. If you allow gaps, then you just assume the first transaction will commit, and if it happens to rollback, you have a gap, but that's not a problem.
If your concern is high load for a user creation operation, you might be able to make things snappier by implementing the function as a stored procedure; this way you can avoid round-tripping back to the application for each query.
Related
I just received access to a MySQL Database where the ID is a float field (not autoIncrement). This database was first used with a C# Application that is not updated anymore.
I have to make a web app and I can't edit the type of field in the database neither make a new one.
So, how can I make "INSERT" query that will increment the ID and not create problem when multiple people is working in the same time ?
I tried to get the last id, increment by one, then insert into the table but it's not the best way if users are creating a record in the same time.
Thank you
how can I make "INSERT" query that will increment the ID and not create problem when multiple people is working in the same time ?
You literally cannot make an INSERT query alone that will increment the ID and avoid race conditions. It has nothing to do with the data type of the column. The column could be INT and you would have the same race condition problem.
One solution is to use LOCK TABLES to block concurrent sessions from inserting rows. Then your session can read the current MAX() value in the table, increment it, INSERT a new row with the incremented value, and then UNLOCK TABLES as promptly as possible to allow the concurrent sessions to do their INSERTs.
In fact, this is exactly how MySQL's AUTO_INCREMENT works. Each table stores its own most recent auto-increment value. When you insert to a table with an auto-increment, the table is locked briefly, just long enough for your session to read the table's auto-inc value, increment it, store it back into the table's metadata, and also store that value in your session's thread data. Then it unlocks the table's auto-inc lock. This all happens very quickly. Read https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html for more on this.
The difficult part is that you can't simulate this from SQL, because SQL naturally must obey transaction scope. The auto-inc mechanism built into InnoDB works outside of transaction scope, so concurrent sessions can read the latest incremented auto-inc value for the table even if the transaction that incremented it has not finished inserting that value and committing its transaction. This is good for allowing maximum concurrency, but you can't do that at the SQL level.
The closest you can do is the LOCK TABLES solution that I described, but this is rather clumsy because it ends up holding that lock a lot longer than the auto-inc lock typically lasts. This puts a limit on the throughput of concurrent inserts to your table. Is that too limiting for your workload? I can't say. Perhaps you have a modest rate of inserts to this table, and it won't be a problem.
Another solution is to use some other table that has an auto-increment or another type of unique id generator that is safe for concurrent sessions to share. But this would require all concurrent sessions to use the same mechanism as they INSERT rows.
A possible solution could be the following, but it is risky and requires thorough testing of ALL applications using the table/database!
The steps to follow:
rename the table (xxx_refactored or something)
create a view using the original table and cast the ID column as FLOAT in the view, so the other application will see the data as FLOAT.
create a new column or alter the existing one and add the AUTO_INCREMENT to it
Eventually the legacy application will have to be updated to handle the column properly, so the view can be dropped
The view will be updatable, so the legacy application will still be able to insert and update the table through the view.
This won't work if:
Data in the column is outside of the range of the chosen new datatype
The column is referenced by a foreign key constraint from any other table
Probably more :)
!!! TEST EVERYTHING BEFORE YOU DO IT IN PRODUCTION !!!
Probably a better option is to ask somebody to show you the code which maintains this field in the legacy application.
Let's say isolation level is Repeatable Read as it's really is as default for MySQL.
I have two inserts (no checking, no unique columns).
a) Let's say these two inserts happen at the same moment. What will happen? Will it first run the first insert and the second or both of them in different MySQL's threads?
b) Let's say I have insert statement and column called vehicle_id as unique, but before that, I check if it exists or not. If it doesn't exist, go on and insert. Let's say two threads in my code both come at the same moment. So they will both go into if statement since they happened at the same moment.
Now, they both have to do insert with the same vehicle_id. How does MySQL handle this? If it's asynchronous or something, maybe both inserts might happen so quickly that they will both get inserted even though vehicle_id was the same as unique field. If it's not asynchronous or something, one will get inserted first, second one waits. When one is done, second one goes and tries to insert, but it won't insert because of unique vehicle_id restriction. How does this situation work?
I am asking because locks in repeatable read for INSERT lose their essence. I know how it's going to work for Updating/Selecting.
As I understand it the situation is:
a) the threads are assigned for each connection. If both updates are received on the same connection then they will be executed in the same thread, one after the other according to the order in whcih they are received. If they're in different threads then it will be down to whichever thread is scheduled first and that's likely to be OS determined and non-deterministic from your point of view.
b) if a column is defined as UNIQUE at the server, then you cannot insert a second row with the same value so the second insert must fail.
Trying to use a conflicting index in the way you described appears to be an application logic problem, not a MySQL problem. Whatever entity is responsible for your unique ID's (which is your application in this case) it needs to ensure that they are unique. One approach is to implement an Application Lock using MySQL which allows applications running in isolation from each other to share a lock at the server. Check in the mysql docs for how to use this. It's usual use is intended to be application level - therefore not binding on the MySQL server. Another approach would be to use Uuids for unique keys and rely on their uniqueness when you need to create a new one.
A co-worker just made me aware of a very strange MySQL behavior.
Assuming you have a table with an auto_increment field and another field that is set to unique (e.g. a username-field). When trying to insert a row with a username thats already in the table the insert fails, as expected. Yet the auto_increment value is increased as can be seen when you insert a valid new entry after several failed attempts.
For example, when our last entry looks like this...
ID: 10
Username: myname
...and we try five new entries with the same username value on our next insert we will have created a new row like so:
ID: 16
Username: mynewname
While this is not a big problem in itself it seems like a very silly attack vector to kill a table by flooding it with failed insert requests, as the MySQL Reference Manual states:
"The behavior of the auto-increment mechanism is not defined if [...] the value becomes bigger than the maximum integer that can be stored in the specified integer type."
Is this expected behavior?
InnoDB is a transactional engine.
This means that in the following scenario:
Session A inserts record 1
Session B inserts record 2
Session A rolls back
, there is either a possibility of a gap or session B would lock until the session A committed or rolled back.
InnoDB designers (as most of the other transactional engine designers) chose to allow gaps.
From the documentation:
When accessing the auto-increment counter, InnoDB uses a special table-level AUTO-INC lock that it keeps to the end of the current SQL statement, not to the end of the transaction. The special lock release strategy was introduced to improve concurrency for inserts into a table containing an AUTO_INCREMENT column
…
InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.
If you are afraid of the id column wrapping around, make it BIGINT (8-byte long).
Without knowing the exact internals, I would say yes, the auto-increment SHOULD allow for skipped values do to failure inserts. Lets say you are doing a banking transaction, or other where the entire transaction and multiple records go as an all-or-nothing. If you try your insert, get an ID, then stamp all subsequent details with that transaction ID and insert the detail records, you need to ensure your qualified uniqueness. If you have multiple people slamming the database, they too will need to ensure they get their own transaction ID as to not conflict with yours when their transaction gets committed. If something fails on the first transaction, no harm done, and no dangling elements downstream.
Old post,
but this may help people,
You may have to set innodb_autoinc_lock_mode to 0 or 2.
System variables that take a numeric value can be specified as --var_name=value on the command line or as var_name=value in option files.
Command-Line parameter format:
--innodb-autoinc-lock-mode=0
OR
Open your mysql.ini and add following line :
innodb_autoinc_lock_mode=0
I know that this is an old article but since I also couldn't find the right answer, I actually found a way to do this. You have to wrap your query within an if statement. Its usually insert query or insert and on duplicate querys that mess up the organized auto increment order so for regular inserts use:
$check_email_address = //select query here\\
if ( $check_email_address == false ) {
your query inside of here
}
and instead of INSERT AND ON DUPLICATE use a UPDATE SET WHERE QUERY in or outside an if statement doesn't matter and a REPLACE INTO QUERY also does seem to work
I have a system which has a complex primary key for interfacing with external systems, and a fast, small opaque primary key for internal use. For example: the external key might be a compound value - something like (given name (varchar), family name (varchar), zip code (char)) and the internal key would be an integer ("customer ID").
When I receive an incoming request with the external key, I need to look up the internal key - and here's the tricky part - allocate a new internal key if I don't already have one for the given external ID.
Obviously if I have only one client talking to the database at a time, this is fine. SELECT customer_id FROM customers WHERE given_name = 'foo' AND ..., then INSERT INTO customers VALUES (...) if I don't find a value. But, if there are potentially many requests coming in from external systems concurrently, and many may arrive for a previously unheard-of customer all at once, there is a race condition where multiple clients may try to INSERT the new row.
If I were modifying an existing row, that would be easy; simply SELECT FOR UPDATE first, to acquire the appropriate row-level lock, before doing an UPDATE. But in this case, I don't have a row that I can lock, because the row doesn't exist yet!
I've come up with several solutions so far, but each of them has some pretty significant issues:
Catch the error on INSERT, re-try the entire transaction from the top. This is a problem if the transaction involves a dozen customers, especially if the incoming data is potentially talking about the same customers in a different order each time. It's possible to get stuck in mutually recursive deadlock loops, where the conflict occurs on a different customer each time. You can mitigate this with an exponential wait time between re-try attempts, but this is a slow and expensive way to deal with conflicts. Also, this complicates the application code quite a bit as everything needs to be restartable.
Use savepoints. Start a savepoint before the SELECT, catch the error on INSERT, and then roll back to the savepoint and SELECT again. Savepoints aren't completely portable, and their semantics and capabilities differ slightly and subtly between databases; the biggest difference I've noticed is that, sometimes they seem to nest and sometimes they don't, so it would be nice if I could avoid them. This is only a vague impression though - is it inaccurate? Are savepoints standardized, or at least practically consistent? Also, savepoints make it difficult to do things in parallel on the same transaction, because you might not be able to tell exactly how much work you'll be rolling back, although I realize I might just need to live with that.
Acquire some global lock, like a table-level lock using a LOCK statement (oracle mysql postgres). This obviously slows down these operations and results in a lot of lock contention, so I'd prefer to avoid it.
Acquire a more fine-grained, but database-specific lock. I'm only familiar with Postgres's way of doing this, which is very definitely not supported in other databases (the functions even start with "pg_") so again it's a portability issue. Also, postgres's way of doing this would require me to convert the key into a pair of integers somehow, which it may not neatly fit into. Is there a nicer way to acquire locks for hypothetical objects?
It seems to me that this has got to be a common concurrency problem with databases but I haven't managed to find a lot of resources on it; possibly just because I don't know the canonical phrasing. Is it possible to do this with some simple extra bit of syntax, in any of the tagged databases?
I'm not clear on why you can't use INSERT IGNORE, which will run without error and you can check if an insert occurred (modified records). If the insert "fails", then you know the key already exists and you can do a SELECT. You could do the INSERT first, then the SELECT.
Alternatively, if you are using MySQL, use InnoDB which supports transactions. That would make it easier to rollback.
Perform each customer's "lookup or maybe create" operations in autocommit mode, prior to and outside of the main, multi-customer transaction.
WRT generating an opaque primary key, there are a number of options, eg., use a guid or (at least, with Oracle) a sequence table. WRT insuring the external key is unique, apply unique constraint on the column. If the insert fails because the key exists, reattempt the fetch. You can use an insert with where not exist or where not in. Use a stored procedure to reduce the round trips and improve performance.
A co-worker just made me aware of a very strange MySQL behavior.
Assuming you have a table with an auto_increment field and another field that is set to unique (e.g. a username-field). When trying to insert a row with a username thats already in the table the insert fails, as expected. Yet the auto_increment value is increased as can be seen when you insert a valid new entry after several failed attempts.
For example, when our last entry looks like this...
ID: 10
Username: myname
...and we try five new entries with the same username value on our next insert we will have created a new row like so:
ID: 16
Username: mynewname
While this is not a big problem in itself it seems like a very silly attack vector to kill a table by flooding it with failed insert requests, as the MySQL Reference Manual states:
"The behavior of the auto-increment mechanism is not defined if [...] the value becomes bigger than the maximum integer that can be stored in the specified integer type."
Is this expected behavior?
InnoDB is a transactional engine.
This means that in the following scenario:
Session A inserts record 1
Session B inserts record 2
Session A rolls back
, there is either a possibility of a gap or session B would lock until the session A committed or rolled back.
InnoDB designers (as most of the other transactional engine designers) chose to allow gaps.
From the documentation:
When accessing the auto-increment counter, InnoDB uses a special table-level AUTO-INC lock that it keeps to the end of the current SQL statement, not to the end of the transaction. The special lock release strategy was introduced to improve concurrency for inserts into a table containing an AUTO_INCREMENT column
…
InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.
If you are afraid of the id column wrapping around, make it BIGINT (8-byte long).
Without knowing the exact internals, I would say yes, the auto-increment SHOULD allow for skipped values do to failure inserts. Lets say you are doing a banking transaction, or other where the entire transaction and multiple records go as an all-or-nothing. If you try your insert, get an ID, then stamp all subsequent details with that transaction ID and insert the detail records, you need to ensure your qualified uniqueness. If you have multiple people slamming the database, they too will need to ensure they get their own transaction ID as to not conflict with yours when their transaction gets committed. If something fails on the first transaction, no harm done, and no dangling elements downstream.
Old post,
but this may help people,
You may have to set innodb_autoinc_lock_mode to 0 or 2.
System variables that take a numeric value can be specified as --var_name=value on the command line or as var_name=value in option files.
Command-Line parameter format:
--innodb-autoinc-lock-mode=0
OR
Open your mysql.ini and add following line :
innodb_autoinc_lock_mode=0
I know that this is an old article but since I also couldn't find the right answer, I actually found a way to do this. You have to wrap your query within an if statement. Its usually insert query or insert and on duplicate querys that mess up the organized auto increment order so for regular inserts use:
$check_email_address = //select query here\\
if ( $check_email_address == false ) {
your query inside of here
}
and instead of INSERT AND ON DUPLICATE use a UPDATE SET WHERE QUERY in or outside an if statement doesn't matter and a REPLACE INTO QUERY also does seem to work