MySQL atomic insert-if-not-exists with stable autoincrement - mysql

In MySQL, I am using an InnoDB table that contains unique names, and IDs for those names. Clients need to atomically check for an existing name, insert a new one if it does not exist, and get the ID. The ID is an AUTO_INCREMENT value, and it must not increment out-of-control when checking for existing values regardless of the setting of "innodb_autoinc_lock_mode"; this is because very often the same name will be checked (e.g. "Alice"), and every now and then some new name will come along (e.g. "Bob").
The "INSERT...ON DUPLICATE KEY UPDATE" statement causes an AUTO_INCREMENT increase even in the duplicate-key case, depending on "innodb_autoinc_lock_mode", and is thus unacceptable. The ID will be used as the target of a Foreign-Key Constraint (in another table), and thus it is not okay to change existing IDs. Clients must not deadlock when they do this action concurrently, regardless of how the operations might be interleaved.
I would like the processing during the atomic operation (e.g. checking for the existing ID and deciding whether or not to do the insert) to be done on the server-side rather than the client-side, so that the delay for other sessions attempting to do the same thing simultaneously is minimal and does not need to wait for client-side processing.
My test table to demonstrate this is named FirstNames:
CREATE TABLE `FirstNames` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`FirstName` varchar(45) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `FirstName_UNIQUE` (`FirstName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The best solution that I have come up with thus far is as follows:
COMMIT;
SET #myName='Alice';
SET #curId=NULL;
SET autocommit=0;
LOCK TABLES FirstNames WRITE;
SELECT Id INTO #curId FROM FirstNames WHERE FirstName = #myName;
INSERT INTO `FirstNames` (`FirstName`) SELECT #myName FROM DUAL WHERE #curId IS NULL;
COMMIT;
UNLOCK TABLES;
SET #curId=IF(#curId IS NULL, LAST_INSERT_ID(), #curId);
SELECT #curId;
This uses "LOCK TABLES...WRITE" following the instructions given in the MySQL "Interaction of Table Locking and Transactions" documentation for the correct way to lock InnoDB tables. This solution requires the user to have the "LOCK TABLES" privilege.
If I run the above query with #myName="Alice", I obtain a new ID and then continue to obtain the same ID no matter how many times I run it. If I then run with #myName="Bob", I get another ID with the next AUTO_INCREMENT value, and so on. Checking for a name that already exists does not increase the table's AUTO_INCREMENT value.
I am wondering if there is a better solution to accomplish this, perhaps one that does not require the "LOCK TABLES"/"UNLOCK TABLES" commands and combines more "rudimentary" commands (e.g. "INSERT" and "SELECT") in a more clever way? Or is this the best methodology that MySQL currently has to offer?
Edit
This is not a duplicate of "How to 'insert if not exists' in MySQL?". That question does not address all of the criteria that I stated. The issue of keeping the AUTO_INCREMENT value stable is not resolved there (it is only mentioned in passing).
Many of the answers do not address getting the ID of the existing/inserted record, some of the answers do not provide an atomic operation, and some of the answers have the logic being done on the client-side rather than the server-side. A number of the answers change an existing record, which is not what I'm looking for. I am asking for either a better method to meet all of the criteria stated, or confirmation that my solution is the optimal one with existing MySQL support.

The question is really about how to normalize data when you expect there to be duplicates. And then avoid "burning" ids.
http://mysql.rjweb.org/doc.php/staging_table#normalization discusses a 2-step process and is aimed at mass updates due to high-speed ingestion of rows. It degenerates to a single row, but still requires the 2 steps.
Step 1 INSERTs any new rows, creating new auto_inc ids.
Step 2 pulls back the ids en masse.
Note that the work is best done with autocommit=ON and outside the main transaction that is loading the data. This avoids an extra cause for burning ids, namely potential rollbacks.

You can use a conditional INSERT in a single statement:
INSERT INTO FirstNames (FirstName)
SELECT i.firstName
FROM (SELECT 'Alice' AS firstName) i
WHERE NOT EXISTS (SELECT * FROM FirstNames t WHERE t.FirstName = i.firstName);
The next AUTO_INCREMENT value stays untouched in case of existance. But I can't tell you that would be the case in any (future) version or for every configuration. However, it is not much different from what you did - Just in a single statement and without locking the table.
At this point you can be sure that the name exists and just select the corresponding Id:
SELECT Id FROM FirstNames WHERE FirstName = 'Alice';

Related

Select FOR UPDATE gives duplicate key error

I am using SELECT...FOR UPDATE to enforce a unique key. My table looks like:
CREATE TABLE tblProductKeys (
pkKey varchar(100) DEFAULT NULL,
fkVendor varchar(50) DEFAULT NULL,
productType varchar(100) DEFAULT NULL,
productKey bigint(20) DEFAULT NULL,
UNIQUE KEY pkKey (pkKey,fkVendor,productType,productKey)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So rows might look like:
{'Ace-Hammer','Ace','Hammer',121},
{'Ace-Hammer','Ace','Hammer',122},
...
{'Menards-Hammer','Menards','Hammer',121},
...
So note that 'Ace-Hammer' and 'Menards-Hammer' can have the same productKey, only the product+key combination needs to be unique. The requirement that it is an integer defined in this way is organizational, I don't think this is something I can do with auto_increment using innoDb, but hence the question.
So if a vendor creates a new version of an existing product, we give it a distinct key for that vendor/product combination (I realize the pkKey column is redundant in these examples).
My stored procedure is like:
CREATE PROCEDURE getNewKey(IN vkey varchar(50),vvendor varchar(50),vkeyType varchar(50)) BEGIN
start transaction;
set #newKey=(select max(productKey) from tblProductKeys where pkKey=vkey and fkVendor=vvendor and productType=vkeyType FOR UPDATE);
set #newKey=coalesce(#newKey,0);
set #newKey=#newKey+1;
insert into tblProductKeys values (vkey,vclient,vkeyType,#newKey);
commit;
select #newKey as keyMax;
END
That's all! During periods of heavy use, (1000s of users), I see:
Duplicate entry 'Ace-Hammer-Ace-Hammer-44613' for key 'pkKey'.
I can retry the transaction, but this is not an error I was expecting and I'd like to understand why it happens. I could understand the row locking causing deadlock but in this case it seems like the rows are not locked at all. I wonder if the issue is with max() in this context, or possibly the table index. This sproc is the only transaction that is performed on this table.
Any insight is appreciated. I have read several MySql/SO posts on the subject, most concerns and issues seem to be with over-locking or deadlocks. E.g. here: When using MySQL's FOR UPDATE locking, what is exactly locked?
To achieve "only the product+key combination needs to be unique", say
UNIQUE(pkKey, productKey)
in either order. Then, your 4-column UNIQUE is redundant. It could be turned into a plain INDEX if needed for some particular query.
Furthermore, you really ought to have a PRIMARY KEY. It may as well be
PRIMARY KEY(pkKey, productKey) -- in either order
and then get rid of my suggested UNIQUE key.
There is no good reason to make productKey depend on pkKey, if that is what you are thinking of. Instead, simply do
productKey INT UNSIGNED AUTO_INCREMENT
There needs to be at least INDEX(productKey).
Now, I am unclear on whether you need for the 'Menards' and 'Ace' hammers to both be number 121? Summary:
PRIMARY KEY(pkKey, productKey),
INDEX(productKey)
Case 1: Both need to be "121". You need some way to explicitly insert a new row with an existing auto-inc value. This is not a problem; you simply specify '121' instead of letting it acquire the next auto-inc value.
Case 2: There is no need for both to be "121". Then simply use the full force of AUTO_INCREMENT:
PRIMARY KEY(productKey)
But if you really like your SP, let's shorten it down to a single statement, even tossing the transaction:
BEGIN;
INSERT
INTO tblProductKeys
SELECT vkey, vclient, vkeyType,
#new_id := COALESCE(MAX(productKey) + 1, 0)
FROM tblProductKeys
WHERE pkKey = vkey
AND fkVendor = vvendor
AND productType = vkeyType;
END //
Now, you will need
INDEX(pkKey, fkVendor, productType, -- in any order
productKey) -- last
PRIMARY KEY(pkKey, productKey) -- in either order (as previously discussed)
Then use #new_id outside the SP.
I'm a little embarrassed but it is a pretty obvious problem. The issue is that 'FOR UPDATE' only locks the current row. So you can UPDATE it. But I am doing an INSERT! Not an update.
If 2 queries collide, the row is locked but after the transaction is complete the row is unlocked and it can be read. So you are still reading a stale value. To get the behavior I was expected, you'd need to lock the whole table.
So I think auto-increment would work for me, although I need a way to get the last_inserted_id so I need to be in the context of a procedure anyway (I am using c# driver).

How to let data "disappear" from database? MySQL

I've got a bit of a stupid question. The thing is my program has to have the function to delete data from my database. Yay, not really the problem. But how can I delete data without the danger that others can see, that there has been something deleted.
User Table:
U_ID U_NAME
1 Chris
2 Peter
OTHER TABLE
ID TIMESTAMP FK_U_D
1 2012-12-01 1
2 2012-12-02 1
Sooooo the ID's are AUTO_INCREMENT, so if I delete one of them there's a gap. Furthermore, the timestamp is also bigger than the row before, so ascending.
I want to let the data with ID 1 disappear from the user's profile (U_ID 1).
If I delete it, there is a gap. If I just change the FK_U_ID to 2 (Peter) it's obvious, because when I insert data, there are 20 or 30 data rows with the same U_ID...so it's obvious that there has been a modification.
If I set the FK_U_ID NULL --> same sh** like when I change it to another U_ID.
Is there any solution to get this work? I know that if nobody but me has access to the database, it's just no problem. But just in case, if somebody controls my program it should not be obvious that there has been modifications.
So here we go.
For the ID gaps issue you can use GUIDs as #SLaks suggests, but then you can't use the native RDBMS auto_increment which means you have to create the GUID and insert it along with the rest of the record data upon creation. Of course, you don't really need the ID to be globally unique, you could just store a random string of 20 characters or something, but then you have to do a DB read to see if that ID is taken and repeat (recursively) that process until you find an unused ID... could be quite taxing.
It's not at all clear why you would want to "hide" evidence that a delete was performed. That sounds like a really bad idea. I'm not a fan of promulgating misinformation.
Two of the characteristics of an ideal primary key are:
- anonymous (be void of any useful information, doesn't matter what it's set to)
- immutable (once assigned, it will never be changed.)
But, if we set that whole discussion aside...
I can answer a slightly different question (an answer you might find helpful to your particular situation)
The only way to eliminate a "gap" in the values in a column with an AUTO_INCREMENT would be to change the column values from their current values to a contiguous sequence of new values. If there are any foreign keys that reference that column, the values in those columns would need to be updated as well, to preserve the relationship. That will likely leave the current auto_increment value of the table higher than the largest value of the id column, so I'd want to reset that as well, to avoid a "gap" on the next insert.
(I have done re-sequencing of auto_increment values in development and test environments, to "cleanup" lookup tables, and to move the id values of some tables to ranges that are distinct from ranges in other tables... that let's me test SQL to make sure the SQL join predicates aren't inadvertently referencing the wrong table, and returning rows that look correct by accident... those are some reasons I've done reassignment if auto_increment values)
Note that the database can "automagically" update foreign key values (for InnnoDB tables) when you change the primary key value, as long as the foreign key constraint is defined with ON UPDATE CASCADE, and FOREIGN_KEY_CHECKS is not disabled.
If there are no foreign keys to deal with, and assuming that all of the current values of id are positive integers, then I've been able to do something like this: (with appropriate backups in place, so I can recover if things don't work right)
UPDATE mytable t
JOIN (
SELECT s.id AS old_id
, #i := #i + 1 AS new_id
FROM mytable s
CROSS
JOIN (SELECT #i := 0) i
ORDER BY s.id
) c
ON t.id = c.old_id
SET t.id = c.new_id
WHERE t.id <> c.new_id
To reset the table AUTO_INCREMENT back down to the largest id value in the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
Typically, I will create a table and populate it from that query in the inline view (aliased as c) above. I can then use that table to update both foreign key columns and the primary key column, first disabling the FOREIGN_KEY_CHECKS and then re-enabling it. (In a concurrent environment, where other processes might be inserting/updating/deleting rows from one of the tables, I would of course first obtain an exclusive lock on all of the tables to be updated.)
Taking up again, the discussion I set aside earlier... this type of "administrative" function can be useful in a test environment, when setting up test cases. But it is NOT a function that is ever performed in a production environment, with live data.

Emulating a transaction-safe SEQUENCE in MySQL

We're using MySQL with InnoDB storage engine and transactions a lot, and we've run into a problem: we need a nice way to emulate Oracle's SEQUENCEs in MySQL. The requirements are:
- concurrency support
- transaction safety
- max performance (meaning minimizing locks and deadlocks)
We don't care if some of the values won't be used, i.e. gaps in sequence are ok. There is an easy way to archieve that by creating a separate InnoDB table with a counter, however this means it will take part in transaction and will introduce locks and waiting. I am thinking to try a MyISAM table with manual locks, any other ideas or best practices?
If auto-increment isn't good enough for your needs, you can create a atomic sequence mechanism with n named sequences like this:
Create a table to store your sequences:
CREATE TABLE sequence (
seq_name varchar(20) unique not null,
seq_current int unsigned not null
);
Assuming you have a row for 'foo' in the table you can atomically get the next sequence id like this:
UPDATE sequence SET seq_current = (#next := seq_current + 1) WHERE seq_name = 'foo';
SELECT #next;
No locks required. Both statements need to be executed in the same session, so that the local variable #next is actually defined when the select happens.
The right way to do this is given in the MySQL manual:
UPDATE child_codes SET counter_field = LAST_INSERT_ID(counter_field + 1);
SELECT LAST_INSERT_ID();
We are a high transaction gaming company and need these sort of solutions for our needs. One of the features of Oracle sequences was also the increment value that could also be set.
The solution uses DUPLICATE KEY.
CREATE TABLE sequences (
id BIGINT DEFAULT 1,
name CHAR(20),
increment TINYINT,
UNIQUE KEY(name)
);
To get the next index:
Abstract the following with a stored procedure or a function sp_seq_next_val(VARCHAR):
INSERT INTO sequences (name) VALUES ("user_id") ON DUPLICATE KEY UPDATE id = id + increment;<br/>
SELECT id FROM sequences WHERE name = "user_id";
Won't the MySQL Identity column on the table handle this?
CREATE TABLE table_name
(
id INTEGER AUTO_INCREMENT PRIMARY KEY
)
Or are you looking to use it for something other than just inserting into another table?
If you're writing using a procedural language as well (instead of just SQL) then the other option would be to create a table containing a single integer (or long integer) value and a stored procedure which locked it, selected from it, incremented it and unlocked it before returning the value.
(Note - always increment before you return the value - it maximise the chance of not getting duplicates if there are errors - or wrap the whole thing in a transaction.)
You would then call this independently of your main insert / update (so it doesn't get caught in any transactions automatically created by the calling mechanism) and then pass it as a parameter to wherever you want to use it.
Because it's independent of the rest of the stuff you're doing it should be quick and avoid locking issues. Even if you did see an error caused by locking (unlikely unless you're overloading the database) you could just call it a second / third time.

MySQL AUTO_INCREMENT does not ROLLBACK

I'm using MySQL's AUTO_INCREMENT field and InnoDB to support transactions. I noticed when I rollback the transaction, the AUTO_INCREMENT field is not rollbacked? I found out that it was designed this way but are there any workarounds to this?
It can't work that way. Consider:
program one, you open a transaction and insert into a table FOO which has an autoinc primary key (arbitrarily, we say it gets 557 for its key value).
Program two starts, it opens a transaction and inserts into table FOO getting 558.
Program two inserts into table BAR which has a column which is a foreign key to FOO. So now the 558 is located in both FOO and BAR.
Program two now commits.
Program three starts and generates a report from table FOO. The 558 record is printed.
After that, program one rolls back.
How does the database reclaim the 557 value? Does it go into FOO and decrement all the other primary keys greater than 557? How does it fix BAR? How does it erase the 558 printed on the report program three output?
Oracle's sequence numbers are also independent of transactions for the same reason.
If you can solve this problem in constant time, I'm sure you can make a lot of money in the database field.
Now, if you have a requirement that your auto increment field never have gaps (for auditing purposes, say). Then you cannot rollback your transactions. Instead you need to have a status flag on your records. On first insert, the record's status is "Incomplete" then you start the transaction, do your work and update the status to "compete" (or whatever you need). Then when you commit, the record is live. If the transaction rollsback, the incomplete record is still there for auditing. This will cause you many other headaches but is one way to deal with audit trails.
Let me point out something very important:
You should never depend on the numeric features of autogenerated keys.
That is, other than comparing them for equality (=) or unequality (<>), you should not do anything else. No relational operators (<, >), no sorting by indexes, etc. If you need to sort by "date added", have a "date added" column.
Treat them as apples and oranges: Does it make sense to ask if an apple is the same as an orange? Yes. Does it make sense to ask if an apple is larger than an orange? No. (Actually, it does, but you get my point.)
If you stick to this rule, gaps in the continuity of autogenerated indexes will not cause problems.
I had a client needed the ID to rollback on a table of invoices, where the order must be consecutive
My solution in MySQL was to remove the AUTO-INCREMENT and pull the latest Id from the table, add one (+1) and then insert it manually.
If the table is named "TableA" and the Auto-increment column is "Id"
INSERT INTO TableA (Id, Col2, Col3, Col4, ...)
VALUES (
(SELECT Id FROM TableA t ORDER BY t.Id DESC LIMIT 1)+1,
Col2_Val, Col3_Val, Col4_Val, ...)
Why do you care if it is rolled back? AUTO_INCREMENT key fields are not supposed to have any meaning so you really shouldn't care what value is used.
If you have information you're trying to preserve, perhaps another non-key column is needed.
I do not know of any way to do that. According to the MySQL Documentation, this is expected behavior and will happen with all innodb_autoinc_lock_mode lock modes. The specific text is:
In all lock modes (0, 1, and 2), if a
transaction that generated
auto-increment values rolls back,
those auto-increment values are
“lost.” Once a value is generated for
an auto-increment column, it cannot be
rolled back, whether or not the
“INSERT-like” statement is completed,
and whether or not the containing
transaction is rolled back. Such lost
values are not reused. Thus, there may
be gaps in the values stored in an
AUTO_INCREMENT column of a table.
If you set auto_increment to 1 after a rollback or deletion, on the next insert, MySQL will see that 1 is already used and will instead get the MAX() value and add 1 to it.
This will ensure that if the row with the last value is deleted (or the insert is rolled back), it will be reused.
To set the auto_increment to 1, do something like this:
ALTER TABLE tbl auto_increment = 1
This is not as efficient as simply continuing on with the next number because MAX() can be expensive, but if you delete/rollback infrequently and are obsessed with reusing the highest value, then this is a realistic approach.
Be aware that this does not prevent gaps from records deleted in the middle or if another insert should occur prior to you setting auto_increment back to 1.
INSERT INTO prueba(id)
VALUES (
(SELECT IFNULL( MAX( id ) , 0 )+1 FROM prueba target))
If the table doesn't contain values or zero rows
add target for error mysql type update FROM on SELECT
If you need to have the ids assigned in numerical order with no gaps, then you can't use an autoincrement column. You'll need to define a standard integer column and use a stored procedure that calculates the next number in the insert sequence and inserts the record within a transaction. If the insert fails, then the next time the procedure is called it will recalculate the next id.
Having said that, it is a bad idea to rely on ids being in some particular order with no gaps. If you need to preserve ordering, you should probably timestamp the row on insert (and potentially on update).
Concrete answer to this specific dilemma (which I also had) is the following:
1) Create a table that holds different counters for different documents (invoices, receipts, RMA's, etc..); Insert a record for each of your documents and add the initial counter to 0.
2) Before creating a new document, do the following (for invoices, for example):
UPDATE document_counters SET counter = LAST_INSERT_ID(counter + 1) where type = 'invoice'
3) Get the last value that you just updated to, like so:
SELECT LAST_INSERT_ID()
or just use your PHP (or whatever) mysql_insert_id() function to get the same thing
4) Insert your new record along with the primary ID that you just got back from the DB. This will override the current auto increment index, and make sure you have no ID gaps between you records.
This whole thing needs to be wrapped inside a transaction, of course. The beauty of this method is that, when you rollback a transaction, your UPDATE statement from Step 2 will be rolled back, and the counter will not change anymore. Other concurrent transactions will block until the first transaction is either committed or rolled back so they will not have access to either the old counter OR a new one, until all other transactions are finished first.
SOLUTION:
Let's use 'tbl_test' as an example table, and suppose the field 'Id' has AUTO_INCREMENT attribute
CREATE TABLE tbl_test (
Id int NOT NULL AUTO_INCREMENT ,
Name varchar(255) NULL ,
PRIMARY KEY (`Id`)
)
;
Let's suppose that table has houndred or thousand rows already inserted and you don't want to use AUTO_INCREMENT anymore; because when you rollback a transaction the field 'Id' is always adding +1 to AUTO_INCREMENT value.
So to avoid that you might make this:
Let's remove AUTO_INCREMENT value from column 'Id' (this won't delete your inserted rows):
ALTER TABLE tbl_test MODIFY COLUMN Id int(11) NOT NULL FIRST;
Finally, we create a BEFORE INSERT Trigger to generate an 'Id' value automatically. But using this way won't affect your Id value even if you rollback any transaction.
CREATE TRIGGER trg_tbl_test_1
BEFORE INSERT ON tbl_test
FOR EACH ROW
BEGIN
SET NEW.Id= COALESCE((SELECT MAX(Id) FROM tbl_test),0) + 1;
END;
That's it! You're done!
You're welcome.
$masterConn = mysql_connect("localhost", "root", '');
mysql_select_db("sample", $masterConn);
for($i=1; $i<=10; $i++) {
mysql_query("START TRANSACTION",$masterConn);
$qry_insert = "INSERT INTO `customer` (id, `a`, `b`) VALUES (NULL, '$i', 'a')";
mysql_query($qry_insert,$masterConn);
if($i%2==1) mysql_query("COMMIT",$masterConn);
else mysql_query("ROLLBACK",$masterConn);
mysql_query("ALTER TABLE customer auto_increment = 1",$masterConn);
}
echo "Done";

Should I lock an ISAM table to insert a value into a unique key field?

I have an ISAm table in mySql that was created similar to this:
create table mytable (
id int not null auto_increment primary key,
name varchar(64) not null );
create unique index nameIndex on mytable (name);
I have multiple processes inserting rows into this table. If two processes try to insert the same "name", I want to make sure that one of them either gets an error or finds the row with the matching "name".
Should I lock the table and in the lock make sure that the name doesn't exist, or should I rely on the server giving an error to one of the processes that try to insert a value that already exists in the unique indexed field?
I'm a bit hesitant to use a lock because I don't want to get into a deadlock situation.
Do not bother locking, your index will prevent duplicates. You should handle the error code from your application.
MySQL should return an error code of 1062 (or SQLSTATE 23000) when your unique key constraint is violated.
By the way you described the fear of encountering a DEADLOCK, the causation may not be clearly understood (unless there is more to your querying than described in the question).
A good summary someone else wrote:
Query 1 begins by locking resource A
Query 2 begins by locking resource B
Query 1, in order to continue, needs a lock on resource B, but Query 2 is locking that resource, so Query 1 starts waiting for it to release
In the meantime, Query 2 tries to finish, but it needs a lock on resource A in order to finish, but it can't get that because Query 1 has the lock on that.