I was running my ruby scripts to load in to mysql. It has an error:
Mysql::Error: Duplicate entry '4444281482' for key 'PRIMARY'
Where my primary key is Auto-increment ID (Big-INT). I was running the script in multiple terminals with different data using screen, to load into the same table. This problem never happened before, but when it happens, all the scripts in different terminals are likely to suffer from that problem. The dataset is different. It seems to happen randomly.
What is likely to be the cause?
Why there would be duplicate in an auto-increment field?
You mention that you are running the script from different terminals using different data. According to the MySQL manual, and assuming your engine is InnoDB, since each transaction is inserting a different amount of rows against an AUTO_INCREMENT column, the engine may not know how many rows will be retrieved in advance. This could possibly explain why you are receiving a duplicate key error. With the use of a table-level lock held to the end of the statement, only one INSERT statement can execute at a time and the generation of auto-increment numbers won't interleave.
I'm pretty sure I had this problem - it has nothing to do with client (I mean its reproducible in both my app, query browser, cli client etc.).
If you don't bother with gaps in your id numeration you can try
ALTER TABLE `tableName` AUTO_INCREMENT = 4444281492;
(of course you can try to add more than 10 indexes, like 100000 to be sure ;) you can always revert counter to old value with same query)
This will change your auto increment counter to greater number and potentially skip invalid indexes - although I have no idea what is the cause of this issue (in my case it persisted durign mysqld restart or entire machine reboot)
oh and I should add - I did it on dev server, if this is production I would advice further investigation.
Related
I'm creating a database which requires several fields to be unique, and was wondering which method is least expensive in terms of checking that uniqueness?
Query the database with a mysqli() call to check if a value exists?
Use PHP to download a file of all entries, then check that file and delete afterwards
Set columns to "Unique" index
If the best option (which I'm assuming it is) is to set the columns to unique, then how do you go about handling the error that gets thrown when the value already exists without breaking out of the function? Or is that even possible?
Querying the database first risks race conditions. That is, you SELECT to verify the value isn't already there, so you can INSERT it. But unfortunately, in the brief moment between your SELECT and your INSERT, someone else slips in and inserts the value you were going to add. So you end up having to catch the error anyway.
This may seem unlikely, but there's some old wisdom: "one in a million is next Tuesday." I.e. when we process millions of transactions per day, even a rare fluke is bound to happen sooner than we think.
This is right out. What happens when the set of entries is 10 million long? 100 million? 1 billion? This solution doesn't scale, so just put it out of your mind immediately.
Yes, use a UNIQUE constraint. Attempt the INSERT and handle the error. This avoids a race condition, because your INSERT's unique check is atomic. That is, no one can slip in between the clock ticks to add a value before you can insert it.
One caveat of this: in MySQL's InnoDB storage engine, if you try an INSERT and it fails due to conflicting with a UNIQUE constraint (or other reason for failure), it doesn't reverse its allocation of the next auto-increment value. The row is not inserted, but the auto-inc value is generated and discarded. So if you have frequent cases of such failures, you could end up skipping a lot of integers in your primary key. I had one case where my customer actually ran out of integers because they were skipping 1500 id values for each row that was successfully inserted. In their case, I suggested using your solution 1, then try the insert only if they are "pretty sure" of a safe insert, but then they have to handle the error anyway just in case of the race condition.
Handling the error means checking the return value every time you execute an SQL query. I can't tell you how many questions I read on StackOverflow where programmers fail to check that execute() returned false, and they wonder why their INSERT failed.
The quick answer is let the database do it if at all possible.
The slower answer depends on how you want to handle exceptions to your uniqueness requirement.
If you never need to over-ride the uniqueness requirement, you can use a UNIQUE index in MySQL. Then you can use "ON DUPLICATE KEY" to handle the exceptions.
However, if you sometimes need to allow a duplicate, you can't use a UNIQUE key and you'd be best using a regular INDEX and doing a query first to see if the value exists before you insert it.
Well, the least expensive is one point, the user experience is another.
I would personnaly go for a query (with a custom message if key is found) AND a Unique constraint (to have a consistent db). So 1 + 3.
But if you want less expensive, just go to the unique constraint, and try to build some comprehensive error message, using the error message from mysqli_error.
So 1 + 3 or 3, but not 2.
A co-worker just made me aware of a very strange MySQL behavior.
Assuming you have a table with an auto_increment field and another field that is set to unique (e.g. a username-field). When trying to insert a row with a username thats already in the table the insert fails, as expected. Yet the auto_increment value is increased as can be seen when you insert a valid new entry after several failed attempts.
For example, when our last entry looks like this...
ID: 10
Username: myname
...and we try five new entries with the same username value on our next insert we will have created a new row like so:
ID: 16
Username: mynewname
While this is not a big problem in itself it seems like a very silly attack vector to kill a table by flooding it with failed insert requests, as the MySQL Reference Manual states:
"The behavior of the auto-increment mechanism is not defined if [...] the value becomes bigger than the maximum integer that can be stored in the specified integer type."
Is this expected behavior?
InnoDB is a transactional engine.
This means that in the following scenario:
Session A inserts record 1
Session B inserts record 2
Session A rolls back
, there is either a possibility of a gap or session B would lock until the session A committed or rolled back.
InnoDB designers (as most of the other transactional engine designers) chose to allow gaps.
From the documentation:
When accessing the auto-increment counter, InnoDB uses a special table-level AUTO-INC lock that it keeps to the end of the current SQL statement, not to the end of the transaction. The special lock release strategy was introduced to improve concurrency for inserts into a table containing an AUTO_INCREMENT column
…
InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.
If you are afraid of the id column wrapping around, make it BIGINT (8-byte long).
Without knowing the exact internals, I would say yes, the auto-increment SHOULD allow for skipped values do to failure inserts. Lets say you are doing a banking transaction, or other where the entire transaction and multiple records go as an all-or-nothing. If you try your insert, get an ID, then stamp all subsequent details with that transaction ID and insert the detail records, you need to ensure your qualified uniqueness. If you have multiple people slamming the database, they too will need to ensure they get their own transaction ID as to not conflict with yours when their transaction gets committed. If something fails on the first transaction, no harm done, and no dangling elements downstream.
Old post,
but this may help people,
You may have to set innodb_autoinc_lock_mode to 0 or 2.
System variables that take a numeric value can be specified as --var_name=value on the command line or as var_name=value in option files.
Command-Line parameter format:
--innodb-autoinc-lock-mode=0
OR
Open your mysql.ini and add following line :
innodb_autoinc_lock_mode=0
I know that this is an old article but since I also couldn't find the right answer, I actually found a way to do this. You have to wrap your query within an if statement. Its usually insert query or insert and on duplicate querys that mess up the organized auto increment order so for regular inserts use:
$check_email_address = //select query here\\
if ( $check_email_address == false ) {
your query inside of here
}
and instead of INSERT AND ON DUPLICATE use a UPDATE SET WHERE QUERY in or outside an if statement doesn't matter and a REPLACE INTO QUERY also does seem to work
I have a table that stores messages from one user to another. messages(user_id,friend_id,message,created_date). My primary key is (friend_id,created_date). This prevents duplicate messages (AFAIK) because they will fail to insert.
Right now this is ok because my code generates about 20 of these queries at a time per user and I only have one user. But if there were hundreds or thousands of users would this create a bottleneck in my database with all the failed transactions? And if what kinds of things could I do to improve the situation?
EDIT:
The boiled down question is should I use the primary key constraint,check outside of mysql, or use some other mysql functionality to keep duplicates out of the database?
Should be fine as mysql will just do a primary key lookup internally and ignore the record (I'm assuming you're using INSERT IGNORE). If you were checking if they exist before inserting, mysql will still check again when you insert. This means if most inserts are going to succeed, then you're saving an extra check. If the vast majority of inserts were failing (not likely) then possibly the savings from not sending unnecessary data would outweigh the occasional repeated check.
A co-worker just made me aware of a very strange MySQL behavior.
Assuming you have a table with an auto_increment field and another field that is set to unique (e.g. a username-field). When trying to insert a row with a username thats already in the table the insert fails, as expected. Yet the auto_increment value is increased as can be seen when you insert a valid new entry after several failed attempts.
For example, when our last entry looks like this...
ID: 10
Username: myname
...and we try five new entries with the same username value on our next insert we will have created a new row like so:
ID: 16
Username: mynewname
While this is not a big problem in itself it seems like a very silly attack vector to kill a table by flooding it with failed insert requests, as the MySQL Reference Manual states:
"The behavior of the auto-increment mechanism is not defined if [...] the value becomes bigger than the maximum integer that can be stored in the specified integer type."
Is this expected behavior?
InnoDB is a transactional engine.
This means that in the following scenario:
Session A inserts record 1
Session B inserts record 2
Session A rolls back
, there is either a possibility of a gap or session B would lock until the session A committed or rolled back.
InnoDB designers (as most of the other transactional engine designers) chose to allow gaps.
From the documentation:
When accessing the auto-increment counter, InnoDB uses a special table-level AUTO-INC lock that it keeps to the end of the current SQL statement, not to the end of the transaction. The special lock release strategy was introduced to improve concurrency for inserts into a table containing an AUTO_INCREMENT column
…
InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.
If you are afraid of the id column wrapping around, make it BIGINT (8-byte long).
Without knowing the exact internals, I would say yes, the auto-increment SHOULD allow for skipped values do to failure inserts. Lets say you are doing a banking transaction, or other where the entire transaction and multiple records go as an all-or-nothing. If you try your insert, get an ID, then stamp all subsequent details with that transaction ID and insert the detail records, you need to ensure your qualified uniqueness. If you have multiple people slamming the database, they too will need to ensure they get their own transaction ID as to not conflict with yours when their transaction gets committed. If something fails on the first transaction, no harm done, and no dangling elements downstream.
Old post,
but this may help people,
You may have to set innodb_autoinc_lock_mode to 0 or 2.
System variables that take a numeric value can be specified as --var_name=value on the command line or as var_name=value in option files.
Command-Line parameter format:
--innodb-autoinc-lock-mode=0
OR
Open your mysql.ini and add following line :
innodb_autoinc_lock_mode=0
I know that this is an old article but since I also couldn't find the right answer, I actually found a way to do this. You have to wrap your query within an if statement. Its usually insert query or insert and on duplicate querys that mess up the organized auto increment order so for regular inserts use:
$check_email_address = //select query here\\
if ( $check_email_address == false ) {
your query inside of here
}
and instead of INSERT AND ON DUPLICATE use a UPDATE SET WHERE QUERY in or outside an if statement doesn't matter and a REPLACE INTO QUERY also does seem to work
For example if I have an auto-numbered field, I add new records without specifying this field and let DB engine to pick it for me.
So, will it pick the number of the deleted record? If yes, when?
// SQL Server, MySQL. //
Follow-up question: What happens when DB engine runs out of numbers to use for primary keys?
NO. numerical primary keys will not reused, except you specify them manually(you should really avoid this!)
AFAIK, this could happen in MySQL:
How AUTO_INCREMENT Handling Works in InnoDB:
InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.
After a restart of server. Innodb reuse previously generated auto_increment values.
:
Suggested fix:
innodb table should not lose the track of next number for auto_increment column after
restart.
Depends on the auto-numbering system. If you're using a sequence of any kind, the numbers of deleted records will not get reused, as the sequence does not know about them.
Generally, no, the numbers are not reused.
However, you can -- in products like Oracle -- specify a sequence generator which cycles around and will reuse numbers.
Whether those are numbers of deleted records or not is your applications's problem.
This question needs to be made more precise:
... "with Oracle Sequences"
... "with MySQL autonumber columns"
... etc...
As long as you create the table correctly you will not reuse numbers.
However you can RESEED the identity column (IN MSSQL anyway) by using the following:
-- Enter the number of the last valid entry in the table not the next number to be used
DBCC CHECKIDENT ([TableName], RESEED, [NumberYouWantToStartAt])
This is of course insane... and should never be done :)
MySQL will not reuse IDs unless you truncate the table or delete from the table with no where clause (in which case MySQL, internally, simply does a truncate).
Not specifically. If the key is being read from a sequence or autoincrementing identity column the sequence will just plug along and produce the next value. However, you can deactivate this (set identity_insert on on SQL Server) and put any number you want in the column as long as it doesn't violate the uniqueness constraint.
Yeah, it really depends on the way you generate the id.
For example if you are using a GUID as the primary key, most implementations of getting a random new Guid are not likely to pick another guid again, but it will given enough time and if the Guid is not in the table the insert statement will go fine, but if there is already a guid there you will get a primary key constraint violation.
I consider the MySQL "feature" of reusing id's a bug.
Consider something like processing of file uploads. Using the database id as a filename is a good practice : simple, no risk of exploits with user-supplied filenames, etc.
You can't really make everything transactional when the filesystem is involved... you'll have to commit the database transaction then write the file, or write the file and commit the database transaction, but if one or both fail, or you have a crash, or your network filesystem has a fit, you might have a valid record in the database and no file, or a file without a database record, since the thing is not atomic.
If such a problem happens, and the first thing the server does when coming back is overwrite the ids, and thus the files, of rolled back transactions, it sucks. Those files could have been useful.
no, imagine if your bank decided to re-use your account_id - arghhhh !!