Too many auto increments with ON DUPLICATE KEY UPDATE - mysql

I have a basic table with columns:
id (primary with AI)
name (unique)
etc
If the unique column doesn't exist, INSERT the row, otherwise UPDATE the row....
INSERT INTO pages (name, etc)
VALUES
'bob',
'randomness'
ON DUPLICATE KEY UPDATE
name = VALUES(name),
etc = VALUES(etc)
The problem is that if it performs an UPDATE, the auto_increment value on the id column goes up. So if a whole bunch of UPDATES are performed, the id auto_increment goes through the roof.
Apparently it was a bug: http://bugs.mysql.com/bug.php?id=28781
...but I'm using InnoDB on mySQL 5.5.8 on shared hosting.
Other people having issues with no solution years ago:
prevent autoincrement on MYSQL duplicate insert and
Why does MySQL autoincrement increase on failed inserts?
Ideas on a fix? Have I maybe structured the database incorrectly somehow?
******EDIT****: It appears adding innodb_autoinc_lock_mode = 0 to your my.ini file fixes the problem but what options do I have for shared hosting?
******EDIT 2******: OK, I think my only option is to change to MyISAM as the storage engine. Being a mega mySQL newbie, I hope that doesn't cause many issues. Yeah?

I don't think there is a way to bypass this behaviour of INSERT ... ON DUPLICTE KEY UPDATE.
You can however put two statements, one UPDATE and one INSERT, in one transaction:
START TRANSACTION ;
UPDATE pages
SET etc = 'randomness'
WHERE name = 'bob' ;
INSERT INTO pages (name, etc)
SELECT
'bob' AS name
, 'randomness' AS etc
FROM dual
WHERE NOT EXISTS
( SELECT *
FROM pages p
WHERE p.name = 'bob'
) ;
COMMIT ;

The on duplicate key functionality of MySQL is exactly the same as doing two separate queries, one to select, then one to either update the selected record, or insert a new record. Doing so programmatically is just as fast and will prevent this problem in the future as well as make your code more portable.

Related

Resolve MySQL deadlock on INSERT INTO SELECT NOT EXISTS

I've been searching the internet for a couple hours now and I'm not sure how to resolve this at all. So brief description is a customer posts orders to our system and they can supply a Customer Reference that our system will reject if that Customer Reference already exists.
I can't make the column in MySQL UNIQUE as different clients sometimes use the same Customer Reference and we do not require the Customer Reference so sometimes it's just left blank.
Originally I was just checking if the Customer Reference existed if necessary and then inserting the row if it did not exist. This works on 99.99% of cases, but I have a client that mass sends orders and those sometimes have duplicates. Which since they're posting quickly the select can happen before the first insert and duplicates arise.
I've switched to code like this below:(Shortened for example, this only runs if customerReference is not blank)
INSERT INTO ordersTable (clientID,customerReference,deliveryName) SELECT clientID, customerReference,deliveryName
FROM (SELECT 'clientID' as clientID, 'customerReference' as customerReference, 'deliveryName' as deliveryName) t
WHERE NOT EXISTS (SELECT 1 FROM ordersTable u WHERE u.customerReference = t.customerReference AND u.clientID = t.clientID);
This ends in deadlocks for any processes after the original row is inserted. I was hoping to avoid deadlocks?
My options it seems are:
Live with it deadlocking because I know if it deadlocks then the row already exists and instead of looking at affected_rows ==0 make it affected_rows <= 0.
Try to come up with some column that will make a unique record hash per order based on client ID and Customer Reference? and then do an "INSERT IGNORE" for that column?
I wasn't too confident in either solution so I thought it couldn't hurt to ask for advice first.
Have you tried using a transaction with a unique constraint on the uniqueID and clientID columns? This will prevent duplicates from being inserted, and you can catch the exception that is thrown when a replication is attempted to be inserted and handle it as needed.
INSERT INTO ordersTable (clientID,uniqueID,deliveryName)
VALUES ('clientID', 'uniqueID', 'deliveryName')
ON DUPLICATE KEY UPDATE deliveryName = VALUES(deliveryName);
Ok, you can also use "INSERT IGNORE" statement. This statement tells the server to insert the new record, but if there is a violation of a UNIQUE index or PRIMARY KEY, ignore the error and don't insert the new record.
INSERT IGNORE INTO ordersTable (clientID,uniqueID,deliveryName)
VALUES ('clientID', 'uniqueID', 'deliveryName');

MySQL concurrency and auto_incrementing key

I have a MySQL table of Users, and a table of Actions performed by the Users (linked to that User by a the primary key, userid ). The Actions table has an incrementing key indx. Whenever I add a new row to that table, I then update the latest column of the relevant Users row with the indx of the row I just added to the Actions table. So something like:
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LAST_INSERT_ID() WHERE userid=1;
The idea being that I can check for updates for a User by seeing if the latest is higher then the last time I checked.
My issue is that if more than one connection is opened on the database and they try and add an Action for the same User at the same time, connection2 could conceivably run their INSERT and UPDATE between the INSERT and update of connection1, and the latest entry of the user they're both trying to update will no longer have the indx of the most recent action entry.
I've been reading up on transaction, isolation levels, etc. But haven't really found a way around this (though my understanding of how these work exactly is pretty shaky, so maybe I just misunderstood). I think I need a way to lock the Actions table until the User table is updated. This application only gets used by a few hundred users tops, so I don't think the performance hit due to momentarily locking the table will be too bad.
So is that something that can be done in MySQL? Is there a better solution? I imagine this general pattern must be pretty common: having one table with a bunch of varieties of rows, and a second table with a row that tracks meta data for each variety in table A and needs to be updated atomically each time that first table is changed. So I'm hoping there's a solution that isn't too complex
Use SELECT ... FOR UPDATE to lock the row in order to serialize the access to the table and prevent from race conditions:
START TRANSACTION;
SELECT any_column FROM users WHERE userid=1 FOR UPDATE;
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LATEST_INSERT_ID() WHERE userid=1;
COMMIT;
However this will slown down your INSERTing rate, because all these transactions from all sessions will be serialized.
The better option is to not store the last ID in users table at all. Just use SELECT max( id ) FROM actions WHERE userid = xxxx in all places where this number is required. With an index on actions( userid ) this query will be very fast (assuming that id column is the primary key in this table), and the inserts will not be slowed down

How to update a row and insert one if it doesn't exist, without wrongly raising auto_increment [duplicate]

I have table structure like this
when I insert row to the table I'm using this query:
INSERT INTO table_blah ( material_item, ... hidden ) VALUES ( data, ... data ) ON DUPLICATE KEY UPDATE id = id, material_item = data, ... hidden = data;
when I first insert data without triggering the ON DUPLICATE KEY the id increments fine:
but when the ON DUPLICATE KEY triggers and i INSERT A NEW ROW the id looks odd to me:
How can I keep the auto increment, increment properly even when it triggers ON DUPLICATE KEY?
This behavior is documented (paragraph in parentheses):
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that
would cause a duplicate value in a UNIQUE index or PRIMARY KEY, MySQL
performs an UPDATE of the old row. For example, if column a is
declared as UNIQUE and contains the value 1, the following two
statements have similar effect:
INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
(The effects are not identical for
an InnoDB table where a is an auto-increment column. With an
auto-increment column, an INSERT statement increases the
auto-increment value but UPDATE does not.)
Here is a simple explanation. MySQL attempts to do the insert first. This is when the id gets auto incremented. Once incremented, it stays. Then the duplicate is detected and the update happens. But the value gets missed.
You should not depend on auto_increment having no gaps. If that is a requirement, the overhead on the updates and inserts is much larger. Essentially, you need to put a lock on the entire table, and renumber everything that needs to be renumbered, typically using a trigger. A better solution is to calculate incremental values on output.
This question is a fairly old one, but I answer it maybe it helps someone, to solve the auto-incrementing problem use the following code before insert/on duplicate update part and execute them all together:
SET #NEW_AI = (SELECT MAX(`the_id`)+1 FROM `table_blah`);
SET #ALTER_SQL = CONCAT('ALTER TABLE `table_blah` AUTO_INCREMENT =', #NEW_AI);
PREPARE NEWSQL FROM #ALTER_SQL;
EXECUTE NEWSQL;
together and in one statement it should be something like below:
SET #NEW_AI = (SELECT MAX(`the_id`)+1 FROM `table_blah`);
SET #ALTER_SQL = CONCAT('ALTER TABLE `table_blah` AUTO_INCREMENT =', #NEW_AI);
PREPARE NEWSQL FROM #ALTER_SQL;
EXECUTE NEWSQL;
INSERT INTO `table_blah` (`the_col`) VALUES("the_value")
ON DUPLICATE KEY UPDATE `the_col` = "the_value";
I had the same frustration of gaps in the auto increment but I found a way to avoid it.
In terms of previouslly discussed "overheads". When I first wrote my DB query code, it did so many separate queries that it took 5 hours. Once I put on
"ON DUPLICATE KEY UPDATE"
it got it down to about 50 seconds. Amazing! Anyway the way I solved it was by using 2 queries. Which doulbles the time it takes to 2 minutes, which is still fine.
First I did an sql query for writing all the data (updates and inserts), but I included "IGNORE" in the first query, so this just bypasses the updates and only inserts the new stuff. So assuming your auto_increment previously has no gaps then it will still have no gaps because its only new records. I believe it is updates that cause the gaps. So for inserts:
"INSERT IGNORE INTO mytablename(stuff,stuff2) VALUES "
Next I did the "ON DUPLICATE KEY UPDATE" variation of that sql query. It will keep the ID's in tact because all the records being updated have ID's already. The only thing it breaks is the auto_increment value, which gets incremented when a new record is added (or updated). So the solution is to patch this auto_increment value back to what it was before, once you have applied the updates.
To patch the auto increment value use this sql in your php:
"ALTER TABLE mytablename AUTO_INCREMENT = " . ($TableCount + 1);
This works because when you do the updates you are not increasing the amount of records. Therefore we can use the tablecount to know what the next ID should be. You set $TableCount to the table count, then we add 1 and that's the next auto increment number.
This is cheap and dirty but it seems to work. Could be bad using this while something else is writing to the db though.
Change database engine from InnoDB to MyIsam will resolve your issue.
I often deal with this by creating a temporary table, recording in the temporary table whether the record is new or not, doing an UPDATE only on the rows that are not new, and doing an INSERT with the new rows. Here's a complete example:
## THE SETUP
# This is the table we're trying to insert into
DROP TABLE IF EXISTS items;
CREATE TABLE items (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) UNIQUE,
price INT
);
# Put a few rows into the table
INSERT INTO items (name, price) VALUES
("Bike", 200),
("Basketball", 10),
("Fishing rod", 25)
;
## THE INSERT/UPDATE
# Create a temporary table to help with the update
DROP TEMPORARY TABLE IF EXISTS itemUpdates;
CREATE TEMPORARY TABLE itemUpdates (
name VARCHAR(100) UNIQUE,
price INT,
isNew BOOLEAN DEFAULT(true)
);
# Change the price of the Bike and Basketball and add a new Tent item
INSERT INTO itemUpdates (name, price) VALUES
("Bike", 150),
("Basketball", 8),
("Tent", 100)
;
# For items that already exist, set isNew false
UPDATE itemUpdates
JOIN items
ON items.name = itemUpdates.name
SET isNew = false;
# UPDATE the already-existing items
UPDATE items
JOIN itemUpdates
ON items.name = itemUpdates.name
SET items.price = itemUpdates.price
WHERE itemUpdates.isNew = false;
# INSERT the new items
INSERT INTO items (name, price)
SELECT name, price
FROM itemUpdates
WHERE itemUpdates.isNew = true;
# Check the results
SELECT * FROM items;
# Results:
# ID | Name | Price
# 1 | Bike | 150
# 2 | Basketball | 8
# 3 | Fishing rod | 25
# 4 | Tent | 100
The INSERT IGNORE INTO approach is simpler, but it ignores any error, which isn't what I want. And I agree that this is strange behavior on the part of MySQL but it's what we've got to work with.
I just thought I'd add, as i was trying to find an answer to my problem.
I could not stop the duplicate warning and found it was because I had it set it to TINYINT which only allows 127 entries, changing to SMALL/MED/BIGINT allows for many more
I don't think this is a problem with MySQL 5.6. See this example.
ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)
Adding less of a direct answer and more of a fix to the end results.
If you don't use your autoincrement as an identification field within your application (and you really shouldn't be. A UUID or something of that nature is better practice), and of course, if you don't have multi-billions of lines, you can reset your autoincrement field fairly easily.
SET SQL_SAFE_UPDATES = 0;
SET #num := 0;
UPDATE my_table SET id = #num := (#num+1);
ALTER TABLE my_table AUTO_INCREMENT =1;
I kinda hate that this is a thing when doing an INSERT UPDATE in MySQL.
This is not my code. I got it some somewhere on SO but it was so long ago...
Additional note, this is not really an answer to this issue. Its more to help fix an out-of-control autoincrement field.
INSERT INTO table_blah ( material_item, ... hidden ) VALUES ( data, ... data ) ON DUPLICATE KEY UPDATE material_item = data, ... hidden = data
Yes remove the ID=ID as it will automaticly add where PRIMARY KEY = PRIMARY KEY...

Replace Into Query Syntax

I want to be able to update a table of the same schema using a "replace into" statement. In the end, I need to be able to update a large table with values that may have changed.
Here is the query I am using to start off:
REPLACE INTO table_name
(visual, inspection_status, inspector_name, gelpak_name, gelpak_location),
VALUES (3, 'Partially Inspected', 'Me', 'GP1234', 'A01');
What I don't understand is how does the database engine know what is a duplicate row and what isn't? This data is extremely important and I can't risk the data being corrupted. Is it as simple as "if all columns listed have the same value, it is a duplicate row"?
I am just trying to figure out an efficient way of doing this so I can update > 45,000 rows in under a minute.
As the documentation says:
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.
REPLACE does work much like an INSERT that just overwrites records that have the same PRIMARY KEY or UNIQUE index, however, beware.
Shlomi Noach writes about the problem with using REPLACE INTO here:
But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look.
Whenever a row is deleted, all indexes need to be updated, and most importantly the PRIMARY KEY. When a new row is inserted, the same happens. Especially on InnoDB tables (because of their clustered nature), this means much overhead. The restructuring of an index is an expensive operation. Index nodes may need to be merged upon DELETE. Nodes may need to be split due to INSERT. After many REPLACE INTO executions, it is most probable that your index is more fragmented than it would have been, had you used SELECT/UPDATE or INSERT INTO ... ON DUPLICATE KEY
Also, there's the notion of "well, if the row isn't there, we create it. If it's there, it simply get's updated". This is false. The row doesn't just get updated, it is completely removed. The problem is, if there's a PRIMARY KEY on that table, and the REPLACE INTO does not specify a value for the PRIMARY KEY (for example, it's an AUTO_INCREMENT column), the new row gets a different value, and this may not be what you were looking for in terms of behavior.
Many uses of REPLACE INTO have no intention of changing PRIMARY KEY (or other UNIQUE KEY) values. In that case, it's better left alone. On a production system I've seen, changing REPLACE INTO to INSERT INTO ... ON DPLICATE KEY resulted in a ten fold more throughput (measured in queries per second) and a drastic decrease in IO operations and in load average.
In summary, REPLACE INTO may be right for your implementation, but you might find it more appropriate (and less risky) to use INSERT ... ON DUPLICATE KEY UPDATE instead.
or something like that:
insert ignore tbl1 (select * from tbl2);
UPDATE
`tbl1` AS `dest`,
(SELECT * FROM tbl2) AS `src`
SET
dest.field=src.field,
dest.field=if (length(src.field)>0,src.field,dest.field) /* or anything like that*/
WHERE
`dest`.id = `src`.id;
CREATE TEMPORARY TABLE test
(prim INT PRIMARY KEY
,sec INT NOT NULL UNIQUE
,tert INT UNIQUE
,com VARCHAR(255)
);
INSERT INTO test (prim,sec,tert,com)
VALUES (1,2,3,'123')
,(2,3,null,'23n')
,(3,1,null,'31n');
REPLACE INTO test(prim,sec,tert,com)
VALUES (3,3,3,'333');
SELECT *
FROM test;
DROP TEMPORARY TABLE test;
fun times

Which is a faster way for checking for duplicate entries, then creating a new entry?

I want to check if an entry exist, if it does I'll increment it's count field by 1, if it doesn't I'll create a new entry and have it's count initialize to 1. Simple enough, right? It seems so, however, I've stumbled upon a lot of ways to do this and I'm not sure which way is the fastest.
1) I could use this to check for an existing entry, then depending, either update or create:
if(mysql_num_rows(mysql_query("SELECT userid FROM plus_signup WHERE userid = '$userid'")))
2) Or should I use WHERE_EXISTS?
SELECT DISTINCT store_type FROM stores
WHERE EXISTS (SELECT * FROM cities_stores
WHERE cities_stores.store_type = stores.store_type);
3) Or use this to insert an entry, then if it exists, update it:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
4) Or perhaps I can set the id column as a unique key then just wait to see if there's a duplicate error on entry? Then I could update that entry instead.
I'll have around 1 million entries to search through, the primary key is currently a bigint. All I want to match when searching through the entries is just the bigint id field, no two entries have the same id at the moment and I'd like to keep it that way.
Edit: Oh shoot, I created this in the wrong section. I meant to put it into serverfault.
I believe it's 3.
Set an INDEX or a UNIQUE constraint and then use the syntax of number 3.
It depends which case will happen more often.
If it is more likely that the record does not exists I'd go for an INSERT IGNORE INTO, checking affected rows afterwards; if this is 0 the record already exists, so an UPDATE is issued.
Otherwise I'd go for INSERT INTO ... ON DUPLICATE KEY UPDATE.