I've been searching the internet for a couple hours now and I'm not sure how to resolve this at all. So brief description is a customer posts orders to our system and they can supply a Customer Reference that our system will reject if that Customer Reference already exists.
I can't make the column in MySQL UNIQUE as different clients sometimes use the same Customer Reference and we do not require the Customer Reference so sometimes it's just left blank.
Originally I was just checking if the Customer Reference existed if necessary and then inserting the row if it did not exist. This works on 99.99% of cases, but I have a client that mass sends orders and those sometimes have duplicates. Which since they're posting quickly the select can happen before the first insert and duplicates arise.
I've switched to code like this below:(Shortened for example, this only runs if customerReference is not blank)
INSERT INTO ordersTable (clientID,customerReference,deliveryName) SELECT clientID, customerReference,deliveryName
FROM (SELECT 'clientID' as clientID, 'customerReference' as customerReference, 'deliveryName' as deliveryName) t
WHERE NOT EXISTS (SELECT 1 FROM ordersTable u WHERE u.customerReference = t.customerReference AND u.clientID = t.clientID);
This ends in deadlocks for any processes after the original row is inserted. I was hoping to avoid deadlocks?
My options it seems are:
Live with it deadlocking because I know if it deadlocks then the row already exists and instead of looking at affected_rows ==0 make it affected_rows <= 0.
Try to come up with some column that will make a unique record hash per order based on client ID and Customer Reference? and then do an "INSERT IGNORE" for that column?
I wasn't too confident in either solution so I thought it couldn't hurt to ask for advice first.
Have you tried using a transaction with a unique constraint on the uniqueID and clientID columns? This will prevent duplicates from being inserted, and you can catch the exception that is thrown when a replication is attempted to be inserted and handle it as needed.
INSERT INTO ordersTable (clientID,uniqueID,deliveryName)
VALUES ('clientID', 'uniqueID', 'deliveryName')
ON DUPLICATE KEY UPDATE deliveryName = VALUES(deliveryName);
Ok, you can also use "INSERT IGNORE" statement. This statement tells the server to insert the new record, but if there is a violation of a UNIQUE index or PRIMARY KEY, ignore the error and don't insert the new record.
INSERT IGNORE INTO ordersTable (clientID,uniqueID,deliveryName)
VALUES ('clientID', 'uniqueID', 'deliveryName');
Related
If have a table where I do bulk imports from CSV files.
First column is the Id field with autoincrement.
What bothers me is:
When I do a
Select count(*)
And a
Select max(Id)
I get different values. I would have expected those to be identical ?
What am I missing ?
If you insert 10 rows, delete 5, then insert 10 more then your COUNT(*) will not match MAX(id).
You can also insert an id way ahead of where it should be, like in an empty table INSERT ... (id) VALUES (9000000) will kick up your MAX(id) significantly despite having only 1 row.
Rolled-back transactions can also interfere with this.
If you want to know the next increment, check the AUTO_INCREMENT value, but be aware that this is only a guess, the actual value used may differ by the time you actually get around to inserting.
If you want them to match then you need to:
Start with a table where AUTO_INCREMENT=1, as in it's either brand new or has been cleared with TRUNCATE.
Insert using auto-generated id values as one transaction, or as a series of transactions where all of them have been fully committed.
I have a MySQL table of Users, and a table of Actions performed by the Users (linked to that User by a the primary key, userid ). The Actions table has an incrementing key indx. Whenever I add a new row to that table, I then update the latest column of the relevant Users row with the indx of the row I just added to the Actions table. So something like:
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LAST_INSERT_ID() WHERE userid=1;
The idea being that I can check for updates for a User by seeing if the latest is higher then the last time I checked.
My issue is that if more than one connection is opened on the database and they try and add an Action for the same User at the same time, connection2 could conceivably run their INSERT and UPDATE between the INSERT and update of connection1, and the latest entry of the user they're both trying to update will no longer have the indx of the most recent action entry.
I've been reading up on transaction, isolation levels, etc. But haven't really found a way around this (though my understanding of how these work exactly is pretty shaky, so maybe I just misunderstood). I think I need a way to lock the Actions table until the User table is updated. This application only gets used by a few hundred users tops, so I don't think the performance hit due to momentarily locking the table will be too bad.
So is that something that can be done in MySQL? Is there a better solution? I imagine this general pattern must be pretty common: having one table with a bunch of varieties of rows, and a second table with a row that tracks meta data for each variety in table A and needs to be updated atomically each time that first table is changed. So I'm hoping there's a solution that isn't too complex
Use SELECT ... FOR UPDATE to lock the row in order to serialize the access to the table and prevent from race conditions:
START TRANSACTION;
SELECT any_column FROM users WHERE userid=1 FOR UPDATE;
INSERT INTO actions(indx,actionname,userid) VALUES(default, "myaction", 1);
UPDATE users SET latest=LATEST_INSERT_ID() WHERE userid=1;
COMMIT;
However this will slown down your INSERTing rate, because all these transactions from all sessions will be serialized.
The better option is to not store the last ID in users table at all. Just use SELECT max( id ) FROM actions WHERE userid = xxxx in all places where this number is required. With an index on actions( userid ) this query will be very fast (assuming that id column is the primary key in this table), and the inserts will not be slowed down
I have a left join query that shows all the fields from a primary table (tblMarkers) and the values from a second table (tblLocations) where there is matching record.
tblLocations does not have a record for every id in tblMarkers
$query ="SELECT `tblMarkers`.*,`tblLocation`.*,`tblLocation`.`ID` AS `markerID`
FROM
`tblMarkers`
LEFT JOIN `tblLocation` ON `tblMarkers`.`ID` = `tblLocation`.`ID`
WHERE
`tblMarkers`.`ID` = $id";
I am comfortable with using UPDATE to update the tblMarkers fields but how do I update or INSERT a record into tblLocations if the record does not exist yet in tblLocations.
Also, how do I lock the record I ma working on to prevent someone else from doing an update at the same time?
Can I also use UPDATE tblMarkers * or do I have to list every field in the UPDATE statement?
Unfortunately you might have to implement some validation in your outside script. There is an IF statement in SQL, but I'm not sure if you can trigger different commands based on it's outcome.
Locking
In terms of locking, you have 2 options. for MyISAM tables, you can only lock the entire table using http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
LOCK TABLE users;
For InnoDB tables, there is no explicit 'lock' for single rows, however you can use transactions, to get exclusive rights during the operation. http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html
Update
There might be some shorthand notation, but I think you have to list every field in your query. Alternatively, you can always read the entire row, delete it and insert again using shorthand INSERT query. It all depends on how many fields you've got.
I have a basic table with columns:
id (primary with AI)
name (unique)
etc
If the unique column doesn't exist, INSERT the row, otherwise UPDATE the row....
INSERT INTO pages (name, etc)
VALUES
'bob',
'randomness'
ON DUPLICATE KEY UPDATE
name = VALUES(name),
etc = VALUES(etc)
The problem is that if it performs an UPDATE, the auto_increment value on the id column goes up. So if a whole bunch of UPDATES are performed, the id auto_increment goes through the roof.
Apparently it was a bug: http://bugs.mysql.com/bug.php?id=28781
...but I'm using InnoDB on mySQL 5.5.8 on shared hosting.
Other people having issues with no solution years ago:
prevent autoincrement on MYSQL duplicate insert and
Why does MySQL autoincrement increase on failed inserts?
Ideas on a fix? Have I maybe structured the database incorrectly somehow?
******EDIT****: It appears adding innodb_autoinc_lock_mode = 0 to your my.ini file fixes the problem but what options do I have for shared hosting?
******EDIT 2******: OK, I think my only option is to change to MyISAM as the storage engine. Being a mega mySQL newbie, I hope that doesn't cause many issues. Yeah?
I don't think there is a way to bypass this behaviour of INSERT ... ON DUPLICTE KEY UPDATE.
You can however put two statements, one UPDATE and one INSERT, in one transaction:
START TRANSACTION ;
UPDATE pages
SET etc = 'randomness'
WHERE name = 'bob' ;
INSERT INTO pages (name, etc)
SELECT
'bob' AS name
, 'randomness' AS etc
FROM dual
WHERE NOT EXISTS
( SELECT *
FROM pages p
WHERE p.name = 'bob'
) ;
COMMIT ;
The on duplicate key functionality of MySQL is exactly the same as doing two separate queries, one to select, then one to either update the selected record, or insert a new record. Doing so programmatically is just as fast and will prevent this problem in the future as well as make your code more portable.
I want to check if an entry exist, if it does I'll increment it's count field by 1, if it doesn't I'll create a new entry and have it's count initialize to 1. Simple enough, right? It seems so, however, I've stumbled upon a lot of ways to do this and I'm not sure which way is the fastest.
1) I could use this to check for an existing entry, then depending, either update or create:
if(mysql_num_rows(mysql_query("SELECT userid FROM plus_signup WHERE userid = '$userid'")))
2) Or should I use WHERE_EXISTS?
SELECT DISTINCT store_type FROM stores
WHERE EXISTS (SELECT * FROM cities_stores
WHERE cities_stores.store_type = stores.store_type);
3) Or use this to insert an entry, then if it exists, update it:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
4) Or perhaps I can set the id column as a unique key then just wait to see if there's a duplicate error on entry? Then I could update that entry instead.
I'll have around 1 million entries to search through, the primary key is currently a bigint. All I want to match when searching through the entries is just the bigint id field, no two entries have the same id at the moment and I'd like to keep it that way.
Edit: Oh shoot, I created this in the wrong section. I meant to put it into serverfault.
I believe it's 3.
Set an INDEX or a UNIQUE constraint and then use the syntax of number 3.
It depends which case will happen more often.
If it is more likely that the record does not exists I'd go for an INSERT IGNORE INTO, checking affected rows afterwards; if this is 0 the record already exists, so an UPDATE is issued.
Otherwise I'd go for INSERT INTO ... ON DUPLICATE KEY UPDATE.