Ignore duplicate key upon insert with PDO - mysql

Is there a generic way to INSERT IGNORE with PDO that will work on all database drivers?
If not, is it fair to assume the following would work:
try {
$stmt = $db->prepare("INSERT INTO link_table (id1, id2) VALUES (:id1, :id2)");
$stmt->execute(array( ':id1' => $id1, ':id2' => $id2 ));
}
catch (PDOException $ex) {
// Thanks to comment by Mike:
// Re-throw exception if it wasn't a constraint violation.
if ($ex->getCode() != 23000)
throw $ex;
}

AFAIK no there is not a generic version that will work with all database drivers. INSERT IGNORE and INSERT...ON DUPLICATE KEY UPDATE are specific to MySQL.
Checking for an existing record by selecting it first, or deleting an existing record and re-inserting are both prone to problems, including race conditions and possible foreign key constraint violations or cascading deletes.
I think your approach is probably the safest. You can always check the error code if you want to determine the cause of the exception - see:
http://docstore.mik.ua/orelly/java-ent/jenut/ch08_06.htm
I think you might want to check for code 23000.

Your code would probably work, but there are other things that can go wrong while executing a PDO statement. You will want to check if you really receive a constraint violation, and even if it's not a different kind (for example, you could also receive a foreign key violation when inserting a pair of values that doesn't have corresponding rows in one of the referenced tables). AFAIK, there is no cross-DBMS way to achieve this though.
For DBMSes that support transactions (Oracle, SQL Server, PostgreSQL, MySQL to some extent, ...), here's a different approach:
start transaction
delete all rows that match the row to be inserted (deleting zero or one rows)
insert
commit

Related

SQL standard UPSERT call

I'm looking for a standard SQL "UPSERT" statement. A one call for insert and update if exists.
I'm looking for a working, efficient and cross platform call.
I've seen MERGE, UPSERT, REPLACE, INSERT .. ON DUPLICATE UPDATE but no statement meets the needs.
BTW I use MYSQL and HSQLDB for unitests. I understand that HSQLDB is limited and may not cover what I need, but I couldn't find a standard way even without it.
A statement that only MYSQL and HSQLDB will also be enough for now.
I've been looking around for a while and couldn't get an answer.
My table:
CREATE TABLE MY_TABLE (
MY_KEY varchar(50) NOT NULL ,
MY_VALUE varchar(50) DEFAULT NULL,
TIME_STAMP bigint NOT NULL,
PRIMARY KEY (MY_KEY)
);
Any idea?
The only solution that is supported by both MySQL and HSQLDB is to query the rows you intend to replace, and conditionally either INSERT or UPDATE. This means you have to write more application code to compensate for the differences between RDBMS implementations.
START TRANSACTION.
SELECT ... FOR UPDATE.
If the SELECT finds rows, then UPDATE.
Else, INSERT.
COMMIT.
MySQL doesn't support the ANSI SQL MERGE statement. It supports REPLACE and INSERT...ON DUPLICATE KEY UPDATE. See my answer to "INSERT IGNORE" vs "INSERT ... ON DUPLICATE KEY UPDATE" for more on that.
Re comments: Yes, another approach is to just try the INSERT and see if it succeeds. Otherwise, do an UPDATE. If you attempt the INSERT and it hits a duplicate key, it'll generate an error, which turns into an exception in some client interfaces. The disadvantage of doing this in MySQL is that it generates a new auto-increment ID even if the INSERT fails. So you end up with gaps. I know gaps in auto-increment sequence are not ordinarily something to worry about, but I helped a customer last year who had gaps of 1000-1500 in between successful inserts because of this effect, and the result was that they exhausted the range of an INT in their primary key.
As #baraky says, one could instead attempt the UPDATE first, and if that affects zero rows, then do the INSERT instead. My comment on this strategy is that UPDATEing zero rows is not an exception -- you'll have to check for "number of rows affected" after the UPDATE to know whether it "succeeded" or not.
But querying the number of rows affected returns you to the original problem: you have to use different queries in MySQL versus HSQLDB.
HSQLDB:
CALL DIAGNOSTICS(ROW_COUNT);
MySQL:
SELECT ROW_COUNT();
The syntax for doing an upsert in a single command varies by RDBMS.
MySQLINSERT…ON DUPLICATE KEY UPDATE
HSQLDBMERGE
PostgresINSERT…ON CONFLICT…
See Wikipedia for more.
If you want a cross platform solution, then you'll need to use multiple commands. First check for the existing row, then conditionally insert or update as appropriate.

MySQL DUPLICATE KEY UPDATE fails to update due to a NOT NULL field which is already set

I have a MySQL DB which is using strict mode so I need to fill all NOT NULL values when I insert a row. The API Im creating is using just DUPLICATE KEY UPDATE functionality to do both inserts/updates.
The client application complains if any NOT NULL attributes are inserted which is expected.
Basic example (id is primary key and theare are two fields that are NOT NULL aaa and xxx)
INSERT INTO tablename (aaa, xxx, id ) VALUES ( "value", "value", 1)
ON DUPLICATE KEY UPDATE aaa=VALUES(aaa), xxx=VALUES(xxx)
All good so far. Once it is inserted, the system would allow doing updates. Nevertheless, I get the following error when updating only one of the fields.
INSERT INTO tablename (aaa, id ) VALUES ( "newValue", 1)
ON DUPLICATE KEY UPDATE aaa=VALUES(aaa)
java.sql.SQLException: Field 'xxx' doesn't have a default value
This Exception is a lie as the row is already inserted and xxx attribute has "value" as value. I would expect the following sentence to be equivalent to:
UPDATE tablename SET aaa="newValue" WHERE id=1
I would be glad if someone can shed some light about this issue.
Edit:
I can use the SQL query in PhpMyAdmin successfully to update just one field so I am afraid that this is not a SQL problem but a driver problem with JDBC. That may not have solution then.
#Marc B: Your insight is probably true and would indicate what I just described. That would mean that there is a bug in JDBC as it should not do that check when the insert is of ON DUPLICATE type as there may be a default value for the row after all. Can't provide real table data but I believe that all explained above is quite clear.
#ruakh: It does not fail to insert, neither I am expecting delayed validation. One requirement I have is to have both insert/updates done using the same query as the servlet does not know if the row exists or not. The JAVA API service only fails to update a row that has NOT NULL fields which were already filled when the insert was done. The exception is a lie because the field DOES have a default value as it was inserted before the update.
This is a typical case of DRY / SRP fail; in an attempt to not duplicate code you've created a function that violates the single responsibility principle.
The semantics of an INSERT statement is that you expect no conflicting rows; the ON DUPLICATE KEY UPDATE option is merely there to avoid handling the conflict inside your code, requiring another separate query. This is quite different from an UPDATE statement, where you would expect at least one matching row to be present.
Imagine that MySQL would only check the columns when an INSERT doesn't conflict and for some reason a row was just removed from the database and your code that expects to perform an update has to deal with an exception it doesn't expect. Given the difference in statement behaviour it's good practice to separate your insert and update logic.
Theory aside, MySQL puts together an execution plan when a query is run; in the case of an INSERT statement it has to assume that it might succeed when attempted, because that's the most optimal strategy. It prevents having to check indices etc. only to find out later that a column is missing.
This is per design and not a bug in JDBC.

Is inserting a new database entry faster than checking if the entry exists first?

I was once told that it is faster to just run an insert and let the insert fail than to check if a database entry exists and then inserting if it is missing.
I was also told that that most databases are heavily optimized for reading reading rather than writing, so wouldn't a quick check be faster than a slow insert?
Is this a question of the expected number of collisions? (IE it's faster to insert only if there is a low chance of the entry already existing.) Does it depend on the database type I am running? And for that matter, is it bad practice to have a method that is going to be constantly adding insert errors to my error log?
Thanks.
If the insert is going to fail because of an index violation, it will be at most marginally slower than a check that the record exists. (Both require checking whether the index contains the value.) If the insert is going to succeed, then issuing two queries is significantly slower than issuing one.
You can use INSERT IGNORE so that if the key already exist, the insert command would just be ignored, else the new row will be inserted. This way you need to issue a single query, which checks the duplicate values as well inserts new values too.
still Be careful with INSERT IGNORE as it turns EVERY error into a warning. Read this post for insert ignore
On duplicate key ignore?
I think INSERT IGNORE INTO .... can be used here, either it will insert or ignore it.
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If you want to delete the old value and insert a new value you can use REPLACE You can use REPLACE instead of INSERT to overwrite old rows.
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.
Else use the INSERT IGNORE as it will either inserts or ignores.
a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If your intension is to Insert if its a new record OR Update the record if it already exists then how about doing an UPSERT?
Check out - http://vadivel.blogspot.com/2011/09/upsert-insert-and-update-in-sql-server.html
Instead of checking whether the record exists or not we can try to Update it directly. If there is no matching record then ##RowCount would be 0. Based on that we can Insert it as a new record. [In SQL Server 2008 you can use MERGE concept for this]
EDIT: Please note, I know this works for MS SQL Server and I don't know about MySQL or ORACLE

Enforcing unique columns

If a column is made unique in a database table structure, is there any need to do a check to see if a new value to be inserted already exists in the table via script? Or would it be fine just to insert values letting the DBMS filter non-new values?
When you will try to insert a duplicate value in a unique column, your insert query will fail. So it might be a good idea to make sure you are checking to see if your insert queries went well or not. Althought regardless of the situation you should always check if your insert query went through or not :)
You should always validate your data before inserting it on the database. That being said, what will happen if you try to insert a non-unique value on a unique defined column is an SQLexception.
In order to validate this before insertion, you could for example do a
select 1
from mytable_with_unique_column
where my_unique_column = myNewValue
If the query returns anything, then simply do not try to insert as that will throw an SQLException.
Verification of unique constraint is definitely an overkill.
When you put unique constraint on your column, an implicit index is created for this column. Thus, DBMS can (and will) verify your data much faster. Unfortunately, when you try to insert duplicate value into your column, you will get constraint violation exception you have to deal with (but you have to deal with such error while using script verification either).
Good luck.
You can combine the insert statement and validation select into one statement:
insert into mytable_with_unique_column (...) values (...)
where not exists
(
select 1
from mytable_with_unique_column
where my_unique_column = myNewValue
)
This will only insert a new row if there isn't already a row with the given unique value.

Best way to test for duplicate keys in a database

This is more of a correctness question. Say I have a table with a primary key column in my database. In my DAO code I have a function called insertRow(string key) that will return true if the key doesn't exist in the table and insert a new row with the key. Otherwise, if a row already exists with that key it returns false. Is it better/worse to have insertRow first check for the existence of the key or just go ahead and do the insert and catch the duplicate key error? Or is saving on a single select statement too trivial an optimization to even bother worrying about?
So in sudo code:
boolean insertRow(String key){
//potentially a select + insert
if(select count(*) from mytable where key = "somekey" == 0){
insert into mytable values("somekey")
return true;
}
return false;
}
or
boolean insertRow(String key){
try{
//always just 1 insert
insert into mytable values("somekey")
return true;
} catch (DuplicateKeyException ex){}
return false;
}
Insert the row, catch the duplicate key error. My personal choice
I reckon this might perform better, depending on the cost of throwing the exception against the cost of hitting the db twice.
Only by testing both scenarios wilil you know for sure
Try the insert, then catch the error.
Otherwise, you could still have a concurrency issue between two active SPIDs (lets say two web users on the system at the same time), in which case, you'd have to catch the error anyway:
User1: Check for key "newkey"? Not in database.
User2: Check for key "newkey"? Not in database.
User1: Insert key "newkey". Success.
User2: Insert key "newkey". Duplicate Key Error.
You can mitigate this by using explicit transactions or setting the transaction-isolation level, but its just easier to use the second technique, unless you are sure only one application thread is running against the database at all times.
In my opinion, this is an excellent case for using exceptions (since the duplicate is exceptional), unless you're counting on there to, most of the time, be a row already (i.e., you're doing "insert, but update if exists" logic.)
If the purpose of the code is to update, then you should either use the select or an INSERT ... ON DUPLICATE KEY UPDATE clause (if supported by your database engine.) Alternatively, make a stored procedure that handles this logic for you.
Second one because first option hits twice the db while second one just once.
The short answer is that you need to test it for yourself. My gut feeling is that doing a small select to check for the existence will perform better, but you need to verify that for yourself at volume and see whichever performs better.
In general, I don't like to leave my error checking entirely to the exception engine of whatever it is I'm doing. In other words, if I can check to see if what I'm doing is valid rather than just having an exception thrown, that's generally what I do.
I would suggest, however, using an EXISTS query rather than count(*)
if(exists (select 1 from mytable where key = "somekey"))
return false
else
insert the row
All that being said (from an abstract, engine-neutral perspective), I'm pretty sure that MySQL has some keywords that can be used to insert a row into a table only if the primary key doesn't exist. This may be your best bet, assuming you're OK with using MySQL-specific keywords.
Another option would be to place the logic entirely in the SQL statement.
another two options in mysql are to use
insert ignore into....
and
insert into .... on duplicate key update field=value
including on duplicate key update field=field
See: http://dev.mysql.com/doc/refman/5.0/en/insert.html
Edit:
You can test affected_rows for whether or not the insert had an effect or not.
Now that I've found Martin Fowler's book online, a decent way to do it is with a key table- see pg 222 for more info.