SQL standard UPSERT call - mysql

I'm looking for a standard SQL "UPSERT" statement. A one call for insert and update if exists.
I'm looking for a working, efficient and cross platform call.
I've seen MERGE, UPSERT, REPLACE, INSERT .. ON DUPLICATE UPDATE but no statement meets the needs.
BTW I use MYSQL and HSQLDB for unitests. I understand that HSQLDB is limited and may not cover what I need, but I couldn't find a standard way even without it.
A statement that only MYSQL and HSQLDB will also be enough for now.
I've been looking around for a while and couldn't get an answer.
My table:
CREATE TABLE MY_TABLE (
MY_KEY varchar(50) NOT NULL ,
MY_VALUE varchar(50) DEFAULT NULL,
TIME_STAMP bigint NOT NULL,
PRIMARY KEY (MY_KEY)
);
Any idea?

The only solution that is supported by both MySQL and HSQLDB is to query the rows you intend to replace, and conditionally either INSERT or UPDATE. This means you have to write more application code to compensate for the differences between RDBMS implementations.
START TRANSACTION.
SELECT ... FOR UPDATE.
If the SELECT finds rows, then UPDATE.
Else, INSERT.
COMMIT.
MySQL doesn't support the ANSI SQL MERGE statement. It supports REPLACE and INSERT...ON DUPLICATE KEY UPDATE. See my answer to "INSERT IGNORE" vs "INSERT ... ON DUPLICATE KEY UPDATE" for more on that.
Re comments: Yes, another approach is to just try the INSERT and see if it succeeds. Otherwise, do an UPDATE. If you attempt the INSERT and it hits a duplicate key, it'll generate an error, which turns into an exception in some client interfaces. The disadvantage of doing this in MySQL is that it generates a new auto-increment ID even if the INSERT fails. So you end up with gaps. I know gaps in auto-increment sequence are not ordinarily something to worry about, but I helped a customer last year who had gaps of 1000-1500 in between successful inserts because of this effect, and the result was that they exhausted the range of an INT in their primary key.
As #baraky says, one could instead attempt the UPDATE first, and if that affects zero rows, then do the INSERT instead. My comment on this strategy is that UPDATEing zero rows is not an exception -- you'll have to check for "number of rows affected" after the UPDATE to know whether it "succeeded" or not.
But querying the number of rows affected returns you to the original problem: you have to use different queries in MySQL versus HSQLDB.
HSQLDB:
CALL DIAGNOSTICS(ROW_COUNT);
MySQL:
SELECT ROW_COUNT();

The syntax for doing an upsert in a single command varies by RDBMS.
MySQLINSERT…ON DUPLICATE KEY UPDATE
HSQLDBMERGE
PostgresINSERT…ON CONFLICT…
See Wikipedia for more.
If you want a cross platform solution, then you'll need to use multiple commands. First check for the existing row, then conditionally insert or update as appropriate.

Related

Manually increament primary key - Transaction and racing condition

This may not be a real world issue but is more like a learning topic.
Using PHP, MySQL and PDO, I know all about auto_increment and lastInsertId(). Consider that the primary key has no auto_incerment attribute and we have to use something like SELECT MAX(id) FROM table in order to retrieve last id, increment it manually and then INSERT INTO table (id) VALUES (:lastIdPlusOne). Wrap whole code in beginTransaction and commit.
Is this approach safe? If user A and B at the same time load this script what will happens at the end? both transaction will be failed? Or both will be successful (for instance, if the last id was 10, A will insert 11 and B will insert 12)?
Note that since I am a PHP & MySQL developer, therefor I am more interested in MySQL behavior in this case.
If both got the same max, then the one that inserts first will succeed, and other(s) will fail.
To overcome this issue without using using auto_increment fields, you may use a trigger before insert that does the job (new.id=max) i.e. same logic, but in a trigger, so the DB server is the one who controls it.
Not sure though if this is 100% safe in a master-master replication environment in case of a server failure.
This is #eggyal comment, that I quote here:
You must ensure that you use a locking read to fetch the MAX() in the first (select) query; it will then block until the transaction is committed. However, this is very poor design and should not be used in a production system.

MySQL DUPLICATE KEY UPDATE fails to update due to a NOT NULL field which is already set

I have a MySQL DB which is using strict mode so I need to fill all NOT NULL values when I insert a row. The API Im creating is using just DUPLICATE KEY UPDATE functionality to do both inserts/updates.
The client application complains if any NOT NULL attributes are inserted which is expected.
Basic example (id is primary key and theare are two fields that are NOT NULL aaa and xxx)
INSERT INTO tablename (aaa, xxx, id ) VALUES ( "value", "value", 1)
ON DUPLICATE KEY UPDATE aaa=VALUES(aaa), xxx=VALUES(xxx)
All good so far. Once it is inserted, the system would allow doing updates. Nevertheless, I get the following error when updating only one of the fields.
INSERT INTO tablename (aaa, id ) VALUES ( "newValue", 1)
ON DUPLICATE KEY UPDATE aaa=VALUES(aaa)
java.sql.SQLException: Field 'xxx' doesn't have a default value
This Exception is a lie as the row is already inserted and xxx attribute has "value" as value. I would expect the following sentence to be equivalent to:
UPDATE tablename SET aaa="newValue" WHERE id=1
I would be glad if someone can shed some light about this issue.
Edit:
I can use the SQL query in PhpMyAdmin successfully to update just one field so I am afraid that this is not a SQL problem but a driver problem with JDBC. That may not have solution then.
#Marc B: Your insight is probably true and would indicate what I just described. That would mean that there is a bug in JDBC as it should not do that check when the insert is of ON DUPLICATE type as there may be a default value for the row after all. Can't provide real table data but I believe that all explained above is quite clear.
#ruakh: It does not fail to insert, neither I am expecting delayed validation. One requirement I have is to have both insert/updates done using the same query as the servlet does not know if the row exists or not. The JAVA API service only fails to update a row that has NOT NULL fields which were already filled when the insert was done. The exception is a lie because the field DOES have a default value as it was inserted before the update.
This is a typical case of DRY / SRP fail; in an attempt to not duplicate code you've created a function that violates the single responsibility principle.
The semantics of an INSERT statement is that you expect no conflicting rows; the ON DUPLICATE KEY UPDATE option is merely there to avoid handling the conflict inside your code, requiring another separate query. This is quite different from an UPDATE statement, where you would expect at least one matching row to be present.
Imagine that MySQL would only check the columns when an INSERT doesn't conflict and for some reason a row was just removed from the database and your code that expects to perform an update has to deal with an exception it doesn't expect. Given the difference in statement behaviour it's good practice to separate your insert and update logic.
Theory aside, MySQL puts together an execution plan when a query is run; in the case of an INSERT statement it has to assume that it might succeed when attempted, because that's the most optimal strategy. It prevents having to check indices etc. only to find out later that a column is missing.
This is per design and not a bug in JDBC.

Is inserting a new database entry faster than checking if the entry exists first?

I was once told that it is faster to just run an insert and let the insert fail than to check if a database entry exists and then inserting if it is missing.
I was also told that that most databases are heavily optimized for reading reading rather than writing, so wouldn't a quick check be faster than a slow insert?
Is this a question of the expected number of collisions? (IE it's faster to insert only if there is a low chance of the entry already existing.) Does it depend on the database type I am running? And for that matter, is it bad practice to have a method that is going to be constantly adding insert errors to my error log?
Thanks.
If the insert is going to fail because of an index violation, it will be at most marginally slower than a check that the record exists. (Both require checking whether the index contains the value.) If the insert is going to succeed, then issuing two queries is significantly slower than issuing one.
You can use INSERT IGNORE so that if the key already exist, the insert command would just be ignored, else the new row will be inserted. This way you need to issue a single query, which checks the duplicate values as well inserts new values too.
still Be careful with INSERT IGNORE as it turns EVERY error into a warning. Read this post for insert ignore
On duplicate key ignore?
I think INSERT IGNORE INTO .... can be used here, either it will insert or ignore it.
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If you want to delete the old value and insert a new value you can use REPLACE You can use REPLACE instead of INSERT to overwrite old rows.
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.
Else use the INSERT IGNORE as it will either inserts or ignores.
a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
If your intension is to Insert if its a new record OR Update the record if it already exists then how about doing an UPSERT?
Check out - http://vadivel.blogspot.com/2011/09/upsert-insert-and-update-in-sql-server.html
Instead of checking whether the record exists or not we can try to Update it directly. If there is no matching record then ##RowCount would be 0. Based on that we can Insert it as a new record. [In SQL Server 2008 you can use MERGE concept for this]
EDIT: Please note, I know this works for MS SQL Server and I don't know about MySQL or ORACLE

Enforcing unique columns

If a column is made unique in a database table structure, is there any need to do a check to see if a new value to be inserted already exists in the table via script? Or would it be fine just to insert values letting the DBMS filter non-new values?
When you will try to insert a duplicate value in a unique column, your insert query will fail. So it might be a good idea to make sure you are checking to see if your insert queries went well or not. Althought regardless of the situation you should always check if your insert query went through or not :)
You should always validate your data before inserting it on the database. That being said, what will happen if you try to insert a non-unique value on a unique defined column is an SQLexception.
In order to validate this before insertion, you could for example do a
select 1
from mytable_with_unique_column
where my_unique_column = myNewValue
If the query returns anything, then simply do not try to insert as that will throw an SQLException.
Verification of unique constraint is definitely an overkill.
When you put unique constraint on your column, an implicit index is created for this column. Thus, DBMS can (and will) verify your data much faster. Unfortunately, when you try to insert duplicate value into your column, you will get constraint violation exception you have to deal with (but you have to deal with such error while using script verification either).
Good luck.
You can combine the insert statement and validation select into one statement:
insert into mytable_with_unique_column (...) values (...)
where not exists
(
select 1
from mytable_with_unique_column
where my_unique_column = myNewValue
)
This will only insert a new row if there isn't already a row with the given unique value.

Easy mysql question regarding primary keys and an insert

In mysql, how do I get the primary key used for an insert operation, when it is autoincrementing.
Basically, i want the new autoincremented value to be returned when the statement completes.
Thanks!
Your clarification comment says that you're interested in making sure that LAST_INSERT_ID() doesn't give the wrong result if another concurrent INSERT happens. Rest assured that it is safe to use LAST_INSERT_ID() regardless of other concurrent activity. LAST_INSERT_ID() returns only the most recent ID generated during the current session.
You can try it yourself:
Open two shell windows, run mysql
client in each and connect to
database.
Shell 1: INSERT into a table with an
AUTO_INCREMENT key.
Shell 1: SELECT LAST_INSERT_ID(),
see result.
Shell 2: INSERT into the same table.
Shell 2: SELECT LAST_INSERT_ID(),
see result different from shell 1.
Shell 1: SELECT LAST_INSERT_ID()
again, see a repeat of earlier
result.
If you think about it, this is the only way that makes sense. All databases that support auto-incrementing key mechanisms must act this way. If the result depends on a race condition with other clients possibly INSERTing concurrently, then there would be no dependable way to get the last inserted ID value in your current session.
MySQL's LAST_INSERT_ID()
The MySQL Docs describe the function: LAST_INSERT_ID()
[select max(primary_key_column_name) from table_name]
Ahhh not nessecarily. I am not an MySQL guy but there are specific way to get the last inserted id for the last completed action that are a little more robust than this. What if an insert has happened between you writing to the table and querying it? I know about because it stung me many moons ago (so yeah it does happen).
If all else fails read the manual: http://dev.mysql.com/doc/refman/5.0/en/getting-unique-id.html