Proper usage of INSERT .... ON DUPLICATE KEY UPDATE - mysql

I am considering using INSERT ON DUPLICATE KEY UPDATE for my application which routinely has to submit many rows to the database in one transaction. However I am slightly confused regarding one thing. The usage examples online seem to be many in their variations for this functionality.
The behavior I am looking for is that I want to Insert the row if it does not already exist in the unique index, but if it does exist I simply want to return the ID but update nothing. Am I correct in assuming that this is the intended functionality for this statement.
Also I don't want to go creating dummy fields in my tables to utilize this functionality, as is suggested in many examples. That in my opinion is just bad practice.
Any advice is greatly appreciated. Below is an example from mysql's website that illustrates close to what I want but the c=3 part is not explained on it. I am wondering if this is required to make the last_insert_id actually work or if its just part of their example. I have read that without some dummy operation after the last_insert_id part then the last_insert_id won't work.
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), c=3;

Instead you can just SELECT the unique ID to determine whether it exists. If it does, just return it. Otherwise, do the INSERT and return the new ID.

You cannot do this with a single statement in MySQL. A SELECT statement returns existing values; data modification statements (including INSERT) do not return data. (They usually return a count of some sort.) This includes the INSERT...ON DUPLICATE KEY UPDATE statement—it does not return data.
You can probably do what you want with a stored procedure, but the procedure will contain more than one statement. If that doesn't work for you, then do as #Explosion Pills suggests and use a SELECT followed, if needed, by an INSERT.

Related

Insert statement yields different results than the select all in sql

I'm currently trying to create a table of theater locations that only has three locations. I imported denormalized data that I tried to normalize with this statement:
insert into theater(`name`, email, address, phone)
select distinct theater, theater_email, theater_address, theater_phone
from denormalized_tickets;
when I comment out the first line and run it I get the result I'm looking for.
When I write a query to see the theater table select * from theater;, it returns the theaters duplicated each 12 times.
How should I solve this? Is there anything I'm overlooking?
As discussed in the comments above, INSERT creates new rows each time you execute it. If you do that multiple times, you may add more rows every time.
Vasya recommended creating a UNIQUE index to block new rows from being created with the same values. This may or may not be appropriate for a given table. For instance, what if you want to allow multiple rows to have the same values?
Another thing you might like to read about is MySQL's REPLACE statement. The syntax is similar to INSERT, but if there's a duplicate in column(s) of a primary key or unique key, it first deletes the old row and then inserts the new row. But this won't help if you don't have the unique key defined, because how would MySQL know it's a conflict?

Query optimation for insert to database

I need a solution to insert a lot of rows in concurrent time in my sql DB.
I have a rule, that everytime I insert to my transaction table, I need a unique ID that's composed by currentTime+transactionSource+sequenceNumber. my problem is, when I test my service using Jmeter, the service is down when the concurrent insert process is up to 3000 rows. the problem relies on, the duplication of the unique ID I generate. so, there are some duplications. in my assumption, the duplication happen because a previous insert process hasnt finished, but there's another insert process. So,it generates unique ID duplication.
Can anyone give me suggestion in what the best way for doing this? Thank you.
MySQL has three wonderful methods to ensure that an id is unique:
auto_increment columns
uuid()
uuid_short()
Use them! The most common way to implement a unique id is the first one:
create table t (
t_id int auto_increment primar key,
. . .
)
I strongly, strongly advise you not to maintain your own id. You get race conditions (as you have seen). Your code will be less efficient than the code in the database. If you need the separate components, you can implement them as columns in the table.
In other words, your fundamental problem is your "rule". And there are zillions of databases in the world that work perfectly well without such a rule.
Why don't you let the database handle the insert id and then update the row with a secondary field containing the format you want ? If you have dupplicates, you can always append the row id to this identifier so it will always be unique.

generate id number mysql

i want to generate a id number for my user table.
id number is unique index.
here my trigger
USE `schema_epolling`;
DELIMITER $$
CREATE DEFINER=`root`#`localhost` TRIGGER `tbl_user_BINS` BEFORE INSERT ON `tbl_user`
FOR EACH ROW
BEGIN
SET NEW.id_number = CONCAT(DATE_FORMAT(NOW(),'%y'),LPAD((SELECT auto_increment FROM
information_schema.tables WHERE table_schema = 'schema_epolling' AND table_name =
'tbl_user'),6,0));
END
it works if i insert one by one .. or may 5 rows at a time.
but if i insert a bulk rows.. an error occured.
id number
heres the code i use for inserting bulk rows from another schema/table:
INSERT INTO schema_epolling.tbl_user (last_name, first_name)
SELECT last_name, first_name
FROM schema_nc.tbl_person
heres the error:
Error Code: 1062. Duplicate entry '14000004' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000011' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000018' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000025' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000032' for key 'id_number_UNIQUE'
if i use uuid() function it works fine. but i dont want uuid() its too long.
You don't want to generate id values that way.
The auto-increment value for the current INSERT is not generated yet at the time the BEFORE INSERT trigger executes.
Even if it were, the INFORMATION_SCHEMA would contain the maximum auto-increment value generate by any thread, not just the thread executing the trigger. So you would have a race condition that would easily conflict with other concurrent inserts and get the wrong value.
Also, querying INFORMATION_SCHEMA on every INSERT is likely to be a bottleneck for your performance.
In this case, to get the auto-increment value formatted with the two-digit year number prepended, you could advance the table's auto-increment value up to %y million, and then when we reach January 1 2015 you would ALTER TABLE to advance it again.
Re your comments:
The answer I gave above applies to how MySQL's auto-increment works. If you don't rely on auto-increment, you can generate the values by some other means.
Incrementing another one-row table as #Vatev suggests (though this creates a relatively long-lived lock on that table, which could be a bottleneck for your inserts).
Generating values in your application, based on an central, atomic id-generator like memcached. See other ideas here: Generate unique IDs in a distributed environment
Using UUID(). Yes, sorry, it's 32 characters long. Don't truncate it or you will use uniqueness.
But combining triggers with auto-increment in the way you show simply won't work.
I'd like to add my two cents to expound on Bill Karwin's point.
It's better that you don't generate a Unique ID by attempting to manually cobble one together.
The fact that your school produces an ID in that way does not mean that's the best way to do it (assuming that is what they are using that generated value for which I can't know without more information).
Your database work will be simpler and less error prone if you accept that the purpose for an ID field (or key) is to guarantee uniqueness in each row of data, not as a reference point to store certain pieces of human readable data in a central spot.
This type of a ID/key is known as a surrogate key.
If you'd like to read more about them here's a good article: http://en.wikipedia.org/wiki/Surrogate_key
It's common for a surrogate key to also be the primary key of a table, (and when it's used in this way it can greatly simplify creating relationships between tables).
If you would like to add a secondary column that concatenates date values and other information because that's valuable for an application you are writing, or any other purpose you see fit, then create that as a separate column in your table.
Thinking of an ID column/key in this, fire & forget, way may simplify the concept enough that you may experience a number of benefits in your database creation efforts.
As an example, should you require uniqueness between un-associated databases, you will more easily be able to stomach the use of a UUID.
(Because you'll know it's purpose is merely to ensure uniqueness NOT to be useful to you in any other way.)
Additionally, as you've found, taking the responsibility on yourself, instead of relying on the database, to produce a unique value adds time consuming complexity that can otherwise be avoided.
Hope this helps.

Avoid inserting duplicate column values

I have a very simple table with 3 columns tag_id, label, timestamp and I need as lightweight as possible a query to insert only when there is a new value for the column label.
How would I write an sql query to do this? I can see some examples already on the site but they are all mixed up in more complex queries (some involving subqueries) that I can't understand.
There seems to be different ways of doing it and I need to find out the most lightweight one so that I can repeat it in a loop to insert multiple tags in one go without putting too much strain on the server.
You can use
ALTER TABLE `tableName` ADD UNIQUE KEY (label);
This will enforce a unique value for that column in the schema. You will get an error when you attempt to insert a duplicate value. If you want to simply ignore the error, you can use INSERT IGNORE INTO.
you can also use:
INSERT INTO table(`label`) VALUES ("new value")
ON DUPLICATE KEY UPDATE `label` = "new value";

insert and exclude duplicates in mysql

I have 2 equal databases (A and B) with one table each running in separate offline machines.
Every day I export their data (as csv) and "merge" it into a 3rd database (C). I first process A, then B (I insert the content from A to C, then the contents from B to C)
Now, it could happen that I get duplicate rows. I consider a duplicate if some field, for example "mail" already exists. I don't care if the rest of the fields are the same.
How can I insert A and B into C excluding those rows that are duplicates?
Thanks in advance!
Easiest solution should be to create a unique index on the columns in question and run the second insert as INSERT IGNORE
Personally I use the ON DUPLICATE KEY UPDATE as using INSERT IGNORE causes any errors to be thrown as warnings.
This may have some side effects and may result in behavior you may not expect. See this post for details on some of the side effects.
If you end up using the ON DUPLICATE KEY UPDATE syntax, it will also provide a means of changing your logic to update specific fields with new data should business requirements change.
For instance, you can tally how many times a duplicate record was inserted by saying ON DUPLICATE KEY UPDATE quantity = quantity+1.
The post referenced above has a ton more information.