ON DUPLICATE KEY UPDATE - precedence - mysql

When doing an INSERT INTO {tableA} SELECT a,b,c FROM {tableB} ON DUPLICATE KEY UPDATE x=y
What is the precedence on how the duplicate keys are evaluated? I assume that MySQL first checks to see if a tuple from tableB exists clashes with a unique/primary key in tableA. If the SELECT doesn't include a primary key, or if no other duplicate key exists, then each subsequent UNIQUE INDEX "group" is evaluated and the same checking is performed. But what happens if your tableB has multiple sets of unique, multi-column indexes? Are they evaluated top-to-bottom, as described by SHOW INDEXES FROM tableB ?
Here's my SHOW INDEXES FROM <table>:
Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation
daily_metrics,0,PRIMARY,1,id,A
daily_metrics,0,unique_lineItem_creative_y_m_d,1,line_item_id,A
daily_metrics,0,unique_lineItem_creative_y_m_d,2,creative_id,A
daily_metrics,0,unique_lineItem_creative_y_m_d,3,year,A
daily_metrics,0,unique_lineItem_creative_y_m_d,4,month,A
...
Imagine there are additional sets of unique indexes similar to unique_lineItem_creative_y_m_d
The docs don't seem to illustrate this behavior.
https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
I also assume that the first matching unique index is used, if a match exists, without any attempt to match subsequent unique indexes that could match. In other words, the first unique index that matches is used, without regard for trying to find the best possible match across all indexes.

You are correct: as soon as MySQL detects a duplicate in any UNIQUE index, it abandons the INSERT and does the update.
The order in which MySQL evaluates the unique indexes does not change the outcome. There's no such thing as a better match for some unique index over another. Because they're unique indexes, any combination of column values that shows up as a duplicate is enough to completely specify the row to be updated.
MySQL's query planner, hopefully, chooses the index that's least costly to evaluate. But, formally speaking, the index it uses for this purpose is unpredictable. This unpredictability is an attribute of declarative languages like SQL. MySQL can do its work any way that works, and it doesn't have to tell you. It can be hard for programmers to grasp, because we're used to procedural languages.

If any primary or unique keys exist in tableB, that's irrelevant. The only thing that matters for INSERT...ON DUPLICATE KEY UPDATE are the primary or unique keys of the table you're inserting into — tableA in your example.
If the values you insert to tableA are already found in any of tableA's primary or unique keys, that triggers the UPDATE part of the IODKU.
It's about the values being inserted, not the constraints of the source table.
You can also trigger the UPDATE without using any source table -- just by inserting a VALUES() clause with a set of constants.

Related

MYSQL index optimization for table that stores relationship between 2 other tables

My question is regarding database structuring for a table that links 2 other tables for storing the relationship.
for example, I have 3 tables, users, locations, and users_locations.
users and locations table both have an id column.
users_locations table has the user_id and location_id from the other 2 tables.
how do you define your indexes/constraints on these tables to efficiently answer questions such as what locations does this user have or what users belong to this location?
eg.
select user_id from users_locations where location_id = 5;
or
select location_id from users_locations where user_id = 5;
currently, I do not have a foreign key constraint set, which I assume I should add, but does that automatically speed up the queries or create an index?
I don't think I can create an index on each column since there will be duplicates eg. multiple user_id entries for each location, and vice versa.
Will adding a composite key like PRIMARY_KEY (user_id, location_id) speed up queries when most queries only have half of the key?
Is there any reason to just set an AUTO INCREMENT PRIMARY_KEY field on this table when you will never query by that id?
Do I really even need to set a PRIMARY KEY?
Basically, for any table, decision to create an index or not create an index, totally depends on your use cases which you support. Indexes must always be on the per use basis and not on nice to have.
For your particular queries that you have mentioned, separate indexes on both the columns are good enough, that is query doesn't need to go to your rows to fetch the information.
Creating foreign key on a table column automatically creates an index so you need not create indexes yourself if you decide to set up foreign keys.
If you keep an auto increment key as primary key, you will still have to make user_id and location id combination as unique otherwise you will bloat your table with duplicates.So keeping a separate auto increment key doesn't make sense in your use case. However if you want to keep track of each visit to a location and save user experience each time then auto increment primary key will be a required thing.
However I would like to point it out that creating indexes does not guarantee that your queries will use them unless specified explicitly. For a single query there can be many execution plans and most efficient may not use an index.
The optimal indexes for a many-to-many mapping table:
PRIMARY KEY (aid, bid),
INDEX(bid, aid)
More discussion and more tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
(Comments on specific points in the Question)
FOREIGN KEYs implicitly create indexes, unless an explicit index has already been provided.
Composite indexes are better for many-to-many tables.
A FOREIGN KEY involves an integrity check, so it is inherently slower than simply having the index. (And the integrity check for this kind of table is of dubious value.)
There is no need for an AUTO_INCREMENT on a many:many table. However, ...
It is important to have a PRIMARY KEY on every table. The pair of columns is fine as a "natural" PRIMARY KEY.
A WHERE clause would like to use the first column(s) of some index; don't worry that it is not using all the columns.
In EXPLAIN you sometimes see "Using index". This means that a "covering index" was used. That means that all the columns used in the SELECT were found in that one index -- without having to reach into the data to get more columns. This is a performance boost. And necessitates two two-column indexes (on is the PK, one is a plain INDEX.)
With InnoDB, any 'secondary' index (INDEX or UNIQUE) implicitly includes the columns of the PK. So, given PRIMARY KEY(a,b), INDEX(b), that secondary index is effectively INDEX(b,a). I prefer to spell out the two columns to point out the to reader that I deliberately wanted those two columns in that order.
Hopefully, the above link will answer any further questions.

INSERT or UPDATE if condition over two columns (regardless of order) is met

I want to do the following:
INSERT INTO table1 (term1, term2, number) values ('A','B',10);
however if values A and B are already present in table1 regardless of their order, ie. the predicate
(term1='A' && term2='B') OR (`term1='B' && term2='A')
holds true, then I just want to update column number. Is there any way of doing that?
A (perhaps the) clean way to handle this situation is to use INSERT ... ON DUPLICATE KEY UPDATE, read the documentation.
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that
would cause a duplicate value in a UNIQUE index or PRIMARY KEY, MySQL
performs an UPDATE of the old row
The important part is would cause a duplicate value in a UNIQUE index. Therefore you need to create an multicolumn unique index.
Now I'm not sure if you can manage the order that way, therefore I'd create an extra field with the concatenation of the sorted values of your fields, and have that field uniquely indexed.
EDIT: Instead of storing the concatenation your fields, you could also just store the hash and index it.
Thanks #okiharaherbst,
This is what I did: I added new column "uniqueKey" as primary key and insert goes as follows:
INSERT INTO table1(term1,term2,num,uniquekey) VALUES ( "a","b",10,
concat(greatest(term1,term2),least(term1,term2))) on duplicate key update num=10;

Replace Into Query Syntax

I want to be able to update a table of the same schema using a "replace into" statement. In the end, I need to be able to update a large table with values that may have changed.
Here is the query I am using to start off:
REPLACE INTO table_name
(visual, inspection_status, inspector_name, gelpak_name, gelpak_location),
VALUES (3, 'Partially Inspected', 'Me', 'GP1234', 'A01');
What I don't understand is how does the database engine know what is a duplicate row and what isn't? This data is extremely important and I can't risk the data being corrupted. Is it as simple as "if all columns listed have the same value, it is a duplicate row"?
I am just trying to figure out an efficient way of doing this so I can update > 45,000 rows in under a minute.
As the documentation says:
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.
REPLACE does work much like an INSERT that just overwrites records that have the same PRIMARY KEY or UNIQUE index, however, beware.
Shlomi Noach writes about the problem with using REPLACE INTO here:
But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look.
Whenever a row is deleted, all indexes need to be updated, and most importantly the PRIMARY KEY. When a new row is inserted, the same happens. Especially on InnoDB tables (because of their clustered nature), this means much overhead. The restructuring of an index is an expensive operation. Index nodes may need to be merged upon DELETE. Nodes may need to be split due to INSERT. After many REPLACE INTO executions, it is most probable that your index is more fragmented than it would have been, had you used SELECT/UPDATE or INSERT INTO ... ON DUPLICATE KEY
Also, there's the notion of "well, if the row isn't there, we create it. If it's there, it simply get's updated". This is false. The row doesn't just get updated, it is completely removed. The problem is, if there's a PRIMARY KEY on that table, and the REPLACE INTO does not specify a value for the PRIMARY KEY (for example, it's an AUTO_INCREMENT column), the new row gets a different value, and this may not be what you were looking for in terms of behavior.
Many uses of REPLACE INTO have no intention of changing PRIMARY KEY (or other UNIQUE KEY) values. In that case, it's better left alone. On a production system I've seen, changing REPLACE INTO to INSERT INTO ... ON DPLICATE KEY resulted in a ten fold more throughput (measured in queries per second) and a drastic decrease in IO operations and in load average.
In summary, REPLACE INTO may be right for your implementation, but you might find it more appropriate (and less risky) to use INSERT ... ON DUPLICATE KEY UPDATE instead.
or something like that:
insert ignore tbl1 (select * from tbl2);
UPDATE
`tbl1` AS `dest`,
(SELECT * FROM tbl2) AS `src`
SET
dest.field=src.field,
dest.field=if (length(src.field)>0,src.field,dest.field) /* or anything like that*/
WHERE
`dest`.id = `src`.id;
CREATE TEMPORARY TABLE test
(prim INT PRIMARY KEY
,sec INT NOT NULL UNIQUE
,tert INT UNIQUE
,com VARCHAR(255)
);
INSERT INTO test (prim,sec,tert,com)
VALUES (1,2,3,'123')
,(2,3,null,'23n')
,(3,1,null,'31n');
REPLACE INTO test(prim,sec,tert,com)
VALUES (3,3,3,'333');
SELECT *
FROM test;
DROP TEMPORARY TABLE test;
fun times

Does mySql have an update/insert combo which inserts if the update fails?

I'm not optimistic that this can be done without a stored procedure, but I'm curious if the following is possible.
I want to write a single query insert/update that updates a row if it finds a match and if not inserts into the table with the values it would have been updating.
So... something like
updateInsert into table_a set n = 'foo' where p='bar';
in the event that there is no row where p='bar' it would automatically insert into table_a set n = 'foo';
EDIT:
Based on a couple of comments I see that I need to clarify that n is not a PRIMARY KEY and the table actually needs the freedom to have duplicate rows. I just have a situation where a specific entry needs to be unique... perhaps I'm just mixing metaphors in a bad way and should pull this out into a separate table where this key is unique.
I would enforce this with the table schema - utilize a unique multi-column key on the target table and use INSERT IGNORE INTO - it should throw an error on a duplicate key, but the insert will ignore on error.

Which DB design is faster: a unique index and INSERT IGNORE, or using SELECT to find existing records?

I have a table with just one column: userid.
When a user accesses a certain page, his userid is being inserted to the table. Userids are unique, so there shouldn't be two of the same userids in that table.
I'm considering two designs:
Making the column unique and using INSERT commands every time a user accesses that page.
Checking if the user is already recorded in the table by SELECTing from the table, then INSERTing if no record is found.
Which one is faster?
Definitely create a UNIQUE index, or, better, make this column a PRIMARY KEY.
You need an index to make your checks fast anyway.
Why don't make this index UNIQUE so that you have another fallback option (if you for some reason forgot to check with SELECT)?
If your table is InnoDB, it will have a PRIMARY KEY anyway, since all InnoDB tables are index-organized by design.
In case you didn't declare a PRIMARY KEY in your table, InnoDB will create a hidden column to be a primary key, thus making your table twise as large and you will not have an index on your column.
Creating a PRIMARY KEY on your column is a win-win.
You can issue
INSERT
IGNORE
INTO mytable
VALUES (userid)
and check how many records were affected.
If 0, there was a key violation, but no exception.
How about using REPLACE?
If a user already exists it's being replaced, if it doesn't a new row is inserted.
what about doing update, e.g.
UPDATE xxx SET x=x+1 WHERE userid=y
and if that fails (e.g. no matched rows), then do an insert for a new user?
SELECT is faster... but you'd prefer SELECT check not because of this, but to escape from rasing an error..
orrrrrrr
INSERT INTO xxx (`userid`) VALUES (4) ON DUPLICATE KEY UPDATE userid=VALUE(`userid`)
You should make it unique in any cases.
Wether to check first using SELECT, depends on what scenario is most common. If you have new users all the time, and only occationally existing users, it might be overall faster for the system to just insert and catch the exception in the rare occations this happens, but exception is slower than check first and then insert, so if it is a common scenario that it is an existing user, you should allways check first with select.