REPLACE INTO, does it re-use the PRIMARY KEY? - mysql

The REPLACE INTO function in MySQL works in such a way that it deletes and inserts the row. In my table, the primary key (id) is auto-incremented, so I was expecting it to delete and then insert a table with id at the tail of the database.
However, it does the unexpected and inserts it with the same id! Is this the expected behaviour, or am I missing something here? (I am not setting the id when calling the REPLACE INTO statement)

This is an expected behavior if you have another UNIQUE index in your table which you must have otherwise it would add the row as you would expect. See the documentation:
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted. See Section 13.2.5, “INSERT Syntax”.
https://dev.mysql.com/doc/refman/5.5/en/replace.html
This really also makes lot of sense because how else would mySQL find the row to replace? It could only scan the whole table and that would be time consuming. I created an SQL Fiddle to demonstrate this, please have a look here

That is expected behavior. Technically, in cases where ALL unique keys (not just primary key) on the data to be replaced/inserted are a match to an existing row, MySQL actually deletes your existing row and inserts a new row with the replacement data, using the same values for all the unique keys. So, if you look to see the number of affected rows on such a query you will get 2 affected rows for each replacement and only one for the straight inserts.

Related

Adding a UNIQUE key to a large existing MySQL table which is receiving INSERTs/DELETEs

I have a very large table (dozens of millions of rows) and a UNIQUE index needs to be added to a column on that table. I know for a fact that the table does contain duplicated values on that key, which I need to clean up (by deleting rows/resetting the value of the column to something unique that I can automatically generate). A plus is that the rows which are already duplicated do not get modified anymore.
What would be the right approach to perform a change like this, given that I will be probably using the Percona pt-osc tool and there are continuous deletes/inserts on the table? My plan was:
Add code that ensures no dupe IDs get inserted anymore. Probably I need to add a separate table for this temporarily, since I want the database to enforce this for me and not the application - so insert into the "shadow table" with a unique index in a transaction together with my main table, rollback all inserts that try to insert duplicate values
Backfill the table by zapping all invalid column values which are within the primary key range below $current_pkey_value
Then add the index and use pt-osc to changeover the table
Is there anything I am missing?
Since we use pt-online-schema-change we are using triggers for performing the synchronisation from the existing table to a temp table. The tool actually has a special configuration key for this, --no-check-unique-key-change, which will do exactly what we need - agree to perform the ALTER TABLE and set up triggers in such a way that if a conflict occurs, INSERT .. IGNORE will be applied and the first row having used the now-unique value will win in the insert during synchronisation. For us this is a good tradeoff because all the duplicates we have seen resulted from data races, not from actual conflicts in the value generation process.

Efficiently bulk import data in SSIS with occasional PK duplicate content?

Am regularly loading a flat file with 100k records into a table after some transformations. The table has a PK on two columns. The data on the whole does not contain duplicate PK information but occasionally, there are duplicates.
I naively didn't understand why SSIS was rejecting all my records when only some of them violated the PK constraint. I believe the problem is that during a bulk load, if even 1 of the rows violates the PK constraint, all rows in that batch get rejected.
If I alter the FastLoadMaxInsertCommitSize property of the OLE Db Destination to 1, if fixes the problem but it then runs like a dog as it's committing every 1 row.
In MySQL, the bulk load facility allows you to ignore PK errors and skip those rows without sacrificing performance. Does anyone know of a way to achieve this in SQL Server.
Any help much appreciated.
It sounds like you may be looking for IGNORE_DUP_KEY?
Using the IGNORE_DUP_KEY Option to Handle Duplicate Values
When you create or modify a unique
index or constraint, you can set the
IGNORE_DUP_KEY option ON or OFF. This
option specifies the error response to
duplicate key values in a multiple-row
INSERT statement after the index has
been created. When IGNORE_DUP_KEY is
set to OFF (the default), the SQL
Server Database Engine rejects all
rows in the statement when one or more
rows contain duplicate key values.
When set to ON, only the rows that
contain duplicate key values are
rejected; the nonduplicate key values
are added.
For example, if a single statement
inserts 20 rows into a table with a
unique index, and 10 of those rows
contain duplicate key values, by
default all 20 rows are rejected.
However, if the index option
IGNORE_DUP_KEY is set to ON, only the
10 duplicate key values will be
rejected; the other 10 nonduplicate
key values will be inserted into the
table.
You can up the FastLoadMaxInsertCommitSize to say 5k...this will speed up your inserts greatly. Then, set the Error Output to redirect the rows - on the error output from there, send a batch of 5k rows that contains an error row to another Destination. (This next bit is from memory!) If you set this up to not be fast load, it will then insert the good rows and you can pass the error output to an error table or something like a row count task.
You can play with the FastLoadMaxInsertCommitSize figures until you find something that works well for you.

INSERT IGNORE Inserting 0 rows

When I am attempting to run "INSERT IGNORE ..." in MYSQL to add only one variable to a table of several options, after the first insert, it refuses to work. It says "Inserted rows: 0" and doesn't insert my new value into the database. I believe this is because there is already an entry with a "nothing" value and MYSQL doesn't allow the empty field to be duplicated. This seems to be odd behavior (as long as the two inserts are not exactly the same), so I am wondering if there is some way to avoid this.
The two INSERT do not have to be exactly the same, they just have to be the same for the primary key columns.
INSERT IGNORE will ignore an insert if there is already a row with the same primary key.
If you did INSERT instead of INSERT IGNORE, you would be getting an error (duplicate primary key).
If you want to instead update the existing row, you can use REPLACE.
Either way, there can be only one row for each primary key.

Performing an UPDATE or INSERT depending whether a row exists or not in MySQL

In MySQL, I'm trying to find an efficient way to perform an UPDATE if a row already exists in a table, or an INSERT if the row doesn't exist.
I've found two possible ways so far:
The obvious one: open a transaction, SELECT to find if the row exists, INSERT if it doesn't exist or UPDATE if it exists, commit transaction
first INSERT IGNORE into the table (so no error is raised if the row already exists), then UPDATE
The second method avoids the transaction.
Which one do you think is more efficient, and are there better ways (for example using a trigger)?
INSERT ... ON DUPLICATE KEY UPDATE
You could also perform an UPDATE, check the number of rows affected, if it's less than 1, then it didn't find a matching row, so perfom the INSERT.
There is another way - REPLACE.
REPLACE INTO myTable (col1) VALUES (value1)
REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted. See Section 12.2.5, “INSERT Syntax”.
In mysql there's a REPLACE statement that, I believe, does more or less what you want it to do.
REPLACE INTO would be a solution, it uses the UNIQUE INDEX for replacing or inserting something.
REPLACE INTO
yourTable
SET
column = value;
Please be aware that this works differently from what you might expect, the REPLACE is quite literally. It first checks if there is a UNIQUE INDEX collision which would prevent an INSERT, it removes (DELETE) all rows which collide and then INSERTs the row you've given it.
This, for example, leads to subtle problems like Triggers not firing (because they check for an update, which never occurs) or values reverted to the defaults (because you must specify all values).
If you're doing a lot of these, it might be worth writing them to a file, and then using 'LOAD DATA INFILE ... REPLACE ...'

MySQL is handling one SQL query at the time?

If you got 100 000 users, is MySQL executing one SQL query at the time?
Because in my PHP code I check if a certain row exists; if it doesn't it creates one. If it does, it just updates the row counter.
It crossed my mind that perhaps 100 users are checking if the row exists at the same time, and when it doesn't they all create one row each.
If MySQL is handling them sequentially I know that it won't be an issue, then one user will check if it exists, if not, create it. The other user will check if it exists, and since that's the case, it just updates the counter.
But if they all check if it exists at the same time and let's say it doesn't, then they all create one row and the whole table structure will fail.
Would be great if someone could shed some light on this topic.
Use a UNIQUE constraint or, if viable, make the primary key one of your data items and the SQL server will prevent duplicate rows from being created. You can even use the "ON DUPLICATE KEY UPDATE ..." syntax to specify the alternate operation if the row already exists.
From your comments, it sounds like you could use the user_id as your primary key, in which case, you'd be able to use something like this:
INSERT INTO usercounts (user_id,usercount)
VALUES (id-goes-here,1)
ON DUPLICATE KEY UPDATE usercount=usercount+1;
If you put the check and insert into a transaction then you can avoid this problem. This way, the check and create will be run as one one query and there shouldn't be any confusion