Is there a performance difference between using unique constraint and trigger to prevent duplicated rows in MySQL?
Monitor it. But well, the outcome should be obvious.
As the unique index exists specifically to enforce this constraint, it should be the first choice.
By using the trigger you would have to do additional operations to even check if there is the dataset (tablescan vs index lookup, or you set an index without constraint on the column but...), then react to it accordingly. So if there is nothing else you are trying to do (logging the failed attempt maybe), this would be unnecessary steps.
Related
I have a very large table (dozens of millions of rows) and a UNIQUE index needs to be added to a column on that table. I know for a fact that the table does contain duplicated values on that key, which I need to clean up (by deleting rows/resetting the value of the column to something unique that I can automatically generate). A plus is that the rows which are already duplicated do not get modified anymore.
What would be the right approach to perform a change like this, given that I will be probably using the Percona pt-osc tool and there are continuous deletes/inserts on the table? My plan was:
Add code that ensures no dupe IDs get inserted anymore. Probably I need to add a separate table for this temporarily, since I want the database to enforce this for me and not the application - so insert into the "shadow table" with a unique index in a transaction together with my main table, rollback all inserts that try to insert duplicate values
Backfill the table by zapping all invalid column values which are within the primary key range below $current_pkey_value
Then add the index and use pt-osc to changeover the table
Is there anything I am missing?
Since we use pt-online-schema-change we are using triggers for performing the synchronisation from the existing table to a temp table. The tool actually has a special configuration key for this, --no-check-unique-key-change, which will do exactly what we need - agree to perform the ALTER TABLE and set up triggers in such a way that if a conflict occurs, INSERT .. IGNORE will be applied and the first row having used the now-unique value will win in the insert during synchronisation. For us this is a good tradeoff because all the duplicates we have seen resulted from data races, not from actual conflicts in the value generation process.
I have a table that serves as a foreign key lookup from another table. The table is very simple, containing a ID column with is the primary key and a JSON column. I wish to remove abandoned entries from this table.
I tried running this script:
DELETE
FROM `ate`.`test_configuration`
WHERE `ate`.`test_configuration`.`ID` NOT IN (SELECT DISTINCT `ate`.`index`.`TestID` from `ate`.`index`);
But encountered an error stating my I wasn't using a where clause that uses the key column:
Error Code: 1175. You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column To disable safe mode, toggle the option in Preferences -> SQL Editor and reconnect.
This is confusing as my where clause does use the primary key column. I am aware that I can disable safe mode as part of my script as a workaround, but would still like to understand why I'm getting this error. I'd like to avoid unsafe updates if possible.
I believe Optimizer just unable to use index effectively for such query - so it does full table scan.
How many rows are in the test_configuration and how many of them will be deleted?
(You might try to use index hints to force optimizer to use index for the query, just not sure if they are supported in your version of mysql).
Update: After a lot of painful research, I've discovered what the problem actually is and updated the title to make a little more sense. I'll put my answer below.
Unfortunately, I'm not able to copy the query that's giving me this problem because it belongs to my company, so I'll have to keep my question very specific.
I have an INSERT INTO ... SELECT query that's returning this error:
Duplicate entry <gobbledygook> for key 'idx_<tablename>'
The tablename at the end is the correct name, but it has this weird idx_ prefix before it that's not a part of any of the tables I'm currently working with. What is that idx? Does it have something to do with the information_schema?
Update: Apparently, I need to clarify something: There is no column with idx in the name.
The numerous websearches didn't reveal much when I was trying to solve this problem, but I did finally figure it out (and JohnH's answer helped me to do this).
I finally discovered that "idx" is not something created by MySQL, but a name that someone else gave to the index. I have never come across a uniqueness constraint on an index that wasn't a key before, so I didn't know where that error came from.
This command showed all of the indices:
SHOW INDEX FROM <tablename>
And I was able to see that non-unique was set to 0 for this key.
To fix the problem, I was able to simply drop the index and recreate it, without adding a uniqueness constraint.
DROP INDEX idx_<tablename> ON <tablename>;
ALTER TABLE <tablename> ADD INDEX idx_<tablename> (<comma-separated columns>);
Whether or not removing the uniqueness constraint is a good idea remains to be seen, but it's also beyond the scope of this question.
"idx_" is a common prefix for index names.
You many have an index that does not allow duplicate values for the column values referenced by that index.
In my case the unique index had duplicate entries even though the column being indexed didn't. I can only think this was caused by a bug. Solution was
Stop the service that writes to the db
Drop the index
Recreate the index
(Do the operation that was previously failing)
Start the service
It's important if you are dropping an recreating an index that nothing can be given an opportunity to insert a duplicate entry while you are doing this. This is why I stopped the service that writes to the db.
i'd like to ask a question regarding Unique columns in MySQL.
Would like to ask experts on which is a better way to approach this problem, advantages or disadvantages if there is any.
Set a varchar column as unique
Do a SQL INSERT IGNORE
If affected rows > 0 proceed with running the code
versus
Leave a varchar column as not-unique
Do a search query to look for identical value
If there is no rows returned in query, Do a SQL INSERT
proceed with running the code
Neither of the 2 approaches is good.
You don't do INSERT IGNORE nor do you search. The searching part is also unreliable, because it fails at concurrency and compromises the integrity. Imagine this scenario: you and I try to insert the same info into the database. We connect at the same time. Code in question determines that there's no such record in the database, for both of us. We both insert the same data. Now your column isn't unique, therefore we'll end up with 2 records that are the same - your integrity now fails.
What you do is set the column to unique, insert and catch the exception in the language of your choice.
MySQL will fail in case of duplicate record, and any proper db driver for MySQL will interpret this as an exception.
Since you haven't mentioned what the language is, it's difficult to move forward with examples.
Defining a column as an unique index has a few advantages, first of all when you define it as an "unique index" MySQL can optimize your index for unique values (same as a primary key) because mysql doesn't have to check if there are more rows with the same value so it can use an optimized algoritme for the lookups.
Also you are assured that there never will be a double entry in your database instead of handeling this in multiple places in your code.
When you don't define it as UNIQUE you first need to check if an records exists in your table, and then insert something wich requires 2 queries (and even a full table lock) instead of 1 wich decreases your performance and is more error prone
http://dev.mysql.com/doc/refman/5.0/en/constraint-primary-key.html
I'm leaving the fact that you would use the INSERT IGNORE wich IGNORES the exception when the entry allready exists in the database (Still you could use it for high performance operations maybe in some sort of special case). A normal INSERT will give you the feedback if an entry allready exists
Putting a constraint like UNIQUE is better when it comes to query performance and data reliability. But there is also a trade-off when it comes to writing. So It's up to you which do you prefer. But in your case, since you also do INSERT IF NOT EXIST query, so I guess, it's better to just use the Constraint.
If you got 100 000 users, is MySQL executing one SQL query at the time?
Because in my PHP code I check if a certain row exists; if it doesn't it creates one. If it does, it just updates the row counter.
It crossed my mind that perhaps 100 users are checking if the row exists at the same time, and when it doesn't they all create one row each.
If MySQL is handling them sequentially I know that it won't be an issue, then one user will check if it exists, if not, create it. The other user will check if it exists, and since that's the case, it just updates the counter.
But if they all check if it exists at the same time and let's say it doesn't, then they all create one row and the whole table structure will fail.
Would be great if someone could shed some light on this topic.
Use a UNIQUE constraint or, if viable, make the primary key one of your data items and the SQL server will prevent duplicate rows from being created. You can even use the "ON DUPLICATE KEY UPDATE ..." syntax to specify the alternate operation if the row already exists.
From your comments, it sounds like you could use the user_id as your primary key, in which case, you'd be able to use something like this:
INSERT INTO usercounts (user_id,usercount)
VALUES (id-goes-here,1)
ON DUPLICATE KEY UPDATE usercount=usercount+1;
If you put the check and insert into a transaction then you can avoid this problem. This way, the check and create will be run as one one query and there shouldn't be any confusion