Removing duplicates with unique index - mysql

I inserted between two tables fields A,B,C,D, believing I had created a Unique Index on A,B,C,D to prevent duplicates. However I somehow simply made a normal index on those. So duplicates got inserted. It is 20 million record table.
If I change my existing index from normal to unique or simply a add a new unique index for A,B,C,D will the duplicates be removed or will adding fail since unique records exist? I'd test it yet it is 30 mil records and I neither wish to mess the table up or duplicate it.

If you have duplicates in your table and you use
ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
the query will fail with Error 1062 (duplicate key).
But if you use IGNORE
-- (only works before MySQL 5.7.4)
ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
the duplicates will be removed. But the documentation doesn't specify which row will be kept:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or
if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key errors
occur. If IGNORE is specified, only one row is used of rows with
duplicates on a unique key. The other conflicting rows are deleted.
Incorrect values are truncated to the closest matching acceptable
value.
As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and
its use produces an error.
(ALTER TABLE Syntax)
If your version is 5.7.4 or greater - you can:
Copy the data into a temporary table (it doesn't technically need to be temporary).
Truncate the original table.
Create the UNIQUE INDEX.
And copy the data back with INSERT IGNORE (which is still available).
CREATE TABLE tmp_data SELECT * FROM mytable;
TRUNCATE TABLE mytable;
ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
INSERT IGNORE INTO mytable SELECT * from tmp_data;
DROP TABLE tmp_data;
If you use the IGNORE modifier, errors that occur while executing the
INSERT statement are ignored. For example, without IGNORE, a row that
duplicates an existing UNIQUE index or PRIMARY KEY value in the table
causes a duplicate-key error and the statement is aborted. With
IGNORE, the row is discarded and no error occurs. Ignored errors
generate warnings instead.
(INSERT Syntax)
Also see: INSERT ... SELECT Syntax and Comparison of the IGNORE Keyword and Strict SQL Mode

if you think there will be duplicates, adding the unique index will fail.
first check what duplicates there are:
select * from
(select a,b,c,d,count(*) as n from table_name group by a,b,c,d) x
where x.n > 1
This may be a expensive query on 20M rows, but will get you all duplicate keys that will prevent you from adding the primary index.
You could split this up into smaller chunks if you do a where in the subquery: where a='some_value'
For the records retrieved, you will have to change something to make the rows unique. If that is done (query returns 0 rows) you should be safe to add the primary index.

Instead of IGNORE you can use ON DUPLICATE KEY UPDATE, which will give you control over which values should prevail.

To answer your question- adding a UNIQUE constraint on a column that has duplicate values will throw an error.
For example, you can try the following script:
CREATE TABLE `USER` (
`USER_ID` INT NOT NULL,
`USERNAME` VARCHAR(45) NOT NULL,
`NAME` VARCHAR(45) NULL,
PRIMARY KEY (`USER_ID`));
INSERT INTO USER VALUES(1,'apple', 'woz'),(2,'apple', 'jobs'),
(3,'google', 'sergey'),(4,'google', 'larry');
ALTER TABLE `USER`
ADD UNIQUE INDEX `USERNAME_UNIQUE` (`USERNAME` ASC);
/*
Operation failed: There was an error while applying the SQL script to the database.
ERROR 1062: Duplicate entry 'apple' for key 'USERNAME_UNIQUE'
*/

Related

AUTO_INCREMENT not working as expected [duplicate]

I've been using InnoDB for a project, and relying on auto_increment. This is not a problem for most of the tables, but for tables with deletion, this might be an issue:
AUTO_INCREMENT Handling in InnoDB
particularly this part:
AUTO_INCREMENT column named ai_col: After a server startup, for the first insert into a table t, InnoDB executes the equivalent of this statement:
SELECT MAX(ai_col) FROM t FOR UPDATE;
InnoDB increments by one the value retrieved by the statement and assigns it to the column and to the auto-increment counter for the table.
This is a problem because while it ensures that within the table, the key is unique, there are foreign keys to this table where those keys are no longer unique.
The mysql server does/should not restart often, but this is breaking. Are there any easy ways around this?
If you have a foreign key constraint, how can you delete a row from table A when table B references that row? That seems like an error to me.
Regardless, you can avoid the reuse of auto-increment values by resetting the offset when your application starts back up. Query for the maximum in all the tables that reference table A, then alter the table above that maximum, e.g. if the max is 989, use this:
alter table TableA auto_increment=999;
Also beware that different MySQL engines have different auto-increment behavior. This trick works for InnoDB.
So you have two tables:
TableA
A_ID [PK]
and
TableB
B_ID [PK]
A_ID [FK, TableA.A_ID]
And in TableB, the value of A_ID is not unique? Or is there a value in TableB.A_ID that is not in TableA.A_ID?
If you need the value of TableB.A_ID to be unique, then you need to add a UNIQUE constraint to that column.
Or am I still missing something?
Use a foreign key constraint with 'SET NULL' for updates and deletes.
Create another table with a column that remembers the last created Id. This way you don't have to take care of the max values in new tables that have this as foreign key.
I checked.
alter table TableA auto_increment=1;
does NOT work.
And the reason I found in two documents
http://docs.oracle.com/cd/E17952_01/refman-5.1-en/innodb-auto-increment-handling.html
InnoDB uses the following algorithm to initialize the auto-increment counter for a table t that contains an AUTO_INCREMENT column named ai_col: After a server startup, for the first insert into a table t, InnoDB executes the equivalent of this statement:
SELECT MAX(ai_col) FROM t FOR UPDATE;
InnoDB increments the value retrieved by the statement and assigns it to the column and to the auto-increment counter for the table. By default, the value is incremented by one. This default can be overridden by the auto_increment_increment configuration setting.
and
http://docs.oracle.com/cd/E17952_01/refman-5.1-en/alter-table.html
You cannot reset the counter to a value less than or equal to any that have already been used.
This is the reason why alter table will not work. I think that only option is to wipe out data and rewrite it in a new table with new id.
In my case table was logfile , so I just did:
RENAME TABLE SystemEvents To SystemEvents_old;
CREATE TABLE SystemEvents LIKE SystemEvents_old;

How to reseed an "Auto increment" column for InnoDB engine database?

I am using an artificial primary key for a table. The table had two columns, one is the primary key and the other one is a Dates (datatype: Date) column. When I tried to load bulk data from a file (which contained values for the second column only), the YYYY part of the dates were added to the primary key column (which was the first column in the table) and the rest of the date was truncated.
So I needed to reset the table. I tried it using the Truncate table statement, but it failed with an error because this table was referenced in the foreign key constraint of another table. So I had to do it using the delete * from table; statement. I did delete all the records, but then when I inserted the records again (using the insert into statement this time), it started incrementing the ID starting from the year after the last year in the year I had previously inserted (i.e. it did not refresh it).
NOTE:- I am using MySQL 5.5 and InnoDB engine.
MY EFFORT SO FAR:-
I tried ALTER TABLE table1 AUTO_INCREMENT=0; (Reference Second Answer) ---> IT DID NOT HELP.
I tried ALTER TABLE table1 DROP column; (Reference- answer 1) ---> Error on rename of table1
Deleted the table again and tried to do:
DBCC CHECKIDENT('table1', RESEED, 0);
(Reference) ---> Syntax error at "DDBC" - Unexpected INDENT_QUOTED
(This statement is right after the delete table statement, if that
matters)
In this article, under the section named "Auto Increment Columns for INNODB Tables" and the heading "Update 17 Feb 2009:", it says that in InnoDB truncate does reset the AUTO_INCREMENT index in versions higher than MySQL 4.1... So I want some way to truncate my table, or do something else to reset the AUTO_INCREMENT index.
QUESTION:-
Is there a way to somehow reset the auto_increment when I delete the data in my table?
I need a way to fix the aforementioned DDBC CHECKINDENT error, or somehow truncate the table which has been referenced in a foreign key constraint of another table.
Follow below steps:
Step1: Truncate table after disabling foreign key constraint and then again enable-
set foreign_key_checks=0;
truncate table mytable;
set foreign_key_checks=1;
Step2: Now at the time of bulk uploading select columns in table only those are in your csv file means un-check rest one (auto id also) and make sure that colums in csv should be in same order as in your table. Also autoid columns should not in your csv file.
You can use below command to upload data.
LOAD DATA LOCAL INFILE '/root/myfile.csv' INTO TABLE mytable fields terminated by ',' enclosed by '"' lines terminated by '\n' (field2,field3,field5);
Note: If you are working in windows environment then change accordinglyl.
You can only reset the auto increment value to 1 (not 0). Therefore, unless I am mistaken you are looking for
alter table a auto_increment = 1;
You can query the next used auto increment value using
select auto_increment from information_schema.tables where
table_name='a' and table_schema=schema();
(Do not forget to replace 'a' with the actual name of your table).
You can play around with a test database (it is likely that your MySQL installation already has a database called test, otherwise create it using create database test;)
use test;
create table a (id int primary key auto_increment, x int); -- auto_increment = 1
insert into a (x) values (1), (42), (43), (12); -- auto_increment = 5
delete from a where id > 1; -- auto_increment = 5
alter table a auto_increment = 2; -- auto_increment = 2
delete from a;
alter table a auto_increment = 1; -- auto_increment = 1

MySQL: ALTER IGNORE TABLE gives "Integrity constraint violation"

I'm trying to remove duplicates from a MySQL table using ALTER IGNORE TABLE + an UNIQUE KEY. The MySQL documentation says:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
When I run the query ...
ALTER IGNORE TABLE table ADD UNIQUE INDEX dupidx (field)
... I still get the error #1062 - Duplicate entry 'blabla' for key 'dupidx'.
The IGNORE keyword extension to MySQL seems to have a bug in the InnoDB version on some version of MySQL.
You could always, convert to MyISAM, IGNORE-ADD the index and then convert back to InnoDB
ALTER TABLE table ENGINE MyISAM;
ALTER IGNORE TABLE table ADD UNIQUE INDEX dupidx (field);
ALTER TABLE table ENGINE InnoDB;
Note, if you have Foreign Key constraints this will not work, you will have to remove those first, and add them back later.
Or try set session old_alter_table=1 (Don't forget to set it back!)
See: http://mysqlolyk.wordpress.com/2012/02/18/alter-ignore-table-add-index-always-give-errors/
The problem is that you have duplicate data in the field you're trying to index. You'll need to remove the offending duplicates before you can add a unique index.
One way is to do the following:
CREATE TABLE tmp_table LIKE table;
ALTER IGNORE TABLE tmp_table ADD UNIQUE INDEX dupidx (field);
INSERT IGNORE INTO tmp_table SELECT * FROM table;
DROP TABLE table;
RENAME TABLE tmp_table TO table;
this allows you to insert only the unique data into the table

In MySQL, how do I write a query to skip a duplicate row while inserting, when there's no unique field (primary key)?

I am trying to insert rows into a table that has no unique field or primary key. How can I write a query that will simply ignore the insert if there already exists a row with the exact same values on all fields -- a duplicate row?
Thanks.
You must have a primary key or unique key defined on some column or columns in the table for uniqueness to have any meaning. Every mechanism for detecting duplicates automatically relies on this being true.
You can't do the SELECT COUNT(*)... solution because it's subject to race conditions. That is, someone could insert a duplicate row in the moment after you select and before you insert. The only way around this is to lock the table with SELECT ... FOR UPDATE or LOCK TABLES.
Uh, why not make a primary key?
Otherwise, you have to basically do SELECT COUNT(*) FROM table WHERE field1=value AND ... AND fieldN=value for before EVERY insert.

MySQL INSERT IGNORE not working

Here's my table with some sample data
a_id | b_id
------------
1 225
2 494
3 589
When I run this query
INSERT IGNORE INTO table_name (a_id, b_id) VALUES ('4', '230') ('2', '494')
It inserts both those rows when it's supposed to ignore the second value pair (2, 494)
No indexes defined, neither of those columns are primary.
What don't I know?
From the docs:
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row still is not inserted, but no error is issued.
(my italics).
Your row is not duplicating "an existing UNIQUE index or PRIMARY KEY value" since you have no primary key nor any unique constraints.
If, as you mention in one of your comments, you want neither field to be unique but you do want the combination to be unique, you need a composite primary key across both columns (get rid of any duplicates first):
alter table MYTABLE add primary key (a_id,b_id)
If you don't put a UNIQUE criteria or set a PRIMARY KEY, MySql won't know that your new entry is a duplicate.
if there is no primary key, there can't be duplicate key to ignore. you should always set a primary key, so pleae do that - and if you want to have additional colums that shouldn't be duplicate, set them as "unique".
If I understand you correctly, after you run the insert command your table looks like this
1 225
2 494
3 589
4 230
2 494
If so, then the answer is because your table design allows duplicates.
If you want it prevent the second record from being inserted, you'll need to define the a_id column as a primary key, or a unique index. If you do, then the insert ignore statement will work as you expect it to, i.e. insert the records, ignore the errors such as trying to add a duplicate record.