Remove duplicates from table - mysql

I have a table with only a primary key column and a text column. The text column has duplicate values and I want those gone.
What have I tried
I googled around a bit and quickly found what I thought was the answer, which was this:
ALTER IGNORE TABLE tablename ADD UNIQUE INDEX index_name (column1);
But after trying to execute the query I ended up with MySQL saying: "#1062 - Duplicate entry 'v&d' for key 'remove_duplicates'". So after fiddeling with that for a while I found that it didn't work.
After that I tried creating a tmp table and removing the old one, but I couldn't get that going either. I may have gotten the syntax wrong which was:
CREATE table `tmp` like `Tag`
alter table tmp add unique (text)
INSERT IGNORE INTO `tmp` SELECT * FROM `Tag`
RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`
DROP TABLE `deleteme`;
What do I want
A table that has no duplicate values for column 'text'. If anyone sees any errors with my previous methods please let me know, or if you think it should/could be done in a different way please let me know!
Edit
I forgot to mention that I also have a relation hanging on the PK (yeah, quite important I know). Is there some way to "preserve" the relation with the other table as well? I could manually change the id's in the other table if need be, but a way to change that as well would be great.

When you insert into the table, remove the duplicates then:
CREATE table `tmp` like `Tag`
alter table tmp add unique (text)
INSERT INTO `tmp` SELECT min(pk), text FROM `Tag` group by text;
RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`
DROP TABLE `deleteme`;

This is how i would done it on a pretty large table i assume you have a column "id"
ALTER TABLE Tag ADD UNIQUE INDEX text_id (text, id);
Create a UNIQUE INDEX on column1. id so the next querys should run faster.
Then if you would like to know how many duplicates you had in the table Tag
SELECT COUNT(*) as "total" - COUNT(DISTINCT text) as "duplicates text" FROM Tag
To get the first row that is UNIQUE you should run (if FIFO is imporant)
INSERT INTO `tmp` SELECT MIN(id), text FROM `Tag` GROUP BY text (Gordon Linoff query)
To get the last row that is UNIQUE you should run (if LIFO is imporant)
INSERT INTO `tmp` SELECT MAX(id), text FROM `Tag` GROUP BY text
Because off the covering index the copy should be pretty quick, if the server doesn't need to create a disk temporary table...
RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`
DROP TABLE `deleteme`;

Related

Unique Column in mysql database

I m facing a probem and i don't believe that it can accept a solution so I hope if anyone knows a solution suggest it, please.
I have a column in my table that contains a certain records; some of those records are duplicated and I want to insert some new records into my table, but I wish for the new records to not be duplicated. So, basically I want to control when the data can be duplicated and when not.
I ve tried this but it does not work:
ALTER TABLE MyTable DROP PRIMARY KEY
ALTER TABLE MyTable
ADD PRIMARY KEY (`S.No`),
ADD UNIQUE KEY `PCID_uk` (`PCID`),
ADD UNIQUE KEY `USERNAME_uk` (`USERNAME`)
some of those records are duplicated and I want to insert some new records into my table, but I wish for the new records to not be duplicated
Constraints are meant to guarantee integrity over the whole table, so what you ask for is not not straight forward, but still possible.
The idea is to create a new column with a default value of 1, and then feed it using row_number() (available in MySQL 8.0). Assuming that the primary key of your table is id, and that you want to enforce partial uniqueness on column col, that would look like:
alter table mytable add col_rn int default 1;
update mytable t
inner join (
select id, row_number() over(partition by col order by id) rn
from mytable
) t1 on t1.id = t.id
set t.col_rn = t.rn;
With this set up at hand, you can create the following unique constraint
alter table mytable add constraint unique_col_rn unique (col, col_rn);
Now you can insert new records in your table, not providing values for col_rn, so it defaults to 1. If a record already exists for col, the unique constraint raises an error.
insert into mytable (col) values (...);

How to delete the duplicate record from table in mysql

I want to delete duplicate records from table. The table does not have primary key (ID), so answers like this are not suitable.
Here is my try:
DELETE FROM afscp_permit
USING afscp_permit, afscp_permit AS vtable
WHERE (NOT afscp_permit.field_name=vtable.field_name)
AND (afscp_permit.field_name=vtable.field_name)
Following might help if there is no bug in your DBMS and no constraints are in the way:
ALTER IGNORE TABLE afscp_permit ADD UNIQUE INDEX field_name_index (field_name );
How about creating a temporary table with the same columns and doing:
INSERT INTO temp SELECT DISTINCT * FROM afscp_permit;
DELETE FROM afscp_permit;
INSERT INTO afscp_permit SELECT * FROM temp;
DROP TABLE temp;

Error 156: table already exists when adding a primary key

I'm using the following statement ALTER TABLE my_tbl ADD PRIMARY KEY (id ); to add a primary key to an existing MySQL table. In reply I'm getting the error:
Error 156 : Table 'db_name.my_tbl#1' already exists.
I checked and the table has no duplicate id entries, and if I do something like DROP TABLE my_tbl#1 then the original table (my_tbl) is deleted. It's perhaps interesting to note that my_tbl was created by Create Table my_tbl SELECT id, ... FROM tmp_tbl (where tmp_tbl is a temporary table).
Anyone has an idea what's going on here?
Update: there seems to be some kind of an orphaned table situation here. I tried the suggestions in the answers below, but in my case they did not resolve the problem. I finally used a workaround: I created a table with a different name (e.g. my_tbl_new) , copied the information to this table and added to it the primary key. I Then deleted the original table and renamed the new one back to my_tbl.
try something like this:-
ALTER TABLE my_tbl DROP PRIMARY KEY, ADD PRIMARY KEY(id,id);
or try this:-
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS WHERE CONSTRAINT_TYPE = 'PRIMARY KEY'
AND TABLE_NAME = '[my_tbl]'
AND TABLE_SCHEMA ='dbo' )
BEGIN
ALTER TABLE [dbo].[my_tbl] ADD CONSTRAINT [PK_ID] PRIMARY KEY CLUSTERED ([ID])
END
or try to flush the table like this:-
DROP TABLE IF EXISTS `my_tbl` ;
FLUSH TABLES `my_tbl` ;
CREATE TABLE `my_tbl` ...
DROP TABLE IF EXISTS `mytable` ;
FLUSH TABLES `mytable` ;
CREATE TABLE `mytable` ...
Also it might be a permission issue.
I had the same problem while trying to alter indexes, through SQLyog, when my database name contained "-" chars. So I renamed the database to not have them and then it worked just fine.
(Since there's no direct way to rename a DB, I had to copy it to a new one, with correct name)

Performance concerns regarding ALTER TABLE ADD COLUMN

I have a big MySQL InnoDB table having 5 million rows. I need to add a column to the table which will have a default int value.
What is the best way to do it? The normal alter table command appears to take a lot of time. Is there any better way to do it? Basically I want to know if there is any faster way or efficient way of doing it.
And if the table has foreign key references, is there any way other than alter table to do this?
Any help appreciated.
I would not say this is a better way, but ... You could create a separate table for the new data and set it up as foreign key relationship to the existing table. That would be "fast", but if the data really belongs in the main table and every (or most) existing records will have a value, then you should just alter the table and add it.
Suppose the table looked like this:
CREATE TABLE mytable
(
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(25),
PRIMARY KEY (id),
KEY name (name)
);
and you want to add an age column with
ALTER TABLE mytable ADD COLUMN age INT NOT NULL DEFAULT 0;
You could perform the ALTER TABLE in stages as follows:
CREATE TABLE mytablenew LIKE mytable;
ALTER TABLE mytablenew ADD COLUMN age INT NOT NULL DEFAULT 0;
INSERT INTO mytablenew SELECT id,name FROM mytable;
ALTER TABLE mytable RENAME mytableold;
ALTER TABLE mytablenew RENAME mytable;
DROP TABLE mytableold;
If mytable uses the MyISAM Storage Engine and has nonunique indexes, add two more lines
CREATE TABLE mytablenew LIKE mytable;
ALTER TABLE mytablenew ADD COLUMN address VARCHAR(50);
ALTER TABLE mytablenew DISABLE KEYS;
INSERT INTO mytablenew SELECT id,name FROM mytable;
ALTER TABLE mytable RENAME mytableold;
ALTER TABLE mytablenew RENAME mytable;
DROP TABLE mytableold;
ALTER TABLE mytable ENABLE KEYS;
This will let you see how many seconds each stage takes. From here, you can decide whether or not a straightforward ALTER TABLE is better.
This technique gets a little gory if there are foreign key references.
Your steps would be
SET UNIQUE_CHECKS = 0;
SET FOREIGN_KEY_CHECKS = 0;
Drop the foreign key references in mytable.
Perform the ALTER TABLE in Stages
Create the foreign key references in mytable.
SET UNIQUE_CHECKS = 1;
SET FOREIGN_KEY_CHECKS = 1;
Give it a Try !!!

How to do a MySQL Insert if unique, but the columns are too long for unique index

I have found a great answer for when inserting a new record, ignore if the data already exists.
1) Create a UNIQUE INDEX on the columns.
2) INSERT IGNORE INTO ...
But my problem is that one of the columns is a VARCHAR(2000)**, and MySQL has a 1000-character limit to indexes.
The columns are: id (int), type (varchar 35), data (varchar 2000)
So is there a way to make sure I'm not adding the same data twice with a single query? Or do I need to do a select first to check for existence and if false, perform the insert?
Thanks.
** This is not design, I'm just moving data around so no chance of making this column smaller.
Given the table design you mentioned:
CREATE TABLE mydb.mytable
(
id int not null auto_increment,
type varchar(35),
data varchar(2000),
primary key (id)
);
Your best chance would be the following:
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new ADD COLUMN data_hash CHAR(40);
ALTER TABLE mydb.mytable_new ADD UNIQUE INDEX (data_hash);
INSERT INTO mydb.mytable_new (id,type,data,data_hash)
SELECT id,type,data,UPPER(SHA(data)) FROM mydb.mytable;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
DROP TABLE mydb.mytable_old;
Once you add this new column and index, table should now look like this:
CREATE TABLE mydb.mytable
(
id int not null auto_increment,
type varchar(35),
data varchar(2000),
data_hash char(40),
primary key (id),
unique key data_hash (data_hash)
);
Simply perform your operations as follows:
INSERTs
INSERT INTO mydb.mytable (type,data,data_hash)
VALUES ('somtype','newdata',UPPER(SHA('newdata')));
INSERTs should fail on data_hash is you attempt a duplicate key insertion
SELECTs
SELECT * FROM mydb.mytable
WHERE data_hash = UPPER(SHA('data_I_am_searching_for'));
Give it a Try !!!
Hashes have collisions so unless you don’t care about that, this is not a foolproof solution