Delete Duplicate Rows of specific column value from mysql table - mysql

I've an 'orders' table structure like this which contains 100,000 records:
date orderid type productsales other
01-Aug-2014 11 order 118 10.12
01-Aug-2014 11 order 118 10.12
18-Aug-2014 11 order 35 4.21
22-Aug-2014 11 Refund -35 -4.21
09-Sep-2014 12 order 56 7.29
15-Sep-2014 12 refund -56 -7.29
23-Oct-2014 13 order 25 2.32
26-Oct-2014 13 refund -25 -2.32
Now, what I want to achieve is to delete those duplicate row from my table where the orderid, type, productsales and other columns values are same to each other and keep only one row (look at the first two records for the orderid of 11).
But if the 'orderids' are same for the two records of the same 'type' of order, but the 'productsales' and 'other' columns values are different then don't delete those rows. I hope I clarified my point.
I'm looking for a mysql delete query to perform this task.

You should add an id column. If you don't want to use a temp table, you could probably do something like this (I have NOT tested this, so...):
ALTER TABLE 'orders'
ADD COLUMN 'id' INT NOT NULL AUTO_INCREMENT FIRST, ADD PRIMARY KEY Id(id)
DELETE
FROM orders INNER JOIN
(
SELECT TOP 1 id
FROM orders
WHERE COUNT(DISTINCT date,orderid,type.productsales,other) > 1
) dupes
ON orders.id = dupes.id

May be its duplicate question to this: MySql: remove table rows depending on column duplicate values?
You can seek for the answer there.
The solution there specify that adding unique index on your possible duplicate columns with IGNORE keyword will remove all duplicates row.
ALTER IGNORE TABLE `table` ADD UNIQUE INDEX `name` (`col1`, `col2`, `col3`);
Here I also want to mention some points:
unique index does not make change in row if any columns(from index, like here 3 columns) have null as value. Ex: null,1,"asdsa" can be stored twice
same way if you have single column in unique index then multiple rows with null values(for that column) will remains in table
IGNORE keywords id depreciated now, it will not work after MySQL 5.6(may be). Now only option is to create new table by a query like this:
CREATE TABLE <table_name> AS SELECT * FROM <your_table> GROUP BY col1,col2,col3;
After that you can delete <your_table> and rename <table_name> to your table.
Here you can change the column list in Group By clause according to your need(from all columns to one column, or few columns which have duplicate values together).
The plus point is, it will work with null values also.

A really easy way to do this is to add a UNIQUE index on the 3 columns. When you write the ALTER statement, include the IGNORE keyword. Like so:
ALTER IGNORE TABLE orders ADD UNIQUE INDEX idx_name (orderid, type, productsales, other);
This will drop all the duplicate rows. As an added benefit, future INSERTs that are duplicates will error out. As always, you may want to take a backup before running something like this...
I hope this can help you.

try this.
create temp table such as temp and stored unique data,
SELECT distinct * into temp FROM Orders
then delete records of orders table table as
DELETE FROM orders
after deleted all records insert records temp into records.
INSER into RECORDS SELECT * FROM TEMP DROP TABLE TEMP

If you have completely duplicated rows, and you want to do this in SQL, then perhaps the best method is to save the rows you want in a temporary table, truncate the table, and insert the data back in:
create temporary table temp_orders as
select distinct *
from orders;
truncate table orders;
alter table orders add orderid int not null primary key auto_increment;
insert into orders;
select *
from temp_orders;
Oh, look, I also added an auto-incrementing primary key so you won't have this problem in the future. This would be a simpler process if you have a unique key on each row.

Related

Delet just one record of duplicate

i am trying to delete e-mail duplicates from table nlt_user
this query is showing correctly records having duplicates:
select [e-mail], count([e-mail])
from nlt_user
group by [e-mail]
having count([e-mail]) > 1
now how can i delete all records having duplicate but one?
Thank you
If MySQL version is prior 5.7.4 you can add a UNIQUE index on the column e-mail with the IGNORE keyword.
This will remove all the duplicate e-mail rows:
ALTER IGNORE TABLE nlt_user
ADD UNIQUE INDEX idx_e-mail (e-mail);
If > 5.7.4 you can use a temporary table (IGNORE not possible on ALTER anymore):
CREATE TABLE nlt_user_new LIKE nlt_user;
ALTER TABLE nlt_user_new ADD UNIQUE INDEX (emailaddress);
INSERT IGNORE INTO nlt_user_new SELECT * FROM nlt_user;
DROP TABLE nlt_user;
RENAME TABLE nlt_user_new TO nlt_user;
Try this :
delete n1 from nlt_user n1
inner join nlt_user n2 on n1.e-mail=n2.e-mail and n1.id>n2.id;
This will keep record with minimum ID value of duplicates and deletes remaining duplicate records
The rank function can be employed to retain only the unique values
1:Create a new table which contains only unique values
Example: nlt_user_unique
CREATE TABLE nlt_user_unique AS
(SELECT * FROM
(SELECT A.*,RANK() OVER (PARTITION BY email ORDER BY email) RNK
FROM nlt_user A)
where RNK=1)
2:Truncate the orignal table containing duplicates
truncate table nlt_user
3:Insert the unique rows from the table created in step 1 to your table nlt_user
INSERT INTO nlt_user()
SELECT email from nlt_user_unique;

Remove duplicate values without ID

I have a table like this:
uuid | username | first_seen | last_seen | score
Before, the table used the primary key of a "player_id" column that ascended. I removed this player_id as I no longer needed it. I want to make the 'uuid' the primary key, but there's a lot of duplicates. I want to remove all these duplicates from the table, but keep the first one (based off the row number, the first row stays).
How can I do this? I've searched up everywhere, but they all show how to do it if you have a row ID column...
I highly advocate having auto-incremented integer primary keys. So, I would encourage you to go back. These are useful for several reasons, such as:
They tell you the insert order of rows.
They are more efficient for primary keys.
Because primary keys are clustered in MySQL, they always go at the end.
But, you don't have to follow that advice. My recommendation would be to insert the data into a new table and reload into your desired table:
create temporary table tt as
select t.*
from tt
group by tt.uuid;
truncate table t;
alter table t add constraint pk_uuid primary key (uuid);
insert into t
select * from tt;
Note: I am using a (mis)feature of MySQL that allows you to group by one column while pulling columns not in the group by. I don't like this extension, but you do not specify how to choose the particular row you want. This will give values for the other columns from matching rows. There are other ways to get one row per uuid.

How to delete a certain row from mysql table with same column values?

I have a problem with my queries in MySQL. My table has 4 columns and it looks something like this:
id_users id_product quantity date
1 2 1 2013
1 2 1 2013
2 2 1 2013
1 3 1 2013
id_users and id_product are foreign keys from different tables.
What I want is to delete just one row:
1 2 1 2013
Which appears twice, so I just want to delete it.
I've tried this query:
delete from orders where id_users = 1 and id_product = 2
But it will delete both of them (since they are duplicated). Any hints on solving this problem?
Add a limit to the delete query
delete from orders
where id_users = 1 and id_product = 2
limit 1
All tables should have a primary key (consisting of a single or multiple columns), duplicate rows doesn't make sense in a relational database. You can limit the number of delete rows using LIMIT though:
DELETE FROM orders WHERE id_users = 1 AND id_product = 2 LIMIT 1
But that just solves your current issue, you should definitely work on the bigger issue by defining primary keys.
You need to specify the number of rows which should be deleted. In your case (and I assume that you only want to keep one) this can be done like this:
DELETE FROM your_table WHERE id_users=1 AND id_product=2
LIMIT (SELECT COUNT(*)-1 FROM your_table WHERE id_users=1 AND id_product=2)
Best way to design table is add one temporary row as auto increment and keep as primary key. So we can avoid such above issues.
There are already answers for Deleting row by LIMIT. Ideally you should have primary key in your table. But if there is not.
I will give other ways:
By creating Unique index
I see id_users and id_product should be unique in your example.
ALTER IGNORE TABLE orders ADD UNIQUE INDEX unique_columns_index (id_users, id_product)
These will delete duplicate rows with same data.
But if you still get an error, even if you use IGNORE clause, try this:
ALTER TABLE orders ENGINE MyISAM;
ALTER IGNORE TABLE orders ADD UNIQUE INDEX unique_columns_index (id_users, id_product)
ALTER TABLE orders ENGINE InnoDB;
By creating table again
If there are multiple rows who have duplicate values, then you can also recreate table
RENAME TABLE `orders` TO `orders2`;
CREATE TABLE `orders`
SELECT * FROM `orders2` GROUP BY id_users, id_product;
You must add an id that auto-increment for each row, after that you can delet the row by its id.
so your table will have an unique id for each row and the id_user, id_product ecc...

How do I delete all the duplicate records in a MySQL table without temp tables

I've seen a number of variations on this but nothing quite matches what I'm trying to accomplish.
I have a table, TableA, which contain the answers given by users to configurable questionnaires. The columns are member_id, quiz_num, question_num, answer_num.
Somehow a few members got their answers submitted twice. So I need to remove the duplicated records, but make sure that one row is left behind.
There is no primary column so there could be two or three rows all with the exact same data.
Is there a query to remove all the duplicates?
Add Unique Index on your table:
ALTER IGNORE TABLE `TableA`
ADD UNIQUE INDEX (`member_id`, `quiz_num`, `question_num`, `answer_num`);
Another way to do this would be:
Add primary key in your table then you can easily remove duplicates from your table using the following query:
DELETE FROM member
WHERE id IN (SELECT *
FROM (SELECT id FROM member
GROUP BY member_id, quiz_num, question_num, answer_num HAVING (COUNT(*) > 1)
) AS A
);
Instead of drop table TableA, you could delete all registers (delete from TableA;) and then populate original table with registers coming from TableA_Verify (insert into TAbleA select * from TAbleA_Verify). In this way you won't lost all references to original table (indexes,... )
CREATE TABLE TableA_Verify AS SELECT DISTINCT * FROM TableA;
DELETE FROM TableA;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
DROP TABLE TableA_Verify;
This doesn't use TEMP Tables, but real tables instead. If the problem is just about temp tables and not about table creation or dropping tables, this will work:
SELECT DISTINCT * INTO TableA_Verify FROM TableA;
DROP TABLE TableA;
RENAME TABLE TableA_Verify TO TableA;
Thanks to jveirasv for the answer above.
If you need to remove duplicates of a specific sets of column, you can use this (if you have a timestamp in the table that vary for example)
CREATE TABLE TableA_Verify AS SELECT * FROM TableA WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];
DELETE FROM TableA;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
DROP TABLE TableA_Verify;
Add Unique Index on your table:
ALTER IGNORE TABLE TableA
ADD UNIQUE INDEX (member_id, quiz_num, question_num, answer_num);
is work very well
If you are not using any primary key, then execute following queries at one single stroke. By replacing values:
# table_name - Your Table Name
# column_name_of_duplicates - Name of column where duplicate entries are found
create table table_name_temp like table_name;
insert into table_name_temp select distinct(column_name_of_duplicates),value,type from table_name group by column_name_of_duplicates;
delete from table_name;
insert into table_name select * from table_name_temp;
drop table table_name_temp
create temporary table and store distinct(non duplicate) values
make empty original table
insert values to original table from temp table
delete temp table
It is always advisable to take backup of database before you play with it.
As noted in the comments, the query in Saharsh Shah's answer must be run multiple times if items are duplicated more than once.
Here's a solution that doesn't delete any data, and keeps the data in the original table the entire time, allowing for duplicates to be deleted while keeping the table 'live':
alter table tableA add column duplicate tinyint(1) not null default '0';
update tableA set
duplicate=if(#member_id=member_id
and #quiz_num=quiz_num
and #question_num=question_num
and #answer_num=answer_num,1,0),
member_id=(#member_id:=member_id),
quiz_num=(#quiz_num:=quiz_num),
question_num=(#question_num:=question_num),
answer_num=(#answer_num:=answer_num)
order by member_id, quiz_num, question_num, answer_num;
delete from tableA where duplicate=1;
alter table tableA drop column duplicate;
This basically checks to see if the current row is the same as the last row, and if it is, marks it as duplicate (the order statement ensures that duplicates will show up next to each other). Then you delete the duplicate records. I remove the duplicate column at the end to bring it back to its original state.
It looks like alter table ignore also might go away soon: http://dev.mysql.com/worklog/task/?id=7395
An alternative way would be to create a new temporary table with same structure.
CREATE TABLE temp_table AS SELECT * FROM original_table LIMIT 0
Then create the primary key in the table.
ALTER TABLE temp_table ADD PRIMARY KEY (primary-key-field)
Finally copy all records from the original table while ignoring the duplicate records.
INSERT IGNORE INTO temp_table AS SELECT * FROM original_table
Now you can delete the original table and rename the new table.
DROP TABLE original_table
RENAME TABLE temp_table TO original_table
Tested in mysql 5.Dont know about other versions.
If you want to keep the row with the lowest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.member_id = n2.member_id and n1.answer_num =n2.answer_num
If you want to keep the row with the highest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.member_id = n2.member_id and n1.answer_num =n2.answer_num

Duplicate Entries in DB

I have a huge table of products but there are lot of duplicate entries. The table has more than10 Thousand entries and I want to remove the duplicate entries in it without manually finding and deleting it. Please let me know if you can provide me a solution for this
You could use SELECT DISTINCT INTO TempTable, drop the original table, and then rename the temp one.
You should also add primary and unique keys to avoid this sort of thing in the future.
for full row duplicates try this.
select distinct * into mytable_tmp from mytable
drop table mytable
alter table mytable_tmp rename mytable
Seems the below statements will help you in resolving your requirements.
if the table(foo) has primary key field
First step
store key values in temporary table, give your unique conditions in group by clause
if you want to delete the duplicate email id, give email id in group by clause and give the primary key name in
select clause like either min(primarykey) or max(primarykey)
CREATE TEMPORARY TABLE temptable AS SELECT min( primarykey ) FROM foo GROUP BY uniquefields;
Second step
call the below delete statement and give the table name and primarykey columns
DELETE FROM foo WHERE primarykey NOT IN (SELECT * FROM temptable );
execute both the query combined in your query analyser or db tool.
If the table(foo) doesn't have a primary key filed
step 1
CREATE TABLE temp_table AS SELECT * FROM foo GROUP BY field or fileds;
step 2
DELETE FROM foo;
step 3
INSERT INTO foo select * from temp_table;
There are different solutions to remove duplicate rows and it fully depends upon your scenario to make use of one from them. The simplest method is to alter the table making the Unique Index on Product Name field:
alter ignore table products add unique index `unique_index` (product_name);
You can remove the index after getting all the duplicate rows deleted:
alter table products drop index `unique_index`;
Please let me know if this resolves the issue. If not I can give you alternate solutions for that.
You can add more than one column to a group by. I.E.
SELECT * from tableName GROUP BY prod_name HAVING count(prod_name) > 1
That will show the unique products. You can write it dump it to new table and drop the existing one.