Removing duplicate entries in an SQL table? - mysql

I did research but the answers were too complicated to convert to my schema and solution.
I have a table which I forgot to make a field unique in and now the insert has created lots and lots of items under the same field value. My table name is queue_items and the field is called item - how can I remove duplicates of item field?
I still want to be left with 1 item of the duplicates if that makes sense, but just delete any more than 1.
Here is what I've got so far
WITH CTE AS(
SELECT `item`
RN = ROW_NUMBER()OVER(PARTITION BY `item` ORDER BY `item`)
FROM `queue_items`
)
DELETE FROM CTE WHERE RN > 1

If you have a primary key like e.g. id you could try something like:
DELETE FROM
queue_items
WHERE
id
NOT IN (
SELECT MIN(id) FROM queue_items GROUP BY item
);

I would suggest emptying the table and repopulating it:
CREATE temp_qi AS
SELECT i.*
FROM (SELECT qi.*,
ROW_NUMBER()OVER(PARTITION BY `item` ORDER BY `item`) as seqnum
FROM `queue_items`
) qi
WHERE seqnum = 1;
ALTER TABLE drop_column seqnum;
TRUNCATE TABLE queue_items; -- backup before doing this!
INSERT INTO queue_items
SELECT * -- columns should be in the right order
FROM temp_qi;

Well that was s tricky one :) As far As I noticed you have a SQL-Table with Duplicate Entries, no Unique-Key (of course) and now simply want to get rid of the duplicates. I tried to recreate this using mySQL, which is not that easy for one can not DELETE / UPDATE and Querying to the same Table in mySQL per Definition.
Instead I had to follow this simple workaround:
Create a new TempTable, with the same structure as the original table
Copy every entry, to the new TempTable BUT group them by the duplicate_ID
DELETE original table
RENAME TempTable to the original table's name
In SQL you can do so by running the following queries - but make sure you have a backup, just in case.... :)
Workaround Delete duplicate SQL entries:
CREATE TABLE newTestTabelle LIKE TestTabelle;
//make sure table was created successfully before next step
INSERT INTO newTestTabelle
SELECT * FROM TestTabelle
GROUP BY myIndex;
//make sure copy was successfully done
DROP TABLE TestTabelle;
ALTER TABLE newTestTabelle RENAME TO TestTabelle;
Hint: I found a similar solution and very nice documentation for that under following link (http://www.mysqltutorial.org/mysql-delete-duplicate-rows/) - read for further information on topic

Related

Delete Duplicate records and keep one in MYSQL version 5.7 ( Table with out primary key)

We have some duplicate entries in our Items Table and trying to delete them but need one out of them
Table: Items (No Primary Key
ItemNumber,lastModifiedDate
10056,'2020-10-19'
10056,'2020-10-19'
10057,'2020-10-19'
10057,'2020-10-20'
Expected Output:
ItemNumber,lastModifiedDate
10056,'2020-10-19'
10057,'2020-10-20'
I tried below :
delete from Items where (ItemNumber,LastModifiedDate) not in
(
SELECT
ItemNumber,max(LastModifiedDate) LastModifiedDate
FROM
(select * from Items ) Items
GROUP BY
ItemNumber
);
We can do it in Mysql V8 using ROW_NUMBER() windows Function, but that feature is not available in 5.7, and i can't upgrade the DB now.
Thanks in Advance
Your problem is actually tricky because your duplicate records are really identical in every way. One approach here is to filter off the duplicates in a temporary table. Then truncate your current table and populate it using the filtered data.
CREATE TEMPORARY TABLE ItemsTemp AS (
SELECT ItemNumber, MAX(lastModifiedDate) AS lastModifiedDate
FROM Items
GROUP BY ItemNumber
)
TRUNCATE TABLE Items; -- remove all data in Items
-- repopulate Items using non duplicate data
INSERT INTO Items (ItemNumber, lastModifiedDate)
SELECT ItemNumber, lastModifiedDate
FROM ItemsTemp;
DROP TABLE ItemsTemp; -- drop the temporary table
I found the answer in other way round.
alter table add id column with auto increment by 1,
then you got the differentiator, group by the required fields, keep min(id) record and delete remaining duplicates.

MYSQL drop duplicates of userid

I thought I'd made the column userid in my table "userslive" unique, but somehow must have made a mistake. I've seen multiple answers to this question, but I'm afraid of messing up again so I hope someone can help me directly.
So this table has no unique columns, but I've got a column "timer" which was the timestamp of scraping the data. If possible I'd like to drop rows with the lowest "timer" with duplicate "userid" column.
It's a fairly big table at about 2 million rows (20 columns). There is about 1000 duplicate userid which I've found using this query:
SELECT userid, COUNT(userid) as cnt FROM userslive GROUP BY userid HAVING (cnt > 1);
Is this the correct syntax? I tried this on a backup table, but I suspect this is too heavy for a table this big (unless left to run for a very long time.
DELETE FROM userslive using userslive,
userslive e1
where userslive.timer < e1.timer
and userslive.userid = e1.userid
Is there a quicker way to do this?
EDIT: I should say the "timer" is not a unique column.
DELETE t1.* /* delete from a copy named t1 only */
FROM userslive t1, userslive t2
WHERE t1.userid = t2.userid
AND t1.timer < t2.timer
fiddle
Logic: if for some record (in a copy aliased as t1) we can find a record (in a table copy aliased as t2) with the same user but with greater/later timer value - this record must be deleted.
I've done this in the past and the easiest way to solve this is to add an id column and then select userid, max(new_id) into a new table and join that for the delete. Something like this.
ALTER TABLE `userslive`
ADD `new_id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
Now you have your new unique column and create a new table for selecting the ones to delete.
CREATE TABLE `users_to_delete`
AS
SELECT userid, new_id
FROM (
SELECT userid, max(new_id) new_id, count(*) user_rows
FROM `userslive`
GROUP BY 1
) dataset
WHERE user_rows > 1
Then use that to delete your duplicate rows by joining it into a DELETE statement like this:
DELETE `userslive` FROM `userslive`
INNER JOIN `users_to_delete` USING(userid,new_id);
Make sure you back everything up before you delete anything just in case.

Mysql, Insert new record into table B if foreign key exists in table A

There are a few similar questions on here. None provide a solution. I would like to INSERT a NEW record into table B, but only if a foreign key exists in table A. To be clear, I do not wish to insert the result of a select. I just need to know that the foreign key exists.
INSERT INTO tableB (tableA_ID,code,notes,created) VALUES ('24','1','test',NOW())
SELECT tableA_ID FROM tableA WHERE tableA_ID='24' AND owner_ID='9'
Clearly, the above does not work. But is this even possible? I want to insert the NEW data into tableB, only if the record for the row in tableA exists and belongs to owner_ID.
The queries I have seen so far relate to INSERTING the results from the SELECT query - I do not wish to do that.
Try this:
INSERT INTO tableB (tableA_ID,code,notes,created)
SELECT id, code, notes, created
FROM ( SELECT '24' as id, '1' as code, 'test' as notes, NOW() as created) t
WHERE EXISTS
(
SELECT tableA_ID
FROM tableA
WHERE tableA_ID='24' AND owner_ID='9'
)
I know it's a pretty much old answered question but it's highly ranked now in google search results and I think some addition may help someone in the future.
In some DB configuration, you may want to insert a row in a table that have two or more foreign keys. Let's say we have four tables in a chat application :
Users, Threads, Thread_Users and Messages
If we want a User to join a Thread we'll want to insert a row in Thread_Users in wich have two foreign keys : user_id, thread_id.
Then, we can use a query like this, to insert if both foreign keys exists, and silently fail otherwise :
INSERT INTO `thread_users` (thread_id,user_id,status,creation_date)
SELECT 2,3,'pending',1601465161690 FROM (SELECT 1 as nb_threads, 1 as nb_users) as tmp
WHERE tmp.nb_threads = (SELECT count(*) FROM `threads` WHERE threads.id = 2)
AND tmp.nb_users = (SELECT count(*) FROM `users` WHERE users.id = 3)
It's a little verbose but it does the job pretty well.
Application-side, we just have to raise an error if affectedRows = 0 and maybe trying to see which of the keys doesn'nt exists. IMHO, it's a better way to do the job than to execute two SELECT queries and THEN execute the INSERT especially when an inexistent foreign key probability is very low.

How do I delete all the duplicate records in a MySQL table without temp tables

I've seen a number of variations on this but nothing quite matches what I'm trying to accomplish.
I have a table, TableA, which contain the answers given by users to configurable questionnaires. The columns are member_id, quiz_num, question_num, answer_num.
Somehow a few members got their answers submitted twice. So I need to remove the duplicated records, but make sure that one row is left behind.
There is no primary column so there could be two or three rows all with the exact same data.
Is there a query to remove all the duplicates?
Add Unique Index on your table:
ALTER IGNORE TABLE `TableA`
ADD UNIQUE INDEX (`member_id`, `quiz_num`, `question_num`, `answer_num`);
Another way to do this would be:
Add primary key in your table then you can easily remove duplicates from your table using the following query:
DELETE FROM member
WHERE id IN (SELECT *
FROM (SELECT id FROM member
GROUP BY member_id, quiz_num, question_num, answer_num HAVING (COUNT(*) > 1)
) AS A
);
Instead of drop table TableA, you could delete all registers (delete from TableA;) and then populate original table with registers coming from TableA_Verify (insert into TAbleA select * from TAbleA_Verify). In this way you won't lost all references to original table (indexes,... )
CREATE TABLE TableA_Verify AS SELECT DISTINCT * FROM TableA;
DELETE FROM TableA;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
DROP TABLE TableA_Verify;
This doesn't use TEMP Tables, but real tables instead. If the problem is just about temp tables and not about table creation or dropping tables, this will work:
SELECT DISTINCT * INTO TableA_Verify FROM TableA;
DROP TABLE TableA;
RENAME TABLE TableA_Verify TO TableA;
Thanks to jveirasv for the answer above.
If you need to remove duplicates of a specific sets of column, you can use this (if you have a timestamp in the table that vary for example)
CREATE TABLE TableA_Verify AS SELECT * FROM TableA WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];
DELETE FROM TableA;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
DROP TABLE TableA_Verify;
Add Unique Index on your table:
ALTER IGNORE TABLE TableA
ADD UNIQUE INDEX (member_id, quiz_num, question_num, answer_num);
is work very well
If you are not using any primary key, then execute following queries at one single stroke. By replacing values:
# table_name - Your Table Name
# column_name_of_duplicates - Name of column where duplicate entries are found
create table table_name_temp like table_name;
insert into table_name_temp select distinct(column_name_of_duplicates),value,type from table_name group by column_name_of_duplicates;
delete from table_name;
insert into table_name select * from table_name_temp;
drop table table_name_temp
create temporary table and store distinct(non duplicate) values
make empty original table
insert values to original table from temp table
delete temp table
It is always advisable to take backup of database before you play with it.
As noted in the comments, the query in Saharsh Shah's answer must be run multiple times if items are duplicated more than once.
Here's a solution that doesn't delete any data, and keeps the data in the original table the entire time, allowing for duplicates to be deleted while keeping the table 'live':
alter table tableA add column duplicate tinyint(1) not null default '0';
update tableA set
duplicate=if(#member_id=member_id
and #quiz_num=quiz_num
and #question_num=question_num
and #answer_num=answer_num,1,0),
member_id=(#member_id:=member_id),
quiz_num=(#quiz_num:=quiz_num),
question_num=(#question_num:=question_num),
answer_num=(#answer_num:=answer_num)
order by member_id, quiz_num, question_num, answer_num;
delete from tableA where duplicate=1;
alter table tableA drop column duplicate;
This basically checks to see if the current row is the same as the last row, and if it is, marks it as duplicate (the order statement ensures that duplicates will show up next to each other). Then you delete the duplicate records. I remove the duplicate column at the end to bring it back to its original state.
It looks like alter table ignore also might go away soon: http://dev.mysql.com/worklog/task/?id=7395
An alternative way would be to create a new temporary table with same structure.
CREATE TABLE temp_table AS SELECT * FROM original_table LIMIT 0
Then create the primary key in the table.
ALTER TABLE temp_table ADD PRIMARY KEY (primary-key-field)
Finally copy all records from the original table while ignoring the duplicate records.
INSERT IGNORE INTO temp_table AS SELECT * FROM original_table
Now you can delete the original table and rename the new table.
DROP TABLE original_table
RENAME TABLE temp_table TO original_table
Tested in mysql 5.Dont know about other versions.
If you want to keep the row with the lowest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.member_id = n2.member_id and n1.answer_num =n2.answer_num
If you want to keep the row with the highest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.member_id = n2.member_id and n1.answer_num =n2.answer_num

Duplicate Entries in DB

I have a huge table of products but there are lot of duplicate entries. The table has more than10 Thousand entries and I want to remove the duplicate entries in it without manually finding and deleting it. Please let me know if you can provide me a solution for this
You could use SELECT DISTINCT INTO TempTable, drop the original table, and then rename the temp one.
You should also add primary and unique keys to avoid this sort of thing in the future.
for full row duplicates try this.
select distinct * into mytable_tmp from mytable
drop table mytable
alter table mytable_tmp rename mytable
Seems the below statements will help you in resolving your requirements.
if the table(foo) has primary key field
First step
store key values in temporary table, give your unique conditions in group by clause
if you want to delete the duplicate email id, give email id in group by clause and give the primary key name in
select clause like either min(primarykey) or max(primarykey)
CREATE TEMPORARY TABLE temptable AS SELECT min( primarykey ) FROM foo GROUP BY uniquefields;
Second step
call the below delete statement and give the table name and primarykey columns
DELETE FROM foo WHERE primarykey NOT IN (SELECT * FROM temptable );
execute both the query combined in your query analyser or db tool.
If the table(foo) doesn't have a primary key filed
step 1
CREATE TABLE temp_table AS SELECT * FROM foo GROUP BY field or fileds;
step 2
DELETE FROM foo;
step 3
INSERT INTO foo select * from temp_table;
There are different solutions to remove duplicate rows and it fully depends upon your scenario to make use of one from them. The simplest method is to alter the table making the Unique Index on Product Name field:
alter ignore table products add unique index `unique_index` (product_name);
You can remove the index after getting all the duplicate rows deleted:
alter table products drop index `unique_index`;
Please let me know if this resolves the issue. If not I can give you alternate solutions for that.
You can add more than one column to a group by. I.E.
SELECT * from tableName GROUP BY prod_name HAVING count(prod_name) > 1
That will show the unique products. You can write it dump it to new table and drop the existing one.