Deleting duplicates in mysql (2 tables) - mysql

I have two tables (id_test, test) , each of them has an ID column, which is unique, and two entries with the same id in the two tables are the same. Now, i have another column in one of the tables (id_test) that also should be unique, so I want to eliminate duplicates according to this other column, let's call it YD.
To identify the duplicates I used
SELECT ID, YD AS x, COUNT(*) AS y
FROM id_test
GROUP BY x
HAVING y>1;
now, I want to delete these entries in both tables. How can I do it?

This query shows the first ID for every YD in id_test table:
SELECT ID, YD
FROM id_test
GROUP BY YD
and these are the rows you have to keep. The following query returns the IDs you have to delete:
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL
Now I think i need more details about your tables, but what I think you need is this:
DELETE FROM test
WHERE
test.ID IN (
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL)

As documented under ALTER TABLE Syntax (emphasis added):
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
Therefore:
ALTER IGNORE TABLE id_test ADD UNIQUE (YD)

I think you don't user select in because if data large it impossible.
You should clone a table the same structure. Insert data not duplicate in it.
INSERT INTO test_new (ID, YD) SELECT t.ID, t.YD FROM test t LEFT JOIN test_id ti ON t.ID = ti.id WHERE ti.id IS NULL;
After drop table test, rename test_new -> test.

Related

MYSQL drop duplicates of userid

I thought I'd made the column userid in my table "userslive" unique, but somehow must have made a mistake. I've seen multiple answers to this question, but I'm afraid of messing up again so I hope someone can help me directly.
So this table has no unique columns, but I've got a column "timer" which was the timestamp of scraping the data. If possible I'd like to drop rows with the lowest "timer" with duplicate "userid" column.
It's a fairly big table at about 2 million rows (20 columns). There is about 1000 duplicate userid which I've found using this query:
SELECT userid, COUNT(userid) as cnt FROM userslive GROUP BY userid HAVING (cnt > 1);
Is this the correct syntax? I tried this on a backup table, but I suspect this is too heavy for a table this big (unless left to run for a very long time.
DELETE FROM userslive using userslive,
userslive e1
where userslive.timer < e1.timer
and userslive.userid = e1.userid
Is there a quicker way to do this?
EDIT: I should say the "timer" is not a unique column.
DELETE t1.* /* delete from a copy named t1 only */
FROM userslive t1, userslive t2
WHERE t1.userid = t2.userid
AND t1.timer < t2.timer
fiddle
Logic: if for some record (in a copy aliased as t1) we can find a record (in a table copy aliased as t2) with the same user but with greater/later timer value - this record must be deleted.
I've done this in the past and the easiest way to solve this is to add an id column and then select userid, max(new_id) into a new table and join that for the delete. Something like this.
ALTER TABLE `userslive`
ADD `new_id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
Now you have your new unique column and create a new table for selecting the ones to delete.
CREATE TABLE `users_to_delete`
AS
SELECT userid, new_id
FROM (
SELECT userid, max(new_id) new_id, count(*) user_rows
FROM `userslive`
GROUP BY 1
) dataset
WHERE user_rows > 1
Then use that to delete your duplicate rows by joining it into a DELETE statement like this:
DELETE `userslive` FROM `userslive`
INNER JOIN `users_to_delete` USING(userid,new_id);
Make sure you back everything up before you delete anything just in case.

Mysql, Insert new record into table B if foreign key exists in table A

There are a few similar questions on here. None provide a solution. I would like to INSERT a NEW record into table B, but only if a foreign key exists in table A. To be clear, I do not wish to insert the result of a select. I just need to know that the foreign key exists.
INSERT INTO tableB (tableA_ID,code,notes,created) VALUES ('24','1','test',NOW())
SELECT tableA_ID FROM tableA WHERE tableA_ID='24' AND owner_ID='9'
Clearly, the above does not work. But is this even possible? I want to insert the NEW data into tableB, only if the record for the row in tableA exists and belongs to owner_ID.
The queries I have seen so far relate to INSERTING the results from the SELECT query - I do not wish to do that.
Try this:
INSERT INTO tableB (tableA_ID,code,notes,created)
SELECT id, code, notes, created
FROM ( SELECT '24' as id, '1' as code, 'test' as notes, NOW() as created) t
WHERE EXISTS
(
SELECT tableA_ID
FROM tableA
WHERE tableA_ID='24' AND owner_ID='9'
)
I know it's a pretty much old answered question but it's highly ranked now in google search results and I think some addition may help someone in the future.
In some DB configuration, you may want to insert a row in a table that have two or more foreign keys. Let's say we have four tables in a chat application :
Users, Threads, Thread_Users and Messages
If we want a User to join a Thread we'll want to insert a row in Thread_Users in wich have two foreign keys : user_id, thread_id.
Then, we can use a query like this, to insert if both foreign keys exists, and silently fail otherwise :
INSERT INTO `thread_users` (thread_id,user_id,status,creation_date)
SELECT 2,3,'pending',1601465161690 FROM (SELECT 1 as nb_threads, 1 as nb_users) as tmp
WHERE tmp.nb_threads = (SELECT count(*) FROM `threads` WHERE threads.id = 2)
AND tmp.nb_users = (SELECT count(*) FROM `users` WHERE users.id = 3)
It's a little verbose but it does the job pretty well.
Application-side, we just have to raise an error if affectedRows = 0 and maybe trying to see which of the keys doesn'nt exists. IMHO, it's a better way to do the job than to execute two SELECT queries and THEN execute the INSERT especially when an inexistent foreign key probability is very low.

MySQL: delete duplicates rows where possible

I entered a query that introduced some duplicates into my database. The table is straight forward.
It has an id (int) column and a phrase column which is varchar(255). In order to find duplicates, my query looks like the following:
SELECT phrase from foo GROUP BY phrase HAVING (count(phrase) > 1)
My question is, how do I delete the duplicate entries without manually having to do it? I want to use the query above to generate the list of entries that need to be deleted at least once. This way only one version of 'phrase' exists in table foo.
This would keep one row (the one with the lowest ID) per phrase.
DELETE FROM foo
WHERE id NOT IN (
SELECT id FROM (
SELECT MIN(id) id
FROM foo
GROUP BY phrase
) _
);
As dan pointed out in comments, with MySQL you need that weird inner query.
You should use:
SELECT max(id) from foo GROUP BY phrase HAVING (count(phrase) > 1)
To establish what the ids that need to be deleted.
To delete the entries you can do something like:
delete from foo where id in (select id from (SELECT max(id) from foo GROUP BY phrase HAVING (count(phrase) > 1)) foo);
You will be able to execute the delete statement multiple times, to delete duplicates that are more then one.
You need to create a temporary table, add unique values, add just one of the duplicated values, and then rename your temporary table to your original one.
create table tmp like foo;
alter table tmp add unique (phrase);
insert into tmp select * from foo
on duplicate key update phrase=ifnull(phrase, values(phrase));
rename table foo to deleteme, tmp to foo;
drop table deleteme;
You can do a JOIN and decide if you want to delete the first (min) or last (max) duplicate.
DELETE phrase FROM phrase JOIN
(SELECT max(id),COUNT(id) cnt from foo GROUP BY phrase
HAVING cnt>1) AS dups
ON phrase.id=dups.id
You need to run it multiple times if you have more that more than 1 duplicate of each record.

Removing duplicate rows from a MySql table

Similar questions were indeed asked, but I didn't find an answer.
I have a MySql table with 3 non-unique fields. I don't want duplicate rows. Meaning ("a", "b", "c") and ("a", "dasd", "dfsd") are okay (I don't mind having "a" twice in the first fields), but having ("a", "b", "c") twice is wrong.
I need a query which will remove duplicates, leaving only one row for each row group.
Edit This has already been covered on SO before.
One approach would be to create a new table based on the existing table. You could do this through something like:
create table myNewTable SELECT distinct * FROM myOldTable;
Then you could clear the old table's data, and create a unique constraint on the fields you don't want duplicated:
TRUNCATE TABLE myOldTable;
ALTER TABLE myOldTable
ADD UNIQUE (field1, field2);
Then insert your data back into the original table. Because you created myNewTable using DISTINCT, you should not have any duplicates.
INSERT INTO myOldTable SELECT * FROM myNewTable;
Note: It assumes we have primary key apart from column1 and column2 and column3. Also it assumes that last row should be preserved. Helpful when we have some other information also apart from column1,column2 and column3.
It saves the last primary key and delete the rest for unique values of Column1,Column2,Column3
Insert result of below query into a temp table
SELECT MAX(PrimaryKey)
FROM TABLENAME
GROUP BY Column1,Column2,Column3
Delete from TABLENAME where PrimaryKey NOT IN (SELECT PrimaryKey FROM TEMPTABLE)
If we have only these 3 columns, then
Save distinct in temp table
truncate original table
insert back into original from temp table.
You can retrieve a list of the duplicates like this:
SELECT field1, field2, field3, count(*) AS cnt
FROM yourtable
GROUP by field1, field2, field3
HAVING (cnt > 1)
You'll then have to delete the duplicate rows in subsequent seperate queries.
I will solve the problem by using a temporary table and subqueries to find the elements to erase. That will only work if your table 'yourTable' with the fields f1,f2,f3 has also an ID field that is unique.
Create the temporary table to store the IDs of the elements to erase.
CREATE TEMPORARY TABLE ids (ID int);
Find the IDs of the elements to erase:
INSERT INTO ids(ID) SELECT ID FROM yourTable AS t
WHERE 1 != (SELECT COUNT(*) FROM yourTable
WHERE yourTable.ID <= t.ID
AND yourTable.f1 = t.f1
AND yourTable.f2 = t.f2
AND yourTable.f3 = t.f3);
Delete the elements of the table with the previously selected indexes
DELETE yourTable FROM yourTable,ids WHERE yourTable.ID = ids.ID;
Remove the temporary table
DROP TABLE ids;
If SQL supported to to subqueries using the same table for a SELECT and a DELETE we could do all that in the same query, but this is not the case, so we need to go through a temporary table.
To avoir duplicates to happen I will set the three fields as primary keys of the table, in this way:
ALTER TABLE yourTable ADD PRIMARY KEY (f1, f2, f3);
You will be able to alter your table this way, only when you removed all the duplicates and once the table altered subsequent inserts with duplicated values will fail.

Deleting duplicate rows from a table

I have a table in my database which has duplicate records that I want to delete. I don't want to create a new table with distinct entries for this. What I want is to delete duplicate entries from the existing table without the creation of any new table. Is there any way to do this?
id action
L1_name L1_data
L2_name L2_data
L3_name L3_data
L4_name L4_data
L5_name L5_data
L6_name L6_data
L7_name L7_data
L8_name L8_data
L9_name L9_data
L10_name L10_data
L11_name L11_data
L12_name L12_data
L13_name L13_data
L14_name L14_data
L15_name L15_data
see these all are my fields :
id is unique for every row.
L11_data is unique for respective action field.
L11_data is having company names while action is having name of the industries.
So in my data I'm having duplicate name of the companies in L11_data for their respective industries.
What I want is to have is unique name and other data of the companies in the particular industry stored in action. I hope I have stated my problem in a way that you people can understand it.
Yes, assuming you have a unique ID field, you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their group of values.
Example query:
DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)
Notes:
I freely chose "Table" and "ID" as representative names
The list of fields ("Field1, Field2, ...") should include all fields except for the ID
This may be a slow query depending on the number of fields and rows, however I expect it would be okay compared to alternatives
EDIT: In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.
ALTER IGNORE TABLE 'table' ADD UNIQUE INDEX(your cols);
Duplicates get NULL, then you can delete them
DELETE
FROM table_x a
WHERE rowid < ANY (
SELECT rowid
FROM table_x b
WHERE a.someField = b.someField
AND a.someOtherField = b.someOtherField
)
WHERE (
a.someField,
a.someOtherField
) IN (
SELECT c.someField,
c.someOtherField
FROM table_x c
GROUP BY c.someField,
c.someOtherField
HAVING count(*) > 1
)
In above query the combination of someField and someOtherField must identify the duplicates distinctively.