I thought I'd made the column userid in my table "userslive" unique, but somehow must have made a mistake. I've seen multiple answers to this question, but I'm afraid of messing up again so I hope someone can help me directly.
So this table has no unique columns, but I've got a column "timer" which was the timestamp of scraping the data. If possible I'd like to drop rows with the lowest "timer" with duplicate "userid" column.
It's a fairly big table at about 2 million rows (20 columns). There is about 1000 duplicate userid which I've found using this query:
SELECT userid, COUNT(userid) as cnt FROM userslive GROUP BY userid HAVING (cnt > 1);
Is this the correct syntax? I tried this on a backup table, but I suspect this is too heavy for a table this big (unless left to run for a very long time.
DELETE FROM userslive using userslive,
userslive e1
where userslive.timer < e1.timer
and userslive.userid = e1.userid
Is there a quicker way to do this?
EDIT: I should say the "timer" is not a unique column.
DELETE t1.* /* delete from a copy named t1 only */
FROM userslive t1, userslive t2
WHERE t1.userid = t2.userid
AND t1.timer < t2.timer
fiddle
Logic: if for some record (in a copy aliased as t1) we can find a record (in a table copy aliased as t2) with the same user but with greater/later timer value - this record must be deleted.
I've done this in the past and the easiest way to solve this is to add an id column and then select userid, max(new_id) into a new table and join that for the delete. Something like this.
ALTER TABLE `userslive`
ADD `new_id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
Now you have your new unique column and create a new table for selecting the ones to delete.
CREATE TABLE `users_to_delete`
AS
SELECT userid, new_id
FROM (
SELECT userid, max(new_id) new_id, count(*) user_rows
FROM `userslive`
GROUP BY 1
) dataset
WHERE user_rows > 1
Then use that to delete your duplicate rows by joining it into a DELETE statement like this:
DELETE `userslive` FROM `userslive`
INNER JOIN `users_to_delete` USING(userid,new_id);
Make sure you back everything up before you delete anything just in case.
Related
I did research but the answers were too complicated to convert to my schema and solution.
I have a table which I forgot to make a field unique in and now the insert has created lots and lots of items under the same field value. My table name is queue_items and the field is called item - how can I remove duplicates of item field?
I still want to be left with 1 item of the duplicates if that makes sense, but just delete any more than 1.
Here is what I've got so far
WITH CTE AS(
SELECT `item`
RN = ROW_NUMBER()OVER(PARTITION BY `item` ORDER BY `item`)
FROM `queue_items`
)
DELETE FROM CTE WHERE RN > 1
If you have a primary key like e.g. id you could try something like:
DELETE FROM
queue_items
WHERE
id
NOT IN (
SELECT MIN(id) FROM queue_items GROUP BY item
);
I would suggest emptying the table and repopulating it:
CREATE temp_qi AS
SELECT i.*
FROM (SELECT qi.*,
ROW_NUMBER()OVER(PARTITION BY `item` ORDER BY `item`) as seqnum
FROM `queue_items`
) qi
WHERE seqnum = 1;
ALTER TABLE drop_column seqnum;
TRUNCATE TABLE queue_items; -- backup before doing this!
INSERT INTO queue_items
SELECT * -- columns should be in the right order
FROM temp_qi;
Well that was s tricky one :) As far As I noticed you have a SQL-Table with Duplicate Entries, no Unique-Key (of course) and now simply want to get rid of the duplicates. I tried to recreate this using mySQL, which is not that easy for one can not DELETE / UPDATE and Querying to the same Table in mySQL per Definition.
Instead I had to follow this simple workaround:
Create a new TempTable, with the same structure as the original table
Copy every entry, to the new TempTable BUT group them by the duplicate_ID
DELETE original table
RENAME TempTable to the original table's name
In SQL you can do so by running the following queries - but make sure you have a backup, just in case.... :)
Workaround Delete duplicate SQL entries:
CREATE TABLE newTestTabelle LIKE TestTabelle;
//make sure table was created successfully before next step
INSERT INTO newTestTabelle
SELECT * FROM TestTabelle
GROUP BY myIndex;
//make sure copy was successfully done
DROP TABLE TestTabelle;
ALTER TABLE newTestTabelle RENAME TO TestTabelle;
Hint: I found a similar solution and very nice documentation for that under following link (http://www.mysqltutorial.org/mysql-delete-duplicate-rows/) - read for further information on topic
I am using the following query to delete multiple records except one from my table. It works well with small tables, but it got stuck when I tried it with a table that has >130000 records. The thing is, I don't even get an error. phpMyAdmin just gets stuck and the query ("loading... yellow line) basically takes forever.
My table structure
person_id (AI & PK)
person_name ( I want to delete multiple person_name records except one)
query
DELETE t2
FROM `person` t1
INNER JOIN `person` t2
ON t1.person_name = t2.person_name
AND t1.person_id < t2.person_id;
UPDATE : I don't have an index on person table. But my three other tables (person_job & person_image, book_who_wrote_it) contains foreign keys from person table (person_id)
First, do you have an index on person(person_name, person_id)? That would be the place to start.
Deleting lots of rows incurs overhead. Often, it is faster to put the results in another table and reinsert them:
create temporary table tmp_person as
select p.*
from person p join
(select person_name, max(person_id) as max_person_id
from person
) pp
on p.person_id = pp.max_person_id;
truncate table person;
insert into person
select * from tmp_person;
Be sure you validate tmp_person before truncating person! Truncate does not log the deletion of each row, so it is much, much, much faster than delete under most circumstances.
NOTE:
If you really only have two columns in person, then you can simplify the first query to:
create temporary table tmp_person as
select person_name, max(person_id) as max_person_id
from person;
try this
DELETE
FROM `person` t1
where person_id not in
(select * from
(select person_id from person group by person_name)x)
I need to remove all duplicates records that have have the same stationId and only keep one record that has the latest dateUpdated
stationId is varchar(20)
dateUpdated is datetime
I usually remove duplicates this the following, but this time I don't think it will work
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
I don't think that alter table statement removes records; it just ignores index creation errors.
Instead:
delete t
from table t left join
(select t.stationId, max(t.dateUpdated) as maxdu
from table t
group by t.stationId
) tmax
on t.stationId = tmax.stationId and t.dateUpdated = tmax.maxdu
where tmax.stationId is null;
DELETE t1 FROM table t1, table t2 WHERE t1.dateUpdated < t2.dateUpdated AND t1.stationId= t2.stationId
Delete all Duplicate Rows except for One in MySQL?
I've seen a number of variations on this but nothing quite matches what I'm trying to accomplish.
I have a table, TableA, which contain the answers given by users to configurable questionnaires. The columns are member_id, quiz_num, question_num, answer_num.
Somehow a few members got their answers submitted twice. So I need to remove the duplicated records, but make sure that one row is left behind.
There is no primary column so there could be two or three rows all with the exact same data.
Is there a query to remove all the duplicates?
Add Unique Index on your table:
ALTER IGNORE TABLE `TableA`
ADD UNIQUE INDEX (`member_id`, `quiz_num`, `question_num`, `answer_num`);
Another way to do this would be:
Add primary key in your table then you can easily remove duplicates from your table using the following query:
DELETE FROM member
WHERE id IN (SELECT *
FROM (SELECT id FROM member
GROUP BY member_id, quiz_num, question_num, answer_num HAVING (COUNT(*) > 1)
) AS A
);
Instead of drop table TableA, you could delete all registers (delete from TableA;) and then populate original table with registers coming from TableA_Verify (insert into TAbleA select * from TAbleA_Verify). In this way you won't lost all references to original table (indexes,... )
CREATE TABLE TableA_Verify AS SELECT DISTINCT * FROM TableA;
DELETE FROM TableA;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
DROP TABLE TableA_Verify;
This doesn't use TEMP Tables, but real tables instead. If the problem is just about temp tables and not about table creation or dropping tables, this will work:
SELECT DISTINCT * INTO TableA_Verify FROM TableA;
DROP TABLE TableA;
RENAME TABLE TableA_Verify TO TableA;
Thanks to jveirasv for the answer above.
If you need to remove duplicates of a specific sets of column, you can use this (if you have a timestamp in the table that vary for example)
CREATE TABLE TableA_Verify AS SELECT * FROM TableA WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];
DELETE FROM TableA;
INSERT INTO TableA SELECT * FROM TAbleA_Verify;
DROP TABLE TableA_Verify;
Add Unique Index on your table:
ALTER IGNORE TABLE TableA
ADD UNIQUE INDEX (member_id, quiz_num, question_num, answer_num);
is work very well
If you are not using any primary key, then execute following queries at one single stroke. By replacing values:
# table_name - Your Table Name
# column_name_of_duplicates - Name of column where duplicate entries are found
create table table_name_temp like table_name;
insert into table_name_temp select distinct(column_name_of_duplicates),value,type from table_name group by column_name_of_duplicates;
delete from table_name;
insert into table_name select * from table_name_temp;
drop table table_name_temp
create temporary table and store distinct(non duplicate) values
make empty original table
insert values to original table from temp table
delete temp table
It is always advisable to take backup of database before you play with it.
As noted in the comments, the query in Saharsh Shah's answer must be run multiple times if items are duplicated more than once.
Here's a solution that doesn't delete any data, and keeps the data in the original table the entire time, allowing for duplicates to be deleted while keeping the table 'live':
alter table tableA add column duplicate tinyint(1) not null default '0';
update tableA set
duplicate=if(#member_id=member_id
and #quiz_num=quiz_num
and #question_num=question_num
and #answer_num=answer_num,1,0),
member_id=(#member_id:=member_id),
quiz_num=(#quiz_num:=quiz_num),
question_num=(#question_num:=question_num),
answer_num=(#answer_num:=answer_num)
order by member_id, quiz_num, question_num, answer_num;
delete from tableA where duplicate=1;
alter table tableA drop column duplicate;
This basically checks to see if the current row is the same as the last row, and if it is, marks it as duplicate (the order statement ensures that duplicates will show up next to each other). Then you delete the duplicate records. I remove the duplicate column at the end to bring it back to its original state.
It looks like alter table ignore also might go away soon: http://dev.mysql.com/worklog/task/?id=7395
An alternative way would be to create a new temporary table with same structure.
CREATE TABLE temp_table AS SELECT * FROM original_table LIMIT 0
Then create the primary key in the table.
ALTER TABLE temp_table ADD PRIMARY KEY (primary-key-field)
Finally copy all records from the original table while ignoring the duplicate records.
INSERT IGNORE INTO temp_table AS SELECT * FROM original_table
Now you can delete the original table and rename the new table.
DROP TABLE original_table
RENAME TABLE temp_table TO original_table
Tested in mysql 5.Dont know about other versions.
If you want to keep the row with the lowest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.member_id = n2.member_id and n1.answer_num =n2.answer_num
If you want to keep the row with the highest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.member_id = n2.member_id and n1.answer_num =n2.answer_num
I have two tables (id_test, test) , each of them has an ID column, which is unique, and two entries with the same id in the two tables are the same. Now, i have another column in one of the tables (id_test) that also should be unique, so I want to eliminate duplicates according to this other column, let's call it YD.
To identify the duplicates I used
SELECT ID, YD AS x, COUNT(*) AS y
FROM id_test
GROUP BY x
HAVING y>1;
now, I want to delete these entries in both tables. How can I do it?
This query shows the first ID for every YD in id_test table:
SELECT ID, YD
FROM id_test
GROUP BY YD
and these are the rows you have to keep. The following query returns the IDs you have to delete:
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL
Now I think i need more details about your tables, but what I think you need is this:
DELETE FROM test
WHERE
test.ID IN (
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL)
As documented under ALTER TABLE Syntax (emphasis added):
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
Therefore:
ALTER IGNORE TABLE id_test ADD UNIQUE (YD)
I think you don't user select in because if data large it impossible.
You should clone a table the same structure. Insert data not duplicate in it.
INSERT INTO test_new (ID, YD) SELECT t.ID, t.YD FROM test t LEFT JOIN test_id ti ON t.ID = ti.id WHERE ti.id IS NULL;
After drop table test, rename test_new -> test.