Remove repeat rows from MySQL table - mysql

Is there a way to remove all repeat rows from a MySQL database?

A couple of years ago, someone requested a way to delete duplicates. Subselects make it possible with a query like this in MySQL 4.1:
DELETE FROM some_table WHERE primaryKey NOT IN
(SELECT MIN(primaryKey) FROM some_table GROUP BY some_column)
Of course, you can use MAX(primaryKey) as well if you want to keep the newest record with the duplicate value instead of the oldest record with the duplicate value.
To understand how this works, look at the output of this query:
SELECT some_column, MIN(primaryKey) FROM some_table GROUP BY some_column
As you can see, this query returns the primary key for the first record containing each value of some_column. Logically, then, any key value NOT found in this result set must be a duplicate, and therefore it should be deleted.

These questions / answers might interest you :
How to delete duplicate records in mysql database?
How to delete Duplicates in MySQL table.
And idea that's often used when you are working with a big table is to :
Create a new table
Insert into that table the unique records (i.e. only one version of the duplicates in the original table, generally using a select distinct)
and use that new table in your application ; or drop the old table and rename the new one to the old name.
Good thing with this principle is you have the possibility to verify what's in the new table before dropping the old one -- always nice to check that sort of thing ^^

Related

Can I add rows to MySQL before removing all old rows (except same primary)?

If I have a table that has these rows:
animal (primary)
-------
man
dog
cow
and I want to delete all the rows and insert my new rows (that may contain some of the same data), such as:
animal (primary)
-------
dog
chicken
wolf
I could simply do something like:
delete from animal;
and then insert the new rows.
But when I do that, for a split second, 'dog' won't be accessible through the SELECT statement.
I could simply insert ignore the new data and then delete the rest, one by one, but that doesn't feel like the right solution when I have a lot of rows.
Is there a way to insert the new data and then have MySQL automatically delete the rest afterward?
I have a program that selects data from this table every 5 minutes (and the code I'm writing now will be updating this table once every 30 minutes), so I would like to be as accurate as possible at all times, and I would rather have too many rows for a split second than too few rows for the same time.
Note: I know that this may seem like it is unnecessary but I just feel like if I leave too many of those unlikely possibilities in different places, there will be times where things go wrong.
You may want to use TRUNCATE instead of DELETE here. TRUNCATE is faster than DELETE and resets the table back to its empty state (meaning IDENTITY columns are reset to original values as well).
Not sure why you're having problems with selecting a value that was deleted and re-added, maybe I'm missing some context. But if you're wiping the table clean, you might want to use truncate instead.
You could add another column timestamp and change the select statement to accommodate this scenario where it needs to check for the latest value.
If this is for school, I would argue that you need a timestamp and that is what your professor is looking for. You shouldn't need to truncate a table to get the latest values, you need to adjust the thinking behind the table and how you are querying data. Hope this helps!
Check out these:
How to make a mysql table with date and time columns?
Why not update values instead?
My other questions would be:
How are you loading this into the table?
What does that code look like?
Can you change the way you Select from the table?
What values are being "updated" and change in such a way that you need to truncate the entire table?
If you don't want to add new column, there is an other method.
1. At first step, update table in any way that mark all existing rows for deletion in future. For example:
UPDATE `table_name` SET `animal`=CONCAT('MUST_BE_DELETED_', `animal`)
At second step, insert new rows.
On final step, remove all marked rows:
DELETE FROM `table_name` WHERE `animal` LIKE 'MUST_BE_DELETED_%'
You could implement this by having the updated_on column as timestamp and you may even utilize some default values, but let's go with an example without them.
I presume the table would look something like this:
CREATE TABLE `new_table` (
`animal` varchar(255) NOT NULL,
`updated_on` timestamp,
PRIMARY KEY (`animal`)
) ENGINE=InnoDB
This is just a dummy table example. What's important are the two queries later on.
You would simply perform a query to insert the data, such as:
insert into my_table(animal)
select animal from my_view where animal = 'dogs'
on duplicate key update
updated_on = current_timestamp;
Please notice that my_view is your table/view/query by which you supply the values to insert into your table. Also notice that you need to have primary/unique key constraint on your animal column in this example, in order to work.
Then, you proceed with the following query, to "purge" (delete) the old values:
delete from my_table
where updated_on < (
select *
from (
select max(updated_on) from my_table
) as max_date
);
Please notice that you could make a separate view in order to obtain this max_date value for updated_on entry. This entry should indicate the timestamp for your last updated/inserted values in a previous query, so you could proceed with utilizing it in a where clause in order to issue deletion of old records that you don't want/need anymore.
IMPORTANT NOTE:
Since you are doing multiple queries and it's supposed to be a single operation, I'd advise you to utilize it within a single trancations and to utilize a proper rollback on various potential outcomes (i.e. in case of mysql exceptions). You might wish to utilize a proper stored procedure for that.

How to remove duplicate entries from database which as over 10000 records with no id field

I have database like the following with 10K rows. How to delete duplicate if all fields are same. I don't want to search for any specific company. Is there a way to search and find any multiple entries with all same fields get deleted. Thanks
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Note: This command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
OR
Simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table
This below query used to find the duplicate entry using all fields:
Select * from Table group by company_name,city,state,country having count(*)>1;

I have a query that finds duplicates in my SQL database-now how do I delete said duplicates?

I have an sql query that finds and groups these duplicates using very complicated conditions:
SELECT right(post_url, LOCATE('-', REVERSE(post_url),LOCATE('-',REVERSE(post_url))+1) -1) as name,
left(post_name,LOCATE('-',post_url,LOCATE('-',post_url)+1) - 1) as city,
post_title as original,ID,post_name,count(*)
FROM table WHERE post_type='finder'
GROUP BY name,city having count(*) > 1
To explain the query, post_url is basically a url name, ending with the name of someone, e.g : new-jersey-something-something-donald-t
I go to the second dash from the right and get the name that way. Then I get the city/state which is in the second dash from the left. In this manner, I've successfully found the duplicates in this database-but I'm having trouble thinking of a way to isolate the duplicate and delete it. In addition, I only want to delete the copy that does not have %near% in post_url. my question is, using the query here, how would I change this to delete the duplicate?
You're not going to be able to do it in one query. That's because you need to write a query that looks something like this:
DELETE FROM table
WHERE id IN (SELECT ... FROM table WHERE ...)
MySQL specifically prohibits this. You can't delete based on a subquery that references the same table. You also can't rewrite this query using JOINs.
There is an easy solution, though: use a temporary table and two queries.
-- build the list of IDs to delete
CREATE TEMPORARY TABLE temp
SELECT ... FROM table WHERE ...
-- now delete those items
DELETE FROM table
WHERE id IN (SELECT id FROM temp);
You can improve performance with JOINs and indexes.
The key to "isolating" the duplicates is to ensure that every item you want to delete has a primary key - that way you can easily build a list of IDs to delete. If your table don't have primary keys, you are reduced to doing WHERE clauses and JOINs on multiple columns - that gets messy very quickly.

How to avoid duplicate entries without primary key and unique key?

I want to know whether it is possible to avoid duplicate entries or data without any keys or group by statement
Create Unique key constrait.
ALTER TABLE Comment ADD CONSTRAINT uc_Comment UNIQUE (CommentId, Comment)
In above case Comment duplication will not be done as we are creating the unique combination of COmmentId and Comment.
Hope this helps.
More info : http://www.w3schools.com/sql/sql_unique.asp OR
SQL Server 2005 How Create a Unique Constraint?
If you want to suppress duplicates when querying, use SELECT DISTINCT.
If you want to avoid putting duplicates into a table, just don't insert records that are already there. It doesn't matter whether you have a primary/unique key: those will make the database not allow duplicate records, but it's still up to you to avoid trying to insert duplicates (assuming you want your queries to succeed).
You can use SELECT to find whether a record already exists before trying to insert it. Or, if you want to be fancy, you can insert the new records into a temporary table, use DELETE to remove any that are already present in the real table, then use INSERT ... SELECT to copy the remaining records from the temporary table into the real one.

mySQL find dupes and remove them

I am wondering if there is a way to do this through one query.
Seems when I was initially populating my DB with dummy data to work with 10k records, somewhere in the mess of it all the script dummped an extra 1,044 rows where the rows are duplicates. I determined this using
SELECT x.ID, x.firstname FROM info x
INNER JOIN (SELECT ID FROM info
GROUP BY ID HAVING count(id) > 1) d ON x.ID = d.ID
What I am trying to figure out is through this single query can I add another piece to it that will remove one of the matching dupes from each dupe found?
also I realize the ID column should have been set to auto increment, but it wasn't
My favorite way of removing duplicates would be:
ALTER IGNORE TABLE info ADD UNIQUE (ID);
To explain a bit further (for reference, take a look here)
UNIQUE - you are adding unique index to ID column.
IGNORE - is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
The query that I use is generally something like
Delete from table where id in (
Select Max(id) from table
Group by (DUPFIELD)
Having count (*)>1)
You have to run this several times since it all only remove one duplicated row at a time, but it's fast.
The most efficient way is you do it in below steps:
Step 1: Move the non duplicates (unique tuples) into a temporary table
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
Step 2: delete delete the old table.We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;
Step 3: rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;