I have a database called 'master_database' and a table called 'info'
In the 'info' table I have multiple records and I need the 'email' field to not contain any duplicates but currently it does. What SQL command can I run to remove these duplicates?
You can know the rows that are repeated by using this:
SELECT email, COUNT(email) FROM info GROUP BY email HAVING COUNT(email) > 1
DELETE FROM master_database.info WHERE info.ID NOT IN (SELECT MAX(info.ID) FROM master_database.info GROUP BY info.email HAVING COUNT(info.email) > 1)
This assumes you have a unique ID in the table where the higher the number the later the record, if you have a last_edited timestamp it might be better to use the MAX of that.
PLEASE TEST FIRST! Run the following to test:
SELECT * FROM master_database.info WHERE info.ID NOT IN (SELECT MAX(info.ID) FROM master_database.info GROUP BY info.email HAVING COUNT(info.email) > 1)
These values will be deleted.
If you don't have a unique field at all (ie the rows are really duplicates, the following methods will work):
Go to relevant table.
Create select query to select only the relevant duplicate row pairs.
Amend query to delete but add limit like:
DELETE FROM [table name] WHERE [fieldname]LIKE '[value]' AND [fieldname]LIKE '[value]' LIMIT 1
simulate to check effect before using query to delete one of a duplicate pair or repeat usage if there are more than a pair of duplicates
or
Go to relevant table.
Create select query to select only the relevant rows.
Copy query for later pasting.
Export button at the bottom of the query results will export just the query results.
Paste query back but amend it to delete those rows.
Copy export insert query for one of the duplicate rows to reinsert without the duplicate.
Related
I have an sql query that finds and groups these duplicates using very complicated conditions:
SELECT right(post_url, LOCATE('-', REVERSE(post_url),LOCATE('-',REVERSE(post_url))+1) -1) as name,
left(post_name,LOCATE('-',post_url,LOCATE('-',post_url)+1) - 1) as city,
post_title as original,ID,post_name,count(*)
FROM table WHERE post_type='finder'
GROUP BY name,city having count(*) > 1
To explain the query, post_url is basically a url name, ending with the name of someone, e.g : new-jersey-something-something-donald-t
I go to the second dash from the right and get the name that way. Then I get the city/state which is in the second dash from the left. In this manner, I've successfully found the duplicates in this database-but I'm having trouble thinking of a way to isolate the duplicate and delete it. In addition, I only want to delete the copy that does not have %near% in post_url. my question is, using the query here, how would I change this to delete the duplicate?
You're not going to be able to do it in one query. That's because you need to write a query that looks something like this:
DELETE FROM table
WHERE id IN (SELECT ... FROM table WHERE ...)
MySQL specifically prohibits this. You can't delete based on a subquery that references the same table. You also can't rewrite this query using JOINs.
There is an easy solution, though: use a temporary table and two queries.
-- build the list of IDs to delete
CREATE TEMPORARY TABLE temp
SELECT ... FROM table WHERE ...
-- now delete those items
DELETE FROM table
WHERE id IN (SELECT id FROM temp);
You can improve performance with JOINs and indexes.
The key to "isolating" the duplicates is to ensure that every item you want to delete has a primary key - that way you can easily build a list of IDs to delete. If your table don't have primary keys, you are reduced to doing WHERE clauses and JOINs on multiple columns - that gets messy very quickly.
I have a table called sg with the following columns:
player_uuid, player_name, coins, kills, deaths, and wins
However, I ran into an issue that caused some duplicate rows and some of those rows been modified. So, I am wondering how to drop the rows with older data. That said...
How do I drop the duplicate rows where player_uuid is the same? But I only want to drop the rows where coins, kills, deaths, or wins is smaller than it's duplicate.
Example data: http://i.stack.imgur.com/Xieod.png
In this case, I want to keep the row with 46 deaths and delete the row with 43 deaths.
Failing to come up with a single delete statement due to the way the data is structured: 3 Delete statements instead:
The way it works is: Find if there are multiple rows for a given UUID, and determined which row is to be kept (Max value of the given column), then join back on itself and determine which rows are not to be kept, store in temporary table and delete all that is marked in that temporary table from the main data table (called someTable). The benefit of this approach is: If you have more then 1 duplicate (3,4,5 rows till infinity), they will also be deleted.
CREATE TEMPORARY TABLE tempTable AS
SELECT a.player_uuid, a.kills, b.keepRow
FROM someTable a
LEFT JOIN (SELECT MAX(kills) AS kills, player_uuid, 1 AS keepRow
FROM sometable
GROUP BY player_uuid
HAVING COUNT(*)>1
) b ON a.player_uuid=b.player_uuid AND a.kills=b.kills
WHERE b.keepRow!=1;
DELETE a.* FROM someTable a, tempTable b
WHERE a.player_uuid=b.player_uuid AND a.kills=b.kills;
Repeat for the other columns (wins,coins,deaths) by replacing all kills with the other column names.
Always test delete code first :)
Also: While you are at it:
At a unique index to prevent this from happening again:
CREATE UNIQUE INDEX idx_st_nn_1 ON someTable(player_uuid);
When you then try to insert a faulty record, your code will just get an error in return. The best code to handle inserts in that case would be:
INSERT INTO someTable(player_uuid,kills) VALUES ('someplayer',1000)
ON DUPLICATE KEY UPDATE kills=1000;
What also helps is having some time indicator column: Then only one delete would have to be executed:
ALTER TABLE someTable ADD COLUMN (last_updated TIMESTAMP);
Timestamps update them selves, so no code changes required to use this.
So I know in MySQL it's possible to insert multiple rows in one query like so:
INSERT INTO table (col1,col2) VALUES (1,2),(3,4),(5,6)
I would like to delete multiple rows in a similar way. I know it's possible to delete multiple rows based on the exact same conditions for each row, i.e.
DELETE FROM table WHERE col1='4' and col2='5'
or
DELETE FROM table WHERE col1 IN (1,2,3,4,5)
However, what if I wanted to delete multiple rows in one query, with each row having a set of conditions unique to itself? Something like this would be what I am looking for:
DELETE FROM table WHERE (col1,col2) IN (1,2),(3,4),(5,6)
Does anyone know of a way to do this? Or is it not possible?
You were very close, you can use this:
DELETE FROM table WHERE (col1,col2) IN ((1,2),(3,4),(5,6))
Please see this fiddle.
A slight extension to the answer given, so, hopefully useful to the asker and anyone else looking.
You can also SELECT the values you want to delete. But watch out for the Error 1093 - You can't specify the target table for update in FROM clause.
DELETE FROM
orders_products_history
WHERE
(branchID, action) IN (
SELECT
branchID,
action
FROM
(
SELECT
branchID,
action
FROM
orders_products_history
GROUP BY
branchID,
action
HAVING
COUNT(*) > 10000
) a
);
I wanted to delete all history records where the number of history records for a single action/branch exceed 10,000. And thanks to this question and chosen answer, I can.
Hope this is of use.
Richard.
Took a lot of googling but here is what I do in Python for MySql when I want to delete multiple items from a single table using a list of values.
#create some empty list
values = []
#continue to append the values you want to delete to it
#BUT you must ensure instead of a string it's a single value tuple
values.append(([Your Variable],))
#Then once your array is loaded perform an execute many
cursor.executemany("DELETE FROM YourTable WHERE ID = %s", values)
I am wondering if there is a way to do this through one query.
Seems when I was initially populating my DB with dummy data to work with 10k records, somewhere in the mess of it all the script dummped an extra 1,044 rows where the rows are duplicates. I determined this using
SELECT x.ID, x.firstname FROM info x
INNER JOIN (SELECT ID FROM info
GROUP BY ID HAVING count(id) > 1) d ON x.ID = d.ID
What I am trying to figure out is through this single query can I add another piece to it that will remove one of the matching dupes from each dupe found?
also I realize the ID column should have been set to auto increment, but it wasn't
My favorite way of removing duplicates would be:
ALTER IGNORE TABLE info ADD UNIQUE (ID);
To explain a bit further (for reference, take a look here)
UNIQUE - you are adding unique index to ID column.
IGNORE - is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
The query that I use is generally something like
Delete from table where id in (
Select Max(id) from table
Group by (DUPFIELD)
Having count (*)>1)
You have to run this several times since it all only remove one duplicated row at a time, but it's fast.
The most efficient way is you do it in below steps:
Step 1: Move the non duplicates (unique tuples) into a temporary table
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
Step 2: delete delete the old table.We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;
Step 3: rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;
I have read many article about this one. I want to hear from you.
My problem is:
A table: ID(INT, Unique, Auto Increase) , Title(varchar), Content(text), Keywords(varchar)
My PHP Code will always do insert new record, but not accept duplicated record base on Title or Keywords. So, the title or keyword can't be Primary field. My PHP Code need to do check existing and insert like 10-20 records same time.
So, I check like this:
SELECT * FROM TABLE WHERE TITLE=XXX
And if return nothing, then I do INSERT.
I read some other post. And some guy say:
INSERT IGNORE INTO Table values()
An other guy suggest:
SELECT COUNT(ID) FROM TABLE
IF it return 0, then do INSERT
I don't know which one faster between those queries.
And I have 1 more question, what is different and faster on those queries too:
SELECT COUNT(ID) FROM ..
SELECT COUNT(0) FROM ...
SELECT COUNT(1) FROM ...
SELECT COUNT(*) FROM ...
All of them show me total of records in table, but I don't know do mySQL think number 0 or 1 is my ID field? Even I do SELECT COUNT(1000) , I still get total records of my table, while my table only have 4 columns.
I'm using MySQL Workbench, have any option for test speed on this app?
I would use insert on duplicate key update command. One important comment from the documents states that: "...if there is a single multiple-column unique index on the table, then the update uses (seems to use) all columns (of the unique index) in the update query."
So if there is a UNIQUE(Title,Keywords) constraint on the table in the example, then, you would use:
INSERT INTO table (Title,Content,Keywords) VALUES ('blah_title','blah_content','blah_keywords')
ON DUPLICATE KEY UPDATE Content='blah_content';
it should work and it is one query to the database.
SELECT COUNT(*) FROM .... is faster than SELECT COUNT(ID) FROM .. or build something like this:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=3;