I have a mysql query that returns quite a few distinct, yet correct, results
select distinct page_id, display_id
from display_to_page;
But now I'm trying to delete everything that isn't in that result set (deleting duplicates) but I'm a bit stuck.
I know I can do something like:
delete from display_to_page dp
(select distinct page_id, display_id
from display_to_page) dp2 ...
But i'm unsure how to complete the syntax there.
How can I structure a delete that will remove anything not in that result set?
If you only have two columns in the table, then the easiest way is probably truncate/reload:
create temporary table temp_pd as
select distinct page_id, display_id
from display_to_page;
truncate table display_to_page;
insert into display_to_page (page_id, display_id)
select page_id, display_id
from temp_pd;
Be sure to copy the table before trying this on your data!
Related
Hello SQL query experts!
I have one table called 'mytable' which has 2 columns such as id and title .
I tried to remove duplicates except only one record(row) comparing title.
Below was my choice:
DELETE FROM `myTable` AS `m1`
WHERE `m1`.`id`
NOT IN (SELECT MIN(`b`.`id`) as `recordid` FROM `myTable` AS `b` GROUP BY `b`.`title`)
error : Error in query (1064): Syntax error near '* FROM `myTable` AS `m1` WHERE `m1`.`id` NOT IN (SELECT MIN(`b`.`id`) as `reco' at line 1
but I faced a trouble and tried to resolve this problem more than 2 hours.
It seems like very simple problem.
But I can't figure it out. So I am asking to stackoverflow!
And mainly, I see something strange.
I tried like this but it has no any error.
SELECT * FROM `myTable` AS `m1`
WHERE `m1`.`id`
NOT IN (SELECT MIN(`b`.`id`) as `recordid` FROM `myTable` AS `b` GROUP BY `b`.`title`)
When I run this query, I can obtain the list of records(rows) I want to delete from 'myTable' table.
Why do I face a deletion problem although I can obtain the list to delete?
I need your help really.
Thanks everyone!
You can phrase this as:
delete m
from mytable m left join
(select m2.title, min(m2.id) as min_id
from mytable m2
group by m2.title
) m2
on m.title = m2.title and m.id > m.min_id;
For performance, you want in index on (title, id).
I think Gordon's answer lays the gist. Recently had to do something similar, ended up with this (applied to your situation):
DELETE FROM mytable WHERE id IN (
SELECT *
FROM (
SELECT m.id
FROM my_table m
WHERE m.id NOT IN (
SELECT MAX(m.id)
FROM my_table sub
GROUP BY sub.title
HAVING COUNT(sub.title) > 1
)
AND m.id NOT IN (
SELECT MAX(sub2.id)
FROM my_table sub2
GROUP BY sub2.title
HAVING COUNT(sub2.title) = 1
)
) AS m
)
The extra wrapper was necessary (if I remember correctly) because sub-query was not allowed in a DELETE statement (but could be used like shown).
This will remove all the records, by ID, that have a count (of title) greater than 0, but will not remove the latest (max) record.
NOTE: this is a very intensive query. Indexes on ID & Title are recommended and even then: sloooowwww. Ran this through just 100k records with indexes and still takes about 10 seconds.
The syntax:
DELETE FROM `myTable` AS `m1`
is wrong.
It should be:
DELETE m1 FROM `myTable` AS `m1`
but you don't need to alias the table, you can just do
DELETE FROM `myTable`
Also MySql does not allow the direct use of the target table inside a subquery like the one you use with NOT IN, but you can overcome this limitation by enclosing the subquery inside another one:
DELETE FROM `myTable`
WHERE `id` NOT IN (
SELECT `recordid`
FROM (
SELECT MIN(`id`) as `recordid`
FROM `myTable`
GROUP BY `title`
) t
)
I removed the aliases of the nested subquery because they are not needed.
I found out the exact reason of issue I faced finally.
I referenced the comment of #Malakiyasanjay.
you can find that from here How to keep only one row of a table, removing duplicate rows?
I tried like this: (and it worked for me as well but it took a lot of time to run the query for 30,000 rows)
delete from myTable
where id not in
(select min(id) as min from (select * from myTable) as x group by title)
The problem was I couldn't specify the 'myTable' table as a target table. so I used (select * from myTable) as x and figured it out.
I am sorry I can't explain more detail about that because I am not familiar with mysql query. But you should note that:
MySql does not allow the direct use of the target table inside a subquery like the one you use with NOT IN, but you can overcome this limitation by enclosing the subquery inside another one.
(Please reference #forpas 's answer.)
But you have to notice this takes so long time... It might cause the time out error. I ran this query for table with about 600,000 rows but it didn't response for several days. So I conclude this idea is pretty fit to small database table.
I hope this is helpful for everyone! :)
I want to remove duplicates based on the combination of listings.product_id and listings.channel_listing_id
This simple query returns 400.000 rows (the id's of the rows I want to keep):
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
While this variation returns 1.600.000 rows, which are all records on the table, not only is_verified = 0:
SELECT *
FROM (
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
) AS keepem
I'd expect them to return the same amount of rows.
What's the reason for this? How can I avoid it (in order to use the subselect in the where condition of the DELETE statement)?
EDIT: I found that doing a SELECT DISTINCT in the outer SELECT "fixes" it (it returns 400.000 records as it should). I'm still not sure if I should trust this subquery, for there is no DISTINCT in the DELETE statement.
EDIT 2: Seems to be just a bug in the way phpMyAdmin reports the total count of the rows.
Your query as it stands is ambiguous. Suppose you have two listings with the same product_id and channel_id. Then what id is supposed to be returned? The first, the second? Or both, ignoring the GROUP request?
What if there is more than one id with different product and channel ids?
Try removing the ambiguity by selecting MAX(id) AS id and adding DISTINCT.
Are there any foreign keys to worry about? If not, you could pour the original table into a copy, empty the original and copy back in it the non-duplicates only. Messier, but you only do SELECTs or DELETEs guaranteed to succeed, and you also get to keep a backup.
Assign aliases in order to avoid field reference ambiguity:
SELECT
keepem.*
FROM
(
SELECT
innerStat.id
FROM
`listings` AS innerStat
WHERE
innerStat.is_verified = 0
GROUP BY
innerStat.product_id,
innerStat.channel_listing_id
) AS keepem
I wanted to know if there is an easy way to remove duplicates from a table sql.
Rather than fetch the whole table and delete the data if they appear twice.
Thank you in advance
This is my structure :
CREATE TABLE IF NOT EXISTS `mups` (
`idgroupe` varchar(15) NOT NULL,
`fan` bigint(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
If you are using Sql Server
Check this: SQL SERVER – 2005 – 2008 – Delete Duplicate Rows
Sample Code using CTE:
/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO
Add a calculated column that takes the checksum of the entire row. Search for any duplicate checksums, rank and remove the duplicates.
you can do something like this :
DELETE from yourTable WHERE tableID in
(SELECT clone.tableID
from yourTable origine,
yourTable clone
where clone.tableID= origine.tableID)
But in the WHERE, you can either compare the indexes or compare each other fields...
depending on how you find your doubles.
note, this solution has the advantage of letting you choose what IS a double (if the PK changes for example)
You can find the duplicates by joining the table to itself, doing a group by the fields you are looking for duplicates in, and a having clause where count is greater than one.
Let's say your table name is customers, and your looking for duplicate name fields.
select cust_out.name, count(cust_count.name)
from customers cust_out
inner join customers cust_count on cust_out.name = cust_count.name
group by cust_out.name
having count(cust_count.name) > 1
If you use this in a delete statement you would be deleting all the duplicate records, when you probably intend to keep on of the records.
So to select the records to delete,
select cust_dup.id
from customers cust
inner join customers cust_dup on cust.name = cust_dup.name and cust_dup.id > cust.id
group by cust_dup.id
I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:
INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID)
SELECT DISTINCT MAILING_ID,REPORT_IDFROM table_1
;
The problem is that I want to remove duplicates by col1 and col2, but also want to include to the insert all the other fields of table_1.
I tried to add all the relevant columns this way:
INSERT IGNORE INTO temp(M_ID,MAILING_ID,REPORT_ID,
MAILING_NAME,VISIBILITY,EXPORTED) SELECT DISTINCT
M_ID,MAILING_ID,REPORT_ID,MAILING_NAME,VISIBILITY,
EXPORTED FROM table_1
;
M_ID(int,primary),MAILING_ID(int),REPORT_ID(int),
MAILING_NAME(varchar),VISIBILITY(varchar),EXPORTED(int)
But it inserted all rows into temp (including duplicates)
The best way to delete duplicate rows by multiple columns is the simplest one:
Add an UNIQUE index:
ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);
The IGNORE above makes sure that only the first found row is kept, the rest discarded.
(You can then drop that index if you need future duplicates and/or know they won't happen again).
This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clause by using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It never deletes unique rows.
DELETE FROM my_table
WHERE id IN (
SELECT calc_id FROM (
SELECT MAX(id) AS calc_id
FROM my_table
GROUP BY identField1, identField2
HAVING COUNT(id) > 1
) temp
)
I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.
For Mysql:
DELETE t1 FROM yourtable t1
INNER JOIN yourtable t2 WHERE t1.id < t2.id
AND t1.identField1 = t2.identField1
AND t1.identField2 = t2.identField2;
You will first need to find your duplicates by grouping on the two fields with a having clause.
Select identField1, identField2, count(*) FROM yourTable
GROUP BY identField1, identField2
HAVING count(*) >1
If this returns what you want, you can then use it as a subquery and
DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
GROUP BY identField1, identField2
HAVING count(*) >1 )
you can always get the primary ids by grouping that two unique fields
select count(*), id as count from table group by col a, col b having count(*)>1;
and then
delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;
you can also use max() in place of limit
NOTE: This solution is an alternative & old school solution.
If you couldn't achieve what you wanted, then you can try my "oldschool" method:
First, run this query to get the duplicate records:
select column1,
column2,
count(*)
from table
group by column1,
column2
having count(*) > 1
order by count(*) desc
After that, select those results and paste them into the notepad++:
Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).
Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:
Find what regex: \D*(\d+)\D*(\d+)\D*\r\n
Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n
Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.
This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.
In a large data set if you are selecting the multiple columns in the select clause ex:
select x,y,z from table1.
And the requirement is to remove duplicate based on two columns:from above example let y,z
then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:
select x,y,z
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1
I was wondering if there is a way to do something like selecting all without ... some columns here
something like SELECT */column1,column2 , is there a way to do this ?
I just need to output something like
column1 , column2 ( from another table ) , here all other columns without column1 ( or something to make the select skip the first few columns)
EDIT:
The thing is that i need this to be dynamic , so i cant just select what i don't know. I never know how many columns there will be , i just know the 1st and the 2nd column
EDIT: here is a picture http://oi44.tinypic.com/xgdyiq.jpg
I don't need the second id column , just the last column like i have pointed.
Start building custom views, which are geared aorund saving developers time and encapsulating them from the database schema.
Oh, so select all but certain fields. You have two options.
One is a little slow.. Copy the table, drop the fields you don't want, then SELECT *
The other is to build the field list from a subquery to information_schema or something, then remove occurrences of 'field_i_dont_want' in that list.
SELECT ( SELECT THE TABLES YOU WANT AND CONCAT INTO ONE STRING ) FROM TABLE
If you need to combine records from multiple tables, you need to find a way to relate them together. Primary Keys, Foreign Keys, or anything common among this.
I will try to explain this with a sql similar to your problem.
SELECT table1.id, table2.name, table1.column3, table1.column4
FROM table1
INNER JOIN table2 On table2.commmonfield = table1.commonfield
If you have 'n' columns in your table as in Col1,Col2,Col3....Coln you can select whatever columns you want to select from the table.
SELECT Col1,Col2 FROM YOURTABLE;
You either select all columns (*) or especify the columns you want one by one. There is no way to select 'all but some'.
The SQL language lets you either select a wildcard set of columns or enumerated single columns from a singular table. However you can join a secondary table and get a wildcard there.
SELECT
a.col1,
b.*
FROM
table_a as a
JOIN table_b as b ON (a.col5 = b.col_1)