I have a table with columns like this:
id timestamp content
where ID is a string, and timestamp is DEFAULT CURRENT_TIMESTAMP.
id and timestamp together make a composite key, so you can select the newest colum with something like:
select * from table where id = 'text-here' order by timestamp desc limit 1
I now have a problem where I want to delete all but the newest entry for each id, but I have no idea how to do this. If it had an auto-incrementing primary key I could use a sub-query to select the ones to keep and use NOT IN, as is demonstrated on numerous questions here, but I don't know how to do this with a composite key.
It is possible without a subquery too:
DELETE t
FROM t
JOIN t AS t2 ON t.timestamp < t2.timestamp AND t.id = t2.id;
http://sqlfiddle.com/#!9/1ff88/1
The following query:
DELETE mytable
FROM mytable
INNER JOIN (SELECT id, MAX(`timestamp`) AS `timestamp`
FROM mytable
GROUP BY id) AS t
ON mytable.id = t.id AND mytable.`timestamp` < t.`timestamp`
deletes all but the newest record per id from mytable.
Demo here
Related
Hi I have the following example table
ID
StartDate
1
05/12/2007
1
31/05/2010
I need it so that there is only one row per ID with the earliest start date as follows:
ID
StartDate
1
05/12/2007
Is there a way to do this in mySQL?
Many Thanks
Yes, you can simply do this, first arrange The items date wise and then apply the group by clause.
query may be:
SELECT FROM `TABLE` GROUP BY `ID` ORDER BY `StartDate`;
First group by then order by.
There are a couple of options - e.g. null left joins - but the simplest approach is probably something like this (and I'm assuming a unique column of 'UID'):
select t1.id, t1.startdate from myTable as t1
join myTable as t2 on t2.uid = (select uid from myTable where id = t1.id order by
startdate limit 1)
where t1.uid = t2.uid;
Simply:
SELECT ID, MIN(StartDate)
FROM your_table_name
GROUP BY ID
I tried to remove duplicate rows from a table TT
here is my query
delete t1
from TT t1
, TT t2
where t1.id < t2.id
and t1.url = t2.url
Here id is the primary key and url has the unique key in the table TT. You must be wondering why there are duplicate rows with unique index?
Actually it did happen and I don't know why but right now I want to remove the duplicate rows first. I am able to run the query in phpmyadmin but no duplicate rows are deleted at all(There is duplicate rows in the Table TT).
What could be the reason? Thanks!
You can use ROW_NUMBER() to remove duplicate
;WITH cte AS (
SELECT *
, ROW_NUMBER OVER(PARTITION BY url ORDER BY url) AS rn
FROM TT
)
DELETE FROM cte
WHERE rn > 1
I am updating my table setting a field named "status" based on the condition that the total number of distinct rows should be more than 10 and less than 13. The query is as follows:
update myTable set status='Established'
where id IN(select id, count(*) as c
from myTable
where year>=1996 and year<=2008
group by id
having count(distinct year)>=10 and count(distinct year)<=13)
The problem is, I'm getting error1241 that is "operand should contain 1 column"! Could you please advise how can I solve this? Thanks!
The result of the sub query must return only 1 column :
update myTable set status='Established'
where id IN(select id
from myTable
group by id
having count(distinct year)>=10 and count(distinct year)>=13)
In MySQL, an update with a join often performs better than an update with a subquery in the where clause.
This version might have better performance:
update myTable join
(select id, count(*) as c
from myTable
where year >= 1996 and year <= 2008
group by id
having count(distinct year) >= 10 and count(distinct year) <= 13
) filter
on myTable.id = filter.id
set status = 'Established';
I will also note that you have a table where a column called id is not unique among the rows. Typically, such a column would be a primary key, so the having clause would always fail (there would only be one row).
update myTable
set status='Established'
where id IN(select id from myTable
group by id
having count(distinct year)>=10
and count(distinct year)>=13)
You are using IN operator and then you inner query returns two columns id and count(*) it should return only one column back.
We have 2 tables called : "post" and " post_extra"
summery construction of "post" table's are: id,postdate,title,description
And for post_extra they are: eid,news_id,rating,views
"id" filed in the first table is related to "news_id" to the second table.
There are more than 100,000 records on the table, that many of them are duplicated. I want to keep only one record and remove duplicate records on "post" table that have the same title, and then remove the related record on "post_extra"
I ran this query on phpmyadmin but the server was crashed. And I had to restart it.
DELETE e
FROM Post p1, Post p2, Post_extra e
WHERE p1.postdate > p2.postdate
AND p1.title = p2.title
AND e.news_id = p1.id
How can I do this?
Suppose you have table named as 'tables' in which you have the duplicate records.
Firstly you have to do group by column on which you want to delete duplicate.But I am not doing it with group by.I am writing self join instead of writing nested query or creating temporary table.
SELECT * FROM `names` GROUP BY title, id having count(title) > 1;
This query return number of duplicate records with their title and id.
You don't need to create the temporary table in this case.
To Delete duplicate except one record:
In this table it should have auto increment column. The possible solution that I've just come across:
DELETE t1 FROM tables t1, tables t2 WHERE t1.id > t2.id AND t1.title = t2.title
if you want to keep the row with the lowest auto increment id value OR
DELETE t1 FROM tables t1, tables t2 WHERE t1.id < t2.id AND t1.title = n2.title
if you want to keep the row with the highest auto increment id value.
You can cross check your solution,by selecting the duplicate records again by given query:
SELECT * FROM `tables` GROUP BY title, id having count(title) > 1;
If it return 0 result, then you query is successful.
This will keep entries with the lowest id for each title
DELETE p, e
FROM Post p
left join Post_extra e on e.news_id = p.id
where id not in
(
select * from
(
select min(id)
from post
group by title
) x
)
SQLFiddle demo
You can delete duplicate record by creating a temporary table with unique index on the fields that you need to check for the duplicate value
then issue
Insert IGNORE into select * from TableWithDuplicates
You will get a temporary table without duplicates .
then delete the records from the original table (TableWithDuplicates) by JOIN the tables
Should be something like
CREATE TEMPORARY TABLE `tmp_post` (
`id` INT(10) NULL,
`postDate` DATE NULL,
`title` VARCHAR(50) NULL,
`description` VARCHAR(50) NULL, UNIQUE INDEX `postDate_title_description` (`postDate`, `title`, `description`) );
INSERT IGNORE INTO tmp_post
SELECT id,postDate,title,description
FROM post ;
DELETE post.*
FROM post
LEFT JOIN tmp_post tmp ON tmp.id = post.id
WHERE tmp.id IS NULL ;
Sorry I didn't tested this code
I have table called scheduler. It contains following columns:
ID
sequence_id
schedule_time (timestamp)
processed
source_order
I need to delete duplicate rows from the table but keeping 1 row which has same schedule_time and source_order for a particular sequence_id where processed=0
DELETE yourTable FROM yourTable LEFT OUTER JOIN (
SELECT MIN(ID) AS minID FROM yourTable WHERE processed = 0 GROUP BY schedule_time, source_order
) AS keepRowTable ON yourTable.ID = keepRowTable.minID
WHERE keepRowTable.ID IS NULL AND processed = 0
I apply from this post ;P How can I remove duplicate rows?
Have you seen it?
--fixed version--
DELETE yourTable FROM yourTable LEFT OUTER JOIN (
SELECT MIN(ID) AS minID FROM yourTable WHERE processed = 0 GROUP BY schedule_time, source_order
) AS keepRowTable ON yourTable.ID = keepRowTable.minID
WHERE keepRowTable.minID IS NULL AND processed = 0
For mysql
DELETE a from tbl a , tbl b WHERE a.Id>b.Id and
a.sequence_id= b.sequence_id and a.processed=0;
The fastest way to remove duplicates - is definitely to force them out by adding an index, leaving only one copy of each left in the table:
ALTER IGNORE TABLE dates ADD PRIMARY KEY (
ID
sequence_id
schedule_time
processed
source_order
)
Now if you have a key, you might need to delete it and so on, but the point is that when you add a unique key with IGNORE to a table with duplicates - the bahavior is to delete all the extra records / duplicates. So after you added this key, you now just need to delete it again to be able to make new duplicates :-)
Now if you need to do more complex filtering (on witch one of the duplicates to keep that you can not just include in indexes - although unlikely), you can create a table at the same time as you select and input what you want in it - all in the same query:
CREATE TABLE tmp SELECT ..fields.. GROUP BY ( ..what you need..)
DROP TABLE original_table
ALTER TABLE tmp RENAME TO original_table_name