I want to be able to limit the amount of duplicate records in a mySQL database table to 2.
(Excluding the id field which is auto increment)
My table is set up like
id city item
---------------------
1 Miami 4
2 Detroit 5
3 Miami 4
4 Miami 18
5 Miami 4
So in that table, only row 5 would be deleted.
How can I do this?
MySQL has some foibles when reading and writing to the same table. So I don't actually know if this will work, the syntax is fine in many implementations of SQL, but I don't know if it's MySQL friendly...
DELETE
yourTable
WHERE
1 < (SELECT COUNT(*)
FROM yourTable as Lookup
WHERE city = yourTable.city AND item = yourTable.item AND id < yourTable.id)
EDIT
Amazingly convoluted, but worth a try?
DELETE
yourTable
FROM
yourTable
INNER JOIN
(
SELECT
id
FROM
(
SELECT
id
FROM
yourTable
WHERE
1 < (SELECT COUNT(*)
FROM yourTable as Lookup
WHERE city = yourTable.city AND item = yourTable.item AND id < yourTable.id)
)
AS inner_deletes
)
AS deletes
ON deletes.id = yourTable.id
I think your problem here is that both your code and/or table structure allows inserting duplicates and you are asking this question when you should really fix your db and/or code.
i think a better solution is avoid allow more than 5 registers, you have to implement a validation where if select count(*) > 3 you will not accept the new insert.
because if you want to do this into the data tier, you have to use a stored procedure , because first you need to identify all the register with more than 3 registers and delete only the last .
Saludos
Due to MySQL being notoriously difficult when it comes to updating queried tables (see for example the answers from Dems), the best I can figure out is sadly more than one statement but on the plus side fairly readable;
CREATE TEMPORARY TABLE Dump AS SELECT id FROM table1 WHERE id NOT IN
(SELECT MIN(id) FROM table1 GROUP BY city,item UNION
SELECT MAX(id) FROM table1 GROUP BY city,item);
DELETE FROM table1 where id in (select * from Dump);
DROP TABLE DUMP;
Not sure if it was important which duplicate was removed, this keeps the first and last.
In your reply to Joachim's answer, you ask about saving 3 or 5 rows, this is one way to accomplish it. Depending on how you are using this database, you could either call this in a loop, or you could turn it into a stored procedure. Either way, you would continue to run this entire block of code until Rows Affected = 0:
drop table if exists TempTable;
create table TempTable
select city, item,
count(*) as record_count,
min(id) as ItemToDrop -- this could be changed to max() if you
-- want to delete new stuff instead
from YourTable
group by city, item
having count(*) > 2; -- This value = number of rows you save
delete from YourTable
where id in (select ItemToDrop from TempTable);
Related
I'm trying to learn sql better, views more specifically but I can't get the following to work out for me.
I've put a slimmed down version of it here. There's more joins I have to do based on foreign keys from the tbl2 matches.
Since it's a view, I can't create temp tables.
I can't rely on stored procedures in this case.
I could do outer apply, but only to get specific references (row 1, 2...) and that would be by doing a Select * from Table2 where.... and that would mean 1 index scan per time I use it.
I could create the view using "With tbl2 (FK_TABLE1...) as SELECT FK_TABLE1 from dbo.TABLE2) but that doesn't seem to be helpful. Each reference to it does a sort or a scan so no gain there.
Is there some way I'm able to create some type of list that I can reuse so I can simply just run 1 index scan to get the matching ones from Table2?
Or is there another way to think about this?
Table1 (PK, XX, YY)
Table2 (PK, FK_TABLE1, Type, Progress, ZZ, FK_Status)
Create View MyView
as
Select
Table1.PK
,Table1.XX
,Table1.YY
---- I want to present data from the first 3 matches
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(0) ROWS FETCH NEXT (1) ROWS ONLY) ZZ1
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(1) ROWS FETCH NEXT (1) ROWS ONLY) ZZ2
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(2) ROWS FETCH NEXT (1) ROWS ONLY) ZZ3
,sts.StatusName CurrentStatus
From Table1
LEFT OUTER JOIN Table2 AS tbl2 ON (tbl2.FK_TABLE1= Table1.PK) ---- Here I want to make some sort of join so I get all matching rows from the other table
LEFT OUTER JOIN STATUS AS sts ON (sts.PK = [tbl2 ordered by type, if last elements status = X take that, else status of first).FK_STATUS) ---- Here I'm a bit puzzled, since I have to order by, but also have a fallback value if last element isn't matching.
I have a table called actions, with the following columns, I want to extract only the ID_tracking that have not done a certain action. I tried
SELECT id_tracking from table WHERE id_tracking NOT IN
( SELECT id_tracking FROM table where id_action = X ).
This method works, but it takes extremely long on a small table, and there will be tables with millions of rows so this is not a solution. How can this be done?
Sample data
ID_tracking | ID_action
1009 1
1009 2
1009 3
1009 5
1010 2
1010 3
1010 4
1011 5
I often approach this type of problem using GROUP BY and HAVING:
SELECT id_tracking
FROM table t
GROUP BY id_tracking
HAVING SUM(id_action = x) = 0;
One issue with your query is that you'll get multiple rows for each id_tracking that meets the condition.
In practice, though, a very reasonable approach would be:
SELECT t.id_tracking
FROM tracking t
WHERE NOT EXISTS (SELECT 1
FROM trackingaction ta
WHERE ta.id_tracking = t.id_tracking AND
ta.id_action = X
);
This uses two different tables, one where id_tracking is the unique key and one which is the table you describe. For best results, you want an index on trackingaction(id_tracking, id_action).
Why don't you do like this:
Do you have a different table for tracking IDs i.e. ID_tracking? If yes, then add a does_it_have_action column there to check if tracking ID has a action added in actions table.
When a new entry is added to your table with ID_tracking and ID_action then update that tracking IDs table and set does_it_have_action = 1 else keep it default 0.
When you want to check the tracking IDs for actions, just make a select query to this table.
Now, you will have to care about update and insert statements to this tracking IDs table. So, whenever an action is added for a tracking ID, check this tracking IDs table if ID_tracking exists or not. If exists then update does_it_have_action column equals to 1 else insert it with does_it_have_action = 0.
I hope that will work for you when you will have billions of rows one day.
P.S: Here is a rough structure for this new table tracking_ids
ID_tracking | does_it_have_action (default to 0) | created_at (current
timestamp)
Using distinct and join will make your query hell faster (when you get numerous results in inner query). I just used distinct and translated your not in to join. Try this
Demo At SqlFiddle
SELECT distinct yt.id_tracking
FROM table1 yt
left join (SELECT distinct id_tracking as idt from table1 where id_action = 3) mt
on yt.id_tracking=mt.idt
where mt.idt is null
first I will like to state that am still a newbie on writing SQL Queries. I thoroughly searched for an answer on this Error and I got a good number of answers, but none seems to be helpful or I will say I don't really know how to apply the solutions to mine.
Here is my challenge, I have an application table, that stores applicants records with some unique columns e.g (dl_number,parent_id,person_id). The parent_id keeps tracks of individual applicant history records with the his/her first record and each applicant is meant to have a unique dl_number, but for some reasons, some applicants dl_number(s) are not unique, hence a need to identify the records with changing dl_number(s).
Below is the SQL Query, that am getting the [sql error (1241) operand should contain 1 column(s)] error on.
SELECT id,application_id,dl_number,surname,firstname,othername,birth_date,status_id,expiry_date,person_id,COUNT(DISTINCT(dl_number,parent_id,birth_date)) AS NumOccurrences
FROM tbl_dl_application
WHERE status_id > 1
GROUP BY dl_number,parent_id,birth_date
HAVING NumOccurrences > 1
Please any help on how to solve this, or a better way to solve this.
Sample table and expected result
DISTICT is not really a function to be used that way.
You can do SELECT DISTICT column1, column2 FROM table to get unique rows only, or similarly SELECT column, count(DISTINCT anothercolumn) FROM table GROUP BY column to get unique rows within a group.
Problem as I understand it: You look for duplicates in your table. Duplicates are defined as having identical values of these 3 columns: dl_number, parent_id and birth_date.
I'm also assuming that id is a primary key in your table. If not, replace the t2.id <> t.id condition with one that uniquely identify your row.
If you only wanted to know what are the duplicated groups, this should work:
SELECT dl_number, parent_id, birth_date, count(*) as NumOccurences -- You can only add aggregation functions here, not another column unless you group by it.
FROM tbl_dl_application t
WHERE status_id > 1 -- I don't know what this is but it should do no harm.
GROUP BY dl_number, parent_id, birth_date
HAVING count(*)>1
If, however, you want to know details of each duplicated row, this query will give you that:
SELECT *
FROM tbl_dl_application t
WHERE
status_id > 1 -- I don't know what this is but it should do no harm.
AND EXISTS (
SELECT 1
FROM tbl_dl_application t2
WHERE
t2.dl_number = t.dl_number
AND t2.parent_id = t.parent_id
AND t2.birth_date = t.birth_date
AND t2.id <> t.id
)
ORDER BY dl_number, parent_id, birth_date, id; -- So you have your duplicates nicely next to each other.
Please explain further if I misunderstood your objective, or ask if the solution is not clear enough.
**You have to use only one column while use to DISTINCT function. You used this three field dl_number,parent_id,birth_date. Just use 1 filed from these 3. Then query will run.**
For example.
SELECT id,application_id,dl_number,surname,firstname,othername,birth_date,status_id,expiry_date,person_id,COUNT(DISTINCT(parent_id)) AS NumOccurrences
FROM tbl_dl_application
WHERE status_id > 1
GROUP BY dl_number,parent_id,birth_date
HAVING NumOccurrences > 1
This is a hard one. A third party has been sending us data from a fourth party. But they have done that in a horrible format and they messed up and duplicated many of the data.
Now the data is all in one table, even though it should have been in much more than one. This has to do with a historical data format.
Now what SHOULD be each record with multiple related records in other tables, is actually put into our database as follows:
Id HistoricalId Field1 Field2 Field3 Field4 FieldX ...
1 327
2 data data data
3 data data data
4 data data
5 data data
6 328
7 data data data (etc etc)
Everything grossly simplified. So you always first have a sort of "header record". Then records with the data. Until there is a new header. Let's call all the records from one header to the next together a "superrecord" (for instance in the example ID 1 t/m 5 form together the first superrecord, the next superrecord stats at Id 6).
Problem is: there are MANY duplicate "superrecords", easily identified by their duplicate HistoricalId in the header record. But they can be anywhere in the database (the records that form the superrecord will be well sorted and not mixed up, but the superrecords are mixed up).
So the puzzle: remove all duplicate superrecords. We are talking 10.000s here if not more.
So, how would you, in MySQL:
Find a Id from a duplicate superrecord (easy)
Find the Id from the next header record (i.e. the following superrecord)
Delete everything between (and including) the first Id and the second Id minus 1
And do this for all duplicate superrecords.
My head starts spinning. It must be possible with just mySQL, but how? I am just not experienced enough. Even though I am not bad at MySQL, here I cannot even see where to start. Or should I program something in php?
Anyone likes a challenge? Thank you in advance!
UPDATE: Solved it thanks to you and two hours of hard work. See solution.
If you're open to copying to a different table etc., then...
You can figure which records you want to delete. All records where the historical-id exists in some other record with a higher ID
SELECT id, HISTORICAL_ID
FROM tbl t1
WHERE historical_id>0
AND exists
(SELECT 1 FROM tbl t2
WHERE T2.hISTORICAL_id=T1.HISTORICAL_ID and T2.ID>T1.ID)
Since each record has an ID, for each record, you could compute the ID of the Header Record. (This is what you mention in your comment). It would be the Max. ID from any "previous" record where historical id is filled in.
Select ID, HISTORICAL_ID
,(Select MAX(ID) FROM T2 Where T1.ID <T2.ID and T1.HistoricalId<>0) As PARENT_ID
From TBL T1
You can then match the PARENT_ID with the first query to get all the IDs you wish to delete
I finally solved it. Thanks everyone, you all put me into the right direction.
Three queries are needed:
First mark all duplicate header records by setting HistoricalID to -1
UPDATE
t1 INNER JOIN
(SELECT MIN(id) AS keep, HistoricalID FROM t1
GROUP BY HistoricalID
HAVING count(*) > 1 AND HistoricalID > 0) t2
ON t1.HistoricalID = t2.HistoricalID
SET HistoricalID = IF(t1.id=t2.keep, t1.HistoricalID , -1)
WHERE t1.HistoricalID > 0
Secondly copy HistoricalID from the header record to all other records below it (in the same superrecord). I can undo this later easily if needed.
UPDATE
t1 JOIN
( SELECT Id, #s:=IF(HistoricalID='', #s, HistoricalID) HistoricalID FROM
(SELECT * FROM t1 ORDER BY Id) r, (SELECT #s:='') t ) t2
ON t1.Id = t2.Id
SET t1.HistoricalID= t2.HistoricalID
Delete all duplicates:
DELETE FROM t1 WHERE HistoricalID = -1
It worked. Couldn't have done it without you!
I am trying to delete duplicate rows from my mysql table. I've tried multiple queries but I am keep on getting this error: #1093 - You can't specify target table 'usa_city' for update in FROM clause
The table looks like this:
usa_city
--------
id(pk)
id_state
city_name
And the queries I have tired were:
DELETE FROM usa_city
WHERE id NOT IN
(
SELECT MIN(id)
FROM usa_city
GROUP BY city_name, id_state
)
And:
DELETE
FROM usa_city
WHERE usa_city.id IN
-- List 1 - all rows that have duplicates
(SELECT F.id
FROM usa_city AS F
WHERE Exists (SELECT city_name, id_state, Count(id)
FROM usa_city
WHERE usa_city.city_name = F.city_name
AND usa_city.id_state = F.id_state
GROUP BY usa_city.city_name, usa_city.id_state
HAVING Count(usa_city.id) > 1))
AND usa_city.id NOT IN
-- List 2 - one row from each set of duplicate
(SELECT Min(id)
FROM usa_city AS F
WHERE Exists (SELECT city_name, id_state, Count(id)
FROM usa_city
WHERE usa_city.city_name = F.city_name
AND usa_city.id_state = F.id_state
GROUP BY usa_city.city_name, usa_city.id_state
HAVING Count(usa_city.id) > 1)
GROUP BY city_name, id_state);
Thanks in advance.
Try to select the duplicates first, the delete them
DELETE FROM usa_city WHERE city_id IN
(
SELECT city_id FROM usa_city
GROUP BY city_name, id_state
HAVING count(city_id) > 1
)
Hope it helps!!!
MODIFIED: Based on the comment, if you want to keep one record, you can make a join and keep the lowest value
DELETE c1 FROM usa_city c1, usa_city c2 WHERE c1.id < c2.id AND
(c1.city_name= c2.city_name AND c1.id_state = c2.id_state)
Be sure to make a backup before executing the query above...
from mysql documentation:
"Currently, you cannot delete from a table and select from the same
table in a subquery."
but here is a workaround for update, should work for delete too.
also, you could select rows, and then in php for example delete them in loop
You may found here an answer to your problem: How to delete duplicate records in mysql database?
You should improve your database by using keyfields to prevent duplicate rows, so you dont need to clear in future.
Edit : This solution is also found if you follow the link posted by BloodyWorld, so if it works please go and upvote DMin's post here
Found this browsing the internet (#1 google result for mysql delete duplicate rows), have you tried it?
delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.field_name=vtable.field_name)
Judging from your examples, when you say "duplicate", you mean "having the same combination of id_state and city_name", correct? If so after you have done removing the duplictes, I strongly suggest creating a UNIQUE constraint on {id_state, city_name}.
To actually remove the duplicates, it is not enough to just identify the set of duplicates, you must also decide which of the identified duplicates to keep. Assuming you want to keep the ones with the smallest id, the following piece of SQL will do the job:
CREATE TEMPORARY TABLE usa_city_to_delete AS
SELECT id FROM usa_city T1
WHERE EXISTS (
SELECT * FROM usa_city T2
WHERE
T1.id_state = T2.id_state
AND T1.city_name = T2.city_name
AND T1.id > T2.id
);
DELETE FROM usa_city
WHERE id IN (SELECT id FROM usa_city_to_delete);
DROP TEMPORARY TABLE usa_city_to_delete;
Unfortunately, MySQL does not allow the correlated subqueries in DELETE, otherwise we could have done that in a single statement, without the temporary table.
--- EDIT ---
You can't have a correlated subquery but you can have JOIN, as illustrated by Carlos Quijano answer. Also, the temporary table can be created implicitly, as suggested by Kokers.
So it is possible to do it in a single statement, contrary to what I wrote above...