Essentially I have the following called Table1 with columns OrderNum and Book there should never be duplicate records of any kind of Book for each OrderNum, if there is it needs to identified and deleted.
For example:
OrderNum 1 should only have Book1 listed once so the query must identify the other 2 Book1 listed for OrderNum 1 and delete them.
OrderNum 4 should only have Book2 listed once so the query must identify the other Book2 listed for OrderNum 4 and delete it.
After the query runs Table1 Should look like this:
I am working with MS Access queries but I am looking for a solution that could work for an mySQL query as well.
I don't know how to do this gracefully on either MySQL or Access, because your table doesn't have a primary key column, which it rightfully should have. On Access, you could try creating a new table, then populating it using the following query:
INSERT INTO yourNewTable (OrderNum, Book)
SELECT DISTINCT OrderNum, Book
FROM yourTable;
Then, delete yourTable after you are done with the above query.
If you had a primary key/auto increment column in your table, let's say id, then you could use the following delete statement directly:
DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.OrderNum = t1.OrderNum AND
t2.Book = b1.Book AND
t2.id < t1.id);
This would leave, for each (OrderNum, Book) combination, the single record among duplicates which happens to have the lowest id value.
Related
I thought I'd made the column userid in my table "userslive" unique, but somehow must have made a mistake. I've seen multiple answers to this question, but I'm afraid of messing up again so I hope someone can help me directly.
So this table has no unique columns, but I've got a column "timer" which was the timestamp of scraping the data. If possible I'd like to drop rows with the lowest "timer" with duplicate "userid" column.
It's a fairly big table at about 2 million rows (20 columns). There is about 1000 duplicate userid which I've found using this query:
SELECT userid, COUNT(userid) as cnt FROM userslive GROUP BY userid HAVING (cnt > 1);
Is this the correct syntax? I tried this on a backup table, but I suspect this is too heavy for a table this big (unless left to run for a very long time.
DELETE FROM userslive using userslive,
userslive e1
where userslive.timer < e1.timer
and userslive.userid = e1.userid
Is there a quicker way to do this?
EDIT: I should say the "timer" is not a unique column.
DELETE t1.* /* delete from a copy named t1 only */
FROM userslive t1, userslive t2
WHERE t1.userid = t2.userid
AND t1.timer < t2.timer
fiddle
Logic: if for some record (in a copy aliased as t1) we can find a record (in a table copy aliased as t2) with the same user but with greater/later timer value - this record must be deleted.
I've done this in the past and the easiest way to solve this is to add an id column and then select userid, max(new_id) into a new table and join that for the delete. Something like this.
ALTER TABLE `userslive`
ADD `new_id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
Now you have your new unique column and create a new table for selecting the ones to delete.
CREATE TABLE `users_to_delete`
AS
SELECT userid, new_id
FROM (
SELECT userid, max(new_id) new_id, count(*) user_rows
FROM `userslive`
GROUP BY 1
) dataset
WHERE user_rows > 1
Then use that to delete your duplicate rows by joining it into a DELETE statement like this:
DELETE `userslive` FROM `userslive`
INNER JOIN `users_to_delete` USING(userid,new_id);
Make sure you back everything up before you delete anything just in case.
I have certain geocoded addresses that are duplicates in my SQL table I am trying to delete and only leave one. I've written the code below:
SELECT
address`, COUNT(*)
FROM stores
GROUP BY address`
HAVING COUNT(*) > 1
However, it only shows the address and the actual count (which is 2) showing duplicates and won't let me select any to delete. What additional sql command do I need to add to delete duplicates?
Assuming that id is unique column in your table stores
DELETE FROM stores WHERE id NOT IN
( SELECT min FROM (SELECT MIN(id) AS min FROM stores GROUP BY address) AS T1)
I have a table “table-A” with duplicate records like (duplicates based on “name” column)
"`
ID Name Course
01 abc dotnet
02 xyz java
03 mno sas
04 abc dotnet
05 xyz java
06 abc dotnet
07 aaa testing
08 bbb sap
--- “abc” came 3 times (id—1,4,6)
--- “xyz” came 2 times (id – 2,5)
`"
From above table I need to delete duplicates (with ID -1, 4, 2) — not IDs 6, 5 those are latest files.
“table-A” should contain only--
ID Name Course
03 mno sas
05 xyz java
06 abc dotnet
07 aaa testing
08 bbb sap
I tried like—
CREATE TEMPORARY TABLE temptable (idTemp int(12), totTemp int(4));
INSERT INTO temp_table(`idTemp`, `totTemp`) select max(ID), count(*) as tot from table-A
group by Name, Course having tot > 1 or tot =1 order by ID ;
Delete from table-A where ID not in (select idTemp from temp_table);
Above code is working. But it’s taking very huge time on large data table. My table contains 200,000 records with 40+ columns and every month it’s added 20,000 records.
I need to find duplicates based on 10 columns (group by on 10 columns) in this case anyone suggest well and fast working code.
I find different logics on internet and tried but they are taking more time than I explained above.
My main concern is query execution time. So please suggest me good logic or query which will do above task fast.
(just for information:--
i find few logics in stackoverflow also, the best one is suggesting to make unique index on thous columns. but in my case the data is coming from government every month, it may contain duplicates in file as well as compare to database. so i need to delete the duplicates or show duplicates in grid (using asp.net).
)
ALTER IGNORE TABLE table_a ADD UNIQUE INDEX index_123 (name, course );
this will drop duplicate rows.
and makes inserts with duplicate data give an error, but make sure to take a backup before running this query
You must delete the duplicates manually, to prevent this in the future you have to make the values that should not be duplicated unique, i.e users cannot share the same number, employee number etc but they can share the same name and surname. Read up on the unique property with SQL
DELETE t1.*
FROM
tableName t1 INNER JOIN tableName t2
ON t1.Name=t2.Name
AND t1.ID < t2.ID
Please see fiddle here.
I think you should create an unique index on these fields to avoid duplicates on INSERT
Here is the query to delete duplicates:
DELETE FROM T WHERE ID NOT IN
(SELECT MAX(ID) FROM (SELECT * FROM T) T1 GROUP BY Name)
SQLFidddle demo
One more way:
DELETE T1
FROM T as T1
LEFT JOIN (SELECT MAX(ID) as ID FROM T GROUP BY Name) as T2
ON T1.Id=T2.Id
Where T2.id is null
SQLFidddle demo
i am currently writing query. i want to select all records from table . records will be based on mutiple values of foreign key. for example all records related to 1 and 2 both
eg. table might have
id name uid
1 bil 3
2 test 3
3 test 4
4 test 4
5 bil 5
6 bil 5
i want to select all records related to 3 but also related to 4 in this case it is record number 2
SELECT id
FROM `table`
WHERE uid = value1 AND like_id
IN (SELECT like_id
FROM likes
WHERE uid = uid2)
LIMIT 0 , 30
It's not at all clear where "value1" is coming from, or "uid2" is coming from, or where the column "like_id" is coming from. Those column names do not appear in your sample table. Your example query references two different table names (table and likes), yet you only show data for one example table, and that table does not have a column named like_id.
If we assume that "value1" and "uid2" in your query are literals, or bind parameters supplied to the query, which seems to be reasonable, given your specification (variously), of values of 1,2,3 and 4. But we're still left with "like_id" column. Given that it's referenced in the SELECT list of the IN subquery, we're going to presume that's a column in the "likes" table, and given that it's referenced in the outer query, we're going to assume that it's a column in the (unfortunately named) table table.
(Bottomline, it's not at all clear how your query is returning a "correct" result, given that you've made it impossible to replicate a working test case.)
Given a single table, as shown in your example data, e.g.
CREATE TABLE likes (id INT, name VARCHAR(4), uid INT);
INSERT INTO likes VALUES (1,'bil',3),(2,'test',3),(3,'test',4)
,(4,'test',4),(5,'bil',5),(6,'bil',5);
ALTER TABLE likes ADD PRIMARY KEY (id);
ALTER TABLE likes ADD CONSTRAINT likes_ix UNIQUE KEY (uid, name);
Assuming that we're running a query against that single table, and that we're matching "likes" associated with uid=3 to "likes" associated with uid=4, and that the matching is done on the "name" column, then
SELECT t.id
FROM `likes` t
WHERE t.uid = 3
AND EXISTS
( SELECT 1
FROM `likes` s
WHERE s.name = t.name
AND s.uid = 4
)
That will return the id of the row from the likes table for uid=3 where we also find a row in the likes table for uid=4 with a matching name value.
Given a limited number of rows to be inspected from the likes table on the outer query, that gives a limited number of times a correlated subquery would need to be run, which should give reasonable performance:
For large sets, a join operation generally performs better to return an equivalent result:
SELECT t.id
FROM `likes` t
JOIN `likes` s
ON s.name = t.name
AND s.uid = 4
WHERE t.uid = 3
GROUP
BY t.id
The key to optimum performance for either query is going to be appropriate indexes.
I have two tables (id_test, test) , each of them has an ID column, which is unique, and two entries with the same id in the two tables are the same. Now, i have another column in one of the tables (id_test) that also should be unique, so I want to eliminate duplicates according to this other column, let's call it YD.
To identify the duplicates I used
SELECT ID, YD AS x, COUNT(*) AS y
FROM id_test
GROUP BY x
HAVING y>1;
now, I want to delete these entries in both tables. How can I do it?
This query shows the first ID for every YD in id_test table:
SELECT ID, YD
FROM id_test
GROUP BY YD
and these are the rows you have to keep. The following query returns the IDs you have to delete:
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL
Now I think i need more details about your tables, but what I think you need is this:
DELETE FROM test
WHERE
test.ID IN (
SELECT id_test.ID
FROM id_test LEFT JOIN (select ID, YD from id_test group by YD) id_test_keep
on id_test.ID=id_test_keep.ID and id_test.YD = id_test_keep.YD
WHERE id_test_keep.ID IS NULL)
As documented under ALTER TABLE Syntax (emphasis added):
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
Therefore:
ALTER IGNORE TABLE id_test ADD UNIQUE (YD)
I think you don't user select in because if data large it impossible.
You should clone a table the same structure. Insert data not duplicate in it.
INSERT INTO test_new (ID, YD) SELECT t.ID, t.YD FROM test t LEFT JOIN test_id ti ON t.ID = ti.id WHERE ti.id IS NULL;
After drop table test, rename test_new -> test.