Deleting duplicate rows with SQL, CTE and everything else not working - mysql

I'm trying to delete a lot of duplicate rows from a SQL table with businesses' codes and businesses' descriptions but I have to keep one for each entry, I have something like 1925 rows and I have 345 rows with duplicates and triple entries, this is the query I used to find duplicates and triple entries:
SELECT codice_ateco_2007, descrizione_ateco_2007, COUNT(*) AS CNT FROM codici_ateco_il_leone GROUP BY codice_ateco_2007, descrizione_ateco_2007 HAVING CNT > 1;
I tried the following but won't work, any of them, when I use CTE I get and error saying unknown function after WITH statement and when I use the other codes like
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
it won't work anyway it says I cannot select the table inside the in function.
Is CTE and the other code out of date or what?How can somebody fix this?By the way there also is id PRIMARY KEY in the codici_ateco_il_leone table.

One method is row_number() with a join:
delete mdt
from MyDuplicateTable mdt join
(select mdt2.*,
row_number() over (partition by DuplicateColumn1, DuplicateColumn2, DuplicateColumn3 order by id) as seqnum
from MyDuplicateTable mdt2
) mdt2
on mdt2.id = mdt.id
where seqnum > 1;
A similar approach uses aggregation:
delete mdt
from MyDuplicateTable mdt join
(select DuplicateColumn1, DuplicateColumn2, DuplicateColumn3, min(id) as min_id
from MyDuplicateTable mdt2
group by DuplicateColumn1, DuplicateColumn2, DuplicateColumn3
having count(*) > 1
) mdt2
using (DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
where mdt.id > mdt2.min_id;
Both of these assume that id is a global unique identifier for each row. That seems reasonable based on the context. However, both can be tweaked if the id can be duplicated for different values of the three key columns.

Your delete statement is fine and works in about every DBMS - except for MySQL where you get this stupid error. The solution to this is simple: replace from sometable with from (select * from sometable) somealias:
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM (SELECT * FROM MyDuplicateTable) t
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3
);

Related

Select column from selected column subquery [duplicate]

I am running this query on MySQL
SELECT ID FROM (
SELECT ID, msisdn
FROM (
SELECT * FROM TT2
)
);
and it is giving this error:
Every derived table must have its own alias.
What's causing this error?
Every derived table (AKA sub-query) must indeed have an alias. I.e. each query in brackets must be given an alias (AS whatever), which can the be used to refer to it in the rest of the outer query.
SELECT ID FROM (
SELECT ID, msisdn FROM (
SELECT * FROM TT2
) AS T
) AS T
In your case, of course, the entire query could be replaced with:
SELECT ID FROM TT2
I think it's asking you to do this:
SELECT ID
FROM (SELECT ID,
msisdn
FROM (SELECT * FROM TT2) as myalias
) as anotheralias;
But why would you write this query in the first place?
Here's a different example that can't be rewritten without aliases ( can't GROUP BY DISTINCT).
Imagine a table called purchases that records purchases made by customers at stores, i.e. it's a many to many table and the software needs to know which customers have made purchases at more than one store:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases)
GROUP BY customer_id HAVING 1 < SUM(1);
..will break with the error Every derived table must have its own alias. To fix:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases) AS custom
GROUP BY customer_id HAVING 1 < SUM(1);
( Note the AS custom alias).
I arrived here because I thought I should check in SO if there are adequate answers, after a syntax error that gave me this error, or if I could possibly post an answer myself.
OK, the answers here explain what this error is, so not much more to say, but nevertheless I will give my 2 cents, using my own words:
This error is caused by the fact that you basically generate a new table with your subquery for the FROM command.
That's what a derived table is, and as such, it needs to have an alias (actually a name reference to it).
Given the following hypothetical query:
SELECT id, key1
FROM (
SELECT t1.ID id, t2.key1 key1, t2.key2 key2, t2.key3 key3
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.key3 = 'some-value'
) AS tt
At the end, the whole subquery inside the FROM command will produce the table that is aliased as tt and it will have the following columns id, key1, key2, key3.
Then, with the initial SELECT, we finally select the id and key1 from that generated table (tt).

How to use AVG() function after GROUP BY with CASE in MySQL [duplicate]

I am running this query on MySQL
SELECT ID FROM (
SELECT ID, msisdn
FROM (
SELECT * FROM TT2
)
);
and it is giving this error:
Every derived table must have its own alias.
What's causing this error?
Every derived table (AKA sub-query) must indeed have an alias. I.e. each query in brackets must be given an alias (AS whatever), which can the be used to refer to it in the rest of the outer query.
SELECT ID FROM (
SELECT ID, msisdn FROM (
SELECT * FROM TT2
) AS T
) AS T
In your case, of course, the entire query could be replaced with:
SELECT ID FROM TT2
I think it's asking you to do this:
SELECT ID
FROM (SELECT ID,
msisdn
FROM (SELECT * FROM TT2) as myalias
) as anotheralias;
But why would you write this query in the first place?
Here's a different example that can't be rewritten without aliases ( can't GROUP BY DISTINCT).
Imagine a table called purchases that records purchases made by customers at stores, i.e. it's a many to many table and the software needs to know which customers have made purchases at more than one store:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases)
GROUP BY customer_id HAVING 1 < SUM(1);
..will break with the error Every derived table must have its own alias. To fix:
SELECT DISTINCT customer_id, SUM(1)
FROM ( SELECT DISTINCT customer_id, store_id FROM purchases) AS custom
GROUP BY customer_id HAVING 1 < SUM(1);
( Note the AS custom alias).
I arrived here because I thought I should check in SO if there are adequate answers, after a syntax error that gave me this error, or if I could possibly post an answer myself.
OK, the answers here explain what this error is, so not much more to say, but nevertheless I will give my 2 cents, using my own words:
This error is caused by the fact that you basically generate a new table with your subquery for the FROM command.
That's what a derived table is, and as such, it needs to have an alias (actually a name reference to it).
Given the following hypothetical query:
SELECT id, key1
FROM (
SELECT t1.ID id, t2.key1 key1, t2.key2 key2, t2.key3 key3
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.key3 = 'some-value'
) AS tt
At the end, the whole subquery inside the FROM command will produce the table that is aliased as tt and it will have the following columns id, key1, key2, key3.
Then, with the initial SELECT, we finally select the id and key1 from that generated table (tt).

Display the orders in which more than one article is ordered

I tried to display the entries in table where the order has more than one article: but it´s not working the way I tried it. Can somebody show me what´s wrong?!
Here´s what I tried:
SELECT *
FROM TableX
WHERE (SELECT COUNT(Ordernumber) FROM TableX AS a WHERE a>1);
One option is to use a subquery to identify the order numbers having more than one article, then join this subquery to your original table to obtain the full records for these matching orders.
SELECT t1.*
FROM TableX t1
INNER JOIN
(
SELECT Ordernumber
FROM TableX
GROUP BY Ordernumber
HAVING COUNT(*) > 1
) t2
ON t1.Ordernumber = t2.Ordernumber
This query assumes that all articles within a given order are unique. If duplicate articles could occur, and you would not count duplicates, then you can use the following HAVING clause instead:
HAVING COUNT(DISTINCT article) > 1
Another option:
SELECT *
FROM TableX
WHERE Ordernumber IN
(
SELECT Ordernumber
FROM TableX
GROUP BY Ordernumber
HAVING COUNT(*) > 1
)

SQL find distinct and show other columns

I have read many replies and to similar questions but cannot seem to apply it to my situation. I have a table that averages 10,000 records and is ever changing. It containing a column called deviceID which has about 20 unique values, another called dateAndTime and many others including status1 and status2. I need to isolate one instance each deviceID, showing the record that had the most current dateAndTime. This works great using:
select DISTINCT deviceID, MAX(dateAndTime)
from MyTable
Group By deviceID
ORDER BY MAX(dateAndTime) DESC
(I have noticed omitting DISTINCT from the above statement also yields the same result)
However, I cannot expand this statement to include the fields status fields without incurring errors in the statement or incorrect results. I have tried using IN and EXISTS and syntax to isolate rows, all without luck. I am wondering how I can nest or re-write this query so that the results will display the unique deviceID's, the date of the most recent record and the corresponding status fields associated with those unique records.
If you can guarantee that the DeviceID + DateAndTime is UNIQUE you can do the following:
SELECT *
FROM
MyTable as T1,
(SELECT DeviceID, max(DateAndTime) as mx FROM MyTable group by DeviceID) as T2
WHERE
T1.DeviceID = T2.DeviceID AND
T1.DateAndTime = T2.mx
So basically what happens is, that you do a group by on the DeviceID (NOTE: A GROUP BY always goes with an aggregate function. We are using MAX in this case).
Then you join the Query with the Table, and add the DeviceID + DateAndTime in the WHERE clause.
Side Note... GROUP BY will return distinct elements with or without adding DISTINCT because all rows are distinct by default.
Maybe:
SELECT a.*
FROM( SELECT DISTINCT *,
ROW_NUMBER() OVER (PARTITION BY deviceID ORDER BY dateAndTime DESC) as rown
FROM MyTable ) a
WHERE a.rown = 1

Using sql to find duplicate records and delete in same operation

I'm using this SQL statement to find duplicate records:
SELECT id,
user_id,
activity_type_id,
source_id,
source_type,
COUNT(*) AS cnt
FROM activities
GROUP BY id, user_id, activity_type_id, source_id, source_type
HAVING COUNT(*) > 1
However, I want to not only find, but delete in the same operation.
delete from activities where id not in (select max(id) from activities group by ....)
Thanks to #OMG Ponies and his other post here is revised solution (but not exactly the same). I assumed here that it does not matter which specific rows are left undeleted. Also the assumption is that id is primary key.
In my example, I just set up one extra column name for testing but it can be easily extended to more columns via GROUP BY clause.
DELETE a FROM activities a
LEFT JOIN (SELECT MAX(id) AS id FROM activities GROUP BY name) uniqId
ON a.id=uniqId.id WHERE uniqId.id IS NULL;