How to identify and select all duplicated entries - mysql

I have trying to identify and select all dublicated entries filled under an row in my mysql table.
I have tried using this query:
SELECT id, link, COUNT(*)
FROM linkuri
HAVING COUNT(*)>1
LIMIT 0 , 30
The problem is that is resulting me 0 results and I've checked manualy few pages and I've seen some dublicates entries.
What I want is to check and delete all dublicated entryes filled under the row link.

You are probably looking for
SELECT a.id, a.link, b.cnt
FROM linkuri
INNER JOIN
( SELECT link, COUNT(*) AS cnt FROM linkuri GROUP BY link HAVING COUNT(*) >1 )b
ON (b.link = a.link)

The problem with your query is you're not grouping by anything. To find duplicates you have to group them by a column(s) and then get the count from each group. Having statement is like a where clause on the group http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
SELECT id, link, COUNT(*)
FROM linkuri
GROUP BY link
HAVING COUNT(*)>1
LIMIT 0 , 30

Related

Deleting duplicate rows with SQL, CTE and everything else not working

I'm trying to delete a lot of duplicate rows from a SQL table with businesses' codes and businesses' descriptions but I have to keep one for each entry, I have something like 1925 rows and I have 345 rows with duplicates and triple entries, this is the query I used to find duplicates and triple entries:
SELECT codice_ateco_2007, descrizione_ateco_2007, COUNT(*) AS CNT FROM codici_ateco_il_leone GROUP BY codice_ateco_2007, descrizione_ateco_2007 HAVING CNT > 1;
I tried the following but won't work, any of them, when I use CTE I get and error saying unknown function after WITH statement and when I use the other codes like
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
it won't work anyway it says I cannot select the table inside the in function.
Is CTE and the other code out of date or what?How can somebody fix this?By the way there also is id PRIMARY KEY in the codici_ateco_il_leone table.
One method is row_number() with a join:
delete mdt
from MyDuplicateTable mdt join
(select mdt2.*,
row_number() over (partition by DuplicateColumn1, DuplicateColumn2, DuplicateColumn3 order by id) as seqnum
from MyDuplicateTable mdt2
) mdt2
on mdt2.id = mdt.id
where seqnum > 1;
A similar approach uses aggregation:
delete mdt
from MyDuplicateTable mdt join
(select DuplicateColumn1, DuplicateColumn2, DuplicateColumn3, min(id) as min_id
from MyDuplicateTable mdt2
group by DuplicateColumn1, DuplicateColumn2, DuplicateColumn3
having count(*) > 1
) mdt2
using (DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
where mdt.id > mdt2.min_id;
Both of these assume that id is a global unique identifier for each row. That seems reasonable based on the context. However, both can be tweaked if the id can be duplicated for different values of the three key columns.
Your delete statement is fine and works in about every DBMS - except for MySQL where you get this stupid error. The solution to this is simple: replace from sometable with from (select * from sometable) somealias:
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM (SELECT * FROM MyDuplicateTable) t
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3
);

MySQL database | querying count() and select at the same time

i am using MySql workbench 5.7 to run this.
i am trying to get the result of this query:
SELECT COUNT(Users) FROM UserList.custumers;
and this query:
SELECT Users FROM UserList.custumers;
at the same table, meaning i want a list of users in one column and the amount of total users in the other column.
when i tries this:
SELECT Users , COUNT(Users) FROM UserList.custumers;
i get a single row with the right count but only the first user in my list....
You can either use a cross join since you know the count query will result in one row... whose value you want repeated on every row.
SELECt users, userCount
FROM userlist.custumers
CROSS JOIN (Select count(*) UserCount from userlist.custumers)
Or you can run a count in the select.... I prefer the first as the count only has to be done once.
SELECT users, (SELECT count(*) cnt FROM userlist.custumers) as userCount
FROM userlist.custumers
Or in a environment supporting window functions (not mySQL) you could count(*) over (partition by 1) as userCount
The reason you're getting one row is due to mySQL's extension of the GROUP BY which will pick a single value from non-aggregated columns to display when you use aggregation without a group by clause. If you add a group by to your select, you will not get the count of all users. Thus the need for the inline select or the cross join.
Consider: -- 1 record not all users
SELECT Users , COUNT(Users) FROM UserList.custumers;
vs --all users wrong count
SELECT Users , COUNT(Users) FROM UserList.custumers group by users;
vs -- what I believe you're after
SELECT Users, x.usercount FROM UserList.custumers
CROSS JOIN (Select count(*) UserCount from userlist.custumers) x
Use a subquery in SELECT.
Select Users,
(SELECT COUNT(Users) FROM UserList.custumers) as total
FROM UserList.custumers;

MySQL look for duplicates on multiple fields

I have a MySQL database with the following fields:
id, email, first_name, last_name
I want to run an SQL query that will display rows where id and email exists more than once.
Basically, the id and email field should only have one row and I would like to run a query to see if there are any possible duplicates
If you just want to return the id and email that are duplicated, you can just use a GROUP BY query:
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
if you also want to return the full rows, then you have to join the previous query back:
SELECT yourtable.*
FROM
yourtable INNER JOIN (
SELECT id, email
FROM yourtable
GROUP BY id, email
HAVING COUNT(*)>1
) s
ON yourtable.id = s.id AND yourtable.email=s.email
You'll want something like this:
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
See also this question.
You can search for all ids that meet a specific count by grouping them and using a having clause like this:
SELECT id, COUNT(*) AS totalCount
FROM myTable
GROUP BY id
HAVING COUNT(*) > 1;
Anything this query returns has a duplicate. To check for duplicate emails, you can just change the column you're selecting.

mysql - how do I get count of counts

I have a table with duplicate skus.
skua
skua
skub
skub
skub
skuc
skuc
skud
SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku;
shows me all the skus that have duplicates and the number of duplicates
skua 2
skub 3
skuc 2
skud 1
I am trying to find how many there are with 2 duplicates, 3 duplicates etc.
i.e.
duplicated count
1 1 (skud)
2 2 (skua, and skuc)
3 1 (skub)
and I don't know how to write the sql. I imagine it needs a subselect...
thanks
Just use your current query as an inline view, and use the rows from that just like it was from a table.
e.g.
SELECT t.Count AS `duplicated`
, COUNT(1) AS `count`
FROM ( SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku ) t
GROUP BY t.Count
MySQL refers to an inline view as a "derived table", and that name makes sense, when we understand how MySQL actually processes that. MySQL runs that inner query, and creates a temporary MyISAM table; once that is done, MySQL runs the outer query, using the temporary MyISAM table. (You'll see that if you run an EXPLAIN on the query.)
Above, I left your query just as you formatted it; I'd tend to reformat your query, so that entire query looks like this:
SELECT t.Count AS `duplicated'
, COUNT(1) AS `count`
FROM ( SELECT p.sku
, COUNT(1) AS `Count`
FROM products p
GROUP BY p.sku
) t
GROUP BY t.Count
(Just makes it easier for me to see the inner query, and easier to extract it and run it separately. And qualifying all column references (with a table alias or table name) is a best practice.)
select dup_count as duplicated,
count(*) as `count`,
group_concat(sku) as skus
from
(
SELECT sku, COUNT(1) AS dup_count
FROM products
GROUP BY sku
) tmp_tbl
group by dup_count

Mysql select one of duplicate rows

I am trying to select the latest entry of the duplicate entries against the ticket_id in a mysql table , My current is something like this.
SELECT * ,
COUNT(*) AS cnt ,
ticket_id
FROM temp_tickets
GROUP BY ticket_id
ORDER BY id DESC
It gives the number of times a row is duplicated but i am able to select the latest one of those mulptiple rows
Let say i have 3 ticket_id's which got duplicated for 5 times so now i want to select the latest occurrence from all these 3 id's .
Lemme know if i have to be more specific.
Thanks
Here's one way (assuming "latest" means "greatest id")
SELECT temp_tickets.*
FROM temp_tickets
JOIN ( SELECT MAX(id) AS id
FROM temp_tickets
GROUP BY ticket_id
HAVING COUNT(*) > 1
) latest ON latest.id = temp_tickets.id
I suspect it might be possible to come up with a more efficient solution involving user variables but I'll leave that to someone else...