mysql get all records when looking for duplicates - mysql

I want to make a report of all the entries in a table where one column has duplicate entries. Let's assume we have a table like this:
customer_name | some_number
Tom 1
Steve 3
Chris 4
Tim 3
...
I want to show all the records that have some_number as a duplicate. I have used a query like this to show all the duplicate records:
select customer_name, some_number from table where some_number in (select some_number from table group by some_number having count(*) > 1) order by some_number;
This works for a small table, but the one I actually need to operate on is fairly large. 30,000 + rows and it is taking FOREVER! Does someone have a better way to do this?
Thanks!

Try this query:
SELECT t1.*
FROM (SELECT some_number, COUNT(*) AS nb
FROM your_table
GROUP BY some_number
HAVING nb>1
) t2, your_table t1
WHERE t1.some_number=t2.some_number
The query first uses GROUP BY to find duplicate records, then joins with the table to retrieve all fields.
Since HAVING is used, it will return only the records you are interested in, then do the join with your_table.
Be sure your table has an index on some_number if you want the query to be fast.

Does this perform better? It joins on a table of some_number counts and then filters to include only those with a count > 1.
SELECT t.customer_name, t.some_number
FROM my_table t
INNER JOIN (
SELECT some_number, COUNT(*) AS ct
FROM my_table
GROUP BY some_number ) dup ON t.some_number = dup.some_number
WHERE dup.ct > 1

Related

Select the same columns from a list of tables

I am looking to run this query on a list of tables.
SELECT Description,Code,count(*) as count
FROM table1
group by Description,code
having count(*) > 1
I will have to run this query on 30+ different tables, I was wondering If I could change the from statement and just list off the table names.
In addition, is there some functionality that will add the name of the table that it came from in a seperate column to distinguish where the results came from?
Thanks in advance
You might use UNION ALL to put it together. Unless you need some dynamic table selection.
SELECT Description,Code,count(*) as count, 'table1' as tableNane
FROM table1
group by Description,code
having count(*) > 1
UNION ALL
SELECT Description,Code,count(*) as count, 'table2' as tableNane
FROM table2
group by Description,code
having count(*) > 1
...
Actualy I like #Shubhradeep Majumdar version. It will generate more concise code.
SELECT Description,Code, Count(Code), tableName FROM (
SELECT Description,Code, 'table1' as tableName
FROM table1
UNION ALL
SELECT Description,Code, 'table2' as tableName
FROM table2
) tables
GROUP BY tableName, Description, Code
HAVING COUNT(Code) > 1
But there might be a little catch to it. It is more elegant code, but it might actually be slower than first version. The problem is that tableName is appended at every record before grouping while in my first version you do that on already processed data.
Carrying over from #Marek's answer, You could first append all the tables to a table with union all.
select *, 'tab1' as tabnm from tab1
union all
select *, 'tab2' as tabnm from tab2
union all
select *, 'tab3' as tabnm from tab3
-- and so on...
And then use your code to process that final table.
will save you a great deal of time.
EDITED with a column specifying the table name

mysql - how do I get count of counts

I have a table with duplicate skus.
skua
skua
skub
skub
skub
skuc
skuc
skud
SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku;
shows me all the skus that have duplicates and the number of duplicates
skua 2
skub 3
skuc 2
skud 1
I am trying to find how many there are with 2 duplicates, 3 duplicates etc.
i.e.
duplicated count
1 1 (skud)
2 2 (skua, and skuc)
3 1 (skub)
and I don't know how to write the sql. I imagine it needs a subselect...
thanks
Just use your current query as an inline view, and use the rows from that just like it was from a table.
e.g.
SELECT t.Count AS `duplicated`
, COUNT(1) AS `count`
FROM ( SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku ) t
GROUP BY t.Count
MySQL refers to an inline view as a "derived table", and that name makes sense, when we understand how MySQL actually processes that. MySQL runs that inner query, and creates a temporary MyISAM table; once that is done, MySQL runs the outer query, using the temporary MyISAM table. (You'll see that if you run an EXPLAIN on the query.)
Above, I left your query just as you formatted it; I'd tend to reformat your query, so that entire query looks like this:
SELECT t.Count AS `duplicated'
, COUNT(1) AS `count`
FROM ( SELECT p.sku
, COUNT(1) AS `Count`
FROM products p
GROUP BY p.sku
) t
GROUP BY t.Count
(Just makes it easier for me to see the inner query, and easier to extract it and run it separately. And qualifying all column references (with a table alias or table name) is a best practice.)
select dup_count as duplicated,
count(*) as `count`,
group_concat(sku) as skus
from
(
SELECT sku, COUNT(1) AS dup_count
FROM products
GROUP BY sku
) tmp_tbl
group by dup_count

Get last but one row for each ID

I am using query like
select * from audittable where a_id IN (1,2,3,4,5,6,7,8);
For each ID its returning 5-6 records. I wanted to get the last but one record for each ID.
Can i do this in one sql statement.
Try this query
SELECT
*
FROM
(SELECT
#rn:=if(#prv=a_id, #rn+1, 1) as rId,
#prv:=a_id as a_id,
---Remaining columns
FROM
audittable
JOIN
(SELECT #rn:=0, #prv:=0) t
WHERE
a_id IN (1,2,3,4,5,6,7,8)
ORDER BY
a_id, <column> desc)tmp --Replace column with the column with which you will determine it is the last record
WHERE
rId=1;
If your database is having DateCreated or any column in which you are saving the DateTime as well like when your data is inserted for a particular row then you may use query like
select at1.* from audittable at1 where
datecreated in( select max(datecreated) from audittable at2
where
at1.id = at2.id
order by datecreated desc
);
You may also use LIMIT function as well.
Hope you understand and works for you.
In SQLite, you have the columns a_id and b. For each a_id you get a set of b's. Let you want
to get the latest/highest (maximum in terms of row_id, date or another naturally increasing index) one of b's
SELECT MAX(b), *
FROM audittable
GROUP BY a_id
Here MAX help to get the maximum b from each group.
Bad news that MySQL doesn't associate MAX b with other *-columns of the table. But it still can be used in case of simple table with a_id and b columns!

Get number of duplicate rows resulting from a DISTINCT query

I have a table with rows where a, b, and c are commonly the same.
I have a query that gives me each unique record. I'm trying to get the count, of the duplicate records for each distinct record returned.
SELECT DISTINCT
a,
b,
c,
COUNT(id) as counted
FROM
table
The COUNT here returns the count for all the records. What I was looking for was the count of records identical to the unique record.
SELECT a,b,c,COUNT(*) FROM table GROUP BY a,b,c
SELECT DISTINCT
a,
b,
c,
(
SELECT
COUNT(id)
FROM
table_name t1
WHERE
t2.a = t1.a
) AS counted
FROM
table_name t2
The above sub query know as inline sub query. in where clause t1 and t2 treat as different table(It's single table in DB) by query. So it check the equality and then count. as we put distinct for a column so all play done with that only.
I hope am able to enplane.
Ah, figured this one out from a duplicate as I was writing the question - I figured I'd share my results as they were different enough from the answer I got mine from.
I have to use a subquery to get query non-distinct records. Then, I can use results from the first query in the subquery's WHERE clause.
SELECT DISTINCT
a,
b,
c,
(
SELECT
COUNT(id)
FROM
table_name t1
WHERE
t2.a = t1.a
) AS counted
FROM
table_name t2
This works. Let me know if there are gaps in my understanding.
With help from this answer: https://stackoverflow.com/a/14110336/1270996

How do I write a SQL query to detect duplicate primary keys?

Suppose I want to alter the table so that my primary keys are as follows
user_id , round , tournament_id
Currently there are duplicates that I need to clean up. What is the query to find all duplicates?
This is for MySQL and I would like to see duplicate rows
Technically, you don't need such a query; any RDBMS worth its salt will not allow the insertion of a row which would produce a duplicate primary key in the table. Such a thing violates the very definition of a primary key.
However, if you are looking to write a query to find duplicates of these groups of columns before applying a primary key to the table that consists of these columns, then this is what you'd want:
select
t.user_id, t.round, t.tournament_id
from
table as t
group by
t.user_id, t.round, t.tournament_id
having
count(*) > 1
The above will only give you the combination of columns that have more than one row for that combination, if you want to see all of the columns in the rows, then you would do the following:
select
o.*
from
table as o
inner join (
select
t.user_id, t.round, t.tournament_id
from
table as t
group by
t.user_id, t.round, t.tournament_id
having
count(*) > 1
) as t on
t.user_id = o.user_id and
t.round = o.round and
t.tournament_id = o.tournament_id
Note that you could also create a temporary table and join on that if you need to use the results multiple times.
SELECT name, COUNT(*) AS counter
FROM customers
GROUP BY name
HAVING COUNT (*) > 1
That's what you are looking for.
In table:
ID NAME email
-- ---- -----
1 John Doe john#teratrax.com
2 Mark Smith marks#teratrax.com
3 John Doe jdoe#company.com
will return
name counter
---- -------
John Doe 2
Assuming you either have a table with those three columns, or that you can make and populate a table with those three columns, this query will show the duplicates.
select user_id, round, tournament_id
from yourtable
group by user_id, round, tournament_id
having count(*) > 1
This query selects all rows from the customers table that have a duplicate name but also shows the email of each duplicate.
SELECT c.name, c.email FROM customers c, customers d
WHERE c.name = d.name
GROUP BY c.name, c.email
HAVING COUNT(*) > 1
The downside of this is that you have to list all the columns you want to output twice, once in the SELECT and once in the GROUP BY clause. The other approach is to use a subquery or join to filter the table against the list of known duplicate keys.