Count and sum up all duplicate records in MySQL - mysql

I have table with, following structure.
id name
1 john
2 ana
3 john
4 ana
5 peter
6 ana
7 Abrar
8 Raju
Duplicate entries in the table are as follows
john(2) duplicate
ana(3) duplicate
The names which are duplicates are john and ana.
My question is how would I count the records in total which are duplicate here it is '5' records
Note : I also followed the similar question in community but it explains how we can add the number of duplicates exists for that particular name in the table and adds up the third column in table representing the duplicates records with same name but in my case I wanted to know the number of all duplicates exist in the table (here the result of the query is just number "5") irrespective of the names.

Just take a count subquery on the query you already have in mind (or perhaps have already written):
SELECT SUM(cnt) AS total_duplicates
FROM
(
SELECT COUNT(*) AS cnt
FROM yourTable
GROUP BY name
HAVING COUNT(*) > 1
) t;
Demo

Related

How to get exact rows count of particular column in MySQL table

I want to get exact row count of specified column
Example: Table
Name Id Age
_______________________________
Jon 1 30
Merry 2 40
William 50
David
There are 4 rows in table but i want to count ID column.
I am using below query to achieve it
select count(Id) from table;
But its returning 4 and I know why it is returning 4 but I want output as 2 because there are only two rows in Id column.
How can i achieve it?
Try this:
select count(Id) from table where id>0;
with the help of #blabla_bingo and #Edwin Dijk finally i have achieved it by below query
select count(Id) from table where Id!="";

Remove duplicated based on two columns [duplicate]

This question already has answers here:
How to select and/or delete all but one row of each set of duplicates in a table?
(2 answers)
How can I remove duplicate rows?
(43 answers)
Closed 1 year ago.
I've a flights table that consists of few columns but somebody seem to have ran a migration twice that resulted in creation of same data twice.
Anyway, the flight should only have only data from the following condition: The flight_number and the date.
Basically the table is looking like this at the moment:
flight_number
date
123
2021-09-16
123
2021-09-16
123
2021-09-17
124
2021-09-18
124
2021-09-18
Result I want:
flight_number
date
123
2021-09-16
123
2021-09-17
124
2021-09-18
Basically, keep only one and remove duplicated (if the flight_number is same of the same date).
I'm looking for a DELETE SQL query but couldn't find the one like I am looking for.
What is the query that can help me achieve it?
Thanks!
EDIT: Yes, all the data has a column id that is unique even if the data is same.
You need to identify which rows to keep and which to remove; this can be done as such:
delete ff from
flight ff
inner join (
select flight_number, row_number() over (partition by flight_number order by date) as RN
from flight f
) dups
on ff.flight_number = dups.flight_number
where dups.rn > 1
Basically, this uses Row_Number to create a row identifier based on certain criteria, in this case, for each (partition) Flight_number, create a row number then delete any records where the row_number is > 1.
You will need to change this to use the actual ID column on the join, like this https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=58a4ac7235ea22b557116ad68c8449c3

mysql: is there any way we can fetch records in different order than stored order?

We have following table with columns:
CategoryId | CategoryName with CategoryId as primary-key.
We have following data:
0 Other
1 Bumper
2 Door
3 Roof
4 Fender
I use select * from these table and i get rows in the above sequence only i.e. 0 first and 4 at last.
Is there any way I can get 0 at last? i.e. 1,2,3,4,0?
select * from yourtable order by CategoryId=0,CategoryId
You can use arbitrary expressions in an order by clause. By first ordering by CategoryId=0, you get all records for which that is false, followed by records (obviously only one, since it is the primary key) for which it is true. Then each of those sets of records are sorted by CategoryId.

SQL: Repeated records by grouping some columns

I have a data like,
ID Name ItemA ItemB ItemC
OXZ234 Adam 4 4 5
OXZ234 Adam 1 2 3
OXZ345 Tarzen 6 7 8
OXDER2 William 9 8 2
OXDER2 William 0 8 0
I need to find how much of food each person eats. For example by referring first two records I can say, Adam of ID OXZ234 ate ItemA-5, ItemB-6 and ItemC-8. But for small amount of data this kind of manual calculation is affordable. I have a million data records like this. So initially I need to find the records which is having same ID and name but only items count differing.
I have tried the query to find duplicate records by grouping all columns like below,
select ID,Name,ItemA,ItemB,ItemC, COUNT(*)
from DATA_REFRESH
group by ID,Name,ItemA,ItemB,ItemC
having COUNT(*) > 1
But Now I have to identify records having items columns differed.
So the expected output is like,
OXZ234 Adam 2
OXDER2 William 2
OXZ345 Tarzen 1
Any suggestion would be helpful!
You want SUM
select ID,
Name,
sum(ItemA) as ItA,
sum(ItemB) as ItB,
sum(ItemC) as ItC,
count(ID) as Occurrences -- Counts the number of entries per person
from DATA_REFRESH
group by ID,Name
having count(ID) >1 -- restricts this so only those with more than one entry appear
Hi, You can have a simple query without having clause,
select ID,Name,COUNT(*)
from DATA_REFRESH
group by ID,Name order by COUNT(*) desc ;
Simply try like this,
select ID,Name,COUNT(*)
from Sample_Check
group by ID,Name
having COUNT(*) > 1

ORDER BY and GROUP BY those results in a single query

I am trying to query a dataset from a single table, which contains quiz answers/entries from multiple users. I want to pull out the highest scoring entry from each individual user.
My data looks like the following:
ID TP_ID quiz_id name num_questions correct incorrect percent created_at
1 10154312970149546 1 Joe 3 2 1 67 2015-09-20 22:47:10
2 10154312970149546 1 Joe 3 3 0 100 2015-09-21 20:15:20
3 125564674465289 1 Test User 3 1 2 33 2015-09-23 08:07:18
4 10153627558393996 1 Bob 3 3 0 100 2015-09-23 11:27:02
My query looks like the following:
SELECT * FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`
ORDER BY `correct` DESC
In my mind, what that should do is get the two users from the IN clause, order them by the number of correct answers and then group them together, so I should be left with the 2 highest scores from those two users.
In reality it's giving me two results, but the one from Joe gives me the lower of the two values (2), with Bob first with a score of 3. Swapping to ASC ordering keeps the scores the same but places Joe first.
So, how could I achieve what I need?
You're after the groupwise maximum, which can be obtained by joining the grouped results back to the table:
SELECT * FROM entries NATURAL JOIN (
SELECT TP_ID, MAX(correct) correct
FROM entries
WHERE TP_ID IN ('10153627558393996', '10154312970149546')
GROUP BY TP_ID
) t
Of course, if a user has multiple records with the maximal score, it will return all of them; should you only want some subset, you'll need to express the logic for determining which.
MySql is quite lax when it comes to group-by-clauses - but as a rule of thumb you should try to follow the rule that other DBMSs enforce:
In a group-by-query each column should either be part of the group-by-clause or contain a column-function.
For your query I would suggest:
SELECT `TP_ID`,`name`,max(`correct`) FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`,`name`
Since your table seems quite denormalized the group by name-par could be omitted, but it might be necessary in other cases.
ORDER BY is only used to specify in which order the results are returned but does nothing about what results are returned - so you need to apply the max()-function to get the highest number of right answers.