Query for finding duplicates

Query for finding duplicates - mysql

I am having a table with following schema:
CUSTOMERS (id INT, name VARCHAR(10), height VARCHAR(10), weight INT)
id is the primary key. I want to find out rows in which people who are having exactly same name, same height and same weight. In other words, I want to find out duplicates with-respect-to name, height and weight.
Example table:
1, sam, 160, 100
2, ron, 167, 88
3, john, 150, 90
4, sam, 160, 100
5, rick, 158, 110
6, john, 150, 90
7, sam, 166, 110
Example Output:
Now since there are people with same name, same height and same weight:
sam (id=1), sam (id=4)
and
john (id=3), john (id=6)
I want to get these ids. It is also okay if I get only one id per match (i.e. id=1 from first match and id=3 from second match).
I am trying this query but not sure if it is correct or not.
SELECT id
FROM customers
GROUP BY name, height, weight

Try this (valid for sql server):
SELECT
t.NAME,
'Ids = '+
(
SELECT cast(Id as varchar)+','
FROM Customers c
WHERE c.NAME = t.NAME AND c.Weight = t.Weight AND c.Height = t.Height
FOR XML PATH('')
)
FROM
(
SELECT Name, height, weight
FROM Customers
GROUP BY Name, height, weight
HAVING COUNT(*) > 1
) t
OR
as you asked - only one Id per match
SELECT
t.NAME,
c.Id
FROM
(
SELECT Name, height, weight
FROM Customers
GROUP BY Name, height, weight
HAVING COUNT(*) > 1
) t
JOIN Customers c ON t.NAME AND c.Weight = t.Weight AND c.Height = t.Height

SELECT *
FROM customers C
INNER JOIN
(
SELECT name, height, weight
FROM customers
GROUP BY name, height, weight
HAVING COUNT(*) > 1
) X ON C.name = X.name AND C.height = X.height AND C.weight = X.weight

SELECT c.*
FROM customers c
JOIN (
SELECT name, height, weight
FROM
GROUP BY name, height, weight
HAVING count(*) > 1
) t ON c.name = t.name and c.height = t.height and c.weight = t.weight

you are on the right way:
SELECT min(id)
FROM customers
GROUP BY name, height, weight
HAVING COUNT(*) > 1

I don't know what you are using since you tagged several databases.
In Sql server you won't be able to select the id without putting it in the SELECT.
so if you want to select other fields besides the ones in the group clasue you can use PARTITION BY. Something like this:
SELECT id,
ROW_NUMBER() OVER(PARTITION BY c.name, c.height, c.weight ORDER BY c.name) AS DuplicateCount
FROM customers c
This will give you the ids of the duplicates that you have with the same name, height and weight.
I'm not sure that this is faster that the other solutions though, but, you can profile it and compare.

If it is okay to get only one id per match as you say, you are close to solution:
SELECT
min( id )
,name, height, weight --<-- oncly if you need/want
FROM customers
GROUP BY name, height, weight
HAVING count(*) > 1

Related

MySQL Select max() from every Value

I there a way to get the max() Value of every Value?
I have a table like this:
id primary key
name foreign key
age
and I need the highes age of every Name. For example:
ID NAME AGE
1, Marco, 12
2, Jason, 23
3, Tom, 5
4, Marco, 16
5, Jason, 22
The output should be:
ID NAME AGE
2, Jason, 23
3, Tom, 5
4, Marco, 16
Is this possible and how?
Thank you.

You can get the max value of each column using aggregation:
select max(id), name, max(age)
from t
group by name;
But if you want the complete row with the max age, then that would be:
select t.*
from t
where t.age = (select max(t2.age) from t t2 where t2.name = t.name);

Try this:
select id,name,max(age) over(partition by name) as max_age from table group by id,name;

You can use aggregation:
select min(id) id, name, max(age) age from mytable group by name

You can get the max age and name from sub query then left join to obtain its ID.
SELECT b.id, a.name, a.maxage
FROM (SELECT name, MAX(age) AS maxage
FROM table
GROUP BY NAME
) a
LEFT JOIN table b ON a.NAME = b.NAME AND a.maxage= b.AGE

MySQL select percentage

I have a database table "tblfavs" with five columns: id, userid, logoid, favdate, did.
I want to determine the percentage of a user's (userid) favorites (id) that share the same designer id (did), and where userid <> did, displayed from highest percentage to lowest.
In pseudo-query format:
SELECT [percentage], userid, did
FROM tblfavs
WHERE record has the same userid and did
AND userid <> did
GROUP BY userid
ORDER BY [percentage] DESC
I can't get my head around the query to accomplish this. Help appreciated!
Edit:
Sample data
1, 1, 5, 2017-01-01, 2
2, 7, 3, 2017-01-02, 5
3, 1, 8, 2017-01-02, 2
4, 7, 1, 2017-01-02, 3
In this set user 1 (second column) has two entries and both have "2" as the designer id (final column).
Expected output
100%, userid 1, did 2
50%, userid 7, did 5
50%, userid 7, did 3
etc.

This is easier in other DBMS which feature window functions (e.g. COUNT OVER). However, this is not that difficult in MySQL either. You just need two aggregations: Count per userid and did, count per userid, divide.
select
ud.cnt * 100.0 / u.cnt as percentage,
ud.userid,
ud.did
from
(
select userid, did, count(*) as cnt
from tblfavs
group by userid, did
) ud
join
(
select userid, count(*) as cnt
from tblfavs
group by userid
) u on u.userid = ud.userid
order by percentage desc;

Try this (improved from the previous answer to match your exact input-so you get % for each row not agregated)
select
ud.cnt2 * 100.0 / u.cnt as percentage,
ud.userid,
ud.did
from
(
select
out3.userid,out3.did, cnt2
from
(select userid,did from tblfavs) out3
join
(
select
userid, did, count(*) as cnt2
from tblfavs
group by userid, did
) ud on ud.userid = out3.userid and ud.did = out3.did
) ud
join
(
select userid, count(*) as cnt
from tblfavs
group by userid
) u on u.userid = ud.userid
order by percentage desc;

MYSQL - Group By / Order By not working

I have the following data inside a table:
id person_id item_id price
1 1 1 10
2 1 1 20
3 1 3 50
Now what I want to do is group by the item ID, select the id that has the highest value and take the price.
E.g. the sum would be: (20 + 50) and ignore the 10.
I am using the following:
SELECT SUM(`price`)
FROM
(SELECT id, person_id, item_id, price
FROM `table` tbl
INNER JOIN person p USING (person_id)
WHERE p.person_id = 1
ORDER BY id DESC) x
GROUP BY item_id
However, this query is still adding (10 + 20 + 50), which is obviously not what I need to have.
Any ideas to where I am going wrong?

Here is what you are trying to achieve. First you need grouping in a subquery and not in outer query. In outer query you need only sum:
SELECT SUM(`price`)
FROM
(SELECT MAX(price) as price
FROM `table` tbl
INNER JOIN person p USING (person_id)
WHERE p.person_id = 1
GROUP BY item_id) x

http://sqlfiddle.com/#!9/40803/5
SELECT SUM(t1.price)
FROM tbl t1
LEFT JOIN tbl t2
ON t1.person_id= t2.person_id
AND t1.item_id = t2.item_id
AND t1.id<t2.id
WHERE t1.person_id = 1
AND t2.id IS NULL;

I'm not sure if this is the only requirement you have. If so, try this.
SELECT SUM(price)
FROM
(SELECT MAX(price)
FROM table
WHERE person_id = 1
GROUP BY item_id)

First of all - you don't need the person table, because the other table already contains the person_id. So i removed it from the examples.
Your query returns a sum of prices for each item.
If you replace SELECT SUM(price) with SELECT item_id, SUM(price) you wil get
item_id SUM(`price`)
1 30
3 50
But that is not what you want. Neither is it what you wrote in the question " (10 + 20 + 50)".
Now replacing the first line with SELECT id, item_id, SUM(price) you will get one row for each item with the highest id.
id item_id price
2 1 20
3 3 50
This works because of the "undocumented feature" of MySQL, wich allows you to select columns that are not listed in the GROUP BY clause and get the first row from the subselect each group (each item in this case).
Now you only need to sum the price column in an additional outer select
SELECT SUM(price)
FROM (
SELECT id, item_id ,price
FROM (
SELECT id, person_id, item_id, price
FROM `table` tbl
WHERE tbl.person_id = 1
ORDER BY id DESC ) x
GROUP BY item_id
) y
However i do not recomend to use that "feature". While it still works on MySQL 5.6, you never know if that will work with newer versions. It already doesn't work on MariaDB.
Instead you can determite the MAX(id) for each item in an subselect, select only the rows with the determined ids and get the summed price of them.
SELECT SUM(`price`)
FROM `table` tbl
WHERE tbl.id IN (
SELECT MAX(tbl2.id)
FROM `table` tbl2
WHERE tbl2.person_id = 1
GROUP BY tbl2.item_id
)
Another solution (wich internaly does the same) is
SELECT SUM(`price`)
FROM `table` tbl
JOIN (
SELECT MAX(tbl2.id) as id
FROM `table` tbl2
WHERE tbl2.person_id = 1
GROUP BY tbl2.item_id
) x ON x.id = tbl.id
Alex's solution also works fine, if the groups (number of rows per person and item) are rather small.

You have used group by in main query, but it is on subquery like
SELECT id, person_id, item_id, SUM(`price`) FROM ( SELECT MAX(price) FROM `table` tbl WHERE p.person_id = 1 GROUP BY item_id ) AS x

MySql - Get duplicates by multiple columns

I have an part inventory table that stores parts by PartName, WarehouseId, VendorCode (main interest columns). It should only have unique PartName entries by WarehouseId and VendorCode. However, entries are mixed, and I need to get the PartName, WarehouseId and Vendor for such a case. E.g:
ABC133, Warehouse 10, VendorCode 1234
ABC133, Warehouse 10, VendorCode 1222
BBB111, Warehouse 20, VendorCode 1111
BBB111, Warehouse 20, VendorCode 2222
I have customized a query found on this site to do this, but it only brings the first "duplicate" for every duplicate PartName, and I need to get all the faulty entries:
ABC133, Warehouse 10, VendorCode 1222
BBB111, Warehouse 20, VendorCode 1111
This is the query I use:
SELECT i.MFGPN, i.VendorCode, i.WarehouseID FROM edi_846_inventory i
INNER JOIN (SELECT MFGPN FROM edi_846_inventory
GROUP BY MFGPN HAVING count(MFGPN) > 1 and count(VendorCode) > 1) dup ON i.MFGPN = dup.MFGPN
where MFGPN is the PartName
Thanks

This is the query that you want:
SELECT i.MFGPN, i.VendorCode, i.WarehouseID
FROM edi_846_inventory i INNER JOIN
(SELECT MFGPN, WarehouseID
FROM edi_846_inventory
GROUP BY MFGPN, WarehouseID
HAVING count(*) > 1
) dup
ON i.MFGPN = dup.MFGPN AND i.WarehouseID = dup.WarehouseID;
In other words, your subquery needs to aggregate by both MFGPN and WarehouseID.
Also, just concatenating the vendors together might be sufficient for you:
SELECT MFGPN, WarehouseID, GROUP_CONCAT(VendorCode) as Vendors
FROM edi_846_inventory
GROUP BY MFGPN, WarehouseID
HAVING count(*) > 1

A work colleague found a solution:
select i.MFGPN, i.WarehouseID, i.VendorCode, i.IngramSKU
from edi_846_inventory i
where i.MFGPN in
(
select *
from
(
select distinct i.MFGPN
from edi_846_inventory i
group by i.MFGPN, i.WarehouseID
having count(*) > 1
) dup
)
order by i.MFGPN, i.WarehouseID

DISPLAY MIN() and MAX() Values in MySQL

I have a table called cia with 2 columns:
Column 1 ('Name') has the names of all countries in the world.
Column 2 ('area')has the size of those countries in m^2.
I want to find the biggest and smallest country. To find those I need to enter the following Queries:
SELECT Name, MAX(area) FROM cia
My other query:
SELECT Name, MIN(area) FROM cia
Now obviously I could do
SELECT MIN(area), MAX(area) FROM cia
however, I wouldn't get the corresponding name to my values then. Is it possible to get an output like this
Country | Fläche
Afghanistan | lowest value of column 'area'
China | highest value of column 'area'

This is the minimum size:
select min(area) from cia;
And this the maximum:
select max(area) from cia;
So:
select * from cia
where area = (select min(area) from cia)
or area = (select max(area) from cia)
order by area;

You can try this query:
CREATE TABLE area (name varchar(50), area int);
insert into area values ('Italy', 1000);
insert into area values ('China', 10000);
insert into area values ('San Marino', 10);
insert into area values ('Ghana', 3333);
select main.* from area main
where
not exists(
SELECT 'MINIMUM'
FROM area a2
where a2.area < main.area
)
or
not exists(
SELECT 'MAXIMUM'
FROM area a3
WHERE a3.area > main.area
)
order by area desc
In this way write only two subqueries. Other ways can go in error with other DBMS (no use of GROUP BY to show Name)
Go to Sql Fiddle

EDIT:
Sorry, my first thought was wrong. But this works.
SELECT Typ = 'MaxValue', * FROM (SELECT TOP 1 Name, area FROM cia ORDER BY area DESC) tmp1
UNION ALL
SELECT Typ = 'MinValue', * FROM (SELECT TOP 1 Name, area FROM cia ORDER BY area ASC) tmp2

You obviously have to hit the table twice, but there is no need for three hits.
select case Area when MaxArea then 'Largest' else 'Smallest' end Rating, b.Name as Country, b.Area
from(
select Max( Area ) as MaxArea, Min( Area ) as MinArea
from Cia a
) S
join Cia b
on b.Area = s.MaxArea
or b.Area = s.MinArea;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Query for finding duplicates - mysql

SELECT * FROM customers C INNER JOIN ( SELECT name, height, weight FROM customers GROUP BY name, height, weight HAVING COUNT(*) > 1 ) X ON C.name = X.name AND C.height = X.height AND C.weight = X.weight

SELECT c.* FROM customers c JOIN ( SELECT name, height, weight FROM GROUP BY name, height, weight HAVING count(*) > 1 ) t ON c.name = t.name and c.height = t.height and c.weight = t.weight

you are on the right way: SELECT min(id) FROM customers GROUP BY name, height, weight HAVING COUNT(*) > 1

If it is okay to get only one id per match as you say, you are close to solution: SELECT min( id ) ,name, height, weight --<-- oncly if you need/want FROM customers GROUP BY name, height, weight HAVING count(*) > 1

Related

MySQL Select max() from every Value

MySQL select percentage

MYSQL - Group By / Order By not working

MySql - Get duplicates by multiple columns

DISPLAY MIN() and MAX() Values in MySQL

Categories

Resources