MySQL query: Single Table multiple comparisions - mysql

I have the following mysql table:
id | member |
1 | abc
1 | pqr
2 | xyz
3 | pqr
3 | abc
I have been trying to write a query which would return the id which has exact same members as a given id. For example, if given id is 1 then the query should return 3 because both id 1 and id 3 have exact same members viz. {abc, pqr}. Any pointers? Appreciate it.
EDIT: The table may have duplicates, e.g. id 3 may have members {abc, abc} instead of {pqr, abc}, in which case the query should not return id 3.

Here's a solution that finds matching pairs for the entire table - you can add a where clause to filter as needed. Basically it does a self-join based on equal "member" and unequal "id". It then compares the resulting count grouped by the 2 ids and compares them to the total count of those ids from the original table. If they both match, it means they have the same exact members.
select
t1.id, t2.id
from
table t1
inner join table t2
on t1.member = t2.member
and t1.id < t2.id
inner join (select id, count(1) as cnt from table group by id) c1
on t1.id = c1.id
inner join (select id, count(1) as cnt from table group by id) c2
on t2.id = c2.id
group by
t1.id, t2.id, c1.cnt, c2.cnt
having
count(1) = c1.cnt
and count(1) = c2.cnt
order by
t1.id, t2.id
This is some sample data I used which returned matches of (1,3) and (6,7)
insert into table
values
(1, 'abc'), (1, 'pqr'), (2, 'xyz'), (3, 'pqr'), (3, 'abc'), (4, 'abc'), (5, 'pqr'),
(6, 'abc'), (6, 'def'), (6, 'ghi'), (7, 'abc'), (7, 'def'), (7, 'ghi')

similar (to Derek Kromm's) approach using sub-queries:
SELECT id
FROM mc a
WHERE
id != 1 AND
member IN (
SELECT member FROM mc WHERE id=1)
GROUP BY id
HAVING
COUNT(*) IN (
SELECT COUNT(*) FROM mc WHERE id=1) AND
COUNT(*) IN (
SELECT COUNT(*) FROM mc where id=a.id);
a logic here is we need all ids that match following 2 conditions:
member is among those that belong to id 1
total number of members is same as number of those that belong to id 1
total number of selected members equal to total number of members for current id

try this:
declare #id int
set #id=1
select a.id from
(select id,COUNT(*) cnt from sample_table
where member in (select member from sample_table where id=#id)
and id <>#id
group by id)a
join
(select count(distinct member) cnt from sample_table where id=#id)b
on a.cnt=b.cnt

Related

SQL question. Find the two person having same hobbies in one table

TABLE [tbl_hobby]
person_id (int) , hobby_id(int)
has many records. I want to get a SQL query to find all pairs of personid who have the same hobbies( same hobby_id ).
If A has hobby_id 1, B has too, if A doesn't have hobby_id 2, B doesn't have too, we will output A & B 's person_ids.
If A and B and C reach the limits, we output A & B , B & C, A & C.
I've finished in a very very very stupid method, multiple joins the table itself and multiple sub-queries. And of course be laughed by leader.
Is there any high performance method in a SQL for this question?
I have been thinking hard for this since 36 hrs ago......
sample data in mysql dump
CREATE TABLE `tbl_hobby` (
`person_id` int(11) NOT NULL,
`hobby_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tbl_hobby` (`person_id`, `hobby_id`) VALUES
(1, 1),(1, 2),(1, 3),(1, 4),(1, 5),(2, 2),
(2, 3),(2, 4),(3, 1),(3, 2),(3, 3),(3, 4),
(4, 1),(4, 3),(4, 4),(5, 1),(5, 5),(5, 9),
(6, 2),(6, 3),(6, 4),(7, 1),(7, 3),(7, 7),
(8, 2),(8, 3),(8, 4),(9, 1),(9, 2),(9, 3),
(9, 4),(10, 1),(10, 5),(10, 9),(10, 11);
COMMIT;
Expert result: (2 and 6 and 8 same, 3 and 9 same)
2,6
2,8
6,8
3,9
Order of result records and order of the two number in one record is not important. Result record in one column or in two columns are all accepted since it can be easily concated or seperated.
Aggregate per person to get strings of their hobbies. Then aggregate per hobby list find out which belong to more than one person.
select hobbies, group_concat(person_id order by person_id) as persons
from
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) persons
group by hobbies
having count(*) > 1
order by hobbies;
This gives a a list of persons per hobby. Which is the easiest way to output a solution as we would otherwise have to build all possible pairs.
UPDATE: If you want pairs, you'll have to query the table twice:
select p1.person_id as person 1, p2.person_id as person2
from
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) p1
join
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) p2 on p2.person_id > p1.person_id and p2.hobbies = p1.hobbies
order by person1, person2;
Alternative version, without using any proprietary string handling:
select distinct t1.person_id, t2.person_id
from tbl_hobby t1
join tbl_hobby t2
on t1.person_id < t2.person_id
where 2 = all (select count(*)
from tbl_hobby
where person_id in (t1.person_id, t2.person_id)
group by hobby_id);
Perhaps less efficient, but portable!

SQL How to use SUM and Group By with multiple tables?

I have three tables.
For each "id" value, I would like the sum of the col1 values, the sum of col2 values & the sum of col3 values listed separately. I am not summing across tables.
table a
num | id | col1
================
1 100 0
2 100 1
3 100 0
1 101 1
2 101 1
3 101 0
table b
idx | id | col2
=================
1 100 20
2 100 20
3 100 20
4 101 100
5 101 100
table c
idx | id | col3
==============================
1 100 1
2 100 1
3 100 1
4 101 10
5 101 1
I would like the results to look like this,
ID | sum_col1 | sum_col2 | sum_col3
====================================
100 1 60 3
101 2 200 11
Here is my query which runs too long and then times out. My tables are about 25,000 rows.
SELECT a.id as id,
SUM(a.col1) as sum_col1,
SUM(b.col2) as sum_col2,
SUM(c.col3) as sum_col3
FROM a, b, c
WHERE a.id=b.id
AND a=id=c.id
GROUP by id
Order by id desc
The number of rows in each table may be different, but the range of "id" values in each table is the same.
This appears to be a similar question, but I can't make it work,
Mysql join two tables sum, where and group by
Here is a solution based on your data. Issue with your query is that you were joining tables on a non-unique column resulting in Cartesian product.
Data
DROP TABLE IF EXISTS A;
CREATE TABLE A
(num int,
id int,
col1 int);
INSERT INTO A VALUES (1, 100, 0);
INSERT INTO A VALUES (2, 100, 1);
INSERT INTO A VALUES (3, 100, 0);
INSERT INTO A VALUES (1, 101, 1);
INSERT INTO A VALUES (2, 101, 1);
INSERT INTO A VALUES (3 , 101, 0);
DROP TABLE IF EXISTS B;
CREATE TABLE B
(idx int,
id int,
col2 int);
INSERT INTO B VALUES (1, 100, 20);
INSERT INTO B VALUES (2, 100, 20);
INSERT INTO B VALUES (3, 100, 20);
INSERT INTO B VALUES (4, 101, 100);
INSERT INTO B VALUES (5, 101, 100);
DROP TABLE IF EXISTS C;
CREATE TABLE C
(idx int,
id int,
col3 int);
INSERT INTO C VALUES (1, 100, 1);
INSERT INTO C VALUES (2, 100, 1);
INSERT INTO C VALUES (3, 100, 1);
INSERT INTO C VALUES (4, 101, 10);
INSERT INTO C VALUES (5, 101, 1);
Solution
SELECT a_sum.id, col1_sum, col2_sum, col3_sum
FROM (SELECT id, SUM(col1) AS col1_sum
FROM a
GROUP BY id ) a_sum
JOIN
(SELECT id, SUM(col2) AS col2_sum
FROM b
GROUP BY id ) b_sum
ON (a_sum.id = b_sum.id)
JOIN
(SELECT id, SUM(col3) AS col3_sum
FROM c
GROUP BY id ) c_sum
ON (a_sum.id = c_sum.id);
Result is as expected
Note: Do outer joins if an id doesnt have to be present in all three tables.
Maybe this will do?
Haven't got a chance to run it, but i think it can do the job.
SELECT sumA.id, sumA.sumCol1, sumB.sumCol2, sumC.sumCol3
FROM
(SELECT id, SUM(col1) AS sumCol1 FROM a GROUP BY id ORDER BY id ASC) AS sumA
JOIN (SELECT id, SUM(col2) AS sumCol2 FROM b GROUP BY id ORDER BY id ASC) AS sumB ON sumB.id = sumA.id
JOIN (SELECT id, SUM(col3) AS sumCol3 FROM c GROUP BY id ORDER BY id ASC) AS sumC ON sumC.id = sumB.id
;
EDIT
SELECT IF(sumA.id IS NOT NULL, sumA.id, IF(sumB.id IS NOT NULL, sumB.id, IF(sumC.id IS NOT NULL, sumC.id,''))),,
sumA.sumCol1, sumB.sumCol2, sumC.sumCol3
FROM
(SELECT id, SUM(col1) AS sumCol1 FROM a GROUP BY id ORDER BY id ASC) AS sumA
OUTER JOIN (SELECT id, SUM(col2) AS sumCol2 FROM b GROUP BY id ORDER BY id ASC) AS sumB ON sumB.id = sumA.id
OUTER JOIN (SELECT id, SUM(col3) AS sumCol3 FROM c GROUP BY id ORDER BY id ASC) AS sumC ON sumC.id = sumB.id
;
I would do the summing first, then union the results, then pivot them round:
SELECT
id,
MAX(CASE WHEN which = 'a' then sumof end) as sum_a,
MAX(CASE WHEN which = 'b' then sumof end) as sum_b,
MAX(CASE WHEN which = 'c' then sumof end) as sum_c
FROM
(
SELECT id, sum(col1) as sumof, 'a' as which FROM a GROUP BY id
UNION ALL
SELECT id, sum(col2) as sumof, 'b' as which FROM b GROUP BY id
UNION ALL
SELECT id, sum(col3) as sumof, 'c' as which FROM c GROUP BY id
) a
GROUP BY id
You could also union, then sum:
SELECT
id,
SUM(CASE WHEN which = 'a' then v end) as sum_a,
SUM(CASE WHEN which = 'b' then v end) as sum_b,
SUM(CASE WHEN which = 'c' then v end) as sum_c
FROM
(
SELECT id, col1 as v, 'a' as which FROM a GROUP BY id
UNION ALL
SELECT id, col2 as v, 'b' as which FROM b GROUP BY id
UNION ALL
SELECT id, col3 as v, 'c' as which FROM c GROUP BY id
) a
GROUP BY id
You cant easily use a join, unless all tables have all values of ID, in which case I'd say you can sum them as subqueries and then join the results together.. But if one of your tables suddenly lacks an id value that the other two tables have, that row disappears from your results (unless you use full outer join and some really ugly coalescing in your ON clause)
Using union in this case will give you a more missing-value-tolerant result set, as it can cope with missing values of ID in any table. Thus, we union the tables together into one dataset, but use a constant to track which table the value came from, that way we can pick it out into its own summation later
If any id value is not present in any table, then the sum for that column will be null. If you want it to be 0, you can change the MAX to SUM or wrap the MAX in a COALESCE

Selecting all rows with only one value in column with another common value

my table:
drop table if exists new_table;
create table if not exists new_table(
obj_type int(4),
user_id varchar(30),
payer_id varchar(30)
);
insert into new_table (obj_type, user_id, payer_id) values
(1, 'user1', 'payer1'),
(1, 'user2', 'payer1'),
(2, 'user3', 'payer1'),
(1, 'user1', 'payer2'),
(1, 'user2', 'payer2'),
(2, 'user3', 'payer2'),
(3, 'user1', 'payer3'),
(3, 'user2', 'payer3');
I am trying to select all the payer id's whose obj_type is only one value and not any other values. In other words, even though each payer has multiple users, I only want the payers who are only using one obj_type.
I have tried using a query like this:
select * from new_table
where obj_type = 1
group by payer_id;
But this returns rows whose payers also have other user's with other obj_types. I am trying to get a result that looks like:
obj | user | payer
----|-------|--------
3 | user1 | payer3
3 | user2 | payer3
Thanks in advance.
That is actually easy:
SELECT player_id
FROM new_table
GROUP BY player_id
HAVING COUNT(DISTINCT obj_type) = 1
Having filters rows just like WHERE but it does so after the aggregation.
The difference is best explained by an example:
SELECT dept_id, SUM(salary)
FROM employees
WHERE salary > 100000
GROUP BY dept_id
This will give you the sum of the salaries of people earning more than 100000 each.
SELECT dept_id, SUM(salary)
FROM employees
GROUP BY dept_id
HAVINF salary > 100000
The second query will give you the departments where all employees together earn more than 100000 even if no single employee earns that much.
If you want to return all rows without grouping them you can use analytic functions:
SELECT * FROM (
SELECT obj_type,user_id,
payer_id,
COUNT(DISTINCT obj_type) OVER (PARTITION BY payer_id) AS distinct_obj_type
FROM new_table)
WHERE distinct_obj_type = 1
Or you can use exist with the query above:
SELECT *
FROM new_table
WHERE payer_id IN (SELECT payer_id
FROM new_table
GROUP BY payer_id
HAVING COUNT(DISTINCT obj_type) = 1)

How to find the column having duplicate value in SQL Server

I have a table:
create table #t
(
ID int,
value nvarchar(5)
)
insert #t
values (1,'A'), (2, 'B'), (3, 'A'), (3, 'B')
Sample data:
ID value
------------
1 A
2 B
3 A
3 B
For my project I need the ID which has having both the values
Result :
ID
3
Kindly help me out.
To get IDs having 2 values
select id
from #t
group by id
having count(distinct value) >= 2
or to get all IDs having A and B
select id
from #t
where value in ('A','B')
group by id
having count(distinct value) = 2
or to make it more generic to get IDs having all values
select id
from #t
group by id
having count(distinct value) = (select count(distinct value) from #t)

Getting x amount of rows where id = id(other table) with the highest amount of rows with that id

x = doesn't matter
table A
0 row0
1 row1
2 row2
table B
0 x
1 x
0 X
2 X
2 X
0 x
there are in this example 3 rows of 0, 2 rows of 2 and 1 row 1.
i want to get for example the two rows who has the highest count of rows.
desired result:
0 row0 ==> because 3 rows in b is the highest amount.
2 row2 ==> because 2 rows in b is the second highest amount.
my attempt so far:
SELECT Id, Name FROM A
WHERE Id =
(
SELECT IdB FROM B
GROUP BY IdB
ORDER BY count(IdB) DESC
LIMIT 2
)
edit: i use mysql
thanks
You want to JOIN your tables together by id:
SELECT
A.id,
A.name
FROM
A join
B on A.id = B.id
GROUP BY
A.name
ORDER BY
count(B.id) desc
LIMIT 2
SQLFIDDLE
That will return an output that matches your example above.
Not sure how you want to handle ties.
What this does is JOIN (learn more about joins) the two tables together based on your ID's then aggregate the results counting the occurrences of B.IDB and displaying that count along with the name from table A
SELECT A.Name, count(B.IDB) cnt
FROM A
INNER JOIN IDB
on A.ID = B.IDB
GROUP BY A.Name
ORDER BY count(B.IDB) desc
LIMIT 2
But with your example this should return
Name cnt
row0 3
row2 2
-- Sample table
-- table A schema
create table A (id int, name varchar(10))
insert into A
values (0, 'row0')
insert into A
values (1, 'row1')
insert into A
values (2, 'row2')
select * from A
-- table A schema
create table B (id int, name varchar(10))
insert into B
values (0, 'X')
insert into B
values (0, 'X')
insert into B
values (0, 'X')
insert into B
values (0, 'X')
insert into B
values (1, 'X')
insert into B
values (2, 'X')
insert into B
values (2, 'X')
insert into B
values (2, 'X')
insert into B
values (2, 'X')
insert into B
values (2, 'X')
insert into B
values (2, 'X')
select * from B
--Query
select Top 2 A.name, count(B.id) from A
inner join B on A.id = b.id
group by A.name
order by count(A.name) desc
You can try ;
select a.name, b.count
from A a,
(select idB, count(idB) count from B group by idB) b
where a.id = b.idB
and count > 1