Select two items with maximum number of common values - mysql

I have the following table:
+----+-----------+-----------+
| id | teacherId | studentId |
+----+-----------+-----------+
| 1 | 1 | 4 |
| 2 | 1 | 2 |
| 3 | 1 | 1 |
| 4 | 1 | 3 |
| 5 | 2 | 2 |
| 6 | 2 | 1 |
| 7 | 2 | 3 |
| 8 | 3 | 9 |
| 9 | 3 | 6 |
| 10 | 1 | 6 |
+----+-----------+-----------+
I need a query to find two teacherId's with maximum number of common studentId's.
In this case teachers with teacherIds 1,2 have common students with studentIds 2, 1, 3, which is greater than 1,3 having common students 6.
Thanks in Advance!
[Edit]: After several hours I've had the following solution:
SELECT * FROM (
SELECT r1tid, r2tid, COUNT(r2tid) AS cnt
FROM (
SELECT r1.teacherId AS r1tid, r2.teacherId AS r2tid
FROM table r1
INNER JOIN table r2 ON r1.studentId=r2.studentId AND r1.teacherId!=r2.teacherId
ORDER BY r1tid
) t
GROUP BY r1tid, r2tid
ORDER BY cnt DESC
) t GROUP BY cnt ORDER BY cnt DESC LIMIT 1;
I was sure that there must exist more short and elegant solution, but I could not find it.

You would do this with a self-join. Assuming no duplicates in the table:
select t.teacherid, t2.teacherid, count(*) as NumStudentsInCommon
from table t join
table t2
on t.studentid = t2.studentid and
t.teacherid < t2.teacherid
group by t.teacherid, t2.teacherid
order by NumStudentsInCommon desc
limit 1;
If you had duplicates, you would just replace count(*) with count(distinct studentid), but count(distinct) requires a bit more work.

select t.teacherId, t2.teacherId, sum(t.studentId) as NumStudentsInCommon
from table1 t join
table1 t2
on t.studentId = t2.studentId and
t.teacherId < t2.teacherId
group by t.teacherId, t2.teacherId
order by NumStudentsInCommon desc

Related

How to calculate max values of groups?

I have a table like so (I'm not sure how to format tables)
Category / Products / Purchases
1 | A | 12
1 | B | 13
1 | C | 11
2 | A | 1
2 | B | 2
2 | C | 3
Expected output:
1 | B | 13
2 | C | 3
However I keep on getting
1 | A | 13
2 | A | 3
ie. It just selects the first occurrence of the second column.
Here is my code:
SELECT Category, Products, MAX(Purchases) FROM myTable GROUP BY Category;
Use filtering in the where clause:
select t.*
from t
where t.purchases = (select max(t2.purchases) from t t2 where t2.category = t.category);
With NOT EXISTS:
select m.* from myTable m
where not exists (
select 1 from myTable
where category = m.category and purchases > m.purchases
)
See the demo.
Results:
| Category | Products | Purchases |
| -------- | -------- | --------- |
| 1 | B | 13 |
| 2 | C | 3 |
You can use row_number() to identify max purchase for each group or replace rownumber() to rank() if there are ties of max purchases for each group
Select Category, Products,
Purchases from (Select Category,
Products,
Purchases,
row_number() over (partition by
category, products order by
purchases desc) rn from table) t
where t.rn=1
)

select count only showing 1 result and the wrong one

I want to search TABLE1 and count which number_id has the most 5's in experience column.
TABLE1
+-------------+------------+
| number_id | experience |
+-------------+------------+
| 20 | 5 |
| 20 | 5 |
| 19 | 1 |
| 18 | 2 |
| 15 | 3 |
| 13 | 1 |
| 10 | 5 |
+-------------+------------+
So in this case it would be number_id=20
Then do an inner join on TABLE2 and map the number that matches the number_id in TABLE1.
TABLE2
+-------------+------------+
| id | number |
+-------------+------------+
| 20 | 000000000 |
| 29 | 012345678 |
| 19 | 123456789 |
| 18 | 223456789 |
| 15 | 345678910 |
| 13 | 123457898 |
| 10 | 545678910 |
+-------------+------------+
So the result would be:
000000000 (2 results of 5)
545678910 (1 result of 5)
So far I have:
SELECT number, experience, number_id, COUNT(*) AS SUM FROM TABLE1
INNER JOIN TABLE2 ON TABLE1.number_id = TABLE2.id
WHERE experience = '5' order by SUM LIMIT 10
But it's returning just
545678910
How can I get it to return both results and by order of number of instances of 5 in the experience column?
Thanks
This query will give you the results that you want. The subquery fetches all the number_id that have experience values of 5. The SUM(experience=5) works because MySQL uses a value of 1 for true and 0 for false. The results of the subquery are then joined to table2 to give the number field. Finally the results are ordered by the number of experience=5:
SELECT t2.number, t1.num_fives
FROM (SELECT number_id, SUM(experience = 5) AS num_fives
FROM table1
WHERE experience = 5
GROUP BY number_id) t1
JOIN table2 t2
ON t2.id = t1.number_id
ORDER BY num_fives DESC
Output:
number num_fives
000000000 2
545678910 1
SQLFiddle Demo
Add a group by clause:
SELECT number, experience, number_id, COUNT(*) AS SUM
FROM TABLE1
JOIN TABLE2 ON TABLE1.number_id = TABLE2.id
WHERE experience = '5'
GROUP BY 1, 2, 3 -- <<< Added this clause
ORDER BY SUM
LIMIT 10

how to select all of duplicate record in mysql

My records is:
name | id | AVG(point) as point
a | 1 | 6
b | 2 | 6
c | 3 | 5
d | 4 | 5
e | 5 | 4
f | 6 | 3
g | 7 | 2
How to select record below:
1.I want to select top 3 record, result follow:
name | id | AVG(point) as point
a | 1 | 6
b | 2 | 6
c | 3 | 5
d | 4 | 5
e | 5 | 4
2.I want to select record not into top 3, result follow:
name | id | AVG(point) as point
f | 6 | 3
g | 7 | 2
How can I do?
There are several ways to do these. Here's a couple using in and not in.
For the top 3, you can use in:
select *
from yourtable
where point in (select distinct point
from yourtable
order by 1 desc
limit 3)
For the rest, use not in instead:
select *
from yourtable
where point not in (select distinct point
from yourtable
order by 1 desc
limit 3)
Other methods include exists with not exists and distinct with joins.
select *
from yourtable as t1
inner join (select distinct point
from yourtable
order by 1 desc
limit 3) as t2
on t1.point = t2.point
For the second part of your question, do not use
desc

MySQL group by with MAX not working as expected?

I have a table:
ID | User | Amount
1 | 1 | 50
2 | 1 | 80
3 | 2 | 80
4 | 2 | 100
5 | 1 | 90
6 | 1 | 120
7 | 2 | 120
8 | 1 | 150
9 | 2 | 300
I do a query:
SELECT * FROM TABLE ORDER BY amount DESC group by userid
I'm getting this:
ID | User | Amount
1 | 1 | 50
2 | 1 | 80
But I was expecting:
ID | User | Amount
9 | 2 | 300
8 | 1 | 150
What is wrong with my sql?
When grouping you have to use aggregate functions like max() for all columns that are not grouped by
select t.*
from table t
inner join
(
SELECT userid, max(amount) as total
FROM TABLE
group by userid
) x on x.userid = t.userid and x.total = t.amount
ORDER BY t.amount DESC
Another solution.Check SQL Fiddle
Using FIND_IN_SET clause
SELECT
ua.*
FROM user_amount ua
WHERE FIND_IN_SET(ua.amount,(SELECT
MAX(ua1.amount)
FROM user_amount ua1
WHERE ua1.user = ua.user)) > 0
ORDER BY amount desc;
Using IN clause
SELECT
ua.*
FROM user_amount ua
WHERE ua.amount IN (SELECT
MAX(ua1.amount)
FROM user_amount ua1
WHERE ua1.user = ua.user)
ORDER BY amount desc

How to select only the latest rows for each user?

My table looks like this:
id | user_id | period_id | completed_on
----------------------------------------
1 | 1 | 1 | 2010-01-01
2 | 2 | 1 | 2010-01-10
3 | 3 | 1 | 2010-01-13
4 | 1 | 2 | 2011-01-01
5 | 2 | 2 | 2011-01-03
6 | 2 | 3 | 2012-01-13
... | ... | ... | ...
I want to select only the latest users periods entries, bearing in mind that users will not all have the same period entries.
Essentially (assuming all I have is the above table) I want to get this:
id | user_id | period_id | completed_on
----------------------------------------
3 | 3 | 1 | 2010-01-13
4 | 1 | 2 | 2011-01-01
6 | 2 | 3 | 2012-01-13
Both of the below queries always resulted with the first user_id occurance being selected, not the latest (because the ordering happens after the rows are selected from what I understand):
SELECT
DISTINCT user_id,
period_id,
completed_on
FROM my_table
ORDER BY
user_id ASC,
period_id DESC
SELECT *
FROM my_table
GROUP BY user_id
ORDER BY
user_id ASC,
period_id DESC
Seems like this should work using MAX and a subquery:
SELECT t.Id, t.User_Id, t.Period_Id, t.Completed_On
FROM my_table t
JOIN (SELECT Max(completed_on) Max_Completed_On, t.User_Id
FROM my_table
GROUP BY t.User_ID
) t2 ON
t.User_Id = t2.User_Id AND t.Completed_On = t2.Max_Completed_On
However, if you potentially have multiple records where the completed_on date is the same per user, then this could return multiple records. Depending on your needs, potentially adding a MAX(Id) in your subquery and joining on that would work.
try this:
SELECT t.Id, t.User_Id, t.Period_Id, t.Completed_On
FROM table1 t
JOIN (SELECT Max(completed_on) Max_Completed_On, t.User_Id
FROM table1 t
GROUP BY t.User_ID) t2 ON t.User_Id = t2.User_Id AND t.Completed_On = t2.Max_Completed_On
DEMO HERE