MySQL Relational Division with multiple IDs - mysql

Please take a look at the question described here: MySQL ONLY IN() equivalent clause , regarding Relational Division in MySQL.
My database structure is very similar to the one described, but in the "Chocolate Boys Table", I have an additional ID field - let's call it milk ID.
Chocolates Boys Table
+----+---------+-----------------------+
| id | chocolate_id | milk id | boy_id |
+----+--------------+---------+--------+
| 1 | 1000 | 2000 | 10007 |
| 2 | 1003 | 2001 | 10007 |
| 3 | 1006 | 2005 | 10007 |
| 4 | 1000 | 2001 | 10009 |
| 5 | 1001 | 2000 | 10009 |
| 6 | 1005 | 2008 | 10009 |
+----+--------------+---------+--------|
The objective is to run a query that retrieves the boy ID that contains the exact chocolate and milk IDs that I pass in. Here are some examples of my expected results:
Example #1:
Chocolate IDs Passed In (in order) - 1000,1003,1006.
Milk IDs Passed In (in order) - 2000,2001,2005.
Expected Result: Query returns boy ID of 10007.
Example #2:
Chocolate IDs Passed In (in order) - 1000,1003.
Milk IDs Passed In (in order) - 2000,2001.
Expected Result: Empty result set.
Example #3:
Chocolate IDs Passed In (in order) - 1003,1000,1006.
Milk IDs Passed In (in order) - 2000,2001,2005.
Expected Result: Empty result set - The passed in IDs are included in boy ID 10007, but the order is wrong. The values of Chocolate ID and Milk ID don't match up if examined on a row by row basis.
I am attempting to use a slightly modified version of John Woo's solution in order to incorporate the added ID field:
SELECT boy_id
FROM boys_chocolates a
WHERE chocolate_id IN (1003,1000,1006) AND milk_id IN (2000,2001,2005) AND
EXISTS
(
SELECT 1
FROM boys_chocolates b
WHERE a.boy_ID = b.boy_ID
GROUP BY boy_id
HAVING COUNT(DISTINCT chocolate_id) = 3
)
GROUP BY boy_id
HAVING COUNT(*) = 3
The problem that I'm having is that the IN function does not enforce order, as seen in example #3. I would like the above query to return an empty result set. What needs to be changed in order to address this problem? Thank you!

Try this approach:
SELECT a.boy_id
FROM
(SELECT id, boy_id FROM boys_chocolates WHERE chocolate_id = 1000) a
JOIN
(
(SELECT id, boy_id FROM boys_chocolates WHERE chocolate_id = 1003) b,
(SELECT id, boy_id FROM boys_chocolates WHERE chocolate_id = 1006) c,
(SELECT id, boy_id FROM boys_chocolates WHERE milk_id = 2000) d,
(SELECT id, boy_id FROM boys_chocolates WHERE milk_id = 2001) e,
(SELECT id, boy_id FROM boys_chocolates WHERE milk_id = 2005) f
)
ON a.boy_id = b.boy_id AND a.boy_id = c.boy_id AND a.boy_id = d.boy_id
AND a.boy_id = e.boy_id AND a.boy_id = f.boy_id AND b.id > a.id
AND c.id > b.id AND e.id > d.id AND f.id > e.id;
Replace 1000 1003 1006 with your first chocolate_id, second chocolate_id, third chocolate_id respectively. Also replace 2000 2001 2005 with your first milk_id, second milk_id, third milk_id.

Related

How to write a multiple query solution as a single query in MySQL

My task is to find all those subjects, by their id, that have (at least one, but) the fewest lowest passing grades in the database (the grade being the grade 6). I've managed to write the solution with three queries, however my task is to write it as a single query in MySQL. Thank you in advance.
-- 1. single query "solution"
SELECT subject_id FROM (SELECT subject_id, COUNT(*) AS six_count
FROM exams WHERE grade = 6
GROUP BY subject_id) AS sixes
WHERE subject_id = (SELECT MIN(six_count) FROM sixes);
-- 2. multiple queries solution
CREATE TABLE sixes AS (SELECT subject_id, COUNT(*) AS six_count
FROM exams WHERE grade = 6
GROUP BY subject_id);
SELECT subject_id FROM sixes
WHERE subject_id = (SELECT MIN(six_count) FROM sixes);
DROP TABLE sixes;
EDIT:
Exams table example:
| subject_id | student_id | exam_year | exam_mark | grade | exam_date |
| 1 | 20100022| 2011 | 'apr' | 10 | 2011-04-11 |
| 2 | 20100055| 2011 | 'oct' | 6 | 2011-10-04 |
| 3 | 20110030| 2011 | 'jan1' | 7 | 2011-01-26 |
| 5 | 20110055| 2011 | 'jan2' | 6 | 2011-02-13 |
| 5 | 20110001| 2011 | 'jun1' | 8 | 2011-06-23 |
This should do the trick. The sub query selects the first lowest number of sixes. The main query selects all subjects with that number. The trick is in ORDER BY count(*) LIMIT 1, which makes the sub query return the record with the lowest count.
SELECT
subject_id,
count(*) as six_count
FROM exams
WHERE grade = 6
GROUP BY subject_id
HAVING count(*) =
( SELECT count(*)
FROM exams
WHERE grade = 6
GROUP BY subject_id
ORDER BY count(*)
LIMIT 1
)
This pattern should do the trick. Generalized names.
SELECT subjectID
FROM TEST_DATA
WHERE grade = 6
GROUP
BY SubjectID
HAVING COUNT(1) =
( SELECT count(1) AS minCount
FROM TEST_DATA
WHERE grade = 6
GROUP
BY subjectID
ORDER
BY minCount
LIMIT 1
);

leetcode 574 winning candidate query

Please see the picture for ERROR SCREENSHOT
Table: Candidate
+-----+---------+
| id | Name |
+-----+---------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
+-----+---------+
Table: Vote
+-----+--------------+
| id | CandidateId |
+-----+--------------+
| 1 | 2 |
| 2 | 4 |
| 3 | 3 |
| 4 | 2 |
| 5 | 5 |
+-----+--------------+
id is the auto-increment primary key, CandidateId is the id appeared in Candidate table.
Write a sql to find the name of the winning candidate, the above example will return the winner B.
+------+
| Name |
+------+
| B |
+------+
Notes:
You may assume there is no tie, in other words there will be at most one winning candidate.
Why this code can't work? Just try to use without limit
SELECT c.Name AS Name
FROM Candidate AS c
JOIN
(SELECT r.CandidateId AS can, MAX(r.Total_vote) AS big
FROM (SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) AS r) AS v
ON c.id = v.can;
In your query, here: SELECT r.CandidateId AS can, MAX(r.Total_vote) AS big
you use MAX aggregate function, without group by, which is not correct SQL.
Try:
SELECT Candidate.* FROM Candidate
JOIN (
SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId
ORDER BY COUNT(id) DESC LIMIT 1
) v
ON Candidate.id = v.CandidateId
This is a join/group by query with order by:
select c.name
from candidate c join
vote v
on v.candidateid = c.id
group by c.id, c.name
order by count(*) desc
limit 1;
SELECT c.Name AS Name
FROM Candidate AS c JOIN (SELECT r.CandidateId AS can
FROM
(SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) AS r
WHERE r.Total_vote = (SELECT MAX(r.Total_vote) FROM (SELECT
CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) r)) AS v
ON c.id = v.can;
This is updated code
My code has two errors. The first one is "use of an aggregate like Max requires a Group By clause if there are any non-aggregated columns in the select list", but not sure why my previous code still can run and show no error. Maybe the system add the group by function automatically when it run.
The second one is that max can't be used with Group by in this format.

Difference between these sql queries

I can't seem to understand why these two queries return different results for the following task: "Find names and grades of students who only have friends in the same grade. Return the result sorted by grade, then by name within each grade."
Tables here: https://lagunita.stanford.edu/c4x/DB/SQL/asset/socialdata.html
The first query:
SELECT DISTINCT h1.name, h1.grade
FROM Highschooler h1, Friend f, Highschooler h2
WHERE h1.ID = f.ID1 AND h2.ID = f.ID2 AND h1.grade = h2.grade
ORDER BY h1.grade, h1.name
The second query:
select name, grade from Highschooler
where ID not in (
select ID1 from Highschooler H1, Friend, Highschooler H2
where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and H1.grade <> H2.grade)
order by grade, name;
The second one returns the expected result, but not the first one. If anyone cares to clarify, Thanks.
The first query applies three filter in the query simultaneously to all data in tables and returns just those entries matching all the filters. The second query firstly does a subquery where it returns rows matching the subquery condition and then all the IDs which are not there are returned, which includes also IDs for which H1.ID = Friend.ID1 and Friend.ID2 = H2.ID do not hold true. You can try something like:
select name, grade from Highschooler
where where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and ID not in (
select ID1 from Highschooler H1, Friend, Highschooler H2
where H1.ID = Friend.ID1 and Friend.ID2 = H2.ID and H1.grade <> H2.grade)
order by grade, name;
It can be standard NULL - related behavior . Demo
create table tble (ID int, col int);
insert tble(ID, col)
values (1,1),(2,null),(3,2);
select *
from tble
where col=1;
select *
from tble
where ID not in (select t2.ID from tble t2 where t2.col<>1);
Because select t2.ID from tble t2 where t2.col<>1 must not return ID 2 as predicate NULL <> 1 does not evaluates to TRUE.
I just wanted to add further clarification on the first query explanations. The first query results in this:
SELECT DISTINCT h1.name, h1.grade FROM Highschooler h1, Friend f, Highschooler h2 WHERE h1.ID = f.ID1 AND h2.ID = f.ID2 AND h1.grade = h2.grade ORDER BY h1.grade, h1.name;
+-----------+-------+
| name | grade |
+-----------+-------+
| Cassandra | 9 |
| Gabriel | 9 |
| Jordan | 9 |
| Tiffany | 9 |
| Andrew | 10 |
| Brittany | 10 |
| Haley | 10 |
| Kris | 10 |
| Alexis | 11 |
| Gabriel | 11 |
| Jessica | 11 |
| John | 12 |
| Jordan | 12 |
| Kyle | 12 |
| Logan | 12 |
+-----------+-------+
15 rows in set (0,00 sec)
Since you are performing a cartesian product (by means of selecting the same table Highschooler twice), and one of your conditions is h1.grade = h2.grade, you are going to retrieve all students that have at least one friend in the same grade. The only student you are not getting is Austin, which is the only one that doesn't have any friends in his grade.
The second query is explained in Radek's answer.
I hope this helps.

Is there any way in Google Big Query to Left Outer Join one-to-one *without reusing* any row from the right?

We have a group of patients in one table and we want to match each of them to a patient exactly like them in another table - but we want pairs of patients so we cannot match a patient to more than one other patient.
Left Outer Joins add every occurrence of a match - which matches patients to every other possible match - so we need some other approach.
We see lots of answers on SO about matching to the first row - but that leaves us with a single patient being matched to multiple other patients - not a pair like we need.
Is there any possible way to create pair matches without duplication between tables in Google Big Query? (Even if it takes multiple steps.)
ADDENDUM: Here are example tables. It would be great to see a SQL example using this.
Here is what is needed.
Example Source Tables:
Table A
PatientID Race Gender
1 A F
2 B M
3 A F
Table B
PatientID
4 A F
5 A F
6 B M
Results Table Desired:
Table C
A.PatientID B.PatientID_Match
1 4
2 6
3 5
CLARIFICATION: Patients in Table A must match patients from Table B. (They cannot match patients in their own table.)
select min (case tab when 'A' then patientID end) as A_patientID
,min (case tab when 'B' then patientID end) as B_patientID
from (select tab
,patientID
,rank() over (order by race,gender) r
,row_number() over (partition by tab,race,gender order by patientID) rn
from ( select 'A' as tab,A.* from A
union all select 'B' as tab,B.* from B
) t
) t
group by t.r
,t.rn
-- having count(*) = 2
;
+-------------+-------------+
| a_patientid | b_patientid |
+-------------+-------------+
| 3 | 5 |
+-------------+-------------+
| 2 | 6 |
+-------------+-------------+
| 1 | 4 |
+-------------+-------------+
The main idea -
Rows from both tables are divided to groups by their attributes (race,gender).
This is being done using the RANK function.
Within each group of attributes (race,gender) the rows are being ordered, per table, by their patientid .
+-----+-----------+------+--------+ +---+----+
| tab | patientid | race | gender | | r | rn |
+-----+-----------+------+--------+ +---+----+
+-----+-----------+------+--------+ +---+----+
| A | 1 | A | F | | 1 | 1 |
+-----+-----------+------+--------+ +---+----+
| B | 4 | A | F | | 1 | 1 |
+-----+-----------+------+--------+ +---+----+
+-----+-----------+------+--------+ +---+----+
| A | 3 | A | F | | 1 | 2 |
+-----+-----------+------+--------+ +---+----+
| B | 5 | A | F | | 1 | 2 |
+-----+-----------+------+--------+ +---+----+
+-----+-----------+------+--------+ +---+----+
| A | 2 | B | M | | 5 | 1 |
+-----+-----------+------+--------+ +---+----+
| B | 6 | B | M | | 5 | 1 |
+-----+-----------+------+--------+ +---+----+
In the final phase, the rows are being divided into groups (GROUP BY) by their RANK (r) and ROW_NUMBER (rn) values, which means each group has a row from each table (or only a single row if there is no matching row from the other table).
In many databases, a lateral join would be the way to go. In Google, you can use row_number(). The query looks something like this:
select p.*, pp.patient_id as other_patient_id
from patients p cross join
(select p.*,
row_number() over (partition by col1, col2, col3 order by col1) as seqnum
from patients p
) pp
where pp.seqnum = 1;
The columns in the partition by are the columns used for similarity.
SELECT
a.PatientID AS PatientID,
b.PatientID AS PatientID_Match
FROM (
SELECT PatientID, Race, Gender,
ROW_NUMBER() OVER(PARTITION BY Race, Gender) AS Pos
FROM TableA
) AS a
JOIN (
SELECT PatientID, Race, Gender,
ROW_NUMBER() OVER(PARTITION BY Race, Gender) AS Pos
FROM TableB
) AS b
ON a.Race = b.Race AND a.Gender = b.Gender AND a.Pos = b.Pos
Above will leave out those patients from TableA which either do not have match in TableB or potential match in TableB was already used as match for another patient in TableA (as per your we want pairs of patients so we cannot match a patient to more than one other patient. requirement)
To address Dudu's comments about NULL for attributes:
SELECT
a.PatientID AS PatientID,
b.PatientID AS PatientID_Match
FROM (
SELECT
PatientID, IFNULL(Race, 'null') AS Race, IFNULL(Gender, 'null') AS Gender,
ROW_NUMBER() OVER(PARTITION BY Race, Gender) AS Pos
FROM TableA
) AS a
JOIN (
SELECT
PatientID, IFNULL(Race, 'null') AS Race, IFNULL(Gender, 'null') AS Gender,
ROW_NUMBER() OVER(PARTITION BY Race, Gender) AS Pos
FROM TableB
) AS b
ON a.Race = b.Race AND a.Gender = b.Gender AND a.Pos = b.Pos

retrieve value of maximum occurrence in a table

I am in a very complicated problem. Let me explain you first what I am doing right now:
I have a table name feedback in which I am storing grades against course id. The table looks like this:
+-------+-------+-------+-------+-----------+--------------
| id | cid | grade |g_point| workload | easiness
+-------+-------+-------+-------+-----------+--------------
| 1 | 10 | A+ | 1 | 5 | 4
| 2 | 10 | A+ | 1 | 2 | 4
| 3 | 10 | B | 3 | 3 | 3
| 4 | 11 | B+ | 2 | 2 | 3
| 5 | 11 | A+ | 1 | 5 | 4
| 6 | 12 | B | 3 | 3 | 3
| 7 | 11 | B+ | 2 | 7 | 8
| 8 | 11 | A+ | 1 | 1 | 2
g_point has just specific values for the grades, thus I can use these values to show the user courses sorted by grades.
Okay, now first my task is to print out the grade of each course. The grade can be calculated by the maximum occurrence against each course. For example from this table we can see the result of cid = 10 will be A+, because it is present two times there. This is simple. I have already implemented this query which I will write here in the end.
The main problem is when we talk about the course cid = 11 which has two different grades. Now in that situation client asks me to take the average of workload and easiness of both these courses and whichever course has the greater average should be shown. The average would be computed like this:
all workload values of the grade against course
+ all easiness values of the grade against course
/ 2
From this example cid = 11 has four entries,have equal number of grades against a course
B+ grade average
avgworkload(2 + 7)/2=x
avgeasiness(3 + 8)/2 = y
answer x+y/2 = 10
A+ grade average
avgworkload(5 + 1)/2=x
avgeasiness(4 + 2)/2 = y
answer x+y/2 = 3
so the grade should be B+.
This is the query which I am running to get the max occurrence grade
SELECT
f3.coursecodeID cid,
f3.grade_point p,
f3.grade g
FROM (
SELECT
coursecodeID,
MAX(mode_qty) mode_qty
FROM (
SELECT
coursecodeID,
COUNT(grade_point) mode_qty
FROM feedback
GROUP BY
coursecodeID, grade_point
) f1
GROUP BY coursecodeID
) f2
INNER JOIN (
SELECT
coursecodeID,
grade_point,
grade,
COUNT(grade_point) mode_qty
FROM feedback
GROUP BY
coursecodeID, grade_point
) f3
ON
f2.coursecodeID = f3.coursecodeID AND
f2.mode_qty = f3.mode_qty
GROUP BY f3.coursecodeID
ORDER BY f3.grade_point
Here is SQL Fiddle.
I added a table Courses with the list of all course IDs, to make the main idea of the query easier to see. Most likely you have it in the real database. If not, you can generate it on the fly from feedback by grouping by cid.
For each cid we need to find the grade. Group feedback by cid, grade to get a list of all grades for the cid. We need to pick only one grade for a cid, so we use LIMIT 1. To determine which grade to pick we order them. First, by occurrence - simple COUNT. Second, by the average score. Finally, if there are several grades than have same occurrence and same average score, then pick the grade with the smallest g_point. You can adjust the rules by tweaking the ORDER BY clause.
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
FROM courses
ORDER BY courses.cid
result set
cid CourseGrade
10 A+
11 B+
12 B
UPDATE
MySQL doesn't have lateral joins, so one possible way to get the second column g_point is to repeat the correlated sub-query. SQL Fiddle
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
,(
SELECT feedback.g_point
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGPoint
FROM courses
ORDER BY CourseGPoint
result set
cid CourseGrade CourseGPoint
10 A+ 1
11 B+ 2
12 B 3
Update 2 Added average score into ORDER BY SQL Fiddle
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
,(
SELECT feedback.g_point
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGPoint
,(
SELECT (AVG(workload) + AVG(easiness))/2
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS AvgScore
FROM courses
ORDER BY CourseGPoint, AvgScore DESC
result
cid CourseGrade CourseGPoint AvgScore
10 A+ 1 3.75
11 B+ 2 5
12 B 3 3
If I understood well you need an inner select to find the average, and a second outer select to find the maximum values of the average
select cid, grade, max(average)/2 from (
select cid, grade, avg(workload + easiness) as average
from feedback
group by cid, grade
) x group by cid, grade
This solution has been tested on your data usign sql fiddle at this link
If you change the previous query to
select cid, max(average)/2 from (
select cid, grade, avg(workload + easiness) as average
from feedback
group by cid, grade
) x group by cid
You will find the max average for each cid.
As mentioned in the comments you have to choose wich strategy use if you have more grades that meets the max average. For example if you have
+-------+-------+-------+-------+-----------+--------------
| id | cid | grade |g_point| workload | easiness
+-------+-------+-------+-------+-----------+--------------
| 1 | 10 | A+ | 1 | 5 | 4
| 2 | 10 | A+ | 1 | 2 | 4
| 3 | 10 | B | 3 | 3 | 3
| 4 | 11 | B+ | 2 | 2 | 3
| 5 | 11 | A+ | 1 | 5 | 4
| 9 | 11 | C | 1 | 3 | 6
You will have grades A+ and C soddisfing the maximum average 4.5