SQL how to select unique one-to-one pairings - mysql

edited to make clearer - many apologies for the confusion of the original example
I have the following table structure representing married couples:
id | Person | Spouse
______________________
1 | Mary | John
2 | John | Mary
3 | Katy | Bob
4 | Bob | Katy
5 | Mary | John
6 | John | Mary
In this example Mary is married to John, Katy to Bob and a different Mary is married to a different John.
How can I retrieve these pairs of married couples?
I have got close with this:
SELECT
p.id id1,
q.id id2
FROM
people p
INNER JOIN people q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.id < q.id
ORDER BY p.id
However this returns:
1 | 2 (1st Mary & 1st John)
1 | 6 (1st Mary & 2nd John) *problem*
2 | 5 (1st John & 2nd Mary) *problem*
3 | 4 (Katy & Bob)
5 | 6 (2nd Mary & 2nd John)
How can I make sure the 1st Mary and 1st John are only married once (i.e. remove the problem rows above)?
Many thanks
Here's the SQL to create the example:
CREATE TABLE people
(`id` int, `person` varchar(7), `spouse` varchar(7))
;
INSERT INTO people
(`id`, `person`, `spouse`)
VALUES
(1, 'Mary', 'John'),
(2, 'John', 'Mary'),
(3, 'Katy', 'Bob'),
(4, 'Bob', 'Katy'),
(5, 'Mary', 'John'),
(6, 'John', 'Mary')
;
SELECT
p.id id1,
q.id id2
FROM
people p
INNER JOIN people q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.id < q.id
ORDER BY p.id
;

I'll give it a try:
SELECT
p.id AS id1,
q.id AS id2
FROM
people AS p
JOIN people AS q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.id < q.id
JOIN (SELECT
p.id, COUNT(*) AS rank
FROM
people AS p
INNER JOIN people AS p2 ON
p.person = p2.person AND
p.spouse = p2.spouse AND
p.id >= p2.id
GROUP BY p.id
) AS x ON
x.id = p.id
JOIN (SELECT
p.id, COUNT(*) AS rank
FROM
people AS p
INNER JOIN people AS p2 ON
p.person = p2.person AND
p.spouse = p2.spouse AND
p.id >= p2.id
GROUP BY p.id
) AS y ON
y.id = q.id AND
y.rank = x.rank ;
And another one:
SELECT
p.id AS id1,
q.id AS id2
FROM
people AS p
JOIN people AS q ON
p.person = q.spouse AND
q.person = p.spouse
JOIN people AS p2 ON
p.person = p2.person AND
p.spouse = p2.spouse AND
p.id >= p2.id
JOIN people AS q2 ON
q.person = q2.person AND
q.spouse = q2.spouse AND
q.id >= q2.id
WHERE
p.id < q.id
GROUP BY
p.id, q.id
HAVING
COUNT(DISTINCT p2.id) = COUNT(DISTINCT q2.id) ;
Both tested at SQL-Fiddle
It would be much simpler, if only MySQL had window functions (like almost all other DBMS have). Tested at Postgres fiddle:
WITH cte AS
( SELECT
id, person, spouse,
ROW_NUMBER() OVER( PARTITION BY person, spouse
ORDER BY id )
AS rn
FROM
people
)
SELECT
p.id AS id1,
q.id AS id2
FROM
cte AS p
JOIN cte AS q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.rn = q.rn AND
p.id < q.id ;

In this example Mary is married to John, Katy to Bob and a different Mary is married to Richard.
Nothing in your show data structures allows to differentiate between those two “Marys”, because there is no difference between them.
Both are just the text literal Mary. If you want to differentiate between different people that might have the same name, then you need another criterion, and a unique one at that. (F.e. the id of the database records for each individual person.)

Your database stricture is wrong.
People like Mary, John, etc. do not have identity.
Some heuristic query might help, but it is not a reliable solution.
So, please, improve you data structure.

Not very elegant, but works:
SELECT p.id, q.id
FROM people p
INNER JOIN people q ON
p.person1 = q.person2 and
q.person1 = p.person1
which in fact uses the existance of an inverted row as a selector

There's lots of ways of doing it, however one of the most important reasons for using a database is that it holds lots of data - and there should rarely be times when you ever write a query which retrieves lots of data. Except in very unusual circumstances, and for homework assignments, the results should be filtered according to some criteria. Hence the most appropriate solution depends on what other stuff you add to the query later.
But here's a couple of examples of how to get the unique pairs:
SELECT a, b, GROUP_CONCAT(id)
(SELECT id
, IF (person>=spouse, person, spouse) as a
, IF (person>=spouse, spouse, person) as b
FROM yourtable ) AS pairs
GROUP BY a,b;
SELECT id, person, spouse
FROM yourtable s1
WHERE NOT EXISTS ( SELECT 1
FROM yourtable s2
WHERE s2.id>s1.id
AND s1.person=s2.spouse
AND s1.spouse=S2.person);
(there are several other solutions).

Related

MySQL where joined value is multiple ANDs

Running into a seemingly simple JOIN problems here..
I have two tables, users and courses
| users.id | users.name |
| 1 | Joe |
| 2 | Mary |
| 3 | Mark |
| courses.id | courses.name |
| 1 | History |
| 2 | Math |
| 3 | Science |
| 4 | English |
and another table that joins the two:
| users_id | courses_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
I'm trying to find distinct user names who are in course 1 and course 2
It's possible a user is in other courses, too, but I only care that they're in 1 and 2 at a minimum
SELECT DISTINCT(users.name)
FROM users_courses
LEFT JOIN users ON users_courses.users_id = users.id
LEFT JOIN courses ON users_courses.courses_id = courses.id
WHERE courses.name = "History" AND courses.name = "Math"
AND courses.name NOT IN ("English")
I understand why this is returning an empty set (since no single joined row has History and Math - it only has one value per row.
How can I structure the query so that it returns "Joe" because he is in both courses?
Update - I'm hoping to avoid hard-coding the expected total count of courses for a given user, since they might be in other courses my search does not care about.
Join users to a query that returns the user ids that are in both courses:
select u.name
from users u
inner join (
select users_id
from users_courses
where courses_id in (1, 2)
group by users_id
having count(distinct courses_id) = 2
) c on c.users_id = u.id
You can omit distinct from the condition:
count(distinct courses_id) = 2
if there are no duplicates in users_courses.
See the demo.
If you want to search by course names and not ids:
select u.name
from users u
inner join (
select uc.users_id
from users_courses uc inner join courses c
on c.id = uc.courses_id
where c.name in ('History', 'Math')
group by uc.users_id
having count(distinct c.id) = 2
) c on c.users_id = u.id
See the demo.
Results:
| name |
| ---- |
| Joe |
You can use in operator and use select to generate list of potential users_id attending the second course, to find matching ones in the first course. This is many times faster than using joins.
select distinct u.users_id, users.name
from users_courses u, users
where u.users_id in (select distinct users_id from users_courses where courses_id = 2)
and u.courses_id = 1
and users.users_id = u.users_id
Almost similar to what #Nae's solution.
select u.name from users u
where exists
(select 1
from users_courses uc
where uc.course_id in (1, 2)
and uc.user_id = u.id
group by uc.user_id
having count(0) = 2);
Your code is close. Just use GROUP BY and a HAVING clause:
SELECT u.name
FROM users_courses uc JOIN
users u
ON uc.users_id = u.id JOIN
courses c
ON uc.courses_id = c.id
WHERE c.name IN ('History', 'Math')
GROUP BY u.name
HAVING COUNT(DISTINCT c.name) = 2;
Notes:
This assumes that users cannot have the same name. You might want to use GROUP BY u.id, u.name to ensure that you are counting individual users.
If users cannot take the same course multiple times, then use COUNT(*) = 2 rather than COUNT(DISTINCT).
I'd write:
SELECT MAX(u.name)
FROM users_courses uc
LEFT JOIN users u ON uc.users_id = u.id
WHERE uc.courses_id IN (1, 2)
GROUP BY uc.users_id
HAVING COUNT(0) = 2
;
For more complex conditions (for example requiring the user to be in certain classes but also not in certain classes such as "Science") this should also work:
SELECT MAX(u.name)
FROM users_courses uc
LEFT JOIN users u ON uc.users_id = u.id
GROUP BY uc.users_id
HAVING (
SUM(uc.courses_id = 1) = 1
-- user enrolled exactly once in the course 2
AND SUM(uc.courses_id = 2) = 1
-- user enrolled in course 3, 0 times
AND SUM(uc.courses_id = 3) = 0
)
;

Correctly join multiple many-to-many tables - MySQL query

a seemingly generic SQL query really left me clueless.
Here's the case.
I have 3 generic tables (simplified versions here):
Movie
id | title
-----------------------
1 | Evil Dead
-----------------------
2 | Bohemian Rhapsody
....
Genre
id | title
-----------------------
1 | Horror
-----------------------
2 | Comedy
....
Rating
id | title
-----------------------
1 | PG-13
-----------------------
2 | R
....
And 2 many-to-many tables to connect them:
Movie_Genre
movie_id | genre_id
Movie_Rating
movie_id | rating_id
The initial challenge was to write a query which allows me to fetch movies that belong to multiple genres (e.g. horror comedies or sci-fi action).
Thankfully, I was able to find this solution here
MySQL: Select records where joined table matches ALL values
However, what would be the correct option to fetch records that belong to multiple many-to-many tables? E.g. rated R horror comedies. Is there any way to do so without subquery (or a single one only)?
One method uses correlated subqueries:
select m.*
from movies m
where (select count(*)
from movie_genre mg
where mg.movie_id = m.id
) > 1 and
(select count(*)
from movie_rating mr
where mr.movie_id = m.id
) > 1 ;
With indexes on movie_genre(movie_id) and movie_rating(movie_id) this probably has quite reasonable performance.
The above is possibly the most efficient method. However, if you wanted to avoid subqueries, one method would be:
select mg.movie_id
from movie_genres mg join
movie_ratings mr
on mg.movie_id = mr.movie_id
group by mg.movie_id
having count(distinct mg.genre_id) > 0 and
count(distinct mr.genre_id) > 0;
More efficient than the above is aggregating before the join:
select mg.movie_id
from (select movie_id
from mg_genres
group by movie_id
having count(*) >= 2
) mg join
(select movie_id
from mg_ratings
group by movie_id
having count(*) >= 2
) mr
on mg.movie_id = mr.movie_id;
Although you state that you want to avoid subqueries, the irony is that the version with no subqueries probably has the worst performance of these three options.
E.g. rated R horror comedies
You can join all the tables together, aggregate by movie and filter with a HAVING clause:
select m.id, m.title
from movies m
inner join movie_genre mg on mg.movid_id = m.id
inner join genre g on g.id = mg.genre_id
inner join movie_rating mr on mr.movie_id = m.id
inner join rating r on r.id = mr.rating_id
group by m.id, m.title
having
max(r.title = 'R') = 1
and max(g.title = 'Horror') = 1
and max(g.title = 'Comedy') = 1
You can also use a couple of exists conditions along with correlated subqueries:
select m.*
from movie m
where
exists (
select 1
from movie_genre mg
inner join genre g on g.id = mg.genre_id
where mg.movie_id = m.id and g.title = 'R')
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Horror'
)
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Comedy'
)

Mysql SELECT only unique values in one column when left joined with another table

This is the query:
SELECT a.id, a.userName,if(o.userId=1,'C',if(i.userId=1,'I','N')) AS relation
FROM tbl_users AS a
LEFT JOIN tbl_contacts AS o ON a.id = o.contactId
LEFT JOIN tbl_invites AS i ON a.id = i.invitedId
ORDER BY relation
This returns the output as follows:
+----+--------------+-------------+
| ID | USERNAME | RELATION |
+----+--------------+-------------+
| 1 | ray | C |
+----+--------------+-------------+
| 2 | john | I |
+----+--------------+-------------+
| 1 | ray | N |
+----+--------------+-------------+
I need to remove the third row from the select query by checking if possible that id is duplicate. The priority is as follows:
C -> I -> N. So since there is already a "ray" with a C, I dont want it again with an I or N.
I tried adding distinct(a.id) but it doesn't work. How do I do this?
Why doesn't DISTINCT work for this?
From the specs you gave, all you have to do is group by ID and username, then pick the lowest value of relation you can find (since C < I < N)
SELECT a.id, a.userName, MIN(if(o.userId=1,'C',if(i.userId=1,'I','N'))) AS relation
FROM tbl_users AS a
LEFT JOIN tbl_contacts AS o ON a.id = o.contactId
LEFT JOIN tbl_invites AS i ON a.id = i.invitedId
GROUP BY a.id, a.username
There are multiple ways to get the group-wise maximum/minimum as you can see in this manual page.
The best one suited for you is the first one, if the order of the rows can not be defined by alphabetic order.
In this case, given if the desired order were z-a-m (see Rams' comment) you'd need the FIELD() function.
So your answer is
SELECT
a.id,
a.userName,
if(o.userId=1,'C',if(i.userId=1,'I','N')) AS relation
FROM tbl_users a
LEFT JOIN tbl_contacts AS o ON a.id = o.contactId
LEFT JOIN tbl_invites AS i ON a.id = i.invitedId
WHERE
if(o.userId=1,'C',if(i.userId=1,'I','N')) = (
SELECT
if(o.userId=1,'C',if(i.userId=1,'I','N')) AS relation
FROM tbl_users aa
LEFT JOIN tbl_contacts AS o ON aa.id = o.contactId
LEFT JOIN tbl_invites AS i ON aa.id = i.invitedId
WHERE aa.id = a.id AND aa.userName = a.userName
ORDER BY FIELD(relation, 'N', 'I', 'C') DESC
LIMIT 1
)
Note, you can also do it like ORDER BY FIELD(relation, 'C', 'I', 'N') to have it more readable / intuitive. I turned it the other way round, because if you'd have the possibility of having a 'X' in the relation, the FIELD() function would have returned 0 because X is not specified as a parameter. Therefore it would be sorted before 'C'. By sorting descending and turning the order of the parameters around this can not happen.

SQL - How to find the person with the highest grade

I am trying to find the name of the person who received the highest grade in the "Big Data" course.
I have 3 different tables:
People (id, name, age, address)
---------------------------------------------------
p1 | Tom Martin| 24 | 11, Integer Avenue, Fractions, MA
p2 | Al Smith | 33 | 26, Main Street, Noman's Land, PA
p3 | Kim Burton| 40 | 45, Elm Street, Blacksburg, VA
---------------------------------------------------
Courses (cid, name, department)
---------------------------------------------------------
c1 | Systematic Torture | MATH
c2 | Pretty Painful | CS
c3 | Not so Bad | MATH
c4 | Big Data | CS
---------------------------------------------------------
Grades (pid, cid, grade)
---------------------------------------------------
p1 | c1 | 3.5
p2 | c3 | 2.5
p3 | c2 | 4.0
p3 | c4 | 3.85
---------------------------------------------------
I can't figure out how to find the person with the highest grade without using any fancy SQL feature. That is, I just want to use SELECT, FROM, WHERE, UNION, INTERSECT, EXCEPT, CREATE VIEW and arithmetic comparison operators like =, <, >.
My outcome is showing something other than what I try to achieve.
This is what I have tried so far:
CREATE VIEW TEMPFIVE AS
SELECT G1.pid FROM Grades AS G1, Grades AS G2 WHERE G1.pid = G2.pid AND G1.cid = G2.cid
SELECT People.name, Courses.name FROM TEMPFIVE, People, Courses WHERE TEMPFIVE.pid = People.pid AND Courses.name = "Big Data";
+------------+----------+
| name | name |
+------------+----------+
| Tom Martin | Big Data |
| Al Smith | Big Data |
|Kim Burton | Big Data |
|Kim Burton | Big Data |
+------------+----------+
The easiest way is to use LIMT 1 with an ORDER BY DESC clause:
SELECT p.name, c.name, g.grade
FROM People AS p
JOIN Grades AS g ON p.id = g.pid
JOIN Courses AS c ON c.cid = g.cid
WHERE c.name = "Big Data"
ORDER BY g.grade DESC LIMIT 1
No Idea for MySql Query structure. So Explained in steps. I hope you can build query based on that.
join three tables according to their relationship
set course name 'Big data' in where clause
set grade order to DESC order
set the limit to fetch only first row.
Try this
select * from(
select p.id pid,p.name name, p.age age,p.address address,
c.cid cid, c.name coursname, c.department department,g.grade grade
from Grades G
left join
Courses C on g.cid = c.cid
left join
People p on g.pid = p.id
)a where coursname= 'Big Data' order by grade desc
you can apply the operators on the where clause
GiorgosBestos shoes the correct way if you only want 1 record. If you want ties. meaning if more than 1 student has the same MAX grade then you can do a subselect as follows:
SELECT p.name, c.name, g.grade
FROM
(
SELECT c.cid, MAX(g.grade) MaxGrade
FROM
Grades g
INNER JOIN Courses c
ON c.cid = g.cid
AND c.name = 'Big Data'
GROUP BY
c.cid
) m
INNER JOIN Grades g
ON g.cid = m.cid
AND g.grade = m.MaxGrade
INNER JOIN People p
ON g.pid = p.id
The following SQL covers the case when tow or more students have the same maximum grade:
SELECT P.NAME,
C.NAME,
G.GRADE
FROM PEOPLE P
JOIN GRADES G ON G.PID = P.ID
JOIN COURSES C ON C.CID = G.CID
WHERE C.NAME = 'Big data'
AND G.GRADE = (SELECT MAX(G2.GRADE)
FROM PEOPLE P2
JOIN GRADES G2 ON G2.PID = P2.ID
JOIN COURSES C2 ON C2.CID = G2.CID
WHERE C2.NAME = 'Big data');
It is similar but not identical to the SQL proposed by Matt.

GROUP BY aggregate by column

I've run into an issue where every time I attempt to use GROUP BY, H2 informs me that I need to add certain column names into the GROUP BY clause because, based on my research, it's unclear to H2 how to sort columns with non-repeating data.
Here's an example to elaborate:
Person table
+------------+------------+
| ID | Name |
+============+============+
| 1 | John |
+------------+------------+
| 2 | Jane |
+------------+------------+
Pet table
+------------+------------+------------+------------+
| ID | PERSON_ID | NAME | BIRTHDATE |
+============+============+============+============+
| 1 | 1 | Rufus | 2012 |
+------------+------------+------------+------------+
| 2 | 1 | Ben | 2014 |
+------------+------------+------------+------------+
Let's say I want all the oldest pets belonging to John.
SELECT PERSON.NAME, PET.NAME, PET.BIRTHDATE FROM PERSON
INNER JOIN PET ON PET.PERSON_ID = PERSON.ID
GROUP BY PERSON.NAME
ORDER BY PET.BIRTHDATE ASC
This would work perfectly in MySQL because it will simply group by PERSON.NAME and, by default, select the first record in the set. However, in H2 it needs to have aggregation such as MAX, MIN, etc.
The problem, as you can see in this example, is that you could use MIN to get the BIRTHDATE ordered correctly but there does not appear to be any aggregation function available for sorting NAME based on the oldest BIRTHDATE?
If you want the oldest pets, I would recommend:
SELECT p.NAME, pt.NAME, pt.BIRTHDATE
FROM PERSON p INNER JOIN
PET pt
ON pt.PERSON_ID = p.ID
WHERE pt.BIRTHDATE = (SELECT MIN(pt2.BIRTHDATE)
FROM pet pt2
WHERE pt2.PERSON_ID = PT.PERSON_ID
);
This explicitly selects the pet or pets (for each person) that have the earliest birth year. No aggregation is necessary.
You can also phrase this with JOINs only in the FROM:
SELECT p.NAME, pt.NAME, pt.BIRTHDATE
FROM PERSON p INNER JOIN
PET pt
ON pt.PERSON_ID = p.ID JOIN
(SELECT PERSON_ID, MIN(pt2.BIRTHDATE) as MINBT
FROM pet pt2
GROUP BY pt2.PERSON_ID
) pt2
ON pt2.PERSON_ID = PT.PERSON_ID;
You can always resort to NOT EXISTS in such cases, if the person has no pet with smaller birthdate then the pet is the oldest (if two pets happen to have the same age and both are the oldest ones for that person, then both are selected):
SELECT p.NAME, q.NAME, q.BIRTHDATE
FROM PERSON p
INNER JOIN PET q ON q.PERSON_ID = p.ID AND NOT EXISTS (
SELECT * FROM PET WHERE PERSON_ID = p.ID AND BIRTHDATE < q.BIRTHDATE
)
ORDER BY q.BIRTHDATE ASC
If you insist on GROUP BY you can do it like this:
SELECT a.name, b.name, b.BIRTHDATE FROM (
SELECT p.id, MIN(q.BIRTHDATE) birthdate FROM PERSON p
INNER JOIN PET q ON q.PERSON_ID = p.ID
GROUP BY p.ID
) o INNER JOIN PERSON a ON a.ID = o.ID
INNER JOIN PET b ON b.PERSON_ID = a.ID AND b.BIRTHDATE = o.BIRTHDATE
ORDER BY b.BIRTHDATE
If you can use WITH the query could be written easier.