a seemingly generic SQL query really left me clueless.
Here's the case.
I have 3 generic tables (simplified versions here):
Movie
id | title
-----------------------
1 | Evil Dead
-----------------------
2 | Bohemian Rhapsody
....
Genre
id | title
-----------------------
1 | Horror
-----------------------
2 | Comedy
....
Rating
id | title
-----------------------
1 | PG-13
-----------------------
2 | R
....
And 2 many-to-many tables to connect them:
Movie_Genre
movie_id | genre_id
Movie_Rating
movie_id | rating_id
The initial challenge was to write a query which allows me to fetch movies that belong to multiple genres (e.g. horror comedies or sci-fi action).
Thankfully, I was able to find this solution here
MySQL: Select records where joined table matches ALL values
However, what would be the correct option to fetch records that belong to multiple many-to-many tables? E.g. rated R horror comedies. Is there any way to do so without subquery (or a single one only)?
One method uses correlated subqueries:
select m.*
from movies m
where (select count(*)
from movie_genre mg
where mg.movie_id = m.id
) > 1 and
(select count(*)
from movie_rating mr
where mr.movie_id = m.id
) > 1 ;
With indexes on movie_genre(movie_id) and movie_rating(movie_id) this probably has quite reasonable performance.
The above is possibly the most efficient method. However, if you wanted to avoid subqueries, one method would be:
select mg.movie_id
from movie_genres mg join
movie_ratings mr
on mg.movie_id = mr.movie_id
group by mg.movie_id
having count(distinct mg.genre_id) > 0 and
count(distinct mr.genre_id) > 0;
More efficient than the above is aggregating before the join:
select mg.movie_id
from (select movie_id
from mg_genres
group by movie_id
having count(*) >= 2
) mg join
(select movie_id
from mg_ratings
group by movie_id
having count(*) >= 2
) mr
on mg.movie_id = mr.movie_id;
Although you state that you want to avoid subqueries, the irony is that the version with no subqueries probably has the worst performance of these three options.
E.g. rated R horror comedies
You can join all the tables together, aggregate by movie and filter with a HAVING clause:
select m.id, m.title
from movies m
inner join movie_genre mg on mg.movid_id = m.id
inner join genre g on g.id = mg.genre_id
inner join movie_rating mr on mr.movie_id = m.id
inner join rating r on r.id = mr.rating_id
group by m.id, m.title
having
max(r.title = 'R') = 1
and max(g.title = 'Horror') = 1
and max(g.title = 'Comedy') = 1
You can also use a couple of exists conditions along with correlated subqueries:
select m.*
from movie m
where
exists (
select 1
from movie_genre mg
inner join genre g on g.id = mg.genre_id
where mg.movie_id = m.id and g.title = 'R')
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Horror'
)
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Comedy'
)
Related
I want to find for each genre of movie, find the N actors who have played in most movies of the genre
Tables and their columns:
actor(actor_id,name)
role(actor_id,movie_id)
movie(movie_id,title)
movie_has_genre(movie_id,genre_id)
genre(genre_id,genre_name)
With this query I can find the actors who played on the most movies of the same genre.
select t1.genre_name, t1.actor_id, t1.max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t1
inner join
(
select genre_name, MAX(max_value) AS max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t
GROUP BY genre_name
) t2
ON t1.genre_name = t2.genre_name and t1.max_value = t2.max_value
ORDER BY
t1.max_value desc;
But I want to limit the number of the actors to 1.So how can I do that?
Example:
Results I get:
genre_name | actor_id | max_value
==================================
Thriller | 22591 | 7
Drama | 22591 | 6
Crime | 65536 | 3
Horror | 22591 | 3
Action | 292028 | 3
Action | 378578 | 3
Action | 388698 | 3
Results I want:
genre_name | actor_id | max_value
==================================
Thriller | 22591 | 7
Drama | 22591 | 6
Crime | 65536 | 3
Horror | 22591 | 3
Action | 292028 | 3
If you want just one actor selected randomly just add the following line to your code:
select genre_name, actor_id, max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t1
inner join
(
select genre_name, MAX(max_value) AS max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t
GROUP BY genre_name
) t2
USING(genre_name,max_value)
GROUP BY genre_name, max_value
ORDER BY max_value desc;
Some of the joins you used are redundant.
SELECT
U.genre_name, U.actor_id, U.actor_genre_count
FROM
(SELECT
A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
FROM genre A
JOIN movie_has_genre B
ON A.genre_id=B.genre_id
JOIN role C
ON C.movie_id=B.movie_id
GROUP BY A.genre_id, A.genre_name, C.actor_id) U
JOIN
(SELECT
S.genre_id, S.genre_name, MAX(S.actor_genre_count) max_actor_genre
FROM
(SELECT
A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
FROM genre A
JOIN movie_has_genre B
ON A.genre_id=B.genre_id
JOIN role C
ON C.movie_id=B.movie_id
GROUP BY A.genre_id, A.genre_name, C.actor_id) S
GROUP BY S.genre_id, S.genre_name) V
ON U.genre_name=V.genre_name AND U.actor_genre_count=V.max_actor_genre;
This solution is adapted from this Stack Overflow answer about limiting results by name. I attempted to do a similar query that should choose the first actor_id and only return it.
SELECT id, CategoryName, image, date_listed, item_id
SELECT t1.genre_name, t1.actor_id, t1.actor_movie_count
FROM
(
SELECT g.genre_name, r.actor_id, COUNT(*) as actor_movie_count
FROM genre g
INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
INNER JOIN role r ON m.movie_id = r.movie_id
GROUP BY g.genre_name, r.actor_id
) t1
LEFT JOIN
(
SELECT genre_name, actor_id, MAX(actor_movie_count) AS max_actor_movie_count
FROM
(
SELECT g.genre_name, r.actor_id, COUNT(*) AS actor_movie_count
FROM genre g
INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
INNER JOIN role r ON m.movie_id = r.movie_id
GROUP BY g.genre_name, r.actor_id
)
GROUP BY genre_name
) t2
ON t1.genre_name = t2.genre_name AND t1.actor_movie_count = t2.max_actor_movie_count AND (t1.actor_id > t2.actor_id)
WHERE t2.genre_id IS NULL
ORDER BY t1.actor_movie_count DESC
If this still doesn't solve your problem, other similar questions with explanations are described below:
SO answer about returning 1 row per group
SO question about limiting query answer to N results per group
SO question about selecting N items per category
External Article: Finding the max/first of a particular group in SQL
You can use a correlated LIMIT 1 subquery to get the id of the actor who played that genre most.
select g.genre_name, (
select r.actor_id
from movie_has_genre mg
join role r on r.movie_id = mg.movie_id
where mg.genre_id = g.genre_id
group by r.actor_id
order by count(*) desc,
r.actor_id asc -- on tie least actor_id wins
) as actor_id
from genre g
The result would be like:
genre_name | actor_id
======================
Thriller | 22591
Drama | 22591
Crime | 65536
Horror | 22591
Action | 292028
As you see, the count is not included. If you need the count, the simple way would be to return it in the same string column with actor_id
Change the SELECT clause in the subquery to
select concat(r.actor_id, ':', count(*)) as actor_id_count
This wil return the actor_id and the count in a single string column like
genre_name | actor_id_count
===========================
Thriller | 22591:7
You can then parse it (with split, explode or what ever) in your application code.
A solution with CTE (Common Table Expression) and ROW_NUMBER() (window functions) (supported by MySQL 8 and MariaDB 10.2) could be:
with cte as (
select g.genre_name, r.actor_id, count(*) as max_value,
row_number() over (partition by g.genre_name order by count(*) desc, r.actor_id) as rn
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join role r on mhg.movie_id = r.movie_id
group by g.genre_name, r.actor_id
)
select genre_name, actor_id, max_value from cte where rn = 1
I have a many to many relationship between Movies and Genres. What I want to do is query for Action Comedy Movies. This is as close as I have gotten:
SELECT * FROM movies
JOIN movies_genre ON (movies.id = movies_genre.movie_id)
JOIN genres ON (movies_genre.genre_id = genres.id)
WHERE (
genres.genre = "Comedy" OR
genres.genre = "Action & Adventure"
)
But this gives me all the movies that are Comedy or Adventure. If I change the OR to an AND then I get back an empty table. Is there a simple way to do this with one query?
You want information about a movie, so SELECT * is not appropriate. The following query returns movie ids that match both genres:
SELECT mg.movie_id
FROM movies_genre mg JOIN
genres g
ON mg.genre_id = g.id
WHERE g.genre IN ('Comedy', 'Action & Adventure')
GROUP BY mg.movie_id
HAVING COUNT(*) = 2;
Notes:
Table aliases make the query much easier to write and to read.
IN is more sensible than a bunch of OR expressions.
The HAVING clause counts the number of matching genres. It assumes that genres are not repeated.
If you want full movie information, you can join that in using additional logic.
Try...
select * from movies m
where m.id in (select movie_id from movies_genre join genres on (movies_genere.genre_id = genres.id) where genres.genre = 'Comedy')
and m.id in (select movie_id from movies_genre join genres on (movies_genere.genre_id = genres.id) where genres.genre = 'Action & Adventure')
Ugh, many-to-many relationships are the worst, requiring queries like this:
select *
from movies as m
where exists (
select 1
from movies_genre as mg
inner join genres as g
on g.id = mg.genre_id
where mg.movie_id = m.id
and g.genre in ('Action & Adventure', 'Comedy')
group by g.id
having count(*) = 2
)
I found something that works, but in no way do I believe it is the best way:
SELECT * FROM movies
JOIN movies_genre ON (movies.id = movies_genre.movie_id)
JOIN genres ON (movies_genre.genre_id = genres.id)
WHERE (
genres.genre = "Action"
AND (
movies.poster IN (
SELECT movies.poster FROM movies
JOIN movies_genre ON (movies.id = movies_genre.id)
JOIN genres ON (movies_genre.id = genres.id)
WHERE genres.genre = "Comedy"
)
)
)
I have three tables( movie,actor,casting). I want to know the actors name for the id obtained from this query.
select id from movie where title ='Casablanca';
My tables:
Movie | Actor | casting
_______ ________ _______
Movieid Actorid Movieid
title name Actorid
yr ord
director
budget
gross
This should do it:
SELECT a.name
FROM movie m
INNER JOIN casting c
ON m.id = c.movieid
INNER JOIN actor a
ON c.actorid = a.id
WHERE m.title = 'Casablanca';
Try this:
SELECT a.id, a.name
FROM actor a
INNER JOIN casting c
ON a.id = c.actorid
INNER JOIN movie m
ON c.movieid = m.id
WHERE m.title ='Casablanca';
I have to group my anime index according to their AniDB ID and show the values in a DESCENDING order according to file auto increment id.
Here's what I did currently:
SELECT
f.id, f.category, f.anidb, f.mal_id, COUNT( * ) AS dupes, f.filename,
a.titles, a.synopsis, a.episodes, a.image, a.rating,
c.name as cat_name, c.id as categoryid
FROM table_files f
LEFT JOIN table_anidb a ON a.id = f.anidb
LEFT JOIN table_categories c ON c.id = f.category
GROUP BY a.id ORDER BY f.id DESC
PROBLEM:
I have Naruto 8 episodes. episode 8's ID is 204. And ep.1 has ID 160. The query return like this:
id | anidb | filename | dupes | cat_name
--------------------------------------------------------
201 | 8692 | SAO | 1 | Series
200 | 9251 | RYO | 1 | Movie
.....
.......
160 | 239 | Naruto ep.1 | 8 | Series
But I want Naruto Episode 8 to be showed in the top of the results instead of episode 1 in the last.
How do I group by anidb and mal_id at the same time with an OR logic? So that the grouping can be done even if there is not any anidb ID provided.
Ad. 1.
Since id, anidb and filename are all in one table i'm afraid you can't get away from doing a subquery join:
SQLFiddle
SELECT f.id, f.anidb, f.filename
FROM files f
JOIN
(SELECT MAX(id) as id FROM files GROUP BY anidb) AS f2
ON f2.id = f.id
ORDER BY f.id DESC
(data flattened for the sake of readibility but you can get the general idea)
Ad. 2.
As for the second problem, you really just have to add second grouping column to the above joined subquery:
SQLFiddle
SELECT f.id, f.anidb, f.mal_id, f.filename
FROM files f
JOIN
(SELECT MAX(id) as id FROM files GROUP BY anidb, mal_id) AS f2 on f2.id = f.id
ORDER BY f.id DESC
The NULL's are distinct from each other (e.g. NULL != NULL) so there's no fear that grouping would melt all the nulled anidb rows into one.
For the first problem you can use ORDER BY dupes
SELECT
f.id, f.category, f.anidb, f.mal_id, COUNT( * ) AS dupes, f.filename,
a.titles, a.synopsis, a.episodes, a.image, a.rating,
c.name as cat_name, c.id as categoryid
FROM table_files f
LEFT JOIN table_anidb a ON a.id = f.anidb
LEFT JOIN table_categories c ON c.id = f.category
GROUP BY a.id ORDER BY dupes DESC
For the second problem you can use CASE to check if f.anidb is null
SELECT
f.id, f.category, f.anidb, f.mal_id, COUNT( * ) AS dupes, f.filename,
a.titles, a.synopsis, a.episodes, a.image, a.rating,
c.name as cat_name, c.id as categoryid
FROM table_files f
LEFT JOIN table_anidb a ON a.id = f.anidb
LEFT JOIN table_categories c ON c.id = f.category
GROUP BY
(CASE WHEN f.anidb IS NULL THEN f.mal_id ELSE f.anidb END )
ORDER BY dupes DESC
So let's same I'm trying to find actors who are in two movies together (for the purpose of a degrees of separation page). I have databases as such (this is just some made up data):
actors
id first_name last_name gender
17 brad pitt m
2 kevin bacon m
movies
id name year
20 benjamin button 2008
roles
a_id m_id role
17 20 Mr. Benjamin Button
So I want to return the names of the movies which both actors are in. I have the first and last names of two actors.
I'm having a lot of trouble getting this to work. What I'm having trouble with, specifically, is the SELECT part
SELECT name FROM movies JOIN . . .
I'm starting with first_name and last_name values for each
You must join twice:
SELECT m.name movie_name
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.first_name = 'brad' and a1.last_name = 'pitt' and
a2.first_name = 'kevin' and a2.last_name = 'bacon'
Show all actor combinations per film:
SELECT m.name movie_name, a1.id actor1, a2.id actor2
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.id < a2.id
The < ensures that each combination is only reported once.
select m.name,group_concat(concat_ws(' ',a.first_name,a.last_name) order by a.last_name) as actors
from actors as a
inner join roles as r on a.id = r.a_id
inner join movies as m on m.id = r.m_id
where r.a_id in (2,17)
group by r.m_id
having count(r.a_id) = 2
order by m.name
declare #FirstActorID int,
#SecondActorID int;
select m.[name]
from
movies m
inner join [roles] r1 on r1.m_id = m.id and r1.a_id = #FirstActorID
inner join [roles] r2 on r2.m_id = m.id and r2.a_id = #SecondActorID