a seemingly generic SQL query really left me clueless.
Here's the case.
I have 3 generic tables (simplified versions here):
Movie
id | title
-----------------------
1 | Evil Dead
-----------------------
2 | Bohemian Rhapsody
....
Genre
id | title
-----------------------
1 | Horror
-----------------------
2 | Comedy
....
Rating
id | title
-----------------------
1 | PG-13
-----------------------
2 | R
....
And 2 many-to-many tables to connect them:
Movie_Genre
movie_id | genre_id
Movie_Rating
movie_id | rating_id
The initial challenge was to write a query which allows me to fetch movies that belong to multiple genres (e.g. horror comedies or sci-fi action).
Thankfully, I was able to find this solution here
MySQL: Select records where joined table matches ALL values
However, what would be the correct option to fetch records that belong to multiple many-to-many tables? E.g. rated R horror comedies. Is there any way to do so without subquery (or a single one only)?
One method uses correlated subqueries:
select m.*
from movies m
where (select count(*)
from movie_genre mg
where mg.movie_id = m.id
) > 1 and
(select count(*)
from movie_rating mr
where mr.movie_id = m.id
) > 1 ;
With indexes on movie_genre(movie_id) and movie_rating(movie_id) this probably has quite reasonable performance.
The above is possibly the most efficient method. However, if you wanted to avoid subqueries, one method would be:
select mg.movie_id
from movie_genres mg join
movie_ratings mr
on mg.movie_id = mr.movie_id
group by mg.movie_id
having count(distinct mg.genre_id) > 0 and
count(distinct mr.genre_id) > 0;
More efficient than the above is aggregating before the join:
select mg.movie_id
from (select movie_id
from mg_genres
group by movie_id
having count(*) >= 2
) mg join
(select movie_id
from mg_ratings
group by movie_id
having count(*) >= 2
) mr
on mg.movie_id = mr.movie_id;
Although you state that you want to avoid subqueries, the irony is that the version with no subqueries probably has the worst performance of these three options.
E.g. rated R horror comedies
You can join all the tables together, aggregate by movie and filter with a HAVING clause:
select m.id, m.title
from movies m
inner join movie_genre mg on mg.movid_id = m.id
inner join genre g on g.id = mg.genre_id
inner join movie_rating mr on mr.movie_id = m.id
inner join rating r on r.id = mr.rating_id
group by m.id, m.title
having
max(r.title = 'R') = 1
and max(g.title = 'Horror') = 1
and max(g.title = 'Comedy') = 1
You can also use a couple of exists conditions along with correlated subqueries:
select m.*
from movie m
where
exists (
select 1
from movie_genre mg
inner join genre g on g.id = mg.genre_id
where mg.movie_id = m.id and g.title = 'R')
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Horror'
)
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Comedy'
)
I want to find for each genre of movie, find the N actors who have played in most movies of the genre
Tables and their columns:
actor(actor_id,name)
role(actor_id,movie_id)
movie(movie_id,title)
movie_has_genre(movie_id,genre_id)
genre(genre_id,genre_name)
With this query I can find the actors who played on the most movies of the same genre.
select t1.genre_name, t1.actor_id, t1.max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t1
inner join
(
select genre_name, MAX(max_value) AS max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t
GROUP BY genre_name
) t2
ON t1.genre_name = t2.genre_name and t1.max_value = t2.max_value
ORDER BY
t1.max_value desc;
But I want to limit the number of the actors to 1.So how can I do that?
Example:
Results I get:
genre_name | actor_id | max_value
==================================
Thriller | 22591 | 7
Drama | 22591 | 6
Crime | 65536 | 3
Horror | 22591 | 3
Action | 292028 | 3
Action | 378578 | 3
Action | 388698 | 3
Results I want:
genre_name | actor_id | max_value
==================================
Thriller | 22591 | 7
Drama | 22591 | 6
Crime | 65536 | 3
Horror | 22591 | 3
Action | 292028 | 3
If you want just one actor selected randomly just add the following line to your code:
select genre_name, actor_id, max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t1
inner join
(
select genre_name, MAX(max_value) AS max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t
GROUP BY genre_name
) t2
USING(genre_name,max_value)
GROUP BY genre_name, max_value
ORDER BY max_value desc;
Some of the joins you used are redundant.
SELECT
U.genre_name, U.actor_id, U.actor_genre_count
FROM
(SELECT
A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
FROM genre A
JOIN movie_has_genre B
ON A.genre_id=B.genre_id
JOIN role C
ON C.movie_id=B.movie_id
GROUP BY A.genre_id, A.genre_name, C.actor_id) U
JOIN
(SELECT
S.genre_id, S.genre_name, MAX(S.actor_genre_count) max_actor_genre
FROM
(SELECT
A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
FROM genre A
JOIN movie_has_genre B
ON A.genre_id=B.genre_id
JOIN role C
ON C.movie_id=B.movie_id
GROUP BY A.genre_id, A.genre_name, C.actor_id) S
GROUP BY S.genre_id, S.genre_name) V
ON U.genre_name=V.genre_name AND U.actor_genre_count=V.max_actor_genre;
This solution is adapted from this Stack Overflow answer about limiting results by name. I attempted to do a similar query that should choose the first actor_id and only return it.
SELECT id, CategoryName, image, date_listed, item_id
SELECT t1.genre_name, t1.actor_id, t1.actor_movie_count
FROM
(
SELECT g.genre_name, r.actor_id, COUNT(*) as actor_movie_count
FROM genre g
INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
INNER JOIN role r ON m.movie_id = r.movie_id
GROUP BY g.genre_name, r.actor_id
) t1
LEFT JOIN
(
SELECT genre_name, actor_id, MAX(actor_movie_count) AS max_actor_movie_count
FROM
(
SELECT g.genre_name, r.actor_id, COUNT(*) AS actor_movie_count
FROM genre g
INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
INNER JOIN role r ON m.movie_id = r.movie_id
GROUP BY g.genre_name, r.actor_id
)
GROUP BY genre_name
) t2
ON t1.genre_name = t2.genre_name AND t1.actor_movie_count = t2.max_actor_movie_count AND (t1.actor_id > t2.actor_id)
WHERE t2.genre_id IS NULL
ORDER BY t1.actor_movie_count DESC
If this still doesn't solve your problem, other similar questions with explanations are described below:
SO answer about returning 1 row per group
SO question about limiting query answer to N results per group
SO question about selecting N items per category
External Article: Finding the max/first of a particular group in SQL
You can use a correlated LIMIT 1 subquery to get the id of the actor who played that genre most.
select g.genre_name, (
select r.actor_id
from movie_has_genre mg
join role r on r.movie_id = mg.movie_id
where mg.genre_id = g.genre_id
group by r.actor_id
order by count(*) desc,
r.actor_id asc -- on tie least actor_id wins
) as actor_id
from genre g
The result would be like:
genre_name | actor_id
======================
Thriller | 22591
Drama | 22591
Crime | 65536
Horror | 22591
Action | 292028
As you see, the count is not included. If you need the count, the simple way would be to return it in the same string column with actor_id
Change the SELECT clause in the subquery to
select concat(r.actor_id, ':', count(*)) as actor_id_count
This wil return the actor_id and the count in a single string column like
genre_name | actor_id_count
===========================
Thriller | 22591:7
You can then parse it (with split, explode or what ever) in your application code.
A solution with CTE (Common Table Expression) and ROW_NUMBER() (window functions) (supported by MySQL 8 and MariaDB 10.2) could be:
with cte as (
select g.genre_name, r.actor_id, count(*) as max_value,
row_number() over (partition by g.genre_name order by count(*) desc, r.actor_id) as rn
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join role r on mhg.movie_id = r.movie_id
group by g.genre_name, r.actor_id
)
select genre_name, actor_id, max_value from cte where rn = 1
I want to perform a search query on my database tables. This is my tables example.
movies
--------------------------
ID (INT) | TITLE (VARCHAR)
--------------------------
actors
-------------------------
ID (INT) | NAME (VARCHAR)
-------------------------
directors
-------------------------
ID (INT) | NAME (VARCHAR)
-------------------------
ma
------------------------------------
ID (INT) | MOVIE (INT) | ACTOR (INT)
------------------------------------
md
---------------------------------------
ID (INT) | MOVIE (INT) | DIRECTOR (INT)
---------------------------------------
I want to search on movies.title, actors.name, directors.name based on some keyword. ma table have field movie that reference to movie.id and field actor that reference to actor.id and so on. I want to display the result group by movie.id. The problem is I can't figured out how to perform based on table directors and actors, based on table ma and md that have reference to table movies. Sorry for the grammar, if you think this question worth it, you can edit it.
So, any idea?
This one you get everything
SELECT m.title, a.name AS actor, d.name AS director
FROM movies m
INNER JOIN ma ON (ma.movie_id = m.id)
INNER JOIN actors a ON (a.id = ma.actor_id)
INNER JOIN md ON (md.movie_id = m.id)
INNER JOIN directors d ON (d.id = md.director_id)
ORDER BY m.title
This one you group by movies
SELECT m.title, GROUP_CONCAT(a.name) AS actorS, GROUP_CONCAT(d.name) AS directors
FROM movies m
INNER JOIN ma ON (ma.movie_id = m.id)
INNER JOIN actors a ON (a.id = ma.actor_id)
INNER JOIN md ON (md.movie_id = m.id)
INNER JOIN directors d ON (d.id = md.director_id)
GROUP BY m.id
ORDER BY m.title
[EDIT]
For goodness' sake.. just add the WHERE
...
WHERE m.title LIKE 'KEYWORD%' OR a.name LIKE 'KEYWORD%' OR d.name LIKE 'KEYWORD%'
So let's same I'm trying to find actors who are in two movies together (for the purpose of a degrees of separation page). I have databases as such (this is just some made up data):
actors
id first_name last_name gender
17 brad pitt m
2 kevin bacon m
movies
id name year
20 benjamin button 2008
roles
a_id m_id role
17 20 Mr. Benjamin Button
So I want to return the names of the movies which both actors are in. I have the first and last names of two actors.
I'm having a lot of trouble getting this to work. What I'm having trouble with, specifically, is the SELECT part
SELECT name FROM movies JOIN . . .
I'm starting with first_name and last_name values for each
You must join twice:
SELECT m.name movie_name
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.first_name = 'brad' and a1.last_name = 'pitt' and
a2.first_name = 'kevin' and a2.last_name = 'bacon'
Show all actor combinations per film:
SELECT m.name movie_name, a1.id actor1, a2.id actor2
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.id < a2.id
The < ensures that each combination is only reported once.
select m.name,group_concat(concat_ws(' ',a.first_name,a.last_name) order by a.last_name) as actors
from actors as a
inner join roles as r on a.id = r.a_id
inner join movies as m on m.id = r.m_id
where r.a_id in (2,17)
group by r.m_id
having count(r.a_id) = 2
order by m.name
declare #FirstActorID int,
#SecondActorID int;
select m.[name]
from
movies m
inner join [roles] r1 on r1.m_id = m.id and r1.a_id = #FirstActorID
inner join [roles] r2 on r2.m_id = m.id and r2.a_id = #SecondActorID
Consider I have three tables:
Movies (movie 1, movie 2, etc)
Categories (action, suspense, etc)
Movies_Categories (movie 1 -> action, movie 1 -> suspense, movie 2 -> suspense, etc)
How could I select only the movies that belong or don't belong to a specific category using only 1 query?
Thanks!
Belongs:
SELECT m.*
FROM movies m
INNER JOIN movies_categories mc
ON m.id = mc.movie_id
INNER JOIN categories c
ON c.id = mc.category_id
AND c.name = 'action';
Doesn't belong:
SELECT m.*
FROM movies m
LEFT OUTER JOIN (SELECT mc.movie_id
FROM movies_categories mc
INNER JOIN categories c
ON c.id = mc.category_id
AND c.name = 'action') mcx
ON m.id = mcx.movie_id
WHERE mcx.movie_id IS NULL
SELECT m.*,
IF(mc.movie_id IS NULL, 'doesnt belong', 'belongs')
FROM Movies m
LEFT JOIN Movies_Categories mc ON mc.movie_id = m.id
AND mc.category_id = (SELECT id
FROM categories
WHERE name = 'action')