I want to find for each genre of movie, find the N actors who have played in most movies of the genre
Tables and their columns:
actor(actor_id,name)
role(actor_id,movie_id)
movie(movie_id,title)
movie_has_genre(movie_id,genre_id)
genre(genre_id,genre_name)
With this query I can find the actors who played on the most movies of the same genre.
select t1.genre_name, t1.actor_id, t1.max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t1
inner join
(
select genre_name, MAX(max_value) AS max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t
GROUP BY genre_name
) t2
ON t1.genre_name = t2.genre_name and t1.max_value = t2.max_value
ORDER BY
t1.max_value desc;
But I want to limit the number of the actors to 1.So how can I do that?
Example:
Results I get:
genre_name | actor_id | max_value
==================================
Thriller | 22591 | 7
Drama | 22591 | 6
Crime | 65536 | 3
Horror | 22591 | 3
Action | 292028 | 3
Action | 378578 | 3
Action | 388698 | 3
Results I want:
genre_name | actor_id | max_value
==================================
Thriller | 22591 | 7
Drama | 22591 | 6
Crime | 65536 | 3
Horror | 22591 | 3
Action | 292028 | 3
If you want just one actor selected randomly just add the following line to your code:
select genre_name, actor_id, max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t1
inner join
(
select genre_name, MAX(max_value) AS max_value
from
(
select g.genre_name, a.actor_id, count(*) as max_value
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join movie m on mhg.movie_id = m.movie_id
inner join role r on m.movie_id = r.movie_id
inner join actor a on a.actor_id = r.actor_id
group by g.genre_name, a.actor_id
) t
GROUP BY genre_name
) t2
USING(genre_name,max_value)
GROUP BY genre_name, max_value
ORDER BY max_value desc;
Some of the joins you used are redundant.
SELECT
U.genre_name, U.actor_id, U.actor_genre_count
FROM
(SELECT
A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
FROM genre A
JOIN movie_has_genre B
ON A.genre_id=B.genre_id
JOIN role C
ON C.movie_id=B.movie_id
GROUP BY A.genre_id, A.genre_name, C.actor_id) U
JOIN
(SELECT
S.genre_id, S.genre_name, MAX(S.actor_genre_count) max_actor_genre
FROM
(SELECT
A.genre_id, A.genre_name, C.actor_id, count(*) actor_genre_count
FROM genre A
JOIN movie_has_genre B
ON A.genre_id=B.genre_id
JOIN role C
ON C.movie_id=B.movie_id
GROUP BY A.genre_id, A.genre_name, C.actor_id) S
GROUP BY S.genre_id, S.genre_name) V
ON U.genre_name=V.genre_name AND U.actor_genre_count=V.max_actor_genre;
This solution is adapted from this Stack Overflow answer about limiting results by name. I attempted to do a similar query that should choose the first actor_id and only return it.
SELECT id, CategoryName, image, date_listed, item_id
SELECT t1.genre_name, t1.actor_id, t1.actor_movie_count
FROM
(
SELECT g.genre_name, r.actor_id, COUNT(*) as actor_movie_count
FROM genre g
INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
INNER JOIN role r ON m.movie_id = r.movie_id
GROUP BY g.genre_name, r.actor_id
) t1
LEFT JOIN
(
SELECT genre_name, actor_id, MAX(actor_movie_count) AS max_actor_movie_count
FROM
(
SELECT g.genre_name, r.actor_id, COUNT(*) AS actor_movie_count
FROM genre g
INNER JOIN movie_has_genre mhg ON mhg.genre_id = g.genre_id
INNER JOIN role r ON m.movie_id = r.movie_id
GROUP BY g.genre_name, r.actor_id
)
GROUP BY genre_name
) t2
ON t1.genre_name = t2.genre_name AND t1.actor_movie_count = t2.max_actor_movie_count AND (t1.actor_id > t2.actor_id)
WHERE t2.genre_id IS NULL
ORDER BY t1.actor_movie_count DESC
If this still doesn't solve your problem, other similar questions with explanations are described below:
SO answer about returning 1 row per group
SO question about limiting query answer to N results per group
SO question about selecting N items per category
External Article: Finding the max/first of a particular group in SQL
You can use a correlated LIMIT 1 subquery to get the id of the actor who played that genre most.
select g.genre_name, (
select r.actor_id
from movie_has_genre mg
join role r on r.movie_id = mg.movie_id
where mg.genre_id = g.genre_id
group by r.actor_id
order by count(*) desc,
r.actor_id asc -- on tie least actor_id wins
) as actor_id
from genre g
The result would be like:
genre_name | actor_id
======================
Thriller | 22591
Drama | 22591
Crime | 65536
Horror | 22591
Action | 292028
As you see, the count is not included. If you need the count, the simple way would be to return it in the same string column with actor_id
Change the SELECT clause in the subquery to
select concat(r.actor_id, ':', count(*)) as actor_id_count
This wil return the actor_id and the count in a single string column like
genre_name | actor_id_count
===========================
Thriller | 22591:7
You can then parse it (with split, explode or what ever) in your application code.
A solution with CTE (Common Table Expression) and ROW_NUMBER() (window functions) (supported by MySQL 8 and MariaDB 10.2) could be:
with cte as (
select g.genre_name, r.actor_id, count(*) as max_value,
row_number() over (partition by g.genre_name order by count(*) desc, r.actor_id) as rn
from genre g
inner join movie_has_genre mhg on mhg.genre_id = g.genre_id
inner join role r on mhg.movie_id = r.movie_id
group by g.genre_name, r.actor_id
)
select genre_name, actor_id, max_value from cte where rn = 1
Related
-- Note: The actor should have acted in at least five Indian movies. -- (Hint: You should use the weighted average based on votes. If the ratings clash, then the total number of votes should act as the tie breaker
SELECT n.name as actor_name
, r.total_votes
, COUNT(r.movie_id) as movie_count
, r.avg_rating as actor_avg_rating
, RANK() OVER( PARTITION BY
rm.category = 'actor'
ORDER BY
r.avg_rating DESC
) actor_rank
FROM names as n
JOIN role_mapping as rm
ON n.id = rm.movie_id
JOIN movie as m
ON m.id = rm.movie_id
JOIN ratings as r
ON r.movie_id = m.id
where m.country regexp '^INDIA$'
and m.languages regexp '^HINDI$'
group
by actor_name
having count(rm.movie_id) >= 5;
The output gives no error but no result too.
This would work:
SELECT a.name as actor_name, c.total_votes, COUNT(c.movie_id) as movie_count,c.avg_rating as actor_avg_rating,
RANK() OVER( PARTITION BY
d.category = 'actor'
ORDER BY
c.avg_rating DESC
) actor_rank
FROM names a, movie b, ratings c, role_mapping d
where b.country = 'INDIA'
and b.id = c.movie_id
and b.id= d.movie_id
and a.id = d.name_id
group by actor_name
having count(d.movie_id) >= 5
order by actor_avg_rating desc
;
You had tried joining nameid with movie id which is the mistake
SELECT NAME AS actor_name,
Cast(Sum(total_votes)/Count(movie_id) AS DECIMAL(8,0)) AS total_votes,
Count(movie_id) AS movie_count,
avg_rating AS actor_avg_rating,
Dense_rank() OVER(ORDER BY avg_rating DESC) AS actor_rank
FROM names n INNER JOIN role_mapping r ON n.id=r.name_id
INNER JOIN ratings using (movie_id) INNER JOIN movie m ON m.id=r.movie_id
WHERE country="india" AND category="actor"
GROUP BY actor_name
HAVING Count(movie_id)>=5;
WITH top_actor
AS (SELECT b.NAME
AS
actor_name,
Sum(c.total_votes)
AS
total_votes,
Count(DISTINCT a.movie_id)
AS
movie_count,
Round(Sum(c.avg_rating * c.total_votes) / Sum(c.total_votes), 2)
AS
actor_avg_rating
FROM role_mapping a
INNER JOIN names b
ON a.name_id = b.id
INNER JOIN ratings c
ON a.movie_id = c.movie_id
INNER JOIN movie d
ON a.movie_id = d.id
WHERE a.category = 'actor'
AND d.country LIKE '%India%'
GROUP BY a.name_id,
b.NAME
HAVING Count(DISTINCT a.movie_id) >= 5)
SELECT *,
Rank()
OVER (
ORDER BY actor_avg_rating DESC) AS actor_rank
FROM top_actor;
a seemingly generic SQL query really left me clueless.
Here's the case.
I have 3 generic tables (simplified versions here):
Movie
id | title
-----------------------
1 | Evil Dead
-----------------------
2 | Bohemian Rhapsody
....
Genre
id | title
-----------------------
1 | Horror
-----------------------
2 | Comedy
....
Rating
id | title
-----------------------
1 | PG-13
-----------------------
2 | R
....
And 2 many-to-many tables to connect them:
Movie_Genre
movie_id | genre_id
Movie_Rating
movie_id | rating_id
The initial challenge was to write a query which allows me to fetch movies that belong to multiple genres (e.g. horror comedies or sci-fi action).
Thankfully, I was able to find this solution here
MySQL: Select records where joined table matches ALL values
However, what would be the correct option to fetch records that belong to multiple many-to-many tables? E.g. rated R horror comedies. Is there any way to do so without subquery (or a single one only)?
One method uses correlated subqueries:
select m.*
from movies m
where (select count(*)
from movie_genre mg
where mg.movie_id = m.id
) > 1 and
(select count(*)
from movie_rating mr
where mr.movie_id = m.id
) > 1 ;
With indexes on movie_genre(movie_id) and movie_rating(movie_id) this probably has quite reasonable performance.
The above is possibly the most efficient method. However, if you wanted to avoid subqueries, one method would be:
select mg.movie_id
from movie_genres mg join
movie_ratings mr
on mg.movie_id = mr.movie_id
group by mg.movie_id
having count(distinct mg.genre_id) > 0 and
count(distinct mr.genre_id) > 0;
More efficient than the above is aggregating before the join:
select mg.movie_id
from (select movie_id
from mg_genres
group by movie_id
having count(*) >= 2
) mg join
(select movie_id
from mg_ratings
group by movie_id
having count(*) >= 2
) mr
on mg.movie_id = mr.movie_id;
Although you state that you want to avoid subqueries, the irony is that the version with no subqueries probably has the worst performance of these three options.
E.g. rated R horror comedies
You can join all the tables together, aggregate by movie and filter with a HAVING clause:
select m.id, m.title
from movies m
inner join movie_genre mg on mg.movid_id = m.id
inner join genre g on g.id = mg.genre_id
inner join movie_rating mr on mr.movie_id = m.id
inner join rating r on r.id = mr.rating_id
group by m.id, m.title
having
max(r.title = 'R') = 1
and max(g.title = 'Horror') = 1
and max(g.title = 'Comedy') = 1
You can also use a couple of exists conditions along with correlated subqueries:
select m.*
from movie m
where
exists (
select 1
from movie_genre mg
inner join genre g on g.id = mg.genre_id
where mg.movie_id = m.id and g.title = 'R')
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Horror'
)
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Comedy'
)
I've the above dataset, I need to report for each year the percentage of movies in that year with only female actors, and the total number of movies made that year. For example, one answer will be: 1990 31.81 13522 meaning that in 1990 there were 13,522 movies, and 31.81%
In order to get the moves with only female actors, wrote the following code:
SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title
FROM Movie a
WHERE a.title NOT IN (
SELECT b.title from Movie b
Inner Join M_cast c
on TRIM(c.MID) = b.MID
Inner Join Person d
on TRIM(c.PID) = d.PID
WHERE d.Gender='Male'
GROUP BY b.title
)
GROUP BY a.year,a.title
Order By a.year asc
The total movies in each year , can be found using the following:
SELECT a.year, count(a.title) AS Total_Movies
FROM Movie a
GROUP BY a.year
ORDER BY COUNT(a.title) DESC
Combinig the both I wrote, the following code:
SELECT z.year as Year, count(z.title) AS Total_Movies, count(x.title) as Female_movies, count(z.title)/ count(x.title) As percentage
FROM Movie z
Inner Join (
SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title
FROM Movie a
WHERE a.title NOT IN (
SELECT b.title from Movie b
Inner Join M_cast c
on TRIM(c.MID) = b.MID
Inner Join Person d
on TRIM(c.PID) = d.PID
WHERE d.Gender='Male'
GROUP BY b.title
)
GROUP BY a.year,a.title
Order By a.year asc
)x
on x.year = z.year
GROUP BY z.year
ORDER BY COUNT(z.title) DESC
However, in th output I'm seeing the years with only female movies correctly, but the count of total movies is equal to female_movies so I'm getting 1%, I tried debugging the code, but not sure where this is going wrong. Any insights would be appreciated.
You assume that your 'z' contains all movies but since you do an inner join on the female movies, they'll also only contain female movies. You could fix that with a 'left join'.
Assuming your two queries are correct, you can join on them with a 'WITH' like this:
WITH allmovies (year, cnt) as
(SELECT a.year, count(a.title) AS Total_Movies
FROM Movie a
GROUP BY a.year
ORDER BY COUNT(a.title) DESC)
,
femalemovies (year, cnt, title) as
(SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title
FROM Movie a
WHERE a.title NOT IN (
SELECT b.title from Movie b
Inner Join M_cast c
on TRIM(c.MID) = b.MID
Inner Join Person d
on TRIM(c.PID) = d.PID
WHERE d.Gender='Male'
GROUP BY b.title
)
GROUP BY a.year,a.title
Order By a.year asc)
select * from allmovies left join femalemovies on allmovies.year = femalemovies.year
You can use conditional aggregation. In a CASE expression check if no cast member that isn't female exists with a correlated subquery. If the check is successful, return something not NULL and count() that to get the number of movies with only female cast members (or none at all).
SELECT m.year,
count(*) count_all,
count(CASE
WHEN NOT EXISTS (SELECT *
FROM m_cast c
INNER JOIN person p
ON p.pid = c.pid
WHERE c.mid = m.mid
AND p.gender <> 'Female') THEN
1
END)
/
count(*)
*
100 percentage_only_female
FROM movie m
GROUP BY m.year;
Since in MySQL Boolean expressions in numerical context evaluate to 1 if true and to 0 otherwise, you could also use a sum() over the NOT EXISTS.
SELECT m.year,
count(*) count_all,
sum(NOT EXISTS (SELECT *
FROM m_cast c
INNER JOIN person p
ON p.pid = c.pid
WHERE c.mid = m.mid
AND p.gender <> 'Female'))
/
count(*)
*
100 percentage_only_female
FROM movie m
GROUP BY m.year;
That however isn't compatible with most other DBMS in contrast to the first one.
I would use two levels of aggregation:
SELECT m.MID, m.title, m.year,
COUNT(*) as num_actors,
SUM(gender = 'Female') as num_female_actors
FROM Movie m JOIN
M_cast c
ON c.MID = b.MID JOIN
Person p
ON p.PID = c.PID
GROUP BY m.MID, m.title, m.year;
Then a simple outer aggregation:
SELECT year,
COUNT(*) as num_movies,
SUM( num_actors = num_female_actors ) as num_female_only,
AVG( num_actors = num_female_actors ) as female_only_ratio
FROM (SELECT m.MID, m.title, m.year,
COUNT(*) as num_actors,
SUM(gender = 'Female') as num_female_actors
FROM Movie m JOIN
M_cast c
ON c.MID = b.MID JOIN
Person p
ON p.PID = c.PID
GROUP BY m.MID, m.title, m.year
) m
GROUP BY year;
Notes:
Use meaningful table aliases, rather than arbitrary letters. You'll note that the table aliases are abbreviations for the table names.
Do not use functions when filtering or JOINing unless necessary. I removed the TRIM(). If you need it use it. Or better yet, fix the data.
SELECT m.Year,COUNT(m.Year),x.t,
(COUNT(m.Year)*1.0/x.t*1.0)*100
FROM Movie m LEFT JOIN
(SELECT Year,COUNT(Year) AS t FROM Movie GROUP BY year) AS x
ON m.Year=x.Year
WHERE m.MID IN
(SELECT MID FROM M_Cast WHERE PID in
(SELECT PID FROM Person WHERE Gender='Female')
AND m.MID NOT IN
(SELECT MID FROM M_Cast WHERE PID in
(SELECT PID FROM Person WHERE Gender='Male'))) GROUP BY m.year
Check if this is what you're looking for.
select movie.year, count(movie.mid) as Year_Wise_Movie_Count,cast(x.Female_Cast_Only as real) / count(movie.mid) As Percentage_of_Female_Cast from movie
inner join
(
SELECT Movie.year as Year, COUNT(Movie.mid) AS Female_Cast_Only
FROM Movie
WHERE Movie.MID NOT IN (
SELECT Movie.MID from Movie
Inner Join M_cast
on TRIM(M_cast.MID) = Movie.MID
Inner Join Person
on TRIM(M_cast.PID) = Person.PID
WHERE Person.Gender!='Female'
GROUP BY Movie.MID
)
GROUP BY Movie.year
Order By Movie.year asc
) x
on x.year = movie.year
GROUP BY movie.year
ORDER BY movie.year
Output:
year Year_Wise_Movie_Count Percentage_of_Female_Cast
---- --------------------- -------------------------
1939 2 0.5
1999 66 0.0151515151515152
2000 64 0.015625
2018 104 0.00961538461538462
Note:
This was executed in SQLIte3
I have to group my anime index according to their AniDB ID and show the values in a DESCENDING order according to file auto increment id.
Here's what I did currently:
SELECT
f.id, f.category, f.anidb, f.mal_id, COUNT( * ) AS dupes, f.filename,
a.titles, a.synopsis, a.episodes, a.image, a.rating,
c.name as cat_name, c.id as categoryid
FROM table_files f
LEFT JOIN table_anidb a ON a.id = f.anidb
LEFT JOIN table_categories c ON c.id = f.category
GROUP BY a.id ORDER BY f.id DESC
PROBLEM:
I have Naruto 8 episodes. episode 8's ID is 204. And ep.1 has ID 160. The query return like this:
id | anidb | filename | dupes | cat_name
--------------------------------------------------------
201 | 8692 | SAO | 1 | Series
200 | 9251 | RYO | 1 | Movie
.....
.......
160 | 239 | Naruto ep.1 | 8 | Series
But I want Naruto Episode 8 to be showed in the top of the results instead of episode 1 in the last.
How do I group by anidb and mal_id at the same time with an OR logic? So that the grouping can be done even if there is not any anidb ID provided.
Ad. 1.
Since id, anidb and filename are all in one table i'm afraid you can't get away from doing a subquery join:
SQLFiddle
SELECT f.id, f.anidb, f.filename
FROM files f
JOIN
(SELECT MAX(id) as id FROM files GROUP BY anidb) AS f2
ON f2.id = f.id
ORDER BY f.id DESC
(data flattened for the sake of readibility but you can get the general idea)
Ad. 2.
As for the second problem, you really just have to add second grouping column to the above joined subquery:
SQLFiddle
SELECT f.id, f.anidb, f.mal_id, f.filename
FROM files f
JOIN
(SELECT MAX(id) as id FROM files GROUP BY anidb, mal_id) AS f2 on f2.id = f.id
ORDER BY f.id DESC
The NULL's are distinct from each other (e.g. NULL != NULL) so there's no fear that grouping would melt all the nulled anidb rows into one.
For the first problem you can use ORDER BY dupes
SELECT
f.id, f.category, f.anidb, f.mal_id, COUNT( * ) AS dupes, f.filename,
a.titles, a.synopsis, a.episodes, a.image, a.rating,
c.name as cat_name, c.id as categoryid
FROM table_files f
LEFT JOIN table_anidb a ON a.id = f.anidb
LEFT JOIN table_categories c ON c.id = f.category
GROUP BY a.id ORDER BY dupes DESC
For the second problem you can use CASE to check if f.anidb is null
SELECT
f.id, f.category, f.anidb, f.mal_id, COUNT( * ) AS dupes, f.filename,
a.titles, a.synopsis, a.episodes, a.image, a.rating,
c.name as cat_name, c.id as categoryid
FROM table_files f
LEFT JOIN table_anidb a ON a.id = f.anidb
LEFT JOIN table_categories c ON c.id = f.category
GROUP BY
(CASE WHEN f.anidb IS NULL THEN f.mal_id ELSE f.anidb END )
ORDER BY dupes DESC
SELECT *
FROM ( SELECT a.*, a.id AS id_player,
(SELECT COUNT(id)
FROM `vd7qw_footsal_goals`
WHERE a.id = id_player
AND id_group IN (SELECT id_group
from `vd7qw_footsal_groupofleague`
WHERE id_league = 2)
) AS goals,
team.team_name
FROM `vd7qw_footsal_players` AS a
LEFT JOIN vd7qw_footsal_teams AS team
ON team.id = a.id_team
LEFT JOIN vd7qw_footsal_teamofgroup AS tog
ON tog.id_team = team.id
LEFT JOIN vd7qw_footsal_groups AS g
ON g.id = tog.id_group
WHERE (a.state IN (1))
) AS h
WHERE goals > 0
ORDER BY goals DESC
This is my query when I have 2 or more groups in 1 league and player score a goal in each group, query returns the correct number of goals but duplicates player for example:
John Doe got 3 goals in group 1
and
John Doe got 4 goals in group 2
Query returns:
John Doe got 7 goals
where is my mistake?
Try GROUP BY
SELECT *
FROM ( SELECT a.*, a.id AS id_player, (SELECT COUNT(id)
FROM `vd7qw_footsal_goals`
WHERE a.id = id_player
AND id_group IN (SELECT id_group
from `vd7qw_footsal_groupofleague`
WHERE id_league = 2)) AS goals, team.team_name
FROM `vd7qw_footsal_players` AS a
LEFT JOIN vd7qw_footsal_teams AS team
ON team.id = a.id_team
LEFT JOIN vd7qw_footsal_teamofgroup AS tog
ON tog.id_team = team.id
LEFT JOIN vd7qw_footsal_groups AS g
ON g.id = tog.id_group
WHERE (a.state IN (1))) AS h
WHERE goals > 0
GROUP BY id_group
ORDER BY goals DESC