Average Count of Genres per Movie for MySQL - mysql

Stumbled into a problem using an IMBd dataset that I can't seem to figure out the answer to. The question is:
Create a table that contains the average count of genres per movie for
each genre
We have two tables: Movies: id, name; Genres: id (movieId), genre
Movies:
id,name
1,Toy Story
2,Jumanji
3,Grumpier Old Men
4,Waiting to Exhale
5,Father of the Bride Part II
6,Heat
Genres:
id,genre
1,Animation
1,Children's
1,Comedy
2,Adventure
2,Children's
2,Fantasy
3,Comedy
3,Romance
4,Comedy
4,Drama
5,Comedy
6,Action
6,Crime
6,Thriller
I maybe interpreting the question incorrectly, but shouldn't the output be 3 columns: genre, movie, and count?
My answer would start along the lines of:
SELECT genre, name, AVG(COUNT(*)) FROM movies
JOIN genres ON genres.id=movies.id
GROUP BY name;
Any ideas on how you would interpret the question and answer?

Well, I would start with the number of genres per movie:
select id, count(*) as num_genres
from genres g
group by id
Then, I would "attach" this information to the genres information. And aggregate and average:
select g.genre, avg(m.num_genres)
from genres g join
(select id, count(*) as num_genres
from genres g
group by id
) m
on g.id = m.id
group by g.genre;

I agree with Gordon first number of genres per movie
select id, count(*) as num_genres
from genres g
group by id
But the average of genres per movies should be
SELECT AVG(num_generes)
FROM (
SELECT id, count(*) as num_genres
FROM genres g
GROUP BY id
) t

Related

SQL: Group by in subquery

I'm trying to find the output of all books that have more than one genre using a group by statement and subquery. However, it keeps returning Subquery returns more than 1 row. This is what I have so far:
SELECT title
FROM book
WHERE 1 < (SELECT COUNT(genre) FROM genres GROUP BY book_id);
Here's an example:
SELECT b.title
FROM ( SELECT g.book_id
FROM genres g
GROUP
BY g.book_id
HAVING COUNT(1) > 1
) m
JOIN book b
ON b.id = m.book_id
The inline view m is meant to return us values of book_id that appear more than one time in the genres table. Depending on uniqueness constraints, we might want to count distinct values of genre
HAVING COUNT(DISTINCT g.genre) > 1
if we want to find books with exactly three related genre:
HAVING COUNT(DISTINCT g.genre) = 3
Once we have a list of book_id values, we can join to the book table. (The query assumes that book_id in genres is a foreign key reference to the id column in book table.)
You seem to what a correlated subquery:
SELECT b.title
FROM book b
WHERE 1 < (SELECT COUNT(*) FROM genres g WHERE g.book_id = b.book_id);
SELECT distinct a.title
FROM book a, (select bookid,count(distinct genre)genres from genres group by bookid)b
WHERE a.book_id=b.bookid and b.genres>1
hope it helps!

SQL queries when two or more tables to be joined

IMDB database has the following tables
actors(id, first_name, last_name, gender)
directors(id, first_name,last_name)
directors_genres(director_id, genre, prob)
movies(id, name,year, rank)
movies_directors(director_id, movie_id)
roles(actor_id,movie_id, role)
movies_genres(movie_id, genre)
a) Write a query that lists the female actors who appeared in a movie during the 90s (1990-1999) that was rated higher than 8.5.
b) Write a query that lists all actors who was in a movie rated lower than 3.0 two or more times. List the name of the actor, the movie and each rating, ordered ascending by the actors’ last name then first name.
c) Write a query that lists all actors who have been in two or more movies of different genres. List their name, movie and their respective genres.
My answers:
a)
SELECT actors.firstname
from ((roles inner join movies on roles.mid=movies.id)
inner join actors on actors.id=roles.aid)
where (movies.year between 1990 and 1999)
and
(movies.rank >= 8.5)
. is it correct ?
and can anyone help how to approach other queries. Thanks in advance
You forget gender, and higher than 8.5 (not higher than or equals).
SELECT actors.firstname
from roles inner join movies on roles.movie_id = movies.id
inner join actors on actors.id=roles.aid
where movies.year between 1990 and 1999
and movies.rank > 8.5
and actors.gender = 'F';
P.S. Is this your school work?
SELECT
first_name, last_name
FROM actors
JOIN
movies ON actors.id=movies.id
WHERE movies.gender='female'and movies.rank>8.5 AND movies.year
BETWEEN
1990-1999
Add role to actors table
actors(id, first_name, last_name, gender,role)
movies(id, name,year, rank)
One more variant:
SELECT actors.firstname FROM actors WHERE id IN(
SELECT actors_id FROM roles WHERE movies_id IN(
SELECT id FROM movies WHERE (movies.year BETWEEN 1990 and 1999)
and (movies.rank >=8)));

How to separate the maximum count for each genre of something

I want to find for each
genre of movie, find the N actors who have played in most movies
of the genre
I have done this:
select genre.genre_name,actor.actor_id,count(genre.genre_name) from genre
inner join movie_has_genre on movie_has_genre.genre_id=genre.genre_id
inner join movie on movie_has_genre.movie_id=movie.movie_id
inner join role on movie.movie_id=role.movie_id
inner join actor on actor.actor_id=role.actor_id
group by genre.genre_name,actor.actor_id;
which gives as a result for each genre how many movies of that genre every actor has played and now i want to find for each genre the actor that has played the most moviesof that genre.
Tables and their columns:
actor(actor_id,name)
role(actor_id,movie_id)
movie(movie_id,title)
movie_has_genre(movie_id,genre_id)
genre(genre_id,genre_name)
Also the result should be something like this:
Action 22591 7
Horror 25863 3
Horror 24867 3
Comedy 23476 2
Drama 14536 1
Drama 19634 1
Drama 17563 1
Man, what I'd do is the next (supposing your code is working well):
-- Notice this is your code with some aliases, nothing else.
-- Just for making mi job easier.
create view frequency as
select genre.genre_name as genre_ name,
actor.actor_id as actor_id,
count(genre.genre_name) as freq
from genre
inner join movie_has_genre on movie_has_genre.genre_id=genre.genre_id
inner join movie on movie_has_genre.movie_id=movie.movie_id
inner join role on movie.movie_id=role.movie_id
inner join actor on actor.actor_id=role.actor_id
group by genre.genre_name,actor.actor_id;
-- And this is my proposal
-- Take the max frequency per each category
-- and find the guy who possesses it (maybe 2 or more...)
select genre.genre_name,actor.actor_id
from frequency as tbl1 inner join
(
-- The max frequency in a genre.
select f.genre_name,
max(f.freq) as max_freq
from frequency f
group by(genre_name)
) as tbl2 on (tbl1.genre_name = tbl2.genre_name)
where tbl1.freq = tbl2.max_freq;
And well, there's one problem: It may return more than one actor per category, if there's a tie. But how can I know who is the winner? I let it for you. Maybe it's wrong, I don't think so, but we're both learning! Hope I'd help you.
You need to use the MAX() function. Some SQL implementations (such as Oracle) allow you to do this: SELECT MAX(COUNT(whatever)) but MySQL isn't one of them.
One way to do what you want is this:
select genre_name, actor_id, max(genrecount)
from (
select genre.genre_name, actor.actor_id, count(genre.genre_name) as genrecount
from genre
inner join movie_has_genre on movie_has_genre.genre_id=genre.genre_id
inner join movie on movie_has_genre.movie_id=movie.movie_id
inner join role on movie.movie_id=role.movie_id
group by genre.genre_name,actor.actor_id
) as topactor
This does the outer SELECT on the table derived from the inner SELECT.

denormalize many to many relationship in MySQL

I would like to denormalize many to many relationship in mysql. In order to import to MongoDB as Json format Schema.
Input
I have 3 tables:
Movies : id, title, url
Genres : id, genre
movie_genres : movie_id, genre_id
example
movie Table
id title link
1 star wars http://link-to-imdb
2 shrek http://link-to-imdb
movie_genres Table
movie genre
1 1
2 1
genres Table
id genre
0 unknown
1 action
2 comedy
3 drama
I would like to transform it to a single table by moving genres into movies as array or multiple values.
There are quite a few limited number of genres (only 15).
Output
So, Final output of table would be:
Movies : id, title, url, genre
Here, genre would be multiple values.
Example:
id title link genre
1 star wars http://link-to-imdb action, drama, sci-fi
2 shrek http://link-to-imdb anime
I did this - MySQL Query:
select M.id ,M.title ,M.release_date, M.video, M.IMDBURL, G.genre
from genres G, movie_genres MG, movies M
where M.id = MG.movie and MG.genre = G.id
but causes lot of repetition depending on number of genres. It would be nice If I could dump genres altogether.
In this cause you should use GROUP_CONCAT() function
SELECT movie.id, movie.title, movie.url, GROUP_CONCAT(g.genre SEPARATOR ', ') AS genres
FROM movie
LEFT JOIN movie_genres mg ON movie.id = mg.movie_id
LEFT JOIN genres g ON mg.genre_id = g.id
GROUP BY movie.id
I didn't test the query above (there could be some typos), but I hope you will be able to get the idea

MYSQL, Max,Group by and Max

I am having the following two table.
1.Movie Detail (Movie-ID,Movie_Name,Rating,Votes,Year)
2.Movie Genre (Movie-ID,Genre)
I am using the following query to perform join and get the movie with highest rating in each
genre.
select Movie_Name,
max(Rating) as Rating,
Genre from movie_test
inner join movie_genre
where movie_test.Movie_ID = movie_genre.Movie_ID
group by Genre
In the output Rating and Genre are correct but the Movie_Name is incorrect.
can anyone suggest what changes I should make to get the correct movie name along with rating and genre.
SELECT g.*, d.*
FROM MovieGenre g
INNER JOIN MovieDetail d
ON g.MovieID = d.MovieID
INNER JOIN
(
SELECT a.Genre, MAX(b.Rating) maxRating
FROM MovieGenre a
INNER JOIN MovieDetail b
ON a.MovieID = b.MovieID
GROUP BY a.Genre
) sub ON g.Genre = sub.Genre AND
d.rating = sub.maxRating
There is something wrong with your schema design. If a Movie can have many Genre as well as Genre can be contain on many Movie, it should be a three table design.
MovieDetails Table
MovieID (PK)
MovieName
MovieRating
Genre Table
GenreID (PK)
GenreName
Movie_Genre Table
MovieID (FK) -- compound primary key with GenreID
GenreID (FK)
This is a common MySQL problem - specifying non-aggregate/non-aggregated-by columns in an aggregate query. Other flavours of SQL do not let you do this and will warn you.
When you do a query like yours, you are selecting non-aggregate columns in an aggregated group. Since many rows share the same genre, when you select Movie_Name it picks one row at random from each group and displays that one, because there is no general algorithm to guess the row you want and return the values of that.
You might ask 'why does it pick randomly? It could pick the one that max(Rating) belongs to?' but what about other aggregate columns, like avg(Rating)? What row does it pick there? What if two rows have the same max, anyway? Therefore it cannot have an algorithm to pick a row.
To solve a problem like this, you have to restructure your query, something like:
select Movie_Name,
Rating,
Genre from movie_test mt
inner join movie_genre
where movie_test.Movie_ID = movie_genre.Movie_ID
and Rating = (select max(Rating) from movie_test mt2 where mt.Genre = mt2.Genre
group by Genre
limit 1
This will select the row with the rating being the same as the maximum rating for that genre, using a subquery.
Query:
SELECT t.Movie_Name,
t.Rating,
g.Genre
FROM movie_test t
INNER JOIN movie_genre g ON t.Movie_ID = g.Movie_ID
WHERE t.Movie_ID = (SELECT t1.Movie_ID
FROM movie_test t1
INNER JOIN movie_genre g1 ON t1.Movie_ID = g1.Movie_ID
WHERE g1.Genre = g.Genre
ORDER BY t1.Rating DESC
LIMIT 1)