MySQL - Movie Database Query including Null Rows - mysql

For reference I am working on the Sakila rental DVD database outlined here (#2): http://www.ntu.edu.sg/home/ehchua/programming/sql/SampleDatabases.html
I am trying to find the runtime of all Sci-Fi movies each actor has been in, including those who have not been in a Sci-Fi movie. I have the correct query for those who HAVE been in a Sci-Fi movie, but I'm having trouble expanding it to include all actors even if the runtime is NULL.
Here is my query:
SELECT act.first_name, act.last_name, SUM(fm.length)
FROM film fm
INNER JOIN film_actor fa ON fa.film_id = fm.film_id
INNER JOIN actor act ON fa.actor_id = act.actor_id
LEFT JOIN film_category fc ON fm.film_id = fc.film_id
LEFT JOIN category cat ON fc.category_id = cat.category_id
WHERE cat.name = 'Sci-Fi'
GROUP BY act.first_name, act.last_name
ORDER BY act.last_name ASC
This gets me all 167 actors who are in at least 1 Sci-Fi movie. I think my WHERE clause is not allowing NULL rows, but I don't know how to fix it.

You are correct, the where clause is working on the far right side of your left joins...if it is null it's going to be filtered out because null does not equal 'Sci-Fi'. Want nulls? Tell the where statement to include them
WHERE (cat.name = 'Sci-Fi' or cat.name is null)

You should have "cat.name = 'Sci-Fi' be a part of your ON clause in the outer join.
Unless your criteria is filtering in on nulls, you should never put criteria involving an outer joined table in the WHERE clause, as it turns the outer join into an inner join. The criteria belongs in the ON clause.
SELECT act.first_name, act.last_name, SUM(fm.length)
FROM film fm
INNER JOIN film_actor fa
ON fa.film_id = fm.film_id
INNER JOIN actor act
ON fa.actor_id = act.actor_id
LEFT JOIN film_category fc
ON fm.film_id = fc.film_id
LEFT JOIN category cat
ON fc.category_id = cat.category_id
and cat.name = 'Sci-Fi'
GROUP BY act.first_name, act.last_name
ORDER BY act.last_name ASC

Related

SQL Sakila Query Question - Find all actors that have starred in films of all 16 film categories

I am trying to put together a query from the Sakila database.
The query should find all actors that have starred in all 16 film categories.
To get all of this information into one table for querying, I have performed a INNER JOIN:
SELECT a.first_name, a.last_name FROM actor a
INNER JOIN film_actor fa
ON fa.actor_id = a.actor_id
INNER JOIN film_category fc
ON fc.film_id = fa.film_id;
However, from there I do a GROUP BY on the category_id but don't know how to iterate through and count if a particular actor_id has all 16 categories?
Does this complex of a query require writing a FUNCTION or PROCEDURE?
You are almost there. Group against the actor name and check that the unique category count is 16:
SELECT a.actor_id, a.first_name, a.last_name
FROM actor a
INNER JOIN film_actor fa ON fa.actor_id = a.actor_id
INNER JOIN film_category fc ON fc.film_id = fa.film_id
GROUP BY a.actor_id, a.first_name, a.last_name
HAVING COUNT(DISTINCT fc.category_id) =
(
SELECT COUNT(DISTINCT category_id)
FROM film_category
)

SQL Finding similarities

"Determine if there are actors with the same first name who appeared in the same movie."
This is my task and I'm supposed to do that with subqueries and I just dont really know what else to do. I tried everything with group by, order by and having count but I just don't get to the point where I get the same first name actors with the same movie.
Maybe someone can help me? I am using Sakila Database
SELECT
a.first_name
,(a.last_name)
,a.actor_id
, f.title
FROM actor a
JOIN film_actor fa ON fa.actor_id = a.actor_id
JOIN film f ON f.film_id = fa.film_id
JOIN(SELECT b.first_name, COUNT(*)
FROM actor B
GROUP BY b.first_name
HAVING COUNT(*) > 1 ) b
ON a.first_name = b.first_name
GROUP BY a.last_name
HAVING COUNT(f.title) > 1
ORDER BY a.first_name
You can do this with joins only:
select f.title, a1.first_name, a1.last_name as last_name_1, a2.last_name as last_name_2
from film f
inner join film_actor fa1 on fa1.film_id = f.film_id
inner join film_actor fa2 on fa2.film_id = f.film_id
inner join actor a1 on a1.actor_id = fa.actor_id
inner join actor a2 on a2.actor_id = fa.actor_id
where a1.first_name = a2.first_name and a1.actor_id < a2.actor_id
Starting from the film table, this follows the relationships to actor through film_actor twice, and then filters on diffrerent actors that have the same first name.
As a result, you get tuples of actors that have the same last name and played in the same film. The inequality condition ensures that there are no "mirror" records (that is, each tuple appears only once per film).
I would simply use aggregation:
SELECT fa.film_id, a.first_name,
GROUP_CONCAT(a.last_name) as last_names,
GROUP_CONCAT(a.actor_id) as actor_ids
FROM actor a JOIN
film_actor fa
ON fa.actor_id = a.actor_id
GROUP BY fa.film_id, a.first_name
HAVING COUNT(*) > 1;
Your question doesn't specify what the result set should look like. This returns one row per actors with the same first name in a film. The last names are concatenated into a string as are the actor ids.
How about this:
SELECT f.title, f.film_id, a.first_name, a.last_name, a.actor_id
FROM actor a
JOIN film_actor fa ON fa.actor_id = a.actor_id
JOIN film f ON f.film_id = fa.film_id
WHERE a.first_name IN (
SELECT a2.first_name
FROM actor a2
JOIN film_actor fa2 ON fa2.actor_id = a2.actor_id
JOIN film f2 ON f2.film_id = fa2.film_id
WHERE a2.actor_id <> a.actor_id AND f2.film_id = f.film_id
)
ORDER BY f.title ASC, a.last_name ASC, a.first_name ASC
Explaining query step-by-step
SELECT the needed fields from the joined tables
JOIN the necessary tables
WHERE (here is the subquery) a.first_name is in the set of:
firstnames of actors, different than the current actor (a2.actor_id <> a.actor_id) and the film is the same (f2.film_id = f.film_id)
The subquery in where is a similar select with joins query as the parent query.
PS:
One can do variations on this basic query template:
Eg film_id can be given as parameter, so one can find all actors with same name for specific film.
Also one can group and count how many actors appeared in same film wih same name eg by grouping on film_id and counting.
One can even optimise a bit the query by removing unnecessary joins (eg film.title may not be needed at all) and so on..
The advantage of having single results returned (instead of tuples or aggregates) is that number of actors with same name in same film is not fixed and manipulating the results, eg by grouping and counting or getting further info for each actor, is easier.
The price is a slightly more complex and potentialy slower query.

List of customers that have never rented out a movie from the top 5 actors (Sakila DB)

In Sakila DB, how to get a list of customers that have never rented out even a single movie from the top 5 actors (the list of top actors is calculated by rental volume).
This is what I used to find the top 5 actors
SELECT a.actor_id, a.first_name, a.last_name,
COUNT(r.rental_id) AS rentalVolume
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
JOIN inventory i ON f.film_id = i.film_id
JOIN rental r ON i.inventory_id = r.inventory_id
GROUP BY a.actor_id, a.first_name, a.last_name
ORDER BY rentalVolume DESC
LIMIT 5;
I want to SELECT the customer_id, first_name, last_name, that have never rented out a movie from these actors.
The desired result would be something like this
Customer Number First Name Last Name
2 PETER OLIVIER
8 JOHN DOE
64 GWEN LORENZO
You can use the not exists operator to search for the costumers that haven't any movie with the top 5 actors (to identify which of their rentals are movies with those actors you can use the in operator) :
select c.costumer_id, c.first_name, c.last_name
from costumers c
where not exists (
select *
from rental r
inner join inventory i on i.inventory_id = r.inventory_id
inner join film f on f.film_id = i.film_id
inner join film_actor fa on fa.film_id = f.film_id and
fa.actor_id in (<< Here_goes_your_top_5_actors_query >>)
where r.costumer_id = c.costumer_id
)
This is the most direct and easiest to understand translation of your logic to SQL, but a correlated subquery can result in very bad performance (specially when large numbers of records are involved). If this query is too slow for you, then you can do a select of rentals grouped by costumer, and summing their films that have one of those actors, returning only the costumers with a sum of zero.
The inner joins now have to be changed to left joins, because we are interested in rows that don't have not even a single film_actors matching the top 5 actors, so an inner join wouldn't return those costumers.
select c.costumer_id, c.first_name, c.last_name
from costumer c
left join rental r on r.costumer_id = c.costumer_id
left join inventory i on i.inventory_id = r.inventory_id
left join film_actor fa on fa.film_id = i.film_id and
fa.actor_id in (<< Here_goes_your_top_5_actors_query >>)
group by c.costumer_id, c.first_name, c.last_name
having sum(fa.film_id) = 0
PS: in this faster query I have removed the join with films because it was never needed, we don't use any data from the film table, so we can directly join inventory to film_actor.

How to display Null result with a WHERE clause

Hey guys I'm new to SQL and having some difficultly. I'm hoping someone could clear some stuff up for me.
This is my issue. I want to display all of the categories that an actor has played in and the amount of films they have played in that category. So for example they have played in action movies 5 times. This is what I have so far:
SELECT c.name AS "Category_Name"
, Count(c.name) AS "Count"
FROM category c
JOIN film_category fc
ON c.category_id = fc.category_id
JOIN film f
ON fc.film_id = f.film_id
JOIN film_actor fa
ON f.film_id = fa.film_id
JOIN actor a
ON fa.actor_id = a.actor_id
WHERE a.first_name = "Kevin"
AND a.last_name = "Bloom"
GROUP
BY c.name
ORDER
BY c.name ASC;
This will display all of the categories and the amount of times "Kevin Bloom" has played in each however it will not display NULL values for categories he has not played in and I need it to. I have spend a few hours trying to figure this out but it either didn't help or I wasn't able to understand it.
From what I gather the WHERE clause is causing this issue. I also believe I will likely need to use a LEFT JOIN instead and possibly a sub query. I'm a little shaky on both of these things when used in conjunction. If anyone can offer some help to a first time learner I would really appreciate it!
SELECT c.name AS "Category_Name", Count(a.actor_id) AS "Count"
FROM category c
LEFT JOIN film_category fc ON c.category_id = fc.category_id
LEFT JOIN film f ON fc.film_id = f.film_id
LEFT JOIN film_actor fa ON f.film_id = fa.film_id
LEFT JOIN actor a
ON fa.actor_id = a.actor_id
AND a.first_name = 'Kevin' AND a.last_name = 'Bloom'
GROUP BY c.name
ORDER BY c.name ASC;
Per your comment, why to use AND versus WHERE is about how WHERE is evaluated when executing. The WHERE clause limits the entire result set by the condition(s) you specify. Whereas the ON conditions only limit what is allowed to match the records and not necessarily the entire results set when an OUTER JOIN is used. So If you put a WHERE condition in that limits results based on the RIGHT side of your LEFT JOIN it becomes and INNER JOIN because it tells SQL that you only want the results that match and because only some categories match that actor you would only get those categories. However by putting the condition in the ON clause of the JOIN your results are not limited and all categories will be returned but only actors matching your criteria will be considered.
You would use a LEFT JOIN, but you have to be careful:
SELECT c.name AS Category_Name, Count(a.actor_id) AS "Count"
FROM category c LEFT JOIN
film_category fc
ON c.category_id = fc.category_id LEFT JOIN
film f
ON fc.film_id = f.film_id LEFT JOIN
film_actor fa
ON f.film_id = fa.film_id LEFT JOIN
actor a
ON fa.actor_id = a.actor_id AND a.first_name = 'Kevin' AND a.last_name = 'Bloom'
GROUP BY c.name
ORDER BY c.name ASC;
Notes:
LEFT JOIN is key to the solution.
Notice the COUNT() has changed to count the id from actor. This will return 0 for categories where he has not acted.
The standard delimiter for strings in SQL is a single quote, not a double quote.
There is no need to escape column aliases, unless necessary.

In clause in mysql

I have 4 tables as follows which has the following attributes:
Actor:actor_id,firstname,lastname
film_actor:actor_id,film_id
film_category:film_id,category_id
category:category_id,name
I want to find the list of all actors working in films, their film_id, category_id and category name.
I want to use the In clause for foll query. So i am getting the o/p by implementing this as follows:
select a.first_name,a.actor_id,fc.film_id,c.name,c.category_id
from actor a,film_actor fa,film_category fc,category c where a.actor_id
in (select fa.film_id from film_actor fa where fa.actor_id=a.actor_id
and fa.film_id
in (select fc.film_id from film_category fc where fc.film_id=fa.film_id
and fc.category_id
in(select fc.category_id from category c where c.category_id=fc.category_id)))
But suppose now i want to know list of actors for particular category_id.Lets say suppose 5
which is present. So I make following changes:
select a.first_name,a.actor_id,fc.film_id,c.name,c.category_id
from actor a,film_actor fa,film_category fc,category c
where a.actor_id in
(select fa.film_id from film_actor fa where fa.actor_id=a.actor_id
and fa.film_id
in (select fc.film_id from film_category fc where fc.film_id=fa.film_id
and fc.category_id
in(select fc.category_id from category c where c.category_id=fc.category_id
and category_id=5)))
I am getting empty result.Also lastly when we should use IN clause and when should we not?
Don't use IN for this.. use JOIN instead.
select
a.first_name,
a.actor_id,
fc.film_id,
c.name,
c.category_id
from
actor a join
film_actor fa on fa.actor_id = a.actor_id join
film_category fc on fc.film_id = fa.film_id join
category c on c.category_id = fc.category_id and c.category_id = 5
I typically only use IN for a hard-coded set of IDs... JOIN or EXISTS for every other case. Not only is this cleaner, but it will likely result in a better performing execution plan as well.
Please start to use join syntax instead of an IN clause:
select a.first_name,
a.actor_id,
fc.film_id,
c.name,
c.category_id
from actor a
left join film_actor fa
on a.actor_id = fa.actor_id
left join film_category fc
fa.film_id = fc.film_id
left join category c
on fc.category_id = c.category_id
and c.category_id=5
This will return all records from the actor table regardless of if there is a matching record in the other tables.
If you need help learning JOIN syntax here is a great visual explanation of joins