SQL - List attributes that showed up with others - mysql

I am working on an assignment that requires me to list the actors and actresses that show up in a movie, and then the actors and actresses that have starred with them in other movies. A diagram of the database is viewable here: http://i.imgur.com/kj8qVgF.png
I have a query for the first part (getting the names of actors and actresses that show up in a certain movie).
SELECT DISTINCT n.name
FROM cast_info c
INNER JOIN name n
ON (n.id = c.person_id)
INNER JOIN title t
ON (c.movie_id = t.id)
CROSS JOIN role_type r
WHERE (t.title = 'The Movie') AND (r.role = 'actress' OR r.role = 'actor')
Could I get some assistance to help me find the actresses and actors that star with them in other movies?
Example:
Actors in a given movie 'The Movie': Bob, Joe, Billy
Actors in a different movie 'Another Movie': Joe, Daniel, Frank
Actors in another different movie 'Third Movie': Billy, Susan, Theodore
It should return Daniel, Frank, Susan, and Theodore, because they starred in at least one movie with one of the actors in the given movie.

According to your example, and your model is 3NF, so you can use trivial IN to solve the problem just like you described in Natural language.
SELECT DISTINCT n.name
FROM cast_info c
INNER JOIN name n
ON (n.id = c.person_id)
INNER JOIN title t
ON (c.movie_id = t.id)
INNER JOIN role_type r
ON (c.role_id = r.id)
WHERE (r.role = 'actress' OR r.role = 'actor') -- find the actresses and actors ...
AND c.movie_id IN -- starred in at least one movie ...
(SELECT movie_id
FROM cast_info
WHERE id in -- with one of the actors in the given movie => Bob, Joe, Billy
(SELECT cc.id
FROM cast_info cc
INNER JOIN role_type rr
ON (cc.role_id = rr.id)
WHERE ( rr.role = 'actor') -- only select actors in the given movie
AND cc.movie_id in (select id from title where title = 'The Movie')
)
)
AND c.id NOT IN -- except the actors in the given movie
(SELECT cc.id
FROM cast_info cc
INNER JOIN role_type rr
ON (cc.role_id = rr.id)
WHERE ( rr.role = 'actor') -- only select actors in the given movie
AND cc.movie_id in (select id from title where title = 'The Movie')
)
This easy to translate Natural language to SQL but not efficient using IN, you can transfer IN with EXISTS, that would be more efficient.

So we meet again. Hopefully I can be of the same assistance as your last question. I like that you began by breaking this down into smaller parts. I would have started at the same place as you - getting all of the actors in the necessary movie. I've changed your query slightly, by selecting the persons id instead of their name so I can keep this answer a little shorter and cleaner.
SELECT c.person_id
FROM cast_info c
JOIN title t ON c.movie_id = t.id
JOIN role_type r ON c.role_id = r.id
WHERE t.title = 'The Movie' AND (r.role = 'actress' OR r.role = 'actor');
*Note, I did not use distinct here because a person should not appear in this movie as an actor or actress more than once, but c.person_id is not a unique key so it would not hurt to be sure.
Then we can get all of the movies with those actors in them. We can filter to avoid the original table.
SELECT t.id
FROM title t
JOIN cast_info c ON c.movie_id = t.id
WHERE t.title != 'The Movie'
AND c.person_id IN(SELECT c.person_id
FROM cast_info c
JOIN title t ON c.movie_id = t.id
JOIN role_type r ON r.id = c.role_id
WHERE t.title = 'The Movie' AND (r.role = 'actress' OR r.role = 'actor'))
Now we can pull all of the actors from all of those movies, and exclude our original actors. Here is where it is a good idea to use distinct person id. This is because Bob may star in 'The Movie'. Later, Bob did a movie with John, and another movie with John, but we don't want John to appear twice.
So, here is the final query:
SELECT DISTINCT c.person_id
FROM cast_info c
WHERE c.movie_id IN(SELECT t.id
FROM title t
JOIN cast_info c ON c.movie_id = t.id
WHERE t.title != 'The Movie'
AND c.person_id IN(SELECT c.person_id
FROM cast_info c
JOIN title t ON c.movie_id = t.id
JOIN role_type r ON r.id = c.role_id
WHERE t.title = 'The Movie' AND (r.role = 'actress' OR r.role = 'actor')))
AND c.person_id NOT IN(SELECT c.person_id
FROM cast_info c
JOIN title t ON c.movie_id = t.id
JOIN role_type r ON r.id = c.role_id
WHERE t.title = 'The Movie' AND (r.role = 'actress' OR r.role = 'actor'))
As I've said before, it's difficult to test this without data, and it's a lot of tables to make a simple SQL Fiddle, so please try these in bits like I have written them if they don't work, and let me know what may need to be tweaked.

Related

SQL - Find the object with the most appearances

I am a newbie to SQL working on an assignment to find the actor or actress with the most appearances. A diagram of the database I'm working with is here:
Here was the query I was trying to use:
SELECT DISTINCT n.name, count(n.name)
FROM cast_info c
INNER JOIN name n
ON (n.id = c.person_id)
INNER JOIN title t
ON (c.movie_id = t.id)
CROSS JOIN role_type r
WHERE (r.role = 'actor' OR r.role = 'actress')
GROUP BY n.name
This is intended to get a count of how many times different actors showed up, which I can then sort and select the top one. But it doesn't work. Something else I did was:
SELECT n.name, count(n.name) AS amount
FROM cast_info c
INNER JOIN name n
ON (n.id = c.person_id)
INNER JOIN title t
ON (c.movie_id = t.id)
LEFT JOIN role_type r
ON c.role_id = r.id
AND (r.role = 'actor' OR r.role = 'actress')
GROUP BY amount
ORDER BY amount DESC
LIMIT 1
But that gives the error
aggregate functions are not allowed in GROUP BY
LINE 1: SELECT COUNT(*) AS total FROM (SELECT n.name, count(n.name) ...
Tips?
I am going to take a stab at each of these questions for you, because this assignment is obviously causing you some trouble.
You can find everything you need in your cast_info table and your role_type table, unless you need to display the actors/actresses actual name.
I would start by selecting all rows that represent an actor or actress in a movie. This should be a unique combination, as a person can't be an actor in the same movie twice. Once you've done that, group by the persons id and get the count() of rows, which should effectively be the number of movies. I think the error you're getting is exactly for the reason it sounds, you can't use an aggregate column in your order by. A workaround for that would be to use this as a subquery, and use MAX() to get most appearances.
Try this:
SELECT c.personid, MAX(numMovies) AS mostApperances
FROM(SELECT c.personid, COUNT(*) AS numMovies
FROM cast_info c
JOIN role_type r ON r.id = c.role_id
WHERE r.role = 'actor' OR r.role = 'actress'
GROUP BY c.personid) t
Try this
SELECT DISTINCT n.name, count(n.name)
FROM cast_info c
INNER JOIN name n
ON n.id = c.person_id
INNER JOIN title t
ON c.movie_id = t.id
LEFT JOIN role_type r
ON c.role_id = r.id
AND (r.role = 'actor' OR r.role = 'actress')
GROUP BY n.name

SQL - Select rows where one element appears twice

I am working on an assignment, and I need to find movies that have been directed by directors that directed more than one movie starring Angelina Jolie. Currently, I have this:
SELECT DISTINCT t.title, n.name
FROM (
SELECT DISTINCT t.id theMovies
FROM name n
INNER JOIN cast_info c
ON (c.person_id = n.id)
INNER JOIN title t
ON (t.id = c.movie_id)
WHERE n.name = 'Jolie, Angelina'
) as newTable
INNER JOIN title t
ON (t.id = theMovies)
INNER JOIN cast_info c
ON (c.movie_id = t.id)
INNER JOIN name n
ON (n.id = c.person_id)
CROSS JOIN role_type
WHERE role = 'director';
What this query currently does is find a list of movies starring Angelina Jolie, and then it lists the directors of those movies. All I need to do now is keep only the rows where the director is present in at least one other row. Any tips?
For reference, here is a diagram of the database I'm using:
http://i.imgur.com/kj8qVgF.png
I'm also rather new to SQL so any suggestions to improve my query would be much appreciated!
I would break this up into several pieces and build up to your final query. If you are new to
SQL, it's good practice to break things into bits and put them back together. With that, I'll restate the goal: find movies that have been directed by directors who have directed a movie with Angelina Jolie.
I would start by getting all movies with Angelina Jolie:
SELECT t.id
FROM name n
JOIN cast_info c ON c.person_id = n.id
JOIN title t ON t.id = c.movie_id
WHERE n.name = 'Jolie, Angelina';
Now, let's get the directors of those movies:
SELECT c.person_id
FROM cast_info c
JOIN title t ON t.id = c.movie_id
JOIN role_type r ON r.id = c.role_id
WHERE r.role = 'director' AND t.id IN(SELECT t.id
FROM name n
JOIN cast_info c ON c.person_id = n.id
JOIN title t ON t.id = c.movie_id
WHERE n.name = 'Jolie, Angelina');
We can modify the above query to group by person_id, having a count(*) greater than one (meaning more than one movie).
SELECT c.person_id
FROM cast_info c
JOIN title t ON t.id = c.movie_id
JOIN role_type r ON r.id = c.role_id
WHERE r.role = 'director' AND t.id IN(SELECT t.id
FROM name n
JOIN cast_info c ON c.person_id = n.id
JOIN title t ON t.id = c.movie_id
WHERE n.name = 'Jolie, Angelina')
GROUP BY person_id
HAVING COUNT(*) > 1;
Now, we need to find movies directed by those directors, and filter so that we don't include movies with Angelina Jolie.
SELECT t.id
FROM title t
JOIN cast_info c ON c.movie_id = t.id
JOIN role_type r ON r.id = c.role_id
WHERE r.role = 'director'
AND c.person_id IN (SELECT c.person_id
FROM cast_info c
JOIN title t ON t.id = c.movie_id
JOIN role_type r ON r.id = c.role_id
WHERE r.role = 'director' AND t.id IN(SELECT t.id
FROM name n
JOIN cast_info c ON c.person_id = n.id
JOIN title t ON t.id = c.movie_id
WHERE n.name = 'Jolie, Angelina')
GROUP BY person_id
HAVING COUNT(*) > 1)
AND t.id NOT IN(SELECT t.id
FROM name n
JOIN cast_info c ON c.person_id = n.id
JOIN title t ON t.id = c.movie_id
WHERE n.name = 'Jolie, Angelina');
I can't test via SQL Fiddle because it is not working at the moment but I will do so as soon as I can. Some stuff might need to be tweaked, but let me know if this helps.
Please try:
SELECT
t.title
, n.name
FROM title t
INNER JOIN cast_info c
ON t.id = c.movie_id
INNER JOIN name n
ON c.person_id = n.id
INNER JOIN role_type r
ON c.person_role_id = r.id
INNER JOIN (
SELECT
c.person_id
, r.id
FROM cast_info c
INNER JOIN role_type r
ON c.person_role_id = r.id
WHERE r.role = 'director'
AND c.movie_id IN (
SELECT DISTINCT
c.movie_id
FROM name n
INNER JOIN cast_info c
ON c.person_id = n.id
WHERE n.name = 'Jolie, Angelina'
)
GROUP BY
c.person_id
, r.id
HAVING COUNT(*) > 1
) d
ON c.person_id = d.person_id
AND r.id = d.id
;
Try a multi-part query "in parts" if it doesn't seem to work, this helps identify where it may be failing
-- 1
SELECT DISTINCT
c.movie_id
FROM name n
INNER JOIN cast_info c
ON c.person_id = n.id
WHERE n.name = 'Jolie, Angelina'
;
-- 2
SELECT
c.person_id
, r.id
FROM cast_info c
INNER JOIN role_type r
ON c.person_role_id = r.id
WHERE r.role = 'director'
GROUP BY
c.person_id
, r.id
HAVING COUNT(*) > 1
;
-- 3
SELECT
c.person_id
, r.id
FROM cast_info c
INNER JOIN role_type r
ON c.person_role_id = r.id
WHERE r.role = 'director'
AND c.movie_id IN (
SELECT DISTINCT
c.movie_id
FROM name n
INNER JOIN cast_info c
ON c.person_id = n.id
WHERE n.name = 'Jolie, Angelina'
)
GROUP BY
c.person_id
, r.id
HAVING COUNT(*) > 1
;

SQL Query Improvement - Picking X only if A>B where A and B values are within a scope range of a variable

I am practicing my SQL skills and trying to solve some exercises.
Problem: For all cases where the same reviewer rated the same movie twice (stored in rating table) and gave it a higher rating (rating.stars) the second time, return the reviewer's name (stored in reviewer table) and the title of the movie (stored in movie table).
SELECT
r.NAME AS reviewer
,m.Title AS movietitle
FROM rating ra
LEFT JOIN movie m ON m.mID = ra.mID
LEFT JOIN reviewer r ON r.rID = ra.rID
LEFT JOIN
(
SELECT
ra.rID
,ra.mID
,MAX(ra.RatingDate) AS MaxDate
,MIN(ra.RatingDate) AS MinDate
,MAX(ra.stars) AS MaxStars
,MIN(ra.stars) AS MinStars
FROM Rating ra
GROUP BY ra.rID, ra.mID
HAVING MAX(ra.stars) <> MIN(ra.stars) and COUNT(*) = 2
) rs ON ra.rID = rs.rID AND ra.mID = rs.mID
WHERE
ra.Ratingdate = rs.MaxDate
AND ra.stars = rs.MaxStars
For the above query solution I tried and is correct (I think)
Is this the most clear way for solving the problem?
Are there any shortcuts for solving the same problem?
What is the practical name of these problems? I made out a name on my title as "Picking X only if A>B where A and B values are within a scope range of a variable". Is there a better way to categorize those type of problems or are they already categorized within books or in communities of some sort?
References
http://sqlfiddle.com/#!2/8031b/1509
http://i42.photobucket.com/albums/e311/indiecoding/data_zpsd505fa8a.png
I would be inclined to do this with an exists clause:
select rv.name, m.title
from rating r join
movie m
on r.mid = m.mid join
reviewer rv
on r.rid = rv.rid
where exists (select 1
from rating r2
where r2.rid = r.rid and
r2.mid = r.mid and
r2.ratingdate < r.ratingdate and
r2.stars < r.stars
);
The joins just bring the necessary tables together to get the reviewer name and movie name. The key is the correlated subquery. It directly implements the logic you are looking for.
This will generate one row per pair pod ratings that qualify. i.e., if there are three ratings by same reviewer for same movie, with increasing ratings, you will get one row in output for each pair (1,2); (2, 3), and (1,3). Of course, the movie name and rater name will be the same for all output rows anyway, so if that's all you want, then just add a distinct to the select clause and you'll only get one row...
Select r.Name reviewer, m.Title movietitle
From rating r1
join rating r2
On r2.mID = r1.mID
And rID = r1.rID
And r2.RatingDate > r1.RatingDate
And r2.stars > r1.stars
left join movie m ON m.mID = r1.mID
left join reviewer r ON r.rID = 1.rID
If you don't want to see the output for (1, 3) [where there is an intervening rating between the two], and if you want other data to be generated like the rating dates, for example, then that would require use of a subquery to restrict pairings to adjacent (successive) ratings...
Select r.Name reviewer, m.Title movietitle
From rating r1
join rating r2
On r2.mID = r1.mID
And rID = r1.rID
And r2.stars > r1.stars
And r2.RatingDate =
(Select min(ratingDate)
From rating
Where mID = r1.mId
and rId = r1.rId
And ratingDate > r1.ratingDate)
left join movie m ON m.mID = r1.mID
left join reviewer r ON r.rID = 1.rID

SQL query, three tables

So let's same I'm trying to find actors who are in two movies together (for the purpose of a degrees of separation page). I have databases as such (this is just some made up data):
actors
id first_name last_name gender
17 brad pitt m
2 kevin bacon m
movies
id name year
20 benjamin button 2008
roles
a_id m_id role
17 20 Mr. Benjamin Button
So I want to return the names of the movies which both actors are in. I have the first and last names of two actors.
I'm having a lot of trouble getting this to work. What I'm having trouble with, specifically, is the SELECT part
SELECT name FROM movies JOIN . . .
I'm starting with first_name and last_name values for each
You must join twice:
SELECT m.name movie_name
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.first_name = 'brad' and a1.last_name = 'pitt' and
a2.first_name = 'kevin' and a2.last_name = 'bacon'
Show all actor combinations per film:
SELECT m.name movie_name, a1.id actor1, a2.id actor2
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.id < a2.id
The < ensures that each combination is only reported once.
select m.name,group_concat(concat_ws(' ',a.first_name,a.last_name) order by a.last_name) as actors
from actors as a
inner join roles as r on a.id = r.a_id
inner join movies as m on m.id = r.m_id
where r.a_id in (2,17)
group by r.m_id
having count(r.a_id) = 2
order by m.name
declare #FirstActorID int,
#SecondActorID int;
select m.[name]
from
movies m
inner join [roles] r1 on r1.m_id = m.id and r1.a_id = #FirstActorID
inner join [roles] r2 on r2.m_id = m.id and r2.a_id = #SecondActorID

Help with (somewhat simple) MySQL JOIN

Consider I have three tables:
Movies (movie 1, movie 2, etc)
Categories (action, suspense, etc)
Movies_Categories (movie 1 -> action, movie 1 -> suspense, movie 2 -> suspense, etc)
How could I select only the movies that belong or don't belong to a specific category using only 1 query?
Thanks!
Belongs:
SELECT m.*
FROM movies m
INNER JOIN movies_categories mc
ON m.id = mc.movie_id
INNER JOIN categories c
ON c.id = mc.category_id
AND c.name = 'action';
Doesn't belong:
SELECT m.*
FROM movies m
LEFT OUTER JOIN (SELECT mc.movie_id
FROM movies_categories mc
INNER JOIN categories c
ON c.id = mc.category_id
AND c.name = 'action') mcx
ON m.id = mcx.movie_id
WHERE mcx.movie_id IS NULL
SELECT m.*,
IF(mc.movie_id IS NULL, 'doesnt belong', 'belongs')
FROM Movies m
LEFT JOIN Movies_Categories mc ON mc.movie_id = m.id
AND mc.category_id = (SELECT id
FROM categories
WHERE name = 'action')