So I have to do a query where I am asked to find the count of directors who, for every pair of movies' genres, have directed both. I thought I should take two instances of genre id and two instances of directors id and find those where g_id's are different but dir_id are the same,so I tried something like this
select distinct g1.genre_id as genre1,
g2.genre_id as genre2,
count(distinct mhd1.director_id) as directors_count
from genre g1, genre g2, movie_has_genre mhg1,movie_has_genre
mhg2,movie_has_director mhd1,movie_has_director mhd2
where
(g2.genre_id <> g1.genre_id) and (mhg1.genre_id = g1.genre_id)
and (mhg2.genre_id = g2.genre_id) and (mhd1.movie_id = mhg1.movie_id)
and (mhd2.movie_id = mhg2.movie_id) and (mhd1.director_id =mhd2.director_id)
group by g1.genre_id, g2.genre_id;
Base is
actor(actor_id,first_name,last_name,gender)
director(director_id,first_name,last_name)
role(movie_id,actor_id,role)
genre(genre_id,name)
movie(movie_id,title,year,rank)
movie_has_director(movie_id,director_id)
movie_has_genre(genre_id,movie_id)
but it is not working. What am I missing there? Thank you
EDIT problem seems to be I get both (a,b) and (b,a) pairs while I should get only (a,b) with a
First you need kind of a cross join to get all combinations of two genres: genre g1 join genre g2 on g2.id > g1.id. Then you need to join movie_has_genre and movie_has_director to both (g1 and g2) and only keep rows where the director is the same (md2.director_id = md1.director_id). The rest is basic GROUP BY and COUNT:
select g1.name as genre1,
g2.name as genre2,
count(distinct md1.director_id) as directors_count
from genre g1
join genre g2 on g2.id > g1.id
join movie_has_genre mg1 on mg1.genre_id = g1.id
join movie_has_genre mg2 on mg2.genre_id = g2.id
join movie_has_director md1 on md1.movie_id = mg1.movie_id
join movie_has_director md2 on md2.movie_id = mg2.movie_id
and md2.director_id = md1.director_id
group by g1.id, g2.id
Adding on to #Paul's answer, you can get rid of duplicate (a,b), (b,a) by using DISTINCT, LEAST, GREATEST
select DISTINCT LEAST(g1.name,g2.name) as genre1,
GREATEST(g1.name,g2.name) as genre2,
count(distinct md1.director_id) as directors_count
from genre g1
join genre g2 on g2.id <> g1.id
join movie_has_genre mg1 on mg1.genre_id = g1.id
join movie_has_genre mg2 on mg2.genre_id = g2.id
join movie_has_director md1 on md1.movie_id = mg1.movie_id
join movie_has_director md2 on md2.movie_id = mg2.movie_id
and md2.director_id = md1.director_id
group by g1.id, g2.id
Related
List all the actors who made a film during the 1950s and also in the 1980s.
When I try exclusively:
SELECT a.name
FROM movies AS m
JOIN castings AS c
JOIN actors AS a
ON m.id = c.movieid
AND c.actorid = a.id
WHERE (m.yr BETWEEN 1950 AND 1959)
or:
SELECT a.name
FROM movies AS m
JOIN castings AS c
JOIN actors AS a
ON m.id = c.movieid
AND c.actorid = a.id
WHERE (m.yr BETWEEN 1980 AND 1989)
I get results for both separate queries. However I get no rows when I combine these.
SELECT a.name
FROM movies AS m1
JOIN movies AS m2
JOIN castings AS c
JOIN actors AS a
ON m1.id = c.movieid
AND m2.id = c.movieid
AND c.actorid = a.id
AND m1.id < m2.id
WHERE (m1.yr BETWEEN 1950 AND 1959) AND (m2.yr BETWEEN 1980 AND 1989);
How can I find the names I'm looking for?
Starting from your existing query, a straight-forward approach is to use aggregation, and filter with a HAVING clause:
SELECT a.name
FROM movies AS m
JOIN castings AS c ON m.id = c.movieid
JOIN actors AS a ON c.actorid = a.id
GROUP BY a.id, a.name
HAVING
MAX(m.yr BETWEEN 1950 AND 1959) = 1
AND MAX(m.yr BETWEEN 1980 AND 1989) = 1
You can use a subquery in where clause.
SELECT
a.name
FROM
actors AS a
WHERE EXISTS
(SELECT
1
FROM
movies m,
castings c
WHERE c.actorid = a.id
AND m.id = c.movieid
AND (m.yr BETWEEN 1950
AND 1959)
LIMIT 1)
OR EXISTS
(SELECT
1
FROM
movies m,
castings c
WHERE c.actorid = a.id
AND m.id = c.movieid
AND (m.yr BETWEEN 1980
AND 1989)
LIMIT 1)
Or you can use Group By like the following while getting number of movies:
SELECT
a.name,
SUM(IF(m.yr BETWEEN 1950 AND 1959, 1, 0)) AS e1950, -- Number of Movies in 1950s
SUM(IF(m.yr BETWEEN 1980 AND 1989, 1, 0)) AS e1980 -- Number of Movies in 1980s
FROM movies AS m
JOIN castings AS c
JOIN actors AS a
ON m.id = c.movieid
AND c.actorid = a.id
GROUP BY a.name HAVING e1950 > 0 AND e1980 > 0;
I try to display each pair of actors, the two actors have not played on anyone
common movie genre while at the same time the genre that one has played together with the genre which has been played by the other being at least 7
I did this:
select a1.actor_id as i8opoios1,a2.actor_id as i8opoios2,((count(distinct(g1.genre_name))+count(distinct(g2.genre_name)))>=7) as result from actor as a1
inner join actor as a2 on a1.actor_id!=a2.actor_id
inner join role as r1 on a1.actor_id=r1.actor_id
inner join movie as m1 on m1.movie_id=r1.movie_id
inner join movie_has_genre as mg1 on mg1.movie_id=m1.movie_id
inner join genre as g1 on mg1.genre_id=g1.genre_id
inner join role as r2 on a2.actor_id=r2.actor_id
inner join movie as m2 on m2.movie_id=r2.movie_id
inner join movie_has_genre as mg2 on mg2.movie_id=m2.movie_id
inner join genre as g2 on mg2.genre_id=g2.genre_id
where a1.actor_id<a2.actor_id and mg1.genre_id!=mg2.genre_id
group by a1.actor_id,a2.actor_id;
This query returns me all the pair of actors who have not played on anyone
common movie genre and as a result a 1(TRUE) if combined they played on 7 or more genre and 0(FAlSE) if they hadnt.My question is if anyone has an idea on how can i return only the true statements.
Tables and their columns:
actor(actor_id,name)
role(actor_id,movie_id)
movie(movie_id,title)
movie_has_genre(movie_id,genre_id)
genre(genre_id,gender_name)
Add the condition to your where clause to limit the rows.
SELECT
a1.actor_id as i8opoios1,
a2.actor_id as i8opoios2,
IF((count(distinct(g1.genre_name))+count(distinct(g2.genre_name)))>=7,1,0) as result
FROM actor as a1
INNER JOIN actor as a2
on a1.actor_id != a2.actor_id
INNER JOIN role as r1
on a1.actor_id = r1.actor_id
INNER JOIN movie as m1
on m1.movie_id = r1.movie_id
INNER JOIN movie_has_genre as mg1
on mg1.movie_id = m1.movie_id
INNER JOIN genre as g1
on mg1.genre_id = g1.genre_id
INNER JOIN role as r2
on a2.actor_id = r2.actor_id
INNER JOIN movie as m2
on m2.movie_id = r2.movie_id
INNER JOIN movie_has_genre as mg2
on mg2.movie_id = m2.movie_id
INNER JOIN genre as g2
on mg2.genre_id = g2.genre_id
WHERE a1.actor_id < a2.actor_id
AND mg1.genre_id != mg2.genre_id
HAVING IF((count(distinct(g1.genre_name))+count(distinct(g2.genre_name)))>=7,1,0) = 1
GROUP BY a1.actor_id,a2.actor_id;
I am a newbie to SQL working on an assignment to find the actor or actress with the most appearances. A diagram of the database I'm working with is here:
Here was the query I was trying to use:
SELECT DISTINCT n.name, count(n.name)
FROM cast_info c
INNER JOIN name n
ON (n.id = c.person_id)
INNER JOIN title t
ON (c.movie_id = t.id)
CROSS JOIN role_type r
WHERE (r.role = 'actor' OR r.role = 'actress')
GROUP BY n.name
This is intended to get a count of how many times different actors showed up, which I can then sort and select the top one. But it doesn't work. Something else I did was:
SELECT n.name, count(n.name) AS amount
FROM cast_info c
INNER JOIN name n
ON (n.id = c.person_id)
INNER JOIN title t
ON (c.movie_id = t.id)
LEFT JOIN role_type r
ON c.role_id = r.id
AND (r.role = 'actor' OR r.role = 'actress')
GROUP BY amount
ORDER BY amount DESC
LIMIT 1
But that gives the error
aggregate functions are not allowed in GROUP BY
LINE 1: SELECT COUNT(*) AS total FROM (SELECT n.name, count(n.name) ...
Tips?
I am going to take a stab at each of these questions for you, because this assignment is obviously causing you some trouble.
You can find everything you need in your cast_info table and your role_type table, unless you need to display the actors/actresses actual name.
I would start by selecting all rows that represent an actor or actress in a movie. This should be a unique combination, as a person can't be an actor in the same movie twice. Once you've done that, group by the persons id and get the count() of rows, which should effectively be the number of movies. I think the error you're getting is exactly for the reason it sounds, you can't use an aggregate column in your order by. A workaround for that would be to use this as a subquery, and use MAX() to get most appearances.
Try this:
SELECT c.personid, MAX(numMovies) AS mostApperances
FROM(SELECT c.personid, COUNT(*) AS numMovies
FROM cast_info c
JOIN role_type r ON r.id = c.role_id
WHERE r.role = 'actor' OR r.role = 'actress'
GROUP BY c.personid) t
Try this
SELECT DISTINCT n.name, count(n.name)
FROM cast_info c
INNER JOIN name n
ON n.id = c.person_id
INNER JOIN title t
ON c.movie_id = t.id
LEFT JOIN role_type r
ON c.role_id = r.id
AND (r.role = 'actor' OR r.role = 'actress')
GROUP BY n.name
My original query is doing joins using the WHERE clause rather than JOIN. I realized that this was not returning movies that did not have any stars or genres did not show up so I think I have to do a LEFT JOIN in order to show every movie. Here is my original SQL:
SELECT *
FROM movies m, stars s, stars_in_movies sm, genres g, genres_in_movies gm
WHERE m.id = sm.movie_id
AND sm.star_id = s.id
AND gm.genre_id = g.id
AND gm.movie_id = m.id
AND m.title LIKE '%the%'
AND s.first_name LIKE '%Ben%'
ORDER BY m.title ASC
LIMIT 5;
I tried to do a LEFT JOIN on movies I'm definitely doing something wrong.
SELECT *
FROM movies m, stars s, stars_in_movies sm, genres g, genres_in_movies gm
LEFT JOIN movies m1 ON m1.id = sm.movie_id
LEFT JOIN movies m2 ON m2.id = gm.movie_id
AND sm.star_id = s.id
AND gm.genre_id = g.id
ORDER BY m.title ASC
LIMIT 5;
I get ERROR 1054 (42S22): Unknown column 'sm.movie_id' in 'on clause' so clearly I'm doing the join wrong, I just don't see what it is.
Don't mix the comma operator with JOIN - they have different precedence! There is even a warning about this in the manual:
However, the precedence of the comma operator is less than of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section.
Try this instead:
SELECT *
FROM movies m
LEFT JOIN (
stars s
JOIN stars_in_movies sm
ON sm.star_id = s.id
) ON m.id = sm.movie_id AND s.first_name LIKE '%Ben%'
LEFT JOIN (
genres g
JOIN genres_in_movies gm
ON gm.genre_id = g.id
) ON gm.movie_id = m.id
WHERE m.title LIKE '%the%'
ORDER BY m.title ASC
LIMIT 5;
You should put your conditions related to your JOINs in the same ON clause. However, for your above problem, you should use the following query:
SELECT *
FROM movies m
LEFT JOIN stars_in_movies sm ON sm.movie_id = m.id
JOIN stars s ON sm.star_id = s.id
LEFT JOIN genres_in_movies gm ON gm.movie_id = m.id
JOIN genres g ON gm.genre_id = g.id
ORDER BY m.title ASC
LIMIT 5;
Maybe ugly, But the way it will work is here. Beware this is ugly and lot of people is giving warning about this kind of hacks
SELECT *
FROM movies m, stars_in_movies sm LEFT JOIN movies m1 ON m1.id = sm.movie_id, stars s
ORDER BY m.title ASC
LIMIT 5;
when using joins, you must do the join with the right table which have the columns you are comparing.
SQL Join (inner join in MySQL)
select emp1.id,emp1.name,emp1.job from (select id, type as name, description as job from component_type as emp1)emp1
inner join
emp
on emp1.id=emp.id;
Left Join
select emp1.id,emp1.name,emp1.job from (select id, type as name, description as job from component_type as emp1 where id between '1' AND '5')emp1
left join
emp
on emp1.id=emp.id;
Right Join
select emp1.id,emp1.name,emp1.job from (select id, type as name, description as job from component_type as emp1)emp1
Right join
(select * from emp where id between '1' and '5')exe
on emp1.id=exe.id;
Using alias connect many table without using join..
select sum(s.salary_amount) as total_expenses_paid_to_all_department
from salary_mas_tbl s,dept_mas_tbl d
where s.salary_dept=d.dept_id;
So let's same I'm trying to find actors who are in two movies together (for the purpose of a degrees of separation page). I have databases as such (this is just some made up data):
actors
id first_name last_name gender
17 brad pitt m
2 kevin bacon m
movies
id name year
20 benjamin button 2008
roles
a_id m_id role
17 20 Mr. Benjamin Button
So I want to return the names of the movies which both actors are in. I have the first and last names of two actors.
I'm having a lot of trouble getting this to work. What I'm having trouble with, specifically, is the SELECT part
SELECT name FROM movies JOIN . . .
I'm starting with first_name and last_name values for each
You must join twice:
SELECT m.name movie_name
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.first_name = 'brad' and a1.last_name = 'pitt' and
a2.first_name = 'kevin' and a2.last_name = 'bacon'
Show all actor combinations per film:
SELECT m.name movie_name, a1.id actor1, a2.id actor2
FROM movies m join roles r1 on
r1.m_id = m.id join actors a1 on
r1.a_id = a1.id join roles r2 on
r2.m_id = m.id join actors a2 on
r2.a_id = a2.id
WHERE
a1.id < a2.id
The < ensures that each combination is only reported once.
select m.name,group_concat(concat_ws(' ',a.first_name,a.last_name) order by a.last_name) as actors
from actors as a
inner join roles as r on a.id = r.a_id
inner join movies as m on m.id = r.m_id
where r.a_id in (2,17)
group by r.m_id
having count(r.a_id) = 2
order by m.name
declare #FirstActorID int,
#SecondActorID int;
select m.[name]
from
movies m
inner join [roles] r1 on r1.m_id = m.id and r1.a_id = #FirstActorID
inner join [roles] r2 on r2.m_id = m.id and r2.a_id = #SecondActorID