Selecting multiple table based on MySQL - mysql

I want to perform a search query on my database tables. This is my tables example.
movies
--------------------------
ID (INT) | TITLE (VARCHAR)
--------------------------
actors
-------------------------
ID (INT) | NAME (VARCHAR)
-------------------------
directors
-------------------------
ID (INT) | NAME (VARCHAR)
-------------------------
ma
------------------------------------
ID (INT) | MOVIE (INT) | ACTOR (INT)
------------------------------------
md
---------------------------------------
ID (INT) | MOVIE (INT) | DIRECTOR (INT)
---------------------------------------
I want to search on movies.title, actors.name, directors.name based on some keyword. ma table have field movie that reference to movie.id and field actor that reference to actor.id and so on. I want to display the result group by movie.id. The problem is I can't figured out how to perform based on table directors and actors, based on table ma and md that have reference to table movies. Sorry for the grammar, if you think this question worth it, you can edit it.
So, any idea?

This one you get everything
SELECT m.title, a.name AS actor, d.name AS director
FROM movies m
INNER JOIN ma ON (ma.movie_id = m.id)
INNER JOIN actors a ON (a.id = ma.actor_id)
INNER JOIN md ON (md.movie_id = m.id)
INNER JOIN directors d ON (d.id = md.director_id)
ORDER BY m.title
This one you group by movies
SELECT m.title, GROUP_CONCAT(a.name) AS actorS, GROUP_CONCAT(d.name) AS directors
FROM movies m
INNER JOIN ma ON (ma.movie_id = m.id)
INNER JOIN actors a ON (a.id = ma.actor_id)
INNER JOIN md ON (md.movie_id = m.id)
INNER JOIN directors d ON (d.id = md.director_id)
GROUP BY m.id
ORDER BY m.title
[EDIT]
For goodness' sake.. just add the WHERE
...
WHERE m.title LIKE 'KEYWORD%' OR a.name LIKE 'KEYWORD%' OR d.name LIKE 'KEYWORD%'

Related

Querying from IMDB Database using MySQL [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I wrote a SQL query to answer the following question:
Find all the actors that made more movies with Yash Chopra than any other director in the IMBD database.
Sample schema:
person
(pid *
,name
);
m_cast
(mid *
,pid *
);
m_director
(mid*
,pid*
);
* = (component of) PRIMARY KEY
Following is my query:
WITH common_actors AS
(SELECT A.actor_id as actors, B.director_id as director_id, B.movies as movies_with_director,
B.director_id as yash_chops_id, B.movies as movies_with_yash_chops FROM
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id) A
JOIN
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id
)B
ON A.actor_id = B.actor_id
WHERE B.director_id in (SELECT PID FROM Person WHERE Name LIKE
'%Yash%Chopra%'))
SELECT distinct actors as actor_id, movies_with_yash_chops as total_movies FROM common_actors
WHERE actors NOT IN (SELECT actors FROM common_actors WHERE movies_with_director > movies_with_yash_chops)
And the result obtained from this is of length: 430 rows. However the result obtained should be of length 243 rows. Could anyone please suggest where I went wrong in my query? Is my approach right?
Sample result:
Actor name
0 Sharib Hashmi
1 Kulbir Badesron
2 Gurdas Maan
3 Parikshat Sahni
...
242 Ramlal Shyamlal
Thanks in advance!
Consider the following:
DROP TABLE IF EXISTS person;
CREATE TABLE person
(person_id SERIAL PRIMARY KEY
,name VARCHAR(20) NOT NULL UNIQUE
);
DROP TABLE IF EXISTS movie;
CREATE TABLE movie
(movie_id SERIAL PRIMARY KEY
,title VARCHAR(50) NOT NULL UNIQUE
);
DROP TABLE IF EXISTS m_cast;
CREATE TABLE m_cast
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);
DROP TABLE IF EXISTS m_director;
CREATE TABLE m_director
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);
INSERT INTO person (name) VALUES
('Steven Feelberg'),
('Manly Kubrick'),
('Alfred Spatchcock'),
('Fred Pitt'),
('Raphael DiMaggio'),
('Bill Smith');
INSERT INTO movie VALUES
(1,'Feelberg\'s Movie with Fred & Raph'),
(2,'Feelberg and Fred Ride Again'),
(3,'Kubrick shoots DiMaggio'),
(4,'Kubrick\'s Movie with Bill Smith'),
(5,'Spatchcock Presents Bill Smith');
INSERT INTO m_director VALUES
(1,1),
(2,1),
(3,2),
(4,2),
(5,3);
INSERT INTO m_cast VALUES
(1,4),
(1,5),
(2,4),
(3,5),
(4,6),
(5,6);
I've included the movie table only for ease of reference. It's not relevant to the actual problem.
Also, note that this model assumes that cast members are only listed once, regardless of whether or not they have multiple roles in a given film.
The following query asks 'how often have each actor and director worked together'...
An actor is any person who has been a cast member of any movie.
A director is any person who has been a director of any movie.
SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director;
+-------------------+-------------------+-------+
| actor | director | total |
+-------------------+-------------------+-------+
| Fred Pitt | Alfred Spatchcock | 0 |
| Fred Pitt | Manly Kubrick | 0 |
| Fred Pitt | Steven Feelberg | 2 |
| Raphael DiMaggio | Alfred Spatchcock | 0 |
| Raphael DiMaggio | Manly Kubrick | 1 |
| Raphael DiMaggio | Steven Feelberg | 1 |
| Bill Smith | Alfred Spatchcock | 1 |
| Bill Smith | Manly Kubrick | 1 |
| Bill Smith | Steven Feelberg | 0 |
+-------------------+-------------------+-------+
By observation, we can see that:
the only actor to work more often with Feelberg than any other director is Fred Pritt
Raphael DiCaprio and Bill Smith have both worked equally often with two directors (albeit different directors)
EDIT: While I'm not seriously advocating this as a solution, the following is simply to demonstrate that the kernel provided above is really all you need to solve the problem...
SELECT x.*
FROM
( SELECT a.*
FROM
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) a
LEFT
JOIN
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) b
ON b.actor = a.actor
AND b.director <> a.director
AND b.total > a.total
WHERE b.actor IS NULL
) x
LEFT JOIN
( SELECT a.*
FROM
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) a
LEFT
JOIN
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) b
ON b.actor = a.actor
AND b.director <> a.director
AND b.total > a.total
WHERE b.actor IS NULL
) y
ON y.actor = x.actor AND y.director <> x.director
WHERE y.actor IS NULL;
+-----------+-----------------+-------+
| actor | director | total |
+-----------+-----------------+-------+
| Fred Pitt | Steven Feelberg | 2 |
+-----------+-----------------+-------+
This returns a list of every actor, and the director with whom they've worked most often. In this case, because Bill Smith and Raphael DiMaggio have worked most often equally with two directors, they are excluded from the result.
The answer to your problem is simply to select from this list all rows with Yash Chopra listed as the director.

Correctly join multiple many-to-many tables - MySQL query

a seemingly generic SQL query really left me clueless.
Here's the case.
I have 3 generic tables (simplified versions here):
Movie
id | title
-----------------------
1 | Evil Dead
-----------------------
2 | Bohemian Rhapsody
....
Genre
id | title
-----------------------
1 | Horror
-----------------------
2 | Comedy
....
Rating
id | title
-----------------------
1 | PG-13
-----------------------
2 | R
....
And 2 many-to-many tables to connect them:
Movie_Genre
movie_id | genre_id
Movie_Rating
movie_id | rating_id
The initial challenge was to write a query which allows me to fetch movies that belong to multiple genres (e.g. horror comedies or sci-fi action).
Thankfully, I was able to find this solution here
MySQL: Select records where joined table matches ALL values
However, what would be the correct option to fetch records that belong to multiple many-to-many tables? E.g. rated R horror comedies. Is there any way to do so without subquery (or a single one only)?
One method uses correlated subqueries:
select m.*
from movies m
where (select count(*)
from movie_genre mg
where mg.movie_id = m.id
) > 1 and
(select count(*)
from movie_rating mr
where mr.movie_id = m.id
) > 1 ;
With indexes on movie_genre(movie_id) and movie_rating(movie_id) this probably has quite reasonable performance.
The above is possibly the most efficient method. However, if you wanted to avoid subqueries, one method would be:
select mg.movie_id
from movie_genres mg join
movie_ratings mr
on mg.movie_id = mr.movie_id
group by mg.movie_id
having count(distinct mg.genre_id) > 0 and
count(distinct mr.genre_id) > 0;
More efficient than the above is aggregating before the join:
select mg.movie_id
from (select movie_id
from mg_genres
group by movie_id
having count(*) >= 2
) mg join
(select movie_id
from mg_ratings
group by movie_id
having count(*) >= 2
) mr
on mg.movie_id = mr.movie_id;
Although you state that you want to avoid subqueries, the irony is that the version with no subqueries probably has the worst performance of these three options.
E.g. rated R horror comedies
You can join all the tables together, aggregate by movie and filter with a HAVING clause:
select m.id, m.title
from movies m
inner join movie_genre mg on mg.movid_id = m.id
inner join genre g on g.id = mg.genre_id
inner join movie_rating mr on mr.movie_id = m.id
inner join rating r on r.id = mr.rating_id
group by m.id, m.title
having
max(r.title = 'R') = 1
and max(g.title = 'Horror') = 1
and max(g.title = 'Comedy') = 1
You can also use a couple of exists conditions along with correlated subqueries:
select m.*
from movie m
where
exists (
select 1
from movie_genre mg
inner join genre g on g.id = mg.genre_id
where mg.movie_id = m.id and g.title = 'R')
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Horror'
)
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Comedy'
)

GROUP BY aggregate by column

I've run into an issue where every time I attempt to use GROUP BY, H2 informs me that I need to add certain column names into the GROUP BY clause because, based on my research, it's unclear to H2 how to sort columns with non-repeating data.
Here's an example to elaborate:
Person table
+------------+------------+
| ID | Name |
+============+============+
| 1 | John |
+------------+------------+
| 2 | Jane |
+------------+------------+
Pet table
+------------+------------+------------+------------+
| ID | PERSON_ID | NAME | BIRTHDATE |
+============+============+============+============+
| 1 | 1 | Rufus | 2012 |
+------------+------------+------------+------------+
| 2 | 1 | Ben | 2014 |
+------------+------------+------------+------------+
Let's say I want all the oldest pets belonging to John.
SELECT PERSON.NAME, PET.NAME, PET.BIRTHDATE FROM PERSON
INNER JOIN PET ON PET.PERSON_ID = PERSON.ID
GROUP BY PERSON.NAME
ORDER BY PET.BIRTHDATE ASC
This would work perfectly in MySQL because it will simply group by PERSON.NAME and, by default, select the first record in the set. However, in H2 it needs to have aggregation such as MAX, MIN, etc.
The problem, as you can see in this example, is that you could use MIN to get the BIRTHDATE ordered correctly but there does not appear to be any aggregation function available for sorting NAME based on the oldest BIRTHDATE?
If you want the oldest pets, I would recommend:
SELECT p.NAME, pt.NAME, pt.BIRTHDATE
FROM PERSON p INNER JOIN
PET pt
ON pt.PERSON_ID = p.ID
WHERE pt.BIRTHDATE = (SELECT MIN(pt2.BIRTHDATE)
FROM pet pt2
WHERE pt2.PERSON_ID = PT.PERSON_ID
);
This explicitly selects the pet or pets (for each person) that have the earliest birth year. No aggregation is necessary.
You can also phrase this with JOINs only in the FROM:
SELECT p.NAME, pt.NAME, pt.BIRTHDATE
FROM PERSON p INNER JOIN
PET pt
ON pt.PERSON_ID = p.ID JOIN
(SELECT PERSON_ID, MIN(pt2.BIRTHDATE) as MINBT
FROM pet pt2
GROUP BY pt2.PERSON_ID
) pt2
ON pt2.PERSON_ID = PT.PERSON_ID;
You can always resort to NOT EXISTS in such cases, if the person has no pet with smaller birthdate then the pet is the oldest (if two pets happen to have the same age and both are the oldest ones for that person, then both are selected):
SELECT p.NAME, q.NAME, q.BIRTHDATE
FROM PERSON p
INNER JOIN PET q ON q.PERSON_ID = p.ID AND NOT EXISTS (
SELECT * FROM PET WHERE PERSON_ID = p.ID AND BIRTHDATE < q.BIRTHDATE
)
ORDER BY q.BIRTHDATE ASC
If you insist on GROUP BY you can do it like this:
SELECT a.name, b.name, b.BIRTHDATE FROM (
SELECT p.id, MIN(q.BIRTHDATE) birthdate FROM PERSON p
INNER JOIN PET q ON q.PERSON_ID = p.ID
GROUP BY p.ID
) o INNER JOIN PERSON a ON a.ID = o.ID
INNER JOIN PET b ON b.PERSON_ID = a.ID AND b.BIRTHDATE = o.BIRTHDATE
ORDER BY b.BIRTHDATE
If you can use WITH the query could be written easier.

MySQL many-to-many junction table: Select all entries from A which contain no values in B not in list

I have a many-to-many relationship, joined through a junction table. My specific case is recipes and ingredients. I want to select all recipes which don't contain ingredients not in a given list. For example, if I input cheese, toast and crackers, I want the results to include cheese on toast, cheese with crackers, but not jam on toast.
So something like:
SELECT * FROM
recipe
JOIN recipe_ingredient on recipe.id = recipe_ingredient.recipe_id
JOIN ingredient on ingredient.id = recipe_ingredient.ingredient_id
WHERE ingredient.name
???
("cheese", "toast", "crackers")
Selecting recipes which do contain any or all of these ingredients is easy enough, but if it can be avoided I don't want to have to then subsequently filter out results which contain unlisted ingredients.
Edit:
Some example tables:
ingredient
-----------
id | name
1 | "cheese"
2 | "toast"
3 | "crackers"
4 | "jam"
recipe
-----------
id | name
1 | "cheese on toast"
2 | "cheese with crackers"
3 | "jam on toast"
recipe_ingredient
-------------------------
recipe_id | ingredient_id
1 | 1
1 | 2
2 | 1
2 | 3
3 | 2
3 | 4
One way to achieve this would be to select recipes that have any ingredient not listed in your criteria to match using ALL with a subquery:
SELECT r.id
FROM recipe r
JOIN recipe_ingredient ri on r.id = ri.recipe_id
JOIN ingredient i on i.id = ri.ingredient_id
WHERE i.name <> ALL ( SELECT 'cheese' UNION SELECT 'toast' UNION SELECT 'crackers' )
GROUP BY r.id
To retrieve only those recipees that match your conditions you could wrap the above statement using the very same <> ALL comparison.
SELECT *
FROM recipe
WHERE id <> ALL (
SELECT r.id
FROM recipe r
JOIN recipe_ingredient ri on r.id = ri.recipe_id
JOIN ingredient i on i.id = ri.ingredient_id
WHERE i.name <> ALL ( SELECT 'cheese' UNION SELECT 'toast' UNION SELECT 'crackers' )
GROUP BY r.id
);
Additional note: Actually NOT IN is an alias for <> ALL, so you could use them interchangeably.
Given your sample it would only return:
id | name
---|-------------------------
1 | cheese on toast
2 | cheese with crackers
See it working here: http://sqlfiddle.com/#!9/f20010/25
Pretty trick this one. But it can be done. The trick part here is to know how much ingredients each recipe has and then compare it with amount of ingredients with the given parameters.
select tb.name
from ( select r.id, r.name, count(*) qtd
from ingredient i
inner join recipe_ingredient ri
on i.id = ri.ingredient_id
inner join recipe r
on r.id = ri.recipe_id
where i.name in ('cheese', 'toast', 'crackers')
group by r.id, r.name
) tb
where exists ( select 1
from ingredient i
inner join recipe_ingredient ri
on i.id = ri.ingredient_id
inner join recipe rr
on rr.id = ri.recipe_id
where rr.id = tb.id
group by rr.id
having count(*) = tb.qtd)
First I selected all recipes that has those ingredients that you filter counting from it how much ingredients it has. That first query will give me:
"cheese on toast" 2
"cheese with crackers" 2
"jam on toast" 1
And on the EXISTS clause I made a subquery to count the total ingredients that all recipes have and joined with the upper subquery. So it will only give me the ones listed.
See it working here: http://sqlfiddle.com/#!9/f20010/22

Joining tables in sql and obtain the data as per the query

I have three tables( movie,actor,casting). I want to know the actors name for the id obtained from this query.
select id from movie where title ='Casablanca';
My tables:
Movie | Actor | casting
_______ ________ _______
Movieid Actorid Movieid
title name Actorid
yr ord
director
budget
gross
This should do it:
SELECT a.name
FROM movie m
INNER JOIN casting c
ON m.id = c.movieid
INNER JOIN actor a
ON c.actorid = a.id
WHERE m.title = 'Casablanca';
Try this:
SELECT a.id, a.name
FROM actor a
INNER JOIN casting c
ON a.id = c.actorid
INNER JOIN movie m
ON c.movieid = m.id
WHERE m.title ='Casablanca';