Querying from IMDB Database using MySQL [closed] - mysql

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I wrote a SQL query to answer the following question:
Find all the actors that made more movies with Yash Chopra than any other director in the IMBD database.
Sample schema:
person
(pid *
,name
);
m_cast
(mid *
,pid *
);
m_director
(mid*
,pid*
);
* = (component of) PRIMARY KEY
Following is my query:
WITH common_actors AS
(SELECT A.actor_id as actors, B.director_id as director_id, B.movies as movies_with_director,
B.director_id as yash_chops_id, B.movies as movies_with_yash_chops FROM
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id) A
JOIN
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id
)B
ON A.actor_id = B.actor_id
WHERE B.director_id in (SELECT PID FROM Person WHERE Name LIKE
'%Yash%Chopra%'))
SELECT distinct actors as actor_id, movies_with_yash_chops as total_movies FROM common_actors
WHERE actors NOT IN (SELECT actors FROM common_actors WHERE movies_with_director > movies_with_yash_chops)
And the result obtained from this is of length: 430 rows. However the result obtained should be of length 243 rows. Could anyone please suggest where I went wrong in my query? Is my approach right?
Sample result:
Actor name
0 Sharib Hashmi
1 Kulbir Badesron
2 Gurdas Maan
3 Parikshat Sahni
...
242 Ramlal Shyamlal
Thanks in advance!

Consider the following:
DROP TABLE IF EXISTS person;
CREATE TABLE person
(person_id SERIAL PRIMARY KEY
,name VARCHAR(20) NOT NULL UNIQUE
);
DROP TABLE IF EXISTS movie;
CREATE TABLE movie
(movie_id SERIAL PRIMARY KEY
,title VARCHAR(50) NOT NULL UNIQUE
);
DROP TABLE IF EXISTS m_cast;
CREATE TABLE m_cast
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);
DROP TABLE IF EXISTS m_director;
CREATE TABLE m_director
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);
INSERT INTO person (name) VALUES
('Steven Feelberg'),
('Manly Kubrick'),
('Alfred Spatchcock'),
('Fred Pitt'),
('Raphael DiMaggio'),
('Bill Smith');
INSERT INTO movie VALUES
(1,'Feelberg\'s Movie with Fred & Raph'),
(2,'Feelberg and Fred Ride Again'),
(3,'Kubrick shoots DiMaggio'),
(4,'Kubrick\'s Movie with Bill Smith'),
(5,'Spatchcock Presents Bill Smith');
INSERT INTO m_director VALUES
(1,1),
(2,1),
(3,2),
(4,2),
(5,3);
INSERT INTO m_cast VALUES
(1,4),
(1,5),
(2,4),
(3,5),
(4,6),
(5,6);
I've included the movie table only for ease of reference. It's not relevant to the actual problem.
Also, note that this model assumes that cast members are only listed once, regardless of whether or not they have multiple roles in a given film.
The following query asks 'how often have each actor and director worked together'...
An actor is any person who has been a cast member of any movie.
A director is any person who has been a director of any movie.
SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director;
+-------------------+-------------------+-------+
| actor | director | total |
+-------------------+-------------------+-------+
| Fred Pitt | Alfred Spatchcock | 0 |
| Fred Pitt | Manly Kubrick | 0 |
| Fred Pitt | Steven Feelberg | 2 |
| Raphael DiMaggio | Alfred Spatchcock | 0 |
| Raphael DiMaggio | Manly Kubrick | 1 |
| Raphael DiMaggio | Steven Feelberg | 1 |
| Bill Smith | Alfred Spatchcock | 1 |
| Bill Smith | Manly Kubrick | 1 |
| Bill Smith | Steven Feelberg | 0 |
+-------------------+-------------------+-------+
By observation, we can see that:
the only actor to work more often with Feelberg than any other director is Fred Pritt
Raphael DiCaprio and Bill Smith have both worked equally often with two directors (albeit different directors)
EDIT: While I'm not seriously advocating this as a solution, the following is simply to demonstrate that the kernel provided above is really all you need to solve the problem...
SELECT x.*
FROM
( SELECT a.*
FROM
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) a
LEFT
JOIN
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) b
ON b.actor = a.actor
AND b.director <> a.director
AND b.total > a.total
WHERE b.actor IS NULL
) x
LEFT JOIN
( SELECT a.*
FROM
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) a
LEFT
JOIN
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) b
ON b.actor = a.actor
AND b.director <> a.director
AND b.total > a.total
WHERE b.actor IS NULL
) y
ON y.actor = x.actor AND y.director <> x.director
WHERE y.actor IS NULL;
+-----------+-----------------+-------+
| actor | director | total |
+-----------+-----------------+-------+
| Fred Pitt | Steven Feelberg | 2 |
+-----------+-----------------+-------+
This returns a list of every actor, and the director with whom they've worked most often. In this case, because Bill Smith and Raphael DiMaggio have worked most often equally with two directors, they are excluded from the result.
The answer to your problem is simply to select from this list all rows with Yash Chopra listed as the director.

Related

Correctly join multiple many-to-many tables - MySQL query

a seemingly generic SQL query really left me clueless.
Here's the case.
I have 3 generic tables (simplified versions here):
Movie
id | title
-----------------------
1 | Evil Dead
-----------------------
2 | Bohemian Rhapsody
....
Genre
id | title
-----------------------
1 | Horror
-----------------------
2 | Comedy
....
Rating
id | title
-----------------------
1 | PG-13
-----------------------
2 | R
....
And 2 many-to-many tables to connect them:
Movie_Genre
movie_id | genre_id
Movie_Rating
movie_id | rating_id
The initial challenge was to write a query which allows me to fetch movies that belong to multiple genres (e.g. horror comedies or sci-fi action).
Thankfully, I was able to find this solution here
MySQL: Select records where joined table matches ALL values
However, what would be the correct option to fetch records that belong to multiple many-to-many tables? E.g. rated R horror comedies. Is there any way to do so without subquery (or a single one only)?
One method uses correlated subqueries:
select m.*
from movies m
where (select count(*)
from movie_genre mg
where mg.movie_id = m.id
) > 1 and
(select count(*)
from movie_rating mr
where mr.movie_id = m.id
) > 1 ;
With indexes on movie_genre(movie_id) and movie_rating(movie_id) this probably has quite reasonable performance.
The above is possibly the most efficient method. However, if you wanted to avoid subqueries, one method would be:
select mg.movie_id
from movie_genres mg join
movie_ratings mr
on mg.movie_id = mr.movie_id
group by mg.movie_id
having count(distinct mg.genre_id) > 0 and
count(distinct mr.genre_id) > 0;
More efficient than the above is aggregating before the join:
select mg.movie_id
from (select movie_id
from mg_genres
group by movie_id
having count(*) >= 2
) mg join
(select movie_id
from mg_ratings
group by movie_id
having count(*) >= 2
) mr
on mg.movie_id = mr.movie_id;
Although you state that you want to avoid subqueries, the irony is that the version with no subqueries probably has the worst performance of these three options.
E.g. rated R horror comedies
You can join all the tables together, aggregate by movie and filter with a HAVING clause:
select m.id, m.title
from movies m
inner join movie_genre mg on mg.movid_id = m.id
inner join genre g on g.id = mg.genre_id
inner join movie_rating mr on mr.movie_id = m.id
inner join rating r on r.id = mr.rating_id
group by m.id, m.title
having
max(r.title = 'R') = 1
and max(g.title = 'Horror') = 1
and max(g.title = 'Comedy') = 1
You can also use a couple of exists conditions along with correlated subqueries:
select m.*
from movie m
where
exists (
select 1
from movie_genre mg
inner join genre g on g.id = mg.genre_id
where mg.movie_id = m.id and g.title = 'R')
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Horror'
)
and exists (
select 1
from movie_rating mr
inner join rating r on r.id = mr.rating_id
where mr.movie_id = m.id and r.title = 'Comedy'
)

view specific queried value of each cell of table using some query in mysql?

I have one table containing some relation between tables?
school_id of school_table
class_id of class_table
teacher_id of teacher_table
subject_id of subject_table
Table sample:
I want to retrieve this whole table with replacing the name of each id from the child tables.
Ex. At time of retrieving the table as table or view i want to see each school name in place of school_id, class name in place of class_id and for other both class and subject with all rows in the table(above shown in image.)
Above shown image is my original table stored in mysql database. and i want this as result:
assign_subject_id | Schoolname | teachername | classname | subjectName | session
1 | UCDC | rahul | one | math | 2018-19
2 | UCDC | gopi | two | CS | 2018-19
school_table:
school_id | schoolname
7 | UCDC
and so forth for all table.
Thank you.
assuming you have school_table, teacher_table, class_table, subject_table
each with same pattern as school_table (school_id | schoolname)
the you need some join
INNER JOIN if all the relation key must match
select a.assign_subject_id, b.schoolname, c.teachername
, d.classname, e.subjectName, a.session
from my_table a
inner join school_table b on a.school_id = b.school_id
inner join teacher_table c on a.teacher_id = c.teacher_id
inner join class_table d on a.class_id = d.class_id
inner join subject_table e on a.subject_id = e.subject_id
LEFT JOIN if the relation key can not match
select a.assign_subject_id, b.schoolname, c.teachername
, d.classname, e.subjectName, a.session
from my_table a
left join school_table b on a.school_id = b.school_id
left join teacher_table c on a.teacher_id = c.teacher_id
left join class_table d on a.class_id = d.class_id
left join subject_table e on a.subject_id = e.subject_id

SQL query with left join does not allow use of column from previous table

I am forming a sql query like below which gives me an error that i cannot use m1.id in inner where clause.
SELECT
c.id,
m.member_no,
mc.card_no
FROM
customer AS c
LEFT JOIN
(
SELECT
*
FROM
membership
WHERE
creation_date = (SELECT MAX(creation_date) FROM membership AS m1 WHERE m1.cust_id = 123)
) AS m ON m.cust_id = c.id
LEFT JOIN
(
SELECT
*
FROM
member_card
WHERE
emboss_date = (SELECT MAX(emboss_date) FROM member_card AS mc1 WHERE mc1.membership_id = m.id)
) AS mc ON mc.membership_id = m.id
WHERE
c.id = 123
Table :
Customer, Membership, Member_card.
Customer can have many membership and each membership can have many member_card.
Table Customer
id name address
1 amit abc
2 mohit xyz
3 rahul asdf
4 ganesh pqr
Table membership
id member_no creation_date cust_id
21 123 21-09-1978 1
31 234 21-09-1988 2
41 345 21-09-1998 1
51 456 21-09-1977 2
Table member_card
id card_no membership_id emboss_date
111 12345 21 21-09-1978
222 23456 31 21-09-1977
333 34567 21 21-09-1976
444 456789 41 21-09-1975
cust_id is foriegn Key in membership table
membership_id id foreign key in member_card table
Now, i want the customer details of all customer table with latest member_no(w.r.t creation_date) and card_no(w.r.t emboss_date), Even if a customer does not have membership, the customer details should be there. refer the query above i made
So, there should be one record for one customer, i.e the final result should contain 4 rows with data from all three tables
The joins should go with proper subquery
select
c.id, m.member_no, mc.card_no
from customer c
left join (
select * from membership m
where creation_date = (select max(creation_date)
from membership where cust_id = m.cust_id)
) m on m.cust_id = c.id
left join (select * from member_card mc
where emboss_date = (select max(emboss_date)
from member_card where membership_id = mc.membership_id)
) mc on mc.membership_id = m.id
where c.id = 123
Other version with aggregation instead of correlation
select
c.id, m.member_no, mc.card_no
from customer c left join (
select cust_id, max(creation_date) creation_date
from membership
group by cust_id) m
on m.cust_id = c.id left join (
select membership_id, max(emboss_date) emboss_date
from member_card mc
group by membership_id) mc
on mc.membership_id = m.cust_id
where c.id = 123

SQL - How to find the person with the highest grade

I am trying to find the name of the person who received the highest grade in the "Big Data" course.
I have 3 different tables:
People (id, name, age, address)
---------------------------------------------------
p1 | Tom Martin| 24 | 11, Integer Avenue, Fractions, MA
p2 | Al Smith | 33 | 26, Main Street, Noman's Land, PA
p3 | Kim Burton| 40 | 45, Elm Street, Blacksburg, VA
---------------------------------------------------
Courses (cid, name, department)
---------------------------------------------------------
c1 | Systematic Torture | MATH
c2 | Pretty Painful | CS
c3 | Not so Bad | MATH
c4 | Big Data | CS
---------------------------------------------------------
Grades (pid, cid, grade)
---------------------------------------------------
p1 | c1 | 3.5
p2 | c3 | 2.5
p3 | c2 | 4.0
p3 | c4 | 3.85
---------------------------------------------------
I can't figure out how to find the person with the highest grade without using any fancy SQL feature. That is, I just want to use SELECT, FROM, WHERE, UNION, INTERSECT, EXCEPT, CREATE VIEW and arithmetic comparison operators like =, <, >.
My outcome is showing something other than what I try to achieve.
This is what I have tried so far:
CREATE VIEW TEMPFIVE AS
SELECT G1.pid FROM Grades AS G1, Grades AS G2 WHERE G1.pid = G2.pid AND G1.cid = G2.cid
SELECT People.name, Courses.name FROM TEMPFIVE, People, Courses WHERE TEMPFIVE.pid = People.pid AND Courses.name = "Big Data";
+------------+----------+
| name | name |
+------------+----------+
| Tom Martin | Big Data |
| Al Smith | Big Data |
|Kim Burton | Big Data |
|Kim Burton | Big Data |
+------------+----------+
The easiest way is to use LIMT 1 with an ORDER BY DESC clause:
SELECT p.name, c.name, g.grade
FROM People AS p
JOIN Grades AS g ON p.id = g.pid
JOIN Courses AS c ON c.cid = g.cid
WHERE c.name = "Big Data"
ORDER BY g.grade DESC LIMIT 1
No Idea for MySql Query structure. So Explained in steps. I hope you can build query based on that.
join three tables according to their relationship
set course name 'Big data' in where clause
set grade order to DESC order
set the limit to fetch only first row.
Try this
select * from(
select p.id pid,p.name name, p.age age,p.address address,
c.cid cid, c.name coursname, c.department department,g.grade grade
from Grades G
left join
Courses C on g.cid = c.cid
left join
People p on g.pid = p.id
)a where coursname= 'Big Data' order by grade desc
you can apply the operators on the where clause
GiorgosBestos shoes the correct way if you only want 1 record. If you want ties. meaning if more than 1 student has the same MAX grade then you can do a subselect as follows:
SELECT p.name, c.name, g.grade
FROM
(
SELECT c.cid, MAX(g.grade) MaxGrade
FROM
Grades g
INNER JOIN Courses c
ON c.cid = g.cid
AND c.name = 'Big Data'
GROUP BY
c.cid
) m
INNER JOIN Grades g
ON g.cid = m.cid
AND g.grade = m.MaxGrade
INNER JOIN People p
ON g.pid = p.id
The following SQL covers the case when tow or more students have the same maximum grade:
SELECT P.NAME,
C.NAME,
G.GRADE
FROM PEOPLE P
JOIN GRADES G ON G.PID = P.ID
JOIN COURSES C ON C.CID = G.CID
WHERE C.NAME = 'Big data'
AND G.GRADE = (SELECT MAX(G2.GRADE)
FROM PEOPLE P2
JOIN GRADES G2 ON G2.PID = P2.ID
JOIN COURSES C2 ON C2.CID = G2.CID
WHERE C2.NAME = 'Big data');
It is similar but not identical to the SQL proposed by Matt.

mysql particular query [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have two table: teams and formations:
Table teams:
team_id | team_name
1 | Barcelona
2 | Real Madrid
3 | PSG
Table formations:
formation_id | team_id | module
1 | 2 | 5-3-2
2 | 1 | 4-4-2
3 | 3 | 4-4-2
4 | 2 | 4-4-3
Pratically i have need of "join" between the 2 table GROUP BY team_id but with the last "formation_id"
The result my be this:
team_id | team_name | formation_id | module
1 | Barcelona | 2 | 4-4-2
2 | Real Madrid| 4 | 4-4-3
3 | PSG | 3 | 4-4-2
Actually my query is:
SELECT *
FROM formations f
INNER JOIN teams t
ON (f.team_id = t.team_id)
GROUP BY t.team_id
With my query I selected the first insert formation for each team, instead I must select the last formations for each team.
Check this SQLFIDDLE
SELECT A.team_id,A.team_name,B.formation_id,B.module
FROM teams A,formations B
WHERE A.team_id=B.team_id
AND B.formation_id =
(
SELECT max(formation_id)
FROM formations C
WHERE C.team_id =B.team_id
)
ORDER BY A.team_id;
create table teams
(
team_id int
,team_name varchar(40)
);
create table formations
(
formation_id int
,team_id int
,module int
);
insert into teams
values
(1,'Barcelona'),(2,'Real Madrid'),(3,'PSG');
insert into formations
values
(1,2,532),(2,1,442),(3,3,442),(4,2,443);
You can find their maximum formation ID using a subquery that you will later join it with the original tables. Try this one,
SELECT a.*, c.formation_ID, c.`module`
FROM teams a
INNER JOIN
(
SELECT team_id, MAX(formation_ID) maxID
FROM formations
GROUP BY team_ID
) b ON a.team_id = b.team_id
INNER JOIN formations c
ON c.team_id = b.team_id AND
c.formation_ID = b.maxID
ORDER BY a.Team_id
SQLFiddle Demo
You can write:
SELECT t.team_id,
t.team_name,
f.formation_id,
f.module
FROM teams t
JOIN formations f
ON f.team_id = t.team_id
-- require f.formation_id to be MAX(formation_id) for some team:
JOIN ( SELECT MAX(formation_id) AS id
FROM formations
GROUP
BY team_id
) max_formation_ids
ON max_formation_ids.id = f.formation_id
;
or:
SELECT t.team_id,
t.team_name,
f.formation_id,
f.module
FROM teams t
JOIN formations f
ON f.team_id = t.team_id
-- require f.formation_id to be MAX(formation_id) for this team:
WHERE f.formation_id =
( SELECT MAX(formation_id)
FROM formations
WHERE team_id = t.team_id
)
;
or:
SELECT t.team_id,
t.team_name,
f.formation_id,
f.module
FROM teams t
JOIN formations f
ON f.team_id = t.team_id
-- forbid f.formation_id to be less than another for the same team:
LEFT
OUTER
JOIN formations f2
ON f2.team_id = t.team_id
AND f2.formation_id > f.formation_id
WHERE f2.formation_id IS NULL
;
SQL spec states that when you use group by clause then all fields/expressions in the output should be either results of aggregate functions (like count, avg, etc) or fields listed in group by clause. otherwise the behavior is not defined. So if you need to pick exactly this record you need to add some criteria to your query. Also, there is no guarantee that simple 'select * from some_table' would always return rows in the same order.
You could do something like:
select f.*, t.team_name from formations f
join (select team_id, max(formation_id) as max_formation_id from formations f
group by team_id) as mrf on mrf.max_formation_id = f.formation_id
join teams t on f.team_id = t.team_id
This version produces the same result without aggregates, which often incur internal sorting.
SELECT t.team_id,
t.team_name,
f.formation_id,
f.module
FROM teams t
INNER JOIN formations f
ON f.team_id = t.team_id
LEFT OUTER JOIN formations f2
ON f2.team_id = t.team_id
AND f2.formation_id > f.formation_id
WHERE f2.formation_id IS NULL
create table teams (team_id integer, team_name varchar(50));
create table formations (formation_id integer, team_id integer, module varchar(20));
insert into formations (formation_id, team_id, module) values
(1, 2, '5-3-2'),
(2, 1, '4-4-2'),
(3, 3, '4-4-2'),
(4, 2, '4-4-3')
;
insert into teams (team_id, team_name) values
(1, 'Barcelona'),
(2, 'Real Madrid'),
(3, 'PSG')
;
select t.team_id, team_name, f.formation_id, module
from
formations f
inner join
teams t on f.team_id = t.team_id
inner join (
select team_id, max(formation_id) as formation_id
from formations
group by team_id
) s on s.team_id = t.team_id and s.formation_id = f.formation_id
order by t.team_id
;
+---------+-------------+--------------+--------+
| team_id | team_name | formation_id | module |
+---------+-------------+--------------+--------+
| 1 | Barcelona | 2 | 4-4-2 |
| 2 | Real Madrid | 4 | 4-4-3 |
| 3 | PSG | 3 | 4-4-2 |
+---------+-------------+--------------+--------+