SQL query for duplicate rows based on 2 columns - mysql

I have 3 tables movie, rating and reviewer
movie has 4 columns movieID, title, year, director
rating has 4 columns reviewerID, movieID, stars, ratingDate
reviewer has 2 columns reviewerID, name
How do I query reviewer who rated the same movie more than 1 time and gave it higher rating on the second review.
This is my attempt at query to find rows with duplicate values in 2 columns (meaning the movie has been rated by 1 reviewer more than once), and then somehow I need to query reviewer who gave higher stars on second review.
SELECT reviewer.name, movie.title, rating.stars, rating.ratingDate
FROM rating
INNER JOIN reviewer ON reviewer.rID = rating.rID
INNER JOIN movie ON movie.mID = rating.mID
WHERE rating.rID IN (SELECT rating.rID FROM rating GROUP BY rating.rID, rating.mID HAVING COUNT(*) > 1)
ORDER BY reviewer.name, rating.ratingDate;
movie table
movieID
Title
Year
Director
101
Gone with the Wind
1939
Victor Fleming
102
Star Wars
1977
George Lucas
103
The Sound of Music
1965
Robert Wise
104
E.T.
1982
Steven Spielberg
105
Titanic
1997
James Cameron
106
Snow White
1937
null
107
Avatar
2009
James Cameron
108
Raiders of the Lost Ark
1981
Steven Spielberg
rating table
reviewerID
movie ID
Stars
ratingDate
201
101
2
2011-01-22
201
101
4
2011-01-27
202
106
4
null
203
103
2
2011-01-20
203
108
4
2011-01-12
203
108
2
2011-01-30
204
101
3
2011-01-09
205
103
3
2011-01-27
205
104
2
2011-01-22
205
108
4
null
206
107
3
2011-01-15
206
106
5
2011-01-19
207
107
5
2011-01-20
208
104
3
2011-01-02
reviewer table
reviewerID
Name
201
Sarah Martinez
202
Daniel Lewis
203
Brittany Harris
204
Mike Anderson
205
Chris Jackson
206
Elizabeth Thomas
207
James Cameron
208
Ashley White
Expected result
Reviewer
Title
Sarah Martinez
Gone with the Wind
EDIT: I am using MySQL version 8.0.29

Use:
select re.Name,mo.Title
FROM (
select reviewerID,movieID,ratingDate,Stars
from rating r
where exists (select 1
from rating r1
where r1.reviewerID=r.reviewerID
and r.movieID=r1.movieID
and r.ratingDate>r1.ratingDate
and r.Stars>r1.Stars
)) as t1
inner join movie mo on t1.movieID=mo.movieID
inner join reviewer re on t1.reviewerID=re.reviewerID
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=0c5d850ee3393b054d9af4c4ac241d96
The key part is the EXISTS statement
where exists (select 1
from rating r1
where r1.reviewerID=r.reviewerID
and r.movieID=r1.movieID
and r.ratingDate>r1.ratingDate
and r.Stars>r1.Stars
which will return only the results on which you have the same user more than one movie, the rating Stars are bigger than the previos one based on the ratingDate

we don't need to use where in with rating and join with rating
You can try to use lead window function to get the next start each reviewerID and movieID which represent duplicate rating (order by ratingDate)
then compare with your logic to find a newer start greater than older start.
SELECT DISTINCT r.Name,m.Title
FROM (
SELECT reviewerID,
movieID,
Stars,
LEAD(Stars) OVER(PARTITION BY reviewerID, movieID ORDER BY ratingDate) n_start
FROM rating
) t1
INNER JOIN movie m ON t1.movieID = m.movieID
INNER JOIN reviewer r ON r.reviewerID = t1.reviewerID
WHERE Stars < t1.n_start
This sample data sqlfiddle provide by #ErgestBasha

Related

Find what genre is most frequent in each age category

I am thinking about making age groups per decade and find out what genre is more frequent. It is more difficult than I expected but here is what I have tried:
One table is like this, called: sell_log
id id_film id_cust
1 2 2
2 3 4
3 1 5
4 4 3
5 5 1
6 2 4
7 2 3
8 3 1
9 5 3
2nd here is a table about the films that has the id and the genres:
id_film genres
1 comedy
2 fantasy
3 sci-fi
4 drama
5 thriller
and 3rd table, customers is this:
id_cust date_of_birth_cust
1 1992-03-12
2 1999-06-25
3 1986-01-14
4 1985-09-18
5 1992-05-19
This is the code I did:
select id_cust,date_of_birth_cust,
CASE
WHEN date_of_birth_cust > 1980-01-01 and date_of_birth_cust < 1990-01-01 then ##show genre##
WHEN date_of_birth_cust > 1990-01-01 and date_of_birth_cust < 2000-01-01 then ##show genre##
ELSE ##show genre##
END
from purchases
INNER JOIN (
select id_cust
FROM sell_log
group by id_cust
) customer.id_cust = sell_log.id_cust
How is the correct form in your opinion?
Expected results: for example
based on the most frequent number of genres find that genre and pass it for that age group.
ages most frequent genre
from 1980 to 1990 comedy
from 1990 to 2000 fantasy
rest ages drama
Update:
doing the code in the answer gives this:
ages most_frequent_genre
from 1980 to 1989 Comedy
from 1990 to 1999 Thriller
from 1990 to 1999 Action
from 1990 to 1999 Comedy
rest Comedy
What am I doing wrong
You can use a CTE to get the results per age and genre and then use it to get the maximum number of purchases per age. Finally join again to the CTE:
with cte as (
select
CASE
WHEN year(c.date_of_birth_cust) between 1980 and 1989 then 'from 1980 to 1989'
WHEN year(c.date_of_birth_cust) between 1990 and 1999 then 'from 1990 to 1999'
ELSE 'rest'
END ages,
f.genres,
count(*) counter
from sell_log s
inner join films f on f.id_film = s.id_film
inner join customers c on c.id_cust = s.id_cust
group by ages, f.genres
)
select c.ages, c.genres most_frequent_genre
from cte c inner join (
select c.ages, max(counter) counter
from cte c
group by c.ages
) g on g.ages = c.ages and g.counter = c.counter
order by c.ages
See the demo.
In your sample data there are ties which will all be at the results.
Results:
| ages | most_frequent_genre |
| ----------------- | ------------------- |
| from 1980 to 1989 | fantasy |
| from 1990 to 1999 | comedy |
| rest | fantasy |

MySQL three table join with sum

I have three tables
Student
studenid stuname
101 john
102 aron
103 mary
104 lucy
Subject
studenid subjid subjname
101 1 maths
102 2 science
103 3 computer
104 4 english
Marks
subjid mark
1 50
2 40
3 55
4 60
1 40
2 55
3 60
I want output like this where studenid (sum of mark as total)
studenid stuname mark
101 john 90
102 aron 95
103 mary 115
104 lucy 60
Thank you in advance for yout help, i want output like this even join query or subquery which is best for timing
This just requires a straight left join across all tables, with an aggregation by student.
SELECT
st.studenid,
st.stuname,
COALESCE(SUM(m.mark), 0) AS mark
FROM Student st
LEFT JOIN Subject su
ON st.studenid = su.studenid
LEFT JOIN Marks m
ON su.subjid = m.subjid
GROUP BY
st.studenid,
st.stuname;
Demo
Note that if studenid be a primary key in the Student table, then strictly we would only need to aggregate by this column alone.

MYSQL - Joining Multiple tables with COUNT()

I was asked to create a database for both students' tardiness and school policy's violations.
Now, I have three separate tables:
tbl_ClassList:
Student_ID Student_Name
1000 Lee, Jonder
1001 Chow, Stephen
1002 Kim, Martin
1003 Johns, Kevin
1004 Hearfield, Olivia
1005 Jarrs, Marlon
tbl_Tardy:
Record_No Student_ID
1 1001
2 1001
3 1000
4 1003
5 1002
6 1003
7 1001
8 1001
9 1002
10 1004
tbl_Violation:
Record_No Student_ID
1 1000
2 1000
3 1004
4 1005
5 1001
6 1002
7 1003
What I was asked to do is to generate a list that combines that contains information about the students including his/her ID, name, number of tardiness and the number of violations. Something like this:
Student_ID Student_Name No. of Tardy No. of Violation
1000 Lee, Jonder 1 2
1001 Chow, Stephen 4 1
1002 Kim, Martin 2 1
1003 Johns, Kevin 2 1
1004 Hearfield, Olivia 1 1
1005 Jarrs, Marlon 0 1
Is there any type of Joins I can use to achieve the output? Please help me.
You can find separate aggregates for tardy and violations inside subqueries and left join them with classlist table. Use coalesce to get zero in case there is no row for tardy/violation.
select c.*,
coalesce(t.count_tardy, 0) as no_of_tardy,
coalesce(v.count_violations, 0) as no_of_violations,
from tbl_classlist c
left join (
select student_id,
count(*) as count_tardy
from tbl_tardy
group by student_id
) t on c.student_id = t.student_id
left join (
select student_id,
count(*) as count_violations
from tbl_violation
group by student_id
) v on c.student_id = v.student_id;

Return rows in common with another user in SQL (Collaborative Filtering)

I'm trying to build a basic collaborative filtering recommendation system using MySQL. I have a user rating table like this:
user_id movie_id rating
1 131 342 3 <<< User 131 has rated movie 342
2 131 312 5 <<< and also 312
3 122 312 4
4 213 342 5
5 141 342 5 <<< User 141 has rated 342
6 141 312 4 <<< and also 312 (2 movies in common)
7 141 323 3
So I'm trying to find similar users to 131. I want to return the users who have at least two movies in common and the ratings are above 3. So it should return rows 5 and 6 (as shown above).
This is what I have so far:
SELECT * from user_ratings
WHERE rating >= 3
AND movie_id IN (SELECT movie_id from user_rating WHERE user_id = 131)
AND user_id != 131
This returns:
user_id movie_id rating
3 122 312 4 <<< Don't want these two
4 213 342 5 <<<
5 141 342 5
6 141 312 4
It returns the movies that users have in common with 131, but I need it to only show the users who have at least two items in common. How could I do this? I'm unsure of how to proceed :(
You can first find the user_ids that have more or equal number of movies as user_id = 131 with rating > 3. Then use IN in the WHERE clause to get the additional data:
SQL Fiddle
SELECT *
FROM user_ratings
WHERE
user_id IN(
SELECT user_id
FROM user_ratings
WHERE
movie_id IN (SELECT movie_id FROM user_ratings WHERE user_id = 131)
AND rating > 3
GROUP BY user_id
HAVING
COUNT(*) >= (SELECT COUNT(*) FROM user_ratings WHERE user_id = 131)
)
AND rating > 3
You can use a self join on movie_id to get a list of user_ids that have rated at least 2 of the same movies as user 131 at a rating of 4 or higher:
select ur2.user_id
from user_ratings ur1
join user_ratings ur2
on ur2.movie_id = ur1.movie_id
and ur2.user_id <> ur1.user_id
where ur1.user_id = 131
and ur2.rating > 3
group by ur2.user_id
having count(*) >= 2
http://sqlfiddle.com/#!9/06b56/4

Query to display the only the most recent message of each thread

Environment PHP 5.3.5 phpMyAdmin 3.3.9
Here's an link of the issue in sqlfiddle
Trying to create a query which filters through a user's message activity. The activity will display the most recent threads based upon message recently added. So I have to consider if the user is the recipient or sender.
I have a query structured but I am having great difficulty on grouping the threads. When I group the threads, the order of the most recent activity is lost.
Here is the query and results which captures the correct order of the last 10 messages but there are duplicate thread_ids. I want to only display the most recent thread based upon the message sent date.
SQL query:
SELECT m.date_sent, m.thread_id, m.message_id, m.sender_id,
upub2.firstname as sender_name, mr.recipient_id,
upub1.firstname as recipient_name
FROM message AS m
JOIN message_recipient AS mr ON mr.message_id = m.message_id
JOIN user_public_info AS upub1 ON upub1.user_public_info_id = mr.recipient_id
join user_public_info AS upub2 ON upub2.user_public_info_id = m.sender_id
WHERE ((m.senderDelete IS NULL OR m.senderDelete = 0) AND m.sender_id = 2 ) OR
((mr.is_delete IS NULL OR mr.is_delete = 0 ) AND mr.recipient_id = 2)
ORDER BY m.message_id;
Results:
date_sent thread_id message_id sender_id sender_name recipient_id recipient_name
2013-10-09 14:31:50 106 113 1 John 2 Mark
2013-10-09 14:30:50 107 112 2 Mark 1 John
2013-10-09 14:30:31 106 111 2 Mark 1 John
2013-10-09 09:49:58 112 110 1 John 2 Mark
2013-10-09 09:20:24 108 106 1 John 2 Mark
2013-10-07 15:46:15 107 105 1 John 2 Mark
2013-10-07 14:40:25 103 104 1 John 2 Mark
2013-10-07 14:39:37 103 103 1 John 2 Mark
2013-10-07 14:36:34 107 102 2 Mark 1 John
2013-10-07 14:36:07 106 101 2 Mark 1 John
2013-10-07 14:35:29 105 100 2 Mark 1 John
2013-10-07 12:32:50 104 99 2 Mark 1 John
2013-10-07 12:15:43 104 98 2 Mark 1 John
2013-10-07 11:46:36 104 97 2 Mark 1 John
2013-10-07 11:43:32 104 96 1 John 2 Mark
2013-10-07 11:43:17 104 95 1 John 2 Mark
2013-10-07 11:27:14 103 94 1 John 2 Mark
What I want is this:
date_sent thread_id message_id sender_id sender_name recipient_id recipient_name
2013-10-09 14:31:50 106 113 1 John 2 Mark
2013-10-09 14:30:50 107 112 2 Mark 1 John
2013-10-09 09:49:58 112 110 1 John 2 Mark
2013-10-09 09:20:24 108 106 1 John 2 Mark
2013-10-07 14:40:25 103 104 1 John 2 Mark
2013-10-07 14:35:29 105 100 2 Mark 1 John
2013-10-07 12:32:50 104 99 2 Mark 1 John
Can this be done in one single query or should I create another query based upon the results of the first query?
Thank you for anyone that can help me solve this query....
Your various data sets, queries, and results do not appear to correspod with one another so it's a little difficult to follow, but I suspect you're after something along these lines...
SELECT m.message_id m_id
, m.sender_id s_id
, m.thread_id t_id
, m.subject
, m.message
, m.date_sent
, s.firstname sender
, r.firstname recipient
FROM message m
JOIN message_recipient n
ON n.message_id = m.message_id
JOIN user_public_info s
ON s.user_public_info_id = m.sender_id
JOIN user_public_info r
ON r.user_public_info_id = n.recipient_id
JOIN (SELECT thread_id, MAX(max_message_id) max_message_id FROM message GROUP BY thread_id)x
ON x.thread_id = m.thread_id AND x.max_message_id = m.message_id;