SQL code need help understanding - mysql

I'm working on a movie database and I thought it might be a good idea to have some type of parental control in place for the future. I wrote some SQL code and it works for the most part, but i don't know why.
In my main movies table I have movies rated with the standard rating g, pg, pg-13, r, and NC-17.
Here is the SQL code i used
Select title
From movies
Where rating < "r";
I works though it still shows the NC-17 shows. If I change the r rating to NC-17 it only shows the g rated shows.
I know I can type out a longer SQL to give me the matches I want, but I want to understand why this code is performing the way it is.
Thanks for the help.

How is MySQL to know R is less than NC-17? MySQL knows how to sort numbers and letters but not movie ratings. You have to assign the ratings numbers and sort based on that.
For example:
Rating Value
------------------
G 1
PG 10
PG-13 20
R 30
NC-17 40
Than give each movie the numerical value of the rating (or use a join) and then sort on that.

SQL doesn't understand the movie rating system. The < operator looks at strings in alphabetical order. So when you say < 'R', it's looking for for all ratings that start with a letter before R in the alphabet. Since there are a limited number of options for ratings, you're best off doing something along the lines of this:
SELECT title
FROM movies
WHERE rating NOT LIKE 'R'
AND rating NOT LIKE 'NC-17'

Here is the query that would probably work for you if you want to rank them:
SELECT m.title, m.rating
FROM
(
SELECT s.title, s.rating,
CASE WHEN s.rating = 'G' THEN 1
WHEN s.rating = 'PG' THEN 2
WHEN s.rating = 'PG-13' THEN 3
WHEN s.rating = 'R' THEN 4
WHEN s.rating = 'NC-17' THEN 5
ELSE 6 END AS MovieRanking
FROM movies s
) m
WHERE m.MovieRanking < 4

Related

To find actors who were never unemployed for more than 3 years in a stretch

I'm suppose to find the actors that were never unemployed for more than 3 years at a stretch. (Assume that the actors remain unemployed between two consecutive movies).
The M_cast represents the actors and person table has the name of the actor and the movie table has the year column.
SQL code:
SELECT a.Name, c.year
FROM Person a
Inner Join M_cast b
ON a.PID = b.PID
Inner Join Movie c
ON c.MID = b.MID
This would give us all the actor name and the various years they worked , however, I'm not sure how to check if the an actor worked for 3 years in a row or not. Would appreciate your insights on this. If a similar question was asked anywhere else, please point me in the right direction.
So you've got the Actors, Movies (Work), and the Year for each of the Movies. My approach to this would be to use the LEAD() function to add a new column next_year.
A bit of background on the function:
LEAD() is a window function that provides access to a row at a specified physical offset which follows the current row.
For example, by using the LEAD() function, from the current row, you can access data of the next row, or the row after the next row, and so on.
The LEAD() function can be very useful for comparing the value of the current row with the value of the following row.
The following shows the syntax of the LEAD() function:
LEAD(return_value ,offset [,default])
OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sort_expression [ASC | DESC], ...
)
Example use:
SELECT *
, LEAD(year, 1, 0) OVER (PARTITION BY Name ORDER BY year ASC) AS next_year
FROM (yourJoinedTable)
Using (next_year - year) AS gap you can create a column of gaps between movies (years unemployed). Finally, find the MAX(gap) for each actor and filter those actors out which have a MAX(gap) of ≤ 3. This should leave you with only actors that have no more than 3 years between movies.
Hope this helps!

MySQL get duplicate rows in subquery

I want to display all duplicate records from my table, rows are like this
uid planet degree
1 1 104
1 2 109
1 3 206
2 1 40
2 2 76
2 3 302
I have many different OR statements with different combinations in subquery and I want to count every one of them which matches, but it only displays the first match of each planet and degree.
Query:
SELECT DISTINCT
p.uid,
(SELECT COUNT(*)
FROM Params AS p2
WHERE p2.uid = p.uid
AND(
(p2.planet = 1 AND p2.degree BETWEEN 320 - 10 AND 320 + 10) OR
(p2.planet = 7 AND p2.degree BETWEEN 316 - 10 AND 316 + 10)
...Some more OR statements...
)
) AS counts FROM Params AS p HAVING counts > 0 ORDER BY p.uid DESC
any solution folks?
updated
So, the problem most people have with their counting-joined-sub-query-group-queries, is that the base query isn't right, and the following may seem like a complete overkill for this question ;o)
base data
in this particular example what you would want as a data basis is at first this:
(uidA, planetA, uidB, planetB) for every combination of player A and player B planets. that one is quite simple (l is for left, r is for right):
SELECT l.uid, l.planet, r.uid, r.planet
FROM params l, params r
first step done.
filter data
now you want to determine if - for one row, meaning one pair of planets - the planets collide (or almost collide). this is where the WHERE comes in.
WHERE ABS(l.degree-r.degree) < 10
would for example only leave those pairs of planet with a difference in degrees of less than 10. more complex stuff is possible (your crazy conditional ...), for example if the planets have different diameter, you may add additional stuff. however, my advise would be, that you put some additional data that you have in your query into tables.
for example, if all 1st planets players have the same size, you could have a table with (planet_id, size). If every planet can have different sizes, add the size to the params table as a column.
then your WHERE clause could be like:
WHERE l.size+r.size < ABS(l.degree-r.degree)
if for example two big planets with size 5 and 10 should at least be 15 degrees apart, this query would find all those planets that aren't.
we assume, that you have a nice conditional, so at this point, we have a list of (uidA, planetA, uidB, planetB) of planets, that are close to colliding or colliding (whatever semantics you chose). the next step is to get the data you're actually interested in:
limit uidA to a specific user_id (the currently logged in user for example)
add l.uid = <uid> to your WHERE.
count for every planet A, how many planets B exist, that threaten collision
add GROUP BY l.uid, l.planet,
replace r.uid, r.planet with count(*) as counts in your SELECT clause
then you can even filter: HAVING counts > 1 (HAVING is the WHERE for after you have GROUPed)
and of course, you can
filter out certain players B that may not have planetary interactions with player A
add to your WHERE
r.uid NOT IN (1)
find only self collisions
WHERE l.uid = r.uid
find only non-self collisions
WHERE l.uid <> r.uid
find only collisions with one specific planet
WHERE l.planet = 1
conclusion
a structured approach where you start from the correct base data, then filter it appropriately and then group it, is usually the best approach. if some of the concepts are unclear to you, please read up on them online, there are manuals everywhere
final query could look something like this
SELECT l.uid, l.planet, count(*) as counts
FROM params l, params r
WHERE [ collision-condition ]
GROUP BY l.uid, l.planet
HAVING counts > 0
if you want to collide a non-planet object, you might want to either make a "virtual table", so instead of FROM params l, params r you do (with possibly different fields, I just assume you add a size-field that is somehow used):
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size) r
multiple:
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size
UNION
SELECT 250 as degree, 3 as planet, 10 as size
UNION ...) r

selecting count, percentage and compare in mysql

I have a MYSQL database that has two tables.
The first one called movies_interaction with the following parameters: movie_id, watcher_id, movie_duration.
the second table is movies_additional with the following parameters: movie_id and movie_length
What I need is to connect these two tables by which I need to retrieve the movies which was watched more than once AND more than 60% of its duration for each user.
This is so far what I wrote, but I know its wrong, so don't take that on me please.
SELECT watcher_id, COUNT(*)
AS video_count FROM movie_interaction
GROUP BY movie_id HAVING COUNT(*) >= 2 AND
movies_interaction.movie_duration *100 / movies_additional.movie_length >= 60
the fourth line of the code is where I need help!
Result can look like this: watcher_id1 = 9, watcher_id2 = 13...etc.
Thank you very much.
I'm not exactly sure on you need but I think that way look like that.
SELECT A.movie_id, COUNT(I.watcher_id) AS video_count
FROM movie_interaction I
INNER JOIN movie_additional A ON A.movie_id = I.movie_id AND
I.movie_duration / A.movie_length >= 0.60
GROUP BY A.movie_id
HAVING COUNT(I.watcher_id) >= 2
This will give you a list of movies watched more than one time for more that 60>%

How do I compute a ranking with MySQL stored procedures?

Let's assume we have this very simple table:
|class |student|
---------------
Math Alice
Math Bob
Math Peter
Math Anne
Music Bob
Music Chis
Music Debbie
Music Emily
Music David
Sports Alice
Sports Chris
Sports Emily
.
.
.
Now I want to find out, who I have the most classes in common with.
So basically I want a query that gets as input a list of classes (some subset of all classes)
and returns a list like:
|student |common classes|
Brad 6
Melissa 4
Chris 3
Bob 3
.
.
.
What I'm doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I'm attending around 1000 classes - and so do most of the other students. I'd like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I'd be glad if someone could give me some hints on how to do that.
(note: I'm using a MySQL cluster, because it's a very big school with 1 million classes and several million students)
UPDATE
Ok, it's obvious that I'm not a DB expert ;) 4 times the nearly the same answer means it's too easy.
Thank you anyway! I tested the following SQL statement and it's returning what I need, although it is very slow on the cluster (but that will be another question, I guess).
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
But actually I simplified my problem a bit too much, so let's make a bit it harder:
Some classes are more important than others, so they are weighted:
| class | importance |
Music 0.8
Math 0.7
Sports 0.01
English 0.5
...
Additionally, students can be more ore less important.
(In case you're wondering what this is all about... it's an analogy. And it's getting worse. So please just accept that fact. It has to do with normalizing.)
|student | importance |
Bob 3.5
Anne 4.2
Chris 0.3
...
This means a simple COUNT() won't do it anymore.
In order to find out who I have the most in common with, I want to do the following:
map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
float myScoreForClassC = getMyScoreForClass(c);
List students = getStudentsAttendingClass(c);
foreach (Student s in students)
{
float studentScoreForClassC = c.classImportance*s.Importance;
studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
}
}
I hope it's not getting too confusing.
I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I'm attending.
SELECT
tbl.student,
COUNT(tbl.class) AS common_classes
FROM
tbl
WHERE tbl.class IN (SELECT
sub.class
FROM
tbl AS sub
WHERE
(sub.student = "BEN")) -- substitue "BEN" as appropriate
GROUP BY tbl.student
ORDER BY common_classes DESC;
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
Update re your question update.
Assuming there's a table class_importance and student_importance as you describe above:
SELECT classes.student, SUM(ci.importance*si.importance) AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
The only thing this doesn't have is the LEAST(weighted_importance, myScoreForClassC) because I don't know how you calculate that.
Supposing you have another table myScores:
class | score
Math 10
Sports 0
Music 0.8
...
You can combine it all like this (see the extra LEAST inside the SUM):
SELECT classes.student, SUM(LEAST(m.score,ci.importance*si.importance)) -- min
AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
LEFT JOIN myScores m ON classes.class=m.class -- add in myScores
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
If your myScores didn't have a score for a particular class and you wanted to assign some default, you could use IFNULL(m.score,defaultvalue).
As I understand your question, you can simply run a query like this:
SELECT `student`, COUNT(`class`) AS `commonClasses`
FROM `classes_to_students`
WHERE `class` IN ('Math', 'Music', 'Sport')
GROUP BY `student`
ORDER BY `commonClasses` DESC
Do you need to specify the classes? Or could you just specify the student? Knowing the student would let you get their classes and then get the list of other students who share those classes.
SELECT
otherStudents.Student,
COUNT(*) AS sharedClasses
FROM
class_student_map AS myClasses
INNER JOIN
class_student_map AS otherStudents
ON otherStudents.class = myClasses.class
AND otherStudents.student != myClasses.student
WHERE
myClasses.student = 'Ben'
GROUP BY
otherStudents.Student
EDIT
To follow up your edit, you just need to join on the new table and do your calculation.
Using the SQL example you gave in the edit...
SELECT
classes_table.student,
MIN(class_importance.importance * student_importance.importance) as rank
FROM
classes_table
INNER JOIN
class_important
ON classes_table.class = class_importance.class
INNER JOIN
student_important
ON classes_table.student = student_importance.student
WHERE
classes_table.class in (my_subject_list)
GROUP BY
classes_table.student
ORDER BY
2

In MySQL, can I have a table returning the ten last rated games by rating?

The actual question is a little more complex than that, so here goes.
I have a website which reviews games. Ratings/reviews are posted for each game, and so I have a MySQL database to handle it all.
Thing is, I'd really like a page that showed what score (out of 10) meant what, and to illustrate it would have the game that was last reviewed as an example. I can always do it without, but this would be cooler.
So the query should return something like this (but running from 10 to 0):
|---------------*----------------*-----------------*-----------------|
* game.gameName | game.gameImage | review.ourScore | review.postedOn *
|---------------*----------------*-----------------*-----------------|
| Top Game | img | 10 | (unix timestamp)|
| NearlyTop Game| img | 9 | (unix timestamp)|
| Great Game | img | 8 | (unix timestamp)|
|---------------*----------------*-----------------*-----------------|
The information is in two tables, game and review. I think you'd use MAX() to find out the last timestamp and corresponding game information, but as far as complex queries go, I'm in way over my head.
Of course this could be done with 10 simple SELECTs but I'm sure there must be a way to do this in one query.
Thanks for any help.
Here is an ugly solution I found:
This query simply gets the IDs and scores of the reviews that you want to look at. I have included it so that you can understand what the trick is, without getting distracted by other stuff:
SELECT * FROM
(SELECT reviewID, ourScore FROM review ORDER BY postedOn DESC) as `r`
GROUP BY ourScore
ORDER BY ourScore DESC;
This exploits MySQL's 'GROUP BY' behavior. When the grouping is done, if the source rows have different values for different columns, then the value of the topmost source row is used. So if you had rows in this order:
reviewId Score
1 3
0 3
2 3
Then after you group by score, the reviewId is 1 because that row was on the top:
reviewId Score
1 3
So we want to put the most recent review on the top before we do the group by. Since ORDERing is always dones after grouping, in a single SELECT statement, I had to make a subquery to accomplish this. Now we just dress up this query a little bit to get all the fields you wanted:
SELECT `r`.*, game.gameName, game.gameImage FROM
(SELECT reviewID, ourScore, postedOn, gameID FROM review ORDER BY **postedOn DESC**) as `r`
JOIN game ON `r`.gameID = game.gameID
GROUP BY ourScore
ORDER BY ourScore DESC;
That should work.
SELECT DISTINCT game.gameName, game.gameImage, review.ourScore FROM game
LEFT JOIN review
ON game.ID = review.gameID
ORDER BY review.postedOn
LIMIT 10
Or something like that, check out how to use the Distinct first, I'm not sure on the syntax, and you may have to tell the ORDER BY DESC or ASC depending on what you want.
Well..
SELECT game.gameName, game.gameImage, review.ourScore
FROM game
LEFT JOIN review ON game.gameID = review.gameID
GROUP BY review.ourScore DESC
LIMIT 10
returns a list of games grouped by each individual score. But this isn't what I want, I want the game that is last posted - this is why the timestamp is important. With that query, MySQL returns the first result it can find.
I think this would work:
select g.gameName, g.gameImage, r.ourScore, r.postedOn
from game g, review r
where g.gameId = r.gameId
and r.postedOn = (select max(sr.postedOn)
from review sr where sr.ourScore = r.ourScore)
group by r.ourScore
order by r.ourScore desc;
Edit: above SQL was corrected after David Grayson's comment. I think this query is pretty easy to understand but probably performs poorly compared with his solution.