I have a movie database with a table for actors and another one for movies, I created a third table to add an actor partecipation in a movie. I added a field "star" to distinque leading actors from not leading actors.
I wish create a list order by the actors importance and so by the the total number of "stars".
SELECT a.id, a.name, COUNT( p.star ) AS star
FROM actors a
JOIN playing p, movies m
WHERE p.id_actor = a.id
AND p.id_movie = m.id
AND p.star =1
GROUP BY p.star
ORDER BY p.star DESC;
ACTORS
+----+---------+
| id | name |
+----+---------+
| 1 | actor01 |
| 2 | actor02 |
| 3 | actor03 |
+----+---------+
MOVIES
+----+----------+
| id | title |
+----+----------+
| 1 | movie01 |
| 2 | movie02 |
| 3 | movie03 |
+----+----------+
PLAYING
+----------+----------+-------+------+
| id_movie | id_actor | char | star |
+----------+----------+-------+------+
| 1 | 1 | char1 | 0 |
| 1 | 2 | char2 | 1 |
| 2 | 3 | char3 | 1 |
+----------+----------+-------+------+
I Need output Like:
+----------+--------------+
| actor | protagonist |
+----------+--------------+
| actor01 | 2 times |
| actor02 | 3 times |
+----------+--------------+
You need to fix the group by clause to group by the actor not the star column. You need to fix the order by to group by the aggregated column, not the original column:
SELECT a.id, a.name, sum( p.star = 1) AS stars
FROM actors a join playing p
on p.id_actor = a.id join
movies m
on p.id_movie = m.id
GROUP BY a.id, a.name
ORDER BY stars DESC;
Along the way, I fixed the from so it uses proper join syntax (with an on clause). And changed the query so it returns all actors, even those who have never been the star.
1.If you want to count all stars for an actor, you should group by actor but not stars.(Unless you want to count how many times an actor gets 1 star in a movie, you may not want to group by star)
2.You may want to use ON with JOIN
3.You may want to ORDER BY star but not ORDER BY p.star since you want to order by the result.
4.You may want to use SUM instead of COUNT to get the star counts.(SUM calculates the value but COUNT calculates the number. With SUM, you can set star value to whatever you want without change your sql. You can have star=2 which shows the actor is important to the movie or have star=-1, which means the actor stinks.)
You may have a look at the sql below:
SELECT a.id, a.name, SUM( p.star ) AS sum
FROM actors a
LEFT JOIN playing p ON p.id_actor = a.id
LEFT JOIN movies m ON p.id_movie = m.id
GROUP BY a.id
ORDER BY sum DESC;
Related
Please see the picture for ERROR SCREENSHOT
Table: Candidate
+-----+---------+
| id | Name |
+-----+---------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
+-----+---------+
Table: Vote
+-----+--------------+
| id | CandidateId |
+-----+--------------+
| 1 | 2 |
| 2 | 4 |
| 3 | 3 |
| 4 | 2 |
| 5 | 5 |
+-----+--------------+
id is the auto-increment primary key, CandidateId is the id appeared in Candidate table.
Write a sql to find the name of the winning candidate, the above example will return the winner B.
+------+
| Name |
+------+
| B |
+------+
Notes:
You may assume there is no tie, in other words there will be at most one winning candidate.
Why this code can't work? Just try to use without limit
SELECT c.Name AS Name
FROM Candidate AS c
JOIN
(SELECT r.CandidateId AS can, MAX(r.Total_vote) AS big
FROM (SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) AS r) AS v
ON c.id = v.can;
In your query, here: SELECT r.CandidateId AS can, MAX(r.Total_vote) AS big
you use MAX aggregate function, without group by, which is not correct SQL.
Try:
SELECT Candidate.* FROM Candidate
JOIN (
SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId
ORDER BY COUNT(id) DESC LIMIT 1
) v
ON Candidate.id = v.CandidateId
This is a join/group by query with order by:
select c.name
from candidate c join
vote v
on v.candidateid = c.id
group by c.id, c.name
order by count(*) desc
limit 1;
SELECT c.Name AS Name
FROM Candidate AS c JOIN (SELECT r.CandidateId AS can
FROM
(SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) AS r
WHERE r.Total_vote = (SELECT MAX(r.Total_vote) FROM (SELECT
CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) r)) AS v
ON c.id = v.can;
This is updated code
My code has two errors. The first one is "use of an aggregate like Max requires a Group By clause if there are any non-aggregated columns in the select list", but not sure why my previous code still can run and show no error. Maybe the system add the group by function automatically when it run.
The second one is that max can't be used with Group by in this format.
I'm sure I'm not the first to need to do this, but I couldn't find a similar question that accounted for the nuance.
I have 3 tables (fav_food, fav_color, and fav_place) that all follow a similar pattern:
userid | label | rank |
=================================
1 | red | 1
1 | green | 2
1 | orange | 3
2 | blue | 1
2 | red | 2
...
Each table will have at most 3 items per userid, but some users might have fewer than 3, and other users might have none for a given table. i.e., it's possible user 10 has 2 favorite colors, 1 favorite food, and 0 favorite places.
I'm looking for a query that can output my data like so:
userid | fav_food | fav_place | fav_color | rank
===========================================================
1 | pizza | New York | red | 1
1 | burgers | NULL | green | 2
1 | NULL | NULL | orange | 3
2 | tacos | Chicago | blue | 1
2 | burgers | Orlando | red | 2
...
Basically, all ranked 1 items together, ranked 2 items together, and ranked 3 items together (NULLs were no item of that rank exists).
I was able to get it working using 3 separate queries (one for each table) + post processing at the application layer, but for the sake of my personal knowledge base, I was wondering if anyone knew how to do it in a single query.
Many thanks!
#Isick,
You can do this with a LEFT OUTER JOIN on each table to a table containing just the userid and rank. DEMO
select user_rank.userid, user_rank.rank, f.food, p.place, c.color from
(
select userid, rank from fav_food
union
select userid, rank from fav_place
union
select userid, rank from fav_color
) user_rank
left outer join
( select userid, rank, label as food from fav_food) f
on user_rank.userid = f.userid and user_rank.rank = f.rank
left outer join
( select userid, rank, label as place from fav_place) p
on user_rank.userid = p.userid and user_rank.rank = p.rank
left outer join
( select userid, rank, label as color from fav_color) c
on user_rank.userid = c.userid and user_rank.rank = c.rank
order by userid, rank
There 3 tables category, program, video. Every video belongs to a program, and any program belongs to a category.
Category table
id | title
1 | cartoons
2 | documental
Program table
id | programcode | title | category_id
1 | WUCU | Program Name | 1
2 | ELKI | Program Name | 2
Video table
id | videocode | title | program_id
1 | ELKI00001 | Name | 2
2 | ELKI00002 | Name | 2
3 | ELKI00003 | Name | 2
4 | WUCU00001 | Name | 1
5 | WUCU00002 | Name | 1
6 | WUCU00003 | Name | 1
I need to get last 2 videos for every category
The problem: MySQL doesn't support getting the top most N values in a group. So we have to do it ourselves. This means we need a way to group the sets together (Category.Title and a way to know which videos to return belonging to each category. We'd like to use LIMIT here to limit the results by 2 but we can't limit by 2 for each category. We'd also like to use MAX to get the highest video ID for each category but that doesn't get us the 2nd one. So we have to build those in ourselves.
This is built using logic found:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
but altered to fit your data set
This is working but I don't know why I'm having to do multiple subselects for #RNUM and #VCAT. Not sure why they have to be separated at this point. fiddle
set #Rnum :=0, #VCat :='';
SELECT * FROM (
Select SequencedSet.*, #Rnum := if(mvcat = CTitle, #Rnum + 1, 1) RowNumber from (
SELECT CTitle, VCode, VTitle, VID, #vcat mvcat,
#VCat := CTitle as VCAT
FROM (
SELECT C.Title CTitle, V.Code VCode, V.Title VTitle, V.ID VID
FROM Video V
INNER JOIN Program P
on P.ID = V.Program_Id
INNER JOIN Category C
on C.ID = P.Category_ID
ORDER BY C.Title, V.ID DESC) orderedset) sequencedSet) X
where X.ROWNUMBER<=2
I have 2 tables that look like this:
users (uid, name)
-------------------
| 1 | User 1 |
| 2 | User 2 |
| 3 | User 3 |
| 4 | User 4 |
| 5 | User 5 |
-------------------
highscores (user_id, time)
-------------------
| 3 | 12005 |
| 3 | 29505 |
| 3 | 17505 |
| 5 | 19505 |
-------------------
I want to query only for users that have a highscore and only the top highscore of each user. The result should look like:
------------------------
| User 3 | 29505 |
| User 5 | 19505 |
------------------------
My query looks like this:
SELECT user.name, highscores.time
FROM user
INNER JOIN highscores ON user.uid = highscores.user_id
ORDER BY time ASC
LIMIT 0 , 10
Actually this returns multiple highscores of the same user. I also tried to group them but it did not work since it did not return the best result but a random one (eg: for user id 3 it returned 17505 instead of 29505).
Many thanks!
You should use the aggregated function MAX() together with group by clause.
SELECT a.name, MAX(b.`time`) maxTime
FROM users a
INNER JOIN highscores b
on a.uid = b.user_id
GROUP BY a.name
SQLFiddle Demo
Your effort of grouping users was correct. You just needed to use MAX(time) aggregate function instead of selecting only time.
I think you wrote older query was like this:
SELECT name, time
FROM users
INNER JOIN highscores ON users.uid = highscores.user_id
GROUP BY name,time
But actual query should be:
SELECT user.name, MAX(`time`) AS topScore
FROM users
INNER JOIN highscores ON users.uid = highscores.user_id
GROUP BY user.name
I have a list of publications stored in publications table. Each publication has a many-to-many relation with categories and also a many-to-many relation with keywords.
Given a publication I'd like to find related ones based on a score value computed with the following algorithm:
each shared category with other publications counts as one point
each shared keyword with other publications counts as one point
the score value is the sum of the points computed with previous steps
I want to retrieve with a single query the list of related publications ordered by this score.
Now I have these two queries which compute the score for both categories and keyword
SELECT c.publication_id, (COUNT(c.category_id)) AS cscore
FROM cat_pub c
WHERE c.category_id IN <list of category ids obtained from the current publication>
GROUP BY c.publication_id
ORDER BY cscore DESC
and for the keyword score
SELECT k.publication_id, (COUNT(k.keyword_id)) AS kscore
FROM key_pub k
WHERE k.keyword IN <list of category ids obtained from the current publication>
GROUP BY k.publication_id
ORDER BY kscore DESC
Finally I need to JOIN the resulting query with a SELECT query which should retrieve publications data (title, intro, etc,) ordering them by score and with a limit clause to get the most relevant publications related to the selected one.
Currently I tried to use these two queries as subtables in a join:
SELECT mydata.*, (q1.cscore + q2.kscore) AS score
FROM publications p
INNER JOIN (<cscore query>) q1 ON p.id = q1.publication_id
INNER JOIN (<kscore query>) q2 ON p.id = q2.publication_id
ORDER BY score DESC
LIMIT 5
EXPLAIN shows me that a couple of temporary table will be used. Could it be a performance problem? Is there any better way to implement this?
Update
To answer to Johan's comment
Your solution is wrong. Use a LIMIT clause in subqueries could lead to inconsistent results with every value for the limit. What if I have the following results for the subqueries (I'll show 11 records, but your query will fetch only the first ten)
+-------+--------+ +-------+--------+
| p.id | cscore | | p.id | kscore |
+-------+--------+ +-------+--------+
| 27854 | 100 | | 27865 | 100 |
| 27853 | 100 | | 27864 | 100 |
| 27852 | 100 | | 27863 | 100 |
| 27851 | 100 | | 27862 | 100 |
| 27850 | 100 | | 27861 | 100 |
| 27849 | 100 | | 27860 | 100 |
| 27848 | 100 | | 27859 | 100 |
| 27847 | 100 | | 27858 | 100 |
| 27846 | 100 | | 27857 | 100 |
| 27845 | 100 | | 27856 | 100 |
| 27844 | 100 | | 27855 | 100 |
| 1000 | 99 | | 1000 | 99 |
+-------+--------+ +-------+--------+
If I have ten record with 100 as cscore and ten different records with 100 as kscore the join will produce an empty set. So I'm not getting any result, while the publication with id 1000 should be the solution and it's left out from the result set.
Furthermore I could consider your solution with a LEFT JOIN, in this case only records from the left table will be fetched, and each record will get a total score of 100 (because of the NULL given by the empty kscore field in the second table). Again, the result is wrong because the highest scored record should be p1000 with a total score of 198 (= 99 + 99)
Your solution cannot produce reliable results.
You only want 5 results each from the subqueries.
I think it is best to only select 5 from then and use that in the query.
Rewrite q1 as:
SELECT c.publication_id, COUNT(*) AS cscore
FROM cat_pub c
WHERE c.publication_id = p.id
AND c.category_id IN <list of category ids obtained from the current publication>
GROUP BY c.publication_id
ORDER BY cscore DESC
LIMIT 10
Rewrite q2 as:
SELECT k.publication_id, COUNT(*) AS kscore
FROM key_pub k
WHERE p.id = k.publication_id
AND k.keyword IN <list of category ids obtained from the current publication>
GROUP BY k.publication_id
ORDER BY kscore DESC
LIMIT 10
Leave the join as is:
SELECT p.*, (q1.cscore + q2.kscore) AS score
FROM publications p
INNER JOIN (<cscore query>) q1 ON p.id = q1.publication_id
INNER JOIN (<kscore query>) q2 ON p.id = q2.publication_id
ORDER BY score DESC
LIMIT 5
Note that count(*) is usually a faster choice, because it will not test of null If you can have null values and don't want to include those in the count, then name the count(field) explicitly.