I was trying to solve problem 15 under http://sqlzoo.net/wiki/More_JOIN_operations
I don't understand why my query is wrong, even though my output is like it's supposed to be.
Here's my query:
SELECT movie.title, COUNT(actorid)
FROM movie JOIN casting on movie.id=movieid
WHERE yr='1978'
GROUP BY casting.movieid
ORDER BY COUNT(casting.actorid) DESC
And the official answer:
SELECT title, COUNT(actorid)
FROM casting,movie
WHERE yr=1978
AND movieid=movie.id
GROUP BY title
ORDER BY 2 DESC
If I just change the ORDER BY in my query from ORDER BY COUNT(casting.actorid) DESC to ORDER BY 2 DESC the answer is accepted (correct). Any reason for this?
All three of these should be accepted:
SELECT m.title, COUNT(c.actorid) as NumActors
FROM movie m JOIN
casting c
on m.id= c.movieid
WHERE yr = '1978'
GROUP BY c.movieid
ORDER BY COUNT(c.actorid) DESC
and:
ORDER BY 2 DESC
and:
ORDER BY NumActors DESC
Kudos for using proper explicit JOIN syntax. Simple rule: do not use commas in the FROM clause.
As a note: I think the use of 2 might be removed from some future version of the ANSI-compliant databases.
Yes - you can sort by the column ordinal. However you need to be aware that if the select list changes (if columns are added or removed, if the columns order is changed, etc..) you will see unexpected results if the order by clause is not modified to reflect the changes.
Some RDMBS's will allow you to give the column an alias and to use this in the order by clause. Not sure if MySql is one of those however.
SELECT movie.title, COUNT(actorid) NumMovies
FROM movie JOIN casting on movie.id=movieid
WHERE yr='1978'
GROUP BY casting.movieid
ORDER BY NumMovies DESC
Related
I was hoping someone could point me in the right direction as to why my data is not ordered by the flight_count column?
SELECT pilot_id,
pilot_firstname,
pilot_lastname,
pilot_email,
licence_num,
flight_count
FROM pilots
INNER JOIN
( SELECT pilot_id,
COUNT(flight_id) AS 'flight_count'
FROM flights
GROUP BY pilot_id
ORDER BY flight_count DESC
) as a
USING (pilot_id);
Move the ordering criteria to the outer select:
SELECT p.pilot_id, p.pilot_firstname, p.pilot_lastname, p.pilot_email, p.licence_num,
fc.flight_count
FROM pilots p
JOIN (
SELECT pilot_id, COUNT(flight_id) AS flight_count
FROM flights
GROUP BY pilot_id
) as fc
on fc.pilot_id = p.pilot_id
ORDER BY fc.flight_count DESC;
Note you should not be 'quoting' column aliases to delimit them, and the name is fine as-is; it's also generally a good idea to use (meaningful) table aliases explicitely, it helps with readability and also means there's less work for the query optimizer to do if columns are explicitely aliased.
I have recently started to learn sql queries but i am having some issues.
I have these two tables here:
Birds
http://i.imgur.com/2m0VuoE.png
MembersLikesBirdEncounter (birdID is the foriegn key here referenced from above table Birds)
http://i.imgur.com/0cWlG94.png
i am trying to display the most common birdID value from the table MembersLikesBirdEncounter, which is 234. Below is the query i have come up with which doesn't seem to be working. What am i doing wrong?
SELECT m.birdID, COUNT(m.birdID)
FROM MembersLikesBirdEncounter m, Birds b
WHERE b.birdID = m.birdID
GROUP BY m.birdID
ORDER BY m.birdID DESC
LIMIT 1;
I want the output to be
birdID
------
234
Not so hard...
Instead:
GROUP BY Count(m.birdID) DESC
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax. Then, the problem with your query is the ORDER BY column. You want to order by the count:
SELECT m.birdID, COUNT(m.birdID)
FROM MembersLikesBirdEncounter m JOIN
Birds b
ON b.birdID = m.birdID
GROUP BY m.birdID
ORDER BY COUNT(jm.birdID) DESC
LIMIT 1;
Then -- assuming that birdId refers always refers to valid birds -- the JOIN is not necessary. This should be sufficient:
SELECT m.birdID, COUNT(m.birdID)
FROM MembersLikesBirdEncounter m
GROUP BY m.birdID
ORDER BY COUNT(jm.birdID) DESC
LIMIT 1;
select d.order_type from migu_td_aaa_order_log_d d where exists(select 1
from migu_user r where r.user_id = '156210106' and r.user_num =
d.serv_number) and d.product_id in ('2028594290','2028596512','2028597138' )
order by d.opr_time desc limit 1
why the above sql failed ,indicates :
FAILED: SemanticException [Error 10002]: Line 4:11 Invalid column reference 'opr_time'
but the below one works :
select temp.order_type from (
select d.* from migu_td_aaa_order_log_d d where exists(select 1 from
migu_user r where r.user_id = '156210106' and r.user_num = d.serv_number)
and d.product_id in ('2028594290','2028596512','2028597138' ) order by
d.opr_time desc limit 1) temp;
this one works fine ,too ,and much more efficient than the second one:
select d.* from migu_td_aaa_order_log_d d where exists(select 1 from
migu_user r where r.user_id = '156210106' and r.user_num = d.serv_number)
and d.product_id in ('2028594290','2028596512','2028597138' )
order by d.opr_time desc limit 1
I only need to get order_type field,so even though the second one works,but it cost much more time.
Can anyone help me?
Thanks a lot!
Your first query does not work because, in the first select statement, you are just getting one column (d.order_type), but you are trying to order by another column (d.opr_time), which you have not included in your select statement
select d.order_type from ...
...
order by d.opr_time desc limit 1
Note that if you added the column d.opr_time to your first query, it would work:
select d.order_type, d.opr_time from ...
...
order by d.opr_time desc limit 1
Your second query works because, in the subquery, you have selected all the columns of d (d.*), so when you order by opr_time, that column is present. (Same for the third query).
select temp.order_type from (
select d.* ... order by d.opr_time ...
EDITED:
According to the Hive documentation:
When using group by clause, the select statement can only include
columns included in the group by clause. Of course, you can have as
many aggregation functions (e.g. count) in the select statement as
well.
So, this query:
select d.order_type, d.opr_time from ...
...
order by d.opr_time desc limit 1
Shouldn't work either, because the select clause has an additional column (d.order_type) that is not included in the group by clause.
I hope this helps.
P.S. This answer about SQL execution order might be useful.
1.
Hive currently have an order by limitation.
The current status of this issue is PATCH AVAILABLE.
see -
"Can't order by an unselected column"
https://issues.apache.org/jira/browse/HIVE-15160
2.
You might want to get familiar with LEFT SEMI JOIN which is a cleaner syntax for EXISTS
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinSyntax
3.
using min / max over a struct / named_struct can be used instead of order by ... asc / desc and limit 1
Here is an alternative solution:
select max(named_struct('opr_time',d.opr_time,'order_type',d.order_type)).order_type
from migu_td_aaa_order_log_d d
left semi join migu_user r
on r.user_num =
d.serv_number
and r.user_id = '156210106'
where d.product_id in ('2028594290','2028596512','2028597138')
;
P.s.
You seriously want to consider to treat IDs (user_id, product_id) as numeric and not as strings.
I have the following query, which was developed from a hint found online because of a problem with a GROUP BY returning the maximum value; but it's running really slowly.
Having looked online I'm seeing that WHERE IN (SELECT.... GROUP BY) is probably the issue, but, to be honest, I'm struggling to find a way around this:
SELECT *
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
WHERE (a.train_id, a.TimeStamp) in (
SELECT a.train_id, max(a.TimeStamp)
FROM a
GROUP BY a.train_id
)
I'm thinking I possibly need a derived table, but my experience in this area is zero and it's just not working out!
you can move that to a SUBQUERY and also select only required columns instead of All (*)
SELECT a.train_uid
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
JOIN (SELECT a.train_id, max(a.TimeStamp) as TimeStamp
FROM a
GROUP BY a.train_id )T
on a.train_id = T.train_id
and a.TimeStamp = T.TimeStamp
Here is a simplified version of my table
tbl_records
-title
-created
-views
I am wondering how I can make a query where they are grouped by title, but the record that is returned for each group is the most recently created. I then will order it by views.
One way I guess is to do a sub query and order it by created and then group it by title and then from those results order it by views. I guess there is a better way though.
Thanks
EDIT:
SAMPLE DATA:
-title: Gnu Design
-created: 2009-11-11 14:47:18
-views: 104
-title: Gnu Design
-created:2010-01-01 21:37:09
-views:9
-title: French Connection
-created:2010-05-01 09:27:19
-views:20
I would like the results to be:
-title: French Connection
-created:2010-05-01 09:27:19
-views:20
-title: Gnu Design
-created:2010-01-01 21:37:09
-views:9
Only the most recent Gnu Design is shown and then the results are ordered by views.
This is an example of the greatest-n-per-group problem that appears frequently on StackOverflow.
Here's my usual solution:
SELECT t1.*
FROM tbl_records t1
LEFT OUTER JOIN tbl_records t2 ON (t1.title = t2.title AND
(t1.created < t2.created OR t1.created = t2.created AND t1.primarykey < t2.primarykey))
WHERE t2.title IS NULL;
Explanation: find the row t1 for which no other row t2 exists with the same title and a greater created date. In case of ties, use some unique key to resolve the tie, unless it's okay to get multiple rows per title.
select i.*, o.views from
(
select
title
, max(created) as last_created
from tbl_records
group by title
) i inner join tbl_records o
on i.title = o.title and i.last_created = o.created
order by o.views desc
I'm assuming that the aggregation to be applied to views is count(), but could well be wrong (you'll need to have some way of defining which measure of views you wish to have for the lastest created title). Hope that helps.
EDIT: have seen your sample data and edited accordingly.
SELECT title,
MAX(created),
views
FROM table
GROUP BY title
ORDER BY views DESC