SQL - subqueries for top result without order by - mysql

(Sorry about the title, couldn't think of how to explain it)
So I have an Olympic database, the basic layout is that there's a competitors table with competitornum, givenname, and familyname (other columns aren't necessary for this) There's also a results table with competitornum, and place (between 1 and 8).
I'm trying to get the givenname and familyname and total number of gold, silver, and bronze medals (place = 1, 2 or 3)
It also needs to only display the results with the top number of medals, and all of this without using the Order By clause...
I asked this question before but realised I forgot to say some things, but the previous answer before the bold part was added was:
SELECT c.Givenname, c.Familyname, COUNT(r.places) AS TotalPlaces
FROM Competitors c INNER JOIN Results r
ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.Givenname, c.Familyname
I'm thinking it needs another subquery like
AND TotalPlaces = (SELECT MAX(TotalPlaces))
but I'm not sure how to use an alias in a subquery when it's above the subquery...
All help is appreciated, thanks!
EDIT: The official question on my assignment (I can't figure out the answer, I've really tried, that's why I'm here):
Which competitor(s) got the largest number of medals (counting gold, silver and bronze all together)? List their given and family names and the total number of their medals (only).
Warning: your solution must not assume that competitor names are always different
Do not use an ORDER BY clause, in any part of this query.

You need to have another subquery for this,
SELECT c.Givenname, c.Familyname, COUNT(r.places) AS TotalPlaces
FROM Competitors c
INNER JOIN Results r ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.Givenname, c.Familyname
HAVING COUNT(r.places) =
(
SELECT MAX(TotalPlaces)
FROM
(
SELECT COUNT(g.places) AS TotalPlaces
FROM Competitors f
INNER JOIN Results g ON f.Competitornum = g.Competitornum
WHERE g.place IN (1,2,3)
GROUP BY f.Givenname, f.Familyname
)
)

The final answer (thanks to John Woo and lc.)
Pasted this here for anyone that comes across this question in the future:
SELECT c.Givenname, c.Familyname, COUNT(r.place) AS TotalPlaces
FROM Competitors c
INNER JOIN Results r ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.competitornum, c.Givenname, c.Familyname
HAVING COUNT(r.place) =
(
SELECT MAX(TotalPlaces)
FROM
(
SELECT COUNT(r.place) AS TotalPlaces
FROM Competitors c
INNER JOIN Results r ON r.Competitornum = c.Competitornum
WHERE r.place IN (1,2,3)
GROUP BY c.competitornum, c.Givenname, c.Familyname
)
)

Related

SQL - Find object with the highest count in a column

I am answering questions about an IMDB database as shown below.
I need to find which TV show (which is a kind_type that shows up as 'tv series') has the most episodes, actors and actresses, and seasons (these are separate parts of the question).
To start off, I wrote a query to find the name of the TV show that has the most actresses:
SELECT *
FROM (
SELECT DISTINCT t.title, count(t.title) total
FROM title t
INNER JOIN kind_type k
ON (t.kind_id = k.id)
INNER JOIN cast_info c
ON (c.movie_id = t.id)
CROSS JOIN role_type r
GROUP BY t.title
HAVING r.role = 'actress' AND k.kind = 'tv series'
ORDER BY total DESC
) as newTable
LIMIT 1
However, I get the error:
column "r.role" must appear in the GROUP
BY clause or be used in an aggregate function
LINE 11: HAVING r.role = 'actress' AND k.kind = 'tv series'
So you can think of it as having a lot of cast_info objects, each attached to role_type objects. Each cast_info also has a variable for the movie_id, and I aimed to select a list of all cast_info objects that had role_types with the role 'actress', and then pick out the most frequently occurring 'movie_id' that shows up in that list.
Example:
In this example, the query should ideally return "3" because that is the movie ID that has the most actresses.
Any tips would be greatly appreciated.
This is a simple fix and likely just a mistake on your part.
You're receiving the error because you're putting a regular condition inside your HAVING clause. HAVING is used for conditions regarding aggregate functions.
For example, if you were trying to select only rows with a total greater than 2, you use having:
HAVING total > 2
However, what you want needs to go in a WHERE clause. Try this:
SELECT *
FROM (
SELECT DISTINCT t.title, count(t.title) total
FROM title t
INNER JOIN kind_type k
ON (t.kind_id = k.id)
INNER JOIN cast_info c
ON (c.movie_id = t.id)
JOIN role_type r
ON (r.id = c.role_id)
WHERE r.role = 'actress' AND k.kind = 'tv series'
GROUP BY t.title
ORDER BY total DESC
) as newTable
LIMIT 1
Here is more info on the HAVING clause.

SQL join tables and get Average

I asked yesterday a little bit similar question (I thought that that was my problem but later i realised that there was a fault). But that question got couple of nice answers and it did not make sense to change that question. And i think this question is enough different.
Question:
I have four tables and i need to calculate the Average points that each School has gotten.
Problem: the School Average should be calculated by the two latest Points each Team has gotten. At the moment the Query calculates all the points a Teams has gotten in the average.
A School can have multiple Teams and Teams can have multiple points. And from each team only the two latest points should be calculated in the School Average. Each School should also get the proper City KAID (CITY_ID). In the sqlFiddle everything works but the Average is wrong because it calculates all the points a Team has gotten.
I have created a simplificated working: sqlFiddle
The average for SCHOOL1 should be 2,66...
Example:
Let's say that Team10 has 6 points:
TEAM10 3..4..7..0..3..5 = 8 (3+5=8)
Only the latest two points should be calculated in the average (3 and 5). This should happen for all the teams.
I have tried couple of Queries but they don't work.
Query 1 (Problem: calculates all the points):
SELECT SNAME As School, AVG(PTS) As Points, ka.KAID As City_id FROM
Schools op
LEFT JOIN Points pi
ON op.OPID = pi.OPID
LEFT JOIN Citys ka
ON op.KAID = ka.KAID
GROUP BY SNAME, ka.KAID
ORDER BY City_id, Points, School ASC
Query 2 (Problem: Average wrong and duplicates):
SELECT IFNULL(AVG(PTS), 0) AS AVG, po2.KAID AS KID, SNAME AS SNAM FROM
(
SELECT te1.ID, te1.KAID, po1.PTS, te1.OPID FROM Points po1
INNER JOIN Teams te1 ON te1.ID = po1.TEID
GROUP BY po1.TEID, te1.ID HAVING count(*) >= 2
)
po2 INNER JOIN Schools sch1 ON po2.KAID = sch1.KAID
GROUP BY sch1.SNAME, sch1.OPID
ORDER BY po2.ID DESC
I am quite new to sql I have tried different Queries but i haven't gotten this to work properly.
If something is not clear please ask i will try to Explain it better.
try running this...
SELECT
SNAME As School,
SUM(pts)/ count(*) As Points,
ka.KAID As City_id
FROM Schools op
LEFT JOIN Points pi
ON op.OPID = pi.OPID
LEFT JOIN Citys ka
ON op.KAID = ka.KAID
GROUP BY SNAME, ka.KAID
ORDER BY City_id, Points, School ASC
DEMO
From what I see you have for the first school and the first city 8 rows with the sum = 29.
29/8 = 3.25.. you are joining the tables on the correct fields and the query is returning the rows in the table based on the opid and kaid so it seems the results are correct.. i'm guessing the avg function is not including the 0's or something but the results are there
EDIT:
to get it for the two newest rows you need to look at the greatest id per school and then the second greatest.. this will do what you want.
SELECT
SNAME As School,
SUM(pts)/ count(*) As Points,
ka.KAID As City_id
FROM Schools op
LEFT JOIN Points pi ON op.OPID = pi.OPID
LEFT JOIN Citys ka ON op.KAID = ka.KAID
JOIN
( ( SELECT MAX(id) as f_id
FROM points
GROUP BY TEID
ORDER BY f_id
)
UNION
( SELECT p1.id
FROM
( SELECT MAX(id) as t_id
FROM points
GROUP BY TEID
ORDER BY t_id
)t
LEFT JOIN points p1 on p1.id = (t.t_id -1)
)
) temp ON temp.f_id = pi.id
GROUP BY SNAME, ka.KAID
ORDER BY City_id, Points, School ASC;
ANOTHER DEMO

creating a custom column from joining two tables

I am terrible with sub queries if that is what i need to do. First let me show you a preview of my tables and what i'm trying to do.
this is the result i want at the end:
business.name
reviews_count (total count, matching the current queries business_id)
where the b.industry_id matches 7
This is what i'm trying but i feel stuck and dont know how to match the total count, let me explain:
select
b.name,
reviews_count as (select count(*) as count from reviews where business_id = b.business_id),
from business as b
left join reviews as r
on r.business_id = b.id
where b.industry_id = 7
the sub query business_id needs to match the the current businesses id that is being run. Hope i made sense. ( reviews_count doesnt exist, i just made it up to use when i output)
This looks like a job for GROUP BY
SELECT
b.name,
count(distinct r.id)
FROM
businesses b
JOIN reviews r ON r.business_id = b.id
WHERE b.industry_id = 7
GROUP BY b.id
That way you can avoid the subquery alltogether.

MySQL: SUM Joined Table on Limited Selection of Rows

I have a problem similar to this question but a bit more complicated and I'm having trouble figuring out how to do it efficiently. Given two tables, one for a list of athletes and one with a list of races they've run in, e.g.,
ATHLETES
id
name
gender
details
RACES
athlete_id
year
points
I want to rank all of the athletes by gender for a given period of years using only their top 4 race finishes (where "top" is defined by points). I feel like I should be able to do this in a subquery but I can't figure out how to reference the outer query from the inner. What I have now looks like this:
SELECT SUM(points) as points, a.* FROM
(SELECT rr.points, inner_a.id as athlete_id
FROM athletes_raceresult rr
INNER JOIN athletes_athlete inner_a ON rr.athlete_id = inner_a.id
WHERE inner_a.gender ='m' AND rr.year BETWEEN 2012 AND 2014
AND inner_a.id = a.id
ORDER BY rr.points DESC) as races
INNER JOIN athletes_athlete a ON races.athlete_id = a.id
GROUP BY races.athlete_id
ORDER BY points DESC
But that doesn't limit the points to 4 rows per athlete. It looks like I want a correlated subquery, but I can't get that to work.
The following SQL Fiddle example illustrates how this can be done:
SELECT SUM(rr.points), a.id
FROM athletes_raceresult AS rr
INNER JOIN athletes_athlete AS a ON rr.athlete_id = a.id
WHERE (
SELECT count(crr.points)
FROM athletes_raceresult AS crr
INNER JOIN athletes_athlete AS ca ON crr.athlete_id = ca.id
WHERE ca.gender = 'm'
AND crr.year BETWEEN 2012 AND 2014
AND crr.athlete_id = a.id AND crr.points >= rr.points
) <= 4
AND a.gender = 'm'
AND rr.year BETWEEN 2012 AND 2014
GROUP BY a.id

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.