MySQL: SUM Joined Table on Limited Selection of Rows - mysql

I have a problem similar to this question but a bit more complicated and I'm having trouble figuring out how to do it efficiently. Given two tables, one for a list of athletes and one with a list of races they've run in, e.g.,
ATHLETES
id
name
gender
details
RACES
athlete_id
year
points
I want to rank all of the athletes by gender for a given period of years using only their top 4 race finishes (where "top" is defined by points). I feel like I should be able to do this in a subquery but I can't figure out how to reference the outer query from the inner. What I have now looks like this:
SELECT SUM(points) as points, a.* FROM
(SELECT rr.points, inner_a.id as athlete_id
FROM athletes_raceresult rr
INNER JOIN athletes_athlete inner_a ON rr.athlete_id = inner_a.id
WHERE inner_a.gender ='m' AND rr.year BETWEEN 2012 AND 2014
AND inner_a.id = a.id
ORDER BY rr.points DESC) as races
INNER JOIN athletes_athlete a ON races.athlete_id = a.id
GROUP BY races.athlete_id
ORDER BY points DESC
But that doesn't limit the points to 4 rows per athlete. It looks like I want a correlated subquery, but I can't get that to work.

The following SQL Fiddle example illustrates how this can be done:
SELECT SUM(rr.points), a.id
FROM athletes_raceresult AS rr
INNER JOIN athletes_athlete AS a ON rr.athlete_id = a.id
WHERE (
SELECT count(crr.points)
FROM athletes_raceresult AS crr
INNER JOIN athletes_athlete AS ca ON crr.athlete_id = ca.id
WHERE ca.gender = 'm'
AND crr.year BETWEEN 2012 AND 2014
AND crr.athlete_id = a.id AND crr.points >= rr.points
) <= 4
AND a.gender = 'm'
AND rr.year BETWEEN 2012 AND 2014
GROUP BY a.id

Related

MYSQL View Query Performance Issue

I have 5 SQL tables
store
staff
departments
sold_items
staff_rating
I created a view that JOINs this four of the tables together. The last table (staff_rating),I want to get the rating column at a time close to when items was sold (sold_items.date) for the view rows.
I have tried the following SQL Queries which works but have performance issues.
SQL QUERY 1
SELECT s.name,
s.country,
d.name,
si.item,
si.date,
(SELECT rating
FROM staff_ratings
WHERE staff_id = s.id
ORDER BY DATEDIFF(date, si.date) LIMIT 1) AS rating,
st.name,
st.owner
FROM store st
LEFT OUTER JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN sold_items si ON si.store_id = st.id
SQL QUERY 2
SELECT s.name,
s.country,
d.name,
si.item,
si.date,
si.rating ,
st.name,
st.owner
FROM store st
LEFT OUTER JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN (SELECT *,
(SELECT rating
FROM staff_ratings
WHERE staff_id = si.staff_id
ORDER BY DATEDIFF(date, si.date) LIMIT 1) AS rating
FROM sold_items) si ON si.store_id = st.id
SQL Query 2 is faster than SQL Query 1. But Both still have performance issue. Appreciate help for a query with better performance. Thanks in advance.
Your query doesn't look right to me (as mentioned in a comment on the original post; lacking staff_id in the join on the sales, etc)
Ignoring that, one of your biggest performance hits is likely to be this...
ORDER BY DATEDIFF(date, si.date) LIMIT 1
That order by can only be answered by comparing EVERY record for that staff member to the current sales record.
What you ideally want to be able to do is find the appropriate staff rating from an index, and not to have to run computations that involve dates from both the ratings table and the sales table.
If, for example, you wanted "the most recent rating BEFORE the sale", the query can be substantially improved...
SELECT
s.name,
s.country,
d.name,
si.item,
si.date,
(
SELECT sr.rating
FROM staff_ratings sr
WHERE sr.staff_id = s.id
AND sr.date <= si.date
ORDER BY sr.date DESC
LIMIT 1
)
AS rating,
st.name,
st.owner
FROM store st
LEFT JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN sold_items si ON si.store_id = st.id
Then, with an index for staff_ratings(staff_id, date, rating) the optimiser can very quickly look up which rating to use, without having to scan Every Single Rating for that staff member.
Why DATEDIFF? Would something like this work better? If so, the given index will make it work much faster.
WHERE staff_id = s.id
AND s.date >= s1.date
ORDER BY s.date
LIMIT 1
And INDEX(staff_id, date)
Do you need LEFT JOIN? Perhaps plain JOIN?
d may benefit from INDEX(store_id, name)

Writting SQL code with using count and gorup_concat

I've already read every post with the similarly title but didn't find right answer.
What I really need to do is to count some data from MySQL table and then do group_concat because I got more than one row.
My table looks like this
and here is how I tried to run the query
SELECT
count(cal.day) * 8,
w.name
FROM claim as c
RIGHT JOIN calendar as cal ON c.id = cal.claim_id
RIGHT JOIN worker as w ON c.worker_id = w.id
GROUP BY c.id
ORDER BY w.name asc
But I get for some workers more than one row and I can't group_concat them because of count(). I need this for mysql procedure I've making so please help me if you can.
I hope I've gived you enough informations
Edit for Dylan:
See the difference in output
GROUP BY w.id
GROUP BY c.id
MySQL does'not allow two aggregate functions used together, like GROUP_CONCAT(COUNT(...)).
Therefore, we can use a sub-query to work around as below.
SELECT
GROUP_CONCAT(t.cnt_cal_day) as cnt_days,
t.name
FROM
(
SELECT
count(cal.day) * 8 as cnt_cal_day,
w.name
FROM claim as c
RIGHT JOIN calendar as cal ON c.id = cal.claim_id
RIGHT JOIN worker as w ON c.worker_id = w.id
GROUP BY c.id
ORDER BY w.name asc
) t
While the question is still not clear for me, I try to guess what you need.
This query:
SELECT
w.name,
COUNT(cal.day) * 8 AS nb_hours
FROM worker w
LEFT JOIN claim c ON w.id = c.worker_id
INNER JOIN calendar cal ON c.id = cal.claim_id
GROUP BY w.id
ORDER BY w.name ASC
returns the names of all workers and, for each one, the number of hours of vacation approved for them.
If you use LEFT JOIN calendar instead you will get the number of hours of vacation claimed by each worker (approved and not approved). In order to separate them you should make the query like this:
SELECT
w.name,
c.approved, # <---- I assumed the name of this field
COUNT(cal.day) * 8 AS nb_hours
FROM worker w
LEFT JOIN claim c ON w.id = c.worker_id
LEFT JOIN calendar cal ON c.id = cal.claim_id
GROUP BY w.id, c.approved
ORDER BY w.name ASC
This query should return 1 or 2 rows for each worker, depending on the types of vacation claims they have (none, approved only, not approved only, both). For workers that don't have any vacation claim, the query returns NULL in column approved and 0 in column nb_hours.

SQL join tables and get Average

I asked yesterday a little bit similar question (I thought that that was my problem but later i realised that there was a fault). But that question got couple of nice answers and it did not make sense to change that question. And i think this question is enough different.
Question:
I have four tables and i need to calculate the Average points that each School has gotten.
Problem: the School Average should be calculated by the two latest Points each Team has gotten. At the moment the Query calculates all the points a Teams has gotten in the average.
A School can have multiple Teams and Teams can have multiple points. And from each team only the two latest points should be calculated in the School Average. Each School should also get the proper City KAID (CITY_ID). In the sqlFiddle everything works but the Average is wrong because it calculates all the points a Team has gotten.
I have created a simplificated working: sqlFiddle
The average for SCHOOL1 should be 2,66...
Example:
Let's say that Team10 has 6 points:
TEAM10 3..4..7..0..3..5 = 8 (3+5=8)
Only the latest two points should be calculated in the average (3 and 5). This should happen for all the teams.
I have tried couple of Queries but they don't work.
Query 1 (Problem: calculates all the points):
SELECT SNAME As School, AVG(PTS) As Points, ka.KAID As City_id FROM
Schools op
LEFT JOIN Points pi
ON op.OPID = pi.OPID
LEFT JOIN Citys ka
ON op.KAID = ka.KAID
GROUP BY SNAME, ka.KAID
ORDER BY City_id, Points, School ASC
Query 2 (Problem: Average wrong and duplicates):
SELECT IFNULL(AVG(PTS), 0) AS AVG, po2.KAID AS KID, SNAME AS SNAM FROM
(
SELECT te1.ID, te1.KAID, po1.PTS, te1.OPID FROM Points po1
INNER JOIN Teams te1 ON te1.ID = po1.TEID
GROUP BY po1.TEID, te1.ID HAVING count(*) >= 2
)
po2 INNER JOIN Schools sch1 ON po2.KAID = sch1.KAID
GROUP BY sch1.SNAME, sch1.OPID
ORDER BY po2.ID DESC
I am quite new to sql I have tried different Queries but i haven't gotten this to work properly.
If something is not clear please ask i will try to Explain it better.
try running this...
SELECT
SNAME As School,
SUM(pts)/ count(*) As Points,
ka.KAID As City_id
FROM Schools op
LEFT JOIN Points pi
ON op.OPID = pi.OPID
LEFT JOIN Citys ka
ON op.KAID = ka.KAID
GROUP BY SNAME, ka.KAID
ORDER BY City_id, Points, School ASC
DEMO
From what I see you have for the first school and the first city 8 rows with the sum = 29.
29/8 = 3.25.. you are joining the tables on the correct fields and the query is returning the rows in the table based on the opid and kaid so it seems the results are correct.. i'm guessing the avg function is not including the 0's or something but the results are there
EDIT:
to get it for the two newest rows you need to look at the greatest id per school and then the second greatest.. this will do what you want.
SELECT
SNAME As School,
SUM(pts)/ count(*) As Points,
ka.KAID As City_id
FROM Schools op
LEFT JOIN Points pi ON op.OPID = pi.OPID
LEFT JOIN Citys ka ON op.KAID = ka.KAID
JOIN
( ( SELECT MAX(id) as f_id
FROM points
GROUP BY TEID
ORDER BY f_id
)
UNION
( SELECT p1.id
FROM
( SELECT MAX(id) as t_id
FROM points
GROUP BY TEID
ORDER BY t_id
)t
LEFT JOIN points p1 on p1.id = (t.t_id -1)
)
) temp ON temp.f_id = pi.id
GROUP BY SNAME, ka.KAID
ORDER BY City_id, Points, School ASC;
ANOTHER DEMO

creating a custom column from joining two tables

I am terrible with sub queries if that is what i need to do. First let me show you a preview of my tables and what i'm trying to do.
this is the result i want at the end:
business.name
reviews_count (total count, matching the current queries business_id)
where the b.industry_id matches 7
This is what i'm trying but i feel stuck and dont know how to match the total count, let me explain:
select
b.name,
reviews_count as (select count(*) as count from reviews where business_id = b.business_id),
from business as b
left join reviews as r
on r.business_id = b.id
where b.industry_id = 7
the sub query business_id needs to match the the current businesses id that is being run. Hope i made sense. ( reviews_count doesnt exist, i just made it up to use when i output)
This looks like a job for GROUP BY
SELECT
b.name,
count(distinct r.id)
FROM
businesses b
JOIN reviews r ON r.business_id = b.id
WHERE b.industry_id = 7
GROUP BY b.id
That way you can avoid the subquery alltogether.

Listing top 5 users measured by most common rows in foreign key table

Here's a picture of my database structure:
When I input an Observation, I'm entering a row into Observations but also 0, 1 or more rows into Criteria.
I'm trying to write an SQL statement which allows me to select the Strongest 5 teachers in a specific criteria.
So for example I'd click on Questioning (which might have an ID of 5 in Criteria_Labels), I'd want to return a list of 5 teachers (Teacher_ID from Observations) who have the most rows of Criteria_ID = 5 in Criteria.
The statement that I've attempted to write is as follows:
SELECT t.Name AS Teacher_Name
FROM observations o
LEFT JOIN teachers t ON o.Teacher_ID = t.Teacher_ID
LEFT JOIN criteria c ON o.ID = c.Observation_ID
WHERE c.Criteria_ID = 5
ORDER BY COUNT(c.Criteria_ID) DESC
LIMIT 0,5
However, it only appears to return one member of staff. I'm not sure I've got this right at all, but hopefully I'm along the right lines.
Can anyone help? Thanks in advance,
SELECT t.Teacher_ID, t.Name AS Teacher_Name, count(*) as total
FROM observations o
LEFT JOIN teachers t ON o.Teacher_ID = t.Teacher_ID
LEFT JOIN criteria c ON o.ID = c.Observation_ID
WHERE c.Criteria_ID = 5
group by t.Teacher_ID, t.Name
order by total desc
limit 5