Sorting a dataset with a group by statement - mysql

I am writing a query into a database that tracks the results of athletic competitions. My database has an athletes table:
| id | first_name | last_name | Gender |
| 1 | Sam | Johnson | m |
| 2 | Adam | Jones | m |
and a results table
| id | time | athlete_id
| 1 | 1302 | 1
| 2 | 1420 | 1
| 3 | 1491 | 2
| 4 | 1541 | 2
| 5 | 0 | 1
I want to retrieve all the athletes and only their fastest result. I have a query like this
select a.id as aid, a.`first`, a.`last`, r.`id` as `rid`, min(r.`time`) as `time`
FROM athletes a, results r
WHERE
r.athlete_id=a.id AND
r.time > 0
GROUP BY a.id
ORDER BY r.time
So far my query does limit the result to the fastest time, but it's not sorting by the time correctly. I also tried adding second reference to the results table
select a.id as aid, a.`first`, a.`last`, r.`id` as `rid`, r.`time`
FROM athletes a, results r, results r2
WHERE
r.athlete_id=a.id AND
r2.athlete_id=a.id AND
r.time > 0
r1.time < r2.time
ORDER BY r.time
but that caused a out of memory error. The results table has over a million entries and the athletes entry has over 15,000. So the question remains, is there an efficient way of sorting the grouped records or should I have the PHP script remove results as the record set is looped.

Try
SELECT q.athlete_id aid, a.first, a.last, r.id rid, q.`time`
FROM
(SELECT athlete_id, MIN(`time`) `time`
FROM results
WHERE time > 0
GROUP BY athlete_id) q JOIN results r
ON q.athlete_id = r.athlete_id
AND q.`time` = r.`time` JOIN athletes a
ON q.q.athlete_id = a.id
ORDER BY q.`time`
Output:
| AID | FIRST | LAST | RID | TIME |
--------------------------------------
| 1 | Sam | Johnson | 1 | 1302 |
| 2 | Adam | Jones | 3 | 1491 |
SQLFiddle

Related

Bring all data from a table with joins with where clause that may not exist in the other table

I'm having a hard time setting up a query(select). Database is not my specialty, so I'm turning to the experts. Let me show what I need.
----companies--- ----company_server----- -----servers---- -----print------------------------
| id | name | | company | server | | id | name | | id |page|copy | date |server
|----|-------- | |---------|----------| |----|-------- | |----|----|-----|-------------
| 1 | Company1 |1--N| 1 | 1 |N*--1| 1 | Server1 |1--N| 1 | 2 | 3 | 2020-1-11 | 1
| 2 | Company2 | | 2 | 1 | | 2 | Server2 | | 2 | 1 | 6 | 2020-1-12 | 3
| 3 | Company3 | | 3 | 2 | | 3 | Server3 | | 3 | 4 | 5 | 2020-1-13 | 4
| 3 | 3 | | 4 | Server4 | | 4 | 5 | 3 | 2020-1-15 | 2
| 5 | 3 | 4 | 2020-1-15 | 4
| 6 | 1 | 2 | 2020-1-16 | 3
| 7 | 2 | 2 | 2020-1-16 | 4
What I need?
Example where date between CAST(2020-1-12 AS DATE) AND CAST(2020-1-15 AS DATE) group by servers.id
| companies | server | sum | percent
------------------------------------------------------------------------------------
| company1,company2 | server1 | sum(page*copy) = 0 or null | 0 or NULL
| company3 | server2 | sum(page*copy) = 15 | 28.30
| company3 | server3 | sum(page*copy) = 6 | 11.32
| NULL | server4 | sum(page*copy) = 32 | 60.38
Few notes:
I need this query for MYSQL;
Every Company is linked to at least one server.
I need result grouped by server. So, every company linked to that server must be concatenated by a comma.
If the company has not yet been registered, the value null should be presented.
The sum (page * copie) must be presented as zero or null (I don't care) in the case that there was no printing in the date range.
The percentage should be calculated according to the date range entered and not with all records in the database.
The field date is stored as MYSQL DATE.
Experts, I thank you in advance for your help. I currently solve this problem with at least 03 queries to the database, but I have a conviction that I could do it with just one query.
Added a fiddle. Sorry. Im still learing how to use this.
https://www.db-fiddle.com/f/dXej7QCPe9iDopfYd1SfVh/2
Follows the query that more or less represents how far I had arrived. Notice that in the middle of the way 'server4' disappeared because there are no values ​​for it in print in the period searched for him and I am in possession of the total of the period but I cannot calculate the percentage.
i'm stuck
select
*
from
(select
sum(p.copy * p.page) as sum1,
s.name as s_name,
s.id as s_id
from
print p
join servers s on s.id = p.server
where p.date between cast('2020-1-12' as date) and cast('2020-1-15' as date)
group by s.id) as t1
join company_server cs on cs.server = t1.s_id
right join companies c on c.id = cs.company
cross join(
select
sum(p1.copy * p1.page) sum2
from
print p1
where p1.date between cast('2020-1-12' as date) and cast('2020-1-15' as date)
) as c;
I did this query before you add fiddle, so may be name of column of mine is not same as you. Anyway, this is my solution, hope it help you.
select group_concat(c.name separator ',') as name_company,
ss.name,
sum_print as sum,
(sum_print/total) *100 as percentage
from companies c
inner join company_server cs on c.id = cs.company
right join servers ss on ss.id = cs.id
left join
(
select server,sum(page*copy) as sum_print, date from print
where date between CAST('2020-1-12' AS DATE) AND CAST('2020-1-15' AS DATE)
group by server
) tmp on tmp.server = ss.id
cross join
(select sum(page*copy) as total from print where date between CAST('2020-1-12' AS DATE) AND CAST('2020-1-15' AS DATE)) tmp2
group by id
Group and concat by comma, using GROUP_CONCAT .
You can reference this image for JOIN clause.
https://i.stack.imgur.com/6cioZ.png

Multiple rows with same identifier, how to select the row with heighest value (ignoring the rest) with multiple left joins?

this issue has been bothering me for a few hours now. After finding out my old query had an issue, I had to rebuild it.
The situation:
I need to match each patient_id with a clinic_id, and for that I get all the appointments using the patient_id,
find the highest appointment_id and use its clinic_id to set the last known clinic_id.
My old query did this, but it skipped patients that never had an appointment.
These are my current results, but I need to filter my results. Question is, how?
+---------------+-------------------+-------------------+---------------+
| patient_id | country_code | appointment_id | clinic_id |
+---------------+-------------------+-------------------+---------------+
| 111 | UK | 620 | 3 |
| 111 | UK | 621 | 2 |
| 111 | UK | 1995 | 1 |
| 222 | UK | 609 | 3 |
| 222 | UK | 610 | 2 |
| 333 | UK | null | null |
| 444 | UK | null | null |
+---------------+-------------------+-------------------+---------------+
What I want is the following:
+---------------+-------------------+-------------------+---------------+
| patient_id | country_code | appointment_id | clinic_id |
+---------------+-------------------+-------------------+---------------+
| 111 | UK | 1995 | 1 |
| 222 | UK | 610 | 2 |
| 333 | UK | null | null |
| 444 | UK | null | null |
+---------------+-------------------+-------------------+---------------+
I am using the following query right now:
SELECT
patient.id,
systemcountry.country_code,
appointment_patient.appointment_id,
appointment.clinic_id
FROM
patient
LEFT JOIN
systemcountry ON patient.country_id = systemcountry.id
LEFT JOIN
appointment_patient ON patient_id = patient.id
LEFT JOIN
appointment ON appointment_patient.appointment_id = appointment.id
This was my old query, which had an issue causing it to skip patients that never had an appointment:
SELECT
patient.id AS patient_id,
systemcountry.code AS systemcountry_code,
appointment.clinic_id
FROM
patient
LEFT JOIN
systemcountry ON patient.land_id = systemcountry.id,
appointment
WHERE
appointment.id = (SELECT
MAX(appointment_id)
FROM
appointment_patient
WHERE
patient_id = patient.id);
I am still a beginner, so go easy on me.
I appreciate any input, thanks!
Move the sub-select in your original query's WHERE clause into a LEFT JOIN (something like this):
LEFT JOIN
(SELECT MAX(appointment_id), patient_id
FROM appointment_patient
GROUP BY patient_id) as apt ON patient.patient_id=apt.patient_id
You can try function max() over columns whose only highest value you want, then group by the result set using patient.id
SELECT
patient.id,
systemcountry.country_code,
max(appointment_patient.appointment_id),
appointment.clinic_id
FROM
patient
LEFT JOIN
systemcountry ON patient.country_id = systemcountry.id
LEFT JOIN
appointment_patient ON patient_id = patient.id
LEFT JOIN
appointment ON appointment_patient.appointment_id = appointment.id
GROUP BY patient.id

How to ORDER BY CALCULATED SUM with Another Table in MySQL

For example, I have 2 table 'meta' and 'log'
in meta table:
| type | score |
|------|-------|
| a | 1 |
| b | 2 |
| c | 3 |
in log table:
| log_id | log_type | object_id |
|--------|----------|-----------|
| 1 | a | 13 |
| 2 | b | 13 |
| 3 | a | 14 |
| 4 | c | 14 |
| 5 | b | 15 |
| 6 | c | 15 |
so we know:
object 13 got score: a+b = 3
object 14 got score: a+c = 4
object 15 got score: b+c = 5
I want to query log table group by object id and order by sum of object score, is it possible?
select log.object_id, sum(meta.score)
from log
left join meta on meta.type = log.log_type
group by log.object_id
order by sum(meta.score) desc
This will produced the desired output
SELECT object_id, sum(score) from log
INNER JOIN meta on meta.type = log.log_type group by object_id ORDER BY sum(score);
But have you got the correct table design? You need to join on the meta.type and log.log_type column but this implies that if the log_type is 'a' the value of 3 is common for all object_ids is this really what you want?
Try this
Select object_id,sum(score) as c FROM `log` as a INNER JOIN `meta` as b on a.log_type=b.type group by object_id order by c desc
SELECT log.object_id, sum(score) total_score
FROM meta
INNER JOIN log on meta.type = log.log_type
GROUP BY log.object_id

MySQL select unique rows in two columns with the highest value in one column

I have a basic table:
+-----+--------+------+------+
| id, | name, | cat, | time |
+-----+--------+------+------+
| 1 | jamie | 1 | 100 |
| 2 | jamie | 2 | 100 |
| 3 | jamie | 1 | 50 |
| 4 | jamie | 2 | 150 |
| 5 | bob | 1 | 100 |
| 6 | tim | 1 | 300 |
| 7 | alice | 4 | 100 |
+-----+--------+------+------+
I tried using the "Left Joining with self, tweaking join conditions and filters" part of this answer: SQL Select only rows with Max Value on a Column but some reason when there are records with a value of 0 it breaks, and it also doesn't return every unique answer for some reason.
When doing the query on this table I'd like to receive the following values:
+-----+--------+------+------+
| id, | name, | cat, | time |
+-----+--------+------+------+
| 1 | jamie | 1 | 100 |
| 4 | jamie | 2 | 150 |
| 5 | bob | 1 | 100 |
| 6 | tim | 1 | 300 |
| 7 | alice | 4 | 100 |
+-----+--------+------+------+
Because they are unique on name and cat and have the highest time value.
The query I adapted from the answer above is:
SELECT a.name, a.cat, a.id, a.time
FROM data A
INNER JOIN (
SELECT name, cat, id, MAX(time) as time
FROM data
WHERE extra_column = 1
GROUP BY name, cat
) b ON a.id = b.id AND a.time = b.time
The issue here is that ID is unique per row you can't get the unique value when getting the max; you have to join on the grouped values instead.
SELECT a.name, a.cat, a.id, a.time
FROM data A
INNER JOIN (
SELECT name, cat, MAX(time) as time
FROM data
WHERE extra_column = 1
GROUP BY name, cat
) b ON A.Cat = B.cat and A.Name = B.Name AND a.time = b.time
Think about it... So what ID is mySQL returning form the Inline view? It could be 1 or 3 and 2 or 4 for jamie. Hows does the engine know to pick the one with the max ID? it is "free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. " it could pick the wrong one resulting in incorrect results. So you can't use it to join on.
https://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html
If you want to use a self join, you could use this query:
SELECT
d1.*
FROM
date d1 LEFT JOIN date d2
ON d1.name=d2.name
AND d1.cat=d2.cat
AND d1.time<d2.time
WHERE
d2.time IS NULL
It is very simple
SELECT MAX(TIME),name,cat FROM table name group by cat

More concise SQL query involving MAX()

inventory
+------------------+-------------------+------------+
| DVD | replacement_price | stock |
+------------------+-------------------+------------+
| Pi | 9.99 | 500 |
| Dune | 29.99 | 100 |
| Heathers | 4.99 | 20 |
| Jaws | 19.99 | 500 |
| Mulholland_Drive | 39.99 | 50 |
| Waking_Life | 29.99 | 200 |
+------------------+-------------------+------------+
rented
+-----------------+-----------+------------------+
| subscriber | queue_nbr | DVD |
+-----------------+-----------+------------------+
| Bob | 1 | Mulholland_Drive |
| Bob | 2 | Jaws |
| Chey | 1 | Pi |
| Chey | 2 | Heathers |
| Jamie | 2 | Mulholland_Drive |
| Jamie | 4 | Dune |
| Jamie | 1 | Jaws |
| Jamie | 3 | Waking_Life |
| Nora | 4 | Jaws |
| Nora | 2 | Mulholland_Drive |
| Nora | 3 | Dune |
| Nora | 1 | Waking_Life |
+-----------------+-----------+------------------+
I want to return ONLY the subscriber(s) with the priciest movie queue (think Netflix DVD replacement costs if you lost all the movies you had out at a given time). I've used MAX() rather than TOP, LIMIT or ROWNUM because the query needs to be as db-independent as possible and must return multiple subscribers in the event of a tie. Using the tables above, the result should be
+---------+
| highest |
+---------+
| Jamie |
| Nora |
+---------+
After much searching and experimentation, I've come up with code that works, but it seems to my novice eyes bloated and inefficient, both in quantity of code and execution.
Would anyone mind refactoring and explaining your code?
My code:
SELECT z.subscriber highest
FROM
(SELECT MAX(price) max_price
FROM (
SELECT subscriber_name subscriber, SUM(replacement_price) price
FROM inventory i
INNER JOIN rented r
ON i.DVD = r.DVD
GROUP BY subscriber
) x
) y
INNER JOIN
(
SELECT subscriber_name subscriber, SUM(replacement_price) price
FROM inventory i
INNER JOIN rented r
ON i.DVD = r.DVD
GROUP BY subscriber
) z
ON z.price = y.max_price
If you want to return only those with the max total, then you could use the following which works in both MySQL and SQL Server. It is not any more concise than your current query though:
select subscriber
from inventory i
inner join rented r
on i.dvd = r.dvd
group by subscriber
having sum(replacement_price) = (select max(TotalCost)
from
(
select sum(replacement_price) TotalCost
from inventory i
inner join rented r
on i.dvd = r.dvd
group by subscriber
) p);
If you are using SQL Server, then I would suggest implementing windowing functions, similar to this:
select subscriber
from
(
select subscriber,
rank() over(order by sum(replacement_price) desc) rnk
from inventory i
inner join rented r
on i.dvd = r.dvd
group by subscriber
) src
where rnk = 1
See SQL Fiddle with Demo
SELECT z.subscriber
FROM(
SELECT RANK() OVER(ORDER BY SUM(replacement_price)) subscriber_rank,
r.subscriber subscriber,
SUM(replacement_price) totalReplacementPrice
FROM inventory i
INNER JOIN rented r ON i.dvd = r.DVD
GROUP BY subscriber
) z
WHERE z.subscriber_rank = 1
Some of your column names are different in you query from you sql sample, so I've used the column names given in the demo tables. I use the rank function in the inner query to find the order of all of the people ordering by the sum of the replacement_price. Then select the row(s) where the rank is 1.
Rank is available in both MS Sql Server and Oracle. To go much further than that as #bluefeet says you will need to give more detail as to which database you are targetting.