SQL LIMIT dynamic N - mysql

I want to get the (last row) average air_temperature from all stations that have the specified county_number.
Therefor, my solution would be something like
SELECT AVG(air_temperature)
FROM weather
WHERE station_id IN (
SELECT station_id
FROM stations
WHERE county_number = 25
)
ORDER
BY id DESC
LIMIT 1;
Clearly, this does not give the correct row as it returns the average air_temperature based on all air_temperature ever recorded of one station.
Back to the problem, I want to get the average air_temperature over the last inserted row from each station that have the specified county_number.
Table weather
+------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| station_id | char(20) | YES | MUL | NULL | |
| timestamp | timestamp | YES | | NULL | |
| air_temperature | float | YES | | NULL | |
+------------------+-------------+------+-----+---------+----------------+
Table stations
+---------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+-------+
| station_id | char(20) | NO | PRI | NULL | |
| county_number | int(10) | YES | | NULL | |
+---------------+-------------+------+-----+---------+-------+
Tables are minimized

I would recommend doing this with a join and some filtering:
select avg(w.air_temperature)
from weather w join
stations s
on w.station_id = s.station_id
where s.county_number = 25 and
w.timestamp = (select max(w2.timestamp) from weather w2 where w2.station_id = w.station_id)

You can get the last inserted row by checking the max(timestamp):
SELECT
AVG(w.air_temperature)
FROM weather w
INNER JOIN (
SELECT station_id, max(timestamp) maxtimestamp FROM weather GROUP BY station_id
) t
ON w.station_id = t.station_id AND w.timestamp = t.maxtimestamp
WHERE
w.station_id IN (SELECT station_id FROM stations WHERE county_number = 25)

UPDATE: I just noticed that your timestamp column is nullable and you are talking about the "last inserted row". That is the one with the greatest ID. Hence:
As of MySQL 8 you can use window functions in order to read the table only once:
select avg(air_temperature)
from
(
select air_temperature, id, max(id) over (partition by station_id) as max_id
from weather
where station_id in (select station_id from stations where county_number = 25)
) analyzed
where id = max_id;
In older versions you must read the table twice:
select avg(air_temperature)
from weather
where (station_id, id) in
(
select station_id, max(id)
from weather
where station_id in (select station_id from stations where county_number = 25)
group by station_id
);

Related

MySql - Getting rowid / order in a complex query [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Working on a project with students/grades/etc, I need to update the top 3 students every once in a while. I came up with the query below. However, I am having trouble getting their rank/order. I know how to do that in a simple query, but in a more complex one, it is not working.
I am getting all of the other columns correctly, and, with all the methods I tried to get the order by, I sometimes got 0 (like the current state of the code), sometimes values that are just wrong (1, 11, 10), etc.
NOTE: I have checked various questions (including the question below), but I just couldn't figure out how to place them in my query.
What is the best way to generate ranks in MYSQL?
Summary:
GOAL:
- Get sum of each students' marks from marks, divide that on the number of entries in the table (again marks). Students are from a given grade.
- Use sum(mark) to rank these students.
- Get the top three.
- Place the top three students from that grade in the TopStudents table, with their average marks (as sum) and their id's.
TABLES:
Students table contains info about student including id:
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | int (20) unsigned | NO | PRI | NULL | auto_increment |
| name |varchar(20) unsigned | NO | | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
Marks Table has marks of each student on each exam
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id |int (20) unsigned | NO | PRI | NULL | auto_increment |
| idStudent |int (20) unsigned | NO | FOR | NULL | |
| mark |tinyInt (3) unsigned | NO | | NULL | |
| idExam |int (20) unsigned | NO | FOR | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
Grade Table has grade id and name:
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | int (20) unsigned | NO | PRI | NULL | auto_increment |
| name |varchar(20) unsigned | NO | | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
Class Table classes for each grade. References table
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | int (20) unsigned | NO | PRI | NULL | auto_increment |
| name |varchar(20) unsigned | NO | | NULL | |
| idGrade | int (20) unsigned | NO | FOR | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
and finally, the infamous TopStudents Table .
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | int (20) unsigned | NO | PRI | NULL | auto_increment |
| idStudent | int (20) unsigned | NO | FOR | NULL | |
| sumMarks | int (20) unsigned | NO | | NULL | |
| rank |tinyInt (1) unsigned | NO | | NULL | |
| date |date unsigned | NO | | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
ATTEMPTS:
Attempt 1: ERROR: all ranks are 0
INSERT INTO topStudents(`date`, idStudent, `sum`, `order`)
SELECT
'2018-10-10' AS DATE,
student.id AS idStudent,
AVG(marks.mark)
#n = #n + 1 AS `order`
FROM
marks
INNER JOIN student ON student.id = marks.idStudent
INNER JOIN class ON class.id = marks.idClass
INNER JOIN grade ON class.idGrade = grade.id
WHERE
grade.id = 2
GROUP BY
marks.idStudent
ORDER BY
SUM(mark)
DESC
LIMIT 3
Attempt 2: ranks returned: 1, 11, 10
SET #n := 0;
INSERT INTO topStudents(`date`, idStudent, `sum`, `rank`)
SELECT
'2018-10-10' AS DATE,
tbl.idStudent AS idStudent,
AVG(tbl.mark) AS `sum`,
rnk AS `rank`
FROM (SELECT student.id AS idStudent, SUM(mark) AS mark FROM
marks
INNER JOIN student ON student.id = marks.idStudent
INNER JOIN class ON class.id = marks.idClass
INNER JOIN grade ON class.idGrade = grade.id
WHERE
grade.id = 2
GROUP BY
marks.idStudent
ORDER BY
SUM(mark)
DESC
LIMIT 3) AS tbl, (SELECT #n = #n + 1) AS rnk
In more recent versions of MySQL, you need to use a derived table for the ordering, before assigning the ranks:
INSERT INTO topStudents (`date`, idStudent, `sum`, `order`)
SELECT date, idStudent, `sum`, (#n := #n + 1) AS `order`
FROM (SELECT '2018-10-10' AS DATE, s.id AS idStudent,
SUM(m.mark) / (SELECT COUNT(*) FROM marks m2 WHERE m2.idStudent = m.idStudent) AS `sum`
FROM marks m JOIN
student s
ON s.id = m.idStudent JOIN
class c
ON c.id = m.idClass JOIN
grade g
ON c.idGrade = g.id
WHERE g.id = 2
GROUP BY m.idStudent
ORDER BY SUM(mark) DESC
LIMIT 3
) sm CROSS JOIN
(SELECT #n := 0) params;
I am almost certain that the calculation for sum is incorrect, and that you really intend avg(mark). However, this is the logic you have in your question.

Mysql use query result in new query

So I have two tables one called points_log and one called leaderboard.
mysql> describe points_log;
+---------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+-------+
| user_id | int(11) | NO | | NULL | |
| points | int(11) | YES | | 0 | |
| date | date | NO | | NULL | |
+---------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> describe leaderboard;
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| bucket | varchar(255) | YES | | NULL | |
| user_id | int(11) | YES | | NULL | |
| school_id | int(11) | YES | | NULL | |
+-----------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
I have the following query:
SELECT leaderboard.user_id FROM leaderboard where
leaderboard.bucket=(SELECT bucket FROM leaderboard WHERE leaderboard.user_id=$user_id) AND
leaderboard.school_id = (SELECT school_id FROM leaderboard WHERE leaderboard.user_id=$user_id)
This will return one or more rows with user_id's that are in the bucket with $user_id passed in. What I want to do is take all of those user_id's and find run the following query
SELECT sum(points) FROM points_log WHERE user_id=$user_id AND
date >= (SELECT subdate(curdate(), INTERVAL (weekday(now())) DAY))
The issue is this second query if not guaranteed to return something, so in the case that it doesn't return anything I want sum(points) to be 0. I also need to return the user_id,bucket, and sum(points) for each row.
Right now what I have is
SELECT leaderboard.user_id,sum(points_log.points) AS points, leaderboard.bucket
FROM points_log LEFT JOIN leaderboard ON points_log.user_id = leaderboard.user_id
WHERE points_log.DATE >= (SELECT subdate(curdate(), INTERVAL (weekday(now())) DAY))
AND leaderboard.bucket=(SELECT bucket FROM leaderboard WHERE leaderboard.user_id=$user_id)
AND leaderboard.school_id = (SELECT school_id FROM leaderboard WHERE leaderboard.user_id=$user_id)
GROUP BY USER_ID ORDER BY SUM(points) DESC
The issue with this is that it only works when there is a value in points_log for that user. I'm unsure how to make it default to 0 if there is no value.
Any help is greatly appreciated!
SELECT leaderboard.user_id, COALESCE( sum(points_log.points), 0 )AS points, leaderboard.bucket
FROM points_log RIGTH OUTER JOIN leaderboard ON points_log.user_id = leaderboard.user_id
WHERE points_log.DATE >= (SELECT subdate(curdate(), INTERVAL (weekday(now())) DAY))
AND leaderboard.bucket=(SELECT bucket FROM leaderboard WHERE leaderboard.user_id=$user_id)
AND leaderboard.school_id = (SELECT school_id FROM leaderboard WHERE leaderboard.user_id=$user_id)
GROUP BY USER_ID ORDER BY SUM(points) DESC
Try this... note the Outer Join and the COALESCE function.

How to get the difference of a column between the most current date and the earliest date on multiple rows

Here's the columns for table users.
+--------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-----------------+------+-----+---------+----------------+
| uid | int(6) unsigned | YES | | NULL | |
| score | decimal(6,2) | YES | | NULL | |
| status | text | YES | | NULL | |
| date | datetime | YES | | NULL | |
| cid | int(7) unsigned | NO | PRI | NULL | auto_increment |
+--------+-----------------+------+-----+---------+----------------+
I want the difference between a user's most current score and earliest score. I tried:
select co1.uid, co1.score, co1.date from users as co1, (select uid, score, min(date) from users group by uid) as co2 where co2.uid = co1.uid;
This does not work. I also tried
select co1.uid, co1.score, co1.date from users as co1, (select uid, score, max(date) - min(date) from users group by uid) as co2 where co2.uid = co1.uid;
Result I get:http://pastebin.com/seR81WbE
Result I want:
uid max(score)-min(score)
1 40
2 -60
3 23
etc
I think the simplest solution is two joins:
select u.uid, umin.score, umax.score
from (select uid, min(date) as mind, max(date) as maxd
from users
group by uid
) u join
users umin
on u.uid = umin.uid and umin.date = u.mind join
users umax
on u.uid = umax.uid and umax.date = u.maxd;
I should note: if you know the scores are only increasing, you can do the much simpler:
select uid, min(score), max(score)
from users
group by uid;

Get all rows from a table for a particular user along with sum

I have a table called real_estate its structure and data is as follows:-
| id | user_id | details | location | worth
| 1 | 1 | Null | Null | 10000000
| 2 | 1 | Null | Null | 20000000
| 3 | 2 | Null | Null | 10000000
My query is the folloeing:
SELECT * , SUM( worth ) as sum
FROM real_estate
WHERE user_id = '1'
The result which I get from this query is
| id | user_id | details | location | worth | sum
| 1 | 1 | Null | Null | 10000000 | 30000000
I want result to be like
| id | user_id | details | location | worth | sum
| 1 | 1 | Null | Null | 10000000 | 30000000
| 2 | 1 | Null | Null | 20000000 | 30000000
Is there any way to get the result the way I want or should I write 2 different queries?
1)To get the sum of worth
2)To get all the rows for that user
You need to use a subquery that calculates the sum for every user, and then JOIN the result of the subquery with your table:
SELECT real_estate.*, s.user_sum
FROM
real_estate INNER JOIN (SELECT user_id, SUM(worth) AS user_sum
FROM real_estate
GROUP BY user_id) s
ON real_estate.user_id = s.user_id
WHERE
user_id = '1'
but if you just need to return records for a single user, you could use this:
SELECT
real_estate.*,
(SELECT SUM(worth) FROM real_estate WHERE user_id='1') AS user_sum
FROM
real_estate
WHERE
user_id='1'
You can do your sum in a subquery like this
SELECT * , (select SUM(worth) from real_estate WHERE user_id = '1' ) as sum
FROM real_estate WHERE user_id = '1'
Group by id
SELECT * , SUM( worth ) as sum FROM real_estate WHERE user_id = '1' group by id

Optimize MySQL nested select with arithmetic operation

I have this sql query running on MySQL 5.1 non-normalized table. It works the way i want it to, but it can be quite slow. I added an index on the day column but it still needs to be faster. Any suggestions on how to get this faster? (maybe with a join instead?)
SELECT DISTINCT(bucket) AS b,
(possible_free_slots -
(SELECT COUNT(availability)
FROM ip_bucket_list
WHERE bucket = b
AND availability = 'used'
AND tday = 'evening'
AND day LIKE '2012-12-14%'
AND network = '10_83_mh1_bucket')) AS free_slots
FROM ip_bucket_list
ORDER BY free_slots DESC;
The individual queries are fast:
SELECT DISTINCT(bucket) FROM ip_bucket_list;
1024 rows in set (0.05 sec)
SELECT COUNT(availability) from ip_bucket_list WHERE bucket = 0 AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket';
1 row in set (0.00 sec)
Table:
mysql> describe ip_bucket_list;
+---------------------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ip | varchar(50) | YES | | NULL | |
| bucket | int(11) | NO | MUL | NULL | |
| availability | varchar(20) | YES | | NULL | |
| network | varchar(100) | NO | MUL | NULL | |
| possible_free_slots | int(11) | NO | | NULL | |
| tday | varchar(20) | YES | | NULL | |
| day | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+---------------------+--------------+------+-----+-------------------+----------------+
and the DESC:
DESC SELECT DISTINCT(bucket) as b,(possible_free_slots - (SELECT COUNT(availability) from ip_bucket_list WHERE bucket = b AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket')) as free_slots FROM ip_bucket_list ORDER BY free_slots DESC;
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| 1 | PRIMARY | ip_bucket_list | ALL | NULL | NULL | NULL | NULL | 328354 | Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | ip_bucket_list | ref | bucket,network,ip_bucket_list_day_index | bucket | 4 | func | 161 | Using where |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
I would move the correlated subquery from the SELECT clause into the FROM clause, using a join:
SELECT distinct bucket as b,
(possible_free_slots - a.avail) as free_slots
FROM ip_bucket_list ipbl left outer join
(SELECT bucket COUNT(availability) as avail
from ip_bucket_list
WHERE availability = 'used' AND tday = 'evening' AND
day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
) on a
on ipbl.bucket = avail.bucket
ORDER BY free_slots DESC;
The version in the SELECT clause is probably being re-run for every row (even before the distinct is running). By putting it in the from clause, the ip_bucket_list table will be scanned only once.
Also, if you are expecting each bucket to only show up once, then I would recommend that you use group by rather than distinct. It would clarify the purpose of the query. You may be able to eliminate the second reference to the table altogether, with something like:
SELECT bucket as b,
max(possible_free_slots -
(case when availability = 'used' AND tday = 'evening' AND
day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
then 1 else 0
end)
) as free_slots
FROM ip_bucket_list
group by bucket
ORDER BY free_slots DESC;
To speed up your version of the query, you need an index on bucket, because this is used for the correlated subquery.
Try moving the subquery into the main query - like so:
SELECT b.bucket AS b,
b.possible_free_slots - COUNT(l.availability) AS free_slots
FROM ip_bucket_list b
LEFT JOIN ip_bucket_list l
ON l.bucket = b.bucket
AND l.availability = 'used'
AND l.tday = 'evening'
AND l.day LIKE '2012-12-14%'
AND l.network = '10_83_mh1_bucket'
GROUP BY b.bucket, b.possible_free_slots
ORDER BY 2 DESC