Limit on join query using between - mysql

I am trying to filter some results indexed by a timestamp using another set of results that define valid timestamp periods.
Current query:
SELECT Measurements.moment AS "moment",
Measurements.actualValue,
start,
stop
FROM Measurements
INNER JOIN (SELECT COALESCE(#previousValue <> M.actualValue AND #previousResource = M.resourceId, 1) AS "changed",
(COALESCE(#previousMoment, ?)) AS "start",
M.moment AS "stop",
#previousValue AS "actualValue",
M.resourceId,
#previousMoment := moment,
#previousValue := M.actualValue,
#previousResource := M.resourceId
FROM Measurements `M`
INNER JOIN (SELECT #previousValue := NULL, #previousResource := NULL, #previousMoment := NULL) `d`
WHERE (M.moment BETWEEN ? AND ?) AND
(M.actualValue > ?)
ORDER BY M.resourceId ASC, M.moment ASC) `changes` ON Measurements.moment BETWEEN changes.start AND changes.stop
WHERE (Measurements.resourceId = 1) AND
(Measurements.moment BETWEEN ? AND ?) AND
(changes.changed)
ORDER BY Measurements.moment ASC;
resourceId, moment is already an index.
Since these are actually timeseries data, is there any way to limit the join on just 1 match to improve performance?
Sample data
+-------------+---------------------+------------+
| actualValue | moment | resourceId |
+-------------+---------------------+------------+
| 0.01 | 2018-09-26 07:50:25 | 1 |
| 0.01 | 2018-09-26 07:52:35 | 1 |
| 0.01 | 2018-09-26 07:52:44 | 2 |
| 0.01 | 2018-09-26 07:52:54 | 1 |
| 0.01 | 2018-09-26 07:53:03 | 1 |
| 0.01 | 2018-09-26 07:53:13 | 2 |
| 0.01 | 2018-09-26 07:53:22 | 1 |
| 0.01 | 2018-09-26 07:54:32 | 1 |
| 0.01 | 2018-09-26 07:55:41 | 1 |
| 0.01 | 2018-09-26 07:56:51 | 1 |
+-------------+---------------------+------------+
Expected output: All measurements with resourceId=1 where resourceId=2 had a measurement in that same minute (in an advanced version, the minute can be dynamic).
+-------------+---------------------+------------+
| actualValue | moment | resourceId |
+-------------+---------------------+------------+
| 0.01 | 2018-09-26 07:52:35 | 1 |
| 0.01 | 2018-09-26 07:52:54 | 1 |
| 0.01 | 2018-09-26 07:53:03 | 1 |
| 0.01 | 2018-09-26 07:53:22 | 1 |
+-------------+---------------------+------------+

When you use an independent subquery (this case) then it's executed entirely before the outer query. In your case this could be potentially massive, and probably most of the rows are not really neeeded.
If you rephrase the query using an inner JOIN then the secondary access to the table will be filtered out immediately, avoiding the need for a full scan of the table.
Try the following query:
select
m.moment,
m.actualValue,
c.moment as start,
timestampadd(minute, 1, c.moment) as stop
from Measurements m
join Measurements c on m.moment
between c.moment and timestampadd(minute, 1, c.moment)
where m.resourceId = 1
and c.resourceId = 2
and m.moment between ? and ?
order by m.moment

Composite Index needed:
Measurements: INDEX(resourceId, moment) -- in this order
You may want AND (Measurements.moment BETWEEN ? AND ?) in the subquery
In a "derived table" (the subquery you have), the Optimizer is free to ignore the ORDER BY. If, however, you add a LIMIT, the ORDER BY will be honored.

I found a solution using table unpivoting:
SELECT moment, value
FROM (SELECT IF(resourceId = ? AND #previousValue = 0, NULL, actualValue) AS value,
measurements.moment,
resourceId,
#previousValue := IF(resourceId <> ?, actualValue, #previousValue) AS enabled
FROM (SELECT *
FROM (SELECT moment,
Measurements.actualValue,
Measurements.resourceId AS resourceId
FROM Measurements
WHERE Measurements.resourceId = ?
AND moment BETWEEN ? AND ?
UNION (SELECT start,
periods.actualValue AS actualValue,
resourceId
FROM (SELECT COALESCE(#previousValue <> M3.actualValue, 1) AS "changed",
(COALESCE(#previousMoment, ?)) AS "start",
#previousMoment := M3.moment AS "stop",
COALESCE(#previousValue, IF(M3.actualValue = 1, 0, 1)) AS "actualValue",
M3.resourceId AS resourceId,
#previousValue := M3.actualValue
FROM Measurements `M3`
INNER JOIN (SELECT #previousValue := NULL,
#previousMoment := NULL) `d`
WHERE (M3.moment BETWEEN ? AND ?)
ORDER BY M3.resourceId ASC, M3.moment ASC) AS periods
WHERE periods.changed)) AS measurements
ORDER BY moment ASC) AS measurements
INNER JOIN (SELECT #previousValue := NULL) `k`) AS mixed
WHERE value IS NOT NULL
AND resourceId = ?;
This runs essentially runs table once per select, running ~40k x ~4k rows in 100ms.

Related

Optimizing SQL Query for max value with various conditions from a single MySQL table

I have the following SQL query
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
AND (`id` =
(
SELECT `id`
FROM `sensor_data` AS `sd2`
WHERE sd1.mid = sd2.mid
AND sd1.sid = sd2.sid
ORDER BY `value` DESC, `id` DESC
LIMIT 1)
)
Background:
I've checked the validity of the query by changing LIMIT 1 to LIMIT 0, and the query works without any problem. However with LIMIT 1 the query doesn't complete, it just states loading until I shutdown and restart.
Breaking the Query down:
I have broken down the query with the date boundary as follows:
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
This takes about 0.24 seconds to return the query with 8200 rows each having 5 columns.
Question:
I suspect the second half of my Query, is not correct or well optimized.
The tables are as follows:
Current Table:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 52 | 10 | 2 | 39 | 2015-05-13 11:56:25 |
| 53 | 10 | 2 | 40 | 2015-05-13 11:56:42 |
| 54 | 10 | 2 | 40 | 2015-05-13 11:56:45 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 57 | 11 | 2 | 18 | 2015-05-13 11:58:41 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 59 | 11 | 3 | 58 | 2015-05-13 11:59:01 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Q: How would I get the MAX(v)for each sid for each mid?
NB#1: In the example above ROW 53, 54, 55 have all the same value (40), but I would like to retrieve the row with the most recent timestamp, which is ROW 55.
Expected Output:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Structure of the table:
NB#2:
Since this table has over 110 million entries, it is critical to have have date boundaries, which limits to ~8000 entries over a 24 hour period.
The query can be written as follows:
SELECT t1.id, t1.mid, t1.sid, t1.v, t1.ts
FROM yourtable t1
INNER JOIN (
SELECT mid, sid, MAX(v) as v
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid
) t2
ON t1.mid = t2.mid
AND t1.sid = t2.sid
AND t1.v = t2.v
INNER JOIN (
SELECT mid, sid, v, MAX(ts) as ts
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid, v
) t3
ON t1.mid = t3.mid
AND t1.sid = t3.sid
AND t1.v = t3.v
AND t1.ts = t3.ts;
Edit and Explanation:
The first sub-query (first INNER JOIN) fetches MAX(v) per (mid, sid) combination. The second sub-query is to identify MAX(ts) for every (mid, sid, v). At this point, the two queries do not influence each others' results. It is also important to note that ts date range selection is done in the two sub-queries independently such that the final query has fewer rows to examine and no additional WHERE filters to apply.
Effectively, this translates into getting MAX(v) per (mid, sid) combination initially (first sub-query); and if there is more than one record with the same value MAX(v) for a given (mid, sid) combo, then the excess records get eliminated by the selection of MAX(ts) for every (mid, sid, v) combination obtained by the second sub-query. We then simply associate the output of the two queries by the two INNER JOIN conditions to get to the id of the desired records.
Demo
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.mid)
union
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.sid);
IN ( SELECT ... ) does not optimize well. It is even worse because of being correlated.
What you are looking for is a groupwise-max .
Please provide SHOW CREATE TABLE; we need to know at least what the PRIMARY KEY is.
Suggested code
You will need:
With the WHERE: INDEX(timestamp, mid, sid, v, id)
Without the WHERE: INDEX(mid, sid, v, timestamp, id)
Code:
SELECT id, mid, sid, v, timestamp
FROM ( SELECT #prev_mid := 99999, -- some value not in table
#prev_sid := 99999,
#n := 0 ) AS init
JOIN (
SELECT #n := if(mid != #prev_mid OR
sid != #prev_sid,
1, #n + 1) AS n,
#prev_mid := mid,
#prev_sid := sid,
id, mid, sid, v, timestamp
FROM sensor_data
WHERE timestamp >= '2017-05-13'
timestamp < '2017-05-13' + INTERVAL 1 DAY
ORDER BY mid DESC, sid DESC, v DESC, timestamp DESC
) AS x
WHERE n = 1
ORDER BY mid, sid; -- optional
Notes:
The index is 'composite' and 'covering'.
This should make one pass over the index, thereby providing 'good' performance.
The final ORDER BY is optional; the results may be in reverse order.
All the DESC in the inner ORDER BY must be in place to work correctly (unless you are using MySQL 8.0).
Note how the WHERE avoids including both midnights? And avoids manually computing leap-days, year-ends, etc?
With the WHERE (and associated INDEX), there will be filtering, but a 'sort'.
Without the WHERE (and the other INDEX), sort will not be needed.
You can test the performance of any competing formulations via this trick, even if you do not have enough rows (yet) to get reliable timings:
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
This can also be used to compare different versions of MySQL and MariaDB -- I have seen 3 significantly different performance characteristics in a related groupwise-max test.

Get user's highest score from a table

I have a feeling this is a very simple question but maybe i'm having brain fart right now and just can't seem to figure out how to go about it.
I have a MySQL table structure like below
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 1 | 2016-11-17 | 2 | 133291 | 17 |
| 2 | 2016-11-17 | 6 | 82247 | 17 |
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 4 | 2016-11-17 | 1 | 109338 | 17 |
| 5 | 2016-11-17 | 7 | 64762 | 61 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Now i can get a particular user's best performance by doing this
SELECT *
FROM performance
WHERE user_id = 17 AND date = '2016-11-17'
ORDER BY score desc,speed asc LIMIT 1
This should return the row with ID = 3. Now what I want is a single query to run to be able to return that 1 such row for each unique user_id in the table. So the resulting result would be something like this
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Also further more, can I have another question within this same query that would further sort this eventual resulting table by the same criteria of sort (score desc, speed asc). Thanks
A simple method uses a correlated subquery:
select p.*
from performance p
where p.date = '2016-11-17' and
p.id = (select p2.id
from performance p2
where p2.user_id = p.user_id and p2.date = p.date
order by score desc, speed asc
limit 1
);
This should be able to take advantage of an index on performance(date, user_id, score, speed).
Is easy using variable to emulate row_number() over (partition by Order by)
Explanation:
First create two variables in the subquery.
Order by user_id so when user change the #rn reset to 1
Order by score desc, speed asc so each row will have a row_number, and the one you want always will have rn = 1
#rn := you change #rn for each row
if you have a new user_id then #rn is set to 1
otherwise #rn is set to #rn+1
SQL Fiddle Demo
SELECT `id`, `date`, `score`, `speed`, `user_id`
FROM (
SELECT *,
#rn := if(#user_id = `user_id`,
#rn + 1 ,
if(#user_id := `user_id`,1,1)
) as rn
FROM Table1
CROSS JOIN (SELECT #user_id := 0, #rn := 0) as param
WHERE date = '2016-11-17'
ORDER BY `user_id`, `score` desc, `speed` asc
) T
where T.rn =1
OUTPUT
For mysql
You can try with a double in subselect and group by
select * from performance
where (user_id, score,speed ) in (
SELECT user_id, max_score, max(speed)
FROM performance
WHERE (user_id, score) in (select user_id, max(score) max_score
from performance
group by user_id)
group by user_id, max_score
);

"Faking" a sub-table of results from 3 tables, in SQL

If I have a table of cases:
CASE_NUMBER | CASE_ID | STATUS | SUBJECT |
----------------------------------------------------------------
3108 | 123456 | Closed_Billable | Something Interesting
3109 | 325124 | Closed_Billable | Broken printer
3110 | 432432 | Open_Assigned | Email not working
And a table of calls:
PARENT_ID | STATUS | DUR(H) | DUR(M) | SUBJECT
---------------------------------------------------------------
123456 | Held | 1 | 30 | Initial discussion
123456 | Cancelled | 0 | 0 | Walk user through
123456 | Held | 0 | 45 | Remote debug session
325124 | Held | 1 | 0 | Consultation
325124 | Held | 1 | 15 | Needs assessment
432432 | Held | 1 | 30 | Support call
And a table of meetings:
PARENT_ID | STATUS | DUR(H) | DUR(M) | SUBJECT
-------------------------------------------------------
123456 | Held | 3 | 15 | On-site work
325124 | Held | 2 | 0 | Un-jam printer
432432 | Held | 1 | 0 | Reconnect network
How do I do a select with these parameters (this is not working code, obviously):
SELECT cases.case_number, cases.subject, calls.subject, meetings.subject
WHERE cases.status="Closed_Billable" AND (calls.status="Held" OR meetings.status="Held)
LEFT JOIN cases
ON cases.case_id = calls.parent_id
LEFT JOIN cases
ON cases.case_id = meetings.parent_id
and end up with a "faked" nested table like:
CASE_NUMBER | CASE SUBJECT | # CALLS | # MEETINGS | CALL SUBJECT | MEETING SUBJECT | DURATION (H) | DURATION (M) | TOTAL
-----------------------------------------------------------------------------------------------------------------------------------------
3108 | Something Interesting | 2 | 1 | | | | | 5.5H
| | | | Initial Discussion | | 1 | 30 |
| | | | Remote Debug Session | | 0 | 45 |
| | | | | On-site work | 3 | 15 |
3109 | Broken printer | 2 | 1 | | | | | 4.25H
| | | | Consultation | | 1 | 0 |
| | | | Needs assessment | | 1 | 15 |
| | | | | Un-jam printer | 2 | 0 |
I've tried joins and subqueries the best I can figure out, but I get repeated entries - for example, each Meeting in a Case will show say 3 times, once for each Call in that case.
I'm stumped! Obviously there's other fields I'm pulling here, and doing COUNTs of Calls and Meetings, and SUMs of their durations, but I'd be happy just to show a table/sub-table like this.
Is it even possible?
Thanks,
David.
Assembling a query result in the exact format you want is .. somewhat of a pain. It can be done, but presentation stuff like that is best left to the application.
That said, this will do what you want:
select case when case_id > floor(case_id) then ''
else case_number
end case_number,
coalesce(q1.c, '') calls,
coalesce(q2.c, '') meetings,
coalesce(calls.subject, '') `call subject`,
coalesce(meetings.subject, '') `meeting subject`,
case when calls.subject is not null then calls.dhour
when meetings.subject is not null then meetings.dhour
else ''
end dhour,
case when calls.subject is not null then calls.dmin
when meetings.subject is not null then meetings.dmin
else ''
end dhour,
coalesce(q3.total, '') total
from
(
select case_number, case_id
from cases where status = 'Closed_Billable'
union select case_number, concat(case_id, '.1')
from cases where status = 'Closed_Billable'
union select case_number, concat(case_id, '.2')
from cases where status = 'Closed_Billable'
) main
left join
(select parent_id, count(*) c
from calls
where status != 'Cancelled'
group by parent_id ) q1
on q1.parent_id = case_id
left join
(select parent_id, count(*) c
from meetings
group by parent_id) q2
on q2.parent_id = case_id
left join
(select parent_id, sum(dhour + m) total
from
(select parent_id, dhour, dmin / 60 m
from calls
where status != 'Cancelled'
union all
select parent_id, dhour, dmin / 60 m
from meetings
) qq
group by parent_id
) q3
on q3.parent_id = case_id
left join calls
on concat(calls.parent_id, '.1') = main.case_id
left join meetings
on concat(meetings.parent_id, '.2') = main.case_id
order by case_id asc
Note, i've renamed your duration fields because i dislike the parenthesis in them.
We have to mangle the case_id a little bit inside the query in order to be able to get you your blank rows / fields - those are what makes the query cumbersome
There's a demo here: http://sqlfiddle.com/#!9/d59d4/21
edited code to work with different schema in comment fiddle
select case when case_id > floor(case_id) then ''
else case_number
end case_number,
coalesce(q1.c, '') calls,
coalesce(q2.c, '') meetings,
coalesce(calls.name, '') `call subject`,
coalesce(meetings.name, '') `meeting subject`,
case when calls.name is not null then calls.duration_hours
when meetings.name is not null then meetings.duration_hours
else ''
end duration_hours,
case when calls.name is not null then calls.duration_minutes
when meetings.name is not null then meetings.duration_minutes
else ''
end duration_hours,
coalesce(q3.total, '') total
from
(
select case_number, id as case_id
from cases where status = 'Closed_Billable'
union select case_number, concat(id, '.1') as case_id
from cases where status = 'Closed_Billable'
union select case_number, concat(id, '.2') as case_id
from cases where status = 'Closed_Billable'
) main
left join
(select parent_id, count(*) c
from calls
where status != 'Cancelled'
group by parent_id ) q1
on q1.parent_id = case_id
left join
(select parent_id, count(*) c
from meetings
group by parent_id) q2
on q2.parent_id = case_id
left join
(select parent_id, sum(duration_hours + m) total
from
(select parent_id, duration_hours, duration_minutes / 60 m
from calls
where status != 'Cancelled'
union all
select parent_id, duration_hours, duration_minutes / 60 m
from meetings
) qq
group by parent_id
) q3
on q3.parent_id = case_id
left join calls
on concat(calls.parent_id, '.1') = main.case_id
left join meetings
on concat(meetings.parent_id, '.2') = main.case_id
order by case_id asc
You can't really get final results like that without some seriously ugly "wrapper" queries, of this sort:
SET #prevCaseNum := 'blahdyblahnowaythisshouldmatchanything';
SET #prevCaseSub := 'seeabovetonotmatchanything';
SELECT IF(#prevCaseNum = CASE_NUMBER, '', CASE_NUMBER) AS CASE_NUMBER
, IF(#prevCaseNum = CASE_NUMBER AND #prevCaseSubject = CASE_SUBJECT, '', CASE_SUBJECT) AS CASE_SUBJECT
, etc.....
, #prevCaseNum := CASE_NUMBER AS prevCaseNum
, #prevCaseSubject = CASE_SUBJECT AS prevCaseSub
, etc....
FROM ( [the real query] ORDER BY CASE_NUMBER, etc....) AS trq
;
And then wrap all that with another select to strip the prevCase fields.
And even this still won't give you the blanks you want on the "upper right".

How to update the following rows after the sum of the previous rows reach a threshold? MySQL

I want to update the following rows after the sum of the previous rows reach a defined threshold. I'm using MySQL, and trying to think of a way to solve this using SQL only.
Here's an example. Having the threshold 100. Iterating through the rows, when the sum of the previous rows amount >= 100, set the following rows to checked.
Before the operation:
| id | amount | checked |
| 1 | 50 | false |
| 2 | 50 | false |
| 3 | 20 | false |
| 4 | 30 | false |
After the operation:
| id | amount | checked |
| 1 | 50 | false |
| 2 | 50 | false | <- threshold reached (50 + 50 >= 100)
| 3 | 20 | true* |
| 4 | 30 | true* |
Is it possible to do it with just a SQL query? Do I need a stored procedure? How could I implement it using either solution?
You can do this by calculating the cumulative amount and using update, and join:
update table t join
(select t.*, (select sum(amount) from table t2 where t2.id <= t.id) as cum
from table t
) tcum
on tcum.id = t.id and tcum.cum >= 100
set checked = true;
EDIT:
For faster performance, you can use variables. The following should be a correct way to do this:
update table t join
(select t.*, (#cum := #cum + amount) as cum
from table t cross join
(select #cum := 0) vars
order by t.id
) tcum
on tcum.id = t.id and tcum.cum >= 100
set checked = true;
Something like this? haven't tested so it may need a tweak or two but that should do what you want.
UPDATE table t,
(SELECT
#a := #a + amount AS cumulative_sum
FROM table
JOIN (SELECT #a := 0) as whatever
) temp
SET t.checked = true WHERE temp.cumulative_sum >= 100 ;

Optimize nested query to single query

I have a (MySQL) table containing dates of the last scan of hosts combined with a report ID:
+--------------+---------------------+--------+
| host | last_scan | report |
+--------------+---------------------+--------+
| 112.86.115.0 | 2012-01-03 01:39:30 | 4 |
| 112.86.115.1 | 2012-01-03 01:39:30 | 4 |
| 112.86.115.2 | 2012-01-03 02:03:40 | 4 |
| 112.86.115.2 | 2012-01-03 04:33:47 | 5 |
| 112.86.115.1 | 2012-01-03 04:20:23 | 5 |
| 112.86.115.6 | 2012-01-03 04:20:23 | 5 |
| 112.86.115.2 | 2012-01-05 04:29:46 | 8 |
| 112.86.115.6 | 2012-01-05 04:17:35 | 8 |
| 112.86.115.5 | 2012-01-05 04:29:48 | 8 |
| 112.86.115.4 | 2012-01-05 04:17:37 | 8 |
+--------------+---------------------+--------+
I want to select a list of all hosts with the date of the last scan and the corresponding report id. I have built the following nested query, but I am sure it can be done in a single query:
SELECT rh.host, rh.report, rh.last_scan
FROM report_hosts rh
WHERE rh.report = (
SELECT rh2.report
FROM report_hosts rh2
WHERE rh2.host = rh.host
ORDER BY rh2.last_scan DESC
LIMIT 1
)
GROUP BY rh.host
Is it possible to do this with a single, non-nested query?
No, but you can do a JOIN in your query
SELECT x.*
FROM report_hosts x
INNER JOIN (
SELECT host,MAX(last_scan) AS last_scan FROM report_hosts GROUP BY host
) y ON x.host=y.host AND x.last_scan=y.last_scan
Your query is doing a filesort, which is very inefficient. My solutions doesn't. It's very advisable to create an index on this table
ALTER TABLE `report_hosts` ADD INDEX ( `host` , `last_scan` ) ;
Else your query will do a filesort twice.
If you want to select from the report_hosts table only once then you could use a sort of 'RANK OVER PARTITION' method (available in Oracle but not, sadly, in MySQL). Something like this should work:
select h.host,h.last_scan as most_recent_scan,h.report
from
(
select rh.*,
case when #curHost != rh.host then #rank := 1 else #rank := #rank+1 end as rank,
case when #curHost != rh.host then #curHost := rh.host end
from report_hosts rh
cross join (select #rank := null,#curHost = null) t
order by host asc,last_scan desc
) h
where h.rank = 1;
Granted it is still nested but it does avoid the 'double select' problem. Not sure if it will be more efficient or not - kinda depends what indexes you have and volume of data.