Optimize nested query to single query - mysql

I have a (MySQL) table containing dates of the last scan of hosts combined with a report ID:
+--------------+---------------------+--------+
| host | last_scan | report |
+--------------+---------------------+--------+
| 112.86.115.0 | 2012-01-03 01:39:30 | 4 |
| 112.86.115.1 | 2012-01-03 01:39:30 | 4 |
| 112.86.115.2 | 2012-01-03 02:03:40 | 4 |
| 112.86.115.2 | 2012-01-03 04:33:47 | 5 |
| 112.86.115.1 | 2012-01-03 04:20:23 | 5 |
| 112.86.115.6 | 2012-01-03 04:20:23 | 5 |
| 112.86.115.2 | 2012-01-05 04:29:46 | 8 |
| 112.86.115.6 | 2012-01-05 04:17:35 | 8 |
| 112.86.115.5 | 2012-01-05 04:29:48 | 8 |
| 112.86.115.4 | 2012-01-05 04:17:37 | 8 |
+--------------+---------------------+--------+
I want to select a list of all hosts with the date of the last scan and the corresponding report id. I have built the following nested query, but I am sure it can be done in a single query:
SELECT rh.host, rh.report, rh.last_scan
FROM report_hosts rh
WHERE rh.report = (
SELECT rh2.report
FROM report_hosts rh2
WHERE rh2.host = rh.host
ORDER BY rh2.last_scan DESC
LIMIT 1
)
GROUP BY rh.host
Is it possible to do this with a single, non-nested query?

No, but you can do a JOIN in your query
SELECT x.*
FROM report_hosts x
INNER JOIN (
SELECT host,MAX(last_scan) AS last_scan FROM report_hosts GROUP BY host
) y ON x.host=y.host AND x.last_scan=y.last_scan
Your query is doing a filesort, which is very inefficient. My solutions doesn't. It's very advisable to create an index on this table
ALTER TABLE `report_hosts` ADD INDEX ( `host` , `last_scan` ) ;
Else your query will do a filesort twice.

If you want to select from the report_hosts table only once then you could use a sort of 'RANK OVER PARTITION' method (available in Oracle but not, sadly, in MySQL). Something like this should work:
select h.host,h.last_scan as most_recent_scan,h.report
from
(
select rh.*,
case when #curHost != rh.host then #rank := 1 else #rank := #rank+1 end as rank,
case when #curHost != rh.host then #curHost := rh.host end
from report_hosts rh
cross join (select #rank := null,#curHost = null) t
order by host asc,last_scan desc
) h
where h.rank = 1;
Granted it is still nested but it does avoid the 'double select' problem. Not sure if it will be more efficient or not - kinda depends what indexes you have and volume of data.

Related

Target table not updatable? Creating a sequence column with ORDER BY using MySql 5.5.62

I'm inputting the following query using MySql 5.5.62, but getting an error message that the table is not updatable?
SET #row_number = 0
> OK
> Time: 0,064s
UPDATE `tbl_a` t
JOIN ( SELECT ( #row_number := #row_number + 1 ) pNum,
pDateHour, pPrio, pID FROM `tbl_a` ORDER BY pDateHour ASC ) q ON t.pID = q.pID
SET t.pPrio = q.pNum
> 1288 - The target table q of the UPDATE is not updatable
> Time: 0,068s
What I'm trying to achieve is to by adding a sequence column that I can sort my results on instead of sorting on multiple columns. It's based on an ORDER BY clause involving 1 column.
I need this result
+-------+---------------------+
| pPrio | pDateHour |
+-------+---------------------+
| 1 | 2021-09-16 18:40:02 |
| 2 | 2021-09-16 19:00:20 |
| 3 | 2021-09-16 19:20:47 |
| 4 | 2021-09-16 20:00:59 |
| 5 | 2021-09-16 20:01:48 |
| 6 | 2021-09-16 20:20:31 |
| 7 | 2021-09-16 20:40:05 |
+-------+---------------------+
The sequence column you seem to want is actually a derived column based on pDateHour. Therefore, it might make more sense to actually generate this sequence at the time you query, rather than doing an update:
SELECT (#rn := #rn + 1) AS pPrio, pDateHour
FROM yourTable, (SELECT #rn := 0) AS x
ORDER BY pDateHour;

Alternative to "ntile" for MySQL version lower than 8?

I am trying the below code, which analyses and scores customers based on recency, frequency and monetary value of transactions.
select customer_id, rfm_recency, rfm_frequency, rfm_monetary
from
(
select customer_id,
ntile(4) over (order by last_order_date) as rfm_recency,
ntile(4) over (order by count_order) as rfm_frequency,
ntile(4) over (order by sum_amount) as rfm_monetary
from
(
select customer_id,
max(local_date) as last_order_date,
count(*) as count_order,
sum(amount) as sum_amount
from transaction
group by customer_id) as T
) as P
However ntile is not available in my MySQL version (v5) as apparently it's a "window function" which works on v8+ only.
I can't find a working alternative to this function. I am very new to SQL so I'm having a hard time figuring it out myself.
Is there an ntile alternative that I can use? The code works fine if i remove the ntile segment.
You should really upgrade to MySQL 8.0 if you need features in MySQL 8.0. They are bound to be easier and more optimized.
I found a way to simulate the ntile query shown in the documentation:
SELECT
val,
ROW_NUMBER() OVER w AS 'row_number',
NTILE(2) OVER w AS 'ntile2',
NTILE(4) OVER w AS 'ntile4'
FROM numbers
WINDOW w AS (ORDER BY val);
Here's a solution:
SELECT val, #r:=#r+1 AS rownum,
FLOOR((#r-1)*2/9)+1 AS ntile2,
FLOOR((#r-1)*4/9)+1 AS ntile4
FROM (SELECT #r:=0,#n:=0) AS _init, numbers
The 2 and 4 factors are for the ntile(2) and ntile(4) respectively. The 9 value is because there are 9 rows in this example table. You must know the count of the table before you can run this query. The solution also requires user defined variables, which are always kind of tricky.
Result:
+------+--------+--------+--------+
| val | rownum | ntile2 | ntile4 |
+------+--------+--------+--------+
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 1 |
| 2 | 3 | 1 | 1 |
| 3 | 4 | 1 | 2 |
| 3 | 5 | 1 | 2 |
| 3 | 6 | 2 | 3 |
| 4 | 7 | 2 | 3 |
| 4 | 8 | 2 | 4 |
| 5 | 9 | 2 | 4 |
+------+--------+--------+--------+
I'll leave it as an exercise for you to adapt this technique to your query and your table, or to decide that it's time to upgrade to MySQL 8.0.
You can enumerate rows and use arithmetic. Unfortunately, you'll need to do this three times:
select floor(seqnum * 4 / #rn) as ntile_recency, t.*
from (select (#rn := #rn + 1) as seqnum, t.*
from (select customer_id, max(local_date) as last_order_date, count(*) as count_order,
sum(amount) as sum_amount
from transaction
group by customer_id
order by last_order_date
) t cross join
(select #rn := 0) params
) t;

Limit on join query using between

I am trying to filter some results indexed by a timestamp using another set of results that define valid timestamp periods.
Current query:
SELECT Measurements.moment AS "moment",
Measurements.actualValue,
start,
stop
FROM Measurements
INNER JOIN (SELECT COALESCE(#previousValue <> M.actualValue AND #previousResource = M.resourceId, 1) AS "changed",
(COALESCE(#previousMoment, ?)) AS "start",
M.moment AS "stop",
#previousValue AS "actualValue",
M.resourceId,
#previousMoment := moment,
#previousValue := M.actualValue,
#previousResource := M.resourceId
FROM Measurements `M`
INNER JOIN (SELECT #previousValue := NULL, #previousResource := NULL, #previousMoment := NULL) `d`
WHERE (M.moment BETWEEN ? AND ?) AND
(M.actualValue > ?)
ORDER BY M.resourceId ASC, M.moment ASC) `changes` ON Measurements.moment BETWEEN changes.start AND changes.stop
WHERE (Measurements.resourceId = 1) AND
(Measurements.moment BETWEEN ? AND ?) AND
(changes.changed)
ORDER BY Measurements.moment ASC;
resourceId, moment is already an index.
Since these are actually timeseries data, is there any way to limit the join on just 1 match to improve performance?
Sample data
+-------------+---------------------+------------+
| actualValue | moment | resourceId |
+-------------+---------------------+------------+
| 0.01 | 2018-09-26 07:50:25 | 1 |
| 0.01 | 2018-09-26 07:52:35 | 1 |
| 0.01 | 2018-09-26 07:52:44 | 2 |
| 0.01 | 2018-09-26 07:52:54 | 1 |
| 0.01 | 2018-09-26 07:53:03 | 1 |
| 0.01 | 2018-09-26 07:53:13 | 2 |
| 0.01 | 2018-09-26 07:53:22 | 1 |
| 0.01 | 2018-09-26 07:54:32 | 1 |
| 0.01 | 2018-09-26 07:55:41 | 1 |
| 0.01 | 2018-09-26 07:56:51 | 1 |
+-------------+---------------------+------------+
Expected output: All measurements with resourceId=1 where resourceId=2 had a measurement in that same minute (in an advanced version, the minute can be dynamic).
+-------------+---------------------+------------+
| actualValue | moment | resourceId |
+-------------+---------------------+------------+
| 0.01 | 2018-09-26 07:52:35 | 1 |
| 0.01 | 2018-09-26 07:52:54 | 1 |
| 0.01 | 2018-09-26 07:53:03 | 1 |
| 0.01 | 2018-09-26 07:53:22 | 1 |
+-------------+---------------------+------------+
When you use an independent subquery (this case) then it's executed entirely before the outer query. In your case this could be potentially massive, and probably most of the rows are not really neeeded.
If you rephrase the query using an inner JOIN then the secondary access to the table will be filtered out immediately, avoiding the need for a full scan of the table.
Try the following query:
select
m.moment,
m.actualValue,
c.moment as start,
timestampadd(minute, 1, c.moment) as stop
from Measurements m
join Measurements c on m.moment
between c.moment and timestampadd(minute, 1, c.moment)
where m.resourceId = 1
and c.resourceId = 2
and m.moment between ? and ?
order by m.moment
Composite Index needed:
Measurements: INDEX(resourceId, moment) -- in this order
You may want AND (Measurements.moment BETWEEN ? AND ?) in the subquery
In a "derived table" (the subquery you have), the Optimizer is free to ignore the ORDER BY. If, however, you add a LIMIT, the ORDER BY will be honored.
I found a solution using table unpivoting:
SELECT moment, value
FROM (SELECT IF(resourceId = ? AND #previousValue = 0, NULL, actualValue) AS value,
measurements.moment,
resourceId,
#previousValue := IF(resourceId <> ?, actualValue, #previousValue) AS enabled
FROM (SELECT *
FROM (SELECT moment,
Measurements.actualValue,
Measurements.resourceId AS resourceId
FROM Measurements
WHERE Measurements.resourceId = ?
AND moment BETWEEN ? AND ?
UNION (SELECT start,
periods.actualValue AS actualValue,
resourceId
FROM (SELECT COALESCE(#previousValue <> M3.actualValue, 1) AS "changed",
(COALESCE(#previousMoment, ?)) AS "start",
#previousMoment := M3.moment AS "stop",
COALESCE(#previousValue, IF(M3.actualValue = 1, 0, 1)) AS "actualValue",
M3.resourceId AS resourceId,
#previousValue := M3.actualValue
FROM Measurements `M3`
INNER JOIN (SELECT #previousValue := NULL,
#previousMoment := NULL) `d`
WHERE (M3.moment BETWEEN ? AND ?)
ORDER BY M3.resourceId ASC, M3.moment ASC) AS periods
WHERE periods.changed)) AS measurements
ORDER BY moment ASC) AS measurements
INNER JOIN (SELECT #previousValue := NULL) `k`) AS mixed
WHERE value IS NOT NULL
AND resourceId = ?;
This runs essentially runs table once per select, running ~40k x ~4k rows in 100ms.

Get user's highest score from a table

I have a feeling this is a very simple question but maybe i'm having brain fart right now and just can't seem to figure out how to go about it.
I have a MySQL table structure like below
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 1 | 2016-11-17 | 2 | 133291 | 17 |
| 2 | 2016-11-17 | 6 | 82247 | 17 |
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 4 | 2016-11-17 | 1 | 109338 | 17 |
| 5 | 2016-11-17 | 7 | 64762 | 61 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Now i can get a particular user's best performance by doing this
SELECT *
FROM performance
WHERE user_id = 17 AND date = '2016-11-17'
ORDER BY score desc,speed asc LIMIT 1
This should return the row with ID = 3. Now what I want is a single query to run to be able to return that 1 such row for each unique user_id in the table. So the resulting result would be something like this
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Also further more, can I have another question within this same query that would further sort this eventual resulting table by the same criteria of sort (score desc, speed asc). Thanks
A simple method uses a correlated subquery:
select p.*
from performance p
where p.date = '2016-11-17' and
p.id = (select p2.id
from performance p2
where p2.user_id = p.user_id and p2.date = p.date
order by score desc, speed asc
limit 1
);
This should be able to take advantage of an index on performance(date, user_id, score, speed).
Is easy using variable to emulate row_number() over (partition by Order by)
Explanation:
First create two variables in the subquery.
Order by user_id so when user change the #rn reset to 1
Order by score desc, speed asc so each row will have a row_number, and the one you want always will have rn = 1
#rn := you change #rn for each row
if you have a new user_id then #rn is set to 1
otherwise #rn is set to #rn+1
SQL Fiddle Demo
SELECT `id`, `date`, `score`, `speed`, `user_id`
FROM (
SELECT *,
#rn := if(#user_id = `user_id`,
#rn + 1 ,
if(#user_id := `user_id`,1,1)
) as rn
FROM Table1
CROSS JOIN (SELECT #user_id := 0, #rn := 0) as param
WHERE date = '2016-11-17'
ORDER BY `user_id`, `score` desc, `speed` asc
) T
where T.rn =1
OUTPUT
For mysql
You can try with a double in subselect and group by
select * from performance
where (user_id, score,speed ) in (
SELECT user_id, max_score, max(speed)
FROM performance
WHERE (user_id, score) in (select user_id, max(score) max_score
from performance
group by user_id)
group by user_id, max_score
);

Split a column into a defined range in MYSQL

I have a table which looks like this:
+-----------------------
| id | first_name
+-----------------------
| AC0089 | John |
| AC0015 | Dan |
| AC0017 | Marry |
| AC0003 | Andy |
| AC0001 | Trent |
| AC0006 | Smith |
+-----------------------
I need a query to split the id in the range of 3 and also display the starting id of that range i.e.
+------------+----------+--------
| startrange | endrange | id
+------------+----------+--------
| 1 | 3 | AC0089
| 4 | 6 | AC0003
+------------+----------+--------
I am pretty new to SQL and trying the below query but I dont think I am near to the correct solution at all ! Here is the query:
select startrange, endrange, id from table inner join (select 1 startRange, 3 endrange union all select 4 startRange, 6 endRange) r group by r.startRange, r.endRange;
It is giving the same id every-time and I am not able to come up with any other solution. How Can I get the required output?
Try this
SET #ct := 0;
select startrange,(startrange + 2) as endrange, seq_no from
(select (c.st - (select count(*) from <table_name>)) as startrange, c.* from
(select (#ct := #ct + 1) as st, b.* from <table_name> as b) c
having startrange mod 3 = 1) as cc;
sorry for formating.
I'm not completely sure what your trying to do but if you're trying to convert a table of ID's into ranges use a case when.
CASE WHEN startrange in(1,2,3) THEN 1
ELSE NULL
END as startrange,
CASE WHEN endrange in(1,2,3) THEN 3
ELSE NULL
END as endrange,
CASE WHEN ID in(1,2,3) THEN id
WHEN ID in(4,5,6) THEN id
ELSE id
END AS ID