MySQL - get min/max of consecutive events in a series of rows - mysql

I have a table that looks like this:
http://sqlfiddle.com/#!9/152d2/1/0
CREATE TABLE Table1 (
id int,
value decimal(10,5),
dt datetime,
threshold_id int
);
Current Query:
SELECT sensors_id, DATE_FORMAT(datetime, '%Y-%m-%d'), MIN(value), MAX(value)
FROM Readings
WHERE datetime < "2015-11-18 00:00:00"
AND datetime > "2015-10-18 00:00:00"
AND sensors_id = 9
GROUP BY DATE_FORMAT(datetime, '%Y-%m-%d')
ORDER BY datetime DESC
What I'm trying to do is to return the min/max value in each group, where threshold_id IS NOT NULL. Therefore, the example should return something like:
min_value | max_value | start_date | end_date
9 | 10.5 | 2015-07-29 10:52:31 | 2015-07-29 10:57:31
8.5 | 9.5 | 2015-07-29 11:03:31 | 2015-07-29 11:05:31
I can't work out how to do this grouping. I need to return the min/max for each group of consecutive rows where the threshold_id IS NOT NULL.

Use user variables to compare existing value to the previous value and increment a column you can use to group by,tested on my machine.
SELECT MIN(value),MAX(value),MIN(dt),MAX(dt)
FROM (
SELECT id,value,dt,
CASE WHEN COALESCE(threshold_id,'')=#last_ci THEN #n ELSE #n:=#n+1 END AS g,
#last_ci := COALESCE(threshold_id,'') As th
FROM
Table1, (SELECT #n:=0) r
ORDER BY
id
) s
WHERE th!=''
GROUP BY
g
For mysql 8 this could be rewritten as below.Use a CTE to get different sequences and GROUP By the difference between them.
WITH cte as (
SELECT *,
ROW_NUMBER() OVER (ORDER BY id)as rn,
ROW_NUMBER() OVER (PARTITION BY threshold_id ORDER BY id)as rnn
FROM Table1
ORDER BY id
)
SELECT MIN(value),MAX(value),MIN(dt),MAX(dt) FROM cte WHERE threshold_id IS NOT NULL GROUP BY rn-rnn
MYSQL8
FIDDLE

Your sample data only includes a single day's worth, so you only get a single row back (assuming you want to group by day):
SELECT DAYOFYEAR(dt) `day`, MIN(`value`) min_value, MAX(`value`) max_value
FROM Table1
GROUP BY `day`
ORDER BY `day` ASC

Related

Get sum of previous records in query and add or subtract the following results

Case:
I select an initial date and an end date, it should bring me the movements of all the products in that date range, but if there were movements before the initial date (records in table), I want to obtain the previous sum (prevData)
if the first move is exit 5 and the second move is income 2.
I would have in the first row (prevData-5), second row would have (prevData-5 + 2) and thus have a cumulative.
The prevData would be calculated as the sum of the above, validating the product id of the record, I made the query but if the product has 10 movements, I would do the query 10 times, and how would I identify the sum of another product_id?
SELECT
ik.id,
ik.quantity,
ik.date,
ik.product_id,
#balance = (SELECT SUM(quantity) FROM table_kardex WHERE product_id = ik.product_id AND id < ik.id)
from table_kardex ik
where ik.date between '2021-11-01' and '2021-11-15'
order by ik.product_id,ik.id asc
I hope you have given me to understand, I will be attentive to any questions.
#table_kardex
id|quantity|date|product_id
1 8 2020-10-12 2
2 15 2020-10-12 1
3 5 2021-11-01 1
4 10 2021-11-01 2
5 -2 2021-11-02 1
6 -4 2021-11-02 2
#result
id|quantity|date|product_id|saldo
3 5 2021-11-01 1 20 (15+5)
5 -2 2021-11-02 1 18 (15+5-2)
4 10 2021-11-01 2 18 (8+10-4)
6 -4 2021-11-02 2 14 (15+5-2)
Use MySQL 5.7
If you're using MySQL 8+, then analytic functions can be used here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY date) rn,
SUM(quantity) OVER (PARTITION BY product_id ORDER BY date) saldo
FROM table_kardex
WHERE date BETWEEN '2021-11-01' AND '2021-11-15'
)
SELECT id, quantity, date, product_id, saldo
FROM cte
WHERE rn > 1
ORDER BY product_id, date;
MySQL 5.7
Try this:
SELECT *
FROM (
SELECT product_id,
t1.`date`,
SUM(t2.quantity) - t1.quantity cumulative_quantity_before,
SUM(t2.quantity) cumulative_quantity_after
FROM table t1
JOIN table t2 USING (product_id)
WHERE t1.`date` >= t2.`date`
AND t1.`date` <= #period_end
GROUP BY product_id, t1.`date`, t1.quantity
) prepare_data
WHERE `date` >= #period_start;
The easiest solution is to use the window function SUM OVER to get the running total. In the second step reduce this to the date you want to have this started:
SELECT id, quantity, date, product_id, balance
FROM
(
SELECT
id,
quantity,
date,
product_id,
SUM(quantity) OVER (PARTITION BY product_id ORDER BY id) AS balance
from table_kardex ik
where date < DATE '2021-11-16'
) cumulated
WHERE date >= DATE '2021-11-01'
ORDER BY product_id, id;
UPDATE: You have changed your request to mention that you are using an old MySQL version (5.7). This doesn't support window functions. In that case use your original query. If I am not mistaken, though, #balance = (...) is invalid syntax for MySQL. And according to your explanation you want id <= ik.id, not id < ik.id:
SELECT
ik.id,
ik.quantity,
ik.date,
ik.product_id,
(
SELECT SUM(quantity)
FROM table_kardex
WHERE product_id = ik.product_id AND id <= ik.id
) AS balance
FROM table_kardex ik
WHERE ik.date >= DATE '2021-11-01' AND ik.date < DATE '2021-11-16'
ORDER BY ik.product_id, ik.id;
The appropriate indexes for this query are:
create index idx1 on table_kardex (date, product_id, id);
create index idx2 on table_kardex (product_id, id, quantity);

How to find condition when sum is specific value by MySQL

I'd like to know some condition from this table.
date value
2022-01-01 5
2022-01-02 1
2022-01-03 3
2022-01-04 0
2022-01-05 2
2022-01-06 2
When is the date if sum of values exceed 10?
Actually, the answer is '2022-01-05'. Because sum from '2022-01-01' to '2022-01-05' is 11. It's easy for us as a human.
But how do I express in MySQL? Please let me know.
If you are using MySQL 8+ then window functions makes your requirement easy:
WITH cte AS (
SELECT *, SUM(value) OVER (ORDER BY date) sum_value
FROM yourTable
)
SELECT date
FROM cte
WHERE sum_value > 10
ORDER BY date
LIMIT 1;
On earlier versions of MySQL we can express the rolling sum with a correlated subquery:
SELECT date
FROM yourTable t1
WHERE (SELECT SUM(t2.value)
FROM yourTable t2
WHERE t2.date <= t1.date) >= 10
ORDER BY date
LIMIT 1;
Another approach for MySQL < 8, using a user variable to store the rolling sum -
SELECT `date`
FROM (
SELECT t.*, #sum_value := #sum_value + `value` AS `sum_value`
FROM t, (SELECT #sum_value := 0) z
ORDER BY `date` ASC
) y
WHERE `sum_value` >= 10
ORDER BY `date` ASC
LIMIT 1;

Update by LEAD/LAG or Recursive CTE?

There is a table holding date and account balance.
However, the balance is not available for some dates.
Assuming the balance does not change when date is unavailable.
How to update the balance information for all dates?
Here is an example:
Table D contains all valid dates.
2000-01-01
2000-01-02
2000-01-03
2000-01-04
2000-01-05
2000-01-06
2000-01-07
2000-01-08
2000-01-09
Table A contains date and account balance.
2000-01-02 $100
2000-01-05 $200
2000-01-09 $700
Ultimately, I want to generate a table like this:
2000-01-01 null
2000-01-02 $100
2000-01-03 $100
2000-01-04 $100
2000-01-05 $200
2000-01-06 $200
2000-01-07 $200
2000-01-08 $200
2000-01-09 $700
I have thought about the following:
LEAD and LAG,
Recursive CTE
However, it seems that none of them is suitable for this scenario.
SQL Server does not support the IGNORE NULLS option for LAG() or LAST_VALUE(). That is actually the simplest method.
Instead, you can use APPLY:
select d.*, a.balance
from dates d outer apply
(select top (1) a.*
from a
where a.date <= d.date
order by a.date desc
) a;
Or the equivalent using a correlated subquery:
select d.*,
(select top (1) a.*
from a
where a.date <= d.date
order by a.date desc
fetch first 1 row only
)
from dates d;
This will work in both MySQL and SQL Server with the caveat that you need LIMIT in MySQL.
That said, if you had a large amount of data (which is unlikely at the granularity of "date"), then a two-steps of window functions are probably the better solution:
select ad.date,
max(ad.balance) over (partition by grp) as balance
from (select d.date, a.balance,
count(a.date) over (order by d.date) as grp
from dates d left join
a
on d.date = a.date
) ad;
The subquery assigns a "group" to each balance value and the following dates. This "group" is then used to assign the balance in the outer query.
This version will work in both MySQL or SQL Server.
One way you could achieve this is by using first value and creating some ranking functions. I am using SQL server
with cte as
(
select '2000-01-01' as Datenew union all
select '2000-01-02' as Datenew union all
select '2000-01-03' as Datenew union all
select '2000-01-04' as Datenew union all
select '2000-01-05' as Datenew union all
select '2000-01-06' as Datenew union all
select '2000-01-07' as Datenew union all
select '2000-01-08' as Datenew union all
select '2000-01-09' as Datenew ), cte2 as (
select '2000-01-02' as DateSal, '100' as Salary union all
select '2000-01-05' as DateSal, '200' as Salary union all
select '2000-01-09' as DateSal, '700' as Salary )
select datenew, Salary = FIRST_VALUE(salary) over (partition by ranking order by datenew) from (
select datenew, salary ,
sum(case when DateSal is not null then 1 end) over (order by datenew) ranking
from cte c
left join cte2 c2 on c.Datenew = c2.DateSal ) tst
order by datenew
--Sum creates running total to create a grouping and first value ensures that we are getting the same value for the given group.
this is the output
ANSI SQL.
Table_D
-------
dd(field name)
-------
2000-01-01
2000-01-02
2000-01-03
2000-01-04
2000-01-05
2000-01-06
2000-01-07
2000-01-08
2000-01-09
Table_A
-------
dd(field name) cost(field name)
-------
2000-01-02 $100
2000-01-05 $200
2000-01-09 $700
select a.dd
, (
case when a.cost is null then min(a.cost) OVER (partition by a.cost_group ORDER BY a.dd) else a.cost end
) as cost
from (
select a.dd, b.cost
, count(b.cost) over (order by a.dd) as cost_group
from Table_D a
left join Table_A b on (b.dd = a.dd)
) a
We can use count() over () window function to create different partitions and then min() over () function to spread minimum value to that partition.
First, I have created temporary variable table to hold OP data -
declare #xyz table (dt date, amount int)
insert into #xyz values
('2000-01-02','100'),
('2000-01-05','200'),
('2000-01-09','700');
Second, I will fetch max date from above table.
declare #maxDT date = (select cast(max(dt) as date) from #xyz);
Final, first CTE is recursive CTE to create data from 2000-01-01 to max date store in above variable. Second CTE is to create partitions.
;with cte as (
select cast('2000-01-01' as date) as dt
union all
select dateadd(day,1,cte.dt) from cte where cte.dt < #maxDT
), cte2 as (
select cte.dt, x.amount, x.dt as dt2, count(x.dt) over (order by cte.dt) as ranking
from cte
left join #xyz x on x.dt = cte.dt
)
select dt, min(amount) over (partition by ranking)
from cte2;

Query to subtract same column value at different interval of day with SQL database

In MySQL, I want to subtract one of column value at different interval of time based on another column 'timestamp'.
table structure is :
id | generator_id | timestamp | generated_value
1 | 1 | 2019-05-27 06:55:20 | 123456
2 | 1 | 2019-05-27 07:55:20 | 234566
3 | 1 | 2019-05-27 08:55:20 | 333456
..
..
20 | 1 | 2019-05-27 19:55:20 | 9876908
From above table I want to fetch the generated_value column value which should be difference of first timestamp fo day and timestamp of last value of day.
In above example I am looking query which should give me output as 9,753,452 (9876908 - 123456).
In general to fetch the single record of first value and last value of day I use below query
// Below will give me end day value
SELECT * FROM generator_meters where generator_id=1 and timestamp like '2019-05-27%' order by timestamp desc limit 1 ;
//this will give me last day value
SELECT * FROM generator_meters where generator_id=1 and timestamp like '2019-05-27%' order by timestamp limit 1 ;
Question is how should I get the final generated_value by doing minus of first value of day from last value of day.
Expected Output
generator_id | generated_value
1 | 9753452
Thanks in advance !!
In your example the value gets bigger and bigger. If this is guaranteed to be so, you can use
select max(generated_value) - min(generated_value) as result
from sun_electric.generator_meters
where generator_id = 1
and date(timestamp) = date '2019-05-27';
Or for multiple IDs:
select generator_id, max(generated_value) - min(generated_value) as result
from sun_electric.generator_meters
and date(timestamp) = date '2019-05-27'
group by generator_id
order by generator_id;
If the value is not ascending, then you can use the following query for ID 1:
select last_row.generated_value - first_row.generated_value as result
from
(
select *
from sun_electric.generator_meters
where generator_id = 1
and date(timestamp) = date '2019-05-27'
order by timestamp
limit 1
) first_row
cross join
(
select *
from sun_electric.generator_meters
where generator_id = 1
and date(timestamp) = date '2019-05-27'
order by timestamp desc
limit 1
) last_row;
Here is one way to get a result for multiple IDs:
select
minmax.generator_id,
(
select generated_value
from sun_electric.generator_meters gm
where gm.generator_id = minmax.generator_id
and gm.timestamp = minmax.max_ts
) -
(
select generated_value
from sun_electric.generator_meters gm
where gm.generator_id = minmax.generator_id
and gm.timestamp = minmax.min_ts
) as result
from
(
select generator_id, min(timestamp) as min_ts, max(timestamp) as max_ts
from sun_electric.generator_meters
where date(timestamp) = date '2019-05-27'
group by generator_id
) minmax
order by minmax.generator_id;
You can also move the subqueries to the from clause and join them, if you like this better. Yet another approach would be to use window functions, available as of MySQL 8.
This following script will return your expected results for the filtered ID and Date-
SELECT generator_id,CAST(timestamp AS DATE) ,
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = max(timestamp)
)
-
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = min(timestamp)
) AS Diff
FROM sun_electric.generator_meters
WHERE generator_id = 1
AND CAST(timestamp AS DATE) = '2019-05-27'
GROUP BY generator_id,CAST(timestamp AS DATE) ;
If you want the same result with GROUP BY ID and Date just remove the filter as below-
SELECT generator_id,CAST(timestamp AS DATE) ,
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = max(timestamp)
)
-
(
SELECT generated_value
FROM sun_electric.generator_meters B
WHERE timestamp = min(timestamp)
) AS Diff
FROM sun_electric.generator_meters
GROUP BY generator_id,CAST(timestamp AS DATE) ;

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.