MySQL query to get accumulatives values

MySQL query to get accumulatives values - mysql

This question is related with a previous question. I want to get differential daily values among rows in a table that looks like here:
Date | VALUE
--------------------------------
"2011-01-14 11:00" | 2
"2011-01-14 19:30" | 5
"2011-01-15 13:30" | 7
"2011-01-15 23:50" | 6
"2011-01-16 9:30" | 10
"2011-01-16 18:30" | 15
The query gets the difference/accumulative between the newest daily value and the previous newest daily value.
SELECT
t1.dt AS date,
t1.value - t2.value AS value
FROM
(SELECT DATE(date) dt, MAX(date), VALUE as value FROM table GROUP BY dt) t1
JOIN
(SELECT DATE(date) dt, MAX(date), VALUE as value FROM table GROUP BY dt) t2
ON t1.dt = t2.dt + INTERVAL 1 DAY
So the result is something like:
Date | VALUE
---------------------------
"2011-01-15 00:00" | -1
"2011-01-16 00:00" | 6
But I need the accumulative value from the first day also. In general, I need the accumulative value for a day, if the previous day doesn't exist. Something like this:
Date | VALUE
---------------------------
"2011-01-14 00:00" | 3
"2011-01-15 00:00" | -1
"2011-01-16 00:00" | 6

You would almost certainly be better off processing these differences when iterating over the result set in your language of choice but the following query should work
SELECT
`t1`.`date`,
`t1`.`value`,
IFNULL((
SELECT CAST(`t1`.`value` AS SIGNED) - CAST(`t2`.`value` AS SIGNED)
FROM `table` `t2`
WHERE `t2`.`date` < `t1`.`date`
ORDER BY `date` DESC
LIMIT 1
), `t1`.`value`)
FROM `table` `t1`
ORDER BY `date` ASC;

Related

Group overlapping ranges of data in MySQL

Is there an easy way avoiding the usage of cursors to convert this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 3 |
+-------+------+-------+
| X | 2 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 12 |
+-------+------+-------+
| Y | 12 | 13 |
+-------+------+-------+
Into this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 13 |
+-------+------+-------+
So far I've tried to assign an ID to each row and GROUP BY that ID, but I can't get any closer without using cursors.

SELECT `Group`, `From`, `Until`
FROM ( SELECT `Group`, `From`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`From` > t2.`From`
AND t1.`From` <= t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t3
JOIN ( SELECT `Group`, `Until`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`Until` >= t2.`From`
AND t1.`Until` < t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t4 USING (`Group`, rn)
fiddle
Must work at any overlapping type (partially overlapped, adjacent, fully included).
Will not work if From and/or Until is NULL.
Could you add an explanation in English? – ysth
1st subquery searches joined ranges starts (see the fiddle - it is executed separately) - it searches for From value in a group which is not in the middle/end of any other range (start point equiality allowed).
2nd subquery do the same for joined ranges Until.
Both additionally enumerates found values ascending.
Outer query simply joins each range start and its finish into one row.

If you are using MYSQL version 8+ then you can use row_number to get the desired result:
Demo
SELECT MIN(`FROM`) START,
MAX(`UNTIL`) END,
`GROUP` FROM (
SELECT A.*,
ROW_NUMBER() OVER(ORDER BY `FROM`) RN_FROM,
ROW_NUMBER() OVER(PARTITION BY `GROUP` ORDER BY `UNTIL`) RN_UNTIL
FROM Table_lag A) X
GROUP BY `GROUP`, (RN_FROM - RN_UNTIL)
ORDER BY START;

You can do this with window functions only, using some gaps-and-island technique.
The idea is to build group of consecutive record having the same group and overlapping ranges, using lag() and a window sum(). You can then aggregate the groups:
select grp, min(c_from) c_from, max(c_until) c_until
from (
select
t.*,
sum(lag_c_until < c_from) over(partition by grp order by c_from) mygrp
from (
select
t.*,
lag(c_until, 1, c_until) over(partition by grp order by c_from) lag_c_until
from mytable t
) t
) t
group by grp, mygrp
The column names you chose conflict with SQL keywords (group, from), so I renamed them to grp, c_from and c_until.
Demo on DB Fiddle - with credits to ysth for creating the fiddle in the first place:
grp | c_from | c_until
:-- | -----: | ------:
X | 1 | 4
Y | 5 | 7
X | 8 | 10
Y | 11 | 13

I would use a recursive CTE for this:
with recursive intervals (`Group`, `From`, `Until`) as (
select distinct t1.Group, t1.From, t1.Until
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.Group=t2.Group
and t1.From between t2.From and t2.Until+1
and (t1.From,t1.Until) <> (t2.From,t2.Until)
)
union all
select t1.Group, t1.From, t2.Until
from intervals t1
join Table_lag t2
on t2.Group=t1.Group
and t2.From between t1.From and t1.Until+1
and t2.Until > t1.Until
)
select `Group`, `From`, max(`Until`) as Until
from intervals
group by `Group`, `From`
order by `From`, `Group`;
The anchor expression (select .. where not exists (...)) finds all the group & from that won't combine with some earlier from (so has one row for each row in our eventual output):
Then the recursive query adds rows for merged intervals for each of our rows.
Then just group by group and from (those are awful column names) to get the biggest
interval for each starting group/from.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9efa508504b80e44b73c952572394b76
Alternatively, you can do it with a straightforward set of joins and subqueries, with no CTE or window functions needed:
select
interval_start_range.grp,
interval_start_range.start,
max(merged.finish) finish
from (
select
interval_start.grp,
interval_start.start,
min(later_interval_start.start) next_start
from (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) interval_start
left join (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) later_interval_start
on interval_start.grp=later_interval_start.grp
and interval_start.start < later_interval_start.start
group by interval_start.grp, interval_start.start
) as interval_start_range
join Table_lag merged
on merged.grp=interval_start_range.grp
and merged.start >= interval_start_range.start
and (interval_start_range.next_start is null or merged.start < interval_start_range.next_start)
group by interval_start_range.grp, interval_start_range.start
order by interval_start_range.start, interval_start_range.grp
(I have renamed the columns here to not need backticks.)
Here there's a select to get all the starts of the reportable intervals we will report, joined to another similar select (you could use a CTE to avoid the redundancy) to find the following start of a reportable interval for the same group (if there is one). That's wrapped in a subquery to get the group, the start value, and the start value of the following reportable interval. Then it just needs to join all the other records that start within that range and pick the maximum ending value.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=151cc933489c299f7beefa99e1959549

Select from a range of dates in MySQL

Is it possible to select a range of dates where each date is one row?
What I would like to do is something like this pseudo code:
SELECT date from dual where date >= "2019-01-01" AND date <= "2019-01-05"
And results should be a single column with values:
2019-01-01
2019-01-02
2019-01-03
2019-01-04
2019-01-05
Is this even possible in MySQL?

In MySQL 8.0, you can do this with a recursive query:
with recursive cte as (
select '2019-01-01' dt
union all select dt + interval 1 day from cte where dt < '2019-01-05'
)
select dt from cte order by dt
Demo on DB Fiddle:
| dt |
| :--------- |
| 2019-01-01 |
| 2019-01-02 |
| 2019-01-03 |
| 2019-01-04 |
| 2019-01-05 |
In earlier versions, solutions would typically include a table of numbers, that contains sequential integers starting at 0.
create table mynumbers (n int);
insert into mynumbers values(1), (2), (3), (4), (5);
select '2019-01-01' + interval n day dt
from mynumbers n
where '2019-01-01' + interval n day <= '2019-01-05'
Demo
Note: if you need to generate a very large dataset, the "number table" solution is more efficient than the recursive query.

You can try something like,
SELECT date from dual
where date >= (SELECT min(a.date) from DatesTable a)
AND date <= (SELECT max(b.date) from DatesTable b)
OR
SELECT date from dual
where date BETWEEN (SELECT min(a.date) from DatesTable a)
AND
(SELECT max(b.date) from DatesTable b)
You can also add some condition (by using WHERE clause) in the sub queries to reflect different situations.

First record combined with GROUP BY

Let's say I got a table "values" which contains the fields
id (int)
name (varchar)
value (float)
timestamp (int)
Now I want to to calculate the highest lowest and first value (timestamp based) for each name on the entire values table.
Is this possible to be achieved in one single performant query? I stumbled upon the 'first_value' function, but that one doesn't seem to work. I tried the following query, using joins, but also without success.
SELECT
a.name,
b.value as open,
MIN(a.value) as low,
MAX(a.value) as high
FROM values a
LEFT JOIN values b
ON a.name = b.name AND b.id = MIN(a.id)
GROUP BY a.name;
Isn't there some sort of function which would make something similar as this possible?
SELECT
name,
FIRST_VALUE(value) as open,
MIN(value) as low,
MAX(value) as high
FROM values
GROUP BY name
ORDER BY timestamp ASC;
Example data
id name value timestamp
1 USD 3 16540
2 EUR 5 16540
3 GBP 4 16540
4 EUR 2 16600
5 USD 4 16600
6 GBP 5 16600
7 USD 6 16660
8 EUR 7 16660
9 GBP 6 16660
10 USD 5 16720
11 EUR 5 16720
12 GBP 7 16720
13 EUR 8 16780
14 USD 7 16780
15 GBP 8 16780
Example output
name open low high
USD 3 3 7
EUR 5 2 8
GBP 4 4 8
I'm using MySQL-client version: 5.6.39
A tie should not be possible, if it does, I don't care which value gets picked.

If you are running MySQL 8.0, this can be quite easily solved with window functions:
select name, value open, low, high
from (
select
name,
value,
min(value) over(partition by name) low,
max(value) over(partition by name) high,
row_number() over(partition by name order by timestamp) rn
from mytable
) x
where rn = 1
Demo on DB Fiddle:
| name | open | low | high |
| ---- | ---- | --- | ---- |
| EUR | 5 | 2 | 8 |
| GBP | 4 | 4 | 8 |
| USD | 3 | 3 | 7 |
In earlier versions, you could:
use a correlated subquery to filter on the first record for each name
join the table with an aggregate query that computes the min and max of each name
Query:
select
t.name,
t.value open,
t0.low,
t0.high
from
mytable t
inner join (
select name, min(value) low, max(value) high from mytable group by name
) t0 on t0.name = t.name
where t.timestamp = (
select min(t1.timestamp) from mytable t1 where t1.name = t.name
);
Demo on MySQL 5.6 DB Fiddle: same results as above
This could also be achieved using inline subqueries (which may actually perform better):
select
t.name,
t.value open,
(select min(value) from mytable t1 where t1.name = t.name) low,
(select max(value) from mytable t1 where t1.name = t.name) high
from
mytable t
where timestamp = (
select min(t1.timestamp) from mytable t1 where t1.name = t.name
)
Demo on MySQL 5.6 DB Fiddle

in one single performant query
Do it logically and let the DBMS worry about performance. If that isn't fast enough, check your indexes.
The value associated with the first timestamp requires a join. You can find the first timestamp easily enough. Getting a value from a row associated with a given row: that's what joins are for.
So, we have:
SELECT
name,
value as open,
v1.low
v1.high
FROM values as v join (
select name,
min(timestamp) as timestamp,
min(value) as low,
max(value) as high
FROM values
GROUP BY name
) as v1
on v.name = v1.name and v.timestamp = v1.timestamp

This solution seems to have the best performance.
SELECT
name,
CAST(SUBSTRING_INDEX(GROUP_CONCAT(CAST(value AS CHAR) ORDER BY TIMESTAMP ASC), ',', 1) AS DECIMAL(10, 6)) AS open,
MIN(value) AS low,
MAX(value) AS high
FROM mytable
GROUP BY name
ORDER BY name ASC

query that subtracts sum of one column from another

I have a table similar to the following in my database
+----+----+----+---------------------+
| id | a | b | date_created |
+----+----+----+---------------------+
| 1 | 22 | 33 | 2014-07-31 14:38:17 |
| 2 | 11 | 9 | 2014-07-30 14:40:19 |
| 3 | 8 | 4 | 2014-07-29 14:40:34 |
+----+----+----+---------------------+
I'm trying to write a query that subtracts sum(b) from each a. However, the values of b included in sum(b) should be only those that are earlier than (or the same time as) the a they are being subtracted from. In other words, the results returned by the query should be those shown below
22 - (33 + 9 + 4)
11 - (9 + 4)
8 - (4)
is it possible to calculate this in a single query?

select id, a, a - (select sum(b)
from My_TABLE T2
where T2.date_created <= T1.date_created)
from MY_TABLE T1;

Something like this should work:
select t1.a - ifnull( sum(t2.b), 0)
from myTable t1
left outer join myTable t2 on t2.date_created <= t1.date_created
group by t1.a
Note that the table is joined to itself to access two different sets of information.
Edit:
I think you probably want to group by the date_created like:
select t1.date_created, t1.a - ifnull( sum(t2.b), 0)
from myTable t1
left outer join myTable t2 on t2.date_created <= t1.date_created
group by t1.date_created, t1.a

SELECT x.*
, x.a - SUM(y.b)
FROM my_table x
JOIN my_table y
ON y.date_created <= x.date_created
GROUP
BY x.id;

Another alternative
SQL Fiddle Example
SELECT
id,
a - (#total := #total + b) as Total
FROM
(SELECT *, #total:=0
FROM my_table
ORDER BY date_created asc) AS Base

Finding a previous, non-contiguous date using SQL

Suppose a table, tableX, like this:
| date | hours |
| 2014-07-02 | 10 |
| 2014-07-03 | 10 |
| 2014-07-07 | 20 |
| 2014-07-08 | 40 |
The dates are 'workdays' -- that is, no weekends or holidays.
I want to find the increase in hours between consecutive workdays, like this:
| date | hours |
| 2014-07-03 | 0 |
| 2014-07-07 | 10 |
| 2014-07-08 | 20 |
The challenge is dealing with the gaps. If there were no gaps, something like
SELECT t1.date1 AS 'first day', t2.date1 AS 'second day', (t2.hours - t1.hours)
FROM tableX t1
LEFT JOIN tableX t2 ON t2.date1 = DATE_add(t1.date1, INTERVAL 1 DAY)
ORDER BY t2.date1;
would get it done, but that doesn't work in this case as there is a gap between 2014-07-03 and 2014-07-07.

Just use a correlated subquery instead. You have two fields, so you can do this with two correlated subqueries, or a correlated subquery with a join back to the table. Here is the first version:
SELECT t1.date1 as `first day`,
(select t2.date1
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) as `next day`,
(select t2.hours
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) - t.hours
FROM tableX t
ORDER BY t.date1;

Another alternative is to rank the data by date and then subtract the hours of the previous workday's date from the hours of the current workday's date.
SELECT
ranked_t1.date1 date,
ranked_t1.hours - ranked_t2.hours hours
FROM
(
SELECT t.*,
#rownum := #rownum + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum := 0) r
) ranked_t1
INNER JOIN
(
SELECT t.*,
#rownum2 := #rownum2 + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum2 := 0) r
) ranked_t2
ON ranked_t2.rank = ranked_t1.rank - 1;
SQL Fiddle demo
Note:
Obviously an index on tableX.date1 would speed up the query.
Instead of a correlated subquery, a join is used in the above query.
Reference:
Mysql rank function on SO

Unfortunately, MySQL doesn't (yet) have analytic functions which would allow you to access the "previous row" or the "next row" of the data stream. However, you can duplicate it with this:
select h2.LogDate, h2.Hours - h1.Hours as Added_Hours
from Hours h1
left join Hours h2
on h2.LogDate =(
select Min( LogDate )
from Hours
where LogDate > h1.LogDate )
where h2.LogDate is not null;
Check it out here. Note the index on the date field. If that field is not indexed, this query will take forever.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL query to get accumulatives values - mysql

Related

Group overlapping ranges of data in MySQL

Select from a range of dates in MySQL

First record combined with GROUP BY

query that subtracts sum of one column from another

Finding a previous, non-contiguous date using SQL

Categories

Resources