Finding a previous, non-contiguous date using SQL - mysql

Suppose a table, tableX, like this:
| date | hours |
| 2014-07-02 | 10 |
| 2014-07-03 | 10 |
| 2014-07-07 | 20 |
| 2014-07-08 | 40 |
The dates are 'workdays' -- that is, no weekends or holidays.
I want to find the increase in hours between consecutive workdays, like this:
| date | hours |
| 2014-07-03 | 0 |
| 2014-07-07 | 10 |
| 2014-07-08 | 20 |
The challenge is dealing with the gaps. If there were no gaps, something like
SELECT t1.date1 AS 'first day', t2.date1 AS 'second day', (t2.hours - t1.hours)
FROM tableX t1
LEFT JOIN tableX t2 ON t2.date1 = DATE_add(t1.date1, INTERVAL 1 DAY)
ORDER BY t2.date1;
would get it done, but that doesn't work in this case as there is a gap between 2014-07-03 and 2014-07-07.

Just use a correlated subquery instead. You have two fields, so you can do this with two correlated subqueries, or a correlated subquery with a join back to the table. Here is the first version:
SELECT t1.date1 as `first day`,
(select t2.date1
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) as `next day`,
(select t2.hours
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) - t.hours
FROM tableX t
ORDER BY t.date1;

Another alternative is to rank the data by date and then subtract the hours of the previous workday's date from the hours of the current workday's date.
SELECT
ranked_t1.date1 date,
ranked_t1.hours - ranked_t2.hours hours
FROM
(
SELECT t.*,
#rownum := #rownum + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum := 0) r
) ranked_t1
INNER JOIN
(
SELECT t.*,
#rownum2 := #rownum2 + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum2 := 0) r
) ranked_t2
ON ranked_t2.rank = ranked_t1.rank - 1;
SQL Fiddle demo
Note:
Obviously an index on tableX.date1 would speed up the query.
Instead of a correlated subquery, a join is used in the above query.
Reference:
Mysql rank function on SO

Unfortunately, MySQL doesn't (yet) have analytic functions which would allow you to access the "previous row" or the "next row" of the data stream. However, you can duplicate it with this:
select h2.LogDate, h2.Hours - h1.Hours as Added_Hours
from Hours h1
left join Hours h2
on h2.LogDate =(
select Min( LogDate )
from Hours
where LogDate > h1.LogDate )
where h2.LogDate is not null;
Check it out here. Note the index on the date field. If that field is not indexed, this query will take forever.

Related

Group overlapping ranges of data in MySQL

Is there an easy way avoiding the usage of cursors to convert this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 3 |
+-------+------+-------+
| X | 2 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 12 |
+-------+------+-------+
| Y | 12 | 13 |
+-------+------+-------+
Into this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 13 |
+-------+------+-------+
So far I've tried to assign an ID to each row and GROUP BY that ID, but I can't get any closer without using cursors.
SELECT `Group`, `From`, `Until`
FROM ( SELECT `Group`, `From`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`From` > t2.`From`
AND t1.`From` <= t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t3
JOIN ( SELECT `Group`, `Until`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`Until` >= t2.`From`
AND t1.`Until` < t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t4 USING (`Group`, rn)
fiddle
Must work at any overlapping type (partially overlapped, adjacent, fully included).
Will not work if From and/or Until is NULL.
Could you add an explanation in English? – ysth
1st subquery searches joined ranges starts (see the fiddle - it is executed separately) - it searches for From value in a group which is not in the middle/end of any other range (start point equiality allowed).
2nd subquery do the same for joined ranges Until.
Both additionally enumerates found values ascending.
Outer query simply joins each range start and its finish into one row.
If you are using MYSQL version 8+ then you can use row_number to get the desired result:
Demo
SELECT MIN(`FROM`) START,
MAX(`UNTIL`) END,
`GROUP` FROM (
SELECT A.*,
ROW_NUMBER() OVER(ORDER BY `FROM`) RN_FROM,
ROW_NUMBER() OVER(PARTITION BY `GROUP` ORDER BY `UNTIL`) RN_UNTIL
FROM Table_lag A) X
GROUP BY `GROUP`, (RN_FROM - RN_UNTIL)
ORDER BY START;
You can do this with window functions only, using some gaps-and-island technique.
The idea is to build group of consecutive record having the same group and overlapping ranges, using lag() and a window sum(). You can then aggregate the groups:
select grp, min(c_from) c_from, max(c_until) c_until
from (
select
t.*,
sum(lag_c_until < c_from) over(partition by grp order by c_from) mygrp
from (
select
t.*,
lag(c_until, 1, c_until) over(partition by grp order by c_from) lag_c_until
from mytable t
) t
) t
group by grp, mygrp
The column names you chose conflict with SQL keywords (group, from), so I renamed them to grp, c_from and c_until.
Demo on DB Fiddle - with credits to ysth for creating the fiddle in the first place:
grp | c_from | c_until
:-- | -----: | ------:
X | 1 | 4
Y | 5 | 7
X | 8 | 10
Y | 11 | 13
I would use a recursive CTE for this:
with recursive intervals (`Group`, `From`, `Until`) as (
select distinct t1.Group, t1.From, t1.Until
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.Group=t2.Group
and t1.From between t2.From and t2.Until+1
and (t1.From,t1.Until) <> (t2.From,t2.Until)
)
union all
select t1.Group, t1.From, t2.Until
from intervals t1
join Table_lag t2
on t2.Group=t1.Group
and t2.From between t1.From and t1.Until+1
and t2.Until > t1.Until
)
select `Group`, `From`, max(`Until`) as Until
from intervals
group by `Group`, `From`
order by `From`, `Group`;
The anchor expression (select .. where not exists (...)) finds all the group & from that won't combine with some earlier from (so has one row for each row in our eventual output):
Then the recursive query adds rows for merged intervals for each of our rows.
Then just group by group and from (those are awful column names) to get the biggest
interval for each starting group/from.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9efa508504b80e44b73c952572394b76
Alternatively, you can do it with a straightforward set of joins and subqueries, with no CTE or window functions needed:
select
interval_start_range.grp,
interval_start_range.start,
max(merged.finish) finish
from (
select
interval_start.grp,
interval_start.start,
min(later_interval_start.start) next_start
from (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) interval_start
left join (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) later_interval_start
on interval_start.grp=later_interval_start.grp
and interval_start.start < later_interval_start.start
group by interval_start.grp, interval_start.start
) as interval_start_range
join Table_lag merged
on merged.grp=interval_start_range.grp
and merged.start >= interval_start_range.start
and (interval_start_range.next_start is null or merged.start < interval_start_range.next_start)
group by interval_start_range.grp, interval_start_range.start
order by interval_start_range.start, interval_start_range.grp
(I have renamed the columns here to not need backticks.)
Here there's a select to get all the starts of the reportable intervals we will report, joined to another similar select (you could use a CTE to avoid the redundancy) to find the following start of a reportable interval for the same group (if there is one). That's wrapped in a subquery to get the group, the start value, and the start value of the following reportable interval. Then it just needs to join all the other records that start within that range and pick the maximum ending value.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=151cc933489c299f7beefa99e1959549

Sum until certain value

i've tried some other topics for this but couldn't get answers that actually worked for me.
I have a activities table with some values ( in mysql)
| id| user_id | elevation | distance |
|---|------------|--------------------|----------|
| 1 | 1 | 220 | 5000 |
| 2 | 1 | 300 | 7000 |
| 3 | 2 | 520 | 2000 |
| 4 | 2 | 120 | 3500 |
I need to sum distance and elevation until distance sum up to certain value, per user_id.
Example, sum until 5000 is reached:
User 1 - distance 5000 - elevation 220
User 2 - distance 5500 - elevation 640
I found many solutions but none with group_by. How i do this in mysql?
Update : I used that query but now i'm with another problem. The join always use the insert order, and not a datetime field i want.
SELECT
t.*
FROM
(
SELECT
t.*,
(
#d := #d + DISTANCE
) AS running_distance
FROM
(
SELECT
t.*,
c.meta
FROM
inscricao i
INNER JOIN categorias c ON
i.categoria_id = c.id
LEFT JOIN(
select
t.data_inicio,t.usuario_id,t.aplicativo,t.data_fim,t.distance,t.tempo_decorrido,t.ritmo_cardiaco,t.velocidade_media,t.type,t.ganho_de_altimetria
from
corridas t
order by
data_inicio asc
) t ON
t.usuario_id = i.usuario_id
AND t.data_inicio >= i.inicio
AND t.data_fim <= i.fim
WHERE
i.desafio_id = 29
AND(
i.usuario_id = 5354
)
ORDER BY
data_inicio asc
-- usuario_id
) t
join (
SELECT
#u :=- 1,
#d := 0
) params
ORDER BY
data_inicio asc
) t
WHERE
(
running_distance >= meta * 1000
AND running_distance - DISTANCE < meta * 1000
)
OR(
running_distance <= meta * 1000
)
order by
data_inicio desc
So if a older activity is inserted after, the sum gets wrong. Someone knows how to handle it?
You can use variables to get the cumulative sum . . . then some simple filtering logic:
select t.*
from (select t.*,
(#d := if(#u = user_id, #d + distance,
if(#u := user_id, distance, distance)
)
) as running_distance -- pun intended ??
from (select t.*
from t
order by user_id, id
) t cross join
(select #u := -1, #d := 0) params
) t
where running_distance >= 5000 and
running_distance - distance < 5000;
Notes:
The more recent versions of MySQL are finicky about variable assignment and order by. The innermost subquery is not needed in earlier versions of MySQL.
MySQL does not guarantee the order of evaluation of expressions in a select. Hence, all variable assignments are in a single expression.
If distance can be negative, then a user may have more than one row in the result set.
This is not an aggregation query.

get totals each day based on a given timestamp

I have a simple table:
user | timestamp
===================
Foo | 1440358805
Bar | 1440558805
BarFoo | 1440559805
FooBar | 1440758805
I would like to get a view with total number of users each day:
date | total
===================
...
2015-08-23 | 1 //Foo
2015-08-24 | 1
2015-08-25 | 1
2015-08-26 | 3 //+Bar +BarFoo
2015-08-27 | 3
2015-08-28 | 4 //+FooBar
...
What I currently have is
SELECT From_unixtime(a.timestamp, '%Y-%m-%d') AS date,
Count(From_unixtime(a.timestamp, '%Y-%m-%d')) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
ORDER BY a.timestamp ASC
which counts only the user of a certain day:
date | total
===================
2015-08-23 | 1 //Foo
2015-08-26 | 2 //Bar +BarFoo
2015-08-28 | 1 //FooBar
I've prepared a sqlfiddle
EDIT
The solution by #splash58 returns this result:
date | #t:=coalesce(total, #t)
==================================
2015-08-23 | 1
2015-08-26 | 3
2015-08-28 | 4
2015-08-21 | 4
2015-08-22 | 4
2015-08-24 | 4
2015-08-25 | 4
2015-08-27 | 4
2015-08-29 | 4
2015-08-30 | 4
You can get the cumulative values by using variables:
SELECT date, total, (#cume := #cume + total) as cume_total
FROM (SELECT From_unixtime(a.timestamp, '%Y-%m-%d') as date, Count(*) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
) a CROSS JOIN
(SELECT #cume := 0) params
ORDER BY date;
This gives you the dates that are in your data. If you want additional dates (where no users start), then one way is a calendar table:
SELECT c.date, a.total, (#cume := #cume + coalesce(a.total, 0)) as cume_total
FROM Calendar c JOIN
(SELECT From_unixtime(a.timestamp, '%Y-%m-%d') as date, Count(*) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
) a
ON a.date = c.date CROSS JOIN
(SELECT #cume := 0) params
WHERE c.date BETWEEN '2015-08-23' AND '2015-08-28'
ORDER BY c.date;
You can also put the dates explicitly in the query (using a subquery), if you don't have a calendar table.
To save order of dates, i think, we need to wrap query in one more select
select date, #n:=#n + ifnull(total,0) total
from
(select Calendar.date, total
from Calendar
left join
(select From_unixtime(timestamp, '%Y-%m-%d') date, count(*) total
from thetable
group by date) t2
on Calendar.date= t2.date
order by date) t3
cross join (select #n:=0) n
Demo on sqlfiddle
You can use function
TIMESTAMPDIFF(DAY,`timestamp_field`, CURDATE())
You will not have to convert timestamp to other field dypes.
drop table if exists thetable;
create table thetable (user text, timestamp int);
insert into thetable values
('Foo', 1440358805),
('Bar', 1440558805),
('BarFoo', 1440559805),
('FooBar', 1440758805);
DROP PROCEDURE IF EXISTS insertTEMP;
DELIMITER //
CREATE PROCEDURE insertTEMP (first date, last date) begin
drop table if exists Calendar;
CREATE TEMPORARY TABLE Calendar (date date);
WHILE first <= last DO
INSERT INTO Calendar Values (first);
SET first = first + interval 1 day;
END WHILE;
END //
DELIMITER ;
call insertTEMP('2015-08-23', '2015-08-28');
select Calendar.date, #t:=coalesce(total, #t)
from Calendar
left join
(select date, max(total) total
from (select From_unixtime(a.timestamp, '%Y-%m-%d') AS date,
#n:=#n+1 AS total
from thetable AS a, (select #n:=0) n
order by a.timestamp ASC) t1
group by date ) t2
on Calendar.date= t2.date,
(select #t:=0) t
result
date, #t:=coalesce(total, #t)
2015-08-23 1
2015-08-24 1
2015-08-25 1
2015-08-26 3
2015-08-27 3
2015-08-28 4

query that subtracts sum of one column from another

I have a table similar to the following in my database
+----+----+----+---------------------+
| id | a | b | date_created |
+----+----+----+---------------------+
| 1 | 22 | 33 | 2014-07-31 14:38:17 |
| 2 | 11 | 9 | 2014-07-30 14:40:19 |
| 3 | 8 | 4 | 2014-07-29 14:40:34 |
+----+----+----+---------------------+
I'm trying to write a query that subtracts sum(b) from each a. However, the values of b included in sum(b) should be only those that are earlier than (or the same time as) the a they are being subtracted from. In other words, the results returned by the query should be those shown below
22 - (33 + 9 + 4)
11 - (9 + 4)
8 - (4)
is it possible to calculate this in a single query?
select id, a, a - (select sum(b)
from My_TABLE T2
where T2.date_created <= T1.date_created)
from MY_TABLE T1;
Something like this should work:
select t1.a - ifnull( sum(t2.b), 0)
from myTable t1
left outer join myTable t2 on t2.date_created <= t1.date_created
group by t1.a
Note that the table is joined to itself to access two different sets of information.
Edit:
I think you probably want to group by the date_created like:
select t1.date_created, t1.a - ifnull( sum(t2.b), 0)
from myTable t1
left outer join myTable t2 on t2.date_created <= t1.date_created
group by t1.date_created, t1.a
SELECT x.*
, x.a - SUM(y.b)
FROM my_table x
JOIN my_table y
ON y.date_created <= x.date_created
GROUP
BY x.id;
Another alternative
SQL Fiddle Example
SELECT
id,
a - (#total := #total + b) as Total
FROM
(SELECT *, #total:=0
FROM my_table
ORDER BY date_created asc) AS Base

MySQL query to get accumulatives values

This question is related with a previous question. I want to get differential daily values among rows in a table that looks like here:
Date | VALUE
--------------------------------
"2011-01-14 11:00" | 2
"2011-01-14 19:30" | 5
"2011-01-15 13:30" | 7
"2011-01-15 23:50" | 6
"2011-01-16 9:30" | 10
"2011-01-16 18:30" | 15
The query gets the difference/accumulative between the newest daily value and the previous newest daily value.
SELECT
t1.dt AS date,
t1.value - t2.value AS value
FROM
(SELECT DATE(date) dt, MAX(date), VALUE as value FROM table GROUP BY dt) t1
JOIN
(SELECT DATE(date) dt, MAX(date), VALUE as value FROM table GROUP BY dt) t2
ON t1.dt = t2.dt + INTERVAL 1 DAY
So the result is something like:
Date | VALUE
---------------------------
"2011-01-15 00:00" | -1
"2011-01-16 00:00" | 6
But I need the accumulative value from the first day also. In general, I need the accumulative value for a day, if the previous day doesn't exist. Something like this:
Date | VALUE
---------------------------
"2011-01-14 00:00" | 3
"2011-01-15 00:00" | -1
"2011-01-16 00:00" | 6
You would almost certainly be better off processing these differences when iterating over the result set in your language of choice but the following query should work
SELECT
`t1`.`date`,
`t1`.`value`,
IFNULL((
SELECT CAST(`t1`.`value` AS SIGNED) - CAST(`t2`.`value` AS SIGNED)
FROM `table` `t2`
WHERE `t2`.`date` < `t1`.`date`
ORDER BY `date` DESC
LIMIT 1
), `t1`.`value`)
FROM `table` `t1`
ORDER BY `date` ASC;