I have a simple table:
user | timestamp
===================
Foo | 1440358805
Bar | 1440558805
BarFoo | 1440559805
FooBar | 1440758805
I would like to get a view with total number of users each day:
date | total
===================
...
2015-08-23 | 1 //Foo
2015-08-24 | 1
2015-08-25 | 1
2015-08-26 | 3 //+Bar +BarFoo
2015-08-27 | 3
2015-08-28 | 4 //+FooBar
...
What I currently have is
SELECT From_unixtime(a.timestamp, '%Y-%m-%d') AS date,
Count(From_unixtime(a.timestamp, '%Y-%m-%d')) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
ORDER BY a.timestamp ASC
which counts only the user of a certain day:
date | total
===================
2015-08-23 | 1 //Foo
2015-08-26 | 2 //Bar +BarFoo
2015-08-28 | 1 //FooBar
I've prepared a sqlfiddle
EDIT
The solution by #splash58 returns this result:
date | #t:=coalesce(total, #t)
==================================
2015-08-23 | 1
2015-08-26 | 3
2015-08-28 | 4
2015-08-21 | 4
2015-08-22 | 4
2015-08-24 | 4
2015-08-25 | 4
2015-08-27 | 4
2015-08-29 | 4
2015-08-30 | 4
You can get the cumulative values by using variables:
SELECT date, total, (#cume := #cume + total) as cume_total
FROM (SELECT From_unixtime(a.timestamp, '%Y-%m-%d') as date, Count(*) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
) a CROSS JOIN
(SELECT #cume := 0) params
ORDER BY date;
This gives you the dates that are in your data. If you want additional dates (where no users start), then one way is a calendar table:
SELECT c.date, a.total, (#cume := #cume + coalesce(a.total, 0)) as cume_total
FROM Calendar c JOIN
(SELECT From_unixtime(a.timestamp, '%Y-%m-%d') as date, Count(*) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
) a
ON a.date = c.date CROSS JOIN
(SELECT #cume := 0) params
WHERE c.date BETWEEN '2015-08-23' AND '2015-08-28'
ORDER BY c.date;
You can also put the dates explicitly in the query (using a subquery), if you don't have a calendar table.
To save order of dates, i think, we need to wrap query in one more select
select date, #n:=#n + ifnull(total,0) total
from
(select Calendar.date, total
from Calendar
left join
(select From_unixtime(timestamp, '%Y-%m-%d') date, count(*) total
from thetable
group by date) t2
on Calendar.date= t2.date
order by date) t3
cross join (select #n:=0) n
Demo on sqlfiddle
You can use function
TIMESTAMPDIFF(DAY,`timestamp_field`, CURDATE())
You will not have to convert timestamp to other field dypes.
drop table if exists thetable;
create table thetable (user text, timestamp int);
insert into thetable values
('Foo', 1440358805),
('Bar', 1440558805),
('BarFoo', 1440559805),
('FooBar', 1440758805);
DROP PROCEDURE IF EXISTS insertTEMP;
DELIMITER //
CREATE PROCEDURE insertTEMP (first date, last date) begin
drop table if exists Calendar;
CREATE TEMPORARY TABLE Calendar (date date);
WHILE first <= last DO
INSERT INTO Calendar Values (first);
SET first = first + interval 1 day;
END WHILE;
END //
DELIMITER ;
call insertTEMP('2015-08-23', '2015-08-28');
select Calendar.date, #t:=coalesce(total, #t)
from Calendar
left join
(select date, max(total) total
from (select From_unixtime(a.timestamp, '%Y-%m-%d') AS date,
#n:=#n+1 AS total
from thetable AS a, (select #n:=0) n
order by a.timestamp ASC) t1
group by date ) t2
on Calendar.date= t2.date,
(select #t:=0) t
result
date, #t:=coalesce(total, #t)
2015-08-23 1
2015-08-24 1
2015-08-25 1
2015-08-26 3
2015-08-27 3
2015-08-28 4
Related
Is there an easy way avoiding the usage of cursors to convert this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 3 |
+-------+------+-------+
| X | 2 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 12 |
+-------+------+-------+
| Y | 12 | 13 |
+-------+------+-------+
Into this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 13 |
+-------+------+-------+
So far I've tried to assign an ID to each row and GROUP BY that ID, but I can't get any closer without using cursors.
SELECT `Group`, `From`, `Until`
FROM ( SELECT `Group`, `From`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`From` > t2.`From`
AND t1.`From` <= t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t3
JOIN ( SELECT `Group`, `Until`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`Until` >= t2.`From`
AND t1.`Until` < t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t4 USING (`Group`, rn)
fiddle
Must work at any overlapping type (partially overlapped, adjacent, fully included).
Will not work if From and/or Until is NULL.
Could you add an explanation in English? – ysth
1st subquery searches joined ranges starts (see the fiddle - it is executed separately) - it searches for From value in a group which is not in the middle/end of any other range (start point equiality allowed).
2nd subquery do the same for joined ranges Until.
Both additionally enumerates found values ascending.
Outer query simply joins each range start and its finish into one row.
If you are using MYSQL version 8+ then you can use row_number to get the desired result:
Demo
SELECT MIN(`FROM`) START,
MAX(`UNTIL`) END,
`GROUP` FROM (
SELECT A.*,
ROW_NUMBER() OVER(ORDER BY `FROM`) RN_FROM,
ROW_NUMBER() OVER(PARTITION BY `GROUP` ORDER BY `UNTIL`) RN_UNTIL
FROM Table_lag A) X
GROUP BY `GROUP`, (RN_FROM - RN_UNTIL)
ORDER BY START;
You can do this with window functions only, using some gaps-and-island technique.
The idea is to build group of consecutive record having the same group and overlapping ranges, using lag() and a window sum(). You can then aggregate the groups:
select grp, min(c_from) c_from, max(c_until) c_until
from (
select
t.*,
sum(lag_c_until < c_from) over(partition by grp order by c_from) mygrp
from (
select
t.*,
lag(c_until, 1, c_until) over(partition by grp order by c_from) lag_c_until
from mytable t
) t
) t
group by grp, mygrp
The column names you chose conflict with SQL keywords (group, from), so I renamed them to grp, c_from and c_until.
Demo on DB Fiddle - with credits to ysth for creating the fiddle in the first place:
grp | c_from | c_until
:-- | -----: | ------:
X | 1 | 4
Y | 5 | 7
X | 8 | 10
Y | 11 | 13
I would use a recursive CTE for this:
with recursive intervals (`Group`, `From`, `Until`) as (
select distinct t1.Group, t1.From, t1.Until
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.Group=t2.Group
and t1.From between t2.From and t2.Until+1
and (t1.From,t1.Until) <> (t2.From,t2.Until)
)
union all
select t1.Group, t1.From, t2.Until
from intervals t1
join Table_lag t2
on t2.Group=t1.Group
and t2.From between t1.From and t1.Until+1
and t2.Until > t1.Until
)
select `Group`, `From`, max(`Until`) as Until
from intervals
group by `Group`, `From`
order by `From`, `Group`;
The anchor expression (select .. where not exists (...)) finds all the group & from that won't combine with some earlier from (so has one row for each row in our eventual output):
Then the recursive query adds rows for merged intervals for each of our rows.
Then just group by group and from (those are awful column names) to get the biggest
interval for each starting group/from.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9efa508504b80e44b73c952572394b76
Alternatively, you can do it with a straightforward set of joins and subqueries, with no CTE or window functions needed:
select
interval_start_range.grp,
interval_start_range.start,
max(merged.finish) finish
from (
select
interval_start.grp,
interval_start.start,
min(later_interval_start.start) next_start
from (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) interval_start
left join (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) later_interval_start
on interval_start.grp=later_interval_start.grp
and interval_start.start < later_interval_start.start
group by interval_start.grp, interval_start.start
) as interval_start_range
join Table_lag merged
on merged.grp=interval_start_range.grp
and merged.start >= interval_start_range.start
and (interval_start_range.next_start is null or merged.start < interval_start_range.next_start)
group by interval_start_range.grp, interval_start_range.start
order by interval_start_range.start, interval_start_range.grp
(I have renamed the columns here to not need backticks.)
Here there's a select to get all the starts of the reportable intervals we will report, joined to another similar select (you could use a CTE to avoid the redundancy) to find the following start of a reportable interval for the same group (if there is one). That's wrapped in a subquery to get the group, the start value, and the start value of the following reportable interval. Then it just needs to join all the other records that start within that range and pick the maximum ending value.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=151cc933489c299f7beefa99e1959549
I am trying to get the availability of the rooms in my hotel by 1 hour incrementation. So for example, if the room is booked from 9 AM to 10 AM, and from 12 AM to 3 PM, I am trying to get the 1 hour increments of all other times between available_from to available_to
I am able to left join on the table and get the room availability but just not the time slots.
Here is my related schema:
Hotel:
Id | name
Reservation:
Id | hotel_id | room_id | start | end | status
Rooms:
Id | hotel_id | name | number | available_from | available_to
Here is the query I have so far:
SELECT r.id, r.name, r.number, r.type, r.rating
FROM rooms r
LEFT OUTER JOIN reservations res ON res.room_id = r.id
AND CURRENT_TIMESTAMP BETWEEN r.available_from AND r.available_to
GROUP BY r.id, r.type
Example:
(This is the array I am trying to get back from database. Ignore the property names):
[{"roomNumber":1,"availableTimes":["2019-01-01 00:00:00","2019-01-01 01:00:00","2019-01-01 02:00:00","2019-01-01 03:00:00","2019-01-01 04:00:00","2019-01-01 05:00:00","2019-01-01 06:00:00","2019-01-01 07:00:00","2019-01-01 08:00:00","2019-01-01 09:00:00","2019-01-01 10:00:00","2019-01-01 11:00:00","2019-01-01 12:00:00","2019-01-01 13:00:00","2019-01-01 14:00:00","2019-01-01 15:00:00","2019-01-01 16:00:00","2019-01-01 17:00:00","2019-01-01 18:00:00","2019-01-01 19:00:00","2019-01-01 20:00:00","2019-01-01 21:00:00","2019-01-01 22:00:00","2019-01-01 23:00:00"]}]
I tried the following:
SELECT free_from, free_until
FROM (
SELECT a.end AS free_from,
(SELECT MIN(c.start)
FROM reservations c
WHERE c.start > a.end) as free_until
FROM reservations a
WHERE NOT EXISTS (
SELECT 1
FROM reservations b
WHERE b.start BETWEEN a.end AND a.end + INTERVAL 1 HOUR
)
AND a.end BETWEEN '2019-01-03 09:00' AND '2019-01-03 21:00'
) as d
ORDER BY free_until-free_from
LIMIT 0,3;
But I get one row returned only with 1 result which is incorrect as well. How can I solve this problem?
Sample Data:
Hotel:
1 | Marriott
Reservation:
1 | 1 | 1 | 2019-01-03 15:00:00 | 2019-01-03 17:00:00 | Confirmed
1 | 1 | 1 | 2019-01-03 18:00:00 | 2019-01-03 20:00:00 | Confirmed
Rooms:
1 | 1 | "Single" | 528 | 09:00:00 | 21:00:00
Expected Result
Room Id | Room name | Available Times
1 | "Single" | 2019-01-03 09:00:00, 2019-01-03 10:00:00, 2019-01-03 11:00:00, 2019-01-03 12:00:00, 2019-01-03 13:00:00, 2019-01-03 14:00:00, 2019-01-03 17:00:00, 2019-01-03 20:00:00, 2019-01-03 21:00:00, 2019-01-03 22:00:00, 2019-01-03 23:00:00, 2019-01-03 24:00:00
If you add a Times_Slots table to your data base as shown in this SQL Fiddle:
CREATE TABLE Time_Slots
(`Slot` time);
INSERT INTO Time_Slots
(`Slot`)
VALUES
('00:00:00'),
('01:00:00'),
('02:00:00'),
('03:00:00'),
('04:00:00'),
('05:00:00'),
('06:00:00'),
('07:00:00'),
('08:00:00'),
('09:00:00'),
('10:00:00'),
('11:00:00'),
('12:00:00'),
('13:00:00'),
('14:00:00'),
('15:00:00'),
('16:00:00'),
('17:00:00'),
('18:00:00'),
('19:00:00'),
('20:00:00'),
('21:00:00'),
('22:00:00'),
('23:00:00');
Then the following query will provide room availability for all rooms with reservations:
Query 1:
select r.id
, r.Name
, res_date + interval t.slot hour_second available
from Time_Slots t
join Rooms r
on t.Slot between r.available_from and r.available_to
join (select distinct room_id, date(start) res_date from Reservation) res
on res.room_id = r.id
where (r.id, res_date + interval t.slot hour_second) not in (
select r.room_id
, date(r.start) + interval t.slot hour_second Reserved
from Time_Slots t
join Reservation r
on r.start <= date(r.end) + interval t.slot hour_second
and date(r.start) + interval t.slot hour_second < r.end)
This query works by first selecting the available slots from Times_Slots for each day that has at least one reservation for that room, and then filtering out the reserved time slots.
Results:
| id | Name | available |
|----|--------|----------------------|
| 1 | Single | 2019-01-03T09:00:00Z |
| 1 | Single | 2019-01-03T10:00:00Z |
| 1 | Single | 2019-01-03T11:00:00Z |
| 1 | Single | 2019-01-03T12:00:00Z |
| 1 | Single | 2019-01-03T13:00:00Z |
| 1 | Single | 2019-01-03T14:00:00Z |
| 1 | Single | 2019-01-03T17:00:00Z |
| 1 | Single | 2019-01-03T20:00:00Z |
| 1 | Single | 2019-01-03T21:00:00Z |
In your sample output you indicated that the room was available for 2019-01-03 22:00:00, 2019-01-03 23:00:00, 2019-01-03 24:00:00, however those times are after the Room tables defined availability block, so my query excluded those times.
The first problem you have is your schema setup is poor. You don't have good data normalization. 1) Rename the fields for better clarity. 2) Change these two tables to be like this:
Reservation:
Res_ID | hotel_id | room_id | res_start | res_end | status
Rooms:
Room_ID | hotel_id | room_name | room_number | available_from | available_to
You will need a table that has your time slots defined. You can do it with a CTE and then CROSS JOIN it with your rooms. This is one of the few cases where the CROSS JOIN is useful.
Now do your query like this:
WITH timeslots AS (
SELECT CURRENT_DATE() AS time_slot UNION
SELECT CURRENT_DATE() + 1/24 UNION
SELECT CURRENT_DATE() + 2/24 UNION
SELECT CURRENT_DATE() + 3/24 UNION
SELECT CURRENT_DATE() + 4/24 UNION
SELECT CURRENT_DATE() + 5/24 UNION
SELECT CURRENT_DATE() + 6/24 UNION
SELECT CURRENT_DATE() + 7/24 UNION
SELECT CURRENT_DATE() + 8/24 UNION
SELECT CURRENT_DATE() + 9/24 UNION
SELECT CURRENT_DATE() + 10/24 UNION
SELECT CURRENT_DATE() + 11/24 UNION
SELECT CURRENT_DATE() + 12/24 UNION
SELECT CURRENT_DATE() + 13/24 UNION
SELECT CURRENT_DATE() + 14/24 UNION
SELECT CURRENT_DATE() + 15/24 UNION
SELECT CURRENT_DATE() + 16/24 UNION
SELECT CURRENT_DATE() + 17/24 UNION
SELECT CURRENT_DATE() + 18/24 UNION
SELECT CURRENT_DATE() + 19/24 UNION
SELECT CURRENT_DATE() + 20/24 UNION
SELECT CURRENT_DATE() + 21/24 UNION
SELECT CURRENT_DATE() + 22/24 UNION
SELECT CURRENT_DATE() + 23/24 )
SELECT r.id, r.name, r.number, r.type, r.rating,
t.time_slot AS time_slot_open,
t.time_slot + 1/24 AS time_slot_close,
res.Res_ID
FROM rooms r
CROSS JOIN timeslots t
LEFT JOIN reservation res ON res.hotel_id = r.hotel_id AND res.room_id = r.room_id
AND time_slot_open >= res.res_start AND time_slot_open < res.res_close
That will get you a list of all your hotel rooms with 24 records each. If there is a reservation in that room, then it will show you the reservation ID for that slot. From here, you can either use the data as is, or you can further put this into its own CTE and just select everything from it where the reservation ID is null. You can also join or look up other data about the reservation based on that ID.
Update
If you run a version of MySQL before 8.0, the WITH clause is not supported (See: How do you use the "WITH" clause in MySQL?). You'll have to make this a subquery like this:
SELECT r.id, r.name, r.number, r.type, r.rating,
t.time_slot AS time_slot_open,
t.time_slot + 1/24 AS time_slot_close,
res.Res_ID
FROM rooms r
CROSS JOIN (SELECT CURRENT_DATE() AS time_slot UNION
SELECT CURRENT_DATE() + 1/24 UNION
SELECT CURRENT_DATE() + 2/24 UNION
SELECT CURRENT_DATE() + 3/24 UNION
SELECT CURRENT_DATE() + 4/24 UNION
SELECT CURRENT_DATE() + 5/24 UNION
SELECT CURRENT_DATE() + 6/24 UNION
SELECT CURRENT_DATE() + 7/24 UNION
SELECT CURRENT_DATE() + 8/24 UNION
SELECT CURRENT_DATE() + 9/24 UNION
SELECT CURRENT_DATE() + 10/24 UNION
SELECT CURRENT_DATE() + 11/24 UNION
SELECT CURRENT_DATE() + 12/24 UNION
SELECT CURRENT_DATE() + 13/24 UNION
SELECT CURRENT_DATE() + 14/24 UNION
SELECT CURRENT_DATE() + 15/24 UNION
SELECT CURRENT_DATE() + 16/24 UNION
SELECT CURRENT_DATE() + 17/24 UNION
SELECT CURRENT_DATE() + 18/24 UNION
SELECT CURRENT_DATE() + 19/24 UNION
SELECT CURRENT_DATE() + 20/24 UNION
SELECT CURRENT_DATE() + 21/24 UNION
SELECT CURRENT_DATE() + 22/24 UNION
SELECT CURRENT_DATE() + 23/24 ) t
LEFT JOIN reservation res ON res.hotel_id = r.hotel_id AND res.room_id = r.room_id
AND time_slot_open >= res.res_start AND time_slot_open < res.res_close
Plan A (one row per hour)
Get rid of the T and Z; MySQL does not understand that syntax.
Your motel is in a single timezone, correct? Then using either DATETIME or TIMESTAMP is equivalent.
For a 3-hour reservation, make 3 rows. (It is likely to be messier to work with ranges.)
Alas, you are using MySQL, not MariaDB; the latter has automatic sequence generators. Example: The pseudo-table named seq_0_to_23 acts like a table prepopulated with the numbers 0 through 23.
Finding available times requires having a table with all possible hours for all days, hence the note above.
Either do arithmetic or LEFT for hours:
Since LEFT is simple and straightforward,
I will discuss it:
mysql> SELECT NOW(), LEFT(NOW(), 13);
+---------------------+-----------------+
| NOW() | LEFT(NOW(), 13) |
+---------------------+-----------------+
| 2019-01-03 13:43:56 | 2019-01-03 13 |
+---------------------+-----------------+
1 row in set (0.00 sec)
The second column shows a string that could be used for indicating the 1pm hour on that day.
Plan B (ranges)
Another approach uses ranges. However, the processing is complex since all hours are always associated with either a reservation or with "available". The code gets complex, but the performance is good: http://mysql.rjweb.org/doc.php/ipranges
Plan C (bits)
The table involves a date (no time), plus a MEDIUMINT UNSIGNED which happens to be exactly 24 bits. Each bit represents one hour of the day.
Use various boolean operations:
| (OR) the bits together to see what hours are assigned.
0xFFFFFF & ~hours to see what is available.
BIT_COUNT() to count the bits (hours).
While it is possible in SQL to identify which hours a room is available, it may be better to do that in your client code. I assume you have a PHP/Java/whatever frontend!
etc.
More?
Would you like to discuss any of these in more detail?
You need to join the rooms table with a table of timeslots (24 rows). This will generate a list of all possible timeslots for a given room. Filtering out not-available time slots is trivial:
SELECT rooms.id, rooms.name, TIMESTAMP(checkdates.checkdate, timeslots.timeslot) AS datetimeslot
FROM rooms
INNER JOIN (
SELECT CAST('00:00' AS TIME) AS timeslot UNION
SELECT CAST('01:00' AS TIME) UNION
SELECT CAST('02:00' AS TIME) UNION
SELECT CAST('03:00' AS TIME) UNION
SELECT CAST('04:00' AS TIME) UNION
SELECT CAST('05:00' AS TIME) UNION
SELECT CAST('06:00' AS TIME) UNION
SELECT CAST('07:00' AS TIME) UNION
SELECT CAST('08:00' AS TIME) UNION
SELECT CAST('09:00' AS TIME) UNION
SELECT CAST('10:00' AS TIME) UNION
SELECT CAST('11:00' AS TIME) UNION
SELECT CAST('12:00' AS TIME) UNION
SELECT CAST('13:00' AS TIME) UNION
SELECT CAST('14:00' AS TIME) UNION
SELECT CAST('15:00' AS TIME) UNION
SELECT CAST('16:00' AS TIME) UNION
SELECT CAST('17:00' AS TIME) UNION
SELECT CAST('18:00' AS TIME) UNION
SELECT CAST('19:00' AS TIME) UNION
SELECT CAST('20:00' AS TIME) UNION
SELECT CAST('21:00' AS TIME) UNION
SELECT CAST('22:00' AS TIME) UNION
SELECT CAST('23:00' AS TIME)
) AS timeslots ON timeslots.timeslot >= rooms.available_from
AND timeslots.timeslot < rooms.available_to
CROSS JOIN (
SELECT CAST('2019-01-03' AS DATE) AS checkdate
) AS checkdates
WHERE NOT EXISTS (
SELECT 1
FROM reservations
WHERE room_id = rooms.id
AND TIMESTAMP(checkdates.checkdate, timeslots.timeslot) >= `start`
AND TIMESTAMP(checkdates.checkdate, timeslots.timeslot) < `end`
)
Demo on DB Fiddle
The above query checks availability for one date (2019-01-03). For multiple dates simply add them to checkdates.
I have a table having dates in it, I would want to subtract the first date with the second, the second with the third and so on till the last n-1 with n.
How do I write a query for this?
The table would is called Random and the column name is date
date
+------------+
| 2009-06-20 |
| 2010-02-12 |
| 2012-03-14 |
| 2013-09-10 |
| 2014-01-01 |
| 2015-04-10 |
| 2015-05-01 |
| 2016-01-01 |
+------------+
You need to get the next date. I would use a correlated subquery:
select t.date,
(select min(t2.date)
from t t2
where t2.date > t.date
) as next_date
from t;
You just need to use datediff() to get the difference in days.
Use ROW_NUMBER
For line numbering and then use a sub query to calculate the difference
SELECT column_date
,DATEDIFF( D , column_date
,(SELECT column_date FROM
(
SELECT column_date , ROW_NUMBER() OVER ( ORDER BY column_date) AS RowMum
FROM table_Random AS tBL_1
) AS tbl_2
WHERE tbl_2.RowMum= tBL_1.RowMum-1
)
) DIFF
FROM
(
SELECT column_date , ROW_NUMBER() OVER ( ORDER BY column_date) AS RowMum
FROM table_Random
) AS tBL_1
I did not notice the mysql tag when I wrote my answer first so am updating it now with link to MySQL 8.0 fiddle
https://www.db-fiddle.com/f/myUJYeFrMXmU1piQXAmnv4/0
/* tested against MySQL v8.0 */
WITH T(d) AS (
SELECT '2009-06-20' as d
UNION
SELECT '2010-02-12'
UNION
SELECT '2012-03-14'
UNION
SELECT '2013-09-10'
UNION
SELECT '2014-01-01'
UNION
SELECT '2015-04-10'
UNION
SELECT '2015-05-01'
UNION
SELECT '2016-01-01'
), LAGGED(d, next_d) AS (
SELECT d, LEAD(d) OVER (ORDER BY d ASC) AS next_d
FROM T
)
/* datediff args are in opposite order to SQL server. Also,
only day part is considered */
SELECT l.d, l.next_d, DATEDIFF(l.next_d, l.d) AS n_days
FROM LAGGED AS l
Here is my original answer that targeted SQL Server:
WITH T(d) AS (
SELECT d FROM (
VALUES
('2009-06-20'),
('2010-02-12'),
('2012-03-14'),
('2013-09-10'),
('2014-01-01'),
('2015-04-10'),
('2015-05-01'),
('2016-01-01')
) AS T1(d)
), LAGGED(d, next_d) AS (
SELECT d, LEAD(d) OVER (ORDER BY d ASC) AS next_d
FROM T
)
SELECT l.d, l.next_d, DATEDIFF(DAY, l.d, l.next_d) AS n_days
FROM LAGGED AS l
and produces this output (modulo the fussy hand-editing I have done):
d next_d n_days
2009-06-20 2010-02-12 237
2010-02-12 2012-03-14 761
2012-03-14 2013-09-10 545
2013-09-10 2014-01-01 113
2014-01-01 2015-04-10 464
2015-04-10 2015-05-01 21
2015-05-01 2016-01-01 245
2016-01-01 NULL NULL
If I have a table that includes:
user_id | event_time
How can I calculate the average days between events? To get something like:
days_diff | count
1 | 100
2 | 90
3 | 20
A user may have 1 day between events, but may also have 3 days between to subsequent events. How can I count them in both buckets?
Sample data (note in this case the DAY DIFF is 0/1 but this is just a small subset of data)
user_id | event_time
82770 2015-05-04 02:34:53
1 2015-05-04 08:45:53
82770 2015-05-04 20:38:24
82770 2015-05-04 20:38:24
82770 2015-05-04 20:38:24
1 2015-05-05 09:31:42
82770 2015-05-05 13:33:36
82770 2015-05-05 13:33:53
1 2015-05-06 09:53:59
1 2015-05-06 23:31:18
1 2015-05-06 23:31:35
1 2015-05-07 12:31:41
82770 2015-05-07 16:01:16
Here's a solution without using a temporary table:
select daybetweenevents as days_diff,
count(daybetweenevents) as count
from (select t1.user_id,
t1.event_time,
datediff(day, t1.event_time, min(t2.event_time)) as daybetweenevents
from yourtable t1
inner join yourtable t2
on t1.user_id = t2.user_id
and t1.event_time < t2.event_time
group by t1.user_id, t1.event_time) temp
group by daybetweenevents
Use DATEDIFF and a correlated sub query to get previous date.
SELECT user_id, event_time,
DATEDIFF((SELECT MAX(event_time)
FROM yourtable
WHERE event_time < a.event_time), event_time) AS days_diff
FROM yourtable AS a
I went with a temporary table of sorted user events to make the correlation lookup easier and handle users with more than two events. This should get you the output you are asking for.
create table #tempOrderedUserEvents
(
id int identity (1,1),
userid int,
event_time datetime
)
insert into #tempOrderedUserEvents (userid, event_time)
select [user_id], event_time
from YourUserDataTable A
order by [user_id], event_time
select interval, count(*) as [count]
from
(
select A.userid, datediff(day, A.event_time, B.event_time) as interval
from #tempOrderedUserEvents A
JOIN #tempOrderedUserEvents B on A.id+1 = B.id and A.userid = B.userid
) as Intervals
group by interval
drop table #tempOrderedUserEvents
Suppose a table, tableX, like this:
| date | hours |
| 2014-07-02 | 10 |
| 2014-07-03 | 10 |
| 2014-07-07 | 20 |
| 2014-07-08 | 40 |
The dates are 'workdays' -- that is, no weekends or holidays.
I want to find the increase in hours between consecutive workdays, like this:
| date | hours |
| 2014-07-03 | 0 |
| 2014-07-07 | 10 |
| 2014-07-08 | 20 |
The challenge is dealing with the gaps. If there were no gaps, something like
SELECT t1.date1 AS 'first day', t2.date1 AS 'second day', (t2.hours - t1.hours)
FROM tableX t1
LEFT JOIN tableX t2 ON t2.date1 = DATE_add(t1.date1, INTERVAL 1 DAY)
ORDER BY t2.date1;
would get it done, but that doesn't work in this case as there is a gap between 2014-07-03 and 2014-07-07.
Just use a correlated subquery instead. You have two fields, so you can do this with two correlated subqueries, or a correlated subquery with a join back to the table. Here is the first version:
SELECT t1.date1 as `first day`,
(select t2.date1
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) as `next day`,
(select t2.hours
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) - t.hours
FROM tableX t
ORDER BY t.date1;
Another alternative is to rank the data by date and then subtract the hours of the previous workday's date from the hours of the current workday's date.
SELECT
ranked_t1.date1 date,
ranked_t1.hours - ranked_t2.hours hours
FROM
(
SELECT t.*,
#rownum := #rownum + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum := 0) r
) ranked_t1
INNER JOIN
(
SELECT t.*,
#rownum2 := #rownum2 + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum2 := 0) r
) ranked_t2
ON ranked_t2.rank = ranked_t1.rank - 1;
SQL Fiddle demo
Note:
Obviously an index on tableX.date1 would speed up the query.
Instead of a correlated subquery, a join is used in the above query.
Reference:
Mysql rank function on SO
Unfortunately, MySQL doesn't (yet) have analytic functions which would allow you to access the "previous row" or the "next row" of the data stream. However, you can duplicate it with this:
select h2.LogDate, h2.Hours - h1.Hours as Added_Hours
from Hours h1
left join Hours h2
on h2.LogDate =(
select Min( LogDate )
from Hours
where LogDate > h1.LogDate )
where h2.LogDate is not null;
Check it out here. Note the index on the date field. If that field is not indexed, this query will take forever.