MySQL time difference in consecutive rows with hierarchical data - mysql

How does one query the time difference between consecutive rows with a hierarchical data? For example, I'd like to go from the following table:
+-------+----------+---------------------+
| group_id | event | event_time |
+-------+----------+---------------------+
| 1 | alarm | 2016-12-01 17:53:12 |
| 1 | alarm | 2016-12-01 17:59:43 |
| 2 | purchase | 2016-11-29 09:49:47 |
| 2 | purchase | 2016-11-29 09:53:51 |
| 2 | purchase | 2016-11-29 09:57:59 |
| 2 | alarm | 2016-11-29 10:01:02 |
| 2 | alarm | 2016-11-29 10:13:27 |
| 2 | purchase | 2016-11-29 10:15:00 |
| 2 | purchase | 2016-11-29 10:16:24 |
+-------+----------+---------------------+
to:
+-------+----------+---------------------+------------+
| group_id | event | event_time | time_delta |
+-------+----------+---------------------+------------+
| 1 | alarm | 2016-12-01 17:53:12 | 0 |
| 1 | alarm | 2016-12-01 17:59:43 | 00:06:31 |
| 2 | purchase | 2016-11-29 09:49:47 | 0 |
| 2 | purchase | 2016-11-29 09:53:51 | 00:04:04 |
| 2 | purchase | 2016-11-29 09:57:59 | 00:04:08 |
| 2 | alarm | 2016-11-29 10:01:02 | 0 |
| 2 | alarm | 2016-11-29 10:13:27 | 00:12:25 |
| 2 | purchase | 2016-11-29 10:15:00 | 0 |
| 2 | purchase | 2016-11-29 10:16:24 | 00:01:24 |
+-------+----------+---------------------+------------+
Data above is illustrative; my data actually has many groups and many events. So basically, I'd like calculate the time difference whenever the group_id and the event is the same in consecutive rows.

You can get the previous time for a given group by doing:
select t.*,
(select t2.time_delta
from t t2
where t2.group_id = t.group_id and
t2.event = t.event and
t2.event_time < t.event_time
order by t2.event_time desc
limit 1
) as prev_event_time
from t;
You can then get the time difference in a variety of ways, such as:
select t.*, timediff(event_time, prev_event_time)
from (select t.*,
(select t2.time_delta
from t t2
where t2.group_id = t.group_id and
t2.event = t.event and
t2.event_time < t.event_time
order by t2.event_time desc
limit 1
) as prev_event_time
from t
) t

Try this using user defined variables:
SELECT
group_id, event, event_time, diff time_delta
FROM
(SELECT
t1.*,
CASE
WHEN #event = event AND #group = group_id THEN TIME_FORMAT(TIMEDIFF(event_time, #et), '%H:%i:%s')
ELSE 0
END diff,
#event:=event,
#group:=group_id,
#et:=event_time
FROM
(SELECT
*
FROM
your_table
ORDER BY group_id , event_time) t1
CROSS JOIN (SELECT #event:='', #group:=- 1, #et:='') t2) t;
#et variable stores the previous event_time within each group of group_id and event.

Related

How to get data status wise after grouping?

I want to get count of different statuses for bookings for each event and some other data. Each row should represent an event.
So I have an events table and a bookings table.
Events table has id, name, max_allowed
Bookings table has id,event_id,status
Status can be booked, canceled, waitlisted.
I want to get data for all events with the count for each status.
So I need these columns -
event_id
booked_count
canceled_count
waitlisted_count
remaining_slots - (max_allowed - booked_count)
occupancy_rate - booked_count/max_allowed
Sample data:
Events
| id | name | max_allowed |
|---- |--------- |------------- |
| 1 | Yoga | 5 |
| 2 | Boxing | 2 |
| 3 | Pilates | 5 |
Bookings
| id | event_id | status |
|---- |---------- |------------ |
| 1 | 1 | booked |
| 2 | 1 | booked |
| 3 | 2 | booked |
| 4 | 2 | canceled |
| 5 | 2 | booked |
| 6 | 2 | waitlisted |
| 7 | 3 | booked |
| 8 | 3 | booked |
| 9 | 3 | booked |
Output:
| event_id | booked_count | canceled_count | waitlisted_count | remaining_slots | occupancy_rate |
|---------- |-------------- |---------------- |------------------ |----------------- |---------------- |
| 1 | 2 | 0 | 0 | 3 | 0.4 |
| 2 | 2 | 1 | 1 | 0 | 1 |
| 3 | 3 | 0 | 0 | 2 | 0.6 |
Use conditional aggregation:
select t.*,
greatest(0, t.max_allowed - t.booked_count + t.canceled_count - t.waitlisted_count) remaining_slots,
least(t.max_allowed, t.booked_count - t.canceled_count + t.waitlisted_count) / t.max_allowed occupancy_rate
from (
select e.id, e.name, e.max_allowed,
sum(status = 'booked') booked_count,
sum(status = 'canceled') canceled_count,
sum(status = 'waitlisted') waitlisted_count
from Events e left join Bookings b
on b.event_id = e.id
group by e.id, e.name, e.max_allowed
) t
Try below query.
with src_data as
(select
event_id,
sum(case when status='booked' then 1 else 0 end ) as booked_count,
sum(case when status='canceled' then 1 else 0 end ) as canceled_count,
sum(case when status='waitlisted' then 1 else 0 end ) as waitlisted_count
from bookings group by event_id
)
select
s.event_id,
s.booked_count,
s.canceled_count,
s.waitlisted_count,
e.max_allowed-s.booked_count,
s.booked_count/e.max_allowed
from events e inner join src_data s
on e.id=s.event_id;

Row counter per Column

Say I have a table like so
| id | user_id | event_id | created_at |
|----|---------|----------|------------|
| 1 | 5 | 10 | 2015-01-01 |
| 2 | 6 | 7 | 2015-01-02 |
| 3 | 3 | 8 | 2015-01-01 |
| 4 | 5 | 9 | 2015-01-04 |
| 5 | 5 | 10 | 2015-01-02 |
| 6 | 6 | 1 | 2015-01-01 |
I want to be able to generate a counter of events per user. So my result would be:
| counter | user_id | event_id | created_at |
|---------|---------|----------|------------|
| 1 | 5 | 10 | 2015-01-01 |
| 1 | 6 | 7 | 2015-01-02 |
| 1 | 3 | 8 | 2015-01-01 |
| 2 | 5 | 9 | 2015-01-04 |
| 3 | 5 | 10 | 2015-01-02 |
| 2 | 6 | 1 | 2015-01-01 |
One idea is to self join the table and group by to replicate row_number() over.. function available in other RDBMS.
Check this Rextester Demo and see second query, to understand how inner join works in this case.
select t1.user_id,
t1.event_id,
t1.created_at,
count(*) as counter
from your_table t1
inner join your_table t2
on t1.user_id=t2.user_id
and t1.id>=t2.id
group by t1.user_id,
t1.event_id,
t1.created_at
order by t1.user_id,t1.event_id;
Output:
+---------+----------+------------+---------+
| user_id | event_id | created_at | counter |
+---------+----------+------------+---------+
| 3 | 8 | 01-01-2015 | 1 |
| 5 | 10 | 01-01-2015 | 1 |
| 5 | 10 | 02-01-2015 | 3 |
| 5 | 9 | 04-01-2015 | 2 |
| 6 | 1 | 01-01-2015 | 2 |
| 6 | 7 | 02-01-2015 | 1 |
+---------+----------+------------+---------+
Try the following:
select counter,
xx.user_id,
xx.event_id,
xx.created_at
from xx
join (select a.id,
a.user_id,
count(*) as counter
from xx as a
join xx as b
on a.user_id=b.user_id
and b.id<=a.id
group by 1,2) as counts
on xx.id=counts.id
Use a join to generate rows for each id with all the other lower ids for that user below it and count them.
Try This one:
Sub query will help to get this rsult.
select (select count(*) from user_event iue where iue.user_id == oue.user_id) as counter,
oue.user_id,
oue.event_id,
oue.created_at
from user_event oue
You could try to use a variable as a table, cross join it with the source table and reset whenever user id changes.
SELECT #counter := CASE
WHEN #user = user_id THEN #counter + 1
ELSE 1
END AS counter,
#user := user_id AS user_id,
event_id,
created_at
FROM your_table m,
(SELECT #counter := 0,
#user := '') AS t
ORDER BY user_id;
I've created a demo here

Get last balance sign change in (My)SQL

I have a Transaction table that records every amount added to or subtracted from the balance of a Customer, with the new balance:
+----+------------+------------+--------+---------+
| id | customerId | timestamp | amount | balance |
+----+------------+------------+--------+---------+
| 1 | 1 | 1000000001 | 10 | 10 |
| 2 | 1 | 1000000002 | -20 | -10 |
| 3 | 1 | 1000000003 | -10 | -20 |
| 4 | 2 | 1000000004 | -5 | -5 |
| 5 | 2 | 1000000005 | -5 | -10 |
| 6 | 2 | 1000000006 | 10 | 0 |
| 7 | 3 | 1000000007 | -5 | -5 |
| 8 | 3 | 1000000008 | 10 | 5 |
| 9 | 3 | 1000000009 | 10 | 15 |
| 10 | 4 | 1000000010 | 5 | 5 |
+----+------------+------------+--------+---------+
The Customer table stores the current balance, and looks like:
+----+---------+
| id | balance |
+----+---------+
| 1 | -20 |
| 2 | 0 |
| 3 | 15 |
| 4 | 5 |
+----+---------+
I would like to add a balanceSignSince column, that would store the timestamp at which the balance sign last changed. Transitioning to and from positive, negative, or zero counts as a balance change.
After the update, based on the above data, the Customer table should contain:
+----+---------+------------------+
| id | balance | balanceSignSince |
+----+---------+------------------+
| 1 | -20 | 1000000002 |
| 2 | 0 | 1000000006 |
| 3 | 15 | 1000000008 |
| 4 | 5 | 1000000010 |
+----+---------+------------------+
How can I write a SQL query that updates every Customer with the last time the balance sign changed, based on the Transaction table?
I suspect I can't do this without a quite complex stored procedure, but am curious to see if any clever ideas come up.
This uses a simulated rank() function.
select customerId, min(tstamp) from
(
select tstamp,
if (#cust = customerId and sign(#bal) = sign(balance), #rn := #rn,
if (#cust = customerId and sign(#bal) <> sign(balance), #rn := #rn + 1, #rn := 0)) as rn,
#cust := customerId as customerId, #bal := balance as balance
from
(select #rn := 0) x,
(select id, #cust := customerId as customerId, tstamp, amount, #bal := balance as balance
from trans order by customerId, tstamp desc) y
) z
where rn = 0
group by customerId;
Check it: http://rextester.com/XJVKK61181
This script returns a table like this:
+------------+----+------------+---------+
| tstamp | rn | customerId | balance |
+------------+----+------------+---------+
| 1000000003 | 0 | 1 | -20 |
| 1000000002 | 0 | 1 | -10 |
| 1000000001 | 1 | 1 | 10 |
| 1000000006 | 0 | 2 | 0 |
| 1000000005 | 2 | 2 | -10 |
| 1000000004 | 2 | 2 | -5 |
| 1000000009 | 0 | 3 | 15 |
| 1000000008 | 2 | 3 | 5 |
| 1000000007 | 3 | 3 | -5 |
| 1000000010 | 0 | 4 | 5 |
+------------+----+------------+---------+
Then selecting min(timestamp) of files where rn = 0:
+------------+-------------+
| customerId | min(tstamp) |
+------------+-------------+
| 1 | 1000000002 |
+------------+-------------+
| 2 | 1000000006 |
+------------+-------------+
| 3 | 1000000009 |
+------------+-------------+
| 4 | 1000000010 |
+------------+-------------+
Updated answer with the restriction that this needs to work on the existing data
The following query should work for most cases, there is still an issue with customers having only a single transaction or no sign change. As this is a one time update, I would run the query below and then do a simple update for all users not having a timestamp set, for them it's going to be the timestamp of the first transaction:
# Find the smallest timestamp, e.g. the
# transaction which changed the signum.
SELECT
p.customerId as customerId,
MIN(t.timestamp) as balanceSignSince
FROM
transaction as t,
(
# find the latest timestamp having
# a different sign for each user.
# Here is the issue with users having
# only a single transaction or no sign
# changes.
SELECT
u.customerId as customerId,
MAX(t.timestamp) as balanceSignSince
FROM
transaction as t,
customer as c,
(
# find the timestamp of the very last
# transaction for every user.
SELECT
t.customerId as customerId,
MAX(t.timestamp) as lastTransaction
FROM
transaction as t
GROUP BY
t.customerId
) as u
WHERE
u.customerId = c.id
AND u.customerId = t.customerId
AND SIGN(c.balance) <> SIGN(t.balance)
GROUP BY
u.customerId
) as p
WHERE
p.customerId = t.customerId
AND p.balanceSignSince < t.timestamp
GROUP BY
p.customerId;
Fiddle: http://sqlfiddle.com/#!9/bd0760/13
Original Answer
This should work to get the timestamp of a sign change:
SELECT
c.id as id,
MAX(t.timestamp) as balanceSignSince
FROM
transaction as t,
customer as c
WHERE
t.customerId = c.id
AND SIGN(t.balance) <> SIGN(c.balance)
This needs to be executed before the customer table is updated with the new balance. If you have a trigger on transation:insert you should probably put the above into the query updating the customer table.

MySQL - Time difference between rows of the same type

I have a table with a list of agent_ids, a previous_status, a new status, and a time stamp. I'm trying to determine the time difference between each status change, by agent, in order to determine how long an agent was active in a particular status.
For example:
+------+--------------+--------------+----------------+----------------------+
| id | agent_id | old_status | new_status | date_time |
+----------------------------------------------------------------------------+
| 1 | 1 | offline | online | 2015-06-11 09:00:01 |
| 2 | 1 | online | busy | 2015-06-11 09:30:23 |
| 3 | 3 | offline | online | 2015-06-11 09:31:27 |
| 4 | 1 | busy | offline | 2015-06-11 09:31:45 |
| 5 | 3 | online | offline | 2015-06-11 09:32:10 |
+----------------------------------------------------------------------------+
The goal would be to create a new result table with a time_difference column,
and the time_difference column for row 5 for example, should be 43 seconds, which is the difference between row 5 (the most recent status for agent_id 3) and row 3, the previous status for agent_id 3. Likewise, the time_difference for row 4 should be difference between row 4 and row 2.
You can do something along the lines of
SELECT id, agent_id, old_status, new_status, date_time, seconds
FROM
(
SELECT id, agent_id, old_status, new_status, date_time,
IF(#a = agent_id, TIMESTAMPDIFF(SECOND, #p, date_time), NULL) seconds,
#a := agent_id, #p := date_time
FROM table1 t CROSS JOIN (SELECT #p := NULL, #a := NULL) i
ORDER BY agent_id, id
) q
Output:
+------+----------+------------+------------+---------------------+---------+
| id | agent_id | old_status | new_status | date_time | seconds |
+------+----------+------------+------------+---------------------+---------+
| 1 | 1 | offline | online | 2015-06-11 09:00:01 | NULL |
| 2 | 1 | online | busy | 2015-06-11 09:30:23 | 1822 |
| 4 | 1 | busy | offline | 2015-06-11 09:31:45 | 82 |
| 3 | 3 | offline | online | 2015-06-11 09:31:27 | NULL |
| 5 | 3 | online | offline | 2015-06-11 09:32:10 | 43 |
+------+----------+------------+------------+---------------------+---------+
Here is a SQLFiddle demo
You can approach this without variables, using a correlated subquery:
select t.*,
timestampdiff(second, t.date_time, t.next_date_time) as secs
from (select t.*,
(select t2.date_time
from table t2
where t2.agent_id = t.agent_id and
t2.date_time > t.date_time
order by t2.date_time
limit 1
) as next_date_time
from table t
) t

MySQL count rows with similar timestamp

Is there anyway to count a given run of timestamps that are close to each other, but not necessarily in a fixed time frame?
Ie, not grouped by hour or minute, but rather grouped by how close the current row's timestamp is to the next row's timestamp. If the next row is within "x" seconds/minutes then add that row to the group, otherwise start a new grouping.
Given this data:
+----+---------+---------------------+
| id | item_id | event_date |
+----+---------+---------------------+
| 1 | 1 | 2013-05-17 11:59:59 |
| 2 | 1 | 2013-05-17 12:00:00 |
| 3 | 1 | 2013-05-17 12:00:02 |
| 4 | 1 | 2013-05-17 12:00:03 |
| 5 | 3 | 2013-05-17 14:05:00 |
| 6 | 3 | 2013-05-17 14:05:01 |
| 7 | 3 | 2013-05-17 15:30:00 |
| 8 | 3 | 2013-05-17 15:30:01 |
| 9 | 3 | 2013-05-17 15:30:02 |
| 10 | 1 | 2013-05-18 09:12:00 |
| 11 | 1 | 2013-05-18 09:13:30 |
| 12 | 1 | 2013-05-18 09:13:45 |
| 13 | 1 | 2013-05-18 09:14:00 |
| 14 | 2 | 2013-05-20 15:45:00 |
| 15 | 2 | 2013-05-20 15:45:03 |
| 16 | 2 | 2013-05-20 15:45:10 |
| 17 | 2 | 2013-05-23 07:36:00 |
| 18 | 2 | 2013-05-23 07:36:10 |
| 19 | 2 | 2013-05-23 07:36:12 |
| 20 | 2 | 2013-05-23 07:36:15 |
| 21 | 1 | 2013-05-24 11:55:00 |
| 22 | 1 | 2013-05-24 11:55:02 |
+----+---------+---------------------+
Desired Results:
+---------+-------+---------------------+
| item_id | total | last_date_in_group |
+---------+-------+---------------------+
| 1 | 4 | 2013-05-17 12:00:03 |
| 3 | 2 | 2013-05-17 14:05:01 |
| 3 | 3 | 2013-05-17 15:30:02 |
| 1 | 4 | 2013-05-18 09:14:00 |
| 2 | 3 | 2013-05-20 15:45:10 |
| 2 | 4 | 2013-05-23 07:36:15 |
| 1 | 2 | 2013-05-24 11:55:02 |
+---------+-------+---------------------+
This is a little complicated. To start, you need is time of the next event for each record. The following subquery adds in such a time (nexted), if it is within bounds:
select t.*,
(select event_date
from t t2
where t2.item_id = t.item_id and
t2.event_date > t.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t
This uses a correlated subquery. The <date comparison here> is for whatever date comparison you want. When there is no record, the value will be NULL.
Now, with this information (nexted) there is a trick to get the grouping. For any record, it is the first event time afterwards where nexted is NULL. This will be the last event in the series. Unfortunately, this requires two levels of nested correlated subqueries (or joins with aggregations). The result looks a bit unwieldy:
select item_id, GROUPING, MIN(event_date) as start_date, MAX(event_date) as end_date,
COUNT(*) as num_dates
from (select t.*,
(select min(t2.event_date)
from (select t1.*,
(select event_date
from t t2
where t2.item_id = t1.item_id and
t2.event_date > t1.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t1
) t2
where t2.nexted is null
) as grouping
from t
) s
group by item_id, grouping;
What about approaching it from finding each individual record's local associations, and then grouping on the max event date from each record's discoveries. This is based on a static differential time interval (5 minutes in my example)
SELECT item_id, MAX(total), MAX(last_date_in_group) AS last_date_in_group FROM (
SELECT t1.item_id, COUNT(*) AS total, COALESCE(GREATEST(t1.event_date, MAX(t2.event_date)), t1.event_date) AS last_date_in_group
FROM table_name t1
LEFT JOIN table_name t2 ON t2.event_date BETWEEN t1.event_date AND t1.event_date + INTERVAL 5 MINUTE
GROUP BY t1.id
) t
GROUP BY last_date_in_group