Calculate average days between events? - mysql

If I have a table that includes:
user_id | event_time
How can I calculate the average days between events? To get something like:
days_diff | count
1 | 100
2 | 90
3 | 20
A user may have 1 day between events, but may also have 3 days between to subsequent events. How can I count them in both buckets?
Sample data (note in this case the DAY DIFF is 0/1 but this is just a small subset of data)
user_id | event_time
82770 2015-05-04 02:34:53
1 2015-05-04 08:45:53
82770 2015-05-04 20:38:24
82770 2015-05-04 20:38:24
82770 2015-05-04 20:38:24
1 2015-05-05 09:31:42
82770 2015-05-05 13:33:36
82770 2015-05-05 13:33:53
1 2015-05-06 09:53:59
1 2015-05-06 23:31:18
1 2015-05-06 23:31:35
1 2015-05-07 12:31:41
82770 2015-05-07 16:01:16

Here's a solution without using a temporary table:
select daybetweenevents as days_diff,
count(daybetweenevents) as count
from (select t1.user_id,
t1.event_time,
datediff(day, t1.event_time, min(t2.event_time)) as daybetweenevents
from yourtable t1
inner join yourtable t2
on t1.user_id = t2.user_id
and t1.event_time < t2.event_time
group by t1.user_id, t1.event_time) temp
group by daybetweenevents

Use DATEDIFF and a correlated sub query to get previous date.
SELECT user_id, event_time,
DATEDIFF((SELECT MAX(event_time)
FROM yourtable
WHERE event_time < a.event_time), event_time) AS days_diff
FROM yourtable AS a

I went with a temporary table of sorted user events to make the correlation lookup easier and handle users with more than two events. This should get you the output you are asking for.
create table #tempOrderedUserEvents
(
id int identity (1,1),
userid int,
event_time datetime
)
insert into #tempOrderedUserEvents (userid, event_time)
select [user_id], event_time
from YourUserDataTable A
order by [user_id], event_time
select interval, count(*) as [count]
from
(
select A.userid, datediff(day, A.event_time, B.event_time) as interval
from #tempOrderedUserEvents A
JOIN #tempOrderedUserEvents B on A.id+1 = B.id and A.userid = B.userid
) as Intervals
group by interval
drop table #tempOrderedUserEvents

Related

get totals each day based on a given timestamp

I have a simple table:
user | timestamp
===================
Foo | 1440358805
Bar | 1440558805
BarFoo | 1440559805
FooBar | 1440758805
I would like to get a view with total number of users each day:
date | total
===================
...
2015-08-23 | 1 //Foo
2015-08-24 | 1
2015-08-25 | 1
2015-08-26 | 3 //+Bar +BarFoo
2015-08-27 | 3
2015-08-28 | 4 //+FooBar
...
What I currently have is
SELECT From_unixtime(a.timestamp, '%Y-%m-%d') AS date,
Count(From_unixtime(a.timestamp, '%Y-%m-%d')) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
ORDER BY a.timestamp ASC
which counts only the user of a certain day:
date | total
===================
2015-08-23 | 1 //Foo
2015-08-26 | 2 //Bar +BarFoo
2015-08-28 | 1 //FooBar
I've prepared a sqlfiddle
EDIT
The solution by #splash58 returns this result:
date | #t:=coalesce(total, #t)
==================================
2015-08-23 | 1
2015-08-26 | 3
2015-08-28 | 4
2015-08-21 | 4
2015-08-22 | 4
2015-08-24 | 4
2015-08-25 | 4
2015-08-27 | 4
2015-08-29 | 4
2015-08-30 | 4
You can get the cumulative values by using variables:
SELECT date, total, (#cume := #cume + total) as cume_total
FROM (SELECT From_unixtime(a.timestamp, '%Y-%m-%d') as date, Count(*) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
) a CROSS JOIN
(SELECT #cume := 0) params
ORDER BY date;
This gives you the dates that are in your data. If you want additional dates (where no users start), then one way is a calendar table:
SELECT c.date, a.total, (#cume := #cume + coalesce(a.total, 0)) as cume_total
FROM Calendar c JOIN
(SELECT From_unixtime(a.timestamp, '%Y-%m-%d') as date, Count(*) AS total
FROM thetable AS a
GROUP BY From_unixtime(a.timestamp, '%Y-%m-%d')
) a
ON a.date = c.date CROSS JOIN
(SELECT #cume := 0) params
WHERE c.date BETWEEN '2015-08-23' AND '2015-08-28'
ORDER BY c.date;
You can also put the dates explicitly in the query (using a subquery), if you don't have a calendar table.
To save order of dates, i think, we need to wrap query in one more select
select date, #n:=#n + ifnull(total,0) total
from
(select Calendar.date, total
from Calendar
left join
(select From_unixtime(timestamp, '%Y-%m-%d') date, count(*) total
from thetable
group by date) t2
on Calendar.date= t2.date
order by date) t3
cross join (select #n:=0) n
Demo on sqlfiddle
You can use function
TIMESTAMPDIFF(DAY,`timestamp_field`, CURDATE())
You will not have to convert timestamp to other field dypes.
drop table if exists thetable;
create table thetable (user text, timestamp int);
insert into thetable values
('Foo', 1440358805),
('Bar', 1440558805),
('BarFoo', 1440559805),
('FooBar', 1440758805);
DROP PROCEDURE IF EXISTS insertTEMP;
DELIMITER //
CREATE PROCEDURE insertTEMP (first date, last date) begin
drop table if exists Calendar;
CREATE TEMPORARY TABLE Calendar (date date);
WHILE first <= last DO
INSERT INTO Calendar Values (first);
SET first = first + interval 1 day;
END WHILE;
END //
DELIMITER ;
call insertTEMP('2015-08-23', '2015-08-28');
select Calendar.date, #t:=coalesce(total, #t)
from Calendar
left join
(select date, max(total) total
from (select From_unixtime(a.timestamp, '%Y-%m-%d') AS date,
#n:=#n+1 AS total
from thetable AS a, (select #n:=0) n
order by a.timestamp ASC) t1
group by date ) t2
on Calendar.date= t2.date,
(select #t:=0) t
result
date, #t:=coalesce(total, #t)
2015-08-23 1
2015-08-24 1
2015-08-25 1
2015-08-26 3
2015-08-27 3
2015-08-28 4

Calculating duration percentage

I have a table which logs the HTTP status code of a website whenever the status changes, so the table looks like this...
id status date
-----------------------------------
1 404 2015-10-01 13:30:00
2 200 2015-10-02 13:30:00
3 404 2015-10-03 13:30:00
I want to use this data to display a table on my website showing how many times each status has been logged and the percentage duration of the status to the present time.
I have successfully managed to get the total count for each status using following query....
SELECT
`status`,
COUNT(*) AS `status_count`
FROM `table_name`
GROUP BY `status`
ORDER BY `status`
...when executed gives me something like this...
status status_count
----------------------
200 1
404 2
I would like to modify my sql add duration to my results calculated by the date column, my goal is to end up with this...
status status_count duration (%)
-----------------------------------
200 1 20
404 2 80
Here is SQL FIDDLE DEMO
SELECT t1.status
,COUNT(t1.id) as status_count
,SUM(IF(t2.date IS NULL, NOW(), t2.date)-t1.date) / (NOW()-t3.start_date) as duration
FROM table_name t1
LEFT JOIN table_name t2 ON t1.id = (t2.id - 1)
,(SELECT MIN(date) as start_date FROM table_name) t3
GROUP BY t1.status
Mine is more complicated than nick but give a different result.
And i try it on excel to verify values are correct.
I start date with 2015-07-01 13:30:00 so NOW() function can work
That mean seconds are
404 | 86400 1 day | 0.05101
200 | 86400 1 day | 0.05101
404 | 1521138 17 days | 0.89799
total 1693938
Final Result
404 | 2 | 0.94899
200 | 1 | 0.05101
SQL FIDDLE DEMO
SELECT status, Count(status), SUM(secdiff) / MAX(sectotal) as porcentage
FROM
(
SELECT
h1.status,
h2.dateupdate d1,
h1.dateupdate d2,
TIMESTAMPDIFF(SECOND,h1.dateupdate, h2.dateupdate) secdiff,
TIMESTAMPDIFF(SECOND,
(SELECT MIN(dateupdate) from logHttp),
NOW()) sectotal
FROM
logHttp as h1 INNER JOIN
(
(Select * from logHttp)
union
(select MAX(id) +1, 0, NOW() from logHttp)
) as h2
On h1.id + 1 = h2.id
) as t1
group by status;
Use this query
select a.*,(a.status_count/b.tot)*100 as duration_per from (SELECT
`status`,
COUNT(*) AS `status_count`
FROM `table_name`
GROUP BY `status`
ORDER BY `status`) a join (select count(*) as tot from `table_name`) b

Get distinct values in union all in hive

I have a table in hive that looks something like this
cust_id prod_id timestamp
1 11 2011-01-01 03:30:23
2 22 2011-01-01 03:34:53
1 22 2011-01-01 04:21:03
2 33 2011-01-01 04:44:09
3 33 2011-01-01 04:54:49
so on and so forth.
For each record I want to check that how many unique products has this customer bought within the last 24 hrs excluding the current transaction. So the output should look something like this -
1 0
2 0
1 1
2 1
3 0
My hive query looks something like this
select * from(
select t1.cust_id, count(distinct t1.prod_id) as freq from temp_table t1
left outer join temp_table t2 on (t1.cust_id=t2.cust_id)
where t1.timestamp>=t2.timestamp
and unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
group by t1.cust_id
union all
select t.cust_id, 0 as freq from temp_table t2
)unioned;
Just get all the rows for last 24 hours do a group by on custid and count(distinct productid) -1 as the output. Overall query would look something like this.
select cust_id, COUNT(distinct prod_id) - 1 from table_name where
unix_timestamp(t1.timestamp)-unix_timestamp(t2.timestamp) < 24*60*60
GROUP BY cust_id
*I am subtracting 1 here to exclude the latest transactionid of the user. (hope this is what you meant)
You can join to a derived table that contains the distinct # of products purchased in the past 24 hours for each customer/timestamp pair.
select t1.cust_id, t1.prod_id, t1.timestamp, t2.count_distinct_prod_id - 1
from mytable t1
join (
select t2.cust_id, t2.timestamp, count(distinct t3.prod_id) count_distinct_prod_id
from mytable t2
join mytable t3 on t3.cust_id = t2.cust_id
where unix_timestamp(t2.timestamp) - unix_timestamp(t3.timestamp) < 24*60*60
group by t2.cust_id, t2.timestamp
) t2 on t1.cust_id = t2.cust_id and t1.timestamp = t2.timestamp

Finding a previous, non-contiguous date using SQL

Suppose a table, tableX, like this:
| date | hours |
| 2014-07-02 | 10 |
| 2014-07-03 | 10 |
| 2014-07-07 | 20 |
| 2014-07-08 | 40 |
The dates are 'workdays' -- that is, no weekends or holidays.
I want to find the increase in hours between consecutive workdays, like this:
| date | hours |
| 2014-07-03 | 0 |
| 2014-07-07 | 10 |
| 2014-07-08 | 20 |
The challenge is dealing with the gaps. If there were no gaps, something like
SELECT t1.date1 AS 'first day', t2.date1 AS 'second day', (t2.hours - t1.hours)
FROM tableX t1
LEFT JOIN tableX t2 ON t2.date1 = DATE_add(t1.date1, INTERVAL 1 DAY)
ORDER BY t2.date1;
would get it done, but that doesn't work in this case as there is a gap between 2014-07-03 and 2014-07-07.
Just use a correlated subquery instead. You have two fields, so you can do this with two correlated subqueries, or a correlated subquery with a join back to the table. Here is the first version:
SELECT t1.date1 as `first day`,
(select t2.date1
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) as `next day`,
(select t2.hours
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) - t.hours
FROM tableX t
ORDER BY t.date1;
Another alternative is to rank the data by date and then subtract the hours of the previous workday's date from the hours of the current workday's date.
SELECT
ranked_t1.date1 date,
ranked_t1.hours - ranked_t2.hours hours
FROM
(
SELECT t.*,
#rownum := #rownum + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum := 0) r
) ranked_t1
INNER JOIN
(
SELECT t.*,
#rownum2 := #rownum2 + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum2 := 0) r
) ranked_t2
ON ranked_t2.rank = ranked_t1.rank - 1;
SQL Fiddle demo
Note:
Obviously an index on tableX.date1 would speed up the query.
Instead of a correlated subquery, a join is used in the above query.
Reference:
Mysql rank function on SO
Unfortunately, MySQL doesn't (yet) have analytic functions which would allow you to access the "previous row" or the "next row" of the data stream. However, you can duplicate it with this:
select h2.LogDate, h2.Hours - h1.Hours as Added_Hours
from Hours h1
left join Hours h2
on h2.LogDate =(
select Min( LogDate )
from Hours
where LogDate > h1.LogDate )
where h2.LogDate is not null;
Check it out here. Note the index on the date field. If that field is not indexed, this query will take forever.

same column calculation / percentage

How does someone in MYSQL compare a users percentage from a dates entry and score to another dates entry and score, effectively returning a users percentage increase from one date to another?
I have been trying to wrap my head around this question for a few days and am running out of ideas and feel my sql knowledge is limited. Not sure if I'm supposed to use a join or a subquery? The MYSQL tables consist of 3 fields, name, score, and date.
TABLE: userData
name score date
joe 5 2014-01-01
bob 10 2014-01-01
joe 15 2014-01-08
bob 12 2014-01-08
returned query idea
user %inc last date
joe 33% 2014-01-08
bob 17% 2014-01-08
It seems like such a simple function a database would serve yet trying to understand this is out of my grasp?
You need to use SUBQUERIES. Something like:
SELECT name,
((SELECT score
FROM userData as u2
WHERE u2.name = u1.name
ORDER BY date desc
LIMIT 1
)
/
(
SELECT score
FROM userData as u3
WHERE u3.name = u1.name
ORDER BY date desc
LIMIT 1,1
)
* 100.0
) as inc_perc,
max(date) as last_date
FROM userData as u1
GROUP BY name
Simple solution assuming that the formula for %Inc column = total/sum *100
select name,total/sum * 100, date from (
select name,sum(score) as total,count(*) as num,date from table group by name
)as resultTable
select a.name as [user],(cast(cast(b.score as float)-a.score as float)/cast(a.score as float))*100 as '% Inc',b.[date] as lastdate
from userdata a inner join userdata b on a.name = b.name and a.date < b.date
I guess you are looking for the % increse in the score compared to past date
Another way (and note, that I have another result. Based on the name "percinc", percentage increase, I calculated it in my eyes correctly. If you want your result, just calculate it with t1.score / t2.score * 100):
Sample data:
CREATE TABLE t
(`name` varchar(3), `score` int, `date` varchar(10))
;
INSERT INTO t
(`name`, `score`, `date`)
VALUES
('joe', 5, '2014-01-01'),
('bob', 10, '2014-01-01'),
('joe', 15, '2014-01-08'),
('bob', 12, '2014-01-08')
;
Query:
select
t1.name,
t1.score first_score,
t1.date first_date,
t2.score last_score,
t2.date last_date,
t2.score / t1.score * 100 percinc
from
t t1
join t t2 on t1.name = t2.name
where
t1.date = (select min(date) from t where t.name = t1.name)
and t2.date = (select max(date) from t where t.name = t1.name);
Result:
| NAME | FIRST_SCORE | FIRST_DATE | LAST_SCORE | LAST_DATE | PERCINC |
|------|-------------|------------|------------|------------|---------|
| joe | 5 | 2014-01-01 | 15 | 2014-01-08 | 300 |
| bob | 10 | 2014-01-01 | 12 | 2014-01-08 | 120 |
live demo