MySql query histogram for time intervals data - mysql

I have an event input of this type
event user
event start
event end
event type
Inserted to MySql table, each in its own row with user+start as primary key.
I need to query an histogram for a type by time interval (say minute) counting events occurred on each time interval.
something like:
SELECT count(*) as hits FROM events
WHERE type="browsing"
GROUP BY time_diff("2015-1-1" AND "2015-1-2") / 60 * second
but I could not find any way to do that in SQL besides writing code, any idea?
Sample data
user, start, end, type
1, 2015-1-1 12:00:00, 2015-1-1 12:03:59, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, eating
3, 2015-1-1 12:03:00, 2015-1-1 12:08:00, browsing
the result should look like this:
^
count |
browsing |
users | *
| * * * *
| * * * * * * * *
--|--|--|--|--|--|--|--|--|--> minute
0 1 2 3 4 5 6 7 8 9

You can do this using group by with the level that you want. Here is an example using the data you gave:
First the SQL to create the table and populate it. The ID column here isn't "needed" but it is recommended if the table will be large or have indexes on it.
CREATE TABLE `test`.`events` (
`id` INT NOT NULL AUTO_INCREMENT,
`user` INT NULL,
`start` DATETIME NULL,
`end` DATETIME NULL,
`type` VARCHAR(45) NULL,
PRIMARY KEY (`id`));
INSERT INTO events (user, start, end, type) VALUES
(1, '2015-1-1 12:00:00', '2015-1-1 12:03:59', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'eating'),
(3, '2015-1-1 12:03:00', '2015-1-1 12:08:00', 'browsing');
To get a list of ordered pairs of number of minutes duration to number of events:
The query can then be easily written using the timestampdiff fuction, as shown below:
SELECT
TIMESTAMPDIFF(MINUTE, start, end) as minutes,
COUNT(*) AS numEvents
FROM
test.events
GROUP BY TIMESTAMPDIFF(MINUTE, start, end)
The output:
minutes numEvents
3 3
5 1
The first parameter in the select can be one of FRAC_SECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR.
Here are some more examples of queries you can do:
Events by hour (floor function is applied)
SELECT
TIMESTAMPDIFF(HOUR, start, end) as hours,
COUNT(*) AS numEvents
FROM
test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)
**Events by hour with better formatting **
SELECT
CONCAT("<", TIMESTAMPDIFF(HOUR, start, end) + 1) as hours,
COUNT(*) AS numEvents
FROM
test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)
You can group by a variety of options, but this should definitely get you started. Most plotting packages will allow you to specify arbitrary x y coordinates, so you don't need to worry about the missing values on the x axis.
To get a list of ordered pairs of number of events at a specific time (for logging):
Note that this is left for reference.
Now for the queries. First you have to pick which item you want to use for the grouping. For example, a task might take more than a minute, so the start and end would be in different minutes. For all these examples, I am basing them off of the start time, since that is when the event actually took place.
To group event counts by minute, you can use a query like this:
SELECT
DATE_FORMAT(start, '%M %e, %Y %h:%i %p') as minute,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start), MINUTE(start);
Note how this groups by all the items, starting with year, going the minute. I also have the minute displayed as a label. The resulting output looks like this:
minute numEvents
January 1, 2015 12:00 PM 1
January 1, 2015 12:03 PM 3
This is data that you could then take using php and prepare it for display by one of the many graphing libraries out there, plotting the minute column on the x axis, and plotting the numEvents on the y axis.
Here are some more examples of queries you can do:
Events by hour
SELECT
DATE_FORMAT(start, '%M %e, %Y %h %p') as hour,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start);
Events by date
SELECT
DATE_FORMAT(start, '%M %e, %Y') as date,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start);
Events by month
SELECT
DATE_FORMAT(start, '%M %Y') as date,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start), MONTH(start);
Events by year
SELECT
DATE_FORMAT(start, '%Y') as date,
count(*) AS numEvents
FROM test.events
GROUP BY YEAR(start);
I should also point out that if you have an index on the start column for this table, these queries will complete quickly, even with hundreds of millions of rows.
Hope this helps! Let me know if you have any other questions about this.

I am going to assume that you have a numbers table that contains integers. You also have $starttime and $endtime.
This is one way to get the values you want:
select ($starttime + interval n.n - 1 minute) as thetime, n.n as minutes,
count(sd.user)
from numbers n left join
sampledata sd
on $starttime + interval n.n - 1 minute between sd.start and sd.end
where $starttime + interval n.n - 1 minute <= $endtime and
sd.end >= $starttime and
sd.start <= $endtime
group by n.n
order by n.n;

Related

Find number of rows for each hour where datetime columns match certain criteria

RDBMS: MySQL
The time column(s) datatype is of datetime
For every hour of the 24 hour day I need to retrieve the number of rows in which their start_time matches the hour OR the end_time is great than or equal to the hour.
Below is the current query I have which returns the data I need but only based off of one hour. I can loop through and do 24 separate queries for each hour of the day but I would love to have this in one query.
SELECT COUNT(*) as total_online
FROM broadcasts
WHERE DATE(start_time) = '2018-01-01' AND (HOUR(start_time) = '0' OR
HOUR(end_time) >= '0')
Is there a better way of querying the data I need? Perhaps by using group by somehow? Thank you.
Not exactly sure if i am following, but try something like this:
select datepart(hh, getdate()) , count(*)
from broadcasts
where datepart(hh, starttime) <=datepart(hh, endtime)
and cast(starttime as date)=cast(getdate() as date) and cast(endtime as date)=cast(getdate() as date)
group by datepart(hh, getdate())
Join with a subquery that returns all the hour numbers:
SELECT h.hour_num, COUNT(*) AS total_online
FROM (SELECT 0 AS hour_num UNION SELECT 1 UNION SELECT 2 ... UNION SELECT 23) AS h
JOIN broadcasts AS b ON HOUR(b.start_time) = h.hour_num OR HOUR(b.end_time) >= h.hour_num
WHERE DATE(b.start_time) = '2018-01-01'
GROUP BY h.hour_num

Mysql Get data for Last Six weeks using JOIN

I have edited the query by selecting all the Employees data which have done assessments in past six weeks. Logically it should each employee two time if it has done assessments in two weeks but this query shows single record.
select
AssessmentEmployee.
EmployeeName,
AVG(AssessmentListing.AssessmentScore),
DATE_FORMAT((STR_TO_DATE(`AssessmentSubmittedDatetime`, '%d-%b-%Y %I:%i %p')) , '%Y-%m-%v') as _month
from AssessmentEmployee
LEFT JOIN AssessmentListing
ON AssessmentEmployee.AssessmentID=AssessmentListing.AssessmentID
WHERE (STR_TO_DATE(`AssessmentSubmittedDatetime`, '%d-%b-%Y %I:%i %p') >= DATE_FORMAT(NOW() - INTERVAL 6 Week, '%Y' ))
group by AssessmentEmployee.EmployeeName
I have following table which I am using.
AssessmentEmployee
ID
AssessmentID
EmployeeName
Other table is AssessmentListing
ID
AssessmentID
AssessmentSubmittedDateTime
AssessmentScore
I want to get the employees who have score/ done assessments in Last sex weeks and their average score.
Sample of Data Column of AssessmentListing
ID AssessmentID AssessmentSubmittedDatetime AssessmentScore
1 040416024720 04-Apr-2016 02:48 PM 50
Please try the following query:
select
AssessmentEmployee.
EmployeeName,
AVG(AssessmentListing.AssessmentScore),
DATE_FORMAT((STR_TO_DATE(`AssessmentSubmittedDatetime`, '%d-%b-%Y %I:%i %p')) , '%Y-%v') as year_week
from AssessmentEmployee
LEFT JOIN AssessmentListing
ON AssessmentEmployee.AssessmentID=AssessmentListing.AssessmentID
WHERE UNIX_TIMESTAMP(DATE_FORMAT(STR_TO_DATE(`AssessmentSubmittedDatetime`,'%d-%b-%Y %I:%i %p'),'%Y-%m-%d')) >= UNIX_TIMESTAMP(CURDATE() - INTERVAL 6 WEEK)
group by AssessmentEmployee.EmployeeName, year_week;
You shouldn't store date / time as string. Otherwise embrace these cumbersome jobs while processing them.
you're doing an average so it will average the 2 scores if they take it twice.
SELECT *, SUM(AssessmentScore) as total, SUM(AssessmentScore)/6 as avg
FROM `assessmentlisting`
WHERE STR_TO_DATE(`AssessmentSubmittedDatetime`, '%d-%b-%Y %I:%i %p') > DATE_FORMAT(NOW() - INTERVAL 6 Week, '%Y-%m-%d %I:%i %p' )
GROUP BY assessmentlisting.AssessmentID
Hope it will works.

Find number of "active" rows each month for multiple months in one query

I have a mySQL database with each row containing an activate and a deactivate date. This refers to the period of time when the object the row represents was active.
activate deactivate id
2015-03-01 2015-05-10 1
2013-02-04 2014-08-23 2
I want to find the number of rows that were active at any time during each month. Ex.
Jan: 4
Feb: 2
Mar: 1
etc...
I figured out how to do this for a single month, but I'm struggling with how to do it for all 12 months in a year in a single query. The reason I would like it in a single query is for performance, as information is used immediately and caching wouldn't make sense in this scenario. Here's the code I have for a month at a time. It checks if the activate date comes before the end of the month in question and that the deactivate date was not before the beginning of the period in question.
SELECT * from tblName WHERE activate <= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND deactivate >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
If anybody has any idea how to change this and do grouping such that I can do this for an indefinite number of months I'd appreciate it. I'm at a loss as to how to group.
If you have a table of months that you care about, you can do:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from months m;
If you don't have such a table handy, you can create one on the fly:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from (select date('2015-01-01') as month_start, date('2015-01-31') as month_end union all
select date('2015-02-01') as month_start, date('2015-02-28') as month_end union all
select date('2015-03-01') as month_start, date('2015-03-31') as month_end union all
select date('2015-04-01') as month_start, date('2015-04-30') as month_end
) m;
EDIT:
A potentially faster way is to calculate a cumulative sum of activations and deactivations and then take the maximum per month:
select year(date), month(date), max(cumes)
from (select d, (#s := #s + inc) as cumes
from (select activate_date as d, 1 as inc from table t union all
select deactivate_date, -1 as inc from table t
) t cross join
(select #s := 0) param
order by d
) s
group by year(date), month(date);

MySQL group by week

I have a large number of records with a transaction datetime field going back several years. I would like to do a comparative analysis between the same timespan this year and last. How can I group by week over a 3 month range?
I'm running into problems using the YEARWEEK and WEEK functions because of the day the year 2012 starts of versus the day 2011 starts on.
Given that I have records with datetimes everyday from Jan 1st to the current day, and records with the same datetimes from the prior year, how can I group by week so the output is sums with dates like: 01/01/2011, 01/08/2011, 01/15/2011, etc., and 01/01/2012, 01/08/2012, 01/15/2012, etc.?
My query so far is as follows:
SELECT
DATE_FORMAT(A.transaction_date, '%Y-%m-%d') as date,
ROUND(sum(A.quantity), 3) AS quantity,
ROUND(sum(A.total_amount), 3) AS amount,
A.product_code,
D.fuel_type_code,
D.fuel_type_name,
C.customer_code,
C.customer_name
FROM
cl_transactions AS A
INNER JOIN
card AS B ON A.card_number=B.card_number
INNER JOIN
customer AS C ON B.customer_code=C.customer_code
INNER JOIN
fuel_type AS D ON A.fuel_type=D.fuel_type_code
WHERE
((A.transaction_date >= DATE_FORMAT(NOW() - INTERVAL 3 MONTH, '%Y-%m-01')) OR (A.transaction_date - INTERVAL 1 YEAR >= DATE_FORMAT(NOW() - INTERVAL 15 MONTH, '%Y-%m-01') AND A.transaction_date <= NOW() - INTERVAL 1 YEAR))
GROUP BY
A.transaction_date, fuel_type_code;
I would essentially like something that achieves the following pseudo-query:
GROUP BY
STARTING FROM THE OLDEST DATE (A.transaction_date + INTERVAL 6 DAY)
I started with an inner query using sqlvariables to build out from/to ranges for this year and last year of each respective start of year/month/day (ex: 2012-01-01 and 2011-01-01 respectively). From that, I'm also pre-formatting the date for final output so you have ONE master date basis for display reflecting that of whatever the "this year" week would be.
From that, I do a join to the transaction table where the transaction date is BETWEEN the respective start of current week and start of next week. Since date/time stamps include hour minute, 2012-01-01 by itself is implied as 12:00:00am (midnight) of the day. and between will go UP TO 7 days later 12:00:00 am. And that date will become the start date of the following week.
So, by joining on the date being between EITHER last yr or this yr time period, its the same group qualification. So the field selection does a ROUND( SUM( IF() )) per respective last year or this year. if the incoming transaction date is LESS than the current year's week start, then it must be a record from prior year, otherwise its for the current year. So, respectively, add the value itself, or zero as it applies.
So now, you have the group by. The week that it qualified for was already prepared from the inner query via "ThisYearWeekOf" formatted column, regardless of the otherwise computed "YEARWEEK()" or "WEEK()". The date ranges took care of that qualification for us.
Finally, I added the fuel-type as a join and included that as the group by. You have to group by all non-aggregate columns for proper SQL, although MySQL lets you get by by just grabbing the first entry for the given group if it is NOT so specified in group by.
To close, I DID include the information for the customer as you didn't have it in the group by and did not appear to be applicable... it would just arbitrarily grab one. However, I've added it to the group by, so now your records will show at the per customer level, per product and fuel type, how much sales and quantity between this year and last.
SELECT
JustWeekRange.ThisYearWeekOf,
CTrans.product_code,
FT.fuel_type_code,
FT.fuel_type_name,
C.customer_code,
C.customer_name,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, CTrans.Quantity, 0 )), 3) as LastYrQty,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, CTrans.total_amount, 0 )), 3) as LastYrAmt,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, 0, CTrans.Quantity )), 3) as ThisYrQty,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, 0, CTrans.total_amount )), 3) as ThisYrAmt,
FROM
( SELECT
DATE_FORMAT(#ThisYearDate, '%Y-%m-%d') as ThisYearWeekOf,
#LastYearDate as LastYrWeekStart,
#ThisYearDate as ThisYrWeekStart,
#LastYearDate := date_add( #LastYearDate, interval 7 day ) LastYrStartOfNextWeek,
#ThisYearDate := date_add( #ThisYearDate, interval 7 day ) ThisYrStartOfNextWeek
FROM
(select #ThisYearDate := '2012-01-01',
#LastYearDate := '2011-01-01' ) sqlvars,
cl_transactions justForLimit
HAVING
ThisYrWeekStart < '2012-04-01'
LIMIT 15 ) JustWeekRange
JOIN cl_transactions AS CTrans
ON CTrans.transaction_date BETWEEN
JustWeekRange.LastYrWeekStart AND JustWeekRange.LastYrStartOfNextWeek
OR CTrans.transaction_date BETWEEN
JustWeekRange.ThisYrWeekStart AND JustWeekRange.ThisYrStartOfNextWeek
JOIN fuel_type FT
ON CTrans.fuel_type = FT.fuel_type_code
JOIN card
ON CTrans.card_number = card.card_number
JOIN customer AS C
ON card.customer_code = C.customer_code
GROUP BY
JustWeekRange.ThisYearWeekOf,
CTrans.product_code,
FT.fuel_type_code,
FT.fuel_type_name,
C.customer_code,
C.customer_name

Rolling 30 day uniques in sql

Suppose you have a table of the form:
create table user_activity (
user_id int not null,
activity_date timestamp not null,
...);
It's easy enough to select the number of unique user_id's in the past 30 days.
select count(distinct user_id) from user_activity where activity_date > now() - interval 30 day;
But how can you select the number of unique user_ids in the prior 30 days for each of the past 30 days? E.g. uniques for 0-30 days ago, 1-31 days ago, 2-32 days ago and so on to 30-60 days ago.
The database engine is mysql if it matters
You could try using a sub query:
SELECT DISTINCT `activity_date` as `day`, (
SELECT count(DISTINCT `user_id`) FROM `user_activity` WHERE `activity_date` = `day`
) as `num_uniques`
FROM `user_activity`
WHERE `activity_date` > NOW() - INTERVAL 30 day;
This should give you the number of unique users for each day. However, I haven't tested this since I don't have the DB to work with.
I haven't tried this in MySQL, but hopefully the syntax is right. If not, maybe it will point you in the right direction. First, I often employ a Numbers table. It can be a physical table simply made up of numbers or it can be a generated/virtual/temporary table.
SELECT
N.number,
COUNT(DISTINCT UA.user_id)
FROM
Numbers N
INNER JOIN User_Activity UA ON
UA.activity_date > NOW() - INTERVAL 30 + N.number DAY AND
UA.activity_date <= NOW() - INTERVAL N.number DAY
WHERE
N.number BETWEEN 0 AND 30
GROUP BY
N.number
I'm not familiar with the whole INTERVAL syntax, so if I got that wrong, please let me know and I'll try to correct it.
If you get the days number for todays date and mod it by 30 you get the offset of the current day. Then you add that to each number for a date and divide the result by 30, this gives you the group of days. Then group your results by this number. So in code something like this:
select count(distinct user_id), (to_days(activity_date)+(to_days(now()) % 30)) / 30 as period
from user_activity
group by (to_days(activity_date)+(to_days(now()) % 30)) / 30
I will leave calculating the reverse numbering of period up to you (hint: take the period number for the current date as "max" and subtract period above and add 1.)