Generating time series reports - mysql

I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.

Let's say you have a gigantic table measure with two columns datestamp and temp.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.
You'll need different trunc() functions for hours, etc.
The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.

Related

use group by in mysql to aggregate 5-minute timestamp intervals

I have a mySQL database called crypto where inside I have a table called coin with three different columns: timestamp, price, volume.
The problem is the following: I want to group the data by timestamp in a period of 5 minutes, where the field price presents the maximum value and the field volume presents the sum.
I tried the following command:
SELECT sum(volume), max(price),
round(unix_timestamp(addtime(date(0), timestamp) )/(15*60)) AS
timestamp
FROM btcusd_raw0
GROUP BY timestamp;
But it doesn't return the datetime as a column.
Think about this as truncating a datestamp expression to the next lowest five-minute boundary.
How can you do that? This (long) expression works:
DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) - MINUTE(datestamp) MOD 5) MINUTE
How does it work?
DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') gives the hour of the expression. For example, it turns 2001-09-11 08:42:00 into 2001-09-11 08:00:00 .
(MINUTE(datestamp) - MINUTE(datestamp) MOD 5) retrieves the minute (42 in the example) and turns it to the next lowest five-minute boundary 40.
hourvalue + INTERVAL minutevalue MINUTE adds the hour and minute together. 2001-09-11 08:40:00
So, use this expression in your query, both in the SELECT and GROUP BY clauses.
SELECT sum(volume), max(price),
DATE_FORMAT(timestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(timestamp) - MINUTE(timestamp) MOD 5) MINUTE
FROM btcusd_raw0
GROUP BY DATE_FORMAT(timestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(timestamp) - MINUTE(timestamp) MOD 5) MINUTE;
It makes for verbose queries. You might consider creating a stored function for it. Here's a more complete explanation.

MySQL find date range with count

In MySQL, it is fairly easy to find the number of records that exist within some time interval.
SELECT COUNT(*) FROM records WHERE create_date > '2018-01-01 01:15:00' AND create_date < '2018-01-01 02:15:00'
But I want to do the opposite, sort of. Rather than providing a time interval and getting a count of records, I want to provide a count of records and check if a X minute time interval exists where more than Y many records were created. Getting the exact time interval is not essential, only if one exists or not. At a higher level, I am attempting to identify if there was any X minute "surge" when more than Y records where created during the course of a day.
For example, in the past 24 hours was there any 1 hour interval where a "surge" of more than 50 new records occurred?
I have already ruled out dividing the 24 hours into blocks of 1 hour intervals and checking each block. This does not work because the "surge" could span two sequential 1 hour blocks, such as 25 records at the end of the 01:00:00 block and 25 records at the beginning of the 02:00:00 block.
This should do it:
SELECT COUNT(*)
FROM records r1
WHERE
(SELECT COUNT(*) FROM records r2
WHERE ABS(UNIX_TIMESTAMP(r1.create_date) - UNIX_TIMESTAMP(r2.create_date)) < X) > Y
What this does is count how many records have more than Y records that have been created within X seconds after or before each record.
So basically it will return >=1 if there are any, 0 if not.
So if you wanted to sort by hours you would want to group the records. Here I'm using the built-in functions that return parts of a timestamp, year(), month(), dayofmonth(), hour(). Since you can't use an aggregate function in the where clause I had to use having to limit by the count requirement.
select date(create_date),
hour(create_date),
count(*) as surge from records
where create_date > curdate() - interval 1 day
group by year(create_date), month(create_date),
dayofmonth(create_date), hour(create_date)
having count(*) > 50;
Another method to accomplish your goal might be to select the count of records and group by the interval in question. In this case I'm adding an hour to the create_date to get your 1 hour suggested interval. Anytime the count is greater than 50 it returns a row. Notice I'm also grouping by the hour. This is to prevent multiple starts for a "surge" within the same hour:
select create_date,count(*) as surge from records
group by year(create_date), month(create_date),
dayofmonth(create_date),hour(create_date),
(create_date + interval 1 hour - create_date) having count(*) > 50;
The problem with this however is that some surges may last longer than 1 hour, but it should give you the moment the "surge" started.

Efficient SELECT query to find records within a month

I have a MySQL DB table with multiple date type fields. I need to do different SELECT queries on this table but I am not sure which way is the best to find records from the same month.
I know I can do the following:
SELECT *
FROM table
WHERE MONTH(somedate) = 5
AND YEAR(somedate) = 2015
But I keep reading that isn't efficient and that I should go with using actual dates, i.e.
SELECT *
FROM table
WHERE somedate BETWEEN '2015-05-01' AND '2015-05-31'
However, all I would have is the month and the year as variables coming in from PHP. How do I easily and quickly calculate the last day of the month if I go with second option?
Don't calculate the last day of the month. Calculate the first day of the next month instead.
Your query can be like this
WHERE t.mydatetimecol >= '2015-05-01'
AND t.mydatetimecol < '2015-05-01' + INTERVAL 1 MONTH
Note that we're doing a less than comparison, not a "less than or equal to"... this is very convenient for comparing TIMESTAMP and DATETIME columns, which can include a time portion.
Note that a BETWEEN comparison is a "less than or equal to". To get a comparison equivalent to the query above, we'd need to do
WHERE t.mydatetimecol
BETWEEN '2015-05-01' AND '2015-05-01' + INTERVAL 1 MONTH + INTERVAL -1 SECOND
(This assumes that the resolution of DATETIME and TIMESTAMP is down to a second. In other databases, such as SQL Server, the resolution is finer than a second, so there we'd have the potential of missing a row with value of '2015-05-31 23:59:59.997'. We don't have a problem like that with the less than the first day of the next month comparison... < '2015-06-01'
No need to do the month or date math yourself, let MySQL do it for you. If you muck with adding 1 to the month, you have to handle the rollover from December to January, and increment the year. MySQL has all that already builtin.
date('t', strtotime("$year-$month-01")) will give days in the month

Find data of a whole month in sql

I have this query where I provide to-date & from date.
SELECT *
FROM sales
WHERE date between to-date AND from-date;
Now I want to execute this query with following parameters
to-date = Oct-2015
some-other-date = Oct-2015
That is I want records of the whole month.
How would I do that in a query where I have to and from dates provided it will work for both scenarios where months can be same and different as well.
Update:
dataType for column date is date
You can find the first day of the month containing any given timestamp with an expression like this. For example by using the timestamp NOW(), this finds the first day of the present month.
(DATE(NOW() - INTERVAL DAYOFMONTH(DATE(NOW()))
That's handy, because then you can use an expression like
(DATE(NOW() - INTERVAL DAYOFMONTH(DATE(NOW())) - INTERVAL 1 MONTH
to find the beginning of the previous month if you like. All sorts of date arithmetic become available.
Therefore, you can use an expression like the following to find all records with item_date in the month before the present month.
WHERE item_date>=(DATE(NOW()-INTERVAL DAYOFMONTH(DATE(NOW()))- INTERVAL 1 MONTH
AND item_date < (DATE(NOW()-INTERVAL DAYOFMONTH(DATE(NOW()))
Notice that we cast the end of a range of time as an inequality (<) to the moment just after then end of the range of time.
You may find this writeup useful. http://www.plumislandmedia.net/mysql/sql-reporting-time-intervals/
It's often useful to create a stored function called TRUNC_MONTH() to perform the conversion of the arbitrary timestamp to the first day of the month. It makes your SQL statements easier to read.
select * from sales
where from-date >= 1-Oct-2015
and to-date <= 1-Nov-2015
Update
select * from sales
where date >= from-date
and date <= to-date
Here is SQLFIDDLE
You Can get month from your both to and from dates and find records of that month
SELECT * FROM sales
WHERE MONTH('2002-01-03') AND MONTH('2002-01-3')
SqlFiddle of Using Month Function

How do I select two weeks ago in MYSQL?

I have a report that is driven by a sql query that looks like this:
SELECT batch_log.userid,
batches.operation_id,
SUM(TIME_TO_SEC(ramses.batch_log.time_elapsed)),
SUM(ramses.tasks.estimated_nonrecurring + ramses.tasks.estimated_recurring),
DATE(start_time)
FROM batch_log
JOIN batches ON batch_log.batch_id=batches.id
JOIN ramses.tasks ON ramses.batch_log.batch_id=ramses.tasks.batch_id
JOIN protocase.tblusers on ramses.batch_log.userid = protocase.tblusers.userid
WHERE DATE(ramses.batch_log.start_time) > "2011-02-01"
AND ramses.batch_log.time_elapsed > "00:03:00"
AND DATE(ramses.batch_log.start_time) < now()
AND protocase.tblusers.active = 1
AND protocase.tblusers.userid NOT in ("ksnow","smanning", "dstapleton")
GROUP BY userid, batches.operation_id, date(start_time)
ORDER BY start_time, userid ASC
Since this is to be compared with the time from the current payperiod it causes an error.
Our pay periods start on a Sunday, the first pay period was 2011-02-01 and our last pay period started the 4th of this month. How do I put that into my where statement to strip the most recent pay period out of the query?
EDIT: So now I'm using date_sub(now(), INTERVAL 2 WEEK) but I really need a particular day of the week(SUNDAY) since it is wednesday it's chopping it off at wednesday.
You want to use DATE_SUB, and as an example.
Specifically:
select DATE_SUB(curdate(), INTERVAL 2 WEEK)
gets you two weeks ago. Insert the DATE_SUB ... part into your sql and you're good to go.
Edit per your comment:
Check out DAYOFWEEK:
and you can do something along the lines of:
DATE_SUB(DATE_SUB(curdate(), INTERVAL 2 WEEK), INTERVAL 2 + DAYOFWEEK(curdate()) DAY)
(I don't have a MySql instance to test it on .. but essentially subtract the number of days after Monday.)
Question isn't quite clear, especially after the edit - it isn't clear now is the "pay period" two weeks long or do you want just last two weeks back from last sunday? I assume that the period is two weeks... then you first need to know how many days the latest period (which you want to ignore, as it isn't over yet) has been going on. To get that number of days you can use expression like
DATEDIFF(today, FirstPeriod) % 14
where FirstPeriod is 2011-02-01. And now you strip that number of days from the current date in the query using date_sub(). The exact expression depends on how the period is defined but you should get the idea...