In MySQL, it is fairly easy to find the number of records that exist within some time interval.
SELECT COUNT(*) FROM records WHERE create_date > '2018-01-01 01:15:00' AND create_date < '2018-01-01 02:15:00'
But I want to do the opposite, sort of. Rather than providing a time interval and getting a count of records, I want to provide a count of records and check if a X minute time interval exists where more than Y many records were created. Getting the exact time interval is not essential, only if one exists or not. At a higher level, I am attempting to identify if there was any X minute "surge" when more than Y records where created during the course of a day.
For example, in the past 24 hours was there any 1 hour interval where a "surge" of more than 50 new records occurred?
I have already ruled out dividing the 24 hours into blocks of 1 hour intervals and checking each block. This does not work because the "surge" could span two sequential 1 hour blocks, such as 25 records at the end of the 01:00:00 block and 25 records at the beginning of the 02:00:00 block.
This should do it:
SELECT COUNT(*)
FROM records r1
WHERE
(SELECT COUNT(*) FROM records r2
WHERE ABS(UNIX_TIMESTAMP(r1.create_date) - UNIX_TIMESTAMP(r2.create_date)) < X) > Y
What this does is count how many records have more than Y records that have been created within X seconds after or before each record.
So basically it will return >=1 if there are any, 0 if not.
So if you wanted to sort by hours you would want to group the records. Here I'm using the built-in functions that return parts of a timestamp, year(), month(), dayofmonth(), hour(). Since you can't use an aggregate function in the where clause I had to use having to limit by the count requirement.
select date(create_date),
hour(create_date),
count(*) as surge from records
where create_date > curdate() - interval 1 day
group by year(create_date), month(create_date),
dayofmonth(create_date), hour(create_date)
having count(*) > 50;
Another method to accomplish your goal might be to select the count of records and group by the interval in question. In this case I'm adding an hour to the create_date to get your 1 hour suggested interval. Anytime the count is greater than 50 it returns a row. Notice I'm also grouping by the hour. This is to prevent multiple starts for a "surge" within the same hour:
select create_date,count(*) as surge from records
group by year(create_date), month(create_date),
dayofmonth(create_date),hour(create_date),
(create_date + interval 1 hour - create_date) having count(*) > 50;
The problem with this however is that some surges may last longer than 1 hour, but it should give you the moment the "surge" started.
Related
Assuming that there are 6 months of historical data with hundreds of rides per day:
Write a query that returns, for each of the last 90 days, a count of the rides taken in the 7 day window preceding that day
I would like to find a way to write this in MySQL but have had some trouble with having a rolling sum that resets along with how I could cut up timestamps to reflect a day of the year/date and to then group by that.
I have tried writing subqueries that will limit the sum to a week prior and then place an additional limit of 90 days after that but cannot seem to get the code to return any output.
I have tried writing this is PostgreSQL using a sort of "window" functionality but am much more comfortable working in MySQL and would like to be able to solve it that way. I am familiar on how to write limits, group and order among other things but I am having trouble with the rolling sum resetting per week.
Thank you for your help!
First you'll want a numbers table/query. There are some tricky CTE ways to do that but it might be easier for now just to add a table with the numbers 1-90 in 90 rows.
Then use that to generate, for each row, a date range. Sorry if the syntax isn't quite correct, but write a query along the lines:
SELECT num, DATE_ADD(CURRENT_DATE(), INTERVAL -(num+7) DAY) startdate, DATE_ADD(CURRENT_DATE(), INTERVAL -num DAY) enddate FROM numbers
Then you can cross-join that with your rides table grouped on num and counting the rows in the range:
SELECT num, startdate, enddate, SUM(CASE WHEN startdate <= ridedate AND ridedate <=
enddate THEN 1 ELSE 0 END) ridecount
FROM (date range query) dts, rides
GROUP BY dts.num
Hope that helps.
Assuming you have data on each day, a correlated subquery might be simplest approach:
select dt,
(select count(*)
from rides r
where r.ridedate >= d.dte - interval 7 day and
r.ridedate < d.date
) as rolling_7
from (select distinct ridedate as dt
from rides
) dt
I'm currently running the following query to get the daily average of entries per user on my database, it's working as expected but I want to modify it to get the 7 day averages by day.
SELECT
AVG(bg),
AVG(carbs),
timestamp
FROM users_entries
WHERE uid = '10b47fded7d2ea8d' AND
timestamp >= '2019-01-01 00:00:00' AND timestamp <= '2019-01-30 00:00:00'
GROUP BY DAY(timestamp)
So for example, for the time frame, say 2019-01-01 00:00:00 to 2019-06-01 00:00:00 I would like to find all averages for 7 days and list them out. Basically take each day in the time frame, go back 7 days and get the average of the columns I select.
I'm thinking that this would require some sort of subquery but based on what I see online I do not understand them well enough to figure it out on my own, any help would be great.
In MySQL 8+, you can use window functions:
SELECT DATE(timestamp),
AVG(bg),
AVG(carbs),
AVG(AVG(bg)) OVER (ORDER BY DATE(timestamp) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as bg_7,
AVG(AVG(carbs)) OVER (ORDER BY DATE(timestamp) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as bg_7,
FROM users_entries
WHERE uid = '10b47fded7d2ea8d' AND
timestamp >= '2019-01-01' AND
timestamp < '2019-01-30'
GROUP BY DATE(timestamp);
This is much more challenging in older versions of MySQL.
I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.
Let's say you have a gigantic table measure with two columns datestamp and temp.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.
You'll need different trunc() functions for hours, etc.
The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.
I have a requirement of counting the no. of records inserted into a table for every half an hour.say from 11 to 11 30 if there 5 records and 11 30 to 12 if there are 4 records how to find the no. of records
You'd need the datetime each row was inserted; it's easiest if that is a column in the table. (We'll assume here that the column is named inserted_dt.)
All that we really need is an expression that operates on inserted_dt to return a single value for every value within a given half hour.
If we needed "hour" intervals, and not "half-hour" intervals, it would be very easy:
DATE_FORMAT(t.inserted_dt,'%Y-%m-%d %H:00:00')
Let's define the first "half-hour" ranges as minutes >= '00' AND minutes < '30'
To get the "minutes" out of the inserted_dt column, we could use either of
EXTRACT(MINUTE FROM t.inserted_dt)
DATE_FORMAT(t.inserted_dt,'%i')
We can use a conditional test to determine whether the minutes value is less than 30, or flip it around and test for greater than or equal to thirty:
DATE_FORMAT(t.inserted_dt,'%i')+0 >= 30
We can put that back together with the "year-month-day-hour", by adding an interval of either 0 or 30 minutes,
DATE_FORMAT(t.inserted_dt,'%Y-%m-%d %H:00:00')
+ INTERVAL 30*(DATE_FORMAT(t.inserted_dt,'%i')+0>=30) MINUTE
(There are lots of expressions we could use to do something similar; this one is just one of the shortest we can use to return a DATETIME datatype
Now, we just add the expression to the SELECT list of our query, we get a value that identifies the "halfhour".
To get a "count" for each half hour range, that's just a simple COUNT() aggregate and a GROUP BY. The "trick" is that we use the new "halfhour" expression in the GROUP BY clause.
SELECT DATE_FORMAT(t.inserted_dt,'%Y-%m-%d %H:00:00')
+ INTERVAL 30*(DATE_FORMAT(t.inserted_dt,'%i')+0>=30) MINUTE AS halfhour
, COUNT(*)
FROM mytable t
GROUP BY halfhour
Obviously, add a WHERE clause if you only want to return results for a specified datetime range,
SELECT DATE_FORMAT(t.inserted_dt,'%Y-%m-%d %H:00:00')
+ INTERVAL 30*(DATE_FORMAT(t.inserted_dt,'%i')+0>=30) MINUTE AS halfhour
, COUNT(*)
FROM mytable t
WHERE t.inserted_dt >= '2014-08-12'
AND t.inserted_dt < '2014-08-12' + INTERVAL 1 DAY
GROUP BY halfhour
I have a table from which I'm trying to extracted summed timediff information grouped by days. I don't really know if this is possible
Table columns: mode_type, start_time.
A record exists in this table for each time an employee starts or stops a timer. mode_type = 1 for start, mode_type = 0 for stop.
I'd like to return a sum of the seconds used for each day in the last 30 days.
E.g:
date, seconds_used
02/04/2014, 25
03/04/2014, 12415
04/04/2014, 925
Currently I can return a list of seconds used per mode_type and date but this required later calc in PHP.
SELECT
mode_type,
Sum(Unix_Timestamp(start_time)) AS time,
start_time
FROM
activations
WHERE
start_time < Date(Now() + INTERVAL 1 MONTH)
GROUP BY
mode_type, Day(start_time)
ORDER BY
start_time
I'm stuck... is this possible or do I need to do revert to calculating the diff in PHP post request?
Thanks in advance.
Can you try with this:
SELECT DATE(start_time) AS startdate, TIME_TO_SEC(TIMEDIFF(NOW(),start_time)) AS secs
FROM activations
GROUP BY
startdate