Aggregating data by timespan in MySQL - mysql

Basically I want is to aggregate some values in a table according to a timespan.
What I do is, I take snapshots of a system every 15 minutes and I want to be able to draw some graph over a long period. Since the graphs get really confusing if too many points are shown (besides getting really slow to render) I want to reduce the number of points by aggregating multiple points into a single point by averaging over them.
For this I'd have to be able to group by buckets that can be defined by me (daily, weekly, monthly, yearly, ...) but so far all my experiments had no luck at all.
Is there some trick I can apply to do so?

I had a similar question: collating-stats-into-time-chunks and had it answered very well. In essence, the answer was:
Perhaps you can use the DATE_FORMAT() function, and grouping. Here's an example, hopefully you can adapt to your precise needs.
SELECT
DATE_FORMAT( time, "%H:%i" ),
SUM( bytesIn ),
SUM( bytesOut )
FROM
stats
WHERE
time BETWEEN <start> AND <end>
GROUP BY
DATE_FORMAT( time, "%H:%i" )
If your time window covers more than one day and you use the example format, data from different days will be aggregated into 'hour-of-day' buckets. If the raw data doesn't fall exactly on the hour, you can smooth it out by using "%H:00."
Thanks be to martin clayton for the answer he provided me.

It's easy to truncate times to the last 15 minutes (for example), by doing something like:
SELECT dateadd(minute, datediff(minute, '20000101', yourDateTimeField) / 15 * 15, '20000101') AS the15minuteBlock, COUNT(*) as Cnt
FROM yourTable
GROUP BY dateadd(minute, datediff(minute, '20000101', yourDateTimeField) / 15 * 15, '20000101');
Use similar truncation methods to group by hour, week, whatever.
You could always wrap it up in a CASE statement to handle multiple methods, using:
GROUP BY CASE #option WHEN 'week' THEN dateadd(week, .....

As an addition to #cmroanirgo, I didn't need "sums" of data, but avarages (to see the avarage FPS / player count of my game servers). And, I need to view in detail per 5 minutes - or view an entire week of data (data gets stored every minute).
As an example, you can use the SQL command AVG instead of SUM to get an avarage. Also, you'd have to name your selected values to something, and it shouldn't be the actual field name (that will conflict lateron in your query). Here's the query I'm using to aggregate avarages, of 1 week, by the hour:
SELECT
DATE_FORMAT( moment, "%Y-%m-%d %H:00" ) as _moment,
AVG( maxplayers ) as _maxplayers,
AVG( players ) as _players,
AVG( servers ) as _servers,
AVG( avarage_fps ) as _avarage_fps,
AVG( avarage_realfps ) as _avarage_realfps,
AVG( avarage_maxfps ) as _avarage_maxfps
FROM
playercount
WHERE
moment BETWEEN "<date minus 1 week>" AND "<now>"
GROUP BY
_moment
ORDER BY moment ASC
This is then used (together with PHP) to use in a Bootstrap graph;
<?php
//Do the query here
foreach ($result->fetch_all(MYSQLI_ASSOC) as $item) {
$labels[] = $item['_moment'];
$maxplayers[] = $item['_maxplayers'];
$players[] = $item['_players'];
$servers[] = $item['_servers'];
$fps[] = $item['_avarage_fps'];
$fpsreal[] = $item['_avarage_realfps']/10;
$fpsmax[] = $item['_avarage_maxfps'];
}
?>
var playerChartId = document.getElementById("playerChartId");
var playerChart = new Chart(playerChartId, {
type: 'line',
data: {
labels: ["<?= implode('","', $labels); ?>"],
datasets: [
{
data: [<?= implode(',', $servers); ?>],
borderColor: '#007bff',
pointRadius: 0
},
//etc...

Related

MySql - Calculating distance in time using 2 values from 1 column (Poor design workaround)

I was granted access to a legacy database in order to do some statistics work.
I've so far gotten everything I need out of it, except I am trying to calculate a distance in time, using 5 values, stored in 4 columns (ARGGGHHH)
Above is a subsection of the database.
As you can see, I have start and stop date and time.
I would like to calculate the distance in time from str_date + str_time to stp_date + stp_time
The issue I have is, the calculation should be performed differently depending on the second value in stp_time.
IFF second value = "DUR".... THen I can just take the first value "01:04:51" in this scenario
IFF second value = anything else. stp_time represents a timecode and not a duration. This must then calculate stp_time - str_time (accounting for date if not same date)
All data is 24 hour format. I have done work with conditional aggregation, but I have not figured this one out, and I have never worked with a malformed column like this before.
Any and all advice is welcome.
Thanks for reading
SELECT
CASE WHEN RIGHT(stp_time,3)="DUR"
THEN
TIMEDIFF(LEFT(stp_time,8), '00:00:00')
ELSE
TIMEDIFF(
STR_TO_DATE(CONCAT(stp_date," ",LEFT(stp_time,8)), '%d/%b/%Y %H:%i:%s'),
STR_TO_DATE(CONCAT(str_date," ",LEFT(str_time,8)), '%d/%b/%Y %H:%i:%s')
)
END AS diff
FROM so33289063
Try this out, you might want a where condition for the subquery
With left and right:
SELECT IF(dur,stp,timediff(str,stp)) FROM(
SELECT STR_TO_DATE(CONCAT(str_date," ",LEFT(str_time,8)), 'd%/%b/%Y %H:%i:%s') as str,
STR_TO_DATE(CONCAT(stp_date," ",LEFT(stp_time,8)), 'd%/%b/%Y %H:%i:%s') as stp,
if(RIGHT(stp_time,3)="DUR",1,0) as dur
FROM my_table
) AS times

Calculating time difference between activity timestamps in a query

I'm reasonably new to Access and having trouble solving what should be (I hope) a simple problem - think I may be looking at it through Excel goggles.
I have a table named importedData into which I (not so surprisingly) import a log file each day. This log file is from a simple data-logging application on some mining equipment, and essentially it saves a timestamp and status for the point at which the current activity changes to a new activity.
A sample of the data looks like this:
This information is then filtered using a query to define the range I want to see information for, say from 29/11/2013 06:00:00 AM until 29/11/2013 06:00:00 PM
Now the object of this is to take a status entry's timestamp and get the time difference between it and the record on the subsequent row of the query results. As the equipment works for a 12hr shift, I should then be able to build a picture of how much time the equipment spent doing each activity during that shift.
In the above example, the equipment was in status "START_SHIFT" for 00:01:00, in status "DELAY_WAIT_PIT" for 06:08:26 and so-on. I would then build a unique list of the status entries for the period selected, and sum the total time for each status to get my shift summary.
You can use a correlated subquery to fetch the next timestamp for each row.
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#;
Then you can use that query as a subquery in another query where you compute the duration between timestamp and next_timestamp. And then use that entire new query as a subquery in a third where you GROUP BY status and compute the total duration for each status.
Here's my version which I tested in Access 2007 ...
SELECT
sub2.status,
Format(Sum(Nz(sub2.duration,0)), 'hh:nn:ss') AS SumOfduration
FROM
(
SELECT
sub1.status,
(sub1.next_timestamp - sub1.timestamp) AS duration
FROM
(
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#
) AS sub1
) AS sub2
GROUP BY sub2.status;
If you run into trouble or need to modify it, break out the innermost subquery, sub1, and test that by itself. Then do the same for sub2. I suspect you will want to change the WHERE clause to use parameters instead of hard-coded times.
Note the query Format expression would not be appropriate if your durations exceed 24 hours. Here is an Immediate window session which illustrates the problem ...
' duration greater than one day:
? #2013-11-30 02:00# - #2013-11-29 01:00#
1.04166666667152
' this Format() makes the 25 hr. duration appear as 1 hr.:
? Format(#2013-11-30 02:00# - #2013-11-29 01:00#, "hh:nn:ss")
01:00:00
However, if you're dealing exclusively with data from 12 hr. shifts, this should not be a problem. Keep it in mind in case you ever need to analyze data which spans more than 24 hrs.
If subqueries are unfamiliar, see Allen Browne's page: Subquery basics. He discusses correlated subqueries in the section titled Get the value in another record.

mysql - set value of cell equal to value of cell in another row

I have a MySQL query that generates a table for my vehicle tracking 'in' and 'out' times.
The problem is that the 'in' time is not the same as the 'out' time so seconds or minutes are lost in between.
Is there a way to set the 'in' time equal to the 'out time' from the previous row, even if I need to embed my current select inside a new select?
you will see on the image below that the first rows out time is 15:45:14 and the in time for the next row is 15:46:14. so in this case a minute is lost
in reality if the vehicles has left one point, it is immediately on the road to the next point so I can set the in time equal to the out time of the previous row. This way, time is never lost
the sql for my query is:
select vehicle,InTime,OutTime from (select
PreQuery.callingname as vehicle,
PreQuery.geofence,
PreQuery.GroupSeq,
MIN( PreQuery.`updatetime` ) as InTime,
UNIX_TIMESTAMP(MIN( PreQuery.`updatetime`))as InSeconds,
MAX( PreQuery.`updatetime` ) as OutTime,
UNIX_TIMESTAMP(MAX( PreQuery.`updatetime`))as OutSeconds,
TIME_FORMAT(SEC_TO_TIME((UNIX_TIMESTAMP(MAX( PreQuery.`updatetime` )) - UNIX_TIMESTAMP(MIN( PreQuery.`updatetime`)))),'%H:%i:%s') as Duration,
(UNIX_TIMESTAMP(MAX( PreQuery.`updatetime` )) - UNIX_TIMESTAMP(MIN( PreQuery.`updatetime`))) as DurationSeconds
from
( select
v_starting.callingname,
v_starting.geofence,
v_starting.`updatetime`,
#lastGroup := #lastGroup + if( #lastAddress = v_starting.geofence
AND #lastVehicle = v_starting.callingname, 0, 1 ) as GroupSeq,
#lastVehicle := v_starting.callingname as justVarVehicleChange,
#lastAddress := v_starting.geofence as justVarAddressChange
from
v_starting,
( select #lastVehicle := '',
#lastAddress := '',
#lastGroup := 0 ) SQLVars
order by
v_starting.`updatetime` ) PreQuery
Group By
PreQuery.callingname,
PreQuery.geofence,
PreQuery.GroupSeq) parent
where (InTime> DATE_SUB('2013-03-23 15:00', INTERVAL 24 HOUR) or OutTime> '2013-03-23 15:00' ) and vehicle='TT08' order by InTime asc
The MySQL syntax is in depth so quite large but could be done on a much simpler query as well. like
select vehicle, intime,outtime from vehicletimes
My desired result is something like:
select vehicle, intime(outtime of row above),outtime from vehicletimes
The first rows in time can be as is and the last rows outtime can be as is. I just need to account for every second between the smallest in time and the largest out time.
Any help appreciated as always.
Thanks in advance
I think this will give you the latest in-time prior to each current out-time, for your existing records:
select
vt.vehicle, max(qGetMaxOut.outtime) as intime , vt.outtime
from
vehicle_times vt
inner join
(
select vehicle, outtime
from vehicle_times
) qGetMaxOut
on qGetMaxOut.vehicle = vt.vehicle
and qGetMaxOut.outtime <= vt.intime
group by
vt.vehicle, vt.outtime
The above query will also help you if you want to insert a new record, but need to find the previous in-time for a particular time (ie if you need to insert a new record who's in/out times are prior to the latest time - eg inserting a record that was somehow previously missed and where newer time entries have been added since). If you need this scenario, let me know and I'll elaborate if you can't work it out from the above.
The join basically joins the table "back on itself" to provide another "copy", but limits the results in the "copy" to only those rows for the current vehicle in the main table, and excludes those rows from the copy where the vehicle's out-time is more recent than the current in-time from the main table. This way you can do a MAX() over the copy, to find what the previous out time was.
I don't know your specific requirements, but I would recommend storing the most accurate information you can. So if "sythensising" a value is just for cosmetic purposes on a few reports, I would leave the data alone, and tidy up the report, rather than loosing data that might come in handy down the track. eg what happens if in the future, you suddenly have a requirement to tell your boss "how long are our vehicles 'in' and sitting idle for?"
But if you do just want to insert a new record with the actual out-time ignored, and replaced by the in-time from the most recent record, then this following query will find that value for you:
select
vt.vehicle, max(vt.outtime) as intime
from
vehicle_times vt
group by
vt.vehicle
Have I missed your requirement?

How to get count of records added since the start of the current month

I am building a webservice using php & mysql, and would like to limit requests from each apikey to (x) amount within (y) time.
To achieve this, i count the records added to the the log from each apikey within a period using a sql statement like this:
SELECT COUNT(*) FROM ws_account_log WHERE account_log_account_id='1' AND account_log_timestamp > DATE_SUB(now(), INTERVAL 1 HOUR)
This gives me the a rolling hourly count, which is fine, but I would like a hard limit on the monthly count. eg how many rows have been added since 00:00 of the first day of the current month.
I have seen examples using stored procedures and different syntax, I would like an answer that will work on MYSQL please. Also, if possible the fastest implementation as one of the services provides autocomplete functions, therefore the log table is rather large.
Thanks
SELECT COUNT(*)
FROM ws_account_log
WHERE
account_log_account_id='1'
AND account_log_timestamp >= SUBDATE(CURDATE(), DAYOFMONTH(CURDATE())-1)
Calculate timestamp to the moment you want using php, then pass it to mysql query.
$date = mktime(0, 0, 0, date("n"), 1, date("Y"));
$mysqldate = date( 'Y-m-d H:i:s', $date );
$query = " ... AND account_log_timestamp > ".$mysqldate;

Grouping Unix Timestamp by Day Producing Unevenly Spaced Groups

I'm using a MySQL query to pull a range of datetimes as a Unix Timestamp (because I'll be converting them to Javascript time). I'm grouping by 'FROM_UNIXTIME' as below:
SELECT
UNIX_TIMESTAMP(DateAndTime) as x,
Sum(If(Pass='Pass',1,0)) AS y,
Sum(If(Pass='Fail',1,0)) AS z,
Sum(If(Pass='Fail',1,0))/(Sum(If(Pass='Pass',1,0))+Sum(If(Pass='Fail',1,0))) AS a,
cases.primaryApp
FROM casehistory, cases
WHERE DATE_SUB(CURDATE(),INTERVAL 80 DAY) <= DateAndTime
AND cases.caseNumber = casehistory.caseNumber
AND cases.primaryApp = 'Promo'
GROUP BY FROM_UNIXTIME(x, '%Y-%m-%d')
While I'd expected my timestamps to be returnd evenly spaced (that is, same amount of time between each day/group), I get the following series:
1300488140, 1300501520,
1300625099, 1300699980
All the other data from the query is correct, but because the spacing of the timestamps is irregular, a bar chart based on these stamps looks pretty awful. Perhaps I'm doing something wrong in the way I apply the grouping?
Thank you for the reply. My query 'made sense' in that it produced that could be plotted (the grouping was done on the x alias for the dateandtime value), but the problem was that pulling a Unix timestamp from the database and grouping by day returned a series of timestamps that did not have equal distance between them.
I solved this by pulling only the day (without the time) from the datetime MySQL field, then - in PHP - concatenating an empty time to the date, converting the resulting string to a time, then multiplying the whole shebang by 1000 to return the Javascript time I needed for the charting, like this:
x = x . ' 00:00:00';
x = strtotime(x) * 1000;
The answer put me on the right track; I'll accept it. My chart looks perfect now.
Question is very confused.
Your SQL statement makes no sense - you are grouping by entities not found in the select statement. And a bar chart plots an ordered set of values - so if there's something funny with the spacing then its not really a bar chart.
But I think the answer you are looking for is:
SELECT DATE_FORMAT(dateandtime, '%Y-%m-%d') as ondate
, SUM(IF(Pass='Pass',1,0)) AS passed
, SUM(IF(Pass='Fail',1,0)) AS failed
, SUM(IF(Pass='Fail',1,0))
/(SUM(IF(pass='Pass',1,0))+SUM(IF(Pass='Fail',1,0))) AS fail_pct
, cases.primaryapp
FROM casehistory, cases
WHERE DATE_SUB(CURDATE(),INTERVAL 80 DAY) <= dateandtime
AND cases.casenumber = casehistory.casenumber
AND cases.primaryapp = 'Promo'
GROUP BY DATE_FORMAT(dateandtime, '%Y-%m-%d')
ORDER BY 1;
And if you need Unix timestamps, wrap the above in....
SELECT UNIX_TIMESTAMP(STR_TO_DATE(CONCAT(ilv.ondate, ' 00:00:00'))) AS tstamp
, passed
, failed
, fail_pct
, primaryapp
FROM (
...
) AS ilv
Note that you'll still get anomolies around DST switches.
C.