Group By sets of values - mysql

for a report I'm trying to query events from different shifts. The shifts start on 6 am, 2 pm, and 10 pm every day, and all of the data in the table is tagged with a datetime timestamp. Previously the graveyard shift wasn't doing anything important, so a simple group by DATE(stamp) was sufficient, but now it's 24/7 and I need to break it down into shifts.
Can anyone explain to me how to use a single group by clause to combine datetime values from a range or a set of values? The difficulty is that each graveyard shift spans two calendar days.
I've considered populating a table with 24 hours and shift numbers, then outer joining it and group by DATE(stamp), HOUR(stamp), but that seems hackish and possibly not even working, plus it would give 24 values for each day instead of 3, which then have to be combined in a superquery or script.
MySQL-specific is perfectly ok, that's all we ever use in the reporting.

Since they are all 8-hour shifts, offset by 6 hours from starting at midnight, you turn Stamp into the start-of-shift time like this:
select
stamp,
adddate(date(subdate(stamp, interval 6 hour)),
interval ((hour(subdate(stamp, interval 6 hour))
div 8) * 8) + 6 hour) as shift_start
from mytable;
This substracts 6 hours, then rounds the hour down to either 0 1 or 2 by using integer division, then expands it out again.
Here's the test code with some edge cases:
create table mytable (stamp datetime);
insert into mytable values ('2011-08-17 22:00:00'), ('2011-08-17 23:01:00'),
('2011-08-18 00:02:00'), ('2011-08-18 05:59:00'), ('2011-08-18 06:00:00'),
('2011-08-18 13:59:00'), ('2011-08-18 14:00:00'), ('2011-08-18 17:59:00');
Output of above query:
+---------------------+---------------------+
| stamp | shift_start |
+---------------------+---------------------+
| 2011-08-17 22:00:00 | 2011-08-17 22:00:00 |
| 2011-08-17 23:01:00 | 2011-08-17 22:00:00 |
| 2011-08-18 00:02:00 | 2011-08-17 22:00:00 |
| 2011-08-18 05:59:00 | 2011-08-17 22:00:00 |
| 2011-08-18 06:00:00 | 2011-08-18 06:00:00 |
| 2011-08-18 13:59:00 | 2011-08-18 06:00:00 |
| 2011-08-18 14:00:00 | 2011-08-18 14:00:00 |
| 2011-08-18 17:59:00 | 2011-08-18 14:00:00 |
+---------------------+---------------------+

Try this:
GROUP BY DATE(DATE_ADD(Stamp,INTERVAL -6 HOUR))
That should keep all your shifts on the same day

I think you should pursue your "table with hours and shift numbers" approach. Further, I think you should consider using a calendar table i.e. a table not just covering 24 hours but the whole past, present and future of your enterprise's expected needs. This is not hackish: rather, it is a tried and tested approach. The idea is that SQL is a declarative language designed to query data in tables so a declarative, data-driven solutions make a lot of sense.

Related

SQL max date related issue

I'm having a bit of an issue with max(date) in SQL.
Basically the problem being that I have to check if latest date entered by id is more than 1 days old and then return that date.
id| user_id| send_date
8 | 90 | 2016-10-21 14:31:14
| 10 | 90 | 2016-10-25 09:56:28
| 11 | 18 | 2016-10-22 09:56:28
| 12 | 19 | 2016-10-21 09:56:28
| 13 | 19 | 2016-10-23 09:56:28
| 13 | 20 | 2016-10-25 09:56:28
This is part of a much longer SQL (just the part that I have a problem with):
SELECT max(h.send_date) as lastSent
FROM history h
WHERE (h.send_date < NOW() - INTERVAL 1 DAY);
Now what happens is that instead of selecting rows where latest entered date is older than 1 day, I get the latest one that is older than 1 day even if there's a newer entry in the table.
Does anyone have an idea how to change it so that SQL would only return the latest date when it's older that 24h and the newest (by user) in the table (in the example, it would have to return nothing because there's an entry less than 24h old)?
Edited the table example a bit. This is what I need to get as a result (user_ids 90 and 20 get's ignored because of 2016-10-25 09:56:28):
18 | 2016-10-22 09:56:28
19 | 2016-10-23 09:56:28
for aggregation function you should use having and not where
SELECT max(h.send_date) as lastSent
FROM history h
having max(h.send_date ) < DATE_SUB(NOW() ,INTERVAL 1 DAY) ;

Convert MySQL now() based on timezone of rows being searched

Suppose I have data in a table "events" structured like this:
eventid | datetime_start | datetime_end | timezone
001 | 2016-01-01 10:00:00 | 2016-01-01 14:00:00 | America/Los_Angeles
002 | 2016-01-03 19:00:00 | 2016-01-03 22:00:00 | America/Los_Angeles
003 | 2016-01-17 02:00:00 | 2016-01-17 06:00:00 | America/New_York
004 | 2016-01-31 23:00:00 | 2016-02-01 01:00:00 | America/Los_Angeles
The timezone column allows dates/times to be stored exactly as entered rather than normalized to UTC, GMT, etc.
I want to query the table to find eventids where "now()" falls between datetime_start and datetime_end:
SELECT eventid FROM events WHERE now() BETWEEN datetime_start AND datetime_end
However, since "now()" is based on a fixed timezone (UTC in my case), is there any way to convert "now()" to match the timezone column row by row as it searches? Maybe something in the spirit of the following:
SELECT eventid FROM events WHERE CONVERT_TZ(now(),'UTC',timezone) BETWEEN datetime_start
AND datetime_end

Select distinct and get sum of timestamp differences

I don't know if this is possible, but it'd be really awesome. I have a table of sign-ins for people who are logging time on different projects and I need to compile a report of time logged for each project for a given time period.
My table looks something like this:
id | project | time_in | time_out | break
----------------------------------------------------------------
1 | 1 | 2014-12-07 05:00:00 | 2014-12-07 10:00:00 | 30
2 | 2 | 2014-12-07 06:00:00 | 2014-12-07 13:00:00 | 15
3 | 1 | 2014-12-07 14:00:00 | 2014-12-07 18:00:00 | 0
4 | 3 | 2014-12-07 08:30:00 | 2014-12-07 18:45:00 | 75
5 | 2 | 2014-12-07 12:00:00 | 2014-12-07 16:30:00 | 0
What I'd like to be able to do is get a report of the time logged for each project given a date range, i.e. the total time, probably in seconds, logged for each project.
time_in and time_out are fields of type TIMESTAMP; break is an integer representing the number of minutes the person was on break. I need to get the sum of time_out - time_in - break for each project, e.g. for December 7:
project | time
---------------
1 | 34200
2 | 40500
3 | 34200
This is all I have so far:
SELECT DISTINCT
`project`
FROM `sign_ins`
WHERE
`time_in` >= '2014-12-07 00:00:00' AND
`time_out` <= '2014-12-08 00:00:00';
I appreciate your help on this, SO community. You guys are so brilliant.
You can get the difference in seconds by converting the date/time values to Unix time stamps. Then, just aggregate the differences using sum():
SELECT project,
SUM(UNIX_TIMESTAMP(time_out) - UNIX_TIMESTAMP(time_in) - (break * 60)) as DiffSecs
FROM `sign_ins`
WHERE `time_in` >= '2014-12-07 00:00:00' AND
`time_out` <= '2014-12-08 00:00:00'
GROUP BY project;

Timezone issue on Simple temperature trend using average for last hour in mysql

I would like to use highcharts to see the temperature behaviour on last hour.
I record 20 to 30 temperature values for each hour.
Here, I would like to extract, for last hour, 4 to 6 average values (one value for a 10 or 15 minutes period) and plot them. Maybe I will change that to 3 values (one for 20 minutes) to get something smoother.
I have values like that (for example) :
mysql> SELECT date,valeur FROM temperature
+---------------------+--------+
| date | valeur |
+---------------------+--------+
| 2013-09-26 11:30:40 | 25.2 |
| 2013-09-26 11:33:19 | 25.4 |
| 2013-09-26 11:34:12 | 25.5 |
| 2013-09-26 11:38:37 | 25.4 |
| 2013-09-26 11:39:30 | 25.4 |
| 2013-09-26 11:40:23 | 25.4 |
| 2013-09-26 11:43:02 | 25.4 |
| 2013-09-26 11:45:41 | 25.3 |
| 2013-09-26 11:47:33 | 25.3 |
| 2013-09-26 11:51:07 | 25.4 |
| 2013-09-26 11:51:52 | 25.3 |
...
I tried to extract with this command :
SELECT ROUND(UNIX_TIMESTAMP(date)/(15 * 60)) AS timekey, ROUND(AVG(valeur),1) AS a FROM temperature WHERE date >= (now() - INTERVAL 1 HOUR) GROUP BY timekey ORDER BY DATE;
But I don't get any output. If I change the interval to 5 hours, I get 16 values :
[1534861, 24.600000]
[1534862, 24.600000]
[1534863, 24.600000]
[1534864, 24.700000]
[1534865, 24.700000]
[1534866, 24.600000]
[1534867, 24.600000]
[1534868, 24.600000]
[1534869, 24.600000]
[1534870, 24.600000]
[1534871, 24.700000]
[1534872, 24.700000]
[1534873, 24.700000]
[1534874, 24.800000]
[1534875, 25.000000]
[1534876, 25.200000]
Any idea how to correct this mysql request ?
Thanks you all
Greg
edit - See selected answer : the code was good, but the timezone wasn't !
I am guessing your issue is most likely a timezone difference of 1 hour.
if you get no values for last hour but you get 16 for last 5(making up 4 hours worth of values), that sounds like you have no values for last hour. If you are certain you do, check the timezone of data vs timezone of now()
try using sysdate perhaps. Quote from manual:
In addition, the SET TIMESTAMP statement affects the value returned by
NOW() but not by SYSDATE(). This means that timestamp settings in the
binary log have no effect on invocations of SYSDATE(). Setting the
timestamp to a nonzero value causes each subsequent invocation of
NOW() to return that value. Setting the timestamp to zero cancels this
effect so that NOW() once again returns the current date and time.
See the description for SYSDATE() for additional information about the
differences between the two functions.
This should work (note floor() usage):
SELECT from_unixtime(floor(unix_timestamp(date) / 15 * 60) * 15 * 60) AS tstamp,
round(avg(valeur),1) AS a
FROM temperature
WHERE date >= (now() - INTERVAL 1 HOUR)
GROUP BY 1
ORDER BY 1

Grouping MySQL datetime into intervals irrespective of timezone

This question has been asked before but I am facing a slightly different problem.
I have a table which logs events and stores their timestamps (as datetime). I need to be able to break up time into chunks and get number of events that occurred in that interval. The interval can be custom (Say from 5 minutes to 1 hour and even beyond).
The obvious solution is to convert the datetime to unix_timestamp divide it by number of seconds in the interval, take its floor function and multiply it back by the number of seconds. Finally convert the unix_timestamp back to the datetime format.
This works fine for small intervals.
select
from_unixtime(floor(unix_timestamp(event.timestamp)/300)*300) as start_time,
count(*) as total
from event
where timestamp>='2012-08-03 00:00:00'
group by start_time;
This gives the correct output
+---------------------+-------+
| start_time | total |
+---------------------+-------+
| 2012-08-03 00:00:00 | 11 |
| 2012-08-03 00:05:00 | 4 |
| 2012-08-03 00:10:00 | 4 |
| 2012-08-03 00:15:00 | 7 |
| 2012-08-03 00:20:00 | 8 |
| 2012-08-03 00:25:00 | 1 |
| 2012-08-03 00:30:00 | 1 |
| 2012-08-03 00:35:00 | 3 |
| 2012-08-03 00:40:00 | 3 |
| 2012-08-03 00:45:00 | 5 |
~~~~~OUTPUT SNIPPED~~~~~~~~~~~~
But if I increase the interval to say 1 hour (3600 sec)
mysql> select from_unixtime(floor(unix_timestamp(event.timestamp)/3600)*3600) as start_time, count(*) as total from event where timestamp>='2012-08-03 00:00:00' group by start_time;
+---------------------+-------+
| start_time | total |
+---------------------+-------+
| 2012-08-02 23:30:00 | 35 |
| 2012-08-03 00:30:00 | 30 |
| 2012-08-03 01:30:00 | 12 |
| 2012-08-03 02:30:00 | 18 |
| 2012-08-03 03:30:00 | 12 |
| 2012-08-03 04:30:00 | 4 |
| 2012-08-03 05:30:00 | 3 |
| 2012-08-03 06:30:00 | 13 |
| 2012-08-03 07:30:00 | 269 |
| 2012-08-03 08:30:00 | 681 |
| 2012-08-03 09:30:00 | 1523 |
| 2012-08-03 10:30:00 | 911 |
+---------------------+-------+
The reason, as far as I could gauge, for the boundaries not being set properly is that unix_timestamp will convert time from my local timezone (GMT + 0530) to UTC and then output the numerical value.
So a value like 2012-08-03 00:00:00 will actually be 2012-08-02 18:30:00. Dividing and using floor will set the minutes part to 00. But when I use from_unixtime, it will convert it back to GMT + 0530 and hence give me intervals that begin at 30 mins.
How do I ensure the query works correctly irrespective of the timezone? I use MySQL 5.1.52 so to_seconds() is not available
EDIT:
The query should also fire correctly irrespective of the interval (can be hours, minutes, days). A generic solution would be appreciated
You can use TIMESTAMPDIFF to group by intervals of time:
For a specified interval of hours, you can use:
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(HOUR, '2012-08-03 00:00:00', timestamp) / <n>) * <n> HOUR AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
Replace the occurances of 2012-08-03 00:00:00 with your minimum input date.
<n> is your specified interval in hours (every 2 hours, 3 hours, etc.), and you can do the same for minutes:
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(MINUTE, '2012-08-03 00:00:00', timestamp) / <n>) * <n> MINUTE AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
Where <n> is your specified interval in minutes (every 45 minutes, 90 minutes, etc).
Be sure you're passing in your minimum input date (in this example 2012-08-03 00:00:00) as the second parameter to TIMESTAMPDIFF.
EDIT: If you don't want to worry about which interval unit to pick in the TIMESTAMPDIFF function, then of course just do the interval by seconds (300 = 5 minutes, 3600 = 1 hour, 7200 = 2 hours, etc.)
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, '2012-08-03 00:00:00', timestamp) / <n>) * <n> SECOND AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
EDIT2: To address your comment pertaining to reducing the number of areas in the statement where you have to pass in your minimum parameter date, you can use:
SELECT b.mindate +
INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, b.mindate, timestamp) / <n>) * <n> SECOND AS start_time,
COUNT(*) AS total
FROM event
JOIN (SELECT '2012-08-03 00:00:00' AS mindate) b ON timestamp >= b.mindate
GROUP BY start_time
And simply pass in your minimum datetime parameter once into the join subselect.
You can even make a second column in the join subselect for your seconds interval (e.g. 3600) and name the column something like secinterval... then change the <n>'s to b.secinterval, so you only have to pass in your minimum date parameter AND interval one time each.
SQLFiddle Demo
the easier method would be:
Method1
select date(timestamp) as date_timestamp, hour(timestamp) as hour_timestamp, count(*) as total
from event
where timestamp>='2012-08-03 00:00:00'
group by date_timestamp, hour_timestamp
if you would like to use your original approach.
Method2
select from_unixtime(floor(unix_timestamp(event.timestamp-1800)/3600)*3600+1800) as start_time,
count(*) as total
from event
where timestamp>='2012-08-03 00:00:00'
group by start_time;
EDIT1
for the first method, it also allows user to set different interval.
For example, if user wants the log to group by 15 minutes,
select date(time) as date_timestamp,
hour(time) as hour_timestamp,
floor(minute(time) as minute_timestamp / 15) * 15 as minute_timestamp
count(*) as total
from event
group by date_timestamp, hour_timestamp, minute_timestamp