sum up multilple datediffs of datetimes in mysql - mysql

I have a table with one user and one day's worth of punches (clockin, breakout, breakin, clockout). Now say the user takes 2 or more breaks. I need to sum up the total time of all breaks taken. I have created a sqlfiddle to make it easier to show what I am trying to do. Here is my example: http://sqlfiddle.com/#!2/21542/6 Now I need to take (12:30:21 - 12:04:44) + (12:36:00 - 12:34:00) to get the total of all breaks taken. How can I do that in my query. Now pretend I have 10 users and 10 days in my table. I would need to group by day and user I know.

I would start by finding some way to link the punch-out records with the punch-in records from the same table. We can then put this data into a table and use it for querying against.
CREATE TEMPOARY TABLE breakPunchInOut (
SELECT
DATE(punchout.PunchDateTime) AS ShiftDate,
punchout.EmpId,
punchout.PunchId AS PunchOutID,
(SELECT
PunchId
FROM
timeclock
WHERE
timeclock.EmpId = punchout.EmpId
AND
timeclock.`In-Out` = 1
AND
timeclock.PunchDateTime > punchout.PunchDateTime
AND
DATE(timeclock.PunchDateTime) = DATE(punchout.PunchDateTime)
ORDER BY
timeclock.PunchDateTime ASC
LIMIT 1
) AS PunchInID
FROM
timeclock AS punchout
WHERE
punchout.`In-Out` = 0
HAVING
PunchInID IS NOT NULL
);
The way this query works is looking for all the "punch-outs" in a specific day, for each of these it then looks for the next "punch-in" which happened on the same day, by the same person. The HAVING clause filters out records where there is no punch-in after a punch-out - so maybe where the employee goes home for the day. This is something to remember because if someone goes home halfway through a shift then their break time will not be added to the total.
It's important to point out that this approach will only work for shifts which start and end on the same day. If you have a night shift which starts in the evening and finishes in the morning the next day, then you'll have to alter the way that you join the punch outs and punch ins together.
Now that we have this linking table, its relatively simple to use it to create a summary report for each employee and each shift:
SELECT
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId,
SUM(
TIMESTAMPDIFF(MINUTE, punchOut.PunchDateTime, punchIn.PunchDateTime)
) AS TotalBreakLengthMins
FROM
breakPunchInOut
INNER JOIN
timeclock AS punchOut
ON
punchOut.PunchId = breakPunchInOut.PunchOutId
INNER JOIN
timeclock AS punchIn
ON
punchIn.PunchId = breakPunchInOut.PunchInId
GROUP BY
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId
;
Notice we use the TIMESTAMPDIFF function, not the DATEDIFF. DATEDIFF only calculates the number of days between two dates - it's not used for time.

Related

MYSQL Test multiple dates in one record against a table with specific dates

FIrst time question so I will try and stay on point.
I have a system for recording staff attendance -
Tables:
tbl_Payperiod - (ID, Payperiod, StartDate)
tbl_Rota - (RotaID, PayperodID,EmployeeID, MonDate, Monstarttime, Monfinishtime, TuesDate, Tuesstarttime etc..)
The above works as I want it too and I can capture different variants of the working day e.g. annual leave, sickness etc.
The system is accessed through a browser using PHP (PHPRUNNER)
The Question: What I need to do is check if the date is a Public holiday.
I did this in previous setups in excel (using array and lookup) but I cannot figure out how to test it in MYSQL
I can create a table to hold the holiday dates and have this updated manually.
So how would I check and 'mark' the date in the tbl_rota.MonPH = True or false
Once I can 'mark' the date I can then apply the corresponding pay rate..
Thanks in advance for any assistance
Left join tbl_Rota with the holidays table 7 times, once for each day of the week. Then set each dayPH field to true or false depending on whether the join was successful.
UPDATE tbl_Rota AS r
JOIN tbl_Holidays AS hMon ON r.MonDate = hMon.date
JOIN tbl_Holidays AS hTue ON r.TueDate = hTue.date
...
SET r.MonPH = hMon.date IS NOT NULL,
r.TuePH = hTue.date IS NOT NULL,
...
Having separate columns in tbl_Rota for each day of the week really complicates this. If you had separate rows for each day, you could just do a single left join with the holiday table.
A normalised approach might look a little like this:
RotaID,
PayperodID,
EmployeeID
RotaID
Start_dt
Finish_dt

How do I group a table of datetimes together as long as there is a continuous chain at least every hour?

I have a table called 'events'.
It contains eventID (INT), eventDateTime(DATETIME), and eventMessage(VARCHAR).
I want to be able group the rows by eventDateTime where there is another row with eventDateTime within 1 hour each side. This should propogate forever (for example a group should be able go on for years, as long as there is never a gap longer than an hour between a linking chain of eventDateTime values within that time period. Ideally I want to end up selecting MIN(eventID) for each group, and both the MIN and MAX of eventDateTime which will give me the time span in which the group runs.
I assume I need some kind of iterating loop to do this? Where would I start?
Let's start from subqueries we need
SET #row_number1 = 0;
SET #row_number2 = 0;
The query returns us the events table ordered with row numbers (rn)
SELECT
(#row_number1:=#row_number1 + 1) AS rn, eventID, eventDateTime
FROM
events
ORDER BY eventDateTime
Let's mar them as SUB1 and SUB2
Then let's join them
select *
from SUB1 join SUB2 on sub1.rn=sub2.rn+1
So we have in one row 2 eventDateTime of current and next row and can calculate time difference
TIMESTAMPDIFF(HOUR, SUB1.eventDateTime, SUB2.eventDateTime) as hoursDiff
Then we can add HAVING hourDiff>1 to have rule breaking intervals. For such records SUB1.eventDateTime is the end of previous group but SUB2.eventDateTime is the beginning of next group.
So our query will return us
SUB1.eventID as previousGroupEndEventId,
SUB1.eventDateTime as previousGroupEndeventDateTime,
SUB2.eventID as currentGroupStartEventId,
SUB2.eventDateTime as currentGroupStarteventDateTime,
TIMESTAMPDIFF(HOUR, SUB1.eventDateTime, SUB2.eventDateTime) as breakInterval
And you can use the query results to get all your info
For complex problems requiring some form of looping, some databases allow recursive queries, but apparently not mysql.
Fortunately, in your case I don't think it is necessary. You can instead look for any rows which don't have another row in the preceeding hour thus:
select *
from events as A
where not exists (
select 1
from events as B
where B.eventDateTime < A.eventDateTime
and B.eventDateTime > DATE_ADD(A.eventDateTime, INTERVAL -1 HOUR)
)
Example kept simple. Fix up the details to meet your requirements.
Working example is here: http://sqlfiddle.com/#!9/c3b73c/1

MySQL Group By Order and Count(Distinct)

What is the best way to think about the Group By function in MySQL?
I am writing a MySQL query to pull data through an ODBC connection in a pivot table in Excel so that users can easily access the data.
For example, I have:
Select
statistic_date,
week(statistic_date,4),
year(statistic_date),
Emp_ID,
count(distict Emp_ID),
Site
Cost_Center
I'm trying to count the number of unique employees we have by site by week. The problem I'm running into is around year end, the calendar years don't always match up so it is important to have them by date so that I can manually filter down to the correct dates using a pivot table (2013/2014 had a week were we had to add week 53 + week 1).
I'm experimenting by using different group by statements but I'm not sure how the order matters and what changes when I switch them around.
i.e.
Group by week(statistic_date,4), Site, Cost_Center, Emp_ID
vs
Group by Site, Cost_Center, week(statistic_date,4), Emp_ID
Other things to note:
-Employees can work any number of days. Some are working 4 x 10's, others 5 x 8's with possibly a 6th day if they sign up for OT. If I sum the counts by week, I get anywhere between 3-7 per Emp_ID. I'm hoping to get 1 for the week.
-There are different pay code per employee so the distinct count helps when we are looking by day (VTO = Voluntary Time Off, OT = Over Time, LOA = Leave of Absence, etc). The distinct count will show me 1, where often times I will have 2-3 for the same emp in the same day (hits 40 hours and starts accruing OT then takes VTO or uses personal time in the same day).
I'm starting with a query I wrote to understand our paid hours by week. I'm trying to adapt it for this application. Actual code is below:
SELECT
dkh.STATISTIC_DATE AS 'Date'
,week(dkh.STATISTIC_DATE,4) as 'Week'
,month(dkh.STATISTIC_DATE) as 'Month'
,year(dkh.STATISTIC_DATE) as 'Year'
,dkh.SITE AS 'Site ID Short'
,aep.LOC_DESCR as 'Site Name'
,dkh.EMPLOYEE_ID AS 'Employee ID'
,count(distinct dkh.EMPLOYEE_ID) AS 'Distinct Employee ID'
,aep.NAME AS 'Employee Name'
,aep.BUSINESS_TITLE AS 'Business_Ttile'
,aep.SPRVSR_NAME AS 'Manager'
,SUBSTR(aep.DEPTID,1,4) AS 'Cost_Center'
,dkh.PAY_CODE
,dkh.PAY_CODE_SHORT
,dkh.HOURS
FROM metrics.DAT_KRONOS_HOURS dkh
JOIN metrics.EMPLOYEES_PUBLIC aep
ON aep.SNAPSHOT_DATE = SUBDATE(dkh.STATISTIC_DATE, DAYOFWEEK(dkh.STATISTIC_DATE) + 1)
AND aep.EMPLID = dkh.EMPLOYEE_ID
WHERE dkh.STATISTIC_DATE BETWEEN adddate(now(), interval -1 year) AND DATE(now())
group by dkh.SITE, SUBSTR(aep.DEPTID,1,4), week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE, dkh.EMPLOYEE_ID
The order you use in group by doesn't matter. Each unique combination of the values gets a group of its own. Selecting columns you don't group by gives you somewhat arbitrary results; you'd probably want to use some aggregation function on them, such as SUM to get the group total.
Grouping by values you derive from other values that you already use in group by, like below, isn't very useful.
week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE
If two rows have different weeks, they'll also have different dates, right?

Calculating time difference between activity timestamps in a query

I'm reasonably new to Access and having trouble solving what should be (I hope) a simple problem - think I may be looking at it through Excel goggles.
I have a table named importedData into which I (not so surprisingly) import a log file each day. This log file is from a simple data-logging application on some mining equipment, and essentially it saves a timestamp and status for the point at which the current activity changes to a new activity.
A sample of the data looks like this:
This information is then filtered using a query to define the range I want to see information for, say from 29/11/2013 06:00:00 AM until 29/11/2013 06:00:00 PM
Now the object of this is to take a status entry's timestamp and get the time difference between it and the record on the subsequent row of the query results. As the equipment works for a 12hr shift, I should then be able to build a picture of how much time the equipment spent doing each activity during that shift.
In the above example, the equipment was in status "START_SHIFT" for 00:01:00, in status "DELAY_WAIT_PIT" for 06:08:26 and so-on. I would then build a unique list of the status entries for the period selected, and sum the total time for each status to get my shift summary.
You can use a correlated subquery to fetch the next timestamp for each row.
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#;
Then you can use that query as a subquery in another query where you compute the duration between timestamp and next_timestamp. And then use that entire new query as a subquery in a third where you GROUP BY status and compute the total duration for each status.
Here's my version which I tested in Access 2007 ...
SELECT
sub2.status,
Format(Sum(Nz(sub2.duration,0)), 'hh:nn:ss') AS SumOfduration
FROM
(
SELECT
sub1.status,
(sub1.next_timestamp - sub1.timestamp) AS duration
FROM
(
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#
) AS sub1
) AS sub2
GROUP BY sub2.status;
If you run into trouble or need to modify it, break out the innermost subquery, sub1, and test that by itself. Then do the same for sub2. I suspect you will want to change the WHERE clause to use parameters instead of hard-coded times.
Note the query Format expression would not be appropriate if your durations exceed 24 hours. Here is an Immediate window session which illustrates the problem ...
' duration greater than one day:
? #2013-11-30 02:00# - #2013-11-29 01:00#
1.04166666667152
' this Format() makes the 25 hr. duration appear as 1 hr.:
? Format(#2013-11-30 02:00# - #2013-11-29 01:00#, "hh:nn:ss")
01:00:00
However, if you're dealing exclusively with data from 12 hr. shifts, this should not be a problem. Keep it in mind in case you ever need to analyze data which spans more than 24 hrs.
If subqueries are unfamiliar, see Allen Browne's page: Subquery basics. He discusses correlated subqueries in the section titled Get the value in another record.

Group by date from multiple columns?

first of all sorry for that title, but I have no idea how to describe it:
I'm saving sessions in my table and I would like to get the count of sessions per hour to know how many sessions were active over the day. The sessions are specified by two timestamps: start and end.
Hopefully you can help me.
Here we go:
http://sqlfiddle.com/#!2/bfb62/2/0
While I'm still not sure how you'd like to compare the start and end dates, looks like using COUNT, YEAR, MONTH, DAY, and HOUR, you could come up with your desired results.
Possibly something similar to this:
SELECT COUNT(ID), YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
FROM Sessions
GROUP BY YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
And the SQL Fiddle.
What you want to do is rather hard in MySQL. You can, however, get an approximation without too much difficulty. The following counts up users who start and stop within one day:
select date(start), hour,
sum(case when hours.hour between hour(start) and hours.hour then 1 else 0
end) as GoodEstimate
from sessions s cross join
(select 0 as hour union all
select 1 union all
. . .
select 23
) hours
group by date(start), hour
When a user spans multiple days, the query is harder. Here is one approach, that assumes that there exists a user who starts during every hour:
select thehour, count(*)
from (select distinct date(start), hour(start),
(cast(date(start) as datetime) + interval hour(start) hour as thehour
from sessions
) dh left outer join
sessions s
on s.start <= thehour + interval 1 hour and
s.end >= thehour
group by thehour
Note: these are untested so might have syntax errors.
OK, this is another problem where the index table comes to the rescue.
An index table is something that everyone should have in their toolkit, preferably in the master database. It is a table with a single id int primary key indexed column containing sequential numbers from 0 to n where n is a number big enough to do what you need, 100,000 is good, 1,000,000 is better. You only need to create this table once but once you do you will find it has all kinds of applications.
For your problem you need to consider each hour and, if I understand your problem you need to count every session that started before the end of the hour and hasn't ended before that hour starts.
Here is the SQL fiddle for the solution.
What it does is use a known sequential number from the indextable (only 0 to 100 for this fiddle - just over 4 days - you can see why you need a big n) to link with your data at the top and bottom of the hour.