Calculating working hours for each weekday between date range - mysql

I have a table timeandattandance with following fields
TAId int(11) NOT NULL AUTO_INCREMENT,
PostId int(11) NOT NULL,
PositionId int(11) NOT NULL,
CreatedBy int(11) NOT NULL,
ModifiedBy int(11) NOT NULL,
CreatedDate datetime NOT NULL,
ModifiedDate datetime NOT NULL,
TimeIn datetime NOT NULL,
TimeOut datetime DEFAULT NULL,
TimeBilled tinyint(1) NOT NULL DEFAULT '0',
DeviceId varchar(50) DEFAULT NULL,
UserId int(11) DEFAULT NULL,
oldid varchar(45) DEFAULT NULL,
FromCallIn tinyint(1) DEFAULT '0'
I need to get working hours for each user for each day grouped by week ending date between a date range. that is I have been given a date range. Firstly I need to find out that timein will be between this date range then I need to get all week ending dates and then for every week I need to calculate working hours for each day.
Also while calculating the working hours I need to check if difference between timein and timeout is more than one day then these working hours will be separated for two days instead of one.
I know it's bit complex and if require more explanation please let me know.

Here's an example/demonstration of one possible approach:
SELECT r.dt + INTERVAL 7-DAYOFWEEK(r.dt) DAY AS week_ending
, t.userid
, r.dt
, SUM(TIMESTAMPDIFF(SECOND
,GREATEST(r.dt,t.timein)
,LEAST(r.dt+INTERVAL 1 DAY,t.timeout)
)
)/3600 AS hours_worked
FROM ( SELECT '2014-09-28' AS rb, '2014-10-11' AS re) dr
JOIN ( SELECT DATE(i.timein) AS dt FROM mytable i
UNION
SELECT DATE(o.timeout) FROM mytable o
) r
ON r.dt BETWEEN DATE(dr.rb) AND DATE(dr.re)
JOIN mytable t
ON t.timein < r.dt+INTERVAL 1 DAY
AND t.timeout > r.dt
GROUP BY t.userid, r.dt
ORDER BY week_ending, t.userid, r.dt
NOTE: The week_ending returns the date of the Saturday following (or of) the work date.
The date range is specified in the inline view dr (date range), range begin date is column rb, the range end date is column re. This example shows a two week range, starting on Sunday 2014-09-28 for two full weeks, including time worked on Saturday 2014-10-11. (The value for re could be derived as an integer number of days from rb, to get 14 full days, `rb + INTERVAL 13 DAY)
Only hours worked on these dates, or on days between these dates, are reported. A given userid that did not have any work "time" on a given date will not have a row returned for that date. (The query could be easily tweaked to return rows for all employees for all dates in the range, and returning zeros.)
Absent a "cal" calendar table that contains all date values, we can get a distinct list of date values in the range from the table itself; this could be relatively expensive operation for a large number or rows in the table. This won't return date values that don't appear in the timein or timeout columns. (If there's a work period of over 24 hours, e.g. starting on Monday and ending on Wednesday. and there are no other rows that have a timein or timeout on Wednesday, the 24 hours worked that day will be omitted... that's likely an extreme corner case. Having a distinct list of all possible date value available in a calendar table avoids that problem.)
FOLLOWUP
From your comment it sounds like you need a cross product between a distinct list of days, and a distinct list of users, then an outer join to the working hours table.
JOIN ( distinct_list_of_dates_r ) r
ON ( r_in_specified_week )
CROSS
JOIN ( distinct_list_of_userid_u ) u
LEFT
JOIN mytable t
ON ( t_userid_matches_u )
AND ( t_times_matches_r )
And then GROUP BY the userid from u, rather than from t. You'll likely want to replace a NULL value returned from hours_worked with a 0. The MySQL IFNULL function is convenient for that, IFNULL(expr,0) is shorthand for CASE WHEN expr IS NULL THEN 0 ELSE expr END. If you want total hours for the entire week, rather than by individual days, then do the GROUP BY on the week_ending expression, rather than on the individual date.

Related

Get records between Two Dates with overlapping time intervals

I have the following database
CREATE TABLE `table` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`time` bigint(20) DEFAULT NULL,
`name` varchar(20) DEFAULT NULL,
`messages` varchar(2000) NOT NULL,
PRIMARY KEY (`id`)
)
INSERT INTO `table` VALUES (1,1467311473,"Jim", "Jim wants a book"),
(2,1467226792,"Tyler", "Tyler wants a book"),
(3,1467336672,"Phil", "Phil wants a book");
I need to get the records between date 29 Jun 2016 and 1 July 2016 for time intervals 18:59:52 to 01:31:12.
I wrote a query but it doesn't return the desired output
SELECT l.*
FROM table l
WHERE ((time >=1467226792) AND (CAST(FROM_UNIXTIME(time/1000) as time) >= '18:59:52') AND (CAST(FROM_UNIXTIME(time/1000) as time) <= '01:31:12') AND (time <=1467336672))
Any suggestions??
As I understand it, you're simply interested in all periods greater than '2016-06-29 18:59:52' and less than '2016-07-01 01:31:12' where the time element is NOT between '01:31:12' and '18:59:52'
I think you can turn that logic into sql without further assistance
Ah, well, here's a fiddle - left out all the from_unixtime() stuff because it adds unnecessary complication to an understanding of the problem - but adapting this solution to your needs is literally just a case of preceding each instance of the column time with that function:
http://rextester.com/OOGWB23993
If i got it right
SELECT l.*
FROM `table` l
WHERE time >=1467226792
AND time <=1467336672
AND CAST(FROM_UNIXTIME(time/1000) as time) >= '18:59:52'
AND FROM_UNIXTIME(time/1000) <= DATE_ADD(DATE_ADD(DATE_ADD(CAST(FROM_UNIXTIME(time/1000) as date), INTERVAL 25 HOUR), INTERVAL 31 MINUTE), INTERVAL 12 SECOND)

Optimize SQL query, multiple select with differing arguments

I'm running a query like this in a python script
results = []
for day in days:
for hour in hours:
for id in ids:
query = "SELECT AVG(weight) from table WHERE date >= '%s' \
AND day=%s \
AND hour=%s AND id=%s" % \
(paststr, day, hour, _id)
results.append(query.exec_and_fetch())
Or for people not used to python, for every day, for every hour in that day and for all the ids in a list for each of those hours I need to get the average weight for some items.
as an example:
day 0 hour 0 id 0
day 0 hour 0 id 1
...
day 2 hour 5 id 4
day 2 hour 6 id 0
...
This results in a lot of queries, so I'm thinking if it's possible to do this in one query instead. I've been fiddling a bit with views but I've always got stuck on the varying parameters, or they get so very very slow, it's a rather big table.
My closest guess is this:
create or replace view testavg as
select date, day, hour, id, (select avg(weight) from cuWeight w_i
where w_i.date=w_o.date
and w_i.day=w_o.day
and w_i.hour=w_o.hour)
from cuWeight w_o;
But that hasn't returned anything yet, after waiting a minute or two I cancel the query.
table looks like this:
CREATE TABLE `cuWeight` (
`id` int(11) NOT NULL default '0',
`date` date default NULL,
`hour` int(11) default '0',
`weight` float default '0',
`day` int(11) default '0',
KEY `id_index` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
myisam and latin1 are for historical(almost fossilised) reasons.
You need a GROUP BY query
select date, day, hour, id, avg(weight)
from cuWeight
where date > *some date*
group by date, day, hour, id ;
If it's still slow you can split it up in chunks, for example:
for day in days:
query = "select date, day, hour, id, avg(weight) \
from cuWeight \
where date > '%s' \
and day = %s \
group by date, day, hour, id " % \
(paststr, day)
...

MySQL Query to find monthly active user

I have the following table:
CREATE TABLE account (
account_id bigint(20) NOT NULL AUTO_INCREMENT,
time_start datetime NOT NULL,
time_end datetime DEFAULT NULL,
PRIMARY KEY (account_id),
KEY idx_start (account_id,time_start),
KEY idx_end (account_id,time_end)
) ENGINE=MyISAM
How can I write a query to find how many users log on monthly?
I want to find for the last 90 days how many different account_id are in the table group by month. Group by month means here every 30 days: for example from 2011-12-05 to 2011-11-06, from 2011-12-04 to 2011-11-05 and so on for the last 90 days.
You can trivially get years/months out of a datetime field with YEAR() and MONTH() respectively. But your periods don't match start/end on month boundaries, so you'll need some ugly-looking query logic to handle that conversion.
You should start by writing a stored function/procedure that'll convert a regular date/time to a "fiscal" date time, after which the query should become much cleaner looking. Once you've got the procedure done, it can be reused everywhere, as fiscal period calculations will undoubtedly be repeated elsewhere as well.
This query assumes two things:
1) you have your month logic squared-away (see #Marc's post) and added as an extra column (month) on the table.
2) time_start is the time that the user has "logged-on".
SELECT COUNT(*), month
FROM account
GROUP BY month
HAVING time_start > ADDDATE(CURDATE(),- INTERVAL 90 DAY);
Try messing around with it and see if that helps. I'm not too sure on the negative ADDDATE bit there, so you'll want to check-out MySQL's reference page for date and time functions.
try this
select count(distinct account_id)
from account
where
time_start >= date_sub(now(), interval 90 day)
group by
floor(datediff(now(), time_start) / 30)

MySQL query runs very slow on large table

I am trying to run the following query on a very large table with over 90 million of rows increasing
SELECT COUNT(DISTINCT device_uid) AS cnt, DATE_FORMAT(time_start, '%Y-%m-%d') AS period
FROM game_session
WHERE account_id = -2 AND DATE_FORMAT(time_start '%Y-%m-%d') BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
GROUP BY period
ORDER BY period DESC
I have the following table structure:
CREATE TABLE `game_session` (
`session_id` bigint(20) NOT NULL,
`account_id` bigint(20) NOT NULL,
`authentification_type` char(2) NOT NULL,
`source_ip` char(40) NOT NULL,
`device` char(50) DEFAULT NULL COMMENT 'Added 0.9',
`device_uid` char(50) NOT NULL,
`os` char(50) DEFAULT NULL COMMENT 'Added 0.9',
`carrier` char(50) DEFAULT NULL COMMENT 'Added 0.9',
`protocol_version` char(20) DEFAULT NULL COMMENT 'Added 0.9',
`lang_key` char(2) NOT NULL DEFAULT 'en',
`instance_id` char(100) NOT NULL,
`time_start` datetime NOT NULL,
`time_end` datetime DEFAULT NULL,
PRIMARY KEY (`session_id`),
KEY `game_account_session_fk` (`account_id`),
KEY `lang_key_fk` (`lang_key`),
KEY `lookup_active_session_idx` (`account_id`,`time_start`),
KEY `lookup_finished_session_idx` (`account_id`,`time_end`),
KEY `start_time_idx` (`time_start`),
KEY `lookup_guest_session_idx` (`device_uid`,`time_start`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
How can I optimize this?
Thank for your answer
DATE_FORMAT(time_start '%Y-%m-%d') sounds expensive.
Every calculation on a column reduces the use of indexes. You probably run in to a full index scan + calculation of DATE_FORMAT for each value instead of a index lookup / range scan.
Try to store the computed value in the column (or create a computed index if mysql supports it). Or even better rewrite your conditions to compare directly to the value stored in the column.
Well, 90mlns is a lot, but I suspect it doesn't use the start_time_idx because of the manipulations, which you can avoid (you can manipulate the values you compare it with with, it also must be done only once per query if mysql is smart enough), have you checked EXPLAIN?
You may want to group and sort by time_start instead of the period value you create when the query is run. Sorting by period requires all of those values to be generated before any sorting can be done.
Try swapping out your WHERE clause with the following:
WHERE account_id = -2 AND time_start BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
MySQL will still catch the dates between, the only ones you'll need to worry about are the ones from today, which might get truncated due to technically being greater than midnight.
You can fix that by incrementing the second CURDATE( ) with CURDATE( ) + INTERVAL 1 DAY
I'd change
BETWEEN CURDATE() - INTERVAL 90 DAY AND CURDATE()
to
> (CURDATE() - INTERVAL 90 DAY)
You don't have records from future, do you?
Change the query to:
SELECT COUNT(DISTINCT device_uid) AS cnt
, DATE_FORMAT(time_start, '%Y-%m-%d') AS period
FROM game_session
WHERE account_id = -2
AND time_start >= CURDATE() - INTERVAL 90 DAY
AND time_start < CURDATE() + INTERVAL 1 DAY
GROUP BY DATE(time_start) DESC
so the index of (account_id, time_start) can be used for the WHERE part of the query.
If it's still slow - the DATE(time_start) does not look very good for performance - add a date_start column and store the date part of time_start.
Then add an index on (account_id, date_start, device_uid) which will further improve performance as all necessary info - for the GROUP BY date_start and the COUNT(DISTINCT device_uid) parts - will be on the index:
SELECT COUNT(DISTINCT device_uid) AS cnt
, date_start AS period
FROM game_session
WHERE account_id = -2
AND date_start BETWEEN CURDATE() - INTERVAL 90 DAY
AND CURDATE()
GROUP BY date_start DESC

How to get each day's information using group by keyword in mysql

I'm new in mysql and I'm currently having an issue with a query. I need to get an average duration for each activity each day within a week. The date format is like: '2000-01-01 01:01:01', but I want to get rid of the 01:01:01 thing and only care about the date. How do I do that?
The table is something like this:
record_id int(10) NOT NULL,
activity_id varchar(100) NOT NULL,
start_time datetime NOT NUll,
end_time datetime NOT NULL,
duration int(10) NOT NULL;
Thanks.
You could do something like the following
select activity_id, dayofweek(datetime) as day, avg(duration) as average
from table_name where datetime between start_date and end_date
group by activity_id,dayofweek(datetime)
If I'm understanding, you want to group by the different activity times and see the average days between the start and end of each activity. This should do it for you.
SELECT activity_id, avg(DATEDIFF(end_time, start_time)) AS Average
FROM tablename
GROUP BY activity_id, DAYOFWEEK(start_time)
Edit: Misunderstood, you want it broken down by the day as well, so this should pull each group, broken down by the day of the week that the start_time falls on, and then the average days between start_time and end_time.
You can use date_format(end_time, '%Y%m%d') to convert it to a sortable value by day. Put that in the group by expression, and that should do what you want.