MySQL: group by consecutive days and count groups - mysql

I have a database table which holds each user's checkins in cities. I need to know how many days a user has been in a city, and then, how many visits a user has made to a city (a visit consists of consecutive days spent in a city).
So, consider I have the following table (simplified, containing only the DATETIMEs - same user and city):
datetime
-------------------
2011-06-30 12:11:46
2011-07-01 13:16:34
2011-07-01 15:22:45
2011-07-01 22:35:00
2011-07-02 13:45:12
2011-08-01 00:11:45
2011-08-05 17:14:34
2011-08-05 18:11:46
2011-08-06 20:22:12
The number of days this user has been to this city would be 6 (30.06, 01.07, 02.07, 01.08, 05.08, 06.08).
I thought of doing this using SELECT COUNT(id) FROM table GROUP BY DATE(datetime)
Then, for the number of visits this user has made to this city, the query should return 3 (30.06-02.07, 01.08, 05.08-06.08).
The problem is that I have no idea how shall I build this query.
Any help would be highly appreciated!

You can find the first day of each visit by finding checkins where there was no checkin the day before.
select count(distinct date(start_of_visit.datetime))
from checkin start_of_visit
left join checkin previous_day
on start_of_visit.user = previous_day.user
and start_of_visit.city = previous_day.city
and date(start_of_visit.datetime) - interval 1 day = date(previous_day.datetime)
where previous_day.id is null
There are several important parts to this query.
First, each checkin is joined to any checkin from the previous day. But since it's an outer join, if there was no checkin the previous day the right side of the join will have NULL results. The WHERE filtering happens after the join, so it keeps only those checkins from the left side where there are none from the right side. LEFT OUTER JOIN/WHERE IS NULL is really handy for finding where things aren't.
Then it counts distinct checkin dates to make sure it doesn't double-count if the user checked in multiple times on the first day of the visit. (I actually added that part on edit, when I spotted the possible error.)
Edit: I just re-read your proposed query for the first question. Your query would get you the number of checkins on a given date, instead of a count of dates. I think you want something like this instead:
select count(distinct date(datetime))
from checkin
where user='some user' and city='some city'

Try to apply this code to your task -
CREATE TABLE visits(
user_id INT(11) NOT NULL,
dt DATETIME DEFAULT NULL
);
INSERT INTO visits VALUES
(1, '2011-06-30 12:11:46'),
(1, '2011-07-01 13:16:34'),
(1, '2011-07-01 15:22:45'),
(1, '2011-07-01 22:35:00'),
(1, '2011-07-02 13:45:12'),
(1, '2011-08-01 00:11:45'),
(1, '2011-08-05 17:14:34'),
(1, '2011-08-05 18:11:46'),
(1, '2011-08-06 20:22:12'),
(2, '2011-08-30 16:13:34'),
(2, '2011-08-31 16:13:41');
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT v.user_id,
COUNT(DISTINCT(DATE(dt))) number_of_days,
MAX(days) number_of_visits
FROM
(SELECT user_id, dt
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS days,
#last_dt := DATE(dt),
#last_user := user_id
FROM
visits
ORDER BY
user_id, dt
) v
GROUP BY
v.user_id;
----------------
Output:
+---------+----------------+------------------+
| user_id | number_of_days | number_of_visits |
+---------+----------------+------------------+
| 1 | 6 | 3 |
| 2 | 2 | 1 |
+---------+----------------+------------------+
Explanation:
To understand how it works let's check the subquery, here it is.
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT user_id, dt,
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS
days,
#last_dt := DATE(dt) lt,
#last_user := user_id lu
FROM
visits
ORDER BY
user_id, dt;
As you see the query returns all rows and performs ranking for the number of visits. This is known ranking method based on variables, note that rows are ordered by user and date fields. This query calculates user visits, and outputs next data set where days column provides rank for the number of visits -
+---------+---------------------+------+------------+----+
| user_id | dt | days | lt | lu |
+---------+---------------------+------+------------+----+
| 1 | 2011-06-30 12:11:46 | 1 | 2011-06-30 | 1 |
| 1 | 2011-07-01 13:16:34 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 15:22:45 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 22:35:00 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-02 13:45:12 | 1 | 2011-07-02 | 1 |
| 1 | 2011-08-01 00:11:45 | 2 | 2011-08-01 | 1 |
| 1 | 2011-08-05 17:14:34 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-05 18:11:46 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-06 20:22:12 | 3 | 2011-08-06 | 1 |
| 2 | 2011-08-30 16:13:34 | 1 | 2011-08-30 | 2 |
| 2 | 2011-08-31 16:13:41 | 1 | 2011-08-31 | 2 |
+---------+---------------------+------+------------+----+
Then we group this data set by user and use aggregate functions:
'COUNT(DISTINCT(DATE(dt)))' - counts the number of days
'MAX(days)' - the number of visits, it is a maximum value for the days field from our subquery.
That is all;)

As data sample provided by Devart, the inner "PreQuery" works with sql variables. By defaulting the #LUser to a -1 (probable non-existent user ID), the IF() test checks for any difference between last user and current. As soon as a new user, it gets a value of 1... Additionally, if the last date is more than 1 day from the new date of check-in, it gets a value of 1. Then, the subsequent columns reset the #LUser and #LDate to the value of the incoming record just tested against for the next cycle. Then, the outer query just sums them up and counts them for the final correct results per the Devart data set of
User ID Distinct Visits Total Days
1 3 9
2 1 2
select PreQuery.User_ID,
sum( PreQuery.NextVisit ) as DistinctVisits,
count(*) as TotalDays
from
( select v.user_id,
if( #LUser <> v.User_ID OR #LDate < ( date( v.dt ) - Interval 1 day ), 1, 0 ) as NextVisit,
#LUser := v.user_id,
#LDate := date( v.dt )
from
Visits v,
( select #LUser := -1, #LDate := date(now()) ) AtVars
order by
v.user_id,
v.dt ) PreQuery
group by
PreQuery.User_ID

for a first sub-task:
select count(*)
from (
select TO_DAYS(p.d)
from p
group by TO_DAYS(p.d)
) t

I think you should consider changing database structure. You could add table visits and visit_id into your checkins table. Each time you want to register new checkin you check if there is any checkin a day back. If yes then you add a new checkin with visit_id from yesterday's checkin. If not then you add new visit to visits and new checkin with new visit_id.
Then you could get you data in one query with something like that:
SELECT COUNT(id) AS number_of_days, COUNT(DISTINCT visit_id) number_of_visits FROM checkin GROUP BY user, city
It's not very optimal but still better than doing anything with current structure and it will work. Also if results can be separate queries it will work very fast.
But of course drawbacks are you will need to change database structure, do some more scripting and convert current data to new structure (i.e. you will need to add visit_id to current data).

Related

Finding total active hours by calculating difference between TimeDate records

I have a table to register users logs every one minute and other activities using DateTime for each user_id
This is a sample data of my table
id | user_id | log_datetime
------------------------------------------
1 | 1 | 2016-09-25 13:01:08
2 | 1 | 2016-09-25 13:04:08
3 | 1 | 2016-09-25 13:07:08
4 | 1 | 2016-09-25 13:10:08
5 | 2 | 2016-09-25 13:11:08
6 | 1 | 2016-09-25 13:13:08
7 | 2 | 2016-09-25 13:13:09
8 | 2 | 2016-09-25 13:14:10
I would like to calculate the total active time on the system
UPDATE: Expected Output
For Example user_id 1 his total available time should be 00:12:00
Since his hours and seconds are same so I'll just subtract last log from previous then previous from next previous and so on then I'll sum all subtracted values
this a simple for
Simply I want to loop through the data from last record to first record with in my range
this is a simple formula I hope that make my question clear
SUM((T< n > - T< n-1 >) + (T< n-1 > - T< n-2 >) ... + (T< n-x > - T< n-first >))
Since user_id 1 his hours and seconds are the same then I'll calculate the minutes only.
(13-10)+(10-7)+(7-4)+(4-1) = 12
user_id | total_hours
---------------------------------
1 | 00:12:00
2 | 00:03:02
I did this code
SET #start_date = '2016-09-25';
SET #start_time = '13:00:00';
SET #end_date = '2016-09-25';
SET #end_time = '13:15:00';
SELECT
`ul1`.`user_id`, SEC_TO_TIME(SUM(TIME_TO_SEC(`dl1`.`log_datetime`))) AS total_hours
FROM
`users_logs` AS `ul1`
JOIN `users_logs` AS `ul2`
ON `ul1`.`id` = `ul2`.`id`
WHERE
`ul1`.`log_datetime` >= CONCAT(#start_date, ' ', #start_time)
AND
`ul2`.`log_datetime` <= CONCAT(#end_date, ' ', #end_time)
GROUP BY `ul1`.`user_id`
But this code Sum all Time not getting the difference. This is the output of the code
user_id | total_hours
---------------------------------
1 | 65:35:40
2 | 39:38:25
How can I calculate the Sum of all difference datetime, then I want to display his active hours every 12 hours (00:00:00 - 11:59:59) and (12:00:00 - 23:59:59) with in selected DateTime Period at the beginning of the code
So the output would look like this (just an dummy example not from given data)
user_id | total_hours | 00_12_am | 12_00_pm |
-------------------------------------------------------
1 | 10:10:40 | 02:05:20 | 08:05:20 |
2 | 04:10:20 | 01:05:10 | 03:05:30 |
Thank you
So you log every minute and if a user is available there is a log entry.
Then count the logs per user, so you have the number of total minutes.
select user_id, count(*) as total_minutes
from user_logs
group by user_id;
If you want them displayed as time use sec_to_time:
select user_id, sec_to_time(count(*) * 60) as total_hours
from user_logs
group by user_id;
As to conditional aggregation:
select
user_id,
count(*) as total_minutes,
count(case when hour(log_datetime) < 12 then 1 end) as total_minutes_am,
count(case when hour(log_datetime) >= 12 then 1 end) as total_minutes_pm
from user_logs
group by user_id;
UPDATE: In order to count each minute just once count distinct minutes, i.e. DATE_FORMAT(log_datetime, '%Y-%m-%d %H:%i'). This can be done with COUNT(DISTINCT ...) or with a subquery getting distinct values.
The complete query:
select
user_id,
count(*) as total_minutes,
count(case when log_hour < 12 then 1 end) as total_minutes_am,
count(case when log_hour >= 12 then 1 end) as total_minutes_pm
from
(
select distinct
user_id,
date_format(log_datetime, '%y-%m-%d %h:%i') as log_moment,
hour(log_datetime) as log_hour
from.user_logs
) log
group by user_id;

Selecting the number of last consecutive days from timestamp (excepting today)

I have a table A_DailyLogins with the columns ID (auto increment), Key (userid) and Date (timestamp). I want a query which would return the number of last consecutive days from those timestamp based on the Key, for example if he has a row for yesterday, one for two days ago and another one for three days ago, but the last one isn't from four days ago, it would return 3, because this is the number of last days the user was logged in.
My attempt was to create a query selecting the last 7 rows of the players ordered by Date DESC (this is what I wanted in the first place, but then I thought that it would be great to have all the last consecutive days), and then I retrieved the query result and compared the dates (converted to year/month/day with functions from that language [Pawn]) and increased the number of consecutive days when a date is before the other one with one day. (but this is extremely slow compared to what I think that can be done directly only with MySQL)
The closest thing I found is this: Check for x consecutive days - given timestamps in database . But it still isn't how I want it to be, it's still pretty different. I tried to modify it, but it is way too hard for me, I don't have that much experience in MySQL.
context
let consecutive login period be a period where the user is logged in on all days ( has an entry in A_DailyLogins on every day in period ) where there is no entry in A_DailyLogins immediately before or after the consecutive login period with the same user
and number of consecutive days be the difference between the maximum and minumum dates in a consecutive login period
the maximum date of a consecutive login period has no login entry immediately after ( sequentially ) to it..
the minimum date of a consecutive login period has no login entry immediately previous ( sequentially ) to it..
plan
left join A_DailyLogins to itself using same user and sequential dates where right is null to find maximums
analogous logic to find minimums
calculate row ordering over minimums and maximums with appropriate order by
join maximums and minimums on row number
filter where maximum login is yesterday/today
calculate date_diff between maximum and minimum in range
left join users to above resultset and coalesce over the case where user does not have a consecutive login period ending yesterday/today
input
+----+------+------------+
| ID | Key | Date |
+----+------+------------+
| 25 | eric | 2015-12-23 |
| 26 | eric | 2015-12-25 |
| 27 | eric | 2015-12-26 |
| 28 | eric | 2015-12-27 |
| 29 | eric | 2016-01-01 |
| 30 | eric | 2016-01-02 |
| 31 | eric | 2016-01-03 |
| 32 | nusa | 2015-12-27 |
| 33 | nusa | 2015-12-29 |
+----+------+------------+
query
select all_users.`Key`,
coalesce(nconsecutive, 0) as nconsecutive
from
(
select distinct `Key`
from A_DailyLogins
) all_users
left join
(
select
lower_login_bounds.`Key`,
lower_login_bounds.`Date` as from_login,
upper_login_bounds.`Date` as to_login,
1 + datediff(least(upper_login_bounds.`Date`, date_sub(current_date, interval 1 day))
, lower_login_bounds.`Date`) as nconsecutive
from
(
select curr_login.`Key`, curr_login.`Date`, #rn1 := #rn1 + 1 as row_number
from A_DailyLogins curr_login
left join A_DailyLogins prev_login
on curr_login.`Key` = prev_login.`Key`
and prev_login.`Date` = date_add(curr_login.`Date`, interval -1 day)
cross join ( select #rn1 := 0 ) params
where prev_login.`Date` is null
order by curr_login.`Key`, curr_login.`Date`
) lower_login_bounds
inner join
(
select curr_login.`Key`, curr_login.`Date`, #rn2 := #rn2 + 1 as row_number
from A_DailyLogins curr_login
left join A_DailyLogins next_login
on curr_login.`Key` = next_login.`Key`
and next_login.`Date` = date_add(curr_login.`Date`, interval 1 day)
cross join ( select #rn2 := 0 ) params
where next_login.`Date` is null
order by curr_login.`Key`, curr_login.`Date`
) upper_login_bounds
on lower_login_bounds.row_number = upper_login_bounds.row_number
where upper_login_bounds.`Date` >= date_sub(current_date, interval 1 day)
and lower_login_bounds.`Date` < current_date
) last_consecutive
on all_users.`Key` = last_consecutive.`Key`
;
output
+------+------------------+
| Key | last_consecutive |
+------+------------------+
| eric | 2 |
| nusa | 0 |
+------+------------------+
valid as run on 2016-01-03
sqlfiddle

Average time difference between rows in database

Using MySQL, I have a table that keep track of user visit:
USER_ID | TIMESTAMP
--------+----------------------
1 | 2014-08-11 14:37:36
2 | 2014-08-11 12:37:36
3 | 2014-08-07 16:37:36
1 | 2014-07-14 15:34:36
1 | 2014-07-09 14:37:36
2 | 2014-07-03 14:37:36
3 | 2014-05-23 15:37:36
3 | 2014-05-13 12:37:36
Time is not important, more concern about answer to "how many days between entries"
How do I go about figuring how the average number of days between entries through SQL queries?
For example, the output should look like something like:
(output is just a sample, not reflection of the data table above)
USER_ID | AVG TIME (days)
--------+----------------------
1 | 2
2 | 3
3 | 1
MySQL has no direct "get something from a previous row" capabilities. Easiest workaround is to use a variable to store that "previous" value:
SET last = null;
SELECT user_id, AVG(diff)
FROM (
SELECT user_id, IF(last IS NULL, 0, timestamp - last) AS diff, #last := timestamp
FROM yourtable
ORDER BY user_id, timestamp ASC
) AS foo
GROUP BY user_id
The inner query does your "difference from previous row" calculations, and the outer query does the averaging.

Counting appointments for each day using MYSQL

I'm in trouble with a mysql statement counting appointments for one day within a given time period. I've got a calendar table including starting and finishing column (type = DateTime). The following statement should count all appointments for November including overall appointments:
SELECT
COUNT('APPOINTMENTS') AS Count,
DATE(c.StartingDate) AS Datum
FROM t_calendar c
WHERE
c.GUID = 'blalblabla' AND
((DATE(c.StartingDate) <= DATE('2012-11-01 00:00:00')) AND (DATE(c.EndingDate) >= DATE('2012-11-30 23:59:59'))) OR
((DATE(c.StartingDate) >= DATE('2012-11-01 00:00:00')) AND (DATE(c.EndingDate) <= DATE('2012-11-30 23:59:59')))
GROUP BY DATE(c.StartingDate)
HAVING Count > 1
But how to include appointments that starts before a StartingDate and ends on the StartingDate?
e.g.
StartingDate = 2012-11-14 17:00:00, EndingDate = 2012-11-15 08:00:00
StartingDate = 2012-11-15 09:00:00, EndingDate = 2012-11-15 10:00:00
StartingDate = 2012-11-15 11:00:00, EndingDate = 2012-11-15 12:00:00
My statement returns a count of 2 for 15th of November. But that's wrong because the first appointment is missing. How to include these appointments? What I am missing, UNION SELECT, JOIN, sub selection?
A possible solution?
SELECT
c1.GUID, COUNT('APPOINTMENTS') + COUNT(DISTINCT c2.ANYFIELD) AS Count,
DATE(c1.StartingDate) AS Datum,
COUNT(DISTINCT c2.ANYFIELD)
FROM
t_calendar c1
LEFT JOIN
t_calendar c2
ON
c2.ResourceGUID = c1.ResourceGUID AND
(DATE(c2.EndingDate) = DATE(c1.StartingDate)) AND
(DATE(c2.StartingDate) < DATE(c1.StartingDate))
WHERE
((DATE(c1.StartingDate) <= DATE('2012-11-01 00:00:00')) AND (DATE(c1.EndingDate) >= DATE('2012-11-30 23:59:59'))) OR
((DATE(c1.StartingDate) >= DATE('2012-11-01 00:00:00')) AND (DATE(c1.EndingDate) <= DATE('2012-11-30 23:59:59')))
GROUP BY
c1.ResourceGUID,
DATE(c1.StartingDate)
First: Consolidate range checking
First of all your two range where conditions can be replaced by a single one. And it also seems that you're only counting appointments that either completely overlap target date range or are completely contained within. Partially overlapping ones aren't included. Hence your question about appointments that end right on the range starting date.
To make where clause easily understandable I'll simplify it by using:
two variables to define target range:
rangeStart (in your case 1st Nov 2012)
rangeEnd (I'll rather assume to 1st Dec 2012 00:00:00.00000)
won't be converting datetime to dates only (using date function) the way that you did, but you can easily do that.
With these in mind your where clause can be greatly simplified and covers all appointments for given range:
...
where (c.StartingDate < rangeEnd) and (c.EndingDate >= rangeStart)
...
This will search for all appointments that fall in target range and will cover all these appointment cases:
start end
target range |==============|
partial front |---------|
partial back |---------|
total overlap |---------------------|
total containment |-----|
Partial front/back may also barely touch your target range (what you've been after).
Second: Resolving the problem
Why you're missing the first record? Simply because of your having clause that only collects those groups that have more than 1 appointment starting on a given day: 15th Nov has two, but 14th has only one and is therefore excluded because Count = 1 and is not > 1.
To answer your second question what am I missing is: you're not missing anything, actually you have too much in your statement and needs to simplified.
Try this statement instead that should return exactly what you're after:
select count(c.GUID) as Count,
date(c.StartingDate) as Datum
from t_calendar c
where (c.GUID = 'blabla') and
(c.StartingDate < str_to_date('2012-12-01', '%Y-%m-%d') and
(c.EndingDate >= str_to_date('2012-11-01', '%Y-%m-%d'))
group by date(c.StartingDate)
I used str_to_date function to make string to date conversion more safe.
I'm not really sure why you included having in your statement, because it's not really needed. Unless your actual statement is more complex and you only included part that's most relevant. In that case you'll likely have to change it to:
having Count > 0
Getting appointment count per day in any given date range
There are likely other ways as well but the most common way would be using a numbers or ?calendar* table that gives you the ability to break a range into individual points - days. They you have to join your appointments to this numbers table and provide results.
I've created a SQLFiddle that does the trick. Here's what it does...
Suppose you have numbers table Num with numbers from 0 to x. And appointments table Cal with your records. Following script created these two tables and populates some data. Numbers are only up to 100 which is enough for 3 months worth of data.
-- appointments
create table Cal (
Id int not null auto_increment primary key,
StartDate datetime not null,
EndDate datetime not null
);
-- create appointments
insert Cal (StartDate, EndDate)
values
('2012-10-15 08:00:00', '2012-10-20 16:00:00'),
('2012-10-25 08:00:00', '2012-11-01 03:00:00'),
('2012-11-01 12:00:00', '2012-11-01 15:00:00'),
('2012-11-15 10:00:00', '2012-11-16 10:00:00'),
('2012-11-20 08:00:00', '2012-11-30 08:00:00'),
('2012-11-30 22:00:00', '2012-12-05 00:00:00'),
('2012-12-01 05:00:00', '2012-12-10 12:00:00');
-- numbers table
create table Nums (
Id int not null primary key
);
-- add 100 numbers
insert into Nums
select a.a + (10 * b.a)
from (select 0 as a union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9) as a,
(select 0 as a union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9) as b
Now what you have to do now is
Select a range of days which you do by selecting numbers from Num table and convert them to dates.
Then join your appointments to those dates so that those appointments that fall on particular day are joined to that particular day
Then just group all these appointments per each day and get results
Here's the code that does this:
-- just in case so comparisons don't trip over
set names 'latin1' collate latin1_general_ci;
-- start and end target date range
set #s := str_to_date('2012-11-01', '%Y-%m-%d');
set #e := str_to_date('2012-12-01', '%Y-%m-%d');
-- get appointment count per day within target range of days
select adddate(#s, n.Id) as Day, count(c.Id) as Appointments
from Nums n
left join Cal c
on ((date(c.StartDate) <= adddate(#s, n.Id)) and (date(c.EndDate) >= adddate(#s, n.Id)))
where adddate(#s, n.Id) < #e
group by Day;
And this is the result of this rather simple select statement:
| DAY | APPOINTMENTS |
-----------------------------
| 2012-11-01 | 2 |
| 2012-11-02 | 0 |
| 2012-11-03 | 0 |
| 2012-11-04 | 0 |
| 2012-11-05 | 0 |
| 2012-11-06 | 0 |
| 2012-11-07 | 0 |
| 2012-11-08 | 0 |
| 2012-11-09 | 0 |
| 2012-11-10 | 0 |
| 2012-11-11 | 0 |
| 2012-11-12 | 0 |
| 2012-11-13 | 0 |
| 2012-11-14 | 0 |
| 2012-11-15 | 1 |
| 2012-11-16 | 1 |
| 2012-11-17 | 0 |
| 2012-11-18 | 0 |
| 2012-11-19 | 0 |
| 2012-11-20 | 1 |
| 2012-11-21 | 1 |
| 2012-11-22 | 1 |
| 2012-11-23 | 1 |
| 2012-11-24 | 1 |
| 2012-11-25 | 1 |
| 2012-11-26 | 1 |
| 2012-11-27 | 1 |
| 2012-11-28 | 1 |
| 2012-11-29 | 1 |
| 2012-11-30 | 2 |

MySQL grouping by date range with multiple joins

I currently have quite a messy query, which joins data from multiple tables involving two subqueries. I now have a requirement to group this data by DAY(), WEEK(), MONTH(), and QUARTER().
I have three tables: days, qos and employees. An employee is self-explanatory, a day is a summary of an employee's performance on a given day, and qos is a random quality inspection, which can be performed many times a day.
At the moment, I am selecting all employees, and LEFT JOINing day and qos, which works well. However, now, I need to group the data in order to breakdown a team or individual's performance over a date range.
Taking this data:
Employee
id | name
------------------
1 | Bob Smith
Day
id | employee_id | day_date | calls_taken
---------------------------------------------
1 | 1 | 2011-03-01 | 41
2 | 1 | 2011-03-02 | 24
3 | 1 | 2011-04-01 | 35
Qos
id | employee_id | qos_date | score
----------------------------------------
1 | 1 | 2011-03-03 | 85
2 | 1 | 2011-03-03 | 95
3 | 1 | 2011-04-01 | 91
If I were to start by grouping by DAY(), I would need to see the following results:
Day__date | Day__Employee__id | Day__calls | Day__qos_score
------------------------------------------------------------
2011-03-01 | 1 | 41 | NULL
2011-03-02 | 1 | 24 | NULL
2011-03-03 | 1 | NULL | 90
2011-04-01 | 1 | 35 | 91
As you see, Day__calls should be SUM(calls_taken) and Day__qos_score is AVG(score). I've tried using a similar method as above, but as the date isn't known until one of the tables has been joined, its only displaying a record where there's a day saved.
Is there any way of doing this, or am I going about things the wrong way?
Edit: As requested, here's what I've come up with so far. However, it only shows dates where there's a day.
SELECT COALESCE(`day`.day_date, qos.qos_date) AS Day__date,
employee.id AS Day__Employee__id,
`day`.calls_taken AS Day__Day__calls,
qos.score AS Day__Qos__score
FROM faults_employees `employee`
LEFT JOIN (SELECT `day`.employee_id AS employee_id,
SUM(`day`.calls_taken) AS `calls_in`,
FROM faults_days AS `day`
WHERE employee.id = 7
GROUP BY (`day`.day_date)
) AS `day`
ON `day`.employee_id = `employee`.id
LEFT JOIN (SELECT `qos`.employee_id AS employee_id,
AVG(qos.score) AS `score`
FROM faults_qos qos
WHERE employee.id = 7
GROUP BY (qos.qos_date)
) AS `qos`
ON `qos`.employee_id = `employee`.id AND `qos`.qos_date = `day`.day_date
WHERE employee.id = 7
GROUP BY Day__date
ORDER BY `day`.day_date ASC
The solution I'm comming up with looks like:
SELECT
`date`,
`employee_id`,
SUM(`union`.`calls_taken`) AS `calls_taken`,
AVG(`union`.`score`) AS `score`
FROM ( -- select from union table
(SELECT -- first select all calls taken, leaving qos_score null
`day`.`day_date` AS `date`,
`day`.`employee_id`,
`day`.`calls_taken`,
NULL AS `score`
FROM `employee`
LEFT JOIN
`day`
ON `day`.`employee_id` = `employee`.`id`
)
UNION -- union both tables
(
SELECT -- now select qos score, leaving calls taken null
`qos`.`qos_date` AS `date`,
`qos`.`employee_id`,
NULL AS `calls_taken`,
`qos`.`score`
FROM `employee`
LEFT JOIN
`qos`
ON `qos`.`employee_id` = `employee`.`id`
)
) `union`
GROUP BY `union`.`date` -- group union table by date
For the UNION to work, we have to set the qos_score field in the day table and the calls_taken field in the qos table to null. If we don't, both calls_taken and score would be selected into the same column by the UNION statement.
After this, I selected the required fields with the aggregation functions SUM() and AVG() from the union'd table, grouping by the date field in the union table.