Summarize weekly earnings, including empty records - mysql

I'm trying to summarize weekly earnings for a time management system (PHP/MySQL), but having a lot of trouble with this SQL query. Basically, I need to return sums of earnings for the past 8 weeks including records for weeks with no earnings, but I can't make this work when I add anything into the WHERE clause to narrow it down to specific kinds of tasks. There are three tables involved in this query:
tbltask stores information about tasks, including the date the task is logged, which user did it, how much time should be billed for, and whether the task is billable or not (some are not billable and should be excluded from the earnings calculation)...
task_id task_name time_est billable date_logged user_id
----------------------------------------------------------------------
223 some task 120 0 2014-12-19 1
224 a billable task 45 1 2014-12-19 2
225 also billable 90 1 2014-12-20 1
tbluser stores user information, so i need to join with it to get the payrate per hour...
user_id payrate
--------------------
1 50
2 75
calendar is just a table with a wide range of dates so that I can join with it and produce results for dates with no records.
datefield
--------------------
2013-01-01
2013-01-02
2013-01-03
[...]
2025-12-31
Below is what I have so far, to give me a total of everyone's earnings by week (starting on a Monday) for the 8 weeks prior to the date in question. This seems to work as expected, but counts all tasks instead of just billable tasks. If there are no tasks logged for any of these weeks, I get records returned with 0 as total_earned, which is important because I need records for the past 8 weeks even if no time is logged.
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM( IFNULL( time_est /60 * payrate, 0 ) ) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26'
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8
BUT, I need to only add up earnings for tasks that are billable (billable=1). As soon as I add this in, I no longer get weeks with no logs, so there are weeks missing from the records returned.
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM( IFNULL( time_est /60 * payrate, 0 ) ) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26' AND billable = 1
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8
I understand why this result makes sense (because there are no billable=1 tasks completed in those weeks, so no record is returned), but I can't for the life of me figure out how to rewrite the query to get what I want. I would also like to write queries that get the earnings for a particular user instead of all users totalled (user_id=1) but of course that gives me the same problem. I think I might need to use a subquery?
Can anyone point me in the right direction?
SOLUTION:
In case anyone else ends up struggling with something similar, I used terary's suggestion of IF() to move the billable=1 and user_id=1 logic inside the SUM calculation, instead of putting it in the WHERE clause. This solved my problem because it returns all the empty weeks with a 0 for total earnings instead of skipping those weeks without records. I'm sure there are other ways to do this, but this does work. Here's the resulting query:
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM(IF(billable=1 AND user_id=1, time_est /60 * payrate, 0)) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26' AND billable = 1
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8

Forgive me I didn't create the tables the run the queries. So I can not debug what you have.
However,
A)
You can create tbltasks.wkyear which which will eleminate the need for one of your tables.
this can be stored via on insert via trigger (maybe default value?). or just write the thing.
B) Well I guess I am uncertain of your goal?
I guess I have half a clue now.
I think the SELECT IF(billiable=1,50,0) is your friend.
http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html
http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_yearweek
SELECT YEARWEEK('1987-01-01');
-> 198653

Related

Growth for each quarter+year in SQL over my user table

I am using MYSQL and I have a User database table where my registered users are stored. I'd love to see how many users have registered on an increasing timeline for each quarter. So maybe Q1 2016 I had 1000 users total, then in Q2 2016 I had 2000 users register, in Q3 2016 4000 total users registered, etc (so I want to see the increase, not just how many registered in each quarter)
From another Stack Overflow post, I was able to create a query to see it by each day:
select u.created, count(*)
from (select distinct date(DateCreated) created from `Users`) u
join `Users` u2 on u.created >= date(u2.DateCreated)
group by u.created
and this works for each day, but I'd like to now group it by quarter and year. I tried using the QUARTER(d) function in mysql and even QUARTER(d) + YEAR(d) to concat it but I still can't get the data right (The count(*) ends up producing incredibly high values).
Would anyone be able to help me get my data grouped by quarter/year? My timestamp column is called DateCreated (it's a unix timestamp in milliseconds, so I have to divide by 1000 too)
Thanks so much
I would suggest using a correlated subquery -- this allows you to easily define each row in the result set. I think this is the logic that you want:
select dates.yyyy, dates.q,
(select count(*)
from Users u
where u.DateCreated < dates.mindc + interval 3 month
) as cnt
from (select year(DateCreated) as yyyy, quarter(DateCreated) as q
min(DateCreated) as mindc
from Users u
group by year(DateCreated), quarter(DateCreated)
) dates;

How to transform a NOT IN into a LEFT JOIN (or a NOT EXISTS)

I have the following query that is quite complex and even though I tried to understand how to do using various sources online, all the examples uses simple queries where mine is more complex, and for that, I don't find the solution.
Here's my current query :
SELECT id, category_id, name
FROM orders AS u1
WHERE added < (UTC_TIMESTAMP() - INTERVAL 60 SECOND)
AND (executed IS NULL OR executed < (UTC_DATE() - INTERVAL 1 MONTH))
AND category_id NOT IN (SELECT category_id
FROM orders AS u2
WHERE executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
GROUP BY category_id)
GROUP BY category_id
ORDER BY added ASC
LIMIT 10;
The table orders is like this:
id
category_id
name
added
executed
The purpose of the query is to list n orders (here, 10) that belong in different categories (I have hundreds of categories), so 10 category_id different. The orders showed here must be older than a minute ago (INTERVAL 60 SECOND) and never executed (IS NULL) or executed more than a month ago.
The NOT IN query is to avoid treating a category_id that has already been treated less than 5 seconds ago. So in the result, I remove all the categories that have been treated less than 5 seconds ago.
I've tried to change the NOT IN in a LEFT JOIN clause or a NOT EXISTS but the switch results in a different set of entries so I believe it's not correct.
Here's what I have so far :
SELECT u1.id, u1.category_id, u1.name, u1.added
FROM orders AS u1
LEFT JOIN orders AS u2
ON u1.category_id = u2.category_id
AND u2.executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
WHERE u1.added < (UTC_TIMESTAMP() - INTERVAL 60 SECOND)
AND (u1.executed IS NULL OR u1.executed < (UTC_DATE() - INTERVAL 1 MONTH))
AND u2.category_id IS NULL
GROUP BY u1.category_id
LIMIT 10
Thank you for your help.
Here's a sample data to try. In that case, there is no "older than 5 seconds" since it's near impossible to get a correct value, but it gives you some data to help out :)
Your query is using a column which doesn't exist in the table as a join condition.
ON u1.domain = u2.category_id
There is no column in your example data called "domain"
Your query is also using the incorrect operator for your 2nd join condition.
AND u2.executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
should be
AND u2.executed < (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
as is used in your first query

Calculate salary of tutor based on distinct sittings using mysql

I have the following table denoting a tutor teaching pupils in small groups. Each pupil has an entry into the database. A pupil may be alone or in a group. I wish to calculate the tutors "salary" as such: payment is based on time spent - this means that for each sitting (with one or more pupils) only one sitting will be calculated - distinct sittings! The start and end times are unix times.
<pre>
start end attendance
1359882000 1359882090 1
1359867600 1359867690 0
1359867600 1359867690 1
1359867600 1359867690 0
1360472400 1360477800 1
1360472400 1360477800 1
1359867600 1359867690 1
1359914400 1359919800 1
1360000800 1360006200 1
1360000800 1360006200 0
1360000800 1360006200 1
</pre>
This is what I tried: with no success - I can't get the right duration (number of hours for all distinct sittings)
SELECT YEAR(FROM_UNIXTIME(start)) AS year,
MONTHNAME(STR_TO_DATE(MONTH(FROM_UNIXTIME(start)), '%m')) AS month,
COUNT(DISTINCT start) AS sittings,
SUM(TRUNCATE((end-start)/3600, 1)) as duration
FROM schedules
GROUP BY
YEAR(FROM_UNIXTIME(start)),
MONTH(FROM_UNIXTIME(start))
Thanks for your proposals / support!
EDIT: Required results
Rate = 25
Year Month Sittings Duration Bounty
2013 February 2 2.2 2.2*25
2013 April 4 12.0 12.0*25
You could probably do something with subqueries, I've had a play with SQL fiddle, how does this look for you. Link to sql fiddle : http://sqlfiddle.com/#!2/50718c/3
SELECT
YEAR(d.date) AS year,
MONTH(d.date) AS month,
COUNT(*) AS sittings,
SUM(d.duration) AS duration_mins
FROM (
SELECT
DATE(FROM_UNIXTIME(s.start)) AS date,
s.attendance,
end-start AS duration
FROM schedules s
) d
GROUP BY
year,
month
I couldn't really see where attendance comes into this at present, you didn't specify. The inner query is responsible for taking the schedules, extracting a start date, and a duration (in seconds).
The outer query then uses these derived values but groups them up to get the sums. You could elaborate from here i.e. maybe you only want to select where attendance > 0, or maybe you want to multiply by attendance.
In this next example I have done this, calculating the duration in hours instead, and calculating the applicable duration for where sessions have >1 attendance along with the appropriate bounty assuming bounty == hours * rate : http://sqlfiddle.com/#!2/50718c/21
SELECT
YEAR(d.date) AS year,
MONTH(d.date) AS month,
COUNT(*) AS sittings,
SUM(d.duration) AS duration,
SUM(
IF(d.attendance>0,1,0)
) AS sittingsWorthBounty,
SUM(
IF(d.attendance>0,d.duration,0)
) AS durationForBounty,
SUM(
IF(d.attendance>0,d.bounty,0)
) AS bounty
FROM (
SELECT
DATE(FROM_UNIXTIME(s.start)) AS date,
s.attendance,
(end-start)/3600 AS duration,
(end-start)/3600 * #rate AS bounty
FROM schedules s,
(SELECT #rate := 25) v
) d
GROUP BY
year,
month
The key point here, is that in the subquery you do all the calculation per-row. The main query then is responsible for grouping up the results and getting your totals. The IF statements in the outer query could easily be moved into the subquery instead, for example. I just included them like this so you could see where the values came from.

Mysql query to calculate total cost

HI
I have a table listsing_prices (id,listing_id,day_from,day_to,price)
I need to calculate the total cost of an holiday in mysql becouse I need to sort the results by total cost.
EX:
VALUES IN TABLE
1 6 2011-04-27 2011-04-30 55,00
2 6 2011-05-01 2011-05-02 60,00
3 6 2011-05-03 2011-05-15 65,00
holiday from 2011-04-28 to 2011-05-05 total cost = 480
Without creating an actual table to represent every day from start date to end date, you could use mysql query variables. The first query can join to any table as long as it has as many records as days you are concerned with for the hoiday period... in this case, 8 days from April 28 to May 5. By doing a Cartesian and limiting to 8 will in essence, create a temp result set with one record per each day, starting with 2011/04/28 (your starting date).
Then, this is joined back to your pricing table that matches the date period and sums the matching price for total costs...
select
sum( pt.price ) as TotalCosts
from
( SELECT
#r:= date_add(#r, interval 1 day ) CalendarDate
FROM
(select #r := STR_TO_DATE('2011/04/28', '%Y/%m/%d')) vars,
AnyTableWithAtLeast8ays limit 8 ) JustDates,
PricesTable pt
where
JustDates.CalendarDate between pt.date_from and pt.date_to
select count(price) from listing_prices where day_from >= '2011-04-28' and day_to <= '2011-05-05'
-- This will provide a list of ids along with how many days fall between the two
SELECT a.id, DATEDIFF(DAYS, CASE WHEN day_from < '2011-04-28' THEN '2011-04-28' ELSE day_from END CASE, day_to) AS DayCount
FROM listing_prices a
WHERE '2011-04-28' BETWEEN a.date_from AND a.date_to
AND a.date_to <= '2011-05-05'
-- Based on the previous query, sum the number of days within the range
SELECT SUM( a.price * b.DayCount ) AS Total
FROM listing_prices a
JOIN ( SELECT a.id, DATEDIFF(DAYS, CASE WHEN day_from < '2011-04-28' THEN '2011-04-28' ELSE day_from END CASE, day_to) AS DayCount
FROM listing_prices a
WHERE '2011-04-28' BETWEEN a.date_from AND a.date_to
AND a.date_to <= '2011-05-05'
) b ON a.id = b.id
Please note that this is untested ... the query at the top I believe should work but if it doesn't, it can be modified and so that it does work (get the number of days within each range) and then literally copied and pasted into the subquery of the second query. The second query is the one that you will actually use.

Create a summary row for data across multiple tables

I'm trying to write a SQL query to generate a summary row for the actions performed by a given user in a given period. I have the following relevant table structure:
users
id
team
audit_periods (can be processing, shipping, break, etc)
user_id
period_type (can be "processing", "shipping", etc -- not currently normalized)
started_at
finished_at (can be null for the current period, hence the logic around times below)
audit_tasks
audit_period_id
audit_task_type_id
created_at
score
audit_task_types
name ("scan", "place_in_pallet", etc)
score (seems redundant, but we need to maintain the score that the audit_task received at the time it was performed, as the audit_task_type score can change later)
For each user for a given period, I'd like to create something like the following row of data:
users.id users.email time_spent_processing time_spent_shipping ... number_of_scans number_of_pallets
which would be calculated by figuring out for each user:
What audit_periods fall at least partially in the desired window? (Uses started_at and finished_at.)
How long did a user spend in each type of audit_period? (Should involve group by audit_periods.period_type, I'd imagine.)
What audit_tasks fall within the desired window? (Uses created_at -- not in the code below yet.)
How many of each type of audit_task did a user accomplish during the window? (Joins out to audit_task_type, and likely involves a group by on audit_task_types.name.)
How many points were earned during the time period? (Sums the scores of all the audit_tasks in the window.)
I've exhausted all of the SQL tricks I know (not many) and came up with something like the following:
select
u.id as user_id,
u.email as email,
u.team as team,
ap.period_type as period_type,
att.name,
time_to_sec(
timediff(least("2011-03-17 00:00:00", ifnull(ap.finished_at, utc_timestamp())), greatest("2011-03-16 00:00:00", ap.started_at))
) as period_duration,
sum(at.score) as period_score
from audit_periods as ap
inner join users as u on ap.user_id = u.id
left join audit_tasks as at on at.audit_period_id = ap.id
left join audit_task_types as att on at.audit_task_type_id = att.id
where (ap.started_at >= "2011-03-16 00:00:00" or (ap.finished_at >= "2011-03-17 00:00:00" and ap.finished_at <= "2011-03-17 00:00:00"))
and (ap.finished_at <= "2011-03-17 00:00:00" or (ap.started_at >= "2011-03-16 00:00:00" and ap.started_at <= "2011-03-16 00:00:00"))
and u.team in ("Foo", "Bar")
group by u.id, ap.id, at.id
but this seems to be functionally equivalent to just selecting all of the audit tasks in the end. I've tried some subqueries as well, but to little avail. More directly, this generates something like (skipping less important columns):
user_id | period_type | period_duration | name | score
1 processing 1800s scan 200
1 shipping 1000s place_in_pallet 100
1 shipping 1000s place_in_pallet 100
1 break 500s null null
when I want:
user_id | processing | shipping | break | scan | place_in_pallet | score
1 1800s 1000s 500s 1 2 400
I can easily fetch all of the audit_tasks for a given user and roll them up in code, but I might be fetching hundreds of thousands of audit_tasks over a given period, so it needs to be done in SQL.
Just to be clear -- I'm looking for a query to generate one row per user, containing summary data collected across the other 3 tables. So, for each user, I want to know how much time he spent in each type of audit_period (3600 seconds processing, 3200 seconds shipping, etc), as well as how many of each audit_task he performed (5 scans, 10 items placed in pallet, etc).
I think I have the elements of a solution, I'm just having trouble piecing them together. I know exactly how I would accomplish this in Ruby/Java/etc, but I don't think I understand SQL well enough to know which tool I'm missing. Do I need a temp table? A union? Some other construct entirely?
Any help is greatly appreciated, and I can clarify if the above is complete nonsense.
You will need to break this up into two crosstab queries which give you the information about audit_periods by user and another query that will give you the audit_task information by user and then join that to the Users table. It isn't clear how you want to roll up the information in each of the cases. For example, if a given user has 10 audit_period rows, how should the query roll up those durations? I assumed a sum of the durations here but you might want a min or max or perhaps even an overall delta.
Select U.user_id
, AuditPeriodByUser.TotalDuration_Processing As processing
, AuditPeriodByUser.TotalDuration_Shipping As shipping
, AuditPeriodByUser.TotalDuration_Break As break
, AuditTasksByUser.TotalCount_Scan As scan
, AuditTasksByUser.TotalCount_Place_In_Pallet As place_in_pallet
, AuditTasksByUser.TotalScore As score
From users As U
Left Join (
Select AP.user_id
, Sum( Case When AP.period_type = 'processing'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Processing
, Sum( Case When AP.period_type = 'shipping'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Shipping
, Sum( Case When AP.period_type = 'break'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Break
From audit_periods As AP
Where AP.started_at >= #StartDate
And AP.finished_at <= #EndDate
Group by AP.user_id
) As AuditPeriodByUser
On AuditPeriodByUser.user_id = U.user_id
Left Join (
Select AP.user_id
, Sum( Case When AT.Name = 'scan' Then 1 Else 0 End ) As TotalCount_Scan
, Sum( Case When AT.Name = 'place_in_pallet' Then 1 Else 0 End ) As TotalCount_Place_In_Pallet
, Sum( AT.score ) As TotalScore
From audit_tasks As AT
Join audit_task_types As ATT
On ATT.id = AT.audit_task_type_id
Join audit_periods As AP
On AP.audit_period_id = AP.id
Where AP.started_at >= #StartDate
And AP.finished_at <= #EndDate
Group By AP.user_id
) As AuditTasksByUser
On AuditTasksByUser.user_id = U.user_id