I have a list of workers and would like to understand how many and what % of workers each day were returning workers. I'm defining a returner as someone that worked anytime from the beginning of the year to the day before date in the specific row. The output I'm looking for:
Date, # of workers total today, # of returners, % returners
7/29 , 5, 3, 60%
7/28 , 10, 5, 50%
and so on...
For 7/29, 3 workers worked on or before 7/28 (day before).
For 7/28, 5 workers worked on or before 7/27 (day before).
I currently have a list of workers that came in for today and a list of workers that came in for all days before today, but I'm not sure how to put them together in one query.Currently the dates are hardcoded; but how do I run this query for each day in the last 2 months? I'm also not sure that the way I'm using dates is correct; is there a way better way other than using BETWEEN?
What I currently have:
List of workers for 1 day (the date should be dynamic depending on the date of the row):
select
/* date_format(CONVERT_TZ(backend_shiftgroup.ends_at,'+00:00','{UTC_OFFSET.RAW}:00'), '%X-W%v') as Week, */
backend_userprofile.name,
business.name as 'Business',
worker_id,
ends_at
from backend_shift
join backend_shiftgroup on backend_shift.shift_group_id = backend_shiftgroup.id
join backend_gigtemplate on backend_shift.gig_id = backend_gigtemplate.id
join business on backend_gigtemplate.business_id = business.id
join backend_company on backend_company.id = backend_gigtemplate.company_id
join backend_userprofile on backend_shift.worker_id = backend_userprofile.id
where is_cancelled = 0
and worker_id is not null
and ends_at BETWEEN '2020-07-28' and '2020-07-29'
group by worker_id
order by count(backend_shift.id) desc;
List of all workers before today:
select
/* date_format(CONVERT_TZ(backend_shiftgroup.ends_at,'+00:00','{UTC_OFFSET.RAW}:00'), '%X-W%v') as Week, */
count(backend_shift.id) as '# of shifts',
backend_userprofile.name,
business.name as 'Business',
worker_id,
ends_at
from backend_shift
join backend_shiftgroup on backend_shift.shift_group_id = backend_shiftgroup.id
join backend_gigtemplate on backend_shift.gig_id = backend_gigtemplate.id
join business on backend_gigtemplate.business_id = business.id
join backend_company on backend_company.id = backend_gigtemplate.company_id
join backend_userprofile on backend_shift.worker_id = backend_userprofile.id
where is_cancelled = 0
and worker_id is not null
and ends_at <= '2020-07-28'
group by worker_id
order by count(backend_shift.id) desc;
Related
I have been tasked to find how many users performed a transaction in every month in 2020
I know i have two tables to work with.
Table Name: Receipts|Columns: receipt_id, collection_id, user_id, amount
Table Name: Games |Columns: game_id, collection_id, game_date_time
i tried this but I dont think it makes sense or works
select month(games.game_date_time) AS Month, sum(receipts.id) from bills
join games on bills.collection_id = games.collection_id
WHERE YEAR(games.game_date_time) = 2020
group by receipts.user_id, month(games.game_date_time)
order by month(games.game_date_time)
Use COUNT() to get a count, not SUM(). And if you want a count of users, without counting the same user twice, use COUNT(DISTINCT user_id), don't put user_id in the grouping.
SELECT MONTH(g.game_date_time) AS month, COUNT(DISTINCT r.user_id) AS users
FROM receipts AS r
JOIN games AS g ON r.collection_id = g.collection_id
WHERE YEAR(g.game_date_time) = 2020
GROUP BY month
ORDER BY month
find how many users performed a transaction in every month in 2020
SELECT COUNT(r.user_id)
FROM receipts AS r
JOIN games AS g USING (collection_id)
WHERE YEAR(g.game_date_time) = 2020
GROUP BY r.user_id
HAVING COUNT(DISTINCT MONTH(g.game_date_time)) = MONTH(CURRENT_DATE)
This query:
Selects rows for current year only.
For each user - calculates the amount of distinct months for payments for this user and compares with current month. If user has payments in each month (including current!) these values are equal.
Count the amount of users matched this condition.
PS. The query will fail in 2021 - for to receive correct info in future use
HAVING COUNT(DISTINCT MONTH(g.game_date_time)) = CASE YEAR(CURRENT_DATE)
WHEN 2020
THEN MONTH(CURRENT_DATE)
ELSE 12
END
I have two MySQL tables memberships and member_cards. Each membership & member card can have three states.
Active = start_date <= today <= end_date
Future = today < start_date
Expired = end_date < today
Memberships table
id--------membership_number--------start_date-------------end_date
1--------**123**--------------------------------09-20-2014-----------09-20-2015
2--------**123**--------------------------------09-20-2015-----------09-20-2016
3--------**123**--------------------------------09-20-2016-----------09-20-2017
4--------**123**--------------------------------09-20-2017-----------09-20-2018
5--------**456**--------------------------------09-20-2013-----------09-20-2014
6--------**456**--------------------------------09-20-2014-----------09-20-2015
Membership cards
id--------membership_id-------------start_date-------------end_date
1--------**1**--------------------------------09-20-2014-----------05-15-2015
2--------**1**--------------------------------09-20-2014-----------09-20-2015
3--------**2**--------------------------------09-20-2015-----------05-13-2016
4--------**2**--------------------------------09-20-2015-----------09-20-2016
5--------**3**--------------------------------09-20-2016-----------09-21-2016 (past)
6--------**3**--------------------------------09-20-2016-----------05-15-2017
7--------**3**--------------------------------09-20-2016-----------09-20-2017
8--------**4**--------------------------------09-20-2017-----------05-13-2017
9--------**4**--------------------------------09-20-2017-----------09-20-2018
10-------**5**--------------------------------09-20-2013-----------05-13-2014
11-------**5**--------------------------------09-20-2013-----------09-20-2014
12------**6**--------------------------------09-20-2014-----------05-13-2015
13-----**6**--------------------------------09-20-2014-----------09-20-2015
I want to retrieve
All the active + future memberships + (if there are no active or future memberships for a particular membership number, the last expired record)
The results:
id--------membership_number--------start_date-------------end_date
3--------**123**--------------------------------09-20-2016-----------09-20-2017
4--------**123**--------------------------------09-20-2017-----------09-20-2018
6--------**456**--------------------------------09-20-2014-----------09-20-2015
Active cards + (if the membership has expired, all the cards tied to that membership )
The results:
id--------membership_id-------------start_date-------------end_date
6--------**3**--------------------------------09-20-2016-----------05-15-2017
7--------**3**--------------------------------09-20-2016-----------09-20-2017
8--------**4**--------------------------------09-20-2017-----------05-13-2017
9--------**4**--------------------------------09-20-2017-----------09-20-2018
12------**6**--------------------------------09-20-2014-----------05-13-2015
13-----**6**--------------------------------09-20-2014-----------09-20-2015
Each table contains about 200k records. I am trying to do the second query (for the member_cards) using a single MySQL query using UNION. Are there any better approaches?
As has been said your question is a bit unclear, but I've written a query that returns your 2'nd target table based on the 1'st one. Inner query returns your 1'st target.
select
b.id#,
b.membership_id,
b.start_date,
b.end_date
from membership_cards b
full outer join (
select * from
(select a.*,
max(end_date) over (partition by membership_number) max_end_date,
case when
end_date>=sysdate --replace with today or whatever
then 1 --active
else 0 --inactive
end index_active
from memberships a)
where (end_date<=max_end_date and end_date>=sysdate) or
end_date = max_end_date) c
on c.id# = membership_id
where (c.index_active = 1 and b.end_date >= sysdate) or
c.index_active = 0
I have two tables, one is a list of firms, the other is a list of jobs the firms have advertised with deadlines for application and start dates.
Some of the firms will have advertised no jobs, some will only have jobs that are past their deadline dates, some will only have live jobs and others will have past and live applications.
What I want to be able to show as a result of a query is a list of all the firms, with the nearest deadline they have, sorted by that deadline. So the result might look something like this (if today was 2015-01-01).
Sorry, I misstated that. What I want to be able to do is find the next future deadline, and if there is no future deadline then show the last past deadline. So in the first table below the BillyCo deadline has passed, but the next BuffyCo deadline is shown. In the BillyCo case there are earlier deadlines, but in the BuffyCo case there are both earlier and later deadlines.
id name title date
== ==== ===== ====
1 BobCo null null
2 BillCo Designer 2014-12-01
3 BuffyCo Admin 2015-01-31
So, BobCo has no jobs listed at all, BillCo has a deadline that has passed and BuffyCo has a deadline in the future.
The problematic part is that BillCo may have a set of jobs like this:
id title date desired hit
== ===== ==== ===========
1 Coder 2013-12-01
2 Manager 2014-06-30
3 Designer 2012-12-01 <--
And BuffyCo might have:
id title date desired hit
== ===== ==== ===========
1 Magician 2013-10-01
2 Teaboy 2014-05-19
3 Admin 2015-01-31 <--
4 Writer 2015-02-28
So, I can do something like:
select * from (
select * from firms
left join jobs on firms.id = jobs.firmid
order by date desc)
as t1 group by firmid;
Or, limit the jobs joined or returned by a date criterion, but I don't seem to be able to get the records I want returned. ie the above query would return:
id name title date
== ==== ===== ====
1 BobCo null null
2 BillCo Designer 2014-12-01
3 BuffyCo Writer 2015-02-28
For BuffyCo it's returning the Writer job rather than the Admin job.
Is it impossible with an SQL query? Any advice appreciated, thanks in advance.
I think this may be what you need, you need:
1) calculate the delta for all of your jobs between the date and the current date finding the min delta for each firm.
2) join firms to jobs only on where firm id's match and where the calculated min delta for the firm matches the delta for the row in jobs.
SELECT f.id, f.name, j.title,j.date
FROM firms f LEFT JOIN
(SELECT firmid,MIN(abs(datediff(date, curdate())))) AS delta
FROM jobs
GROUP BY firmid) d
ON f.id = d.firmid
LEFT JOIN jobs j ON f.id = j.id AND d.delta = abs(datediff(j.date, curdate())))) ;
You want to make an outer join with something akin to the group-wise maximum of (next upcoming, last expired):
SELECT * FROM firms LEFT JOIN (
-- fetch the "groupwise" record
jobs NATURAL JOIN (
-- using the relevant date for each firm
SELECT firmid, MAX(closest_date) date
FROM (
-- next upcoming deadline
SELECT firmid, MIN(date) closest_date
FROM jobs
WHERE date >= CURRENT_DATE
GROUP BY firmid
UNION ALL
-- most recent expired deadline
SELECT firmid, MAX(date)
FROM jobs
WHERE date < CURRENT_DATE
GROUP BY firmid
) closest_dates
GROUP BY firmid
) selected_dates
) ON jobs.firmid = firms.id
This will actually give you all jobs that have the best deadline date for each firm. If you want to restrict the results to an indeterminate record from each such group, you can add GROUP BY firms.id to the very end.
The revision to your question makes it rather trickier, but it can still be done. Try this:
select
closest_job.*, firm.name
from
firms
left join (
select future_job.*
from
(
select firmid, min(date) as mindate
from jobs
where date >= curdate()
group by firmid
) future
inner join jobs future_job
on future_job.firmid = future.firmid and future_job.date = future.mindate
union all
select past_job.*
from
(
select firmid, max(date) as maxdate
from jobs
group by firmid
having max(date) < curdate()
) past
inner join jobs past_job
on past_job.firmid = past.firmid and past_job.date = past.maxdate
) closest_job
on firms.id = closest_job.firmid
I think this does what I need:
select * from (
select firms.name, t2.closest_date from firms
left join
(
select * from (
--get first date in the future
SELECT firmid, MIN(date) closest_date
FROM jobs
WHERE date >= CURRENT_DATE
GROUP BY firmid
UNION ALL
-- most recent expired deadline
SELECT firmid, MAX(date)
FROM jobs
WHERE date < CURRENT_DATE
GROUP BY firmid) as t1
-- order so latest date is first
order by closest_date desc) as t2
on firms.id = t2.firmid
-- group by eliminates all but latest date
group by firms.id) as t3
order by closest_date asc;
Thanks for all the help on this
I'm trying to summarize weekly earnings for a time management system (PHP/MySQL), but having a lot of trouble with this SQL query. Basically, I need to return sums of earnings for the past 8 weeks including records for weeks with no earnings, but I can't make this work when I add anything into the WHERE clause to narrow it down to specific kinds of tasks. There are three tables involved in this query:
tbltask stores information about tasks, including the date the task is logged, which user did it, how much time should be billed for, and whether the task is billable or not (some are not billable and should be excluded from the earnings calculation)...
task_id task_name time_est billable date_logged user_id
----------------------------------------------------------------------
223 some task 120 0 2014-12-19 1
224 a billable task 45 1 2014-12-19 2
225 also billable 90 1 2014-12-20 1
tbluser stores user information, so i need to join with it to get the payrate per hour...
user_id payrate
--------------------
1 50
2 75
calendar is just a table with a wide range of dates so that I can join with it and produce results for dates with no records.
datefield
--------------------
2013-01-01
2013-01-02
2013-01-03
[...]
2025-12-31
Below is what I have so far, to give me a total of everyone's earnings by week (starting on a Monday) for the 8 weeks prior to the date in question. This seems to work as expected, but counts all tasks instead of just billable tasks. If there are no tasks logged for any of these weeks, I get records returned with 0 as total_earned, which is important because I need records for the past 8 weeks even if no time is logged.
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM( IFNULL( time_est /60 * payrate, 0 ) ) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26'
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8
BUT, I need to only add up earnings for tasks that are billable (billable=1). As soon as I add this in, I no longer get weeks with no logs, so there are weeks missing from the records returned.
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM( IFNULL( time_est /60 * payrate, 0 ) ) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26' AND billable = 1
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8
I understand why this result makes sense (because there are no billable=1 tasks completed in those weeks, so no record is returned), but I can't for the life of me figure out how to rewrite the query to get what I want. I would also like to write queries that get the earnings for a particular user instead of all users totalled (user_id=1) but of course that gives me the same problem. I think I might need to use a subquery?
Can anyone point me in the right direction?
SOLUTION:
In case anyone else ends up struggling with something similar, I used terary's suggestion of IF() to move the billable=1 and user_id=1 logic inside the SUM calculation, instead of putting it in the WHERE clause. This solved my problem because it returns all the empty weeks with a 0 for total earnings instead of skipping those weeks without records. I'm sure there are other ways to do this, but this does work. Here's the resulting query:
SELECT FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) AS first_day,
SUM(IF(billable=1 AND user_id=1, time_est /60 * payrate, 0)) AS total_earned
FROM calendar
LEFT JOIN tbltask ON tbltask.date_logged = calendar.datefield
LEFT JOIN tbluser ON tbltask.user_id = tbluser .user_id
WHERE datefield <= '2014-12-26' AND billable = 1
GROUP BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7))
ORDER BY FROM_DAYS(TO_DAYS(datefield) - MOD(TO_DAYS(datefield)-2, 7)) DESC
LIMIT 8
Forgive me I didn't create the tables the run the queries. So I can not debug what you have.
However,
A)
You can create tbltasks.wkyear which which will eleminate the need for one of your tables.
this can be stored via on insert via trigger (maybe default value?). or just write the thing.
B) Well I guess I am uncertain of your goal?
I guess I have half a clue now.
I think the SELECT IF(billiable=1,50,0) is your friend.
http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html
http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_yearweek
SELECT YEARWEEK('1987-01-01');
-> 198653
I'm trying to write a SQL query to generate a summary row for the actions performed by a given user in a given period. I have the following relevant table structure:
users
id
team
audit_periods (can be processing, shipping, break, etc)
user_id
period_type (can be "processing", "shipping", etc -- not currently normalized)
started_at
finished_at (can be null for the current period, hence the logic around times below)
audit_tasks
audit_period_id
audit_task_type_id
created_at
score
audit_task_types
name ("scan", "place_in_pallet", etc)
score (seems redundant, but we need to maintain the score that the audit_task received at the time it was performed, as the audit_task_type score can change later)
For each user for a given period, I'd like to create something like the following row of data:
users.id users.email time_spent_processing time_spent_shipping ... number_of_scans number_of_pallets
which would be calculated by figuring out for each user:
What audit_periods fall at least partially in the desired window? (Uses started_at and finished_at.)
How long did a user spend in each type of audit_period? (Should involve group by audit_periods.period_type, I'd imagine.)
What audit_tasks fall within the desired window? (Uses created_at -- not in the code below yet.)
How many of each type of audit_task did a user accomplish during the window? (Joins out to audit_task_type, and likely involves a group by on audit_task_types.name.)
How many points were earned during the time period? (Sums the scores of all the audit_tasks in the window.)
I've exhausted all of the SQL tricks I know (not many) and came up with something like the following:
select
u.id as user_id,
u.email as email,
u.team as team,
ap.period_type as period_type,
att.name,
time_to_sec(
timediff(least("2011-03-17 00:00:00", ifnull(ap.finished_at, utc_timestamp())), greatest("2011-03-16 00:00:00", ap.started_at))
) as period_duration,
sum(at.score) as period_score
from audit_periods as ap
inner join users as u on ap.user_id = u.id
left join audit_tasks as at on at.audit_period_id = ap.id
left join audit_task_types as att on at.audit_task_type_id = att.id
where (ap.started_at >= "2011-03-16 00:00:00" or (ap.finished_at >= "2011-03-17 00:00:00" and ap.finished_at <= "2011-03-17 00:00:00"))
and (ap.finished_at <= "2011-03-17 00:00:00" or (ap.started_at >= "2011-03-16 00:00:00" and ap.started_at <= "2011-03-16 00:00:00"))
and u.team in ("Foo", "Bar")
group by u.id, ap.id, at.id
but this seems to be functionally equivalent to just selecting all of the audit tasks in the end. I've tried some subqueries as well, but to little avail. More directly, this generates something like (skipping less important columns):
user_id | period_type | period_duration | name | score
1 processing 1800s scan 200
1 shipping 1000s place_in_pallet 100
1 shipping 1000s place_in_pallet 100
1 break 500s null null
when I want:
user_id | processing | shipping | break | scan | place_in_pallet | score
1 1800s 1000s 500s 1 2 400
I can easily fetch all of the audit_tasks for a given user and roll them up in code, but I might be fetching hundreds of thousands of audit_tasks over a given period, so it needs to be done in SQL.
Just to be clear -- I'm looking for a query to generate one row per user, containing summary data collected across the other 3 tables. So, for each user, I want to know how much time he spent in each type of audit_period (3600 seconds processing, 3200 seconds shipping, etc), as well as how many of each audit_task he performed (5 scans, 10 items placed in pallet, etc).
I think I have the elements of a solution, I'm just having trouble piecing them together. I know exactly how I would accomplish this in Ruby/Java/etc, but I don't think I understand SQL well enough to know which tool I'm missing. Do I need a temp table? A union? Some other construct entirely?
Any help is greatly appreciated, and I can clarify if the above is complete nonsense.
You will need to break this up into two crosstab queries which give you the information about audit_periods by user and another query that will give you the audit_task information by user and then join that to the Users table. It isn't clear how you want to roll up the information in each of the cases. For example, if a given user has 10 audit_period rows, how should the query roll up those durations? I assumed a sum of the durations here but you might want a min or max or perhaps even an overall delta.
Select U.user_id
, AuditPeriodByUser.TotalDuration_Processing As processing
, AuditPeriodByUser.TotalDuration_Shipping As shipping
, AuditPeriodByUser.TotalDuration_Break As break
, AuditTasksByUser.TotalCount_Scan As scan
, AuditTasksByUser.TotalCount_Place_In_Pallet As place_in_pallet
, AuditTasksByUser.TotalScore As score
From users As U
Left Join (
Select AP.user_id
, Sum( Case When AP.period_type = 'processing'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Processing
, Sum( Case When AP.period_type = 'shipping'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Shipping
, Sum( Case When AP.period_type = 'break'
Then Time_To_Sec(
TimeDiff(
Coalesce(AP.started_at, UTC_TIMESTAMP()), AP.finished_at ) ) )
As TotalDuration_Break
From audit_periods As AP
Where AP.started_at >= #StartDate
And AP.finished_at <= #EndDate
Group by AP.user_id
) As AuditPeriodByUser
On AuditPeriodByUser.user_id = U.user_id
Left Join (
Select AP.user_id
, Sum( Case When AT.Name = 'scan' Then 1 Else 0 End ) As TotalCount_Scan
, Sum( Case When AT.Name = 'place_in_pallet' Then 1 Else 0 End ) As TotalCount_Place_In_Pallet
, Sum( AT.score ) As TotalScore
From audit_tasks As AT
Join audit_task_types As ATT
On ATT.id = AT.audit_task_type_id
Join audit_periods As AP
On AP.audit_period_id = AP.id
Where AP.started_at >= #StartDate
And AP.finished_at <= #EndDate
Group By AP.user_id
) As AuditTasksByUser
On AuditTasksByUser.user_id = U.user_id