MySQL monthly report inaccuracies - mysql

My company has introduced an on-call rota for the IT department. I created a MySQL table which details who takes on-call, when they take it and when it's taken by the next individual on completion of each shift.
Below is a sample (with names removed) taken from late May - early June:
|seq__num | date_taken | date_relinquished | user |
|-----------|---------------|-----------------------|-----------|
| 1 | 2015-05-29 | 2015-06-05 | A |
| 2 | 2015-06-05 | 2015-06-06 | B |
| 3 | 2015-06-06 | 2015-06-07 | C |
| 4 | 2015-06-07 | 2015-06-10 | B |
| 5 | 2015-06-10 | 2015-06-10 | A |
| 6 | 2015-06-10 | 2015-06-12 | B |
| 7 | 2015-06-12 | 2015-06-19 | C |
| 8 | 2015-06-19 | 2015-07-03 | D |
The next step is to produce an automated monthly report which queries the table and outputs how many days each user held on-call for so Finance know how much they need paying. Currently this is counted manually.
The query I've got is:
SELECT user, SUM(DATEDIFF(date_relinquished, date_taken))
AS duration
FROM on-call_log
WHERE YEAR(date_relinquished) = YEAR(CURRENT_DATE - INTERVAL 1 MONTH)
AND MONTH(date_relinquished) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user
While this does work if on-call is held perfectly within a month. If someone is on-call from the one month into the next, it reports the full period, which produces inaccuracies. Instead of reporting as if June actually has 30 days, like so:
A 4
B 6
C 8
D 12
It takes into account how person A took on-call from the previous month and person D took it into the following month, like so:
A 7
B 6
C 8
D 14
I'm a bit of a loss as to how to make it report accurately. Does anyone have any suggestions or ideas? Thanks in advance.

One solution is to use a calendar table - even a calendar table holding all plausible dates into the future is depressingly small!
Then your query might look like this - I've assumed that on-calls are only counted once per day per user (DISTINCT)...
SELECT user
, DATE_FORMAT(dt,'%Y-%m') month
, COUNT(DISTINCT dt) total
FROM calendar x
JOIN my_table y
ON x.dt BETWEEN y.date_taken AND y.date_relinquished
GROUP
BY month
, user;
+------+---------+-------+
| user | month | total |
+------+---------+-------+
| A | 2015-05 | 3 |
| A | 2015-06 | 6 |
| B | 2015-06 | 8 |
| C | 2015-06 | 10 |
| D | 2015-06 | 12 |
| D | 2015-07 | 3 |
+------+---------+-------+

Related

Joining two MySQL datasets by date as well as id

I am struggling to find a way to efficently join two datasets using a single query
Dataset one can be returned using the following query:
SELECT hours_person_id, hours_date, hours_job, SUM(hours_value) AS hours
FROM hours
WHERE hours_status = 1
GROUP BY hours_person_id, hours_date, hours_job
which gives a dataset similar to
| 1 | 2020-06-07 | 101 | 25 |
| 1 | 2020-06-07 | 102 | 10 |
| 1 | 2020-06-07 | 103 | 5 |
| 2 | 2020-06-07 | 101 | 30 |
| 2 | 2020-06-07 | 104 | 10 |
From which we can get total hours per week, per job, etc...
Our second dataset gives us the hourly rates for the each person. The problem is that this table contains both historical and future hourly rates, so the join needs to ensure that the rate applies to the correct person_id and date. There could also be more than 1 rate for a person on a date.
The following gives all the rates that are active
SELECT rate_person_id, rate_date, rate_value
FROM rates
WHERE rate_active = 1
Which could look like
| 1 | 2020-01-01 | 20.00 |
| 1 | 2020-05-01 | 25.00 |
| 1 | 2020-07-01 | 22.00 |
| 2 | 2020-01-01 | 22.00 |
| 2 | 2020-05-01 | 24.00 |
| 3 | 2020-05-01 | 20.00 |
| 3 | 2020-05-01 | 21.00 |
| 3 | 2020-07-01 | 18.00 |
So for the hours above the rate from the 2020-05-01 would be the expected result, with the 21.00 value being the result for person_id === 3
Can what I am looking for be done in a single Query, or am I better off Joining two subqueries?
Update
As requested here is a fiddle that represents the above
https://www.db-fiddle.com/f/oiUpTnajY6M6ZTfZgRf4kT/0
As you can see we have a query that returns the correct data, but this query does not scale to our curennt data set (1.8m lines and more sub tables)
So for the hours above the rate from the 2020-05-01 would be the expected result, with the 21.00 value being the result for person_id === 1
From your rates output, person_id = 1 was never on rate value 21.00 .
| 1 | 2020-01-01 | 20.00 |
| 1 | 2020-05-01 | 25.00 |
| 1 | 2020-07-01 | 22.00 |
For 2 active rates for a person, do you need the most recent rate or you need the rate in the month where he worked. If there is no rate for that month then do you want 0 rate or something else.
SELECT h.*,
(SELECT rate_value
FROM rates r
WHERE h.hours_person_id = r.rate_person_id AND
r.date <= h.date
ORDER BY r.date DESC
LIMIT 1
) as rate_value
FROM hours h
I don't see what active has to do with the question, because you need to go back in time. You can then aggregate or do whatever you want once you have the correct rate on the date.

Converting lump sums to transactions

I have a database that tracks the size of claims.
Each claim has fixed information that is stored in claim (such as claim_id and date_reported_to_insurer).
Each month, I get a report which is added to the table claim_month. This includes fields such as claim_id, month_id [101 is 31/01/2018, 102 is 28/02/2018, etc] and paid_to_date.
Since most claims don't change from month to month, I only add a record for claim_month when the figure has changed since last month. As such, a claim may have a June report and an August report, but not a July report. This would be because the amount paid to date increased in June and August, but not July.
The problem that I have now is that I want to be able to check the amount paid each month.
Consider the following example data:
+----------------+----------+----------------+--------------+
| claim_month_id | claim_id | month_id | paid_to_date |
+----------------+----------+----------------+--------------+
| 1 | 1 | 6 | 1000 |
+----------------+----------+----------------+--------------+
| 5 | 1 | 7 | 1200 |
+----------------+----------+----------------+--------------+
| 7 | 2 | 6 | 500 |
+----------------+----------+----------------+--------------+
| 12 | 1 | 9 | 1400 |
+----------------+----------+----------------+--------------+
| 18 | 2 | 8 | 600 |
+----------------+----------+----------------+--------------+
If we assume that this is all of the information regarding claim 1 and 2, then that would suggest that they are both claims that occurred during June 2018. Their transactions should look like the following:
+----------------+----------+----------------+------------+
| claim_month_id | claim_id | month_id | paid_month |
+----------------+----------+----------------+------------+
| 1 | 1 | 6 | 1000 |
+----------------+----------+----------------+------------+
| 5 | 1 | 7 | 200 |
+----------------+----------+----------------+------------+
| 7 | 2 | 6 | 500 |
+----------------+----------+----------------+------------+
| 12 | 1 | 9 | 200 |
+----------------+----------+----------------+------------+
| 18 | 2 | 8 | 100 |
+----------------+----------+----------------+------------+
The algorithm I'm using for this is
SELECT claim_month_id,
month_id,
claim_id,
new.paid_to_date - old.paid_to_date AS paid_to_date_change,
FROM claim_month AS new
LEFT JOIN claim_month AS old
ON new.claim_id = old.claim_id
AND ( new.month_id > old.month_id
OR old.month_id IS NULL )
GROUP BY new.claim_month_id
HAVING old.month_id = Max(old.month_id)
However this has two issues:
It seems really inefficient at dealing with claims with multiple
records. I haven't run any benchmarking, but it's pretty obvious.
It doesn't show new claims. In the above example, it would only show lines 2, 3 and 5.
Where am I going wrong with my algorithm, and is there a better logic to use to do this?
Use LAG function to get the next paid_to_date of each claim_id, and use the current paid_to_date minus the next paid_to_date.
SELECT
claim_month_id,
claim_id,
month_id,
paid_to_date - LAG(paid_to_date, 1, 0) OVER (PARTITION BY claim_id ORDER BY month_id) AS paid_month
FROM claim
The output table is:
+----------------+----------+----------+------------+
| claim_month_id | claim_id | month_id | paid_month |
+----------------+----------+----------+------------+
| 1 | 1 | 6 | 1000 |
| 5 | 1 | 7 | 200 |
| 12 | 1 | 9 | 200 |
| 7 | 2 | 6 | 500 |
| 18 | 2 | 8 | 100 |
+----------------+----------+----------+------------+

Query to create records that don't exist between 2 points

I've currently written a query that returns the following
+-----------+--------+--------+
| client_id | Period | Status |
+-----------+--------+--------+
| 2378 | 1 | Paid |
| 2378 | 2 | Paid |
| 2378 | 4 | Paid |
| 2378 | 5 | Paid |
| 2378 | 6 | Frozen |
| 2378 | 10 | Paid |
+-----------+--------+--------+
However I would like it to include the periods for where I don't have any data.
Eg Period 3 and periods 7-9. By filling in what occurred in the previous period.
For example Period 3 would become paid. Like so:
+-----------+--------+--------+
| client_id | Period | Status |
+-----------+--------+--------+
| 2378 | 1 | Paid |
| 2378 | 2 | Paid |
| 2378 | 3 | Paid |
| 2378 | 4 | Paid |
| 2378 | 5 | Paid |
| 2378 | 6 | Frozen |
| 2378 | 7 | Frozen |
| 2378 | 8 | Frozen |
| 2378 | 9 | Frozen |
| 2378 | 10 | Paid |
+-----------+--------+--------+
Note that I do have more than 1 client ID present, and my intention is just to complete any blanks between the Minimum and Maximum Period that I have present for that Client_Id in the data.
Also the periods for each client varies from client to client. For example Client 1 can have a Max period 6 and Client 2 a Max period of 8.
Does anyone know of a way that this can be done?
I came across the following question which is slightly similar except for my case I feel like I need to code a loop over the different client_id's?
Example I found
A typical solution is to define a source (table) with all periods and left join your query to the table.
select
ap.Period,
case when q.Status is not null then q.Status else #prevStatus end as status,
#prevStatus:=q.Status
from all_periods ap
left join (your query here) q on ap.Period = q.Period,
(select #prevStatus:='undefined') sess

mysql return two minimum values with limit

I have a table named: workers and a table named: schedule with the following format:
workers:
| id | name | vacationA | vacationB | workhistory |
| 1 | Florin | 2017-05-05 | 2017-05-25 | 2010-01-01 |
| 2 | Andrei | 2017-06-05 | 2017-06-25 | 2010-01-01 |
| 3 | Alexandra | 2017-07-05 | 2017-07-25 | 2010-01-01 |
| 4 | Emilia | 2017-08-05 | 2017-08-25 | 2010-01-01 |
| 5 | Nicoleta | 2017-09-05 | 2017-09-25 | 2010-01-01 |
+----+-----------+------------+------------+-------------+
schedule:
| day | month | name | shifts |
+-----+-------+-----------+--------+
| 1 | 6 | Florin | 0 |
| 1 | 6 | Andrei | 1 |
| 1 | 6 | Alexandra | 2 |
| 1 | 6 | Emilia | 3 |
| 1 | 6 | Nicoleta | 4 |
+-----+-------+-----------+--------+
I need to interrogate table "workers" to give me 2 random names, with minimum shifts number, and workers should not be in vacation period. Also work history must be greater than 18 MONTHS.
In this case, the query i need should return Florin and Andrei.
This is what I've got so far, but it doesn't work as supposed:
SELECT name
FROM workers
WHERE (CURDATE() NOT BETWEEN vacationA AND vacationB) AND
workhistory > (DATE_SUB(CURDATE(), INTERVAL 18 MONTH)) AND
name IN (SELECT name FROM schedule ORDER BY shifts LIMIT 2)
ORDER BY RAND() LIMIT 2;
This query returns
1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'.
Thank you!
Join to schedule instead and then use LIMIT 2:
SELECT w.name
FROM workers w
INNER JOIN schedule s
ON w.name = s.name
WHERE CURDATE() NOT BETWEEN w.vacationA AND w.vacationB AND
w.workhistory > DATE_SUB(CURDATE(), INTERVAL 18 MONTH)
ORDER BY s.shifts
LIMIT 2;
I don't understand the random ordering, because you are only returning two worker records, and those records are not chosen randomly, but rather belong to the smallest shift numbers.

MySQL - Counting transactions related to the most recent visit

Suppose I have two tables in the same MySQL DB:
The first is the inbound_campaign_visit table. It looks like this.
+----+-----------+------------------+---------------------+
| id | user_id | inbound_campaign | date |
+----+-----------+------------------+---------------------+
| 1 | 1 | 1 | 2013-02-18 13:00:00 |
| 2 | 1 | 2 | 2013-02-24 13:00:00 |
| 3 | 2 | 3 | 2013-01-01 01:00:00 |
| 4 | 2 | 2 | 2013-02-24 19:00:00 |
+----+-----------+------------------+---------------------+
A row on this table is generated every time a user visits my site as a result of clicking on a promotional campaign. The "date" column represents the time when they came to the site.
The second table is my transaction table.
+--------+---------+---------------------+
| id | user_id | creation_date |
+--------+---------+---------------------+
| 321639 | 1 | 2013-02-18 14:00:00 |
| 321640 | 1 | 2013-02-24 15:00:00 |
| 321641 | 1 | 2013-02-25 13:00:00 |
| 321642 | 1 | 2013-04-05 12:00:00 |
| 321643 | 2 | 2013-01-01 12:00:00 |
| 321644 | 2 | 2013-02-23 12:00:00 |
+--------+---------+---------------------+
A row on this table is created whenever a transaction happens. The "creation_date" column represents the time the transaction occured.
I want to create a report that will count the number of transactions per inbound campaign. The following rules must apply:
A transaction is considered related to an inbound campaign if the user_id values match that of the transaction and the transaction occurred within 30 days of an inbound_campaign_visit row being created.
A transaction can only apply to the most recent inbound campaign_visit for the given user.
The result table should look something like this:
+------------------+-------------------+
| inbound_campaign | transaction_count |
+------------------+-------------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
+------------------+-------------------+
Notice that transactions 321644 and 321642 are not counted as they fail rule 1. Also notice how transaction 321641 only applies to inbound_campaign 2 and not inbound_campaign 1 (even though both campaigns fall within the 30 day restriction defined in rule 1).
I have been struggling with this for some time so any help would be appreciated. Of course I could do this in code but there must be a way to do this in SQL. TIA.
SELECT a.inbound_campaign,
COUNT(b.user_ID) TotalCount
FROM inbound_campaign_visit a
LEFT JOIN transaction b
ON a.user_ID = b.user_ID AND
DATEDIFF(CURDATE(), b.creation_date) > 30
GROUP BY a.inbound_campaign
SQLFiddle Demo