MySQL Daily Time Coverage Without Gaps - mysql

I have a table like the following example:
What I need to do is return the coverage (number of hours an operator/s were onsite) for each day. The challenge is that I need to ignore gaps in coverage and not double count hours where two operators were signed in at the same time. For instance, the image below is a visual representation of the table.
The logic of the image is as follows:
Operator A: Signed in at 10 and signed out at noon for a total of 2 hours
Operator B: Signed in at 1 and signed out at 3 for a total of 2 hours
Operator A: Came back and signed in at 2 and signed out at 5 for a total of 3 hours but 1 hour overlaps with operator A so I cannot count that 1 hour otherwise I will be double counting coverage
Therefore the total coverage time without overlaps is 6 hours and the value I need the query to produce. So far I can ignore double counting by taking the max in min dates of each day and subtracting the two:
SELECT YEAR, WEEK, SUM(HOURS)
FROM
(SELECT
YEAR(SignedIn) AS YEAR,
WEEK(SignedIn) AS WEEK,
DAY(SignedIn) AS DAY,
time_to_sec(timediff(MAX(SignedOut), MIN(SignedIn)))/ 3600 AS HOURS
FROM OperatorLogs
GROUP BY YEAR, WEEK, DAY) As VirtualTable
GROUP BY YEAR, WEEK
Which produces 7 because it takes the first sign-in (10 AM) and calculates the hours up until the last sign-out (4:00 PM). However, it includes the gap in coverage (12 - 1) which should not be included. I am unsure of how to remove that time from the total hours while also not double counting when there is overlap, i.e. from 2-3 there should only be 1 hour of coverage even though two separate operators are on site each putting in an hour. Any help is appreciated.

Sorry, work interrupted me.
Here's my working solution, I'm not convinced it's optimal due to the (relatively) expensive nature of the joins, but I've optimised it slightly based on the soft-rule that "shifts" never span multiple days.
SELECT
calendar_date,
SUM(coverage_seconds) / 3600 AS coverage_hours
FROM
(
-- Signins that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedin) AS calendar_date,
-(UNIX_TIMESTAMP(e.signedin) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedin)
AND o.signedin < e.signedin
AND o.signedout >= e.signedin
WHERE
o.signedin IS NULL
UNION ALL
-- Signouts that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedout) AS calendar_date,
+(UNIX_TIMESTAMP(e.signedout) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedout)
AND o.signedin <= e.signedout
AND o.signedout > e.signedout
WHERE
o.signedin IS NULL
)
AS coverage_markers
GROUP BY
calendar_date
;
Feel free to test it with more rigourous data...
https://www.db-fiddle.com/f/4RgWVhcdNEro21rUksVdXD/0
(As a note, to make your sample data match your excel image, your first shift should have started at 9am)

Related

How to query available item leases based on a date range in MySQL?

We have a business that rents out international phone numbers to customers when traveling. When a customer makes an order We want to display to the customer the available phone numbers for his booking dates based on his start_date and end_date and numbers which is not occupied yet.
Since these phone numbers are rented out, I need to select from the table ONLY those numbers that are not rented out yet for dates that would interfere with the current customers dates.
I also don't want to rent out any phone number prior to 7 days after its end date. Meaning, If a customer booked a phone number for 1-1-2020 through 1-20-2020, I don't want this phone number to be booked by another customer before 1-27-2020. I want the phone number to have a 7 day window of being clear.
I have a table with the phone numbers and a table with the orders that is related to the phone numbers table via phone_number_id. The orders table has the current customers start_date and end_date for travel without the phone number id saved yet to it. The orders table also has the start_date and end_date for all other customers dates of travel as well as which phone_number_id was assigned/booked up for their travel dates.
How would the MySQL query look like when trying to select the phone numbers that are available for the current customers dates?
I build below query at the moment
SELECT x.id
, x.area_code
, x.phone_number
, y.start_date
, y.end_date
FROM vir_num_table x
LEFT
JOIN orderitemsdetail_table y
ON y.vn_id = x.id
WHERE y.start_date BETWEEN '2020-01-11' AND '2020-01-18'
OR y.start_date IS NULL
I've build this query but stuck here how can I add end_date logic.
Any help would be appreciated! Thanks in advance.
The way I'd approach the problem would be to look at conceptually, is as a cross product of the set of all phone numbers, along with the reservation timeframe, and then exclude those where there's a conflicting reservation.
A conflict would be an overlap, existing reservation that has a start_date before the end of the proposed reservation AND has an end_date on or after the start of the proposed reservation.
I'd do an anti-join pattern, something like this:
SELECT pn.phone_number
FROM phone_number pn
LEFT
JOIN reservation rs
ON rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
WHERE rs.phone_number IS NULL
That essentially says get all rows from phone number, along with matching rows from reservations (rows that overlap), but then exclude all the rows that had a match, leaving just phone_number rows that did not have a match.
We can make the < test a <= or , subtract 8 days, to tailor the "7 day" window before; we can tweak as we run the query through the test cases,
We can achieve an equivalent result using a NOT EXISTS and a correlated subquery. Some people find this easier to comprehend than the ant-join, but its essentially the same query, doing the same thing, get all rows from phone_number but exclude the rows where there is a matching (overlapping) row in reservation
SELECT pn.phone_number
FROM phone_number pn
WHERE NOT EXISTS
( SELECT 1
FROM reservation rs
WHERE rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
)
There are several questions on StackOverflow about checking for overlap, or no overlap, of date ranges.
See e.g.
How to check if two date ranges overlap in mysql?
PHP/SQL - How can I check if a date user input is in between an existing date range in my database?
MySQL query to select distinct rows based on date range overlapping
EDIT
Based on the SQL added as an edit to the question, I'd do the query like this:
SELECT pn.`id`
, pn.`area_code`
, pn.`phone_number`
FROM `vir_num_table` pn
LEFT
JOIN `orderitemsdetail_table` rs
ON rs.vn_id = pn.id
AND rs.start_date <= '2020-01-18' + INTERVAL +7 DAY
AND rs.end_date > '2020-01-11' + INTERVAL -7 DAY
WHERE rs.vn_id IS NULL
The two "tricky" parts. First is the anti-join, understanding how that works. (An outer join, to return all rows from vir_num_table but exclude any rows that have a matching row in reservations. The second tricky part is checking for the overlap, coming up with the conditions: r.start <= p.end AND r.end >= p.start, then tweaking whether we want to include the equals as an overlap, and tweaking the extra seven days (easiest to me to just subtract the 7 days from the beginning of the proposed reservation)
... now occurs to me like we need to add a guard period of 7 days on the end of the reservation period as well, doh!
Here's a query plus sorting algo to choose the optimal phone number selection for maximum utilization efficiency (i.e. getting as close as possible to exactly 7 days before and after each use).
I set it to give open ends a weight of 9, so that "near perfect" fits (7-8 days before or after) would be selected ahead of open-ended numbers. This will yield a slight efficiency improvement, as open numbers can accommodate any reservation. You can adjust this for your needs. If you set this to 0, for example, it would always select open numbers first.
SELECT ph.phone_number,
COALESCE(
MIN(
IF(res.end_date > res.start_date > '2020-01-18',
NULL, -- ignore before-comparison for reservations starting and ending after date range
DATEDIFF('2020-01-11', res.end_date)
), 9) AS open_days_before,
COALESCE(
MIN(
IF(res.start_date < res.end_date < '2020-01-11',
NULL, -- ignore after-comparison for reservations starting and ending before date range
DATEDIFF(res.start_date, '2020-01-18')
), 9) AS open_days_after
FROM phone_number ph
LEFT JOIN reservation res
ON res.phone_number = ph.phone_number
AND res.end_date >= CURRENT_DATE() - INTERVAL 6 DAY
GROUP BY ph.phone_number
HAVING open_days_before >= 7
AND open_days_after >= 7
ORDER BY open_days_before + open_days_after
LIMIT 1
Edit: updated to add grouping, because I realize this is an aggregate problem.
Edit 2: bug fix, changed MAX to MIN
Edit 3: added res.end_date >= CURRENT_DATE - INTERVAL 6 DAY to ignore past reservations, limiting aggregate data and treating phone number with no reservations between 6 days ago and the beginning of the new order as "open on the front-end"
Edit 4: added IF conditions to eliminate reservations outside the given before-or-after comparison ranges (e.g. comparing reservations after the selected range from influencing the "open days before" number), to prevent negative numbers, except when there's overlap with the selected range.
Based on the info you've added then you shouldn't need to check the start date of phone numbers which have been booked out.
You customer provides you with a start date and an end date.
You only rent out phone numbers 7 days after their last lease ended
All you need to do is fetch back phone numbers which either:
- Are not rented out and therefor aren't in the orderitems table
- OR have an end_date which is 7 days before the new customer's start date.
Here you go:
SELECT
`main_table`.`id`,
`main_table`.`area_code`,
`main_table`.`phone_number`,
`orderitemsdetail_table`.`start_date`,
`orderitemsdetail_table`.`end_date`
FROM
`vir_num_table` AS `main_table`
LEFT JOIN
`orderitemsdetail_table` AS `orderitemsdetail_table` ON main_table.id = orderitemsdetail_table.vn_id
WHERE
(DATE_ADD(orderitemsdetail_table.end_date, INTERVAL 7 DAY) < '<CUSTOMER START DATE>'
AND orderitemsdetail_table.start_date > '<CUSTOMER END DATE>')
OR orderitemsdetail_table.id IS NULL

SQL Calculating Moving Average Crossover of variable lengths [duplicate]

This question already has answers here:
How to calculated multiple moving average in MySQL
(3 answers)
Closed 9 years ago.
I am trying to calculate moving averages crossover with variable dates.
My database is structured:
id
stock_id
date
closing_price
And:
stock_id
symbol
For example, I'd like to find out if the average price going back X days ever gets greater than the average price going back Y days within the past Z days. Each of those time periods is variable. This needs to be run for every stock in the database (about 3000 stocks with prices going back 100 years).
I'm a bit stuck on this, what I currently have is a mess of SQL subqueries that don't work because they cant account for the fact that X, Y, and Z can all be any value (0-N). That is, in the past 5 days I could be looking for a stock where the 40 day average is > than 5, or the 5 > 40. Or I could be looking over the past 40 days to find stocks where the 10 day moving average is > 30 day moving average.
This question is different from the other questions as there is variable short and long dates as well as a variable term.
Please find see these earlier posts on Stackoverflow:
How to calculated multiple moving average in MySQL
Calculate moving averages in SQL
These posts have solutions to your question.
I think the most direct way to do a moving average in MySQL is using a correlated subquery. Here is an example:
select p.*,
(select avg(closing_price)
from prices p2
where p2.stock_id = p.stock_id and
p2.date between p.date - interval x day and pdate
) as MvgAvg_X,
(select avg(closing_price)
from prices p2
where p2.stock_id = p.stock_id and
p2.date between p.date - interval y day and pdate
) as MvgAvg_Y
from prices p
You need to fill in the values for x and y.
For performance reasons, you will want an index on prices(stock_id, date, closing_price).
If you have an option for another database, Oracle, Postgres, and SQL Server 2012 all offer much better performing solutions for this problem.
In Postgres, you can write this as:
select p.*,
avg(p.price) over (partition by stock_id rows x preceding) as AvgX,
avg(p.price) over (partition by stock_id rows y preceding) as AvgY
from p

SQL - Calculating variable moving average over variable lenghts

FIRST: This question is NOT a duplicate. I have asked this on here already and it was closed as a duplicate. While it is similar to other threads on stackoverflow, it is actually far more complex. Please read the post before assuming it is a duplicate:
I am trying to calculate variable moving averages crossover with variable dates.
That is: I want to prompt the user for 3 values and 1 option. The input is through a web front end so I can build/edit the query based on input or have multiple queries if needed.
X = 1st moving average term (N day moving average. Any number 1-N)
Y = 2nd moving average term. (N day moving average. Any number 1-N)
Z = Amount of days back from present to search for the occurance of:
option = Over/Under: (> or <. X passing over Y, or X passing Under Y)
X day moving average passing over OR under Y day moving average
within the past Z days.
My database is structured:
tbl_daily_data
id
stock_id
date
adj_close
And:
tbl_stocks
stock_id
symbol
I have a btree index on:
daily_data(stock_id, date, adj_close)
stock_id
I am stuck on this query and having a lot of trouble writing it. If the variables were fixed it would seem trivial but because X, Y, Z are all 100% independent of each other (could look, for example for 5 day moving average within the past 100 days, or 100 day moving average within the past 5) I am having a lot of trouble coding it.
Please help! :(
Edit: I've been told some more context might be helpful?
We are creating an open stock analytic system where users can perform trend analysis. I have a database containing 3500 stocks and their price histories going back to 1970.
This query will be running every day in order to find stocks that match certain criteria
for example:
10 day moving average crossing over 20 day moving average within 5
days
20 day crossing UNDER 10 day moving average within 5 days
55 day crossing UNDER 22 day moving average within 100 days
But each user may be interested in a different analysis so I cannot just store the moving average with each row, it must be calculated.
I am not sure if I fully understand the question ... but something like this might help you get where you need to go: sqlfiddle
SET #X:=5;
SET #Y:=3;
set #Z:=25;
set #option:='under';
select * from (
SELECT stock_id,
datediff(current_date(), date) days_ago,
adj_close,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #X
) move_av_1,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #Y
) move_av_2
FROM
tbl_daily_data T1
where
datediff(current_date(), date) <= #z
) x
where
case when #option ='over' and move_av_1 > move_av_2 then 1 else 0 end +
case when #option ='under' and move_av_2 > move_av_1 then 1 else 0 end > 0
order by stock_id, days_ago
Based on answer by #Tom H here: How do I calculate a moving average using MySQL?

Group by date from multiple columns?

first of all sorry for that title, but I have no idea how to describe it:
I'm saving sessions in my table and I would like to get the count of sessions per hour to know how many sessions were active over the day. The sessions are specified by two timestamps: start and end.
Hopefully you can help me.
Here we go:
http://sqlfiddle.com/#!2/bfb62/2/0
While I'm still not sure how you'd like to compare the start and end dates, looks like using COUNT, YEAR, MONTH, DAY, and HOUR, you could come up with your desired results.
Possibly something similar to this:
SELECT COUNT(ID), YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
FROM Sessions
GROUP BY YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
And the SQL Fiddle.
What you want to do is rather hard in MySQL. You can, however, get an approximation without too much difficulty. The following counts up users who start and stop within one day:
select date(start), hour,
sum(case when hours.hour between hour(start) and hours.hour then 1 else 0
end) as GoodEstimate
from sessions s cross join
(select 0 as hour union all
select 1 union all
. . .
select 23
) hours
group by date(start), hour
When a user spans multiple days, the query is harder. Here is one approach, that assumes that there exists a user who starts during every hour:
select thehour, count(*)
from (select distinct date(start), hour(start),
(cast(date(start) as datetime) + interval hour(start) hour as thehour
from sessions
) dh left outer join
sessions s
on s.start <= thehour + interval 1 hour and
s.end >= thehour
group by thehour
Note: these are untested so might have syntax errors.
OK, this is another problem where the index table comes to the rescue.
An index table is something that everyone should have in their toolkit, preferably in the master database. It is a table with a single id int primary key indexed column containing sequential numbers from 0 to n where n is a number big enough to do what you need, 100,000 is good, 1,000,000 is better. You only need to create this table once but once you do you will find it has all kinds of applications.
For your problem you need to consider each hour and, if I understand your problem you need to count every session that started before the end of the hour and hasn't ended before that hour starts.
Here is the SQL fiddle for the solution.
What it does is use a known sequential number from the indextable (only 0 to 100 for this fiddle - just over 4 days - you can see why you need a big n) to link with your data at the top and bottom of the hour.

Calculating increasing or decreasing trend over time in MySQL

I have a table store_visits with the following structure:
store_visits:
store_name: string
visit_count: integer
visit_date: date
My goal is to create a query that for each store and a given date range, will calculate:
Average Number of Visits over the date range (currently using AVG(visit_count))
Whether store visits are increasing or decreasing
The relative rate of increase/decrease (1 to 4 scale where 1 = low rate, 4 = high rate)
The relative rate of increase/decrease in visits is for directional purpose only. It will always be a linear scale.
I've spent a day trying to construct the MySQL query to do this, and just can't get my head around it.
Any help would be greatly appreciated.
Thanks,
-Scott
Assuming you just want to compare the store visits in the first half of the date range to the second half, here's an example that spans the last 40 days using 2 sub-queries to get the counts for each range.
select
((endVisits + startVisits)/40) average,
(endVisits > startVisits) increasing,
((endVisits - startVisits)/(startVisits) * 100) percentChange
from
(select sum(visit_count) startVisits
from store_visit
where
visit_date > current_date - 40
and visit_date <= current_date - 20) startRange,
(select sum(visit_count) endVisits
from store_visit
where
visit_date > current_date - 20) endRange;
Notes
I don't know where the how you want to calculate your 1-4 increase amount, so I just made it a percentage and you can modify that to whatever logic you want. Also, you'll need to update the date ranges in the sub-queries as needed.
Edit: Just updated the average to ((endVisits + startVisits)/40) instead of ((endVisits + startVisits)/2). You could also use the avg function in your sub-queries and divide the sum of those by 2 to get the average over the whole period.