SQL Calculating Moving Average Crossover of variable lengths [duplicate]

SQL Calculating Moving Average Crossover of variable lengths [duplicate] - mysql

This question already has answers here:
How to calculated multiple moving average in MySQL
(3 answers)
Closed 9 years ago.
I am trying to calculate moving averages crossover with variable dates.
My database is structured:
id
stock_id
date
closing_price
And:
stock_id
symbol
For example, I'd like to find out if the average price going back X days ever gets greater than the average price going back Y days within the past Z days. Each of those time periods is variable. This needs to be run for every stock in the database (about 3000 stocks with prices going back 100 years).
I'm a bit stuck on this, what I currently have is a mess of SQL subqueries that don't work because they cant account for the fact that X, Y, and Z can all be any value (0-N). That is, in the past 5 days I could be looking for a stock where the 40 day average is > than 5, or the 5 > 40. Or I could be looking over the past 40 days to find stocks where the 10 day moving average is > 30 day moving average.
This question is different from the other questions as there is variable short and long dates as well as a variable term.

Please find see these earlier posts on Stackoverflow:
How to calculated multiple moving average in MySQL
Calculate moving averages in SQL
These posts have solutions to your question.

I think the most direct way to do a moving average in MySQL is using a correlated subquery. Here is an example:
select p.*,
(select avg(closing_price)
from prices p2
where p2.stock_id = p.stock_id and
p2.date between p.date - interval x day and pdate
) as MvgAvg_X,
(select avg(closing_price)
from prices p2
where p2.stock_id = p.stock_id and
p2.date between p.date - interval y day and pdate
) as MvgAvg_Y
from prices p
You need to fill in the values for x and y.
For performance reasons, you will want an index on prices(stock_id, date, closing_price).
If you have an option for another database, Oracle, Postgres, and SQL Server 2012 all offer much better performing solutions for this problem.
In Postgres, you can write this as:
select p.*,
avg(p.price) over (partition by stock_id rows x preceding) as AvgX,
avg(p.price) over (partition by stock_id rows y preceding) as AvgY
from p

Related

How to query available item leases based on a date range in MySQL?

We have a business that rents out international phone numbers to customers when traveling. When a customer makes an order We want to display to the customer the available phone numbers for his booking dates based on his start_date and end_date and numbers which is not occupied yet.
Since these phone numbers are rented out, I need to select from the table ONLY those numbers that are not rented out yet for dates that would interfere with the current customers dates.
I also don't want to rent out any phone number prior to 7 days after its end date. Meaning, If a customer booked a phone number for 1-1-2020 through 1-20-2020, I don't want this phone number to be booked by another customer before 1-27-2020. I want the phone number to have a 7 day window of being clear.
I have a table with the phone numbers and a table with the orders that is related to the phone numbers table via phone_number_id. The orders table has the current customers start_date and end_date for travel without the phone number id saved yet to it. The orders table also has the start_date and end_date for all other customers dates of travel as well as which phone_number_id was assigned/booked up for their travel dates.
How would the MySQL query look like when trying to select the phone numbers that are available for the current customers dates?
I build below query at the moment
SELECT x.id
, x.area_code
, x.phone_number
, y.start_date
, y.end_date
FROM vir_num_table x
LEFT
JOIN orderitemsdetail_table y
ON y.vn_id = x.id
WHERE y.start_date BETWEEN '2020-01-11' AND '2020-01-18'
OR y.start_date IS NULL
I've build this query but stuck here how can I add end_date logic.
Any help would be appreciated! Thanks in advance.

The way I'd approach the problem would be to look at conceptually, is as a cross product of the set of all phone numbers, along with the reservation timeframe, and then exclude those where there's a conflicting reservation.
A conflict would be an overlap, existing reservation that has a start_date before the end of the proposed reservation AND has an end_date on or after the start of the proposed reservation.
I'd do an anti-join pattern, something like this:
SELECT pn.phone_number
FROM phone_number pn
LEFT
JOIN reservation rs
ON rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
WHERE rs.phone_number IS NULL
That essentially says get all rows from phone number, along with matching rows from reservations (rows that overlap), but then exclude all the rows that had a match, leaving just phone_number rows that did not have a match.
We can make the < test a <= or , subtract 8 days, to tailor the "7 day" window before; we can tweak as we run the query through the test cases,
We can achieve an equivalent result using a NOT EXISTS and a correlated subquery. Some people find this easier to comprehend than the ant-join, but its essentially the same query, doing the same thing, get all rows from phone_number but exclude the rows where there is a matching (overlapping) row in reservation
SELECT pn.phone_number
FROM phone_number pn
WHERE NOT EXISTS
( SELECT 1
FROM reservation rs
WHERE rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
)
There are several questions on StackOverflow about checking for overlap, or no overlap, of date ranges.
See e.g.
How to check if two date ranges overlap in mysql?
PHP/SQL - How can I check if a date user input is in between an existing date range in my database?
MySQL query to select distinct rows based on date range overlapping
EDIT
Based on the SQL added as an edit to the question, I'd do the query like this:
SELECT pn.`id`
, pn.`area_code`
, pn.`phone_number`
FROM `vir_num_table` pn
LEFT
JOIN `orderitemsdetail_table` rs
ON rs.vn_id = pn.id
AND rs.start_date <= '2020-01-18' + INTERVAL +7 DAY
AND rs.end_date > '2020-01-11' + INTERVAL -7 DAY
WHERE rs.vn_id IS NULL
The two "tricky" parts. First is the anti-join, understanding how that works. (An outer join, to return all rows from vir_num_table but exclude any rows that have a matching row in reservations. The second tricky part is checking for the overlap, coming up with the conditions: r.start <= p.end AND r.end >= p.start, then tweaking whether we want to include the equals as an overlap, and tweaking the extra seven days (easiest to me to just subtract the 7 days from the beginning of the proposed reservation)
... now occurs to me like we need to add a guard period of 7 days on the end of the reservation period as well, doh!

Here's a query plus sorting algo to choose the optimal phone number selection for maximum utilization efficiency (i.e. getting as close as possible to exactly 7 days before and after each use).
I set it to give open ends a weight of 9, so that "near perfect" fits (7-8 days before or after) would be selected ahead of open-ended numbers. This will yield a slight efficiency improvement, as open numbers can accommodate any reservation. You can adjust this for your needs. If you set this to 0, for example, it would always select open numbers first.
SELECT ph.phone_number,
COALESCE(
MIN(
IF(res.end_date > res.start_date > '2020-01-18',
NULL, -- ignore before-comparison for reservations starting and ending after date range
DATEDIFF('2020-01-11', res.end_date)
), 9) AS open_days_before,
COALESCE(
MIN(
IF(res.start_date < res.end_date < '2020-01-11',
NULL, -- ignore after-comparison for reservations starting and ending before date range
DATEDIFF(res.start_date, '2020-01-18')
), 9) AS open_days_after
FROM phone_number ph
LEFT JOIN reservation res
ON res.phone_number = ph.phone_number
AND res.end_date >= CURRENT_DATE() - INTERVAL 6 DAY
GROUP BY ph.phone_number
HAVING open_days_before >= 7
AND open_days_after >= 7
ORDER BY open_days_before + open_days_after
LIMIT 1
Edit: updated to add grouping, because I realize this is an aggregate problem.
Edit 2: bug fix, changed MAX to MIN
Edit 3: added res.end_date >= CURRENT_DATE - INTERVAL 6 DAY to ignore past reservations, limiting aggregate data and treating phone number with no reservations between 6 days ago and the beginning of the new order as "open on the front-end"
Edit 4: added IF conditions to eliminate reservations outside the given before-or-after comparison ranges (e.g. comparing reservations after the selected range from influencing the "open days before" number), to prevent negative numbers, except when there's overlap with the selected range.

Based on the info you've added then you shouldn't need to check the start date of phone numbers which have been booked out.
You customer provides you with a start date and an end date.
You only rent out phone numbers 7 days after their last lease ended
All you need to do is fetch back phone numbers which either:
- Are not rented out and therefor aren't in the orderitems table
- OR have an end_date which is 7 days before the new customer's start date.
Here you go:
SELECT
`main_table`.`id`,
`main_table`.`area_code`,
`main_table`.`phone_number`,
`orderitemsdetail_table`.`start_date`,
`orderitemsdetail_table`.`end_date`
FROM
`vir_num_table` AS `main_table`
LEFT JOIN
`orderitemsdetail_table` AS `orderitemsdetail_table` ON main_table.id = orderitemsdetail_table.vn_id
WHERE
(DATE_ADD(orderitemsdetail_table.end_date, INTERVAL 7 DAY) < '<CUSTOMER START DATE>'
AND orderitemsdetail_table.start_date > '<CUSTOMER END DATE>')
OR orderitemsdetail_table.id IS NULL

MySQL Daily Time Coverage Without Gaps

I have a table like the following example:
What I need to do is return the coverage (number of hours an operator/s were onsite) for each day. The challenge is that I need to ignore gaps in coverage and not double count hours where two operators were signed in at the same time. For instance, the image below is a visual representation of the table.
The logic of the image is as follows:
Operator A: Signed in at 10 and signed out at noon for a total of 2 hours
Operator B: Signed in at 1 and signed out at 3 for a total of 2 hours
Operator A: Came back and signed in at 2 and signed out at 5 for a total of 3 hours but 1 hour overlaps with operator A so I cannot count that 1 hour otherwise I will be double counting coverage
Therefore the total coverage time without overlaps is 6 hours and the value I need the query to produce. So far I can ignore double counting by taking the max in min dates of each day and subtracting the two:
SELECT YEAR, WEEK, SUM(HOURS)
FROM
(SELECT
YEAR(SignedIn) AS YEAR,
WEEK(SignedIn) AS WEEK,
DAY(SignedIn) AS DAY,
time_to_sec(timediff(MAX(SignedOut), MIN(SignedIn)))/ 3600 AS HOURS
FROM OperatorLogs
GROUP BY YEAR, WEEK, DAY) As VirtualTable
GROUP BY YEAR, WEEK
Which produces 7 because it takes the first sign-in (10 AM) and calculates the hours up until the last sign-out (4:00 PM). However, it includes the gap in coverage (12 - 1) which should not be included. I am unsure of how to remove that time from the total hours while also not double counting when there is overlap, i.e. from 2-3 there should only be 1 hour of coverage even though two separate operators are on site each putting in an hour. Any help is appreciated.

Sorry, work interrupted me.
Here's my working solution, I'm not convinced it's optimal due to the (relatively) expensive nature of the joins, but I've optimised it slightly based on the soft-rule that "shifts" never span multiple days.
SELECT
calendar_date,
SUM(coverage_seconds) / 3600 AS coverage_hours
FROM
(
-- Signins that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedin) AS calendar_date,
-(UNIX_TIMESTAMP(e.signedin) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedin)
AND o.signedin < e.signedin
AND o.signedout >= e.signedin
WHERE
o.signedin IS NULL
UNION ALL
-- Signouts that didn't happen within another operators shift
SELECT DISTINCT
DATE(e.signedout) AS calendar_date,
+(UNIX_TIMESTAMP(e.signedout) MOD 86400) AS coverage_seconds
FROM
OperatorLogs e
LEFT JOIN
OperatorLogs o
ON o.signedin >= DATE(e.signedout)
AND o.signedin <= e.signedout
AND o.signedout > e.signedout
WHERE
o.signedin IS NULL
)
AS coverage_markers
GROUP BY
calendar_date
;
Feel free to test it with more rigourous data...
https://www.db-fiddle.com/f/4RgWVhcdNEro21rUksVdXD/0
(As a note, to make your sample data match your excel image, your first shift should have started at 9am)

Pulling rows from 3x columns based off dates

I have 3 columns of importance in my table, each of which store a date.
ID
Inpatient_date
ER_date
I am trying to find which people (ID) went to the ER (ER_date) within 30 days of seeing the hospital (Inpatient_date). I need to be able to look at every date within the inpatient_date column, and compare to every date in the ER_date column. Then from those results, further narrow it down by having the row with the ER_date that was within 30 days, and the row housing the Inpatient_date have the same persons ID.
I am at a loss on how to do this.

You can do this using exists:
select t.*
from t
where exists (select 1
from t t2
where t2.er_date > t.inpatient_date and
t2.er_date < t.inpatient_date + interval 30 day
);
I am interpreting your question as "visits the ER 1-30 days after being in the hospital". If you are looking for 30 days before, or 30 days before and after, you can adjust the condition in the subquery.

SQL - Calculating variable moving average over variable lenghts

FIRST: This question is NOT a duplicate. I have asked this on here already and it was closed as a duplicate. While it is similar to other threads on stackoverflow, it is actually far more complex. Please read the post before assuming it is a duplicate:
I am trying to calculate variable moving averages crossover with variable dates.
That is: I want to prompt the user for 3 values and 1 option. The input is through a web front end so I can build/edit the query based on input or have multiple queries if needed.
X = 1st moving average term (N day moving average. Any number 1-N)
Y = 2nd moving average term. (N day moving average. Any number 1-N)
Z = Amount of days back from present to search for the occurance of:
option = Over/Under: (> or <. X passing over Y, or X passing Under Y)
X day moving average passing over OR under Y day moving average
within the past Z days.
My database is structured:
tbl_daily_data
id
stock_id
date
adj_close
And:
tbl_stocks
stock_id
symbol
I have a btree index on:
daily_data(stock_id, date, adj_close)
stock_id
I am stuck on this query and having a lot of trouble writing it. If the variables were fixed it would seem trivial but because X, Y, Z are all 100% independent of each other (could look, for example for 5 day moving average within the past 100 days, or 100 day moving average within the past 5) I am having a lot of trouble coding it.
Please help! :(
Edit: I've been told some more context might be helpful?
We are creating an open stock analytic system where users can perform trend analysis. I have a database containing 3500 stocks and their price histories going back to 1970.
This query will be running every day in order to find stocks that match certain criteria
for example:
10 day moving average crossing over 20 day moving average within 5
days
20 day crossing UNDER 10 day moving average within 5 days
55 day crossing UNDER 22 day moving average within 100 days
But each user may be interested in a different analysis so I cannot just store the moving average with each row, it must be calculated.

I am not sure if I fully understand the question ... but something like this might help you get where you need to go: sqlfiddle
SET #X:=5;
SET #Y:=3;
set #Z:=25;
set #option:='under';
select * from (
SELECT stock_id,
datediff(current_date(), date) days_ago,
adj_close,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #X
) move_av_1,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #Y
) move_av_2
FROM
tbl_daily_data T1
where
datediff(current_date(), date) <= #z
) x
where
case when #option ='over' and move_av_1 > move_av_2 then 1 else 0 end +
case when #option ='under' and move_av_2 > move_av_1 then 1 else 0 end > 0
order by stock_id, days_ago
Based on answer by #Tom H here: How do I calculate a moving average using MySQL?

Tricky Rails3/mysql query

In rails 3 (also with meta_where gem if you feel like using it in your query), I got a really tricky query that I have been banging my head for:
Suppose I have two models, customers and purchases, customer have many purchases. Let's define customers with at least 2 purchases as "repeat_customer". I need to find the total number of repeat_customers by each day for the past 3 months, something like:
Date TotalRepeatCustomerCount
1/1/11 10 (10 repeat customers by the end of 1/1/11)
1/2/11 15 (5 more customer gained "repeat" status on this date)
1/3/11 16 (1 more customer gained "repeat" status on this date)
...
3/30/11 150
3/31/11 160
Basically I need to group customer count based on the date of creation of their second purchase, since that is when they "gain repeat status".
Certainly this can be achieved in ruby, something like:
Customer.includes(:purchases).all.select{|x| x.purchases.count >= 2 }.group_by{|x| x.purchases.second.created_at.to_date }.map{|date, customers| [date, customers.count]}
However, the above code will fire query on the same lines of Customer.all and Purchase.all, then do a bunch of calculation in ruby. I would much prefer doing selection, grouping and calculations in mysql, since it is not only much faster, it also reduces the bandwith from the database. In large databases, the code above is basically useless.
I have been trying for a while to conjure up the query in rails/active_record, but have no luck even with the nice meta_where gem. If I have to, I will accept a solution in pure mysql query as well.
Edited: I would cache it (or add a "repeat" field to customers), though only for this simplified problem. The criteria for repeat customer can change by the client at any point (2 purchases, 3 purchases, 4 purchases etc), so unfortunately I do have to calculate it on the spot.

SELECT p_date, COUNT(customers.id) FROM
(
SELECT p_date - INTERVAL 1 day p_date, customers.id
FROM
customers NATURAL JOIN purchases
JOIN (SELECT DISTINCT date(purchase_date) p_date FROM purchases) p_dates
WHERE purchases.purchase_date < p_date
GROUP BY p_date, customers.id
HAVING COUNT(purchases.id) >= 2
) a
GROUP BY p_date
I didn't test this in the slightest, so I hope it works. Also, I hope I understood what you are trying to accomplish.
But please note that you should not do this, it'll be too slow. Since the data never changes once the day is passed, just cache it for each day.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008