I've been tasked with making output that fetches number of days passed between an order and its shipment, like this:
order_date
orders
Days0
Days1
Days7Plus
2022-11-01
12
9
3
1
2022-11-15
22
20
0
2
2022-12-02
77
65
5
7
I'm sure you can imagine example underlying data, where there's an orders table with a unique ID per record, an order date that can share multiple IDs, and each order has its own ship date.
The hard part is counting only business days, which required subtracting weekends and holidays from the date range days. I got that all figured out but it required copy-pasting these ugly sub-queries 7 more times :/ While this can be dynamically generated in other code, I figured there must be a cleaner way, since other people (some non-devs) may be testing or reviewing this, and I'll probably get grief about it.
Here's the query essentially:
# get orders shipped count
SELECT
...
# orders that were shipped x number of days from receipt
SUM(COALESCE(DATEDIFF(shipped, ordered), 0)
- ( # subtract weekend days
5 * (DATEDIFF('2022-12-05', '2022-11-01') DIV 7)
+ MID('0123444401233334012222340111123400012345001234550',
7 * WEEKDAY('2022-11-01') + WEEKDAY('2022-12-05') + 1, 1
)
)
- ( # subtract holidays
SELECT COUNT(`date`) FROM holiday WHERE active = 1
AND `date` BETWEEN '2022-11-01' AND '2022-12-05'
AND DAYOFWEEK(`date`) < 6
)
= 0) AS 0Days, # subsequently 1Days, 2Days, 3Days, etc
...
SUM(COALESCE(DATEDIFF(shipped, ordered), 0)
- ( # subtract weekend days
5 * (DATEDIFF('2022-12-05', '2022-11-01') DIV 7)
+ MID('0123444401233334012222340111123400012345001234550',
7 * WEEKDAY('2022-11-01') + WEEKDAY('2022-12-05') + 1, 1
)
)
- ( # subtract holidays
SELECT COUNT(`date`) FROM holiday WHERE active = 1
AND `date` BETWEEN '2022-11-01' AND '2022-12-05'
AND DAYOFWEEK(`date`) < 6
)
>= 7) AS Days7Plus
FROM orders
WHERE
AND ordered BETWEEN :startDate AND :endDate
GROUP BY CAST(ordered AS DATE)
ORDER BY ordered
I got the MID calculation from https://stackoverflow.com/a/6762805/14744970
I feel pretty proud of getting it all together, but I feel like I'm a small step away from collapsing the redundancy down somehow that I'm not quite understanding.
Note that I don't know if the GROUP BY actually matters with any sort of simplifying of the redundant statements.
Related
We have a business that rents out international phone numbers to customers when traveling. When a customer makes an order We want to display to the customer the available phone numbers for his booking dates based on his start_date and end_date and numbers which is not occupied yet.
Since these phone numbers are rented out, I need to select from the table ONLY those numbers that are not rented out yet for dates that would interfere with the current customers dates.
I also don't want to rent out any phone number prior to 7 days after its end date. Meaning, If a customer booked a phone number for 1-1-2020 through 1-20-2020, I don't want this phone number to be booked by another customer before 1-27-2020. I want the phone number to have a 7 day window of being clear.
I have a table with the phone numbers and a table with the orders that is related to the phone numbers table via phone_number_id. The orders table has the current customers start_date and end_date for travel without the phone number id saved yet to it. The orders table also has the start_date and end_date for all other customers dates of travel as well as which phone_number_id was assigned/booked up for their travel dates.
How would the MySQL query look like when trying to select the phone numbers that are available for the current customers dates?
I build below query at the moment
SELECT x.id
, x.area_code
, x.phone_number
, y.start_date
, y.end_date
FROM vir_num_table x
LEFT
JOIN orderitemsdetail_table y
ON y.vn_id = x.id
WHERE y.start_date BETWEEN '2020-01-11' AND '2020-01-18'
OR y.start_date IS NULL
I've build this query but stuck here how can I add end_date logic.
Any help would be appreciated! Thanks in advance.
The way I'd approach the problem would be to look at conceptually, is as a cross product of the set of all phone numbers, along with the reservation timeframe, and then exclude those where there's a conflicting reservation.
A conflict would be an overlap, existing reservation that has a start_date before the end of the proposed reservation AND has an end_date on or after the start of the proposed reservation.
I'd do an anti-join pattern, something like this:
SELECT pn.phone_number
FROM phone_number pn
LEFT
JOIN reservation rs
ON rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
WHERE rs.phone_number IS NULL
That essentially says get all rows from phone number, along with matching rows from reservations (rows that overlap), but then exclude all the rows that had a match, leaving just phone_number rows that did not have a match.
We can make the < test a <= or , subtract 8 days, to tailor the "7 day" window before; we can tweak as we run the query through the test cases,
We can achieve an equivalent result using a NOT EXISTS and a correlated subquery. Some people find this easier to comprehend than the ant-join, but its essentially the same query, doing the same thing, get all rows from phone_number but exclude the rows where there is a matching (overlapping) row in reservation
SELECT pn.phone_number
FROM phone_number pn
WHERE NOT EXISTS
( SELECT 1
FROM reservation rs
WHERE rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
)
There are several questions on StackOverflow about checking for overlap, or no overlap, of date ranges.
See e.g.
How to check if two date ranges overlap in mysql?
PHP/SQL - How can I check if a date user input is in between an existing date range in my database?
MySQL query to select distinct rows based on date range overlapping
EDIT
Based on the SQL added as an edit to the question, I'd do the query like this:
SELECT pn.`id`
, pn.`area_code`
, pn.`phone_number`
FROM `vir_num_table` pn
LEFT
JOIN `orderitemsdetail_table` rs
ON rs.vn_id = pn.id
AND rs.start_date <= '2020-01-18' + INTERVAL +7 DAY
AND rs.end_date > '2020-01-11' + INTERVAL -7 DAY
WHERE rs.vn_id IS NULL
The two "tricky" parts. First is the anti-join, understanding how that works. (An outer join, to return all rows from vir_num_table but exclude any rows that have a matching row in reservations. The second tricky part is checking for the overlap, coming up with the conditions: r.start <= p.end AND r.end >= p.start, then tweaking whether we want to include the equals as an overlap, and tweaking the extra seven days (easiest to me to just subtract the 7 days from the beginning of the proposed reservation)
... now occurs to me like we need to add a guard period of 7 days on the end of the reservation period as well, doh!
Here's a query plus sorting algo to choose the optimal phone number selection for maximum utilization efficiency (i.e. getting as close as possible to exactly 7 days before and after each use).
I set it to give open ends a weight of 9, so that "near perfect" fits (7-8 days before or after) would be selected ahead of open-ended numbers. This will yield a slight efficiency improvement, as open numbers can accommodate any reservation. You can adjust this for your needs. If you set this to 0, for example, it would always select open numbers first.
SELECT ph.phone_number,
COALESCE(
MIN(
IF(res.end_date > res.start_date > '2020-01-18',
NULL, -- ignore before-comparison for reservations starting and ending after date range
DATEDIFF('2020-01-11', res.end_date)
), 9) AS open_days_before,
COALESCE(
MIN(
IF(res.start_date < res.end_date < '2020-01-11',
NULL, -- ignore after-comparison for reservations starting and ending before date range
DATEDIFF(res.start_date, '2020-01-18')
), 9) AS open_days_after
FROM phone_number ph
LEFT JOIN reservation res
ON res.phone_number = ph.phone_number
AND res.end_date >= CURRENT_DATE() - INTERVAL 6 DAY
GROUP BY ph.phone_number
HAVING open_days_before >= 7
AND open_days_after >= 7
ORDER BY open_days_before + open_days_after
LIMIT 1
Edit: updated to add grouping, because I realize this is an aggregate problem.
Edit 2: bug fix, changed MAX to MIN
Edit 3: added res.end_date >= CURRENT_DATE - INTERVAL 6 DAY to ignore past reservations, limiting aggregate data and treating phone number with no reservations between 6 days ago and the beginning of the new order as "open on the front-end"
Edit 4: added IF conditions to eliminate reservations outside the given before-or-after comparison ranges (e.g. comparing reservations after the selected range from influencing the "open days before" number), to prevent negative numbers, except when there's overlap with the selected range.
Based on the info you've added then you shouldn't need to check the start date of phone numbers which have been booked out.
You customer provides you with a start date and an end date.
You only rent out phone numbers 7 days after their last lease ended
All you need to do is fetch back phone numbers which either:
- Are not rented out and therefor aren't in the orderitems table
- OR have an end_date which is 7 days before the new customer's start date.
Here you go:
SELECT
`main_table`.`id`,
`main_table`.`area_code`,
`main_table`.`phone_number`,
`orderitemsdetail_table`.`start_date`,
`orderitemsdetail_table`.`end_date`
FROM
`vir_num_table` AS `main_table`
LEFT JOIN
`orderitemsdetail_table` AS `orderitemsdetail_table` ON main_table.id = orderitemsdetail_table.vn_id
WHERE
(DATE_ADD(orderitemsdetail_table.end_date, INTERVAL 7 DAY) < '<CUSTOMER START DATE>'
AND orderitemsdetail_table.start_date > '<CUSTOMER END DATE>')
OR orderitemsdetail_table.id IS NULL
Today I want to get a help in creating scores per user in my database. I have this query:
SELECT
r1.id,
r1.nickname,
r1.fecha,
r1.bestia1,
r1.bestia2,
r1.bestia3,
r1.bestia4
r1.bestia5
FROM
reporte AS r1
INNER JOIN
( SELECT
nickname, MAX(fecha) AS max_date
FROM
reporte
GROUP BY
nickname ) AS latests_reports
ON latests_reports.nickname = r1.nickname
AND latests_reports.max_date = r1.fecha
ORDER BY
r1.fecha DESC
that's from a friend from this site who helped me in get "the last record per user in each day", based on this I am looking how to count the results in a ranking daily, weekly or monthly, in order to use statistics charts or google datastudio, I've tried the next:
select id, nickname, sum(bestia1), sum(bestia2), etc...
But its not giving the complete result which I want. That's why I am looking for help. Additionally I know datastudio filters where I can show many charts but still I can count completely.
for example, one player in the last 30 days reported 265 monsters killed, but when I use in datastudio my query it counts only the latest value (it can be 12). so I want to count correctly in order to use with charts
SQL records filtered with my query:
One general approach for get the total monsters killed by each user on the latest X days and make a score calculation like the one you propose on the commentaries can be like this:
SET #daysOnHistory = X; -- Where X should be an integer positive number (like 10).
SELECT
nickname,
SUM(bestia1) AS total_bestia1_killed,
SUM(bestia2) AS total_bestia2_killed,
SUM(bestia3) AS total_bestia3_killed,
SUM(bestia4) AS total_bestia4_killed,
SUM(bestia5) AS total_bestia5_killed,
SUM(bestia1 + bestia2 + bestia3 + bestia4 + bestia5) AS total_monsters_killed,
SUM(bestia1 + 2 * bestia2 + 3 * bestia3 + 4 * bestia4 + 5 * bestia5) AS total_score
FROM
reporte
WHERE
fecha >= DATE_ADD(DATE(NOW()), INTERVAL -#daysOnHistory DAY)
GROUP BY
nickname
ORDER BY
total_score DESC
Now, if you want the same calculation but only taking into account the days of the current week (assuming a week starts on Monday), you need to replace the previous WHERE clause by next one:
WHERE
fecha >= DATE_ADD(DATE(NOW()), INTERVAL -WEEKDAY(NOW()) DAY)
Even more, if you want all the same, but only taking into account the days of the current month, you need to replace the WHERE clause by:
WHERE
MONTH(fecha) = MONTH(NOW())
For evaluate the statistics on the days of the current year, you need to replace the WHERE clause by:
WHERE
YEAR(fecha) = YEAR(NOW())
And finally, for evaluation on a specific range of days you can use, for example:
WHERE
DATE(fecha) BETWEEN CAST("2018-10-15" AS DATE) AND CAST('2018-11-10' AS DATE)
I hope this guide will help you and clarify your outlook.
This will give you number of monster killed in the last 30 days per user :
SELECT
nickname,
sum(bestia1) as bestia1,
sum(bestia2) as bestia2,
sum(bestia3) as bestia3,
sum(bestia4) as bestia4,
sum(bestia5) as bestia5
FROM
reporte
WHERE fecha >= DATE_ADD(curdate(), interval -30 day)
GROUP BY nickName
ORDER BY
I want to ask some help on SQL Query on how to retrieve bookings with specific age group. Basically, i want to retrieve bookings where there are customers who are Adults and child, these are determined only by date of birth. Children are treated as 15 years old below and adults are more than 15 years of age. I want to retrieve bookings who have children and adults that does not exceed 20yrs of age. No bookings should be retrieve if there is one customer in the booking that has age of more than 20 yrs old. And bookings should have more than 1 customer. Here's a sample table for your reference -
Booking No 123
Customer 1 - March 1, 2008
Customer 2 - Aug 3, 1998
Booking No 456
Customer 1 - March 2, 1986
Customer 2 - Feb 9, 2007
Customer 3 - Apr 10, 1999
Booking No 789
Customer 1 - Jun 7, 1999
The booking that needs to be retrieved is only Booking No 123. No age is provided in the table and computed only using Date of birth - DateDiff.
BookingID
CustomerID
LName
FName
DOB
ReservationID
BookingID
CompanyID
ArrivalDate
CompanyName
This is the where statement that i've put
(SELECT DATEDIFF(YEAR ,bp.DOB,GETDATE())) <= 20 AND (SELECT DATEDIFF(YEAR ,bp.DOB,GETDATE())) < 15
But still pulling bookings containing customers > 20 yrs old.
This should get you the bookings. The date computation should be based on the RESERVATION Arrival Date, as such if considering older or future reservations, getdate() WOULD alter the computed age at the time of arrival.
I am doing a direct join between the reservation and booking tables grouped by booking and qualifying every occupant's age.
SELECT
R.BookingID
FROM
BookingCustomer BC
JOIN Reservation R
ON BC.BookingID = R.BookingID
group by
R.BookingID
having
SUM( case when DATEDIFF(YEAR , BC.DOB, R.ArrivalDate ) < 16 then 1 else 0 end ) > 0
AND SUM( case when DATEDIFF(YEAR , BC.DOB, R.ArrivalDate ) >= 16
and DATEDIFF(YEAR , BC.DOB, R.ArrivalDate ) < 21 then 1 else 0 end ) > 0
AND SUM( case when DATEDIFF(YEAR , BC.DOB, R.ArrivalDate ) > 20 then 1 else 0 end ) = 0
Now, since a booking is all pointing to a same reservation, you COULD grab all the other fields at the same time
SELECT
R.BookingID,
R.ReservationID,
R.CompanyID,
R.ArrivalDate,
R.CompanyName ... rest of query.
If the query nags about non-aggregate fields, you could just wrap the other fields not part of the group by as MAX() since a booking is always pointing to the same respective reservation and the parent reservation details would not change anyhow.
SELECT
R.BookingID,
MAX( R.ReservationID ) ReservationID,
MAX( R.CompanyID ) CompanyID,
MAX( R.ArrivalDate ) ArrivalDate,
MAX( R.CompanyName ) CompanyName ... rest of query.
Okay, now we can see what's going on (and what's going wrong for you).
Your current query has this:
WHERE (SELECT DATEDIFF(YEAR ,bp.DOB,GETDATE()) <= 20
AND (SELECT DATEDIFF(YEAR ,bp.DOB,GETDATE())) < 15
...this can be translated to:
WHERE (the number of times January 1st is passed) <= 20
AND AT THE SAME TIME (the number of times January 1st is passed) < 15
Besides the fact that ANDs are exclusive - rows have to match both conditions - what's going on is that DATEDIFF counts the number of "boundaries" crossed for the given measure:
Returns the count (signed integer) of the specified datepart boundaries crossed between the specified startdate and enddate.
... and of course the boundary for a year would be January 1st.
First, a digression on range-searching on database. What you do this, WHERE DATEDIFF(YEAR ,bp.DOB,GETDATE() <= 20, you usually cause the database to ignore indices, which are ways to speed up queries; this is because it has to calculate a value (here, the difference in the year), for each row in the table (because otherwise it doesn't know if the calculated value matches).
Instead, it's better to do any "math", whenever possible, on constant values, since the database is going to remember them. The form we should use here will also solve the "selecting older customers" problem too:
WHERE DOB <= DATEADD(year, -21, GETDATE())
(This is equivalent to those "you are 21 if you were born on or before this date in the year XXXX" signs you see in grocery stores)
No that that's out of the way, we need to figure out what we actually need. Restating your conditions above, we're looking for bookings with (all of):
At least one customer 20 years or younger
At least one customer younger than 15 years
No customers more than 20 years old
At least two customers.
Now... Presumably we don't care about multiple (or single) customers that are younger than 20 years old, so long as they're also all more than 15 years old, so we should modify the first condition. Also, quite probably we need to warn if there are only customers in a booking who are 15 years or younger - they don't even have an "almost" adult! And quite probably we need to warn if this person would be all alone, too! So the conditions should be changed to:
At least one customer younger than 15 years
No customers 21 years or older
(please tell me if this restatement was incorrect)
Now that we no our conditions, we can write our statement. We are looking for bookings:
SELECT ReservationID, BookingID, CompanyID, ArrivalDate, CompanyName
FROM Booking
Where there is at least one customer younger than 15 years:
WHERE EXISTS (SELECT 1
FROM BookingCustomer
WHERE BookingCustomer.bookingId = Booking.BookingId
-- birthday after 15 years ago today
AND BookingCustomer.dob > DATEADD(year, -15, GETDATE()))
And there is also no customer 21 years or older:
AND NOT EXISTS (SELECT 1
FROM BookingCustomer
WHERE BookingCustomer.bookingId = Booking.BookingId
-- birthday before or on 21 years ago today
AND BookingCustomer.dob <= DATEADD(year, -21, GETDATE()))
side note: most of the time for booked tours and stuff, they only care about ages at the time the trip is taken, not the booking time, or whatever "today" happens to be when you run this. You probably don't want GETDATE(), but something else, likely ArrivalDate. Since you'd be doing math on a column it would again force a table scan, but keeping the age check - and modifying it a bit to take into account how far ahead a booking can be made - would knock out bookings "earlier" because somebody is definitely old enough (or nobody young enough).
Thanks for the detailed response. I tried the suggested query and it's not returning any booking
WHERE EXISTS (SELECT 1
FROM BookingCustomer
WHERE BookingCustomer.bookingId = Booking.BookingId
-- birthday after 15 years ago today
OR BookingCustomer.dob > DATEADD(year, -15, BOOKING.ARRIVALDATE))
AND NOT EXISTS (SELECT 1
FROM BookingCustomer
WHERE BookingCustomer.bookingId = Booking.BookingId
-- birthday before or on 21 years ago today
AND BookingCustomer.dob <= DATEADD(year, -21, BOOKING.ARRIVALDATE))
FIRST: This question is NOT a duplicate. I have asked this on here already and it was closed as a duplicate. While it is similar to other threads on stackoverflow, it is actually far more complex. Please read the post before assuming it is a duplicate:
I am trying to calculate variable moving averages crossover with variable dates.
That is: I want to prompt the user for 3 values and 1 option. The input is through a web front end so I can build/edit the query based on input or have multiple queries if needed.
X = 1st moving average term (N day moving average. Any number 1-N)
Y = 2nd moving average term. (N day moving average. Any number 1-N)
Z = Amount of days back from present to search for the occurance of:
option = Over/Under: (> or <. X passing over Y, or X passing Under Y)
X day moving average passing over OR under Y day moving average
within the past Z days.
My database is structured:
tbl_daily_data
id
stock_id
date
adj_close
And:
tbl_stocks
stock_id
symbol
I have a btree index on:
daily_data(stock_id, date, adj_close)
stock_id
I am stuck on this query and having a lot of trouble writing it. If the variables were fixed it would seem trivial but because X, Y, Z are all 100% independent of each other (could look, for example for 5 day moving average within the past 100 days, or 100 day moving average within the past 5) I am having a lot of trouble coding it.
Please help! :(
Edit: I've been told some more context might be helpful?
We are creating an open stock analytic system where users can perform trend analysis. I have a database containing 3500 stocks and their price histories going back to 1970.
This query will be running every day in order to find stocks that match certain criteria
for example:
10 day moving average crossing over 20 day moving average within 5
days
20 day crossing UNDER 10 day moving average within 5 days
55 day crossing UNDER 22 day moving average within 100 days
But each user may be interested in a different analysis so I cannot just store the moving average with each row, it must be calculated.
I am not sure if I fully understand the question ... but something like this might help you get where you need to go: sqlfiddle
SET #X:=5;
SET #Y:=3;
set #Z:=25;
set #option:='under';
select * from (
SELECT stock_id,
datediff(current_date(), date) days_ago,
adj_close,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #X
) move_av_1,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #Y
) move_av_2
FROM
tbl_daily_data T1
where
datediff(current_date(), date) <= #z
) x
where
case when #option ='over' and move_av_1 > move_av_2 then 1 else 0 end +
case when #option ='under' and move_av_2 > move_av_1 then 1 else 0 end > 0
order by stock_id, days_ago
Based on answer by #Tom H here: How do I calculate a moving average using MySQL?
Assume this table:
id date
----------------
1 2010-12-12
2 2010-12-13
3 2010-12-18
4 2010-12-22
5 2010-12-23
How do I find the average intervals between these dates, using MySQL queries only?
For instance, the calculation on this table will be
(
( 2010-12-13 - 2010-12-12 )
+ ( 2010-12-18 - 2010-12-13 )
+ ( 2010-12-22 - 2010-12-18 )
+ ( 2010-12-23 - 2010-12-22 )
) / 4
----------------------------------
= ( 1 DAY + 5 DAY + 4 DAY + 1 DAY ) / 4
= 2.75 DAY
Intuitively, what you are asking should be equivalent to the interval between the first and last dates, divided by the number of dates minus 1.
Let me explain more thoroughly. Imagine the dates are points on a line (+ are dates present, - are dates missing, the first date is the 12th, and I changed the last date to Dec 24th for illustration purposes):
++----+---+-+
Now, what you really want to do, is evenly space your dates out between these lines, and find how long it is between each of them:
+--+--+--+--+
To do that, you simply take the number of days between the last and first days, in this case 24 - 12 = 12, and divide it by the number of intervals you have to space out, in this case 4: 12 / 4 = 3.
With a MySQL query
SELECT DATEDIFF(MAX(dt), MIN(dt)) / (COUNT(dt) - 1) FROM a;
This works on this table (with your values it returns 2.75):
CREATE TABLE IF NOT EXISTS `a` (
`dt` date NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `a` (`dt`) VALUES
('2010-12-12'),
('2010-12-13'),
('2010-12-18'),
('2010-12-22'),
('2010-12-24');
If the ids are uniformly incremented without gaps, join the table to itself on id+1:
SELECT d.id, d.date, n.date, datediff(d.date, n.date)
FROM dates d
JOIN dates n ON(n.id = d.id + 1)
Then GROUP BY and average as needed.
If the ids are not uniform, do an inner query to assign ordered ids first.
I guess you'll also need to add a subquery to get the total number of rows.
Alternatively
Create an aggregate function that keeps track of the previous date, and a running sum and count. You'll still need to select from a subquery to force the ordering by date (actually, I'm not sure if that's guaranteed in MySQL).
Come to think of it, this is a much better way of doing it.
And Even Simpler
Just noting that Vegard's solution is much better.
The following query returns correct result
SELECT AVG(
DATEDIFF(i.date, (SELECT MAX(date)
FROM intervals WHERE date < i.date)
)
)
FROM intervals i
but it runs a dependent subquery which might be really inefficient with no index and on a larger number of rows.
You need to do self join and get differences using DATEDIFF function and get average.