I have the following MySQL tables:
ServiceProviders- id, ...other irrelevant columns
ProvidersWorkHours - id, providerId, day(enum[1,2,3,4,5,6,7]), startTime(time), endTime(time)
Service- id, duration, ...other irrelevant columns
Groups- id, serviceId, providerId,size, duration
GroupReservations- groupId, customerId.
ServiceProviders have their own workHours (when they're available to work), they can create Groups of different sizes and of different durations (they can be longer or shorter than regular service duration). I need to find next available time slot for regular service duration that is not reserved or a group still not completely filled (for example group of size 2 that has only 1 reservation is a valid one).
Expected outcomes:
Current date: 2022-06-20 18:24 (MONDAY)
Regular service duration: 45 minutes
Provider workHours: [MONDAY 08:00-17:00, TUESDAY 08:00-17:00, WEDNESDAY 08:00-17:00]
Some expected scenarios:
We have new provider, with no groups, no reservations. Expected nextAvailableSlot 2022-06-21 08:00
We have only one full group at 2022-06-27 12:00. Next availableSlot still should be 2022-06-21 08:00
We have group of size 2 that has only 1 reservation at 2022-06-21 08:00. Next available slot should be 2022-06-21 08:00
We have full groups till 2022-06-21 16:00 (last one ends at 16:00). Next availableSlot should be 2022-06-21 16:00.
As I was thinking we can't go through solution that I found in other places (join same reservations table with itself, in my case it would be groups). It's bad since I can have no groups, or my first group a week ahead of currentDate. Probably what would work is to take currentTime and keep adding regular serviceTime and check all the constrains, until it does not intersect with any reservations, or until it find a group that isn't filled yet. Not sure how to do it in MySQL, though. Any ideas or other solutions?
Related
!!! ---------------------!!!
After exchanging some comments with people, I have decided to include the source data, so anyone who wishes to help with solving this could load it into a table and run the query.
Here is the link that retrieves data from Yahoo Finance for symbol SPY in csv format.
https://query1.finance.yahoo.com/v7/finance/download/SPY?period1=1584963550&period2=1616499550&interval=1d&events=history&includeAdjustedClose=true
The file header needs to be changed. Change Date to Source_Date and Adj Close to Adj_Close. The file does not have a Symbol column, so the query doesn't need to reference it. The only two relevant columns are Source_Date and Adj_Close.
!!! ---------------------!!!
The issue with my query is that it takes a very long time to run. The query is not wrong. I know exactly why it takes such a long time to run. I just couldn't come up with anything more efficient.
First, here is the business logic.
Let's say I bought Apple stock and it went down. It's been 10 days since I bought it and it is still down. I extracted an entire history of daily prices for Apple going back to 1993, and I uploaded it into a database table. Now I want to write a query that will tell me how often did Apple stock price recovered in more than 10 days.
For example. I bought Apple at $100. It went down to $90. It's been 10 days since I bought it. I run my query and it comes back with something like this:
-- Buy Date: April 1, 2001. Buy price: $10. Recovered on: April 12, 2001. Days to recovery: 12
-- Buy Date: June 12, 2006. Buy price: $23. Recovered on: July 20, 2001. Days to recovery: 38
-- Buy Date: January 15, 2009. Buy price: $65. Recovered on: December 30, 2010. Days to recovery: 700
Each example indicates that on each day between Buy Date and Recovered on date, the price stayed below the buy price.
My query has two steps.
The first step. Join the same table to itself, qualified as x and y. For each x.record it searches for the next minimum date where y.price is higher than x.price.
The second query will simply extract only those records from the first query result set where the difference between x.date and y.date is greater than 10.
The reason why the first query (see below) runs for such a long time is because for each record in x.table the query must search an entire y.table. That's a lot of table scans. It comes back with the result but it takes between 50 and 60 seconds.
The table stricture is very simple: Symbol, Date, and Price. Symbol and date are primary key.
SELECT x.Symbol,
x.Source_Date as 'Source_Date',
min(y.Source_Date) as 'Recovery_Date'
FROM transformed_source x,
transformed_source y
where x.Symbol = 'AAPL'
and y.Symbol = x.Symbol
and y.Source_Date > x.Source_Date
and y.Adj_Close > x.Adj_Close
group by x.Symbol, x.Source_Date
*** Note This query will miss records where the price never recovered, so I will need to modify it with an outer join. It wouldn't make any difference. Changing it to outer join will not make it run any faster. So, working with this.
Any ideas are welcome.
Thank you
You might find that a correlated subquery is better:
SELECT ts.Symbol, ts.Source_Date,
(SELECT MIN(ts2.Source_Date)
FROM transformed_source ts2
WHERE ts2.Symbol = t.Symbol AND
ts2.Source_Date > t.Source_Date AND
ts2.Adj_Close > t.Adj_Close
) as Recovery_Date
FROM transformed_source ts
WHERE ts.Symbol = 'AAPL';
Then for performance, you want indexes on transformed_data(symbol, source_date, adj_close).
If you're using an RDBMS that provides pattern matching (e.g. Oracle with match recognise) then you can write a query which will do one pass of your table.
I've put together a DBFiddle
I have 2 tables...
appointments:
id (int)
start (datetime)
end (datetime)
timeslots:
id (int)
timeslot (time)
The table timeslots has records like this...
00:00:00
00:05:00
00:10:00
...
23:55:00
The table appointments has one record...
start: 2017-05-15 01:30:00
end: 2017-05-15 02:00:00
With the following simplified query I try to get the first free starting timeslot each hour if there is no prefered hourly begin time while respecting existing appointments...
SET #mandatory=true, #parallel=1, #targetdate="2017-05-15", #begin='30', #duration='00:30:00', #time_start="00:00:00", #time_end="03:00:00";
SELECT MIN(timeslot) AS free_slot FROM timeslots
INNER JOIN
appointments
ON 1=1
WHERE timeslots.timeslot BETWEEN #time_start AND #time_end
AND (ADDTIME(timeslot, #duration) BETWEEN #time_start AND #time_end)
AND ((#mandatory AND MINUTE(timeslot)=#begin) OR (NOT #mandatory))
AND ((
SELECT COUNT(*) AS anzahl
FROM appointments
WHERE (concat(date(#targetdate)," ",timeslots.timeslot) BETWEEN appointments.start AND appointments.end - INTERVAL 1 SECOND
OR concat(date(#targetdate)," ", ADDTIME(timeslot, #duration) ) BETWEEN appointments.start AND appointments.end - INTERVAL 1 SECOND)
) < #parallel)
GROUP BY HOUR(timeslot)
Explanation of parameters:
#mandatory:
TRUE means, I only allow timeslots matching #begin. FALSE means, I prefer(!) timeslots matching #begin, but if there is no match, I would like to get the first free timeslot each hour.
#parallel:
Number of parallel appointments allowed. The value 1 means, that there should be no overlapping appointments. The value 2 means, that there should be only 2 appointments at the same time.
#targetdate:
The query should return timeslots only for the targetdate.
#begin:
The prefered beginning of the free timeslots. '30' means, that I would like to get timeslots starting "half past ..." every hour.
#duration: The needed duration of consecutive free timeslots.
#time_start and #time_end:
The result should only contain records within this time range.
My problem is, that I get either free timeslots only starting at half past ... or the first free timeslot for every full hour - depending on setting #mandatory to TRUE or FALSE.
When setting #mandatory to TRUE I think I get the desired result (only timeslots starting at the defined begin time). But I want to set #mandatory to FALSE and get first free timeslot every hour ONLY when there is no free timeslot beginning with half past (I would like to PREFER the matching begin timeslots, but offer alternative timeslots...
This is what I would like to get:
00:30:00
01:00:00 (this row respects the existing appointment at half past one and returns the first free timeslot for this hour only when the duration of the remaining timeslots fits)
02:30:00
Is there an easy way to get these results?
Best regards
Right now I am developing Hotel reservation system.
so I need to store prices on certain date/date range for future days, so the price varies on different days/dates. so I need to store those price & date details in to db. i thought of 2 structures .
1st model :
room_prices :
room_id :
from_date :
to_date :
price:
is_available:
2nd design:
room_prices :
room_id :
date:
price:
is_available
so i found the 2nd method is easy. but data stored grows exponentially
as my hotel list grows.
say suppose if i want to store next(future) 2 months price data for one hotel i need to create 60 records for every hotel.
in case of 1st design, i don't require that many records.
ex:
```
price values for X hotel :
1-Dec-15 - 10-Dec-15 -- $200,
1st design requires : only 1 row
2nd design requires : 10 rows
```
i am using mysql,
does it have any performance degradation while searching
room_prices table.
would someone suggest me any better design ?
I have actually worked on designing and implementing a Hotel Reservation system and can offer the following advice based on my experience.
I would recommend your second design option, storing one record for each individual Date / Hotel combination. The reason being that although there will be periods where a Hotel's Rate is the same across multiple days it is more likely that, depending on availability, it will change over time and become different (hotels tend to increase the room rate as the availability drops).
Also there are other important pieces of information that will need to be stored that are specific to a given day:
You will need to manage the hotel availability, i.e. on Date x there
are y rooms available. This will almost certain vary by day.
Some hotels have blackout periods where the hotel is unavailable for short periods of time (typically specific days).
Lead Time - some Hotels only allow rooms to be booked a certain
number of days in advance, this can differ between Week days and
Weekends.
Minimum nights, again data stored by individual date that says if you arrive on this day you must stay x number of nights (say over a weekend)
Also consider a person booking a week long stay, the database query to return the rates and availability for each day of that stay is a lot more concise if you have a pricing record for each Date. You can simply do a query where the Room Rate Date is BETWEEN the Arrival and Departure Date to return a dataset with one record per date of the stay.
I realise with this approach you will store more records but with well indexed tables the performance will be fine and the management of the data will be much simpler. Judging by your comment you are only talking in the region of 18000 records which is a pretty small volume (the system I worked on has several million and works fine).
To illustrate the extra data management if you DON'T store one record per day, imagine that a Hotel has a rate of 100 USD and 20 rooms available for the whole of December:
You will start with one record:
1-Dec to 31st Dec Rate 100 Availability 20
Then you sell one room on the 10th Dec.
Your business logic now has to create three records from the one above:
1-Dec to 9th Dec Rate 100 Availability 20
10-Dec to 10th Dec Rate 100 Availability 19
11-Dec to 31st Dec Rate 100 Availability 20
Then the rate changes on the 3rd and 25th Dec to 110
Your business logic now has to split the data again:
1-Dec to 2-Dec Rate 100 Availability 20
3-Dec to 3-Dec Rate 110 Availability 20
4-Dec to 9-Dec Rate 100 Availability 20
10-Dec to 10-Dec Rate 100 Availability 19
11-Dec to 24-Dec Rate 100 Availability 20
25-Dec to 25-Dec Rate 110 Availability 20
26-Dec to 31-Dec Rate 100 Availability 20
That is more business logic and more overhead than storing one record per date.
I can guarantee you that by the time you have finished your system will end up with one row per date anyway so you might as well design it that way from the beginning and get the benefits of easier data management and quicker database queries.
I think that the first solution is better and as you already noticed it reduce the number of storage you need to store prices. Another possible approach could be : having a single date and assuming that the date specified is valid up until a new date is found, so basically the same structure you designed in the second approach but implementing an internal logic in which you override a date if you find a new date for a specified period.
Let's say that you have ROOM 1 with price $200 from 1st December and with price $250 from 12 December, then you will have only two rows :
1-Dec-15 -- $200
12-Dec-15 -- $250
And you will assume in your logic that a price is valid from the specified date up until a new price is found.
I have a table
PEOPLE, DATE, DELETED
Amanda, 2015-03-01, Null
Ray, 2015-03-01, Null
Moe, 2015-04-01, Null
Yan, 2015-05-01, Null
Bee, 2015-05-05, 2015-06-12
now I need to group it and sum it with months like this:
March: 2 people
April: 3
May: 5
June: 5
July: 4
so new people should not be counted in previous month but they should be in next months for my range (January - June). And if man is DELETED, he should be counted together with another people last time in month when he has been deleted.
How to write query for this?
This can be at least solved using running totals. This just the outline how to do it, you'll need to do some work for the actual solution:
select people, date, 1 as persons from yourtable
union all
select people, deleted, -1 as persons from yourtable where deleted is not null
Then do a running total of this data, so that you sum the +-1 persons -field, and that should give you the amount of people that are there so far.
For the events happening in the middle of the month, you'll have to adjust the date to be the start of that or the next month whichever way you want them to be calculated.
If you need also those months when no changes happened, you'll probably need a table that contains the first day of each month for the biggest range of dates you'll ever need, for example 1.1.2000 - 1.12.2100.
I have a table that has these fields:
How would I find all available time slots for that day (based on the day starting at 7am and ending at 10pm) that aren't currently in this table? For example, on one particular day, if all timeslots from 7am till 10pm were taken bar one at 6pm till 7pm, that would be the one result.
The duration of each time slot does not vary - they all last one hour.
I have tried many different things, but I have a feeling I am so far off, it is hardly worth posting what I've tried.
Any help would be greatly appreciated!
One thing I would think about would be having a "hours table" with all the values you wanna check. Then, with a left join and selecting only null values, you'd get only the values you haven't assigned. I built a SQLfiddle http://sqlfiddle.com/#!2/66b441/6 to check it with some dummy values to show how this works:
SELECT h.slot
FROM hours h
LEFT JOIN deliveries d
ON ( h.slot = d.start_time AND date_stamp = '2014-04-04' )
WHERE start_time IS NULL
Check the data in the SQLfiddle, if you know the slots and they don't overlap, with that you will get for the date the values
SLOT
January, 01 1970 07:00:00+0000
January, 01 1970 10:00:00+0000