Using bitfields for scheduling? - mysql

Does it make any sense to use bitfields to store and manage a schedule?
I'm working on a Ruby on Rails application to handle restaurant opening hours and reservations, and I'm having some difficulty modeling schedule.
Each restaurant will have opening hours (like Monday 9am-12pm and 2pm-5pm) each day, and each table in the restaurant will have a size (2, 4, 8-seat, etc.) and its own openings.
So far, I've been using two tables to keep track of things:
opening_hours
day_of_the_week (string)
starts_at (time)
ends_at (time)
bookings
table_id (int)
starts_at (datetime)
ends_at (datetime)
With those tables, I can make sure new bookings don't overlap other bookings for the same table and that the booking falls within an opening-hour range for that day of the week.
It's problematic to find the open slots in the schedule, though. That is, given a set of opening hours and existing bookings, where are the gaps that could accommodate new bookings?
While looking through StackOverflow for inspiration, I came across this comment about using bitfields for schedules, and it piqued my curiosity. I don't really know anything about bitwise logic, but I wonder if I could replace the above tables with something like:
opening_hours
day_of_the_week (string)
hours (96 bits, representing open/closed times for each quarter-hour of the day)
bookings
table_id (int)
date (date)
hours (96 bits, representing available/booked times for each quarter-hour of the day)
And then I could use bitwise logic (waves hands) and find the open, available times for a given day.
So my questions:
Would it make sense to do something like this?
Can anybody point me to a blog post or tutorial covering using bitfields for schedules?
What else should I look at to learn about bitfields & bitwise logic, specifically in the ruby/Rails realm?
thanks,
Jacob

This seems like a strange way to store what are dates.
Ask yourself what the behavior of the Restaurant or OpeningHours will be? I bet you'll find yourself converting back to real date objects to implement that behavior. So why use a weird encoding in the database (day of week and bit string)?
Additionally, do you ever intend to use SQL date operators to find what Restaurants or tables are open? That just got a whole lot trickier if it's possible at all with your encoding scheme.
I doubt you want to do this.

Related

How to do calculated Attributes in a database

I hope you can help me to understand how to solve my issue. Bases is the creation of a database where working hours of employees should be stored.
With these hours a little math is involved and the combination of database+math seems to be a problem.
I want to store the theoretically workable hours of an employee:
52(weeks a year)*40 hours a week = 2080 Minus holiday etc -> 1900 hours expected time yearly.
His actually worked time would go up by 8 hours each day until he reaches 1900, what would already be an issue. Meaning i dont really know how to implemnt that.
But the problem continues:
This time shall be split between 12 months equally. Okay so 1900 devided by 12 in 12 different columns... sounds stupid but now he reports sick in february and his actual time decreases within this month and accordingly his overall working time decreases as well.
Also there are things like parttime workers or people taking a sabbathical and also these hours need to be conncted to different projects (another table in same db)
In excel this issue was relatively easy to solve but with a pure db Iam kinda lost as how to approach it.
So 2 questions: is something like this even possible in a Mysql DB (i somehow doubt it).
And how would i do it?( in the db or with some additional software/frontend)
It sounds like you are trying to make a DTR system (Daily Time Reoords). I do recommend to design the database that will cater flexible scenarios to all types of employees. A common ground of storing information (date and time) that be able to calculate their working hours of these people.
You worry about the algorithms later it will follow based on your database design.

Compare creation dates of things that may have been made before christ

I'm thinking about making a project in a database with a large amount of objects / people / animals / buildings, etc.
The application would let the user select two candidates and see which came first. The comparison would be made by date, or course.
MySQL only allow dates after 01/01/1000.
If one user were to compare which came first: Michael Jackson or Fred Mercury, the answer would be easy since they came after this year.
But if they were to compare which came first: Tyranosaurus Rex or Dog, they both came before the accepted date.
How could I make those comparisons considering the SQL limit?
I didn't do anything yet, but this is something I'd like to know before I start doing something that will never work.
THIS IS NOT A DUPLICATE OF OTHER QUESTIONS ABOUT OLD DATES.
In other questions, people are asking about how to store. It would be extremely easy, just make a string out of it. But in my case, I'd need to compare such dates, which they didn't ask for, yet.
I could store the dates as a string, using A for after and B for before, as people answered in other questions. There would be no problem. But how could I compare those dates? What part of the string I'd need to break?
You could take a signed BIGINT field and use it as a UNIX timestamp.
A UNIX timestamp is the number of seconds that passed since January 1, 1970, at 0:00 UTC.
Any point in time would simply be a negative timestamp.
If my amateurish calculation is correct, a BIGINT would be enough to take you 292471208678 years into the past (from 1970) and the same number of years into the future. That ought to be enough for pretty much anything.
That would make dates very easy to compare - you'd simply have to see whether one date is bigger than the other.
The conversion from calendar date to timestamp you'd have to do outside mySQL, though.
Depending on what platform you are using there may be a date library to help you with the task.
Why deal with static age at time of entry and offset?
User is going to want to see a date as a date anyway
Complex data entry
Three fields
year smallint (good for up to -32,768 BC)
month tinyint
day tinyint
if ( (y1*10000 + m1*100 + d1) > (y2*10000 + m2*100 + d2) )
OK I had an idea.
Store the age in days, since the hours/seconds are irrelevant for this case.
Christ's age in days: -2015 * 365.
Dog's age in days: -40000 * 365.
In order to make precise calculations, I'd only need an extra field with the date I have added the values. Then add to the "age in days" the difference in days from the day I have added the register, from the day the user is making the comparison.
For example:
Dog's age has been added in 29/12/2015 and the age in days is -40000 * 365.
User is making a comparison on day 29/01/2016.
The difference in days between the two dates is 31 days.
So dog's age in days should be -40000 * 365 - 31.
Using an unsigned big int can do the trick.
Thanks to Pekka for suggesting using negative numbers for any date before the current date.

Database design to get as many stats as possible

I have to structure a MySQL database for work and haven't done that in years. I'd love to get some ideas from you. So here's the task:
I have a couple of "shops" that have, depending on the day of the week and year, different opening hours, which could change further down the line. The shops have
space for a given amount of people (which could change later as well).
A few times a day we count the amount of people in the shop.
We want to compare the utilized capacity between shops. I myself would like to use dc.js to be able to get as much stats as possible from the data.
We also have two different methods of counting our users:
By hand. Reliable, but time consuming.
Light barrier. Automatic, but very inaccurate.
I'd like to get a better approximation of the usercount using the light barrier data and some machine learning algorithm.
Anyway, do you have any tips on how to design the DB as efficiently as possible for my tasks. I was thinking:
SHOP
Id
Name
OPENINGHOURS
Id
ShopId
MaxUsers
Date
Open
Close
MANUALUSERCOUNT
Id
ShopId
Time
Count
AUTOUSERCOUNT
ID
ShopId
Time
Count
Does this structure make sense (at all and for my tasks)?
Thank you!
For an application of this size, I see no problem with this at all. Except what does "time" column in usercount tables refer to ?

Storing dates i Train schedule MYSQL

I have created a train schedule database in MYSQL. There are several thousand routes for each day. But with a few exceptions most of the routes are similar for every working day, but differ on weekends.
At this time I basically update my SQL tables at midnight each day, to get the departures for the next 24 hours. This is however very inconvenient. So I need a way to store dates in my tables so I don't have to do this every day.
I tried to create a separate table where I stored dates for each routenumber (routenumbers are resetted each day), but this made my query so slow that it was impossible to use. Does this mean I would have to store my departure and arrival times as datetimes? In that case the main table containing routes would have several million entries.
Or is there another way?
My routetable looks like this:
StnCode (referenced in seperate Station table)
DepTime
ArrTime
Routenumber
legNumber
How were you storing the dates? A single date/time field? That'd certainly be the most compact representation, but also the most difficult to index and scan, especially if you're doing queries of the following type:
SELECT ...
WHERE MONTH(DepTime) = 4 AND DAY(DepTime) = 19;
Such a construct would require a full table scan to tear apart each date field and extract the month/day. For such a case, it'd be better to denomalize a bit and split the datetime into seperate year/month/day/hour/minute fields and place indeces onto them. Bit more of a hassle to maintain, but would also speed up querying by specific time parts immensely.
Instead of storing Schedules in terms of dates, you can store them against day (Sun, Mon, Tue, etc). This will eliminate storing the dates for routes. You can treat the routes as predetermined, and thus they are fixed. As the number of trains are around 8000(passenger trains) and days are fixed (7), routes are (50-1000), each table like, 1, 1A, AS PUBLISHED IN RAILWAY BOOKS,
This will avoid storing huge combinations of train schedules into the db since every date is translated into one of the weekdays and we are not missing any data.
You can create a table for storing days which will have at most 7 days.
I would suggest to model the database in such a way, that each station is a touch point, and not as station id....
and you can introduce the hub concept in the design, to identify 3-4 stations, which are of the same city....
each station is a touch point, where it is supported by facilities like,boarding point,HALT POINT etc...
cause, not all the stations are boarding points for all the trains..
facilities are the ones which are available at different stations...
all facilities are not available for all the trains...,
ex: Kazipet is a station, which is also a junction...but for FEW TRAINS,ON few routes,
they pass thru the station, and it also halts at the station, but, it will not allow new passengers to board at the station(s).
But, it will allow the same on reverse routes...

How would you store and query hours of operation?

We're building an app that stores "hours of operation" for various businesses. What is the easiest way to represent this data so you can easily check if an item is open?
Some options:
Segment out blocks (every 15 minutes) that you can mark "open/closed". Checking involves seeing if the "open" bit is set for the desired time (a bit like a train schedule).
Storing a list of time ranges (11am-2pm, 5-7pm, etc.) and checking whether the current time falls in any specified range (this is what our brain does when parsing the strings above).
Does anyone have experience in storing and querying timetable information and any advice to give?
(There's all sorts of crazy corner cases like "closed the first Tuesday of the month", but we'll leave that for another day).
store each contiguous block of time as a start time and a duration; this makes it easier to check when the hours cross date boundaries
if you're certain that hours of operation will never cross date boundaries (i.e. there will never be an open-all-night sale or 72-hour marathon event et al) then start/end times will suffice
The most flexible solution might be use the bitset approach. There are 168 hours in a week, so there are 672 15-minute periods. That's only 84 bytes worth of space, which should be tolerable.
I'd use a table like this:
BusinessID | weekDay | OpenTime | CloseTime
---------------------------------------------
1 1 9 13
1 2 5 18
1 3 5 18
1 4 5 18
1 5 5 18
1 6 5 18
1 7 5 18
Here, we have a business that has regular hours of 5 to 6, but shorter hours on sunday.
A query for if open would be (psuedo-sql)
SELECT #isOpen = CAST
(SELECT 1 FROM tblHours
WHERE BusinessId = #id AND weekDay = #Day
AND CONVERT(Currentime to 24 hour) IS BETWEEN(OpenTime,CloseTime)) AS BIT;
If you need to store edge cases, then just have 365 entries, one per day...its really not that much in the grand scheme of things, place an index on the day column and businessId column.
Don't forget to store the businesses timezone in a separate table (normalize!), and perform a transform between your time and it before making these comparisons.
OK, I'll throw in on this for what it's worth.
I need to handle quite a few things.
Fast / Performant Query
Any increments of time, 9:01 PM, 12:14, etc.
International (?) - not sure if this is an issue even with timezones, at least in my case but someone more versed here feel free to chime in
Open - Close spanning to the next day (open at noon, close at 2:00 AM)
Multiple timespans / day
Ability to override specific days (holidays, whatever)
Ability for overrides to be recurring
Ability to query for any point in time and get businesses open (now, future time, past time)
Ability to easily exclude results of businesses closing soon (filter businesses closing in 30 minutes, you don't want to make your users 'that guy that shows up 5 minutes before closing in the food/beverage industry)
I like a lot of the approaches presented and I'm borrowing from a few of them. In my website, project, whatever I need to take into consideration I may have millions of businesses and a few of the approaches here don't seem to scale well to me personally.
Here's what I propose for an algorithm and structure.
We have to make some concrete assumptions, across the globe, anywhere, any time:
There are 7 days in a week.
There are 1440 minutes in one day.
There are a finite number of permutations of minutes of open / closed that are possible.
Not concrete but decent assumptions:
Many permutations of open/closed minutes will be shared across businesses reducing total permutations actually stored.
There was a time in my life I could easily calculate the actual possible combinations to this approach but if someone could assist/thinks it would be useful, that would be great.
I propose 3 tables:
Before you stop reading, consider in the real-world 2 of these tables will be small enough cache neatly. This approach isn't going to be for everyone either due to the sheer complexity of code required to interpret a UI to the data model and back again if needed. Your mileage and needs may vary. This is an attempt at a reasonable 'enterprise' level solution, whatever that means.
HoursOfOperations Table
ID | OPEN (minute of day) | CLOSE (minute of day)
1 | 360 | 1020 (example: 9 AM - 5 PM)
2 | 365 | 1021 (example: edge-case 9:05 AM - 5:01 PM (weirdos) )
etc.
HoursOfOperations doesn't care about what days, just open and close and uniqueness. There can be only a single entry per open/close combination. Now, depending on your environment either this entire table can be cached or it could be cached for the current hour of the day, etc. At any rate, you shouldn't need to query this table for every operation. Depending on your storage solution I envision every column in this table as indexed for performance. As time progresses, this table likely has an exponentially inverse likelihood of INSERT(s). Really though, dealing with this table should mostly be an in-process operation (RAM).
Business2HoursMap
Note: In my example I'm storing "Day" as a bit-flag field/column. This is largely due to my needs and the advancement of LINQ / Flags Enums in C#. There's nothing stopping you from expanding this to 7 bit fields. Both approaches should be relatively similar in both storage logic and query approach.
Another Note: I'm not entering into a semantics argument on "every table needs a PK ID column", please find another forum for that.
BusinessID | HoursID | Day (or, if you prefer split into: BIT Monday, BIT Tuesday, ...)
1 | 1 | 1111111 (this business is open 9-5 every day of the week)
2 | 2 | 1111110 (this business is open 9:05 - 5:01 M-Sat (Monday = day 1)
The reason this is easy to query is that we can always determine quite easily the MOTD (Minute of the Day) that we're after. If I want to know what's open at 5 PM tomorrow I grab all HoursOfOperations IDS WHERE Close >= 1020. Unless I'm looking for a time range, Open becomes insignificant. If you don't want to show businesses closing in the next half-hour, just adjust your incoming time accordingly (search for 5:30 PM (1050), not 5:00 PM (1020).
The second query would naturally be 'give me all business with HoursID IN (1, 2, 3, 4, 5), etc. This should probably raise a red flag as there are limitations to this approach. However, if someone can answer the actual permutations question above we may be able to pull the red flag down. Consider we only need the possible permutations on any one side of the equation at one time, either open or close.
Considering we've got our first table cached, that's a quick operation. Second operation is querying this potentially large-row table but we're searching very small (SMALLINT) hopefully indexed columns.
Now, you may be seeing the complexity on the code side of things. I'm targeting mostly bars in my particular project so it's going to be very safe to assume that I will have a considerable number of businesses with hours such as "11:00 AM - 2:00 AM (the next day)". That would indeed be 2 entries into both the HoursOfOperations table as well as the Business2HoursMap table. E.g. a bar that is open from 11:00 AM - 2:00 AM will have 2 references to the HoursOfOperations table 660 - 1440 (11:00 AM - Midnight) and 0 - 120 (Midnight - 2:00 AM). Those references would be reflected into the actual days in the Business2HoursMap table as 2 entries in our simplistic case, 1 entry = all days Hours reference #1, another all days reference #2. Hope that makes sense, it's been a long day.
Overriding on special days / holidays / whatever.
Overrides are by nature, date based, not day of week based. I think this is where some of the approaches try to shove the proverbial round peg into a square hole. We need another table.
HoursID | BusinessID | Day | Month | Year
1 | 2 | 1 | 1 | NULL
This can certainly get more complex if you needed something like "on every second Tuesday, this company goes fishing for 4 hours". However, what this will allow us to do quite easily is allow 1 - overrides, 2 - reasonable recurring overrides. E.G. if year IS NULL, then every year on New Years day this weirdo bar is open from 9:00 AM to 5:00 PM keeping in line with our above data examples. I.e. - If year were set, it's only for 2013. If month is null, it's every first day of the month. Again, this won't handle every scheduling scenario by NULL columns alone, but theoretically, you could handle just about anything by relying on a long sequence of absolute dates if needed.
Again, I would cache this table on a rolling day basis. I just can't realistically see the rows for this table in a single-day snapshot being very large, at least for my needs. I would check this table first as it is well, an override and would save a query against the much larger Business2HoursMap table on the storage-side.
Interesting problem. I'm really surprised this is the first time I've really needed to think this through. As always, very keen on different insights, approaches or flaws in my approach.
I think I'd personally go for a start + end time, as it would make everything more flexible. A good question would be: what's the chance that the block size would change at a certain point? Then pick the solution that best fits your situation (if it's liable to change I'd go for the timespans definately).
You could store them as a timespan, and use segments in your application. That way you have the easy input using blocks, while keeping the flexibility to change in your datastore.
To add to what Johnathan Holland said, I would allow for multiple entries for the same day.
I would also allow for decimal time, or another column for minutes.
Why? many restaurants and some businesses, and many businesses around the world have lunch and or afternoon breaks. Also, many restaurants (2 that I know of near my house close at odd non-15-increments time. One closes at 9:40 PM on Sundays, and one closes at 1:40 AM.
There is also the issue of holiday hours , such as stores closing early on thanksgiving day, for example, so you need to have calendar-based override.
Perhaps what can be done is a date/time open, date-time close, such as this:
businessID | datetime | type
==========================================
1 10/1/2008 10:30:00 AM 1
1 10/1/2008 02:45:00 PM 0
1 10/1/2008 05:15:00 PM 1
1 10/2/2008 02:00:00 AM 0
1 10/2/2008 10:30:00 AM 1
etc. (type: 1 being open and 0 closed)
And have all the days in the coming 1 or two years precalculated 1-2 years in advance. Note that you would only have 3 columns: int, date/time/bit so the data consumption should be minimal.
This will also allow you to modify specific dates for odd hours for special days, as they become known.
It also takes care of crossing over midnight, as well as 12/24 hour conversions.
It is also timezone agnostic. If you store start time and duration, when you calculate the end time, is your machine going to give you the TZ adjusted time? Is that what you want? More code.
as far as querying for open-closed status: query the date-time in question,
select top 1 type from thehours where datetimefield<=somedatetime and businessID = somebusinessid order by datetime desc
then look at "type". if one, it's open, if 0, it's closed.
PS: I was in retail for 10 years. So I am familiar with the small business crazy-hours problems.
The segment blocks are better, just make sure you give the user an easy way to set them. Click and drag is good.
Any other system (like ranges) is going to be really annoying when you cross the midnight boundary.
As for how you store them, in C++ bitfields would probably be best. In most other languages, and array might be better (lots of wasted space, but would run faster and be easier to comprehend).
I would think a little about those edge-cases right now, because they are going to inform whether you have a base configuration plus overlay or complete static storage of opening times or whatever.
There are so many exceptions - and on a regular basis (like snow days, irregular holidays like Easter, Good Friday), that if this is expected to be a reliable representation of reality (as opposed to a good guess), you'll need to address it pretty soon in the architecture.
How about something like this:
Store Hours Table
Business_id (int)
Start_Time (time)
End_Time (time)
Condition varchar/string
Open bit
'Condition' is a lambda expression (text for a 'where' clause). Build the query dynamically. So for a particular business you select all of the open/close times
Let Query1 = select count(open) from store_hours where #t between start_time and end_time and open = true and business_id = #id and (.. dynamically built expression)
Let Query2 = select count(closed) from store_hours where #t between start_time and end_time and open = false and business_id = #id and (.. dynamically built expression)
So end the end you want something like:
select cast(Query1 as bit) & ~cast(Query2 as bit)
If the result of the last query is 1 then the store is open at time t, otherwise it is closed.
Now you just need a friendly interface that can generate your where clauses (lambda expressions) for you.
The only other corner case that I can think of is what happens if a store is open from say 7am to 2am on one date but closes at 11pm on the following date. Your system should be able to handle that as well by smartly splitting up the times between the two days.
There is surely no need to conserve memory here, but perhaps a need for clean and comprehensible code. "Bit twiddling" is not, IMHO, the way to go.
We need a set container here, which holds any number of unique items and can determine quickly and easily whether an item is a member or not. The setup reuires care, but in routine use a single line of simply understood code determines if you are open or closed
Concept:
Assign index number to every 15 min block, starting at, say, midnight sunday.
Initialize:
Insert into a set the index number of every 15 min block when you are open. ( Assuming you are open fewer hours than you are closed. )
Use:
Subtract from interesting time, in minutes, midnight the previous sunday and divide by 15. If this number is present in the set, you are open.