Functions for value intervals - language-agnostic

I'm currently dealing with a lot of possibly indefinite date spans, ie.
StartDate EndDate
--------- ---------
01JAN1921 31DEC2009
10OCT1955 null
...
where the other end of the interval might be unknown or not defined. I've been working on little functions for detecting overlap, whether an interval is a subinterval of an other, calculating the gap between two intervals etc.
For example to detect overlaps, the problem is
S E S and E are the start and end of the interval
| | we're comparing to. Here both are known, but
s1------+---e1 | either could be null. The small s:s and e:s
| | s2....e2 define the intervals we're comparing to and
|s3--e3 | again we'd like to allow for open intervals.
| s4--+----e4
s5..e5| |
s6--+-------+--s7
| |
Based on questions related to detecting overlap with well defined intervals you need to check
Coalesce(S,Coalesce(e-1,0))<Coalesce(e,Coalesce(S+1,1))
AND Coalesce(E,Coalesce(s+1,0))>Coalesce(s,Coalesce(E-1,1))
I suppose this is a such common thing (concerning not only date or time intervals) that lots of people have been dealing with it. I'm looking for existing implementations, preferrably just functions based on basic comparison operations.

Interval arithmetic is a wide and complex topic. C++/Boost has a library for dealing with it. Python too, and I guess many other languages. Is this what you're asking about, or is this too general?
As for time intervals, there's this SO question, and probably others you can find in the 'Related' sidebar.

There are three common ways people commonly represent dates in programs.
In the Gregorian calendar, with years, months, days, hours, minutes, seconds. It's very robust, but it's difficult to do math with (a leap year every fourth year, but not on the centuries, unless the year is a multiple of 400... can you remember that?)
As the number of seconds since a specific time, although this creates problems as you have to choose whether to count leap seconds (search for "leap seconds" online and you'll see why)
In the Julian calendar, as a number of days since a specific date and the number of seconds since the start of that day
As for implementations to use as a reference,
Java is an example of what not to do (folks, don't create mutable datetime types)
Python's is also a mess (no easy way to parse dates)
PostgreSQL deals with dates and times very nicely. See http://www.postgresql.org/docs/7.4/interactive/datatype-datetime.html, and you can also look at the source code. It uses the Julian calendar internally.
Haskell's standard time library is also nice, and has unlimited range and precision — that's right, they use bigints for the Julian date and an a ratio of bigints for the time. The library includes a bunch of useful date functions, including one for calculating the date Easter falls on. See http://hackage.haskell.org/package/time.

Related

What data type should I use for a 'duration' attribute?

I'm using phpmyadmin/MySQL to make a database.
It's for a plane/bus/train booking system.
I have a 'depart_time' attribute which is a time data type. In the same table I have a 'duration' attribute. Later on I will need to do multiplication on this duration attribute (depending on if it is train/bus/plane).
My question is - what would be the best data type for this duration attribute?
I thought about using a decimal type - but then the values in it won't represent the time exactly (e.g. 1.30 won't represent 1 and a half hrs, it would need to be 1.50 - if that makes sense).
I also thought about using the time data type for this field as well, but I wasn't sure if multiplication would be possible on that?
I couldn't find any help after googling about multiplication on the time data type.
Hopefully this made sense, if you need anymore information then feel free to ask in the comments!
Thanks in advance!
Use an int and record durations in the smallest unit you're interested in. For example, if you need minute accuracy, store one and a half hours as 90 minutes. Formatting that value for display purposes is presentation logic, not the business of the database.
If I were in that position I would probably either:
In seconds. Unlikely that you need more precision than that.
In a string such as P1D for 1 day and P1W2DT3H for 1 week, 2 days, 3 hours. This is a standard format used by many libraries and deals better with situations where something really takes 1 day, but it's a day with a leap hour.
For most cases just using seconds will be fine though.
I would represent it in the database as seconds or minutes (minimum precision you want). Showing it to the user should be done dynamically in frontend (e.g. in minutes (1 min, 30 min 180min), hours (0.1h, 1h, 3h, 3.0h), days (0.5d, 1d) or minimal packed (1d 5h 42min).
You should keep this separate. So I suggest to use seconds.
I've solved how I'm going to do this.
Instead of doing it within the database. I am going to do the multiplication using Python.
I took the information from the table, turned the data into int/datetime.deltatime and the multiplication worked.
I then just needed to return that data depending on whether it's bus/train/plane.
For a multiplier not involved with money, simply use FLOAT.
Then work in seconds (or minutes if you prefer). That can be in INT UNSIGNED.
Use appropriate DATETIME functions to convert seconds to hh:mm or whatever output you desire. Note: The internal format need not be the same as the display format.
A duration could be represented in an open standard manner using ISO 8601 duration format.
See https://www.digi.com/resources/documentation/digidocs/90001437-13/reference/r_iso_8601_duration_format.htm

Compare creation dates of things that may have been made before christ

I'm thinking about making a project in a database with a large amount of objects / people / animals / buildings, etc.
The application would let the user select two candidates and see which came first. The comparison would be made by date, or course.
MySQL only allow dates after 01/01/1000.
If one user were to compare which came first: Michael Jackson or Fred Mercury, the answer would be easy since they came after this year.
But if they were to compare which came first: Tyranosaurus Rex or Dog, they both came before the accepted date.
How could I make those comparisons considering the SQL limit?
I didn't do anything yet, but this is something I'd like to know before I start doing something that will never work.
THIS IS NOT A DUPLICATE OF OTHER QUESTIONS ABOUT OLD DATES.
In other questions, people are asking about how to store. It would be extremely easy, just make a string out of it. But in my case, I'd need to compare such dates, which they didn't ask for, yet.
I could store the dates as a string, using A for after and B for before, as people answered in other questions. There would be no problem. But how could I compare those dates? What part of the string I'd need to break?
You could take a signed BIGINT field and use it as a UNIX timestamp.
A UNIX timestamp is the number of seconds that passed since January 1, 1970, at 0:00 UTC.
Any point in time would simply be a negative timestamp.
If my amateurish calculation is correct, a BIGINT would be enough to take you 292471208678 years into the past (from 1970) and the same number of years into the future. That ought to be enough for pretty much anything.
That would make dates very easy to compare - you'd simply have to see whether one date is bigger than the other.
The conversion from calendar date to timestamp you'd have to do outside mySQL, though.
Depending on what platform you are using there may be a date library to help you with the task.
Why deal with static age at time of entry and offset?
User is going to want to see a date as a date anyway
Complex data entry
Three fields
year smallint (good for up to -32,768 BC)
month tinyint
day tinyint
if ( (y1*10000 + m1*100 + d1) > (y2*10000 + m2*100 + d2) )
OK I had an idea.
Store the age in days, since the hours/seconds are irrelevant for this case.
Christ's age in days: -2015 * 365.
Dog's age in days: -40000 * 365.
In order to make precise calculations, I'd only need an extra field with the date I have added the values. Then add to the "age in days" the difference in days from the day I have added the register, from the day the user is making the comparison.
For example:
Dog's age has been added in 29/12/2015 and the age in days is -40000 * 365.
User is making a comparison on day 29/01/2016.
The difference in days between the two dates is 31 days.
So dog's age in days should be -40000 * 365 - 31.
Using an unsigned big int can do the trick.
Thanks to Pekka for suggesting using negative numbers for any date before the current date.

How to describe interval datatype in mysql?

I'm now working on a mysql database. I have a need to have 2 columns in tables:
The first one has to store info in format "N years M months".
The other one: "H hours MM minutes".
can you give me an idea about how to represent it in table? I know, that mysql doesn't provide any time interval datatype. I will appreciate your opinions very much.
If your database choice is not fixed, it's worth noting that PostgreSQL has a native time/date interval type that might readily represent your desired intervals.
But let's assume that MySQL is what you've got to work with. Then you're going to need a homegrown way to represent your intervals, because MySQL doesn't have any native ones.
A simple approach would be to represent intervals in terms of a least common denominator, such as "seconds." H hours and MM minutes = H * 3600 + MM * 60. Converting back to H and MM would be done through division, modulo, or divmod operations.
Another approach would be to use concatenated strings, such as N years M months = "N,M" and H hours MM minutes = "H:MM".
Either way, your logic (SQL and/or application language) will have to interpret these composed types. String concatenation is not especially efficient, but it's easily decomposed in most programming languages. E.g. in Python:
years, months = [int(part) for part in yearmonth.split(',')]
It's a little less convenient in SQL, but combinations of left, right, and locate will do the trick (possibly with the rare addition of cast).
A bigger problem is that while your hours and minutes intervals are well-defined, your years and months intervals are less so. Neither years nor months have a fixed number of days (hours, seconds, etc.). There are leap years, and months have varied lengths. You can either assume a fixed value as an estimate (e.g. 365.25 days/year taking into account leap years, or the even more accurate 365.2425 days/year taking into account the special handling of 100- and 400-year marks). This is often done when estimating, such as 4.33 weeks/month. Just keep in mind, those are estimates. If you keep the intervals as composed strings, you can use an external library such as Perl's Date::Manip that has considerable savvy about dealing with date intervals. Which brings us to a final option:
Choose an external interval interpretation library, and commit to using its encoding of intervals. For example, Date::Manip::Delta compacts 7 fields (years, months, weeks, days, hours, minutes, and seconds) into a colon-separated string such as 1:2:3:4:5:6:7. If you're going to depend on an external library, might as well choose its native format. Other languages and libraries have their own favored formats. Just be careful in choosing one that has a semantic understanding of the passing of time that maps well to your application needs. Not all will. For example, Python does have a native datetime.timedelta type, but days are the largest unit in its understanding. That might not map well to your need for years and months to have full expression. As Wrikken suggests, if you have a libraries that support a standard representation like ISO8601 durations, all the better. I, however, have not found such libraries to be as common, well-supported, or richly functional as alternatives such as Date::Manip.

Does MySQL support historical date (like 1200)?

I can't see any info about that. Where can I find the oldest date Mysql can support ?
For the specific example you used on your question (year 1200), technically things will work.
In general, however, timestamps are unadvisable for this uses.
First, the range limitation is arbitrary: in MySQL it's Jan 1st, 1000. If you are working with 12-13th century stuff, things go fine... but if at some moment you need to add something older (10th century or earlier), the date will miserably break, and fixing the issue will require re-formatting all your historic dates into something more adequate.
Timestamps are normally represented as raw integers, with a given "tick interval" and "epoch point", so the number is indeed the number of ticks elapsed since the epoch to the represented date (or viceversa for negative dates). This means that, as with any fixed-with integer data-type, the set of representable values is finite. Most timestamp formats I know about sacrifice range in favor of precision, mostly because applications that need to perform time arithmetic often need to do so with a decent precision; while applications that need to work with historical dates very rarely need to perform serious arithmetic.
In other words, timestamps are meant for precise representation of dates. Second (or even fraction of second) precission makes no sense for historical dates: could you tell me, down to the milliseconds, when was Henry the 8th crowned as King of England?
In the case of MySQL, the format is inherently defined as "4-digit years", so any related optimization can rely on the assumption that the year will have 4 digits, or that the entire string will have exactly 10 chars ("yyyy-mm-dd"), etc. It's just a matter of luck that the date you mentioned on your title still fits, but even relying on that is still dangerous: besides what the DB itself can store, you need to be aware of what the rest of your server stack can manipulate. For example, if you are using PHP to interact with your database, trying to handle historical dates is very likely to crash at some point or another (on a 32-bit environment, the range for UNIX-style timestamps is December 13, 1901 through January 19, 2038).
In summary: MySQL will store properly any date with a 4-digit year; but in general using timestamps for historical dates is almost guaranteed to trigger issues and headaches more often than not. I strongly advise against such usage.
Hope this helps.
Edit/addition:
Thank you for this very insteresting
answer. Should I create my own algo
for historical date or choose another
db but which one ? – user284523
I don't think any DB has too much support for this kind of dates: applications using it most often have enough with string-/text- representation. Actually, for dates on year 1 and later, a textual representation will even yield correct sorting / comparisons (as long as the date is represented by order of magnitude: y,m,d order). Comparisons will break, however, if "negative" dates are also involved (they would still compare as earlier than any positive one, but comparing two negative dates would yield a reversed result).
If you only need Year 1 and later dates, or if you don't need sorting, then you can make your life a lot easier by using strings.
Otherwise, the best approach is to use some kind of number, and define your own "tick interval" and "epoch point". A good interval could be days (unless you really need further precission, but even then you can rely on "real" (floating-point) numbers instead of integers); and a reasonable epoch could be Jan 1, 1. The main problem will be turning these values to their text representation, and viceversa. You need to keep in mind the following details:
Leap years have one extra day.
The rule for leap years was "any multiple of 4" until 1582, when it changed from the Julian to the Gregorian calendar and became "multiple of 4 except those that are multiples of 100 unless they are also multiples of 400".
The last day of the Julian calendar was Oct 4th, 1582. The next day, first of the Gregorian calendar, was Oct 15th, 1582. 10 days were skipped to make the new calendar match again with the seasons.
As stated in the comments, the two rules above vary by country: Papal states and some catholic countries did adopt the new calendar on the stated dates, but many other countries took longer to do so (the last being Turkey in 1926). This means that any date between the papal bull in 1582 and the last adoption in 1926 will be ambiguous without geographical context, and even more complex to process.
There is no "year 0": the year before year 1 was year -1, or year 1 BCE.
All of this requires quite elaborate parser and formater functions, but beyond the many case-by-case breakings there isn't really too much complexity (it'd be tedious to code, but quite straight-forward). The use of numbers as the underlying representation ensures correct sorting/comparing for any pair of values.
Knowing this, now it's your choice to take the approach that better fits your needs.
From the documentation:
DATE
A date. The supported range is '1000-01-01' to
'9999-12-31'.
Yes. MySQL dates start in year 1000.
For whatever it's worth, I found that the MySQL DATE field does support dates < 1000 in practice, though the documentation says otherwise. E.g., I was able to enter 325 and it stores as 0325-00-00. A search WHERE table.date < 1000 also gave correct results.
But I am hesitant to rely on the < 1000 dates when they are not officially supported, plus I sometimes need BCE years with more than 4 digits anyway (e.g. 10000 BCE). So separate INT fields for year, month and day (as suggested above) do seem the only choice.
I do wish the DATE type (or perhaps a new HISTDATE type) supported a full range of historical dates - it would be nice to combine three fields into one and simply sort by date instead of having to sort by year, month, day.
Use SMALLINT for year, so the year will accept from -32768 (BC) to 32768 (AD)
As for months and days, use TINYINT UNSIGNED
Most historical events dont have months and days, so you could query like this :
SELECT events FROM history WHERE year='-4990'
Result : 'Noah Ark'
Or : SELECT events FROM history WHERE year='570' AND month='4' AND day='20'
return : "Muhammad pbuh was born"
Depending on requirements, you could also add DATETIME column and make it NULL for date before 1000 and vice versa (thus saving some bytes)
This is an important and interesting problem which has another solution.
Instead of relying on the database platform to support a potentially infinite number of dates with millisecond precision, rely on an object-oriented programming language compiler and runtime to correctly handle date and time arithmetic.
It is possible to do this using the Java Virtual Machine (JVM), where time is measured in milliseconds relative to midnight, January 1, 1970 UTC (Epoch), by persisting the required value as a long in the database (including negative values), and performing the required conversion/calculation in the component layer after retrieval.
For example:
Date d = new Date(Long.MIN_VALUE);
DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy G HH:mm:ss Z");
System.out.println(df.format(d));
Should show:
Sun, 2 Dec 292269055 BC 16:47:04 +0000
This also enables independence of database versions and platforms as it abstracts all date and time arithmetic to the JVM runtime, i.e. changes in database versions and platforms will be much less likely to require re-implementation, if at all.
I had the similar problem and I wanted to continue relay on date fields in the DB to allow me use date range search with accuracy of up-to a day for historic values.
(My DB includes date of birth and dates of roman emperors...)
The solution was to add a constant year (example: 3000) to all the dates before adding them to the DB and subtracting the same number before displaying the query results to the users.
If you DB has already some dates value in it, remember to update the exiting value with the new const number.

How would you store and query hours of operation?

We're building an app that stores "hours of operation" for various businesses. What is the easiest way to represent this data so you can easily check if an item is open?
Some options:
Segment out blocks (every 15 minutes) that you can mark "open/closed". Checking involves seeing if the "open" bit is set for the desired time (a bit like a train schedule).
Storing a list of time ranges (11am-2pm, 5-7pm, etc.) and checking whether the current time falls in any specified range (this is what our brain does when parsing the strings above).
Does anyone have experience in storing and querying timetable information and any advice to give?
(There's all sorts of crazy corner cases like "closed the first Tuesday of the month", but we'll leave that for another day).
store each contiguous block of time as a start time and a duration; this makes it easier to check when the hours cross date boundaries
if you're certain that hours of operation will never cross date boundaries (i.e. there will never be an open-all-night sale or 72-hour marathon event et al) then start/end times will suffice
The most flexible solution might be use the bitset approach. There are 168 hours in a week, so there are 672 15-minute periods. That's only 84 bytes worth of space, which should be tolerable.
I'd use a table like this:
BusinessID | weekDay | OpenTime | CloseTime
---------------------------------------------
1 1 9 13
1 2 5 18
1 3 5 18
1 4 5 18
1 5 5 18
1 6 5 18
1 7 5 18
Here, we have a business that has regular hours of 5 to 6, but shorter hours on sunday.
A query for if open would be (psuedo-sql)
SELECT #isOpen = CAST
(SELECT 1 FROM tblHours
WHERE BusinessId = #id AND weekDay = #Day
AND CONVERT(Currentime to 24 hour) IS BETWEEN(OpenTime,CloseTime)) AS BIT;
If you need to store edge cases, then just have 365 entries, one per day...its really not that much in the grand scheme of things, place an index on the day column and businessId column.
Don't forget to store the businesses timezone in a separate table (normalize!), and perform a transform between your time and it before making these comparisons.
OK, I'll throw in on this for what it's worth.
I need to handle quite a few things.
Fast / Performant Query
Any increments of time, 9:01 PM, 12:14, etc.
International (?) - not sure if this is an issue even with timezones, at least in my case but someone more versed here feel free to chime in
Open - Close spanning to the next day (open at noon, close at 2:00 AM)
Multiple timespans / day
Ability to override specific days (holidays, whatever)
Ability for overrides to be recurring
Ability to query for any point in time and get businesses open (now, future time, past time)
Ability to easily exclude results of businesses closing soon (filter businesses closing in 30 minutes, you don't want to make your users 'that guy that shows up 5 minutes before closing in the food/beverage industry)
I like a lot of the approaches presented and I'm borrowing from a few of them. In my website, project, whatever I need to take into consideration I may have millions of businesses and a few of the approaches here don't seem to scale well to me personally.
Here's what I propose for an algorithm and structure.
We have to make some concrete assumptions, across the globe, anywhere, any time:
There are 7 days in a week.
There are 1440 minutes in one day.
There are a finite number of permutations of minutes of open / closed that are possible.
Not concrete but decent assumptions:
Many permutations of open/closed minutes will be shared across businesses reducing total permutations actually stored.
There was a time in my life I could easily calculate the actual possible combinations to this approach but if someone could assist/thinks it would be useful, that would be great.
I propose 3 tables:
Before you stop reading, consider in the real-world 2 of these tables will be small enough cache neatly. This approach isn't going to be for everyone either due to the sheer complexity of code required to interpret a UI to the data model and back again if needed. Your mileage and needs may vary. This is an attempt at a reasonable 'enterprise' level solution, whatever that means.
HoursOfOperations Table
ID | OPEN (minute of day) | CLOSE (minute of day)
1 | 360 | 1020 (example: 9 AM - 5 PM)
2 | 365 | 1021 (example: edge-case 9:05 AM - 5:01 PM (weirdos) )
etc.
HoursOfOperations doesn't care about what days, just open and close and uniqueness. There can be only a single entry per open/close combination. Now, depending on your environment either this entire table can be cached or it could be cached for the current hour of the day, etc. At any rate, you shouldn't need to query this table for every operation. Depending on your storage solution I envision every column in this table as indexed for performance. As time progresses, this table likely has an exponentially inverse likelihood of INSERT(s). Really though, dealing with this table should mostly be an in-process operation (RAM).
Business2HoursMap
Note: In my example I'm storing "Day" as a bit-flag field/column. This is largely due to my needs and the advancement of LINQ / Flags Enums in C#. There's nothing stopping you from expanding this to 7 bit fields. Both approaches should be relatively similar in both storage logic and query approach.
Another Note: I'm not entering into a semantics argument on "every table needs a PK ID column", please find another forum for that.
BusinessID | HoursID | Day (or, if you prefer split into: BIT Monday, BIT Tuesday, ...)
1 | 1 | 1111111 (this business is open 9-5 every day of the week)
2 | 2 | 1111110 (this business is open 9:05 - 5:01 M-Sat (Monday = day 1)
The reason this is easy to query is that we can always determine quite easily the MOTD (Minute of the Day) that we're after. If I want to know what's open at 5 PM tomorrow I grab all HoursOfOperations IDS WHERE Close >= 1020. Unless I'm looking for a time range, Open becomes insignificant. If you don't want to show businesses closing in the next half-hour, just adjust your incoming time accordingly (search for 5:30 PM (1050), not 5:00 PM (1020).
The second query would naturally be 'give me all business with HoursID IN (1, 2, 3, 4, 5), etc. This should probably raise a red flag as there are limitations to this approach. However, if someone can answer the actual permutations question above we may be able to pull the red flag down. Consider we only need the possible permutations on any one side of the equation at one time, either open or close.
Considering we've got our first table cached, that's a quick operation. Second operation is querying this potentially large-row table but we're searching very small (SMALLINT) hopefully indexed columns.
Now, you may be seeing the complexity on the code side of things. I'm targeting mostly bars in my particular project so it's going to be very safe to assume that I will have a considerable number of businesses with hours such as "11:00 AM - 2:00 AM (the next day)". That would indeed be 2 entries into both the HoursOfOperations table as well as the Business2HoursMap table. E.g. a bar that is open from 11:00 AM - 2:00 AM will have 2 references to the HoursOfOperations table 660 - 1440 (11:00 AM - Midnight) and 0 - 120 (Midnight - 2:00 AM). Those references would be reflected into the actual days in the Business2HoursMap table as 2 entries in our simplistic case, 1 entry = all days Hours reference #1, another all days reference #2. Hope that makes sense, it's been a long day.
Overriding on special days / holidays / whatever.
Overrides are by nature, date based, not day of week based. I think this is where some of the approaches try to shove the proverbial round peg into a square hole. We need another table.
HoursID | BusinessID | Day | Month | Year
1 | 2 | 1 | 1 | NULL
This can certainly get more complex if you needed something like "on every second Tuesday, this company goes fishing for 4 hours". However, what this will allow us to do quite easily is allow 1 - overrides, 2 - reasonable recurring overrides. E.G. if year IS NULL, then every year on New Years day this weirdo bar is open from 9:00 AM to 5:00 PM keeping in line with our above data examples. I.e. - If year were set, it's only for 2013. If month is null, it's every first day of the month. Again, this won't handle every scheduling scenario by NULL columns alone, but theoretically, you could handle just about anything by relying on a long sequence of absolute dates if needed.
Again, I would cache this table on a rolling day basis. I just can't realistically see the rows for this table in a single-day snapshot being very large, at least for my needs. I would check this table first as it is well, an override and would save a query against the much larger Business2HoursMap table on the storage-side.
Interesting problem. I'm really surprised this is the first time I've really needed to think this through. As always, very keen on different insights, approaches or flaws in my approach.
I think I'd personally go for a start + end time, as it would make everything more flexible. A good question would be: what's the chance that the block size would change at a certain point? Then pick the solution that best fits your situation (if it's liable to change I'd go for the timespans definately).
You could store them as a timespan, and use segments in your application. That way you have the easy input using blocks, while keeping the flexibility to change in your datastore.
To add to what Johnathan Holland said, I would allow for multiple entries for the same day.
I would also allow for decimal time, or another column for minutes.
Why? many restaurants and some businesses, and many businesses around the world have lunch and or afternoon breaks. Also, many restaurants (2 that I know of near my house close at odd non-15-increments time. One closes at 9:40 PM on Sundays, and one closes at 1:40 AM.
There is also the issue of holiday hours , such as stores closing early on thanksgiving day, for example, so you need to have calendar-based override.
Perhaps what can be done is a date/time open, date-time close, such as this:
businessID | datetime | type
==========================================
1 10/1/2008 10:30:00 AM 1
1 10/1/2008 02:45:00 PM 0
1 10/1/2008 05:15:00 PM 1
1 10/2/2008 02:00:00 AM 0
1 10/2/2008 10:30:00 AM 1
etc. (type: 1 being open and 0 closed)
And have all the days in the coming 1 or two years precalculated 1-2 years in advance. Note that you would only have 3 columns: int, date/time/bit so the data consumption should be minimal.
This will also allow you to modify specific dates for odd hours for special days, as they become known.
It also takes care of crossing over midnight, as well as 12/24 hour conversions.
It is also timezone agnostic. If you store start time and duration, when you calculate the end time, is your machine going to give you the TZ adjusted time? Is that what you want? More code.
as far as querying for open-closed status: query the date-time in question,
select top 1 type from thehours where datetimefield<=somedatetime and businessID = somebusinessid order by datetime desc
then look at "type". if one, it's open, if 0, it's closed.
PS: I was in retail for 10 years. So I am familiar with the small business crazy-hours problems.
The segment blocks are better, just make sure you give the user an easy way to set them. Click and drag is good.
Any other system (like ranges) is going to be really annoying when you cross the midnight boundary.
As for how you store them, in C++ bitfields would probably be best. In most other languages, and array might be better (lots of wasted space, but would run faster and be easier to comprehend).
I would think a little about those edge-cases right now, because they are going to inform whether you have a base configuration plus overlay or complete static storage of opening times or whatever.
There are so many exceptions - and on a regular basis (like snow days, irregular holidays like Easter, Good Friday), that if this is expected to be a reliable representation of reality (as opposed to a good guess), you'll need to address it pretty soon in the architecture.
How about something like this:
Store Hours Table
Business_id (int)
Start_Time (time)
End_Time (time)
Condition varchar/string
Open bit
'Condition' is a lambda expression (text for a 'where' clause). Build the query dynamically. So for a particular business you select all of the open/close times
Let Query1 = select count(open) from store_hours where #t between start_time and end_time and open = true and business_id = #id and (.. dynamically built expression)
Let Query2 = select count(closed) from store_hours where #t between start_time and end_time and open = false and business_id = #id and (.. dynamically built expression)
So end the end you want something like:
select cast(Query1 as bit) & ~cast(Query2 as bit)
If the result of the last query is 1 then the store is open at time t, otherwise it is closed.
Now you just need a friendly interface that can generate your where clauses (lambda expressions) for you.
The only other corner case that I can think of is what happens if a store is open from say 7am to 2am on one date but closes at 11pm on the following date. Your system should be able to handle that as well by smartly splitting up the times between the two days.
There is surely no need to conserve memory here, but perhaps a need for clean and comprehensible code. "Bit twiddling" is not, IMHO, the way to go.
We need a set container here, which holds any number of unique items and can determine quickly and easily whether an item is a member or not. The setup reuires care, but in routine use a single line of simply understood code determines if you are open or closed
Concept:
Assign index number to every 15 min block, starting at, say, midnight sunday.
Initialize:
Insert into a set the index number of every 15 min block when you are open. ( Assuming you are open fewer hours than you are closed. )
Use:
Subtract from interesting time, in minutes, midnight the previous sunday and divide by 15. If this number is present in the set, you are open.