I can't see any info about that. Where can I find the oldest date Mysql can support ?
For the specific example you used on your question (year 1200), technically things will work.
In general, however, timestamps are unadvisable for this uses.
First, the range limitation is arbitrary: in MySQL it's Jan 1st, 1000. If you are working with 12-13th century stuff, things go fine... but if at some moment you need to add something older (10th century or earlier), the date will miserably break, and fixing the issue will require re-formatting all your historic dates into something more adequate.
Timestamps are normally represented as raw integers, with a given "tick interval" and "epoch point", so the number is indeed the number of ticks elapsed since the epoch to the represented date (or viceversa for negative dates). This means that, as with any fixed-with integer data-type, the set of representable values is finite. Most timestamp formats I know about sacrifice range in favor of precision, mostly because applications that need to perform time arithmetic often need to do so with a decent precision; while applications that need to work with historical dates very rarely need to perform serious arithmetic.
In other words, timestamps are meant for precise representation of dates. Second (or even fraction of second) precission makes no sense for historical dates: could you tell me, down to the milliseconds, when was Henry the 8th crowned as King of England?
In the case of MySQL, the format is inherently defined as "4-digit years", so any related optimization can rely on the assumption that the year will have 4 digits, or that the entire string will have exactly 10 chars ("yyyy-mm-dd"), etc. It's just a matter of luck that the date you mentioned on your title still fits, but even relying on that is still dangerous: besides what the DB itself can store, you need to be aware of what the rest of your server stack can manipulate. For example, if you are using PHP to interact with your database, trying to handle historical dates is very likely to crash at some point or another (on a 32-bit environment, the range for UNIX-style timestamps is December 13, 1901 through January 19, 2038).
In summary: MySQL will store properly any date with a 4-digit year; but in general using timestamps for historical dates is almost guaranteed to trigger issues and headaches more often than not. I strongly advise against such usage.
Hope this helps.
Edit/addition:
Thank you for this very insteresting
answer. Should I create my own algo
for historical date or choose another
db but which one ? – user284523
I don't think any DB has too much support for this kind of dates: applications using it most often have enough with string-/text- representation. Actually, for dates on year 1 and later, a textual representation will even yield correct sorting / comparisons (as long as the date is represented by order of magnitude: y,m,d order). Comparisons will break, however, if "negative" dates are also involved (they would still compare as earlier than any positive one, but comparing two negative dates would yield a reversed result).
If you only need Year 1 and later dates, or if you don't need sorting, then you can make your life a lot easier by using strings.
Otherwise, the best approach is to use some kind of number, and define your own "tick interval" and "epoch point". A good interval could be days (unless you really need further precission, but even then you can rely on "real" (floating-point) numbers instead of integers); and a reasonable epoch could be Jan 1, 1. The main problem will be turning these values to their text representation, and viceversa. You need to keep in mind the following details:
Leap years have one extra day.
The rule for leap years was "any multiple of 4" until 1582, when it changed from the Julian to the Gregorian calendar and became "multiple of 4 except those that are multiples of 100 unless they are also multiples of 400".
The last day of the Julian calendar was Oct 4th, 1582. The next day, first of the Gregorian calendar, was Oct 15th, 1582. 10 days were skipped to make the new calendar match again with the seasons.
As stated in the comments, the two rules above vary by country: Papal states and some catholic countries did adopt the new calendar on the stated dates, but many other countries took longer to do so (the last being Turkey in 1926). This means that any date between the papal bull in 1582 and the last adoption in 1926 will be ambiguous without geographical context, and even more complex to process.
There is no "year 0": the year before year 1 was year -1, or year 1 BCE.
All of this requires quite elaborate parser and formater functions, but beyond the many case-by-case breakings there isn't really too much complexity (it'd be tedious to code, but quite straight-forward). The use of numbers as the underlying representation ensures correct sorting/comparing for any pair of values.
Knowing this, now it's your choice to take the approach that better fits your needs.
From the documentation:
DATE
A date. The supported range is '1000-01-01' to
'9999-12-31'.
Yes. MySQL dates start in year 1000.
For whatever it's worth, I found that the MySQL DATE field does support dates < 1000 in practice, though the documentation says otherwise. E.g., I was able to enter 325 and it stores as 0325-00-00. A search WHERE table.date < 1000 also gave correct results.
But I am hesitant to rely on the < 1000 dates when they are not officially supported, plus I sometimes need BCE years with more than 4 digits anyway (e.g. 10000 BCE). So separate INT fields for year, month and day (as suggested above) do seem the only choice.
I do wish the DATE type (or perhaps a new HISTDATE type) supported a full range of historical dates - it would be nice to combine three fields into one and simply sort by date instead of having to sort by year, month, day.
Use SMALLINT for year, so the year will accept from -32768 (BC) to 32768 (AD)
As for months and days, use TINYINT UNSIGNED
Most historical events dont have months and days, so you could query like this :
SELECT events FROM history WHERE year='-4990'
Result : 'Noah Ark'
Or : SELECT events FROM history WHERE year='570' AND month='4' AND day='20'
return : "Muhammad pbuh was born"
Depending on requirements, you could also add DATETIME column and make it NULL for date before 1000 and vice versa (thus saving some bytes)
This is an important and interesting problem which has another solution.
Instead of relying on the database platform to support a potentially infinite number of dates with millisecond precision, rely on an object-oriented programming language compiler and runtime to correctly handle date and time arithmetic.
It is possible to do this using the Java Virtual Machine (JVM), where time is measured in milliseconds relative to midnight, January 1, 1970 UTC (Epoch), by persisting the required value as a long in the database (including negative values), and performing the required conversion/calculation in the component layer after retrieval.
For example:
Date d = new Date(Long.MIN_VALUE);
DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy G HH:mm:ss Z");
System.out.println(df.format(d));
Should show:
Sun, 2 Dec 292269055 BC 16:47:04 +0000
This also enables independence of database versions and platforms as it abstracts all date and time arithmetic to the JVM runtime, i.e. changes in database versions and platforms will be much less likely to require re-implementation, if at all.
I had the similar problem and I wanted to continue relay on date fields in the DB to allow me use date range search with accuracy of up-to a day for historic values.
(My DB includes date of birth and dates of roman emperors...)
The solution was to add a constant year (example: 3000) to all the dates before adding them to the DB and subtracting the same number before displaying the query results to the users.
If you DB has already some dates value in it, remember to update the exiting value with the new const number.
Related
I would like to know if the data in the columns (below) are in Daylight Savings or not
Column names : Start date , End date
As stated, your question fails to acknowledge that there are a lot of different definitions of Daylight Savings time. I wish I could find the article -- there was a great interview with the original maintainers of the DST tables about how they were actually taking guesses in many parts of the world.
The timezone package in Debian and others defines many of these rules to the best ability of the given authorities.
My advice: convert all dates to a known timezone using authoritative tools (UTC is the typical case; or secondary your local time zone), and do all your calculations based on that, without trying to understand the 120+ different timezone definitions.
If you really want to know if a given date is in a DST situation, I would convert the date to a given timezone, format it with a DST extension (e.g. MDT/MST for Mountain Daylight Time or Mountain Standard Time) and look for the MDT/MST suffix to determine daylight-ness. You'll have to define which locale you want daylight-ness, however as the cutoff day varies from locale to locale.
I'm thinking about making a project in a database with a large amount of objects / people / animals / buildings, etc.
The application would let the user select two candidates and see which came first. The comparison would be made by date, or course.
MySQL only allow dates after 01/01/1000.
If one user were to compare which came first: Michael Jackson or Fred Mercury, the answer would be easy since they came after this year.
But if they were to compare which came first: Tyranosaurus Rex or Dog, they both came before the accepted date.
How could I make those comparisons considering the SQL limit?
I didn't do anything yet, but this is something I'd like to know before I start doing something that will never work.
THIS IS NOT A DUPLICATE OF OTHER QUESTIONS ABOUT OLD DATES.
In other questions, people are asking about how to store. It would be extremely easy, just make a string out of it. But in my case, I'd need to compare such dates, which they didn't ask for, yet.
I could store the dates as a string, using A for after and B for before, as people answered in other questions. There would be no problem. But how could I compare those dates? What part of the string I'd need to break?
You could take a signed BIGINT field and use it as a UNIX timestamp.
A UNIX timestamp is the number of seconds that passed since January 1, 1970, at 0:00 UTC.
Any point in time would simply be a negative timestamp.
If my amateurish calculation is correct, a BIGINT would be enough to take you 292471208678 years into the past (from 1970) and the same number of years into the future. That ought to be enough for pretty much anything.
That would make dates very easy to compare - you'd simply have to see whether one date is bigger than the other.
The conversion from calendar date to timestamp you'd have to do outside mySQL, though.
Depending on what platform you are using there may be a date library to help you with the task.
Why deal with static age at time of entry and offset?
User is going to want to see a date as a date anyway
Complex data entry
Three fields
year smallint (good for up to -32,768 BC)
month tinyint
day tinyint
if ( (y1*10000 + m1*100 + d1) > (y2*10000 + m2*100 + d2) )
OK I had an idea.
Store the age in days, since the hours/seconds are irrelevant for this case.
Christ's age in days: -2015 * 365.
Dog's age in days: -40000 * 365.
In order to make precise calculations, I'd only need an extra field with the date I have added the values. Then add to the "age in days" the difference in days from the day I have added the register, from the day the user is making the comparison.
For example:
Dog's age has been added in 29/12/2015 and the age in days is -40000 * 365.
User is making a comparison on day 29/01/2016.
The difference in days between the two dates is 31 days.
So dog's age in days should be -40000 * 365 - 31.
Using an unsigned big int can do the trick.
Thanks to Pekka for suggesting using negative numbers for any date before the current date.
In order to analyze dates and times I am creating a MySQL table where I want to keep the time information. Some example analyses will be stuff like:
Items per day/week/month/year
Items per weekday
Items per hour
etc.
Now in regards to performance, what way should I record in my datatable:
date type: Unix timestamp?
date type: datetime?
or keep date information in one row each, e.g. year, month, day in separate fields?
The last one, for example, would be handy if I'm analysing by weekday; I wouldn't have to perform WEEKDAY(item.date) on MySQL but could simply use WHERE item.weekday = :w.
Based on your usage, you want to use the native datetime format. Unix formats are most useful when the major operations are (1) ordering; (2) taking differences in seconds/minutes/hours/days; and (3) adding seconds/minutes/hours/days. They need to be converted to internal date time formats to get the month or week day, for instance.
You also have a potential indexing issue. If you want to select ranges of days, hours, months and so on for your results, then you want an index on the column. For this purpose an index on a datetime is probably sufficient.
If the summaries are by hour, you might find it helpful to stored the date component in a date field and the hour in a separate column. That would be particularly helpful if you are combining hours from different days.
Whether you break out other components of the date, such as weekday and month, for indexing purposes would depend on the volume of data in the table, performance requirements, and the queries you are planning on running. I would not be inclined to do this, except as a later optimization.
The rule of thumb is: store things as they should be stored, don't do performance tweaks until you're hitting the bottleneck. If you store your date as separate fields, you'll eventually stumble upon a situation you need this date as a whole inside your database (e.g. update query for a particular range of time), and this will be like hell - condition from 3 april 2015 till 15 may 2015 would be as giant as possible.
You should keep your dates as date type. This will grant you maximum flexibility, (most probably) query readability and will keep all of your opportunities to work with them. The only thing I really can recommend is storing the same date divided into year/month/day in next columns - of course, this will bloat your database and require extreme caution on update scenarios, but this will allow you to use any variant of source data in your queries.
I have some data on fatalities which I'm trying to store, and I'm trying to come up with a reasonable scheme for storing the age of the person when they died.
I don't have DoB data for any of them, but I do have date of death generally (although not always very precisely) and I have data of varying accuracy for their age at death.
Some typical source data might be:
between 20 and 29 years old (or "in their 20s")
5 years old
2 months old
40 days old
adult
child
elderly
I have typically been storing this in three fields...
age_min (integer years)
age_max (integer years)
age_category (enum - baby, child, adult, elderly)
...but clearly this doesn't capture the 2 months old or 40 days old very well, both of which would simply end up as 0 years in my current schema, which is needlessly throwing away information.
It is very important that the database is honest about the precision to which information is known. So converting 2 months into 60 days, for example, would be a bad thing, because it implies a level of precision the source data didn't provide - converting it into 60-90 days might be ok.
I also considered adding a units field so I'd have...
age_min (integer)
age_min_unit (enum - days, months, years)
but the problem with this is it makes comparisons annoying. 24 months == 2 years, but dealing with that just makes a lot of code much more complex than I suspect it needs to be.
I could store all ages in days, with a min and a max, but then the complexity becomes converting that back into something human readable which isn't clunky and doesn't express a greater degree of precision than I actually have.
So for example, 40 days might end up being rendered at 1 month, 10 days which is actually a little less precise than saying 40 days.
Ok just adding it answer for future
Can you try to use the age_min and age_max in days and also carry one more field as "human_readable_age_text" which reads , say "40 days"
Been there, done that. The least ambiguous and easiest to process is to convert everything to days and add a +/- tolerance. That way everything can be stored in 2 fields and all situations are covered. Obviously you have to convert to human readable format before display.
If you have date of birth and date of death the tolerance becomes 0.
Thus the following input values will yield the indicated stored values.
5 years: 2007 183 (ie. 5.5 x 365 = 2007 days. 365/2 = +/-183 days.)
2 months: 75 15
9 years 7 months: 3512 15
child: First value is midpoint of your preferred "child" age range in days. (1-12?, 3-18?). Tolerance is half that.
baby: Same again. Decide on what constitutes a "baby" (0-2?) and generate the values accordingly.
Store the value as min+max+unit. 'adult','child'... etc can be represented as a unit of age for which the min and max would be ignored.
Then you need to find the answer to philosophical questions like "Who is older: a child or a person between 5 and 12 years old?".
When you have the answer to those for all of the possible combo's of age types you will be able to tell if it's possible to use a canonical representation of the age (e.g. days) for comparing.
If its possible - you can add an additional field with the age in days (or seconds, or something...) to use for comparing/sorting. The compare field can be calculated with a trigger, or in the app.
If its not possible - you will need a custom comparator for sorting, afaik that can't be done in MySQL so you will probably have to do all sorting and comparing in the app.
This is a HARD question. In fact it is so hard it seems the SQL standard and most of the major databases out there don't have a clue in their implementation.
Converting all datetimes to UTC allows for easy comparison between records but throws away the timezone information, which means you can't do calculations with them (e.g. add 8 months to a stored datetime) nor retrieve them in the time zone they were stored in. So the naive approach is out.
Storing the timezone offset from UTC in addition to the timestamp (e.g. timestamp with time zone in postgres) would seem to be enough, but different timezones can have the same offset at one point in the year and a different one 6 months later due to DST. For example you could have New York and Chile both at UTC-4 now (August) but after the 4th of November New York will be UTC-5 and Chile (after the 2nd of September) will be UTC-3. So storing just the offset will not allow you to do accurate calculations either. Like the above naive approach it also discards information.
What if you store the timezone identifier (e.g. America/Santiago) with the timestamp instead? This would allow you to distinguish between a Chilean datetime and a New York datetime. But this still isn't enough. If you are storing an expiration date, say midnight 6 months into the future, and the DST rules change (as unfortunately politicians like to do) then your timestamp will be wrong and expiration could happen at 11 pm or 1 am instead. Which might or might not be a big deal to your application. So using a timestamp also discards information.
It seems that to truly be accurate you need to store the local datetime (e.g. using a non timezone aware timestamp type) with the timezone identifier. To support faster comparisons you could cache the utc version of it until the timezone db you use is updated, and then update the cached value if it has changed. So that would be 2 naive timestamp types plus a timezone identifier and some kind of external cron job that checks if the timezone db has changed and runs the appropriate update queries for the cached timestamp.
Is that an accurate solution? Or am I still missing something? Could it be done better?
I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.
You've summarized the problem well. Sadly the answer is to do what you've described.
The correct format to use does depend the pragmatics of what the timestamp is supposed to represent. It can in general be divided between past and future events (though there are exceptions):
Past events can and usually should be stored as something which can never be reinterpreted differently. (eg: a UTC time stamp with a numeric time zone). If the named time zone should be kept (to be informative to the user) then this should be separate.
Future events need the solution you've described. Local timestamp and named time zone. This is because you want to change the "actual" (UTC) time of that event when the time zone rules change.
I would question if time zone conversion is such an overhead? It's usually pretty quick. I'd only go through the pain of caching if you are seeing a really significant performance hit. There are (as you pointed out) some big operations which will require caching (such as sorting billions of rows based on the actual (UTC) time.
If you require future events to be cached in UTC for performance reasons then yes, you need to put a process in place to update the cached values. Depending of the type of DB it is possible that this could be done by the sysadmins as TZ rules change rarely.
If you care about the offset, you should store the actual offset. Storing the timezone identifier is not that same thing as timezones can, and do, change over time. By storing the timezone offset, you can calculate the correct local time at the time of the event, rather than the local time based on the current offset. You may still want to store the timezone identifier, if it's important to know what actual timezone event was considered to have happened in.
Remember, time is a physical attribute, but a timezone is a political one.
If you convert to UTC you can order and compare the records
If you add the name of the timezone it originated from you can represent it in it's original tz and be able to add/substract timeperiods like weeks, months etc (instead of elapsed time).
In your question you state that this is not enough because DST might be changed. DST makes calculating with dates (other than elapsed time) complicated and quite code intensive. Just like you need code to deal with leap years you need to take into account if for a given data / period you need to apply a DST correction or not. For some years the answer will be yes for others no.
See this wiki page for how complex those rules have become.
Storing the offset is basically storing the result of those calculations. That calculated offset is only valid for that given point in time and can't be applied as is to later or earlier points like you suggest in your question. You do the calculation on the UTC time and then convert the resulting time to the required timezone based on the rules that are active at that time in that timezone.
Note that there wasn't any DST before the first world war anywhere and date/time systems in databases handle those cases perfectly.
I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.
Oracle converts with instant in time to UTC but keeps the time zone or UTC offset depending on what you pass. Oracle (correctly) makes a difference between the time zone and UTC offset and returns what you passed to you. This only costs two additional bytes.
Oracle does all calculations on TIMESTAMP WITH TIME ZONE in UTC. This is does not make a difference for adding months, but makes a difference for adding days as there is no daylight savings time. Note that the result of a calculation must always be a valid timestamp, e.g. adding one month to January 31st will throw an exception in Oracle as February 31st does not exist.