I see a lot of discussion about getting dates that are pre-1970. For example, I see people ask a question like, "how do I get a date before 1970?"
What I'd like to know is what is so special about 1970? Why do people have trouble getting dates before that particular year? Was it the beginning of the universe or something?
It is the beginning of the UNIX epoch, timestamp 0. All UNIX timestamps are the number of seconds since January 1st 1970 UTC. The moment of this writing is timestamp 1298440626.
UNIX timestamps pop up in the datetime libraries of a lot of languages and software, as storing times as a number of seconds is convenient for various reasons.
Since 1970 is time 0, dates before then can't typically be stored as timestamps.
It has to do with UNIX times. They're strored as number of seconds since the epoch, and the epoch is defined as the start of the day January 1, 1970 (UTC).
That's also the cause for the upcoming Y2K38 bug where the value will roll over to negative sometime early Feb (from memory) in 2038. Unless they up it to beyond a signed 32-bit value, of course.
It was the beginning of the UNIX era.
Related
Which type I should use to store current date + time in UTC?
Then to be able to convert UTC date to specific timezone?
Now I use TIMESTAMP type and CURRENT_TIMESTAMP.
It stores data like: 2019-08-19 20:44:11
But minutes are different that real UTC time, I dont know why.
My server time is local. It is correct under Windows Server
It is up to you to decide the best way to solve timezone problem when users and server has different locale.
No matter the case and the app (mobile, web, etc.) the problem is the same. You should find the best and easiest in your case way to handle time zones.
Here are few options that you can use:
MySQL
From MySQL Date and Time Types - you can create table fields that will hold your date and time values.
"The date and time types for representing temporal values are DATE, TIME, DATETIME, TIMESTAMP, and YEAR. Each temporal type has a range of valid values, as well as a “zero” value that may be used when you specify an invalid value that MySQL cannot represent. The TIMESTAMP type has special automatic updating behavior, described later."
In respect to MySQL Data Type Storage Requirements read the link and make sure you satisfy the table storage engine and type requirements in your project.
Setting the timezone in MySQL by:
SET time_zone = '+8:00'
To me this is a bit more work to handle, but the data is fully loaded, managed and updated by MySQL. No PHP here!
Using MySQL might seem like a better idea (that's what I'd like to think), but there's a lot more to it.
To be able to choose, you will have to make an educated decision. There's a lot to cover in regards to using MySQL. Here's a practical article that goes into the rabbit hole of using MySQL to manage date, time and timezone.
Since you didn't specify how you interface the database, here's a PHP example and functions to handle the date, time and time zones.
PHP
1. Save date, time and time zone
E.g. Chicago (USA - Illinois) - UTC Offset UTC -5 hours
You can save the date time
2015-11-01 00:00:00
and the time zone
America/Chicago
You will have to work out DST transitions and months having different numbers of days.
Here's a reference to the DateTime to work out any timezone and DST differences:
DateTime Aritmetic
2. Unix Timestamp and Time Zone
Before we go into the details of this option we should be aware of the following:
The unix time stamp is a way to track time as a running total of seconds. This count starts at the Unix Epoch on January 1st, 1970 at UTC. Therefore, the unix time stamp is merely the number of seconds between a particular date and the Unix Epoch. It should also be pointed out (thanks to the comments from visitors to this site) that this point in time technically does not change no matter where you are located on the globe. This is very useful to computer systems for tracking and sorting dated information in dynamic and distributed applications both online and client side.
What happens on January 19, 2038?
On this date the Unix Time Stamp will cease to work due to a 32-bit overflow. Before this moment millions of applications will need to either adopt a new convention for time stamps or be migrated to 64-bit systems which will buy the time stamp a "bit" more time.
Here's how the timestamp works:
08/19/2019 # 8:59pm (UTC) translates to 1566248380 seconds since Jan 01 1970. (UTC)
Using the PHP date() function you can format to anything you want like:
echo date('l jS \of F Y h:i:s A', 1566248380);
Monday 19th of August 2019 08:59:40 PM
or MySQL:
SELECT from_unixtime(2147483647);
+--------------------------------------+
| from_unixtime(2147483647) |
+--------------------------------------+
| 2038-01-19 03:14:07 |
+--------------------------------------+
More example formats that you can convert to:
08/19/2019 # 8:59pm (UTC)
2019-08-19T20:59:40+00:00 in ISO 8601
Mon, 19 Aug 2019 20:59:40 +0000 in RFC 822, 1036, 1123, 2822
Monday, 19-Aug-19 20:59:40 UTC in RFC 2822
2019-08-19T20:59:40+00:00 in RFC 3339
The PHP Date() function can be used as a reference.
Again you will have to save the time zone:
America/Chicago
Set the PHP script time zone for your users by using date_default_timezone_set() function:
// set the default timezone to use. Available since PHP 5.1
date_default_timezone_set('UTC');
date_default_timezone_set('America/Chicago');
You can't store a date/time with time zone information.
MySQL does not store the time zone information on either DATETIME or TIMESTAMP. They are assumed to be on the server time zone.
The only ugly work around is to set the whole MySQL server/vm/docker container to UTC.
I want to store the date of birth as a UNIX timestamp in my database, because this keeps the database small and it speed up the queries.
However, when converting the date of birth to a UNIX time using strtotime, it will output the wrong value, namely the inputted value with one hour difference. I know setting the date_default_timezone_set('UTC'); will output the correct date of birth in UNIX time, but the date of birth has nothing to do with where someone lives, right? Date of birth stays the date of birth, no matter where someone lives.
So in example
$bday = 20;
$bmonth = 6;
$bYear = 1993;
strtotime($cBday.'-'.$cBmonth.'-'.$cByear) // output: 740527200 == Sat, 19 Jun 1993 22:00:00
PS: Database field is defined as: bDate int(4) UNSIGNED
UTC is not a great choice for whole calendar dates such as a date of birth.
My date of birth is 1976-08-27. Not 1976-08-27T00:00:00Z.
I currently live in the US Pacific time zone.
My next birthday is from 2016-08-27T00:00:00-07:00 until 2016-08-28T00:00:00-07:00
In UTC, that's equivalent to 2016-08-27T07:00:00Z until 2016-08-28T07:00:00Z
Of course, if I move to a different time zone before then, I'll celebrate my birthday over a completely different set of ranges.
If I move to Japan, then my birthday will come 16 hours sooner.
My next birthday would be from 2016-08-27T00:00:00+09:00 until 2016-08-28T00:00:00+09:00
In UTC, that's equivalent to 2016-08-26T15:00:00Z until 2016-08-27T15:00:00Z
Therefore, a date of birth (or anniversary date, hire date, etc.) should be stored as a simple year, month and day. No time, and no time zone.
In MySQL, use the DATE type. Do not use DATETIME, TIMESTAMP or an integer containing Unix time.
Also consider that evaluation of age depends on the time zone where the person is currently located, not the time zone where they were born. If the person's location is unknown to the asker - then it's the asker's time zone that is relevant. "How old are you according to you?" is not necessarily the same as "How old are you according to me?".
Of course, where you live doesn't actually make you older or younger - but it comes down to how we as humans evaluate age in years based on our local calendars. If you were instead to ask "How many minutes old am I?" then answer depends on the instantaneous point in time where you were born - which could be measured in UTC, but will usually be given as a local time and time zone. However, in the common case, one does not usually collect that level of detail.
Unix does not know that you are storing a birth date. It just knows that you are storing a timestamp in Unix format. The timestamp includes a time component.
When you convert from the birth date to the timestamp, and back from the timestamp to the birth date, you need to use consistent timezones in order to avoid a time difference in either direction.
Using UTC is a fine choice. The key though is consistency.
This is a HARD question. In fact it is so hard it seems the SQL standard and most of the major databases out there don't have a clue in their implementation.
Converting all datetimes to UTC allows for easy comparison between records but throws away the timezone information, which means you can't do calculations with them (e.g. add 8 months to a stored datetime) nor retrieve them in the time zone they were stored in. So the naive approach is out.
Storing the timezone offset from UTC in addition to the timestamp (e.g. timestamp with time zone in postgres) would seem to be enough, but different timezones can have the same offset at one point in the year and a different one 6 months later due to DST. For example you could have New York and Chile both at UTC-4 now (August) but after the 4th of November New York will be UTC-5 and Chile (after the 2nd of September) will be UTC-3. So storing just the offset will not allow you to do accurate calculations either. Like the above naive approach it also discards information.
What if you store the timezone identifier (e.g. America/Santiago) with the timestamp instead? This would allow you to distinguish between a Chilean datetime and a New York datetime. But this still isn't enough. If you are storing an expiration date, say midnight 6 months into the future, and the DST rules change (as unfortunately politicians like to do) then your timestamp will be wrong and expiration could happen at 11 pm or 1 am instead. Which might or might not be a big deal to your application. So using a timestamp also discards information.
It seems that to truly be accurate you need to store the local datetime (e.g. using a non timezone aware timestamp type) with the timezone identifier. To support faster comparisons you could cache the utc version of it until the timezone db you use is updated, and then update the cached value if it has changed. So that would be 2 naive timestamp types plus a timezone identifier and some kind of external cron job that checks if the timezone db has changed and runs the appropriate update queries for the cached timestamp.
Is that an accurate solution? Or am I still missing something? Could it be done better?
I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.
You've summarized the problem well. Sadly the answer is to do what you've described.
The correct format to use does depend the pragmatics of what the timestamp is supposed to represent. It can in general be divided between past and future events (though there are exceptions):
Past events can and usually should be stored as something which can never be reinterpreted differently. (eg: a UTC time stamp with a numeric time zone). If the named time zone should be kept (to be informative to the user) then this should be separate.
Future events need the solution you've described. Local timestamp and named time zone. This is because you want to change the "actual" (UTC) time of that event when the time zone rules change.
I would question if time zone conversion is such an overhead? It's usually pretty quick. I'd only go through the pain of caching if you are seeing a really significant performance hit. There are (as you pointed out) some big operations which will require caching (such as sorting billions of rows based on the actual (UTC) time.
If you require future events to be cached in UTC for performance reasons then yes, you need to put a process in place to update the cached values. Depending of the type of DB it is possible that this could be done by the sysadmins as TZ rules change rarely.
If you care about the offset, you should store the actual offset. Storing the timezone identifier is not that same thing as timezones can, and do, change over time. By storing the timezone offset, you can calculate the correct local time at the time of the event, rather than the local time based on the current offset. You may still want to store the timezone identifier, if it's important to know what actual timezone event was considered to have happened in.
Remember, time is a physical attribute, but a timezone is a political one.
If you convert to UTC you can order and compare the records
If you add the name of the timezone it originated from you can represent it in it's original tz and be able to add/substract timeperiods like weeks, months etc (instead of elapsed time).
In your question you state that this is not enough because DST might be changed. DST makes calculating with dates (other than elapsed time) complicated and quite code intensive. Just like you need code to deal with leap years you need to take into account if for a given data / period you need to apply a DST correction or not. For some years the answer will be yes for others no.
See this wiki page for how complex those rules have become.
Storing the offset is basically storing the result of those calculations. That calculated offset is only valid for that given point in time and can't be applied as is to later or earlier points like you suggest in your question. You do the calculation on the UTC time and then convert the resulting time to the required timezone based on the rules that are active at that time in that timezone.
Note that there wasn't any DST before the first world war anywhere and date/time systems in databases handle those cases perfectly.
I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.
Oracle converts with instant in time to UTC but keeps the time zone or UTC offset depending on what you pass. Oracle (correctly) makes a difference between the time zone and UTC offset and returns what you passed to you. This only costs two additional bytes.
Oracle does all calculations on TIMESTAMP WITH TIME ZONE in UTC. This is does not make a difference for adding months, but makes a difference for adding days as there is no daylight savings time. Note that the result of a calculation must always be a valid timestamp, e.g. adding one month to January 31st will throw an exception in Oracle as February 31st does not exist.
what is the very least date and time I can have for my time element to be machine readable. An example would help very much.
The spec says this:
The time element represents either a
time on a 24 hour clock, or a precise
date in the proleptic Gregorian
calendar, optionally with a time and a
time-zone offset.
You may ask, "what is the 'proleptic Gregorian calendar'?". I sure did.
According to Wikipedia:
The proleptic Gregorian calendar is
produced by extending the Gregorian
calendar backward to dates preceding
its official introduction in 1582.
Another informative paragraph from the spec:
The time element is not intended for
encoding times for which a precise
date or time cannot be established.
For example, it would be inappropriate
for encoding times like "one
millisecond after the big bang", "the
early part of the Jurassic period", or
"a winter around 250 BCE".
For dates before the introduction of
the Gregorian calendar, authors are
encouraged to not use the time
element, or else to be very careful
about converting dates and times from
the period to the Gregorian calendar.
This is complicated by the manner in
which the Gregorian calendar was
phased in, which occurred at different
times in different countries, ranging
from partway through the 16th century
all the way to early in the 20th.
So, it looks like the answer is: don't use it for dates before the introduction of the Gregorian calendar, or else be careful about it.
You asked:
what is the very least date and time I
can have for my time element to be
machine readable.
It depends on what machine is reading it.
For example, a lot of software won't handle dates before Unix time (January 1, 1970).
I can't see any info about that. Where can I find the oldest date Mysql can support ?
For the specific example you used on your question (year 1200), technically things will work.
In general, however, timestamps are unadvisable for this uses.
First, the range limitation is arbitrary: in MySQL it's Jan 1st, 1000. If you are working with 12-13th century stuff, things go fine... but if at some moment you need to add something older (10th century or earlier), the date will miserably break, and fixing the issue will require re-formatting all your historic dates into something more adequate.
Timestamps are normally represented as raw integers, with a given "tick interval" and "epoch point", so the number is indeed the number of ticks elapsed since the epoch to the represented date (or viceversa for negative dates). This means that, as with any fixed-with integer data-type, the set of representable values is finite. Most timestamp formats I know about sacrifice range in favor of precision, mostly because applications that need to perform time arithmetic often need to do so with a decent precision; while applications that need to work with historical dates very rarely need to perform serious arithmetic.
In other words, timestamps are meant for precise representation of dates. Second (or even fraction of second) precission makes no sense for historical dates: could you tell me, down to the milliseconds, when was Henry the 8th crowned as King of England?
In the case of MySQL, the format is inherently defined as "4-digit years", so any related optimization can rely on the assumption that the year will have 4 digits, or that the entire string will have exactly 10 chars ("yyyy-mm-dd"), etc. It's just a matter of luck that the date you mentioned on your title still fits, but even relying on that is still dangerous: besides what the DB itself can store, you need to be aware of what the rest of your server stack can manipulate. For example, if you are using PHP to interact with your database, trying to handle historical dates is very likely to crash at some point or another (on a 32-bit environment, the range for UNIX-style timestamps is December 13, 1901 through January 19, 2038).
In summary: MySQL will store properly any date with a 4-digit year; but in general using timestamps for historical dates is almost guaranteed to trigger issues and headaches more often than not. I strongly advise against such usage.
Hope this helps.
Edit/addition:
Thank you for this very insteresting
answer. Should I create my own algo
for historical date or choose another
db but which one ? – user284523
I don't think any DB has too much support for this kind of dates: applications using it most often have enough with string-/text- representation. Actually, for dates on year 1 and later, a textual representation will even yield correct sorting / comparisons (as long as the date is represented by order of magnitude: y,m,d order). Comparisons will break, however, if "negative" dates are also involved (they would still compare as earlier than any positive one, but comparing two negative dates would yield a reversed result).
If you only need Year 1 and later dates, or if you don't need sorting, then you can make your life a lot easier by using strings.
Otherwise, the best approach is to use some kind of number, and define your own "tick interval" and "epoch point". A good interval could be days (unless you really need further precission, but even then you can rely on "real" (floating-point) numbers instead of integers); and a reasonable epoch could be Jan 1, 1. The main problem will be turning these values to their text representation, and viceversa. You need to keep in mind the following details:
Leap years have one extra day.
The rule for leap years was "any multiple of 4" until 1582, when it changed from the Julian to the Gregorian calendar and became "multiple of 4 except those that are multiples of 100 unless they are also multiples of 400".
The last day of the Julian calendar was Oct 4th, 1582. The next day, first of the Gregorian calendar, was Oct 15th, 1582. 10 days were skipped to make the new calendar match again with the seasons.
As stated in the comments, the two rules above vary by country: Papal states and some catholic countries did adopt the new calendar on the stated dates, but many other countries took longer to do so (the last being Turkey in 1926). This means that any date between the papal bull in 1582 and the last adoption in 1926 will be ambiguous without geographical context, and even more complex to process.
There is no "year 0": the year before year 1 was year -1, or year 1 BCE.
All of this requires quite elaborate parser and formater functions, but beyond the many case-by-case breakings there isn't really too much complexity (it'd be tedious to code, but quite straight-forward). The use of numbers as the underlying representation ensures correct sorting/comparing for any pair of values.
Knowing this, now it's your choice to take the approach that better fits your needs.
From the documentation:
DATE
A date. The supported range is '1000-01-01' to
'9999-12-31'.
Yes. MySQL dates start in year 1000.
For whatever it's worth, I found that the MySQL DATE field does support dates < 1000 in practice, though the documentation says otherwise. E.g., I was able to enter 325 and it stores as 0325-00-00. A search WHERE table.date < 1000 also gave correct results.
But I am hesitant to rely on the < 1000 dates when they are not officially supported, plus I sometimes need BCE years with more than 4 digits anyway (e.g. 10000 BCE). So separate INT fields for year, month and day (as suggested above) do seem the only choice.
I do wish the DATE type (or perhaps a new HISTDATE type) supported a full range of historical dates - it would be nice to combine three fields into one and simply sort by date instead of having to sort by year, month, day.
Use SMALLINT for year, so the year will accept from -32768 (BC) to 32768 (AD)
As for months and days, use TINYINT UNSIGNED
Most historical events dont have months and days, so you could query like this :
SELECT events FROM history WHERE year='-4990'
Result : 'Noah Ark'
Or : SELECT events FROM history WHERE year='570' AND month='4' AND day='20'
return : "Muhammad pbuh was born"
Depending on requirements, you could also add DATETIME column and make it NULL for date before 1000 and vice versa (thus saving some bytes)
This is an important and interesting problem which has another solution.
Instead of relying on the database platform to support a potentially infinite number of dates with millisecond precision, rely on an object-oriented programming language compiler and runtime to correctly handle date and time arithmetic.
It is possible to do this using the Java Virtual Machine (JVM), where time is measured in milliseconds relative to midnight, January 1, 1970 UTC (Epoch), by persisting the required value as a long in the database (including negative values), and performing the required conversion/calculation in the component layer after retrieval.
For example:
Date d = new Date(Long.MIN_VALUE);
DateFormat df = new SimpleDateFormat("EEE, d MMM yyyy G HH:mm:ss Z");
System.out.println(df.format(d));
Should show:
Sun, 2 Dec 292269055 BC 16:47:04 +0000
This also enables independence of database versions and platforms as it abstracts all date and time arithmetic to the JVM runtime, i.e. changes in database versions and platforms will be much less likely to require re-implementation, if at all.
I had the similar problem and I wanted to continue relay on date fields in the DB to allow me use date range search with accuracy of up-to a day for historic values.
(My DB includes date of birth and dates of roman emperors...)
The solution was to add a constant year (example: 3000) to all the dates before adding them to the DB and subtracting the same number before displaying the query results to the users.
If you DB has already some dates value in it, remember to update the exiting value with the new const number.