Daylight saving in timestamps

Daylight saving in timestamps - mysql

I'm running a matlab function (fastinsert) to insert data into MySQL. The results are correct for the whole year except for 1 hour in March, during daylight saving. In fact it seems that I cannot insert data between 2:00am and 3:00am on that day.
For example with:
ts = 2006 3 26 2 30 0
looking within the matlab function I found that the problem lies into:
java.sql.Timestamp(ts(1)-1900,ts(2)-1,ts(3),ts(4),ts(5),secs,nanosecs)
that gives as a result:
2006-03-26 03:30:00.0
How can I solve this?

I've run into similar problems in storing datetime on many occasions. Treating the value as a derived value seems to make the most sense. In other words, instead of storing the local time store the value as GMT and Time Zone. Then derive the appropriate value when you query the data.
This has the added benefit of making it possible to store values from multiple locations without having to worry about confusion down the road.

Related

Date stored in database and date fetched from database differes

In database, the table-> date column showing the correct date and time.
in database table column :
2020-08-25 04:00:32.217609
But when I am fetching the same date, it's showing the exactly 24 hrs old date and time.
fetched from database :
2020-08-24T16:00:32.217Z
I think it's about local timezone and also the format is different when fetching. I am trying to understand the issue and then looking for solution.
Note: I am fetching the data using typeorm queryBuilder.

Yes, they are the same time in different time zones. The first is in your local time zone (New Zealand Standard Time) 12 hours ahead of UTC. The Z at the end of the second indicates it is in UTC, 12 hours behind you.
The other difference is in the fractional seconds. Your database is storing in microseconds. Your program is storing in milliseconds, or only displaying milliseconds.

Understanding OUTER JOIN for DateTimes, Tableau

I am trying to understand the uses and limitation of outer joins in tableau (tableau online in this case). I have found the beavaiour of tableau to be not what I have expected.
I have provided as detailed a description to my problems below, to avoid any ambiguotity and since I don't know where to start anymore. I hope I have not gone overboard(edits welcome).
Specifics of my use case
I am creating a join between two .csv files that have logged natural data at specific time intervals. One set has hourly time intervals, the other at intervals of minutes (which is variable due to various factors).
'Rain' data set(1):
Date and Time | Rain
01/01/2018 00:00 | 0
01/01/2018 01:00 | 0.4
01/01/2018 02:00 | 1.4
01/01/2018 03:00 | 0.4
'Fill' data set (2):
Date and Time | Fill
24/04/2018 06:04 | 78
24/04/2018 12:44 | 104
24/04/2018 18:51 | 96
25/04/2018 00:20 | 84
Unsurprisingly, I have many nulls in the data (which is not a problem to me) as:
'Rain' has a longer time series
In either data set, the majority of date times do not have an exact equivalent in the other
screenshot of data join here
What I am trying to achieve
I am trying to graph the two data sets in such a way that that I can compare the full data sets against each other, in all of the following ways:
Monthly or Yearly aggregation (average)
Hourly aggregation (average)
Exact times
Problems (and my limited assumptions)
Once graphed in tableau some values had 'null' DateTime values*.
Once graphed in tableau it appears as if many points are simply missing**
Graphing using 'Fill' time series
Graphing using 'Rain' time series
I had assumed (giving the full outer join of 'Date and Time(s)') tableau would join the data sets in chronological order with a common date time series
* I had assumed it impossible for the join conditions to have 'null' values without throwing an error. Also, the data is clean and uniform
** And this is when aggregating monthly, which I assumed would not be affected by any (if any) hourly/minute mismatches
So, finally the question #
In my reading of the online help documentation I am struggling to find a functionality that is native to tableau that can help me achieve these specific goals. I am reaching the worrying conclusion that tableau was not built for this type of 'visual analytics'.
Is there a functionality native to tableau that will allow me to combine the data in the way I described above?
Approaches I have considered
Since I have two .csv files I could combine both set so that I have the full, granular 'Date and Time' fields in one tall list.
However, I would like to find a method that is natural to tableau (online) because in future, at least some of the data wil come from a database (postgres) connection but others will likely have to remain as upload as a .csv or excel files.
Again I ask
What am I overlooking in regards how (and why) to use tableau?
I am not looking for a complete solution, but what tools could I use to achieve this?
Many thanks for any help

Your databases more specifically datasources are in a different level of granularity one is in hours(Higher Level of granularity) and other is in minutes (Lower level of granularity) but your requirmenet is different
Year/Month -- High aggregation
Hourly -- Medium agregation
Exact -- Lower aggregation
When you join two data sources on dates and times (Which would never match) you will get these kind of weird results.
Possible Solution:
Their is a tableau prep tool, use the tool and make both data sources at same level of aggregation, in you case dataset 2 will be aggregated to hour level and the join both the tables, In this case you need to check last requirement (Exact times) as I assume you are looking for the charts at minutes level
Other solution is use blending where primary datasource will be dataset 1 and secondary datasource will be dataset 2, in this case you will get the required data where tableau manages the aggregation and granularity.
Let me know how it goes

So it appears as if various solutions are available.
I want to post this now but will re-edit when I get a bit more time
Option 1
One work-around/solution I found was to create a calculated field as mentioned here and then graph everything against this time series.
This worked well for me even after having created 20+ sheets and numberous dashboards.
As mentioned below, other uses may not provide this flexibility.
Calculation:
IFNULL([Date and Time (Fill.csv)],[Date and Time (Rain.csv)]))
Option 2
This is as mentioned by matt_black a join of the data performs the job quite well. It seems less hacky and is perfect when starting from a clean slate.
I had difficulty creating a join on data sources already in use (will do more poking around on this)
Option 3 ?
As in the answer provided by Siva, blending maybe an option.
I have not confirmed this as of yet.

Is it faster to do string/date comparisons, or to insert-replace in the database?

Problem
I have a database (SQLite or MySQL) full of rainfall values for every day in the last few years. It is very simple:
date | rain
------------------------
"2014-10-20" 3.3
Each day, I pull in a CSV file from my local meteorology bureau. They only publish CSV files with the entire year's data, no daily/weekly/etc files, so by the end of the year there are 365 rows in the file. Within each row the date is split up into the Year, Month, and Day fields.
So when it comes time to store the info in the database, I have two options.
Solution 1: Do date comparison
I would save the date at which I last ran the program, in either the database or a text file. I parse that date using Date.strptime and store it as last_run_time. Then I load the CSV file with CSV.read('raindata.csv').each do |row|, and for every row, I parse the three date fields as a new Date object with rowdate = Date.strptime("#row[2]}-#{row[3]}-#{row[4]}") and say if rowdate > last_run_time then insert info into database.
This way, I avoid making database calls to insert-or-replace values I already have. By the end of the year this spares me 364 database queries, but it means I do a lot of Date parsing and comparing.
Solution 2: Just let the database handle it
I would avoid all of that, and just say for each row in the CSV, insert or ignore into the database. The date field in the DB is unique so if I try to insert but already have the date, it just ignores the query. Pro: avoid making Date comparisons and parsing, con: as many as 364 unnecessary hits to the database.
Question
Which of these two solutions is the smarter, more efficient, more resource-friendly one? Is it better to make unnecessary database queries and spare the CPU, or vice verca?

Database Handles are the most heavy operations whichever solution has lesser number of query is the best approach.
Parsing and Language function have much much much lesser complexity.. so process inputs in language and lesser queries

Hitting the database is probably 1,000 or 1,000,000 times more expansive than comparing dates. Having said that, it makes no difference because making 364 hits to the database once a day is considered zero load for any practical purposes.
If you need your update script to run as fast as possible, do date comparisons. You take the risk that there will be some bugs and maybe some data will be missed sometime in the future.
If you have the extra few seconds, and you care most about data integrity and simplicity update the whole thing daily.

What is enough to store dates/times in the DB from multiple time zones for accurate calculations?

This is a HARD question. In fact it is so hard it seems the SQL standard and most of the major databases out there don't have a clue in their implementation.
Converting all datetimes to UTC allows for easy comparison between records but throws away the timezone information, which means you can't do calculations with them (e.g. add 8 months to a stored datetime) nor retrieve them in the time zone they were stored in. So the naive approach is out.
Storing the timezone offset from UTC in addition to the timestamp (e.g. timestamp with time zone in postgres) would seem to be enough, but different timezones can have the same offset at one point in the year and a different one 6 months later due to DST. For example you could have New York and Chile both at UTC-4 now (August) but after the 4th of November New York will be UTC-5 and Chile (after the 2nd of September) will be UTC-3. So storing just the offset will not allow you to do accurate calculations either. Like the above naive approach it also discards information.
What if you store the timezone identifier (e.g. America/Santiago) with the timestamp instead? This would allow you to distinguish between a Chilean datetime and a New York datetime. But this still isn't enough. If you are storing an expiration date, say midnight 6 months into the future, and the DST rules change (as unfortunately politicians like to do) then your timestamp will be wrong and expiration could happen at 11 pm or 1 am instead. Which might or might not be a big deal to your application. So using a timestamp also discards information.
It seems that to truly be accurate you need to store the local datetime (e.g. using a non timezone aware timestamp type) with the timezone identifier. To support faster comparisons you could cache the utc version of it until the timezone db you use is updated, and then update the cached value if it has changed. So that would be 2 naive timestamp types plus a timezone identifier and some kind of external cron job that checks if the timezone db has changed and runs the appropriate update queries for the cached timestamp.
Is that an accurate solution? Or am I still missing something? Could it be done better?
I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.

You've summarized the problem well. Sadly the answer is to do what you've described.
The correct format to use does depend the pragmatics of what the timestamp is supposed to represent. It can in general be divided between past and future events (though there are exceptions):
Past events can and usually should be stored as something which can never be reinterpreted differently. (eg: a UTC time stamp with a numeric time zone). If the named time zone should be kept (to be informative to the user) then this should be separate.
Future events need the solution you've described. Local timestamp and named time zone. This is because you want to change the "actual" (UTC) time of that event when the time zone rules change.
I would question if time zone conversion is such an overhead? It's usually pretty quick. I'd only go through the pain of caching if you are seeing a really significant performance hit. There are (as you pointed out) some big operations which will require caching (such as sorting billions of rows based on the actual (UTC) time.
If you require future events to be cached in UTC for performance reasons then yes, you need to put a process in place to update the cached values. Depending of the type of DB it is possible that this could be done by the sysadmins as TZ rules change rarely.

If you care about the offset, you should store the actual offset. Storing the timezone identifier is not that same thing as timezones can, and do, change over time. By storing the timezone offset, you can calculate the correct local time at the time of the event, rather than the local time based on the current offset. You may still want to store the timezone identifier, if it's important to know what actual timezone event was considered to have happened in.
Remember, time is a physical attribute, but a timezone is a political one.

If you convert to UTC you can order and compare the records
If you add the name of the timezone it originated from you can represent it in it's original tz and be able to add/substract timeperiods like weeks, months etc (instead of elapsed time).
In your question you state that this is not enough because DST might be changed. DST makes calculating with dates (other than elapsed time) complicated and quite code intensive. Just like you need code to deal with leap years you need to take into account if for a given data / period you need to apply a DST correction or not. For some years the answer will be yes for others no.
See this wiki page for how complex those rules have become.
Storing the offset is basically storing the result of those calculations. That calculated offset is only valid for that given point in time and can't be applied as is to later or earlier points like you suggest in your question. You do the calculation on the UTC time and then convert the resulting time to the required timezone based on the rules that are active at that time in that timezone.
Note that there wasn't any DST before the first world war anywhere and date/time systems in databases handle those cases perfectly.

I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.
Oracle converts with instant in time to UTC but keeps the time zone or UTC offset depending on what you pass. Oracle (correctly) makes a difference between the time zone and UTC offset and returns what you passed to you. This only costs two additional bytes.
Oracle does all calculations on TIMESTAMP WITH TIME ZONE in UTC. This is does not make a difference for adding months, but makes a difference for adding days as there is no daylight savings time. Note that the result of a calculation must always be a valid timestamp, e.g. adding one month to January 31st will throw an exception in Oracle as February 31st does not exist.

Timzone conversions on date only and time only - is it necessary?

We've been working on implementing timezone support for our Web app.
This great SO post has helped us a bunch: Daylight saving time and time zone best practices
We've implelmented the OLSON TZ database in MYSQL and are using that for TZ conversions.
We're building a scheduling app so:
We are storing all our bookings which occur on a specific date at a specific time in UTC time in DateTime fields and converting them using CONVERT_TZ(). This is working great.
What we aren't so sure about is stuff like vacations and breaks:
Vacations are just Date references and don't include a time portion. Because CONVERT_TZ() doesn't work on date objects we are guessing that we are best to just store the date value as per the user's timezone?
id1 id3 startDate endDate
-----------------------------
3 6 2010-12-25 2011-01-03
4 3 2010-09-22 2010-09-26
Same thing with recurring breaks during stored for each day of the week. We currently store their breaks indexed 0-6 for each day of the week. Because these are just time objects we can't use CONVERT_TZ() and assume we should just store them as time values in the user's time zone?
bID sID dayID startTime endTime
--------------------------------
1 4 1 12:00:00 14:00:00
2 4 4 13:30:00 13:30:00
In this case with vacations and breaks we would only compare them to booking times AFTER the booking times have been converted to the user's local time.
Is this the correct way to handle things, or should we be storing both vacations and breaks in some other way so that we can convert them to UTC (not sure how this would work for breaks).
Thanks for your assistance!

The two storage formats look fine. You just need to convert them to the user's local time when you pull them out of the table.
Actually, for the breaks table I presume they're already nominally in local time, so you just compare directly against the local time of the appointment.

I don't understand your question well enough to say my answer is 100% correct for you. But I think what you need to do is store the DateTime in "local" time and also store the timezone. This way you have it correct even if daylight savings time shifts (which happens).
Good article at http://blogs.windwardreports.com/davidt/2009/11/what-every-developer-should-know-about-time.html (yes by me).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008