http://golang.org/pkg/time/
I am building a ISO and RFC complaint core for my new Go system. I am using MySQL and am currently figuring out the most optimal setup for the most important base-tables.
I am trying to figure out how to store the date-time in the database. I want to aim at a good balance between the space the saved time in the database will occupy, but also the query-capabilties and the compatibility with UTC and easy timezone conversion that doesn't give annoying conflicts for inserting and retrieving data into/from Go/MySQL.
I know this sounds a bit weird in context to the title of my question. But I see a lot of wrappers, ORM's and such still storing UNIX timestamps (microseconds?). I think it would be good to just always store UTC nano timestamps and just accepting losing the date/time querying functionalities. I don't want to get into problems when running the system with tons of different countries/languages/timezones/currencies/translations/etc. (internationalizations and localizations). I already encountered these problems before with some systems at work and it drove me nuts to the point where eventually tons of fixes had to be applied through the whole codebase to at least some of the conversion back into order. I don't want this to happen in my system. If it means I always have to do some extra coding to keep all stored times in correct UTC+0, I will take that for granted. Based on ISO-8601 and the timezone aberrations and daytime-savings I will determine the output of the date/time.
The story above is opinion based. But my actual question would be what solely is more efficient to choose Go's timestamp as INT stored vs MySQL TIMESTAMP or DATETIME;
1.) What is most optimal considering storage?
2.) What is most optimal considering timezone conventions?
3.) What is most optimal considering speed and MySQL querying?
The answer to all these questions is simply storing the timestamp in UTC time with t.UTC().UnixNano(), keep in mind that time is int64 so it will always be 8 bytes in the database regardless of precision.
Related
I've spent a lot of wasted time on figuring out timezone issues. I know best practice is to store everything as UTC... but I'm at the point where I don't trust UTC timezone is always preserved.
So my question, is it safe and/or cheaper to store dates as epoch milliseconds in the database instead of a Date type to avoid headaches of whether or not server libraries will convert and stores timezones properly? I found that storing as milliseconds basically fools any servers or even the database to not convert.
I have found it beneficial to think of time zone information as an input/output trait only. I consider it when reading time information, store/process pure time, and only add time zone when formatting for output.
In other words, a moment is never in a time zone. A moment is an absolute point in time that has various names depending on the time zone you are looking at it from.
Suppose I've a table where visitors'(website visitor) information is stored. Suppose, the table structure consists of the following fields:
ID
visitor_id
visit_time (stored as milliseconds in UTC since
'1970-01-01 00:00:00')
Millions of rows are in this table and it's still growing.
In that case, If I want to see a report (day vs visitors) from any timezone then one solution is :
Solution #1:
Get the timezone of the report viewer (i.e. client)
Aggregate the data from this table considering the client's timezone
Show the result day wise
But In that case performance will degrade. Another solution may be the following:
Solution #2:
Using Pre-aggregated tables / summary tables where client's timezone is ignored
But in either case there is a trade off between performance and correctness.
Solution #1 ensures correctness and Solution #2 ensures better performance.
I want to know what is the best practice in this particular scenario?
The issue of handling time comes up a fair amount when you get into distributed systems, users and matching events between various sources of data.
I would strongly suggest that you ensure all logging systems use UTC. This allows collection from any variety of servers (which are all hopefully kept synchronized with respect to their view of the current UTC time) located anywhere in the world.
Then, as requests come in, you can convert from the users timezone to UTC. At this point you have the same decision -- perform a real-time query or perhaps access some data previously summarized.
Whether or not you want to aggregate the data in advance will depend on a bunch of things. Some of these might entail the ability to reduce the amount of data kept, reducing the amount of processing to support queries, how often queries will be performed or even the cost of building a system versus the amount of use it might see.
With respect to best practices -- keep the display characteristics (e.g. time zone) independent from the processing of the data.
If you haven't already, be sure you consider the lifetime of the data you are keeping. Will you need ten years of back data available? Hopefully not. Do you have a strategy for culling old data when it is no longer required? Do you know how much data you'll have if you store every record (estimate with various traffic growth rates)?
Again, a best practice for larger data sets is to understand how you are going to deal with the size and how you are going to manage that data over time as it ages. This might involve long term storage, deletion, or perhaps reduction to summarized form.
Oh, and to slip in a Matrix analogy, what is really going to bake your noodle in terms of "correctness" is the fact that correctness is not at issue here. Every timezone has a different view of traffic during a "day" in their own zone and every one of them is "correct". Even those oddball time zones that differ from yours by an adjustment that isn't measured only in hours.
I am creating a database to store data from a monitoring system that I have created. The system takes a bunch of data points(~4000) a couple times every minute and stores them in my database. I need to be able to down sample based on the time stamp. Right now I am planning on using one table with three columns:
results:
1. point_id
2. timestamp
3. value
so the query I'd be like to do would be:
SELECT point_id,
MAX(value) AS value
FROM results
WHERE timestamp BETWEEN date1 AND date2
GROUP BY point_id;
The problem I am running into is this seems super inefficient with respect to memory. Using this structure each time stamp would have to be recorded 4000 times, which seems a bit excessive to me. The only solutions I thought of that reduce the memory footprint of my database requires me to either use separate tables (which to my understanding is super bad practice) or storing the data in CSV files which would require me to write my own code to search through the data (which to my understanding requires me not to be a bum... and probably search substantially slower). Is there a database structure that I could implement that doesn't require me to store so much duplicate data?
A database on with your data structure is going to be less efficient than custom code. Guess what. That is not unusual.
First, though, I think you should wait until this is actually a performance problem. A timestamp with no fractional seconds requires 4 bytes (see here). So, a record would have, say 4+4+8=16 bytes (assuming a double floating point representation for value). By removing the timestamp you would get 12 bytes -- savings of 25%. I'm not saying that is unimportant. I am saying that other considerations -- such as getting the code to work -- might be more important.
Based on your data, the difference is between 184 Mbytes/day and 138 Mbytes/day, or 67 Gbytes/year and 50 Gbytes. You know, you are going to have to deal with biggish data issues regardless of how you store the timestamp.
Keeping the timestamp in the data will allow you other optimizations, notably the use of partitions to store each day in a separate file. This should be a big benefit for your queries, assuming the where conditions are partition-compatible. (Learn about partitioning here.) You may also need indexes, although partitions should be sufficient for your particular query example.
The point of SQL is not that it is the most optimal way to solve any given problem. Instead, it offers a reasonable solution to a very wide range of problems, and it offers many different capabilities that would be difficult to implement individually. So, the time to a reasonable solution is much, much less than developing bespoke code.
Using this structure each time stamp would have to be recorded 4000 times, which seems a bit excessive to me.
Not really. Date values are not that big and storing the same value for each row is perfectly reasonable.
...use separate tables (which to my understanding is super bad practice)
Who told you that!!! Normalising data (splitting into separate, linked data structures) is actually a good practise - so long as you don't overdo it - and SQL is designed to perform well with relational tables. It would perfectly fine to create a "time" table and link to the data in the other table. It would use a little more memory, but that really shouldn't concern you unless you are working in a very limited memory environment.
We are developing an application in C# 4 that uses SQL Server 2008 R2 as backend. SQL Server Compact 4 is also used for disconnected clients in a very few rare scenarios. We are wondering what's the best way to store date/time data into these databases so that:
Data containing different time offsets (coming from different time zones) can co-exist nicely. This means being sorted and compared.
Data in SQL Server 2008 R2 and SQL Server Compact 4 can be transferred back and forth seamlessly; this is a secondary requirement that should not compromise the chosen design.
Our main concern is to preserve the local time for each recorded event, but without losing the ability to compare and sort events that have been generated from different time zones and therefore have different time offsets.
We have considered the datetimeoffset data type, since it stores the time offset and because it maps nicely to .NET's DateTimeOffset. However, it is not supported in SQL Server Compact 4. An alternative would be to remove offset info from the database and use a simple datetime data type, so that every piece of data in the database is normalized, and the issues with Compact are fewer. However, this introduces the problem that offset info would need to be reconstructed somehow on retrieval before the user sees the data.
So my question is, are there any best practices or guidelines on how to store date/time values in SQL Server, taking into account that we will need to deal with different time zones, and making the interoperability between 2008 R2 and Compact 4 as easy as possible?
Thank you.
It sounds like the relevant points are:
Your incoming data is a good fit for DateTimeOffset
You only care about the offset at that particular time, so you don't need a real time zone. (An offset isn't a time zone.)
You do care about that original offset - you can't just normalize everything to UTC and ignore the offset entirely.
You want to query on the local time.
It does sound like DateTimeOffset is basically the most appropriate type in this case. You should make sure everyone on the team is clear about what it means though - the offset is the offset when the data was originally received. If you want to display that instant in time in a different time zone, you effectively need to go back to UTC, and find out what the offset would be in that display time zone. It's easy to get confused about this sort of thing :)
If you need to maintain the data in SqlServerCE with full fidelity, you'll probably want a DateTime field and then a separate field for the offset (e.g. in minutes, or as a TimeSpan if SqlServerCE supports that).
You are probably right to use DateTimeOffset on the server. You might also want to read my answer on DateTime vs DateTimeOffset.
On the client, where you are using SQLCE, store a DateTime with UTC values. When you send the data to the server, you can use the local time zone of the client to determine the DateTimeOffset that the UTC value corresponds to.
If it's possible that the user might be changing their time zones, then you might also need to store the time zone's id in the client database. But you would just use this during conversion. There's no need to send it to the server unless you might be editing those values on the server or in some other client.
Don't try storing the time on the client in the local time of the client. You will encounter abiguities. For example, when daylight saving time rolls backwards, you don't want two different possible UTC times for the same local time.
Why not use datetime and always store the value as a UTC value, then you can format it to the end users (display) time zone when required.
I would love to hear some opinions or thoughts on a mysql database design.
Basically, I have a tomcat server which recieves different types of data from about 1000 systems out in the field. Each of these systems are unique, and will be reporting unique data.
The data sent can be categorized as frequent, and unfrequent data. The unfrequent data is only sent about once a day and doesn't change much - it is basically just configuration based data.
Frequent data, is sent every 2-3 minutes while the system is turned on. And represents the current state of the system.
This data needs to be databased for each system, and be accessible at any given time from a php page. Essentially for any system in the field, a PHP page needs to be able to access all the data on that client system and display it. In other words, the database needs to show the state of the system.
The information itself is all text-based, and there is a lot of it. The config data (that doesn't change much) is key-value pairs and there is currently about 100 of them.
My idea for the design was to have 100+ columns, and 1 row for each system to hold the config data. But I am worried about having that many columns, mainly because it isn't too future proof if I need to add columns in the future. I am also worried about insert speed if I do it that way. This might blow out to a 2000row x 200column table that gets accessed about 100 times a second so I need to cater for this in my initial design.
I am also wondering, if there is any design philosophies out there that cater for frequently changing, and seldomly changing data based on the engine. This would make sense as I want to keep INSERT/UPDATE time low, and I don't care too much about the SELECT time from php.
I would also love to know how to split up data. I.e. if frequently changing data can be categorised in a few different ways should I have a bunch of tables, representing the data and join them on selects? I am worried about this because I will probably have to make a report to show common properties between all systems (i.e. show all systems with a certain condition).
I hope I have provided enough information here for someone to point me in the right direction, any help on the matter would be great. Or if someone has done something similar and can offer advise I would be very appreciative. Thanks heaps :)
~ Dan
I've posted some questions in a comment. It's hard to give you advice about your rapidly changing data without knowing more about what you're trying to do.
For your configuration data, don't use a 100-column table. Wide tables are notoriously hard to handle in production. Instead, use a four-column table containing these columns:
SYSTEM_ID VARCHAR System identifier
POSTTIME DATETIME The time the information was posted
NAME VARCHAR The name of the parameter
VALUE VARCHAR The value of the parameter
The first three of these columns are your composite primary key.
This design has the advantage that it grows (or shrinks) as you add to (or subtract from) your configuration parameter set. It also allows for the storing of historical data. That means new data points can be INSERTed rather than UPDATEd, which is faster. You can run a daily or weekly job to delete history you're no longer interested in keeping.
(Edit if you really don't need history, get rid of the POSTTIME column and use MySQL's nice extension feature INSERT ON DUPLICATE KEY UPDATE when you post stuff. See http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html)
If your rapidly changing data is similar in form (name/value pairs) to your configuration data, you can use a similar schema to store it.
You may want to create a "current data" table using the MEMORY access method for this stuff. MEMORY tables are very fast to read and write because the data is all in RAM in your MySQL server. The downside is that a MySQL crash and restart will give you an empty table, with the previous contents lost. (MySQL servers crash very infrequently, but when they do they lose MEMORY table contents.)
You can run an occasional job (every few minutes or hours) to copy the contents of your MEMORY table to an on-disk table if you need to save history.
(Edit: You might consider adding memcached http://memcached.org/ to your web application system in the future to handle a high read rate, rather than constructing a database design for version 1 that handles a high read rate. That way you can see which parts of your overall app design have trouble scaling. I wish somebody had convinced me to do this in the past, rather than overdesigning for early versions. )