How to represent date and/or time information in JSON? - json

JSON text (RFC 4627) has unambigious representation of objects, arrays, strings, numbers, Boolean values (literally true or false) and null. However, it has nothing defined for representing time information like date and time of day, which is very common in applications. What are the current methods in use to represent time in JSON given the constraints and grammar laid out in RFC 4627?
Note to respondents: The purpose of this question is to document the various methods known to be in circulation along with examples and relative pros and cons (ideally from field experience).

The only representation that I have seen in use (though, admittedly, my experience is limited to DOJO) is ISO 8601, which works nicely, and represents just about anything you could possibly think of.
For examples, you can visit the link above.
Pros:
Represents pretty much anything you could possibly throw at it, including timespans. (ie. 3 days, 2 hour)
Cons:
Umm... I don't know actually. Other than perhaps it might take a bit of getting used to? It's certainly easy enough to parse, if there aren't built in functions to parse it already.

ISO 8601 seems like a natural choice, but if you'd like to parse it with JavaScript running in a browser, you will need to use a library, for browser supports for the parts of the JavaScript Date object that can parse ISO 8601 dates is inconsistent, even in relatively new browsers. Another problem with ISO 8601 is that it is a large, rich standard, and the date/time libraries support only part of it, so you will have to pick a subset of ISO 8601 to use that is supported by the libraries you use.
Instead, I represent times as the number of milliseconds since 1970-01-01T00:00Z. This is understood by the constructor for the Date object in much older browsers, at least going back to IE7 (which is the oldest I have tested).

There is no set literal so use what's easiest for you. For most people, that's either a string of the UTC output or an long-integer of the UTC-centered timecode.
Read this for a bit more background: http://msdn.microsoft.com/en-us/library/bb299886.aspx

I recommend using RFC 3339 format, which is nice and simple, and understood by an increasing number of languages, libraries, and tools.
Unfortunately, RFC 3339, Unix epoch time, and JavaScript millisecond time, are all still not quite accurate, since none of them account for leap seconds! At some point we're all going to have to revisit time representations yet again. Maybe the next time we can be done with it.

Sorry to comment on such an old question, but in the intervening years more solutions have turned up.
Representing date and/or time information in JSON is a special case of the more general problem of representing complex types and complex data structures in JSON. Part of what make the problem tricky is that if you represent complex types like timestamps as JSON objects, then you need to have a way of expression associative arrays and objects, which happen to look like your JSON object representation of a timestamp, as some other marked-up object.
Google's protocol buffers have a JSON mapping which has the notion of a timestamp type, with defined semantics.
MongoDB's BSON has an Extended JSON which says { "$date": "2017-05-17T23:09:14.000000Z" }.
Both can also express way more complex structures in addition to datetime.

Related

How to store prices in Couchbase without losing precision?

I have a Couchbase database and I would like to store price without losing precision - double is really not good enough for my application. However, it seems that there is no support for currency data types in Couchbase.
Is there a preferred solution for this problem for this database engine?
I was thinking about storing each price twice, once as string and once as double, so that I can still query price for inequality. It's better than nothing but not really a nice solution.
This is really a problem with JSON, but since Couchbase uses pure JSON, it applies :)
One solution that I've seen is to store it as an integer.
For example, if you want to store a price of $129.99, you would store a number of 12999. This could be kinda annoying, but depending on what language/framework you're using, it could be relatively easy to customize your (de)serializer to handle this automatically. Or you could create a calculated property in your class (assuming you're using OOP). Or you could use AOP.
But in any case, your precision is stored. Your string solution would also work, with similar caveats.

Why doesn't JSON support integer?

The semantics of integers and doubles are quite different. Lua recently also added integer support (even though as a subtype of number). Python in a sense is perfect in its type completeness, I am not even talking about other more heavy weight languages like C++/C#/Java ...
There are systems that treats integers significantly different than floating point numbers (or doubles), and while using JSON, it imposes some mental damage that all the integers writing to the wire come back as doubles. On the high level application logic, one could probably differentiate based on the property, but the extra double to int cast makes the code not intuitive and misleading - people would ask questions like: why you do a cast here? Are you sure this is an integer?
So in a sense, the intent is not completely clear when there is no explicit integer support.
Can anyone shed some lights?
Thanks!

Pattern match to identify date format

My source having different date formats as shown below, And im looking for an algorithm to identify the source date pattern tried in Pentaho Data integration with select value and Fuzzy steps.
Date Column (String)
"20150210"
"20050822--"
"2014-02-May"
"20051509--"
"02-May-2014"
"2013-May-12"
"12DEC2013"
"15050815"
"May-02-2014"
"12312015"
I know that in PDI we can achieve through JS step by writing If conditions for each pattern but is not a good idea and this approach makes transformation dead when dealing with huge records, looking out for efficient way to search date pattern.
I believe this is very common issue in all ETL projects, Here Im trying to understand how enterprise vendors like SAS Data Integration, Informatica, SSIS provides easy way to handle.
Do we have any Algorithm to identify source pattern. If so which one?
The formats that are listed above are not limited.
One cannot simply determine a "monovalent" value as the format for any given input.
Consider all of the following formats completely valid:
MM-dd-yy
dd-MM-yy
yy-MM-dd
As stated in a comment by #billinkc, what would you call 01-02-05 in that case?
If at all, your would be a solvable one only if you took a data set into account (e.g. you know that the next X rows are all from the same date format). Then you can look at it as a linear problem with some constraints that can help you determine the date format. Even then, you can't assure that you'll get a definite answer, just increase the probability that you'll have a definite answer.

MongoDB: should I use string instead of date?

In my web project using angular, node and mongodb with JSON, date is not natively supported by JSON serializer. There is a workaround for this problem, as shown here. However, I wonder what's the benefit saving the date as a date object instead of a string in MongoDB? I'm not that far with the project so I don't see the difference.
By saving your dates not as dates but as strings, you are missing out on some very useful features:
MongoDB can query date-ranges with $gt and $lt.
In version 3.0, the aggregation framework got many useful aggregation operators for date handling. None of those work on strings and few of them can be adequately substituted by string operators.
MongoDB dates are internally handled in UNIX epoch, so nasty details like saving timestamps from different timezones or daylight saving times are a non-issue.
A BSON Date is just 8 byte. A date in the minimal form of YYYYMMDD is 12 byte (strings in BSON are prefixed with a 4 byte integer for the length). When you store it as an ISODate string which uses all the ISO 8601 standard has to offer (date, time accurate to millisecond and timezone), you have 32 byte - four times the storage space.
You need to know if any of this matters for your project.
When you really want to avoid using the BSON Date type, you should consider to store your dates as a number representing the elapsed milliseconds/seconds/hours/days (whatever appropriate for your use-case) since a fixed point in time instead of a string. That way you retain the advantages of everything but point 2.
You should at least use ISO dates if you go for this approach. I would however argue that there are benefits in storing date values as date objects. Storing dates as date objects will allow you to add indices and should also help with date range queries. Saying this many developers seem to be happy to store dates as strings, see What is the best way to store dates in MongoDB?

Does any programming language support defining constraints on primitive data types?

Last night I was thining that programming languages can have a feature in which we should be able to constraints the values assigned to primitive data types.
For example I should be able to say my variable of type int can only have value between 0 and 100
int<0, 100> progress;
This would then act as a normal integer in all scenarios except the fact that you won't be able to specify values out of the range defined in constraint. The compiler will not compile the code progress=200.
This constraint can be carried over with type information.
Is this possible? Is it done in any programming language? If yes then which language has it and what is this technique called?
It is generally not possible. It makes little sense to use integers without any arithmetic operators. With arithmetic operators you have this:
int<0,100> x, u, v;
...
x = u + v; // is it in range?
If you're willing to do checks at run-time, then yes, several mainstream languages support it, starting with Pascal.
I believe Pascal (and Delphi) offers something similar with subrange types.
I think this is not possible at all in Java and in Ruby (well, in Ruby probably it is possible, but requires some effort). I have no idea about other languages, though.
Ada allows something like what you describe with ranges:
type My_Int is range 1..100;
So if you try assign a value to a My_Int that's less than 1 or greater than 100, Ada will raise the exception Constraint_Error.
Note that I've never used Ada. I've only read about this feature, so do your research before you plunge in.
It is certainly possible. There are many different techniques to do that, but 'dependent types' is the most popular.
The constraints can be even checked statically at compile time by compiler. See, for example, Agda2 and ATS (ats-lang.org).
Weaker forms of your 'range types' are possible without full dependent types, I think.
Some keywords to search for research papers:
- Guarded types
- Refinment types
- Subrange types
Certainly! In case you missed it: C. Do you C? You don't C? You don't count short as a constraint on Integer? Ok, so C only gives you pre-packaged constrained types.
BTW: It seems the answer that Pascal has subrange types misses the point of them. In Pascal array bounds violations are not possible. This is because the array index must of the same type as the array was declared with. In turn this means that to use an integer index you must coerce it down to the subrange, and that is where the run time check is done, not accessing the array.
This is a very important idea because it means a for loop over an array index type may access the array components safely without any run time checking.
Pascal has subranges. Ada extended that a bit, so you can do something like a subrange, or you can create an entirely new type with characteristics of the existing type, but not compatible with it (e.g., even if it was in the right range, you wouldn't be able to assign an Integer to your new type based off of Integer).
C++ doesn't support the idea directly, but is flexible enough that you can implement it if you want to. If you decide to support all the compound assignment operators (+=, -=, *=, etc.) this can be a lot of work though.
Other languages that support operator overloading (e.g., ML and company) can probably support it in much the same way as C++.
Also note that there are a few non-trivial decisions involved in the design. In particular, if the type is used in a way that could/does result in an intermediate result that overflows the specified range, but produces a final result that's within the specified range, what do you want to happen? Depending on your situation, that might be an error, or it might be entirely acceptable, and you'll have to decide which.
I really doubt that you can do that. Afterall these are primitive datatypes, with emphasis on primitive!
adding a constraint will make the type a subclass of its primitive state, thus extending it.
from wikipedia:
a basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
a built-in type is a data type for which the programming language provides built-in support.
So personally, even if it is possible i wouldnt do, since its a bad practice. Instead just create an object that returns this type and the constraints (which i am sure you thought of this solution).
SQL has domains, which consist of a base type together with a dynamically-checked constraint. For example, you might define the domain telephone_number as a character string with an appropriate number of digits, valid area code, etc.