Does JSON have any numeric limits or is it basically "anything goes" with respect to numbers and it's just up to the receiving party to validate and do error checking on the numeric values. For example:
[12e+10000, 9223372036854775808]
The above would be an out-of-bounds value in a language that only supported float64 and int64, so how is this usually done? Or does JSON only specify the grammar (for example for an int 0|[1-9]\d+ and its up to the processor to determine what to do with it?
The JSON standard does not set any limits on the magnitude or precision of a number; it is only concerned with the syntax. The RFC version is explicit about the fact that the standard permits an implementation to set numeric limits; it does not impose minimum limits.
An implementation which sets a small limit on number magnitude or precision will create interoperability issues; an implementation which outputs numbers outside the possibilities of IEEE754 binary64 may also cause interop issues. Many processors allow numbers to be transmitted as strings, leaving it to the receiver to interpret the string as a number (perhaps with the aid of a schema); this is the only way to transmit infinities and NaN values.
I cannot convert correctly binary numbers to decimal using ASN.1 compilation. Those binaries correspond with lat and long.
lat 1001110010100100101010110011111
long 01101100100101011100100100111000
If I convertem to decimal I get 1314018719 and 1821755704, respectively. However, the coordinates should be this:
enter image description here
I've tried multiple converters but without exit. Any clue?
I don't how you think the encoding works. ASN.1 PER is specified by ITU-T X.680 and ITU-T X.691. (UPER is unaligned PER, a variant of PER defined in the same specs.) The rules for integers include doing things such as encoding as an offset from a lower bound, using a length determinant and minimal octets, using a fixed number of octets and no length determinant, etc., depending on the INTEGER type's constraints. Nobody can tell you how to treat the data you've provided without having the ASN.1 schema and knowing what part of it relates to this data, as well as knowing whether the bits you have include the length determinant or not (if there is one).
SUMMARY
some support for JSON was added to XSLT 3.0 + XPath/XQuery 3.1
unfortunately, JSON number types are handled as IEEE double, subjecting the data to loss of numeric precision
I am considering writing a set of custom functions based on Java BigDecimal instead of IEEE double
Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?
DETAIL
I need to perform a number of transformations of JSON data.
XSLT 3.0 + XPath 3.1 + XQuery 3.1 have some support for JSON through json-to-xml + parse-json.
https://www.w3.org/TR/xpath-functions-31/#json-functions
https://www.saxonica.com/papers/xmlprague-2016mhk.pdf
I have hit a significant snag related to treatment of numeric data types.
My JSON data includes numeric values that exceed the precision of IEEE double-floats. In Java, my numeric values need to be processed using BigDecimal.
https://www.w3.org/TR/xpath-functions-31/#json-to-xml-mapping
states
Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point ...
In addition, I have taken a look at the saxonica 9.8 HE reference implementation source for ./ma/json/JsonParser.java and confirm that the private method parseNumericLiteral() returns a primitive double.
I am considering cloning the saxon 9.8 HE JSON support code and using this as the basis for a set of customized functions which uses Java BigDecimal instead of double in order to retain numeric precision through the transformations ...
Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?
Q: Are you aware of any unforeseen issues which I may encounter?
The XML data model defines decimal numbers as having any finite precision.
https://www.w3.org/TR/xmlschema-2/#decimal
The JSON data model defines numbers as having any finite precision.
https://www.rfc-editor.org/rfc/rfc7159#page-6
Not surprisingly, both warn of potential interoperability issues with numeric values with extended precision.
Q: What was the rationale for explicitly defining the JSON number type in XPath/XQuery as IEEE double?
THE END
This is what the RFC says:
This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate
potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.
That, to my mind, is a pretty clear warning: it says that although the JSON grammar allows arbitrary precision in numeric values, you can't rely on JSON consumers to retain that precision, and it follows that if you want to convey high-precision numeric values, it would be better to convey them as strings.
The rules for fn:json-to-xml and fn:xml-to-json need to be read carefully:
The fn:json-to-xml function creates an element whose string value is
lexically the same as the JSON representation of the number. The
fn:xml-to-json function generates a JSON representation that is the
result of casting the (typed or untyped) value of the node to
xs:double and then casting the result to xs:string. Leading and
trailing whitespace is accepted. Since JSON does not impose limits on
the range or precision of numbers, these rules mean that conversion
from JSON to XML will always succeed, and will retain full precision
in the lexical representation unless the data model implementation is
one that reconstructs the string value from the typed value. In the
reverse direction, conversion from XML to JSON may fail if the value
is infinity or NaN, or if the string value is such that casting to
xs:double produces positive or negative infinity.
Although I probably wrote these words, I'm not sure I recall the exact rationale for why the decision was made this way, but it does suggest that the matter received careful thought. I suspect the thinking was that when you consume JSON, you should try to preserve all the information that is present in the input, but when you generate JSON, you should try to generate something that will be acceptable to all consumers. (The famous maxim about being liberal in what you accept and conservative in what you produce.)
Your analysis of the Saxon source isn't quite correct. You say:
the private method parseNumericLiteral() returns a primitive double.
which is true enough; but the original lexical representation is retained, and when the parser communicates the value to a JsonReceiver, it passes both the Java double and the string representation, so the JsonReceiver has access to both (which is needed for a correct implementation of fn:json-to-xml).
Some APIs, like the paypal API use a string type in JSON to represent a decimal number. So "7.47" instead of 7.47.
Why/when would this be a good idea over using the json number value type? AFAIK the number value type allows for infinite precision as well as scientific notation.
The main reason to transfer numeric values in JSON as strings is to eliminate any loss of precision or ambiguity in transfer.
It's true that the JSON spec does not specify a precision for numeric values. This does not mean that JSON numbers have infinite precision. It means that numeric precision is not specified, which means JSON implementations are free to choose whatever numeric precision is convenient to their implementation or goals. It is this variability that can be a pain if your application has specific precision requirements.
Loss of precision generally isn't apparent in the JSON encoding of the numeric value (1.7 is nice and succinct) but manifests in the JSON parsing and intermediate representations on the receiving end. A JSON parsing function would quite reasonably parse 1.7 into an IEEE double precision floating point number. However, finite length / finite precision decimal representations will always run into numbers whose decimal expansions cannot be represented as a finite sequence of digits:
Irrational numbers (like pi and e)
1.7 has a finite representation in base 10 notation, but in binary (base 2) notation, 1.7 cannot be encoded exactly. Even with a near infinite number of binary digits, you'll only get closer to 1.7, but you'll never get to 1.7 exactly.
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
Consumers of the JSON 1.7 value could use more sophisticated techniques to parse and retain the value in memory, such as using a fixed-point data type or a "string int" data type with arbitrary precision, but this will not entirely eliminate the specter of loss of precision in conversion for some numbers. And the reality is, very few JSON parsers bother with such extreme measures, as the benefits for most situations are low and the memory and CPU costs are high.
So if you are wanting to send a precise numeric value to a consumer and you don't want automatic conversion of the value into the typical internal numeric representation, your best bet is to ship the numeric value out as a string and tell the consumer exactly how that string should be processed if and when numeric operations need to be performed on it.
For example: In some JSON producers (JRuby, for one), BigInteger values automatically output to JSON as strings, largely because the range and precision of BigInteger is so much larger than the IEEE double precision float. Reducing the BigInteger value to double in order to output as a JSON numeric will often lose significant digits.
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
Finally, there is another aspect which can lead to choosing to send numeric data as strings: control of display formatting. Leading zeros and trailing zeros are insignificant to the numeric value. If you send JSON number value 2.10 or 004, after conversion to internal numeric form they will be displayed as 2.1 and 4.
If you are sending data that will be directly displayed to the user, you probably want your money figures to line up nicely on the screen, decimal aligned. One way to do that is to make the client responsible for formatting the data for display. Another way to do it is to have the server format the data for display. Simpler for the client to display stuff on screen perhaps, but this can make extracting the numeric value from the string difficult if the client also needs to make computations on the values.
I'll be a bit contrarian and say that 7.47 is perfectly safe in JSON, even for financial amounts, and that "7.47" isn't any safer.
First, let me address some misconceptions from this thread:
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
That is not true, especially in the context of IEEE 754 double precision format that was mentioned in that answer. 1.7 converts into an exact double 1.6999999999999999555910790149937383830547332763671875 and when that value is "printed" for display, it will always be 1.7, and never 1.69, 1.699999999999 or 1.70000000001. It is 1.7 "exactly".
Learn more here.
7.47 may actually be 7.4699999923423423423 when converted to float
7.47 already is a float, with an exact double value 7.46999999999999975131004248396493494510650634765625. It will not be "converted" to any other float.
a simple system that simply truncates the extra digits off will result in 7.46 and now you've lost a penny somewhere
IEEE rounds, not truncates. And it would not convert to any other number than 7.47 in the first place.
is the JSON number actually a float? As I understand it's a language independent number, and you could parse a JSON number straight into a java BigDecimal or other arbitrary precision format in any language if so inclined.
It is recommended that JSON numbers are interpreted as doubles (IEEE 754 double-precision format). I haven't seen a parser that wouldn't be doing that.
And no, BigDecimal(7.47) is not the right way to do it – it will actually create a BigDecimal representing the exact double of 7.47, which is 7.46999999999999975131004248396493494510650634765625. To get the expected behavior, BigDecimal("7.47") should be used.
Overall, I don't see any fundamental issue with {"price": 7.47}. It will be converted into a double on virtually all platforms, and the semantics of IEEE 754 guarantee that it will be "printed" as 7.47 exactly and always.
Of course floating point rounding errors can happen on further calculations with that value, see e.g. 0.1 + 0.2 == 0.30000000000000004, but I don't see how strings in JSON make this better. If "7.47" arrives as a string and should be part of some calculation, it will need to be converted to some numeric data type anyway, probably float :).
It's worth noting that strings also have disadvantages, e.g., they cannot be passed to Intl.NumberFormat, they are not a "pure" data type, e.g., the dot is a formatting decision.
I'm not strongly against strings, they seem fine to me as well but I don't see anything wrong on {"price": 7.47} either.
The reason I'm doing it is that the SoftwareAG parser tries to "guess" the java type from the value it receives.
So when it receives
"jackpot":{
"growth":200,
"percentage":66.67
}
The first value (growth) will become a java.lang.Long and the second (percentage) will become a java.lang.Double
Now when the second object in this jackpot-array has this
"jackpot":{
"growth":50.50,
"percentage":65
}
I have a problem.
When I exchange these values as Strings, I have complete control and can cast/convert the values to whatever I want.
Summarized Version
Just quoting from #dthorpe's answer, as I think this is the most important point:
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
I18N is another reason NOT to use String for decimal numbers
In tens of countries, such as Germany and France, comma (,) is the decimal separator and dot (.) is the thousands separator. See the list on Wikipedia.
If your JSON document carries decimal numbers as string, you're relying on all possible API consumers using the same number format conversion (which is a step after the JSON parsing). There's the risk of incorrect conversion due to inverted use of comma and dot as separators.
If you use number for decimal numbers that risk is averted.
When I export my bigquery tables in JSON format, the INTEGER fields are converted to strings. Is there any way to maintain the integer data type when exporting?
Here are the minimum steps to reproduce the integer->string conversion phenomenon:
Run the query SELECT INTEGER(1) AS myInt and save the result to a table. Note that the output table schema shows the type as INTEGER.
Export the table in JSON format. The output will be: {"myInt":"1"}
In the JSON format, "1" is a string, not an integer.
This is not currently possible; the reasoning is because of an unfortunate combination of the Javascript spec, IEEE floating point precision, JSON, and BigQuery integer sizes.
In Javascript, all numbers must be representable as IEEE754 double-precision floating point values. Javascript parses JSON numbers into javascript numbers. BigQuery uses 64-bit signed integer values.
The problem comes because not all 64-bit integer values can be represented as IEEE 754 double precision floating point values. (it is easy to see why: IEEE 754 double-precision floating point uses 64 bits but can represent lots of things that aren't integers; therefore, there must be 64 bit integers that it can't represent).
So in order to make BigQuery JSON responses work in Javascript, integer values are wrapped in quotes, so that no precision is lost.
That said ... the decision to represent integers as strings in API request makes sense, since many of the callers of the API will be in javascript. There doesn't seem to be as compelling of an argument to not represent integers as numbers when exporting data. (other than to change it now would be a breaking change).
Can you file a bug at the BigQuery issue tracker to fix this? (it would likely involve another flag in the export configuration).