BigDecimal to Double in order to add to JSON Object - json

How we convert BigDecimal into Double without losing precision in Kotlin?. I need to put it in a JSON response.
I'm using Vert.x and JsonObject. I've tried converting BigDecimal with scale 2 to Double with toDouble. Internally it uses Jackson as Object mapper
Example:
Currently:
BigDecimal("0.000") -> Response: { amount: 0.0 }
What I need:
BigDecimal("0.000") -> Response: { amount: 0.000 }

I'm afraid you can't convert a BigDecimal into a Double without losing precision, for several reasons:
There are many more possible values for BigDecimal than for Double, so the conversion is necessarily lossy.
Doubles are 64-bit, so can't have more than 2⁶⁴ distinct values, while BigDecimals are effectively unlimited.
BigDecimals store decimal fractions, while Doubles store binary fractions.  There's very little overlap between the two, so in most cases the conversion will need to round the value.
Both can store integers exactly (up to a certain value), and both can store fractions such as 0.5 exactly.  But nearly all decimal fractions can't be represented exactly as a binary fraction, and so for example there's no Double holding exactly 0.1.  (1/10 is an infinite recurring fraction in binary — 0.0001100110011… — and so no finite binary fraction can represent it exactly.)
This means that in Kotlin (and most other programming languages), a numeric literal such as 0.1 gets converted to the nearest double-precision number, which is around 0.100000000000000005551115….  In practice, this is usually hidden from you, because when you print out a Double, the formatting routine will round it off, and in many cases that gives back the original number.  But not always, e.g.:
>>> println(0.1 + 0.1 + 0.1)
0.30000000000000004
(All of this is discussed in other questions, most notably here.)
Unlike BigDecimals, Doubles have no precision, so they can't make the distinction you want anyway.
For example, both 1.0 and 1.000000 are represented by exactly the same Double value:
>>> println(1.000000)
1.0
I don't know Vert.x, but I'd be surprised if you really needed a Double here.  Have you tried using a BigDecimal directly?
Or if that doesn't work, have you tried converting it to a String, which will preserve whatever formatting you want?

Related

XSLT 3.0 transformation of JSON to XML -- numeric data types

SUMMARY
some support for JSON was added to XSLT 3.0 + XPath/XQuery 3.1
unfortunately, JSON number types are handled as IEEE double, subjecting the data to loss of numeric precision
I am considering writing a set of custom functions based on Java BigDecimal instead of IEEE double
Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?
DETAIL
I need to perform a number of transformations of JSON data.
XSLT 3.0 + XPath 3.1 + XQuery 3.1 have some support for JSON through json-to-xml + parse-json.
https://www.w3.org/TR/xpath-functions-31/#json-functions
https://www.saxonica.com/papers/xmlprague-2016mhk.pdf
I have hit a significant snag related to treatment of numeric data types.
My JSON data includes numeric values that exceed the precision of IEEE double-floats. In Java, my numeric values need to be processed using BigDecimal.
https://www.w3.org/TR/xpath-functions-31/#json-to-xml-mapping
states
Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point ...
In addition, I have taken a look at the saxonica 9.8 HE reference implementation source for ./ma/json/JsonParser.java and confirm that the private method parseNumericLiteral() returns a primitive double.
I am considering cloning the saxon 9.8 HE JSON support code and using this as the basis for a set of customized functions which uses Java BigDecimal instead of double in order to retain numeric precision through the transformations ...
Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?
Q: Are you aware of any unforeseen issues which I may encounter?
The XML data model defines decimal numbers as having any finite precision.
https://www.w3.org/TR/xmlschema-2/#decimal
The JSON data model defines numbers as having any finite precision.
https://www.rfc-editor.org/rfc/rfc7159#page-6
Not surprisingly, both warn of potential interoperability issues with numeric values with extended precision.
Q: What was the rationale for explicitly defining the JSON number type in XPath/XQuery as IEEE double?
THE END
This is what the RFC says:
This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate
potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.
That, to my mind, is a pretty clear warning: it says that although the JSON grammar allows arbitrary precision in numeric values, you can't rely on JSON consumers to retain that precision, and it follows that if you want to convey high-precision numeric values, it would be better to convey them as strings.
The rules for fn:json-to-xml and fn:xml-to-json need to be read carefully:
The fn:json-to-xml function creates an element whose string value is
lexically the same as the JSON representation of the number. The
fn:xml-to-json function generates a JSON representation that is the
result of casting the (typed or untyped) value of the node to
xs:double and then casting the result to xs:string. Leading and
trailing whitespace is accepted. Since JSON does not impose limits on
the range or precision of numbers, these rules mean that conversion
from JSON to XML will always succeed, and will retain full precision
in the lexical representation unless the data model implementation is
one that reconstructs the string value from the typed value. In the
reverse direction, conversion from XML to JSON may fail if the value
is infinity or NaN, or if the string value is such that casting to
xs:double produces positive or negative infinity.
Although I probably wrote these words, I'm not sure I recall the exact rationale for why the decision was made this way, but it does suggest that the matter received careful thought. I suspect the thinking was that when you consume JSON, you should try to preserve all the information that is present in the input, but when you generate JSON, you should try to generate something that will be acceptable to all consumers. (The famous maxim about being liberal in what you accept and conservative in what you produce.)
Your analysis of the Saxon source isn't quite correct. You say:
the private method parseNumericLiteral() returns a primitive double.
which is true enough; but the original lexical representation is retained, and when the parser communicates the value to a JsonReceiver, it passes both the Java double and the string representation, so the JsonReceiver has access to both (which is needed for a correct implementation of fn:json-to-xml).

JSON number output: OK to strip trailing zeros?

Is there any real world hazard to stripping trailing zero and decimal point from numbers output to JSON? Outputting 2 instead of 2.0
I'm not interested in hypotheticals. Do you know of any widely used JSON parsing libraries that would choke on seeing an "integer" value where a float is possible?
For example, a JSON array of number:
[2.4, 5.6, 4, 1, 0.12]
I'd like to minimize the char length of number values I write to JSON, but there are worries that this will confuse some bonehead JSON reader.
As long as the data is being assigned to a variable of a floating-point type, trailing fractional components that evaluate to 0 are superfluous.
I wouldn't worry about your boneheaded JSON reader, nor would I worry about a few extra zeroes after a decimal point upsetting anyone or anything.

Why would you use a string in JSON to represent a decimal number

Some APIs, like the paypal API use a string type in JSON to represent a decimal number. So "7.47" instead of 7.47.
Why/when would this be a good idea over using the json number value type? AFAIK the number value type allows for infinite precision as well as scientific notation.
The main reason to transfer numeric values in JSON as strings is to eliminate any loss of precision or ambiguity in transfer.
It's true that the JSON spec does not specify a precision for numeric values. This does not mean that JSON numbers have infinite precision. It means that numeric precision is not specified, which means JSON implementations are free to choose whatever numeric precision is convenient to their implementation or goals. It is this variability that can be a pain if your application has specific precision requirements.
Loss of precision generally isn't apparent in the JSON encoding of the numeric value (1.7 is nice and succinct) but manifests in the JSON parsing and intermediate representations on the receiving end. A JSON parsing function would quite reasonably parse 1.7 into an IEEE double precision floating point number. However, finite length / finite precision decimal representations will always run into numbers whose decimal expansions cannot be represented as a finite sequence of digits:
Irrational numbers (like pi and e)
1.7 has a finite representation in base 10 notation, but in binary (base 2) notation, 1.7 cannot be encoded exactly. Even with a near infinite number of binary digits, you'll only get closer to 1.7, but you'll never get to 1.7 exactly.
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
Consumers of the JSON 1.7 value could use more sophisticated techniques to parse and retain the value in memory, such as using a fixed-point data type or a "string int" data type with arbitrary precision, but this will not entirely eliminate the specter of loss of precision in conversion for some numbers. And the reality is, very few JSON parsers bother with such extreme measures, as the benefits for most situations are low and the memory and CPU costs are high.
So if you are wanting to send a precise numeric value to a consumer and you don't want automatic conversion of the value into the typical internal numeric representation, your best bet is to ship the numeric value out as a string and tell the consumer exactly how that string should be processed if and when numeric operations need to be performed on it.
For example: In some JSON producers (JRuby, for one), BigInteger values automatically output to JSON as strings, largely because the range and precision of BigInteger is so much larger than the IEEE double precision float. Reducing the BigInteger value to double in order to output as a JSON numeric will often lose significant digits.
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
Finally, there is another aspect which can lead to choosing to send numeric data as strings: control of display formatting. Leading zeros and trailing zeros are insignificant to the numeric value. If you send JSON number value 2.10 or 004, after conversion to internal numeric form they will be displayed as 2.1 and 4.
If you are sending data that will be directly displayed to the user, you probably want your money figures to line up nicely on the screen, decimal aligned. One way to do that is to make the client responsible for formatting the data for display. Another way to do it is to have the server format the data for display. Simpler for the client to display stuff on screen perhaps, but this can make extracting the numeric value from the string difficult if the client also needs to make computations on the values.
I'll be a bit contrarian and say that 7.47 is perfectly safe in JSON, even for financial amounts, and that "7.47" isn't any safer.
First, let me address some misconceptions from this thread:
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
That is not true, especially in the context of IEEE 754 double precision format that was mentioned in that answer. 1.7 converts into an exact double 1.6999999999999999555910790149937383830547332763671875 and when that value is "printed" for display, it will always be 1.7, and never 1.69, 1.699999999999 or 1.70000000001. It is 1.7 "exactly".
Learn more here.
7.47 may actually be 7.4699999923423423423 when converted to float
7.47 already is a float, with an exact double value 7.46999999999999975131004248396493494510650634765625. It will not be "converted" to any other float.
a simple system that simply truncates the extra digits off will result in 7.46 and now you've lost a penny somewhere
IEEE rounds, not truncates. And it would not convert to any other number than 7.47 in the first place.
is the JSON number actually a float? As I understand it's a language independent number, and you could parse a JSON number straight into a java BigDecimal or other arbitrary precision format in any language if so inclined.
It is recommended that JSON numbers are interpreted as doubles (IEEE 754 double-precision format). I haven't seen a parser that wouldn't be doing that.
And no, BigDecimal(7.47) is not the right way to do it – it will actually create a BigDecimal representing the exact double of 7.47, which is 7.46999999999999975131004248396493494510650634765625. To get the expected behavior, BigDecimal("7.47") should be used.
Overall, I don't see any fundamental issue with {"price": 7.47}. It will be converted into a double on virtually all platforms, and the semantics of IEEE 754 guarantee that it will be "printed" as 7.47 exactly and always.
Of course floating point rounding errors can happen on further calculations with that value, see e.g. 0.1 + 0.2 == 0.30000000000000004, but I don't see how strings in JSON make this better. If "7.47" arrives as a string and should be part of some calculation, it will need to be converted to some numeric data type anyway, probably float :).
It's worth noting that strings also have disadvantages, e.g., they cannot be passed to Intl.NumberFormat, they are not a "pure" data type, e.g., the dot is a formatting decision.
I'm not strongly against strings, they seem fine to me as well but I don't see anything wrong on {"price": 7.47} either.
The reason I'm doing it is that the SoftwareAG parser tries to "guess" the java type from the value it receives.
So when it receives
"jackpot":{
"growth":200,
"percentage":66.67
}
The first value (growth) will become a java.lang.Long and the second (percentage) will become a java.lang.Double
Now when the second object in this jackpot-array has this
"jackpot":{
"growth":50.50,
"percentage":65
}
I have a problem.
When I exchange these values as Strings, I have complete control and can cast/convert the values to whatever I want.
Summarized Version
Just quoting from #dthorpe's answer, as I think this is the most important point:
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
I18N is another reason NOT to use String for decimal numbers
In tens of countries, such as Germany and France, comma (,) is the decimal separator and dot (.) is the thousands separator. See the list on Wikipedia.
If your JSON document carries decimal numbers as string, you're relying on all possible API consumers using the same number format conversion (which is a step after the JSON parsing). There's the risk of incorrect conversion due to inverted use of comma and dot as separators.
If you use number for decimal numbers that risk is averted.

Is there error propagation when serializing floating point values to strings?

Say I have a float (or double) in my favorite language. Say that in memory this value is stored according to IEEE 754, say that I serialize this value in XML or JSON or plain text using base 10. When serializing and de-serializing this value will I lose precision of my number? When should I care about this precision loss?
Would converting the number to base64 prevent the loss of precision?
It depends on the binary-to-decimal conversion function that you use. Assuming this function is not botched (it has no reason to be):
Either it converts to a fixed precision. Old-fashioned languages such as C offer this kind of conversion to decimal. In this case, you should use a format with 17 significant decimal digits. A common format is D.DDDDDDDDDDDDDDDDEXXX where D and X are decimal digits, and there are 16 digits after the dot. This would be specified as %.16e in C-like languages. Converting back such a decimal value to the nearest double produces the same double that was originally printed.
Or convert it to the shortest decimal representation that converts back to the same double. This is what some modern programming languages (e.g. Java) offer by default as printing function. In this case, the property that parsing back the decimal representation will return the original double is automatic.
In either case loss of accuracy should not happen. This is not because you get the exact decimal representation of the original binary64 number with either method 1. or 2. above: in the general case, you don't. Such an exact representation always exists (because 10 is a multiple of 2), but can be up to ~750 digits long for a binary64 number.
What you get with method 1. or 2. above is a decimal number that is closer to the original binary64 number than to any other binary64 number. This means that the opposite conversion, from decimal to binary64, will “round back” to the original.
This is where the “non-botched” assumption is necessary: in order for the successive conversions to return to the original number they must respectively produce the closest decimal to the binary64 number passed and the closest binary64 to the decimal number passed. In these conditions, and with the appropriate number of decimal digits for the first conversion, the round-trip is lossless.
I should point out that (non-botched) conversions to and from decimal are expensive operations. Unless human-readability of the result is important for you, you should consider a simpler format to convert to. The C99-style hexadecimal representation for floating-point numbers is a good compromise between conversion cost and readability. It is not the most compact but it contains only printable characters.
The approach of converting to the shortest form which converts back the same is dangerous (the "round-trip" string formatting mode in .NET uses such an approach, and is buggy as a result). There is probably no reason not to have a decimal-to-binary conversion method yield a result which is more than 0.75lsb from the exact specified numerical value, guaranteeing that a conversion will always yield a perfectly-rounded numerical value is expensive and in most cases not particularly helpful. It would be better to ensure that the precise arithmetic value of the decimal expression will be less than 0.25lsb from the double value to be represented. If a that's less than 0.25lsb away from a double is fed to a routine which returns a double within 0.75lsb of it, the latter routine can be guaranteed to yield the same double as was given to the former.
The approach of simply finding the shortest form that yields the same double assumes that any string representation will always be parsed the same way, even if the value represented falls almost exactly halfway between two adjacent double values. Since obtaining a perfectly-rounded result could require reading an arbitrary number of digits (e.g. 1125899906842624.125000...1 should round up to 1125899906842624.25) few implementations are apt to bother; if an implementation is going to ignore digits beyond a certain point, even when that might yield a result that was e.g. more than .056lsb way from the correct one, it shouldn't be trusted to be accurate to 0.50000lsb in any case.

MySql: convert a float to decimal produce more decimal number then the stored in back.sql file

i want to understand this:
i have a dump of a table (a sql script file) from a database that use float 9,2 as default type for numbers.
In the backup file i have a value like '4172.08'.
I restore this file in a new database and i convert the float to decimal 20,5.
Now the value in the field is 4172.08008
...where come from the 008??
tnx at all
where come from the 008??
Short answer:
In order to avoid the float inherent precision error, cast first to decimal(9,2), then to decimal(20,5).
Long answer:
Floating point numbers are prone to rounding errors in digital computers. It is a little hard to explain without throwing up a lot of math, but lets try: the same way 1/3 represented in decimal requires an infinite number of digits (it is 1.3333333...), some numbers that are "round" in decimal notation have infinite number of digits in binary. Because this format is stored in binary and has finite precision, there is an implicit rounding error and you may experience funny things like getting 0.30000000000000004 as the result of 1.1 + 1.2.
This is the difference between float and decimal. Float is a binary type, and can't represent that value exactly. So when you convert to decimal (as expected, a decimal type), its not exactly the original value.
See http://floating-point-gui.de/ for some more information.