Relatively new to Rust. I am trying to make an API call which requires the JSON body to be serialized.
The JSON body contains an order_amount key with value which can only take values having INR format 100.36, i.e. Rupees 100 and paise 36. Some more examples 10.48, 3.20, 1.09.
The problem I'm facing is that after serialization with json!() from serde_json, the floating point value becomes something like 100.359765464332.
The API subsequently fails because it expects the order_amount to have only two decimal places.
Here is the code that I have:
The imports
use lambda_runtime::{handler_fn, Context, Error};
use reqwest::header::ACCEPT;
use reqwest::{Response, StatusCode};
use serde_json::json;
use std::env;
#[macro_use]
extern crate serde_derive;
The struct that I'm serializing
#[derive(Serialize, Deserialize, Clone, Debug)]
struct OrderCreationEvent {
order_amount: f32,
customer_details: ...,
order_meta: ...,
}
Eg. The order_amount here has a value of 15.38
async fn so_my_function(
e: OrderCreationEvent,
_c: Context,
) -> std::result::Result<CustomOutput, Error> {
let resp: Response = client
.post(url)
.json::<serde_json::Value>(&json!(e))
.send()
.await?;
After json!(), the amount is being serialized to 15.379345234542. I require 15.38
I read a few articles about writing a custom serializer for f32 which can truncate to 2 decimals, but my proficiency is limited in Rust.
So, I found this code and have been tinkering at it with no luck:
fn order_amount_serializer<S>(x: &f32, s: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
s.serialize_f32(*x)
// Ok(f32::trunc(x * 100.0) / 100.0)
}
Whether or not the custom serializer is the right approach or solution to the problem, I would still like to learn how to write one, so feel free to enlighten me there too.
Cheers! :)
TL;DR it's a floating-point issue coming from serde_json widening f32 to f64. You can reproduce it with code as simple as println!("{}", 77.63_f32 as f64). To fix it, you need to convert to f64, then round, and serialize it as f64:
s.serialize_f64((*n as f64 * 100.0).trunc() / 100.0)
Detailed explanation
The problem in the code lies in a different place than where you think it is - it has to do with floating-point precision and not with serde. When you write something like:
let lat = 77.63_f32;
...you instruct the compiler to convert the fraction 7763/100 into f32. But that number cannot be exactly represented by an f32 because f32 (like all binary floating-point types) uses binary fractions, i.e. rationals whose denominators are powers of two within some size limits. Given those constraints, 7763/100 gets approximated as 10175119/2**17.1 If you try to print that f32 value, you'll get the expected 77.63 output because println!() knows it's printing an f32 where all digits after the 7th one are side effect of approximation and to be discarded.
serde_json works differently - it serializes f32 values by converting them to f64 because that is the precision used by JSON and JavaScript. The unfortunate consequence of that is that the 10175119/2**17 approximation of 77.63_f32 gets widened to f64 without the context of the original desire to store 77.63. The f64 simply stores the approximation (which it can accommodate exactly, without further loss of precision), and when you print the resulting f64, you get 77.62999725341797, that's what 10175119/2**17 looks like in decimal to 16 digits of precision.
This is why implementing a custom serialize as s.serialize_f32(f32::trunc(*x * 100.0) / 100.0) has no effect - you rounded an f32 to two decimal digits (which is in your program a no-op because it was rounded to begin with), and then you passed it to serialize_f32(). serialize_f32() proceeds to widen the f32 value to f64 which makes the extra digits from the f32 approximation visible - and you're back to where you started from with the implementation generated by serde.
The correct version must convert f32 to f64, then get rid of the extra digits in f64 type, and then pass it to serialize_f64() for printing:
s.serialize_f64((*n as f64 * 100.0).trunc() / 100.0)
Playground
That works because: the number 77.63_f32 gets converted to the f64 that corresponds to 10175119/2**17 (i.e. not 77.63_f64, which would be approximated2 to 682840701314007/2**43). This number then gets rounded to two digits in f64, and that rounding produces the closest approximation of 77.63 that f64 is capable of. I.e. now we get the same 682840701314007/2**43 approximation we'd get by using 77.63_f64 in Rust source code. That's the number that serde will work with, and serde_json will format it as 77.63 in JSON output.
Side note: the above code uses trunc() following the attempt in the question, but maybe round() as shown here would be a more appropriate choice.
1
You can obtain this ratio with this Python one-liner:
>>> numpy.float32("77.63").as_integer_ratio()
(10175119, 131072)
2
Also obtained using Python:
>>> n = 10175119/131072
>>> rounded = round(n*100.0)/100.0
>>> rounded
77.63
>>> rounded.as_integer_ratio()
(682840701314007, 8796093022208)
If you need precise control about the formatting of your JSON messages serde_json provides the tools you need. Producing JSON goes through the serde_json::ser::Formatter trait. You can write the write_f32/write_f64 functions in a way, that they only produce two digits after the dot. You can take one of the already existing implementations and adapt them to your need.
Related
How we convert BigDecimal into Double without losing precision in Kotlin?. I need to put it in a JSON response.
I'm using Vert.x and JsonObject. I've tried converting BigDecimal with scale 2 to Double with toDouble. Internally it uses Jackson as Object mapper
Example:
Currently:
BigDecimal("0.000") -> Response: { amount: 0.0 }
What I need:
BigDecimal("0.000") -> Response: { amount: 0.000 }
I'm afraid you can't convert a BigDecimal into a Double without losing precision, for several reasons:
There are many more possible values for BigDecimal than for Double, so the conversion is necessarily lossy.
Doubles are 64-bit, so can't have more than 2⁶⁴ distinct values, while BigDecimals are effectively unlimited.
BigDecimals store decimal fractions, while Doubles store binary fractions. There's very little overlap between the two, so in most cases the conversion will need to round the value.
Both can store integers exactly (up to a certain value), and both can store fractions such as 0.5 exactly. But nearly all decimal fractions can't be represented exactly as a binary fraction, and so for example there's no Double holding exactly 0.1. (1/10 is an infinite recurring fraction in binary — 0.0001100110011… — and so no finite binary fraction can represent it exactly.)
This means that in Kotlin (and most other programming languages), a numeric literal such as 0.1 gets converted to the nearest double-precision number, which is around 0.100000000000000005551115…. In practice, this is usually hidden from you, because when you print out a Double, the formatting routine will round it off, and in many cases that gives back the original number. But not always, e.g.:
>>> println(0.1 + 0.1 + 0.1)
0.30000000000000004
(All of this is discussed in other questions, most notably here.)
Unlike BigDecimals, Doubles have no precision, so they can't make the distinction you want anyway.
For example, both 1.0 and 1.000000 are represented by exactly the same Double value:
>>> println(1.000000)
1.0
I don't know Vert.x, but I'd be surprised if you really needed a Double here. Have you tried using a BigDecimal directly?
Or if that doesn't work, have you tried converting it to a String, which will preserve whatever formatting you want?
I've been using strings to represent decoded JSON integers larger than 32 bits. It seems the string_of_int is capable of dealing with large integer inputs. So a decoder, written (in the Json.Decode namespace):
id: json |> field("id", int) |> string_of_int, /* 'id' is string */
is succefully dealing with integers of at least 37 bits.
Encoding, on the other hand, is proving troublesome for me. The remote server won't accept a string representation, and is expecting an int64. Is it possible to make bs-json support the int64 type? I was hoping something like this could be made to work:
type myData = { id: int64 };
let encodeMyData = (data:myData) => Json.Encode.(object_([("id", int64(myData.id)]))
Having to roll my own encoder is not nearly as formidable as a decoder, but ... I'd rather not.
You don't say exactly what problem you have with encoding. The int encoder does literally nothing except change the type, trusting that the int value is actually valid. So I would assume it's the int_of_string operation that causes problems. But that begs the question, if you can successfully decode it as an int, why are you then converting it to a string?
The underlying problem here is that JavaScript doesn't have 64 bit integers. The max safe integer is 253 - 1. JavaScript doesn't actually have integers at all, only floats, which can represent a certain range of integers, but can't efficiently do integer arithmetic unless they're converted to either 32-bit or 64-bit ints. And so for whatever reason, probably consistent overflow handling, it was decided in the EcmaScript specification that binary bitwise operations should operate on 32-bit integers. And so that opened the possibility for an internal 32-bit representation, a notation for creating 32-bit integers, and the possibility of optimized integer arithmetic on those.
So to your question:
Would it be "safe" to just add external int64 : int64 -> Js.Json.t = "%identity" to the encoder files?
No, because there's no 64-bit integer representation in JavaScript, int64 values are represented as an array of two Numbers I believe, but is also an internal implementation detail that's subject to change. Just casting it to Js.Json.t will not yield the result you expect.
So what can you do then?
I would recommend using float. In most respects this will behave exactly like JavaScript numbers, giving you access to its full range.
Alternatively you can use nativeint, which should behave like floats except for division, where the result is truncated to a 32-bit integer.
Lastly, you could also implement your own int_of_string to create an int that is technically out of range by using a couple of lightweight JavaScript functions directly, though I wouldn't really recommend doing this:
let bad_int_of_string = str =>
str |> Js.Float.fromString |> Js.Math.floor_int;
SUMMARY
some support for JSON was added to XSLT 3.0 + XPath/XQuery 3.1
unfortunately, JSON number types are handled as IEEE double, subjecting the data to loss of numeric precision
I am considering writing a set of custom functions based on Java BigDecimal instead of IEEE double
Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?
DETAIL
I need to perform a number of transformations of JSON data.
XSLT 3.0 + XPath 3.1 + XQuery 3.1 have some support for JSON through json-to-xml + parse-json.
https://www.w3.org/TR/xpath-functions-31/#json-functions
https://www.saxonica.com/papers/xmlprague-2016mhk.pdf
I have hit a significant snag related to treatment of numeric data types.
My JSON data includes numeric values that exceed the precision of IEEE double-floats. In Java, my numeric values need to be processed using BigDecimal.
https://www.w3.org/TR/xpath-functions-31/#json-to-xml-mapping
states
Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point ...
In addition, I have taken a look at the saxonica 9.8 HE reference implementation source for ./ma/json/JsonParser.java and confirm that the private method parseNumericLiteral() returns a primitive double.
I am considering cloning the saxon 9.8 HE JSON support code and using this as the basis for a set of customized functions which uses Java BigDecimal instead of double in order to retain numeric precision through the transformations ...
Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?
Q: Are you aware of any unforeseen issues which I may encounter?
The XML data model defines decimal numbers as having any finite precision.
https://www.w3.org/TR/xmlschema-2/#decimal
The JSON data model defines numbers as having any finite precision.
https://www.rfc-editor.org/rfc/rfc7159#page-6
Not surprisingly, both warn of potential interoperability issues with numeric values with extended precision.
Q: What was the rationale for explicitly defining the JSON number type in XPath/XQuery as IEEE double?
THE END
This is what the RFC says:
This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate
potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.
That, to my mind, is a pretty clear warning: it says that although the JSON grammar allows arbitrary precision in numeric values, you can't rely on JSON consumers to retain that precision, and it follows that if you want to convey high-precision numeric values, it would be better to convey them as strings.
The rules for fn:json-to-xml and fn:xml-to-json need to be read carefully:
The fn:json-to-xml function creates an element whose string value is
lexically the same as the JSON representation of the number. The
fn:xml-to-json function generates a JSON representation that is the
result of casting the (typed or untyped) value of the node to
xs:double and then casting the result to xs:string. Leading and
trailing whitespace is accepted. Since JSON does not impose limits on
the range or precision of numbers, these rules mean that conversion
from JSON to XML will always succeed, and will retain full precision
in the lexical representation unless the data model implementation is
one that reconstructs the string value from the typed value. In the
reverse direction, conversion from XML to JSON may fail if the value
is infinity or NaN, or if the string value is such that casting to
xs:double produces positive or negative infinity.
Although I probably wrote these words, I'm not sure I recall the exact rationale for why the decision was made this way, but it does suggest that the matter received careful thought. I suspect the thinking was that when you consume JSON, you should try to preserve all the information that is present in the input, but when you generate JSON, you should try to generate something that will be acceptable to all consumers. (The famous maxim about being liberal in what you accept and conservative in what you produce.)
Your analysis of the Saxon source isn't quite correct. You say:
the private method parseNumericLiteral() returns a primitive double.
which is true enough; but the original lexical representation is retained, and when the parser communicates the value to a JsonReceiver, it passes both the Java double and the string representation, so the JsonReceiver has access to both (which is needed for a correct implementation of fn:json-to-xml).
Is there any real world hazard to stripping trailing zero and decimal point from numbers output to JSON? Outputting 2 instead of 2.0
I'm not interested in hypotheticals. Do you know of any widely used JSON parsing libraries that would choke on seeing an "integer" value where a float is possible?
For example, a JSON array of number:
[2.4, 5.6, 4, 1, 0.12]
I'd like to minimize the char length of number values I write to JSON, but there are worries that this will confuse some bonehead JSON reader.
As long as the data is being assigned to a variable of a floating-point type, trailing fractional components that evaluate to 0 are superfluous.
I wouldn't worry about your boneheaded JSON reader, nor would I worry about a few extra zeroes after a decimal point upsetting anyone or anything.
Some APIs, like the paypal API use a string type in JSON to represent a decimal number. So "7.47" instead of 7.47.
Why/when would this be a good idea over using the json number value type? AFAIK the number value type allows for infinite precision as well as scientific notation.
The main reason to transfer numeric values in JSON as strings is to eliminate any loss of precision or ambiguity in transfer.
It's true that the JSON spec does not specify a precision for numeric values. This does not mean that JSON numbers have infinite precision. It means that numeric precision is not specified, which means JSON implementations are free to choose whatever numeric precision is convenient to their implementation or goals. It is this variability that can be a pain if your application has specific precision requirements.
Loss of precision generally isn't apparent in the JSON encoding of the numeric value (1.7 is nice and succinct) but manifests in the JSON parsing and intermediate representations on the receiving end. A JSON parsing function would quite reasonably parse 1.7 into an IEEE double precision floating point number. However, finite length / finite precision decimal representations will always run into numbers whose decimal expansions cannot be represented as a finite sequence of digits:
Irrational numbers (like pi and e)
1.7 has a finite representation in base 10 notation, but in binary (base 2) notation, 1.7 cannot be encoded exactly. Even with a near infinite number of binary digits, you'll only get closer to 1.7, but you'll never get to 1.7 exactly.
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
Consumers of the JSON 1.7 value could use more sophisticated techniques to parse and retain the value in memory, such as using a fixed-point data type or a "string int" data type with arbitrary precision, but this will not entirely eliminate the specter of loss of precision in conversion for some numbers. And the reality is, very few JSON parsers bother with such extreme measures, as the benefits for most situations are low and the memory and CPU costs are high.
So if you are wanting to send a precise numeric value to a consumer and you don't want automatic conversion of the value into the typical internal numeric representation, your best bet is to ship the numeric value out as a string and tell the consumer exactly how that string should be processed if and when numeric operations need to be performed on it.
For example: In some JSON producers (JRuby, for one), BigInteger values automatically output to JSON as strings, largely because the range and precision of BigInteger is so much larger than the IEEE double precision float. Reducing the BigInteger value to double in order to output as a JSON numeric will often lose significant digits.
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
Finally, there is another aspect which can lead to choosing to send numeric data as strings: control of display formatting. Leading zeros and trailing zeros are insignificant to the numeric value. If you send JSON number value 2.10 or 004, after conversion to internal numeric form they will be displayed as 2.1 and 4.
If you are sending data that will be directly displayed to the user, you probably want your money figures to line up nicely on the screen, decimal aligned. One way to do that is to make the client responsible for formatting the data for display. Another way to do it is to have the server format the data for display. Simpler for the client to display stuff on screen perhaps, but this can make extracting the numeric value from the string difficult if the client also needs to make computations on the values.
I'll be a bit contrarian and say that 7.47 is perfectly safe in JSON, even for financial amounts, and that "7.47" isn't any safer.
First, let me address some misconceptions from this thread:
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
That is not true, especially in the context of IEEE 754 double precision format that was mentioned in that answer. 1.7 converts into an exact double 1.6999999999999999555910790149937383830547332763671875 and when that value is "printed" for display, it will always be 1.7, and never 1.69, 1.699999999999 or 1.70000000001. It is 1.7 "exactly".
Learn more here.
7.47 may actually be 7.4699999923423423423 when converted to float
7.47 already is a float, with an exact double value 7.46999999999999975131004248396493494510650634765625. It will not be "converted" to any other float.
a simple system that simply truncates the extra digits off will result in 7.46 and now you've lost a penny somewhere
IEEE rounds, not truncates. And it would not convert to any other number than 7.47 in the first place.
is the JSON number actually a float? As I understand it's a language independent number, and you could parse a JSON number straight into a java BigDecimal or other arbitrary precision format in any language if so inclined.
It is recommended that JSON numbers are interpreted as doubles (IEEE 754 double-precision format). I haven't seen a parser that wouldn't be doing that.
And no, BigDecimal(7.47) is not the right way to do it – it will actually create a BigDecimal representing the exact double of 7.47, which is 7.46999999999999975131004248396493494510650634765625. To get the expected behavior, BigDecimal("7.47") should be used.
Overall, I don't see any fundamental issue with {"price": 7.47}. It will be converted into a double on virtually all platforms, and the semantics of IEEE 754 guarantee that it will be "printed" as 7.47 exactly and always.
Of course floating point rounding errors can happen on further calculations with that value, see e.g. 0.1 + 0.2 == 0.30000000000000004, but I don't see how strings in JSON make this better. If "7.47" arrives as a string and should be part of some calculation, it will need to be converted to some numeric data type anyway, probably float :).
It's worth noting that strings also have disadvantages, e.g., they cannot be passed to Intl.NumberFormat, they are not a "pure" data type, e.g., the dot is a formatting decision.
I'm not strongly against strings, they seem fine to me as well but I don't see anything wrong on {"price": 7.47} either.
The reason I'm doing it is that the SoftwareAG parser tries to "guess" the java type from the value it receives.
So when it receives
"jackpot":{
"growth":200,
"percentage":66.67
}
The first value (growth) will become a java.lang.Long and the second (percentage) will become a java.lang.Double
Now when the second object in this jackpot-array has this
"jackpot":{
"growth":50.50,
"percentage":65
}
I have a problem.
When I exchange these values as Strings, I have complete control and can cast/convert the values to whatever I want.
Summarized Version
Just quoting from #dthorpe's answer, as I think this is the most important point:
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
I18N is another reason NOT to use String for decimal numbers
In tens of countries, such as Germany and France, comma (,) is the decimal separator and dot (.) is the thousands separator. See the list on Wikipedia.
If your JSON document carries decimal numbers as string, you're relying on all possible API consumers using the same number format conversion (which is a step after the JSON parsing). There's the risk of incorrect conversion due to inverted use of comma and dot as separators.
If you use number for decimal numbers that risk is averted.