How is JSON number encoded? - json

How is number represented in JSON internally and how many bytes of data does it take to store a JSON number?
I can't find any info specifying this internal detail.

According to the ECMA standard (PDF), §8:
A number is represented in base 10 with no superfluous leading zero. It may have a preceding minus sign (U+002D). It may have a (U+002E) prefixed fractional part. It may have an exponent of ten, prefixed by e (U+0065) or E (U+0045) and optionally + (U+002B) or – (U+002D). The digits are the code points U+0030 through U+0039.
So, pretty much text, except that (later on the page) NaN and Infinity aren't acceptable values.
BSON, however, has int32, int64, and double types that are a bit more traditional.

JSON is a data interchange format. It is just text. There is no "internal" representation of JSON, unless you are referring to how your particular system encodes and stores text data.
The number of bytes it takes to store a JSON number would be the length of the number, in characters, multiplied by the number of bytes required to store a character in your particular system.

Related

Convert binary coordinates to decimal ASN.1 UPER

I cannot convert correctly binary numbers to decimal using ASN.1 compilation. Those binaries correspond with lat and long.
lat 1001110010100100101010110011111
long 01101100100101011100100100111000
If I convertem to decimal I get 1314018719 and 1821755704, respectively. However, the coordinates should be this:
enter image description here
I've tried multiple converters but without exit. Any clue?
I don't how you think the encoding works. ASN.1 PER is specified by ITU-T X.680 and ITU-T X.691. (UPER is unaligned PER, a variant of PER defined in the same specs.) The rules for integers include doing things such as encoding as an offset from a lower bound, using a length determinant and minimal octets, using a fixed number of octets and no length determinant, etc., depending on the INTEGER type's constraints. Nobody can tell you how to treat the data you've provided without having the ASN.1 schema and knowing what part of it relates to this data, as well as knowing whether the bits you have include the length determinant or not (if there is one).

Why would you use a string in JSON to represent a decimal number

Some APIs, like the paypal API use a string type in JSON to represent a decimal number. So "7.47" instead of 7.47.
Why/when would this be a good idea over using the json number value type? AFAIK the number value type allows for infinite precision as well as scientific notation.
The main reason to transfer numeric values in JSON as strings is to eliminate any loss of precision or ambiguity in transfer.
It's true that the JSON spec does not specify a precision for numeric values. This does not mean that JSON numbers have infinite precision. It means that numeric precision is not specified, which means JSON implementations are free to choose whatever numeric precision is convenient to their implementation or goals. It is this variability that can be a pain if your application has specific precision requirements.
Loss of precision generally isn't apparent in the JSON encoding of the numeric value (1.7 is nice and succinct) but manifests in the JSON parsing and intermediate representations on the receiving end. A JSON parsing function would quite reasonably parse 1.7 into an IEEE double precision floating point number. However, finite length / finite precision decimal representations will always run into numbers whose decimal expansions cannot be represented as a finite sequence of digits:
Irrational numbers (like pi and e)
1.7 has a finite representation in base 10 notation, but in binary (base 2) notation, 1.7 cannot be encoded exactly. Even with a near infinite number of binary digits, you'll only get closer to 1.7, but you'll never get to 1.7 exactly.
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
Consumers of the JSON 1.7 value could use more sophisticated techniques to parse and retain the value in memory, such as using a fixed-point data type or a "string int" data type with arbitrary precision, but this will not entirely eliminate the specter of loss of precision in conversion for some numbers. And the reality is, very few JSON parsers bother with such extreme measures, as the benefits for most situations are low and the memory and CPU costs are high.
So if you are wanting to send a precise numeric value to a consumer and you don't want automatic conversion of the value into the typical internal numeric representation, your best bet is to ship the numeric value out as a string and tell the consumer exactly how that string should be processed if and when numeric operations need to be performed on it.
For example: In some JSON producers (JRuby, for one), BigInteger values automatically output to JSON as strings, largely because the range and precision of BigInteger is so much larger than the IEEE double precision float. Reducing the BigInteger value to double in order to output as a JSON numeric will often lose significant digits.
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
Finally, there is another aspect which can lead to choosing to send numeric data as strings: control of display formatting. Leading zeros and trailing zeros are insignificant to the numeric value. If you send JSON number value 2.10 or 004, after conversion to internal numeric form they will be displayed as 2.1 and 4.
If you are sending data that will be directly displayed to the user, you probably want your money figures to line up nicely on the screen, decimal aligned. One way to do that is to make the client responsible for formatting the data for display. Another way to do it is to have the server format the data for display. Simpler for the client to display stuff on screen perhaps, but this can make extracting the numeric value from the string difficult if the client also needs to make computations on the values.
I'll be a bit contrarian and say that 7.47 is perfectly safe in JSON, even for financial amounts, and that "7.47" isn't any safer.
First, let me address some misconceptions from this thread:
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
That is not true, especially in the context of IEEE 754 double precision format that was mentioned in that answer. 1.7 converts into an exact double 1.6999999999999999555910790149937383830547332763671875 and when that value is "printed" for display, it will always be 1.7, and never 1.69, 1.699999999999 or 1.70000000001. It is 1.7 "exactly".
Learn more here.
7.47 may actually be 7.4699999923423423423 when converted to float
7.47 already is a float, with an exact double value 7.46999999999999975131004248396493494510650634765625. It will not be "converted" to any other float.
a simple system that simply truncates the extra digits off will result in 7.46 and now you've lost a penny somewhere
IEEE rounds, not truncates. And it would not convert to any other number than 7.47 in the first place.
is the JSON number actually a float? As I understand it's a language independent number, and you could parse a JSON number straight into a java BigDecimal or other arbitrary precision format in any language if so inclined.
It is recommended that JSON numbers are interpreted as doubles (IEEE 754 double-precision format). I haven't seen a parser that wouldn't be doing that.
And no, BigDecimal(7.47) is not the right way to do it – it will actually create a BigDecimal representing the exact double of 7.47, which is 7.46999999999999975131004248396493494510650634765625. To get the expected behavior, BigDecimal("7.47") should be used.
Overall, I don't see any fundamental issue with {"price": 7.47}. It will be converted into a double on virtually all platforms, and the semantics of IEEE 754 guarantee that it will be "printed" as 7.47 exactly and always.
Of course floating point rounding errors can happen on further calculations with that value, see e.g. 0.1 + 0.2 == 0.30000000000000004, but I don't see how strings in JSON make this better. If "7.47" arrives as a string and should be part of some calculation, it will need to be converted to some numeric data type anyway, probably float :).
It's worth noting that strings also have disadvantages, e.g., they cannot be passed to Intl.NumberFormat, they are not a "pure" data type, e.g., the dot is a formatting decision.
I'm not strongly against strings, they seem fine to me as well but I don't see anything wrong on {"price": 7.47} either.
The reason I'm doing it is that the SoftwareAG parser tries to "guess" the java type from the value it receives.
So when it receives
"jackpot":{
"growth":200,
"percentage":66.67
}
The first value (growth) will become a java.lang.Long and the second (percentage) will become a java.lang.Double
Now when the second object in this jackpot-array has this
"jackpot":{
"growth":50.50,
"percentage":65
}
I have a problem.
When I exchange these values as Strings, I have complete control and can cast/convert the values to whatever I want.
Summarized Version
Just quoting from #dthorpe's answer, as I think this is the most important point:
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
I18N is another reason NOT to use String for decimal numbers
In tens of countries, such as Germany and France, comma (,) is the decimal separator and dot (.) is the thousands separator. See the list on Wikipedia.
If your JSON document carries decimal numbers as string, you're relying on all possible API consumers using the same number format conversion (which is a step after the JSON parsing). There's the risk of incorrect conversion due to inverted use of comma and dot as separators.
If you use number for decimal numbers that risk is averted.

Why is JSON invalid if an integer begins with a leading zero?

I'm importing some JSON files into my Parse.com project, and I keep getting the error "invalid key:value pair".
It states that there is an unexpected "8".
Here's an example of my JSON:
}
"Manufacturer":"Manufacturer",
"Model":"THIS IS A STRING",
"Description":"",
"ItemNumber":"Number12345",
"UPC":083456789012,
"Cost":"$0.00",
"DealerPrice":" $0.00 ",
"MSRP":" $0.00 ",
}
If I update the JSON by either removing the 0 from "UPC":083456789012, or converting it to "UPC":"083456789012", it becomes valid.
Can JSON really not accept an integer that begins with 0, or is there a way around the problem?
A leading 0 indicates an octal number in JavaScript. An octal number cannot contain an 8; therefore, that number is invalid.
Moreover, JSON doesn't (officially) support octal numbers, so formally the JSON is invalid, even if the number would not contain an 8. Some parsers do support it though, which may lead to some confusion. Other parsers will recognize it as an invalid sequence and will throw an error, although the exact explanation they give may differ.
Solution: If you have a number, don't ever store it with leading zeroes. If you have a value that needs to have a leading zero, don't treat it as a number, but as a string. Store it with quotes around it.
In this case, you've got a UPC which needs to be 12 digits long and may contain leading zeroes. I think the best way to store it is as a string.
It is debatable, though. If you treat it as a barcode, seeing the leading 0 as an integral part of it, then string makes sense. Other types of barcodes can even contain alphabetic characters.
On the other hand. A UPC is a number, and the fact that it's left-padded with zeroes to 12 digits could be seen as a display property. Actually, if you left-pad it to 13 digits by adding an extra 0, you've got an EAN code, because EAN is a superset of UPC.
If you have a monetary amount, you might display it as € 7.30, while you store it as 7.3, so it could also make sense to store a product code as a number.
But that decision is up to you. I can only advice you to use a string, which is my personal preference for these codes, and if you choose a number, then you'll have to remove the 0 to make it work.
One of the more confusing parts of JavaScript is that if a number starts with a 0 that isn't immediately followed by a ., it represents an octal, not a decimal.
JSON borrows from JavaScript syntax but avoids confusing features, so simply bans numbers with leading zeros (unless then are followed by a .) outright.
Even if this wasn't the case, there would be no reason to expect the 0 to still be in the number when it was parsed since 02 and 2 are just difference representations of the same number (if you force decimal).
If the leading zero is important to your data, then you probably have a string and not a number.
"UPC":"083456789012"
A product code is an identifier, not something you do maths with. It should be a string.
Formally, it is because JSON uses DecimalIntegerLiteral in its JSONNumber production:
JSONNumber ::
-_opt DecimalIntegerLiteral JSONFraction_opt ExponentPart_opt
And DecimalIntegerLiteral may only start with 0 if it is 0:
DecimalIntegerLiteral ::
0
NonZeroDigit DecimalDigits_opt
The rationale behind is is probably:
In the JSON Grammar - to reuse constructs from the main ECMAScript grammar.
In the main ECMAScript grammar - to make it easier to distinguish DecimalIntegerLiteral from HexIntegerLiteral and OctalIntegerLiteral. OctalIntegerLiteral in the first place.
See this productions:
HexIntegerLiteral ::
0x HexDigit
0X HexDigit
HexIntegerLiteral HexDigit
...
OctalIntegerLiteral ::
0 OctalDigit
OctalIntegerLiteral OctalDigit
The UPC should be in string format. For the future you may also get other type of UPC such as GS128 or string based product identification codes. Set your DB column to be string.
If an integer start with 0 in JavaScript it is considered to be the Octal (base 8) value of the integer instead of the decimal (base 10) value. For example:
var a = 065; //Octal Value
var b = 53; //Decimal Value
a == b; //true
I think the easiest way to send your number by JSON is send your number as string.

Is there error propagation when serializing floating point values to strings?

Say I have a float (or double) in my favorite language. Say that in memory this value is stored according to IEEE 754, say that I serialize this value in XML or JSON or plain text using base 10. When serializing and de-serializing this value will I lose precision of my number? When should I care about this precision loss?
Would converting the number to base64 prevent the loss of precision?
It depends on the binary-to-decimal conversion function that you use. Assuming this function is not botched (it has no reason to be):
Either it converts to a fixed precision. Old-fashioned languages such as C offer this kind of conversion to decimal. In this case, you should use a format with 17 significant decimal digits. A common format is D.DDDDDDDDDDDDDDDDEXXX where D and X are decimal digits, and there are 16 digits after the dot. This would be specified as %.16e in C-like languages. Converting back such a decimal value to the nearest double produces the same double that was originally printed.
Or convert it to the shortest decimal representation that converts back to the same double. This is what some modern programming languages (e.g. Java) offer by default as printing function. In this case, the property that parsing back the decimal representation will return the original double is automatic.
In either case loss of accuracy should not happen. This is not because you get the exact decimal representation of the original binary64 number with either method 1. or 2. above: in the general case, you don't. Such an exact representation always exists (because 10 is a multiple of 2), but can be up to ~750 digits long for a binary64 number.
What you get with method 1. or 2. above is a decimal number that is closer to the original binary64 number than to any other binary64 number. This means that the opposite conversion, from decimal to binary64, will “round back” to the original.
This is where the “non-botched” assumption is necessary: in order for the successive conversions to return to the original number they must respectively produce the closest decimal to the binary64 number passed and the closest binary64 to the decimal number passed. In these conditions, and with the appropriate number of decimal digits for the first conversion, the round-trip is lossless.
I should point out that (non-botched) conversions to and from decimal are expensive operations. Unless human-readability of the result is important for you, you should consider a simpler format to convert to. The C99-style hexadecimal representation for floating-point numbers is a good compromise between conversion cost and readability. It is not the most compact but it contains only printable characters.
The approach of converting to the shortest form which converts back the same is dangerous (the "round-trip" string formatting mode in .NET uses such an approach, and is buggy as a result). There is probably no reason not to have a decimal-to-binary conversion method yield a result which is more than 0.75lsb from the exact specified numerical value, guaranteeing that a conversion will always yield a perfectly-rounded numerical value is expensive and in most cases not particularly helpful. It would be better to ensure that the precise arithmetic value of the decimal expression will be less than 0.25lsb from the double value to be represented. If a that's less than 0.25lsb away from a double is fed to a routine which returns a double within 0.75lsb of it, the latter routine can be guaranteed to yield the same double as was given to the former.
The approach of simply finding the shortest form that yields the same double assumes that any string representation will always be parsed the same way, even if the value represented falls almost exactly halfway between two adjacent double values. Since obtaining a perfectly-rounded result could require reading an arbitrary number of digits (e.g. 1125899906842624.125000...1 should round up to 1125899906842624.25) few implementations are apt to bother; if an implementation is going to ignore digits beyond a certain point, even when that might yield a result that was e.g. more than .056lsb way from the correct one, it shouldn't be trusted to be accurate to 0.50000lsb in any case.

Why is it useful to know how to convert between numeric bases?

We are learning about converting Binary to Decimal (and vice-versa) as well as other base-conversion methods, but I don't understand the necessity of this knowledge.
Are there any real-world uses for converting numbers between different bases?
When dealing with Unicode escape codes— '\u2014' in Javascript is — in HTML
When debugging— many debuggers show all numbers in hex
When writing bitmasks— it's more convenient to specify powers of two in hex (or by writing 1 << 4)
In this article I describe a concrete use case. In short, suppose you have a series of bytes you want to transfer using some transport mechanism, but you cannot simply pass the payload as bytes, because you are not able to send binary content. Let's say you can only use 64 characters for encoding the payload. A solution to this problem is to convert the bytes (8-bit characters) into 6-bit characters. Here the number conversion comes into play. Consider the series of bytes as a big number whose base is 256. Then convert it into a number with base 64 and you are done. Each digit of the new base 64 number now denotes a character of your encoded payload...
If you have a device, such as a hard drive, that can only have a set number of states, you can only count in a number system with that many states.
Because a computer's byte only have on and off, you can only represent 0 and 1. Therefore a base2 system is used.
If you have a device that had 3 states, you could represent 0, 1 and 2, and therefore count in a base 3 system.