What is the standard for formatting currency values in JSON? - json

Bearing in mind various quirks of the data types, and localization, what is the best way for a web service to communicate monetary values to and from applications? Is there a standard somewhere?
My first thought was to simply use the number type. For example
"amount": 1234.56
I have seen many arguments about issues with a lack of precision and rounding errors when using floating point data types for monetary calculations--however, we are just transmitting the value, not calculating, so that shouldn't matter.
EventBrite's JSON currency specifications specify something like this:
{
"currency": "USD",
"value": 432,
"display": "$4.32"
}
Bravo for avoiding floating point values, but now we run into another issue: what's the largest number we can hold?
One comment (I don’t know if it’s true, but seems reasonable) claims that, since number implementations vary in JSON, the best you can expect is a 32-bit signed integer. The largest value a 32-bit signed integer can hold is 2147483647. If we represent values in the minor unit, that’s $21,474,836.47. $21 million seems like a huge number, but it’s not inconceivable that some application may need to work with a value larger than that. The problem gets worse with currencies where 1,000 of the minor unit make a major unit, or where the currency is worth less than the US dollar. For example, a Tunisian Dinar is divided into 1,000 milim. 2147483647 milim, or 2147483.647 TND is $1,124,492.04. It's even more likely values over $1 million may be worked with in some cases. Another example: the subunits of the Vietnamese dong have been rendered useless by inflation, so let’s just use major units. 2147483647 VND is $98,526.55. I’m sure many use cases (bank balances, real estate values, etc.) are substantially higher than that. (EventBrite probably doesn’t have to worry about ticket prices being that high, though!)
If we avoid that problem by communicating the value as a string, how should the string be formatted? Different countries/locales have drastically different formats—different currency symbols, whether the symbol occurs before or after the amount, whether or not there is a space between the symbol and amount, if a comma or period is used to separate the decimal, if commas are used as a thousands separator, parentheses or a minus sign to indicate negative values, and possibly more that I’m not aware of.
Should the app know what locale/currency it's working with, communicate values like
"amount": "1234.56"
back and forth, and trust the app to correctly format the amount? (Also: should the decimal value be avoided, and the value specified in terms of the smallest monetary unit? Or should the major and minor unit be listed in different properties?)
Or should the server provide the raw value and the formatted value?
"amount": "1234.56"
"displayAmount": "$1,234.56"
Or should the server provide the raw value and the currency code, and let the app format it?
"amount": "1234.56"
"currencyCode": "USD"
I assume whichever method is used should be used in both directions, transmitting to and from the server.
I have been unable to find the standard--do you have an answer, or can point me to a resource that defines this? It seems like a common issue.

I don't know if it's the best solution, but what I'm trying now is to just pass values as strings unformatted except for a decimal point, like so:
"amount": "1234.56"
The app could easily parse that (and convert it to double, BigDecimal, int, or whatever method the app developer feels best for floating-point arithmetic). The app would be responsible for formatting the value for display according to locale and currency.
This format could accommodate other currency values, whether highly inflated large numbers, numbers with three digits after the decimal point, numbers with no fractional values at all, etc.
Of course, this would assume the app already knows the locale and currency used (from another call, an app setting, or local device values). If those need to be specified per call, another option would be:
"amount": "1234.56",
"currency": "USD",
"locale": "en_US"
I'm tempted to roll these into one JSON object, but a JSON feed may have multiple amounts for different purposes, and then would only need to specify currency settings once. Of course, if it could vary for each amount listed, then it would be best to encapsulate them together, like so:
{
"amount": "1234.56",
"currency": "USD",
"locale": "en_US"
}
Another debatable approach is for the server to provide the raw amount and the formatted amount. (If so, I would suggest encapsulating it as an object, instead of having multiple properties in a feed that all define the same concept):
{
"displayAmount":"$1,234.56",
"calculationAmount":"1234.56"
}
Here, more of the work is offloaded to the server. It also ensures consistency across different platforms and apps in how the numbers are displayed, while still providing an easily parseable value for conditional testing and the like.
However, it does leave a problem--what if the app needs to perform calculations and then show the results to the user? It will still need to format the number for display. Might as well go with the first example at the top of this answer and give the app control over the formatting.
Those are my thoughts, at least. I've been unable to find any solid best practices or research in this area, so I welcome better solutions or potential pitfalls I haven't pointed out.

AFAIK, there is no "currency" standard in JSON - it is a standard based on rudimentary types. Things you might want to consider is that some currencies do not have a decimal part (Guinean Franc, Indonesian Rupiah) and some can be divided into thousandths (Bahraini Dinar)- hence you don't want to assume two decimal places. For Iranian Real $2million is not going to get you far so I would expect you need to deal with doubles not integers. If you are looking for a general international model then you will need a currency code as countries with hyperinflation often change currencies every year of two to divide the value by 1,000,000 (or 100 mill). Historically Brazil and Iran have both done this, I think.
If you need a reference for currency codes (and a bit of other good information) then take a look here: https://gist.github.com/Fluidbyte/2973986

Amount of money should be represented as string.
The idea of using string is that any client that consumes the json should parse it into decimal type such as BigDecimal to avoid floating point imprecision.
However it would only be meaningful if any part of the system avoids floating point too. Even if the backend is only passing data and not doing any calculation, using floating point would eventually result in what you see (in the program) is not what you get (on the json).
And assuming that the source is a database, it is important to have the data stored with right type. If the data is already stored as floating point then any subsequent conversion or casting would be meaningless as it would technically be passing imprecision around.

ON Dev Portal - API Guidelines - Currencies you may find interesting suggestions :
"price" : {
"amount": 40,
"currency": "EUR"
}
It's a bit harder to produce & format than just a string, but I feel this is the cleanest and meaningful way to achieve it :
uncouple amount and currency
use number JSON type
Here the JSON format suggested:
https://pattern.yaas.io/v2/schema-monetary-amount.json
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"title": "Monetary Amount",
"description":"Schema defining monetary amount in given currency.",
"properties": {
"amount": {
"type": "number",
"description": "The amount in the specified currency"
},
"currency": {
"type": "string",
"pattern": "^[a-zA-Z]{3}$",
"description": "ISO 4217 currency code, e.g.: USD, EUR, CHF"
}
},
"required": [
"amount",
"currency"
]
}
Another questions related to currency format pointed out right or wrongly, that the practice is much more like a string with base units :
{
"price": "40.0"
}

There probably isn't any official standard. We are using the following structure for our products:
"amount": {
"currency": "EUR",
"scale": 2,
"value": 875
}
The example above represents amount €8.75.
Currency is defined as string (and values should correspond to ISO4217), scale and value are integers. The meaning of "scale" is obvious. This structure solves many of the problems with currencies not having fractions, having non-standard fractions etc.

Related

Converting strings, based of units in MongoDB field

I am using MongoDB to store different values based of units.
For example I have a speed field:
"Speed":"1 m/s"
or
"Speed":"1 mph"
I also have distance field, like this:
"Distance": "1 ft"
or
"Distance":"1 meter"
I have about 20 different field types, like speed, distance, power, area, angle, and others I would like to store all the fields of different units types in the same units, so I can compare them. I am not sure if it would be best to do this on input or when I am reading from the database, but either is an option.
I am planning on storing a field unit type, I.E. this field is a speed, and an equation to get to the base unit, I.E. if the speed field has m/s and the base field in ft/sec multiple by 3.28, but I am not sure how to structure this. So ideally the fields above would be something like:
{"Speed":"1 m/s"},
{"Speed":"1 mph"},
{"Distance": "1 ft"},
{"Distance":"1 meter"}
Would become
{"Speed":{"base(ft/sec)":3.28,"orig_val":1,"orig_unit":"m/s"},
{"Speed":{"base(ft/sec)":1.47,"orig_val":1,"orig_unit":"mph"},
{"Distance":{"base(in)":12,"orig_val":1,"orig_unit":"ft"},
{"Distance":{"base(in)":39.37,"orig_val":1,"orig_unit":"meter"}
Some thoughts.
Store a specific field as the same unit, irrespective of how you capture - e.g., distance is always stored as feet. The field would be stored as { distance_feet: 120 }.
Store a specific field as it is captured, i.e., a field can be captured with different units. This will have an additional field specifying the "units". For example, { distance: 120, units: "feet" }. In this case, the field units can be either "feet" or "meters" .
In both the cases, the application (or program) logic can take care of the conversion from feet to meters or vice-versa.
An additional field called as "conversion_factor" can be stored in the collection (e.g., for feet and meters conversion, { conversion_factor: 3.38084 } ). This involves storage of same information in the database many times and with large amount of data this adds to the storage space and memory when data is read - a factor to consider.
What info to store and how depends upon factors:
What is purpose of this data and how it is used in the application?
Is it queried? How and in what format? Captured in what format? How
often? What kind of computations (calculations, comparisons, etc.)
happen upon this data?
I think the application requirements or functionality should drive how you design the data, not how you have to program it. You should know at this stage what are the important things you will be doing with the data.
I have about 20 different field types, like speed, distance, power,
area, angle, and others I would like to store all the fields of
different units types in the same units, so I can compare them.
I think, comparing is a computation and the program can take care of data like "conversion factor" and converting from one unit to another.

Why would you use a string in JSON to represent a decimal number

Some APIs, like the paypal API use a string type in JSON to represent a decimal number. So "7.47" instead of 7.47.
Why/when would this be a good idea over using the json number value type? AFAIK the number value type allows for infinite precision as well as scientific notation.
The main reason to transfer numeric values in JSON as strings is to eliminate any loss of precision or ambiguity in transfer.
It's true that the JSON spec does not specify a precision for numeric values. This does not mean that JSON numbers have infinite precision. It means that numeric precision is not specified, which means JSON implementations are free to choose whatever numeric precision is convenient to their implementation or goals. It is this variability that can be a pain if your application has specific precision requirements.
Loss of precision generally isn't apparent in the JSON encoding of the numeric value (1.7 is nice and succinct) but manifests in the JSON parsing and intermediate representations on the receiving end. A JSON parsing function would quite reasonably parse 1.7 into an IEEE double precision floating point number. However, finite length / finite precision decimal representations will always run into numbers whose decimal expansions cannot be represented as a finite sequence of digits:
Irrational numbers (like pi and e)
1.7 has a finite representation in base 10 notation, but in binary (base 2) notation, 1.7 cannot be encoded exactly. Even with a near infinite number of binary digits, you'll only get closer to 1.7, but you'll never get to 1.7 exactly.
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
Consumers of the JSON 1.7 value could use more sophisticated techniques to parse and retain the value in memory, such as using a fixed-point data type or a "string int" data type with arbitrary precision, but this will not entirely eliminate the specter of loss of precision in conversion for some numbers. And the reality is, very few JSON parsers bother with such extreme measures, as the benefits for most situations are low and the memory and CPU costs are high.
So if you are wanting to send a precise numeric value to a consumer and you don't want automatic conversion of the value into the typical internal numeric representation, your best bet is to ship the numeric value out as a string and tell the consumer exactly how that string should be processed if and when numeric operations need to be performed on it.
For example: In some JSON producers (JRuby, for one), BigInteger values automatically output to JSON as strings, largely because the range and precision of BigInteger is so much larger than the IEEE double precision float. Reducing the BigInteger value to double in order to output as a JSON numeric will often lose significant digits.
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
Finally, there is another aspect which can lead to choosing to send numeric data as strings: control of display formatting. Leading zeros and trailing zeros are insignificant to the numeric value. If you send JSON number value 2.10 or 004, after conversion to internal numeric form they will be displayed as 2.1 and 4.
If you are sending data that will be directly displayed to the user, you probably want your money figures to line up nicely on the screen, decimal aligned. One way to do that is to make the client responsible for formatting the data for display. Another way to do it is to have the server format the data for display. Simpler for the client to display stuff on screen perhaps, but this can make extracting the numeric value from the string difficult if the client also needs to make computations on the values.
I'll be a bit contrarian and say that 7.47 is perfectly safe in JSON, even for financial amounts, and that "7.47" isn't any safer.
First, let me address some misconceptions from this thread:
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
That is not true, especially in the context of IEEE 754 double precision format that was mentioned in that answer. 1.7 converts into an exact double 1.6999999999999999555910790149937383830547332763671875 and when that value is "printed" for display, it will always be 1.7, and never 1.69, 1.699999999999 or 1.70000000001. It is 1.7 "exactly".
Learn more here.
7.47 may actually be 7.4699999923423423423 when converted to float
7.47 already is a float, with an exact double value 7.46999999999999975131004248396493494510650634765625. It will not be "converted" to any other float.
a simple system that simply truncates the extra digits off will result in 7.46 and now you've lost a penny somewhere
IEEE rounds, not truncates. And it would not convert to any other number than 7.47 in the first place.
is the JSON number actually a float? As I understand it's a language independent number, and you could parse a JSON number straight into a java BigDecimal or other arbitrary precision format in any language if so inclined.
It is recommended that JSON numbers are interpreted as doubles (IEEE 754 double-precision format). I haven't seen a parser that wouldn't be doing that.
And no, BigDecimal(7.47) is not the right way to do it – it will actually create a BigDecimal representing the exact double of 7.47, which is 7.46999999999999975131004248396493494510650634765625. To get the expected behavior, BigDecimal("7.47") should be used.
Overall, I don't see any fundamental issue with {"price": 7.47}. It will be converted into a double on virtually all platforms, and the semantics of IEEE 754 guarantee that it will be "printed" as 7.47 exactly and always.
Of course floating point rounding errors can happen on further calculations with that value, see e.g. 0.1 + 0.2 == 0.30000000000000004, but I don't see how strings in JSON make this better. If "7.47" arrives as a string and should be part of some calculation, it will need to be converted to some numeric data type anyway, probably float :).
It's worth noting that strings also have disadvantages, e.g., they cannot be passed to Intl.NumberFormat, they are not a "pure" data type, e.g., the dot is a formatting decision.
I'm not strongly against strings, they seem fine to me as well but I don't see anything wrong on {"price": 7.47} either.
The reason I'm doing it is that the SoftwareAG parser tries to "guess" the java type from the value it receives.
So when it receives
"jackpot":{
"growth":200,
"percentage":66.67
}
The first value (growth) will become a java.lang.Long and the second (percentage) will become a java.lang.Double
Now when the second object in this jackpot-array has this
"jackpot":{
"growth":50.50,
"percentage":65
}
I have a problem.
When I exchange these values as Strings, I have complete control and can cast/convert the values to whatever I want.
Summarized Version
Just quoting from #dthorpe's answer, as I think this is the most important point:
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
I18N is another reason NOT to use String for decimal numbers
In tens of countries, such as Germany and France, comma (,) is the decimal separator and dot (.) is the thousands separator. See the list on Wikipedia.
If your JSON document carries decimal numbers as string, you're relying on all possible API consumers using the same number format conversion (which is a step after the JSON parsing). There's the risk of incorrect conversion due to inverted use of comma and dot as separators.
If you use number for decimal numbers that risk is averted.

Best column type in mysql for decimal time

I have a timesheet application where people put eg. 3.5 hours worked in a day.
Sometimes someone might put 1.25 I guess.
I stored as float but am now having issues when I retrieve data... Should I have used decimal to 2 or 3 points?.
There are a few ways to tackle this.
If you want to have uniformity in the entries, I'd suggest using a "Fixed-Point Type" value (see more info here). This would mean that 3.5 hours should always be saved as 3.50.
The other option is to convert all your values to integer values (see more info here), and in your application layer, convert the values back to "readable" format. For example, you would store 1.25 hours as 125, and in your application layer, divide by 100, therefore getting 1.25. This method is more useful for currency management/calculation applications where a specific level of precision is necessary.
Floats allow for floating decimal point precision, so there is no restriction on uniformity of precision for decimal values...
For your purposes, I'd recommend fixed-point type unless there's complexity that you didn't mention.
Hope that helps.

How do I store decimals with USER-specified precision in MySQL?

I have an application in which users are typing measured amounts. I would like to respect the precision they are entering when storing the values in MySQL, i.e. if they type 0.050 I don't want that to become 0.05 since that is loosing information on how exact the measurement was done. Is there a way other than storing the value as a string?
0.050 is equal to 0.05 . If you want 3 digits after comma, you have to implement this feature in application.
As per my knowledge for handling precision point we are using FLOAT, DOUBLE or DECIMAL data types. In your case if you are not using any function like SUM(),AVG(),etc then you can use VARCHAR.
Add always a "control" digit, lets say "1", save to database, then get data from database and always trim ONE time the "control" digit "1" from what you've got.
example:
user inputs 0,000050
save as 0,0000501
get the 0,0000501
trim the last "1" (only the last one be carefull)
k.i.s.s. people :D
edit: proper solution of course, add a column to store precision and right-pad zeroes if needed

Text-correlation in MySQL [duplicate]

Suppose I want to match address records (or person names or whatever) against each other to merge records that are most likely referring to the same address. Basically, I guess I would like to calculate some kind of correlation between the text values and merge the records if this value is over a certain threshold.
Example:
"West Lawnmower Drive 54 A" is probably the same as "W. Lawn Mower Dr. 54A" but different from "East Lawnmower Drive 54 A".
How would you approach this problem? Would it be necessary to have some kind of context-based dictionary that knows, in the address case, that "W", "W." and "West" are the same? What about misspellings ("mover" instead of "mower" etc)?
I think this is a tricky one - perhaps there are some well-known algorithms out there?
A good baseline, probably an impractical one in terms of its relatively high computational cost and more importantly its production of many false positive, would be generic string distance algorithms such as
Edit distance (aka Levenshtein distance)
Ratcliff/Obershelp
Depending on the level of accuracy required (which, BTW, should be specified both in terms of its recall and precision, i.e. generally expressing whether it is more important to miss a correlation than to falsely identify one), a home-grown process based on [some of] the following heuristics and ideas could do the trick:
tokenize the input, i.e. see the input as an array of words rather than a string
tokenization should also keep the line number info
normalize the input with the use of a short dictionary of common substituions (such as "dr" at the end of a line = "drive", "Jack" = "John", "Bill" = "William"..., "W." at the begining of a line is "West" etc.
Identify (a bit like tagging, as in POS tagging) the nature of some entities (for example ZIP Code, and Extended ZIP code, and also city
Identify (lookup) some of these entities (for example a relative short database table can include all the Cities / town in the targeted area
Identify (lookup) some domain-related entities (if all/many of the address deal with say folks in the legal profession, a lookup of law firm names or of federal buildings may be of help.
Generally, put more weight on tokens that come from the last line of the address
Put more (or less) weight on tokens with a particular entity type (ex: "Drive", "Street", "Court" should with much less than the tokens which precede them.
Consider a modified SOUNDEX algorithm to help with normalization of
With the above in mind, implement a rule-based evaluator. Tentatively, the rules could be implemented as visitors to a tree/array-like structure where the input is parsed initially (Visitor design pattern).
The advantage of the rule-based framework, is that each heuristic is in its own function and rules can be prioritized, i.e. placing some rules early in the chain, allowing to abort the evaluation early, with some strong heuristics (eg: different City => Correlation = 0, level of confidence = 95% etc...).
An important consideration with search for correlations is the need to a priori compare every single item (here address) with every other item, hence requiring as many as 1/2 n^2 item-level comparisons. Because of this, it may be useful to store the reference items in a way where they are pre-processed (parsed, normalized...) and also to maybe have a digest/key of sort that can be used as [very rough] indicator of a possible correlation (for example a key made of the 5 digit ZIP-Code followed by the SOUNDEX value of the "primary" name).
I would look at producing a similarity comparison metric that, given two objects (strings perhaps), returns "distance" between them.
If you fulfil the following criteria then it helps:
distance between an object and
itself is zero. (reflexive)
distance from a to b is the same in
both directions (transitive)
distance from a to c is not more
than distance from a to b plus
distance from a to c. (triangle
rule)
If your metric obeys these they you can arrange your objects in metric space which means you can run queries like:
Which other object is most like
this one
Give me the 5 objects
most like this one.
There's a good book about it here. Once you've set up the infrastructure for hosting objects and running the queries you can simply plug in different comparison algorithms, compare their performance and then tune them.
I did this for geographic data at university and it was quite fun trying to tune the comparison algorithms.
I'm sure you could come up with something more advanced but you could start with something simple like reducing the address line to the digits and the first letter of each word and then compare the result of that using a longest common subsequence algorithm.
Hope that helps in some way.
You can use Levenshtein edit distance to find strings that differ by only a few characters. BK Trees can help speed up the matching process.
Disclaimer: I don't know any algorithm that does that, but would really be interested in knowing one if it exists. This answer is a naive attempt of trying to solve the problem, with no previous knowledge whatsoever. Comments welcome, please don't laugh too laud.
If you try doing it by hand, I would suggest applying some kind of "normalization" to your strings : lowercase them, remove punctuation, maybe replace common abbreviations with the full words (Dr. => drive, St => street, etc...).
Then, you can try different alignments between the two strings you compare, and compute the correlation by averaging the absolute differences between corresponding letters (eg a = 1, b = 2, etc.. and corr(a, b) = |a - b| = 1) :
west lawnmover drive
w lawnmower street
Thus, even if some letters are different, the correlation would be high. Then, simply keep the maximal correlation you found, and decide that their are the same if the correlation is above a given threshold.
When I had to modify a proprietary program doing this, back in the early 90s, it took many thousands of lines of code in multiple modules, built up over years of experience. Modern machine-learning techniques ought to make it easier, and perhaps you don't need to perform as well (it was my employer's bread and butter).
So if you're talking about merging lists of actual mailing addresses, I'd do it by outsourcing if I can.
The USPS had some tests to measure quality of address standardization programs. I don't remember anything about how that worked, but you might check if they still do it -- maybe you can get some good training data.