Perl/mysql floating point imprecision - mysql

I'm using Perl to communicate floating point numbers with a mysql database. I perform a multiplication in perl:
$var = 0.001 * 3;
I then store this value in a mysql database in a column of type DOUBLE. I later retrieve the result, perform a further multiplication and addition to the number and store it back into the database
$previous_result_from_db += 0.001*1 + 0.001*0.5.
The result stored in the database should be 0.0045, but instead I get: 0.0045000000000000005. I'm trying to understand where the source of the imprecision is. Is it Perl or the database? What is the correct way to handle this kind of floating point interaction to avoid the imprecision?
Thanks!

"10.0 times 0.1 is hardly ever 1.0" -- Brian Kernighan, The Elements of Programming Style
It is a known limitation of FLOAT and DOUBLE that they are imprecise numeric data types. This is built into the design of the IEEE 754 format. It affects all programming languages that store floating-point numbers using this format.
MySQL documents this in this appendix: B.5.5.8 Problems with Floating-Point Values.
PHP documents this in Warning: Floating point precision.
If you want a scaled numeric data type in MySQL that avoids this rounding behavior, use DECIMAL.

Related

What is precision of MYSQL RAND() function? and how to generate huge random numbers?

What is precision of MYSQL RAND() function?
I can't find it on the official page: MYSQL RAND() function is told to return floating-point number, unfortunately it's precision is not stated in a clear way. It can be a single-precision floating-point data, or double-precision, or any other kind of data.
What I would like to know exactly is - what is the maximum integer range [0,N] in which I can generate random integer numbers with FLOOR(RAND()*N) such that there won't be any "skips" and any number from 0 to N can be generated?
Another thing which I would like to know:
How to generate numbers, which are bigger than N in MySQL?
As written in the MySQL docs the precision is system dependent. So there is not the one answer to your question.
https://dev.mysql.com/doc/internals/en/floating-point-types.html
Since MySQL uses the machine-dependent binary representation of float and double to store values in the database, we have to care about these. Today, most systems use the IEEE standard 754 for binary floating-point arithmetic. It describes a representation for single precision numbers as 1 bit for sign, 8 bits for biased exponent and 23 bits for fraction and for double precision numbers as 1-bit sign, 11-bit biased exponent and 52-bit fraction. However, we can not rely on the fact that every system uses this representation. Luckily, the ISO C standard requires the standard C library to have a header float.h that describes some details of the floating point representation on a machine. The comment above describes the value DBL_DIG. There is an equivalent value FLT_DIG for the C data type float.
At the end I have no clue why the precision of a random number is important in any case. I cannot see any use case

Why would you use a string in JSON to represent a decimal number

Some APIs, like the paypal API use a string type in JSON to represent a decimal number. So "7.47" instead of 7.47.
Why/when would this be a good idea over using the json number value type? AFAIK the number value type allows for infinite precision as well as scientific notation.
The main reason to transfer numeric values in JSON as strings is to eliminate any loss of precision or ambiguity in transfer.
It's true that the JSON spec does not specify a precision for numeric values. This does not mean that JSON numbers have infinite precision. It means that numeric precision is not specified, which means JSON implementations are free to choose whatever numeric precision is convenient to their implementation or goals. It is this variability that can be a pain if your application has specific precision requirements.
Loss of precision generally isn't apparent in the JSON encoding of the numeric value (1.7 is nice and succinct) but manifests in the JSON parsing and intermediate representations on the receiving end. A JSON parsing function would quite reasonably parse 1.7 into an IEEE double precision floating point number. However, finite length / finite precision decimal representations will always run into numbers whose decimal expansions cannot be represented as a finite sequence of digits:
Irrational numbers (like pi and e)
1.7 has a finite representation in base 10 notation, but in binary (base 2) notation, 1.7 cannot be encoded exactly. Even with a near infinite number of binary digits, you'll only get closer to 1.7, but you'll never get to 1.7 exactly.
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
Consumers of the JSON 1.7 value could use more sophisticated techniques to parse and retain the value in memory, such as using a fixed-point data type or a "string int" data type with arbitrary precision, but this will not entirely eliminate the specter of loss of precision in conversion for some numbers. And the reality is, very few JSON parsers bother with such extreme measures, as the benefits for most situations are low and the memory and CPU costs are high.
So if you are wanting to send a precise numeric value to a consumer and you don't want automatic conversion of the value into the typical internal numeric representation, your best bet is to ship the numeric value out as a string and tell the consumer exactly how that string should be processed if and when numeric operations need to be performed on it.
For example: In some JSON producers (JRuby, for one), BigInteger values automatically output to JSON as strings, largely because the range and precision of BigInteger is so much larger than the IEEE double precision float. Reducing the BigInteger value to double in order to output as a JSON numeric will often lose significant digits.
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
Finally, there is another aspect which can lead to choosing to send numeric data as strings: control of display formatting. Leading zeros and trailing zeros are insignificant to the numeric value. If you send JSON number value 2.10 or 004, after conversion to internal numeric form they will be displayed as 2.1 and 4.
If you are sending data that will be directly displayed to the user, you probably want your money figures to line up nicely on the screen, decimal aligned. One way to do that is to make the client responsible for formatting the data for display. Another way to do it is to have the server format the data for display. Simpler for the client to display stuff on screen perhaps, but this can make extracting the numeric value from the string difficult if the client also needs to make computations on the values.
I'll be a bit contrarian and say that 7.47 is perfectly safe in JSON, even for financial amounts, and that "7.47" isn't any safer.
First, let me address some misconceptions from this thread:
So, parsing 1.7 into an in-memory floating point number, then printing out the number will likely return something like 1.69 - not 1.7.
That is not true, especially in the context of IEEE 754 double precision format that was mentioned in that answer. 1.7 converts into an exact double 1.6999999999999999555910790149937383830547332763671875 and when that value is "printed" for display, it will always be 1.7, and never 1.69, 1.699999999999 or 1.70000000001. It is 1.7 "exactly".
Learn more here.
7.47 may actually be 7.4699999923423423423 when converted to float
7.47 already is a float, with an exact double value 7.46999999999999975131004248396493494510650634765625. It will not be "converted" to any other float.
a simple system that simply truncates the extra digits off will result in 7.46 and now you've lost a penny somewhere
IEEE rounds, not truncates. And it would not convert to any other number than 7.47 in the first place.
is the JSON number actually a float? As I understand it's a language independent number, and you could parse a JSON number straight into a java BigDecimal or other arbitrary precision format in any language if so inclined.
It is recommended that JSON numbers are interpreted as doubles (IEEE 754 double-precision format). I haven't seen a parser that wouldn't be doing that.
And no, BigDecimal(7.47) is not the right way to do it – it will actually create a BigDecimal representing the exact double of 7.47, which is 7.46999999999999975131004248396493494510650634765625. To get the expected behavior, BigDecimal("7.47") should be used.
Overall, I don't see any fundamental issue with {"price": 7.47}. It will be converted into a double on virtually all platforms, and the semantics of IEEE 754 guarantee that it will be "printed" as 7.47 exactly and always.
Of course floating point rounding errors can happen on further calculations with that value, see e.g. 0.1 + 0.2 == 0.30000000000000004, but I don't see how strings in JSON make this better. If "7.47" arrives as a string and should be part of some calculation, it will need to be converted to some numeric data type anyway, probably float :).
It's worth noting that strings also have disadvantages, e.g., they cannot be passed to Intl.NumberFormat, they are not a "pure" data type, e.g., the dot is a formatting decision.
I'm not strongly against strings, they seem fine to me as well but I don't see anything wrong on {"price": 7.47} either.
The reason I'm doing it is that the SoftwareAG parser tries to "guess" the java type from the value it receives.
So when it receives
"jackpot":{
"growth":200,
"percentage":66.67
}
The first value (growth) will become a java.lang.Long and the second (percentage) will become a java.lang.Double
Now when the second object in this jackpot-array has this
"jackpot":{
"growth":50.50,
"percentage":65
}
I have a problem.
When I exchange these values as Strings, I have complete control and can cast/convert the values to whatever I want.
Summarized Version
Just quoting from #dthorpe's answer, as I think this is the most important point:
Also, the JSON spec (http://www.json.org/) explicitly states that NaNs and Infinities (INFs) are invalid for JSON numeric values. If you need to express these fringe elements, you cannot use JSON number. You have to use a string or object structure.
I18N is another reason NOT to use String for decimal numbers
In tens of countries, such as Germany and France, comma (,) is the decimal separator and dot (.) is the thousands separator. See the list on Wikipedia.
If your JSON document carries decimal numbers as string, you're relying on all possible API consumers using the same number format conversion (which is a step after the JSON parsing). There's the risk of incorrect conversion due to inverted use of comma and dot as separators.
If you use number for decimal numbers that risk is averted.

How to preserve integer data type when exporting to JSON?

When I export my bigquery tables in JSON format, the INTEGER fields are converted to strings. Is there any way to maintain the integer data type when exporting?
Here are the minimum steps to reproduce the integer->string conversion phenomenon:
Run the query SELECT INTEGER(1) AS myInt and save the result to a table. Note that the output table schema shows the type as INTEGER.
Export the table in JSON format. The output will be: {"myInt":"1"}
In the JSON format, "1" is a string, not an integer.
This is not currently possible; the reasoning is because of an unfortunate combination of the Javascript spec, IEEE floating point precision, JSON, and BigQuery integer sizes.
In Javascript, all numbers must be representable as IEEE754 double-precision floating point values. Javascript parses JSON numbers into javascript numbers. BigQuery uses 64-bit signed integer values.
The problem comes because not all 64-bit integer values can be represented as IEEE 754 double precision floating point values. (it is easy to see why: IEEE 754 double-precision floating point uses 64 bits but can represent lots of things that aren't integers; therefore, there must be 64 bit integers that it can't represent).
So in order to make BigQuery JSON responses work in Javascript, integer values are wrapped in quotes, so that no precision is lost.
That said ... the decision to represent integers as strings in API request makes sense, since many of the callers of the API will be in javascript. There doesn't seem to be as compelling of an argument to not represent integers as numbers when exporting data. (other than to change it now would be a breaking change).
Can you file a bug at the BigQuery issue tracker to fix this? (it would likely involve another flag in the export configuration).

Emulating IBM floating point multiplication/addition in VBA

I am attempting to emulate a (no longer existing) mainframe report generator in an Access 2003 or Access 2010 environment. The data it generates must match exactly with paper reports from the early 70s. Unfortunately, the earliest years data were run on hardware that used IBM floating point representation instead of IEEE. With the help of Google, I've found a library of VBA functions that will convert a float from decimal to the IEEE 754 32bit binary format. I had to modify the library to accept either 32bit or 64bit floats, so I have a modest working knowledge of floating point formats, however, I'm having trouble making the conversion from IEEE to IBM binary format, as well as trouble multiplying and adding either the IBM or the IEEE numbers.
I haven't turned up any other libraries for performing this conversion and arithmetic operations in VBA - is there an easier way to go about this, or an existing library that I'm not finding? Failing that, a clear and straightforward explanation of the relevant algorithms?
Thanks in advance.
To be honest you'd probably do better to start by looking at the Hercules emulator.
http://www.hercules-390.org/ Other than that in theory with VBA you can use the Decimal type to get good results (note you have to CDec to create these) it uses 12 bits with a variable power of ten scalar.
A quick google shows this post from the hercules group, which confirms Alberts point about needing to know the hardware:
---Snip--
In theory, but rather less so in practice. S/360 and S/370 had a
choice of Scientific or Commercial instruction sets. The former added
the FP instructions and registers to the base; the latter the decimal
instructions, including Edit and Edit & Mark. But larger 360 (iirc /65
and up) and 370 (/155 and up) models had the union of the two, called
the Universal instruction set, and at some point the S/370 dropped the
option.
---snip---
I have to say that having looked at the hercules source code you'll probably need to figure out exactly which floating point operation codes (in terms of precision single,long, extended) are being performed.
The problem is here's your confusing the issue of decimal type in access, and that of single and double type floating point values available in access.
If you use the currency data type in access, this is a scaled integer, and will not produce rounding (that is what most of us use for financial calculations and reports). You can also use decimal values in access, and again they don't round at all as they are packed decimals.
However, both the single and double values available inside of access are in fact the same format and conform to the IEEE floating point standard.
For an access single variable, this is a 32bit number, and the range is:
-3.402823E38
to
-1.401298E-45 for negative values
and
1.401298E-45
to
3.402823E38 for positive values
That looks to be the same to me as the IEEE 754 standard.
So, if you add up values in access as a single, you should get the rouding same results.
So, Intel based, and Access single and doubles I believe are the same as this IEEE standard.
The only real issue it and here is what is the format of the original data you're pulling into access, and what kinds of text or string or conversion process is occurring when that data is pulled in and stored?
Access can convert numbers. Try typing these values at the access command line prompt (debug window)
? hex(255)
Above will show FF
? csng(&hFF)
Above will show 255
Edit:
Ah, ok, I see now I have this reversed, my wrong here. The problem here is assuming you convert a number to the older IBM format (Excess 64?), you will THEN have to get your hands on their code that they used for adding those numbers. In fact, even back then, different IBM models depending on what you purchased actually produced different results (more money = more precision).
So, not only do you need conversion routines to convert to the internal representation, you THEN need the routines that add/subtract/multiply those numbers. So, just having conversion routines is not going to get you very far, since you also have to duplicate their exact routines that do math. Those types of routines are likely not all created equal in terms of how they round numbers etc.

MySql: convert a float to decimal produce more decimal number then the stored in back.sql file

i want to understand this:
i have a dump of a table (a sql script file) from a database that use float 9,2 as default type for numbers.
In the backup file i have a value like '4172.08'.
I restore this file in a new database and i convert the float to decimal 20,5.
Now the value in the field is 4172.08008
...where come from the 008??
tnx at all
where come from the 008??
Short answer:
In order to avoid the float inherent precision error, cast first to decimal(9,2), then to decimal(20,5).
Long answer:
Floating point numbers are prone to rounding errors in digital computers. It is a little hard to explain without throwing up a lot of math, but lets try: the same way 1/3 represented in decimal requires an infinite number of digits (it is 1.3333333...), some numbers that are "round" in decimal notation have infinite number of digits in binary. Because this format is stored in binary and has finite precision, there is an implicit rounding error and you may experience funny things like getting 0.30000000000000004 as the result of 1.1 + 1.2.
This is the difference between float and decimal. Float is a binary type, and can't represent that value exactly. So when you convert to decimal (as expected, a decimal type), its not exactly the original value.
See http://floating-point-gui.de/ for some more information.