What is the relationship between a DECIMAL(m,n) column specification and the actual representation of that column in a 64-bit MySQL implementation?
I'm defining tables in a context where I know I need an exact value (hence DECIMAL) and I don't know how sensitive I am to truncation errors in the decimal portion. I'd therefore like to choose a column specification that makes reasonable use of the underlying storage (I know it's a 64-bit system).
I haven't yet found an answer in the MySQL documentation despite a reasonable search.
It doesn't matter if you're using a 64-bit MySQL build. DECIMAL supports precision greater than 64 bits.
DECIMAL uses 4 bytes for each 9 digits, plus extra bytes for the leftover digits. For example, DECIMAL(32,0) supports up to 9+9+9+5 digits. It will use 4+4+4 bytes for the first 27 digits, then 3 more bytes for the remaining 5 digits. A total of 12+3, or 15 bytes.
The fractional part of the decimal value (after the decimal point) stores digits similarly. So DECIMAL(32,9) would support up to 9+9+5 digits for the integer portion and another 9 digits for the fractional portion. Thus 4+4+3 bytes for the integer and 4 bytes for the fractional part.
There's a more detailed description with examples down to the byte, in the code comments for the decimal2bin() function here:
https://github.com/mysql/mysql-server/blob/8.0/strings/decimal.cc#L1282-L1343
Related
What is precision of MYSQL RAND() function?
I can't find it on the official page: MYSQL RAND() function is told to return floating-point number, unfortunately it's precision is not stated in a clear way. It can be a single-precision floating-point data, or double-precision, or any other kind of data.
What I would like to know exactly is - what is the maximum integer range [0,N] in which I can generate random integer numbers with FLOOR(RAND()*N) such that there won't be any "skips" and any number from 0 to N can be generated?
Another thing which I would like to know:
How to generate numbers, which are bigger than N in MySQL?
As written in the MySQL docs the precision is system dependent. So there is not the one answer to your question.
https://dev.mysql.com/doc/internals/en/floating-point-types.html
Since MySQL uses the machine-dependent binary representation of float and double to store values in the database, we have to care about these. Today, most systems use the IEEE standard 754 for binary floating-point arithmetic. It describes a representation for single precision numbers as 1 bit for sign, 8 bits for biased exponent and 23 bits for fraction and for double precision numbers as 1-bit sign, 11-bit biased exponent and 52-bit fraction. However, we can not rely on the fact that every system uses this representation. Luckily, the ISO C standard requires the standard C library to have a header float.h that describes some details of the floating point representation on a machine. The comment above describes the value DBL_DIG. There is an equivalent value FLT_DIG for the C data type float.
At the end I have no clue why the precision of a random number is important in any case. I cannot see any use case
This is surely a duplicate, but I was not able to find an answer to the following question.
Let's consider the decimal integer 14. We can obtain its binary representation, 1110, using e.g. the divide-by-2 method (% represents the modulus operand):
14 % 2 = 0
7 % 2 = 1
3 % 2 = 1
1 % 2 = 1
but how computers convert decimal to binary integers?
The above method would require the computer to perform arithmetic and, as far as I understand, because arithmetic is performed on binary numbers, it seems we would be back dealing with the same issue.
I suppose that any other algorithmic method would suffer the same problem. How do computers convert decimal to binary integers?
Update: Following a discussion with Code-Apprentice (see comments under his answer), here is a reformulation of the question in two cases of interest:
a) How the conversion to binary is performed when the user types integers on a keyboard?
b) Given a mathematical operation in a programming language, say 12 / 3, how does the conversion from decimal to binary is done when running the program, so that the computer can do the arithmetic?
There is only binary
The computer stores all data as binary. It does not convert from decimal to binary since binary is its native language. When the computer displays a number it will convert from the binary representation to any base, which by default is decimal.
A key concept to understand here is the difference between the computers internal storage and the representation as characters on your monitor. If you want to display a number as binary, you can write an algorithm in code to do the exact steps that you performed by hand. You then print out the characters 1 and 0 as calculated by the algorithm.
Indeed, like you mention in one of you comments, if compiler has a small look-up table to associate decimal integers to binary integers then it can be done with simple binary multiplications and additions.
Look-up table has to contain binary associations for single decimal digits and decimal ten, hundred, thousand, etc.
Decimal 14 can be transformed to binary by multipltying binary 1 by binary 10 and added binary 4.
Decimal 149 would be binary 1 multiplied by binary 100, added to binary 4 multiplied by binary 10 and added binary 9 at the end.
Decimal are misunderstood in a program
let's take an example from c language
int x = 14;
here 14 is not decimal its two characters 1 and 4 which are written together to be 14
we know that characters are just representation for some binary value
1 for 00110001
4 for 00110100
full ascii table for characters can be seen here
so 14 in charcter form actually written as binary 00110001 00110100
00110001 00110100 => this binary is made to look as 14 on computer screen (so we think it as decimal)
we know number 14 evntually should become 14 = 1110
or we can pad it with zero to be
14 = 00001110
for this to happen computer/processor only need to do binary to binary conversion i.e.
00110001 00110100 to 00001110
and we are all set
I've come across two different precision formulas for floating-point numbers.
⌊(N-1) log10(2)⌋ = 6 decimal digits (Single-precision)
and
N log10(2) ≈ 7.225 decimal digits (Single-precision)
Where N = 24 Significant bits (Single-precision)
The first formula is found at the top of page 4 of "IEEE Standard 754 for Binary Floating-Point Arithmetic" written by, Professor W. Kahan.
The second formula is found on the Wikipedia article "Single-precision floating-point format" under section IEEE 754 single-precision binary floating-point format: binary32.
For the first formula, Professor W. Kahan says
If a decimal string with at most 6 sig. dec. is converted to Single and then converted back to the same number of sig. dec.,
then the final string should match the original.
For the second formula, Wikipedia says
...the total precision is 24 bits (equivalent to log10(224) ≈ 7.225 decimal digits).
The results of both formulas (6 and 7.225 decimal digits) are different, and I expected them to be the same because I assumed they both were meant to represent the most significant decimal digits which can be converted to floating-point binary and then converted back to decimal with the same number of significant decimal digits that it started with.
Why do these two numbers differ, and what is the most significant decimal digits precision that can be converted to binary and back to decimal without loss of significance?
These are talking about two slightly different things.
The 7.2251 digits is the precision with which a number can be stored internally. For one example, if you did a computation with a double precision number (so you were starting with something like 15 digits of precision), then rounded it to a single precision number, the precision you'd have left at that point would be approximately 7 digits.
The 6 digits is talking about the precision that can be maintained through a round-trip conversion from a string of decimal digits, into a floating point number, then back to another string of decimal digits.
So, let's assume I start with a number like 1.23456789 as a string, then convert that to a float32, then convert the result back to a string. When I've done this, I can expect 6 digits to match exactly. The seventh digit might be rounded though, so I can't necessarily expect it to match (though it probably will be +/- 1 of the original string.
For example, consider the following code:
#include <iostream>
#include <iomanip>
int main() {
double init = 987.23456789;
for (int i = 0; i < 100; i++) {
float f = init + i / 100.0;
std::cout << std::setprecision(10) << std::setw(20) << f;
}
}
This produces a table like the following:
987.2345581 987.2445679 987.2545776 987.2645874
987.2745972 987.2845459 987.2945557 987.3045654
987.3145752 987.324585 987.3345947 987.3445435
987.3545532 987.364563 987.3745728 987.3845825
987.3945923 987.404541 987.4145508 987.4245605
987.4345703 987.4445801 987.4545898 987.4645386
987.4745483 987.4845581 987.4945679 987.5045776
987.5145874 987.5245972 987.5345459 987.5445557
987.5545654 987.5645752 987.574585 987.5845947
987.5945435 987.6045532 987.614563 987.6245728
987.6345825 987.6445923 987.654541 987.6645508
987.6745605 987.6845703 987.6945801 987.7045898
987.7145386 987.7245483 987.7345581 987.7445679
987.7545776 987.7645874 987.7745972 987.7845459
987.7945557 987.8045654 987.8145752 987.824585
987.8345947 987.8445435 987.8545532 987.864563
987.8745728 987.8845825 987.8945923 987.904541
987.9145508 987.9245605 987.9345703 987.9445801
987.9545898 987.9645386 987.9745483 987.9845581
987.9945679 988.0045776 988.0145874 988.0245972
988.0345459 988.0445557 988.0545654 988.0645752
988.074585 988.0845947 988.0945435 988.1045532
988.114563 988.1245728 988.1345825 988.1445923
988.154541 988.1645508 988.1745605 988.1845703
988.1945801 988.2045898 988.2145386 988.2245483
If we look through this, we can see that the first six significant digits always follow the pattern precisely (i.e., each result is exactly 0.01 greater than its predecessor). As we can see in the original double, the value is actually 98x.xx456--but when we convert the single-precision float to decimal, we can see that the 7th digit frequently would not be read back in correctly--since the subsequent digit is greater than 5, it should round up to 98x.xx46, but some of the values won't (e.g,. the second to last item in the first column is 988.154541, which would be round down instead of up, so we'd end up with 98x.xx45 instead of 46. So, even though the value (as stored) is precise to 7 digits (plus a little), by the time we round-trip the value through a conversion to decimal and back, we can't depend on that seventh digit matching precisely any more (even though there's enough precision that it will a lot more often than not).
1. That basically means 7 digits, and the 8th digit will be a little more accurate than nothing, but not a whole lot--for example, if we were converting from a double of 1.2345678, the .225 digits of precision mean that the last digit would be with about +/- .775 of the what started out there (whereas without the .225 digits of precision, it would be basically +/- 1 of what started out there).
what is the most significant decimal digits precision that can be
converted to binary and back to decimal without loss of significance?
The most significant decimal digits precision that can be converted to binary and back to decimal without loss of significance (for single-precision floating-point numbers or 24-bits) is 6 decimal digits.
Why do these two numbers differ...
The numbers 6 and 7.225 differ, because they define two different things. 6 is the most decimal digits that can be round-tripped. 7.225 is the approximate number of decimal digits precision for a 24-bit binary integer because a 24-bit binary integer can have 7 or 8 decimal digits depending on its specific value.
7.225 was found using the specific binary integer formula.
dspec = b·log10(2) (dspec
= specific decimal digits, b = bits)
However, what you normally need to know, are the minimum and maximum decimal digits for a b-bit integer. The following formulas are used to find the min and max decimal digits (7 and 8 respectively for 24-bits) of a specific binary integer.
dmin = ⌈(b-1)·log10(2)⌉ (dmin
= min decimal digits, b = bits, ⌈x⌉ = smallest integer ≥ x)
dmax = ⌈b·log10(2)⌉ (dmax
= max decimal digits, b = bits, ⌈x⌉ = smallest integer ≥ x)
To learn more about how these formulas are derived, read Number of Decimal Digits In a Binary Integer, written by Rick Regan.
This is all well and good, but you may ask, why is 6 the most decimal digits for a round-trip conversion if you say that the span of decimal digits for a 24-bit number is 7 to 8?
The answer is — because the above formulas only work for integers and not floating-point numbers!
Every decimal integer has an exact value in binary. However, the same cannot be said for every decimal floating-point number. Take .1 for example. .1 in binary is the number 0.000110011001100..., which is a repeating or recurring binary. This can produce rounding error.
Moreover, it takes one more bit to represent a decimal floating-point number than it does to represent a decimal integer of equal significance. This is because floating-point numbers are more precise the closer they are to 0, and less precise the further they are from 0. Because of this, many floating-point numbers near the minimum and maximum value ranges (emin = -126 and emax = +127 for single-precision) lose 1 bit of precision due to rounding error. To see this visually, look at What every computer programmer should know about floating point, part 1, written by Josh Haberman.
Furthermore, there are at least 784,757 positive seven-digit decimal numbers that cannot retain their original value after a round-trip conversion. An example of such a number that cannot survive the round-trip is 8.589973e9. This is the smallest positive number that does not retain its original value.
Here's the formula that you should be using for floating-point number precision that will give you 6 decimal digits for round-trip conversion.
dmax = ⌊(b-1)·log10(2)⌋ (dmax
= max decimal digits, b = bits, ⌊x⌋ = largest integer ≤ x)
To learn more about how this formula is derived, read Number of Digits Required For Round-Trip Conversions, also written by Rick Regan. Rick does an excellent job showing the formulas derivation with references to rigorous proofs.
As a result, you can utilize the above formulas in a constructive way; if you understand how they work, you can apply them to any programming language that uses floating-point data types. All you have to know is the number of significant bits that your floating-point data type has, and you can find their respective number of decimal digits that you can count on to have no loss of significance after a round-trip conversion.
June 18, 2017 Update: I want to include a link to Rick Regan's new article which goes into more detail and in my opinion better answers this question than any answer provided here. His article is "Decimal Precision of Binary Floating-Point Numbers" and can be found on his website www.exploringbinary.com.
Do keep in mind that they are the exact same formulas. Remember your high-school math book identity:
Log(x^y) == y * Log(x)
It helps to actually calculate the values for N = 24 with your calculator:
Kahan's: 23 * Log(2) = 6.924
Wikipedia's: Log(2^24) = 7.225
Kahan was forced to truncate 6.924 down to 6 digits because of floor(), bummer. The only actual difference is that Kahan used 1 less bit of precision.
Pretty hard to guess why, the professor might have relied on old notes. Written before IEEE-754 and not taking into account that the 24th bit of precision is for free. The format uses a trick, the most significant bit of a floating point value that isn't 0 is always 1. So it doesn't need to be stored. The processor adds it back before it performs a calculation. Turning 23 bits of stored precision into 24 of effective precision.
Or he took into account that the conversion from a decimal string to a binary floating point value itself generates an error. Many nice round decimal values, like 0.1, cannot be perfectly converted to binary. It has an endless number of digits, just like 1/3 in decimal. That however generates a result that is off by +/- 0.5 bits, achieved by simple rounding. So the result is accurate to 23.5 * Log(2) = 7.074 decimal digits. If he assumed that the conversion routine is clumsy and doesn't properly round then the result can be off by +/-1 bit and N-1 is appropriate. They are not clumsy.
Or he thought like a typical scientist or (heaven forbid) accountant and wants the result of a calculation converted back to decimal as well. Such as you'd get when you trivially look for a 7 digit decimal number whose conversion back-and-forth does not produce the same number. Yes, that adds another +/- 0.5 bit error, summing up to 1 bit error total.
But never, never make that mistake, you always have to include any errors you get from manipulating the number in a calculation. Some of them lose significant digits very quickly, subtraction in particular is very dangerous.
If I wanted to represent -2455.1152 as 32 bit I know the first bit is 1 (negative sign) but I can get the 2455 to binary as 10010010111 but for the fractional part I'm not too sure. .1152 could have an infinite number of fractional parts. Would that mean that only up to 23 bits are used to represent the fractional part? So since 2445 uses 11 bits, bits 11 to 0 are for the fractional part?
for the binary representation I have 10010010111.00011101001. Exponent is 10. 10+127=137. 137 as binary is 10001001.
full representation would be:
1 10001001 1001001011100011101001
is that right?
It looks like you are trying to devise your own floating-point representation, but you used a fixed-point tag so I will explain how to convert your real number to a traditional fixed-point representation. First, you need to decide how many bits will be used to represent the fractional part of the number. Just for the sake of discussion let's say that 16 bits will be used for the fractional part, 15 bits for the integer part, and one bit reserved for the sign bit. Now, multiply the absolute value of the real number by 2^{16}: 2455.1152 * 65536 = 160898429.747. You can either round to the nearest integer or just truncate. Suppose we just truncate to 160898429. Converting this to hexadecimal we get 0x09971D7D. To make this negative, invert and add a 1 to the LSB, and the final result is 0xF668E283.
To convert back to a real number just reverse the process. Take the absolute value of the fixed-point representation and divide by 2^{16}. In this case we would find that the fixed-point representation is equal to the real number -2455.1151886 . The accuracy can be improved by rounding instead of truncating when converting from real to fixed-point, or by allowing more bits for the fractional part.
I want to store many records in a MySQL database. All of them contains money values. But I don't know how many digits will be inserted for each one.
Which data type do I have to use for this purpose?
VARCHAR or INT (or other numeric data types)?
Since money needs an exact representation don't use data types that are only approximate like float. You can use a fixed-point numeric data type for that like
decimal(15,2)
15 is the precision (total length of value including decimal places)
2 is the number of digits after decimal point
See MySQL Numeric Types:
These types are used when it is important to preserve exact precision, for example with monetary data.
You can use DECIMAL or NUMERIC both are same
The DECIMAL and NUMERIC types store exact numeric data values. These types are used when it is important to preserve exact precision, for example with monetary data. In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC. : MySQL
i.e. DECIMAL(10,2)
Good read
I prefer to use BIGINT, and store the values in by multiply with 100, so that it will become integer.
For e.g., to represent a currency value of 93.49, the value shall be stored as 9349, while displaying the value we can divide by 100 and display. This will occupy less storage space.
Caution:
Mostly we don't perform currency * currency multiplication, in case if we are doing it then divide the result with 100 and store, so that it returns to proper precision.
It depends on your need.
Using DECIMAL(10,2) usually is enough but if you need a little bit more precise values you can set DECIMAL(10,4).
If you work with big values replace 10 with 19.
If your application needs to handle money values up to a trillion then this should work: 13,2
If you need to comply with GAAP (Generally Accepted Accounting Principles) then use: 13,4
Usually you should sum your money values at 13,4 before rounding of the output to 13,2.
At the time this question was asked nobody thought about Bitcoin price. In the case of BTC, it is probably insufficient to use DECIMAL(15,2). If the Bitcoin will rise to $100,000 or more, we will need at least DECIMAL(18,9) to support cryptocurrencies in our apps.
DECIMAL(18,9) takes 12 bytes of space in MySQL (4 bytes per 9 digits).
We use double.
*gasp*
Why?
Because it can represent any 15 digit number with no constraints on where the decimal point is. All for a measly 8 bytes!
So it can represent:
0.123456789012345
123456789012345.0
...and anything in between.
This is useful because we're dealing with global currencies, and double can store the various numbers of decimal places we'll likely encounter.
A single double field can represent 999,999,999,999,999s in Japanese yens, 9,999,999,999,999.99s in US dollars and even 9,999,999.99999999s in bitcoins
If you try doing the same with decimal, you need decimal(30, 15) which costs 14 bytes.
Caveats
Of course, using double isn't without caveats.
However, it's not loss of accuracy as some tend to point out. Even though double itself may not be internally exact to the base 10 system, we can make it exact by rounding the value we pull from the database to its significant decimal places. If needed that is. (e.g. If it's going to be outputted, and base 10 representation is required.)
The caveats are, any time we perform arithmetic with it, we need to normalize the result (by rounding it to its significant decimal places) before:
Performing comparisons on it.
Writing it back to the database.
Another kind of caveat is, unlike decimal(m, d) where the database will prevent programs from inserting a number with more than m digits, no such validations exists with double. A program could insert a user inputted value of 20 digits and it'll end up being silently recorded as an inaccurate amount.
If GAAP Compliance is required or you need 4 decimal places:
DECIMAL(13, 4)
Which supports a max value of:
$999,999,999.9999
Otherwise, if 2 decimal places is enough:
DECIMAL(13,2)
src: https://rietta.com/blog/best-data-types-for-currencymoney-in/
Indeed this relies on the programmer's preferences. I personally use: numeric(15,4) to conform to the Generally Accepted Accounting Principles (GAAP).
Try using
Decimal(19,4)
this usually works with every other DB as well
Storing money as BIGINT multiplied by 100 or more with the reason to use less storage space makes no sense in all "normal" situations.
To stay aligned with GAAP it is sufficient to store currencies in DECIMAL(13,4)
MySQL manual reads that it needs 4 bytes per 9 digits to store DECIMAL.
https://dev.mysql.com/doc/refman/8.0/en/precision-math-decimal-characteristics.html
DECIMAL(13,4) represents 9 digits + 4 fraction digits (decimal places) => 4 + 2 bytes = 6 bytes
compare to 8 bytes required to store BIGINT.
There are 2 valid options:
use integer amount of currency minor units (e.g. cents)
represent amount as decimal value of the currency
In both cases you should use decimal data type to have enough significant digits. The difference can be in precision:
even for integer amount of minor units it's better to have extra precisions for accumulators (account for accumulating 10% fees from 1-cent operations)
different currencies have different number of decimals, cryptocurrencies have up to 18 decimals
The number of decimals can change over time due to inflation
Source and more caveats and facts.
Multiplies 10000 and stores as BIGINT, like "Currency" in Visual Basic and Office. See https://msdn.microsoft.com/en-us/library/office/gg264338.aspx