How does CHAR(n) represent its data in binary? - mysql

I am currently reading Robin Nixon's Learning PHP, MySQL, & JavaScript book. In particular the chapters on databases using MySQL. Here is the topic that I am struggling to thoroughly understand.
"The year field, however, has more-predictable values, so instead of VARCHAR we use the more efficient CHAR(4) data type. The parameter of 4 allows for 4 bytes of data, supporting all years from –999 to 9999; a byte comprises 8 bits and can have the values 00000000 through 11111111, which are 0 to 255 in decimal."
In the example above, CHAR(4) is set to store 4 characters that will represent a four digit year entry. Why is it that -999 to 9999 are the only options. How is it represented in binary that it is limited to just those values? My original thought process was that since 4 bytes is 32 bits and if all 32 bits are set to 1, 2^32 is about 4 Billion unsigned and about 2 Billion signed, that would be the amount that could be stored.
I appreciate any explanations.

The book you quoted seems to be suggesting using CHAR to store year values, even though this does not store in binary, it stores digit characters. You are correct that four bytes could store a much greater range if you use the bytes to store a binary value of the number.
It's surprising that the book is concerned about the single extra byte needed to encode the length for a VARCHAR, then it uses CHAR to store numeric year values in such an inefficient way.
It's an example of "penny wise, pound foolish."
Take this book (or any book) with a grain of salt. Remember that tech books are written by people, and they can make mistakes.

First all, for limited year values use YEAR (yyyy, 1901 to 2155, or 0000): 1 byte.
INT(4) can be stored in 2 bytes and would be best (as it has no year 2156 problem). Actually 4 bytes; SMALL INT takes 2 bytes, as #BillKarwin mentions.
CHAR(4) in general (single byte charset) is limited to 4 bytes, and but CHAR is padded with spaces, like '712 ' and cannot immediatedly be used with arithmetic.
VARCHAR(4) needs more than CHAR(4) and allows also smaller strings '712'.

Related

mysql char, varchar and decimal byte size

I was reading the mysql documentation on the byte size for different data types, but was a little confused when it came to char, varchar and decimal.
Can somebody help explain the bytes for these three data types, and also answer how many bytes for the following:
char(7)
varchar(9)
decimal(15,2)
decimal(11,6)
Thanks
CHAR(N) is probably the most confusing because a char is not a fixed byte size across character sets. Furthermore, different row formats handle this problem differently. Tersely, if you're using ROW_FORMAT=COMPACT, ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED then CHAR(N) reverse a minimum of N bytes in order to achieve updates in place without fragmentation. If more bytes are required as the result of a different character encoding than it will use more as necessary, trying to use as few as possible, and NO MORE than the maximum character byte length * N is used. If you're using ROW_FORMAT=REDUNDANT, than CHAR(N) always uses the maximum character byte length * N.
VARCHAR(N) and VARBINARY(N) sets a maximum character length per column of N. Below N, MySQL uses the number of bytes required given the string and character encoding used. MySQL then uses one additional byte to record the length of the string if the string is below 256 bytes. If the length of the string is greater than 255 bytes than it uses 2 bytes to record the length of the string. VAR columns are storage efficient but for string columns with frequent UPDATES, one can trade storage for performance by using a fixed length column such as BINARY.
The DECIMAL description is pretty self explanatory:
"Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table"

mySQL - Does Int(9.455.487) take more space than string(John) in mySQL?

I understood that in a database an int takes less space than a string. But what if the int is really longer than the string. For example 9.455.487 vs "John". Which one will take more space? TY
From the documentation, size of int is 4 bytes, whereas for char it is "M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set." and M is the declared column size.
So when you talk of how much space is taken, the int will take up 4 bytes for a value as long as the value is within the range of int. A string like "John", if declared as char(4) will take up 4 * w bytes, so at least 4 bytes assuming w is 1.
Long story short, the size of a number is not how many characters long it is when you write it out, but the number of bytes to represent it in the binary form.
You should be aware of what "int" (integer) is and what strings are. Integer always has some length and that length means how many bytes are in it's binary representation. On the other hand, strings are sequences of bytes. So, depending of encoding, each symbol may be one or more bytes.
The thing that 9.455.487 is "longer" than "John" is irrelevant here. What is relevant - is how DBMS (or whatever other environment) will represent those things. You're seeing "longer" integer versus "shorter" string while it's not so, it's only a matter of "screen" representation (i.e. what you see on the screen).
Answering question - for MySQL, INT is 4 bytes, while string data types may have dynamic length - such as VARCHAR. Static string length date type is CHAR and from that viewpoint, your number and your string will have same length (4 bytes). Strings and integers are just different things to compare for "length". And visual representation should not confuse you. This entities have different internal structure, and, therefore, should not be compared on "length" according to their visual representation.
Also, you should be aware that not always integer will have 4 bytes length - since even in MySQL your number may belong to, for example, BIGINT data type (which is 8 bytes length). And, as mentioned above, for strings there's also encoding issue. For instance, UTF-8 encoded string may have two (or even more) bytes to represent some non-ASCII symbols. In this case each symbol will add more that 1 byte to total string length.

Best data type to store money values in MySQL

I want to store many records in a MySQL database. All of them contains money values. But I don't know how many digits will be inserted for each one.
Which data type do I have to use for this purpose?
VARCHAR or INT (or other numeric data types)?
Since money needs an exact representation don't use data types that are only approximate like float. You can use a fixed-point numeric data type for that like
decimal(15,2)
15 is the precision (total length of value including decimal places)
2 is the number of digits after decimal point
See MySQL Numeric Types:
These types are used when it is important to preserve exact precision, for example with monetary data.
You can use DECIMAL or NUMERIC both are same
The DECIMAL and NUMERIC types store exact numeric data values. These types are used when it is important to preserve exact precision, for example with monetary data. In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC. : MySQL
i.e. DECIMAL(10,2)
Good read
I prefer to use BIGINT, and store the values in by multiply with 100, so that it will become integer.
For e.g., to represent a currency value of 93.49, the value shall be stored as 9349, while displaying the value we can divide by 100 and display. This will occupy less storage space.
Caution:
Mostly we don't perform currency * currency multiplication, in case if we are doing it then divide the result with 100 and store, so that it returns to proper precision.
It depends on your need.
Using DECIMAL(10,2) usually is enough but if you need a little bit more precise values you can set DECIMAL(10,4).
If you work with big values replace 10 with 19.
If your application needs to handle money values up to a trillion then this should work: 13,2
If you need to comply with GAAP (Generally Accepted Accounting Principles) then use: 13,4
Usually you should sum your money values at 13,4 before rounding of the output to 13,2.
At the time this question was asked nobody thought about Bitcoin price. In the case of BTC, it is probably insufficient to use DECIMAL(15,2). If the Bitcoin will rise to $100,000 or more, we will need at least DECIMAL(18,9) to support cryptocurrencies in our apps.
DECIMAL(18,9) takes 12 bytes of space in MySQL (4 bytes per 9 digits).
We use double.
*gasp*
Why?
Because it can represent any 15 digit number with no constraints on where the decimal point is. All for a measly 8 bytes!
So it can represent:
0.123456789012345
123456789012345.0
...and anything in between.
This is useful because we're dealing with global currencies, and double can store the various numbers of decimal places we'll likely encounter.
A single double field can represent 999,999,999,999,999s in Japanese yens, 9,999,999,999,999.99s in US dollars and even 9,999,999.99999999s in bitcoins
If you try doing the same with decimal, you need decimal(30, 15) which costs 14 bytes.
Caveats
Of course, using double isn't without caveats.
However, it's not loss of accuracy as some tend to point out. Even though double itself may not be internally exact to the base 10 system, we can make it exact by rounding the value we pull from the database to its significant decimal places. If needed that is. (e.g. If it's going to be outputted, and base 10 representation is required.)
The caveats are, any time we perform arithmetic with it, we need to normalize the result (by rounding it to its significant decimal places) before:
Performing comparisons on it.
Writing it back to the database.
Another kind of caveat is, unlike decimal(m, d) where the database will prevent programs from inserting a number with more than m digits, no such validations exists with double. A program could insert a user inputted value of 20 digits and it'll end up being silently recorded as an inaccurate amount.
If GAAP Compliance is required or you need 4 decimal places:
DECIMAL(13, 4)
Which supports a max value of:
$999,999,999.9999
Otherwise, if 2 decimal places is enough:
DECIMAL(13,2)
src: https://rietta.com/blog/best-data-types-for-currencymoney-in/
Indeed this relies on the programmer's preferences. I personally use: numeric(15,4) to conform to the Generally Accepted Accounting Principles (GAAP).
Try using
Decimal(19,4)
this usually works with every other DB as well
Storing money as BIGINT multiplied by 100 or more with the reason to use less storage space makes no sense in all "normal" situations.
To stay aligned with GAAP it is sufficient to store currencies in DECIMAL(13,4)
MySQL manual reads that it needs 4 bytes per 9 digits to store DECIMAL.
https://dev.mysql.com/doc/refman/8.0/en/precision-math-decimal-characteristics.html
DECIMAL(13,4) represents 9 digits + 4 fraction digits (decimal places) => 4 + 2 bytes = 6 bytes
compare to 8 bytes required to store BIGINT.
There are 2 valid options:
use integer amount of currency minor units (e.g. cents)
represent amount as decimal value of the currency
In both cases you should use decimal data type to have enough significant digits. The difference can be in precision:
even for integer amount of minor units it's better to have extra precisions for accumulators (account for accumulating 10% fees from 1-cent operations)
different currencies have different number of decimals, cryptocurrencies have up to 18 decimals
The number of decimals can change over time due to inflation
Source and more caveats and facts.
Multiplies 10000 and stores as BIGINT, like "Currency" in Visual Basic and Office. See https://msdn.microsoft.com/en-us/library/office/gg264338.aspx

MySQL stripping off leading zero from integer column

I have a bigint field which when entering a number such as '05555555555' for example, the 0 is being stripped off and only inserting '5555555555'.
What data type should I use to prevent this?
You can't. Integer columns (bigint's) do not store leading zeros (ie. in a visual representation)
Rather than attempt to store a leading zero (by using a varchar field), have a view (or whatever) format the integer into a string in the format you require.
If you need to store something that is actually a string in the Domain model (e.g. a phone number), use a string rather than an integer type field.
BIGINT and other Integer columns do not store the visual representation of a number, only the number itself in binary form (BIGINT is 8 bytes). 5555555555 is stored as:
00000000 00000000 00000000 00000001 01001011 00100011 00001100 11100011
If the preceding zeros are significant to the integrity of your data, you should be using a VARCHAR or CHAR instead of an integer type. Numerical datatypes should only be used for numerical data. US ZIP Codes and phone numbers are NOT numerical data.
bigint stores the data as a number, and 05555555555 and 5555555555 are the same number. You'll need a string type to preserve the leading zero, e.g. varchar with a suitable maximum length.
You might look into altering the field to use UNSIGNED ZEROFILL. This should allow you to store the number with leading zeros.
Problem is if you got a big database with hundredthousands of rows, a bigint, is much faster then a VARHAR field. I got a similar issue with a product database full with European Article Numbers(EAN).Some of those codes start with a leading 0 When i change it to VARCHAR it takes 8 seconds to load certain pages that search for EAN codes, when I change it to BIGINT it turns into 2 seconds.
Big difference in speed indeed.

Why can't tinyint store more than the number 255 in MySQL?

If TINYINT can store three characters, for example, why can't it store up to the number 999?
Because it takes only 8 bit and hence can encode no more than 2^8 = 256 values.
The three characters you see in something like '123' are the result of the binary to decimal conversion. You cannot store arbitrary 3 characters there.
It is 8 bits and can actually store a maximum value of 255. 8 bits have 256 possible states including zero.