I was reading the mysql documentation on the byte size for different data types, but was a little confused when it came to char, varchar and decimal.
Can somebody help explain the bytes for these three data types, and also answer how many bytes for the following:
char(7)
varchar(9)
decimal(15,2)
decimal(11,6)
Thanks
CHAR(N) is probably the most confusing because a char is not a fixed byte size across character sets. Furthermore, different row formats handle this problem differently. Tersely, if you're using ROW_FORMAT=COMPACT, ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED then CHAR(N) reverse a minimum of N bytes in order to achieve updates in place without fragmentation. If more bytes are required as the result of a different character encoding than it will use more as necessary, trying to use as few as possible, and NO MORE than the maximum character byte length * N is used. If you're using ROW_FORMAT=REDUNDANT, than CHAR(N) always uses the maximum character byte length * N.
VARCHAR(N) and VARBINARY(N) sets a maximum character length per column of N. Below N, MySQL uses the number of bytes required given the string and character encoding used. MySQL then uses one additional byte to record the length of the string if the string is below 256 bytes. If the length of the string is greater than 255 bytes than it uses 2 bytes to record the length of the string. VAR columns are storage efficient but for string columns with frequent UPDATES, one can trade storage for performance by using a fixed length column such as BINARY.
The DECIMAL description is pretty self explanatory:
"Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table"
Related
I read a lot of articles and SO questions about the topic, but all they did was to confuse me more.
I'm trying to understand what is the longest string that a VARCHAR can hold, and how to define it.
In some places it said that a VARCHAR can be created with a max length of 255 (i.e. VARCHAR(255)) - I don't understand if it means 255 bytes or characters.
In other places it said that a VARCHAR can hold up to 8000 bytes - and then depends on the language (if it's 1-byte per character, such as Latin, or more - that determined the length of the longest string).
In simple terms, what does the n in VARCHAR(n) stands for, and what is the range of n?
Is it bytes? Is it the number of a character? between 0-255? between 0-8000?
How does a really long text is saved? does it get split into multiple columns?
VARCHAR is storing strings in 1 byte per symbol (an opposite to nvarchar that can use 2 or more bytes per character). You can read details here.
A common misconception is to think that CHAR(n) and VARCHAR(n), the n defines the number of characters. But in CHAR(n) and VARCHAR(n) the n defines the string length in bytes (0-8,000). n never defines numbers of characters that can be stored. This is similar to the definition of NCHAR(n) and NVARCHAR(n). The misconception happens because when using single-byte encoding, the storage size of CHAR and VARCHAR is n bytes and the number of characters is also n. However, for multi-byte encoding such as UTF-8, higher Unicode ranges (128-1,114,111) result in one character using two or more bytes. For example, in a column defined as CHAR(10), the Database Engine can store 10 characters that use single-byte encoding (Unicode range 0-127), but less than 10 characters when using multi-byte encoding (Unicode range 128-1,114,111). For more information about Unicode storage and character ranges, see Storage differences between UTF-8 and UTF-16.
I've been reading about disc usage/space for different strings, it says that LONGTEXT takes 4GB.
Is that disk space declared for FULLY FILLED column or JUST CREATED (Empty)
Thank You.
The answer is: L + 4 bytes, where L < 2^32
Variable-length string types are stored using a length prefix plus
data. The length prefix requires from one to four bytes depending on
the data type, and the value of the prefix is L (the byte length of
the string). For example, storage for a MEDIUMTEXT value requires L
bytes to store the value plus three bytes to store the length of the
value.
Source: https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html#data-types-storage-reqs-strings
I understood that in a database an int takes less space than a string. But what if the int is really longer than the string. For example 9.455.487 vs "John". Which one will take more space? TY
From the documentation, size of int is 4 bytes, whereas for char it is "M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set." and M is the declared column size.
So when you talk of how much space is taken, the int will take up 4 bytes for a value as long as the value is within the range of int. A string like "John", if declared as char(4) will take up 4 * w bytes, so at least 4 bytes assuming w is 1.
Long story short, the size of a number is not how many characters long it is when you write it out, but the number of bytes to represent it in the binary form.
You should be aware of what "int" (integer) is and what strings are. Integer always has some length and that length means how many bytes are in it's binary representation. On the other hand, strings are sequences of bytes. So, depending of encoding, each symbol may be one or more bytes.
The thing that 9.455.487 is "longer" than "John" is irrelevant here. What is relevant - is how DBMS (or whatever other environment) will represent those things. You're seeing "longer" integer versus "shorter" string while it's not so, it's only a matter of "screen" representation (i.e. what you see on the screen).
Answering question - for MySQL, INT is 4 bytes, while string data types may have dynamic length - such as VARCHAR. Static string length date type is CHAR and from that viewpoint, your number and your string will have same length (4 bytes). Strings and integers are just different things to compare for "length". And visual representation should not confuse you. This entities have different internal structure, and, therefore, should not be compared on "length" according to their visual representation.
Also, you should be aware that not always integer will have 4 bytes length - since even in MySQL your number may belong to, for example, BIGINT data type (which is 8 bytes length). And, as mentioned above, for strings there's also encoding issue. For instance, UTF-8 encoded string may have two (or even more) bytes to represent some non-ASCII symbols. In this case each symbol will add more that 1 byte to total string length.
This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
How many characters can a SQL Server 2008 database field contain when the data type is VARCHAR(MAX)?
From http://msdn.microsoft.com/en-us/library/ms176089.aspx
varchar [ ( n | max ) ]
Variable-length, non-Unicode character
data. n can be a value from 1 through
8,000. max indicates that the maximum
storage size is 2^31-1 bytes. The
storage size is the actual length of
data entered + 2 bytes. The data
entered can be 0 characters in length.
The ISO synonyms for varchar are char
varying or character varying.
1 character = 1 byte. And don't forget 2 bytes for the termination. So, 2^31-3 characters.
For future readers who need this answer quickly:
2^31-1 = 2 147 483 647 characters, or roughly 2.147 billion
See the MSDN reference table for maximum numbers/sizes.
Bytes per varchar(max),
varbinary(max), xml, text, or image
column: 2^31-1
There's a two-byte overhead for the column, so the actual data is 2^31-3 max bytes in length. Assuming you're using a single-byte character encoding, that's 2^31-3 characters total. (If you're using a character encoding that uses more than one byte per character, divide by the total number of bytes per character. If you're using a variable-length character encoding, all bets are off.)
There are a few gotchas worth mentioning - you may need to force the use of varchar(max)
https://dba.stackexchange.com/questions/18483/varcharmax-field-cutting-off-data-after-8000-characters
and print only handles 8000 chars
How to print VARCHAR(MAX) using Print Statement?