Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.
CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes
required for the maximum-length character in the character set
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255
bytes, L + 2 bytes if values may require more than 255 bytes
This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.
Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).
CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is limited for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.
Now, CHAR just goes ahead and reserves this amount of storage.
VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.
Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:
VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 UTF-8 characters is 3 × 85 = 255.
VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 UTF-8 characters is 3 × 86 = 258.
Related
I was reading the mysql documentation on the byte size for different data types, but was a little confused when it came to char, varchar and decimal.
Can somebody help explain the bytes for these three data types, and also answer how many bytes for the following:
char(7)
varchar(9)
decimal(15,2)
decimal(11,6)
Thanks
CHAR(N) is probably the most confusing because a char is not a fixed byte size across character sets. Furthermore, different row formats handle this problem differently. Tersely, if you're using ROW_FORMAT=COMPACT, ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED then CHAR(N) reverse a minimum of N bytes in order to achieve updates in place without fragmentation. If more bytes are required as the result of a different character encoding than it will use more as necessary, trying to use as few as possible, and NO MORE than the maximum character byte length * N is used. If you're using ROW_FORMAT=REDUNDANT, than CHAR(N) always uses the maximum character byte length * N.
VARCHAR(N) and VARBINARY(N) sets a maximum character length per column of N. Below N, MySQL uses the number of bytes required given the string and character encoding used. MySQL then uses one additional byte to record the length of the string if the string is below 256 bytes. If the length of the string is greater than 255 bytes than it uses 2 bytes to record the length of the string. VAR columns are storage efficient but for string columns with frequent UPDATES, one can trade storage for performance by using a fixed length column such as BINARY.
The DECIMAL description is pretty self explanatory:
"Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table"
This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
In the MySQL manual Data Type Storage Requirements, I found:
Data Type Storage Required
--------------------------------------------
TINYTEXT L + 1 bytes, where L < 2^8
TEXT L + 2 bytes, where L < 2^16
If I store 240 characters [utf8-general] in TinyText and also in the Text field the Text field will just eat 1 byte more than the TinyText?
How much space Text will take if I store 1024 letters [utf8-general]?
I think 1024+2 bytes!
Will it eat same space if I save a single character or 2^16 characters in a Text field?
The TinyText can only store up to 255 bytes. That could be as few as 63 characters if you were so unfortunate as to have to store 63 characters that all required 4 bytes in UTF-8. One the other hand, it could store 255 characters if they are all, in fact, in the ASCII subset of UTF-8.
If you store 1024 characters, they will take between 1024 and 4096 (+2) bytes. A Unicode character encoded using UTF-8 will occupy between 1 and 4 bytes.
A single character requiring one byte (U+0000 .. U+007F) will require 3 bytes (1 for the character, 2 for the length) in a Text field. On the other hand, a single character requiring 4 bytes (say U+101001 - I'm not sure that's valid as a Unicode character, but it needs 4 bytes to store it) will require a grand total of 6 bytes to store it. In neither case is it close to 2^16 bytes.
Do learn to distinguish between bytes and characters when dealing with Unicode; it is very important.
Q1: yes
Q2: impossible to answer. Each character in utf-8 can take 1 to 6 bytes. So it will take 1024+2 .. 6144+2 bytes
Q3: nope
According to the mysql documentation:
A column uses one length byte if values require no more than 255 bytes,
two length bytes if values may require
more than 255 bytes
AND
The maximum row size constrains the
number of columns because the total
width of all columns cannot exceed
this size. For example, utf8
characters require up to three bytes
per character, so for a CHAR(255)
CHARACTER SET utf8 column, the server
must allocate 255 × 3 = 765 bytes per
value. Consequently, a table cannot
contain more than 65,535 / 765 = 85
such columns.
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
From the MySQL documentation:
The CHAR and VARCHAR types are
declared with a length that indicates
the maximum number of characters you
want to store. For example, CHAR(30)
can hold up to 30 characters.
A [VARCHAR] column uses one length
byte if values require no more than
255 bytes, two length bytes if
values may require more than 255
bytes.
This makes the answer to your question depend on the character encoding.
With a single-byte encoding like windows-1252 (which MySQL calls latin1), the character length is the same as the byte length, so you can use a VARCHAR(255).
With UTF-8, a VARCHAR(N) may require up to 3N bytes, as would be the case if all characters were in the range U+0800 to U+FFFF. Thus, a VARCHAR(85) is the greatest that ensures a single-byte byte length (requiring a maximum of 255 bytes).
(Note that MySQL apparently does not support characters outside the BMP. The official definition of UTF-8 allows 4 bytes per character.)
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
This depends on the collation of the VARCHAR column.
As you noted, UTF8 may use up to three bytes per character, so if your declare a UTF8 column more than 85 characters long, there is a chance that it will use more than 255 bytes to store its data, and the length hence should be stored in a two-byte field.
If you use latin1, each character is stored in 1 byte.
So the answer is:
VARCHAR(85) COLLATE UTF8_GENERAL_CI
, or
VARCHAR(255) COLLATE LATIN1_GENERAL_CI
I think you're confusing string size with character representation.
For instance, you could have a character that takes 4 bytes to represent it, and put it inside of string whose max storage size requires only one byte to hold the length since there's less than 255 characters in it.
How many characters can a SQL Server 2008 database field contain when the data type is VARCHAR(MAX)?
From http://msdn.microsoft.com/en-us/library/ms176089.aspx
varchar [ ( n | max ) ]
Variable-length, non-Unicode character
data. n can be a value from 1 through
8,000. max indicates that the maximum
storage size is 2^31-1 bytes. The
storage size is the actual length of
data entered + 2 bytes. The data
entered can be 0 characters in length.
The ISO synonyms for varchar are char
varying or character varying.
1 character = 1 byte. And don't forget 2 bytes for the termination. So, 2^31-3 characters.
For future readers who need this answer quickly:
2^31-1 = 2 147 483 647 characters, or roughly 2.147 billion
See the MSDN reference table for maximum numbers/sizes.
Bytes per varchar(max),
varbinary(max), xml, text, or image
column: 2^31-1
There's a two-byte overhead for the column, so the actual data is 2^31-3 max bytes in length. Assuming you're using a single-byte character encoding, that's 2^31-3 characters total. (If you're using a character encoding that uses more than one byte per character, divide by the total number of bytes per character. If you're using a variable-length character encoding, all bets are off.)
There are a few gotchas worth mentioning - you may need to force the use of varchar(max)
https://dba.stackexchange.com/questions/18483/varcharmax-field-cutting-off-data-after-8000-characters
and print only handles 8000 chars
How to print VARCHAR(MAX) using Print Statement?