In the MySQL manual Data Type Storage Requirements, I found:
Data Type Storage Required
--------------------------------------------
TINYTEXT L + 1 bytes, where L < 2^8
TEXT L + 2 bytes, where L < 2^16
If I store 240 characters [utf8-general] in TinyText and also in the Text field the Text field will just eat 1 byte more than the TinyText?
How much space Text will take if I store 1024 letters [utf8-general]?
I think 1024+2 bytes!
Will it eat same space if I save a single character or 2^16 characters in a Text field?
The TinyText can only store up to 255 bytes. That could be as few as 63 characters if you were so unfortunate as to have to store 63 characters that all required 4 bytes in UTF-8. One the other hand, it could store 255 characters if they are all, in fact, in the ASCII subset of UTF-8.
If you store 1024 characters, they will take between 1024 and 4096 (+2) bytes. A Unicode character encoded using UTF-8 will occupy between 1 and 4 bytes.
A single character requiring one byte (U+0000 .. U+007F) will require 3 bytes (1 for the character, 2 for the length) in a Text field. On the other hand, a single character requiring 4 bytes (say U+101001 - I'm not sure that's valid as a Unicode character, but it needs 4 bytes to store it) will require a grand total of 6 bytes to store it. In neither case is it close to 2^16 bytes.
Do learn to distinguish between bytes and characters when dealing with Unicode; it is very important.
Q1: yes
Q2: impossible to answer. Each character in utf-8 can take 1 to 6 bytes. So it will take 1024+2 .. 6144+2 bytes
Q3: nope
Related
Do varchar/text/longtext columns in MySQL occupy full length size in filesystem even for incomplete cells?
Referring to this blog article for example I have the following columns and i have put in a given string in it.
CHAR (72) - can store upto 72 characters - i put in a string "abcd"
VARCHAR (72) - can store upto 72 characters - i put in a string "abcd"
TINYTEXT - can store upto 255 characters - i put in a string "abcd"
TEXT - can store upto 65,535 characters - i put in a string "abcd"
LONGTEXT - can store upto 4,294,967,295 characters - i put in a string "abcd"
How much actual physical space will be occupied by each of these columns for a row?
The full size upto which each column is capable of storing?
Or only as much as required for storing "abcd"?
Check out http://dev.mysql.com/doc/refman/5.7/en/string-type-overview.html
Basically, all of these types except for the CHAR are variable length.
For instance, if you have VARCHAR(72) and you write abcd into it, you will store 5 bytes. 4 bytes for each character, and a 1 byte prefix to store the length of the string (which is 4).
If the length of the string is over 255 characters, then the VARCHAR prefix will be 2 bytes. So a VARCHAR(300) with a 256 character string stored in it will take up 258 bytes.
TINYTEXT has a 1 byte prefix always, because you can only store 255 characters in it, so abcd would take 5 bytes.
TEXT has a 2 byte prefix, so abcd would be 6 bytes.
LONGTEXT has a 4 byte prefix, so abcd would be 8 bytes.
Lastly, there's the nearly useless CHAR type. A CHAR(72) will always take up 72 bytes no matter what you store in it. It's really only useful for super short fields where there is always the exact same number of characters in the field. Like Y or N would be a good CHAR(1) candidate.
Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.
CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes
required for the maximum-length character in the character set
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255
bytes, L + 2 bytes if values may require more than 255 bytes
This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.
Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).
CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is limited for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.
Now, CHAR just goes ahead and reserves this amount of storage.
VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.
Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:
VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 UTF-8 characters is 3 × 85 = 255.
VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 UTF-8 characters is 3 × 86 = 258.
According to the mysql documentation:
A column uses one length byte if values require no more than 255 bytes,
two length bytes if values may require
more than 255 bytes
AND
The maximum row size constrains the
number of columns because the total
width of all columns cannot exceed
this size. For example, utf8
characters require up to three bytes
per character, so for a CHAR(255)
CHARACTER SET utf8 column, the server
must allocate 255 × 3 = 765 bytes per
value. Consequently, a table cannot
contain more than 65,535 / 765 = 85
such columns.
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
From the MySQL documentation:
The CHAR and VARCHAR types are
declared with a length that indicates
the maximum number of characters you
want to store. For example, CHAR(30)
can hold up to 30 characters.
A [VARCHAR] column uses one length
byte if values require no more than
255 bytes, two length bytes if
values may require more than 255
bytes.
This makes the answer to your question depend on the character encoding.
With a single-byte encoding like windows-1252 (which MySQL calls latin1), the character length is the same as the byte length, so you can use a VARCHAR(255).
With UTF-8, a VARCHAR(N) may require up to 3N bytes, as would be the case if all characters were in the range U+0800 to U+FFFF. Thus, a VARCHAR(85) is the greatest that ensures a single-byte byte length (requiring a maximum of 255 bytes).
(Note that MySQL apparently does not support characters outside the BMP. The official definition of UTF-8 allows 4 bytes per character.)
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
This depends on the collation of the VARCHAR column.
As you noted, UTF8 may use up to three bytes per character, so if your declare a UTF8 column more than 85 characters long, there is a chance that it will use more than 255 bytes to store its data, and the length hence should be stored in a two-byte field.
If you use latin1, each character is stored in 1 byte.
So the answer is:
VARCHAR(85) COLLATE UTF8_GENERAL_CI
, or
VARCHAR(255) COLLATE LATIN1_GENERAL_CI
I think you're confusing string size with character representation.
For instance, you could have a character that takes 4 bytes to represent it, and put it inside of string whose max storage size requires only one byte to hold the length since there's less than 255 characters in it.
How many characters can a SQL Server 2008 database field contain when the data type is VARCHAR(MAX)?
From http://msdn.microsoft.com/en-us/library/ms176089.aspx
varchar [ ( n | max ) ]
Variable-length, non-Unicode character
data. n can be a value from 1 through
8,000. max indicates that the maximum
storage size is 2^31-1 bytes. The
storage size is the actual length of
data entered + 2 bytes. The data
entered can be 0 characters in length.
The ISO synonyms for varchar are char
varying or character varying.
1 character = 1 byte. And don't forget 2 bytes for the termination. So, 2^31-3 characters.
For future readers who need this answer quickly:
2^31-1 = 2 147 483 647 characters, or roughly 2.147 billion
See the MSDN reference table for maximum numbers/sizes.
Bytes per varchar(max),
varbinary(max), xml, text, or image
column: 2^31-1
There's a two-byte overhead for the column, so the actual data is 2^31-3 max bytes in length. Assuming you're using a single-byte character encoding, that's 2^31-3 characters total. (If you're using a character encoding that uses more than one byte per character, divide by the total number of bytes per character. If you're using a variable-length character encoding, all bets are off.)
There are a few gotchas worth mentioning - you may need to force the use of varchar(max)
https://dba.stackexchange.com/questions/18483/varcharmax-field-cutting-off-data-after-8000-characters
and print only handles 8000 chars
How to print VARCHAR(MAX) using Print Statement?
Reading this question, a doubt popped into my head:
char and varchar can store up to 255 chars
text can store up to 65k chars
char size in bytes is number of chars
varchar size in bytes is number of chars used + 1
So how much bytes does TEXT actually occupy? ~65KB or number of chars used + 1?
TEXT is a variable length datatype, with a maximum of 65,000 characters.
LONGTEXT can be used for over 4 trillion characters.
To answer your question: it's a variable lenght, and it will only occupy the amount of characters you store.
TEXT occupies a number actual length of your data + 2 bytes.