MySQL - TEXT vs CHAR and VARCHAR - mysql

Reading this question, a doubt popped into my head:
char and varchar can store up to 255 chars
text can store up to 65k chars
char size in bytes is number of chars
varchar size in bytes is number of chars used + 1
So how much bytes does TEXT actually occupy? ~65KB or number of chars used + 1?

TEXT is a variable length datatype, with a maximum of 65,000 characters.
LONGTEXT can be used for over 4 trillion characters.
To answer your question: it's a variable lenght, and it will only occupy the amount of characters you store.

TEXT occupies a number actual length of your data + 2 bytes.

Related

Do char / varchar / text / longtext columns in MySQL occupy full size in filesystem even for partially filled cells?

Do varchar/text/longtext columns in MySQL occupy full length size in filesystem even for incomplete cells?
Referring to this blog article for example I have the following columns and i have put in a given string in it.
CHAR (72) - can store upto 72 characters - i put in a string "abcd"
VARCHAR (72) - can store upto 72 characters - i put in a string "abcd"
TINYTEXT - can store upto 255 characters - i put in a string "abcd"
TEXT - can store upto 65,535 characters - i put in a string "abcd"
LONGTEXT - can store upto 4,294,967,295 characters - i put in a string "abcd"
How much actual physical space will be occupied by each of these columns for a row?
The full size upto which each column is capable of storing?
Or only as much as required for storing "abcd"?
Check out http://dev.mysql.com/doc/refman/5.7/en/string-type-overview.html
Basically, all of these types except for the CHAR are variable length.
For instance, if you have VARCHAR(72) and you write abcd into it, you will store 5 bytes. 4 bytes for each character, and a 1 byte prefix to store the length of the string (which is 4).
If the length of the string is over 255 characters, then the VARCHAR prefix will be 2 bytes. So a VARCHAR(300) with a 256 character string stored in it will take up 258 bytes.
TINYTEXT has a 1 byte prefix always, because you can only store 255 characters in it, so abcd would take 5 bytes.
TEXT has a 2 byte prefix, so abcd would be 6 bytes.
LONGTEXT has a 4 byte prefix, so abcd would be 8 bytes.
Lastly, there's the nearly useless CHAR type. A CHAR(72) will always take up 72 bytes no matter what you store in it. It's really only useful for super short fields where there is always the exact same number of characters in the field. Like Y or N would be a good CHAR(1) candidate.

char or varchar

This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

MySQL char & varchar character sets & storage sizes

Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.
CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes
required for the maximum-length character in the character set
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255
bytes, L + 2 bytes if values may require more than 255 bytes
This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.
Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).
CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is limited for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.
Now, CHAR just goes ahead and reserves this amount of storage.
VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.
Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:
VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 UTF-8 characters is 3 × 85 = 255.
VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 UTF-8 characters is 3 × 86 = 258.

Difference in storage space requirement for TinyText and Text - MySQL

In the MySQL manual Data Type Storage Requirements, I found:
Data Type Storage Required
--------------------------------------------
TINYTEXT L + 1 bytes, where L < 2^8
TEXT L + 2 bytes, where L < 2^16
If I store 240 characters [utf8-general] in TinyText and also in the Text field the Text field will just eat 1 byte more than the TinyText?
How much space Text will take if I store 1024 letters [utf8-general]?
I think 1024+2 bytes!
Will it eat same space if I save a single character or 2^16 characters in a Text field?
The TinyText can only store up to 255 bytes. That could be as few as 63 characters if you were so unfortunate as to have to store 63 characters that all required 4 bytes in UTF-8. One the other hand, it could store 255 characters if they are all, in fact, in the ASCII subset of UTF-8.
If you store 1024 characters, they will take between 1024 and 4096 (+2) bytes. A Unicode character encoded using UTF-8 will occupy between 1 and 4 bytes.
A single character requiring one byte (U+0000 .. U+007F) will require 3 bytes (1 for the character, 2 for the length) in a Text field. On the other hand, a single character requiring 4 bytes (say U+101001 - I'm not sure that's valid as a Unicode character, but it needs 4 bytes to store it) will require a grand total of 6 bytes to store it. In neither case is it close to 2^16 bytes.
Do learn to distinguish between bytes and characters when dealing with Unicode; it is very important.
Q1: yes
Q2: impossible to answer. Each character in utf-8 can take 1 to 6 bytes. So it will take 1024+2 .. 6144+2 bytes
Q3: nope

How many characters in varchar(max)?

How many characters can a SQL Server 2008 database field contain when the data type is VARCHAR(MAX)?
From http://msdn.microsoft.com/en-us/library/ms176089.aspx
varchar [ ( n | max ) ]
Variable-length, non-Unicode character
data. n can be a value from 1 through
8,000. max indicates that the maximum
storage size is 2^31-1 bytes. The
storage size is the actual length of
data entered + 2 bytes. The data
entered can be 0 characters in length.
The ISO synonyms for varchar are char
varying or character varying.
1 character = 1 byte. And don't forget 2 bytes for the termination. So, 2^31-3 characters.
For future readers who need this answer quickly:
2^31-1 = 2 147 483 647 characters, or roughly 2.147 billion
See the MSDN reference table for maximum numbers/sizes.
Bytes per varchar(max),
varbinary(max), xml, text, or image
column: 2^31-1
There's a two-byte overhead for the column, so the actual data is 2^31-3 max bytes in length. Assuming you're using a single-byte character encoding, that's 2^31-3 characters total. (If you're using a character encoding that uses more than one byte per character, divide by the total number of bytes per character. If you're using a variable-length character encoding, all bets are off.)
There are a few gotchas worth mentioning - you may need to force the use of varchar(max)
https://dba.stackexchange.com/questions/18483/varcharmax-field-cutting-off-data-after-8000-characters
and print only handles 8000 chars
How to print VARCHAR(MAX) using Print Statement?