How does MySQL varchar know how many bytes indicate the length? - mysql

The MySQL manual and several StackOverflow answers make it clear that varchar uses:
1 byte for varchars with 0-255 characters in them
2 bytes for varchars with more than 255 characters in them.
The first part makes sense. A single byte can store 256 different values, i.e. 0 through 255.
What I am trying to figure out is how MySQL knows how many bytes indicate the length.
Imagine a 255-char varchar starting with the following bytes: [255][w][o][r][d]~
According to the manual, only the first byte is used to indicate the length in this scenario. When reading the field, MySQL will somehow have to know that this is the case here, and that the second byte is not part of the length.
Now imagine a 256-char varchar starting with the following bytes: [255][1][w][o][r][d]~
Now MySQL miraculously knows that it should interpret the first two bytes as the length, when reading the field.
How does it distinguish? The only foolproof way I have come up with is to interpret only the first byte as length, then determine if the text length matches (in its current encoding), and if not, we know that the first two bytes must be the length.

It happens at the time of definition. All length prefixes will be the same size in bytes for a particular VARCHAR column. The VARCHAR column will use 2 bytes or the VARCHAR column will use 1 byte, depending on the defined size in characters, and the character set.
All VARCHAR columns defined such that it might require more than 255 bytes use 2 bytes to store the size. MySQL isn't going to use 1 byte for some values in a column and 2 bytes for others.
MySQL documentation on CHAR and VARCHAR Types states this pretty clearly (emphasis mine):
A column uses one length byte if values require no more than 255
bytes, two length bytes if values may require more than 255 bytes.
If you declare a VARCHAR(255) column to use the utf8 character set, it's still going to use 2 bytes for the length prefix, not 1, since the length in bytes may be greater than 255 with utf8 characters.

Related

Difference between VARCHAR and TEXT in MySQL [duplicate]

This question already has answers here:
VARCHAR vs TEXT in MySQL
(3 answers)
Closed 3 years ago.
When we create a table in MySQL with a VARCHAR column, we have to set the length for it. But for TEXT type we don't have to provide the length.
What are the differences between VARCHAR and TEXT?
TL;DR
TEXT
fixed max size of 65535 characters (you cannot limit the max size)
takes 2 + c bytes of disk space, where c is the length of the stored string.
cannot be (fully) part of an index. One would need to specify a prefix length.
VARCHAR(M)
variable max size of M characters
M needs to be between 1 and 65535
takes 1 + c bytes (for M ≤ 255) or 2 + c (for 256 ≤ M ≤ 65535) bytes of disk space where c is the length of the stored string
can be part of an index
More Details
TEXT has a fixed max size of 2¹⁶-1 = 65535 characters.
VARCHAR has a variable max size M up to M = 2¹⁶-1.
So you cannot choose the size of TEXT but you can for a VARCHAR.
The other difference is, that you cannot put an index (except for a fulltext index) on a TEXT column.
So if you want to have an index on the column, you have to use VARCHAR. But notice that the length of an index is also limited, so if your VARCHAR column is too long you have to use only the first few characters of the VARCHAR column in your index (See the documentation for CREATE INDEX).
But you also want to use VARCHAR, if you know that the maximum length of the possible input string is only M, e.g. a phone number or a name or something like this. Then you can use VARCHAR(30) instead of TINYTEXT or TEXT and if someone tries to save the text of all three "Lord of the Ring" books in your phone number column you only store the first 30 characters :)
Edit: If the text you want to store in the database is longer than 65535 characters, you have to choose MEDIUMTEXT or LONGTEXT, but be careful: MEDIUMTEXT stores strings up to 16 MB, LONGTEXT up to 4 GB. If you use LONGTEXT and get the data via PHP (at least if you use mysqli without store_result), you maybe get a memory allocation error, because PHP tries to allocate 4 GB of memory to be sure the whole string can be buffered. This maybe also happens in other languages than PHP.
However, you should always check the input (Is it too long? Does it contain strange code?) before storing it in the database.
Notice: For both types, the required disk space depends only on the length of the stored string and not on the maximum length.
E.g. if you use the charset latin1 and store the text "Test" in VARCHAR(30), VARCHAR(100) and TINYTEXT, it always requires 5 bytes (1 byte to store the length of the string and 1 byte for each character). If you store the same text in a VARCHAR(2000) or a TEXT column, it would also require the same space, but, in this case, it would be 6 bytes (2 bytes to store the string length and 1 byte for each character).
For more information have a look at the documentation.
Finally, I want to add a notice, that both, TEXT and VARCHAR are variable length data types, and so they most likely minimize the space you need to store the data. But this comes with a trade-off for performance. If you need better performance, you have to use a fixed length type like CHAR. You can read more about this here.
There is an important detail that has been omitted in the answer above.
MySQL imposes a limit of 65,535 bytes for the max size of each row.
The size of a VARCHAR column is counted towards the maximum row size, while TEXT columns are assumed to be storing their data by reference so they only need 9-12 bytes. That means even if the "theoretical" max size of your VARCHAR field is 65,535 characters you won't be able to achieve that if you have more than one column in your table.
Also note that the actual number of bytes required by a VARCHAR field is dependent on the encoding of the column (and the content). MySQL counts the maximum possible bytes used toward the max row size, so if you use a multibyte encoding like utf8mb4 (which you almost certainly should) it will use up even more of your maximum row size.
Correction: Regardless of how MySQL computes the max row size, whether or not the VARCHAR/TEXT field data is ACTUALLY stored in the row or stored by reference depends on your underlying storage engine. For InnoDB the row format affects this behavior. (Thanks Bill-Karwin)
Reasons to use TEXT:
If you want to store a paragraph or more of text
If you don't need to index the column
If you have reached the row size limit for your table
Reasons to use VARCHAR:
If you want to store a few words or a sentence
If you want to index the (entire) column
If you want to use the column with foreign-key constraints

How to constrain varchar in mysql 5.1?

I need to create column in mysql 5.1 that can store user's feedback.
It shouldn't be too long, so I think not more 1000 characters of UTF-8.
The question is how to represent this efficiently in mysql 5.1.
For now I have:
`description` varchar NOT NULL,
But how to constrain varchar to hold at most 1000 characters of UTF-8?
From the documentation:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
This means that you can store up to 65,535 bytes in a VARCHAR column. However, from the String Type Overview:
MySQL interprets length specifications in character column definitions in character units. (Before MySQL 4.1, column lengths were interpreted in bytes.) This applies to CHAR, VARCHAR, and the TEXT types.
So, declare your table with a UTF8 collation and set the length of the varchar to 1,000 characters and MySQL will do the work for you behind the scenes.
Since the size is apparently defined in bytes, ...
-correction- Field size is defined in 'character units'. It's a bit unclear what they mean by that, but I guess they mean 'code units'.
Removed the rest of the detailed explanation, since it wasn't (entirely true).
Correction. In MySQL you actually define the number of characters in the field. It is still limited to the 65535 byte boundary though. Above that, MySQL just reserves 3 bytes per character for UTF-8, which means that you cannot have UTF-8 fields of more than 21844 characters, and declaring a field als VARCHAR(21900) will just fail for that reason: " Column length too big for column 'field1' (max = 21845); use BLOB or TEXT instead: ". The number in this message is wrong, by the way. The actual maximum size is 21844. 21845 is 1/3 of 65535, but I guess you need to subtract the two bytes for the field size header as well.
The limit of 3 bytes is weird, though. The unicode definition is designed to be able to expand with extra characters. There are already supplementary characters of 4 bytes, that actually cannot be stored in a UTF-8 varchar(1) field, or any varchar field for that matter, since MySQL just doesn't seem able to read those characters: "Incorrect string value: '\xF0\xA0\x9C\x8E' for column 'field1' at row 1". So I guess you would need an actual binary/blob column to be able to store these characters.
I think the documentation about this subject is pretty poor, but I've tried some things and came to this conclusion. You can see the fiddle here: http://sqlfiddle.com/#!2/4d938
To the question:
So for your specific situation, declaring the field as varchar(1000) will do the trick, presuming you don't want people to use the supplementary characters in their feedback.
Some things to consider though:
I think a 'feedback' field of 1000 characters is pretty small. For many folks this will be enough, but if you have to say more, it is annoying if you can't. So I would make the field bigger.
varchar fields are stored in the record and consume a part of the maximum row size of 65536 bytes. This is an important fact. You cannot have two varchar(20000) fields in a row, because together they would be larger than this maximum row size.
A better alternative for large text fields would be therefor be to make them TEXT or MEDIUMTEXT, which can be even larger and are stored in a different way.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions.
http://dev.mysql.com/doc/refman/5.0/en/char.html

MySQL: VARCHAR(1024) vs VARCHAR(512)

In MySQL what is the difference between VARCHAR(1024) and VARCHAR(512)? If my item will never be more than 512 characters, what do I lose by using VARCHAR(1024)?
Don't know where you got that from, but it's not possible to create a table with varchar without specifying the length. It results in a syntax error. So your question is obsolete.
UPDATE:
Nothing. Varchar is as the name implies a datatype of variable length, at least to the maximum length you specified when creating the table. This means, that in a varchar column for each row one additional byte is used to store how long the string in the row actually is.
So the difference between varchar(1024) and varchar(512) is, that your data gets truncated when you try to insert more than 1024 or 512 bytes. Note: bytes, not characters. How much bytes each character uses is dependent on the character set you're using.
There is a actually a difference. And it can have a big performance impact if you manipulate big data. If a temporary table is used, the records on disk will take the full length indicated instead of the variable length. A high value will slow down the request even more in that case. Temporary tables can occur for various reasons (such as memory full, or some combinations of group by /order by).
VARCHAR(1024) 1024 this is lenght.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
According to mySQL documentation
In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte
length prefix plus data. The length prefix indicates the number of
bytes in the value. A column uses one length byte if values require no
more than 255 bytes, two length bytes if values may require more than
255 bytes.
A deeper analysis of the performance impact of larger VARCHARs can be found here.

Memory usage of storing strings as varchar in MySQL

I've begun to get very interested in the memory usage of MySQL. So I'm looking at this here:
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
I get very excited about the prospect of saving memory by (for example) needing only a signed smallint where I was using an unsigned int in many places. Then I read about varchars...
"VARCHAR(M) - Length + 1 bytes if column values require 0 – 255 bytes"
What?! Now it appears to me as though storing a single varchar would use up so much memory, that I may as well not even get excited with my int vs. smallint because it's vastly overshadowed by the varchar field. So I come here asking if this is true, because it simply can't be? Are varchars really that terrible? Or should I really not be getting excited at all for my smallint discovery?
edit: Sorry! I should've been more clear. So, let's say I store a varchar with 7 characters, meaning 8 bytes. That means, then, that it uses the same as a number stored in a BIGINT column? That's what I'm concerned about.
What this is saying is that for a given string length, the amount of storage used is equal to the length of the string in bytes, plus one byte to tell MySQL how long the string is.
So for instance, the word "automobile" is 10 bytes (1 for each character), so if it is stored in a varchar column it will take up 11 bytes. 1 for the number 10 , and 1 each for each of the characters in the string.
From the link you posted:
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
The storage requirements depend on these factors:
-The actual length of the column value
-The column's maximum possible length
-The character set used for the column, because some character sets contain multi-byte characters
For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes).
While I am no MySQL DBA, it appears there is a very simple answer to this question, and no need to go deeper into storage sizes - because it is NOT configureable.
Per MySQL memory storage documentation,
MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length.
Thus, you won't have any specific gains by using VARCHAR for a table using the MEMORY storage engine, no matter how VARCHAR is stored on other storage engines such as MyISAM or InnoDB.

MySQL VARCHAR size limit

If I have a column in table with field of type VARCHAR(15) and if I try to insert data of length 16, MySQL gives an error stating
Data too long for column 'testname' at row 1
Does anyone know why VARCHAR fields in MySQL take fixed length? Also how many bytes does a VARCHAR field take per record based on the size given?
From the MySQL 5.0 Manual:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
I only use VARCHAR when I'm certain that the data the column needs to hold will never exceed a certain length, and even then I'm cautious. If I'm storing a text string I tend to use one of the TEXT types.
Check out the MySQL Storage Requirements for more information on how the bytes are used.
If you set a column to be varchar(15) the maximum bytes allowed is 15. Thus you can't pass it more than 15 characters without modifying the column to support more than 15. If you store a 4 character string it should only use around 4 bytes out of a possible 15, whereas if you used char(15) it would have filled in the other 11 with empty bytes.
http://dev.mysql.com/doc/refman/5.0/en/char.html
( My byte calculation was probably off since it's always -1/+1 or something like that ).
Small extra local note. The number of bytes used will depend on the encoding scheme in use. 1 byte per character in latin1 encoding, but up to 3 in UTF8. See link in mlambie's answer for details.
If you look here it should tell you everything about varchar you want to know:
http://dev.mysql.com/doc/refman/5.0/en/char.html
Basically, depending on the length you chose it will use 1 or two bytes to track the length of the current string in that column, so it will store the number of bytes for the data you put in, plus one or two bytes.
So, if you put in 'abc' then it will be 4 or 5 bytes used for that column in that row.
If you used char(15) then even 'abc' would take up 15 bytes, as the data is the right-padded to use up the full 15 bytes.