Index size on VARCHAR column - mysql

In trying to index a VARCHAR(256) fields I get the following error:
MySQL said: Index column size too large. The maximum column size is 767 bytes.
It works using VARCHAR(255) but I'm curious why varchar translates to what seems like 3 reserved bytes for each 'char' in a field, if it's using variable width encoding, does it just assume 'worst case' for ever letter, so three bytes if utf-8?
Additionally, if I'm only using English, would the best way to get larger than 255 size to use ascii encoding on the column instead?

Each CHARACTER SET has a maximum length, in bytes. For latin1 and ascii, it is 1. That each character takes one byte.
For utf8 it may take 3 bytes, hence the factor of 3. 3*255+2 = 767. The "2" is for the length.
utf8mb4 covers all of the currently defined Unicode characters, and takes up to 4 bytes per character.
I say "up to" because English takes only 1 byte per char; European languages take 1 or 2 bytes. Chinese and Emojis take 3 or 4.
The implementation of indexing needs to reserve space for the largest possible number of bytes for the column. And there is (was) a limit of 767. Newer versions raised the limit to 3072.
Meanwhile, do not arbitrarily use VARCHAR(256) or even VARCHAR(255), pick some reasonable limit.

Related

Mysql 5 - how many characters can be stored with utf8mb4_unicode_ci?

Assuming I have a varchar(250) and a TEXT field. How many characters (if they have all the maximum size, I think 4 bytes?) can be stored in the first field and in the second field if their encoding is utf8mb4_unicode_ci?
varchar(250) can store up to 250 characters, since the length restriction is applied on the number of characters, not bytes. The overall row length restriction is applied on byte level, however, 250 *4 is only 1000, which is a far cry from the limit of 65,535.
Text columns can hold up to 2^16-1 bytes of data. Since you assumed that each character takes up the maximum 4 bytes, a simple division will give you the maximum number of characters.

How does MySQL varchar know how many bytes indicate the length?

The MySQL manual and several StackOverflow answers make it clear that varchar uses:
1 byte for varchars with 0-255 characters in them
2 bytes for varchars with more than 255 characters in them.
The first part makes sense. A single byte can store 256 different values, i.e. 0 through 255.
What I am trying to figure out is how MySQL knows how many bytes indicate the length.
Imagine a 255-char varchar starting with the following bytes: [255][w][o][r][d]~
According to the manual, only the first byte is used to indicate the length in this scenario. When reading the field, MySQL will somehow have to know that this is the case here, and that the second byte is not part of the length.
Now imagine a 256-char varchar starting with the following bytes: [255][1][w][o][r][d]~
Now MySQL miraculously knows that it should interpret the first two bytes as the length, when reading the field.
How does it distinguish? The only foolproof way I have come up with is to interpret only the first byte as length, then determine if the text length matches (in its current encoding), and if not, we know that the first two bytes must be the length.
It happens at the time of definition. All length prefixes will be the same size in bytes for a particular VARCHAR column. The VARCHAR column will use 2 bytes or the VARCHAR column will use 1 byte, depending on the defined size in characters, and the character set.
All VARCHAR columns defined such that it might require more than 255 bytes use 2 bytes to store the size. MySQL isn't going to use 1 byte for some values in a column and 2 bytes for others.
MySQL documentation on CHAR and VARCHAR Types states this pretty clearly (emphasis mine):
A column uses one length byte if values require no more than 255
bytes, two length bytes if values may require more than 255 bytes.
If you declare a VARCHAR(255) column to use the utf8 character set, it's still going to use 2 bytes for the length prefix, not 1, since the length in bytes may be greater than 255 with utf8 characters.

How to constrain varchar in mysql 5.1?

I need to create column in mysql 5.1 that can store user's feedback.
It shouldn't be too long, so I think not more 1000 characters of UTF-8.
The question is how to represent this efficiently in mysql 5.1.
For now I have:
`description` varchar NOT NULL,
But how to constrain varchar to hold at most 1000 characters of UTF-8?
From the documentation:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
This means that you can store up to 65,535 bytes in a VARCHAR column. However, from the String Type Overview:
MySQL interprets length specifications in character column definitions in character units. (Before MySQL 4.1, column lengths were interpreted in bytes.) This applies to CHAR, VARCHAR, and the TEXT types.
So, declare your table with a UTF8 collation and set the length of the varchar to 1,000 characters and MySQL will do the work for you behind the scenes.
Since the size is apparently defined in bytes, ...
-correction- Field size is defined in 'character units'. It's a bit unclear what they mean by that, but I guess they mean 'code units'.
Removed the rest of the detailed explanation, since it wasn't (entirely true).
Correction. In MySQL you actually define the number of characters in the field. It is still limited to the 65535 byte boundary though. Above that, MySQL just reserves 3 bytes per character for UTF-8, which means that you cannot have UTF-8 fields of more than 21844 characters, and declaring a field als VARCHAR(21900) will just fail for that reason: " Column length too big for column 'field1' (max = 21845); use BLOB or TEXT instead: ". The number in this message is wrong, by the way. The actual maximum size is 21844. 21845 is 1/3 of 65535, but I guess you need to subtract the two bytes for the field size header as well.
The limit of 3 bytes is weird, though. The unicode definition is designed to be able to expand with extra characters. There are already supplementary characters of 4 bytes, that actually cannot be stored in a UTF-8 varchar(1) field, or any varchar field for that matter, since MySQL just doesn't seem able to read those characters: "Incorrect string value: '\xF0\xA0\x9C\x8E' for column 'field1' at row 1". So I guess you would need an actual binary/blob column to be able to store these characters.
I think the documentation about this subject is pretty poor, but I've tried some things and came to this conclusion. You can see the fiddle here: http://sqlfiddle.com/#!2/4d938
To the question:
So for your specific situation, declaring the field as varchar(1000) will do the trick, presuming you don't want people to use the supplementary characters in their feedback.
Some things to consider though:
I think a 'feedback' field of 1000 characters is pretty small. For many folks this will be enough, but if you have to say more, it is annoying if you can't. So I would make the field bigger.
varchar fields are stored in the record and consume a part of the maximum row size of 65536 bytes. This is an important fact. You cannot have two varchar(20000) fields in a row, because together they would be larger than this maximum row size.
A better alternative for large text fields would be therefor be to make them TEXT or MEDIUMTEXT, which can be even larger and are stored in a different way.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions.
http://dev.mysql.com/doc/refman/5.0/en/char.html

What is the MySQL VARCHAR max size?

I would like to know what the max size is for a MySQL VARCHAR type.
I read that the max size is limited by the row size which is about 65k. I tried setting the field to varchar(20000) but it says that that's too large.
I could set it to varchar(10000). What is the exact max I can set it to?
Keep in mind that MySQL has a maximum row size limit
The internal representation of a MySQL table has a maximum row size limit of 65,535 bytes, not counting BLOB and TEXT types. BLOB and TEXT columns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row. Read more about Limits on Table Column Count and Row Size.
Maximum size a single column can occupy, is different before and after MySQL 5.0.3
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
However, note that the limit is lower if you use a multi-byte character set like utf8 or utf8mb4.
Use TEXT types inorder to overcome row size limit.
The four TEXT types are TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. These correspond to the four BLOB types and have the same maximum lengths and storage requirements.
More details on BLOB and TEXT Types
Ref for MySQLv8.0 https://dev.mysql.com/doc/refman/8.0/en/blob.html
Ref for MySQLv5.7 https://dev.mysql.com/doc/refman/5.7/en/blob.html
Ref for MySQLv5.6 https://dev.mysql.com/doc/refman/5.6/en/blob.html
Even more
Checkout more details on Data Type Storage Requirements which deals with storage requirements for all data types.
As per the online docs, there is a 64K row limit and you can work out the row size by using:
row length = 1
+ (sum of column lengths)
+ (number of NULL columns + delete_flag + 7)/8
+ (number of variable-length columns)
You need to keep in mind that the column lengths aren't a one-to-one mapping of their size. For example, CHAR(10) CHARACTER SET utf8 requires three bytes for each of the ten characters since that particular encoding has to account for the three-bytes-per-character property of utf8 (that's MySQL's utf8 encoding rather than "real" UTF-8, which can have up to four bytes).
But, if your row size is approaching 64K, you may want to examine the schema of your database. It's a rare table that needs to be that wide in a properly set up (3NF) database - it's possible, just not very common.
If you want to use more than that, you can use the BLOB or TEXT types. These do not count against the 64K limit of the row (other than a small administrative footprint) but you need to be aware of other problems that come from their use, such as not being able to sort using the entire text block beyond a certain number of characters (though this can be configured upwards), forcing temporary tables to be on disk rather than in memory, or having to configure client and server comms buffers to handle the sizes efficiently.
The sizes allowed are:
TINYTEXT 255 (+1 byte overhead)
TEXT 64K - 1 (+2 bytes overhead)
MEDIUMTEXT 16M - 1 (+3 bytes overhead)
LONGTEXT 4G - 1 (+4 bytes overhead)
You still have the byte/character mismatch (so that a MEDIUMTEXT utf8 column can store "only" about half a million characters, (16M-1)/3 = 5,592,405) but it still greatly expands your range.
Source
The max length of a varchar is subject to the max row size in MySQL,
which is 64KB (not counting BLOBs):
VARCHAR(65535) However, note that the limit is lower if you use a
multi-byte character set:
VARCHAR(21844) CHARACTER SET utf8
From MySQL documentation:
The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is
subject to the maximum row size (65,535 bytes, which is shared among
all columns) and the character set used. For example, utf8 characters
can require up to three bytes per character, so a VARCHAR column that
uses the utf8 character set can be declared to be a maximum of 21,844
characters.
Limits for the VARCHAR varies depending on charset used. Using ASCII would use 1 byte per character. Meaning you could store 65,535 characters. Using utf8 will use 3 bytes per character resulting in character limit of 21,844. BUT if you are using the modern multibyte charset utf8mb4 which you should use! It supports emojis and other special characters. It will be using 4 bytes per character. This will limit the number of characters per table to 16,383. Note that other fields such as INT will also be counted to these limits.
Conclusion:
utf8 maximum of 21,844 characters
utf8mb4 maximum of 16,383 characters
you can also use MEDIUMBLOB/LONGBLOB or MEDIUMTEXT/LONGTEXT
A BLOB type in MySQL can store up to 65,534 bytes, if you try to store more than this much data MySQL will truncate the data. MEDIUMBLOB can store up to 16,777,213 bytes, and LONGBLOB can store up to 4,294,967,292 bytes.
Before Mysql version 5.0.3 Varchar datatype can store 255 character, but from 5.0.3 it can be store 65,535 characters.
BUT it has a limitation of maximum row size of 65,535 bytes. It means including all columns it must not be more than 65,535 bytes.
In your case it may possible that when you are trying to set more than 10000 it is exceeding more than 65,535 and mysql will gives the error.
For more information: https://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html
blog with example: http://sforsuresh.in/mysql_varchar_max_length/
In my case, I tried 20'000 according to #Firze answer (with UTF8 limit) and phpMyAdmin responded with the maximum size; the answer was to decrease or choose BLOB instead.
So, I think, finally, the best is to test yourself according to the version of MySQL you have and the engine used. As MySQL / phpMyAdmin has safeguards.
You can use TEXT type, which is not limited to 64KB.

MySQL VARCHAR size limit

If I have a column in table with field of type VARCHAR(15) and if I try to insert data of length 16, MySQL gives an error stating
Data too long for column 'testname' at row 1
Does anyone know why VARCHAR fields in MySQL take fixed length? Also how many bytes does a VARCHAR field take per record based on the size given?
From the MySQL 5.0 Manual:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
I only use VARCHAR when I'm certain that the data the column needs to hold will never exceed a certain length, and even then I'm cautious. If I'm storing a text string I tend to use one of the TEXT types.
Check out the MySQL Storage Requirements for more information on how the bytes are used.
If you set a column to be varchar(15) the maximum bytes allowed is 15. Thus you can't pass it more than 15 characters without modifying the column to support more than 15. If you store a 4 character string it should only use around 4 bytes out of a possible 15, whereas if you used char(15) it would have filled in the other 11 with empty bytes.
http://dev.mysql.com/doc/refman/5.0/en/char.html
( My byte calculation was probably off since it's always -1/+1 or something like that ).
Small extra local note. The number of bytes used will depend on the encoding scheme in use. 1 byte per character in latin1 encoding, but up to 3 in UTF8. See link in mlambie's answer for details.
If you look here it should tell you everything about varchar you want to know:
http://dev.mysql.com/doc/refman/5.0/en/char.html
Basically, depending on the length you chose it will use 1 or two bytes to track the length of the current string in that column, so it will store the number of bytes for the data you put in, plus one or two bytes.
So, if you put in 'abc' then it will be 4 or 5 bytes used for that column in that row.
If you used char(15) then even 'abc' would take up 15 bytes, as the data is the right-padded to use up the full 15 bytes.