This question already has answers here:
VARCHAR vs TEXT in MySQL
(3 answers)
Closed 3 years ago.
When we create a table in MySQL with a VARCHAR column, we have to set the length for it. But for TEXT type we don't have to provide the length.
What are the differences between VARCHAR and TEXT?
TL;DR
TEXT
fixed max size of 65535 characters (you cannot limit the max size)
takes 2 + c bytes of disk space, where c is the length of the stored string.
cannot be (fully) part of an index. One would need to specify a prefix length.
VARCHAR(M)
variable max size of M characters
M needs to be between 1 and 65535
takes 1 + c bytes (for M ≤ 255) or 2 + c (for 256 ≤ M ≤ 65535) bytes of disk space where c is the length of the stored string
can be part of an index
More Details
TEXT has a fixed max size of 2¹⁶-1 = 65535 characters.
VARCHAR has a variable max size M up to M = 2¹⁶-1.
So you cannot choose the size of TEXT but you can for a VARCHAR.
The other difference is, that you cannot put an index (except for a fulltext index) on a TEXT column.
So if you want to have an index on the column, you have to use VARCHAR. But notice that the length of an index is also limited, so if your VARCHAR column is too long you have to use only the first few characters of the VARCHAR column in your index (See the documentation for CREATE INDEX).
But you also want to use VARCHAR, if you know that the maximum length of the possible input string is only M, e.g. a phone number or a name or something like this. Then you can use VARCHAR(30) instead of TINYTEXT or TEXT and if someone tries to save the text of all three "Lord of the Ring" books in your phone number column you only store the first 30 characters :)
Edit: If the text you want to store in the database is longer than 65535 characters, you have to choose MEDIUMTEXT or LONGTEXT, but be careful: MEDIUMTEXT stores strings up to 16 MB, LONGTEXT up to 4 GB. If you use LONGTEXT and get the data via PHP (at least if you use mysqli without store_result), you maybe get a memory allocation error, because PHP tries to allocate 4 GB of memory to be sure the whole string can be buffered. This maybe also happens in other languages than PHP.
However, you should always check the input (Is it too long? Does it contain strange code?) before storing it in the database.
Notice: For both types, the required disk space depends only on the length of the stored string and not on the maximum length.
E.g. if you use the charset latin1 and store the text "Test" in VARCHAR(30), VARCHAR(100) and TINYTEXT, it always requires 5 bytes (1 byte to store the length of the string and 1 byte for each character). If you store the same text in a VARCHAR(2000) or a TEXT column, it would also require the same space, but, in this case, it would be 6 bytes (2 bytes to store the string length and 1 byte for each character).
For more information have a look at the documentation.
Finally, I want to add a notice, that both, TEXT and VARCHAR are variable length data types, and so they most likely minimize the space you need to store the data. But this comes with a trade-off for performance. If you need better performance, you have to use a fixed length type like CHAR. You can read more about this here.
There is an important detail that has been omitted in the answer above.
MySQL imposes a limit of 65,535 bytes for the max size of each row.
The size of a VARCHAR column is counted towards the maximum row size, while TEXT columns are assumed to be storing their data by reference so they only need 9-12 bytes. That means even if the "theoretical" max size of your VARCHAR field is 65,535 characters you won't be able to achieve that if you have more than one column in your table.
Also note that the actual number of bytes required by a VARCHAR field is dependent on the encoding of the column (and the content). MySQL counts the maximum possible bytes used toward the max row size, so if you use a multibyte encoding like utf8mb4 (which you almost certainly should) it will use up even more of your maximum row size.
Correction: Regardless of how MySQL computes the max row size, whether or not the VARCHAR/TEXT field data is ACTUALLY stored in the row or stored by reference depends on your underlying storage engine. For InnoDB the row format affects this behavior. (Thanks Bill-Karwin)
Reasons to use TEXT:
If you want to store a paragraph or more of text
If you don't need to index the column
If you have reached the row size limit for your table
Reasons to use VARCHAR:
If you want to store a few words or a sentence
If you want to index the (entire) column
If you want to use the column with foreign-key constraints
Related
I read that in MySql varchar allows a (variable) max size of 65KB. Text data type fixed max size is 65K. Meaning if a column is declared of type Text and field has only one character, it will still take up disk space of 65KB and similarly when loaded into memory takes up 65KB. Is this correct?
Also is this the same for medium text (occupying 16MB even if that column has just one character)?
Lets say i need to declare a column which will be occupied by strings with number of characters in the range of 150K, i cannot use text and can use medium text but wondering if i will be wasting lot of disk space/memory. What is a better way to do this. One way i can think of is to create 3 rows (split 150k characters into 3 rows of 50K), but is there is a better way of doing this
No, variable length data types like VARCHAR and TEXT and their cousins do not occupy as much space as their maximum size. They only occupy the length of the string they store, and this can vary row by row, as you store strings of different lengths.
They also have between 1 and 4 bytes per row to encode the length. For example, a VARCHAR(255) is stored with 1 preceding byte, because 1 byte can encode any length up to 255. Whereas LONGTEXT requires 4 bytes to encode the length, because that's what is required for the 4GB maximum length of a LONGTEXT. Following the bytes encoding the length, the actual data content only needs to be as long as the respective string.
It's a bit more complicated than that, since InnoDB stores all data in pages of uniform size (16KB by default), so long strings must be split over multiple pages.
For a column where you expect typical data will be 150K characters, you should use MEDIUMTEXT. The VARCHAR and TEXT types can't store 150K characters.
The variable-length quality does not apply to the CHAR type, which always stores the full length according to the definition of the column. I use CHAR only for strings that are of fixed size on every row, so there's no wasted space.
I have heard that in MsSQL/Access databases that if you declare a varchar of length 100, it declares those 100 chars every row, even if there is only one char in that column.
I have two questions about this.
First: is this true? And if yes, does this also work like this in MySQL?
Why i'm asking this:
I'm working a lot with mysql, and i came across a table database with 128 longtext-columns. The reasoning behind this was "We cannot be certain how much data gets stored in these columns. sometimes it's 1 char, sometimes thousands." I was wondering if this was the right way storage-wise, or that he has to do some changes.
No, VARCHAR is meant for variable length text, while CHAR is fixed length. The number parameter is the character limit for the text but VARCHAR only uses up as much space as the actual characters you enter in that row (+ some bytes to store the length used).
MySQL, Microsoft SQL Server and pretty much all relational databases work the same way with VARCHAR. Every column takes up some minimum amount of space in a row but with VARCHAR it would be the bytes to store the text + bytes to store the length of the text. No text entered would mean just a 1 or 2 bytes used to save '0' as the length.
If you don't know how much text data will be entered, then use LONGTEXT in MySQL or NVARCHAR(MAX) in MS-SQL. This datatype allows you to store an unlimited amount of text efficiently (up to the row size limitations of the database itself). It's just a bigger, unlimited length version of standard VARCAHR.
For SQL Server the answer is no. From the documentation on MSDN:
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n
defines the string length and can be a value from 1 through 8,000. max
indicates that the maximum storage size is 2^31-1 bytes (2 GB). The
storage size is the actual length of the data entered + 2 bytes. The
ISO synonyms for varchar are char varying or character varying.
It is possible someone was confusing VARCHAR and CHAR. The CHAR data type requires a fixed amount of storage, based on the maximum allowed size.
EDIT
Rereading your question I'm not entirely sure I've followed your meaning. If you were not referring to the required storage space then please disregard.
This question already has answers here:
VARCHAR vs TEXT in MySQL
(3 answers)
Closed 3 years ago.
When we create a table in MySQL with a VARCHAR column, we have to set the length for it. But for TEXT type we don't have to provide the length.
What are the differences between VARCHAR and TEXT?
TL;DR
TEXT
fixed max size of 65535 characters (you cannot limit the max size)
takes 2 + c bytes of disk space, where c is the length of the stored string.
cannot be (fully) part of an index. One would need to specify a prefix length.
VARCHAR(M)
variable max size of M characters
M needs to be between 1 and 65535
takes 1 + c bytes (for M ≤ 255) or 2 + c (for 256 ≤ M ≤ 65535) bytes of disk space where c is the length of the stored string
can be part of an index
More Details
TEXT has a fixed max size of 2¹⁶-1 = 65535 characters.
VARCHAR has a variable max size M up to M = 2¹⁶-1.
So you cannot choose the size of TEXT but you can for a VARCHAR.
The other difference is, that you cannot put an index (except for a fulltext index) on a TEXT column.
So if you want to have an index on the column, you have to use VARCHAR. But notice that the length of an index is also limited, so if your VARCHAR column is too long you have to use only the first few characters of the VARCHAR column in your index (See the documentation for CREATE INDEX).
But you also want to use VARCHAR, if you know that the maximum length of the possible input string is only M, e.g. a phone number or a name or something like this. Then you can use VARCHAR(30) instead of TINYTEXT or TEXT and if someone tries to save the text of all three "Lord of the Ring" books in your phone number column you only store the first 30 characters :)
Edit: If the text you want to store in the database is longer than 65535 characters, you have to choose MEDIUMTEXT or LONGTEXT, but be careful: MEDIUMTEXT stores strings up to 16 MB, LONGTEXT up to 4 GB. If you use LONGTEXT and get the data via PHP (at least if you use mysqli without store_result), you maybe get a memory allocation error, because PHP tries to allocate 4 GB of memory to be sure the whole string can be buffered. This maybe also happens in other languages than PHP.
However, you should always check the input (Is it too long? Does it contain strange code?) before storing it in the database.
Notice: For both types, the required disk space depends only on the length of the stored string and not on the maximum length.
E.g. if you use the charset latin1 and store the text "Test" in VARCHAR(30), VARCHAR(100) and TINYTEXT, it always requires 5 bytes (1 byte to store the length of the string and 1 byte for each character). If you store the same text in a VARCHAR(2000) or a TEXT column, it would also require the same space, but, in this case, it would be 6 bytes (2 bytes to store the string length and 1 byte for each character).
For more information have a look at the documentation.
Finally, I want to add a notice, that both, TEXT and VARCHAR are variable length data types, and so they most likely minimize the space you need to store the data. But this comes with a trade-off for performance. If you need better performance, you have to use a fixed length type like CHAR. You can read more about this here.
There is an important detail that has been omitted in the answer above.
MySQL imposes a limit of 65,535 bytes for the max size of each row.
The size of a VARCHAR column is counted towards the maximum row size, while TEXT columns are assumed to be storing their data by reference so they only need 9-12 bytes. That means even if the "theoretical" max size of your VARCHAR field is 65,535 characters you won't be able to achieve that if you have more than one column in your table.
Also note that the actual number of bytes required by a VARCHAR field is dependent on the encoding of the column (and the content). MySQL counts the maximum possible bytes used toward the max row size, so if you use a multibyte encoding like utf8mb4 (which you almost certainly should) it will use up even more of your maximum row size.
Correction: Regardless of how MySQL computes the max row size, whether or not the VARCHAR/TEXT field data is ACTUALLY stored in the row or stored by reference depends on your underlying storage engine. For InnoDB the row format affects this behavior. (Thanks Bill-Karwin)
Reasons to use TEXT:
If you want to store a paragraph or more of text
If you don't need to index the column
If you have reached the row size limit for your table
Reasons to use VARCHAR:
If you want to store a few words or a sentence
If you want to index the (entire) column
If you want to use the column with foreign-key constraints
We have some MySQL tables with 100,000 to 10,000,000 records. Some of the fields are VARCHAR(100) when in fact no entry exceeds 11 characters.
Clearly we are using up way more space then we should be... If one VARCHAR(100) field for a million-record table uses 100MB of space, then we might be wasting as much as several GB of space.
If we were to streamline these tables, and reduce the VARCHAR fields to their proper size, would it help us with more than just storage space? Could it possibly improve the lookup times for queries?
As of MySQL documentation to Data type storage requirements the varchar type stores the values as follows:
L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes, where L represents the actual length in bytes of a given string value
Seems to me that if your plan is to change the type from VARCHAR(100) to VARCHAR(11) it will not affect query performance because MySQL already stores the value on its "optimum".
If you had a type CHAR(100) your strings with less than 100 characters would be right padded with blank spaces and in this case you will have a bad space consumption and I think that a bad query performance too.
The length of CHAR type, referring the documentation, is:
M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set, where M represents the declared column length in characters
But if all your records have fixed length 11 you should use CHAR(11) and it will improve the storage and performance of queries.
Another important point about string storage refers to the char set, as says in documentation:
To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all characters use the same number of bytes and can require up to three bytes per character.
Hope it helps!
I don't know the specifics of the mysql implementation, but I do know the typical implementation of a relational database, and in that implementation it does help.
Typically, records are stored consecutively in a file called a RID table. The record number in the RID table (using zero based counting) times the record size is an offset to where in the file the record is stored.
If the record size is smaller, then more records from the RID table fit into a disk sector fetched from the disk and more records fit into memory.
Even with a different implementation, a smaller record buffer allows more records to be cached in memory, which can reduce the number of disk accesses.
If I have a column in table with field of type VARCHAR(15) and if I try to insert data of length 16, MySQL gives an error stating
Data too long for column 'testname' at row 1
Does anyone know why VARCHAR fields in MySQL take fixed length? Also how many bytes does a VARCHAR field take per record based on the size given?
From the MySQL 5.0 Manual:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
I only use VARCHAR when I'm certain that the data the column needs to hold will never exceed a certain length, and even then I'm cautious. If I'm storing a text string I tend to use one of the TEXT types.
Check out the MySQL Storage Requirements for more information on how the bytes are used.
If you set a column to be varchar(15) the maximum bytes allowed is 15. Thus you can't pass it more than 15 characters without modifying the column to support more than 15. If you store a 4 character string it should only use around 4 bytes out of a possible 15, whereas if you used char(15) it would have filled in the other 11 with empty bytes.
http://dev.mysql.com/doc/refman/5.0/en/char.html
( My byte calculation was probably off since it's always -1/+1 or something like that ).
Small extra local note. The number of bytes used will depend on the encoding scheme in use. 1 byte per character in latin1 encoding, but up to 3 in UTF8. See link in mlambie's answer for details.
If you look here it should tell you everything about varchar you want to know:
http://dev.mysql.com/doc/refman/5.0/en/char.html
Basically, depending on the length you chose it will use 1 or two bytes to track the length of the current string in that column, so it will store the number of bytes for the data you put in, plus one or two bytes.
So, if you put in 'abc' then it will be 4 or 5 bytes used for that column in that row.
If you used char(15) then even 'abc' would take up 15 bytes, as the data is the right-padded to use up the full 15 bytes.