Memory usage of storing strings as varchar in MySQL - mysql

I've begun to get very interested in the memory usage of MySQL. So I'm looking at this here:
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
I get very excited about the prospect of saving memory by (for example) needing only a signed smallint where I was using an unsigned int in many places. Then I read about varchars...
"VARCHAR(M) - Length + 1 bytes if column values require 0 – 255 bytes"
What?! Now it appears to me as though storing a single varchar would use up so much memory, that I may as well not even get excited with my int vs. smallint because it's vastly overshadowed by the varchar field. So I come here asking if this is true, because it simply can't be? Are varchars really that terrible? Or should I really not be getting excited at all for my smallint discovery?
edit: Sorry! I should've been more clear. So, let's say I store a varchar with 7 characters, meaning 8 bytes. That means, then, that it uses the same as a number stored in a BIGINT column? That's what I'm concerned about.

What this is saying is that for a given string length, the amount of storage used is equal to the length of the string in bytes, plus one byte to tell MySQL how long the string is.
So for instance, the word "automobile" is 10 bytes (1 for each character), so if it is stored in a varchar column it will take up 11 bytes. 1 for the number 10 , and 1 each for each of the characters in the string.
From the link you posted:
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
The storage requirements depend on these factors:
-The actual length of the column value
-The column's maximum possible length
-The character set used for the column, because some character sets contain multi-byte characters
For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes).

While I am no MySQL DBA, it appears there is a very simple answer to this question, and no need to go deeper into storage sizes - because it is NOT configureable.
Per MySQL memory storage documentation,
MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length.
Thus, you won't have any specific gains by using VARCHAR for a table using the MEMORY storage engine, no matter how VARCHAR is stored on other storage engines such as MyISAM or InnoDB.

Related

Mysql memory efficiency and type declaration

I have heard that in MsSQL/Access databases that if you declare a varchar of length 100, it declares those 100 chars every row, even if there is only one char in that column.
I have two questions about this.
First: is this true? And if yes, does this also work like this in MySQL?
Why i'm asking this:
I'm working a lot with mysql, and i came across a table database with 128 longtext-columns. The reasoning behind this was "We cannot be certain how much data gets stored in these columns. sometimes it's 1 char, sometimes thousands." I was wondering if this was the right way storage-wise, or that he has to do some changes.
No, VARCHAR is meant for variable length text, while CHAR is fixed length. The number parameter is the character limit for the text but VARCHAR only uses up as much space as the actual characters you enter in that row (+ some bytes to store the length used).
MySQL, Microsoft SQL Server and pretty much all relational databases work the same way with VARCHAR. Every column takes up some minimum amount of space in a row but with VARCHAR it would be the bytes to store the text + bytes to store the length of the text. No text entered would mean just a 1 or 2 bytes used to save '0' as the length.
If you don't know how much text data will be entered, then use LONGTEXT in MySQL or NVARCHAR(MAX) in MS-SQL. This datatype allows you to store an unlimited amount of text efficiently (up to the row size limitations of the database itself). It's just a bigger, unlimited length version of standard VARCAHR.
For SQL Server the answer is no. From the documentation on MSDN:
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n
defines the string length and can be a value from 1 through 8,000. max
indicates that the maximum storage size is 2^31-1 bytes (2 GB). The
storage size is the actual length of the data entered + 2 bytes. The
ISO synonyms for varchar are char varying or character varying.
It is possible someone was confusing VARCHAR and CHAR. The CHAR data type requires a fixed amount of storage, based on the maximum allowed size.
EDIT
Rereading your question I'm not entirely sure I've followed your meaning. If you were not referring to the required storage space then please disregard.

How does MySQL varchar know how many bytes indicate the length?

The MySQL manual and several StackOverflow answers make it clear that varchar uses:
1 byte for varchars with 0-255 characters in them
2 bytes for varchars with more than 255 characters in them.
The first part makes sense. A single byte can store 256 different values, i.e. 0 through 255.
What I am trying to figure out is how MySQL knows how many bytes indicate the length.
Imagine a 255-char varchar starting with the following bytes: [255][w][o][r][d]~
According to the manual, only the first byte is used to indicate the length in this scenario. When reading the field, MySQL will somehow have to know that this is the case here, and that the second byte is not part of the length.
Now imagine a 256-char varchar starting with the following bytes: [255][1][w][o][r][d]~
Now MySQL miraculously knows that it should interpret the first two bytes as the length, when reading the field.
How does it distinguish? The only foolproof way I have come up with is to interpret only the first byte as length, then determine if the text length matches (in its current encoding), and if not, we know that the first two bytes must be the length.
It happens at the time of definition. All length prefixes will be the same size in bytes for a particular VARCHAR column. The VARCHAR column will use 2 bytes or the VARCHAR column will use 1 byte, depending on the defined size in characters, and the character set.
All VARCHAR columns defined such that it might require more than 255 bytes use 2 bytes to store the size. MySQL isn't going to use 1 byte for some values in a column and 2 bytes for others.
MySQL documentation on CHAR and VARCHAR Types states this pretty clearly (emphasis mine):
A column uses one length byte if values require no more than 255
bytes, two length bytes if values may require more than 255 bytes.
If you declare a VARCHAR(255) column to use the utf8 character set, it's still going to use 2 bytes for the length prefix, not 1, since the length in bytes may be greater than 255 with utf8 characters.

MySQL: VARCHAR(1024) vs VARCHAR(512)

In MySQL what is the difference between VARCHAR(1024) and VARCHAR(512)? If my item will never be more than 512 characters, what do I lose by using VARCHAR(1024)?
Don't know where you got that from, but it's not possible to create a table with varchar without specifying the length. It results in a syntax error. So your question is obsolete.
UPDATE:
Nothing. Varchar is as the name implies a datatype of variable length, at least to the maximum length you specified when creating the table. This means, that in a varchar column for each row one additional byte is used to store how long the string in the row actually is.
So the difference between varchar(1024) and varchar(512) is, that your data gets truncated when you try to insert more than 1024 or 512 bytes. Note: bytes, not characters. How much bytes each character uses is dependent on the character set you're using.
There is a actually a difference. And it can have a big performance impact if you manipulate big data. If a temporary table is used, the records on disk will take the full length indicated instead of the variable length. A high value will slow down the request even more in that case. Temporary tables can occur for various reasons (such as memory full, or some combinations of group by /order by).
VARCHAR(1024) 1024 this is lenght.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
According to mySQL documentation
In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte
length prefix plus data. The length prefix indicates the number of
bytes in the value. A column uses one length byte if values require no
more than 255 bytes, two length bytes if values may require more than
255 bytes.
A deeper analysis of the performance impact of larger VARCHARs can be found here.

MySQL Tables : Does storage efficiency affect lookup times?

We have some MySQL tables with 100,000 to 10,000,000 records. Some of the fields are VARCHAR(100) when in fact no entry exceeds 11 characters.
Clearly we are using up way more space then we should be... If one VARCHAR(100) field for a million-record table uses 100MB of space, then we might be wasting as much as several GB of space.
If we were to streamline these tables, and reduce the VARCHAR fields to their proper size, would it help us with more than just storage space? Could it possibly improve the lookup times for queries?
As of MySQL documentation to Data type storage requirements the varchar type stores the values as follows:
L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes, where L represents the actual length in bytes of a given string value
Seems to me that if your plan is to change the type from VARCHAR(100) to VARCHAR(11) it will not affect query performance because MySQL already stores the value on its "optimum".
If you had a type CHAR(100) your strings with less than 100 characters would be right padded with blank spaces and in this case you will have a bad space consumption and I think that a bad query performance too.
The length of CHAR type, referring the documentation, is:
M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set, where M represents the declared column length in characters
But if all your records have fixed length 11 you should use CHAR(11) and it will improve the storage and performance of queries.
Another important point about string storage refers to the char set, as says in documentation:
To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all characters use the same number of bytes and can require up to three bytes per character.
Hope it helps!
I don't know the specifics of the mysql implementation, but I do know the typical implementation of a relational database, and in that implementation it does help.
Typically, records are stored consecutively in a file called a RID table. The record number in the RID table (using zero based counting) times the record size is an offset to where in the file the record is stored.
If the record size is smaller, then more records from the RID table fit into a disk sector fetched from the disk and more records fit into memory.
Even with a different implementation, a smaller record buffer allows more records to be cached in memory, which can reduce the number of disk accesses.

MySQL VARCHAR size limit

If I have a column in table with field of type VARCHAR(15) and if I try to insert data of length 16, MySQL gives an error stating
Data too long for column 'testname' at row 1
Does anyone know why VARCHAR fields in MySQL take fixed length? Also how many bytes does a VARCHAR field take per record based on the size given?
From the MySQL 5.0 Manual:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
I only use VARCHAR when I'm certain that the data the column needs to hold will never exceed a certain length, and even then I'm cautious. If I'm storing a text string I tend to use one of the TEXT types.
Check out the MySQL Storage Requirements for more information on how the bytes are used.
If you set a column to be varchar(15) the maximum bytes allowed is 15. Thus you can't pass it more than 15 characters without modifying the column to support more than 15. If you store a 4 character string it should only use around 4 bytes out of a possible 15, whereas if you used char(15) it would have filled in the other 11 with empty bytes.
http://dev.mysql.com/doc/refman/5.0/en/char.html
( My byte calculation was probably off since it's always -1/+1 or something like that ).
Small extra local note. The number of bytes used will depend on the encoding scheme in use. 1 byte per character in latin1 encoding, but up to 3 in UTF8. See link in mlambie's answer for details.
If you look here it should tell you everything about varchar you want to know:
http://dev.mysql.com/doc/refman/5.0/en/char.html
Basically, depending on the length you chose it will use 1 or two bytes to track the length of the current string in that column, so it will store the number of bytes for the data you put in, plus one or two bytes.
So, if you put in 'abc' then it will be 4 or 5 bytes used for that column in that row.
If you used char(15) then even 'abc' would take up 15 bytes, as the data is the right-padded to use up the full 15 bytes.