varchar(20) and varchar(50) are same? - mysql

I saw comment "If you have 50 million values between 10 and 15 characters in a varchar(20) column, and the same 50 million values in a varchar(50) column, they will take up exactly the same space. That's the whole point of varchar, as opposed to char.". Can Anybody tell me the reason? See What is a reasonable length limit on person "Name" fields?

MySQL offers a choice of storage engines. The physical storage of data depends on the storage engine.
MyISAM Storage of VARCHAR
In MyISAM, VARCHARs typically occupy just the actual length of the string plus a byte or two of length. This is made practical by the design limitation of MyISAM to table locking as opposed to a row locking capability. Performance consequences include a more compact cache profile, but also more complicated (slower) computation of record offsets.
(In fact, MyISAM gives you a degree of choice between fixed physical row size and variable physical row size table formats depending on column types occuring in the whole table. Occurrence of VARCHAR changes the default method only, but the presence of a TEXT blob forces VARCHARs in the same table to use the variable length method as well.)
The physical storage method is particularly important with indexes, which is a different story than tables. MyISAM uses space compression for both CHAR and VARCHAR columns, meaning that shorter data take up less space in the index in both cases.
InnoDB Storage of VARCHAR
InnoDB, like most other current relational databases, uses a more sophisticated mechanism. VARCHAR columns whose maximum width is less than 768 bytes will be stored inline, with room reserved matching that maximum width. More accurately here:
For each non-NULL variable-length field, the record header contains
the length of the column in one or two bytes. Two bytes will only be
needed if part of the column is stored externally in overflow pages or
the maximum length exceeds 255 bytes and the actual length exceeds 127
bytes. For an externally stored column, the two-byte length indicates
the length of the internally stored part plus the 20-byte pointer to
the externally stored part. The internal part is 768 bytes, so the
length is 768+20. The 20-byte pointer stores the true length of the
column.
InnoDB currently does not do space compression in its indexes, the opposite of MyISAM as described above.
Back to the question
All of the above is however just an implementational detail that may even change between versions. The true difference between CHAR and VARCHAR is semantic, and so is the one between VARCHAR(20) and VARCHAR(50). By ensuring that there is no way to store a 30 character string in a VARCHAR(20), the database makes the life easier and better defined for various processors and applications that it supposedly integrates into a predictably behaving solution. This is the big deal.
Regarding personal names specifically, this question may give you some practical guidance. People with full names over 70 UTF-8 characters are in trouble anyway.

Yes, that is indeed the whole point of VARCHAR. It only takes up as much space as the text is long.
If you had CHAR(50), it would take up 50 bytes (or characters) no matter how short the data really is (it would be padded, usually by spaces).
Can Anybody tell me the reason?
Because people thought it was wasteful to store a lot of useless padding, they invented VARCHAR.

The manual states:
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. (...)
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
Notice that VARCHAR(255) is not the same as VARCHAR(256).
This is theory. As habeebperwad suggests, the actual footprint of one row depends on (engine) page size and (hard disk) block size.

Related

always use 255 chars for varchar fields decreases performance?

I usually use maximum chars possible for varchar fields, so in most cases I set 255 but only using 16 chars in columns...
does this decreases performance for my database?
When it comes to storage, a VARCHAR(255) column will take up 1 byte to store the length of the actual value plus the bytes required to store the actual value.
For a latin1 VARCHAR(255) column, that's at most 256 bytes. For a UTF8 column, where each character can take up to 3 bytes (though rarely), the maximum size is 766 bytes. As we know the maximum index length for a single column in bytes in InnoDB is 767 bytes, hence perhaps the reason some declare 255 as the maximum supported column length.
So, again, when storing the value, it only takes up as much room as is actually needed.
However, if the column is indexed, the index automatically allocates the maximum possible size so that each node in the index has enough room to store any possible value. When searching through an index, MySQL loads the nodes in specific byte size chunks at a time. Large nodes means less nodes per read, which means it takes longer to search the index.
MySQL will also use the maximum size when storing the values in a temp table for sorting.
So, even if you aren't using indexes, but are ever performing a query that can't utilize an index for sorting, you will get a performance hit.
Therefore, if performance is your goal, setting any VARCHAR column to 255 characters should not be a rule of thumb. Instead, you should use the minimum required.
There may be edge cases where you'd rather suffer the performance every day so that you never have to lock a table completely to increase the size of a column, but I don't think that's the norm.
One possible exception is if you are joining on a VARCHAR column between two tables. MySQL says:
MySQL can use indexes on columns more efficiently if they are declared
as the same type and size.
In that case, you might use the max size between the two.
Whenever you're talking about "performance" you can only find out one way: Benchmarking.
In theoretical terms there's no difference between VARCHAR(20) and VARCHAR(255) if they're both populated with the same data. Keep in mind if you get your length wrong you will have massive truncation problems and MySQL does not warn you before it starts chopping data to fit.
I try to avoid setting limits on VARCHAR columns unless the data would be completely invalid if it was longer. For instance, two-character ISO country codes can be stored in VARCHAR(2) because longer strings are meaningless. For other things, especially names or phone numbers, limiting the length is potentially and probably harmful.
Still, you will want to test any schema you create to be sure it meets your performance requirements. I expect you'd have a hard time detecting any difference at all between VARCHAR(25) and VARCHAR(16).
There are two ways in which this will decrease performance.
if you're loading those columns many many times, performing a join on the column, or other such thing that means they need to be accessed a large number of times. The number of times depends on your machine, but think on the order of millions.
if you're always filling the field (using 20 chars in a varchar(20), then the length checks are adding a little overhead whenever you perform an insert.
The best way to determine this though is to benchmark your database though.

Is there any difference between varchar(10) and varchar(1000) when we store string whose length is less than 10?

I understand the difference between char and varchar but I'm not sure of exact meaning of varchar maximum length.
When we store string whose length is less than 10, is there any difference between varchar(10) and varchar(100)?
I think both cases use same space and have same performance. If so, why do we need varchar max limit?
Is it enough to use just "varchar" instead of "varchar(xxx)"?
(added) I'm using MySQL 5.0.67
This depends entirely on the DBMS engine being used. SQL itself does not mandate how things are stored physically, just how they're seen logically.
For example, your DBMS may allocate space in the row for the maximum size, plus some extra bytes to store the length. In that case, there would be a big difference between varchar(10) and varchar(1000) since you would waste quite a bit of space per row.
Alternatively, it may use a buffer pool for the varchar data and store only the length and the buffer pool "starting address" in the row. In that case, every single row would store identically-sized information for a varchar column regardless of its size, but there would be an added step to extract the actual data in that column (following the link to the buffer pool).
The reason you use a varchar is exactly why it's named varchar. It allows you to store variable-sized data elements. Typically, char(10) gives you ten characters, no matter what, padding it with spaces if you insert something shorter. You can trim trailing spaces off as you extract it but that won't work so well if the data you want to store is actually "hello ", with a trailing space you want preserved.
A decent DBMS engine may decide to make a trade-off depending on the maximum size of the varchar column. For short ones, it could just store it inline in the row and consume the extra bytes for the size.
Longer varchar columns could be "outsourced" to a separate buffer pool to ensure row-reading is kept efficient (at least until you need the large varchar column, anyway).
What you need to do is re-ask the question for your specific DBMS so as to get a more targeted answer.
Or, in all honesty, engineer your database to only store the maximum size. If you know it's 10, then varchar(1000) is a waste. If, in the future, you need to enlarge the column, that is the time to do it, rather than now (see YAGNI).
For MySQL, you'll want to look at Chapter 14 Storage Engines of the online documentation.
It covers the various storage engines (such as InnoDB and MyISAM) that MySQL uses and, looking deep enough, you can see how the information is physically stored.
For example, in MyISAM, the presence of variable length data in a table (varchar included) usually means dynamic tables. This follows a scheme roughly analogous to the buffer pool concept I mentioned above, with the advantage that less space is wasted for variable sized columns, and the disadvantage that rows may become fragmented.
The other storage format (discounting compressed format since it's only really used for read-only tables) is the static one, where data is stored in a single physical row.
Information on the InnoDB physical structures can be found here. Depending on whether you use the Antelope or Barracuda file format, you end up with the "all information is a physical row" or "buffer pool" situation, similar to the MyISAM distinction between dynamic and static.
In SQL Server, the limit does not affect how the data is stored on disk. What it does provide, though, is one constraint for free. If you, as the database designer, only want up to 10 characters stored, you've prevented someone from storing a novel instead.
Is it enough to use just "varchar"
Again, for SQL Server, almost certainly not what you want. In most circumstances, if you don't specify a limit, you get a varchar(1) (surely the most pointless data type ever conceived). Occasionally, it's a varchar(30).
In Oracle, the Varchar stretches in size depending on its use, just up to the point where you set the limit. This means that indeed, a varchar(10) and a varchar(100) containing 2 characters use the same space (which is different for a char, which always uses the full allocated space).

Does limiting the size of rows benefit MySQL table size?

I know that limiting the row size will keep the size at a set maximum, however, if i limit all my rows to the maximum that they will ever need, does that increase performance on the rows (i.e. take up less space)
Or does mysql dynamically allocate the size necessary for rows and maximum row length has no affect on it?
MySQL will not dynamically size fields; the potential required size is determined by the data type you choose. However, you can choose appropriate data types to help optimize the database.
For example, the TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT data types require 1 byte, 2 bytes, 3 bytes, 4 bytes, and 8 bytes respectively. If you are only ever going to store integers up to 10,000, SMALLINT would be the best choice since it can store values up to 65,535 (unsigned) wioth plenty of extra room and requires half of the size of an INT.
With variable length types like VARCHAR any length up to and including 255 uses a one byte prefix to store the length. Anything more uses a 2 byte prefix. Limiting VARCHARs to at most 255 if possible is best. As an example, the string foo will require 4 bytes in both a VARCHAR(3) field and a VARCHAR(255) field. If the max length of the field was 500, for example, foo would require 5 bytes.
When possible with short (or sometimes even long) text xolumns, the CHAR type can be beneficial if the strings always have the same length or a very close length. Also, if you're using the MyISAM storage engine and all of your columns are fixed length, you can use the FIXED row format, which is beneficial.
More information about the MySQL data types is available in the manual.
I don't know what is purpose of your question, but if you want to minimize storage space used by database you may use built-in compression, which is avaliable when you are using InnoDB storage engine.
More information: Enabling Compression for a Table (dev.mysql.com)
I don't know how it affects performance (speed), I never used that feature, but I think compression may slow your database significantly. However if you have limited disk space on some hosting server - it may be useful for applictions without heavy load.
MySql does not automatically allocate the necessary size, so you wan't to define it yourself to make the database's performance better and keeping it at a minimum size.
One nice approach that I walked into the other way though was to use NVARCHAR instead of VARCHAR if the VARCHAR was to be used for long strings, but yet not long enough to fit for TEXT, which requires a lot of space.

mysql varchar innodb page size limit 8100 bytes

Regarding innodb, someone recently told me:
"the varchar content beyond 768 bytes
is stored in supplemental 16K pages"
This is very interesting. If each varchar will be latin1, which I believe stores as 1byte per letter, would a single varchar(500) (<768 bytes) require an extra i/o as a varchar(1000) (>768 bytes) would??
(this question is to find out if all varchars or just big varchars are split into a separate page)
Is the 768 limit per varchar or for all varchars in the row added together? (for example, does this get optimized - varchar(300), varchar(300), varchar(300): [where each individual varchar column is below 768 but together they are above 768 characters]?
I am confused about if the 768 limit relates to each individual varchar or all varchars in the row totaled (as in the question). Any clarification?
EDIT: Removed part about CHARS due to finding out about their limit of 255 max.
The documentation of the current MySQL does not mention this constraint. It actually gives the impression that VARCHAR is stored in the row itself, as the sum of the length of all VARCHARs together in a row cannot be more than 65k bytes.
It also mentions that storage is length+data, which is different from what you describe.
I did find a mention of a splitted storage in MySQL compression internals:
Tables created in previous versions of
InnoDB use the “Antelope” file format,
which supports only
ROW_FORMAT=REDUNDANT and
ROW_FORMAT=COMPACT. In these formats,
InnoDB stores the first 768 bytes of
BLOB, VARCHAR and TEXT columns in the
clustered index record along with the
primary key. The 768-byte prefix is
followed by a 20-byte pointer to the
overflow pages that contain the rest
of the column value.
To me this suggests that each column is considered individually when determining whether that column needs an overflow.
This seems to contrast with newer versions of MySQL which seem to store VARCHAR quite differently. A VARCHAR in a newer version has a max length of 65k bytes, which seems to be stored in the record iself (no overflow), which results in the constraint that all columns together may not be longer than 65k byte.
a CHAR is very different. It's maximum length is 255, and it is stored in the row itself. So no CHAR(500) :)
The length of a CHAR column is fixed
to the length that you declare when
you create the table. The length can
be any value from 0 to 255. When CHAR
values are stored, they are
right-padded with spaces to the
specified length. When CHAR values are
retrieved, trailing spaces are removed
unless the PAD_CHAR_TO_FULL_LENGTH SQL
mode is enabled.

What are the optimum varchar sizes for MySQL?

How does MySQL store a varchar field? Can I assume that the following pattern represents sensible storage sizes :
1,2,4,8,16,32,64,128,255 (max)
A clarification via example. Lets say I have a varchar field of 20 characters. Does MySQL when creating this field, basically reserve space for 32 bytes(not sure if they are bytes or not) but only allow 20 to be entered?
I guess I am worried about optimising disk space for a massive table.
To answer the question, on disk MySql uses 1 + the size that is used in the field to store the data (so if the column was declared varchar(45), and the field was "FooBar" it would use 7 bytes on disk, unless of course you where using a multibyte character set, where it would be using 14 bytes). So, however you declare your columns, it wont make a difference on the storage end (you stated you are worried about disk optimization for a massive table). However, it does make a difference in queries, as VARCHAR's are converted to CHAR's when MySql makes a temporary table (SORT, ORDER, etc) and the more records you can fit into a single page, the less memory and faster your table scans will be.
MySQL stores a varchar field as a variable length record, with either a one-byte or a two-byte prefix to indicate the record size.
Having a pattern of storage sizes doesn't really make any difference to how MySQL will function when dealing with variable length record storage. The length specified in a varchar(x) declaration will simply determine the maximum length of the data that can be stored. Basically, a varchar(16) is no different disk-wise than a varchar(128).
This manual page has a more detailed explanation.
Edit: With regards to your updated question, the answer is still the same. A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead). So it doesn't matter if you have a varchar(16) or a varchar(128), if you store a 10-character string in it, you're only going to use 10 bytes (plus 1 or 2) of disk space.