always use 255 chars for varchar fields decreases performance? - mysql

I usually use maximum chars possible for varchar fields, so in most cases I set 255 but only using 16 chars in columns...
does this decreases performance for my database?

When it comes to storage, a VARCHAR(255) column will take up 1 byte to store the length of the actual value plus the bytes required to store the actual value.
For a latin1 VARCHAR(255) column, that's at most 256 bytes. For a UTF8 column, where each character can take up to 3 bytes (though rarely), the maximum size is 766 bytes. As we know the maximum index length for a single column in bytes in InnoDB is 767 bytes, hence perhaps the reason some declare 255 as the maximum supported column length.
So, again, when storing the value, it only takes up as much room as is actually needed.
However, if the column is indexed, the index automatically allocates the maximum possible size so that each node in the index has enough room to store any possible value. When searching through an index, MySQL loads the nodes in specific byte size chunks at a time. Large nodes means less nodes per read, which means it takes longer to search the index.
MySQL will also use the maximum size when storing the values in a temp table for sorting.
So, even if you aren't using indexes, but are ever performing a query that can't utilize an index for sorting, you will get a performance hit.
Therefore, if performance is your goal, setting any VARCHAR column to 255 characters should not be a rule of thumb. Instead, you should use the minimum required.
There may be edge cases where you'd rather suffer the performance every day so that you never have to lock a table completely to increase the size of a column, but I don't think that's the norm.
One possible exception is if you are joining on a VARCHAR column between two tables. MySQL says:
MySQL can use indexes on columns more efficiently if they are declared
as the same type and size.
In that case, you might use the max size between the two.

Whenever you're talking about "performance" you can only find out one way: Benchmarking.
In theoretical terms there's no difference between VARCHAR(20) and VARCHAR(255) if they're both populated with the same data. Keep in mind if you get your length wrong you will have massive truncation problems and MySQL does not warn you before it starts chopping data to fit.
I try to avoid setting limits on VARCHAR columns unless the data would be completely invalid if it was longer. For instance, two-character ISO country codes can be stored in VARCHAR(2) because longer strings are meaningless. For other things, especially names or phone numbers, limiting the length is potentially and probably harmful.
Still, you will want to test any schema you create to be sure it meets your performance requirements. I expect you'd have a hard time detecting any difference at all between VARCHAR(25) and VARCHAR(16).

There are two ways in which this will decrease performance.
if you're loading those columns many many times, performing a join on the column, or other such thing that means they need to be accessed a large number of times. The number of times depends on your machine, but think on the order of millions.
if you're always filling the field (using 20 chars in a varchar(20), then the length checks are adding a little overhead whenever you perform an insert.
The best way to determine this though is to benchmark your database though.

Related

Performance benefit in using correct data types

Is there any performance benefit in using the exact data types needed for a column? Or is it just storage optimisation?
For example, I'm creating a users table and I know for certainty that there will only be 200 users in total. When I'm manipulating the data in the the server, doing some select/update/insert/delete, is there any performance difference between using TINYINT - UN for the users_id column or using just INT?
The same applies to the user's name. I know, for now, that the user with the longest name length is 48, but I don't know if in the future there won't be a new user inserted in the table with a name with 65 characters in length. Is there any performance benefit in reserving only the needed lenght, for now, using VARCHAR(48) or can I avoid having to check constantly the column allowed length for each new user and use just VARCHAR(255)?
There is little advantage in either case.
For the number, you do gain a slight performance advantage. Typically, integers are 4 and a tinyint is 1 byte. So, if you have multiple smaller fields, then your records will be smaller. Smaller records then imply fewer data pages and ultimately slightly faster queries. This shows up when you start to have lots of records.
For the varchar, you don't even have that advantage. Both varchar(48) and varchar(255) occupy the same amount of space (there is one addition byte for lengths greater than 255). The values determine the space for this data type.
In other cases, it can make a big difference. In particular, storing dates as the native format is usually important, both to take advantage of date/time functions and to make better use of indexes.

Does significantly increasing the length of varchar fields impact on performance?

At the moment, we have a varchar(255) field. One of our users has requested that the field be increased about 10-fold to around varchar(2048). Given his use-case, it appears to be a reasonable request.
The field in question is not indexed, it is not part of any joins and it is never included in any where clauses.
I realize that smaller increases (say from 50 to 100) have no impact, but does this still hold at much larger increases in size?
Apparently not with a sidenote:
"In storage, VARCHAR(255) is smart enough to store only the length you need on a given row, unlike CHAR(255) which would always store 255 characters.
But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: when your query implicitly generates a temporary table, for instance while sorting or GROUP BY, VARCHAR fields are converted to CHAR to gain the advantage of working with fixed-width rows. If you use a lot of VARCHAR(255) fields for data that doesn't need to be that long, this can make the temporary table very large.
It's best to define the column based on the type of data that you intend to store. It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR that is certainly longer than any address. And 255 is customary because it may have been the maximum length of a VARCHAR in some databases in the dawn of time (as well as PostgreSQL until more recently)."
by Bill Karwin
So it basically depends on your specific user for that field; If you don't use GROUP BY with that field, then there's no problem.
For your case there is no difference between varchar(255) and varchar(2048).
In MySQL, temporary tables and MEMORY tables store a VARCHAR column as a fixed-length column, padded out to its maximum length. If you design VARCHAR columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.

Is there any difference between varchar(10) and varchar(1000) when we store string whose length is less than 10?

I understand the difference between char and varchar but I'm not sure of exact meaning of varchar maximum length.
When we store string whose length is less than 10, is there any difference between varchar(10) and varchar(100)?
I think both cases use same space and have same performance. If so, why do we need varchar max limit?
Is it enough to use just "varchar" instead of "varchar(xxx)"?
(added) I'm using MySQL 5.0.67
This depends entirely on the DBMS engine being used. SQL itself does not mandate how things are stored physically, just how they're seen logically.
For example, your DBMS may allocate space in the row for the maximum size, plus some extra bytes to store the length. In that case, there would be a big difference between varchar(10) and varchar(1000) since you would waste quite a bit of space per row.
Alternatively, it may use a buffer pool for the varchar data and store only the length and the buffer pool "starting address" in the row. In that case, every single row would store identically-sized information for a varchar column regardless of its size, but there would be an added step to extract the actual data in that column (following the link to the buffer pool).
The reason you use a varchar is exactly why it's named varchar. It allows you to store variable-sized data elements. Typically, char(10) gives you ten characters, no matter what, padding it with spaces if you insert something shorter. You can trim trailing spaces off as you extract it but that won't work so well if the data you want to store is actually "hello ", with a trailing space you want preserved.
A decent DBMS engine may decide to make a trade-off depending on the maximum size of the varchar column. For short ones, it could just store it inline in the row and consume the extra bytes for the size.
Longer varchar columns could be "outsourced" to a separate buffer pool to ensure row-reading is kept efficient (at least until you need the large varchar column, anyway).
What you need to do is re-ask the question for your specific DBMS so as to get a more targeted answer.
Or, in all honesty, engineer your database to only store the maximum size. If you know it's 10, then varchar(1000) is a waste. If, in the future, you need to enlarge the column, that is the time to do it, rather than now (see YAGNI).
For MySQL, you'll want to look at Chapter 14 Storage Engines of the online documentation.
It covers the various storage engines (such as InnoDB and MyISAM) that MySQL uses and, looking deep enough, you can see how the information is physically stored.
For example, in MyISAM, the presence of variable length data in a table (varchar included) usually means dynamic tables. This follows a scheme roughly analogous to the buffer pool concept I mentioned above, with the advantage that less space is wasted for variable sized columns, and the disadvantage that rows may become fragmented.
The other storage format (discounting compressed format since it's only really used for read-only tables) is the static one, where data is stored in a single physical row.
Information on the InnoDB physical structures can be found here. Depending on whether you use the Antelope or Barracuda file format, you end up with the "all information is a physical row" or "buffer pool" situation, similar to the MyISAM distinction between dynamic and static.
In SQL Server, the limit does not affect how the data is stored on disk. What it does provide, though, is one constraint for free. If you, as the database designer, only want up to 10 characters stored, you've prevented someone from storing a novel instead.
Is it enough to use just "varchar"
Again, for SQL Server, almost certainly not what you want. In most circumstances, if you don't specify a limit, you get a varchar(1) (surely the most pointless data type ever conceived). Occasionally, it's a varchar(30).
In Oracle, the Varchar stretches in size depending on its use, just up to the point where you set the limit. This means that indeed, a varchar(10) and a varchar(100) containing 2 characters use the same space (which is different for a char, which always uses the full allocated space).

Does limiting the size of rows benefit MySQL table size?

I know that limiting the row size will keep the size at a set maximum, however, if i limit all my rows to the maximum that they will ever need, does that increase performance on the rows (i.e. take up less space)
Or does mysql dynamically allocate the size necessary for rows and maximum row length has no affect on it?
MySQL will not dynamically size fields; the potential required size is determined by the data type you choose. However, you can choose appropriate data types to help optimize the database.
For example, the TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT data types require 1 byte, 2 bytes, 3 bytes, 4 bytes, and 8 bytes respectively. If you are only ever going to store integers up to 10,000, SMALLINT would be the best choice since it can store values up to 65,535 (unsigned) wioth plenty of extra room and requires half of the size of an INT.
With variable length types like VARCHAR any length up to and including 255 uses a one byte prefix to store the length. Anything more uses a 2 byte prefix. Limiting VARCHARs to at most 255 if possible is best. As an example, the string foo will require 4 bytes in both a VARCHAR(3) field and a VARCHAR(255) field. If the max length of the field was 500, for example, foo would require 5 bytes.
When possible with short (or sometimes even long) text xolumns, the CHAR type can be beneficial if the strings always have the same length or a very close length. Also, if you're using the MyISAM storage engine and all of your columns are fixed length, you can use the FIXED row format, which is beneficial.
More information about the MySQL data types is available in the manual.
I don't know what is purpose of your question, but if you want to minimize storage space used by database you may use built-in compression, which is avaliable when you are using InnoDB storage engine.
More information: Enabling Compression for a Table (dev.mysql.com)
I don't know how it affects performance (speed), I never used that feature, but I think compression may slow your database significantly. However if you have limited disk space on some hosting server - it may be useful for applictions without heavy load.
MySql does not automatically allocate the necessary size, so you wan't to define it yourself to make the database's performance better and keeping it at a minimum size.
One nice approach that I walked into the other way though was to use NVARCHAR instead of VARCHAR if the VARCHAR was to be used for long strings, but yet not long enough to fit for TEXT, which requires a lot of space.

What are the optimum varchar sizes for MySQL?

How does MySQL store a varchar field? Can I assume that the following pattern represents sensible storage sizes :
1,2,4,8,16,32,64,128,255 (max)
A clarification via example. Lets say I have a varchar field of 20 characters. Does MySQL when creating this field, basically reserve space for 32 bytes(not sure if they are bytes or not) but only allow 20 to be entered?
I guess I am worried about optimising disk space for a massive table.
To answer the question, on disk MySql uses 1 + the size that is used in the field to store the data (so if the column was declared varchar(45), and the field was "FooBar" it would use 7 bytes on disk, unless of course you where using a multibyte character set, where it would be using 14 bytes). So, however you declare your columns, it wont make a difference on the storage end (you stated you are worried about disk optimization for a massive table). However, it does make a difference in queries, as VARCHAR's are converted to CHAR's when MySql makes a temporary table (SORT, ORDER, etc) and the more records you can fit into a single page, the less memory and faster your table scans will be.
MySQL stores a varchar field as a variable length record, with either a one-byte or a two-byte prefix to indicate the record size.
Having a pattern of storage sizes doesn't really make any difference to how MySQL will function when dealing with variable length record storage. The length specified in a varchar(x) declaration will simply determine the maximum length of the data that can be stored. Basically, a varchar(16) is no different disk-wise than a varchar(128).
This manual page has a more detailed explanation.
Edit: With regards to your updated question, the answer is still the same. A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead). So it doesn't matter if you have a varchar(16) or a varchar(128), if you store a 10-character string in it, you're only going to use 10 bytes (plus 1 or 2) of disk space.