Is Padding a field a good choice for MySQL Index? - mysql

If you have a MySQL table with a very large number of rows that includes a variable length field that is often used in WHERE or ORDER BY clauses, and it is infrequent that INSERTS or UPDATES are made, then it would be a good candidate for using an index on the field.
However, from what I could find on the topic, it seems MySQL doesn't handle variable length fields very quickly (compared to fixed length fields) when you index them in this manner. So, I was wondering if it would make sense to left pad the column's rows with empty strings to force all of them to some fixed maximum length. Would this make any sense at all? Or am I just over thinking this?

After consulting the manual some more, I realize this is a "feature" already baked into MySQL:
The length of a CHAR column is fixed to the length that you declare
when you create the table. The length can be any value from 0 to 255.
When CHAR values are stored, they are right- padded with spaces to the
specified length. When CHAR values are retrieved, trailing spaces are
removed.

Related

always use 255 chars for varchar fields decreases performance?

I usually use maximum chars possible for varchar fields, so in most cases I set 255 but only using 16 chars in columns...
does this decreases performance for my database?
When it comes to storage, a VARCHAR(255) column will take up 1 byte to store the length of the actual value plus the bytes required to store the actual value.
For a latin1 VARCHAR(255) column, that's at most 256 bytes. For a UTF8 column, where each character can take up to 3 bytes (though rarely), the maximum size is 766 bytes. As we know the maximum index length for a single column in bytes in InnoDB is 767 bytes, hence perhaps the reason some declare 255 as the maximum supported column length.
So, again, when storing the value, it only takes up as much room as is actually needed.
However, if the column is indexed, the index automatically allocates the maximum possible size so that each node in the index has enough room to store any possible value. When searching through an index, MySQL loads the nodes in specific byte size chunks at a time. Large nodes means less nodes per read, which means it takes longer to search the index.
MySQL will also use the maximum size when storing the values in a temp table for sorting.
So, even if you aren't using indexes, but are ever performing a query that can't utilize an index for sorting, you will get a performance hit.
Therefore, if performance is your goal, setting any VARCHAR column to 255 characters should not be a rule of thumb. Instead, you should use the minimum required.
There may be edge cases where you'd rather suffer the performance every day so that you never have to lock a table completely to increase the size of a column, but I don't think that's the norm.
One possible exception is if you are joining on a VARCHAR column between two tables. MySQL says:
MySQL can use indexes on columns more efficiently if they are declared
as the same type and size.
In that case, you might use the max size between the two.
Whenever you're talking about "performance" you can only find out one way: Benchmarking.
In theoretical terms there's no difference between VARCHAR(20) and VARCHAR(255) if they're both populated with the same data. Keep in mind if you get your length wrong you will have massive truncation problems and MySQL does not warn you before it starts chopping data to fit.
I try to avoid setting limits on VARCHAR columns unless the data would be completely invalid if it was longer. For instance, two-character ISO country codes can be stored in VARCHAR(2) because longer strings are meaningless. For other things, especially names or phone numbers, limiting the length is potentially and probably harmful.
Still, you will want to test any schema you create to be sure it meets your performance requirements. I expect you'd have a hard time detecting any difference at all between VARCHAR(25) and VARCHAR(16).
There are two ways in which this will decrease performance.
if you're loading those columns many many times, performing a join on the column, or other such thing that means they need to be accessed a large number of times. The number of times depends on your machine, but think on the order of millions.
if you're always filling the field (using 20 chars in a varchar(20), then the length checks are adding a little overhead whenever you perform an insert.
The best way to determine this though is to benchmark your database though.

Need help in MySQL DB Design (minimizing the overhead)

I am trying to design a mysql DB
but dunno what to do to minimize the overhead to a minimum, I have uniq needs.
one of the fields could be as long as 60kb or short as 100 bytes, what kind and lenghth should I use in this field to minimize the overhead to its minimum ?
I've heard if you define the maximum to be like 60k then every unused space left in the row till 60k will be filled with space, you understand that this could cause unnecessary overhead, only few raws would make use of this lenghth while most of the rest rows won't, what do you suggest ??
In MySQL, VARCHAR stores compactly on disk, that is it stores only the string used on a given row, plus a byte or two to encode the length of that string. This is true for both data and indexes.
But once the VARCHAR is loaded out of the storage engine and into memory, it is padded out to its full length. This consumes a lot of memory needlessly if you declare VARCHAR(65535). Then that padded-out representation may end up on disk during sorting or temporary table operations.
So use TEXT. This data type doesn't get padded out in memory like VARCHAR does, and it also supports strings up to 64KB.
If you need longer strings, use MEDIUMTEXT which supports up to 16MB.
Use VARCHAR.
Values in VARCHAR columns are variable-length strings. The length can
be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to
65,535 in 5.0.3 and later versions.
one of the fields could be as long as 60kb or short as 100 bytes
Sounds like a BLOB/TEXT. As Andreas has pointed out, you could even use VARCHAR if you know for sure you won't need more than 64K.
I've heard if you define the maximum to be like 60k then every unused space left in the row till 60k will be filled with space
If you use CHAR, the yes: it stores padding.
only few rows would make use of this length while most of the rest rows won't
That's incorrect. Modern DBMSes store variable-length rows: only the space actually needed will be actually used. Just use BLOB/TEXT or VARCHAR and you'll be fine.

Specifying lengths of datatypes

Here is something that troubles me as I am creating a database table columns. For each of these there is a data type which has it's length. For e.g say one of the tables is a file path, and I assume this file path to be not longer than 100 in length at max, obviously i specify this as
filepath Varchar(100)
However, this still takes the same amount of memory space as say varchar(255) which is 1 byte. Given this, what is the benefit of me specifying the length as 100. Taking an outlier example, if my filepath exceeds varchar(100), does the database reject/trim down the filepath value to fit it to 100? Or does it allow it to exceed beyond 100 since the allotted memory space is still around 1 byte?
Essentially the above explanation frames my question as should one try and be very specific about the expected maximum length for a table column? Or just play it safe and specify the upper limit of the expected length of the table column depending on the memory requirement ?
Thanks much !
Parijat
MySQL will auto-truncate the value down to 100 characters. The number in the brackets for text/char fields is the MAXIMUM length. Note that this is a CHARACTER limit. If you've got a multibyte collation on that field, you can store more than 100 bytes in the field, but only 100 characters worth of text.
This is different than saying int(10), where the bracketed number is for display purposes only. An int is an int internally and takes up 16bits, regardless of how many digits you allow with the (#), but you'll never SEE more than those # digits.
very specific about the expected maximum length for a table column? Or just play it safe
If one would make a table containing addresses, you undoubtedly know that there will be some kind of limit to the length of the address. It would be useless to allow longer fields in the database.
You should play it safe, and be very careful.

What should be the typical length of user's Full Name in database [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
List of standard lengths for database fields
Simple as that, what should be the typical length of allowed "Full Name" of a user in database?
When I create users table, I usually set it as varchar 31 or 32 (according to performance). What do you guys use and what's standard/typical convention.
Sidenote: I never face problem in email length (as I set it 254) and password (hash, 32 length).
The maximum your average varchar field allows (254?).
You are not winning anything by making it arbitrarily shorter. The fine-grained size controls on numbers and chars are more or less a relic from the past, when every byte mattered. It can matter today - if you are dealing with tens or hundreds of millions of rows, or thousands of queries per sec. For your average database (i.e. 99% of them) performance comes from proper indexing and querying, NOT making your rows a couple of bytes smaller.
Only restrict the length of a field when there is some formal specification that defines a maximum length, like 13 digits for an EAN code or 12 characters for an ISIN.
Full name is always a computed column composed of first, middle, last, prefix, suffix, degree, family name, etc in my designs. The list of individual columns are determined by the targeted local of the app. The display length of 'Full Name' is normall contained within the app design not the database. There is not any space savings in SQL Server between varchar(32) and varchar(256). Varchar(256) is my choice.
I never want to be in the meeting when someone says "Your db design will not hold all our data".
You are always assigning an ID to the user so you can join and do look-ups using the ID instead of the FullName, correct?
I would recommend at least 128.
Well you can just put it at 255 if you want.
varchars is a Variable length storage type. This means theres 1 byte which stores the actual length of the string, varchars dont use up more bites then needed so storage wise it really does not matter. This is described on the mysql page
Description can be found here http://dev.mysql.com/doc/refman/5.0/en/char.html
It is illustrated halfway the page check the table.
VARCHAR values are not padded when
they are stored. Handling of trailing
spaces is version-dependent. As of
MySQL 5.0.3, trailing spaces are
retained when values are stored and
retrieved, in conformance with
standard SQL. Before MySQL 5.0.3,
trailing spaces are removed from
values when they are stored into a
VARCHAR column; this means that the
spaces also are absent from retrieved
values.
Conclusion:
Storage wise you could always go for 255 because it wont use up additional space and you wont get intro trouble with string getting cut off.
Greetz

What are the optimum varchar sizes for MySQL?

How does MySQL store a varchar field? Can I assume that the following pattern represents sensible storage sizes :
1,2,4,8,16,32,64,128,255 (max)
A clarification via example. Lets say I have a varchar field of 20 characters. Does MySQL when creating this field, basically reserve space for 32 bytes(not sure if they are bytes or not) but only allow 20 to be entered?
I guess I am worried about optimising disk space for a massive table.
To answer the question, on disk MySql uses 1 + the size that is used in the field to store the data (so if the column was declared varchar(45), and the field was "FooBar" it would use 7 bytes on disk, unless of course you where using a multibyte character set, where it would be using 14 bytes). So, however you declare your columns, it wont make a difference on the storage end (you stated you are worried about disk optimization for a massive table). However, it does make a difference in queries, as VARCHAR's are converted to CHAR's when MySql makes a temporary table (SORT, ORDER, etc) and the more records you can fit into a single page, the less memory and faster your table scans will be.
MySQL stores a varchar field as a variable length record, with either a one-byte or a two-byte prefix to indicate the record size.
Having a pattern of storage sizes doesn't really make any difference to how MySQL will function when dealing with variable length record storage. The length specified in a varchar(x) declaration will simply determine the maximum length of the data that can be stored. Basically, a varchar(16) is no different disk-wise than a varchar(128).
This manual page has a more detailed explanation.
Edit: With regards to your updated question, the answer is still the same. A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead). So it doesn't matter if you have a varchar(16) or a varchar(128), if you store a 10-character string in it, you're only going to use 10 bytes (plus 1 or 2) of disk space.