Varchar, Char or Binary to improve MySQL Performance - mysql

I'm using MySQL and I'm reading in some places that using CHAR in indexed columns is 20% faster than use VARCHAR. In other places seems that its benefit is only when the table doesn't have any VARCHAR column. Is that true?
The information that I want store is a GUID. Is a better option store the data in a BINARY or in a CHAR if the database uses character set utf8? It's worth convert my data to BINARY every time that I want insert, update or query filtering by the GUID? I prefer faster data access than save disk usage.

Any fixed-width column type will lend itself to faster seeking operations when compared to variable-width types. Unless your table is partitioned, it is also true that the variable-width types can degrade performance even on operations which do not involve them. For consideration, think through the algorithm for how you would iterate all the values in a column when all columns are fixed-width and then when some aren't:
For all-fixed width tables (or partitions), you might use simple pointer arithmetic, where you add the value of the combined data-width of all the columns in the partition each time through the loop.
If there are any variable-width columns, however, you would need to calculate the amount to add to the pointer every iteration, based on the actual on-disk "width" of the columns.

Related

Performance (not space) differences between storing dates in DATE vs. VARCHAR fields

We have a MySQL-based system that stores date values in VARCHAR(10) fields, as SQL-format strings (e.g. '2021-09-07').
I am aware of the large additional space requirements of using this format over the native DATE format, but I have been unable to find any information about performance characteristics and how the two formats would differ in terms of speed. I would assume that working with strings would be slower for non-indexed fields. However, if the field is indexed I could imagine that the indexes on string fields could potentially yield much larger improvements than those on date fields, and possibly even overtake them in terms of performance, for certain tasks.
Can anyone advise on the speed implications of choosing one format over the other, in relation to the following situations (assuming the field is indexed):
Use of the field in a JOIN condition
Use of the field in a WHERE clause
Use of the field in an ORDER BY clause
Updating the index on INSERT/UPDATE
I would like to migrate the fields to the more space-efficient format but want to get some information about any potential performance implications (good or bad) that may apply. I plan to do some profiling of my own if I go down this route, but don't want to waste my time if there is a clear, known advantage of one format over the other.
Note that I am also interested in the same question for VARCHAR(19) vs. DATETIME, particualrly if it yields a different answer.
Additional space is a performance issue. Databases work by reading data pages (and index pages) from disk. Bigger records require more data pages to store. And that has an effect on performance.
In other words, your date column is 11 bytes versus 4 bytes. If you had a table with only ten date columns, that would be 110 bytes versus 40, that would require almost three times as much time to scan the data.
As for your operations, if you have indexes that are being used, then the indexes will also be larger. Because of the way that MySQL handles collations for columns, comparing string values is (generally) going to be less efficient than comparing binary values.
Certain operations such as extracting the date components are probably comparable (a string function versus a date function). However, other operations such as extracting the day of the week or the month name probably require converting to a date and then to a string, which is more expensive.
More bytes being compared --> slightly longer to do the comparison.
More bytes in the column --> more space used --> slightly slower operations because of more bulk on disk, in cache, etc.
This applies to either 1+10 or 1+19 bytes for VARCHAR versus 3 for DATE or 5 for DATETIME. The "1" is for the 'length' of VARCHAR; if the strings are a consistent length, you could switch to CHAR.
A BTree is essentially the same whether the value is an Int, Float, Decimal, Enum, Date, or Char. VARCHAR is slightly different in that it has a 'length'; I don't see this is a big issue in the grand scheme of things.
The number of rows that need to be fetched is the most important factor in the speed of a query. Other things, such as datatype size, come further down the list of what slows things down.
There are lots of DATETIME functions that you won't be able to used. This may lead to a table scan instead of using an INDEX. That would have a big impact on performance. Let's see some specific SQL queries.
Using CHARACTER SET ascii COLLATE ascii_bin for your date/datetime columns would make comparisons slightly faster.

MySQL multiple rows vs storing values all in one string

I was just wondering about the efficiency of storing a large amount of boolean values inside of a CHAR or VARCHAR
data
"TFTFTTF"
vs
isFoo isBar isText
false true false
Would it be worth the worse performance by switching storing these values in this manner? I figured it would just be easier just to set a single value rather than having all of those other fields
thanks
Don't do it. MySQL offers types such as char(1) and tinyint that occupy the same space as a single character. In addition, MySQL offers enumerated types, if you want your flags to have more than one value -- and for the values to be recognizable.
That last point is the critical point. You want your code to make sense. The string 'FTF' does not make sense. The columns isFoo, isBar, and isText do make sense.
There is no need to obfuscate your data model.
This would be a bad idea, not only does it have no advantage in terms of the space used, it also has a bad influence on query performance and the comprehensibility of your data model.
Disk Space
In terms of storage usage, it makes no real difference whether the data is stored in a single varchar(n) or char(n) column or in multiple tinynt, char(1)or bit(1) columns. Only when using varchar you would need 1 to 2 bytes more disk space per entry.
For more information about the storage requirements of the different data types, see the MySql documentation.
Query Performance
If boolean values were stored in a VarChar, the search for all entries where a specific value is True would take much longer, since string operations would be necessary to find the correct entries. Even when searching for a combination of Boolean values such as "TFTFTFTFTT", the query would still take longer than if the boolean values were stored in individual columns. Furthermore you can assign indexes to single columns like isFoo or isBar, which has a great positive effect on query performance.
Data Model
A data model should be as comprehensible as possible and if possible independent of any kind of implementation considerations.
Realistically, a database field should only contain one atomic value, that is to say: a value that can't be subdivided into separate parts.
Columns that do not contain atomic values:
cannot be sorted
cannot be grouped
cannot be indexed
So let's say you want to find all rows where isFoo is true you wouldn't be able to do it unless you were to do string operations like "find the third characters in this string and see if it's equal to "F". This would imply a full table scan with every query which would degrade performance quite dramatically.
it depends on what you want to do after storing the data in this format.
after retrieving this record you will have to do further processing on the server side which worsen the performance if you want to load the data by checking specific conditions. the logic in the server would become complex.
The columns isFoo, isBar, and isText would help you to write queries better.

Performance benefit in using correct data types

Is there any performance benefit in using the exact data types needed for a column? Or is it just storage optimisation?
For example, I'm creating a users table and I know for certainty that there will only be 200 users in total. When I'm manipulating the data in the the server, doing some select/update/insert/delete, is there any performance difference between using TINYINT - UN for the users_id column or using just INT?
The same applies to the user's name. I know, for now, that the user with the longest name length is 48, but I don't know if in the future there won't be a new user inserted in the table with a name with 65 characters in length. Is there any performance benefit in reserving only the needed lenght, for now, using VARCHAR(48) or can I avoid having to check constantly the column allowed length for each new user and use just VARCHAR(255)?
There is little advantage in either case.
For the number, you do gain a slight performance advantage. Typically, integers are 4 and a tinyint is 1 byte. So, if you have multiple smaller fields, then your records will be smaller. Smaller records then imply fewer data pages and ultimately slightly faster queries. This shows up when you start to have lots of records.
For the varchar, you don't even have that advantage. Both varchar(48) and varchar(255) occupy the same amount of space (there is one addition byte for lengths greater than 255). The values determine the space for this data type.
In other cases, it can make a big difference. In particular, storing dates as the native format is usually important, both to take advantage of date/time functions and to make better use of indexes.

always use 255 chars for varchar fields decreases performance?

I usually use maximum chars possible for varchar fields, so in most cases I set 255 but only using 16 chars in columns...
does this decreases performance for my database?
When it comes to storage, a VARCHAR(255) column will take up 1 byte to store the length of the actual value plus the bytes required to store the actual value.
For a latin1 VARCHAR(255) column, that's at most 256 bytes. For a UTF8 column, where each character can take up to 3 bytes (though rarely), the maximum size is 766 bytes. As we know the maximum index length for a single column in bytes in InnoDB is 767 bytes, hence perhaps the reason some declare 255 as the maximum supported column length.
So, again, when storing the value, it only takes up as much room as is actually needed.
However, if the column is indexed, the index automatically allocates the maximum possible size so that each node in the index has enough room to store any possible value. When searching through an index, MySQL loads the nodes in specific byte size chunks at a time. Large nodes means less nodes per read, which means it takes longer to search the index.
MySQL will also use the maximum size when storing the values in a temp table for sorting.
So, even if you aren't using indexes, but are ever performing a query that can't utilize an index for sorting, you will get a performance hit.
Therefore, if performance is your goal, setting any VARCHAR column to 255 characters should not be a rule of thumb. Instead, you should use the minimum required.
There may be edge cases where you'd rather suffer the performance every day so that you never have to lock a table completely to increase the size of a column, but I don't think that's the norm.
One possible exception is if you are joining on a VARCHAR column between two tables. MySQL says:
MySQL can use indexes on columns more efficiently if they are declared
as the same type and size.
In that case, you might use the max size between the two.
Whenever you're talking about "performance" you can only find out one way: Benchmarking.
In theoretical terms there's no difference between VARCHAR(20) and VARCHAR(255) if they're both populated with the same data. Keep in mind if you get your length wrong you will have massive truncation problems and MySQL does not warn you before it starts chopping data to fit.
I try to avoid setting limits on VARCHAR columns unless the data would be completely invalid if it was longer. For instance, two-character ISO country codes can be stored in VARCHAR(2) because longer strings are meaningless. For other things, especially names or phone numbers, limiting the length is potentially and probably harmful.
Still, you will want to test any schema you create to be sure it meets your performance requirements. I expect you'd have a hard time detecting any difference at all between VARCHAR(25) and VARCHAR(16).
There are two ways in which this will decrease performance.
if you're loading those columns many many times, performing a join on the column, or other such thing that means they need to be accessed a large number of times. The number of times depends on your machine, but think on the order of millions.
if you're always filling the field (using 20 chars in a varchar(20), then the length checks are adding a little overhead whenever you perform an insert.
The best way to determine this though is to benchmark your database though.

Why we limit length of columns values in MYSQL

When we creating database for our application, We limited lengths of database columns.
example -
String (200)
int (5)
etc
Is there any effect on Speed or some effect?
First of all, one does not limit the length of a "database". Instead, one does limit the size of columns of tables in a database.
Why do we do this, you ask?
We don't want to waste any space for data that's never going to use it (that's what the varchar, varbinary and the like are for).
It's a best practice because it forces you to think of your data structure BEFORE you actually use it.
The less data there is the faster the processing of the application (that's a tautology).
It makes it easier to validate your data if you know exactely how much space it is allowed to take.
Full text indexes gain greatly when limited in size
One reason I can think of is, When you didn't specify the length of a column data type, the MYsql engine would assume a default length value that may be a lot larger of the length of the actual data that would be stored in that column. So it is always a best practice never to ignore the length property of a column.
Limiting the length of database fields ensures validation of data, you won't get any unexpected data of a length other than what has been specified. Also certain fields cannot be indexed such as LONG so choose appropriately and wisely. With regard to performance the effect is negligible. You need to also think about the data itself, for example storing data in a unicode encoding such as UTF-8 may increase the storage requirements.