Decimal VS Int in MySQL? - mysql

Are there any performance difference between decimal(10,0) unsigned type and int(10) unsigned type?

It may depend on the version of MySQL you are using. See here.
Prior to MySQL 5.0.3, the DECIMAL type was stored as a string and would typically be slower.
However, since MySQL 5.0.3 the DECIMAL type is stored in a binary format so with the size of your DECIMAL above, there may not be much difference in performance.
The main performance issue would have been the amount of space taken up by the different types (with DECIMAL being slower). With MySQL 5.0.3+ this appears to be less of an issue, however if you will be performing numeric calculations on the values as part of the query, there may be some performance difference. This may be worth testing as there is no indication in the documentation that i can see.
Edit:
With regards to the int(10) unsigned, i took this at face value as just being a 4 byte int. However this has a maximum value of 4294967295 which strictly doesn't provide the same range of numbers as a DECIMAL(10,0) unsigned .
As #Unreason pointed out, you would need to use a bigint to cover the full range of 10 digit numbers, pushing the size up to 8 bytes.
A common mistake is that when specifying numeric columns types in MySQL, people often think the number in the brackets has an impact on the size of the number they can store. It doesn't. The number range is purely based on the column type and whether it is signed or unsigned. The number in the brackets is for display purposes in results and has no impact on the values stored in the column. It will also have no impact of the display of the results unless you specify the ZEROFILL option on the column as well.

According to the mysql data storage your decimal will require
DECIMAL(10,0): 4 bytes for 9 digits and 1 byte for the remaining 10th digit, so in total five bytes (assuming my reading of documentation is correct).
INT(10): will need BIGINT which is 8 bytes.
The differences is that the decimal is packed and some operations on such data type might be slower then on normal INT types which map directly to machine represented numbers.
Still I would do your own tests to confirm the above reasoning.
EDIT:
I noticed that I did not elaborate on the obvious point - assuming the above logic is sound the difference in size required is 60% more space needed for BIGINT variant.
However this does not directly translate to penalties due to the fact that data is normally not written byte by byte. In case of selects/updates of many rows you should see the performance loss/gain, but in case of selecting/updating a small number of rows the filesystem will fetch blocks from the disk(s) which will normally get/write multiple columns anyway.
The size (and speed) of indexes might be more directly impacted.
However, the question on how the packing influences various operations still remains open.

According to this similar question, yes, potentially there is a big performance hit because of difference in the way DECIMAL and INT are treated and threaded into the CPU when doing calculations.
See: Is there a performance hit using decimal data types (MySQL / Postgres)

I doubt such a difference can be performance related at all.
Most of performance issues tied to proper database design and indexing plan, and server/hardware tuning as a next level.

Related

Performance (not space) differences between storing dates in DATE vs. VARCHAR fields

We have a MySQL-based system that stores date values in VARCHAR(10) fields, as SQL-format strings (e.g. '2021-09-07').
I am aware of the large additional space requirements of using this format over the native DATE format, but I have been unable to find any information about performance characteristics and how the two formats would differ in terms of speed. I would assume that working with strings would be slower for non-indexed fields. However, if the field is indexed I could imagine that the indexes on string fields could potentially yield much larger improvements than those on date fields, and possibly even overtake them in terms of performance, for certain tasks.
Can anyone advise on the speed implications of choosing one format over the other, in relation to the following situations (assuming the field is indexed):
Use of the field in a JOIN condition
Use of the field in a WHERE clause
Use of the field in an ORDER BY clause
Updating the index on INSERT/UPDATE
I would like to migrate the fields to the more space-efficient format but want to get some information about any potential performance implications (good or bad) that may apply. I plan to do some profiling of my own if I go down this route, but don't want to waste my time if there is a clear, known advantage of one format over the other.
Note that I am also interested in the same question for VARCHAR(19) vs. DATETIME, particualrly if it yields a different answer.
Additional space is a performance issue. Databases work by reading data pages (and index pages) from disk. Bigger records require more data pages to store. And that has an effect on performance.
In other words, your date column is 11 bytes versus 4 bytes. If you had a table with only ten date columns, that would be 110 bytes versus 40, that would require almost three times as much time to scan the data.
As for your operations, if you have indexes that are being used, then the indexes will also be larger. Because of the way that MySQL handles collations for columns, comparing string values is (generally) going to be less efficient than comparing binary values.
Certain operations such as extracting the date components are probably comparable (a string function versus a date function). However, other operations such as extracting the day of the week or the month name probably require converting to a date and then to a string, which is more expensive.
More bytes being compared --> slightly longer to do the comparison.
More bytes in the column --> more space used --> slightly slower operations because of more bulk on disk, in cache, etc.
This applies to either 1+10 or 1+19 bytes for VARCHAR versus 3 for DATE or 5 for DATETIME. The "1" is for the 'length' of VARCHAR; if the strings are a consistent length, you could switch to CHAR.
A BTree is essentially the same whether the value is an Int, Float, Decimal, Enum, Date, or Char. VARCHAR is slightly different in that it has a 'length'; I don't see this is a big issue in the grand scheme of things.
The number of rows that need to be fetched is the most important factor in the speed of a query. Other things, such as datatype size, come further down the list of what slows things down.
There are lots of DATETIME functions that you won't be able to used. This may lead to a table scan instead of using an INDEX. That would have a big impact on performance. Let's see some specific SQL queries.
Using CHARACTER SET ascii COLLATE ascii_bin for your date/datetime columns would make comparisons slightly faster.

Performance benefit in using correct data types

Is there any performance benefit in using the exact data types needed for a column? Or is it just storage optimisation?
For example, I'm creating a users table and I know for certainty that there will only be 200 users in total. When I'm manipulating the data in the the server, doing some select/update/insert/delete, is there any performance difference between using TINYINT - UN for the users_id column or using just INT?
The same applies to the user's name. I know, for now, that the user with the longest name length is 48, but I don't know if in the future there won't be a new user inserted in the table with a name with 65 characters in length. Is there any performance benefit in reserving only the needed lenght, for now, using VARCHAR(48) or can I avoid having to check constantly the column allowed length for each new user and use just VARCHAR(255)?
There is little advantage in either case.
For the number, you do gain a slight performance advantage. Typically, integers are 4 and a tinyint is 1 byte. So, if you have multiple smaller fields, then your records will be smaller. Smaller records then imply fewer data pages and ultimately slightly faster queries. This shows up when you start to have lots of records.
For the varchar, you don't even have that advantage. Both varchar(48) and varchar(255) occupy the same amount of space (there is one addition byte for lengths greater than 255). The values determine the space for this data type.
In other cases, it can make a big difference. In particular, storing dates as the native format is usually important, both to take advantage of date/time functions and to make better use of indexes.

always use 255 chars for varchar fields decreases performance?

I usually use maximum chars possible for varchar fields, so in most cases I set 255 but only using 16 chars in columns...
does this decreases performance for my database?
When it comes to storage, a VARCHAR(255) column will take up 1 byte to store the length of the actual value plus the bytes required to store the actual value.
For a latin1 VARCHAR(255) column, that's at most 256 bytes. For a UTF8 column, where each character can take up to 3 bytes (though rarely), the maximum size is 766 bytes. As we know the maximum index length for a single column in bytes in InnoDB is 767 bytes, hence perhaps the reason some declare 255 as the maximum supported column length.
So, again, when storing the value, it only takes up as much room as is actually needed.
However, if the column is indexed, the index automatically allocates the maximum possible size so that each node in the index has enough room to store any possible value. When searching through an index, MySQL loads the nodes in specific byte size chunks at a time. Large nodes means less nodes per read, which means it takes longer to search the index.
MySQL will also use the maximum size when storing the values in a temp table for sorting.
So, even if you aren't using indexes, but are ever performing a query that can't utilize an index for sorting, you will get a performance hit.
Therefore, if performance is your goal, setting any VARCHAR column to 255 characters should not be a rule of thumb. Instead, you should use the minimum required.
There may be edge cases where you'd rather suffer the performance every day so that you never have to lock a table completely to increase the size of a column, but I don't think that's the norm.
One possible exception is if you are joining on a VARCHAR column between two tables. MySQL says:
MySQL can use indexes on columns more efficiently if they are declared
as the same type and size.
In that case, you might use the max size between the two.
Whenever you're talking about "performance" you can only find out one way: Benchmarking.
In theoretical terms there's no difference between VARCHAR(20) and VARCHAR(255) if they're both populated with the same data. Keep in mind if you get your length wrong you will have massive truncation problems and MySQL does not warn you before it starts chopping data to fit.
I try to avoid setting limits on VARCHAR columns unless the data would be completely invalid if it was longer. For instance, two-character ISO country codes can be stored in VARCHAR(2) because longer strings are meaningless. For other things, especially names or phone numbers, limiting the length is potentially and probably harmful.
Still, you will want to test any schema you create to be sure it meets your performance requirements. I expect you'd have a hard time detecting any difference at all between VARCHAR(25) and VARCHAR(16).
There are two ways in which this will decrease performance.
if you're loading those columns many many times, performing a join on the column, or other such thing that means they need to be accessed a large number of times. The number of times depends on your machine, but think on the order of millions.
if you're always filling the field (using 20 chars in a varchar(20), then the length checks are adding a little overhead whenever you perform an insert.
The best way to determine this though is to benchmark your database though.

Roughly how much faster is int compared to bigint when indexing and used in ORDER BY?

First I know there are many variables, but I just need something to base my decision to use int or bigint.
I have a rank field that is something like 9 to 10 digits long so I either have to decrease the precision to ensure it's never bigger than 9 digits or use bignit. My question is whether there is an information on approximations for speed of read/write, indexing and using int and bigint in ORDER BY clause?
Roughly how much faster is int compared to bigint when indexing and used in ORDER BY?
I'm asking for any benchmarks you may have done, or in your own experience.
Edit: my ranking algorithm creates rank that's a float and grows by day. by 2020 it will be 7745.9570846. I convert this number to integer by deciding how precise I want to be, for example if I decide to be precise up to 5 digits after dot, converted number would be 774595708 which fits in int even though I loose some precision. I could have more precision by using bigint, and also ensure I wont run out of space shortly after 2020, but because I am not sure what the speed tradeoffs are I'm not sure if i shoud or not
Ints may be slightly (though unlikely in a noticable way) faster than bigints because it is less data to store, compare, etc.
The bottom line is to use the data type that fits your data, not based on some premature optimization. Build your app first, then optimize if there is a problem.

mysql , bigint or decimal for storing > 32 bit values but less than 64 bits

We're needing to store integer values of up to 2^38 . Are there any reasons to use decimal(12,0) or should we use bigint ?
In my view, bigint would be better. It's stored as an integer that MySQL will understand natively without any conversion required, and will therefore (I imagine) be faster at manipulating. You should therefore expect MySQL to be marginally more efficient if you use bigint.
According to this manual page, the first 9 digits of your number will be stored in a four-byte block and the remaining digits (you require up to 12) will be stored in a two-byte block. That means your column takes up 6 bytes per row, as opposed to 8 bytes for bigint. I would suggest that unless a) you are going to be storing a truly obscene number of rows, such that the space taken up is a serious concern, and b) you are going to need to query the data in question very little, you should go with bigint.
This is an assumption, but I think its a good one ... on a 64bit machine, i'm pretty sure accessing a 64bit integer is very efficient, so you should stick with bigint. i don't know off-hand how mysql stores decimals, but i can't imagine how it would do so more efficiently than storing a 64-bit integer.