I seem to see a lot of people arbitrarily assigning large sizes to primary/foreign key fields in their MySQL schemas, such as INT(11) and even BIGINT(20) as WordPress uses.
Now correct me if I'm wrong, but even an INT(4) would support (unsigned) values up to over 4 billion. Change it to INT(5) and you allow for values up to a quadrillion, which is more than you would ever need, unless possibly you're storing geodata at NASA/Google, which I'm sure most of us aren't.
Is there a reason people use such large sizes for their primary keys? Seems like a waste to me...
The size is neither bits nor bytes.
It's just the display width, that is
used when the field has ZEROFILL
specified.
and
INT[(M)] [UNSIGNED] [ZEROFILL] A
normal-size integer. The signed range
is -2147483648 to 2147483647. The
unsigned range is 0 to 4294967295.
See this explanation.
I don't see any good reason to use a number larger than 32-bit integer for indexing data in normal business-sized databases. Most of them have maybe millions of records (or that order of magnitude).
Related
I am running a MySQL 5.7.30-0ubuntu0.16.04.1-log Server where I have the option of saving in char(4) or in smallint(5, unsigned).
There will be a primary index on the column and the key will be used as a referrence accross tables.
What is faster? Char or Int?
Unsigned SMALLINT values use two bytes and have values in the range [0, 65535]. CHAR(4) values take four bytes. So, indexing SMALLINT values will make for a smaller index. Smaller is faster. Plus indexes on character columns usually have all sorts of character-set and case-insensitivity monkey business built in to them, which also takes time and space.
But, for a table with at most 65K rows, the effect of this choice will be so small you'll have trouble measuring it. If you build something that's hard to debug, you'll spend your precious time and ten thousand times as much computer time debugging it than it will save.
Design your tables so they match your application. If you're using a four-digit number use SMALLINT.
The next person to work on your code (even if that person is you a year from now) will thank you for a clear implementation.
And keep in mind that MySQL ignores the number in parentheses on INT declarations. SMALLINT(4), SMALLINT(5), and SMALLINT all mean precisely the same thing. MySQL uses the native processor integer datatypes: TINYINT is an 8-bit number, SMALLINT a 16-bit number, INT a 32-bit number, and BIGINT a 64-bit number. Likewise FLOAT is a 32-bit IEEE 754 floating point number and DOUBLE a 64-bit one. The number of digits SMALLINT(4) is a nod to SQL standards compatibility.
As mentioned by O. Jones, SMALLINT will be faster and more space-efficient.
This is related to the following answer: mysql-char-vs-int
Also, MySQL Documentation:
CHAR and VARCHAR types
Integer Types
Case 1: The difference between CHAR(4) and SMALLINT is insignificant. It should not influence you choice of datatypes. Instead, use the datatypes that match the data.
Case 2: If you are comparing TINYINT to VARCHAR(255), the answer is probably different. Note that there is a much bigger difference in the choices.
Case 3: If the choice comes down to whether to "normalize" a column, there are arguments either way. I much prefer using a CHAR(2) for country_code than normalizing in order to shrink to a TINYINT. The overhead of extra normalization always(?) outweighs the space savings.
Another consideration: How many secondary keys are on the table? And how many other tables will you be joining to?
Case 4: PRIMARY KEY(big_string) but no secondary keys. There is no possibly no advantage in switching to an int.
Case 5: Since secondary keys include the PK, consider:
PRIMARY KEY(big_string),
INDEX(foo),
INDEX(bar)
versus
PRIMARY KEY(id), -- surrogate AUTO_INCREMENT
INDEX(big_string),
INDEX(foo),
INDEX(bar)
The latter will take less disk space.
Another consideration: Fetching a row is far more costly than comparing an int or string. My point is that you should not worry about comparison performance; you should look at the bigger picture when optimizing.
Case 6: USA 5-digit zip code. CHAR(5) (5 bytes) is reasonable. MEDIUMINT(5) UNSIGNED ZEROFILL (3 bytes) is better because it does everything better. (And it is a very rare case of the *INT(n) being meaningful.)
And the debate goes on and on.
I have created one table and I want to store numbers from 1 to 60 numbers only inside the field.
What should I put in the datatype of the table field? Should I use TINYINT (4) datatype?
"Best" data type is open to interpretation. Here are three options:
numeric(2, 0)
varchar(2)
tinyint(2)
These have different sizes, but that doesn't make them "best" -- except under certain circumstances where storage space is a primary concern. I am guessing that your "numbers" are not really numbers, but are codes of some sort that vary from 1 to 60.
If these are referencing a reference table, then tinyint makes sense as the key, because keys are often numbers. However, I often use int for such keys. The extra three bytes usually have little impact on performance.
If the code is zero-padded (so '01' rather than '1'), then char(2) is the appropriate type. It might take more space, but it accurately represents the value.
If these are indeed numbers -- like addition or multiplication is defined -- then tinyint is definitely the most appropriate type.
Yes TINYINT is the best option its from -128 to 127!
Documentation Link: https://dev.mysql.com/doc/refman/8.0/en/integer-types.html
I'm trying to increase database performance of one of my customers.
many tables have bigint(250). I've read on MySql documentation that the bigint is max 8 bits/bytes. I don't understand why it is possible to have a bigint(250) while the max is 8?
Also with the INT fields, some fields have INT(25), but INT is max 8 bit/bytes.
Do I see this correct or not?
And what does MySql with sizes that are bigger than the field size?
When dealing with types such as INT, BIGINT, etc,, the numbers inside the parentheses are for display width only, unlike e.g. VARCHAR where it defines the storage size as well.
If the display width is this big, you can safely assume the designer just had a moment of insanity, because unless the width is less than the maximum value, it's basically useless.
It would be more important to determine whether the field is signed. Defining it as BIGINT UNSIGNED effectively doubles the range for fields that should never be negative, such as an identifier.
I'm using phpMyAdmin to create my table structures.
I can read from the documentation pages on MySQL about size limits for Integer Types:
MySQL Integer Types Reference
So here is where I'm getting a little confused with creating a column.
I want to create a column in the table: tbl_note_categories called notescounter
I don't foresee myself creating thousands of noteids in the tbl_notes with any specific categoryid. But I do believe I'd create hundreds of notes to each categoryid.
I'm at that point of choosing between: tinyint, smallint, mediumint.
According the documentation link above, I'm guessing smallint is my best choice.
So here's my confusion. PhpMyAdmin asks for a Length/Values parameter to be specified.
I'm going to make sure this new column (notescounter) is unsigned, giving me up to 65536.
Does that mean I need the Length/Values to be (5)?
I'm guessing Length is character length, but I'm not sure. (comparing to varchar)
No, this is a common misconception about MySQL. In fact, the "length" has no effect on the size of an integer or the range of values it can store.
TINYINT is always 8 bits and can store 28 distinct values.
SMALLINT is always 16 bits and can store 216 distinct values.
INT is always 32 bits and can store 232 distinct values.
BIGINT is always 64 bits and can store 264 distinct values.
There's also a MEDIUMINT, but the engineers who work on MySQL tell me MEDIUMINT always gets promoted to a 32-bit INT internally, so there's actually no benefit to using MEDIUMINT.
The length is only for display, and this only matters if you use the ZEROFILL option.
See an example in my answer to What is the difference (when being applied to my code) between INT(10) and INT(12)?
Yes, you want to specify a length of 5.
In MySQL, the "length" attribute on the integer types is optional. It's a MySQL extension which is non-standard).
When it is omitted from the column declaration, MySQL provides a default value. For a SMALLINT UNSIGNED, the default value is 5.
This value does NOT have any impact on the range of values that can be stored for an integer type. It specifies a "display length", which is returned in resultset metadata, which a client can choose to use or ignore.
http://dev.mysql.com/doc/refman/5.5/en/numeric-type-attributes.html
I all..
I have always used int(10) for everything, but couple days ago, I started a new project, and was hoping to do this 100% optimized ;)
So I am wondering, how many;
user_id => int(6) vs. mediumint (8) or similar will be possible to create/add
group_id => tinyint(1) vs tinyint (4) or similar will it be possible to create/add
and so on..
I know that the (X) is the width of the field, but, I can not quite understand the actual number of users/posts/messages ++ that can be created using example; mediumint(8) for id, instead of int(10).
Thanks for any reply on this!!
-Tom
Database IDs are usually always positive (0->∞) so the max value would be:
Integer Type Max Value
TINYINT 255
SMALLINT 65535
MEDIUMINT 16777215
INT 4294967295
BIGINT 18446744073709551615
I know that the (X) is the width of the field
The optional number in parens is the display width. It has nothing to do with how many unique values are in the range of the integer or how much storage space the integer needs. Application code is free to ignore your hint about the display width. "Display width" is a non-standard extension to SQL.
INTEGER(6) and INTEGER(2) both take 4 bytes to store, and both accept values ranging from -2147483648 to 2147483647.
All medium integers take 3 bytes to store, and accept values ranging from -8388608 to 8388607.
Assuming that a medium int is big enough (~ 16 million unique values) to identify your full domain of values you potentially save 1 byte per row over a 4-byte integer. (Potentially, because some platforms require padding to the next word boundary. For 32-bit systems, that would be 4 bytes, so no actual space savings. I don't know whether MySQL does that.) For 10 million rows, you might save 10 megabytes (plus some space savings in the index)--not very much these days. Narrower tables are generally faster than wider tables, but I don't think you'll notice the difference here.