I have some small tables that don't need the bigint primary key, they won't get that big, but, all tables have bigint primary key as standard.
Can this affect my performance or mysql is smart on that?
I wouldn't like to change the PKs to int on those tables, but if it can slow me down, surely I will.
One of optimization rules for DBMS is "keep your data as small as possible" - so if you don't need bigint - declare it as an int (and change type when you need it)
Based on benchmarks here using an BIGINT could increase the database size by a significant factor, which would affect performance, probably not noticeable until you reached a significant size.
MySQL won't do this for you, as it (and you) never know(s) how big the tables will get. The performance benefits of changing BIGINT to INT on smaller tables is negligable, although it might be an idea to keep the BIGINT type in case your row count goes above INTs limits of 2147483647 (4294967295 unsigned). It is, however, advisable to keep your data in as compact a way as possible.
If it's a relatively small table, you might be better off going with MEDIUMINT actually. It's limits are 8388607 and 16777215 (unsigned).
Related
In MySQL, I need to store primary key of length 21 (google or facebook uid) which has much more size than int (2147483647 as a maximum value) and BIGINT (9223372036854775807 as a maximum value),
I also read about varchar and realized that it's of low performance and AUTO_INCREMENT issues.
What is the best primary key for such a case?
Why not just use numeric(21, 0)? That seems to exactly describe what you are looking for.
I wouldn't say that "varchar has low performance". What is more accurate is that shorter keys are better and fixed length keys are better. Hence, strings are not optimal as keys for indexes. However, they are still very reasonable when needed.
I am running a MySQL 5.7.30-0ubuntu0.16.04.1-log Server where I have the option of saving in char(4) or in smallint(5, unsigned).
There will be a primary index on the column and the key will be used as a referrence accross tables.
What is faster? Char or Int?
Unsigned SMALLINT values use two bytes and have values in the range [0, 65535]. CHAR(4) values take four bytes. So, indexing SMALLINT values will make for a smaller index. Smaller is faster. Plus indexes on character columns usually have all sorts of character-set and case-insensitivity monkey business built in to them, which also takes time and space.
But, for a table with at most 65K rows, the effect of this choice will be so small you'll have trouble measuring it. If you build something that's hard to debug, you'll spend your precious time and ten thousand times as much computer time debugging it than it will save.
Design your tables so they match your application. If you're using a four-digit number use SMALLINT.
The next person to work on your code (even if that person is you a year from now) will thank you for a clear implementation.
And keep in mind that MySQL ignores the number in parentheses on INT declarations. SMALLINT(4), SMALLINT(5), and SMALLINT all mean precisely the same thing. MySQL uses the native processor integer datatypes: TINYINT is an 8-bit number, SMALLINT a 16-bit number, INT a 32-bit number, and BIGINT a 64-bit number. Likewise FLOAT is a 32-bit IEEE 754 floating point number and DOUBLE a 64-bit one. The number of digits SMALLINT(4) is a nod to SQL standards compatibility.
As mentioned by O. Jones, SMALLINT will be faster and more space-efficient.
This is related to the following answer: mysql-char-vs-int
Also, MySQL Documentation:
CHAR and VARCHAR types
Integer Types
Case 1: The difference between CHAR(4) and SMALLINT is insignificant. It should not influence you choice of datatypes. Instead, use the datatypes that match the data.
Case 2: If you are comparing TINYINT to VARCHAR(255), the answer is probably different. Note that there is a much bigger difference in the choices.
Case 3: If the choice comes down to whether to "normalize" a column, there are arguments either way. I much prefer using a CHAR(2) for country_code than normalizing in order to shrink to a TINYINT. The overhead of extra normalization always(?) outweighs the space savings.
Another consideration: How many secondary keys are on the table? And how many other tables will you be joining to?
Case 4: PRIMARY KEY(big_string) but no secondary keys. There is no possibly no advantage in switching to an int.
Case 5: Since secondary keys include the PK, consider:
PRIMARY KEY(big_string),
INDEX(foo),
INDEX(bar)
versus
PRIMARY KEY(id), -- surrogate AUTO_INCREMENT
INDEX(big_string),
INDEX(foo),
INDEX(bar)
The latter will take less disk space.
Another consideration: Fetching a row is far more costly than comparing an int or string. My point is that you should not worry about comparison performance; you should look at the bigger picture when optimizing.
Case 6: USA 5-digit zip code. CHAR(5) (5 bytes) is reasonable. MEDIUMINT(5) UNSIGNED ZEROFILL (3 bytes) is better because it does everything better. (And it is a very rare case of the *INT(n) being meaningful.)
And the debate goes on and on.
I have a BOOK table consits of 15 columns, but most of them are small integers(INT(1) for different ratings, and also somewhere INT(4) or INT(5))
The table is meant to be used for dynamic search with filters on web-site. In order to speed things up, I created indexes on almost every INT column (10-11 indexes in total). I don't have most of data in table yet, but will I have any memory trouble in prospect of huge table?
My question in general - does small integer index require comparatively more memory than I expect?
It's a lot easier to shrink the datatypes before you have a zillion rows in the table.
INT UNSIGNED takes 4 bytes and allows numbers from 0 to about 4 billion.
TINYINT UNSIGNED takes 1 byte and allows values 0..255. So, if you have a billion-row table, changing an INT to TINYINT would shrink the disk footprint by 3GB, plus another 3GB if it is also in an index. (This is a simplification; hope you get the idea.)
SMALLINT UNSIGNED takes 2 bytes, allowing 0..65535. That is probably what you want instead of INT(4) and maybe INT(5)?
The (5) means nothing (except when used with ZEROFILL).
Your table will probably be 1/3 data and 2/3 index. This ratio is abnormal, but not "bad".
Instead of 10-11 single-column indexes, I recommend you make about that many 2-column indexes. This will improve some more queries.
You need to get a feel for the traffic -- what columns do people usually filter on? And how do they filter? That is a=7 versus a>7.
Once you have some likely SELECTs, study my Cookbook to see how to optimize the indexes. After that, come back with SHOW CREATE TABLE and the SELECTs; I may suggest further tweaks.
I would not hesitate to build a table like yours with a billion rows, even if I did not have enough RAM to cache it all.
I have a database using VARCHAR(255) as its primary key on tables and they look like GUIDs. Wouldnt an Int be better for performance?
It depends on your storage engine, but generally speaking an int/bigint would be better. If you are using innodb, a uuid/guid is a bad choice for a primary key because of the way a clustered index works. read this blog to learn more about it. To sum it up, keys are stored by range and since uuids' are random they would make inserts and lookups less efficient since you would thrash the cache with reading and writing whole memory blocks for each row.
Ints take less space on disk so you need less I/O when searching. As long as the range suits your need I would say that an int would be faster.
If the values of the primary key in my table range from 200,000,000 to 200,000,100
Will queries be much slower than if the values were 1000 to 1100?
No. Unless you're using char/varchar fields for those numbers, a number is stored in a fixed-sized raw binary format, and the "size" of a number has no bearing on search speeds - 1 will occupy as much space inside the database as 999999999999.
The answer to your specific question is no, they will not be much slower.
Maybe a few nanoseconds as the extra bits get sent down the wire and processed, but nothing significant.
However, if your question was "will having 200,000,100 rows be slower than 1,100?" then yes, it would be a bit slower.
No, but as others have noted, a larger number of records in your table will generally require more IO to retrieve records, although it can be mitigated through indexes.
The data type for your primary key will make a slight difference (200M still fits into a 4 byte INT, but an 8 byte BIGINT will be slightly larger, and a CHAR(100) would be more wasteful etc). This choice will result in slightly larger storage requirements, for this table, its indexes, and other tables with foreign keys to this table.
Well, define much slower.....
The values 1000-1100 can fit in a SMALLINT (2 bytes) and 200,000,000 will have to be put into an INT (4 bytes). So there can be twice as many records in memory for a SMALLINT as for an INT. So 1000-11000 will be faster
Nope, otherwise the algorithm would be very stupid... Your values perfectly fit into 32 bit integer.
The queries will be exactly the same.
They will be slower, however, when you have between 200,000,000 to 200,000,100 rows in your table when compared to 1000 to 1100 rows.
Not at all. It wouldn't at all. The mysql developers are very particular about high value - speed interoperability.