MySQL Primary Keys and VARCHAR(255) - mysql

I have a database using VARCHAR(255) as its primary key on tables and they look like GUIDs. Wouldnt an Int be better for performance?

It depends on your storage engine, but generally speaking an int/bigint would be better. If you are using innodb, a uuid/guid is a bad choice for a primary key because of the way a clustered index works. read this blog to learn more about it. To sum it up, keys are stored by range and since uuids' are random they would make inserts and lookups less efficient since you would thrash the cache with reading and writing whole memory blocks for each row.

Ints take less space on disk so you need less I/O when searching. As long as the range suits your need I would say that an int would be faster.

Related

Primary Key using timestamp+ guuid

I have large data , will it differ in indexing performace if I have epoc+guid vs guid+epoc as primary key since epoc are sorted.
Strictly speaking, a primary key is a logic concept, rather than a physical concept, but it's usually implemented using an index (which is a physical concept). On InnoDB, that index will affect the order in which data is stored. This means that when writing to the table, it will have to re-order the data in the order of the primary key if the primary key value isn't neatly appended to the end.
Therefore, using epoc+guid should be much faster than guid+epoc, as long as the epoc reflects the time of writing the data to disk. If the epoc is some other value - business transaction date, date of birth, whatever - the difference is harder to predict.
As GUIDs are guaranteed unique, I'm not sure why you would ever want to have guid+epoc.
INSERT: guid+epoc will be somewhat slower. It will become much slower when the table is bigger than can fit in the buffer_pool.
SELECT: If you tend to access only "recent" data, then epoc+guid could be significantly faster. ("locality of reference")
If you use "Type 1" UUID and shuffle the bits, then you get both features (uniqueness and time-ordered) in a single field. (Smaller is better.) More: http://mysql.rjweb.org/doc.php/uuid Also, MySQL 8 has equivalent functions.
To discuss further, please provide the type of data, size of data, and type of inserts/selects/updates/deletes.

Right choice of data type for UUID as primary key

Prior Conditions are
Table will have data in billions
Table will have secondary indexes
Table's primary key will be a foreign key of another table.
Table will have a heavy data (another column may be Text).
Primary Key must be Unique, cause my database is replicated over machine's that's why I am choosing UUID.
PS: space is also a concern so I guess Varchar(36) might be the bad idea
I agree with BINARY(16). (16 bytes is better than 37.)
But UUIDs are hopelessly inefficient for huge tables. (I assume your billion-row table will not fit in RAM.)
I discuss those and more issues in http://mysql.rjweb.org/doc.php/uuid
I would go for BINARY(16). If you want to use CHAR then CHAR(32) is large enough without the hyphens, but BINARY is smaller and faster.

Is there a recommended size for a Mysql Primary key

Each entry in my 'projects' table has a unique 32 characters Hash identifier stored using a varchar(32).
Would that be considered a bad practice to use this as the primary key ? Is there a recommended size, datatype for Primary keys ?
I would say yes, it's a bad idea to use such a large column for a primary key. The reason is that every index you create on that table will have that 32 character column in it, which will bloat the size of all the indexes. Bigger indexes mean more disk space, memory and I/O.
Better to use an auto-increment integer key if possible, and simply create a unique index on your hash identifier column.
Dependstm ;)
Judging by your description, this field is intrinsic to your data and must be unique. If that really is the case, then you must make it a key. If you have child tables, consider introducing another, so called "surrogate" key simply to keep child FKs slimmer and possibly avoid ON UPDATE CASCADE. But beware that every additional index introduces overhead, especially for clustered tables. More on surrogate keys here.
On the other hand, if this key is not intrinsic to your data model, replace it with a smaller one (e.g auto-incremented integer). You'll save some disk space and (more importantly) increase the effectiveness of the cache.
Depends on your usage on how your primary key should be defined. I typically use an INT(11) for my primary keys. It makes it really easy for foreign keys.
I just saw your edit. I would personally use the int(11) with auto increment. Depending on your setup, this would allow for you to have other tables with foreign key restraints very easily. You could do the same thing with varchar but it has always been my understanding that int is faster than varchar especially with indexes.
There's nothing inherently wrong with using this as the PKEY. If you've got many other tables using this as an FKEY, perhaps not. There's no one answer.
Also note, if you know it's always going to be exactly 32 chars, you should make it a CHAR(32) instead.
In database engines, one of the the most important items is the disk space. Keeping small and compact data is normally associated with good performance, by reducing the quantity of data that is transmitted and transferred by the database. If a table is going to have a few lines, there's no reason to define a PK of type INT; MEDIUMINT, SMALLINT or even TINYINT can be used instead (just as you would use DATE instead of DATETIME), it's all about keeping it succinct.
This key is bad for several reasons.
One is addressed by #Eric in that every secondary index will contain those same 32 characters
Primary keys tend to be used in as look up from other tables and those tables also need to have those 32 characters, perhaps in there primary key and the same problem will arise again on those tables.
The biggest reason I can think of is performance. As you insert records of hash type you are basically inserting keys in random order and that in turn will eventually lead to a lot of page splits and pages that only between 50% and 90% filled. That leads to a unnecessary deep tree, longer search times, bigger table space and that the index takes more memory.

how can affect performance when I use UUIDs as my primary keys in MySQL

I would like to know how or how much can affect the performance of the server when I use UUID for my primary keys in MySQL.
I suppose you are using InnoDB (You should anyway....)
So read the following chapter from High performance MySQL 2ed, p.117: http://books.google.com.hk/books?id=BL0NNoFPuAQC&lpg=PA117&ots=COPMBsvA7V&dq=uuid%20innodb%20clustered%20index%20high%20performance%20mysql&pg=PA117#v=onepage&q&f=false
In general, UUID is a poor choice from the performance standpoint (due to clustered index) and they have a benchmark to prove this.
UUID is 36 chars long, which means 36 bytes. INT is 4 bytes with variations of TINYINT, MEDIUMINT and BIGINT, which are all below 36 bytes.
Therefore, for each record you insert, your index will allocate more space.
Also, UUID is taking more time to be computed opposing to incrementing integer.
As for how much it can affect the system, it's hard to give out the actual numbers because there are various factors - the load of the system, the hardware and so on.

What is the use of Mysql Index key?

Hi I am a newbie to mysql
Here are my questions:
What is the use of Mysql Index key?
Does it make a difference in mysql queries with defining an Index key and without it?
Are all primary keys default Index key?
Thanks a million
1- Defining an index on a column (or set of columns) makes searching on that column (or set) much faster, at the expense of additional disk space.
2- Yes, the difference is that queries using that column will be much faster.
3- Yes, as it's usual to search by the primary key, it makes sense for that column to always be indexed.
Read more on MySQL indexing here.
An index is indeed an additional set of records. Nothing more.
Things that make indexes access faster are:
Internally there's more chance that the engine put in buffer the index than the whole table rows
The index is smaller so to parse it means reading less blocks of the hard drive
The index is sorted already, so finding a given value is easy
In case of being not null, it's even faster (for various reasons, but the most important thing to know is that the engine doesn't store null values in indexes)
To know whether or not an index is useful is not so easy to guess (obviously I'm not talking about the primary key) and should be investigated. Here are some counterparts when it might slow down your operations:
It will slow down inserts and updates on indexed fields
It requires more maintenance: statistics have to be built for each index so the computing could take a significantly longer time if you add many indexes
It might slow down the queries when the statistics are not up to date. This effect could be catastrophic because the engine would actually go "the wrong way"
It might slow down when the query are inadequate (anyway indexes should not be a rule but an exception: no index, except if there's an urge on certain queries. I know usually every table has at least one index, but it came after investigations)
We could comment this last point a lot, but I think every case is special and many examples of this already exist in internet.
Now about your question 'Are all primary keys default Index key?', I must say that it is not like this that the optimizer works. When there are various indexes defined in a table, the more efficient index combination will be compiled with on the fly datas and some other static datas (indexes statistics), in order to reach the best performances. There's no default index per se, every situation leeds to a different result.
Regards.