MySQL performance using AUTO_INCREMENT on a PRIMARY KEY - mysql

I ran a comparison INSERTing rows into an empty table using MySQL 5.6.
Each table contained a column (ascending) that was incremented serially by AUTO_INCREMENT, and a pair of columns (random_1, random_2) that receive random, unique numbers.
In the first test, ascending was PRIMARY KEY and (random_1, random_2) were KEY. In the second test, (random_1, random_2) were PRIMARY KEY and ascending was KEY.
CREATE TABLE clh_test_pk_auto_increment (
ascending_pk BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, -- PK
random_ak_1 BIGINT UNSIGNED NOT NULL, -- AK1
random_ak_2 BIGINT UNSIGNED, -- AK2
payload VARCHAR(40),
PRIMARY KEY ( ascending_pk ),
KEY ( random_ak_1, random_ak_2 )
) ENGINE=MYISAM
AUTO_INCREMENT=1
;
CREATE TABLE clh_test_auto_increment (
ascending_ak BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, -- AK
random_pk_1 BIGINT UNSIGNED NOT NULL, -- PK1
random_pk_2 BIGINT UNSIGNED, -- PK2
payload VARCHAR(40),
PRIMARY KEY ( random_pk_1, random_pk_2 ),
KEY ( ascending_ak )
) ENGINE=MYISAM
AUTO_INCREMENT=1
;
Consistently, the second test (where the auto-increment column is not the PRIMARY KEY) runs slightly faster -- 5-6%. Can anyone speculate as to why?

Primary keys are often used as the sequence in which the data is actually stored. If the primary key is incremented, the data is simply appended. If the primary key is random, that would mean that existing data must be moved about to get the new row into the proper sequence. A basic (non-primary-key) index is typically much lighter in content and can be moved around faster with less overhead.
I know this to be true for other DBMS's; I would venture to guess that MySQL works similarly in this respect.
UPDATE
As stated by #BillKarwin in comments below, this theory would not hold true for MyISAM tables. As a followup-theory, I'd refer to #KevinPostlewaite's answer below (which he's since deleted), that the issue is the lack of AUTO_INCREMENT on a PRIMARY KEY - which must be unique. With AUTO_INCREMENT it's easier to determine that the values are unique since they are guaranteed to be incremental. With random values, it may take some time to actually walk the index to make this determination.

Related

Differences between defining primary key - along with column name, at the end of create table stmt, adding primary key index after create table stmt

There are three ways I have seen to define primary keys.
Define along with its column name definition:
CREATE TABLE test (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
-- other fields
);
Define the key at the end of the table definition:
CREATE TABLE test (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
-- other fields
PRIMARY KEY (id)
);
Adding primary key index after table creation. Generally I have seen this in phpMyAdmin's exported .sql files. (Does it depends on the storage engine used?)
CREATE TABLE test (
id INT UNSIGNED NOT NULL,
-- other fields
);
ALTER TABLE test
ADD PRIMARY KEY (id),
MODIFY id INT UNSIGNED NOT NULL AUTO_INCREMENT;
What are the internal differences between all these methods?
Mostly I have seen that importing an SQL file having the 3rd method takes longer time than having other methods.
Edit (After Bill Karwin told that "(the) example(s) shows no import of data"):
The examples above don't contain INSERT queries, but what differences there will be if there are INSERT statements after each of these CREATE TABLE queries for inserting data in them?
There is no difference between the first two forms. It's only a syntax convenience if your primary key is a single column. But if you have a multi-column primary key, you must define the PK as a table constraint:
CREATE TABLE test (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
other INT NOT NULL,
-- other fields
PRIMARY KEY (id, other)
);
The third form is almost the same, because you define the primary key before inserting any data into the table. The only effect is that metadata is altered by the second DDL statement.
Some people claim that adding the primary key after importing data is faster, but this is not true for MySQL's default storage engine InnoDB. The table data is stored as a clustered index. If you don't declare your own primary key, another row id is created implicitly, and this becomes the key for the clustered index. So you're inserting into an index one way or the other.
It's possible that in the old MyISAM storage engine, inserting data to a table with no primary key is a little faster. But you have to count the extra time it takes to add the primary key after you're done inserting data.
In any case, your example shows no import of data, so it's moot.

mysql choose between unique key and primary key for user id

Im creating a user database ... i want to separate user - cellphone number from 'user' table and create another table for it (user_cellphone (table))
but i have a problem to select best index !
in user_cellphone table, we get user_id and cellphone number ... but all SELECT queries are more based on 'user_id' so i want to know if it's better to choose 'user_id' column as primary key or not !!!
(Also each user have only one cellphone number !)
which option of these 2 options are better ?
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`),
UNIQUE INDEX `user_id` (`user_id`)
)
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`),
UNIQUE INDEX `id` (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`)
)
choosing 'user_id' as primary key or just set 'user_id' as a unique key ?! is there any different here in performance ? (Im talking about when i have millions of rows)
in future im going to use some queries like this:
select u.*,cell.* FROM user AS u LEFT JOIN user_cellphone AS cell ON cell.user_id = u.id
so which one of these options give me better performance for some queries like this ?
May I offer some hard-won data design advice?
Do not use telephone numbers as any kind of unique or primary key.
Why not?
Sometimes multiple people use a single number.
Sometimes people make up fake numbers.
People punctuate numbers based on context. To my neighbors, my number is (978)555-4321. To a customer in the Netherlands it is +1.978.555.4321. Can you write a program to regularize those numbers? Of course. Can you write a correct program to do that? No. Why bother trying. Just take whatever people give you.
(Unless you work for a mobile phone provider, in which case ask your database administrator.
Read this carefully. https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md
InnoDB tables are stored as a clustered index, also called an index-organized table. If the table has a PRIMARY KEY, then that is used as the key for the clustered index. The other UNIQUE KEY is a secondary index.
Queries where you look up rows by the clustered index are a little bit more efficient than using a secondary index, even if that secondary index is a unique index. So if you want to optimize for the most common query which you say is by user_id, then it would be a good idea to make that your clustered index.
In your case, it would be kind of strange to separate the cellphones into a separate table, but then make user_id alone be the PRIMARY KEY. That means that only one row per user_id can exist in this table. I would have expected that you separated cellphones into a separate table to allow each user to have multiple phone numbers.
You can get the same benefit of the clustered index if you just make sure user_id is the first column in a compound key:
CREATE TABLE `user_cellphone_num` (
`user_id` INT UNSIGNED NOT NULL,
`num` TINYINT UNSIGNED NOT NULL,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`, `num`)
)
So a query like SELECT ... FROM user_cellphone_num WHERE user_id = ? will match one or more rows, but it will be an efficient lookup because it's searching the first column of the clustered index.
Reference: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

The best way for designing many to many entities relationship

I have two tables permissions and groups of many to many relationship
CREATE TABLE `permissions` (
`Permission_Id` int(11) NOT NULL AUTO_INCREMENT,
`Permission_Name` varchar(50) DEFAULT NULL,
PRIMARY KEY (`Permission_Id`)
)
Groups table
CREATE TABLE `groups` (
`Group_Id` int(11) NOT NULL AUTO_INCREMENT,
`Group_Desc` varchar(100) DEFAULT NULL,
PRIMARY KEY (`Group_Id`)
)
I am confuse how to implement the many to many relationship
which is better to create a composite primary key of Group_id and Permission_id in a new table
Or to create a new table & select the columns from the two table using join keyword .
From my blog:
Do it this way.
CREATE TABLE XtoY (
# No surrogate id for this table
x_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to one table
y_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to the other table
# Include other fields specific to the 'relation'
PRIMARY KEY(x_id, y_id), -- When starting with X
INDEX (y_id, x_id) -- When starting with Y
) ENGINE=InnoDB;
Notes:
⚈ Lack of an AUTO_INCREMENT id for this table -- The PK given is the 'natural' PK; there is no good reason for a surrogate.
⚈ "MEDIUMINT" -- This is a reminder that all INTs should be made as small as is safe (smaller ⇒ faster). Of course the declaration here must match the definition in the table being linked to.
⚈ "UNSIGNED" -- Nearly all INTs may as well be declared non-negative
⚈ "NOT NULL" -- Well, that's true, isn't it?
⚈ "InnoDB" -- More effecient than MyISAM because of the way the PRIMARY KEY is clustered with the data in InnoDB.
⚈ "INDEX(y_id, x_id)" -- The PRIMARY KEY makes it efficient to go one direction; this index makes the other direction efficient. No need to say UNIQUE; that would be extra effort on INSERTs.
⚈ In the secondary index, saying just INDEX(y_id) would work because it would implicit include x_id. But I would rather make it more obvious that I am hoping for a 'covering' index.
To conditionally INSERT new links, use IODKU
Note that if you had an AUTO_INCREMENT in this table, IODKU would "burn" ids quite rapidly.
More
A FOREIGN KEY implicitly creates an index on the column(s) involved.
PRIMARY KEY(a,b) (1) says that the combo (a,b) is UNIQUE, and (2) orders the data by (a,b).
INDEX(a), INDEX(b) (whether generated by FOREIGN KEY or generated manually) is not the same as INDEX(a,b).
InnoDB really needs a PRIMARY KEY, so you may as well say PRIMARY KEY (a,b) instead of UNIQUE(a,b).
I know the solution.
I need a to create "junction" table to hold many-to-many relationship in this case.
CREATE TABLE Groups_Permissions
(
Group_Id INT,
Permission_Id INT,
)
The combination of Group_Id and Permmission_Id should be UNIQUE and have FK to groups and permission tables.

MySQL constraints involving multiple columns

I have a table in an application for which the current schema is:
CREATE TABLE quotes
(
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
quote_request_id INT UNSIGNED NOT NULL,
quote_amount DECIMAL(12, 2) NOT NULL,
accepted TINYINT UNSIGNED NOT NULL DEFAULT 0,
FOREIGN KEY (quote_request_id) REFERENCES quote_requests(id)
) Engine=InnoDB;
I want to enforce a constraint such that only one quote can be accepted for a given quote request - i.e. an UPDATE or INSERT query should fail if it attempts to modify the table such that two or more rows with the same quote_request_id value will have an accepted value of 1.
Is this possible in MySQL? Enforcing constraints such as foreign keys, uniqueness of columns other than the primary key etc. work fine, and I can find information about applying a UNIQUE constraint to multiple columns, but I can't find anything about more complex constraints which involve multiple columns.
If you want to do this without triggers, you can add another table where only accepted quotes will be stored - and you can remove the accepted column from the quotes table:
CREATE TABLE quotes
(
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
quote_request_id INT UNSIGNED NOT NULL,
quote_amount DECIMAL(12, 2) NOT NULL,
--- accepted TINYINT UNSIGNED NOT NULL DEFAULT 0, --- removed
FOREIGN KEY (quote_request_id) REFERENCES quote_requests(id)
UNIQUE KEY (quote_request_id, id) --- needed for the FK below
) Engine=InnoDB;
CREATE TABLE quotes_accepted
(
id INT UNSIGNED NOT NULL PRIMARY KEY,
quote_request_id INT UNSIGNED NOT NULL,
UNIQUE KEY (quote_request_id), --- this ensures there is only one
--- accepted quote per request
FOREIGN KEY (quote_request_id, id)
REFERENCES quotes(quote_request_id, id)
) Engine=InnoDB;
You mean you want a UNIQUE like this:
UNIQUE `quote_accepts` (`quote_request_id`, `accepted`)
where, for a repeat pair of quote_request_id & accepted, the INSERT will fail.
Answered by a_horse_with_no_name, but in a comment so it can't be accepted:
"I don't think this is possible without reverting to a trigger in MySQL because MySQL does not support partial indexes."

Enforce unique rows in MySQL

I have a table in MySQL that has 3 fields and I want to enforce uniqueness among two of the fields. Here is the table DDL:
CREATE TABLE `CLIENT_NAMES` (
`ID` int(11) NOT NULL auto_increment,
`CLIENT_NAME` varchar(500) NOT NULL,
`OWNER_ID` int(11) NOT NULL,
PRIMARY KEY (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The ID field is a surrogate key (this table is being loaded with ETL).
The CLIENT_NAME is a field that contains names of clients
The OWNER_ID is an id indicates a clients owner.
I thought I could enforce this with a unique index on CLIENT_NAME and OWNER_ID,
ALTER TABLE `DW`.`CLIENT_NAMES`
ADD UNIQUE INDEX enforce_unique_idx(`CLIENT_NAME`, `OWNER_ID`);
but MySQL gives me an error:
Error executing SQL commands to update table.
Specified key was too long; max key length is 765 bytes (error 1071)
Anyone else have any ideas?
MySQL cannot enforce uniqueness on keys that are longer than 765 bytes (and apparently 500 UTF8 characters can surpass this limit).
Does CLIENT_NAME really need to be 500 characters long? Seems a bit excessive.
Add a new (shorter) column that is hash(CLIENT_NAME). Get MySQL to enforce uniqueness on that hash instead.
Have you looked at CONSTRAINT ... UNIQUE?
Something seems a bit odd about this table; I would actually think about refactoring it. What do ID and OWNER_ID refer to, and what is the relationship between them?
Would it make sense to have
CREATE TABLE `CLIENTS` (
`ID` int(11) NOT NULL auto_increment,
`CLIENT_NAME` varchar(500) NOT NULL,
# other client fields - address, phone, whatever
PRIMARY KEY (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `CLIENTS_OWNERS` (
`CLIENT_ID` int(11) NOT NULL,
`OWNER_ID` int(11) NOT NULL,
PRIMARY KEY (`CLIENT_ID`,`OWNER_ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I would really avoid adding a unique key like that on a 500 character string. It's much more efficient to enforce uniqueness on two ints, plus an id in a table should really refer to something that needs an id; in your version, the ID field seems to identify just the client/owner relationship, which really doesn't need a separate id, since it's just a mapping.
Here. For the UTF8 charset, MySQL may use up to 3 bytes per character. CLIENT_NAME is 3 x 500 = 1500 bytes. Shorten CLIENT_NAME to 250.
later: +1 to creating a hash of the name and using that as the key.