I'm just getting into indexes on MySQL using InnoDB.
Firstly, and hopefully I am right, because I am using InnoDB and creating foreign keys, they will automatically be used as an index when querying the table? Is that correct?
Also, I'm reading that the order of the index will effect the speed of a query and even if it is used.
SO...how exactly do I specify the order of the index if that will indeed impact queries.
If you take my below table for example. It would be very beneficial for a query on this table to first use the index FK on org_id, since that is going greatly reduce the amount of rows read, and it is the org_id that most data is going to be separated by in the application.
CREATE TABLE IF NOT EXISTS `completed_checks` (
`complete_check_id` int(15) NOT NULL AUTO_INCREMENT,
`check_type` varchar(40) NOT NULL,
`check_desc` varchar(200) DEFAULT NULL,
`assigned_user` int(12) DEFAULT NULL,
`assigned_area` int(12) DEFAULT NULL,
`org_id` varchar(8) NOT NULL,
`check_notes` varchar(300) DEFAULT NULL,
`due` date NOT NULL,
`completed_by` int(12) DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`status` int(1) DEFAULT NULL,
`passed` int(1) DEFAULT '0',
PRIMARY KEY (`complete_check_id`),
KEY `fk_org_id_CCheck` (`org_id`),
KEY `fk_user_id_CCheck` (`assigned_user`),
KEY `fk_AreaID_CCheck` (`assigned_area`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So would MySQL use the FK index on org_id first when querying this table with org_id in the where clause?
And on a separate note, how would I specify the order in which the indexes are used in MySQL? If this is something that I need to be concerned about?
Thanks
Yes, this is correct, see MySQL documentation on creating foreign keys:
index_name represents a foreign key ID. The index_name value is ignored if there is already an explicitly defined index on the child table that can support the foreign key. Otherwise, MySQL implicitly creates a foreign key index
The order of the indexes in a table does not affect what index a query will use. You cannot even say that in general all queries should use such index first since different queries may need different indexes. Moreover, MySQL cannot use more than 1 index per table in a query.
In general MySQL decides which index to use (if any). If you believe that MySQL erred in its decision, then you can use index hint to influence MySQL's decision:
Index hints give the optimizer information about how to choose indexes during query processing.
In the newer versions of MySQL you can also use optimizer hints to influence the query plan.
The last way to influence index use is to force the update of the index statistics collected on a table using the analyse table command:
ANALYZE TABLE analyzes and stores the key distribution for a table.
Related
Example:
Here is the employee table:
CREATE TABLE `employees` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`code` varchar(4) NOT NULL,
`deleted_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
);
The code is a simple login code of 4 characters. Soft delete is implemented using deleted_at field. Current employees are those with deleted_at=NULL.
We need to keep the code unique between the current employees.
Using a UNIQUE Constraints on the code field will prevent current employees from using codes that have been used by a soft-deleted employee.
How to enforce this constraint?
This is an example of the general problem of how to enforce consistency constraints in MySQL.
Edit:
The schema could be changed to make use of unique constraints as #bill-karwin suggests.
What about applying complex consistency constraints that may span multiple tables?
One way (if possible) is to change the schema in order to apply the constraints using foreign key constraint or unique constraint.
Is there another way to apply complex consistency constraints?
One relatively simple solution to your problem would be to change the deleted_at column to default to something other than NULL (e.g. '1900-01-01', or even the "zero" date '0000-00-00' if you have them enabled). You can then create a UNIQUE index on (code, deleted_at) which would prevent any employee from using a code which a current employee had (since you would get a match on (code,default)), but not exclude them using a code which a previous employee had used, since the default value would not match the deleted_at timestamp.
One solution is to create a nullable column is_active that is restricted to either NULL or a single non-NULL value. The columns code and is_active together must be unique.
CREATE TABLE `employees` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`code` varchar(4) NOT NULL,
`is_active` enum('yes'),
`deleted_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY (`code`, `is_active`)
);
If is_active is NULL, then it allows any number of duplicates in the code column. If is_active is not NULL, then it allows only one value 'yes' and therefore each value in the code column must be unique.
deleted_at no longer indicates if the employee is currently active or not, only when then were inactivated.
Re your comment:
Constraints that span multiple tables are called ASSERTIONS in the SQL standard, but there is literally no RDBMS product that implements that feature from the standard.
Some implement constraints with triggers, but it's not always obvious how to design triggers to do what you want efficiently.
Honestly, most people resort to application logic for these sorts of constraints. This comes with some risk of race conditions. As soon as you do a SELECT statement to verify the data satisfies the constraints, some other concurrent session may commit data that spoils the constraint before your session can commit its changes.
The only solution is to use pessimistic locking to ensure no other session can jump ahead of you.
I have an extremely large MySQL table that I would like to partition. A simplified create of this table is as given below -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`)
)
I want to partition my table either by columnA or id, but the problem is that the MySQL Manual states -
In other words, every unique key on the table must use every column in the table's partitioning expression.
Which means that I cannot partition the table on either of those columns without changing my schema. For example, I have considered adding id to all my unique keys like so -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`,`id`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`,`id`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`,`id`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`,`id`)
)
Which I do not mind doing except for the fact that it allows for the creation of rows that should not be created. For example, by my original schema, the following row insertion wouldn't have worked twice -
INSERT into myTable (columnC, columnD) VALUES (1.0,2.0)
But it works with the second schema as columnC and columnD by themselves no longer form a unique key. I have considered getting around this by using triggers to prevent the creation of such rows but then the trigger cost would reduce(or outweigh) the partitioning performance gain
Edited:
Some additional information about this table:
Table has more than 1.2Billion records.
Using Mysql 5.6.34 version with InnoDB Engine and running on AWS RDS.
Few other indexes are also there on this table.
Because of huge data and multiple indexes it is an expensive process to insert and retrieve the data.
There are no unique indexes on timestamp and float data types. It was just a sample table schema for illustration. Our actual table has similar schema as above table.
Other than Partitioning what options do we have to improve the
performance of the table without losing any data and maintaining the
integrity constraints.
How do I partition a MySQL table that contains several unique keys?
Sorry to say, you don't.
Also, you shouldn't. Remember that UPDATE and INSERT operations to a table with unique keys necessarily must query the table to ensure the keys stay unique. If it were possible to partition a table so unique keys weren't built in to the partititon expression, then every insert or update would require querying every partition. This would be likely to make the partitioning worse than useless.
We are having a table like this to save login tokens per user sessions. This table was not partitioned earlier but now we decided to partition it to improve performance as it contains over a few millions rows.
CREATE TABLE `tokens` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uid` int(10) unsigned DEFAULT NULL,
`session` int(10) unsigned DEFAULT '0',
`token` varchar(128) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `usersession` (`uid`,`session`),
KEY `uid` (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY HASH(id) PARTITIONS 101;
We plan to partition based on 'id' as it's primarily used for "select" queries and hence can effectively perform pruning.
However the problem is we maintain unique index of (uid, session) and partition requires participating column to be part of unique index. Now in this case unique index of (id, uid, session) doesn't make sense (will always be unique).
is there anyway to get around this issue without manually checking (uid, session).
Don't use partitioning. It won't speed up this kind of table.
I have yet to see a case of BY HASH that speeds up a system.
It is almost never useful to partition on the PRIMARY KEY.
In general, don't have an AUTO_INCREMENT id when you have a perfectly good "natural" PK -- (uid, session). Or should it be (toke n)?
Don't have one index being the first part of another: (uid) is redundant, given (uid, session).
Consider using utf8mb4 if you expect to have Emoji or Chinese. On the other hand, if token is, say, base64, then make it ascii or something.
So, I think this will work significantly better (smaller, faster, etc):
CREATE TABLE `tokens` (
`uid` int(10) unsigned DEFAULT NULL,
`session` int(10) unsigned DEFAULT '0',
`token` VARBINARY(128) NOT NULL DEFAULT '',
PRIMARY KEY (token),
) ENGINE=InnoDB
Which of these do you search by?
WHERE token = ...
WHERE uid = ... AND session = ...
One drawback is that I got rid of id; if id is needed by other tables, then a change is needed there.
Presumably your unique uid,sessionkey index enforces some business rule for you.
Do you rely on DBMS enforcement of that rule? Do you use INSERT .... ON DUPLICATE KEY UPDATE... statements, or use error handlers, or some such thing, to handle this uniqueness? Or is it there just for good measure?
If you rely on that unique index, partitioning this table on id will not work. Fugeddaboudit.
If you can delete that index, or delete its unique constraint, you may be able to proceed with partitioning. But partitioning isn't generally suitable for tables with multiple unique keys.
A 40M-row table is ordinarily not large enough to be a good candidate for partitioning. If you're having performance problems you should investigate improving your indexing instead.
Edit: If you have modern hardware (multi-terabyte storage, plenty of RAM) and well-chosen indexes, partitioning is (I believe) more trouble that it's worth. It's definitely a lot of trouble for tables with fewer than about 10**9 rows. When your autoincrementing id values must be BIGINT rather than INT data types (because int.MaxValue isn't big enough), that's when partitioning starts to be worth considering.
It's most effective when all queries filter based on the partitioning key. Filtering on other criteria without the partitioning key is slow.
Pro tip: The old saying about regular expressions also applies to partititions. If you solve a problem with partitioning, now you have two problems.
I have table in my database which looks like this (names changed to comply with NDA)
CREATE TABLE `Job` (
`id` varchar(45) NOT NULL,
`type` int(11) NOT NULL,
`status` int(11) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `JobTypeFK_idx` (`type`),
KEY `JobStatusFK_idx` (`status`),
KEY `JobTypeFK_idx1` (`type`),
KEY `JobStatusFK_idx1` (`status`),
KEY `JobParentIDFK_idx` (`parent_id`),
)ENGINE=InnoDB DEFAULT CHARSET=latin1;
I have read the significance of naming Indexes as per this question
significance of index name in creating an index (mySQL)
unfortunately it doesn't talk about the situation where there is more than one duplicate index on the same column.
There is another question, but relevant to SQL Server
Same column with multiple index names. It is possible. What is the use?
Unfortunately I am not working with SQL-Server. I was cleaning up the schema to use newer MySQL features when I came across duplicate index names, which I want to remove. I just wanted to know if there is any possible problems which I might face later? If I keep worrying about breaking something, then I would never be able to clean up the schema.
As far as I know, the only place where index names are used (other than DDL statements to modify and drop indexes) is in index hints. This allows you to suggest or force MySQL to use a specific index in a query, and it identifies them by name. If you ever make use of this feature, and you remove the index that's required by the query, the query will get an error.
As this feature is very rarely used, you can probably remove the redundant indexes without worrying about breaking anything. On the off chance that you do use this feature, just make sure you remove the index that isn't named. On the really unlikely chance that you have different queries that force different names of indexes on the same column, rewrite them to use the same index name, and then remove the other index.
You can search your code for the regular expression:
\b(using|ignore|force)\s+(index|key)\b
to find any uses of this feature.
I have the following table:
create table stuff (
id mediumint unsigned not null auto_increment primary key,
title varchar(150) not null,
link varchar(250) not null,
time timestamp default current_timestamp not null,
content varchar(1500)
);
If I EXPLAIN the query
select id from stuff order by id;
then it says it uses they primary key as an index for ordering the results. But with this query:
select id,title from stuff order by id;
EXPLAIN says no possible keys and it resorts to filesort.
Why is that? Isn't the data of a certain row stored together in the database? If it can order the results using the index when I'm querying only the id then why adding an other column to the query makes a difference? The primary key identifies the row already, so I think it should use the primary key for ordering in the second case too.
Can you explain why this is not the case?
Sure, because it is more performant in this query: you need to read full index and after that iteratively read row by row from data. This is extremely unefficient. Instead of this mysql just prefers to read the data right from the data file.
Also, what kind of storage engine do you use? Seems like mysam.
For this case innodb would be more efficient, since it uses clustered indexes over primary key (which is monotonously growing in your case).