MySQL: INDEX name (lastName, firstName) - mysql

this is the query from tutorial i read
CREATE TABLE Employee (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
departmentId TINYINT UNSIGNED NOT NULL
COMMENT "CONSTRAINT FOREIGN KEY (departmentId) REFERENCES Department(id)",
firstName VARCHAR(20) NOT NULL,
lastName VARCHAR(40) NOT NULL,
email VARCHAR(60) NOT NULL,
ext SMALLINT UNSIGNED NULL,
hireDate TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
leaveDate DATETIME NULL,
INDEX name (lastName, firstName),
INDEX (departmentId)
)
what is the function of INDEX name (lastName, firstName) ?
Please inform me if my question is not clear.
Thank you,
GusDe

INDEX name (lastName, firstName) is creating an additional index for fast lookups when you are querying using the lastname with or without the first.
It is a composite index because it includes two columns.
Added The author of the tutorial is "guessing" that employees will often be looked up by their name or by their departmentID. That's why he or she created the two additional indexes.
-- The primary key index is automatically created for you in most dbms systems.
In real life, it is not wise to solely rely on "guessing" what columns in the tables should be indexed. Instead, use the "slow queries" log (MySQL example) to determine what queries are executing slowly and how to speed them up. Usually the answer is to add another index or two.
ps. The downside of indexes is that they increase the time required to add, update or delete data in the table since the table and the index have to be modified. A second downside of indexes is that they take up room in the db. But storage is cheap these days.
Since most databases have far more reads than writes, the speedup in querying provided by the index usually far outweighs the costs.

Related

Clustered index for foreign key column

Consider the following example of a messaging system:
create table chat_group
(
id int auto_increment primary key,
title varchar(100) not null,
date_created date not null
)
create table chat_message
(
id int auto_increment,
user_id int not null,
chat_group_id int not null,
message text charset utf8mb4 not null,
date_created datetime not null
)
Now I see that the most common request for the chat_message table is SELECT * FROM chat_message where chat_group_id = ?. So my idea is to put a clustered index on chat_group_id column so the chat messages will be organized by groups on the disk.
But in MySQL it requires PRIMARY KEY(which actually is a clustered index) to be unique, so what is the solution here? What clustered index do I make for the given situation.
Yes, "you can have your cake and eat it, too":
PRIMARY KEY(chat_group_id, id),
INDEX(id)
The PK provides "clustering" by the group; this is likely to speed up your main queries. Including id makes it UNIQUE, which is a requirement (in MySQL) for the PK.
The secondary INDEX(id) is the minimum needed to keep AUTO_INCREMENT happy -- namely having some index starting with the id.

mysql choose between unique key and primary key for user id

Im creating a user database ... i want to separate user - cellphone number from 'user' table and create another table for it (user_cellphone (table))
but i have a problem to select best index !
in user_cellphone table, we get user_id and cellphone number ... but all SELECT queries are more based on 'user_id' so i want to know if it's better to choose 'user_id' column as primary key or not !!!
(Also each user have only one cellphone number !)
which option of these 2 options are better ?
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`),
UNIQUE INDEX `user_id` (`user_id`)
)
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`),
UNIQUE INDEX `id` (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`)
)
choosing 'user_id' as primary key or just set 'user_id' as a unique key ?! is there any different here in performance ? (Im talking about when i have millions of rows)
in future im going to use some queries like this:
select u.*,cell.* FROM user AS u LEFT JOIN user_cellphone AS cell ON cell.user_id = u.id
so which one of these options give me better performance for some queries like this ?
May I offer some hard-won data design advice?
Do not use telephone numbers as any kind of unique or primary key.
Why not?
Sometimes multiple people use a single number.
Sometimes people make up fake numbers.
People punctuate numbers based on context. To my neighbors, my number is (978)555-4321. To a customer in the Netherlands it is +1.978.555.4321. Can you write a program to regularize those numbers? Of course. Can you write a correct program to do that? No. Why bother trying. Just take whatever people give you.
(Unless you work for a mobile phone provider, in which case ask your database administrator.
Read this carefully. https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md
InnoDB tables are stored as a clustered index, also called an index-organized table. If the table has a PRIMARY KEY, then that is used as the key for the clustered index. The other UNIQUE KEY is a secondary index.
Queries where you look up rows by the clustered index are a little bit more efficient than using a secondary index, even if that secondary index is a unique index. So if you want to optimize for the most common query which you say is by user_id, then it would be a good idea to make that your clustered index.
In your case, it would be kind of strange to separate the cellphones into a separate table, but then make user_id alone be the PRIMARY KEY. That means that only one row per user_id can exist in this table. I would have expected that you separated cellphones into a separate table to allow each user to have multiple phone numbers.
You can get the same benefit of the clustered index if you just make sure user_id is the first column in a compound key:
CREATE TABLE `user_cellphone_num` (
`user_id` INT UNSIGNED NOT NULL,
`num` TINYINT UNSIGNED NOT NULL,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`, `num`)
)
So a query like SELECT ... FROM user_cellphone_num WHERE user_id = ? will match one or more rows, but it will be an efficient lookup because it's searching the first column of the clustered index.
Reference: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

Priority of primary key in multiple column index?

I'm new to sql and now working with MySQL.
I'm going through the concept of indexes and I'm not sure what would happen in the following case:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
I have read that here, last_name or (last_name,first_name) can be used for lookup where as first_name cannot be used for lookup directly (not a leftmost index).
I have also read that PRIMARY KEY and UNIQUE KEY are indexed automatically. So, in my case where the id index comes? Don't it come as a leftmost prefix.
select * from test
where id=xxx and last_name==xxxx
will this call for a look up or searches the entire database?
First, your query is redundant. The id comparison is sufficient.
The optimizer is going to recognize that two indexes can be used for the query. I'm pretty sure that MySQL will choose the primary key index, because it is unique and clustered. Hence, it is obviously the correct one.
If neither index is unique or a primary key, then MySQL will resort to statistics about the indexes (or arbitrarily choosing one of them). You can read about index statistics in the documentation.

mysql innodb primary key

Currently we have tables that has type 4 UUID(random) as primary key.
Our application layer does a batch insert into DB.
But still since the primary key is random, this will result in multiple disk spins.
There are four tables.
CREATE TABLE process (
process_id binary(16) not null, // uuid
created_time bigint not null,
owner varchar(150) not null,
primary key (process_id),
index idx_c_t (created_time),
index idx_o (owner)
)
Records inserted = 2000/min
CREATE TABLE process_job (
job_id binary(16) not null, //uuid
process_id binary(16) not null, //uuid
info varchar(200),
text varchar(500),
primary key(job_id),
index idx_p_id (process_id)
)
Records inserted = 10000/min
CREATE TABLE ob_status (
job_id binary(16) not null, //uuid
status ('STARTED', 'SUCCESS', 'ERROR') not null,
job_code varchar(100) not null,
info varchar(200),
text varchar(500),
primary key(job_id, status, job_code)
)
Records inserted = 20000/min
CREATE TABLE process_job_custom (
job_id binary(16) not null, //uuid
key varchar(100) not null,
value varchar(500),
primary key(process_id, key)
)
Records inserted = 10000/min
All our tables use DYNAMIC format.
Further we delete 15 days old data periodically.
We run this deletes in batches by considering around 1000 records.
But when ever the deletes run, the whole db performance is bad. Disk usage is very high.(we suspect this is due to the randomness of the primary key)
So we are planning to alter our primary key as (time based key, uuid) and add index on (uuid) columns.
The records may arrive in random order(not exactly in the time based key order).
But the records for a time based key arrives mostly within 5 minute spread.
Also our deletes are based on the time based key.
Will this affect the performance of inserts?
Also our primary use cases involve time based queries.
So will the select performance also increase?
Further we are planning on partitioning by time based key.
Will this help us in better performance overall?
We suspect the major issue was the randomness of the primary key.
Will add the time based key(something like created time of the process) as the first part of the primary key in the all the tables and indexing based on the uuid columns help us?

mySQL KEY Partitioning using three table fields (columns)

I am writing a data warehouse, using MySQL as the back-end. I need to partition a table based on two integer IDs and a name string. I have read (parts of) the mySQL documentation regarding partitioning, and it seems the most appropriate partitioning scheme in this scenario would be either a HASH or KEY partitioning.
I have elected for a KEY partitioning because I (chicked out and) dont want to be responsible for providing a 'collision free' hashing algorithm for my fields - instead, I am relying on MySQL hashing to generate the keys required for hashing.
I have included below, a snippet of the schema of the table that I would like to partition based on the COMPOSITE of the following fields:
school id, course_id, ssname (student surname).
BTW, before anyone points out that this is not the best way to store school related information, I'll have to point out that I am only using the case below as an analogy to what I am trying to model.
My Current CREATE TABLE statement looks like this:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
FOREIGN KEY (school_id) REFERENCES school(id) ON DELETE RESTRICT ON UPDATE CASCADE,
FOREIGN KEY (course_id) REFERENCES course(id) ON DELETE RESTRICT ON UPDATE CASCADE,
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname(16))
) ENGINE=innodb;
I would like to know how to modify the statement above so that the table is partitioned using the three fields I mentioned at the begining of this question (namely - school_id, course_id and the starting letter of the students surname).
Another question I would like to ask is this:
What happens in 'edge' situations for example if I attempt to insert a record that contains a valid* school_id, course_id or surname - for which no underlying partitioned table file exists - will mySQL automatically create the underlying file.?
Case in point. I have the following schools: New York Kindergaten, Belfast Elementary and the following courses: Lie Algebra in Infitesmal Dimensions, Entangled Entities
Also assume I have the following students (surnames): Bush, Blair, Hussein
When I add a new school (or course, or student), can I insert them into the foobar table (actually, I cant think why not). The reason I ask is that I forsee adding more schools and courses etc, which means that mySQL will have to create additional tables behind the scenes (as the hash will generate new keys).
I will be grateful if someone with experience in this area can confirm (preferably with links backing their assertion), that my understanding (i.e. no manual administration is required if I add new schools, courses or students to the database), is correct.
I dont know if my second question was well formed (clear) or not. If not, I will be glad to clarify further.
*VALID - by valid, I mean that it is valid in terms of not breaking referential integrity.
I doubt partitioning is as useful as you think. That said, there are a couple of other problems with what you're asking for (note: the entirety of this answer applies to MySQL 5; version 6 might be different):
columns used in KEY partitioning must be a part of the primary key. school_id, course_id and ssname are not part of the primary key.
more generally, every UNIQUE key (including the primary key) must include all columns in the partition1. This means you can only partition on the intersection of the columns in the UNIQUE keys. In your example, the intersection is empty.
most partitioning schemes (other than KEY) require integer or null values. If not NULL, ssname will not be an integer value.
foreign keys and partitioning aren't supported simultaneously2. This is a strong argument not to use partitioning.
Fortunately, collision free hashing is one thing you don't need to worry about, because partitioning is going to result in collisions (otherwise, you'd only have a single row in each partition). If you could ignore the above problems as well as the limitations on functions used in partitioning expressions, you could create a HASH partition with:
CREATE TABLE foobar (
...
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id + ORD(ssname))
PARTITIONS 2
;
What should work is:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id)
PARTITIONS 2
;
or:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id, ssname),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY KEY (school_id, course_id, ssname)
PARTITIONS 2
;
As for the files that store tables, MySOL will create them, though it may do it when you define the table rather than when rows are inserted into it. You don't need to worry about how MySQL manages files. Remember, there are a limited number of partitions, defined when you create the table by the PARTITIONS *n* clause.