Consider the following example of a messaging system:
create table chat_group
(
id int auto_increment primary key,
title varchar(100) not null,
date_created date not null
)
create table chat_message
(
id int auto_increment,
user_id int not null,
chat_group_id int not null,
message text charset utf8mb4 not null,
date_created datetime not null
)
Now I see that the most common request for the chat_message table is SELECT * FROM chat_message where chat_group_id = ?. So my idea is to put a clustered index on chat_group_id column so the chat messages will be organized by groups on the disk.
But in MySQL it requires PRIMARY KEY(which actually is a clustered index) to be unique, so what is the solution here? What clustered index do I make for the given situation.
Yes, "you can have your cake and eat it, too":
PRIMARY KEY(chat_group_id, id),
INDEX(id)
The PK provides "clustering" by the group; this is likely to speed up your main queries. Including id makes it UNIQUE, which is a requirement (in MySQL) for the PK.
The secondary INDEX(id) is the minimum needed to keep AUTO_INCREMENT happy -- namely having some index starting with the id.
Related
Im creating a user database ... i want to separate user - cellphone number from 'user' table and create another table for it (user_cellphone (table))
but i have a problem to select best index !
in user_cellphone table, we get user_id and cellphone number ... but all SELECT queries are more based on 'user_id' so i want to know if it's better to choose 'user_id' column as primary key or not !!!
(Also each user have only one cellphone number !)
which option of these 2 options are better ?
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`),
UNIQUE INDEX `user_id` (`user_id`)
)
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`),
UNIQUE INDEX `id` (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`)
)
choosing 'user_id' as primary key or just set 'user_id' as a unique key ?! is there any different here in performance ? (Im talking about when i have millions of rows)
in future im going to use some queries like this:
select u.*,cell.* FROM user AS u LEFT JOIN user_cellphone AS cell ON cell.user_id = u.id
so which one of these options give me better performance for some queries like this ?
May I offer some hard-won data design advice?
Do not use telephone numbers as any kind of unique or primary key.
Why not?
Sometimes multiple people use a single number.
Sometimes people make up fake numbers.
People punctuate numbers based on context. To my neighbors, my number is (978)555-4321. To a customer in the Netherlands it is +1.978.555.4321. Can you write a program to regularize those numbers? Of course. Can you write a correct program to do that? No. Why bother trying. Just take whatever people give you.
(Unless you work for a mobile phone provider, in which case ask your database administrator.
Read this carefully. https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md
InnoDB tables are stored as a clustered index, also called an index-organized table. If the table has a PRIMARY KEY, then that is used as the key for the clustered index. The other UNIQUE KEY is a secondary index.
Queries where you look up rows by the clustered index are a little bit more efficient than using a secondary index, even if that secondary index is a unique index. So if you want to optimize for the most common query which you say is by user_id, then it would be a good idea to make that your clustered index.
In your case, it would be kind of strange to separate the cellphones into a separate table, but then make user_id alone be the PRIMARY KEY. That means that only one row per user_id can exist in this table. I would have expected that you separated cellphones into a separate table to allow each user to have multiple phone numbers.
You can get the same benefit of the clustered index if you just make sure user_id is the first column in a compound key:
CREATE TABLE `user_cellphone_num` (
`user_id` INT UNSIGNED NOT NULL,
`num` TINYINT UNSIGNED NOT NULL,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`, `num`)
)
So a query like SELECT ... FROM user_cellphone_num WHERE user_id = ? will match one or more rows, but it will be an efficient lookup because it's searching the first column of the clustered index.
Reference: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
I need to create a table called benificiaries where I have three columns
customerid
accountno
bank
The condition should be one customerid can have only one unique accountno. But another customerid can have the same accountno and same unique (only once). So I cant give primary key to accountno. Even for customerid I can't give primary key, since one customerid can have multiple records with unique accountno.
How can we create table in this case? Any ideas?
You can use multiple-column unique index.
CREATE TABLE YOUR_TABLE (
id INT NOT NULL AUTO_INCREMENT,
customerid INT NOT NULL,
accountno INT NOT NULL,
bank INT NOT NULL,
PRIMARY KEY (id),
UNIQUE INDEX name (customerid,accountno)
);
Documentation here.
https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html
If one customerid can have only 1 unique account number then how can you expect duplicates in terms of customer id in that table?
You can simply set a primary key to another column and make the customerid unique. I think this is what you want to have. Now every customerid is unique, but many costomerids can have the same accountno.
CREATE TABLE benificiaries(
id INT PRIMARY KEY,
customerid INT NOT NULL UNIQUE,
accountno INT NOT NULL,
bank INT NOT NULL
);
database can't manage all the business constraints within the data model. For the case, you might address elementary constraints with indexes (multiple column index for customerid, accountno and simple column index for accountno to perform search on the other way), add an auto-increment id and deal the business constraints in your code.
Just set your customer_id as a primary key then regarding the concept that a only two customer_id can have same account number once, will depend on the process of your App or System.
CREATE TABLE `tmpr_map`.`tbl_example`
(`customer_id` INT(11) NOT NULL AUTO_INCREMENT,
`account_number` VARCHAR NOT NULL , `bank_amount` DECIMAL(11,2) NOT NULL ,
PRIMARY KEY (`customer_id`)) ENGINE = InnoDB;
Currently we have tables that has type 4 UUID(random) as primary key.
Our application layer does a batch insert into DB.
But still since the primary key is random, this will result in multiple disk spins.
There are four tables.
CREATE TABLE process (
process_id binary(16) not null, // uuid
created_time bigint not null,
owner varchar(150) not null,
primary key (process_id),
index idx_c_t (created_time),
index idx_o (owner)
)
Records inserted = 2000/min
CREATE TABLE process_job (
job_id binary(16) not null, //uuid
process_id binary(16) not null, //uuid
info varchar(200),
text varchar(500),
primary key(job_id),
index idx_p_id (process_id)
)
Records inserted = 10000/min
CREATE TABLE ob_status (
job_id binary(16) not null, //uuid
status ('STARTED', 'SUCCESS', 'ERROR') not null,
job_code varchar(100) not null,
info varchar(200),
text varchar(500),
primary key(job_id, status, job_code)
)
Records inserted = 20000/min
CREATE TABLE process_job_custom (
job_id binary(16) not null, //uuid
key varchar(100) not null,
value varchar(500),
primary key(process_id, key)
)
Records inserted = 10000/min
All our tables use DYNAMIC format.
Further we delete 15 days old data periodically.
We run this deletes in batches by considering around 1000 records.
But when ever the deletes run, the whole db performance is bad. Disk usage is very high.(we suspect this is due to the randomness of the primary key)
So we are planning to alter our primary key as (time based key, uuid) and add index on (uuid) columns.
The records may arrive in random order(not exactly in the time based key order).
But the records for a time based key arrives mostly within 5 minute spread.
Also our deletes are based on the time based key.
Will this affect the performance of inserts?
Also our primary use cases involve time based queries.
So will the select performance also increase?
Further we are planning on partitioning by time based key.
Will this help us in better performance overall?
We suspect the major issue was the randomness of the primary key.
Will add the time based key(something like created time of the process) as the first part of the primary key in the all the tables and indexing based on the uuid columns help us?
this is the query from tutorial i read
CREATE TABLE Employee (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
departmentId TINYINT UNSIGNED NOT NULL
COMMENT "CONSTRAINT FOREIGN KEY (departmentId) REFERENCES Department(id)",
firstName VARCHAR(20) NOT NULL,
lastName VARCHAR(40) NOT NULL,
email VARCHAR(60) NOT NULL,
ext SMALLINT UNSIGNED NULL,
hireDate TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
leaveDate DATETIME NULL,
INDEX name (lastName, firstName),
INDEX (departmentId)
)
what is the function of INDEX name (lastName, firstName) ?
Please inform me if my question is not clear.
Thank you,
GusDe
INDEX name (lastName, firstName) is creating an additional index for fast lookups when you are querying using the lastname with or without the first.
It is a composite index because it includes two columns.
Added The author of the tutorial is "guessing" that employees will often be looked up by their name or by their departmentID. That's why he or she created the two additional indexes.
-- The primary key index is automatically created for you in most dbms systems.
In real life, it is not wise to solely rely on "guessing" what columns in the tables should be indexed. Instead, use the "slow queries" log (MySQL example) to determine what queries are executing slowly and how to speed them up. Usually the answer is to add another index or two.
ps. The downside of indexes is that they increase the time required to add, update or delete data in the table since the table and the index have to be modified. A second downside of indexes is that they take up room in the db. But storage is cheap these days.
Since most databases have far more reads than writes, the speedup in querying provided by the index usually far outweighs the costs.
I know how to use INDEX as in the following code. And I know how to use foreign key and primary key.
CREATE TABLE tasks (
task_id int unsigned NOT NULL AUTO_INCREMENT,
parent_id int unsigned NOT NULL DEFAULT 0,
task varchar(100) NOT NULL,
date_added timestamp NOT NULL,
date_completed timestamp NULL,
PRIMARY KEY ( task_id ),
INDEX parent ( parent_id )
)
However I found a code using KEY instead of INDEX as following.
CREATE TABLE orders (
order_id int unsigned NOT NULL AUTO_INCREMENT,
-- etc
KEY order_date ( order_date )
)
I could not find any explanation on the official MySQL page. Could anyone tell me what is the differences between KEY and INDEX?
The only difference I see is that when I use KEY ..., I need to repeat the word, e.g. KEY order_date ( order_date ).
There's no difference. They are synonyms, though INDEX should be preferred (as INDEX is ISO SQL compliant, while KEY is a MySQL-specific, non-portable, extension).
From the CREATE TABLE manual entry:
KEY is normally a synonym for INDEX. The key attribute PRIMARY KEY can also be specified as just KEY when given in a column definition. This was implemented for compatibility with other database systems.
By "The key attribute PRIMARY KEY can also be specified as just KEY when given in a column definition.", it means that these three CREATE TABLE statements below are equivalent and generate identical TABLE objects in the database:
CREATE TABLE orders1 (
order_id int PRIMARY KEY
);
CREATE TABLE orders2 (
order_id int KEY
);
CREATE TABLE orders3 (
order_id int NOT NULL,
PRIMARY KEY ( order_id )
);
...while these 2 statements below (for orders4, orders5) are equivalent with each other, but not with the 3 statements above, as here KEY and INDEX are synonyms for INDEX, not a PRIMARY KEY:
CREATE TABLE orders4 (
order_id int NOT NULL,
KEY ( order_id )
);
CREATE TABLE orders5 (
order_id int NOT NULL,
INDEX ( order_id )
);
...as the KEY ( order_id ) and INDEX ( order_id ) members do not define a PRIMARY KEY, they only define a generic INDEX object, which is nothing like a KEY at all (as it does not uniquely identify a row).
As can be seen by running SHOW CREATE TABLE orders1...5:
Table
SHOW CREATE TABLE...
orders1
CREATE TABLE orders1 ( order_id int NOT NULL, PRIMARY KEY ( order_id ))
orders2
CREATE TABLE orders2 ( order_id int NOT NULL, PRIMARY KEY ( order_id ))
orders3
CREATE TABLE orders3 ( order_id int NOT NULL, PRIMARY KEY ( order_id ))
orders4
CREATE TABLE orders4 ( order_id int NOT NULL, KEY ( order_id ))
orders5
CREATE TABLE orders5 ( order_id int NOT NULL, KEY ( order_id ))
Here is a nice description about the "difference":
"MySQL requires every Key also be indexed, that's an implementation
detail specific to MySQL to improve performance."
Keys are special fields that play very specific roles within a table, and the type of key determines its purpose within the table.
An index is a structure that RDBMS(database management system) provides to improve data processing. An index has nothing to do with a logical database structure.
SO...
Keys are logical structures you use to identify records within a table and indexes are physical structures you use to optimize data processing.
Source: Database Design for Mere Mortals
Author: Michael Hernandez
It is mentioned as a synonym for INDEX in the 'create table' docs:
MySQL 5.5 Reference Manual :: 13 SQL Statement Syntax :: 13.1 Data Definition Statements :: 13.1.17 CREATE TABLE Syntax
#Nos already cited the section and linked the help for 5.1.
Like PRIMARY KEY creates a primary key and an index for you,
KEY creates an index only.
A key is a set of columns or expressions on which we build an index.
While an index is a structure that is stored in database, keys are strictly a logical concept.
Index help us in fast accessing a record, whereas keys just identify the records uniquely.
Every table will necessarily have a key, but having an index is not mandatory.
Check on https://docs.oracle.com/cd/E11882_01/server.112/e40540/indexiot.htm#CNCPT721