mysql innodb primary key - mysql

Currently we have tables that has type 4 UUID(random) as primary key.
Our application layer does a batch insert into DB.
But still since the primary key is random, this will result in multiple disk spins.
There are four tables.
CREATE TABLE process (
process_id binary(16) not null, // uuid
created_time bigint not null,
owner varchar(150) not null,
primary key (process_id),
index idx_c_t (created_time),
index idx_o (owner)
)
Records inserted = 2000/min
CREATE TABLE process_job (
job_id binary(16) not null, //uuid
process_id binary(16) not null, //uuid
info varchar(200),
text varchar(500),
primary key(job_id),
index idx_p_id (process_id)
)
Records inserted = 10000/min
CREATE TABLE ob_status (
job_id binary(16) not null, //uuid
status ('STARTED', 'SUCCESS', 'ERROR') not null,
job_code varchar(100) not null,
info varchar(200),
text varchar(500),
primary key(job_id, status, job_code)
)
Records inserted = 20000/min
CREATE TABLE process_job_custom (
job_id binary(16) not null, //uuid
key varchar(100) not null,
value varchar(500),
primary key(process_id, key)
)
Records inserted = 10000/min
All our tables use DYNAMIC format.
Further we delete 15 days old data periodically.
We run this deletes in batches by considering around 1000 records.
But when ever the deletes run, the whole db performance is bad. Disk usage is very high.(we suspect this is due to the randomness of the primary key)
So we are planning to alter our primary key as (time based key, uuid) and add index on (uuid) columns.
The records may arrive in random order(not exactly in the time based key order).
But the records for a time based key arrives mostly within 5 minute spread.
Also our deletes are based on the time based key.
Will this affect the performance of inserts?
Also our primary use cases involve time based queries.
So will the select performance also increase?
Further we are planning on partitioning by time based key.
Will this help us in better performance overall?
We suspect the major issue was the randomness of the primary key.
Will add the time based key(something like created time of the process) as the first part of the primary key in the all the tables and indexing based on the uuid columns help us?

Related

SQL. Create one big table or different tables for earch client "client stock portfolio"? It is training project

What is better for multiply clients?
I create training project and can't understand what's better. Create one big stock portfolio table for all broker's clients or create individual table for each client? Individual table will require add brokerage agreement id for each table's name for it indentification.
DROP TABLE IF EXISTS portfolio;
CREATE TABLE common_portfolio (
common_portfolio_id serial,
brokerage_agreement_id BIGINT UNSIGNED NOT NULL,
type_assets_id BIGINT UNSIGNED NOT NULL,
stock_id BIGINT UNSIGNED NOT NULL,
stock_num BIGINT UNSIGNED NOT NULL,
FOREIGN KEY (brokerage_agreement_id) REFERENCES brokerage_agreement (brokerage_agreement_id),
FOREIGN KEY (type_assets_id) REFERENCES type_assets (type_assets_id),
FOREIGN KEY (stock_id) REFERENCES stock (stock_id)
);
VS
DROP TABLE IF EXISTS portfolio_12345612348; -- number generate from brokerage_agreement_id
CREATE TABLE portfolio_12345612348 (
position_id serial,
type_assets_id BIGINT UNSIGNED NOT NULL,
stock_id BIGINT UNSIGNED NOT NULL,
stock_num BIGINT UNSIGNED NOT NULL,
FOREIGN KEY (type_assets_id) REFERENCES type_assets (type_assets_id),
FOREIGN KEY (stock_id) REFERENCES stock (stock_id)
);
It is always better to keep all them in same table.
Keeping each client's data in a separate table will provide you with best performance only in case when you're looking for this particular customer.
But in all other cases it will be hell: creating/deleting a client will require you to build a dynamical create/drop table statement.
When sometime later you decided to add a column, you'll need to find ALL of those tables somehow and add new column to each one of them.
Even counting number of clients will cause you to write way more code rather than just "select count" statement.
And many more cases
So, use only one table

mysql choose between unique key and primary key for user id

Im creating a user database ... i want to separate user - cellphone number from 'user' table and create another table for it (user_cellphone (table))
but i have a problem to select best index !
in user_cellphone table, we get user_id and cellphone number ... but all SELECT queries are more based on 'user_id' so i want to know if it's better to choose 'user_id' column as primary key or not !!!
(Also each user have only one cellphone number !)
which option of these 2 options are better ?
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`),
UNIQUE INDEX `user_id` (`user_id`)
)
CREATE TABLE `user_cellphone_num` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`),
UNIQUE INDEX `id` (`id`),
UNIQUE INDEX `cellphone` (`cellphone_country_code`, `cellphone_num`)
)
choosing 'user_id' as primary key or just set 'user_id' as a unique key ?! is there any different here in performance ? (Im talking about when i have millions of rows)
in future im going to use some queries like this:
select u.*,cell.* FROM user AS u LEFT JOIN user_cellphone AS cell ON cell.user_id = u.id
so which one of these options give me better performance for some queries like this ?
May I offer some hard-won data design advice?
Do not use telephone numbers as any kind of unique or primary key.
Why not?
Sometimes multiple people use a single number.
Sometimes people make up fake numbers.
People punctuate numbers based on context. To my neighbors, my number is (978)555-4321. To a customer in the Netherlands it is +1.978.555.4321. Can you write a program to regularize those numbers? Of course. Can you write a correct program to do that? No. Why bother trying. Just take whatever people give you.
(Unless you work for a mobile phone provider, in which case ask your database administrator.
Read this carefully. https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md
InnoDB tables are stored as a clustered index, also called an index-organized table. If the table has a PRIMARY KEY, then that is used as the key for the clustered index. The other UNIQUE KEY is a secondary index.
Queries where you look up rows by the clustered index are a little bit more efficient than using a secondary index, even if that secondary index is a unique index. So if you want to optimize for the most common query which you say is by user_id, then it would be a good idea to make that your clustered index.
In your case, it would be kind of strange to separate the cellphones into a separate table, but then make user_id alone be the PRIMARY KEY. That means that only one row per user_id can exist in this table. I would have expected that you separated cellphones into a separate table to allow each user to have multiple phone numbers.
You can get the same benefit of the clustered index if you just make sure user_id is the first column in a compound key:
CREATE TABLE `user_cellphone_num` (
`user_id` INT UNSIGNED NOT NULL,
`num` TINYINT UNSIGNED NOT NULL,
`cellphone_country_code` SMALLINT UNSIGNED NOT NULL,
`cellphone_num` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`, `num`)
)
So a query like SELECT ... FROM user_cellphone_num WHERE user_id = ? will match one or more rows, but it will be an efficient lookup because it's searching the first column of the clustered index.
Reference: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

How do I partition a MySQL table that contains several unique keys?

I have an extremely large MySQL table that I would like to partition. A simplified create of this table is as given below -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`)
)
I want to partition my table either by columnA or id, but the problem is that the MySQL Manual states -
In other words, every unique key on the table must use every column in the table's partitioning expression.
Which means that I cannot partition the table on either of those columns without changing my schema. For example, I have considered adding id to all my unique keys like so -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`,`id`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`,`id`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`,`id`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`,`id`)
)
Which I do not mind doing except for the fact that it allows for the creation of rows that should not be created. For example, by my original schema, the following row insertion wouldn't have worked twice -
INSERT into myTable (columnC, columnD) VALUES (1.0,2.0)
But it works with the second schema as columnC and columnD by themselves no longer form a unique key. I have considered getting around this by using triggers to prevent the creation of such rows but then the trigger cost would reduce(or outweigh) the partitioning performance gain
Edited:
Some additional information about this table:
Table has more than 1.2Billion records.
Using Mysql 5.6.34 version with InnoDB Engine and running on AWS RDS.
Few other indexes are also there on this table.
Because of huge data and multiple indexes it is an expensive process to insert and retrieve the data.
There are no unique indexes on timestamp and float data types. It was just a sample table schema for illustration. Our actual table has similar schema as above table.
Other than Partitioning what options do we have to improve the
performance of the table without losing any data and maintaining the
integrity constraints.
How do I partition a MySQL table that contains several unique keys?
Sorry to say, you don't.
Also, you shouldn't. Remember that UPDATE and INSERT operations to a table with unique keys necessarily must query the table to ensure the keys stay unique. If it were possible to partition a table so unique keys weren't built in to the partititon expression, then every insert or update would require querying every partition. This would be likely to make the partitioning worse than useless.

MySQL performance using AUTO_INCREMENT on a PRIMARY KEY

I ran a comparison INSERTing rows into an empty table using MySQL 5.6.
Each table contained a column (ascending) that was incremented serially by AUTO_INCREMENT, and a pair of columns (random_1, random_2) that receive random, unique numbers.
In the first test, ascending was PRIMARY KEY and (random_1, random_2) were KEY. In the second test, (random_1, random_2) were PRIMARY KEY and ascending was KEY.
CREATE TABLE clh_test_pk_auto_increment (
ascending_pk BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, -- PK
random_ak_1 BIGINT UNSIGNED NOT NULL, -- AK1
random_ak_2 BIGINT UNSIGNED, -- AK2
payload VARCHAR(40),
PRIMARY KEY ( ascending_pk ),
KEY ( random_ak_1, random_ak_2 )
) ENGINE=MYISAM
AUTO_INCREMENT=1
;
CREATE TABLE clh_test_auto_increment (
ascending_ak BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, -- AK
random_pk_1 BIGINT UNSIGNED NOT NULL, -- PK1
random_pk_2 BIGINT UNSIGNED, -- PK2
payload VARCHAR(40),
PRIMARY KEY ( random_pk_1, random_pk_2 ),
KEY ( ascending_ak )
) ENGINE=MYISAM
AUTO_INCREMENT=1
;
Consistently, the second test (where the auto-increment column is not the PRIMARY KEY) runs slightly faster -- 5-6%. Can anyone speculate as to why?
Primary keys are often used as the sequence in which the data is actually stored. If the primary key is incremented, the data is simply appended. If the primary key is random, that would mean that existing data must be moved about to get the new row into the proper sequence. A basic (non-primary-key) index is typically much lighter in content and can be moved around faster with less overhead.
I know this to be true for other DBMS's; I would venture to guess that MySQL works similarly in this respect.
UPDATE
As stated by #BillKarwin in comments below, this theory would not hold true for MyISAM tables. As a followup-theory, I'd refer to #KevinPostlewaite's answer below (which he's since deleted), that the issue is the lack of AUTO_INCREMENT on a PRIMARY KEY - which must be unique. With AUTO_INCREMENT it's easier to determine that the values are unique since they are guaranteed to be incremental. With random values, it may take some time to actually walk the index to make this determination.

MySQL: INDEX name (lastName, firstName)

this is the query from tutorial i read
CREATE TABLE Employee (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
departmentId TINYINT UNSIGNED NOT NULL
COMMENT "CONSTRAINT FOREIGN KEY (departmentId) REFERENCES Department(id)",
firstName VARCHAR(20) NOT NULL,
lastName VARCHAR(40) NOT NULL,
email VARCHAR(60) NOT NULL,
ext SMALLINT UNSIGNED NULL,
hireDate TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
leaveDate DATETIME NULL,
INDEX name (lastName, firstName),
INDEX (departmentId)
)
what is the function of INDEX name (lastName, firstName) ?
Please inform me if my question is not clear.
Thank you,
GusDe
INDEX name (lastName, firstName) is creating an additional index for fast lookups when you are querying using the lastname with or without the first.
It is a composite index because it includes two columns.
Added The author of the tutorial is "guessing" that employees will often be looked up by their name or by their departmentID. That's why he or she created the two additional indexes.
-- The primary key index is automatically created for you in most dbms systems.
In real life, it is not wise to solely rely on "guessing" what columns in the tables should be indexed. Instead, use the "slow queries" log (MySQL example) to determine what queries are executing slowly and how to speed them up. Usually the answer is to add another index or two.
ps. The downside of indexes is that they increase the time required to add, update or delete data in the table since the table and the index have to be modified. A second downside of indexes is that they take up room in the db. But storage is cheap these days.
Since most databases have far more reads than writes, the speedup in querying provided by the index usually far outweighs the costs.