MySql primary key doesn't consume any size - mysql

I have those two tables schema:
CREATE TABLE `myTable` (
id int(11) NOT NULL AUTO_INCREMENT,
lat double NOT NULL,
lng double NOT NULL,
date datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
mobile bigint(11) unsigned NOT NULL,
date_updated datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `IDX_Datee` (`mobile`,`date`),
CONSTRAINT `FK_DeviceLocationss` FOREIGN KEY (`mobile`) REFERENCES `device` (`serial`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
And here is the second one:
CREATE TABLE `myTable2` (
lat double NOT NULL,
lng double NOT NULL,
date datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
mobile bigint(11) unsigned NOT NULL,
date_updated datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY `IDX_Datee2` (`mobile`,`date`),
CONSTRAINT `FK_DeviceLocationss2` FOREIGN KEY (`mobile`) REFERENCES `device` (`serial`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
In every table there are around 4,000,000 records till now,
So I'm trying to build the most suitable schema which is more fast and less storage consuming.
When I check the state of each Table in MySql Workbeanch I got little confusing:
First Table:
Second Table
When I changed the IDX_Datee key from Index to Primary, It doesn't consume any space.
I believe the second schema is better for me, But I don't have a good understand about that difference.
Can anyone explain that?

The table is index organized. The datarecords are stored in index order.
see https://dev.mysql.com/doc/refman/5.5/en/optimizing-primary-keys.html
"With the InnoDB storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns"
so there is no extra index necessary

All operations (select, insert, delete, update) on a single row specified by the PK will be very fast and efficient. Drill down the BTree that contains the data and is organized by the PK, and there is the row to work with.
The PK takes a tiny amount of space, just as any BTree is more than the leaf nodes. As a Rule-Of-Thumb, MySQL's BTrees (data or index) have a fanout of about 100. That is, each node has about 100 nodes under it. This implies that there is only about 1% overhead for the non-leaf nodes for the 'rest' of the PK overhead.
16KB / 61 is about 268 -- your "fanout".
For starters, I will suggest that DOUBLE (8 bytes) is gross overkill for latitude and longitude unless you are trying to distinguish one flea from another on a dog. Here is my table of representation choices for lat/lng.
INT is 4 bytes. If you are sure you won't go past 16 million, change the PK to MEDIUMINT UNSIGNED (3 bytes). (I suggest this is too risky.)
The size of the PK is doubly important because it is included in every secondary key.
If (mobile, date) is unique, the it may as well be the PK. That shaves off two copies of id, and speeds up queries based on mobile.
If mobile contains phone numbers, well some numbers won't fit. Better off going with DECIMAL(11) takes 5 bytes; (13) takes 6. If, instead, mobile is an AUTO_INCREMENT in some other table, the perhaps even SMALLINT UNSIGNED (2 bytes per copy, per table) would be better.
Your First table has 4 extra columns (relative to the Second table): id--twice, mobile, and date.

Related

How do I partition a MySQL table that contains several unique keys?

I have an extremely large MySQL table that I would like to partition. A simplified create of this table is as given below -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`)
)
I want to partition my table either by columnA or id, but the problem is that the MySQL Manual states -
In other words, every unique key on the table must use every column in the table's partitioning expression.
Which means that I cannot partition the table on either of those columns without changing my schema. For example, I have considered adding id to all my unique keys like so -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`,`id`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`,`id`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`,`id`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`,`id`)
)
Which I do not mind doing except for the fact that it allows for the creation of rows that should not be created. For example, by my original schema, the following row insertion wouldn't have worked twice -
INSERT into myTable (columnC, columnD) VALUES (1.0,2.0)
But it works with the second schema as columnC and columnD by themselves no longer form a unique key. I have considered getting around this by using triggers to prevent the creation of such rows but then the trigger cost would reduce(or outweigh) the partitioning performance gain
Edited:
Some additional information about this table:
Table has more than 1.2Billion records.
Using Mysql 5.6.34 version with InnoDB Engine and running on AWS RDS.
Few other indexes are also there on this table.
Because of huge data and multiple indexes it is an expensive process to insert and retrieve the data.
There are no unique indexes on timestamp and float data types. It was just a sample table schema for illustration. Our actual table has similar schema as above table.
Other than Partitioning what options do we have to improve the
performance of the table without losing any data and maintaining the
integrity constraints.
How do I partition a MySQL table that contains several unique keys?
Sorry to say, you don't.
Also, you shouldn't. Remember that UPDATE and INSERT operations to a table with unique keys necessarily must query the table to ensure the keys stay unique. If it were possible to partition a table so unique keys weren't built in to the partititon expression, then every insert or update would require querying every partition. This would be likely to make the partitioning worse than useless.

Performance of joins on multi-million-row tables

I need to give my website users the ability to select their country, province and city. So I want to display a list of countries, then a list of provinces in the selected country, then a list of cities in the selected province (I don't want any other UI solution for now). Of course, every name must be in the user's language, so I need additional tables for the translations.
Let's focus on the case of the cities. Here are the two tables:
CREATE TABLE `city` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`province_id` int(10) unsigned DEFAULT NULL
PRIMARY KEY (`id`),
KEY `idx_fk_city_province` (`province_id`),
CONSTRAINT `fk_city_province` FOREIGN KEY (`province_id`) REFERENCES `province` (`id`)
) ENGINE=InnoDB;
CREATE TABLE `city_translation` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`city_id` int(10) unsigned NOT NULL,
`locale_id` int(10) unsigned DEFAULT NULL,
`name` varchar(255) DEFAULT NULL
PRIMARY KEY (`id`),
KEY `idx_fk_city_translation_city` (`city_id`),
KEY `idx_fk_city_translation_locale` (`locale_id`),
KEY `idx_city_translation_city_locale` (`city_id`,`locale_id`),
CONSTRAINT `fk_city_translation_city` FOREIGN KEY (`city_id`) REFERENCES `city` (`id`),
CONSTRAINT `fk_city_translation_locale` FOREIGN KEY (`locale_id`) REFERENCES `locale` (`id`)
) ENGINE=InnoDB;
The city table contains 4 millions rows and the city_translation table 4 millions × the number of the languages available on my website. This is 12 millions now. If in the future I want to support 10 languages, it will be 40 millions...
So I am wondering: is it a bad idea (performance wise) to work with a table of this size, or is a good index (here on the join fields, city_id and locale_id) sufficient to make the size not matter?
If not, what are the common solutions used to solve this specific --but I guess common-- problem? I'm only interested in performance. I'm ok to denormalize if necessary, or even to use other tools if they are more appropriate (ElasticSearch?).
Get rid of id in city_translations. Instead have PRIMARY KEY(city_id, locale_id). With InnoDB, this may double the speed because of cutting out an unnecessary step in the JOINs. And you can shrink the disk footprint by also removing the two indexes starting with city_id.
Do you think you will go beyond 16M cities? I doubt it. So save one byte by changing (in all tables) city_id to MEDIUMINT UNSIGNED.
Save 3 bytes by changing locale_id to TINYINT UNSIGNED.
Those savings are multiplied by the number of columns and indexes mentioning them.
How big are the tables (GB)? What is the setting of innodb_buffer_pool_size? How much RAM is there? See if you can make that setting bigger than the total table size and yet no more than 70% of available memory. (That's the only "tunable" that is worth checking.)
I hope you have a default of CHARACTER SET utf8mb4 for the sake of Chinese users. (But that is another story.)

Tuning SQL Query for a table with size more than 2GB

I have a table with millions of records and the size of table currently is 2GB and expected to grow further
Table Structure
CREATE TABLE `test` (
`column_1` int(11) NOT NULL AUTO_INCREMENT,
`column_2` int(11) NOT NULL,
`column_3` int(11) NOT NULL,
`column_4` int(11) NOT NULL,
`column_5` datetime NOT NULL,
`column_6` time NOT NULL,
PRIMARY KEY (`column_1`),
UNIQUE KEY `index_1` (`column_2`,`column_3`),
UNIQUE KEY `index_2` (`column_2`,`column_4`),
KEY `index_3` (`column_3`),
KEY `index_4` (`column_4`),
KEY `index_5` (`column_2`),
KEY `index_6` (`column_5`,`column_2`),
CONSTRAINT `fk_1` FOREIGN KEY (`column_3`) REFERENCES `test2`(`id`),
CONSTRAINT `fk_2` FOREIGN KEY (`column_4`) REFERENCES `test2` (`id`),
CONSTRAINT `fl_3` FOREIGN KEY (`column_2`) REFERENCES `link` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14164023 DEFAULT CHARSET=utf8;
When I run the following query it is taking around 5-8 secs for different values of column_2. Can some one help to execute this better ?
SELECT count(*) FROM test WHERE test.column_2= 26 and
test.column_5 between '2015-06-01 00:00:00' AND
'2015-06-30 00:00:00' ;
Note: The timings mentioned are captured by executing the query on mysql work bench
Your index_6 currently has column_5, then column_2, so MySQL first tries to filter based on the BETWEEN clause. However MySQL has limitation that after using index in range mode, it can't use the 2nd part of the index (more info in this blog post).
The correct way of optimizing such queries is to have the equation column as 1st part of index and the range column as second. Then MySQL will choose rows which have column_2 value of 26 and then will use 2nd part of index to further filter them based on column_5 date range.
So the solution is to have an index:
KEY `ind_c2_c5` (`column_2`,`column_5`)
BTW it is better to give indexes descriptive names, so you know on first sight what they are for...

MySQL performance using AUTO_INCREMENT on a PRIMARY KEY

I ran a comparison INSERTing rows into an empty table using MySQL 5.6.
Each table contained a column (ascending) that was incremented serially by AUTO_INCREMENT, and a pair of columns (random_1, random_2) that receive random, unique numbers.
In the first test, ascending was PRIMARY KEY and (random_1, random_2) were KEY. In the second test, (random_1, random_2) were PRIMARY KEY and ascending was KEY.
CREATE TABLE clh_test_pk_auto_increment (
ascending_pk BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, -- PK
random_ak_1 BIGINT UNSIGNED NOT NULL, -- AK1
random_ak_2 BIGINT UNSIGNED, -- AK2
payload VARCHAR(40),
PRIMARY KEY ( ascending_pk ),
KEY ( random_ak_1, random_ak_2 )
) ENGINE=MYISAM
AUTO_INCREMENT=1
;
CREATE TABLE clh_test_auto_increment (
ascending_ak BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, -- AK
random_pk_1 BIGINT UNSIGNED NOT NULL, -- PK1
random_pk_2 BIGINT UNSIGNED, -- PK2
payload VARCHAR(40),
PRIMARY KEY ( random_pk_1, random_pk_2 ),
KEY ( ascending_ak )
) ENGINE=MYISAM
AUTO_INCREMENT=1
;
Consistently, the second test (where the auto-increment column is not the PRIMARY KEY) runs slightly faster -- 5-6%. Can anyone speculate as to why?
Primary keys are often used as the sequence in which the data is actually stored. If the primary key is incremented, the data is simply appended. If the primary key is random, that would mean that existing data must be moved about to get the new row into the proper sequence. A basic (non-primary-key) index is typically much lighter in content and can be moved around faster with less overhead.
I know this to be true for other DBMS's; I would venture to guess that MySQL works similarly in this respect.
UPDATE
As stated by #BillKarwin in comments below, this theory would not hold true for MyISAM tables. As a followup-theory, I'd refer to #KevinPostlewaite's answer below (which he's since deleted), that the issue is the lack of AUTO_INCREMENT on a PRIMARY KEY - which must be unique. With AUTO_INCREMENT it's easier to determine that the values are unique since they are guaranteed to be incremental. With random values, it may take some time to actually walk the index to make this determination.

Enforce unique rows in MySQL

I have a table in MySQL that has 3 fields and I want to enforce uniqueness among two of the fields. Here is the table DDL:
CREATE TABLE `CLIENT_NAMES` (
`ID` int(11) NOT NULL auto_increment,
`CLIENT_NAME` varchar(500) NOT NULL,
`OWNER_ID` int(11) NOT NULL,
PRIMARY KEY (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The ID field is a surrogate key (this table is being loaded with ETL).
The CLIENT_NAME is a field that contains names of clients
The OWNER_ID is an id indicates a clients owner.
I thought I could enforce this with a unique index on CLIENT_NAME and OWNER_ID,
ALTER TABLE `DW`.`CLIENT_NAMES`
ADD UNIQUE INDEX enforce_unique_idx(`CLIENT_NAME`, `OWNER_ID`);
but MySQL gives me an error:
Error executing SQL commands to update table.
Specified key was too long; max key length is 765 bytes (error 1071)
Anyone else have any ideas?
MySQL cannot enforce uniqueness on keys that are longer than 765 bytes (and apparently 500 UTF8 characters can surpass this limit).
Does CLIENT_NAME really need to be 500 characters long? Seems a bit excessive.
Add a new (shorter) column that is hash(CLIENT_NAME). Get MySQL to enforce uniqueness on that hash instead.
Have you looked at CONSTRAINT ... UNIQUE?
Something seems a bit odd about this table; I would actually think about refactoring it. What do ID and OWNER_ID refer to, and what is the relationship between them?
Would it make sense to have
CREATE TABLE `CLIENTS` (
`ID` int(11) NOT NULL auto_increment,
`CLIENT_NAME` varchar(500) NOT NULL,
# other client fields - address, phone, whatever
PRIMARY KEY (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `CLIENTS_OWNERS` (
`CLIENT_ID` int(11) NOT NULL,
`OWNER_ID` int(11) NOT NULL,
PRIMARY KEY (`CLIENT_ID`,`OWNER_ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I would really avoid adding a unique key like that on a 500 character string. It's much more efficient to enforce uniqueness on two ints, plus an id in a table should really refer to something that needs an id; in your version, the ID field seems to identify just the client/owner relationship, which really doesn't need a separate id, since it's just a mapping.
Here. For the UTF8 charset, MySQL may use up to 3 bytes per character. CLIENT_NAME is 3 x 500 = 1500 bytes. Shorten CLIENT_NAME to 250.
later: +1 to creating a hash of the name and using that as the key.