MySQL Innodb: Large Composite PK no other indexes - mysql

I am creating an Innodb table with four columns.
Table
column_a (tiny_int)
column_b (medium_int)
column_c (timestamp)
column_d (medium_int)
Primary Key -> column_a, column_b, column_c
From a logical standpoint, columns A, B, C must be made into a PK together.However, to increase performance and be able to read directly from the index (using index) I am considering a PK that comprises of all 4 columns (A, B, C, D).
QUESTION
What would the performance be of appending an additional column to the Primary Key on an Innodb table?
CONSIDERATIONS
Surrogate primary keys are absolutely out of the question
No other indexes will exist on this table
Table is read/write intensive (both about equal)
Thank you!

In InnoDB, the PRIMARY KEY index structure includes all non-key fields and will automatically use them for covering index queries and row elimination. There is no separate "data" structure other than the PRIMARY KEY index structure. It is not necessary to add additional fields to the PRIMARY KEY definition itself. Note that it won't show Using index when it's using the PRIMARY KEY on an InnoDB table, because it's a different code path which doesn't trigger the addition of that message.

A few things to consider:
Unless the query in question uses all of the columns in the index, the index will not be used.
As jeremycole notes: in the Innodb structure all row data is stored in the B-tree leaf nodes of the clustered index (PRIMARY INDEX)
This concept is covered:
http://www.innodb.com/wp/wp-content/uploads/2009/05/innodb-file-formats-and-source-code-structure.pdf
http://blog.johnjosephbachir.org/2006/10/22/everything-you-need-to-know-about-designing-mysql-innodb-primary-keys/
... and in jeremy's blog post here:
http://blog.jcole.us/2013/01/07/the-physical-structure-of-innodb-index-pages/
As such, a query on A, B, C will be sufficient for efficiently obtaining all values on this Innodb table.

Related

Does MySQL create an extra index for primary key or uses the data itself as an "index"

Cant find a explicit answer of that.
I know that when you create a primary key, MySQL orders the data according to that primary key, question is, does it actually create another index, or uses the actual data as an index since it should be ordered by the primary key?
EDIT:
if I have a table with has index A and index B and no primary key, I have the data + index A + index B. If I change the table to have columns of index A as the primary key, I will only have data (which is also used as an index) + index B right? The above is in terms of memory usage
Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.
When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index
If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
How the Clustered Index Speeds Up Queries
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.
if I have a table with has index A and index B and no primary key, I have the data + index A + index B. If I change the table to have columns of index A as the primary key, I will only have data (which is also used as an index) + index B right? The above is in terms of memory usage
Yes, the index for the clustered index is the table itself. That's the only place other non-indexed columns are stored. When you SHOW TABLE STATUS you see this reported as Data_length. Secondary indexes are reported as Index_length.
mysql> show table status like 'redacted'\G
*************************** 1. row ***************************
Name: redacted
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 100217
Avg_row_length: 1168
Data_length: 117063680 <-- clustered index
Max_data_length: 0
Index_length: 3653632 <-- secondary index(es)
InnoDB always stores a clustered index. If you have no PRIMARY KEY defined on any columns of your table, InnoDB creates an artificial column as the key for the clustered index, and this column cannot be queried.
if I have a table with has index A and index B and no primary key, I
have the data + index A + index B. If I change the table to have
columns of index A as the primary key, I will only have data (which is
also used as an index) + index B right? The above is in terms of
memory usage
While that is true - There is more to consider in terms of storage size.
Assuming, what you try to do is logically fine and your index, which you want to promote to primary key is actually a candidate key. If you can save on storage size depends on the number of indices and the size of the primary key columns. The reason is that InnoDB appends the primary key columns to every secondary index (if they are not already explicitely part of it). It can also affect other (bigger) tables, if they need to reference it as foreign key.
Here are some simple tests, which can show the differences. I am using MariaDB since it's sequence plugin makes it easy to create dummy data. But you should see the same effects on MySQL server.
So first I will just create a simple table with two INT columns and an index on each filling it with 100K rows.
drop table if exists test;
create table test(
a int,
b int,
index(a),
index(b)
);
insert into test(a, b)
select seq as a, seq as b
from seq_1_to_100000
;
To keep it simple, I will just look at the file size of the table (I'm using innodb_file_per_table=1).
16777216 test.ibd
Now let's do what you wanted, and make column a primary key, changing the CREATE statement:
create table test(
a int,
b int,
primary key(a),
index(b)
);
The file size now is:
13631488 test.ibd
So it's true - You can save on storage size by promoting an index to primary key. In this case almost 20%.
But what happens if I change the column type from INT (4 bytes) to BINARY(32) (32 byte)?
create table test(
a binary(32),
b binary(32),
index(a),
index(b)
);
File size:
37748736 test.ibd
Now make column a primary key
create table test(
a binary(32),
b binary(32),
primary key(a),
index(b)
);
File size:
41943040 test.ibd
As you can see, you can as well increase the size. In this case like 11%.
It is though advised to always define a primary key. If in doubt, just create an AUTO_INCREMENT PRIMARY KEY. In my example it could be:
create table test(
id mediumint auto_increment primary key,
a binary(32),
b binary(32),
index(a),
index(b)
);
File size:
37748736 test.ibd
The size is the same as if we didn't have an explicit primary key. (Though I would expect to save a bit on size, since I use 3 byte PK instead of a hidden 6 byte PK.) But now you can use it in your queries, for foreign keys and joins.

MySQL: Composite Index vs Multiple Indices (Leftmost Index Prefixes)

We have a table that is currently using a composite (i.e. multi-column) index.
Let's say
PRIMARY KEY(A, B)
Of course we can rapidly search based on A alone (Leftmost Index Prefix) and if we want to efficiently search based on B alone, we need to create a separate index for B.
My question is that if I am doing:
PRIMARY KEY (B)
is there any value in retaining
PRIMARY KEY (A,B)
In other words will there be any advantage retaining
PRIMARY KEY (A,B)
if I have
PRIMARY KEY (A)
and
PRIMARY KEY (B)
You are missing a key point about PRIMARY KEY -- it is by definition (at least in MySQL), UNIQUE. And do not have more columns than are needed to make the PK unique.
If B, aloneis unique, then havePRIMARY KEY(B)` without any other columns in the PK definition.
If A is also unique, then do
PRIMARY KEY(B),
UNIQUE(A)
or swap them.
For a longer discussion of creating indexes, see my cookbook.
If it takes both columns to be "unique", then you may need
PRIMARY KEY(A, B),
INDEX(B)
or
PRIMARY KEY(B, A),
INDEX(A)
Until you have the SELECTs, it is hard to know what indexes to create.
You can't have multiple primary keys, so I'm going to assume you're really asking about having an ordinary index.
If you have an index on (A, B), it will be used for queries that use both columns, like:
WHERE A = 1 AND B = 2
as well as queries that just use A:
WHERE A = 3
But if you have a query that just uses B, e.g.
WHERE B = 4
it will not be able to use the index at all. If you need to optimize these queries, you should also have an index on B. So you might have:
UNIQUE KEY (A, B)
INDEX (B)

Can I have too many columns in my composite primary key on one table

I have a table that uses 2 foreign key fields and a date field.
Is it common to have a table use 3 or more fields as a primary key? And are there any disadvantages to doing this?
--
My 3 tables are employees, training, and emp_training. The employees table holds employee data. Training table holds different training courses. And I am designing the emp_training table to be the fields EmployeeID (FK), TrainingID (FK), OnDate.
An employee can do multiple training courses, and can do the same training course multiple times. But they cannot to the same training course more than once on the same day.
Which is better to implement:
Option A - Make all 3 fields a primary key
Option B - Add an autonumber PK field, and use a query to find any potential duplicates.
I've created many tables before using 2 fields as a primary key, but never 3, so I'm curious if there is any disadvantage to proceeding with option A
It's worth to mention, that with SQL Server the PK by default is the one and only clustered key, but you are allowed to create a non-clustered PK as well.
You may define a new clustered index which is not the PK. "Primary Key" is just a name actually...
The most important question is: Which columns participate in a clustered key and (this is the very most important question): Do they have an implicit sorting? And (very important too): Are there many update operations which change the content of participating columns?
You must be aware, that a clustered key defines the physical order on your hard disc. In other words: The clustered key is the table itself. You can think of an index with all columns included. If your leading column (worst case) is a GUID, each insert to your table will not be in order. This leads to a 99.99% fragmentation.
If a clustered index is bound to the time of insert or a running number (best case), it will never go into fragmentation!
What makes things worse: If there is a clustered key (whether it's called PK or not), it will be used as lookup key for other indexes.
So: in many cases it is best to use a running number as clustered key and a non-clustered multi-column index which is much faster to re-build than as if it was the clustered one.
All indexes will profit from this!
My advise for you:
Option C: a running number as PK and additionally a unique multi-column-key to ensure data integrity. No need to use own logic here...
Yes, you can have a poor strategy for choosing too many columns for your composite Primary Key (PK) if a better strategy could be employeed for uniqueness via secondary indexes.
Remember that the PK is special. There is only 1 physical / clustered ordering of your data. Changes to the data via Inserts and Updates (and incumbent shuffling) has overhead there that would not exist if maintained in a secondary index.
So the following can have not-so-insignificant differences:
A primary key with 5 composite columns
vs.
A primary key with 1 or 2 columns plus
Secondary indexes that maintain uniqueness if thought through well
The former mandates movement of data between data pages to maintain the clustered index (the PK). Which might suggest why so often one sees:
(
id int auto_increment primary key,
...
)
in table designs.
Performance with Index Width:
The width of the PK in 1. above is narrow. The width of 2. can be quite wide. Wider keys propagating to child relationships will slow performance and concurrency.
Cases of FK compositions:
Special cases of compositions of foreign keys simply cannot be achieved without the use of a single column index, preferably the PK, as seen in this recent Answer of mine.
I dont think that there is any problem of creating a table with a composed PK ,such tables are needed in larger db .There is not a real problem in creating a table with 2FK whose with the OnDate field form the PK . Both ways are vailable.
Good luck!
If you assign primary key on more than one column it will be composite primary key. For example,
CREATE TABLE employee(
training VARCHAR(10),
emp_training VARCHAR (20),
OnDate INTEGER,
PRIMARY KEY (training, emp_training, OnDate)
)
there will be unique records in training, emp_training, OnDate together and can not be null together.
As already stated you can have a single primary key which consists of multiple columns.
If the question was how to make the columns primary keys separately, that's not possible. However, you can create 1 primary key and add two unique keys

How to find out size of indexes in mysql (including primary keys)

2 common answers are to use show_table_status and INFORMATION_SCHEMA.TABLES
But it seems, both of them don't count primary key's size.
I have tables with millions of records with primary key and no other indexes, and both of methods mentioned above show Index_length: 0 for that tables. Tables are INNODB.
Your primary key is your table. In an InnoDB the primary key contains the actual data thus if the primary key contains the data it is the table.
Think about it for a moment. You get two different types of indexes on an InnoDB table clustered and secondary indexes. The difference is that a clustered index contains the data and a secondary index contains the indexed columns and a pointer to the data. Thus a secondary index does not contain the data but rather the location of where the data is located in the CLUSTERED index.
Normally a primary key is a clustered index. It would be highly inefficient to store both the table with all its values and then a clustered index with all its values. This would effectively double the size of the table.
So when you have a primary key that is on an InnoDB the table size is the size of the primary key. In some database systems you can have a secondary index as a primary key and a separate index as a clustered key, however InnoDB does not allow this.
Go read the following links for more details:
http://dev.mysql.com/doc/refman/5.0/en/innodb-table-and-index.html
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
In these links they explain all I have said above in more detail. Simply put you already have the size of the primary key index as it is the size of your table.
Hope that helps.

does a primary key speed up an index?

Aside from the convenient auto-increment and UNIQUE features, does the PK actually speed up the index?
Will the speed be the same whether it's a non-PKed indexed INT or PKed (same column, two different tests)? If I had the same column on the same table on the same system, will it be faster if a UNIQUE INT column with an index also has PK enabled? Does PK make the index it coexists with faster?
Please, actual results only with system stats if you could be so kind.
The primary key for a table represents the column or set of columns that you use in your most vital queries. It has an associated index, for fast query performance. Query performance benefits from the NOT NULL optimization, because it cannot include any NULL values. With the InnoDB storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns.
If your table is big and important, but does not have an obvious column or set of columns to use as a primary key, you might create a separate column with auto-increment values to use as the primary key. These unique IDs can serve as pointers to corresponding rows in other tables when you join tables using foreign keys.
Also refer the following locations : http://www.dbasquare.com/2012/04/04/how-important-a-primary-key-can-be-for-mysql-performance/ and http://www.w3schools.com/sql/sql_primarykey.asp
Rows in a base table are uniquely identified by the value of the primary key defined for the table. The primary key for a table is composed of the values of one or more columns.
Primary keys are automatically indexed to facilitate effective information retrieval.
The primary key index is the most effective access path for the table.
Other columns or combinations of columns may be defined as a secondary index to improve performance in data retrieval. Secondary indexes are defined on a table after it has been created (using the CREATE INDEX statement).
An example of when a secondary index may be useful is when a search is regularly performed on a non-keyed column in a table with many rows, defining an index on the column may speed up the search. The search result is not affected by the index but the speed of the search is optimized.
It should be noted, however, that indexes create an overhead for update, delete and insert operations because the index must also be updated.
Indexes are internal structures which cannot be explicitly accessed by the user once created. An index will be used if the internal query optimization process determines it will improve the efficiency of a search.
SQL queries are automatically optimized when they are internally prepared for execution. The optimization process determines the most effective way to execute each query, which may or may not involve using an applicable index.