Performance difference between foreign key identifying and non-identifying relationships - mysql

I was just adding some foreign keys to my database and usually all my foreign keys are non identifying as I have never bothered making them identifying as I never knew the difference and my databases always seemed to work well enough for me.
Now I have decided that I am going to make this database properly and was making the foreign keys identifying and non-identifying. I was curious is there any performance difference with them when doing Joins?
Thanks

Yes, there could be some performance benefit to joins by making a foreign key on an identifying relationship. But it depends on the query (as optimization methods always do).
For example, querying the books for a given author:
SELECT a.author_name, b.book_name
FROM Authors AS a
JOIN AuthorBooks AS ab ON a.author_id = ab.author_id
JOIN Books AS b ON b.book_id = ab.book_id
WHERE a.author_id = 12345;
In this case, we hope the join to AuthorBooks uses an index. Which index will it use? It depends on how we define the indexes in that table.
The two entity tables are pretty straightforward.
CREATE TABLE Authors (
author_id INT AUTO_INCREMENT PRIMARY KEY,
author_name VARCHAR(50)
);
CREATE TABLE Books (
book_id INT AUTO_INCREMENT PRIMARY KEY,
book_name VARCHAR(50)
);
But there are two common ways that developers design the many-to-many table. One has an auto-increment id for its primary key:
CREATE TABLE AuthorBooks (
id INT AUTO_INCREMENT PRIMARY KEY,
author_id INT NOT NULL,
book_id INT NOT NULL,
FOREIGN KEY (author_id) REFERENCES Authors (author_id)
FOREIGN KEY (book_id) REFERENCES Books (book_id)
);
The other does not have an id. The primary key is the combination of the two foreign keys, and this makes them both have an identifying relationship with their respective referenced entity tables.
CREATE TABLE AuthorBooks (
author_id INT NOT NULL,
book_id INT NOT NULL,
PRIMARY KEY (author_id, book_id),
FOREIGN KEY (author_id) REFERENCES Authors (author_id)
FOREIGN KEY (book_id) REFERENCES Books (book_id)
);
What's the difference in terms of performance?
First of all, keep in mind how MySQL implements indexes for foreign keys: If there's no index, the foreign key will implicitly create one. If there's an index already on the column, the foreign key will use it. Even an index that includes the foreign key column as the left-most column, that can be used, and there is no need to create a new index for the foreign key.
In the first AuthorBooks table design, as MySQL does the join from Authors to AuthorBooks, it looks up an entry in the index for the author_id foreign key. But to perform the second join, that index entry has to fetch the row it references, to get the book_id value, which it then uses to join to the Books table. So the joins ultimately take an extra table lookup.
In the second AuthorBooks table design, the author_id is indexed by the PRIMARY KEY of the table. So as the join does a lookup to the author_id, it comes with access to the matching book_id, without an extra lookup to the table. The book_id can then be used for the second join. This eliminates a step for each row found by the query.
This turns out to be a great benefit for performance. I have optimized some queries simply by making a many-to-many table use a covering index like this—whether by using the primary key or creating an extra two-column index on the two foreign keys—and this resulted in up to six orders of magnitude improvement for performance.

The answer by #billKarwin is really good. I would just add one observation.
Identifying and non-identifying relationships are logical constructs. They model the underlying business domain - see this question (also answered by the ubiquitous #billKarwin). The reason to use logical constructs like this is to make the database easier to understand (and therefore maintain, extend, etc.). It's not to make your database "faster".

Related

Primary Key/Foreign Key for a table with two junction tables

Lets say I have table A with two junction tables B and C, how would I go about creating primary keys for table A? I have two of these types of table in a diagram I drew, the circle keys are foreign keys btw.
Image with junction tables
Your games table needs only one primary key: this identifies each specific game. In the junction tables, the primary keys are composed of the game primary key and the directors (or types) primary key.
Taken from the reference in the tutorial MySQL Primary Key:
CREATE TABLE roles(
role_id INT AUTO_INCREMENT,
role_name VARCHAR(50),
PRIMARY KEY(role_id)
);
It is difficult to provide information about your specific question because there is too little details in it.
From your comment "if a table has two junction tables attached to it, would it need to have two primary keys?". No.
A primary key is actually a logical concept (a design mechanism) used to define a logical model. A primary key is a set of attributes (columns) that together uniquely identify each end every Tuple (row) in a relation (table). One of the rules of a primary key is that there is only one per relation.
The logical model is used, as mentioned, as the design to create the physical model, relations become tables, attributes become columns, Primary keys may become unique indexes. Foreign Keys may become indexes in the related table and so on.
Many RDBMS's allow the specification of a PRIMARY KEY in a physical table definition. Most also allow definition of FOREIGN KEYs on a physical table also. What they do with them may vary from one implementation to another. Many use the definition of a PRIMARY KEY to define a UNIQUE INDEX of some sort to enforce the "must uniquely identify" each and every record in the table.
So, No, your games_directors table does not need, nor can it have, two primary keys. if you did choose to specify a PRIMARY KEY, you would need to specify all the columns that uniquely identify records in the games_directors table - most likely PRIMARY KEY (game_id, director_id).
Similarly, the PRIMARY KEY for the games table would likely be PRIMARY KEY (game_id), for the directors would likely be PRIMARY KEY (director_id) and for game types it would likely be PRIMARY KEY (game_type_id).
You might use a foreign key from your games_directors table to ensure that when records are added to it that the corresponding director exists in the games table and the directors table. In this case, your games_directors table will have two foreign key relationships (one to games and another to directors). But only one PRIMARY KEY.
So you might end up with something like this:
create table games (
game_id integer,
PRIMARY KEY (game_id)
);
create table directors (
director_id integer,
PRIMARY KEY (director_id)
);
CREATE TABLE games_directors (
game_id INTEGER NOT NULL,
director_id INTEGER NOT NULL,
commission_paid DECIMAL(10,2),
PRIMARY KEY (game_id, director_id),
FOREIGN KEY (game_id) REFERENCES games(game_id),
FOREIGN KEY (director_id) REFERENCES directors(director_id)
);
NB: I didn't tested the above using PostgreSql. The syntax should work for most RDBMS's, but some may require tweaking slightly.
Indexes can be used to speed up access to individual records within table. For example, you might want to create an index on director name or director id (depending upon how you most frequenytly access that table). If you mostly access the director table with an equality condition like this : where director_name = 'fred' then an index on director_name might make sense.
Indexes become more useful as the number of records in the tables grows.
I hope this answers your question. :-)

When to use foreign key as a primary key at the same time?

I have get an intermediate table ArticleLanguage
idArticleLanguage
ArticleId
LanguageId
Name
Foreign keys are:
ArticleId
LanguageId
Should I use primary keys for:
ArticleId
LanguageId
Because these fields are primary keys in related tables?
Link / Junction Tables
Assuming the linked tables are defined as:
CREATE TABLE Article
(
ArticleId INT PRIMARY KEY
-- ... other columns
);
CREATE TABLE Language
(
LanguageId INT PRIMARY KEY
-- ... other columns
);
As per #JulioPérez Option 1, the link table could be created as:
CREATE TABLE ArticleLanguage
(
ArticleId INT NOT NULL,
LanguageId INT NOT NULL,
Name VARCHAR(50),
-- i.e. Composite Primary Key, consisting of the two foreign keys.
PRIMARY KEY(ArticleId, LanguageId),
FOREIGN KEY(ArticleId) REFERENCES Article(ArticleId),
FOREIGN KEY(LanguageId) REFERENCES Language(LanguageId)
);
i.e. with a composite primary key consisting of the two foreign keys used in the "link" relationship, and with no additional Surrogate Key (idArticleLanguage) at all.
Pros of this approach
Enforces uniqueness of the link, i.e. the same ArticleId and LanguageId cannot be linked more than once.
Saves an unnecessary additional surrogate key column on the link table.
Cons of this approach:
Any downstream tables which needs to reference this link table, would need to repeat both keys (ArticleId, LanguageId) as a composite foreign key, which would again consume space. Queries involving downstream tables which reference ArticleLanguage would also be able to join directly to Article and Language, potentially bypassing the link table (it is often easy to 'forget' that both keys are required in the join when using foreign composite keys).
SqlFiddle of option 1 here
The alternative (#JulioPérez Option 2), would be to to keep your additional surrogate PK on the reference table.
CREATE TABLE ArticleLanguage
(
-- New Surrogate PK
idArticleLanguage INT NOT NULL AUTO_INCREMENT,
ArticleId INT NOT NULL,
LanguageId INT,
Name VARCHAR(50),
PRIMARY KEY(idArticleLanguage),
-- Can still optionally enforce uniqueness of the link
UNIQUE(ArticleId, LanguageId),
FOREIGN KEY(ArticleId) REFERENCES Article(ArticleId),
FOREIGN KEY(LanguageId) REFERENCES Language(LanguageId)
);
Pros of this Approach
The Primary Key idArticleLanguage is narrower than the composite key, which will benefit any further downstream tables referencing table ArticleLanguage. It also requires downstream tables to join through the ArticleLanguage link table in order to get ArticleId and LanguageId, for further joins to the Language and Article tables.
The approach allows for an additional use case, viz that if it IS possible to add the same link to Language and Article more than once (e.g. two revisions or two reprints etc), then the UNIQUE key constraint can be removed
Cons of this Approach
If only one unique link per Article and Language is possible, then the additional surrogate key is redundant
SqlFiddle of option 2 here
If you're asking for an opinion, I would stick with option 1, unless you do require non-unique links in your ArticleLanguage table, or unless you have many further downstream tables which reference ArticleLanguage (this would be unusual, IMO).
Table per Type / per Class Inheritance
Unrelated to OP's post, but another common occurrence where a Foreign Key can be used as a Primary Key in the referencing table is when the Table per Type approach is taken when modelling an object oriented class hierarchy with multiple subclasses. Because of the 0/1 to 1 relationship between subclass and base class tables, the base class table's primary key can also be used as the primary key for the subclass tables, for instance:
CREATE TABLE Animal
(
AnimalId INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
-- Common Animal fields here
);
CREATE TABLE Shark
(
AnimalId INT NOT NULL PRIMARY KEY,
-- Subclass specific columns
NumberFins INT,
FOREIGN KEY(AnimalId) REFERENCES Animal(AnimalId)
);
CREATE TABLE Ewok
(
AnimalId INT NOT NULL PRIMARY KEY,
-- Subclass specific columns
Fleas BOOL,
FOREIGN KEY(AnimalId) REFERENCES Animal(AnimalId)
);
More on TPT and other OO modelling in tables here
You have 2 ways:
1) Put "ArticleId + LanguageId" as your only primary key in "intermediate table" and you can name it as "idArticleLanguage". This is called a "composite" primary key because it is composed by 2 (in other case more than 2) fields, in this case 2 foreign keys (PK= FK + FK).
2) Create "idArticleLanguage" that has no relation to the other two "id" and set it as primary key.It can be a simple auto-increment integer.
Both alternatives are accepted. Your election will depend on the goal you want to achieve because what happens if you need to add in this intermediate table the same Article with the same language (Wilkommen German for example) because you have 2 different editions of the article? if you choose alternative 1 it will throw an error because you will have the same composite primary key for 2 rows then you must choose alternative 2 and create a completely different primary key for this table
In any other case (or purpose) you can choose alternative 1 and it will work perfectly
About the change of your question title:
When use foreign key as primary key in the same time?
I will explain it with this example:
You have 2 tables: "country" and "city". "country" have all the countries of the world, "city" have all the cities of the world. But you need to know every capital in the world. What you should do?
You must create an "intermediate table" (named as "capital") that will have every capital on the world. So, we know that country have it's primary key "idcountry" and city have it's primary key is "idcity" you need to bring both as foreign keys to the table "capital" because you will need data of "city" and "country" tables to fill "capital" table
Then "capital" will have it's own primary key "idcapital" that can be a composite one "idcity+idcountry" or it can be an auto-increment integer in both cases you must have "idcity" and "idcountry" as foreign keys on your "capital" table.

MySQL - autoincrement + compound primary key - performance & integrity

I have a database design that makes use of compound primary keys to ensure uniqueness and which are also foreign keys.
These tables are then linked to other tables in the same way, so that in the end the compound key can get up to 4 or 5 columns. This led to some rather large JOINs, so I thought a simple solution would be to use an autoincrement column which is not part of the primary key but which is used as part of the primary key of other table(s).
Here is some pseudo code showing the general layout :
CREATE TABLE Item (
id AUTO_INCREMENT,
...
PRIMARY KEY (id)
) ENGINE = InnoDB;
CREATE TABLE PriceCategory (
id AUTO_INCREMENT,
...
PRIMARY KEY (id)
)
CREATE TABLE ItemPriceCategory (
itemId,
priceCategoryId,
id AUTO_INCREMENT,
...
UNIQUE INDEX id,
PRIMARY KEY (eventId, priceCategoryId)
)
CREATE TABLE ClientType (
id AUTO_INCREMENT,
...
PRIMARY KEY (id)
)
CREATE TABLE Price (
itemPriceCategoryId,
clientTypeId,
id AUTO_INCREMENT,
...
UNIQUE INDEX id,
PRIMARY KEY (itemPriceCategoryId, clientTypeId)
)
table Purchase (
priceId,
userId,
amount,
PRIMARY KEY (priceId, userId)
)
The names of tables have been changed to protect the innocent ;-) Also the actual layout is a little deeper in terms of references.
So, my question is, is this a viable strategy, from a performance and data integrity point of view ? Is it better to have all keys from all the referenced tables in the Purchase table ?
Thanks in advance.
Generally, the advice on primary keys is to have "meaningless", immutable primary keys with a single column. Auto incrementing integers are nice.
So, I would reverse your design - your join tables should also have meaningless primary keys. For instance:
CREATE TABLE ItemPriceCategory (
itemId,
priceCategoryId,
id AUTO_INCREMENT,
...
PRIMARY KEY id,
UNIQUE INDEX (eventId, priceCategoryId)
)
That way, the itemPriceCategoryId column in price is a proper foreign key, linking to the primary key of the ItemPriceCategory table.
You can then use http://dev.mysql.com/doc/refman/5.5/en/innodb-foreign-key-constraints.html foreign keys to ensure the consistency of your database.
In terms of performance, broadly speaking, this strategy should be faster than querying compound keys in a join, but with a well-indexed database, you may not actually notice the difference...
I think that something has been lost in translation over here, but I did my best to make an ER diagram of this.
In general, there are two approaches. The first one is to propagate keys and the second one is to have an auto-increment integer as a PK for each table.
The second approach is often driven by ORM tools which use a DB as object-persistence storage, while the first one (using key propagation) is more common for hand-crafted DB design.
In general, the model with key propagation offers better performance for "random queries", mostly because you can "skip tables" in joins. For example, in the model with key propagation you can join the Purchase table directly to the Item table to report purchases by ItemName. In the other model you would have to join Price and ItemPriceCategory tables too -- just to get to the ItemID.
Basically, the model with key propagation is essentially relational -- while the other one is object-driven. ORM tools either prefer or enforce the model with separate ID (second case), but offer other advantages for development.
Your example seems to be trying to use some kind of a combination of these two -- not necessarily bad, it would help if you could talk to original designer.
With key propagation
Independent keys for each table

Does a Join table (association table) have a primary key ? many to many relationship

Does a Join table (association table) have a primary key ? many to many relationship. I've seen some Join tables with a primary key and some without can someone please explain when you would have a primary key in a join table and why?
Thank you in advance;-)
In a pure 'join' or junction table all the fields will be part of the primary key. For example let's consider the following tables:
CREATE TABLE USERS
(ID_USER NUMBER PRIMARY KEY,
FIRST_NAME VARCHAR2(32),
LAST_NAME VARCHAR2(32));
CREATE TABLE ATTRIBUTES
(ID_ATTRIBUTE NUMBER PRIMARY KEY,
ATTRIBUTE_NAME VARCHAR2(64));
A junction table between these to allow many users to have many attributes would be
CREATE TABLE USER_ATTRIBUTES
(ID_USER NUMBER REFERENCES USERS(ID_USER),
ID_ATTRIBUTE NUMBER REFERENCES ATTRIBUTES(ID_ATTRIBUTE),
PRIMARY KEY(ID_USER, ID_ATTRIBUTE));
Sometimes you'll find the need to add a non-primary column to a junction table but I find this is relatively rare.
Share and enjoy.
All tables should have a primary key. :-)
You can either use a compound foreign key, or a blind integer key.
You would use the compound foreign key when there are no other elements in your association table.
You could use the blind integer key when the association table has elements of its own. The compound foreign key would be defined with two additional indexes.
It depends on the records you are associating. You can create a composite primary key on the id's of the associated records as long as you don't need multiple records per association.
However, its far more important that you make sure both these columns are indexed and have referential integrity defined.
Relationship tables frequently have three candidate keys, one of which need not be enforced with a constraint, and the choice of which key (if any) should be 'primary' is arbitrary.
Consider this example from Joe Celko:
CREATE TABLE Couples
(boy_name INTEGER NOT NULL UNIQUE -- nested key
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL UNIQUE -- nested key,
REFERENCES Girls(girl_name),
PRIMARY KEY(boy_name, girl_name)); -- compound key
The "Couples" table lets you insert
these rows from the original set:
('Joe Celko', 'Brooke Shields')
('Alec Baldwin', 'Kim Bassinger')
Think about this table for a minute.
The PRIMARY KEY is now redundant. If
each boy appears only once in the
table and each girl appears only once
in the table, then each (boy_name,
girl_name) pair can appear only once.
From a theoretical viewpoint, I could
drop the compound key and make either
boy_name or girl_name the new primary
key, or I could just leave them as
candidate keys.
SQL products and theory do not always
match. Many products make the
assumption that the PRIMARY KEY is in
some way special in the data model and
will be the way to access the table
most of the time.
...HOWEVER I suspect you question is implying something more like, "Assuming I'm the kind of person who shuns natural keys in favour of artificial identifiers, should I add an artificial identifier to a table that is composed entirely of artificial identifiers referenced from other tables?"

Should I use indexes for a many-to-many database table?

does it make sense to create indexes for a table called user_movies with the following columns:
user_id
movie_id
There will be much more reading than inserting or updating on this table but I'm not sure what to do. Also: Is it adequate to omit a primary key in this situation?
The correct definition for this table is as follows:
CREATE TABLE user_movies (
user_id INT NOT NULL,
movie_id INT NOT NULL,
PRIMARY KEY (user_id, movie_id),
FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (movie_id) REFERENCES movies(movie_id)
) ENGINE=InnoDb;
Notice "primary key" is a constraint, not a column. It's best practice to have a primary key constraint in every table. Do not confuse primary key constraint with an auto-generated pseudokey column.
In MySQL, declaring a foreign key or a primary key implicitly creates an index. Yes, these are beneficial.
I would index both columns separately and yes you can eliminate the primary key.
I have always heard that you should create a unique index on BOTH columns, first one way (user_id + movie_id) then the other way (movie_id + user_id). It DOES work slightly faster (not much, about 10-20%) in my application with some quick and dirty testing.
It also makes sure you can't have two rows that tie the same movie_id to the same user_id (which could be good, but perhaps not always).
If you are using such a "join-table", you'll probably use some joins in your queries -- and those will probably benefit from an index on each one of those two columns (which means two separate indexes).