Foreign Keys in Database Design

Foreign Keys in Database Design - mysql

Not so much a problem but a question about best practice and what will work for me in the future.
I have a number of tables which contain data that are linked to accounts in my schema - services, locations, providers, etc.
I have two choices, I can add a foreign key to accounts to all of my tables which will reduce the number of joins needed, but potentially will add to the data stored and (maybe?) lead to inconsistencies.
So, my question is, should I add an accounts FK to services, locations, etc. or rely on joins to manage that for me?

Without knowing the structure of your databases it's hard to give a correct answer. But let's take a single table providers for example. If a provider can only have one account, then I would add a FK to the providers table. If this is not the case then I would not use a FK because it wouldn't work.
Foreign Keys are to relate things together so there is no inconsistency. So if you had an employees table and a departments table, employees would have a FK to departments because an employee can only be in one department.

You seem to have some trouble understanding FK's and what they give you.
Your FK should join to a table with a PK (primary key) and this ensures data integrity between the tables.
However if you do not index the columns of the FK, it will be an unindexed FK and this can lead to full table scans on your joins.
PK's and FK's are merely constraints and do not add to storage. Indexing a FK adds to storage, but the performance benefits of indexes usually outweigh the overhead of storage.
Using PK's and indexed FK's are all part of normalized data design and you should not have concerns in using them.

Related

Is merging two many to many relation in one relation table better than seperated relations?

I have tree table which two of them have a many to many relation with third table. now I want to know if I merge this relations together and create one relation table for this relations is a bad practice or not?
I think this design mede my relation and oprations easier to manage and easier understand. but I am worry about performance.
I want to know which one way has better performance?
shared middle table which I add its picture or create a middle table for each relation?

Complex junctions aren't bad in and of themselves. A row in a prescriptions table could well have doctor_id, patient_id, and pharmacy_id values. This works because a single prescription represents the confluence of doctor, patient, and pharmacy (plus some other information which could be factored out into still more related tables). The important part is that the prescription can only be expressed in terms of several foreign keys.
What you have is different. You're describing a case where a row in tbl_1 can be linked to tbl_2 or tbl_3 depending on the value of rel_type. Multiple connections are possible with multiple rel_types. This is bad, because you can't create an enforceable foreign key constraint on tbl_x_id. It may look simpler at first glance because there's only one junction table, but it's hiding a lot of complexity in ways that will hurt later. It'll perform worse, but the real issue is the lack of referential integrity.

Design considerations to using Foreign Key as Primary Key

Are there any general design considerations (good/bad/neutral) for using a foreign key of one table as the primary key in another table?
For example, assume the following tables as part of a film catalogue:
titles
------
id
episodes
--------
title_id (PK/FK)
Episodes could obviously be done with both an id and a title_id, where id would be the PK and title_id would be UNIQUE, but since title_id is already unique, and, technically, identifies the episode, would there be anything to consider in just using it as the PK? What about in general? What design considerations can you see to this?
Thanks for your thoughts!

The answer to your question is basically the description of the technique known as "shared primary key". Accordingly, I've replaced the two tags about primary-key and foreign-key with the single tag shared-primary-key.
Shared primary key is a design where the PK of one table is also an FK that references the PK of another table. As the tag wiki for shared-primary-key indicates, this is useful for one-to-one relationships, whether they are mandatory or optional. These relationships are sometimes called IS-A relationships, as in "an automobile is a vehicle". The relationship between vehicles and autos is also known as a class/subclass or type/subtype relationship.
Like any design technique, it has its benefits and its costs.
Edit in reply to comment:
The biggest benefit to shared primary key is that it enforces the 1-to-1 nature of the relationship. Having this rule enforced in the database is generally more productive than trying to make sure that all the application code follows the rule.
A secondary benefit is that is makes the join between the two tables simple and fast. It's fast (for some database systems) because of the indexes built to support PKs are used by the optimizer to speed up the join.
A third benefit is that a third table can reference both of these two tables with the same FK.
The cost is that there is some programming involved in adding a new entry to both tables. The PK from the primary table has to be copied into the secondary table, and the system typically won't do this for you. Also, the join, while fast, is not free.

Why are composite primary keys still around?

I'm assigned to migrate a database to a mid-class ERP.
The new system uses composite primary keys here and there, and from a pragmatic point of view, why?
Compared to autogenerated IDs, I can only see negative aspects;
Foreign keys becomes blurry
Harder migration or db-redesigns
Inflexible as business change. (My car has no reg.plate..)
Same integrity better achieved with constraints.
It's falling back to the design concept of candiate keys, which I neither see the point of.
Is it a habit/artifact from the floppy-days (minimizing space/indexes), or am I missing something?
//edit//
Just found good SO-post: Composite primary keys versus unique object ID field
//

Composite keys are required when your primary keys are non-surrogate and inherently, um, composite, that is, breakable into several non-related parts.
Some real-world examples:
Many-to-many link tables, in which the primary keys are composed of the keys of the entities related.
Multi-tenant applications when tenant_id is a part of primary key of each entity and the entities are only linkable within the same tenant (constrained by a foreign key).
Applications processing third-party data (with already provided primary keys)
Note that logically, all this can be achieved using a UNIQUE constraint (additional to a surrogate PRIMARY KEY).
However, there are some implementation specific things:
Some systems won't let a FOREIGN KEY refer to anything that is not a PRIMARY KEY.
Some systems would only cluster a table on a PRIMARY KEY, hence making the composite the PRIMARY KEY would improve performance of the queries joining on the composite.

Personally I prefer the use of surrogate keys. However, in joining tables that consist only of the ids from two other tables (to create a many-to-many relationships) composite keys are the way to go and thus taking them out would make things more difficult.
There is a school of thought that surrogate keys are always bad and that if you don't have uniqueness to record through the use of natural keys you have a bad design. I strongly disagree with this (if you aren't storing SSN or some other unique value I defy you to come up with a natural key for a person table for instance.) But many people feel that it is necessary for proper normalization.
Sometimes having a composite key reduces the need to join to another table. Sometimes it doesn't. So there are times when a composite key can boost performance as well as times when it can harm performance. If the key is relatively stable, you may be fine with faster performance on select queries. However, if it is something that is subject to change like a company name, you could be in a world of hurt when company A changes it's name and you have to update a million associated records.
There is no one size fits all in database design. There are time when composite keys are helpful and times when they are horrible. There are times when surrogate keys are helpful and times when they are not.

Composite primary key provides better performance when it comes to them being used as Foreign keys in other tables and reduces table reads - sometimes they can be life savers. If you use surrogate keys, you have to go to that table to get natural key information.
For example (pure example - so we are not talking DB design here), lets say you have an ORDER table and ORDER_ITEM. If you use ProductId and LineNumber (UPDATE: and as Pedro mentioned OrderId or even better OrderNumber) as composite primary key in ORDER_ITEM, then in your cross table for SHIPPING, you would be able to have ProductId in the SHIPPING_ORDERITEM. This can massively boost your performance if for example you have run out of that product and need to find out all products of that ProductId that need to be shipped without a need to join.
On the other hand, if you use a surrogate key, you have to join and you end up with a very inefficient SQL execution plan where it has to do bookmark lookup on several indexes.
See more on bookmark lookup which using surrogate keys becomes a major issue.

Natural primary keys are brittle.
Suppose we have built a system around a natural PK on (CountryCode, PhoneNumber), and several years down the road we need to add Extension, or change the PK to one column: Email. If these PK columns are propagated to all child tables, this becomes very expensive.
A few years ago there were some systems that were built assuming that Social Security Number is a natural PK, and had to be redesigned to use identities, when the SSN became non-unique and nullable.
Because we cannot predict the future, we don't know if later on some change will render obsolete what used to be a perfectly correct and complete model.

The very simple answer is data integrity. If the data is to be useful and accurate then the keys are presumably required. Having an "autogenerated id" doesn't remove the requirement for other keys as well. The alternative is not to enforce uniqueness and accept that data will be duplicated and almost inevatibly contain anomalies and lead to errors as a result. Why would you want that?

In short, the purpose of composite keys is to use the database to enforce one or more business rules. In other words: protect the integrity of your data.
Ex. You have a list of parts that you buy from suppliers. You could could create your supplier and parts table like such:
SUPPLIER
SupplierId
SupplierName
PART
PartId
PartName
SupplierId
Uh oh. The parts table allows for duplicate data. Since you used a surrogate key that was autogenerated, you're not enforcing the fact that a part from a supplier should only be entered once. Instead, you should create the PART table like such:
PART
SupplierId
SupplierPartId
PartName
In this example, your parts come from specific suppliers and you want to enforce the rule: "A single supplier can only supply a single part once" in the PARTS table. Hence, the composite key. Your composite key prevents accidental duplicate entry of a part.
You can always leave business rules out of your database and leave them to your application, but by keeping the rule in the database (via a composite key), you ensure that the business rule is enforced everywhere, especially if you should ever decide to allow multiple applications to access the data.

Just as functions encapsulate a set of instructions, or database views abstract base table connections, so to do surrogate keys abstract the meaning of the entity they are placed on.
If, for example, you have a table that holds vehicle data, applying a surrogate VehicleId abstracts what it means to be a vehicle from a data point of view. When you reference VehicleId = 1, you are most surely talking about a vehicle of some sort, but do we know if it is a 2008 Chevy Impala, or a 1991 Ford F-150? No. Can the underlying data of whatever Vehicle #1 is change at any time? Yes.

Short answer: Multi-column foreign keys naturally refer to multi column primary keys. There can still be an autogenerated id column that is part of the primary key.
Philosophical answer: Primary key is the identity of the row. If there there is a bit of information that is an intrinsic part of the identity of the row (such as which customer the article belongs to.. in a multi customer wiki) - The information should be part of the primary key.
An example: System for organizing LAN parties
The system supports several LAN parties with the same people and organizers attending thus:
CREATE TABLE users ( users_id serial PRIMARY KEY, ... );
And there are several parties:
CREATE TABLE parties ( parties_id serial PRIMARY KEY, ... );
But most of the other stuff needs to carry the information about which party it is linked to:
CREATE TABLE ticket_types (
ticket_types_id serial,
parties_id integer REFERENCES parties,
name text,
....
PRIMARY KEY(ticket_types_id, parties_id)
);
...this is because we want to refer to primary keys. Foreign key on table attendances points to table ticket_types.
CREATE TABLE attendances (
attendances_id serial,
parties_id integer REFERENCES parties,
ticket_types_id integer,
PRIMARY KEY (attendances_id, parties_id),
FOREIGN KEY (ticket_types_id, parties_id) REFERENCES parties
);

While I prefer surrogate keys, I use composite cases in a few cases. The composite key may consist entirely or partially of surrogate key fields.
Many to many join tables. These usually require a unique key on the key pair anyway. In some cases additional columns may be included in the key.
Weak child tables. Things like order lines do not stand on their own. In this case I use the parent (orders) tables primary key in the composite table.
When there are multiple weak tables related to an entity, it may be possible to eliminate a table from the join set when querying child data. In the case of grandchild tables, it is possible to join the grandparent to grandchild without involving the table in the middle.

Alternate to storing Large number of tables -- MySQL

Well, I have been working with large amount of network data. In which I have to filter out some IP address and store their communication with other IP's. But the number of IP's are huge, hundreds of thousands, for which I have to create so many tables. Ultimately I my MySQL access will slow down, everything will slow down. Each table will have few columns, many rows.
My Questions:
Is there a better way to deal with this, I mean storing data of each IP?
Is there something like table of tables?
[Edit]
The reason I am storing in different tables is, I have to keep removing and add entries as time passes by.
Here is the table structure
CREATE TABLE IP(syn_time datetime, source_ip varchar(18), dest_ip varchar(18));
I use C++ to access with ODBC connector

Don't DROP/CREATE tables frequently. MySQL is very buggy with doing that, and understandably so--it should only be done once when the database is created on a new machine. It will hurt things like your buffer pool hit ratio, and disk IO will spike out.
Instead, use InnoDB or xtradb, which means you can delete old rows whilst inserting new ones.
Store the IP in a column of type int(10) unsigned e.g. 192.168.10.50 would be stored as (192 * 2^24) + (168 * 2^16) + (10 * 2^8) + 50 = 3232238130
Put all the information into 1 table, and just use an SELECT ... WHERE on an indexed column

Creating tables dynamically is almost always a bad idea. The alternative is normalisation. I won't go into the academic details of that, but I'll try to explain it in more simple terms.
You can separate relationships between data into three types: one-to-one, one-to-many and many-to-many. Think about how each bit of data relates to other bits and which type of relationship it has.
If a data relationship is one-to-one,
then you can usually just stick it in
the same row of the same table.
Occasionally there may be a reason to
separate it as if it were
one-to-many, but generally speaking,
stick it all in the same place.
If a data relationship is
one-to-many, it should be referenced
between two tables by it's primary
key (you've given each table a
primary key, right?). The "one" side
of one-to-many should have a field
which references the primary key of
the other table. This field is called
a foreign key.
Many-to-many is the most complex
relationship, and it sounds like you
have a few of these. You have to
create a join table. This table will
contain two foreign key fields, one
for one table and another for the
other. For each link between two
records, you'll add one record to
your join table.
Hopefully this should get you started.

Add Foreign Key relationships as bulk operation

I've inherited a database with hundreds of tables. Tables may have implicit FK relations that are not explicitly defined as such. I would like to be able to write a script or query that would be able to do this for all tables. For instance, if a table has a field called user_id, then we know there's a FK relationship with the users table on the id column. Is this even doable?
Thanks in advanced,

Yes, possible but I would want to explore more. Many folks design relational databases without foreign keys especially in the MySQL world. Also people reuse column names in different tables in the same schema (often with less than optimal results). Double check that what you think is a foreign key can be used that way (same data type, width, collation/character set, etc.).
Then i would recommend you copy the tables to a test machine and start doing your ALTER TABLES to add foreign keys. Test like heck.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008