Relational Database design - entity associated with relationship - many-to-many

I have a question about best practices for relational database design. Here is an example that shows the issue I'm not sure about.
You have software packages with names, which hold several pieces of software. A user can own any amount of software package but only comment on the individual pieces of software in the packages he owns.
My entities are: User, Software Package, Software Piece, Comment.
User -> Software Package is N to N.
Software Package -> Software Piece is N to N.
Is the best you can do in a relational database make a Comment hold a foreign key for both a User and a Software piece. I don't really see a way to ensure through the DB schema that Comments can only exist between Users and a Software Piece from one of the Software Packages they own.
Is there a way to associate the Comment entity with a relationship that spans two entities?

Your business rule is that a user can only comment on a piece of software that is contained in a package that they own. What you need to do is apply this rule in the primary key(1) of your COMMENT table.
Consider this ERD:
Note how the primary key of a COMMENT is the combination of its foreign keys to the owned package and the package content. Since both of these primary keys include the package_id(2) you have a declarative referential constraint that ensures that the user can only comment on a piece of software that is contained in a package that they own.
Notes:
(1) If you are averse to composite primary keys, you can have a candidate key (unique index) which achieves the same result.
(2) If you are using an ORM or something that doesn't allow you to share one package_id column between two foreign key relationships you can have two package_id columns in the COMMENT table and add a constraint that forces the one to be equal to the other.

Not sure how you can do that with an Entity but possibly that you can do it if you create/use a composite foreign key with (UserId, SoftwarePieceId) pointing toward these same two fields on the table expressing the relationship N to N for Users and Software Piece.
However, I'm not sure if building this business rule into your schema will really be a good idea. The checking of most of your business rules should be done a the client tier; not at the database tier; otherwise you might end up with an over-complicated schema.

Related

How to design table with primary key, index, unique in SQL [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Here we go again, the old argument still arises...
Would we better have a business key as a primary key, or would we rather have a surrogate id (i.e. an SQL Server identity) with a unique constraint on the business key field?
Please, provide examples or proof to support your theory.
Just a few reasons for using surrogate keys:
Stability: Changing a key because of a business or natural need will negatively affect related tables. Surrogate keys rarely, if ever, need to be changed because there is no meaning tied to the value.
Convention: Allows you to have a standardized Primary Key column naming convention rather than having to think about how to join tables with various names for their PKs.
Speed: Depending on the PK value and type, a surrogate key of an integer may be smaller, faster to index and search.
Both. Have your cake and eat it.
Remember there is nothing special about a primary key, except that it is labelled as such. It is nothing more than a NOT NULL UNIQUE constraint, and a table can have more than one.
If you use a surrogate key, you still want a business key to ensure uniqueness according to the business rules.
It appears that no one has yet said anything in support of non-surrogate (I hesitate to say "natural") keys. So here goes...
A disadvantage of surrogate keys is that they are meaningless (cited as an advantage by some, but...). This sometimes forces you to join a lot more tables into your query than should really be necessary. Compare:
select sum(t.hours)
from timesheets t
where t.dept_code = 'HR'
and t.status = 'VALID'
and t.project_code = 'MYPROJECT'
and t.task = 'BUILD';
against:
select sum(t.hours)
from timesheets t
join departents d on d.dept_id = t.dept_id
join timesheet_statuses s on s.status_id = t.status_id
join projects p on p.project_id = t.project_id
join tasks k on k.task_id = t.task_id
where d.dept_code = 'HR'
and s.status = 'VALID'
and p.project_code = 'MYPROJECT'
and k.task_code = 'BUILD';
Unless anyone seriously thinks the following is a good idea?:
select sum(t.hours)
from timesheets t
where t.dept_id = 34394
and t.status_id = 89
and t.project_id = 1253
and t.task_id = 77;
"But" someone will say, "what happens when the code for MYPROJECT or VALID or HR changes?" To which my answer would be: "why would you need to change it?" These aren't "natural" keys in the sense that some outside body is going to legislate that henceforth 'VALID' should be re-coded as 'GOOD'. Only a small percentage of "natural" keys really fall into that category - SSN and Zip code being the usual examples. I would definitely use a meaningless numeric key for tables like Person, Address - but not for everything, which for some reason most people here seem to advocate.
See also: my answer to another question
Surrogate key will NEVER have a reason to change. I cannot say the same about the natural keys. Last names, emails, ISBN nubmers - they all can change one day.
Surrogate keys (typically integers) have the added-value of making your table relations faster, and more economic in storage and update speed (even better, foreign keys do not need to be updated when using surrogate keys, in contrast with business key fields, that do change now and then).
A table's primary key should be used for identifying uniquely the row, mainly for join purposes. Think a Persons table: names can change, and they're not guaranteed unique.
Think Companies: you're a happy Merkin company doing business with other companies in Merkia. You are clever enough not to use the company name as the primary key, so you use Merkia's government's unique company ID in its entirety of 10 alphanumeric characters.
Then Merkia changes the company IDs because they thought it would be a good idea. It's ok, you use your db engine's cascaded updates feature, for a change that shouldn't involve you in the first place. Later on, your business expands, and now you work with a company in Freedonia. Freedonian company id are up to 16 characters. You need to enlarge the company id primary key (also the foreign key fields in Orders, Issues, MoneyTransfers etc), adding a Country field in the primary key (also in the foreign keys). Ouch! Civil war in Freedonia, it's split in three countries. The country name of your associate should be changed to the new one; cascaded updates to the rescue. BTW, what's your primary key? (Country, CompanyID) or (CompanyID, Country)? The latter helps joins, the former avoids another index (or perhaps many, should you want your Orders grouped by country too).
All these are not proof, but an indication that a surrogate key to uniquely identify a row for all uses, including join operations, is preferable to a business key.
I hate surrogate keys in general. They should only be used when there is no quality natural key available. It is rather absurd when you think about it, to think that adding meaningless data to your table could make things better.
Here are my reasons:
When using natural keys, tables are clustered in the way that they are most often searched thus making queries faster.
When using surrogate keys you must add unique indexes on logical key columns. You still need to prevent logical duplicate data. For example, you can’t allow two Organizations with the same name in your Organization table even though the pk is a surrogate id column.
When surrogate keys are used as the primary key it is much less clear what the natural primary keys are. When developing you want to know what set of columns make the table unique.
In one to many relationship chains, the logical key chains. So for example, Organizations have many Accounts and Accounts have many Invoices. So the logical-key of Organization is OrgName. The logical-key of Accounts is OrgName, AccountID. The logical-key of Invoice is OrgName, AccountID, InvoiceNumber.
When surrogate keys are used, the key chains are truncated by only having a foreign key to the immediate parent. For example, the Invoice table does not have an OrgName column. It only has a column for the AccountID. If you want to search for invoices for a given organization, then you will need to join the Organization, Account, and Invoice tables. If you use logical keys, then you could Query the Organization table directly.
Storing surrogate key values of lookup tables causes tables to be filled with meaningless integers. To view the data, complex views must be created that join to all of the lookup tables. A lookup table is meant to hold a set of acceptable values for a column. It should not be codified by storing an integer surrogate key instead. There is nothing in the normalization rules that suggest that you should store a surrogate integer instead of the value itself.
I have three different database books. Not one of them shows using surrogate keys.
I want to share my experience with you on this endless war :D on natural vs surrogate key dilemma. I think that both surrogate keys (artificial auto-generated ones) and natural keys (composed of column(s) with domain meaning) have pros and cons. So depending on your situation, it might be more relevant to choose one method or the other.
As it seems that many people present surrogate keys as the almost perfect solution and natural keys as the plague, I will focus on the other point of view's arguments:
Disadvantages of surrogate keys
Surrogate keys are:
Source of performance problems:
They are usually implemented using auto-incremented columns which mean:
A round-trip to the database each time you want to get a new Id (I know that this can be improved using caching or [seq]hilo alike algorithms but still those methods have their own drawbacks).
If one-day you need to move your data from one schema to another (It happens quite regularly in my company at least) then you might encounter Id collision problems. And Yes I know that you can use UUIDs but those lasts requires 32 hexadecimal digits! (If you care about database size then it can be an issue).
If you are using one sequence for all your surrogate keys then - for sure - you will end up with contention on your database.
Error prone. A sequence has a max_value limit so - as a developer - you have to put attention to the following points:
You must cycle your sequence ( when the max-value is reached it goes back to 1,2,...).
If you are using the sequence as an ordering (over time) of your data then you must handle the case of cycling (column with Id 1 might be newer than row with Id max-value - 1).
Make sure that your code (and even your client interfaces which should not happen as it supposed to be an internal Id) supports 32b/64b integers that you used to store your sequence values.
They don't guarantee non duplicated data. You can always have 2 rows with all the same column values but with a different generated value. For me this is THE problem of surrogate keys from a database design point of view.
More in Wikipedia...
Myths on natural keys
Composite keys are less inefficient than surrogate keys. No! It depends on the used database engine:
Oracle
MySQL
Natural keys don't exist in real-life. Sorry but they do exist! In aviation industry, for example, the following tuple will be always unique regarding a given scheduled flight (airline, departureDate, flightNumber, operationalSuffix). More generally, when a set of business data is guaranteed to be unique by a given standard then this set of data is a [good] natural key candidate.
Natural keys "pollute the schema" of child tables. For me this is more a feeling than a real problem. Having a 4 columns primary-key of 2 bytes each might be more efficient than a single column of 11 bytes. Besides, the 4 columns can be used to query the child table directly (by using the 4 columns in a where clause) without joining to the parent table.
Conclusion
Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them.
Hope that this helped someone!
Alway use a key that has no business meaning. It's just good practice.
EDIT: I was trying to find a link to it online, but I couldn't. However in 'Patterns of Enterprise Archtecture' [Fowler] it has a good explanation of why you shouldn't use anything other than a key with no meaning other than being a key. It boils down to the fact that it should have one job and one job only.
Surrogate keys are quite handy if you plan to use an ORM tool to handle/generate your data classes. While you can use composite keys with some of the more advanced mappers (read: hibernate), it adds some complexity to your code.
(Of course, database purists will argue that even the notion of a surrogate key is an abomination.)
I'm a fan of using uids for surrogate keys when suitable. The major win with them is that you know the key in advance e.g. you can create an instance of a class with the ID already set and guaranteed to be unique whereas with, say, an integer key you'll need to default to 0 or -1 and update to an appropriate value when you save/update.
UIDs have penalties in terms of lookup and join speed though so it depends on the application in question as to whether they're desirable.
Using a surrogate key is better in my opinion as there is zero chance of it changing. Almost anything I can think of which you might use as a natural key could change (disclaimer: not always true, but commonly).
An example might be a DB of cars - on first glance, you might think that the licence plate could be used as the key. But these could be changed so that'd be a bad idea. You wouldnt really want to find that out after releasing the app, when someone comes to you wanting to know why they can't change their number plate to their shiny new personalised one.
Always use a single column, surrogate key if at all possible. This makes joins as well as inserts/updates/deletes much cleaner because you're only responsible for tracking a single piece of information to maintain the record.
Then, as needed, stack your business keys as unique contraints or indexes. This will keep you data integrity intact.
Business logic/natural keys can change, but the phisical key of a table should NEVER change.
Case 1: Your table is a lookup table with less than 50 records (50 types)
In this case, use manually named keys, according to the meaning of each record.
For Example:
Table: JOB with 50 records
CODE (primary key) NAME DESCRIPTION
PRG PROGRAMMER A programmer is writing code
MNG MANAGER A manager is doing whatever
CLN CLEANER A cleaner cleans
...............
joined with
Table: PEOPLE with 100000 inserts
foreign key JOBCODE in table PEOPLE
looks at
primary key CODE in table JOB
Case 2: Your table is a table with thousands of records
Use surrogate/autoincrement keys.
For Example:
Table: ASSIGNMENT with 1000000 records
joined with
Table: PEOPLE with 100000 records
foreign key PEOPLEID in table ASSIGNMENT
looks at
primary key ID in table PEOPLE (autoincrement)
In the first case:
You can select all programmers in table PEOPLE without use of join with table JOB, but just with: SELECT * FROM PEOPLE WHERE JOBCODE = 'PRG'
In the second case:
Your database queries are faster because your primary key is an integer
You don't need to bother yourself with finding the next unique key because the database itself gives you the next autoincrement.
Surrogate keys can be useful when business information can change or be identical. Business names don't have to be unique across the country, after all. Suppose you deal with two businesses named Smith Electronics, one in Kansas and one in Michigan. You can distinguish them by address, but that'll change. Even the state can change; what if Smith Electronics of Kansas City, Kansas moves across the river to Kansas City, Missouri? There's no obvious way of keeping these businesses distinct with natural key information, so a surrogate key is very useful.
Think of the surrogate key like an ISBN number. Usually, you identify a book by title and author. However, I've got two books titled "Pearl Harbor" by H. P. Willmott, and they're definitely different books, not just different editions. In a case like that, I could refer to the looks of the books, or the earlier versus the later, but it's just as well I have the ISBN to fall back on.
On a datawarehouse scenario I believe is better to follow the surrogate key path. Two reasons:
You are independent of the source system, and changes there --such as a data type change-- won't affect you.
Your DW will need less physical space since you will use only integer data types for your surrogate keys. Also your indexes will work better.
As a reminder it is not good practice to place clustered indices on random surrogate keys i.e. GUIDs that read XY8D7-DFD8S, as they SQL Server has no ability to physically sort these data. You should instead place unique indices on these data, though it may be also beneficial to simply run SQL profiler for the main table operations and then place those data into the Database Engine Tuning Advisor.
See thread # http://social.msdn.microsoft.com/Forums/en-us/sqlgetstarted/thread/27bd9c77-ec31-44f1-ab7f-bd2cb13129be
This is one of those cases where a surrogate key pretty much always makes sense. There are cases where you either choose what's best for the database or what's best for your object model, but in both cases, using a meaningless key or GUID is a better idea. It makes indexing easier and faster, and it is an identity for your object that doesn't change.
In the case of point in time database it is best to have combination of surrogate and natural keys. e.g. you need to track a member information for a club. Some attributes of a member never change. e.g Date of Birth but name can change.
So create a Member table with a member_id surrogate key and have a column for DOB.
Create another table called person name and have columns for member_id, member_fname, member_lname, date_updated. In this table the natural key would be member_id + date_updated.
Horse for courses. To state my bias; I'm a developer first, so I'm mainly concerned with giving the users a working application.
I've worked on systems with natural keys, and had to spend a lot of time making sure that value changes would ripple through.
I've worked on systems with only surrogate keys, and the only drawback has been a lack of denormalised data for partitioning.
Most traditional PL/SQL developers I have worked with didn't like surrogate keys because of the number of tables per join, but our test and production databases never raised a sweat; the extra joins didn't affect the application performance. With database dialects that don't support clauses like "X inner join Y on X.a = Y.b", or developers who don't use that syntax, the extra joins for surrogate keys do make the queries harder to read, and longer to type and check: see #Tony Andrews post. But if you use an ORM or any other SQL-generation framework you won't notice it. Touch-typing also mitigates.
Maybe not completely relevant to this topic, but a headache I have dealing with surrogate keys. Oracle pre-delivered analytics creates auto-generated SKs on all of its dimension tables in the warehouse, and it also stores those on the facts. So, anytime they (dimensions) need to be reloaded as new columns are added or need to be populated for all items in the dimension, the SKs assigned during the update makes the SKs out of sync with the original values stored to the fact, forcing a complete reload of all fact tables that join to it. I would prefer that even if the SK was a meaningless number, there would be some way that it could not change for original/old records. As many know, out-of-the box rarely serves an organization's needs, and we have to customize constantly. We now have 3yrs worth of data in our warehouse, and complete reloads from the Oracle Financial systems are very large. So in my case, they are not generated from data entry, but added in a warehouse to help reporting performance. I get it, but ours do change, and it's a nightmare.

MySQL is there a point to having a primary key on a lookup table which referrers to a primary key on another table which is indexed?

I'm just doing some basic normalisation but I don't have the answer for this, wondering if you guys can give me some info on right/wrong, do's/dont's etc.
So if I have:
I've always set a primary key (unique auto incrementer on lookup tables), in the image the lookup tables would be "page_downloads" and "page_includes" but I can guarantee those columns will never get used as they will only be queried via the page_id, same for so many definition tables.
So my question is: "Is there any point? What is the best practice thing to do? Always create the primary key even though it will never be used or don't bother creating it as it is fine to use the indexed int column which refers to a primary key in another table. Eg the relationship in the picture (page_id to page_id). Thoughts?"
Thanks
D
No. While every table should have a PRIMARY KEY, it need not be a surrogate. In this instance, (page_id,file_id) is a valid compound PRIMARY KEY (as is (file_id,page_id)).
To add some info to Strawberry's valid observations.
There's no absolute answer or best practice regarding the surrogate keys and usually this boils down to individual preference. There are both advantages and disadvantages to using surrogate keys. Among the advantages, one could consider:
Immutability Surrogate keys do not change while the row exists.
This has the following advantages:
Applications cannot lose their reference to a row in the database
(since the identifier never changes). The primary or natural key data
can always be modified, even with databases that do not support
cascading updates across related foreign keys. Requirement
changes[edit] Attributes that uniquely identify an entity might
change, which might invalidate the suitability of natural keys.
Consider the following example:
An employee's network user name is chosen as a natural key. Upon
merging with another company, new employees must be inserted. Some of
the new network user names create conflicts because their user names
were generated independently (when the companies were separate). In
these cases, generally a new attribute must be added to the natural
key (for example, an original_company column). With a surrogate key,
only the table that defines the surrogate key must be changed. With
natural keys, all tables (and possibly other, related software) that
use the natural key will have to change.
Some problem domains do not clearly identify a suitable natural key.
Surrogate keys avoid choosing a natural key that might be incorrect.
Performance[edit] Surrogate keys tend to be a compact data type, such
as a four-byte integer. This allows the database to query the single
key column faster than it could multiple columns. Furthermore a
non-redundant distribution of keys causes the resulting b-tree index
to be completely balanced. Surrogate keys are also less expensive to
join (fewer columns to compare) than compound keys.
Compatibility While using several database application
development systems, drivers, and object-relational mapping systems,
such as Ruby on Rails or Hibernate, it is much easier to use an
integer or GUID surrogate keys for every table instead of natural keys
in order to support database-system-agnostic operations and
object-to-row mapping.
Uniformity When every table has a uniform surrogate key, some
tasks can be easily automated by writing the code in a
table-independent way.
Validation It is possible to design key-values that follow a
well-known pattern or structure which can be automatically verified.
For instance, the keys that are intended to be used in some column of
some table might be designed to "look differently from" those that are
intended to be used in another column or table, thereby simplifying
the detection of application errors in which the keys have been
misplaced. However, this characteristic of the surrogate keys should
never be used to drive any of the logic of the applications
themselves, as this would violate the principles of Database
normalization.

ER diagram - avoiding one-to-one relationship

I've been working on an ER diagram for university project. It is about transport company. That company does particular jobs for other companies and for each job, there are three types of documents needed, and those documents have unique identifiers among other documents of the same kind. So what I did is made these types of documents as separate entities. Now when I want to join them(call them Doc1, Doc2, Doc3) into one entity(call it Job), they are basically made only for that one job and for no other. Also, this job has only one of each of these documents, so therefore it looks like relationships between documents and job are one-to-one. However, when the professor was teaching us ER models, he told that we should always avoid drawing one-to-one relationships(that there should be a way to make these documents kind of attributes of job). So what I want to know is - is it correct to draw the identifiers of these documents as attributes of job, and then make them as foreign keys referencing corresponding fields in documents' table(in relations model)? Or is there any other, more elegant way to connect them somehow avoiding these one-to-one relationships?
Also, if I do it this way, I guess I should make all 3 columns representing documents' identifiers UNIQUE in Job table, right? So that I avoid making two jobs having, for example, same Doc1?
Thank you!
One-to-one relationships are to be avoided, because they signal that the entities joined by the relationship are actually one. However, in the case specified here, the relationship is not one-to-one. Instead it is "one to zero or one", also known as "one-to-one optional".
An example is the relationship between a Home and a Lot. The Home must be located on a Lot, and only one Home can be located on any given Lot, but the Lot can exist before the Home is built. If you are modelling this relationship, you would have a "one to zero or one" relationship between Lot and Home. It would be shown like this:
In your case you have three separate dependencies, so it would look like:
Physically, these relationships may be represented in two ways:
A nullable foreign key in the "one" row (Lot, in my example above),
or
A non-nullable foreign key in the "zero or one" row (Home, in my example above)
You can choose the approach that is most comfortable and efficient for you, depending on the direction in which your application usually navigates.
You may decide to have the database enforce the uniqueness constraint (the fact that only one Home can be on a Lot). In some databases, a null value participates in uniqueness constraints (in other words, a unique index can only have one Null entry). In such a database, you would be constrained to the second approach. In MySQL, this is not the case; a uniqueness constraint ignores null values, so you can choose either approach. The second approach is more common.

Not defined database schema

I have a mysql database with 220 tables. The database is will structured but without any clear relations. I want to find a way to connect the primary key of each table to its correspondent foreign key.
I was thinking to write a script to discover the possible relation between two columns:
The content range should be similar in both of them
The foreign key name could be similar to the primary key table name
Those features are not sufficient to solve the problem. Do you have any idea how I could be more accurate and close to the solution. Also, If there's any available tool which do that.
Please Advice!
Sounds like you have a licensed app+RFS, and you want to save the data (which is an asset that belongs to the organisation), and ditch the app (due to the problems having exceeded the threshold of acceptability).
Happens all the time. Until something like this happens, people do not appreciate that their data is precious, that it out-lives any app, good or bad, in-house or third-party.
SQL Platform
If it was an honest SQL platform, it would have the SQL-compliant catalogue, and the catalogue would contain an entry for each reference. The catalogue is an entry-level SQL Compliance requirement. The code required to access the catalogue and extract the FOREIGN KEY declarations is simple, and it is written in SQL.
Unless you are saying "there are no Referential Integrity constraints, it is all controlled from the app layers", which means it is not a database, it is a data storage location, a Record Filing System, a slave of the app.
In that case, your data has no Referential Integrity
Pretend SQL Platform
Evidently non-compliant databases such as MySQL, PostgreSQL and Oracle fraudulently position themselves as "SQL", but they do not have basic SQL functionality, such as a catalogue. I suppose you get what you pay for.
Solution
For (a) such databases, such as your MySQL, and (b) data placed in an honest SQL container that has no FOREIGN KEY declarations, I would use one of two methods.
Solution 1
First preference.
use awk
load each table into an array
write scripts to:
determine the Keys (if your "keys" are ID fields, you are stuffed, details below)
determine any references between the Keys of the arrays
Solution 2
Now you could do all that in SQL, but then, the code would be horrendous, and SQL is not designed for that (table comparisons). Which is why I would use awk, in which case the code (for an experienced coder) is complex (given 220 files) but straight-forward. That is squarely within awks design and purpose. It would take far less development time.
I wouldn't attempt to provide code here, there are too many dependencies to identify, it would be premature and primitive.
Relational Keys
Relational Keys, as required by Codd's Relational Model, relate ("link", "map", "connect") each row in each table to the rows in any other table that it is related to, by Key. These Keys are natural Keys, and usually compound Keys. Keys are logical identifiers of the data. Thus, writing either awk programs or SQL code to determine:
the Keys
the occurrences of the Keys elsewhere
and thus the dependencies
is a pretty straight-forward matter, because the Keys are visible, recognisable as such.
This is also very important for data that is exported from the database to some other system (which is precisely what we are trying to do here). The Keys have meaning, to the organisation, and that meaning is beyond the database. Thus importation is easy. Codd wrote about this value specifically in the RM.
This is just one of the many scenarios where the value of Relational Keys, the absolute need for them, is appreciated.
Non-keys
Conversely, if your Record Filing System has no Relational Keys, then you are stuffed, and stuffed big time. The IDs are in fact record numbers in the files. They all have the same range, say 1 to 1 million. It is not reasonably possible to relate any given record number in one file to its occurrences in any other file, because record numbers have no meaning.
Record numbers are physical, they do not identify the data.
I see a record number 123456 being repeated in the Invoice file, now what other file does this relate to ? Every other possible file, Supplier, Customer, Part, Address, CreditCard, where it occurs once only, has a record number 123456 !
Whereas with Relational Keys:
I see IBM plus a sequence 1, 2, 3, ... in the Invoice table, now what other table does this relate to ? The only table that has IBM occurring once is the Customer table.
The moral of the story, to etch into one's mind, is this. Actually there are a few, even when limiting them to the context of this Question:
If you want a Relational Database, use Relational Keys, do not use Record IDs
If you want Referential Integrity, use Relational Keys, do not use Record IDs
If your data is precious, use Relational Keys, do not use Record IDs
If you want to export/import your data, use Relational Keys, do not use Record IDs

Why are composite primary keys still around?

I'm assigned to migrate a database to a mid-class ERP.
The new system uses composite primary keys here and there, and from a pragmatic point of view, why?
Compared to autogenerated IDs, I can only see negative aspects;
Foreign keys becomes blurry
Harder migration or db-redesigns
Inflexible as business change. (My car has no reg.plate..)
Same integrity better achieved with constraints.
It's falling back to the design concept of candiate keys, which I neither see the point of.
Is it a habit/artifact from the floppy-days (minimizing space/indexes), or am I missing something?
//edit//
Just found good SO-post: Composite primary keys versus unique object ID field
//
Composite keys are required when your primary keys are non-surrogate and inherently, um, composite, that is, breakable into several non-related parts.
Some real-world examples:
Many-to-many link tables, in which the primary keys are composed of the keys of the entities related.
Multi-tenant applications when tenant_id is a part of primary key of each entity and the entities are only linkable within the same tenant (constrained by a foreign key).
Applications processing third-party data (with already provided primary keys)
Note that logically, all this can be achieved using a UNIQUE constraint (additional to a surrogate PRIMARY KEY).
However, there are some implementation specific things:
Some systems won't let a FOREIGN KEY refer to anything that is not a PRIMARY KEY.
Some systems would only cluster a table on a PRIMARY KEY, hence making the composite the PRIMARY KEY would improve performance of the queries joining on the composite.
Personally I prefer the use of surrogate keys. However, in joining tables that consist only of the ids from two other tables (to create a many-to-many relationships) composite keys are the way to go and thus taking them out would make things more difficult.
There is a school of thought that surrogate keys are always bad and that if you don't have uniqueness to record through the use of natural keys you have a bad design. I strongly disagree with this (if you aren't storing SSN or some other unique value I defy you to come up with a natural key for a person table for instance.) But many people feel that it is necessary for proper normalization.
Sometimes having a composite key reduces the need to join to another table. Sometimes it doesn't. So there are times when a composite key can boost performance as well as times when it can harm performance. If the key is relatively stable, you may be fine with faster performance on select queries. However, if it is something that is subject to change like a company name, you could be in a world of hurt when company A changes it's name and you have to update a million associated records.
There is no one size fits all in database design. There are time when composite keys are helpful and times when they are horrible. There are times when surrogate keys are helpful and times when they are not.
Composite primary key provides better performance when it comes to them being used as Foreign keys in other tables and reduces table reads - sometimes they can be life savers. If you use surrogate keys, you have to go to that table to get natural key information.
For example (pure example - so we are not talking DB design here), lets say you have an ORDER table and ORDER_ITEM. If you use ProductId and LineNumber (UPDATE: and as Pedro mentioned OrderId or even better OrderNumber) as composite primary key in ORDER_ITEM, then in your cross table for SHIPPING, you would be able to have ProductId in the SHIPPING_ORDERITEM. This can massively boost your performance if for example you have run out of that product and need to find out all products of that ProductId that need to be shipped without a need to join.
On the other hand, if you use a surrogate key, you have to join and you end up with a very inefficient SQL execution plan where it has to do bookmark lookup on several indexes.
See more on bookmark lookup which using surrogate keys becomes a major issue.
Natural primary keys are brittle.
Suppose we have built a system around a natural PK on (CountryCode, PhoneNumber), and several years down the road we need to add Extension, or change the PK to one column: Email. If these PK columns are propagated to all child tables, this becomes very expensive.
A few years ago there were some systems that were built assuming that Social Security Number is a natural PK, and had to be redesigned to use identities, when the SSN became non-unique and nullable.
Because we cannot predict the future, we don't know if later on some change will render obsolete what used to be a perfectly correct and complete model.
The very simple answer is data integrity. If the data is to be useful and accurate then the keys are presumably required. Having an "autogenerated id" doesn't remove the requirement for other keys as well. The alternative is not to enforce uniqueness and accept that data will be duplicated and almost inevatibly contain anomalies and lead to errors as a result. Why would you want that?
In short, the purpose of composite keys is to use the database to enforce one or more business rules. In other words: protect the integrity of your data.
Ex. You have a list of parts that you buy from suppliers. You could could create your supplier and parts table like such:
SUPPLIER
SupplierId
SupplierName
PART
PartId
PartName
SupplierId
Uh oh. The parts table allows for duplicate data. Since you used a surrogate key that was autogenerated, you're not enforcing the fact that a part from a supplier should only be entered once. Instead, you should create the PART table like such:
PART
SupplierId
SupplierPartId
PartName
In this example, your parts come from specific suppliers and you want to enforce the rule: "A single supplier can only supply a single part once" in the PARTS table. Hence, the composite key. Your composite key prevents accidental duplicate entry of a part.
You can always leave business rules out of your database and leave them to your application, but by keeping the rule in the database (via a composite key), you ensure that the business rule is enforced everywhere, especially if you should ever decide to allow multiple applications to access the data.
Just as functions encapsulate a set of instructions, or database views abstract base table connections, so to do surrogate keys abstract the meaning of the entity they are placed on.
If, for example, you have a table that holds vehicle data, applying a surrogate VehicleId abstracts what it means to be a vehicle from a data point of view. When you reference VehicleId = 1, you are most surely talking about a vehicle of some sort, but do we know if it is a 2008 Chevy Impala, or a 1991 Ford F-150? No. Can the underlying data of whatever Vehicle #1 is change at any time? Yes.
Short answer: Multi-column foreign keys naturally refer to multi column primary keys. There can still be an autogenerated id column that is part of the primary key.
Philosophical answer: Primary key is the identity of the row. If there there is a bit of information that is an intrinsic part of the identity of the row (such as which customer the article belongs to.. in a multi customer wiki) - The information should be part of the primary key.
An example: System for organizing LAN parties
The system supports several LAN parties with the same people and organizers attending thus:
CREATE TABLE users ( users_id serial PRIMARY KEY, ... );
And there are several parties:
CREATE TABLE parties ( parties_id serial PRIMARY KEY, ... );
But most of the other stuff needs to carry the information about which party it is linked to:
CREATE TABLE ticket_types (
ticket_types_id serial,
parties_id integer REFERENCES parties,
name text,
....
PRIMARY KEY(ticket_types_id, parties_id)
);
...this is because we want to refer to primary keys. Foreign key on table attendances points to table ticket_types.
CREATE TABLE attendances (
attendances_id serial,
parties_id integer REFERENCES parties,
ticket_types_id integer,
PRIMARY KEY (attendances_id, parties_id),
FOREIGN KEY (ticket_types_id, parties_id) REFERENCES parties
);
While I prefer surrogate keys, I use composite cases in a few cases. The composite key may consist entirely or partially of surrogate key fields.
Many to many join tables. These usually require a unique key on the key pair anyway. In some cases additional columns may be included in the key.
Weak child tables. Things like order lines do not stand on their own. In this case I use the parent (orders) tables primary key in the composite table.
When there are multiple weak tables related to an entity, it may be possible to eliminate a table from the join set when querying child data. In the case of grandchild tables, it is possible to join the grandparent to grandchild without involving the table in the middle.