I have a database of polymorphic structure: a "base" type table and two "derived" types:
CREATE TABLE ContactMethod(
id integer PRIMARY KEY
person_id integer
priority integer
allow_solicitation boolean
FOREIGN KEY(person_id) REFERENCES People(id)
)
CREATE TABLE PhoneNumbers(
contact_method_id integer PRIMARY KEY
phone_number varchar
FOREIGN KEY(contact_method_id) REFERENCES ContactMethod(id)
)
CREATE TABLE EmailAddresses(
contact_method_id integer PRIMARY KEY
email_address varchar
FOREIGN KEY(contact_method_id) REFERENCES ContactMethod(id)
)
I want to prevent orphaned ContactMethod records from existing, that is, a ContactMethod record with neither a corresponding PhoneNumber record nor an EmailAddress record. I've seen techniques for ensuring exclusivity (preventing a ContactMethod record with both a related PhoneNumber and EmailAddress), but not for preventing orphans.
One idea is a CHECK constraint that executes a custom function that executes queries. However, executing queries via functions in CHECK constraints is a bad idea.
Another idea is a View that will trigger a violation if an orphaned ContactMethod record is added. The "obvious" way to do this is to put a constraint on the View, but that's not allowed. So it has to be some sort of trick, probably involving an index on the View. Is that really the best (only?) way to enforce no orphans? If so, what is a working example?
Are there other ways? I could get rid of ContactMethod table and duplicate shared columns on the other two tables, but I don't want to do that. I'm primarily curious about capabilities available in MySQL and SQLite, but a solution in any SQL engine would be helpful.
The simplest solution would be to use single table inheritance. So both the contact methods are optional (that is, nullable) fields in the ContactMethod table, but you add a CHECK constraint to ensure at least one of these has a non-null value.
CREATE TABLE ContactMethod(
id integer PRIMARY KEY
person_id integer
priority integer
allow_solicitation boolean,
phone_number varchar DEFAULT NULL
email_address varchar DEFAULT NULL
FOREIGN KEY(person_id) REFERENCES People(id)
CHECK (COALESCE(phone_number, email_address) IS NOT NULL)
)
Another solution that supports polymorphic associations is to reverse the direction of foreign key. Make ContactMethod have a one nullable foreign key for each type of associated method. Use a CHECK to make sure at least one has a non-null value. This works because you don't allow multiple emails or phones per row in ContactMethod. It does mean if you add a different type of contact (e.g. Signal account), then you'd have to add another foreign key to this table.
CREATE TABLE ContactMethod(
id integer PRIMARY KEY
person_id integer
priority integer
allow_solicitation boolean,
phone_number_id integer DEFAULT NULL
email_address_id integer DEFAULT NULL
FOREIGN KEY(person_id) REFERENCES People(id)
FOREIGN KEY(phone_number_id) REFERENCES PhoneNumbers(id)
FOREIGN KEY(email_address_id) REFERENCES EmailAddresses(id)
CHECK (COALESCE(phone_number_id, email_address_id) IS NOT NULL)
)
A newly inserted ContactMethod will always be orphaned until you insert a phone number or an e-mail address. So, you cannot test the condition at insert.
Instead, you could insert contact information with a stored procedure having an optional phone number and optional e-mail parameter in addition to the base information. The base record would only be inserted if at least one of the two has a non-null value.
Then create a delete trigger when a phone number or an e-mail address is deleted, to either delete the ContactMethod record when no related record exist anymore or to raise an exception as shown in Alter a Delete Trigger to Check a Column Value
I am trying to understand the syntaxes for defining tables and I noticed that the column definitions include an option to indiciate that the column references a column from another table.
If I can already define this here, do I still need to explicitly define a FOREIGN KEY constraint specifying that column again? Why?
Because I imagine the REFERENCE definition added as a column constraint should already take care of the fact that the column is a foreign key (since it is referencing another table).
Example code for clarity:
a)
create table SAMPLE (
sample_id INT PRIMARY KEY,
client_id INT REFERENCES CLIENT (client_id)
);
b)
create table SAMPLE (
sample_id INT PRIMARY KEY,
client_id INT NOT NULL,
CONSTRAINT fk_sample_client
FOREIGN KEY (client_id) REFERENCES CLIENT (client_id)
);
Does definition (a) ensure that the clientId is identified as the foreign key, the same way definition (b) does?
REFERENCES as part of the column definition is ignored.
Important
For users familiar with the ANSI/ISO SQL Standard, please
note that no storage engine, including InnoDB, recognizes or enforces
the MATCH clause used in referential integrity constraint definitions.
Use of an explicit MATCH clause does not have the specified effect,
and also causes ON DELETE and ON UPDATE clauses to be ignored. For
these reasons, specifying MATCH should be avoided.
The MATCH clause in the SQL standard controls how NULL values in a
composite (multiple-column) foreign key are handled when comparing to
a primary key. InnoDB essentially implements the semantics defined by
MATCH SIMPLE, which permit a foreign key to be all or partially NULL.
In that case, the (child table) row containing such a foreign key is
permitted to be inserted, and does not match any row in the referenced
(parent) table. It is possible to implement other semantics using
triggers.
Additionally, MySQL requires that the referenced columns be indexed
for performance. However, InnoDB does not enforce any requirement that
the referenced columns be declared UNIQUE or NOT NULL. The handling of
foreign key references to nonunique keys or keys that contain NULL
values is not well defined for operations such as UPDATE or DELETE
CASCADE. You are advised to use foreign keys that reference only keys
that are both UNIQUE (or PRIMARY) and NOT NULL.
MySQL parses but ignores “inline REFERENCES specifications” (as
defined in the SQL standard) where the references are defined as part
of the column specification. MySQL accepts REFERENCES clauses only
when specified as part of a separate FOREIGN KEY specification. For
more information, see Section 1.7.2.3, “FOREIGN KEY Constraint
Differences”.
Source: https://dev.mysql.com/doc/refman/8.0/en/create-table.html
As Simon shared , inline REFERENCES in column definitions are ignored by mySQL and this link gives further explanation as to why.
Defining a column to use a REFERENCES tbl_name(col_name) clause has no
actual effect and serves only as a memo or comment to you that the
column which you are currently defining is intended to refer to a
column in another table.
Simply put, the syntax will still only create the column; it will not specify it as a foreign key or carry out any checks on it.
At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.
I have a problem with foreign keys in MySQL or probably I am just thinking in the wrong direction... I have an activity log table where I need to reference key values from currently 2 other tables. So I am using a field that contains that foreign key value along with an indicator stating which table that foreign key value is from.
Table activitylog
...
RefID INT NOT NULL,
RefType INT NOT NUL,
...
Table offers
OfferID INT NOT NULL,
...
Table orders
OrderID INT NOT NULL,
...
If the user created an offer, the value of OfferID from table Offers would be writtten to RefID of activity log and RefType is set to 1. If it was an order then the value of OrderID goes into RefID and RefType is set to 2.
Of course I could add an additional field, name it OrderID, rename RefID to OfferID and discard RefType and use these fields. But if in future an new entity will be used I would have to add an additional field holding the key values of the new entity instead of just invent RefType 3 and continue having the key values in RefID.
I am now struggling with the definition of the foreing key constraints. The logic would be if RefType = 1 lookup the key in Offers, if RefType = 2 go into Orders.
Does anybody know if there is a way to achieve my current concept or do I have to add additional fields to the activitylog?
No. MySQL doesn't support enforcement of FOREIGN KEY constraints like you explain, a single column referencing multiple tables.
You could define the constraints with the MyISAM engine, but the FK constraints wouldn't be enforced.
If you define the FK constraints for tables using the InnoDB engine, then ALL of the foreign key constraints would be enforced, no matter what values are stored in other columns.
To have FK constraints on a table to reference two (or more) different, independent parent tables, you'd need two (or more) foreign keys columns, one for each table.
With your table design with InnoDB, you'd have to forgo declarative FOREIGN KEY constraints.
It might be possible for you to roll-your-own constraints by writing some messy triggers; have the trigger throw an exception when one of your constraint rules is violated.
I created my MySQL database using phpMyAdmin 3.5.8.1deb1 in Ubuntu.
Instead of that all my tables are InnoDB, I can't add a foreign key, and this is an example:
ALTER TABLE element ADD CONSTRAINT FK_element_id_user FOREIGN KEY (id_user) REFERENCES user(id) ON DELETE SET NULL ON UPDATE CASCADE;
When I run this script I get this error :
#1005 - Can't create table 'tpw.#sql-4d8_e2' (errno: 150) (Details...)
When I click on details I get this :
InnoDB Documentation Supports transactions, row-level locking, and foreign keys
I tried to add the FK manually in the relation view
There could be a couple of things going one here. Here are some things to look for:
Do the data types of each field between the tables match?
Are both both tables using the same MySQL engine?
Here is a good resource to help you debug this issue further.
Excerpt from the resource linked to above:
1) The two key fields type and/or size is not an exact match. For example, if one is INT(10) the key field needs to be INT(10) as well and not INT(11) or TINYINT. You may want to confirm the field size using SHOW CREATE TABLE because Query Browser will sometimes visually show just INTEGER for both INT(10) and INT(11). You should also check that one is not SIGNED and the other is UNSIGNED. They both need to be exactly the same.
2) One of the key field that you are trying to reference does not have an index and/or is not a primary key. If one of the fields in the relationship is not a primary key, you must create an index for that field.
3) The foreign key name is a duplicate of an already existing key. Check that the name of your foreign key is unique within your database. Just add a few random characters to the end of your key name to test for this.
4) One or both of your tables is a MyISAM table. In order to use foreign keys, the tables must both be InnoDB. (Actually, if both tables are MyISAM then you won’t get an error message – it just won’t create the key.) In Query Browser, you can specify the table type.
5) You have specified a cascade ON DELETE SET NULL, but the relevant key field is set to NOT NULL. You can fix this by either changing your cascade or setting the field to allow NULL values.
6) Make sure that the Charset and Collate options are the same both at the table level as well as individual field level for the key columns.
7) You have a default value (ie default=0) on your foreign key column.
8) One of the fields in the relationship is part of a combination (composite) key and does not have it’s own individual index. Even though the field has an index as part of the composite key, you must create a separate index for only that key field in order to use it in a constraint.
9) You have a syntax error in your ALTER statement or you have mistyped one of the field names in the relationship.
10) The name of your foreign key exceeds the max length of 64 chars.
User.ID has to be declared as an INDEX