"Segemented" foreign keys in MySQL? - mysql

I have a problem with foreign keys in MySQL or probably I am just thinking in the wrong direction... I have an activity log table where I need to reference key values from currently 2 other tables. So I am using a field that contains that foreign key value along with an indicator stating which table that foreign key value is from.
Table activitylog
...
RefID INT NOT NULL,
RefType INT NOT NUL,
...
Table offers
OfferID INT NOT NULL,
...
Table orders
OrderID INT NOT NULL,
...
If the user created an offer, the value of OfferID from table Offers would be writtten to RefID of activity log and RefType is set to 1. If it was an order then the value of OrderID goes into RefID and RefType is set to 2.
Of course I could add an additional field, name it OrderID, rename RefID to OfferID and discard RefType and use these fields. But if in future an new entity will be used I would have to add an additional field holding the key values of the new entity instead of just invent RefType 3 and continue having the key values in RefID.
I am now struggling with the definition of the foreing key constraints. The logic would be if RefType = 1 lookup the key in Offers, if RefType = 2 go into Orders.
Does anybody know if there is a way to achieve my current concept or do I have to add additional fields to the activitylog?

No. MySQL doesn't support enforcement of FOREIGN KEY constraints like you explain, a single column referencing multiple tables.
You could define the constraints with the MyISAM engine, but the FK constraints wouldn't be enforced.
If you define the FK constraints for tables using the InnoDB engine, then ALL of the foreign key constraints would be enforced, no matter what values are stored in other columns.
To have FK constraints on a table to reference two (or more) different, independent parent tables, you'd need two (or more) foreign keys columns, one for each table.
With your table design with InnoDB, you'd have to forgo declarative FOREIGN KEY constraints.
It might be possible for you to roll-your-own constraints by writing some messy triggers; have the trigger throw an exception when one of your constraint rules is violated.

Related

MySQL create both entities with cyclic foreign key

I want to have an "Entity" and many versions of it, where one of those versions is the only one which is active/used. It is also possible that the Entity is entirely deactive. So I thought of using two tables with cyclic foreign keys like this:
CREATE TABLE entity (
id int NOT NULL AUTO_INCREMENT,
-- some extra irrelevant data commented out
active_version_id int DEFAULT NULL,
PRIMARY KEY (id)
);
CREATE TABLE entityversion (
id int NOT NULL AUTO_INCREMENT,
-- some extra irrelevant data commented out
entity_id int NOT NULL,
PRIMARY KEY (id)
);
ALTER TABLE entity ADD FOREIGN KEY (active_version_id) REFERENCES entityversion(id) ON DELETE SET NULL;
ALTER TABLE entityversion ADD FOREIGN KEY (entity_id) REFERENCES entity(id) ON DELETE CASCADE;
I would like to, when creating a new active Entity, to create at the same time its first EntityVersion which will be its active_version. The problem is we don't have their ids yet. Currently, we're creating the Entity with "returning id" and using that to create the EntityVersion, also with "returning id", and then updating the active_version_id of that same Entity, so 3 separate commands like this for example:
INSERT INTO entity DEFAULT VALUES RETURNING id;
-- get the ID back and use it as a parameter to the next command
INSERT INTO entityversion (entity_id) VALUES (%s) RETURNING id;
-- again the same thing
UPDATE entity SET active_version_id = %s WHERE id = %s;
I would like to know if there is a shorter way to do this. I also accept as answer a different approach to the table schemas, if it happens to be the better choice. Thanks for the help!
Create both your rows in a stored procedure, or use a before insert trigger if there is no data that only goes in the entityversion version table. To deal with your cyclical id problem, in mariadb use a sequence instead of auto_increment. In mysql, emulate a sequence with an entity_sequence table that only contains the auto_increment id. In your stored procedure/trigger, get the sequence value (with insert..returning id if emulating a sequence), store entityversion using that value, then set the entityversion id to store in your entity row.
You are implying that the entities are 1:1, in which case they may as well be in the same table. (Make one of the NULLable if it is not to inserted until later.)
If it is 1:many (a 'latest' and many 'older' versions), then the FK only goes one way.
In either case, your "circular" FKs go away.
But to answer your question:
Turn off FK checks
CREATE both tables
Populate both tables
ALTER to add both FKs
Turn on FK checks.
More
Well, it seems that you have many:1, not 1:1. The "History" has a column that is the "id" into the "Current" ('active') table. No circular FKs. Index that column so you can go the other way efficiently. ON DELETE CASCADE is not practical in either direction.
The FK should go one direction, not both.

sql management studio [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

complex mysql constraints over foreign keys

It seems rational to me to stop users or bad codes from inserting invalid data, but I don't remember to see this anywhere!
Consider the following tables
How I can make sure an order is always referencing an address that is created by the same user?
Is this kind of constraint usual and recommended? I mean, Do I even have to care about it in the design?
Since I would not expect a user to be able to place an order without a valid address, therefore I would simply remove the separate FK to the user table, and use the combined user id - address id fields from the address table as a foreign key.
CREATE TABLE orders AS (
--[COLUMN DEFINITIONS]
address_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
CONSTRAINT fk_usr_addr FOREIGN KEY (user_id, address_id)
REFERENCES address(user_id, id)
) ENGINE=InnoDB;
If an order is incomplete and does not have an address yet, then this should not be an issue for a multi column foreign key, since according to mysql documentation on using foreign keys:
The MATCH clause in the SQL standard controls how NULL values in a
composite (multiple-column) foreign key are handled when comparing to
a primary key. MySQL essentially implements the semantics defined by
MATCH SIMPLE, which permit a foreign key to be all or partially NULL.
In that case, the (child table) row containing such a foreign key is
permitted to be inserted, and does not match any row in the referenced
(parent) table. It is possible to implement other semantics using
triggers.

Database Design: use a non-key as a FK?

Say I have the following table:
TABLE: widget
- widget_id (not null, unique, auto-increment)
- model_name (PK)
- model_year (PK)
model_name and model_year make up a composite key. Is there any problem to using widget_id as a FK in another table?
A key is any number of columns that can be used to uniquely identify each row within the table.
In the example you've shown, your widget table has two keys:
model_name, model_year
widget_id
In standard SQL, a foreign key may reference any declared key on the referenced table (either primary key or unique). I'd need to check MySQLs compliance.
From MySQL reference manual on foreign keys:
InnoDB permits a foreign key to reference any index column or group of columns. However, in the referenced table, there must be an index where the referenced columns are listed as the first columns in the same order.
As an alternative, if you wish to use the composite key from your referencing table, you'd have two columns in that table that correspond to model_name and model_year, and would then declare your foreign key constraint as:
ALTER TABLE OtherTable ADD CONSTRAINT
FK_OtherTable_Widgets (model_name,model_year)
references Widgets (model_name,model_Year).
Re InnoDB vs MyISAM, in the docs for ALTER TABLE
The FOREIGN KEY and REFERENCES clauses are supported by the InnoDB storage engine, which implements ADD [CONSTRAINT [symbol]] FOREIGN KEY (...) REFERENCES ... (...). See Section 13.6.4.4, “FOREIGN KEY Constraints”. For other storage engines, the clauses are parsed but ignored. The CHECK clause is parsed but ignored by all storage engines. See Section 12.1.17, “CREATE TABLE Syntax”. The reason for accepting but ignoring syntax clauses is for compatibility, to make it easier to port code from other SQL servers, and to run applications that create tables with references. See Section 1.8.5, “MySQL Differences from Standard SQL”.
I have no experience specific to my-sql, but with database-modeling in general
It is really important to understand the difference between primary and secondary keys.
Even if many db (I know for sure Oracle does) permit to specify an unique (simple or composite) key as the FK target, this is not considered a best practice. Use the PK instead.
FK to a secondary key should be used imo only to relate to tables that are not under your control.
In your specific case, I would certainly FK to widget_id: that is because the widget_id should be your PK, and the composite only made unique (and not null of course). This leads to better performance in mane cases, as you join only one column in queries, and is generally considered a best practice (google 'surrogate key' for more info)
MySQL will create an index on the column if there isn't one it can use:
InnoDB requires indexes on foreign
keys and referenced keys so that
foreign key checks can be fast and not
require a table scan. In the
referencing table, there must be an
index where the foreign key columns
are listed as the first columns in the
same order. Such an index is created
on the referencing table automatically
if it does not exist. (This is in
contrast to some older versions, in
which indexes had to be created
explicitly or the creation of foreign
key constraints would fail.)
index_name, if given, is used as
described previously.
http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
Let's say you have two tables. The first table looks like this:
TABLE: widget
- model_name (PK)
- model_year (PK)
- widget_id (not null, unique, auto-increment)
If you want to make another table that refers to unique records in the first table, it should look something like this:
TABLE B: sprocket
- part_number (PK)
- blah
- blah
- model_name_widget (FK to widget)
- model_year_widget (FK to widget)
- blah
With compound primary keys, you have to include all key fields in your FK references to make sure that you are uniquely specifying a record.

Can a foreign key reference a non-unique index?

I thought a foreign key meant that a single row must reference a single row, but I'm looking at some tables where this is definitely not the case. Table1 has column1 with a foreign key constraint on column2 in table2, BUT there are many records in table2 with the same value in column2. There's also non-unique index on column2. What does this mean? Does a foreign key constraint simply mean that at least one record must exist with the right values in the right columns? I thought it meant there must be exactly one such record (not sure how nulls fit in to the picture, but I'm less concerned about that at the moment).
update: Apparently, this behavior is specific to MySQL, which is what I was using, but I didn't mention it in my original question.
From MySQL documentation:
InnoDB allows a foreign key constraint to reference a non-unique key. This is an InnoDB extension to standard SQL.
However, there is a pratical reason to avoid foreign keys on non-unique columns of referenced table. That is, what should be the semantic of "ON DELETE CASCADE" in that case?
The documentation further advises:
The handling of foreign key references to nonunique keys or keys that contain NULL values is not well defined (...) You are advised to use foreign keys that reference only UNIQUE (including PRIMARY) and NOT NULL keys.
Your analysis is correct; the keys don't have to be unique, and constraints will act on the set of matching rows. Not usually a useful behavior, but situations can come up where it's what you want.
When this happens, it usually means that two foreign keys are being linked to each other.
Often the table that would contain the key as a primary key isn't even in the schema.
Example: Two tables, COLLEGES and STUDENTS, both contain a column called ZIPCODE.
If we do a quick check on
SELECT * FROM COLLEGES JOIN STUDENTS ON COLLEGES.ZIPCODE = STUDENTS.ZIPCODE
We might discover that the relationship is many to many. If our schema had a table called ZIPCODES, with primary key ZIPCODE, it would be obvious what's really going on.
But our schema has no such table. Just because our schema has no such table doesn't mean that such data doesn't exist, however. somewhere, out in USPO land, there is just such a table. And both COLLEGES.ZIPCODE and STUDENTS.ZIPCODE are references to that table, even if we don't acknowledge it.
This has more to do with the philosophy of data than the practice of building databases, but it neatly illustrates something fundamental: the data has characteristics that we discover, and not only characteristics that we invent. Of course, what we discover could be what somebody else invented. That's certainly the case with ZIPCODE.
Yes, you can create foreign keys to basically any column(s) in any table. Most times you'll create them to the primary key, though.
If you do use foreign keys that don't point to a primary key, you might also want to create a (non-unique) index to the column(s) being referenced for the sake of performance.
Depends on the RDBMS you're using. I think some do this for you implicitly, or use some other tricks. RTM.
PostgreSQL also refuses this (anyway, even if it is possible, it does not mean it is a good idea):
essais=> CREATE TABLE Cities (name TEXT, country TEXT);
CREATE TABLE
essais=> INSERT INTO Cities VALUES ('Syracuse', 'USA');
INSERT 0 1
essais=> INSERT INTO Cities VALUES ('Syracuse', 'Greece');
INSERT 0 1
essais=> INSERT INTO Cities VALUES ('Paris', 'France');
INSERT 0 1
essais=> INSERT INTO Cities VALUES ('Aramits', 'France');
INSERT 0 1
essais=> INSERT INTO Cities VALUES ('Paris', 'USA');
INSERT 0 1
essais=> CREATE TABLE People (name TEXT, city TEXT REFERENCES Cities(name));
ERROR: there is no unique constraint matching given keys for referenced table "cities"
Necromancing.
As others already said, you shouldn't reference a non-unique key as foreign key.
But what you can do instead (without delete cascade danger) is adding a check-constraint (at least in MS-SQL).
That's not exactly the same as a foreign key, but at least it will prevent the insertion of invalid/orphaned/dead data.
See here for reference (you'll have to port the MS-SQL code to MySQL syntax):
Foreign Key to non-primary key
Edit:
Searching for the reasons for the downvote, according to Mysql CHECK Constraint, MySQL doesn't really support CHECK constraints.
You can define them in your DDL query for compatibility reasons, but they are just ignored...
But as mentioned there, you can create a BEFORE INSERT and BEFORE UPDATE trigger, which will throw an error when the requirements of the data are not met, which is basically the same thing, except that it's an even bigger mess.
As to the question:
I thought a foreign key meant that a single row must reference a
single row, but I'm looking at some tables where this is definitely
not the case.
In any sane RDBMS, this is true.
The fact that this is possible in MySQL is just one more reason why
MySQL is an in-sane RDBMS.
It may be fast, but sacrificing referential integrity and data quality on the altar of speed is not my idea of a quality-rdbms.
In fact, if it's not ACID-compliant, it's not really a (correctly functioning) RDBMS at all.
What database are we talking about? In SQL 2005, I cannot create a foreign key constraint that references a column that does not have a unique constraint (primary key or otherwise).
create table t1
(
id int identity,
fk int
);
create table t2
(
id int identity,
);
CREATE NONCLUSTERED INDEX [IX_t2] ON [t2]
(
[id] ASC
);
ALTER TABLE t1 with NOCHECK
ADD CONSTRAINT FK_t2 FOREIGN KEY (fk)
REFERENCES t2 (id) ;
Msg 1776, Level 16, State 0, Line 1
There are no primary or candidate keys in the referenced table 't2'
that match the referencing column list in the foreign key 'FK_t2'.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.
If you could actually do this, you would effectively have a many-to-many relationship, which is not possible without an intermediate table. I would be truly interested in hearing more about this ...
See this related question and answers as well.