I'm developing a helpdesk-like system, and I want to employ foreign keys, to make sure the DB structure is decent, but I don't know if I should use them at all, and how to employ them properly.
Are there any good tutorials on how (and when) to use Foreign keys ?
edit The part where I'm the most confused at is the ON DELETE .. ON UPDATE .. part, let's say I have the following tables
table 'users'
id int PK auto_increment
department_id int FK (departments.department_id) NULL
name varchar
table 'departments'
id int PK auto_increment
name
users.department_id is a foreign key from departments.department_id, how does the ON UPDATE and ON DELETE functions work here when i want to delete the department or the user?
ON DELETE and ON UPDATE refer to how changes you make in the key table propagate to the dependent table. UPDATE means that the key values get changed in the dependent table to maintain the relation, and DELETE means that dependent records get deleted to maintain the integrity.
Example: Say you have
Users: Name = Bob, Department = 1
Users: Name = Jim, Department = 1
Users: Name = Roy, Department = 2
and
Departments: id = 1, Name = Sales
Departments: id = 2, Name = Bales
Now if you change the deparments table to modify the first record to read id = 5, Name = Sales, then with "UPDATE" you would also change the first two records to read Department = 5 -- and without "UPDATE" you wouldn't be allowed to make the change!
Similarly, if you deleted Department 2, then with "DELETE" you would also delete the record for Roy! And without "DELETE" you wouldn't be allowed to remove the department without first removing Roy.
You will need foreign keys if you are splitting your database into tables and you are working with a DBMS (e.g. MySQL, Oracle and others). I assume from your tags you are using MySQL.
If you don't use foreign keys your database will become hard to manage and maintain. The process of normalisation ensures data consistency, which uses foreign keys.
See here for foreign keys. See here for why foreign keys are important in a relational database here.
Although denormalization is often used when efficiency is the main factor in the design. If this is the case you may want to move away from what I have told you.
Hope this helps.
Related
Let's assume I have 2 tables: foo and bar.
In third table I want to store different kind of data, however every row will have a reference to either foo OR bar.
Is it correct if I create 2 NULLable foreign keys - foo_id and bar_id - in the third table, or is it againts database design principles?
Basically, I thought all the time that foreign keys need to ALWAYS have a "parent", so if I try to e.g. INSERT a row with no primary key matched (or, in this case, with a foreign key set to NULL), I will get an error. Nullable FK-s are new to me, and they feel a bit off.
Also, what are the alternatives? Is it better to create separate tables storing single reference? Isn't this creating redundancy?
Linking tables?
Help.
A nullable FK is "okay". You will still get an error when you try to insert a non-existing parent key (it is just NULL that is allowed now).
The alternative is two link tables, one for foo and one for bar.
Things to consider:
Link tables allow for 1:N. If you don't want that, you can enforce it by primary key on the link table. That is not necessary for the id column solution (they are always 1:N).
You can avoid columns with mostly NULL values using link tables. In your case, though, it seems that you have NULL for exactly half the values. Probably does not qualify as "mostly". This becomes more interesting with more than two parent tables.
You may want to enforce the constraint that exactly one of your two columns is NULL. This can be done with the id column version using a check constraint. It cannot be done with link tables (unless you use triggers maybe).
it is depend on the business logic of the program. if the foreign key field must has a value , it is bad to set it null-able .
for example .
a book table has category_id field which the value is reference from bookCategory table.
each record in book table must has category . if for some reason you set it as nullable . this will cause some record in book table with category_id is null.
the problem will show up in report. the following 2 query will return different total_book
select count(*) as total_book from book;
select
count(*) as total_book
from
book
inner join bookCategory
on book.category_id = category.id
my advice is don't use null-able unless you expect value and no-value . alot of complex system that sometime have value different from one report and another , usually is cause by this.
Is there any sense in using two foreign key to the same parent table, to avoid inner join?
table: user_profile
id1, userid, username, firstname
table: user_hobby1
id2, userid(fk), hobby, movies
table: user_hobby2
id3, userid(fk), firstname(fk), hobby, movies
I want to select all firstname and hobby from the above table. I am not sure if user_hobby1 or user_hobby2 is the best design in terms of performance? One adds extra foreign key and another requires join.
Query1 :
Select firstname, hobby
from user_hobby2;
Query2 :
Select p.firstname, h.hobby
from
user_profile p
inner join user_hobby1 h on u.userid=h.userid;
Copying the value of an attribute from the user table into the hobby table isn't a "foreign key", that's redundancy.
Our performance objectives are not usually met with an approach of avoiding JOIN operations, which are a normal part of how relational databases operate.
I'd go with the normalized design as a first cut. Each attribute should be dependent on the key, the whole key, and nothing but the key. The "firstname" attribute is dependent on the id of the user, not the hobby.
Sometimes, we do gain performance benefits by introducing redundancy into the database. We have to do that in a controlled way, and make sure that we don't get update anomalies. (Consider what changes we want to apply if the value of "firstname" attribute is updated... do we make that change to the user table, the user_hobby table, or both.
Likely, "firstname" is not unique in the user table, so we definitely don't want a foreign key referencing that column; we want foreign keys that reference the user table to reference the PRIMARY KEY of the table.
There's no point in having two foreign keys defined between user_hobby and user, if a user_hobby is related to exactly one user. We only need one foreign key... we just store the id from the user table in the user_hobby table.
if you have two FK in user_hobby2 then you can only ensure that userid and username exist in user_profile, but you have no way to ensure which userid goes with a given username.
if you make (userid, username) a composite FK, then you'll guarantee the consistency of each tuple, but composite FK are generally more complicate to deal with. Depending on the behavior for update and delete cascades I've seen mysql triggering them both and refusing to delete from the parent.
Besides... what's the point of keeping that composite FK? It will only help you when you update or delete from user_profile, but won't help you copy the data when you insert new users or new hobbies for a user.
The join you are trying to avoid is very cheap. Just go with the first approach. It's easier to maintain and will help you keep your data consistent and normalized.
I am in a situation where i have to store key -> value pairs in a table which signifies users who have voted certain products.
UserId ProductID
1 2345
1 1786
6 657
2 1254
1 2187
As you can see that userId keeps on repeating and so can productId. I wanted to know what can be the best way to represent this data. Also is there a necessity of using primary key in here. I've searched a lot but am not able to find the exact specification about my problem. Any help would be appreciated. Thank you.
If you want to enforce that a given user can vote for a given product at most once, create a unique constraint over both columns:
ALTER TABLE mytable ADD UNIQUE INDEX (UserId, ProductID);
Although you can use these two columns together as a key, your app code is often simpler if you define a separate, typically auto increment, key column, but the decision to do this depends on which app code language/library you use.
If you have any tables that hold a foreign key reference to this table, and you intend to use referential integrity, those tables and the SQL used to define the relationship will also be simpler if you create a separate key column - you just end up carting multiple columns around instead of just one.
If I have two tables - table beer and table distributor, each one have a primary key and a third table that have the foreign keys and calls beer_distributor
Is it adequate a new field (primary key) in this table? The other way is with joins, correct? To obtain for example DUVEL De vroliijke drinker?
You've definitely got the right idea. Your beer_distributor table is what's known as a junction table. JOINs and keys/indexes are used together. The database system uses keys to make JOINs work quickly and efficiently. You use this junction table by JOINing both beer and distributor tables to it.
And, your junction table should have a primary key that spans both columns (a multiple-column index / "composite index"), which it looks like it does if I understand that diagram correctly. In that case, it looks good to me. Nicely done.
I would put a primary key in the join table beer_distributor, not a dual primary key of the two foreign keys. IMO, it makes life easier when maintaining the relationship.
UPDATE
To emphasize this point, consider having to change the distributor ACOO9 for beer 163. With the dual primary key, you'd have to remove then reinsert OR know both existing values to update the record. With a separate primary key, you'd simply update the record using this value. Comes in handy when building applications on top this data. If this is strictly a data warehouse, then a dual primary key might make more sense from the DBA perspective.
UPDATE beer_distributor SET distributor_id = XXXXX WHERE beer_id = 163 AND distributor_id = AC009
versus
UPDATE beer_distributor SET distributor_id = XXXXX WHERE id = 1234
I have four Database Tables like these:
Book
ID_Book |ID_Company|Description
BookExtension
ID_BookExtension | ID_Book| ID_Discount
Discount
ID_Discount | Description | ID_Company
Company
ID_Company | Description
Any BookExtension record via foreign keys points indirectly to two different ID_Company fields:
BookExtension.ID_Book references a Book record that contains a Book.ID_Company
BookExtension.ID_Discount references a Discount record that contains a Discount.ID_Company
Is it possible to enforce in Sql Server that any new record in BookExtension must have Book.ID_Company = Discount.ID_Company ?
In a nutshell I want that the following Query must return 0 record!
SELECT count(*) from BookExtension
INNER JOIN Book ON BookExstension.ID_Book = Book.ID_Book
INNER JOIN Discount ON BookExstension.ID_Discount = Discount.ID_Discount
WHERE Book.ID_Company <> Discount.ID_Company
or, in plain English:
I don't want that a BookExtension record references a Book record of a Company and a Discount record of another different Company!
Unless I've misunderstood your intent, the general form of the SQL statement you'd use is
ALTER TABLE FooExtension
ADD CONSTRAINT your-constraint-name
CHECK (ID_Foo = ID_Bar);
That assumes existing data already conforms to the new constraint. If existing data doesn't conform, you can either fix the data (assuming it needs fixing), or you can limit the scope (probably) of the new constraint by also checking the value of ID_FooExtension. (Assuming you can identify "new" rows by the value of ID_FooExtension.)
Later . . .
Thanks, I did indeed misunderstand your situation.
As far as I know, you can't enforce that constraint the way you want to in SQL Server, because it doesn't allow SELECT queries within a CHECK constraint. (I might be wrong about that in SQL Server 2008.) A common workaround is to wrap a SELECT query in a function, and call the function, but that's not reliable according to what I've learned.
You can do this, though.
Create a UNIQUE constraint on Book
(ID_Book, ID_Company). Part of it will look like UNIQUE (ID_Book, ID_Company).
Create a UNIQUE constraint on Discount (ID_Discount, ID_Company).
Add two columns to
BookExtension--Book_ID_Company and
Discount_ID_Company.
Populate those new columns.
Change the foreign key constraints
in BookExtension. You want
BookExtension (ID_Book,
Book_ID_Company) to reference
Book (ID_Book, ID_Company). Similar change for the foreign key
referencing Discount.
Now you can add a check constraint to guarantee that BookExtension.Book_ID_Company is the same as BookExtension.Discount_ID_Company.
I'm not sure how [in]efficient this would be but you could also use an indexed view to achieve this. It needs a helper table with 2 rows as CTEs and UNION are not allowed in indexed views.
CREATE TABLE dbo.TwoNums
(
Num int primary key
)
INSERT INTO TwoNums SELECT 1 UNION ALL SELECT 2
Then the view definition
CREATE VIEW dbo.ConstraintView
WITH SCHEMABINDING
AS
SELECT 1 AS Col FROM dbo.BookExtension
INNER JOIN dbo.Book ON dbo.BookExtension.ID_Book = Book.ID_Book
INNER JOIN dbo.Discount ON dbo.BookExtension.ID_Discount = Discount.ID_Discount
INNER JOIN dbo.TwoNums ON Num = Num
WHERE dbo.Book.ID_Company <> dbo.Discount.ID_Company
And a unique index on the View
CREATE UNIQUE CLUSTERED INDEX [uix] ON [dbo].[ConstraintView]([Col] ASC)