If I don't need to use cascade/restrict and similar constraints in a field which would logically be a foreign key, do I have any reason to explicitly declare it as a foreign key, other than aesthetics?
Wouldn't it actually decrease performance, since it has to test for integrity?
edit: to clarify, I don't need it since:
I won't edit nor delete those values anyway, so I don't need to do cascade and similar checks
Before calling INSERT, I'll check anyway if the target key exists, so I don't need restrict checks either
I understand that this kind of constraint will ensure that that relation will be still valid if the database becomes somehow corrupted, and that is a good thing. However, I'm wondering if there is any other reason to use this function in my case. Am I missing something?
The answers to this quesiton might actually also apply to your question.
If you have columns in tables which reference rows in other tables, you should always be using foreign keys, since even if you think that you 'do not need' the features offered by those checks, it will still help guarantee data integrity in case you forgot a check in your own code.
The performance impact of foreign key checks is neglegible in most cases (see above link), since relational databases use very optimised algorithms to perform them (after all, they are a key feature since they are what actually defines relations between entities).
Another major advantage of FKs is that they will also help others to understand the layout of your database.
Edit:
Since the question linked above is referring to SQL-Server, here's one with replies of a very similar kind for MySQL: Does introducing foreign keys to MySQL reduce performance
You must to do it. If it will touch performance in write -- it's a "pixel" problem.
Main performance problems are in read -- FKs could help query optimizer to select best plan and etc. Even if you DBMS(-s) (if you provide cross-DBMS solution) will gain from it now -- it can happen later.
So answer is -- yes, it's not only aestetics.
Related
I have a question related to this already answered question regards to MySql DB design. I was wondering what are the possible problems/sacrifices related to a decision not to put a "Not Null" constraint on foreign keys in the table? (As mentioned in the linked question, I can have multiple foreign keys in one table and I do not have to always know all of them when uploading data)
Here is an example (simplified):
There are three tables in my DB:
Company
Investor
Investment
Investment table has among others following columns:
Company FK
Investor FK
Problem:
I wanted to know what will be the consequences for the end user, f.e. data analyst, when I will allow "Null value" for Investor FK.
Therefore I think, my question was best answered by Vojta F who showed me both pros and cons of my solution from a perspective of a DB user.
As a DB user (i.e. not a DB admin) I think it is perfectly fine to omit a not null constraint from a foreign key if you don't know its value upon upload. The effect of such an omission is two-fold:
positive: it will be easier for you to upload new data - you won't be forced to insert a fkey value which I think is fine as long as you are aware of this when joining on this column ,
negative: weaker data integrity: it will be harder to resolve records among multiple tables and you'll have to think about the nulls when joining.
In general the gain for using NULL when you need it exceeds any performance, etc, loss (or even gain).
The space consumed so small as to be not worth computing.
The speed considerations are usually non-existent. The Optimizer does a few things differently depending on the NULLability of a indexed column. But, again, your benefit of having (or not having) NULL is likely to exceed any downside.
There are a small number of restrictions. A PRIMARY KEY must include NOT NULL column(s).
I'm curious if something like this is possible, if at all reasonable.
I have a column in a table, that's called ref_table and it points to a table that the current entry relates to. Let's say, in table table_people, Person ID 1 is a client and Person ID 3 is an employee, so respectively their ref_tables will show "table_clients" and "table_emplyees". I shouldn't have a problem keeping the values valid through PHP, but what would some ways of achieving it through SQL be?
I tried testing it with a foreign key constraint to INFROMATION_SCHEMA:
FOREIGN KEY `people_constraint_tables` (`ref_table`)
REFERENCES `INFORMATION_SCHEMA`.`COLUMNS`(`COLUMN_NAME`)
ON DELETE RESTRICT
ON UPDATE RESTRICT
No point refining it since it didn't work. It seems like there's one way to make it work but it is a dirty cheat apparently.
Would you do it with triggers? Would you do it at all? Someone with experience with MySQL tell me if that'sreasonable at all, I'd like to know. Thank you.
MySQL doesn't have the facility to do this easily. Other databases do, through generated columns or table inheritance.
Would I do this with triggers? Well, yes and no. If I had to do this with one table and I had to use MySQL and I wanted to introduce relational integrity, then triggers are the way to go. There is little other choice.
But really, I would simply have a different table for each reference type. There is a little bit of overhead in this (in terms of partially filled tables). And for some applications, a single reference table is quite convenient (internationalization comes to mind). But in general, I would stick with the standard method of a separate table for each entity with properly declared foreign key relationships.
Scenario 1 : Foreign keys are defined properly
Scenario 2 : On Deletion and Updation of records, I can ensure that no orphaned data will be left while writing API without defining any FK. Also I will check data integrity while Insertion
So what is the difference between these two scenarios? I just want to know what benefits I will get using FK (quantitative analysis). Will I achieve better performance in Scenario - 1 than 2? I am newbie in MySQL database design.
Performance differences...
An FK check must reach into the other table (via an index) to do the integrity check. However...
Situation 1: Simple FK:
In many cases, you can, via understanding the flow of the app code, assure yourself that some FK violations "cannot" happen. For example, when you insert into two tables in a row (and you have checked for errors, etc), and the second table is to point to the first table's row you just inserted, then the FK test is redundant and hurts performance (a little).
If, on the other hand, you "simulate" an FK check by doing an extra SELECT, that would be a noticeable performance hit.
Situation 2: FK with cascading delete:
If you use FKs for "cascading delete" (etc), then this may be more efficient than manually doing the delete.
Further, if you can batch the DELETEs, it is probably faster than letting the cascade do them one by one.
Another comment: "Side effects", such as 'cascading' are considered (by some) to be a naughty coding practice.
But... The above differences are probably not enough to make a difference in your app.
I'm working on a database and am using many foreign keys to connect my tables. By default, MySQL sets all of the ON UPDATE and ON DELETE to RESTRICT. This seemed to work fine.
Then one time, I wanted to change the id of several of the rows in a table. This table was involved in many relationships, so I changed the relations to CASCADE so that the changes would be cascaded to the tables that used that id as a foreign key.
Now I think to myself, is there any reason to leave the relations as RESTRICT, since CASCADE seems to make my life easier?
Sometimes you do want to implement all relations logic in your code. And you use RESTRICT just to control yourself - to see errors when you forgot some case.
Also keep in mind that CASCADE operation are sometimes very unexpected, and with huge codebase you should always keep it in mind. So it can be a good solution not to use them at all - helps to organise design of your application, BTW.
Another approach - sometimes you do have circular relations (for example with denormalisation) and CASCADE is impossible to use.
Changing IDs should be a pretty rare thing. Resist the temptation to remove gaps or keep your IDs "tidy," if that was your motivation. If it was, then the answer is easy. No, don't. :) It's generally important, particularly in more complex systems, that an ID in a given domain never represents more than different "thing," ever, even separated by time. Note how if you insert a row into a auto-increment table, then roll back the insert, the id isn't reused, even if there were no other competing actions.
If an id is an auto-increment or any other kind of surrogate key, it's rare that there would be a legitimate reason to change it, and cascading updates are more likely to fire only in response to an error made by you or the code... not for an intentional change.
Cascading updates should generally only be considered for natural keys, where there's a possibility that the parent table's primary key might actually need to be updated.
Natural keys are ids that come from the real world, like a vehicle's VIN or the tax authority assigned parcel number of a piece of real estate, seen as primary keys more frequently in theory than in actual practice... while surrogate keys have zero meaning outside the database (and thus should not be exposed to the user), such as auto increments and internally-generated GUIDs.
Cascading deletes are legitimately used where deleting the parent is to be expected in the normal course of events and the sudden disappearance of all the child rows is also desirable.
As a rule, though, if in doubt, RESTRICT is always the safest course.
In databases where foreign key checking has been disabled in the past, how can one check for foreign key constraint violations?
Basically, if you have no foreign key constraints, you can do this:
SELECT * FROM CHILD C WHERE C.PARENT_ID NOT IN (SELECT ID FROM PARENT);
there is no built-in way to do this. the only thing i can think of would be to look at the TABLE_CONSTRAINTS and KEY_COLUMN_USAGE tables in the INFORMATION_SCHEMA database to manually check for rows that don't match.
It sounds like you could basically reword your question as "How can I ensure referential integrity with foreign keys disabled?"
I imagine the very "headaches" that made you disable the foreign keys are very thing they were intended to enforce. So the simplest answer to me seems to not disable them in the first place. Do it right the first time and you won't have to do it again later.
Enabling the foreign key constraint will check all relations, so if there is something wrong, you will get an error.
"Turning on" the FK after the load should indeed do the check already.
If your DBMS doesn't do that, dump it.
If your DBMS doesn't do that and you still want to keep working with such crap, you could do a query of the appropriate SEMIMINUS expression of the RA.
This is likely to look something like :
SELECT ...
FROM table_with_FK
WHERE NOT EXISTS (
SELECT ...
FROM table_with_PK
WHERE PK_attribute1 = FK_attribute1 and PK_attribute2 = FK_attribute2 and ...
) AND <anything here that allows you to identify the loaded rows>
or a bit more modern (if your DBMS supports EXCEPT) :
SELECT FK_attributes
FROM table_with_FK
WHERE <anything here that allows you to identify the loaded rows>
EXCEPT
SELECT PK_attributes_possibly_renamed
FROM table_with_PK
;
EDIT (answering to "not everyone needs oracle and IBM sized products. "dump it" is not good advice.")
The OP has very clearly indicated that he is DEFINITELY interested in data integrity. So he really should be using a DBMS product that DOES offer a bit of professional-level support for ensuring data integrity. I sincerely hope that "Oracle and IBM sized products" are NOT the only ones who do that.