MySQL: Cascade vs. Restrict - mysql

I'm working on a database and am using many foreign keys to connect my tables. By default, MySQL sets all of the ON UPDATE and ON DELETE to RESTRICT. This seemed to work fine.
Then one time, I wanted to change the id of several of the rows in a table. This table was involved in many relationships, so I changed the relations to CASCADE so that the changes would be cascaded to the tables that used that id as a foreign key.
Now I think to myself, is there any reason to leave the relations as RESTRICT, since CASCADE seems to make my life easier?

Sometimes you do want to implement all relations logic in your code. And you use RESTRICT just to control yourself - to see errors when you forgot some case.
Also keep in mind that CASCADE operation are sometimes very unexpected, and with huge codebase you should always keep it in mind. So it can be a good solution not to use them at all - helps to organise design of your application, BTW.
Another approach - sometimes you do have circular relations (for example with denormalisation) and CASCADE is impossible to use.

Changing IDs should be a pretty rare thing. Resist the temptation to remove gaps or keep your IDs "tidy," if that was your motivation. If it was, then the answer is easy. No, don't. :) It's generally important, particularly in more complex systems, that an ID in a given domain never represents more than different "thing," ever, even separated by time. Note how if you insert a row into a auto-increment table, then roll back the insert, the id isn't reused, even if there were no other competing actions.
If an id is an auto-increment or any other kind of surrogate key, it's rare that there would be a legitimate reason to change it, and cascading updates are more likely to fire only in response to an error made by you or the code... not for an intentional change.
Cascading updates should generally only be considered for natural keys, where there's a possibility that the parent table's primary key might actually need to be updated.
Natural keys are ids that come from the real world, like a vehicle's VIN or the tax authority assigned parcel number of a piece of real estate, seen as primary keys more frequently in theory than in actual practice... while surrogate keys have zero meaning outside the database (and thus should not be exposed to the user), such as auto increments and internally-generated GUIDs.
Cascading deletes are legitimately used where deleting the parent is to be expected in the normal course of events and the sudden disappearance of all the child rows is also desirable.
As a rule, though, if in doubt, RESTRICT is always the safest course.

Related

What if I model a database without defining foreign keys in MySQL?

Scenario 1 : Foreign keys are defined properly
Scenario 2 : On Deletion and Updation of records, I can ensure that no orphaned data will be left while writing API without defining any FK. Also I will check data integrity while Insertion
So what is the difference between these two scenarios? I just want to know what benefits I will get using FK (quantitative analysis). Will I achieve better performance in Scenario - 1 than 2? I am newbie in MySQL database design.
Performance differences...
An FK check must reach into the other table (via an index) to do the integrity check. However...
Situation 1: Simple FK:
In many cases, you can, via understanding the flow of the app code, assure yourself that some FK violations "cannot" happen. For example, when you insert into two tables in a row (and you have checked for errors, etc), and the second table is to point to the first table's row you just inserted, then the FK test is redundant and hurts performance (a little).
If, on the other hand, you "simulate" an FK check by doing an extra SELECT, that would be a noticeable performance hit.
Situation 2: FK with cascading delete:
If you use FKs for "cascading delete" (etc), then this may be more efficient than manually doing the delete.
Further, if you can batch the DELETEs, it is probably faster than letting the cascade do them one by one.
Another comment: "Side effects", such as 'cascading' are considered (by some) to be a naughty coding practice.
But... The above differences are probably not enough to make a difference in your app.

Foreign keys when cascades aren't needed

If I don't need to use cascade/restrict and similar constraints in a field which would logically be a foreign key, do I have any reason to explicitly declare it as a foreign key, other than aesthetics?
Wouldn't it actually decrease performance, since it has to test for integrity?
edit: to clarify, I don't need it since:
I won't edit nor delete those values anyway, so I don't need to do cascade and similar checks
Before calling INSERT, I'll check anyway if the target key exists, so I don't need restrict checks either
I understand that this kind of constraint will ensure that that relation will be still valid if the database becomes somehow corrupted, and that is a good thing. However, I'm wondering if there is any other reason to use this function in my case. Am I missing something?
The answers to this quesiton might actually also apply to your question.
If you have columns in tables which reference rows in other tables, you should always be using foreign keys, since even if you think that you 'do not need' the features offered by those checks, it will still help guarantee data integrity in case you forgot a check in your own code.
The performance impact of foreign key checks is neglegible in most cases (see above link), since relational databases use very optimised algorithms to perform them (after all, they are a key feature since they are what actually defines relations between entities).
Another major advantage of FKs is that they will also help others to understand the layout of your database.
Edit:
Since the question linked above is referring to SQL-Server, here's one with replies of a very similar kind for MySQL: Does introducing foreign keys to MySQL reduce performance
You must to do it. If it will touch performance in write -- it's a "pixel" problem.
Main performance problems are in read -- FKs could help query optimizer to select best plan and etc. Even if you DBMS(-s) (if you provide cross-DBMS solution) will gain from it now -- it can happen later.
So answer is -- yes, it's not only aestetics.

Inherited table in SQL Server 2008. Performance issues?

I have this idea I've been mulling around in my head based on another concept I read somewhere. Basically you have a single "Primary" table with very few fields, other tables inherit that primary table through a foreign key. This much has been done before so its no news. What I would like to do, is to have virtually every table in the database inherit from that Primary table. This way, every object, every record, every entry in every table can have a fully unique primary key(since the PK is actually stored in the Primary table), and can be simply referenced by ID instead of by table.
Another benefit is that it becomes easy to make relationships that can touch multiple tables. For example: I have a Transaction table, and this table wants to have a FK to whatever it is a transaction for(inventory, account, contact, order, etc.). The Transaction can just have a FK to the Primary table, and the necessary piece of data is referenced through that.
The issue that keeps coming up in my head, is whether or not that Primary table will become a bottleneck. The thing is gonna have literally millions of records at one point. I know that gigantic record sets can be handled by good table design, but whats the limit?
Has anyone attempted anything similar to this, and what were your results?
You have to consider that this table will have a tons of foreign key relations. These can cause performance issues, if you want to delete a row from the root table. (Which can cause some nasty execution plans on delete)
So if you plan to remove rows, then it could impact performance. I recently had issues with a setup like this, and it was a pain to clean it up (it was refferencing 120 other tables - deletes where slow as hell).
To overcome this performance issue, you might consider not enforcing contrains (Bad plan), using no contrains for performance (Bad plan), or try to group all data that belongs to one entity in one row, and stick to the normal normalization practices (Good plan)
Yes, the primary table will almost certainly be a bottleneck.
How do you enforce real referential integrity?
For example, How can you be sure that the transaction's FK is actually linked to an inventory, account, contact or order rather than an apple, orange or pineapple?
I think this is something that would be a horrible bottleneck. Not only that it would make enforcing the real PK/FK relationships much harder. It could create a data integrity nightmare. I don't see where you gain any benefits at all.

What are the Pros and Cons of Cascading delete and updates?

Maybe this is sort of a naive question...but I think that we should always have cascading deletes and updates. But I wanted to know are there problems with it and when should we should not do it? I really can't think of a case right now where you would not want to do an cascade delete but I am sure there is one...but what about updates should they be done always?
So can anyone please list out the pros and cons of cascading deletes and updates ? Thanks.
Pros:
When you delete a row from the Parent table all the foreign key rows are deleted
This is usually faster than implementing this with triggers
Orphaned rows are unlikely
Cons
Orphans are possible
If by mistake you delete a row in the parent table all the rows in the corresponding child tables will be deleted and it will be PITA to figure out what you deleted
This depends on the entities that are contained in the tables: if the side of the foreign key cannot exist without the side of the primary key, it makes sense to have cascaded delete.
E. g.: An invoice line item does not have any right to survive if the invoice is deleted.
But if you have a foreign key used for the relationship "works for" for the relationship between an employee and his/her boss, would you want to delete the employee if the boss leaves the company?
In addition: a technical issue is that some ORM (object relational mapping) tools are confused if dependent table entries change without them being responsible for that.
Pros:
Data integrity - Can help avoid situations where a record refers to something that is no longer there.
Cons:
Performance - Cascading deletes/updates can be sloooooooooooooooooooow.
Complexity - It seems most people I work with are not used to cascades, so when you give them a new project that has it, they're a bit surprised the first time they trigger one of these cascades.
As others have already mentioned, can really mess things up when used improperly.

Mysql auto increment primary key id's

I have some mysql tables that have auto incrementing id's that are primary keys, but I notice that I never actually use them... I used to think that every table must have a primary key so I guess that is why I created them before. Should I remove them all if I don't use them at all?
Unless you are running into space problems I wouldn't remove them.
They are a life saver in case you by mistake (or oversight) populate the database with repeated/wrong data.
They also help to have related tables, where you reference the content on one table through the autogenerated id.
This is assuming you have indexes for the other columns you use to actually query the data (if you don't, then more reason to keep the autoincrement ids and use them!).
No.
You should keep them; a database always needs something that differentiates a row from another row (a "Key" of some sort).
If you have something that is guaranteed to be unique for each row, then you can use that as a key; otherwise keep the Primary Key and the Auto generated ID.
I'd personally keep them. They will be especially useful at a later date if you expand the database design and need to reference this table.
Interesting!...
I seem to hold a minority opinion here, getting both upvoted and downvoted to currently an even 0, yet no one in the majority opinion (see responses above) seems to make much of a case for keeping the id field, and the downvoters didn't even bother leaving comments hinting at why doing away with the id is such a bad idea.
In their defense, my own original response did not include any strong argument as to why it is ok to do away with the id attribute in some cases (which seem to apply to the OP). Maybe such a gratuitous response makes it, in of itself, a downvotable response.
Please do educate me, and the OP, by leaving comments pro or against the _systematic_ (and I stress "systematic") need to include auto-incremented non-semantic primary keys in all tables. A promised I returned and added to my response to provide a list of reasons why it may be detrimental to [again, systematically] impose a auto-incremented PK.
My original response:
You bet! you can remove these!
Before you do anything to the database make sure you have a backup, in particular is the DB size is significant.
Use the ALTER TABLE statement to remove the id in the tables where you want to remove it. Specifically
ALTER TABLE myTable DROP COLUMN id
(you also need to remove the PK constraint before removing the id, if the table has such a constraint)
EDIT (Added later)
There are many cases where it just doesn't make sense to carry along an autoincremented ID key, regardless of the relative little extra storage requirement these keys add.
In all these cases, the underlying implication is that
either the data itself supplies a primary key,
or, the application manages the key generation
The key supplied "natively" in the data doesn't necessarily neeeds to be a single column key, it can be a composite key, although in these cases one may wish to study the situation more closely, particularly is the overal key is a bit long.
Here are some of the drawbacks of using an auto-incremeted primary key in lieu of a native or application-supplied key:
The effective data integrity may go unchecked
i.e. the server may allow record insertions of updates which create a duplicated [native] key (eventhough the artificial, autoincremented primary key hides this reality)
When relying on the auto-incremented PK for the support of joins between tables, when part of the [native] key values have to be updated...
...we either create the need of deleting the record in full and and re-insert it with the news values,
...or the risk of keeping outdated/incorrect links.
A common "follow-up" with auto-incremented keys is to create a clustered index on the table for this key.
This does make sense for tables without an native or application-supplied primary key, so so much for data sets that have such keys.
Effectively this prevents choosing a key for the clustered index which may be more beneficial for the most common query patterns.
Migrating tables with an auto-incremented key can made more difficult depending on the DBMS (need to declare the underlying column as plain integer, prior to copy, then need start again the autoincrement...)
For narrow tables, i.e. tables with a few columns only, the relative cost of the auto-incremented PK can be significant, and impact performance in a non negligible fashion.
When inserting new records along with associated records in related tables, the auto-incremented key needs to be obtained after the insertion of the main record, before the related records can be inserted; the logic is simpler when the column values supporting the link are known ahead of time.
To summarize, the idea that so long as the storage can carry the [relatively minimal] extra "weight" of the artificial primary key, we should include and use such a key, is not without drawbacks of its own.
A final consideration is that just like it is rather easy to remove such keys when we don't need them, they too can be easily added, post-facto, when/if it becomes apparent that they are useful in a particular situation. Neither form of refactoring (adding vs. removing the auto-incremented columns) is risk free, but neither is a major production either.
Yes, if you can figure out another primary key.
There is obviously a flaw of your table design. For example, you had a table like
relation_id(PK), parent_id, child_id .
It is known that the combination of parent_id and child_id is unique, then you can assign the primary key to be parent_id + child_id, and then drop the column relation_id.
There should may endlessly other possible cases, but just bear in mind that primary key is helping you to locate data quickly, as well as helping you have your design making sense.