Maybe this is sort of a naive question...but I think that we should always have cascading deletes and updates. But I wanted to know are there problems with it and when should we should not do it? I really can't think of a case right now where you would not want to do an cascade delete but I am sure there is one...but what about updates should they be done always?
So can anyone please list out the pros and cons of cascading deletes and updates ? Thanks.
Pros:
When you delete a row from the Parent table all the foreign key rows are deleted
This is usually faster than implementing this with triggers
Orphaned rows are unlikely
Cons
Orphans are possible
If by mistake you delete a row in the parent table all the rows in the corresponding child tables will be deleted and it will be PITA to figure out what you deleted
This depends on the entities that are contained in the tables: if the side of the foreign key cannot exist without the side of the primary key, it makes sense to have cascaded delete.
E. g.: An invoice line item does not have any right to survive if the invoice is deleted.
But if you have a foreign key used for the relationship "works for" for the relationship between an employee and his/her boss, would you want to delete the employee if the boss leaves the company?
In addition: a technical issue is that some ORM (object relational mapping) tools are confused if dependent table entries change without them being responsible for that.
Pros:
Data integrity - Can help avoid situations where a record refers to something that is no longer there.
Cons:
Performance - Cascading deletes/updates can be sloooooooooooooooooooow.
Complexity - It seems most people I work with are not used to cascades, so when you give them a new project that has it, they're a bit surprised the first time they trigger one of these cascades.
As others have already mentioned, can really mess things up when used improperly.
Related
I'm curious if something like this is possible, if at all reasonable.
I have a column in a table, that's called ref_table and it points to a table that the current entry relates to. Let's say, in table table_people, Person ID 1 is a client and Person ID 3 is an employee, so respectively their ref_tables will show "table_clients" and "table_emplyees". I shouldn't have a problem keeping the values valid through PHP, but what would some ways of achieving it through SQL be?
I tried testing it with a foreign key constraint to INFROMATION_SCHEMA:
FOREIGN KEY `people_constraint_tables` (`ref_table`)
REFERENCES `INFORMATION_SCHEMA`.`COLUMNS`(`COLUMN_NAME`)
ON DELETE RESTRICT
ON UPDATE RESTRICT
No point refining it since it didn't work. It seems like there's one way to make it work but it is a dirty cheat apparently.
Would you do it with triggers? Would you do it at all? Someone with experience with MySQL tell me if that'sreasonable at all, I'd like to know. Thank you.
MySQL doesn't have the facility to do this easily. Other databases do, through generated columns or table inheritance.
Would I do this with triggers? Well, yes and no. If I had to do this with one table and I had to use MySQL and I wanted to introduce relational integrity, then triggers are the way to go. There is little other choice.
But really, I would simply have a different table for each reference type. There is a little bit of overhead in this (in terms of partially filled tables). And for some applications, a single reference table is quite convenient (internationalization comes to mind). But in general, I would stick with the standard method of a separate table for each entity with properly declared foreign key relationships.
Scenario 1 : Foreign keys are defined properly
Scenario 2 : On Deletion and Updation of records, I can ensure that no orphaned data will be left while writing API without defining any FK. Also I will check data integrity while Insertion
So what is the difference between these two scenarios? I just want to know what benefits I will get using FK (quantitative analysis). Will I achieve better performance in Scenario - 1 than 2? I am newbie in MySQL database design.
Performance differences...
An FK check must reach into the other table (via an index) to do the integrity check. However...
Situation 1: Simple FK:
In many cases, you can, via understanding the flow of the app code, assure yourself that some FK violations "cannot" happen. For example, when you insert into two tables in a row (and you have checked for errors, etc), and the second table is to point to the first table's row you just inserted, then the FK test is redundant and hurts performance (a little).
If, on the other hand, you "simulate" an FK check by doing an extra SELECT, that would be a noticeable performance hit.
Situation 2: FK with cascading delete:
If you use FKs for "cascading delete" (etc), then this may be more efficient than manually doing the delete.
Further, if you can batch the DELETEs, it is probably faster than letting the cascade do them one by one.
Another comment: "Side effects", such as 'cascading' are considered (by some) to be a naughty coding practice.
But... The above differences are probably not enough to make a difference in your app.
I'm working on a database and am using many foreign keys to connect my tables. By default, MySQL sets all of the ON UPDATE and ON DELETE to RESTRICT. This seemed to work fine.
Then one time, I wanted to change the id of several of the rows in a table. This table was involved in many relationships, so I changed the relations to CASCADE so that the changes would be cascaded to the tables that used that id as a foreign key.
Now I think to myself, is there any reason to leave the relations as RESTRICT, since CASCADE seems to make my life easier?
Sometimes you do want to implement all relations logic in your code. And you use RESTRICT just to control yourself - to see errors when you forgot some case.
Also keep in mind that CASCADE operation are sometimes very unexpected, and with huge codebase you should always keep it in mind. So it can be a good solution not to use them at all - helps to organise design of your application, BTW.
Another approach - sometimes you do have circular relations (for example with denormalisation) and CASCADE is impossible to use.
Changing IDs should be a pretty rare thing. Resist the temptation to remove gaps or keep your IDs "tidy," if that was your motivation. If it was, then the answer is easy. No, don't. :) It's generally important, particularly in more complex systems, that an ID in a given domain never represents more than different "thing," ever, even separated by time. Note how if you insert a row into a auto-increment table, then roll back the insert, the id isn't reused, even if there were no other competing actions.
If an id is an auto-increment or any other kind of surrogate key, it's rare that there would be a legitimate reason to change it, and cascading updates are more likely to fire only in response to an error made by you or the code... not for an intentional change.
Cascading updates should generally only be considered for natural keys, where there's a possibility that the parent table's primary key might actually need to be updated.
Natural keys are ids that come from the real world, like a vehicle's VIN or the tax authority assigned parcel number of a piece of real estate, seen as primary keys more frequently in theory than in actual practice... while surrogate keys have zero meaning outside the database (and thus should not be exposed to the user), such as auto increments and internally-generated GUIDs.
Cascading deletes are legitimately used where deleting the parent is to be expected in the normal course of events and the sudden disappearance of all the child rows is also desirable.
As a rule, though, if in doubt, RESTRICT is always the safest course.
Say I have a table called 'child' that contains a foreign key referencing another table called 'parent'. If the parent table contains column values I frequently want to access when SELECTing from the child table, is it better to JOIN the tables on the foreign key or store the columns i'm frequently accessing from the parent table a second time in the child table.
Sometimes I also have a third 'grandchild' table that references the child table and want a mixture of information from all 3 tables. A triple JOIN seems like i'm over complicating it.
I feel like there's a much better way to go about this. If anyone has advice or some good resources on this topic, let me know.
This question is based on premature optimization, which is bad.
You're talking about denornalization, which should only be done is there's a genuine and pressing performance problem. While your idea sounds enticing, it's almost always a bad idea, because:
you're only doing it for performance reasons, but databases are pretty fast - you're unlikely to benefit much anyway
denormalizing introduces complexity - if you change a child value you must keep the value in the parent updated. This is a big hassle (not going into detail here)
you don't even know if you have a performance problem: if it ain't busted, don't fix it
I have this idea I've been mulling around in my head based on another concept I read somewhere. Basically you have a single "Primary" table with very few fields, other tables inherit that primary table through a foreign key. This much has been done before so its no news. What I would like to do, is to have virtually every table in the database inherit from that Primary table. This way, every object, every record, every entry in every table can have a fully unique primary key(since the PK is actually stored in the Primary table), and can be simply referenced by ID instead of by table.
Another benefit is that it becomes easy to make relationships that can touch multiple tables. For example: I have a Transaction table, and this table wants to have a FK to whatever it is a transaction for(inventory, account, contact, order, etc.). The Transaction can just have a FK to the Primary table, and the necessary piece of data is referenced through that.
The issue that keeps coming up in my head, is whether or not that Primary table will become a bottleneck. The thing is gonna have literally millions of records at one point. I know that gigantic record sets can be handled by good table design, but whats the limit?
Has anyone attempted anything similar to this, and what were your results?
You have to consider that this table will have a tons of foreign key relations. These can cause performance issues, if you want to delete a row from the root table. (Which can cause some nasty execution plans on delete)
So if you plan to remove rows, then it could impact performance. I recently had issues with a setup like this, and it was a pain to clean it up (it was refferencing 120 other tables - deletes where slow as hell).
To overcome this performance issue, you might consider not enforcing contrains (Bad plan), using no contrains for performance (Bad plan), or try to group all data that belongs to one entity in one row, and stick to the normal normalization practices (Good plan)
Yes, the primary table will almost certainly be a bottleneck.
How do you enforce real referential integrity?
For example, How can you be sure that the transaction's FK is actually linked to an inventory, account, contact or order rather than an apple, orange or pineapple?
I think this is something that would be a horrible bottleneck. Not only that it would make enforcing the real PK/FK relationships much harder. It could create a data integrity nightmare. I don't see where you gain any benefits at all.