Guaranteeing a FK relationship through multiple tables - mysql

I'm using MySQL / InnoDB, and using foreign keys to preserve relationships across tables. In the following scenaro (depicted below), a 'manager' is associated with a 'recordLabel', and an 'artist' is also associated with a 'recordLabel'. When an 'album' is created, it is associated with an 'artist' and a 'manager', but both the artist and the manager need to be associated with the same recordLabel. How can I guarantee that relationship with the current table setup, or do I need to redesign the tables?

You cannot achieve this result using pure DRI - Declarative Referential Integrity, or the linking of foreign keys to ensure the schema's referential integrity.
There are 2 ways to solve this problem:
Consider the requirement a database problem, and use a trigger on INSERT and UPDATE to validate the requirements, and fail otherwise.
Consider the nested link a business logic requirement, and implement it in your business logic in PHP/C#/whatever.
As a sidenote, I think the structure is rather strange from a practical perspective - as far as I know an Artist is signed to a RecordLabel, and assigned a Manager separately (either from the label or individually, many artists retain their own manager when switching to another label). Linking the Manager also to the Album only makes sense to record historic managers, enabling you to retrieve who was the manager to the artist when the album was released, but that automatically means your requirement is invalid if the artist switches labels and/or manages later on. I think therefore it is wrong from a practical data view to enforce this link.

What you do is add recordLabel id to the albums table. Then you put two, two column indexes on albumns (recordLabel_id, artist_id) and (recordLabel_id, managers_id).
Because the record_id can only have one value in each row of the albumns table you will have insured integrity.

Related

INSERT based on the relationship between the types of entities to be inserted?

I can't find a term for what I'm trying to do so that may be limiting my ability to find info related to my question.
I'm trying to relate product identifiers and product processing codes (orange table in fig.) with validation against what product types and subtypes are valid for each process code based on process type. Importantly, each product identifier is related to a product type (see ProductIdentifier table) and each process code is related to process type (see ProcessCode table). I minimized the attributes in the tables below to only those necessary for my question.
In the above example, when I INSERT INTO the RunProcessTypeOne table, I need to validate that the ProductCode for RoleOneProductIdentifier is present in ProductTypeTwo. Similarly, I need to validate that the ProductCode for RoleTwoProductIdentifier is present in ProductSubtypeOne.
Of course I can use a stored procedure that inserts into the RunProcessTypeOne table after running SELECT to check for the presence of the ProductCode related to RoleOneProductIdentifier and RoleTwoProductIdentifier in the relevant tables. This doesn't seem optimal since I'm having to run three SELECTs for every INSERT. Plus, it seems fishy that the relationship between ProcessTypes and ProductCodes would only be known within the stored procedure and not via relationships established between the tables themselves (foreign key).
Are there alternatives to this approach? Is there a standard for handling this type of validation where you need to validate individual instances (e.g. ProductIdentifiers) of entity types based on the relationships between those types (e.g. the relationship between ProductTypeTwo and ProcessTypeOne)?
If more details are helpful: The relationship between ProductCode and ProcessCode is many-to-many but there are rules that define product roles in each process and only certain product types or subtypes may fulfill those roles. ProductTypeOne might include attributes that define a specific kind of product like color or shape. ProductIdentifier includes the many lots of any ProductCode that are manufactured. ProcessCode includes settings that are put on a machine for processing. ProductType by way of ProductCode determines if a ProductIdentifier is valid for a particular ProcessType. Individual ProcessCodes don't discriminate valid ProducIdentifiers, only the ProcessType related to the ProcessCode would discriminate.
it seems fishy that the relationship between ProcessTypes and ProductCodes would only be known within the stored procedure and not via relationships established between the tables themselves (foreign key).
Yes that's an important observation, good to see you questioning the current schema. The fact of the matter is that SQL is not very powerful when it comes to representing data structures. So often a stored procedure is the only/least worst approach.
I'll make a suggestion for how to achieve this without stored procedures, but I won't call it "optimal": there's likely to be a performance hit for INSERTs (and worse for UPDATEs), because the SQL engine will probably be in effect carrying out the same SELECTs as you'd code in a stored procedure.
Split table ProductIdentifier into two:
ProductIdentifierTypeTwo PK ProductIdentifier, ProductCode FK REFERENCES ProductTypeTwo.ProductCode.
ProductIdentifierTypeOne PK ProductIdentifier, ProductCode FK REFERENCES ProductTypeOne.ProductCode.
Also CREATE VIEW ProductIdentifier UNION the two sub-tables, PK ProductIdentifier. This makes sure ProductIdentifier isn't duplicated between the two types.
IOW this avoids the ProductIdentifier table directly referencing the ProductCode table, where it can only examine ProductType as a column value, not as a referential structure.
Then
RunProcessTypeOne.RoleOneProductIdentifier FK REFERENCES ProductIdentifierTypeTwo.ProductIdentifier.
RunProcessTypeOne.RoleTwoProductIdentifier FK REFERENCES ProductIdentifierTypeOne.ProductIdentifier.
Making the original ProductIdentifier a VIEW is the least non-optimal way to manage updates (I'm guessing from your comment): ProductIdentifiers are less volatile than RunProcesses.
Re your more general question:
Is there a standard for handling this type of validation where you need to validate individual instances (e.g. ProductIdentifiers) of entity types based on the relationships between those types (e.g. the relationship between ProductTypeTwo and ProcessTypeOne)?
There are facilities included in the SQL standard. Most vendors haven't implemented them, or only partially support them -- essentially because implementing them would need running SELECTs with tricky logic as part of table updates.
You should be able to CREATE VIEW with a filter to only the rows that are the target of some FK.
(Your dba is likely to object that VIEWs come with an unacceptable performance hit. In this example, you'd have a single ProductIdentifier table, with the two sub-tables I suggest above as VIEWs. But maintaining those views would need joining to ProductCode to filter by ProductType.)
Then you should be able to define a FK to the VIEW rather than to the base table.
(This is the bit many SQL vendors don't support.)

SQL for one to one between a single table

I'd like to know what the best way of reflecting relations between precisely two rows from a single (my)sql table is?
Exemplified, we have:
table Person { id, name }
If I want to reflect that persons can be married monogamously (in liberal countries at least), is it better to use foreign keys within the Person:
table Person { id, name, spouse_id(FK(Person.id)) }
and then create stored procedures to marry and divorce Persons (ensuring mutual registration of the marriage or annulment of it + triggers to handle on_delete events..
or use a mapping table:
table Marriage {
spouse_a(FK(Person.id)),
spouse_b(FK,Person.id) + constraint(NOT IN spouse_a))
}
This way divorces (delete) would simply be delete queries without triggers to cascade, and marriage wouldn't require stored procedure.
The constraint is to prevent polygamy / multi-marriage
I guess the second option is preferred? What is the best way to do this?
I need to be able to update this relation on and off, so it has to be manageable..
EDIT:
Thanks for the replies - in practice the application is physical point-to-point interfaces in networking, where it really is a 1:1 relationship (monogamous marriage), and change in government, trends etc will not change this :)
I'm going to use a separate table with A & B, having A < B checked..
To ensure monogamy, you simply want to ensure that the spouses are unique. So, this almost does what you want:
create table marriage (
spouse_a int not null unique,
spouse_b int not null unique
);
The only problem is that a given spouse can be in either table. One normally handles this with a check constraint:
check (spouse_a < spouse_b)
Voila! Uniqueness for the relationship.
Unfortunately, MySQL does not support check constraints. So you can implement this using a trigger or at the application layer.
Option #1 - Add relationships structurally
You can add one additional table for every conceivable relationship between two people. But then, when someone asks for a new ralationship you forgot to add structurally, you'll need to add a new table.
And then, there will be relationship for three people at a time. And then four. And then, variable size relationships. You name it.
Option #2 - Model relationships as tables
To make it fool proof (well... never possible) you could model the relationships into a new table. This table can have several properties such as size, and also you can model restrictions to it. For example, you can decide to have a single person be the "leader of the cult" if you wish to.
This option requires more effor to design, but will resist much more options, and ideas from your client that you never thought before.

MySQL Database Layout/Modelling/Design Approach / Relationships

Scenario: Multiple Types to a single type; one to many.
So for example:
parent multiple type: students table, suppliers table, customers table, hotels table
child single type: banking details
So a student may have multiple banking details, as can a supplier, etc etc.
Layout Option 1 students table (id) + students_banking_details (student_id) table with the appropriate id relationship, repeat per parent type.
Layout Option 2 students table (+others) + banking_details table. banking_details would have a parent_id column for linking and a parent_type field for determining what the parent is (student / supplier / customers etc).
Layout Option 3 students table (+others) + banking_details table. Then I would create another association table per parent type (eg: students_banking_details) for the linking of student_id and banking_details_id.
Layout Option 4 students table (+others) + banking_details table. banking_details would have a column for each parent type, ie: student_id, supplier_id, customers_id - etc.
Other? Your input...
My thoughts on each of these:
Multiple tables of the same type of information seems wrong. If I want to change what gets stored about banking details, thats also several tables I have to change as opposed to one.
Seems like the most viable option. Apparently this doesnt maintain 'referential integrity' though. I don't know how important that is to me if I'm just going to be cleaning up children programatically when I delete the parents?
Same as (2) except with an extra table per type so my logic tells me this would be slower than (2) with more tables and with the same outcome.
Seems dirty to me with a bunch of null fields in the banking_details table.
Before going any further: if you do decide on a design for storing banking details which lacks referential integrity, please tell me who's going to be running it so I can never, ever do business with them. It's that important. Constraints in your application logic may be followed; things happen, exceptions, interruptions, inconsistencies which are later reflected in data because there aren't meaningful safeguards. Constraints in your schema design must be followed. Much safer, and banking data is something to be as safe as possible with.
You're correct in identifying #1 as suboptimal; an account is an account, no matter who owns it. #2 is out because referential integrity is non-negotiable. #3 is, strictly speaking, the most viable approach, although if you know you're never going to need to worry about expanding the number of entities who might have banking details, you could get away with #4 and a CHECK constraint to ensure that each row only has a value for one of the four foreign keys -- but you're using MySQL, which ignores CHECK constraints, so go with #3.
Index your foreign keys and performance will be fine. Views are nice to avoid boilerplate JOINs if you have a need to do that.

SQL Table Design Issue

So I am building out a set of tables in an existing database at the moment, and have run into a weird problem.
First things first, the tables in question are called Organizations, Applications, and PostOrganizationsApplicants.
Organizations is a pre-existing table that is already populated with lots of data in regards to an organization's information which has been filled out in another form on another portal. EDIT: I cannot edit this table.
Applications is a table that records all information that a user inputs in the application form of the website. It is a new table.
PostOrganizationsApplicants is basically a copy of Organizations. This is also a new table.
The process goes:
1. Go to website and choose between two different web forms, Form A pertains to companies who are in the Organizations table, and Form B pertains to companies who are not in that table.
2a. If Form A is chosen, a lot of the fields in the application will be auto-populated because of their previous submission.
2b. If Form B is chosen, the company has to start from scratch and fill out the entire application from scratch.
3. Any Form B applicants must go into the PostOrganizationsApplicants table.
Now I am extremely new to SQL and Database Management so I may sound pretty stupid, but when I am linking the Organizations and PostOrganizationsApplicants tables to the Applications table, FK's for the OrganizationsID column and PostOrganizationsApplicantsID columns will have lots of empty spaces.
Is this good practice? Is there a better way to structure my tables? I've been racking my brain over this and just can't figure out a better way.
No, it's not necessarily bad practice to allow NULL values for foreign key columns.
If an instance of an entity doesn't have a relationship to an instance of another entity, then storing a NULL in the foreign key column is the normative practice.
From your description of the use case, a "Form A" Applications won't be associated with a row in Organizations or a row in PostOrganizationsApplicants.
Getting the cardinality right is what is important. How manyOrganizations can a given Applications be related to? Zero? One? More than One? And vice versa.
If the relationship is many-to-many, then the usual pattern is to introduce a third relationship table.
Less frequently, we will also implement a relationship table for very sparse relationships, when a relationship to another entity is an exception, rather than the rule.
I'm assuming that the OrganizationsID column you are referring to is in the PostOrganizationsApplicants table (which would mean that a PostOrganizationsApplicants can be associated with (at most) one Organizations.
I'm also assuming that PostOrganizationsApplicantsID column is in the Applications table, which means an instance of Applications can be associated with at most one PostOrganizationsApplicants.
Bottomline, have a "zero-or-one to many" relationship is valid, as long as that supports a suitable representation of the business model.
Why not just add a column to the Organizations table that indicates that the Organization is a "Post" type of organization and set it for the Form B type of applicants? - then, all your orgs are in one table - with a single property to indicate where they came from.
If you can add a new record to Organizations (I hope you can) just
create FK from Organizations as PK of PostOrganizationsApplicants. So
if Organizations has corresponding record in PostOrganizationsApplicants - it's "Post"!
Thanks everybody, I think I found the most efficient way to do it inspired by all of your answers.
My solution below, in case anyone else has a similar problem...
Firstly I will make the PK of PostOrganizationsApplicants the FK of Organizations by making a "link" table.
Then I am going to add a column in PostOrganizationsApplicants which will take in a true/false value on whether they completed the form from the other portal or not. Then I will ask a question in the form whether they have already done the other version of the form or not. If the boolean value is true, then I will point those rows to the Organizations table to auto-populate the forms.
Thanks again!

Is it proper to make a grand-parent key, a primary key, in its grand-child, in a multi-level identifying relationship?

Asked this here a couple of days ago, but haven't gotten many views, let alone a response, so I'm reposting to stackoverflow.
I'm modeling a DB for a conference ticketing system. In this system attendees are members of an attendee group, which belong to a conference. These relationships are identifying, and therefore FKs must be PKs in the respective children.
My current model:
Q: Is it proper to have attendeeGroupConferenceId FK, as a PK, in the attendee table, as MySQL Workbench has automatically set up for me?
On one side one would get a performance boost by keeping it in there for quick association at "check in". However, it does not strictly necessary since the combination of id, attendeeGroupId, and a corresponding lookup of conferenceId in the respective attendeeGroup table, is enough. (Therefore becomes redundant data.)
To me, it feels like it might violate some form of normalization, but I plan on keeping it in for the speed boost as described. I'm just curious about what proper design says about giving it PK status or not.
You definitely don't need the attendeeGroupConferenceId in your attendee table. It's redundant and notice that candidate key is the combination of (attendeeGroupId, personId), not the attendeeGroupConferenceId alone.
The table attendee also seems to violate the Second normal form (2NF) as it is.
My suggestion is to remove the attribute attendeeGroupConferenceId. In any case you can just join the tables in your queries to get extra info rather than keeping an extra attribute.