I'm having trouble determining the proper database design involving entities that may or may not have a relationship with a super entity, or parent entity.
I have the tables work_orders, work_order_groups, and contracts for the corresponding entities.
Originally, it started out with work orders having a 1-to-1 relationship with contracts. But the concept of child work orders was introduced, for extra work of a different type. They were still work orders, but branched from a main work order, so a parent_work_order_id was added to represent that relationship as a foreign key referencing the id of another work order. So a work order could have 0-to-1 parent work orders.
Those child work orders shared the same contract with the parent work order, thus the relationship was changed to 1-to-M for contracts to work orders.
Hopefully at this point, the database design sounds acceptable. Now this is where I'm having some doubts. We've introduced a package deal, where there can be multiple work orders grouped together. We need a work order group entity to exist, so that we can record details about that group. The work orders would have a 0-to-1 relationship with the work order groups.
The contract is going to be created on details from the work order group, so I'm thinking the contract should be associated to the work order group. This would give a 1-to-1 relationship between work order groups and contracts. Similar to child work orders, all work orders within a group are going to share the same contract. And so that can cascade down to all child items: You'd have a work order group with a contract, all the work orders share that contract, and then all the child work orders too. I'm unsure about this design. In my head, I've got something like this:
Is this acceptable? I feel like I'm going to be storing the contracts_id in multiple places, although that is indeed how the relationship works out.
0-to-1 -- Have the id (of the '0') be a valid id or NULL (or perhaps 0) sitting in the Entity table for the '1'.
1:many -- In a single table ('tree'/'parent-child'/'hierarchy') -- The child record has the id of the one parent. The ultimate ancestor (if there is such) will have a parent of NULL or 0. A childless parent is possible, but not obvious by looking at the parent row.
1:many -- In two tables, do something similar.
Related
Scenario: Multiple Types to a single type; one to many.
So for example:
parent multiple type: students table, suppliers table, customers table, hotels table
child single type: banking details
So a student may have multiple banking details, as can a supplier, etc etc.
Layout Option 1 students table (id) + students_banking_details (student_id) table with the appropriate id relationship, repeat per parent type.
Layout Option 2 students table (+others) + banking_details table. banking_details would have a parent_id column for linking and a parent_type field for determining what the parent is (student / supplier / customers etc).
Layout Option 3 students table (+others) + banking_details table. Then I would create another association table per parent type (eg: students_banking_details) for the linking of student_id and banking_details_id.
Layout Option 4 students table (+others) + banking_details table. banking_details would have a column for each parent type, ie: student_id, supplier_id, customers_id - etc.
Other? Your input...
My thoughts on each of these:
Multiple tables of the same type of information seems wrong. If I want to change what gets stored about banking details, thats also several tables I have to change as opposed to one.
Seems like the most viable option. Apparently this doesnt maintain 'referential integrity' though. I don't know how important that is to me if I'm just going to be cleaning up children programatically when I delete the parents?
Same as (2) except with an extra table per type so my logic tells me this would be slower than (2) with more tables and with the same outcome.
Seems dirty to me with a bunch of null fields in the banking_details table.
Before going any further: if you do decide on a design for storing banking details which lacks referential integrity, please tell me who's going to be running it so I can never, ever do business with them. It's that important. Constraints in your application logic may be followed; things happen, exceptions, interruptions, inconsistencies which are later reflected in data because there aren't meaningful safeguards. Constraints in your schema design must be followed. Much safer, and banking data is something to be as safe as possible with.
You're correct in identifying #1 as suboptimal; an account is an account, no matter who owns it. #2 is out because referential integrity is non-negotiable. #3 is, strictly speaking, the most viable approach, although if you know you're never going to need to worry about expanding the number of entities who might have banking details, you could get away with #4 and a CHECK constraint to ensure that each row only has a value for one of the four foreign keys -- but you're using MySQL, which ignores CHECK constraints, so go with #3.
Index your foreign keys and performance will be fine. Views are nice to avoid boilerplate JOINs if you have a need to do that.
If I have two different types of user, Parent and Child. They have identical fields, however a Child has a one to many relationship with exams, a relationship that does not exist for Parents.
Would Parent and Child best be modelled as a single table, or combined?
What if I have two different types of user, Parent and Child. They are the same apart from a child belongs to a school (a school has many children)
again, Would Parent and Child best be modelled as a single table, or combined?
They have identical fields, however a Child has a one to many relationship with exams
Even when fields are the same, different constraints1 means you are dealing with logically separate entities. Absent other factors, separate entities should be put into separate physical tables.
There may, however, be reasons to the contrary. For example, if there is a key that needs to be unique across parents and children combined, or there is another table that needs to reference all of them etc...
If that's the case, then logically both "parent" and "child" are inheriting from the "person", containing the common constraints (and fields). Such "inheritance" can be represented by either storing the whole hierarchy into a single table (and setting unused "portion" to NULL), or by separating all three "classes" into their own tables, and referencing the "base class" from "inherited classes", for example2:
PERSON_ID is unique across all parents and children. In addition to that, OTHER_TABLE can reference it directly, instead of having to separately reference PARENT_ID or CHILD_ID.
1 A foreign key in this case.
2 A very simplified model that just illustrates the point above and does not try to model everything you mentioned in your question
Parent and Child both are Persons without a doubt. You should never put them in seperate tables.
Only time separates them : what if a Child becomes a parent?
A parent easily can have children for that you need a relationship table.
Als a relationship table is the right way to model school membership.
so tables here :
person
is_child_of (many to many, join table) --> relations between persons can be is_parent_of
plain and simple
Remember : being a child is a relation from person to person.
How would you model a grandchild if needed? Yet another table? And a great grandchild?
And supose you are fine with that and you make a lot of tables for a lot of "kind of" relationships, an all of a sudden you want you have to add a field (day of birth) or alter a field format : you have to do it in all your different tables.
You have described three different entities in your question -- Parents, Child(ren), and Exams. You have shows how the three differ in the their relationships to each other.
From everything in your question, I would say that you have three entities and you should set them up as such in your database. That is, Parent and Child should have separate tables.
If the items are modeled differently in your system, I would say that they should be different tables. Just because parents and children both have similar properties (names, ages, etc.), does not mean they will always have the same relationships with other entities in your database. You could have parent and child be in same table with column relating child to parent, but this just leads to awkward self-join queries when trying to represent this relationship. This in itself can become very odd if a child has two parents. So to me you would have a number of different tables:
parent
child
child_to_parent (many-to-many join table)
school
child_to_school (one school to many children)
classes
child_to_classes (many-to-many)
classes_exams (one-to-many table relating exams to classes)
child_to_classes_exams (many-to-many table relating children to exams for specific classes)
* and maybe things like the following
teacher
teacher_to_classes (many-to-many)
Now certainly in your class design (if you are using OOP), you could have child and parent extend from a common "person" class which would handle logic for setting common properties.
I would break children out into a separate table for the simple reason that a parent may have many children (if not now, maybe in the future).
More than the current need, it is also important to consider what may happen in the future even if it doesn't make sense now. You may have two parent users who want to administrate one child and that child's exams.
Consider the most possible future functional requirements and program with them in mind.
I have a table structure for categories,
Each category can have many children, but can only be a child of one category.
I am deciding whether I should just add a parent_id column to the table, or whether I should add a reference table that just has the parent_id and child_id and maps them together.
If it is important a category can have no parent, in which case it is a root category.
The most common use cases for this table will be:
Selecting one category
Selecting all categories
Selecting all children of a category
Selecting the parent of a category
Selecting the child tree of a category
As far as I can tell there is no benefit of having a reference table for this setup? can anyone see any reason for using a reference table?
If you add a reference table, you create an n:n relationship, which you don't want.
So just add a parent_id to the table.
Make it nullable so you can define a category as the root.
Everything you want to select is quite easy, except for the child tree, but an extra table won't help with that. In Oracle you got connect by to select tree-like data, but MySQL unfortunatly doesn't support that, although alternative solutions are often requested and provided.
There are some obstacles:
Since you cannot make parent_id unique (multiple childs can have the same parent), you will have to add a trigger to enforce only one category being the root, although maybe you can live without that check for the moment.
You could theoretically create a loop: Make a the parent of b, b the parent of c, and c the parent of a. To check if this is the case, you should follow the path to the root. If on that path you'll find any category twice, you're in trouble. I think you could use a trigger to validate this as well, although maybe you can live without that check for the moment. It all depends on how you edit your data, but if you are going to query a full tree, you don't want to get into endless loops because of corrupt data.
I'm building a simple way to insert customer orders into the db.
We have several products, each one needs different properties.
I've started designing the following tables:
CUSTOMER -> Order (FK to CUSTOMER) -> OrderItem (FK to Order)
Now I'm thinking How could I link product-specific tables to OrderItem.
Suppose I've two products: product1 (room_name, width, height, color) and product2 (number, width, height, type, optionals). I'd create two different tables and link them with the OrderItem, to get specific options, am I wrong? (of course there will be more than just two products)
How can I do this?
I'd have one Product table with a one-to-many relationship between OrderItem and Product. Put a FOREIGN KEY in the OrderItem table that points to its associated Product.
A design like yours would mean you'd have to add a table every time there was a new product. That would not do. You want to add products by inserting new rows.
No approach can resolve all of the issues you may be dealing with, the choice you make depends on which factor is most important to you.
Most people shirk away from having multiple tables. One reason is that you don't know how many tables you may end up with in the future. Another is that your queries may also bloat by having to join to multiple tables. And it may become a maintenance headache with multiple queries to update every time you add a table. Finally, adding a table is not even remotely as friendly as adding a record (Do you really want your App to be able to create tables?).
One option is just to add more and more fields to the Product table. By making the property fields NULLable, different products can use different fields.
But... You may then need to add logic to ensure that ProductX -always- has a value in FieldA, but that ProductY always has a value in FieldB, etc. And probably some meta-data about each product type so that your application knows which fields to use for which products. You still may need to add new fields, which is possibly tidier than adding new tables, but you still probably don't want the Application doing.
An option that totally avoids using DDL to add a product is to further normalise your data, and have the product-specific-properties in an Entity-Attribute-Value table. This is initially very attractive to many people as it is so generic and flexible.
Product(id, name, another-global-property, etc)
Product_Properties(product_id, property_id, property_value)
You'll probably have some meta-data and extra logic to ensure all the correct properties are used. But now you just add records to a generic structure whenever you create a new product.
But what type should "property value" be? It may need to hold strings, dates, numbers, anything. You could make it a string and use the meta-data to know how to CAST the value. Of you may have several value fields, one of each type, and a "field_type_id" or something to indicate which value-field should be read from.
It's also less friendly for certain searches. If you know a product_id, finding the properties is easy. If you want all products where the expiry date is in the past, you need to be careful about how you structure the data and indexes to make the query efficient. But if you want (expiry < today AND cost > 50) then you get a much different query from what you are used to - Each value is in a different ROW instead of a different FIELD.
Search performance really does begin to shrink as query complexity increases and design considerations become more technical.
Which way you go depends on application functional requirement, architecture and design decisions, and a good helpful dash of 'taste'.
You have tagged question as django. Then you should read this recent post:
Coding an inventory system, with polymorphic items and manageable item types
In this post #ThibaultJ explain how to accomplish this with Django model utils.
The idea is that you have a 'product' model and you inherit product1 and product2 from this model adding specific information for both. #ThibaultJ has posted intesting samples.
I will notice #ThibaultJ about this question. If #ThibaultJ writes an answer I will remove my post.
Here are some options
IMHO I would choose an Inheritance pattern, i.e. a new table called "ProductBase" with a unique Surrogate. Product base would have a classification e.g. "ProductType" which would then allow you to join into the appropriate 'subclass' Product table. OrderItem would reference just the Surrogate. Referential Integrity is enforcable, and it gives the opportunity for extending to additional forms of products. It does however require the use of a common unique surrogate amongst all Product table types. If there are other tables (other than OrderItem) referencing Product, it would also avoid the use of having to FK to composite keys.
Nullable Foreign Keys in OrderItem, i.e. OrderItem would have nullable FK to both (all) types of Product Tables, although only one of them would be present on each row.
By inner joining OrderItem to the appropriate Product tables would eliminate the 'wrong' product joins based on the NULLs. RI can still be enforced.
If you have the SAME type of Primary Key on all your Product subclass tables, then you could also add a single Product "Foreign" Key and a "ProductType" "Switch" on OrderItem. The problem here is that you can't enforce RI.
That said, I really wouldn't be creating a new table for each and every product - surely there are some broad 'categories' of Product which can be modelled in a uniform manner.
No doubt if you sell Aircraft and Groceries that you would probably need a AircraftProduct and a GroceryProduct, but surely A300, Boeing 747 and Cessna Skyhawk would fit as rows inside AircraftProduct, even if there are a few 'optional' nullable fields in each table not applicable to all products in this 'category'?
Edit : First see Dems and Duffmo's posts to see if you can avoid the requirement for having multiple Product tables at all, by using EAV / Multivalue / Metadata patterns to model Product.
I am trying to determine how the tables need to be linked in which ways.
The employees tables is directly linked to a number of tables which provide more information. A few of those tables have even more details.
Employees have a unique employeeid but I understand best practice is to still have a id?
Customers have a unique customerid
Employees have a manager
Managers are employees
Customers have a manager associated with them. manager associated with them
Employees may have a academic, certification and/or professional information.
With all of this said what would be the best recommendation for creating primary and foreign keys? Is there a better way to handle the design?
EDIT
Updated diagram to reflect feedback thus far. See comments to understand changes taking place.
Though your question is sensible, before you go any further in design, I would suggest for you to spend some time understanding relationships, foreign keys and how they propagate through relationships.
The diagram is utterly wrong. It will help it you start naming primary keys with full name, TableNameID, like EmployeeID; then it will become obvious how keys propagate through relationships. If you had full names you would have noticed that all your arrows are pointing in the wrong direction; parent and child are reversed. Anyway, takes some practice. So I would suggest that you rework the diagram and post the new version, so that we can comment on that one. It should start looking something like this (just a small segment)
EDIT
This is supposed to point you to towards the next step. See if you can read description (specification) and follow the diagram at the same time.
Each employee has one manager, a manager manages many employees.
A manager is an employee.
Each customer is managed by an employee who acts as an account manager for that customer.
Account manages for a customer may change over time.
Each employee is a member of one team, each team has many employees.
Employee performance for each employee is tracked over time.
Employee may have many credentials, each credential belongs to one employee only.
Credential is academic or professional.
Employees have unique employeeID but I understand best practice is to
still have a id?
No. (But keep reading.) You need an id number in two cases: 1) when you don't have any other way to identify an entity, and 2) when you're trying to improve join performance. And that doesn't always work. ID numbers always require more joins, but when the critical information is carried in human-readable codes or in natural keys, you don't need a join to get to it. Also, routine use of ID numbers often introduces data integrity problems, because unique constraints on natural keys are often omitted. For example, USPS postal codes for state names are unique, but tables that use ID numbers instead of USPS postal codes often omit the unique constraint on that two-character column. In your case, you need a unique constraint on employee number regardless. (But keep reading.)
Employees have a manager.
Does the table "team" implement this requirement? If the "manager" table identifies managers, then your other manager columns should reference "manager". (In this diagram, that's customers, team, and customer_orders.)
Managers are employees.
And that information is stored in the table "manager"?
Customers have a manager associated with them.
And so does each order. Did you mean to do that?
Employees may have a academic, certification and/or professional
information.
But apparently you can't record any of that information unless you store a skill first. That doesn't sound like a good approach. Give it some more thought.
Whenever you design tables with overlapping predicates (overlapping meanings), you need to stop and sit on your hands for a few minutes. In this case, the predicates of the tables "employees" and "customers" overlap.
If employees can be customers, which is the case for almost every business, then you have set the stage for update anomalies. An employee's last name changes. Clearly, you must update "employees". But do you have to update "customers" too? How do you know? You can't really tell, because those two tables have independent id numbers.
An informal rule of thumb: if any real-world entity has more than one primary key identifying it in your database, you have a problem. In this case, an employee who is also a customer would have two independent primary keys identifying that person--an employee id and a customer id.
Consider adding a table of persons, some of whom will be employees, and some of whom will be customers. If your database is well-designed and useful, I'll bet that later some of the persons will be prospects, some will be job applicants, and so on. You will need an id number for persons, because in the most general case all you can count on knowing is their name.
At some point, you'll have to take your database design knowledge to the next level. For an early start, read this article on people's names.