I am looking for some feedback/guidance on modeling a hierarchy structure within a relational database. My requirement states that I need to have a tree structure, where every node within the tree can represent a different type of data. For example:
Organization
Department 1
Employee 1
Employee 2
Office Equipment 1
Office Equipment 2
Department 1
Team 1
Office Equipment 3
In the example above, Organization, Department, Employee, Office Equipment, and Team could all be different tables within the database and have different properties associated with them. Additionally, things like Office Equipment may not necessarily be required to be associated to a department - it could be associated to a Team or the Organization.
I have two ideas surrounding modeling this:
The first idea is to have a hierarchy table like below:
hierarchys
hierarchy_id (INT, NOT NULL)
parent_hierarchy_id (INT, NOT NULL)
organization_id (INT, NULL)
department_id (INT, NULL)
team_id (INT, NULL)
office_equipment (INT, NULL)
In the table above, each of the columns would be a nullable field with a foreign key reference to their respectable table. The idea would be that only one column from every row would be populated.
My second idea is to have a single table like below:
hierarchys
hierarchy_id (INT, NOT NULL)
parent_hierarchy_id (INT, NOT NULL)
type (INT, NOT NULL)
In this case, the table above would manage the hierarchy structure, and each "node table" would have a hierarchy_id which would have a foreign key reference back to the hierarchy table (i.e. organizations would have a hierachy_id column). The type column would be a lookup to represent which type node is being represented (i.e. Organization, Employee, etc).
I see pros and cons in both approaches.
Some additional information:
I would like to keep in mind maintainability of this table - there will be additions, deletions, changes, etc.
I will have to display this data on an user interface, which will likely just display an icon to represent the node type, and the name.
I will have to preform some aggregations across the tree for different data requests.
This structure will be backed by a MySQL database.
Does anyone have an experience with a similar scenario? I have searched quite a bit for information and guidance on this approach, but have not been able to find any information. I have a feeling there is a specific term for what I am looking for that I am failing to use.
Thank you in advance for the community's help.
You may want to look into "nested sets". This is a model for representing subsets of an ordered set by two limits, which we can call "left" and "right". In this model, (6,7) is a subset of (5,10) because it is "nested" inside of it. If you use nested sets together with your design of having a separate table for the hierarchy, you'll end up with four columns in your hierarchy table: leftID, rightID, ObjectID (an FK), and level.
There is a good description of the nested set model in Wikipedia, which you can view by clicking here.
I have encountered similar situations throughout different projects, and the approach I've taken in those cases was very similar to your second solution.
I am also a bit biased towards how some Ruby on Rails gems do things, but you can easily figure out how you would implement these techniques with plain SQL and some application logic. So I'm giving you one alternative to your solution:
Using "Multi Table Inheritance" (Implemented in Heritage: https://github.com/dipth/Heritage). In this scenario you would have a Node table which forms the basis of your hierarchy with:
Node (id, parent_node_id, heir_type, heir_id)
Where the heir_type is the name of the table holding the details for the node (e.g., Organization, Employee, team, etc.), and the heir_id is the id of the object in that table.
Then each type of node would have it's own table and it's own unique id. e.g.:
Organization(id, name, address)
Having the rest of the tables independently from the hierarchy (i.e., strong entities) makes your model more flexible to new additions. Also having a separate table with its own unique id to handle the hierarchy makes it easier to render the hierarchy without having to deal with parent types etc. This model is also more flexible in the sense that one entity can be part of many different branches of the hierarchy (e.g., Employee 1 could be a member of Team 1 and Team 2 at the same time.)
Your solution has one mistake: The hierarchys is miss-spelled :P JK. The hierarchys table has no unique id. It looks like the unique id is a composite key (hierarchy_id, type). The parent_hierarchy_id does not capture the type of the parent and thus it may point to multiple nodes and many inconsistencies.
If you'd like me to elaborate more, let me know.
Related
I am stuck trying to develop a Bill of Materials in Access. I have a table call IM_Item_Registry where I have the Item_Code and a boolean for if it's a component. Where I'm stuck is that past sins of the company made several part numbers for the same ingredient from different vendors. A product may use ingredient 1 at the beginning of the run and ingredient 2 at the end of a run depending on inventory and it may switch from job to job (Lack of discipline and random purchasing based on price). It's creating a headache for me because they typically have different inclusions. How would I go about adding in the flexibility to use both? or would it just be easier to make multiple versions and then select those version upon scheduling?
I know this is loaded and I can include more detail if needed but I appreciate your help I've been researching on how to do this for a couple weeks now.
EDIT (3/28/2019)
this is for an injection molding company.
IM_Item_Registry (Fields: Item_Code, Category(Raw, manufactured, customer supplied, assembly component), Description, Component (boolean), active (boolean), Unit of Measure.
for this Bill-of-materials 100011 produces component lets call this a handle. bill 100011 uses raw resin 700049 at 98% inclusion and raw color 600020 at 2% inclusion. However, we may run out of raw color 600020 and have to run it out of 600051 which would change 700049 to 98.5% inclusion because 600051 requires 1.5% inclusion to achieve the same color.
i would like to create a table that would call out for the general term lets say 600020 and 600051 is yellow color additive. then create a "ghost" number to call for either 600020 or 600051 and give both formulation recipes. When production starts they would scan in which color they actually used to create the production BOM themselves and record which color was used and how much. is there a way to do this in access database structuring?
I'm assuming I would need both the item_registry table, a BoM table (fields: BOM#, ParentID, Ghost_ID) and then a components table (Fields: Ghost_ID, item_code, Inclusion Rate).
Database normalization is the guiding principle for designing efficient, useful tables and relationships in a relational database. Access forms, subforms, reports, etc. require properly normalized tables to work as intended. There are various levels of normalization, but the common idea is to avoid duplication of data between rows and columns of data. Having duplicate data requires a lot of overhead in storage and in ensuring that actions on the database do not create inconsistent states (contradictory data values). Well-normalized tables allow useful constraints to be defined between data columns and/or rows to ensure that data is valid.
The [BoM] table as proposed in the question is not normalized. But before we get to that, the ParentID was not defined and it's not clear what it represents. Instead, to help show why it's not normalized, let me add a [Product] column to the [BoM] table. Then if such a handle has two alternative lists of components (ghosts?), the table would look like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Handle 2
See the duplication? And now if the product is renamed, for instance to "Bronze Handle", then both rows need to be updated for a single conceptual element. It also introduces the possibility of having contradictory data like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Bronze Handle 2
Enough said about that, since I've already gone on too much about normalization concepts here. Following is a basic normalized schema which would serve you better, but notice that it's not too much different that what you proposed in the question. The only real difference is that the BoM table is normalized by splitting its columns (and purpose) into another table.
I do not list all columns here, only primary and foreign keys and a few other meaningful columns. PK = Primary Key (unique, non-null key), FK = Foreign Key. Proper indices should be defined on the PK and FK columns AND relationships defined with appropriate constraints.
Table: [IM_Item_Registry]
Item_Code (PK)
Table: [BOM]
BOMID (PK)
ProductID (FK)
Table: [BOM_Option]
OptionID (PK)
BOMID (FK)
Primary (boolean) - flags the primary/usual list of components
Description
Table: [Option_Items]
OptionID (FK; part of composite PK)
Item_Code (FK; part of composite PK)
Inclusion_Rate
The [BOM].[ProductID] column alludes to another table with details of the product which should be defined separately from the Bill of Material. If this database really is super-simplistic, then it could just be a string field [Product] containing the name, but I assume there are more useful details to store. Perhaps this is what the ParentID also alluded to? (I suggest choosing names that are not so abstract like "parent" and "ghost", hence my choice of the word "option".)
Really, since [BOM_Option] should be limited to a single option per BOM, it would fulfill proper normalization to create another table like
Table: [BOM_Primary]
[BOMID] (FK and PK) - Primary key so only one primary option can be defined at once
[OptionID] (FK)
I am creating a database for a university dept(for internal use) and this database tracks issues related to people. I get information of only employees at the university and I am tracking them using their university ID. But the database is also intended for people who are not employees at the university or even sometimes people outside the university. I want to assign an ID to these people but store values within same column as university id. Any ideas how I should tackle this issue? I don't know how to keep the univ id and the no. I am going to give to the others in the same column and yet treat them differently (when needed). How do people usually tackle such issues?
PS: I do not like auto numbers since I cannot delete an ID and get it back into the database
Your best bet here is probably to create a common Person table for all people which simply tracks an Id, then relate separate tables for each of the different kinds of people (ex. Employee, Student, etc.) to it - note that all of these names are just examples. Each of these tables would contain a foreign key to the Person table. In effect, this logically make each of them a different "subclass" of person.
For example, Employee my be related to Person via the foreign key
[Person.PersonId] {PK} <==(1-0)==> {FK:Person.PersonId} Employee.PersonId
This is generic for any Other entity:
[Person.PersonId] {PK} <==(1-0)==> {FK:Person.PersonId} Other.PersonId
Note that any columns common to all types of person may exist directly on the Person table, with each of the "subclasses" of person only recording columns specific to its particular type.
In addition, you may create a view with joins these tables to present a "combined" generic record for people. In some cases, it's useful to include a column in the view that simply indicates which table an individual record came from (ex, class or type.
I personally would not want to mix the two different kind of ids within one field. This is just waiting for trouble when you migrate since you need to keep these apart by convention or through other fields.
You will of course use the uni id to search the person ...
(uni id and an id for people not on uni)
Approach 1 - introduce an additional table
Issue
- id
- person_id
Person
- id
- university_person_id (optional)
University Person (table)
- id
Approach 2 - make the university ref id optional
Issue
- person_id
Person
- id
- university_ref (optional)
What do you think?
But if you really want to go along with mixing the two kind of ids within one field I suggest to use a prefix followed by a generated number.
EXTERNAL-123456
You can also introduce an additional field external_contact (boolean) or contact_type 'Uni' or 'External'. Also add a unique index on the two combined.
Hope this will give you some food for thought :)
We have a J2EE content management and e-commerce system, and in this system – for sake of a simple example – let’s say that we have 100 objects. All of these objects extend the same base class, and all share many of the same fields.
Let’s take two objects as an example: a news item that would be posted on a website, and a product that would be sold on a website. Both of these share common properties:
IDs: id, client ID, parent ID (long)
Flags: deleted, archived, inactive (boolean)
Dates: created, modified, deleted (datetime)
Content: name, description
And of course they have some properties that are different:
News item: author, posting date
Product: price, tax
So (finally) here is my question. Let’s say we have 100 objects in our system, and they all follow this pattern. They have many fields that overlap, and some unique fields. In terms of a relational database, would we be better off with:
Option One: Less Tables, Common Tables
table_id: id, client ID, parent ID (long) (id is the primary key, a GUID for all objects)
table_flag: id, deleted, archived, inactive (boolean)
table_date: id, created, modified, deleted (datetime)
table_content: id, name, description
table_news: id, author, posting date
table_product: id, price, tax
Option Two: More Tables, Common Fields Repeated
table_news: id, client ID, parent ID, deleted, archived, inactive, name, description, author, posting date
table_product: id, client ID, parent ID, deleted, archived, inactive, name, description, price, tax
For full disclosure – I am a developer and not a DBA, and because of that I prefer option one. But there is another team member that prefers option two, and I think he makes valid points.
Option One: Pros and Cons
Pro: Encapsulates common fields into common tables.
Pro: Need to change a common field? Change it in one place.
Pro: Only creates new fields/tables when they are needed.
Pro: Easier to create the queries dynamically, less repetitive code
Con: More joining to create objects (not sure of DB impact on that)
Con: More complex queries to store objects (not sure of DB impact on that)
Con: Common tables will become huge over time
Option Two: Pros and Cons
Pro: Perhaps it is better to distribute the load of all objects across tables?
Pro: Could index the news table on the client ID, and index the product table on the parent ID.
Pro: More readable to human eye: easy to see all the fields for an object in one table.
My Two Cents
For me, I much prefer the elegance of the first option – but maybe that is me trying to force object oriented patterns on a relational database. If all things were equal, I would go with option one UNLESS a DB expert told me that when we have millions of objects in the system, option one is going to create a performance problem.
Apologies for the long winded question. I am not great with DB lingo, so I probably could have summarized this more succinctly if I better understood terms like normalization. I tried to search for answers on this topic, and while I found many that were close (I suspect this is a common DB issue) I could not find any that answered all my questions. I read through this article on normalization:
But I did not totally understand it. On the one hand it was saying that you should remove any redundancies. But on the other hand, it was saying that each attribute should define only one object.
Thanks,
John
You should read Patterns of Enterprise Application Architecture by Martin Fowler. He writes about several options for the scenario you describe:
Single Table Inheritance: One table for all object subtypes. Stores all attributes, setting them NULL where they are inapplicable to the row's object subtype.
Class Table Inheritance: One table for column common to all subtypes, then one table for each subtype to store subtype-specific columns.
Concrete Table Inheritance: One table for each subtype, storing both subtype-specific columns and columns common to all subtypes.
Serialized LOB: One table for all object subtypes. Store common attributes as conventional columns, but combine optional or subtype-specific columns as fields in a BLOB that stores XML or JSON or whatever format you want.
Each one of these designs has pros and cons, so choose a solution depending on the most common way you access your data.
However, notice I use the word subtype above. I would use these designs only if the different object types are subtypes of a common base class. I'm assuming that News item and Product don't actually share a logical base class (besides Object); they are not subtypes of a common superclass.
So for the sake of OO design, I would choose Concrete Table Inheritance. This avoids any inappropriate coupling between these subtypes. There are columns the two tables have in common, but they basically amount to bookkeeping, not anything to do with the function of the class and hence the table.
I'm building a simple way to insert customer orders into the db.
We have several products, each one needs different properties.
I've started designing the following tables:
CUSTOMER -> Order (FK to CUSTOMER) -> OrderItem (FK to Order)
Now I'm thinking How could I link product-specific tables to OrderItem.
Suppose I've two products: product1 (room_name, width, height, color) and product2 (number, width, height, type, optionals). I'd create two different tables and link them with the OrderItem, to get specific options, am I wrong? (of course there will be more than just two products)
How can I do this?
I'd have one Product table with a one-to-many relationship between OrderItem and Product. Put a FOREIGN KEY in the OrderItem table that points to its associated Product.
A design like yours would mean you'd have to add a table every time there was a new product. That would not do. You want to add products by inserting new rows.
No approach can resolve all of the issues you may be dealing with, the choice you make depends on which factor is most important to you.
Most people shirk away from having multiple tables. One reason is that you don't know how many tables you may end up with in the future. Another is that your queries may also bloat by having to join to multiple tables. And it may become a maintenance headache with multiple queries to update every time you add a table. Finally, adding a table is not even remotely as friendly as adding a record (Do you really want your App to be able to create tables?).
One option is just to add more and more fields to the Product table. By making the property fields NULLable, different products can use different fields.
But... You may then need to add logic to ensure that ProductX -always- has a value in FieldA, but that ProductY always has a value in FieldB, etc. And probably some meta-data about each product type so that your application knows which fields to use for which products. You still may need to add new fields, which is possibly tidier than adding new tables, but you still probably don't want the Application doing.
An option that totally avoids using DDL to add a product is to further normalise your data, and have the product-specific-properties in an Entity-Attribute-Value table. This is initially very attractive to many people as it is so generic and flexible.
Product(id, name, another-global-property, etc)
Product_Properties(product_id, property_id, property_value)
You'll probably have some meta-data and extra logic to ensure all the correct properties are used. But now you just add records to a generic structure whenever you create a new product.
But what type should "property value" be? It may need to hold strings, dates, numbers, anything. You could make it a string and use the meta-data to know how to CAST the value. Of you may have several value fields, one of each type, and a "field_type_id" or something to indicate which value-field should be read from.
It's also less friendly for certain searches. If you know a product_id, finding the properties is easy. If you want all products where the expiry date is in the past, you need to be careful about how you structure the data and indexes to make the query efficient. But if you want (expiry < today AND cost > 50) then you get a much different query from what you are used to - Each value is in a different ROW instead of a different FIELD.
Search performance really does begin to shrink as query complexity increases and design considerations become more technical.
Which way you go depends on application functional requirement, architecture and design decisions, and a good helpful dash of 'taste'.
You have tagged question as django. Then you should read this recent post:
Coding an inventory system, with polymorphic items and manageable item types
In this post #ThibaultJ explain how to accomplish this with Django model utils.
The idea is that you have a 'product' model and you inherit product1 and product2 from this model adding specific information for both. #ThibaultJ has posted intesting samples.
I will notice #ThibaultJ about this question. If #ThibaultJ writes an answer I will remove my post.
Here are some options
IMHO I would choose an Inheritance pattern, i.e. a new table called "ProductBase" with a unique Surrogate. Product base would have a classification e.g. "ProductType" which would then allow you to join into the appropriate 'subclass' Product table. OrderItem would reference just the Surrogate. Referential Integrity is enforcable, and it gives the opportunity for extending to additional forms of products. It does however require the use of a common unique surrogate amongst all Product table types. If there are other tables (other than OrderItem) referencing Product, it would also avoid the use of having to FK to composite keys.
Nullable Foreign Keys in OrderItem, i.e. OrderItem would have nullable FK to both (all) types of Product Tables, although only one of them would be present on each row.
By inner joining OrderItem to the appropriate Product tables would eliminate the 'wrong' product joins based on the NULLs. RI can still be enforced.
If you have the SAME type of Primary Key on all your Product subclass tables, then you could also add a single Product "Foreign" Key and a "ProductType" "Switch" on OrderItem. The problem here is that you can't enforce RI.
That said, I really wouldn't be creating a new table for each and every product - surely there are some broad 'categories' of Product which can be modelled in a uniform manner.
No doubt if you sell Aircraft and Groceries that you would probably need a AircraftProduct and a GroceryProduct, but surely A300, Boeing 747 and Cessna Skyhawk would fit as rows inside AircraftProduct, even if there are a few 'optional' nullable fields in each table not applicable to all products in this 'category'?
Edit : First see Dems and Duffmo's posts to see if you can avoid the requirement for having multiple Product tables at all, by using EAV / Multivalue / Metadata patterns to model Product.
Part of my schema for a travel project has the following tables
Cruises
Flights
Hotels
CarParking
I need a container that wraps one or more of these products into a package. One Cruise/Hotel etc might be part of many packages. I initially thought of
Package
- PackageId
- Etc
PackageItem
- PackageItemId
- PackageId (fk)
- ItemId (fk)
- ItemType
Where ItemType would indicate whether it's a Cruise, Flight, Hotel etc. I suppose I could use Triggers to enforce referential integrity.
My other idea was
Package
- ...
PackageItem
- PackageItemId
- PackageId (fk)
- CruiseId (nullable fk)
- FlightId (nullable fk)
- HotelId (nullable fk)
- CarParkingId (nullable fk)
- etc
I suppose each has it pros and cons, but I can't decide. Which do you think is better, which would you choose if you had to implement something like this?
Database is MySql. Platform is C# MVC ASP.NET
(I did search and there were a few similar questions but nothing that corresponded all that well)
The first option is the most flexible. And I tend to go with flexibility.
Advantage: Common Queries
If you want a report on cruises, the query is the same as one for hotels, but with a different WHERE clause.
Using the second form you need to join on and select from different tables.
*Advantage: Growth without Schema Changes
If you need to add Excursions to your model (something that can certainly have many associated to a single package), you just create a new Excursions type.
Using the second form you need to add new fields to your tables, creates new tables to hold the data, and update your queries and logic to use those new tables and fields.
Cost: Data moving to a form not friendly for human digestion
Many people could legitimately say that this shouldn't matter at all. I say that it matters in so far as you have to take account of it...
- It can make debugging harder, so you need to be more regimented and methodical
- It means your GUI has to be smarter in transforming your data for display
Also, although this is a cost, it has the benefit of forcing you into a mid-set where you are less likely so make simplistic assumptions and make sloppy mistakes. This is a cost that I like to have.
Falacy: Constraints can't be enforced
Constraint - Each package component must be either Hotel, Packing, Flight or Cruise
Method - Have a component_type table, and FK to that table
Constraint - Only one of each type allowed per package
Method - UNIQUE constraint on (package_id, component_type_id)
Constraint - Each component can only be within one package
Method - UNIQUE constraint on (component_id)
Cost - Deferred complexity
In my opinion, the normalised table to map Packages to Components is actually simple and elegant. The next step, is to decide how to store the associated details of a component.
A single global "component" table could hold all the fields, but allow them to be nullable. Thus a HOTEL would have a NULL Flight_Number. But all components would have a Price.
Or you could create an Entity_Attribute_Value table. This can be formed in such a way as to prevent hotels having a flight number...
- component_attributes table = (id, type_id, attribute_id, attribute_value)
- (type_id, attribute_id) can be foreign keyed to allowable combinations
It's impossible (afaik) to enforce REQUIRED fields, such as Price.
The Value is often stored as a VARCHAR.
For that reason, and others, search the data by Value becomes hard.
final opinion
I would not use option 2, as this is highly constrained and merges two considerations together - How to hold data for different component types (hotels, flights, etc) and how to relate them to their parent packages.
I would instead recommend that you consider the multitude of ways for holding the component data, and make that decision based on your needs. Then relate those components to the packages using a 1:many normalised mapping table. Your option 1.
You haven't mentioned in question whether you need to support multiple products of same type inside a single package - i.e. whether package can contain multiple Hotels, for example.
1) If support for multiple same-type products per package is required then you should go first way, but maybe split relationships into separate tables per product type, i.e.
PackageHotelItem
- PackageItemId
- PackageId (fk)
- HotelId (fk)
PackageCruiseItem
- PackageItemId
- PackageId (fk)
- CruiseId (fk)
... etc.
This way you will be able to have referential integrity via normal FK mechanism.
2) If you don't need such support then you may use your second solution.