I'm trying to use foreign keys properly to maintain data integrity. I'm not really a database guy so I'm wondering if there is some general design principle I don't know about. Here's an example of what I'm trying to do:
Say you want to build a database of vehicles with Type (car, truck, etc.), Make, and Model. A user has to input at least the Type, but the Make and Model are optional (if Model is given, then Make is required). My first idea is to set up the database as such:
Type:
-id (PK)
-description
Make:
-id (PK)
-type_id (FK references Type:id)
-description
Model:
-id (PK)
-make_id (FK references Make:id)
-description
Vechicle:
-id (PK)
-type_id (FK references Type:id)
-make_id (FK references Make:id)
-model_id (FK references Model:id)
How would you setup the FKs for Vehicle to ensure that the Type, Make, and Model all match up? For example, how would you prevent a vehicle having (Type:Motorcyle, Make:Ford, Model:Civic)? Each of those would be valid FKs, but they don't maintain the relationships shown through the other tables' FKs.
Also, because Model isn't required, I can't just store the model_id FK and work backwards from it.
I'm not tied to the database design at all, so I'm open to the possibility of having to change the way the tables are set up. Any ideas?
P.S. - I'm using mysql if anyone's interested, but this is more of a general question about databases.
Edit (Clarifications):
-type_id and make_id are needed in the vehicle table unless there is some way to figure those out in the case that model_id is null;
-the relationships between type_id, make_id, and model_id need to be maintained.
What you are looking for is a CHECK constraint. Unfortunately MySQL does not currently support this. You could emulate such functionality with triggers but you would need to create both an INSERT and an UPDATE trigger for it to work.
However, as other answers have indicated, all you should really be storing is the vehicle model. In you application you should be drilling down to the type if it's available.
Like this:
Type:
id (PK)
description
Make:
id (PK)
type_id (FK references Type:id, not null)
description
Model:
id (PK)
make_id (FK references Make:id, not null)
description
Vechicle:
id (PK)
model_id (FK references Model:id)
Basically don't double reference make and type from vehicle as well. You'll run into problems if you do that. You can get the make and type from the model of the vehicle (if defined). Model must have make. Make must have type.
Think about that for a second: if vehicle has a given model but vehicle and model both have a make, those values can be different. This kind of inconsistency can develop because of information redundancy. You want to avoid that generally.
If you need to figure out the make and type of a vehicle the SQL starts to look like this:
SELECT v.id, v.model_id, m.make_id, k.type_id
FROM vehicle v
LEFT JOIN model m ON v.model_id = m.id
JOIN make k ON m.make_id = k.id
JOIN type t ON k.type_id = t.id
And so on.
Here is one approach:
- One make (Ford, GM, Honda) can have many models, one model belongs to only one make.
- Model is of a certain type (car, truck bike).
- Vehicle is of a certain model. One vehicle can be of only one model; there can be many vehicles of a model.
Model table contains columns common to all models; while car, truck, and motorcycle have columns specific to each one.
When modeling a DB, consider data, entities and relationships; don't start from the UI -- there is a business layer in between to sort things out. It is OK to use MySQL, you can enforce check and foreign key constraints on your application layer.
Your design is fine for data integrity, it will be the job of your application to maintain that a Vehicle must be made up of Makes from a particular Type and Models of a particular Make.
If you want to maintain vehicle type/make/model integrity in the database you could add a check constraint to your Vehicle table that makes sure the Vehicle's make's type id equals the provided type id. And if the model id is not null, make sure it's make id is the same as the make id provided.
I see you already accepted an answer, but an alternate approach that handles your actual structural problem and doesn't use triggers or check constraints would be to create dummy entries in the Make and Model tables with a description of "n/a" or such, one for each entry in Type and Make respectively, and then get rid of the redundant columns in Vehicle.
That way, if all you know is the Type of a vehicle, you'd find the dummy entry in Make that references the appropriate Type, then find the dummy entry in Model that references that Make, then reference that Model from the new row in Vehicle.
The main downsides of course would be extra housekeeping to create the dummy rows, either ahead of time when adding a Type or Make, or on demand when adding a Vehicle with missing data.
Related
I am stuck trying to develop a Bill of Materials in Access. I have a table call IM_Item_Registry where I have the Item_Code and a boolean for if it's a component. Where I'm stuck is that past sins of the company made several part numbers for the same ingredient from different vendors. A product may use ingredient 1 at the beginning of the run and ingredient 2 at the end of a run depending on inventory and it may switch from job to job (Lack of discipline and random purchasing based on price). It's creating a headache for me because they typically have different inclusions. How would I go about adding in the flexibility to use both? or would it just be easier to make multiple versions and then select those version upon scheduling?
I know this is loaded and I can include more detail if needed but I appreciate your help I've been researching on how to do this for a couple weeks now.
EDIT (3/28/2019)
this is for an injection molding company.
IM_Item_Registry (Fields: Item_Code, Category(Raw, manufactured, customer supplied, assembly component), Description, Component (boolean), active (boolean), Unit of Measure.
for this Bill-of-materials 100011 produces component lets call this a handle. bill 100011 uses raw resin 700049 at 98% inclusion and raw color 600020 at 2% inclusion. However, we may run out of raw color 600020 and have to run it out of 600051 which would change 700049 to 98.5% inclusion because 600051 requires 1.5% inclusion to achieve the same color.
i would like to create a table that would call out for the general term lets say 600020 and 600051 is yellow color additive. then create a "ghost" number to call for either 600020 or 600051 and give both formulation recipes. When production starts they would scan in which color they actually used to create the production BOM themselves and record which color was used and how much. is there a way to do this in access database structuring?
I'm assuming I would need both the item_registry table, a BoM table (fields: BOM#, ParentID, Ghost_ID) and then a components table (Fields: Ghost_ID, item_code, Inclusion Rate).
Database normalization is the guiding principle for designing efficient, useful tables and relationships in a relational database. Access forms, subforms, reports, etc. require properly normalized tables to work as intended. There are various levels of normalization, but the common idea is to avoid duplication of data between rows and columns of data. Having duplicate data requires a lot of overhead in storage and in ensuring that actions on the database do not create inconsistent states (contradictory data values). Well-normalized tables allow useful constraints to be defined between data columns and/or rows to ensure that data is valid.
The [BoM] table as proposed in the question is not normalized. But before we get to that, the ParentID was not defined and it's not clear what it represents. Instead, to help show why it's not normalized, let me add a [Product] column to the [BoM] table. Then if such a handle has two alternative lists of components (ghosts?), the table would look like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Handle 2
See the duplication? And now if the product is renamed, for instance to "Bronze Handle", then both rows need to be updated for a single conceptual element. It also introduces the possibility of having contradictory data like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Bronze Handle 2
Enough said about that, since I've already gone on too much about normalization concepts here. Following is a basic normalized schema which would serve you better, but notice that it's not too much different that what you proposed in the question. The only real difference is that the BoM table is normalized by splitting its columns (and purpose) into another table.
I do not list all columns here, only primary and foreign keys and a few other meaningful columns. PK = Primary Key (unique, non-null key), FK = Foreign Key. Proper indices should be defined on the PK and FK columns AND relationships defined with appropriate constraints.
Table: [IM_Item_Registry]
Item_Code (PK)
Table: [BOM]
BOMID (PK)
ProductID (FK)
Table: [BOM_Option]
OptionID (PK)
BOMID (FK)
Primary (boolean) - flags the primary/usual list of components
Description
Table: [Option_Items]
OptionID (FK; part of composite PK)
Item_Code (FK; part of composite PK)
Inclusion_Rate
The [BOM].[ProductID] column alludes to another table with details of the product which should be defined separately from the Bill of Material. If this database really is super-simplistic, then it could just be a string field [Product] containing the name, but I assume there are more useful details to store. Perhaps this is what the ParentID also alluded to? (I suggest choosing names that are not so abstract like "parent" and "ghost", hence my choice of the word "option".)
Really, since [BOM_Option] should be limited to a single option per BOM, it would fulfill proper normalization to create another table like
Table: [BOM_Primary]
[BOMID] (FK and PK) - Primary key so only one primary option can be defined at once
[OptionID] (FK)
I have the following database schema:
t_class: Stores metadata about a class (primary key: class_ID)
t_students: linked to class by foreign key class_ID
t_exams: Stores metadata about an exam (no foreign keys so far)
t_grades: Links t_exams to t_students as the both having a n to m relationsship (has no primary key, but two foreign keys: exam_ID and student_ID). It also has the column grade which stores the result of each student in an exam.
The problem: It is possible that t_grades has no entries for a particular exam. With the current schema there's no way to get all exams of a class if t_grades has no entry.
My solution:
a) Add a key class_ID to t_exams. Downsides: It is more a less a redundant key and I can't add it as a foreign key (results in mysql error 1452)
b) Automatically add all students of a class to t_grades and just leave t_grades.grade empty. This feels very redundant as well.
Question: Is there a better way to solve this specific problem or should I stick with one of my solutions?
Create Code:
Sample database from this post
From the actual database I'm using (it's mostly German unfortunately
I really don't like the way you've graphed your database design, but I understand your problem.
In my opinion you should connect the t_exams directly to t_classes, since all students in a class should take the exam. So, that's your first solution.
In simple terms: A student belongs to a class, a class can get an exam, and an exam, a student took, can be graded.
This seems like a perfectly logical design to me. I don't get why you cannot implement this? I guess we need to see your CREATE TABLE queries?
I agree: My graph is also far from perfect, but just like with yours: I hope you get the idea.
Part of my schema for a travel project has the following tables
Cruises
Flights
Hotels
CarParking
I need a container that wraps one or more of these products into a package. One Cruise/Hotel etc might be part of many packages. I initially thought of
Package
- PackageId
- Etc
PackageItem
- PackageItemId
- PackageId (fk)
- ItemId (fk)
- ItemType
Where ItemType would indicate whether it's a Cruise, Flight, Hotel etc. I suppose I could use Triggers to enforce referential integrity.
My other idea was
Package
- ...
PackageItem
- PackageItemId
- PackageId (fk)
- CruiseId (nullable fk)
- FlightId (nullable fk)
- HotelId (nullable fk)
- CarParkingId (nullable fk)
- etc
I suppose each has it pros and cons, but I can't decide. Which do you think is better, which would you choose if you had to implement something like this?
Database is MySql. Platform is C# MVC ASP.NET
(I did search and there were a few similar questions but nothing that corresponded all that well)
The first option is the most flexible. And I tend to go with flexibility.
Advantage: Common Queries
If you want a report on cruises, the query is the same as one for hotels, but with a different WHERE clause.
Using the second form you need to join on and select from different tables.
*Advantage: Growth without Schema Changes
If you need to add Excursions to your model (something that can certainly have many associated to a single package), you just create a new Excursions type.
Using the second form you need to add new fields to your tables, creates new tables to hold the data, and update your queries and logic to use those new tables and fields.
Cost: Data moving to a form not friendly for human digestion
Many people could legitimately say that this shouldn't matter at all. I say that it matters in so far as you have to take account of it...
- It can make debugging harder, so you need to be more regimented and methodical
- It means your GUI has to be smarter in transforming your data for display
Also, although this is a cost, it has the benefit of forcing you into a mid-set where you are less likely so make simplistic assumptions and make sloppy mistakes. This is a cost that I like to have.
Falacy: Constraints can't be enforced
Constraint - Each package component must be either Hotel, Packing, Flight or Cruise
Method - Have a component_type table, and FK to that table
Constraint - Only one of each type allowed per package
Method - UNIQUE constraint on (package_id, component_type_id)
Constraint - Each component can only be within one package
Method - UNIQUE constraint on (component_id)
Cost - Deferred complexity
In my opinion, the normalised table to map Packages to Components is actually simple and elegant. The next step, is to decide how to store the associated details of a component.
A single global "component" table could hold all the fields, but allow them to be nullable. Thus a HOTEL would have a NULL Flight_Number. But all components would have a Price.
Or you could create an Entity_Attribute_Value table. This can be formed in such a way as to prevent hotels having a flight number...
- component_attributes table = (id, type_id, attribute_id, attribute_value)
- (type_id, attribute_id) can be foreign keyed to allowable combinations
It's impossible (afaik) to enforce REQUIRED fields, such as Price.
The Value is often stored as a VARCHAR.
For that reason, and others, search the data by Value becomes hard.
final opinion
I would not use option 2, as this is highly constrained and merges two considerations together - How to hold data for different component types (hotels, flights, etc) and how to relate them to their parent packages.
I would instead recommend that you consider the multitude of ways for holding the component data, and make that decision based on your needs. Then relate those components to the packages using a 1:many normalised mapping table. Your option 1.
You haven't mentioned in question whether you need to support multiple products of same type inside a single package - i.e. whether package can contain multiple Hotels, for example.
1) If support for multiple same-type products per package is required then you should go first way, but maybe split relationships into separate tables per product type, i.e.
PackageHotelItem
- PackageItemId
- PackageId (fk)
- HotelId (fk)
PackageCruiseItem
- PackageItemId
- PackageId (fk)
- CruiseId (fk)
... etc.
This way you will be able to have referential integrity via normal FK mechanism.
2) If you don't need such support then you may use your second solution.
This is probably a very stupid question, but I am just not sure which solution is the most elegant and the best(most performant) way to go in the following scenario.
I have the following tables:
Customer, Company, Meter, Reading
all of the tables above the line are supposed to be linked to one or more records of a "Comment" table. Which is the best way to model this relationship?
I am seeing two solutions here:
1.) use m:n relationships: CustomerComment, CompanyComment, etc. -> easy to extend later on, but a lot of new tables
2.) use 1:n relationships: Comment table has a field for the PK of the tables above (Customer_id, Company_id, ...) -> minimal table approach, but "harder" to extend since I would have to add a new field to the comment table whenever there is a new table that needs to be have comments
The target is a modular application, which may or may not have any of those four tables.
Which one is the better one - or are there more?
Thanks!
This is the problem with using integers for primary keys. You have a few solutions you can use.
The true unique ID for any given row for Customer, Company, Meter, Reading is a UUID. Maybe because of the database design the primary key has to be an integer but that is ok. This means you never have to add fields to the COMMENTS table if you have a new type in your system. It will always reference by the types ID.
Your tables can look like this:
CUSTOMER
ID UUID
COMPANY
ID UUID
METER
ID UUID
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
Now you can attach comments to any table that has a unique ID.
If you want to support referential constraints
OBJECT is a table that holds all of the ID's of all the pieces of data you have in your system. We really start building a system in which you can associate any comment with anything you want. This may not be suitable in your design.
OBJECT
ID UUID
CUSTOMER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMPANY
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
METER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
FOREIGN_KEY (RELATED_TO) REFERENCES OBJECT(ID) ON DELETE CASCADE
This complicates the design in order to assure you don't need to add 2 tables for each new type in the system. Each design has sacrifices. In this one you've complicated things by saying for every each entry whether it be Company, Customer, Meter I need an associated ID int he Object table so I can put a foreign key on it.
I prefer one for each pair - CustomerComment, CompanyComment, etc. It eventually will speed up your queries, and while it isn't as 'extensible' as a single CommentLink table, you'll need to make schema changes when you add something else that needs comments anyway.
I would use seperate tables, that way you can keep the referential constraints simple.
Say we have this scenario:
Artist ==< Album ==< Track
//ie, One Artist can have many albums, and one album can have many tracks
In this case, all 3 entities have basically the same fields:
ID
Name
A foreign of the one-many relationship to the corresponding children (Artist to Album and Album to Track
A typical solution to the provided solution would be three tables, with the same fields (ArtistID, AlbumID etc...) and foreign key constraints in the one-many relationship field.
But, can we in this case, incorporate a form of inheritance to avoid the repetition of the same field ? I'm talking something of the sort:
Table: EntityType(EntityTypeID, EntityName)
This table would hold 3 entities (1. Artist, 2. Album, 3. Track)
Table: Entities(EntityID, Name, RelField, EntityTypeID)
This table will hold the name of the entity (like the name of
an artist for example), the one-many field (foreign-key
of EntityID) and EntityTypeID holding 1 for Artist, 2 for Album
and so on.
What do you think about the above design? Does it make sense to incorporate "OOP concepts" in this DB scenario?
And finally, would you prefer having the foreign-key constraints of the first scenario or the more generic (with the risk of linking an artist with a Track for example, since there is no check to see the inputter foreign-key value is really of an album) approach?
..btw, come to think of it, I think you can actually check if an inputted value of the RelField of an Artist corresponds to an Album, with triggers maybe?
I have recently seen this very idea of abstraction implemented consistenly, and the application and its database became a monster to maintain and troubleshoot. I will stay away from this technique. The simpler, the better, is my mantra.
There's very little chance that the additional fields that will inevitably accumulate on the various entities will be as obliging. Nothing to be gained by not reflecting reality in a reasonably close fashion.
I don't imagine you'd even likely conflate these entities in your regular OO design.
This reminds me (but only slightly) of an attempt I saw once to implement everything in a single table (named "Entity") with another table (named "Attributes") and a junction table between them.
By stucking all three together, you make your queries less readble (unless you then decompose the three categories as views) and you make searching and indexing more difficult.
Plus, at some point you'll want to add attributes to one category, which aren't attributes for the others. Sticking all three together gives you no room for change without ripping out chunks of your system.
Don't get so clever you trip yourself up.
The only advantage I can see to doing it in your OOP way is if there are other element types added in future (i.e., other than artist, album and track). In that case, you wouldn't need a schema change.
However, I'd tend to opt for the non-OOP way and just change the schema in that case. Some problems you have with the OOP solution are:
what if you want to add the birthdate of artist?
what if you want to store duration of albums and tracks?
what if the want to store track type?
Basically, what if you want to store something that's psecific only to one or two of the element types?
If you're in to this sort of thing, then take a look at table inheritance in PostgreSQL.
create table Artist (id integer not null primary key, name varchar(50));
create table Album (parent integer foreign key (id) references Artist) inherits (Artist);
create table Track (parent integer foreign key (id) references Album) inherits (Artist);
I agree with le dorfier, you might get some reuse out of the notion of a base entity (ID, Name) but beyond that point the concepts of Artist, Album, and Track will diverge.
And a more realistic model would probably have to deal with the fact that multiple artists may contribute to a single track on an album...