Database Design For Recording Test Results - mysql

I work at a manufacturing plant where we assembly 10 different products. Each product is similar in function but requires different parameters to be tested. Originally I created an Access database to store our test results for each unit we build. I laid out the database by having one table for each product. This table stores the production ID along with the test parameters (pressures, temperatures, pass/fail information.. etc.) I feel like this was a poor way to approach this but it seemed to be the only way I could use access's bound forms for easy data entry. My problem is that now whenever I need to add a new test parameter I have to change the table design as well as the forms.
Soon I will have the ability to recreate this system in mySQL and I'm hoping there is a better way to approach storing these tests results. Any insight would be very useful.
Thanks.

Look up "database normalization."
At the most extreme, you could split it into 4 tables:
Product_Types: Product type (VARCHAR/CHAR), id (INT)
Products: id/production id (INT), product type (INT, foreign key bound to Product_Types.id)
Test_Parameters: Type (VARCHAR/CHAR - pressure, temp, etc), id (INT)
Test_Scores: Product (INT, foreign key to Products.id), test (INT, foreign key to Test_Parameters.id), score (INT/whatever seems appropriate), timestamp.
You could theoretically do without the first and third tables and instead just have the names saved in each record (i.e. Product entry: id = 12345, type = "chair"). It's very slightly faster for retrieval that way, but it's also not robust against people misspelling things (i.e. select * from products where type="chair" will miss an entry with type="chiar"), and takes up more storage space since you're saving the textual name over and over again.
Regardless, this is the basic model for a many-to-one relationship, which is what you're looking for: one product, many tests (or, with all four tables, many-to-many: many products, many test types). You need them in separate tables, with each product given an id, and then a foreign key to link each test result to the product it applies to.
Now, let's talk about constraints.
One that I would probably think about throwing on would be a unique key on the test-result table that indexes both the product id and test type, and then be sure to use "ON DUPLICATE KEY UPDATE" so that old values are overwritten by newer ones. That way, you're certain to only ever have one result for each test for each product. If you want to keep old records as well, disregard this paragraph.
The one thing you will definitely lose is the ability to require that all tests are done for a given product. That much will have to be done outside of the database. If you want to require that all the columns are filled in for every single product, then you have to do it pretty much the way you've been doing it (one column for each test in a colossal unified table with NOT NULL constraints on every test column), because now the test results and object id are functionally dependent on each other (neither can exist without the other).

I would use (at least) the following tables:
Product
Id, Name, TestSchedule
Analysis E.g. Measurement of temperature with normal operating parameters with a 1 Kelvin fault tolerance
Id, Name, Description, Instruction
Test E.g. Temperature measurement in product p, expected result is 300-360 Kelvin.
Id, ProductId, AnalysisId, LowerLimit, UpperLimit
TestResult Test result for batch X, e.g. 342 Kelvin, pass
Id, BatchId, TestId, Result, Status (pass/fail)
The reason for having both an Analysis table and a Test table is normalisation. The analysis is generic, specifying a method. The test specifies acceptable limits when the analysis is performed on a particular product.

I think you are looking at needing to use a Many To Many Table.
So One table that stores your products, one that stores each unique test, and then a third M2M table that links product A to however many tests it needs. you M2M could also store (generically) your test results.

Create a table of products with a unique ID / product. Then create a table of tests with a unique test id and a column for applicable product(s). Join these to find which tests apply to which products. You can add new tests at any point.
Further you could have a 'test version' column if you want to store test history, results, etc.

Related

How to structure a Bill of Materials that has multiple options

I am stuck trying to develop a Bill of Materials in Access. I have a table call IM_Item_Registry where I have the Item_Code and a boolean for if it's a component. Where I'm stuck is that past sins of the company made several part numbers for the same ingredient from different vendors. A product may use ingredient 1 at the beginning of the run and ingredient 2 at the end of a run depending on inventory and it may switch from job to job (Lack of discipline and random purchasing based on price). It's creating a headache for me because they typically have different inclusions. How would I go about adding in the flexibility to use both? or would it just be easier to make multiple versions and then select those version upon scheduling?
I know this is loaded and I can include more detail if needed but I appreciate your help I've been researching on how to do this for a couple weeks now.
EDIT (3/28/2019)
this is for an injection molding company.
IM_Item_Registry (Fields: Item_Code, Category(Raw, manufactured, customer supplied, assembly component), Description, Component (boolean), active (boolean), Unit of Measure.
for this Bill-of-materials 100011 produces component lets call this a handle. bill 100011 uses raw resin 700049 at 98% inclusion and raw color 600020 at 2% inclusion. However, we may run out of raw color 600020 and have to run it out of 600051 which would change 700049 to 98.5% inclusion because 600051 requires 1.5% inclusion to achieve the same color.
i would like to create a table that would call out for the general term lets say 600020 and 600051 is yellow color additive. then create a "ghost" number to call for either 600020 or 600051 and give both formulation recipes. When production starts they would scan in which color they actually used to create the production BOM themselves and record which color was used and how much. is there a way to do this in access database structuring?
I'm assuming I would need both the item_registry table, a BoM table (fields: BOM#, ParentID, Ghost_ID) and then a components table (Fields: Ghost_ID, item_code, Inclusion Rate).
Database normalization is the guiding principle for designing efficient, useful tables and relationships in a relational database. Access forms, subforms, reports, etc. require properly normalized tables to work as intended. There are various levels of normalization, but the common idea is to avoid duplication of data between rows and columns of data. Having duplicate data requires a lot of overhead in storage and in ensuring that actions on the database do not create inconsistent states (contradictory data values). Well-normalized tables allow useful constraints to be defined between data columns and/or rows to ensure that data is valid.
The [BoM] table as proposed in the question is not normalized. But before we get to that, the ParentID was not defined and it's not clear what it represents. Instead, to help show why it's not normalized, let me add a [Product] column to the [BoM] table. Then if such a handle has two alternative lists of components (ghosts?), the table would look like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Handle 2
See the duplication? And now if the product is renamed, for instance to "Bronze Handle", then both rows need to be updated for a single conceptual element. It also introduces the possibility of having contradictory data like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Bronze Handle 2
Enough said about that, since I've already gone on too much about normalization concepts here. Following is a basic normalized schema which would serve you better, but notice that it's not too much different that what you proposed in the question. The only real difference is that the BoM table is normalized by splitting its columns (and purpose) into another table.
I do not list all columns here, only primary and foreign keys and a few other meaningful columns. PK = Primary Key (unique, non-null key), FK = Foreign Key. Proper indices should be defined on the PK and FK columns AND relationships defined with appropriate constraints.
Table: [IM_Item_Registry]
Item_Code (PK)
Table: [BOM]
BOMID (PK)
ProductID (FK)
Table: [BOM_Option]
OptionID (PK)
BOMID (FK)
Primary (boolean) - flags the primary/usual list of components
Description
Table: [Option_Items]
OptionID (FK; part of composite PK)
Item_Code (FK; part of composite PK)
Inclusion_Rate
The [BOM].[ProductID] column alludes to another table with details of the product which should be defined separately from the Bill of Material. If this database really is super-simplistic, then it could just be a string field [Product] containing the name, but I assume there are more useful details to store. Perhaps this is what the ParentID also alluded to? (I suggest choosing names that are not so abstract like "parent" and "ghost", hence my choice of the word "option".)
Really, since [BOM_Option] should be limited to a single option per BOM, it would fulfill proper normalization to create another table like
Table: [BOM_Primary]
[BOMID] (FK and PK) - Primary key so only one primary option can be defined at once
[OptionID] (FK)

how can I constrain a foreign key relationship where it may refer to multiple other tables?

I have a table called inventory_movements , and I'm planing to save the products movements in and out the warehouse , it has fields like
1- movement_id(PK)
2- product_id(FK)
3- quantity int
4- unit_price decimal
5- movement ENUM('in','out')
6- date datetime
7- ????????? (reference )(e.g. sell(out)- purchase(in)- fire loss(out)
- sales return (in) - purchase return (out)
my problem is that I want to store the reference of the movement (the cause of the movement) whither it is the order id , or purchase id , purchase return id, .... etc
but I also want to make a constrain on this field to make sure that no invalid data (e.g. not exist purchase) will be stored in the database, of curse I can't make one foreign key references many tables (sales, purchases, purchase returns , ...etc)
a very bad solution is to add column for every reference type (sell id, purchase id, sales return id,etc.. ) and fill the right one in each movement and let the others null , but this is of curse against normalization and I can't add any more reference later.
what can I do in this situation ?
please consider that I'm very newbie, thanks
You have a few approaches. One is to have one foreign key per table type with a constraint that ensures that exactly one is not null. I agree that is clunky but some people prefer it (David Fetter, for example, has blogged about the benefits of this approach).
Another approach is to factor out the common parts of the referenced tables into a single, easily referenced table. If you cannot do this, you can have a trigger-maintained table instead. That would mean something like:
A transaction documents table
A table for sales/purchase data (or maybe different tables for this).
If that cannot be done then you have another table which just stores the ids, relevant tables, and an id for reference purpose, and that is maintained with a trigger, then you have a referring constraint there.
Either way, long-run you are probably going to end up with the second solution (a master transaction journal, and then other tables that extend it).
(Original design question answer below.
Depending on how you want to address this I can see one of two ways of doing it.
The first is to use a basic convention of positive numbers coming in and negative numbers going out. This works for global movements (purchases and sales) but it breaks down for local movements (moving between warehouses).
One option here is to have a separate "states" table which represents both global and local states. For example, purchases, sales, different warehouses, etc. Then you represent the transfer as a graph link between the state. You can also have a documents table which can represent purchases and sales, with appropriate classifictions etc. This allows three-way relationship between an in-state, an out-state, and a document. For example a sale could have an in-state as inventory (or a particular warehouse), an out-state of sale, and a document of the sales invoice.
Of course you can do both, storing global inventory in one way and warehouse movements in the other.

DB design for one-to-one single column table

I'm unsure the best route to take for this example:
A table that holds information for a job; salary, dates of employment etc. The field I am wondering how best to store is 'job_title'.
Job title is going to be used as part of an auto-complete field so
I'll be using a query to fetch results.
The same job title will be used by multiple jobs in the DB.
Job title is going to be a large part of many queries in the
application.
A single job only ever has one title.
1 . Should I have a 2 tables, job and job_title, job table referencing the job_title table for its name.
2 . Should I have a 2 tables, job and job_title but store title as a direct value in job, job_title just storing a list of all preexisting values (somewhat redundant)?
3 . Or should I not use a reference table at all / other suggestion.
What is your choice of design in this situation, and how would it change in a one to many design?
This is an example, the actual design is much larger however I think this well conveys the issue.
Update, To clarify:
A User (outside scope of question) has many Jobs, a job (start/end date, {job title}) has a title, title ( name (ie. 'Web Developer' )
Your option 1 is the best design choice. Create the two tables along these lines:
jobs (job_id PK, title_id FK not null, start_date, end_date, ...)
job_titles (title_id PK, title)
The PKs should have clustered indexes; jobs.title_id and job_titles should have nonclustered or secondary indexes; job_titles.title should have a unique constraint.
This relationship can be modeled as 1-to-1 or 1-to-many (one title, many jobs). To enforce 1-to-1 modeling, apply a unique constraint to jobs.title_id. However, you should not model this as a 1-to-1 relationship, because it's not. You even say so yourself: "The same job title will be used by multiple jobs in the DB" and "A single job only ever has one title." An entry in the jobs table represents a certain position held by a certain user during a certain period of time. Because this is a 1-to-many relationship, a separate table is the correct way to model the data.
Here's a simple example of why this is so. Your company only has one CEO, but what happens if the current one steps down and the board appoints a new one? You'll have two entries in jobs which both reference the same title, even though there's only one CEO "position" and the two users' job date ranges don't overlap. If you enforce a 1-to-1 relationship, modeling this data is impossible.
Why these particular indexes and constraints?
The ID columns are PKs and clustered indexes for hopefully obvious reasons; you use these for joins
jobs.title_id is an FK for hopefully obvious data integrity reasons
jobs.title_id is not null because every job should have a title
jobs.title_id needs an index in order to speed up joins
job_titles.title has an index because you've indicated you'll be querying based on this column (though I wouldn't query in such a fashion, especially since you've said there will be many titles; see below)
job_titles.title has a unique constraint because there's no reason to have duplicates of the same title. You can (and will) have multiple jobs with the same title, but you don't need two entries for "CEO" in job_titles. Enforcing this uniqueness will preserve data integrity useful for reporting purposes (e.g. plot the productivity of IT's web division based on how many "web developer" jobs are filled)
Remarks:
Job title is going to be used as part of an auto-complete field so I'll be using a query to fetch results.
As I mentioned before, use key-value pairs here. Fetch a list of them into memory in your app, and query that list for your autocomplete values. Then send the ID off to the DB for your actual SQL query. The queries will perform better that way; even with indexes, searching integers is generally quicker than searching strings.
You've said that titles will be user created. Put some input sanitation and validation process in place, because you don't want redundant entries like "WEB DEVELOPER", "web developer", "web developer", etc. Validation should occur at both the application and DB levels; the unique constraint is part (but all) of this. Prodigitalson's remark about separate machine and display columns is related to this issue.
Edited: after getting the clarify
A table like this is enough - just add the job_title_id column as foreign key in the main member table
---- "job_title" table ---- (store the job_title)
1. pk - job_title_id
2. unique - job_title_name <- index this
__ original answer __
You need to clarify what's the job_title going represent
a person that hold this position?
the division/department that has this position?
A certain set of attributes? like Sales always has a commission
or just a string of what was it called?
From what I read so far, you just need the "job_title" as some sort of dimension - make the id for it, make the string searchable - and that's it
example
---- "employee" table ---- (store employee info)
1. pk - employee_id
2. fk - job_title_id
3. other attribute (contract_start_date, salary, sex, ... so on ...)
---- "job_title" table ---- (store the job_title)
1. pk - job_title_id
2. unique - job_title_name <- index this
---- "employee_job_title_history" table ---- (We can check the employee job history here)
1. pk - employee_id
2. pk - job_title_id
3. pk - is_effective
4. effective_date [edited: this need to be PK too - thanks to KM.]
I still think you need to provide us a use-case - that will greatly improve both of our understanding I believe
If there are only a few fixed job titles you might want to use an enum in our database.
See http://dev.mysql.com/doc/refman/5.0/en/enum.html
If that's not supported by your version of mysql simply encode it with a numerical index and resolve it to a human readable form in your queries.

DB design to store different products for each customer order

I'm building a simple way to insert customer orders into the db.
We have several products, each one needs different properties.
I've started designing the following tables:
CUSTOMER -> Order (FK to CUSTOMER) -> OrderItem (FK to Order)
Now I'm thinking How could I link product-specific tables to OrderItem.
Suppose I've two products: product1 (room_name, width, height, color) and product2 (number, width, height, type, optionals). I'd create two different tables and link them with the OrderItem, to get specific options, am I wrong? (of course there will be more than just two products)
How can I do this?
I'd have one Product table with a one-to-many relationship between OrderItem and Product. Put a FOREIGN KEY in the OrderItem table that points to its associated Product.
A design like yours would mean you'd have to add a table every time there was a new product. That would not do. You want to add products by inserting new rows.
No approach can resolve all of the issues you may be dealing with, the choice you make depends on which factor is most important to you.
Most people shirk away from having multiple tables. One reason is that you don't know how many tables you may end up with in the future. Another is that your queries may also bloat by having to join to multiple tables. And it may become a maintenance headache with multiple queries to update every time you add a table. Finally, adding a table is not even remotely as friendly as adding a record (Do you really want your App to be able to create tables?).
One option is just to add more and more fields to the Product table. By making the property fields NULLable, different products can use different fields.
But... You may then need to add logic to ensure that ProductX -always- has a value in FieldA, but that ProductY always has a value in FieldB, etc. And probably some meta-data about each product type so that your application knows which fields to use for which products. You still may need to add new fields, which is possibly tidier than adding new tables, but you still probably don't want the Application doing.
An option that totally avoids using DDL to add a product is to further normalise your data, and have the product-specific-properties in an Entity-Attribute-Value table. This is initially very attractive to many people as it is so generic and flexible.
Product(id, name, another-global-property, etc)
Product_Properties(product_id, property_id, property_value)
You'll probably have some meta-data and extra logic to ensure all the correct properties are used. But now you just add records to a generic structure whenever you create a new product.
But what type should "property value" be? It may need to hold strings, dates, numbers, anything. You could make it a string and use the meta-data to know how to CAST the value. Of you may have several value fields, one of each type, and a "field_type_id" or something to indicate which value-field should be read from.
It's also less friendly for certain searches. If you know a product_id, finding the properties is easy. If you want all products where the expiry date is in the past, you need to be careful about how you structure the data and indexes to make the query efficient. But if you want (expiry < today AND cost > 50) then you get a much different query from what you are used to - Each value is in a different ROW instead of a different FIELD.
Search performance really does begin to shrink as query complexity increases and design considerations become more technical.
Which way you go depends on application functional requirement, architecture and design decisions, and a good helpful dash of 'taste'.
You have tagged question as django. Then you should read this recent post:
Coding an inventory system, with polymorphic items and manageable item types
In this post #ThibaultJ explain how to accomplish this with Django model utils.
The idea is that you have a 'product' model and you inherit product1 and product2 from this model adding specific information for both. #ThibaultJ has posted intesting samples.
I will notice #ThibaultJ about this question. If #ThibaultJ writes an answer I will remove my post.
Here are some options
IMHO I would choose an Inheritance pattern, i.e. a new table called "ProductBase" with a unique Surrogate. Product base would have a classification e.g. "ProductType" which would then allow you to join into the appropriate 'subclass' Product table. OrderItem would reference just the Surrogate. Referential Integrity is enforcable, and it gives the opportunity for extending to additional forms of products. It does however require the use of a common unique surrogate amongst all Product table types. If there are other tables (other than OrderItem) referencing Product, it would also avoid the use of having to FK to composite keys.
Nullable Foreign Keys in OrderItem, i.e. OrderItem would have nullable FK to both (all) types of Product Tables, although only one of them would be present on each row.
By inner joining OrderItem to the appropriate Product tables would eliminate the 'wrong' product joins based on the NULLs. RI can still be enforced.
If you have the SAME type of Primary Key on all your Product subclass tables, then you could also add a single Product "Foreign" Key and a "ProductType" "Switch" on OrderItem. The problem here is that you can't enforce RI.
That said, I really wouldn't be creating a new table for each and every product - surely there are some broad 'categories' of Product which can be modelled in a uniform manner.
No doubt if you sell Aircraft and Groceries that you would probably need a AircraftProduct and a GroceryProduct, but surely A300, Boeing 747 and Cessna Skyhawk would fit as rows inside AircraftProduct, even if there are a few 'optional' nullable fields in each table not applicable to all products in this 'category'?
Edit : First see Dems and Duffmo's posts to see if you can avoid the requirement for having multiple Product tables at all, by using EAV / Multivalue / Metadata patterns to model Product.

Different database tables joining on single table

So imagine you have multiple tables in your database each with it's own structure and each with a PRIMARY KEY of it's own.
Now you want to have a Favorites table so that users can add items as favorites. Since there are multiple tables the first thing that comes in mind is to create one Favorites table per table:
Say you have a table called Posts with PRIMARY KEY (post_id) and you create a Post_Favorites with PRIMARY KEY (user_id, post_id)
This would probably be the simplest solution, but could it be possible to have one Favorites table joining across multiple tables?
I've though of the following as a possible solution:
Create a new table called Master with primary key (master_id). Add triggers on all tables in your database on insert, to generate a new master_id and write it along the row in your table. Also let's consider that we also write in the Master table, where the master_id has been used (on which table)
Now you can have one Favorites table with PRIMARY KEY (user_id, master_id)
You can select the Favorites table and join with each individual table on the master_id and get the the favorites per table. But would it be possible to get all the favorites with one query (maybe not a query, but a stored procedure?)
Do you think that this is a stupid approach? Since you will perform one query per table what are you gaining by having a single table?
What are your thoughts on the matter?
One way wold be to sub-type all possible tables to a generic super-type (Entity) and than link user preferences to that super-type. For example:
I think you're on the right track, but a table-based inheritance approach would be great here:
Create a table master_ids, with just one column: an int-identity primary key field called master_id.
On your other tables, (users as an example), change the user_id column from being an int-identity primary key to being just an int primary key. Next, make user_id a foreign key to master_ids.master_id.
This largely preserves data integrity. The only place you can trip up is if you have a master_id = 1, and with a user_id = 1 and a post_id = 1. For a given master_id, you should have only one entry across all tables. In this scenario you have no way of knowing whether master_id 1 refers to the user or to the post. A way to make sure this doesn't happen is to add a second column to the master_ids table, a type_id column. Type_id 1 can refer to users, type_id 2 can refer to posts, etc.. Then you are pretty much good.
Code "gymnastics" may be a bit necessary for inserts. If you're using a good ORM, it shouldn't be a problem. If not, stored procs for inserts are the way to go. But you're having your cake and eating it too.
I'm not sure I really understand the alternative you propose.
But in general, when given the choice of 1) "more tables" or 2) "a mega-table supported by a bunch of fancy code work" ..your interests are best served by more tables without the code gymnastics.
A Red Flag was "Add triggers on all tables in your database" each trigger fire is a performance hit of it's own.
The database designers have built in all kinds of technology to optimize tables/indexes, much of it behind the scenes without you knowing it. Just sit back and enjoy the ride.
Try these for inspiration Database Answers ..no affiliation to me.
An alternative to your approach might be to have the favorites table as user_id, object_id, object_type. When inserting in the favorites table just insert the type of the favorite. However i dont see a simple query being able to work with your approach or mine. One way to go about it might be to use UNION and get one combined resultset and then identify what type of record it is based on the type. Another thing you can do is, turn the UNION query into a MySQL VIEW and simply query that VIEW.
The benefit of using a single table for favorites is a simplicity, which some might consider as against the database normalization rules. But on the upside, you dont have to create so many favorites table and you can add anything to favorites easily by just coming up with a new object_type identifier.
It sounds like you have an is-a type relationship that needs to be modeled. All of the items that can be favourited are a type of "item". It sounds like you are on the right track, but I wouldn't use triggers. What could be the right answer if I have understood correctly, is to pull all the common fields into a single table called items (master is a poor name, master of what?), this should include all the common data that would be needed when you need a users favourite items, I'd expect this to include fields like item_id (primary key), item_type and human_readable_name and maybe some metadata about when the item was created, modified etc. Each of your specific item types would have its own table containing data specific to that item type with an item_id field that has a foreign key relationship to the item table. Then you'd wrap each item type in its own insertion, update and selection SPs (i.e. InsertItemCheese, UpdateItemMonkey, SelectItemCarKeys). The favourites table would then work as you describe, but you only need to select from the item table. If your app needs the specific data for each item type, it would have to be queried for each item (caching is your friend here).
If MySQL supports SPs with multiple result sets you could write one that outputs all the items as a result set, then a result set for each item type if you need all the specific item data in one go. For most cases I would not expect you to need all the data all the time.
Keep in mind that not EVERY use of a PK column needs a constraint. For example a logging table. Even though a logging table has a copy of the PK column from the table being logged, you can't build a constraint.
What would be the worst possible case. You insert a record for Oprah's TV show into the favorites table and then next year you delete the Oprah Show from the list of TV shows but don't delete that ID from the Favorites table? Will that break anything? Probably not. When you join favorites to TV shows that record will fall out of the result set.
There are a couple of ways to share values for PK's. Oracle has the advantage of sequences. If you don't have those you can add a "Step" to your Autonumber fields. There's always a risk though.
Say you think you'll never have more than 10 tables of "things which could be favored" Then start your PK's at 0 for the first table increment by 10, 1 for the second table increment by 10, 2 for the third... and so on. That will guarantee that all the values will be unique across those 10 tables. The risk is that a future requirement will add table 11. You can always 'pad' your guestimate