DB design for one-to-one single column table - mysql

I'm unsure the best route to take for this example:
A table that holds information for a job; salary, dates of employment etc. The field I am wondering how best to store is 'job_title'.
Job title is going to be used as part of an auto-complete field so
I'll be using a query to fetch results.
The same job title will be used by multiple jobs in the DB.
Job title is going to be a large part of many queries in the
application.
A single job only ever has one title.
1 . Should I have a 2 tables, job and job_title, job table referencing the job_title table for its name.
2 . Should I have a 2 tables, job and job_title but store title as a direct value in job, job_title just storing a list of all preexisting values (somewhat redundant)?
3 . Or should I not use a reference table at all / other suggestion.
What is your choice of design in this situation, and how would it change in a one to many design?
This is an example, the actual design is much larger however I think this well conveys the issue.
Update, To clarify:
A User (outside scope of question) has many Jobs, a job (start/end date, {job title}) has a title, title ( name (ie. 'Web Developer' )

Your option 1 is the best design choice. Create the two tables along these lines:
jobs (job_id PK, title_id FK not null, start_date, end_date, ...)
job_titles (title_id PK, title)
The PKs should have clustered indexes; jobs.title_id and job_titles should have nonclustered or secondary indexes; job_titles.title should have a unique constraint.
This relationship can be modeled as 1-to-1 or 1-to-many (one title, many jobs). To enforce 1-to-1 modeling, apply a unique constraint to jobs.title_id. However, you should not model this as a 1-to-1 relationship, because it's not. You even say so yourself: "The same job title will be used by multiple jobs in the DB" and "A single job only ever has one title." An entry in the jobs table represents a certain position held by a certain user during a certain period of time. Because this is a 1-to-many relationship, a separate table is the correct way to model the data.
Here's a simple example of why this is so. Your company only has one CEO, but what happens if the current one steps down and the board appoints a new one? You'll have two entries in jobs which both reference the same title, even though there's only one CEO "position" and the two users' job date ranges don't overlap. If you enforce a 1-to-1 relationship, modeling this data is impossible.
Why these particular indexes and constraints?
The ID columns are PKs and clustered indexes for hopefully obvious reasons; you use these for joins
jobs.title_id is an FK for hopefully obvious data integrity reasons
jobs.title_id is not null because every job should have a title
jobs.title_id needs an index in order to speed up joins
job_titles.title has an index because you've indicated you'll be querying based on this column (though I wouldn't query in such a fashion, especially since you've said there will be many titles; see below)
job_titles.title has a unique constraint because there's no reason to have duplicates of the same title. You can (and will) have multiple jobs with the same title, but you don't need two entries for "CEO" in job_titles. Enforcing this uniqueness will preserve data integrity useful for reporting purposes (e.g. plot the productivity of IT's web division based on how many "web developer" jobs are filled)
Remarks:
Job title is going to be used as part of an auto-complete field so I'll be using a query to fetch results.
As I mentioned before, use key-value pairs here. Fetch a list of them into memory in your app, and query that list for your autocomplete values. Then send the ID off to the DB for your actual SQL query. The queries will perform better that way; even with indexes, searching integers is generally quicker than searching strings.
You've said that titles will be user created. Put some input sanitation and validation process in place, because you don't want redundant entries like "WEB DEVELOPER", "web developer", "web developer", etc. Validation should occur at both the application and DB levels; the unique constraint is part (but all) of this. Prodigitalson's remark about separate machine and display columns is related to this issue.

Edited: after getting the clarify
A table like this is enough - just add the job_title_id column as foreign key in the main member table
---- "job_title" table ---- (store the job_title)
1. pk - job_title_id
2. unique - job_title_name <- index this
__ original answer __
You need to clarify what's the job_title going represent
a person that hold this position?
the division/department that has this position?
A certain set of attributes? like Sales always has a commission
or just a string of what was it called?
From what I read so far, you just need the "job_title" as some sort of dimension - make the id for it, make the string searchable - and that's it
example
---- "employee" table ---- (store employee info)
1. pk - employee_id
2. fk - job_title_id
3. other attribute (contract_start_date, salary, sex, ... so on ...)
---- "job_title" table ---- (store the job_title)
1. pk - job_title_id
2. unique - job_title_name <- index this
---- "employee_job_title_history" table ---- (We can check the employee job history here)
1. pk - employee_id
2. pk - job_title_id
3. pk - is_effective
4. effective_date [edited: this need to be PK too - thanks to KM.]
I still think you need to provide us a use-case - that will greatly improve both of our understanding I believe

If there are only a few fixed job titles you might want to use an enum in our database.
See http://dev.mysql.com/doc/refman/5.0/en/enum.html
If that's not supported by your version of mysql simply encode it with a numerical index and resolve it to a human readable form in your queries.

Related

How to structure a Bill of Materials that has multiple options

I am stuck trying to develop a Bill of Materials in Access. I have a table call IM_Item_Registry where I have the Item_Code and a boolean for if it's a component. Where I'm stuck is that past sins of the company made several part numbers for the same ingredient from different vendors. A product may use ingredient 1 at the beginning of the run and ingredient 2 at the end of a run depending on inventory and it may switch from job to job (Lack of discipline and random purchasing based on price). It's creating a headache for me because they typically have different inclusions. How would I go about adding in the flexibility to use both? or would it just be easier to make multiple versions and then select those version upon scheduling?
I know this is loaded and I can include more detail if needed but I appreciate your help I've been researching on how to do this for a couple weeks now.
EDIT (3/28/2019)
this is for an injection molding company.
IM_Item_Registry (Fields: Item_Code, Category(Raw, manufactured, customer supplied, assembly component), Description, Component (boolean), active (boolean), Unit of Measure.
for this Bill-of-materials 100011 produces component lets call this a handle. bill 100011 uses raw resin 700049 at 98% inclusion and raw color 600020 at 2% inclusion. However, we may run out of raw color 600020 and have to run it out of 600051 which would change 700049 to 98.5% inclusion because 600051 requires 1.5% inclusion to achieve the same color.
i would like to create a table that would call out for the general term lets say 600020 and 600051 is yellow color additive. then create a "ghost" number to call for either 600020 or 600051 and give both formulation recipes. When production starts they would scan in which color they actually used to create the production BOM themselves and record which color was used and how much. is there a way to do this in access database structuring?
I'm assuming I would need both the item_registry table, a BoM table (fields: BOM#, ParentID, Ghost_ID) and then a components table (Fields: Ghost_ID, item_code, Inclusion Rate).
Database normalization is the guiding principle for designing efficient, useful tables and relationships in a relational database. Access forms, subforms, reports, etc. require properly normalized tables to work as intended. There are various levels of normalization, but the common idea is to avoid duplication of data between rows and columns of data. Having duplicate data requires a lot of overhead in storage and in ensuring that actions on the database do not create inconsistent states (contradictory data values). Well-normalized tables allow useful constraints to be defined between data columns and/or rows to ensure that data is valid.
The [BoM] table as proposed in the question is not normalized. But before we get to that, the ParentID was not defined and it's not clear what it represents. Instead, to help show why it's not normalized, let me add a [Product] column to the [BoM] table. Then if such a handle has two alternative lists of components (ghosts?), the table would look like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Handle 2
See the duplication? And now if the product is renamed, for instance to "Bronze Handle", then both rows need to be updated for a single conceptual element. It also introduces the possibility of having contradictory data like
BOMID, Product, GhostID
----- ------- -------
1 Handle 1
1 Bronze Handle 2
Enough said about that, since I've already gone on too much about normalization concepts here. Following is a basic normalized schema which would serve you better, but notice that it's not too much different that what you proposed in the question. The only real difference is that the BoM table is normalized by splitting its columns (and purpose) into another table.
I do not list all columns here, only primary and foreign keys and a few other meaningful columns. PK = Primary Key (unique, non-null key), FK = Foreign Key. Proper indices should be defined on the PK and FK columns AND relationships defined with appropriate constraints.
Table: [IM_Item_Registry]
Item_Code (PK)
Table: [BOM]
BOMID (PK)
ProductID (FK)
Table: [BOM_Option]
OptionID (PK)
BOMID (FK)
Primary (boolean) - flags the primary/usual list of components
Description
Table: [Option_Items]
OptionID (FK; part of composite PK)
Item_Code (FK; part of composite PK)
Inclusion_Rate
The [BOM].[ProductID] column alludes to another table with details of the product which should be defined separately from the Bill of Material. If this database really is super-simplistic, then it could just be a string field [Product] containing the name, but I assume there are more useful details to store. Perhaps this is what the ParentID also alluded to? (I suggest choosing names that are not so abstract like "parent" and "ghost", hence my choice of the word "option".)
Really, since [BOM_Option] should be limited to a single option per BOM, it would fulfill proper normalization to create another table like
Table: [BOM_Primary]
[BOMID] (FK and PK) - Primary key so only one primary option can be defined at once
[OptionID] (FK)

MySQL Database Layout/Modelling/Design Approach / Relationships

Scenario: Multiple Types to a single type; one to many.
So for example:
parent multiple type: students table, suppliers table, customers table, hotels table
child single type: banking details
So a student may have multiple banking details, as can a supplier, etc etc.
Layout Option 1 students table (id) + students_banking_details (student_id) table with the appropriate id relationship, repeat per parent type.
Layout Option 2 students table (+others) + banking_details table. banking_details would have a parent_id column for linking and a parent_type field for determining what the parent is (student / supplier / customers etc).
Layout Option 3 students table (+others) + banking_details table. Then I would create another association table per parent type (eg: students_banking_details) for the linking of student_id and banking_details_id.
Layout Option 4 students table (+others) + banking_details table. banking_details would have a column for each parent type, ie: student_id, supplier_id, customers_id - etc.
Other? Your input...
My thoughts on each of these:
Multiple tables of the same type of information seems wrong. If I want to change what gets stored about banking details, thats also several tables I have to change as opposed to one.
Seems like the most viable option. Apparently this doesnt maintain 'referential integrity' though. I don't know how important that is to me if I'm just going to be cleaning up children programatically when I delete the parents?
Same as (2) except with an extra table per type so my logic tells me this would be slower than (2) with more tables and with the same outcome.
Seems dirty to me with a bunch of null fields in the banking_details table.
Before going any further: if you do decide on a design for storing banking details which lacks referential integrity, please tell me who's going to be running it so I can never, ever do business with them. It's that important. Constraints in your application logic may be followed; things happen, exceptions, interruptions, inconsistencies which are later reflected in data because there aren't meaningful safeguards. Constraints in your schema design must be followed. Much safer, and banking data is something to be as safe as possible with.
You're correct in identifying #1 as suboptimal; an account is an account, no matter who owns it. #2 is out because referential integrity is non-negotiable. #3 is, strictly speaking, the most viable approach, although if you know you're never going to need to worry about expanding the number of entities who might have banking details, you could get away with #4 and a CHECK constraint to ensure that each row only has a value for one of the four foreign keys -- but you're using MySQL, which ignores CHECK constraints, so go with #3.
Index your foreign keys and performance will be fine. Views are nice to avoid boilerplate JOINs if you have a need to do that.

SQL: store data field choices in application or in separate table?

Given the following SQL table:
Ticket:
- id (primary)
- state
- created
I want to limit the state to a set of predetermined choices, for example open, pending, closed. Should a given state be stored as a single string for each table row, depending on the application to decide which strings (choices) are allowed? Or should the state be a foreign key of a separate table that stores all allowed values, such as:
Ticket:
- id (primary)
- state (foreign)
- created
TicketState
- id (primary)
- name
The second options seems better but for large tables with a ton of choices it seems the number of these "extra" tables grows rapidly. What's the most common approach to this?
I would almost always use a Foreign Key into a lookup table. While you can use a CONSTRAINT to limit a column (for example, to 'open', 'pending, 'closed') that approach both hides business logic and also makes it more difficult to add additional values if your requirements change.
The cost of JOINing to a table on an INT clustered primary key is very small and I think that trying to avoid this at the cost of an inferior design is a clear case of premature optimization.
ONE TABLE: How often you add, remove, or rename TicketState's name? Do you plan to use any database other than MySQL in the future? If not, ENUM is your friend for MySQL.
TWO TABLES: Even if you make TicketState separately, it will not grow as Ticket grows. It will be fixed by the number of possible states. It may require additional join, but it's less risky (RECOMMENDED)

Database design for time dependent fields

I am making a MySQL database and am fairly confident I know how to normalize it. However, there is an issue I am not sure how to deal with.
Say I have a table
users
----------
user_id primary key
some_field
some_field2
start_date
user_level
Now, user_level gives the user's level, which can be 1,2,3,4,5 say. But as time passes the user may change levels. Obviously if they change levels I can simply do an UPDATE to the users table. But I want to keep a historical record of the users' past levels
For this reason, I am considering a new table called user_level_history
user_level_history
--------------
id autoincrement primary key
user_id
level_start_date
and then modify the users table:
users
----------
user_id primary key
some_field
some_field2
start_date
user_level_history_id
Then to get the user's current level I check the
user_level_history_id = user_level_history.id
And to get the user's history I can SELECT from user_level_history all rows with the user_id and order chronologically.
Is this the standard way to do this? I can't imagine I'm the first person to come across this problem.
One more point: I am imagining less than 5000 users. Would having many, many more users require a different solution?
Thanks in advance.
I think that could be designed like this:
Have a table for level information like value(1,2,3,4,5) , description ...
Have an association table for user_level_history containing user_id, level_id,level_start_date ...
Have a foreign key from level table to user table with the role user-active-level.
You need to develop a mechanism that when user level is changing, inserting to history table occurs.
No, you aren't the first. Querying temporal data is a common requirement, especially in data warehouse/data mining.
The relational data model doesn't have any native, built in support for storing or querying "temporal data".
A lot of work has been done; I have a book by C.J.Date et al. that covers the topic decently: "Temporal Data and the Relational Model". I've also come across several white papers.
One typical, reasonably simplistic approach to storing a "history" is to have a "current" table (like the one you already have, and then add a "history" table. Whenever a row is changed (inserted,updated,deleted) in the "current" table, you add a row to the "history" table, along with the date the row was changed. (You can store a copy of the pre-change row, or a copy of the post-change row, or both.)
With this approach, there's no need to add any columns to the "current" table.

Database Design For Recording Test Results

I work at a manufacturing plant where we assembly 10 different products. Each product is similar in function but requires different parameters to be tested. Originally I created an Access database to store our test results for each unit we build. I laid out the database by having one table for each product. This table stores the production ID along with the test parameters (pressures, temperatures, pass/fail information.. etc.) I feel like this was a poor way to approach this but it seemed to be the only way I could use access's bound forms for easy data entry. My problem is that now whenever I need to add a new test parameter I have to change the table design as well as the forms.
Soon I will have the ability to recreate this system in mySQL and I'm hoping there is a better way to approach storing these tests results. Any insight would be very useful.
Thanks.
Look up "database normalization."
At the most extreme, you could split it into 4 tables:
Product_Types: Product type (VARCHAR/CHAR), id (INT)
Products: id/production id (INT), product type (INT, foreign key bound to Product_Types.id)
Test_Parameters: Type (VARCHAR/CHAR - pressure, temp, etc), id (INT)
Test_Scores: Product (INT, foreign key to Products.id), test (INT, foreign key to Test_Parameters.id), score (INT/whatever seems appropriate), timestamp.
You could theoretically do without the first and third tables and instead just have the names saved in each record (i.e. Product entry: id = 12345, type = "chair"). It's very slightly faster for retrieval that way, but it's also not robust against people misspelling things (i.e. select * from products where type="chair" will miss an entry with type="chiar"), and takes up more storage space since you're saving the textual name over and over again.
Regardless, this is the basic model for a many-to-one relationship, which is what you're looking for: one product, many tests (or, with all four tables, many-to-many: many products, many test types). You need them in separate tables, with each product given an id, and then a foreign key to link each test result to the product it applies to.
Now, let's talk about constraints.
One that I would probably think about throwing on would be a unique key on the test-result table that indexes both the product id and test type, and then be sure to use "ON DUPLICATE KEY UPDATE" so that old values are overwritten by newer ones. That way, you're certain to only ever have one result for each test for each product. If you want to keep old records as well, disregard this paragraph.
The one thing you will definitely lose is the ability to require that all tests are done for a given product. That much will have to be done outside of the database. If you want to require that all the columns are filled in for every single product, then you have to do it pretty much the way you've been doing it (one column for each test in a colossal unified table with NOT NULL constraints on every test column), because now the test results and object id are functionally dependent on each other (neither can exist without the other).
I would use (at least) the following tables:
Product
Id, Name, TestSchedule
Analysis E.g. Measurement of temperature with normal operating parameters with a 1 Kelvin fault tolerance
Id, Name, Description, Instruction
Test E.g. Temperature measurement in product p, expected result is 300-360 Kelvin.
Id, ProductId, AnalysisId, LowerLimit, UpperLimit
TestResult Test result for batch X, e.g. 342 Kelvin, pass
Id, BatchId, TestId, Result, Status (pass/fail)
The reason for having both an Analysis table and a Test table is normalisation. The analysis is generic, specifying a method. The test specifies acceptable limits when the analysis is performed on a particular product.
I think you are looking at needing to use a Many To Many Table.
So One table that stores your products, one that stores each unique test, and then a third M2M table that links product A to however many tests it needs. you M2M could also store (generically) your test results.
Create a table of products with a unique ID / product. Then create a table of tests with a unique test id and a column for applicable product(s). Join these to find which tests apply to which products. You can add new tests at any point.
Further you could have a 'test version' column if you want to store test history, results, etc.