Database design - table design for modeling a hierarchy - mysql

I am designing a laboratory information system (LIS) and am confused on how to design the tables for the different laboratory tests. How should I deal with a table that has an attribute with multiple values and each of the multiple values of that attribute can also have multiple values as well?
Here's some of the data in my LIS design...
HEMATOLOGY <-------- Lab group
**************************************************************
CBC <-------- Sub group 1
RBC <-------- Component
WBC
Hemoglobin
Hematocrit
MCV
MCH
MCHC
Platelet count
Hemoglobin
Hematocrit
WBC differential
Neutrophils
Lymphocytes
Monocytes
Eosinophils
Basophils
Platelet count
Reticulocyte count
ESR
Bleeding time
Clotting time
Pro-time
Peripheral smear
Malarial smear
ABO
RH typing
CLINICAL MICROSCOPY <-------- Lab Group
**************************************************************
Routine urinalysis <-------- Sub group 1
Visual Examination <-------- Sub group 2
Color <-------- Component
Turbidity
Specific Gravity
Chemical Examination
pH
protein
glucose
ketones
RBC
Hbg
bilirubin
specific gravitiy
nitrite for bacteria
urobilinogen
leukocyte esterase
Microscopic Examination
Red Blood Cells (RBCs)
White Blood Cells (WBCs)
Epithelial Cells
Microorganisms (bacteria, trichomonads, yeast)
Trichomonads
Casts
Crystals
Occult Blood
Pregnancy Test
...This hierarchy of data also gets repeated in other lab groupings in my design (e.g. Blood chemistry, Serology, etc)...
Another question is, how am I gonna deal with a component (for example, RBC) which can be a member of one or more lab groups?
I already implemented a solution to my problem by making a separate tables, 1 for lab group, 1 for sub group 1, 1 for sub group 2 and 1 for component. And then created another table to consolidate all of them by placing a foreign key of each in this table...the only trade off is that some of the rows in this table may have null values. Im not satisfied with my design, so I'm hoping someone could give me advise on how to make it right; any help would be greatly appreciated.

Here are a couple options:
If it is just the hierarchy above you are modeling, and there is no other data involved, then you can do it in two tables:
One problem with this is that you do not enforce that, for example, a sub_group must be a child of a lab_group, or that a component must be child of either a sub_group_1 or a sub_group_2, but you could enforce these requirements in your application tier instead.
The plus side of this approach is that the schema is nice and simple. Even if the entities have more data associated with them, it might still be worth modeling the hierarchy like this and have some separate tables for the entities themselves.
If you want to enforce the correct relationships at the data level, then you are going to have to split it out into separate tables. Maybe something like this:
This assumes that each sub_group_1 is only related to a single lab_group. If this is not the case then add a link table between lab_group and sub_group_1. Likewise for the sub_group_1 -> sub_group_2 relationship.
There is a single link table between component and sub_group_1 and sub_group_2. This allows a single component to be related to several sub_group_1 and sub_group_2 entities. The fact it is a single table means that a lot of the sub_group_1_id and sub_group_2_id records will be null (like you mentioned in your question). You could prevent the nulls be having two separate link tables:
sub_group_1_component with a foreign key to sub_group_1 and a foreign key to component
sub_group_2_component with a foreign key to sub_group_2 and a foreign key to component
The reason I didn't put this in the diagram is that for me, having to query two tables rather than one to get all the component -> sub_group relationships is too much of a pain. For the sake of a little denormalisation (allowing a few nulls) it is much easier to query a single table. If you find yourself allowing a lot of nulls (like a single link table for the relationships between all the entities here) then that is probably denormalising too much.

Personally, I would create 3 tables using relationships for the values. It gives you the ability to create limitless arrays of values. Just try to make sure you give great column names, or your head will spin for days. :)
Also, null values aren't a problem look into all the different type of joins

Related

MS access multiple relationships between two tables

We had an MS Access guru at our company who left for another position. Before she left she gave me a quick introduction on how to create queries from a sql server. I am really struggling with this and as I have no one to turn to at our company I was hoping you guys could help.
Hope you can help!
Thanks!
Well, keep in mind that when you build a query, it DOES NOT necessary mean that a enforced relationship exists here. (it might).
Further more, if you imported the tables, then again its doubtful that relations are defined in Access unless you use the relationships window to "enforce" such relationships.
However, when building a query? We will often join on two fields. When you build a query in the query builder, you are free to "make up" any kind of join you want.
Say I was given two different spreadsheets. One had some people, and another had a list of hotels.
Ok, so say we want to generate a list of all people in the same city as the hotels.
You might join between table "People" and say Hotels with city.
however, WHAT happens if there is more then one state with the same City name?
Well, then just join on City AND State!!!
So you get this:
So I not have some related tables here. I just feel like and want to, and need to join the two tables of data.
As such, we never cared or setup or "had" some relationship defined, but all we care about is creating and building a working query.
So, don't confuse the simple act of building some query with that of having setup a corrrect relatonships between tables.
For a working application? Yes, you most certainly will setup relatonships.
So, if you setup relatonships correctly, then you not be able to say add a customer "invoice" reocrd without FIRST having a customer record. You don't have to do this, but it is a very good idea for a working applicaton.
However, when dealing with imported data? You often may not have an pre-defined relationships.
Now, of course in "most" cases, a query that involves multiple tables will in near all cases "follow" what you defined as relationships in the relationships window but it not necessary a requirement at all.
As noted, when building a working application? Then yes, of course you want to setup the relatonships BEFORE you start adding data.
But for general data processing, and creating queries against say different tables of data you are slicing and dicing and working with?
You are free to cook up and draw lines between the tables in the query builder, and as such, often such quires will have zero to do with the relationships you defined, or in fact even when you don't have any relationships defined at all.
That above People and the list of hotels is a great example. I mean, it rather cool that I simple joined on both City and State, and did not have to write one line of data processing code for my desired results
(a list of people in cities that live in the same city as my hotel list).
So don't confuse what we call "referential integrity" and defined relationships. We define these relationships so it becomes impossible for you the developer to add a customer invoice without first having added the customer. And it also means that you, your code, or even a editing the tables directly will not allow this to occur.
However, when dealing with just reporting, or importing data to work on? Well, then often we will not have any relationships defined, but that sure does not stop us from firing up the query builder and drawing join lines between tables.
Between two given Tables you can have one relationship involving two (of more) fields or two (or more) relationships each involving one field. Both cases are possible and have different implications.
The first case, as the first commenter pointed out, is typically used when you have a compound key in the master Table of the relationship.
The second case is typically used when you have two candidate keys in the master table, each of which is used as a master field in each of the two independent relationships.
In Ms-access the case of two independent relationships may be identified because it implies two table-boxes for the same table in the relationships pane.

Access query is duplicating unique records / Linked table issues

I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.

Method To Create Database for Tv Shows

This is my first question to stackoverflow so if i do something wrong please let me know i will fix it as soon as possible.
So i am trying to make a database for Tv Shows and i would like to know the best way and to make my current database more simple (normalization).
I would to be able to have the following structure or similar.
Fringe
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
Burn Notice
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
... (More Tv Shows)
Sorry if this seems unclear. (Please ask for clarification)
But the structure i have right now is 3 tables (tvshow_list, tvshow_episodes, tvshow_link)
//tvshow_list//
TvShow Name | Director | Company_Created | Language | TVDescription | tv_ID
//tvshow_episodes//
tv_ID | EpisodeNum | SeasonNum | EpTitle | EpDescription | Showdate | epid
//tvshow_link//
epid | ep_link
The Director and the company are linked by an id to another table with a list of companies and directors.
I am pretty sure that there is an more simplified way of doing this.
Thanks for the help in advance,
Krishanthan Lingeswaran
The basic concept of Normalization is the idea that you should only store one copy of any item of data that you have. It looks like you've got a good start already.
There are two basic ways to model what you're trying to do here, with episodes and shows. In the database world, we you might have heard the term "one to many" or "many to many". Both are useful, it just depends on your specific situation to know which is the correct one to use. In your case, the big question to ask yourself is whether a single episode can belong to only one show, or can an episode belong to multiple shows at once? I'll explain the two forms, and why you need to know the answer to that question.
The first form is simply a foreign key relationship. If you have two tables, 'episodes' and 'shows', in the episodes table, you would have a column named 'show_id' that contains the ID of one (and only one!) show. Can you see how you could never have an episode belong to more than one show this way? This is called a "one to many" relationship, i.e. a show can have many episodes.
The second form is to use an association table, and this is the form you used in your example. This form would allow you to associate an episode with multiple shows and is therefore called a "many to many" relationship.
There is some benefit to using the first form, but it's not really that big of a deal in most cases. Your queries will be a little bit shorter because you only have to join 2 tables to get episodes->shows but the other table is just one more join. It really comes down to figuring out if you need a "one to many" or "many to many" type relationship.
An example of a situation where you would need a many-to-many relationship would be if you were modeling a library and had to keep track of who checked out which book. You'd have a table of books, a table of users, and then a table of "books to users" that would have an id, a book_id, and a user_id and would be a many-to-many relationship.
Hope that helps!
I am pretty sure that there is an more simplified way of doing this.
Not as far as I know. Your schema is close to the simplest you can make for what I presume is the functionality you're asking for. "Improvements" on it really only make it more complicated, and should be added as you judge the need emerges on your side. The following examples come to mind (none of which really simplify your schema).
I would standardize your foreign key and primary key names. An example would be to have the columns shows.id, episodes.id, episodes.show_id, link.id, link.episode_id.
Putting SeasonNum as what I presume will be an int in the Episodes table, in my opinion, violates the normalization constraint. This is not a major violation, but if you really want to stick to it, I would create a separate Seasons table and associate it many-to-one to the Shows table, and then have the Episodes associate only with the Seasons. This gives you the opportunity to, for instance, attach information to each season. Also, it prevent repetition of information (while the type of the season ID foreign key column in the Episodes table would ostensibly still be an INT, a foreign key philosophically stores an association, what you want, versus dumb data, what you have).
You may consider putting language, director, and company in their own tables rather than your TV show list. This is the same concern as above and in your case a minor violation of normalization.
Language, director, and company all have interesting issues attached to them regarding the level of the association. Most TV shows have different directors for different episodes. Many are produced in multiple languages and by several different companies and sometimes networks. So at what level do you plan on storing this information? I'm not a software architect, so someone else can better answer this question than me, but I'd set up a polymorphic many-to-many association for languages, directors, and companies and an inheritance model that allows for these values to be specified on an episode-by-episode, season-by-season, or show-by-show basis, inheriting the value from its parent if none are provided.
Bottom line concerning all these suggestions: Pick what's appropriate for your project. If you don't need the functionality afforded by this level of associations, and you don't mind manually entering in repetitive data (you might end up implementing an auto-complete system to help you), you can gloss over some of the normalization constraints.
Normalization is merely a suggestion. Pick what's right for you and learn from your mistakes.

Organizational chart represented in a table

I have an Access application, in which I have an employee table. The employees are part of several different levels in the organization. The orgranization has 1 GM, 5 department heads, and under each department head are several supervisors, and under those supervisors are the workers.
Depending on the position of the employee, they will only have access to records of those under them.
I wanted to represent the organization in a table with some sort of level system. The problem I saw with that was that there are many ppl on the same level (for example supervisors) but they shouldn't have access to the records of a supervisor in another department. How should I approach this problem?
One common way of keeping this kind of hierarchical data in a database uses only a single table, with fields something like this:
userId (primary key)
userName
supervisorId (self-referential "foreign key", refers to another userId in this same table)
positionCode (could be simple like 1=lakey, 2=supervisor; or a foreign key pointing to another table of positions and such)
...whatever else you need to store for each employee...
Then your app uses SQL queries to figure out permissions. To figure out the employees that supervisor 'X' (whose userId is '3', for example) is allowed to see, you query for all employees where supervisorId=3.
If you want higher-up bosses to be able to see everyone underneath them, the easiest way is just to do a recursive search. I.e. query for everyone that reports to this big boss, and for each of them query who reports to them, all the way down the tree.
Does that make sense? You let the database do the work of sorting through all the users, because computers are good at that kind of thing.
I put the positionCode in this example in case you wanted some people to have different permissions... for example, you might have a code '99' for HR employees which have the right to see the list of all employees.
Maybe I'll let some other people try to explain it better...
Here's an article from Microsoft's Access Cookbook that explains these queries rather well.
And here is a somewhat chunky explanation of the same.
Here's a completely different method (the "adjacency list model") that you might find useful, and his explanation is pretty good. He also points out some difficulties with both methods (when he talks about the tables being "denormalized").

One-to-many relationship in the same table

Im trying to use to define a one-to-many relationship in a single table. For example lets say I have a Groups table with these entries:
Group:
Group_1:
name: Atlantic Records
Group_2:
name: Capital Records
Group_3:
name: Gnarls Barkley
Group_4:
name: Death Cab For Cutie
Group_5:
name: Coldplay
Group_6:
name: Management Company
The group Coldplay could be a child of the group Capital Records and a child of the group Management Company and Gnarls Barkley could only be a child of Atlantic Records.
What is the best way to represent this relationship. I am using PHP and mySQL. Also I am using PHP-Doctrine as my ORM if that helps.
I was thinking that I would need to create a linking table called group_groups that would have 2 columns. owner_id and group_id. However i'm not sure if that is best way to do this.
Any insight would be appreciated. Let me know if I explained my problem good enough.
There are a number of possible issues with this approach, but with a minimal understanding of the requirements, here goes:
There appear to be really three 'entities' here: Artist/Band, Label/Recording Co. and Management Co.
Artists/Bands can have a Label/Recording CO
Artists/Bands can have a Management Co.
Label/Recording Co can have multiple Artists/Bands
Management Co can have multiple Artists/Bands
So there are one-to-many relationships between Recording Co and Artists and between Management Co and Artists.
Record each entity only once, in its own table, with a unique ID.
Put the key of the "one" in each instance of the "many" - in this case, Artist/Band would have both a Recording Co ID and a Management Co ID
Then your query will ultimately join Artist, Recording Co and Management Co.
With this structure, you don't need intersection tables, there is a clear separation of "entities" and the query is relatively simple.
A couple of options:
Easiest: If each group can only have one parent, then you just need a "ParentID" field in the main table.
If relationships can be more complex than that, then yes, you'd need some sort of linking table. Maybe even a "relationship type" column to define what kind of relationship between the two groups.
In this particular instance, you would be wise to follow Ken G's advice, since it does indeed appear that you are modeling three separate entities in one table.
In general, it is possible that this could come up -- If you had a "person" table and were modeling who everybody's friends were, for a contrived example.
In this case, you would indeed have a "linking" or associative or marriage table to manage those relationships.
I agree with Ken G and JohnMcG that you should separate Management and Labels. However they may be forgetting that a band can have multiple managers and/or multiple managers over a period of time. In that case you would need a many to many relationship.
management has many bands
band has many management
label has many bands
band has many labels
In that case your orginal idea of using a relationship table is correct. That is home many-to-many relationships are done. However, group_groups could be named better.
Ultimately it will depend on your requirements. For instance if you're storing CD titles then perhaps you would rather attach labels to a particular CD rather than a band.
This does appear to be a conflation of STI (single-table inheritance) and nested sets / tree structures. Nested set/trees are one parent to multiple children:
http://jgeewax.wordpress.com/2006/07/18/hierarchical-data-side-note/
http://www.dbmsmag.com/9603d06.html
http://www.sitepoint.com/article/hierarchical-data-database
I think best of all is to use NestedSet
http://www.doctrine-project.org/documentation/manual/1_0/en/hierarchical-data#nested-set
Just set actAs NestedSet
Yes, you would need a bridge that contained the fields you described. However, I would think your table should be split if it is following the same type of entities as you describe.
(I am assuming there is an id column which can be used for references).
You can add a column called parent_id (allow nulls) and store the id of the parent group in it. Then you can join using sql like: "Select a., b. from group parent join group child on parent.id = child.parent_id".
I do recommend using a separate table for this link because:
1. You cannot support multiple parents with a field. You have to use a separate table.
2. Import/Export/Delete is way more difficult with a field in the table because you may run into key conflicts. For example, if you try to import data, you need to make sure that you first import the parents and then children. With a separate table, you can import all groups and then all relationships without worrying about the actual order of the data.