One-to-many relationship in the same table - mysql

Im trying to use to define a one-to-many relationship in a single table. For example lets say I have a Groups table with these entries:
Group:
Group_1:
name: Atlantic Records
Group_2:
name: Capital Records
Group_3:
name: Gnarls Barkley
Group_4:
name: Death Cab For Cutie
Group_5:
name: Coldplay
Group_6:
name: Management Company
The group Coldplay could be a child of the group Capital Records and a child of the group Management Company and Gnarls Barkley could only be a child of Atlantic Records.
What is the best way to represent this relationship. I am using PHP and mySQL. Also I am using PHP-Doctrine as my ORM if that helps.
I was thinking that I would need to create a linking table called group_groups that would have 2 columns. owner_id and group_id. However i'm not sure if that is best way to do this.
Any insight would be appreciated. Let me know if I explained my problem good enough.

There are a number of possible issues with this approach, but with a minimal understanding of the requirements, here goes:
There appear to be really three 'entities' here: Artist/Band, Label/Recording Co. and Management Co.
Artists/Bands can have a Label/Recording CO
Artists/Bands can have a Management Co.
Label/Recording Co can have multiple Artists/Bands
Management Co can have multiple Artists/Bands
So there are one-to-many relationships between Recording Co and Artists and between Management Co and Artists.
Record each entity only once, in its own table, with a unique ID.
Put the key of the "one" in each instance of the "many" - in this case, Artist/Band would have both a Recording Co ID and a Management Co ID
Then your query will ultimately join Artist, Recording Co and Management Co.
With this structure, you don't need intersection tables, there is a clear separation of "entities" and the query is relatively simple.

A couple of options:
Easiest: If each group can only have one parent, then you just need a "ParentID" field in the main table.
If relationships can be more complex than that, then yes, you'd need some sort of linking table. Maybe even a "relationship type" column to define what kind of relationship between the two groups.

In this particular instance, you would be wise to follow Ken G's advice, since it does indeed appear that you are modeling three separate entities in one table.
In general, it is possible that this could come up -- If you had a "person" table and were modeling who everybody's friends were, for a contrived example.
In this case, you would indeed have a "linking" or associative or marriage table to manage those relationships.

I agree with Ken G and JohnMcG that you should separate Management and Labels. However they may be forgetting that a band can have multiple managers and/or multiple managers over a period of time. In that case you would need a many to many relationship.
management has many bands
band has many management
label has many bands
band has many labels
In that case your orginal idea of using a relationship table is correct. That is home many-to-many relationships are done. However, group_groups could be named better.
Ultimately it will depend on your requirements. For instance if you're storing CD titles then perhaps you would rather attach labels to a particular CD rather than a band.

This does appear to be a conflation of STI (single-table inheritance) and nested sets / tree structures. Nested set/trees are one parent to multiple children:
http://jgeewax.wordpress.com/2006/07/18/hierarchical-data-side-note/
http://www.dbmsmag.com/9603d06.html
http://www.sitepoint.com/article/hierarchical-data-database

I think best of all is to use NestedSet
http://www.doctrine-project.org/documentation/manual/1_0/en/hierarchical-data#nested-set
Just set actAs NestedSet

Yes, you would need a bridge that contained the fields you described. However, I would think your table should be split if it is following the same type of entities as you describe.

(I am assuming there is an id column which can be used for references).
You can add a column called parent_id (allow nulls) and store the id of the parent group in it. Then you can join using sql like: "Select a., b. from group parent join group child on parent.id = child.parent_id".
I do recommend using a separate table for this link because:
1. You cannot support multiple parents with a field. You have to use a separate table.
2. Import/Export/Delete is way more difficult with a field in the table because you may run into key conflicts. For example, if you try to import data, you need to make sure that you first import the parents and then children. With a separate table, you can import all groups and then all relationships without worrying about the actual order of the data.

Related

MS access multiple relationships between two tables

We had an MS Access guru at our company who left for another position. Before she left she gave me a quick introduction on how to create queries from a sql server. I am really struggling with this and as I have no one to turn to at our company I was hoping you guys could help.
Hope you can help!
Thanks!
Well, keep in mind that when you build a query, it DOES NOT necessary mean that a enforced relationship exists here. (it might).
Further more, if you imported the tables, then again its doubtful that relations are defined in Access unless you use the relationships window to "enforce" such relationships.
However, when building a query? We will often join on two fields. When you build a query in the query builder, you are free to "make up" any kind of join you want.
Say I was given two different spreadsheets. One had some people, and another had a list of hotels.
Ok, so say we want to generate a list of all people in the same city as the hotels.
You might join between table "People" and say Hotels with city.
however, WHAT happens if there is more then one state with the same City name?
Well, then just join on City AND State!!!
So you get this:
So I not have some related tables here. I just feel like and want to, and need to join the two tables of data.
As such, we never cared or setup or "had" some relationship defined, but all we care about is creating and building a working query.
So, don't confuse the simple act of building some query with that of having setup a corrrect relatonships between tables.
For a working application? Yes, you most certainly will setup relatonships.
So, if you setup relatonships correctly, then you not be able to say add a customer "invoice" reocrd without FIRST having a customer record. You don't have to do this, but it is a very good idea for a working applicaton.
However, when dealing with imported data? You often may not have an pre-defined relationships.
Now, of course in "most" cases, a query that involves multiple tables will in near all cases "follow" what you defined as relationships in the relationships window but it not necessary a requirement at all.
As noted, when building a working application? Then yes, of course you want to setup the relatonships BEFORE you start adding data.
But for general data processing, and creating queries against say different tables of data you are slicing and dicing and working with?
You are free to cook up and draw lines between the tables in the query builder, and as such, often such quires will have zero to do with the relationships you defined, or in fact even when you don't have any relationships defined at all.
That above People and the list of hotels is a great example. I mean, it rather cool that I simple joined on both City and State, and did not have to write one line of data processing code for my desired results
(a list of people in cities that live in the same city as my hotel list).
So don't confuse what we call "referential integrity" and defined relationships. We define these relationships so it becomes impossible for you the developer to add a customer invoice without first having added the customer. And it also means that you, your code, or even a editing the tables directly will not allow this to occur.
However, when dealing with just reporting, or importing data to work on? Well, then often we will not have any relationships defined, but that sure does not stop us from firing up the query builder and drawing join lines between tables.
Between two given Tables you can have one relationship involving two (of more) fields or two (or more) relationships each involving one field. Both cases are possible and have different implications.
The first case, as the first commenter pointed out, is typically used when you have a compound key in the master Table of the relationship.
The second case is typically used when you have two candidate keys in the master table, each of which is used as a master field in each of the two independent relationships.
In Ms-access the case of two independent relationships may be identified because it implies two table-boxes for the same table in the relationships pane.

Relationship database design - object specific many to many, do I solve with self join table or new table

Being new to relational database design, I am trying to clarify one piece of information to properly design this database. Although I am using Filemaker as the platform, I believe this is a universal question.
Using the logic of ideally having all one to many relationships, and using separate tables or join tables to solve these.
I have a database with multiple products, made by multiple brands, in multiple product categories. I also want this to be as scale-able as possible when it comes to reporting, being able to slice and dice the data in as many ways as possible since the needs of the users are constantly changing.
So when I ask the question "Does each Brand have multiple products" I get a yes, and "Does each product have multiple brands" the answer is no. So this is a one to many relationship, but it also seems that a self-join table might give me everything that I need.
This methodology also seems to go down a rabbit hole for other "product related" information such as product category, each product is tied to one product category, but only one product category is related to a product.
So I see 2 possibilities, make three tables and join them with primary and foreign keys, one for Brand, one for Product Category, and one for Products.
Or the second possibility is to create one table that has the brand and product category and product info all in one table (since they are all product related) and simply do self-joins and other query based tables to give me the future reporting requirements that will be changing over time.
I am looking for input from experiences that might point me in the right direction.
Thanks in advance!
Could you ever want to store additional information about a brand (company URL, phone number, etc.) or about a product category (description, etc.)?
If the answer is yes, you definitely want to use three tables. If you don't, you'll be repeating all that information for every single item that belongs to the same brand or same category.
If the answer is no, there is still an advantage to using three tables - it will prevent typos or other spelling inconsistencies from getting into your database. For example, it would prevent you from writing a brand as "Coca Cola" for some items and as "Coca-Cola" for other items. These inconsistencies get harder and harder to find and correct as your database grows. By having each brand only listed once in it's own table, it will always be written the same way.
The disadvantage of multiple tables is the SQL for your queries is more complicated. There's definitely a tradeoff, but when in doubt, normalize into multiple tables. You'll learn when it's better to de-normalize with more experience.
I am not sure where do you see a room for a self-join here. It seems to me you are saying: I have a table of products; each product has one brand and one (?) category. If that's the case then you need either three tables:
Brands -< Products >- Categories
or - in Filemaker only - you can replace either or both the Brands and the Categories tables with a value list (assuming you won't be renaming brands/categories and at the expense of some reporting capabilities). So really it depends on what type of information you want to get out in the end.
If you truly want your solution to be scalable you need to parse and partition your data now. Otherwise you will be faced with the re-structuring of the solution down the road when the solution grows in size. You will also be faced with parsing and relocating the data to new tables. Since you've also included the SQL and MySQL tags if you plan on connecting Filemaker to an external data source then you will definitely need to up your game structurally.
Building everything in one table is essentially using Filemaker to do Excel work and it won't cut it if you are connecting to SQL, MySQL, etc.
Self join tables are a great tool. However, they should really only be used for calculating small data points and should not be used as pivot points or foundations for your reporting features. It can grow out of control as time goes on and you need to keep your backend clean.
Use summary and sub-summary reporting features to slice product based data.
For retail and general product management solutions, whether it's Filemaker/SQL/or whatever the "Brand" or "Vendor" is it's own table. Then you would have a "Products" table (the match key being the "Brand ID").
The "Product Category" field should be a field in the "Products" table. You can manage the category values by building a standard value list or building a value list based on a "Product Category" table. The second scenario is better for long term administration.

How to set up relational database tables for this many-to-many relationship?

I have a type of data called a chain. Each chain is made up of a specific sequence of another type of data called a step. So a chain is ultimately made up of multiple steps in a specific order. I'm trying to figure out the best way to set this up in MySQL that will allow me to do the following:
Look up all steps in a chain, and get them in the right order
Look up all chains that contain a step
I'm currently considering the following table set up as the appropriate solution:
TABLE chains
id date_created
TABLE steps
id description
TABLE chains_steps (this would be used for joins)
chain_id step_id step_position
In the table chains_steps, the step_position column would be used to order the steps in a chain correctly. It seems unusual for a JOIN table to contain its own distinct piece of data, such as step_position in this case. But maybe it's not unusual at all and I'm just inexperienced/paranoid.
I don't have much experience in all this so I wanted to get some feedback. Are the three tables I suggested the correct way to do this? Are there any viable alternatives and if so, what are the advantages/drawback?
You're doing it right.
Consider a database containing the Employees and Projects tables, and how you'd want to link them in a many-to-many fashion. You'd probably come up with an Assignments table (or Project_Employees in some naming conventions).
At some point you'd decide you want not only to store each project assignment, but you'd also want to store when the assignment started, and when it finished. The natural place to put that is in the assignment itself; it doesn't make sense to store it either with the project or with the employee.
In further designs you might even find it necessary to store further information about the assignment, for example in an employee review process you may wish to store feedback related to their performance in that project, so you'd make the assignment the "one" end of a relationship with a Review table, which would relate back to Assignments with a FK on assignment_id.
So in short, it's perfectly normal to have a junction table that has its own data.
That looks fine, and it's not unusual for the join table to contain a position/rank field.
Look up all steps in a chain, and get them in the right order
SELECT * FROM chains_steps
LEFT JOIN steps ON steps.id = chains_steps.step_id
WHERE chains_steps.chain_id = ?
ORDER BY chains_steps.step_position ASC
Look up all chains that contain a step
SELECT DISTINCT chain_id FROM chains_steps
LEFT JOIN chains ON chains.id = chains_steps.chain_id
I think that the plan you've outlined is the correct approach. Don't worry too much about the presence of step_position on your mapping table. After all the step_position is a bit of data that is directly related to a step in the context of a chain. So the chains_steps table is the right place for it IMHO.
Some things to think about:
Foreign keys - use 'em!
Unique key on the chains_steps table - can a step be present in more than one position in a single chain? What about in different chains?
Good luck!

Database design - table design for modeling a hierarchy

I am designing a laboratory information system (LIS) and am confused on how to design the tables for the different laboratory tests. How should I deal with a table that has an attribute with multiple values and each of the multiple values of that attribute can also have multiple values as well?
Here's some of the data in my LIS design...
HEMATOLOGY <-------- Lab group
**************************************************************
CBC <-------- Sub group 1
RBC <-------- Component
WBC
Hemoglobin
Hematocrit
MCV
MCH
MCHC
Platelet count
Hemoglobin
Hematocrit
WBC differential
Neutrophils
Lymphocytes
Monocytes
Eosinophils
Basophils
Platelet count
Reticulocyte count
ESR
Bleeding time
Clotting time
Pro-time
Peripheral smear
Malarial smear
ABO
RH typing
CLINICAL MICROSCOPY <-------- Lab Group
**************************************************************
Routine urinalysis <-------- Sub group 1
Visual Examination <-------- Sub group 2
Color <-------- Component
Turbidity
Specific Gravity
Chemical Examination
pH
protein
glucose
ketones
RBC
Hbg
bilirubin
specific gravitiy
nitrite for bacteria
urobilinogen
leukocyte esterase
Microscopic Examination
Red Blood Cells (RBCs)
White Blood Cells (WBCs)
Epithelial Cells
Microorganisms (bacteria, trichomonads, yeast)
Trichomonads
Casts
Crystals
Occult Blood
Pregnancy Test
...This hierarchy of data also gets repeated in other lab groupings in my design (e.g. Blood chemistry, Serology, etc)...
Another question is, how am I gonna deal with a component (for example, RBC) which can be a member of one or more lab groups?
I already implemented a solution to my problem by making a separate tables, 1 for lab group, 1 for sub group 1, 1 for sub group 2 and 1 for component. And then created another table to consolidate all of them by placing a foreign key of each in this table...the only trade off is that some of the rows in this table may have null values. Im not satisfied with my design, so I'm hoping someone could give me advise on how to make it right; any help would be greatly appreciated.
Here are a couple options:
If it is just the hierarchy above you are modeling, and there is no other data involved, then you can do it in two tables:
One problem with this is that you do not enforce that, for example, a sub_group must be a child of a lab_group, or that a component must be child of either a sub_group_1 or a sub_group_2, but you could enforce these requirements in your application tier instead.
The plus side of this approach is that the schema is nice and simple. Even if the entities have more data associated with them, it might still be worth modeling the hierarchy like this and have some separate tables for the entities themselves.
If you want to enforce the correct relationships at the data level, then you are going to have to split it out into separate tables. Maybe something like this:
This assumes that each sub_group_1 is only related to a single lab_group. If this is not the case then add a link table between lab_group and sub_group_1. Likewise for the sub_group_1 -> sub_group_2 relationship.
There is a single link table between component and sub_group_1 and sub_group_2. This allows a single component to be related to several sub_group_1 and sub_group_2 entities. The fact it is a single table means that a lot of the sub_group_1_id and sub_group_2_id records will be null (like you mentioned in your question). You could prevent the nulls be having two separate link tables:
sub_group_1_component with a foreign key to sub_group_1 and a foreign key to component
sub_group_2_component with a foreign key to sub_group_2 and a foreign key to component
The reason I didn't put this in the diagram is that for me, having to query two tables rather than one to get all the component -> sub_group relationships is too much of a pain. For the sake of a little denormalisation (allowing a few nulls) it is much easier to query a single table. If you find yourself allowing a lot of nulls (like a single link table for the relationships between all the entities here) then that is probably denormalising too much.
Personally, I would create 3 tables using relationships for the values. It gives you the ability to create limitless arrays of values. Just try to make sure you give great column names, or your head will spin for days. :)
Also, null values aren't a problem look into all the different type of joins

How to store hierarchical information into a database?

I have the following information that should be retrieved by using several dependent select fields on a web form:
Users will be able to add new categories.
Food
- Fruits
- Tropical
- Pineapples
- Pineapples - Brazil
- Pineapples - Hawaii
- Coconuts
- Continental
- Orange
- Fish
....
This data should come from a database.
I realize that creating a table for each category here presented is not a good schema perhaps, so I would to ask, if is there any standard way to deal with this?
I'm also aware of this schema example:
Managing Hierarchical Data in MySQL
Is there any other (perhaps more intuitive way) to store this type of information ?
The link you provided describes the two standard ways for storing this type of information:
Adjacency List
Nested Sets
One issue your question didn't raise is whether all fruits have the same attributes or not.
If all fruits have the same attributes, then the answer that tells you to look at the link you provided and read about adjacency lists and nested sets is correct.
If new fruits can have new attributes, then a user that can add a new fruit can also add a new attribute. This can turn into a mess, real easily. If two users invent the same attribute, but give it a different name, that might be a problem. If two users invent different attributes, but give them the same name, that's another problem.
You might just as well say that, conceptually, each user has their own database, and no meaningful queries can be made that combine data from different users. Problem is, the mission of the database almost always includes, sooner or later, bringing together all the data from the different users.
That's where you face a nearly impossible data management issue.
Kawu gave you the answer.... a recursive relation (the table will be be related to itself) aka Pig's Ear relation.
You example shows a parent with several children, but you didn't say if an item can belong to more that one parent. Can an orange be in 'Tropical' and in 'Citrus'?
Each row has an id and a parent_id with the parent_id pointing to the id of another row.
id=1 name='Fruits' parent_id=0
id=2 name='Citrus' parent_id=1
id=3 name='Bitter Lemon' parent_id=2
id=4 name='Pink Grapefruit' parent_id=2
Here are some examples of schemas using this type of relation to provide unlimited parent-child relations:
Data model for product categories
Data model for organizations and people