I am working on a mysql based system to manage data from processing of food products.
At this point I came across the following specific Problem:
I have a table A with some items:
Farmer Quantity
Farmer A 1000 kg
Farmer B 500 kg
Then I have a table B which is an m:n agregation of data from table A:
Batch Quantity Quality etc.
LI1 200 kg ....
LI2 12000 kg ....
To represent the m:n relation I have a table AB which connects the two:
FK_Farmer FK_Batch
FarmerA LI1
FarmerB LI1
FarmerA LI2
Now the problem: some of the batches in Table B are actually made up of other batches... which means they are recursively composed. I am intersted to know what is the best approach in terms of database design to implement this situation.
Should I include an additional foreign key in table AB referencing back to the batches table? Should I not enforce foreign keys and reference both the farmers and the batch table through the same column (and add a flag to indicate recursion or something)?
Is there any other obvious solution I have ovelooked?
Being able to do drill-down queries for all data through direct MySQL would be nice, but is not necessarily required.
The simplest way to represent the data is to add a Parent pointer to the Batch table. The root of a hierarchy would have a null in this field. Any non-root would point to its parent, which might in turn point to another parent, etc, for as many levels as you may have.
Querying such a structure is tricky because standard SQL has no way to process a tree. Oracle has a proprietary extension in their SQL dialect, but I don't think MySQL does. This means that to chase the whole tree, you have to either write code that loops through queries, or you have to write a query that does multiple joins for some arbitrary number of maximum levels.
But I don't know any easier way around it. Basically I'd plan on chasing the tree with code rather than a single query.
If a parent batch can have multiple child batches, and a child batch can have multiple parent batches, then you need a new mapping table:
FK_ParentBatch FK_ChildBatch
LI1 LI5
LI1 LI6
LI2 LI5
LI2 LI3
LI3 LI4
Use foreign keys to make sure that the relations are maintained; but I don't know if the database can prevent you from getting into loops, you might have to rely on code or stored procs for that.
Related
Suppose I have a table that acts as an inventory of my house - inventory_items if you will. inventory_items contains everything I own, but only the most general information (i.e fields that will apply to everything I own, like a name, purchase date).
I then wish to have a separate table for electronics_data which is an inventory item, but has special information to store (lets say serial_number, wattage) and another for furniture_data which contains furniture specific information (number_of_legs, material).
In all instances, items in electronics_data will have a matching item in inventory_items linked by an id field. The same is true of furniture_data.
If I now wish to show a list of my inventory items, but include specific information from the child tables, logically I think to load the inventory_data, find out what type of item this is, and load the right information from the right table. I can think of two better ways:
1) Create a foreign key relationship between inventory_items and electronics_data - thus loading all items will get me all of my child data too. But, not all items in inventory_items will have a matching item in electronics_data so does this mean a foreign key can't work?
2) Create a view which loads the extra tables if a matching item exists in them, and load the view in my application. If I have lots of different 'types' of data, will this make my view unnecessarily slow (checking everything) and actually defeat the object of the view in the first place?
These are general questions - particularly 2) I would imagine is very data dependent.
Thanks!
1) Foreign keys will work, since the specialised tables are the child tables, so you need to make sure that each record in the child table has a corresponding record in the overall inventory_items table. The reverse is not necessarily true.
2) The view can left join the child tables on the inventory_items table. If the fields used in the join are indexed in all tables, then the operation is not that resource intensive. The biggest catch could be how you build the view, if you have lots of specialised child tables. But this is probably a wider application design question anyway (if you are looking at your electronic devices, then you probably do not want to see the fields from the furniture items table - in these specialised views I would use inner join, not left join).
well it will make your life easier if you could join the tables when extracting data. There are a lot of ways to join tables, in your case if all your tables have an I.D column then you could use an 'Equijoin' This is how you could do so
SELECT inventory_items.name, electronics_data.wattage, furniture_data.material
FROM inventory_items, electronics_data, furniture_data
WHERE inventory_items.i.d=electronics_data.i.d=furniture_data.id;
so with a join like this you can add as many columns as you wish but make sure to highlight the table they are from and in the 'WHERE' clause show where they are equal otherwise it wont return any data
I have posted an fairly detailed response to a similar question here, even how to define the views you mention. Note that the code shown in the view definition is for illustration only. It will not show the most efficient way to write it. Better ways should be fairly straight-forward, however.
A word about view performance. Take a view that joins very large tables in such a way that the query
select * from <view>
takes a long time, say 30 minutes. The query
select * from <view> where <criteria>
could take fractions of a second. In most modern DBMSs, the where criteria is merged with the existing query in the view definition to execute the query. It does not execute the view definition and then do the filtering. So test view performance with actual queries not "data dump" queries.
I tried to design a data structure for easy and fast querying (delete, insert an update speed does not really matter for me).
The problem: transitive relations, one entry could have relations through other entries whose relations I don't want to save separately for every possibility.
Means--> I know that Entry-A is related to Entry-B and also know that Entry-B is related to Entry-C, even though I don't know explicitly that Entry-A is related to Entry-C, I want to query it.
What I think the solution is:
Eliminating the transitive part when inserting, deleting or updating.
Entry:
id
representative_id
I would store them as sets, like group of entries (not mysql set type, the Math set, sorry if my English is wrong). Every set would have a representative entry, all of the set elements would be related to the representative element.
A new insert would insert the Entry and set the representative as itself.
If the newly inserted entry should be connected to another, I simply set the representative id of the newly inserted entry to the referred entry's rep.id.
Attach B to A
It doesn't matter, If I need to connect it to something that is not a representative entry, It would be the same, because every entry in the set would have the same rep.id.
Attach C to B
Detach B-C: The detached item would have become a representative entry, meaning it would relate to itself.
Detach B-C and attach C to X
Deletion:
If I delete a non-representative entry, it is self explanatory. But deleting a rep.entry is harder a bit. I need to chose a new rep.entry for the set and set every set member's rep.id to the new rep.entry's rep.id.
So, delete A in this:
Would result this:
What do you think about this? Is it a correct approach? Am I missing something? What should I improve?
Edit:
Querying:
So, If I want to query every entry that is related to an certain entry, whose id i know:
SELECT *
FROM entries a
LEFT JOIN entries b ON (a.rep_id = b.rep_id)
WHERE a.id = :id
SELECT * FROM AlkReferencia
WHERE rep_id=(SELECT rep_id FROM AlkReferencia
WHERE id=:id);
About the application that requires this:
Basically, I am storing vehicle part numbers (references), one manufacturer can make multiple parts that can replace another and another manufacturer can make parts that are replacing other manufacturer's parts.
Reference: One manufacturer's OEM number to a certain product.
Cross-reference: A manufacturer can make products that objective is to replace another product from another manufacturer.
I must connect these references in a way, when a customer search for a number (doesn't matter what kind of number he has) I can list an exact result and the alternative products.
To use the example above (last picture): B, D and E are different products we may have in store. Each one has a manufacturer and a string name/reference (i called it number before, but it can be almost any character chain). If I search for B's reference number, I should return B as an exact result and D,E as alternatives.
So far so good. BUT I need to upload these reference numbers. I can't just migrate them from an ALL-IN-ONE database. Most of the time, when I upload references I got from a manufacturer (somehow, most of the time from manually, but I can use catalogs too), I only get a list where the manufacturer tells which other reference numbers point to his numbers.
Example.:
Asas filter manufacturer, "AS 1" filter has these cross references (means, replaces these):
GOLDEN SUPER --> 1
ALFA ROMEO --> 101000603000
ALFA ROMEO --> 105000603007
ALFA ROMEO --> 1050006040
RENAULT TRUCKS (RVI) --> 122577600
RENAULT TRUCKS (RVI) --> 1225961
ALFA ROMEO --> 131559401
FRAD --> 19.36.03/10
LANDINI --> 1896000
MASSEY FERGUSON --> 1851815M1
...
It would took ages to write all of the AS 1 references down, but there is many (~1500 ?). And it is ONE filter. There is more than 4000 filter and I need to store there references (and these are only the filters). I think you can see, I can't connect everything, but I must know that Alfa Romeo 101000603000 and 105000603007 are the same, even when I only know (AS 1 --> alfa romeo 101000603000) and (as 1 --> alfa romeo 105000603007).
That is why I want to organize them as sets. Each set member would only connect to one other set member, with a rep_id, that would be the representative member. And when someone would want to (like, admin, when uploading these references) attach a new reference to a set member, I simply INSERT INTO References (rep_id,attached_to_originally_id,refnumber) VALUES([rep_id of the entry what I am trying to attach to],[id of the entry what I am trying to attach to], "16548752324551..");
Another thing: I don't need to worry about insert, delete, update speed that much, because it is an admin task in our system and will be done rarely.
It is not clear what you are trying to do, and it is not clear that you understand how to think & design relationally. But you seem to want rows satisfying "[id] is a member of the set named by member [rep_id]".
Stop thinking in terms of representations and pointers. Just find fill-in-the-(named-)blank statements ("predicates") that say what you know about your application situations and that you can combine to ask about your application situations. Every statement gets a table ("relation"). The columns of the table are the names of the blanks. The rows of the table are the ones that make its statement true. A query has a statement built from its table's statements. The rows of its result are the ones that make its statement true. (When a query has JOIN of table names its statement ANDs the tables' statements. UNION ORs them. EXCEPT puts in AND NOT. WHERE ANDs a condition. Dropping a column by SELECT corresponds to logical EXISTS.)
Maybe your application situations are a bunch of cells with values and pointers. But I suspect that your cells and pointers and connections and attaching and inserting are just your way of explaining & justifying your table design. Your application seems to have something to do with sets or partitions. If you really are trying to represent relations then you should understand that a relational table represents (is) a relation. Regardless, you should determine what your table statements are. If you want design help or criticism tell us more about your application situations, not about representation of them. All relational representation is by tables of rows satisfying statements.
Do you really need to name sets by representative elements? If we don't care what the name is then we typically use a "surrogate" name that is chosen by the DBMS, typically via some integer auto-increment facility. A benefit of using such a membership-independent name for a set is that we don't have to rename, in particular by choosing an element.
I am designing a laboratory information system (LIS) and am confused on how to design the tables for the different laboratory tests. How should I deal with a table that has an attribute with multiple values and each of the multiple values of that attribute can also have multiple values as well?
Here's some of the data in my LIS design...
HEMATOLOGY <-------- Lab group
**************************************************************
CBC <-------- Sub group 1
RBC <-------- Component
WBC
Hemoglobin
Hematocrit
MCV
MCH
MCHC
Platelet count
Hemoglobin
Hematocrit
WBC differential
Neutrophils
Lymphocytes
Monocytes
Eosinophils
Basophils
Platelet count
Reticulocyte count
ESR
Bleeding time
Clotting time
Pro-time
Peripheral smear
Malarial smear
ABO
RH typing
CLINICAL MICROSCOPY <-------- Lab Group
**************************************************************
Routine urinalysis <-------- Sub group 1
Visual Examination <-------- Sub group 2
Color <-------- Component
Turbidity
Specific Gravity
Chemical Examination
pH
protein
glucose
ketones
RBC
Hbg
bilirubin
specific gravitiy
nitrite for bacteria
urobilinogen
leukocyte esterase
Microscopic Examination
Red Blood Cells (RBCs)
White Blood Cells (WBCs)
Epithelial Cells
Microorganisms (bacteria, trichomonads, yeast)
Trichomonads
Casts
Crystals
Occult Blood
Pregnancy Test
...This hierarchy of data also gets repeated in other lab groupings in my design (e.g. Blood chemistry, Serology, etc)...
Another question is, how am I gonna deal with a component (for example, RBC) which can be a member of one or more lab groups?
I already implemented a solution to my problem by making a separate tables, 1 for lab group, 1 for sub group 1, 1 for sub group 2 and 1 for component. And then created another table to consolidate all of them by placing a foreign key of each in this table...the only trade off is that some of the rows in this table may have null values. Im not satisfied with my design, so I'm hoping someone could give me advise on how to make it right; any help would be greatly appreciated.
Here are a couple options:
If it is just the hierarchy above you are modeling, and there is no other data involved, then you can do it in two tables:
One problem with this is that you do not enforce that, for example, a sub_group must be a child of a lab_group, or that a component must be child of either a sub_group_1 or a sub_group_2, but you could enforce these requirements in your application tier instead.
The plus side of this approach is that the schema is nice and simple. Even if the entities have more data associated with them, it might still be worth modeling the hierarchy like this and have some separate tables for the entities themselves.
If you want to enforce the correct relationships at the data level, then you are going to have to split it out into separate tables. Maybe something like this:
This assumes that each sub_group_1 is only related to a single lab_group. If this is not the case then add a link table between lab_group and sub_group_1. Likewise for the sub_group_1 -> sub_group_2 relationship.
There is a single link table between component and sub_group_1 and sub_group_2. This allows a single component to be related to several sub_group_1 and sub_group_2 entities. The fact it is a single table means that a lot of the sub_group_1_id and sub_group_2_id records will be null (like you mentioned in your question). You could prevent the nulls be having two separate link tables:
sub_group_1_component with a foreign key to sub_group_1 and a foreign key to component
sub_group_2_component with a foreign key to sub_group_2 and a foreign key to component
The reason I didn't put this in the diagram is that for me, having to query two tables rather than one to get all the component -> sub_group relationships is too much of a pain. For the sake of a little denormalisation (allowing a few nulls) it is much easier to query a single table. If you find yourself allowing a lot of nulls (like a single link table for the relationships between all the entities here) then that is probably denormalising too much.
Personally, I would create 3 tables using relationships for the values. It gives you the ability to create limitless arrays of values. Just try to make sure you give great column names, or your head will spin for days. :)
Also, null values aren't a problem look into all the different type of joins
I have a site where some pages (we call them gateway pages) are based loosely on certain departments in the organization. Each department has classes associated with it. Unfortunately some of my pages are not associated with a specific department, but do display information about several classes from a department so I can't just query the database strictly on department alone.
Would it be smarter to create a table called gateway_classes with a fk from the gateway table in each class or form a query to somehow filter out exactly what I need from my existing tables using an array of classes to be pulled during the query?
Here's my tables:
departments_classes | classes_vendors | departments | vendors | classes | products | gateway
Any guidance is greatly appreciated.
More Info: There are roughly 350 classes and 18 departments and 12 gateway pages...
Your indexing table idea sounds like it'd work just fine. The only downside to that is that you've got to maintain it separately, and you want to make sure that the data you hold in that table isn't being duplicated in any of your existing tables.
If you don't want to maintain that data differently than you're currently doing so, you can use CF's arrays (or structs) to hold that correlation data (which you'd have to pull from the db in a separate query) and then loop over it as you construct the query that pulls the classes for a given page.
Either way would work okay, it's more a matter of how you prefer to do it, and what you think would be easiest to build, test, and maintain.
One thing about efficiency - make sure you not only link your tables via Foreign Keys (which helps to maintain data integrity), but also put in (nonclustered) indices, which helps the efficiency of the joins and lookups your queries will be doing.
I've seen dramatic speed improvements in my queries (CFQUERYs operating against MS SQL) with the simple act of putting in indices.
In MS SQL, you do so like this:
CREATE NONCLUSTERED INDEX yourIndexName ON yourTableName(yourFieldName)
I hope this helps!
Your problem sounds similar to a common scenario for determining user rights. A User may belong to some Group that has Rights associated with it or the User may be assigned Rights individually. In your case, the User is the Gateway, the Group is the Department, and the Rights are the Classes. A Gateway can then be linked to any number of Departments and/or Classes.
Using this model, you just need to add the gateway_classes table as you describe along with a gateway_departments table.
You could then use UNION to merge the "gateway classes" query with the "gateway departments" query (or perhaps something more elegant) but I think this schema will do want you need without introducing any redundant information.
Im trying to use to define a one-to-many relationship in a single table. For example lets say I have a Groups table with these entries:
Group:
Group_1:
name: Atlantic Records
Group_2:
name: Capital Records
Group_3:
name: Gnarls Barkley
Group_4:
name: Death Cab For Cutie
Group_5:
name: Coldplay
Group_6:
name: Management Company
The group Coldplay could be a child of the group Capital Records and a child of the group Management Company and Gnarls Barkley could only be a child of Atlantic Records.
What is the best way to represent this relationship. I am using PHP and mySQL. Also I am using PHP-Doctrine as my ORM if that helps.
I was thinking that I would need to create a linking table called group_groups that would have 2 columns. owner_id and group_id. However i'm not sure if that is best way to do this.
Any insight would be appreciated. Let me know if I explained my problem good enough.
There are a number of possible issues with this approach, but with a minimal understanding of the requirements, here goes:
There appear to be really three 'entities' here: Artist/Band, Label/Recording Co. and Management Co.
Artists/Bands can have a Label/Recording CO
Artists/Bands can have a Management Co.
Label/Recording Co can have multiple Artists/Bands
Management Co can have multiple Artists/Bands
So there are one-to-many relationships between Recording Co and Artists and between Management Co and Artists.
Record each entity only once, in its own table, with a unique ID.
Put the key of the "one" in each instance of the "many" - in this case, Artist/Band would have both a Recording Co ID and a Management Co ID
Then your query will ultimately join Artist, Recording Co and Management Co.
With this structure, you don't need intersection tables, there is a clear separation of "entities" and the query is relatively simple.
A couple of options:
Easiest: If each group can only have one parent, then you just need a "ParentID" field in the main table.
If relationships can be more complex than that, then yes, you'd need some sort of linking table. Maybe even a "relationship type" column to define what kind of relationship between the two groups.
In this particular instance, you would be wise to follow Ken G's advice, since it does indeed appear that you are modeling three separate entities in one table.
In general, it is possible that this could come up -- If you had a "person" table and were modeling who everybody's friends were, for a contrived example.
In this case, you would indeed have a "linking" or associative or marriage table to manage those relationships.
I agree with Ken G and JohnMcG that you should separate Management and Labels. However they may be forgetting that a band can have multiple managers and/or multiple managers over a period of time. In that case you would need a many to many relationship.
management has many bands
band has many management
label has many bands
band has many labels
In that case your orginal idea of using a relationship table is correct. That is home many-to-many relationships are done. However, group_groups could be named better.
Ultimately it will depend on your requirements. For instance if you're storing CD titles then perhaps you would rather attach labels to a particular CD rather than a band.
This does appear to be a conflation of STI (single-table inheritance) and nested sets / tree structures. Nested set/trees are one parent to multiple children:
http://jgeewax.wordpress.com/2006/07/18/hierarchical-data-side-note/
http://www.dbmsmag.com/9603d06.html
http://www.sitepoint.com/article/hierarchical-data-database
I think best of all is to use NestedSet
http://www.doctrine-project.org/documentation/manual/1_0/en/hierarchical-data#nested-set
Just set actAs NestedSet
Yes, you would need a bridge that contained the fields you described. However, I would think your table should be split if it is following the same type of entities as you describe.
(I am assuming there is an id column which can be used for references).
You can add a column called parent_id (allow nulls) and store the id of the parent group in it. Then you can join using sql like: "Select a., b. from group parent join group child on parent.id = child.parent_id".
I do recommend using a separate table for this link because:
1. You cannot support multiple parents with a field. You have to use a separate table.
2. Import/Export/Delete is way more difficult with a field in the table because you may run into key conflicts. For example, if you try to import data, you need to make sure that you first import the parents and then children. With a separate table, you can import all groups and then all relationships without worrying about the actual order of the data.