What table structure to use for nested data? - mysql

Okay, so I'm working on a project with hierarchical data that I'm using for a book-writing app like so:
top-level parent (Act) - contains Act name, position (first Act), description, and text (intro text)
mid-level parent / child (Chapter) - contains Chapter name, position (first Chapter), description, and text (intro text)
bottom-level (Section) - contains Section name, position (first Section), description, and text (actual content text)
Now, I want to have a variable number of levels (for example include sub-sections that will have the full text), but I'm not entirely sure how to create a table like this efficiently. My initial thought was to have them connected by parentId with top-level having a null for parentId.
For example, if I want to call up a top-level parent in the first position, it's not a big deal. Right now, I can search for nulled parent fields and for a position of 1.
To call up a "chapter" (mid-level), I do the same thing but get the id of the result and use it as a parentid.
The problem is for section, I'd have to have several sub-queries to get to the final result. If I want to ahve variable number of levels, I'd require numerous sub-queries which will be a performance hit.
I saw a similar question but did not really understand the answer or how i could use it.
I already considered some kind of taxonomy table or carrying over all parent ids into the children (ie. a section will have both chapter and act ids listed under parentid) but I'm not set on it. Feedbooks.com uses a similar hierarchy when submitting a book but they don't store data in a database, they just take an input and convert it to an output (a pdf, epub or whatever).
Oh and, I plan on building this in MySQL.
Ideas?
EDIT
An easier way of imagining this scenario is with a family. Let's say you have 3 grandfathers (Bill, Bernard, Boe), who have 5 children between them (John, Joey, Josh, Jeremy, Jackson), who, in turn have all 2 children of their own (examples: Donald, Duey, Donnie). And let's say the database does not store THAT relationship but rather the relationship of what child is staying where and we don't care about the biology here.
So let's say Donnie started out living with John who lives with Bill. Donnie is John's first child and John is Bill's second child. How do you query Bill's first grandchild of the second son?
Let's say they move around and now Donald is staying with John instead. Donald is John's second child (the first child position was filled with someone else) and John is still Bill's second child. How do you query Bill's second grandchild of the second son?
What if John moves with all his children to Boe's house. Boe is the second grandfather. How do you query this information now? How would you store this type of information?
What if you throw great-grandchildren into the mix now?

This is a good example to use closure tables, as on the answer you are referencing. You may want to take a look here and here.
Quoting from the first link:
The Closure Table is a design for representing trees in a relational
database by storing all the paths between tree nodes.

Related

Preserve data integrity in a database structure with two paths of association

I have this situation that is as simple as it is annoying.
The requirements are
Every item must have an associated category.
Every item MAY be included in a set.
Sets must be composed of items of the same category.
There may be several sets of the same category.
The desired logic procedure to insert new data is as following:
Categories are inserted.
Items are inserted. For each new item, a category is assigned.
Sets of items of the same category are created.
I'd like to get a design where data integrity between tables is ensured.
I have come up with the following design, but I can't figure out how to maintain data integrity.
If the relationship highlighted in yellow is not taken into account, everything is very simple and data integrity is forced by design: an item acquires a category only when it is assigned to a set and the category is given by the set itself.However, it would not be possible to have items not associated with a set but linked to a category and this is annoying.
I want to avoid using special "bridging sets" to assign a category to an item since it would feel hacky and there is no way to distinguish between real sets and special ones.
So I introduced the relationship in yellow. But now you can create sets of objects of different categories!
How can I avoid this integrity problem using only plain constraints (index, uniques, FK) in MySQL?
Also I would like to avoid triggers as I don't like them as it seems a fragile and not very reliable way to solve this problem...
I've read about similar question like How to preserve data integrity in circular reference database structure? but I cannot understand how to apply the solution in my case...
Interesting scenario. I don't see a slam-dunk 'best' approach. One consideration here is: what proportion of items are in sets vs attached only to categories?
What you don't want is two fields on items. Because, as you say, there's going to be data anomalies: an item's direct category being different to the category it inherits via its set.
Ideally you'd make a single field on items that is an Algebraic Data Type aka Tagged Union, with a tag saying its payload was a category vs a set. But SQL doesn't support ADTs. So any SQL approach would have to be a bit hacky.
Then I suggest the compromise is to make every item a member of a set, from which it inherits its category. Then data access is consistent: always JOIN items-sets-categories.
To support that, create dummy sets whose only purpose is to link to a category.
To address "there is no way to distinguish between real sets and special ones": put an extra field/indicator on sets: this is a 'real' set vs this is a link-to-category set. (Or a hack: make the set-description as "Category: <category-name>".)
Addit: BTW your "desired logic procedure to insert new data" is just wrong: you must insert sets (Step 3) before items (Step 2).
I think I might found a solution by looking at the answer from Roger Wolf to a similar situation here:
Ensuring relationship integrity in a database modelling sets and subsets
Essentially, in the items table, I've changed the set_id FK to a composite FK that references both set.id and set.category_id from, respectively, items.set_id and item.category_id columns.
In this way there is an overlap of the two FKs on items table.
So for each row in items table, once a category_id is chosen, the FK referring to the sets table is forced to point to a set of the same category.
If this condition is not respected, an exception is thrown.
Now, the original answer came with an advice against the use of this approach.
I am uncertain whether this is a good idea or not.
Surely it works and I think that is a fairly elegant solution compared to the one that uses tiggers for such a simple piece of a a more complex design.
Maybe the same solution is more difficult to understand and maintain if heavily applied to a large set of tables.
Edit:
As AntC pointed out in the comments below, this technique, although working, can give insidious problems e.g. if you want to change the category_id for a set.
In that case you would have to update the category_id of each item linked to that set.
That needs BEGIN COMMIT/END COMMIT wrapped around the updates.
So ultimately it's probably not worth it and it's better to investigate the requirements further in order to find a better schema.

How to organize comic book location information in a database?

I'm working on a comic book database project, and I need to be able to include the various locations within a particular comic issue. There are a couple issues I have to work with:
Locations are more often than not inside other locations (the "Daily
Bugle building" is on "The corner of 39th street and 2nd Avenue" is
in "New York City" is in "New York", etc.)
While the hierarchy of locations is pretty standard
(Universe->Dimension->Galaxy->System->Planet->Continent->Country->State->City->Street->Building->Room),
not all the parent locations are necessarily known for every location
(a comic might involve a named building in an unnamed country in
Africa for instance).
There are a few locations that don't fit into that nice hierarchy but
branch off at some point (for instance, "The Savage Land" is a giant
jungle in Antarctica, so while its parent is a Continent, it is not a
country).
My main goal is to be able to run a search for any location and get all issues that have that location or any locations within that location. A secondary goal is to be able on the administration side of the application to be able to autocomplete full locations (ie I type in a new building for an issue and specify that it is in New York City, and it pulls all "New York City" instances -- yes, there is more than one :P -- in the database and lets me chose the one in Earth-616 or the one in Earth-1610 or I can just add a new New York City under different parent locations). All that front-end stuff I can do and figure out when the time comes, I'm just unsure of the database setup at this point.
Any help would be appreciated!
Update:
After a lot of brainstorming with a couple peers, I think I have come up with a solution that is a bit simpler than the nested model that has been suggested.
The location table would look like this:
ID
Name
Type (enum list of the previously mentioned categories, including an
'other' option)
Uni_ID (ID of the parent universe, null if not applicable)
Dim_ID (ID of the parent Dimension, null if not applicable)
Gal_ID (ID of the parent Galaxy, null if not applicable)
...and so on through all the categories...
Bui_ID (ID of the parent Building, null if not applicable)
So while there are a lot of fields, searching and autocomplete work really easily. All the parents of any given location are right there in the row, all the children of any location can be found with a single query, and as soon as a type is defined for a new location, autocomplete would work easily. At this point, I'm leaning towards this approach instead of the nested model, unless anyone can point out any problems with this setup that I haven't seen.
For hierarchical data, I always prefer using a nested set model to a parent->child (adjacency) model. Look here for a good explanation and example queries. It's a more complicated data model, but it makes querying and searching the data much easier.
I really like what #King Isaac linked earlier about the nested set model. The only arguments I have with what the link said is scalability. If you're defining your lft and rgt boundaries, you have to know how many elements you have, or you have to set arbitrarily large numbers and just hope that you never reach it. I don't know how big this database will be and how many entries you'll have, but it's good to implement a model that doesn't require re-indexing and the such. Here's my modified version
create table #locations (id varchar(100),
name varchar(300),
descriptn varchar(500),
depthLevelId int)
create table #depthLevel(id int,
levelName varchar(300))
***Id level structuring***
10--level 1
100 101-- level 2
1000 1001 1010 1011 --level 3
10000 10001 10010 10011 10100 10101 10110 10111 --level 4
Essentially this makes for super simple queries. The important part is the child id is comprised of the parent id plus whatever random id you want to give it. It doesn't even have to be sequential, just unique. You want everything in the universe?
SELECT *
FROM #locations
WHERE id like '10%'
You want something down the 4th level?
SELECT *
FROM #locations
WHERE id like '10000%'
The id's might get a little long when you get down so many levels but does that really matter when you're writing simple queries? And since it's just a string you can have a very large amount of expandability without ever having to reindex.

How to store hierarchical information into a database?

I have the following information that should be retrieved by using several dependent select fields on a web form:
Users will be able to add new categories.
Food
- Fruits
- Tropical
- Pineapples
- Pineapples - Brazil
- Pineapples - Hawaii
- Coconuts
- Continental
- Orange
- Fish
....
This data should come from a database.
I realize that creating a table for each category here presented is not a good schema perhaps, so I would to ask, if is there any standard way to deal with this?
I'm also aware of this schema example:
Managing Hierarchical Data in MySQL
Is there any other (perhaps more intuitive way) to store this type of information ?
The link you provided describes the two standard ways for storing this type of information:
Adjacency List
Nested Sets
One issue your question didn't raise is whether all fruits have the same attributes or not.
If all fruits have the same attributes, then the answer that tells you to look at the link you provided and read about adjacency lists and nested sets is correct.
If new fruits can have new attributes, then a user that can add a new fruit can also add a new attribute. This can turn into a mess, real easily. If two users invent the same attribute, but give it a different name, that might be a problem. If two users invent different attributes, but give them the same name, that's another problem.
You might just as well say that, conceptually, each user has their own database, and no meaningful queries can be made that combine data from different users. Problem is, the mission of the database almost always includes, sooner or later, bringing together all the data from the different users.
That's where you face a nearly impossible data management issue.
Kawu gave you the answer.... a recursive relation (the table will be be related to itself) aka Pig's Ear relation.
You example shows a parent with several children, but you didn't say if an item can belong to more that one parent. Can an orange be in 'Tropical' and in 'Citrus'?
Each row has an id and a parent_id with the parent_id pointing to the id of another row.
id=1 name='Fruits' parent_id=0
id=2 name='Citrus' parent_id=1
id=3 name='Bitter Lemon' parent_id=2
id=4 name='Pink Grapefruit' parent_id=2
Here are some examples of schemas using this type of relation to provide unlimited parent-child relations:
Data model for product categories
Data model for organizations and people

Sql Server Analysis Services Parent Child with non-unique key

I'm currently building our Data Warehouse, primarily using Ralph Kimball's methods and guidance.
We are using the Microsoft stack for this (so SSIS, SSAS).
I am a bit stuck deciding how to handle BOMS (Bill of Materials) which is effectively an unbalanced hierarchy.
The BOM handles assemblies which are a collection of parts. Each part can have it's own child parts and each part can also appear more than once in different assemblies.
I'm trying to use a DimBOM table as follows...
Now in SSAS I can join the table to itself (ChildItemNumber to ItemNumber) and create a dimension. The dimension will pick up the relationship and create a parent-child link.
The problem is, The ItemNumber in this case is not necessarily unique (because a child item can be a parent itself). If I try to process the dimension SSAS warns about a non unique attribute key.
Is there a way of handling this, short of reverting to an exploded hierarchy e.g.
(source: bimonkey.com)
I had the same problem, in my case fetching hierarchies from SAP tables, after much searching on Internet and work I found the solution. You can find it in my blog here: http://biwithjb.wordpress.com/
It looks a bit complicated due to the SAP data complexities, but in the overall is quite simple... just a couple of tricks here and there ;)
Hope it helps.
I think you might be confusing two things here which are the parts and the assemblies.
one of the key notions in a Parent Child Dimension is that though a father may have many children and grandchildren, a child may only have one parent.
so, i think the parts may be a Parent Child Dimension dimension of their own like:
parent key, child key, business key, name, amount
null, 45, A5286, connection rod,
45, 51, B1452, bolt, 2
45, 52, B5874, rod, 1
(if you need 2 bolts and 1 rod to build a connection rod)
and assemblies may be another Parent Child Dimension:
parent key, child key, business key, name, amount
655, 745, E2497, Motorbike, 2
745, 874, E7482, engine, 1
(if you need 1 engine to build a motorbike)
and they can connect pehaps in a sort of fact where:
child key part, child key assembly, amount
45, 874, 3
(if you need 3 engine rods in one engine)
always try to connect at the lowest relevant level.
in any case, look at adventure works parent child dimension (the enterprise soloution has a few of them) and also look at the relational table and data of them.
hope it helped you find an answer that's relevant for you,
ella

MYSQL: How can I store/retrieve a value based on the 3+ other values in the same table?

Let's say I'm making a program for an English class. I'd like to store data in this way:
ID Object
0 Present Tense
1 1st person singular
2 To Be
3 I am
How can I retrieve the value for ID 3 based on IDs 0-2? The only thing I can think of is:
ID Object FromIDs
3 I am 0,1,2
The problem with this is that I'd have to do a fulltext index and I think this table is going to get pretty large. I don't want separate tables for different types of objects, if possible, because I don't know what I'll end up doing with these objects and I want as much flexibility as possible. I could have a second table relating IDs to each other, but I've only done that successfully relating a column from one table to a column to another.
Thanks in advance!
You need to break the data into different tables. Have a table that stores the "tense"
and another that stores the type "1st person singular".
Can you explain your problem a little more. From what you have I'm not sure if you're trying to go down the path of Entity-Attribute-Value or probably what is more likely is that relational database is not a good fit for your problem; you may need to use some sort of tree data structure. If you update, I can try to provide a better answer.
What I've decided to do is a combination of what was suggested and what I originally thought. I'm going to have a master list with IDs that are auto-incremented and copied to other tables. That way, I have properties of different parts of speech separated, but still have everything relating to everything else.
This is really not a good fit for a relational database. Sorry, you're trying to drive a nail using a screwdriver.
When you have no distinction between an attribute type and a value, you're modeling semantic data. The open standard for this type of data modeling is RDF.
My solution (if you really dont want to break up the table)
**ParentChildTable**
ParentID ChildID
0 3
1 3
2 3
But well, in one table now you have:
-type of tense
-type of person (1st, 3rd....)
-values
So i think it would be better to split, i can see .. .well, right now, 3 tables: values, tensetypes, personetypes and relationship table (value-value for tense/person)