I have a categories table in MySql something like this:
categoryId | categoryTitle | definedField | parentId
1 Title 123 NULL
2 AnotherTitle 234 1
3 AndAnotherOne NULL 1
What I need to do is find the closest definedField value by going up to parent,like this;
Since category 2 has a definedField, return its value;
Since category 3 does not have a definedField, search up, to its parent. It has definedField, so return it. If it didn't have one, search up until find one.
There will ALLWAYS be the topmost category that will have definedField set. I only need to find a good algorithm to search for this in a MySQL InnoDb table.
There is no direct way of retrieving hierarchical data in MySQL (like, for example, Postgres's RECURSIVE query). There is a good article summarizing different ways of implementing nested data set in MySQL: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Most users at one time or another have dealt with hierarchical data in
a SQL database and no doubt learned that the management of
hierarchical data is not what a relational database is intended for.
The tables of a relational database are not hierarchical (like XML),
but are simply a flat list. Hierarchical data has a parent-child
relationship that is not naturally represented in a relational
database table.
The article covers two models: Adjacency List and Nested Set.
The Adjacency List Model
In the adjacency list model, each item in the table contains a pointer
to its parent. The topmost element, in this case electronics, has a
NULL value for its parent. The adjacency list model has the advantage
of being quite simple, it is easy to see thatFLASH is a child ofmp3
players, which is a child of portable electronics, which is a child of
electronics. While the adjacency list model can be dealt with fairly
easily in client-side code, working with the model can be more
problematic in pure SQL.
The Nested Set Model
In the Nested Set Model, we can look at our hierarchy in a new way,
not as nodes and lines, but as nested containers.
Related
I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?
What database (and schema, if applicable) would be most appropriate for storing and retrieving data (location, timestamp) that can be placed at any node of an arbitrarily defined tree? For instance: the location of a book you own:
Book
| |
Home Work
| | | | |
Bedroom Bathroom Den Office Conf room
| | | | |
Closet Underbed EntCtr Closet Desk
| |
Top Shelf Bottom Shelf
XXXX
For each item record, the item's position could look conceivably different but likely the same root and primary nodes, but beyond that could have a different branches and leaves where the item is actually located. And with each added item, the tree itself could conceivably grow (you could add specificity to that "top shelf in the bedroom closet" node eventually, placing newer items in one of 2-3 sub-locations).
I'm thinking a SQL db might not be ideal since the tree could expand arbitrarily and could be entirely different depending on user, but not sure how a NoSQL db like Mongo could handle any updating/expansion (like if the example book is moved from an existing node to a new one a level or two deeper). Maybe the depth/breadth of tree levels could be constrained if using a SQL db, but the column labels could vary, and on the other hand Mongo could simply create a new document for an item if it is moved to a new location.
Any insights from database experts very much appreciated!
Locations, especially those managed by different organizations, are not necessarily hierarchical. For example, Russia is in Europe and Asia. Texarkana is in Texas and Arkansas. US ZIP 42223 is in Kentucky and Tennessee. Geopolitical locations are graphical / networked.
That being said, you can easily model hierarchical data in a SQL database by using an adjacency list:
create table locations (
location_id int primary key,
name text not null,
parent_id int null references locations(location_id)
);
You can then query such a table using Recursive Common Table Expressions (CTEs), which are available in every major database except MySQL, but it sounds like switching databases is an option for you.
Here's an example: http://blog.databasepatterns.com/2014/02/trees-paths-recursive-cte-postgresql.html
You don't need Nested Set, Materialized Path or Closure Table if your DB supports RCTE.
When you say "SQL DB" I think you are referencing a relational database. For this you seem to want a hierarchical database. You can get such a structure in a relational DB. It's called a Nested Set Model. See: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
I am not a pro in MySQL, but want to do something like Object Layer above relational MySQL tables.
I want to have very many "structures" with a fields of type "bigint", "longtext", "datetime", "double" stored in just 7 tables.
entity_types (et_id, et_name) - list of "structures";
entity_types_fields (etf_id, parent_et_id, ....., etf_ident, etf_type) - list of structure properties stored in one table for ALL structures; etf_type contains int value (0,1,2,3) which referenced to one of 4 tables described below.
entities (e_id, et_id) - list of all available entities (id and type id of entity)
and 4 data tables (containing all data for entities) -
entities_props_bigint (parent_e_id, parent_etf_id, ep_data) - for BIGINT data properties
entities_props_longtext (parent_e_id, parent_etf_id, ep_data) - for LONGTEXT data properties
entities_props_datetime (parent_e_id, parent_etf_id, ep_data) - for DATETIME data properties
entities_props_double (parent_e_id, parent_etf_id, ep_data) - for DOUBLE data properties
What the best way to do selection from such data layer ?
Let I have list of e_id (id of entities), each entity can have any type. I want to get predefined list of properties. If some of entities don't have such property, I want to have it equal to NULL.
Do you have some info about how to do it ? May be you have some links or have already deal with such things.
Thanks!
You're reinventing the wheel by implementing a whole metadata system on top of a relational database. Many developers have tried to do what you're doing and then use SQL to query it, as if it is relational data. But implementing a system of non-relational data and metadata in SQL is harder than you expect.
I've changed the relational tag of your question to eav, because your design is a variation of the Entity-Attribute-Value design. There's a limit of five tags in Stack Overflow. But you should be aware that your design is not relational.
A relational design necessarily has a fixed set of attributes for all instances of an entity. The right way to represent this in a relational database is with columns of a table. This allows you to give a name and a data type to each attribute, and to ensure that the same set of names and their data types apply to every row of the table.
What the best way to do selection from such data layer ?
The only scalable way to query your design is to fetch the attribute data and metadata as rows, and reconstruct your object in application code.
SELECT e.e_id, f.etf_ident, f.etf_type,
p0.ep_data AS data0,
p1.ep_data AS data1,
p2.ep_data AS data2,
p3.ep_data AS data3
FROM entities AS e
INNER JOIN entity_type_fields AS f ON e.et_id = f.parent_et_id
LEFT OUTER JOIN entities_props_bigint AS p0 ON (p0.parent_e_id,p0.parent_etf_id) = (e.e_id,f.etf_id)
LEFT OUTER JOIN entities_props_longtext AS p1 ON (p1.parent_e_id,p1.parent_etf_id) = (e.e_id,f.etf_id)
LEFT OUTER JOIN entities_props_datetime AS p2 ON (p2.parent_e_id,p2.parent_etf_id) = (e.e_id,f.etf_id)
LEFT OUTER JOIN entities_props_double AS p3 ON (p3.parent_e_id,p3.parent_etf_id) = (e.e_id,f.etf_id)
In the query above, each entity field should match at most one property, and the other data columns will be null. If all four data columns are null, then the entity field is missing.
Re your comment, okay now I understand better what you are trying to do. You have a collection of entity instances in a tree, but each instance may be a different type.
Here's how I would design it:
Store any attributes that all your entity subtypes have in common in a sort of super-type table.
entities(e_id,entity_type,name,date_created,creator,sku, etc.)
Store any attributes specific to an entity sub-type in their own table, as in Martin Fowler's Class Table Inheritance design.
entity_books(e_id,isbn,pages,publisher,volumes, etc.)
entity_videos(e_id,format,region,discs, etc.)
entity_socks(e_id,fabric,size,color, etc.)
Use the Closure Table design to model the hierarchy of objects.
entity_paths(ancestor_e_id, descendant_e_id, path_length)
For more information on Class Table Inheritance and Closure Table, see my presentations Practical Object-Oriented Models in SQL and Models for Hierarchical Data in SQL, or my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming, or Martin Fowler's book Patterns of Enterprise Application Architecture.
I have the following information that should be retrieved by using several dependent select fields on a web form:
Users will be able to add new categories.
Food
- Fruits
- Tropical
- Pineapples
- Pineapples - Brazil
- Pineapples - Hawaii
- Coconuts
- Continental
- Orange
- Fish
....
This data should come from a database.
I realize that creating a table for each category here presented is not a good schema perhaps, so I would to ask, if is there any standard way to deal with this?
I'm also aware of this schema example:
Managing Hierarchical Data in MySQL
Is there any other (perhaps more intuitive way) to store this type of information ?
The link you provided describes the two standard ways for storing this type of information:
Adjacency List
Nested Sets
One issue your question didn't raise is whether all fruits have the same attributes or not.
If all fruits have the same attributes, then the answer that tells you to look at the link you provided and read about adjacency lists and nested sets is correct.
If new fruits can have new attributes, then a user that can add a new fruit can also add a new attribute. This can turn into a mess, real easily. If two users invent the same attribute, but give it a different name, that might be a problem. If two users invent different attributes, but give them the same name, that's another problem.
You might just as well say that, conceptually, each user has their own database, and no meaningful queries can be made that combine data from different users. Problem is, the mission of the database almost always includes, sooner or later, bringing together all the data from the different users.
That's where you face a nearly impossible data management issue.
Kawu gave you the answer.... a recursive relation (the table will be be related to itself) aka Pig's Ear relation.
You example shows a parent with several children, but you didn't say if an item can belong to more that one parent. Can an orange be in 'Tropical' and in 'Citrus'?
Each row has an id and a parent_id with the parent_id pointing to the id of another row.
id=1 name='Fruits' parent_id=0
id=2 name='Citrus' parent_id=1
id=3 name='Bitter Lemon' parent_id=2
id=4 name='Pink Grapefruit' parent_id=2
Here are some examples of schemas using this type of relation to provide unlimited parent-child relations:
Data model for product categories
Data model for organizations and people
I have the following parent <-> child datamodel:
(almost every line is a table, indented means child-of)
consumerGoods
food
meat
item
fruit
item
vegetable
item
The child-items of meat, fruit and vegetables are in the same table (named items) because they have identical attributes. In the items table I have fields that describes the parent and the parentId.
So an item record could be:
id:1
parentType:meat
parentId:4
price:3.25
expDate:2009-12-31
description:bacon
I'm now building a full text MySQL search for the contents of the description field in "items", but I also want each result to have the information of its parent table, so a "bacon-item" has the data that's in its parent record. I also want each returned result to have data that is in the parent food record and the parent consumerGoods record.
I've got the following query now, but I don't know how to join based on the value of a field in a record, or if that's even possible.
SELECT
*
FROM
item
WHERE MATCH
(description
AGAINST
('searchKey')
One way to do this is is to do multiple queries for each matching "item" record, but if I had a lot of results that would be a lot of queries and would also slow down any filtering I'd want to do for facet-based searching. Another option is to make a new table that contains all the parent item info for each item record and search through that, but then I'd have to constantly update that table if I add item records, which is redundant and quite some work.
I'd like to hear it if I'm thinking in the right direction, or if I'm totally misguided. Any suggestions welcome.
As a general rule of thumb your database structure should contain data, but should not itself be data. A sign that you're breaking this is when you feel that you have to join to a different table based on the data you're reading from some other table. At that point you need to back up and consider your overall data model because odds are very good that you're doing something not quite right.
You could join against a subquery containing the union of all parent types:
select *
from item
left join (
select 'meat' as type, Redness, '' as Ripeness from meat
union all
select 'fruit' as type, -1 as Redness, Ripeness from fruit
union all
select 'vegetable' as type, -1 as Redness, Ripeness from vegetable
) parent on parent.type = item.parentType
But if you can, redesign the database. Instead of the complex model, change it to one table of Items and one table of Categories. The categories should contain one row for meat, one for fruit, and one for vegetables.
Since your example is contrived, it's difficult to know what the actual information requirements are in your case. Damir's diagram shows you the correct way to model PKs and FKs when you have a super-type sub-type relationships.
This situation is one case of a pattern called "generalization-specialization". Almost any treatment of object modeling will deal with generalization-specialization, although it may use different terminology. However, if you want to find articles that help you build a relational database that uses specialization-generalization, search for "generalization specialization relational modeling".
The best of the articles will start by teaching you the same concept that Damir's response illustrated for you. From there, you will learn how to create queries and views that can search for either all kinds of items, or for particular kinds of items, if you know what you are searching for.
A sample view follows:
create view FruitItems as
select
c.ConsumerGoodsID,
Price,
Description,
ConsumerGoodType,
ExpiryDate,
FoodType,
IsTropic
from
ConsumerGoods c
INNER JOIN Food f on f.ConsumerGoodsID = c.ConsumerGoodsID
INNER JOIN Fruit fr on fr.ConsumerGoodsID = c.ConsumerGoodsID
Similarly, you could create views for VegetableItems, MeatItems, and HouseSupplyItems, and even one large view, namely Items, that's the union of each of the specialized views.
In the Items view IsTropic would be true for all tropical fruits, false for all non tropical fruits, and null for Meats, Vegetables, and HouseSupplies. I'm not going to show you the entire Item view for a contrived case, but you get the idea. Especially if you read the best of the articles on relational modeling of this pattern.
The Items view might be a little slow, but it could come in handy when you really don't know any better way to search. And if you search for Istropic = True, you'll automatically exclude all the Meats, Vegetables, and HouseSupplies.
As #Andomar suggested, the design is a bit off; having "multiple parent tables" does not map to DB foreign keys concept. Here is one possible suggestion. This one uses two levels of super-type/subtype relationships. Super-type table contains columns specific to all subtypes (categories), while subtype tables contain columns specific only to the category.