Recursion the optimal solution in this case? - mysql

I'm getting a headache over this. I'm building a system, that can handle a number of projects, groups and file references.
Please take a look at this:
A user should be able to create an infinite number of projects, an infinite numbers of groups and attach an infinite number of file references - much like an ordinary PC file structure works with drive-letters, folders and files.
All of the mentioned elements resides inside a MySQL database. However, I'm not sure if this (see below) is the optimal way of structuring the whole thing:
As you can see, it contains one entity called "Xrefs", containing projects and groups. The rows points inside itself, probably making it ideal to do a recursive call when retrieving the data.
A different approach could be to create 1 entity for projects, 1 entity for groups and 1 entity for file references... as well as 1 helper entity, that ties the three entities together, also containing a "parent" value, that (similar to the first solution) refers to the upper level tuples in order to create a hierachy.
If you were to build a similar project, what would you do?

You hit one of the best known restrictions of MySQL: the ability to use what is called recursive queries (PostgreSQL) or CTE queries (Oracle). There are some possibles workarounds, but considering a project with this kind of requirements you'd probably suffer a lot with many other well known MySQL limitations. Even SQLLite would be more usefull (except for the one concurrent user restriction) on this matter.
DBIx::Class has some components to help you circumvent this MySQL limitations, search for Nested Trees, Ordered Trees, WITH RECURSIVE QUERY… [DBIx::Class::Tree::NestedSet][1]
You will need support for something like: 7.8. WITH Queries (Common Table Expressions), which MySQL do not offer to you.

Your structure is fine - since you are building a tree, not a general graph, there is no need for a separate table that ties entities together. I would put projects into their own table, because they appear to stand on their own, unless you must support hierarchy among projects as well.
However, given that your RDBMS is MySQL, you would have problems building recursive queries. For example, try thinking of a query that would give you all files related to xfer_id of 1 (i.e. the project). None of the files is tied to that ID, so you need to locate your first-level groups, then your second level groups, and then tie files to them. Since your groups can be nested in any number of levels, your query would have to be recursive as well.
Although you can certainly do it, it is currently not simple, and requires writing stored procedures. A common approach for situations like that is to build the tree in memory, with some assistance from RDBMS. The trick is to store the id of the top project in each group, i.e.
xfer_id xfer_fk xfer_top
------- ------- --------
1 - 1
2 1 1
3 1 1
4 3 1
5 3 1
Now a query with the condition WHERE xfer_top=... will give your all the individual "parts", which could be combined in memory without having to bring the entire table in memory.

Related

Best method for storing hierarchy of organisations using eloquent

I need to store organisation ownership hierarchy in a laravel backend. Each node in the hierarchy can be one of a number of types, and each relationship needs to carry the amount of ownership (and potentially more meta data relating to the relationship between nodes). The structure can be arbitrarily deep, and it must be possible to attach a subtree an arbitrary number of times (see C1 below, which appears twice). Below is a sketch of kind of hierarchy I need....
I am using mySQL 8 so I have access to CTE for recursion. I have looked into the adjacency-list package (staudenmeir/laravel-adjacency-list) which uses CTE and looks good, but it uses self referencing tables. I think this means that I cannot store relationship data, and the I don't think I can get the repeated sub tree structure you see above.
I am currently exploring many to many relationships, with a custom pivot table to store the "relationship weighting". But I am unsure if this is a sensible approach and perhaps I'm missing some useful design pattern or this.
I am aware that this is a nebulous question, but while I'm trying to crack this myself using eloquent relationships, I thought I might get a discussion going about design pattens for this type of work.

How to get multi-levels with a single query? [duplicate]

Given the following table
id parentID name image
0 0 default.jpg
1 0 Jason
2 1 Beth b.jpg
3 0 Layla l.jpg
4 2 Hal
5 4 Ben
I am wanting to do the following:
If I search for Ben, I would like to find the image, if there is no image, I would like to find the parent's image, if that does not exist, I would like to go to the grandparent's image... up until we hit the default image.
What is the most efficient way to do this? I know SQL isn't really designed for hierarchical values, but this is what I need to do.
Cheers!
MySQL lacks recursive queries, which are part of standard SQL. Many other brands of database support this feature, including PostgreSQL (see http://www.postgresql.org/docs/8.4/static/queries-with.html).
There are several techniques for handling hierarchical data in MySQL.
Simplest would be to add a column to note the hierarchy that a given photo belongs to. Then you can search for the photos that belong to the same hierarchy, fetch them all back to your application and figure out the ones you need there. This is slightly wasteful in terms of bandwidth, requires you to write more application code, and it's not good if your trees have many nodes.
There are also a few clever techniques to store hierarchical data so you can query them:
Path Enumeration stores the list of ancestors with each node. For instance, photo 5 in your example would store "0-2-4-5". You can search for ancestors by searching for nodes whose path concatenated with "%" is a match for 5's path with a LIKE predicate.
Nested Sets is a complex but clever technique popularized by Joe Celko in his articles and his book "Trees and Hierarchical in SQL for Smarties." There are numerous online blogs and articles about it too. It's easy to query trees, but hard to query immediate children or parents and hard to insert or delete nodes.
Closure Table involves storing every ancestor/descendant relationship in a separate table. It's easy to query trees, easy to insert and delete, and easy to query immediate parents or children if you add a pathlength column.
You can see more information comparing these methods in my presentation Practical Object-Oriented Models in SQL or my upcoming book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Perhaps Managing Hierarchical Data in MySQL helps.

graph database for social network

I am doing a system similar to a social network. The number max of users must be eventually 50.000 or 70.000 at best.
At the moment i am using mysqli+prepared statments. The ERD have now 30 tables, eventually may reach to 40 tables.
So, my question is: i never used a graph database...i have the ERD done by mysql workbench and some code already developed. For the number expected of the users in this project, is recommended change from MySQL to a graph database? my sql code and database model can be availed? there is any advantage with this change?
what do you think ?
thanks
Graphs are nice and fast when stored in SQL, if you've access to recursive queries (which is not the case in MySQL, but which are available in PostgreSQL) and your queries involve a max-depth criteria (which is probably your case on a social network), or if they're indexed properly.
There are multiple methods to index graphs. In your case your graph probably isn't dense, as in you're dealing with multiple forests which are nearly independent (you'll usually be dealing with tightly clustered groups of users), so you've plenty options.
The easiest to implement is a transitive closure (which is, basically, pre-calculating all of the potential paths is called). In your case it may very well be partial (say, depth-2 or depth-3). This allows to fully index related nodes in a separate table, for very fast graph queries. Use triggers or stored procedures to keep it in sync.
If your graph is denser than that, you may want to look into using a GRIPP index. Much like with nested sets, the latter works best (as in updated fastest) if you drop the (rgt - lft - 1) / 2 = number of children property, and use float values for lft/rgt instead of integers. (Doing so avoids to reindex entire chunks of the graph when you insert/move nodes.)

Database Modeling: How to catogorize products like Amazon?

Assume I had a number of products (from a few thousands to hundred of thousands) that needed to be categorized in a hierarchical manner. How would I model such a solution in a database?
Would a simple parent-child table like this work:
product_category
- id
- parent_id
- category_name
Then in my products table, I would just do this:
product
- id
- product_category_id
- name
- description
- price
My concern is that this won't scale. By the way, I'm using MySQL for now.
Course it will scale. That will work just fine, it is a commonly used structure.
Include a level_no. That will assist in the code, but more important, it is required to exclude duplicates.
If you want a really tight structure, you need something like the Unix concept of inodes.
You may have difficulty getting your head around the code required to produce the hierarchy, say from a product, but that is a separate issue.
And please change
(product_category)) id to product_category_id
(product id to product_id
parent_id to parent_product_category_id
Responses to Comments
level_no. Have a look at this Data Model, it is for a Directory Tree structure (eg. the FlieManager Explorer window):
Directory Data Model
See if you can make sense of it, that's the Unix inode concept. The FileNames have to be unique within the Node, hence the second Index. That is actually complete, but some developers these days will have a hissy fit writing the code required to navigate the hierarchy, the levels. Those developers need a level_no to identify what level in the hierarchy they are dealing with.
Recommended changes. Yes, it is called Good Naming Conventions. I am rigid about it, and I publish it, so it is a Naming Standard. There are reasons for it, which will become clear to you when you write some SQL with 3 or 4 levels of joins; especially when you go to same one parent two different ways. If you search SO, you will find many questions for this; always the same answer. It will also be highlit in the next model I write for you.
I used to struggle with the same problem 10 years ago. Here's my personal solution to this problem. But before I start explaining, I would like to mention its pros and cons.
Pros:
You can select subbranches of a given node within any number of
desired depths, with the lowest imaginable cost.
The same can be done to select parent nodes.
No RDBMS specific feature is needed. So the same technique can be
implemented in any of them.
It is all implemented using a single field.
Cons:
You should be able to define a maximum number of depth for your
tree. You also need to define the maximum number of direct children
for the nodes.
Restructuring the tree is more expensive than traversing it. But not
as expensive as Nest Set Model. Adding a new branch is the matter of
finding the right value for the field. And in order to move a branch
into a new parent you need to update that node and all its children
(direct and indirect). The good news is that deleting a node and its
children is as easy as traversing it (which is absolutely nothing).
The technique:
Consider the following table as your tree holder:
CREATE TABLE IF NOT EXISTS `product_category` (
`product_category_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
`category_code` varchar(62) NOT NULL,
PRIMARY KEY (`product_category_id`),
UNIQUE KEY `uni_category_code` (`category_code`)
) DEFAULT CHARSET=utf8 ;
All the magic is done in category_code field. You need to encode your branch address into a text value as follow:
**node_name -> category_code**
Root -> 01
First child -> 01:01
Second child -> 01:02
First grandchild -> 01:01:01
First child of second child -> 01:02:01
In the above example, each node can have up to 99 direct children (assuming we are thinking in decimal). And since category_code is of type varchar(62), we can have up to (62-2)/3 = 20 depth. It's a trade off between the depth you want and the number of direct children each node can have and the size of your field. Scientifically speaking, this is an implementation of a complete tree in which unused branches are not actually created but reserved.
The good parts:
Now imagine you want to select nodes under 01:02. You can do this using a single query:
SELECT *
FROM product_category
WHERE
category_code LIKE '01:02:%'
Selecting direct nodes under the 01:02:
SELECT *
FROM product_category
WHERE
category_code LIKE '01:02:__'
Selecting all the ancestors of 01:02:
SELECT *
FROM product_category
WHERE
'01:02' LIKE CONCAT(category_code, ':%')
The bad parts:
Inserting a new node into the tree is the matter of finding the right category_code. This can be done using a stored procedure or even in a programming language like PHP.
Since the tree is limited in the number of direct children and depth, an insert can fail. But I believe in most practical cases we can assume such a limitation.
Cheers.
Your solution uses the adjacency list model of a hierarchy. It's by far the most common. It will scale ok up to thousands of products. The problem is that it takes either a recursive query or product specific extensions to SQL to deal with an indefinitely deep hierarchy.
There are other models of a hierarchy. In particular, there's the nested set model. The nested set model is good for retrieving the path of any node in a single query. It's also good for retrieving any desired sub tree. It's more work to keep it up to date. A lot more work.
You may want to briefly explore it before you bite off more than you want to chew.
What are you going to do with the hierarchy?
I think your big issue is that this is a deficiency in MySQL. For most RDBMSs which support WITH and WITH RECURSIVE, you should require only one scan per level. This makes deep hierarchies a bit problematic but usually not too bad.
I think to make this work well you will have to code a fairly extensive stored procedure, or you will have to go to another tree model, or you will have to move to a different RDBMS. For example this is easy to do with PostgreSQL and WITH RECURSIVE and this offers a lot better scalability than many other approaches.

Retrieving data with a hierarchical structure in MySQL

Given the following table
id parentID name image
0 0 default.jpg
1 0 Jason
2 1 Beth b.jpg
3 0 Layla l.jpg
4 2 Hal
5 4 Ben
I am wanting to do the following:
If I search for Ben, I would like to find the image, if there is no image, I would like to find the parent's image, if that does not exist, I would like to go to the grandparent's image... up until we hit the default image.
What is the most efficient way to do this? I know SQL isn't really designed for hierarchical values, but this is what I need to do.
Cheers!
MySQL lacks recursive queries, which are part of standard SQL. Many other brands of database support this feature, including PostgreSQL (see http://www.postgresql.org/docs/8.4/static/queries-with.html).
There are several techniques for handling hierarchical data in MySQL.
Simplest would be to add a column to note the hierarchy that a given photo belongs to. Then you can search for the photos that belong to the same hierarchy, fetch them all back to your application and figure out the ones you need there. This is slightly wasteful in terms of bandwidth, requires you to write more application code, and it's not good if your trees have many nodes.
There are also a few clever techniques to store hierarchical data so you can query them:
Path Enumeration stores the list of ancestors with each node. For instance, photo 5 in your example would store "0-2-4-5". You can search for ancestors by searching for nodes whose path concatenated with "%" is a match for 5's path with a LIKE predicate.
Nested Sets is a complex but clever technique popularized by Joe Celko in his articles and his book "Trees and Hierarchical in SQL for Smarties." There are numerous online blogs and articles about it too. It's easy to query trees, but hard to query immediate children or parents and hard to insert or delete nodes.
Closure Table involves storing every ancestor/descendant relationship in a separate table. It's easy to query trees, easy to insert and delete, and easy to query immediate parents or children if you add a pathlength column.
You can see more information comparing these methods in my presentation Practical Object-Oriented Models in SQL or my upcoming book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Perhaps Managing Hierarchical Data in MySQL helps.