Best method for storing hierarchy of organisations using eloquent - mysql

I need to store organisation ownership hierarchy in a laravel backend. Each node in the hierarchy can be one of a number of types, and each relationship needs to carry the amount of ownership (and potentially more meta data relating to the relationship between nodes). The structure can be arbitrarily deep, and it must be possible to attach a subtree an arbitrary number of times (see C1 below, which appears twice). Below is a sketch of kind of hierarchy I need....
I am using mySQL 8 so I have access to CTE for recursion. I have looked into the adjacency-list package (staudenmeir/laravel-adjacency-list) which uses CTE and looks good, but it uses self referencing tables. I think this means that I cannot store relationship data, and the I don't think I can get the repeated sub tree structure you see above.
I am currently exploring many to many relationships, with a custom pivot table to store the "relationship weighting". But I am unsure if this is a sensible approach and perhaps I'm missing some useful design pattern or this.
I am aware that this is a nebulous question, but while I'm trying to crack this myself using eloquent relationships, I thought I might get a discussion going about design pattens for this type of work.

Related

Recursive relationship MySQL table with many-to-many relationship

I'm creating a database structure that contains property listings. Each listing contains various amenities that I need to store, in a listing_amenities table.
This listing_amenities table will contain recursive records. For eg. A listing amenity might be "internet". Under this record we need to store children records such as what kind of internet it is (ADSL, Fibre etc.), the speed (20mbs, 50mbs etc.), as well as the ISP.
My question is whether a recursive model is the right solution here and how I would go about building the database structure for this or if there is a better solution for this kind of problem. In other words, would it be better to simply store all the amenities and their sub_properties in a big JSON blob column on the listings table?
The reason for creating a recursive table is to make querying better so that eventually we can easily query for property listings that have ADSL internet, for example, and sort this by location so that we could possibly target specific areas for marketing purposes to upgrade to fibre.
First, I would recommend against a JSON BLOB, because that will not be easy and reliable to query.
If you identify a finite number of levels you are breaking down your sub_properties into, like two for amenity and sub_property, that will make it easier to deal with.
The reason to use one table, with self references to sub- or super- properties would be to support an indefinite number of levels you can break things down by. However, that may be cumbersome, because you never know how many levels there are, so you don't know how many levels to use in a join. If you can avoid this, it might be better.

How do Salesforce query using the relationships table behind the scenes?

I'm trying to figure out how Salesforce's metadata architecture works behind the scenes. There's a video they've released ( https://www.youtube.com/watch?v=jrKA3cJmoms ) where he goes through many of the important tables that drive it along (about 18m in).
I've figured out the structure for the basic representation / storage / retrieval of simple stuff, but where i'm hazy is how the relationship pivot table works. I'll be happy when:
a) I know exactly how the pivot table relates to things (RelationId column he mentions is not clear to me)
b) I can construct a query for it.
Screenshot from the video
I've not had any luck finding any resources describing it at this level in the detail I need, or managed to find any packages that emulate it that I can learn from.
Does anyone have any low-level experience with this part of Salesforce that could help?
EDIT: Thank you, David Reed for further details in your edit. So presumably you agree that things aren't exactly as explained?
In the 'value' column, the GUID of the related record is stored
This allows ease of fetching -to-one related records and, with a little bit of simple SQL switching, resolve a group of records in the reverse direction.
I believe Salesforce don't have many-to-many relationships, as opposed to using a 'junction', so the above is still relevant
I guess now though I wonder what the point of the pivot table is at all, as there's a very simple relationship going on here now. Unless the lack of index on the value columns dictates the need for one...
Or, could it be more likely/useful if:
The record's value column stores a GUID to the relationship record and not directly to the related record?
This relationship record holds all necessary information required to put together a decent query and ALSO includes the GUID of the related record?
Neither option clear up the ambiguity for me, unless I'm missing something.
You cannot see, query, or otherwise access the internal tables that underlie Salesforce's on-platform schema. When you build an application on the platform, you query relationships using SOQL relationship queries; there are no pivot tables involved in the work you can see and do on the platform.
While some presentations and documentation discuss at some level the underlying implementation, the precise details of the SQL tables, schemas, query optimizers, and so on are not public.
As a Salesforce developer or developer who interacts with Salesforce via the API, you do not need to worry about the underlying SQL implementation used on Salesforce's servers at almost any time. The main point at which that knowledge can become helpful is when you are working with massive data volumes (multiple millions of records). The most helpful documentation for that use case is Best Practices for Deployments with Large Data Volumes. The underlying schema is briefly discussed under Underlying Concepts. But bear in mind
As a customer, you also cannot optimize the SQL underlying many application operations because it is generated by the system, not written by each tenant.
The implementation details are also subject to change.
Metadata Tables and Data Tables
When an organisation declares an object’s field with a relationship type, Force.com maps the field to a Value field in MT_Data, and then uses this field to store the ObjID of a related object.
I believe the documentation you mentioned is using the identifier ObjId ambiguously, and here actually means what it refers to earlier in the document as GUID - the Salesforce Id. Another paragraph states
The MT_Name_Denorm table is a lean data table that stores the ObjID and Name of each record in MT_Data. When an application needs to provide a list of records involved in a parent/child relationship, Force.com uses the MT_Name_Denorm table to execute a relatively simple query that retrieves the Name of each referenced record for display in the app, say, as part of a hyperlink.
This also doesn't make sense unless ObjId is being used to mean what is called GUID in the visual depiction of the table above in the document - the Salesforce Id of the record.

Implementing inheritance in MySQL: alternatives and a table with only surrogate keys

This is a question that has probably been asked before, but I'm having some difficulty to find exactly my case, so I'll explain my situation in search for some feedback:
I have an application that will be registering locations, I have several types of locations, each location type has a different set of attributes, but I need to associate notes to locations regardless of their type and also other types of content (mostly multimedia entries and comments) to said notes. With this in mind, I came up with a couple of solutions:
Create a table for each location type, and a "notes" table for every location table with a foreign key, this is pretty troublesome because I would have to create a multimedia and comments table for every comments table, e.g.:
LocationTypeA
ID
Attr1
Attr2
LocationTypeA_Notes
ID
Attr1
...
LocationTypeA_fk
LocationTypeA_Notes_Multimedia
ID
Attr1
...
LocationTypeA_Notes_fk
And so on, this would be quite annoying to do, but after it's done, developing on this structure should not be so troublesome.
Create a table with a unique identifier for the location and point content there, like so:
Location
ID
LocationTypeA
ID
Attr1
Attr2
Location_fk
Notes
ID
Attr1
...
Location_fk
Multimedia
ID
Attr1
...
Notes_fk
As you see, this is far more simple and also easier to develop, but I just don't like the looks of that table with only IDs (yeah, that's truly the only objection I have to this, it's the option I like the most, to be honest).
Similar to option 2, but I would have an enormous table of attributes shaped like this:
Location
ID
Type
Attribute
Name
Value
And so on, or a table for each attribute; a la Drupal. This would be a pain to develop because then it would take several insert/update operations to do something on a location and the Attribute table would be several times bigger than the location table (or end up with an enormous amount of attribute tables); it also has the same issue of the surrogate-keys-only table (just it has a "type" now, which I would use to define the behavior of the location programmatically), but it's a pretty solution.
So, to the question: which would be a better solution performance and scalability-wise?, which would you go with or which alternatives would you propose? I don't have a problem implementing any of these, options 2 and 3 would be an interesting development, I've never done something like that, but I don't want to go with an option that will collapse on itself when the content grows a bit; you're probably thinking "why not just use Drupal if you know it works like you expect it to?", and I'm thinking "you obviously don't know how difficult it is to use Drupal, either that or you're an expert, which I'm most definitely not".
Also, now that I've written all of this, do you think option 2 is a good idea overall?, do you know of a better way to group entities / simulate inheritance? (please, don't say "just use inheritance!", I'm restricted to using MySQL).
Thanks for your feedback, I'm sorry if I wrote too much and meant too little.
ORM systems usually use the following, mostly the same solutions as you listed there:
One table per hierarchy
Pros:
Simple approach.
Easy to add new classes, you just need to add new columns for the additional data.
Supports polymorphism by simply changing the type of the row.
Data access is fast because the data is in one table.
Ad-hoc reporting is very easy because all of the data is found in one table.
Cons:
Coupling within the class hierarchy is increased because all classes are directly coupled to the same table.
A change in one class can affect the table which can then affect the other classes in the hierarchy.
Space potentially wasted in the database.
Indicating the type becomes complex when significant overlap between types exists.
Table can grow quickly for large hierarchies.
When to use:
This is a good strategy for simple and/or shallow class hierarchies where there is little or no overlap between the types within the hierarchy.
One table per concrete class
Pros:
Easy to do ad-hoc reporting as all the data you need about a single class is stored in only one table.
Good performance to access a single object’s data.
Cons:
When you modify a class you need to modify its table and the table of any of its subclasses. For example if you were to add height and weight to the Person class you would need to add columns to the Customer, Employee, and Executive tables.
Whenever an object changes its role, perhaps you hire one of your customers, you need to copy the data into the appropriate table and assign it a new POID value (or perhaps you could reuse the existing POID value).
It is difficult to support multiple roles and still maintain data integrity. For example, where would you store the name of someone who is both a customer and an employee?
When to use:
When changing types and/or overlap between types is rare.
One table per class
Pros:
Easy to understand because of the one-to-one mapping.
Supports polymorphism very well as you merely have records in the appropriate tables for each type.
Very easy to modify superclasses and add new subclasses as you merely need to modify/add one table.
Data size grows in direct proportion to growth in the number of objects.
Cons:
There are many tables in the database, one for every class (plus tables to maintain relationships).
Potentially takes longer to read and write data using this technique because you need to access multiple tables. This problem can be alleviated if you organize your database intelligently by putting each table within a class hierarchy on different physical disk-drive platters (this assumes that the disk-drive heads all operate independently).
Ad-hoc reporting on your database is difficult, unless you add views to simulate the desired tables.
When to use:
When there is significant overlap between types or when changing types is common.
Generic Schema
Pros:
Works very well when database access is encapsulated by a robust persistence framework.
It can be extended to provide meta data to support a wide range of mappings, including relationship mappings. In short, it is the start at a mapping meta data engine.
It is incredibly flexible, enabling you to quickly change the way that you store objects because you merely need to update the meta data stored in the Class, Inheritance, Attribute, and AttributeType tables accordingly.
Cons:
Very advanced technique that can be difficult to implement at first.
It only works for small amounts of data because you need to access many database rows to build a single object.
You will likely want to build a small administration application to maintain the meta data.
Reporting against this data can be very difficult due to the need to access several rows to obtain the data for a single object.
When to use:
For complex applications that work with small amounts of data, or for applications where you data access isn’t very common or you can pre-load data into caches.

Database schema - organise by object or data?

I'm refactoring a horribly interwoven db schema, it's not that it's overly normalised; just grown ugly over time and not terribly well laid out.
There are several tables (forum boards, forum posts, idea posts, blog entries) that share virtually identical data structures and composition, but are seperated simply because they represent different "objects" from the applications perspective. My initial reaction is to put everything that has the same data structure into the same table, and use a "type" column to distinguish data when performing a select.
Am I setting myself up for a fall by adopting this "all into one" approach and allowing (potentially) so many parts of the application to access the same table? FYI, I can't see this database growing to more than ~20mb over the next year or so...
There's basically three ways to store an object inheritance hierarchy in a relational database. Each has their own pros and cons. See:
http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html
http://www.martinfowler.com/eaaCatalog/classTableInheritance.html
http://www.martinfowler.com/eaaCatalog/concreteTableInheritance.html
The book is great too. Luck would have it that chapter 3 - "Mapping to Relational Databases" - is available freely as a sample chapter. You can read more about the tradeoffs in there.
I used to dislike this "all into one" approach, but after I was forced to use it on a complex project a few years ago, I became a fan. If you index the table correctly, performance should be OK. You'll want an index on the type column to speed up your sort by type operations, for instance.
I now usually recommend that you use a single table to store similar objects. The only question then, is, do you want to use subtables to store data that's specific to a certain type of object? The answer to this question really depends on how different the structure of each object type is, and how many object types you'll have. If you have 50 object types with vastly differing structures, you may want to consider storing just the consistent object parts in the main table and creating a sub table for each object type.
In your example, however, I think you'd be fine just putting it all into a single table.
For more info, see here: http://www.agiledata.org/essays/mappingObjects.html
Don't lean too much on the "applications perspective", it tends to vary over time anway. Often databases are accessed by different applications too, and it usually outlives them all ...
When simliar objects are stored in different tables the reason may be that they actually represent the same domain object, but in a different state, or in a different step in a workflow. Then it often makes sense to keep them in one table and add some simple attributes to flag the state. If the workflow, or whatever it is changes, it's easier to change the database and application too, you may not need to add more tables or classes.

Multiple tables in nested sets hierarchy

I have a number of distinct items stored in different MySQL tables, which I'd like to put in a tree hierarchy. Using the adjacency list model, I can add a parent_id field to each table and link the tables using a foreign key relationship.
However, I'd like to use a nested sets/modified preorder tree traversal model. The data will be used in a environment that's heavily biased towards reads, and the kind of queries I expect to run favour this approach.
The problem is that all the information I have on nested sets assumes that you only have one type of item, stored in a single table. The ways round this that I can think of are:
Having multiple foreign key fields in the tree, one for each table/item type.
Storing the name of the item table in the tree structure as well as the item ID.
Both approaches are inelegant to say the least, so is there a better way of doing this?
RDBMS are not a good match to storing hierarchies to begin with, and your use case makes this even worse. I think a little more fine tuned but still ugly variations of your own suggestions are what you are going to get using a RDBMS. IMHO other data models would provide better solutions to your problem, like graph databases or maybe document databases. The article Should you go Beyond Relational Databases? gives a nice introduction to this kind of stuff.
You have have several types of tree, and a single table which contains the tree information (i.e. the left/right values) for all tree types?
If you have have several types of tree, why not a different table for each type?