Hierarchical Data in MySQL is as fast as XML to retrieve? - mysql

i've got a list of all countries -> states -> cities (-> subcities/villages etc) in a XML file and to retrieve for example a state's all cities it's really quick with XML (using xml parser).
i wonder, if i put all this information in mysql, is retrieving a state's all cities as fast as with XML? cause XML is designed to store hierarchical data while relational databases like mysql are not.
the list contains like 500 000 entities. so i wonder if its as fast as XML using either of:
Adjacency list model
Nested Set model
And which one should i use? Cause (theoretically) there could be unlimited levels under a state (i heard that adjacency isn't good for unlimited child-levels). And which is fastest for this huge dataset?
Thanks!

In this article Quassnoi creates a table with 2,441,405 rows in a heirarchical structure, and tests the performance of highly optimized queries for nested sets and adjacency lists. He runs a variety of different tests, for example fetching ancestors or descendents and times the results (read article for more details of exactly what was tested):
Nested Sets Adjacency Lists
All descendants 300ms 7000ms
All ancestors 15ms 600ms
All descendants up to a certain level 5000ms 600ms
His conclusion is that for MySQL nested sets is faster to query, but has a drawback that it is much slower to update. If you have infrequent updates, use nested sets. Otherwise prefer adjacency lists.
You might also wish to consider if using another database that supports recursive CTEs is an option for you.
I would imagine that an XML file of this size would take a reasonably long time to parse, but if you can cache the parsed structure in memory rather than reading it from disk each time then queries against it will be very fast.
Note that the main drawback of using MySQL for storing heirarchical data is that it requires some very complex queries. Whilst you can just copy the code from the article I linked to, if ever you need you modify it slightly then you will have to understand how it works. If you prefer to keep things simple then XML definitely has an advantage as it was designed for this type of data and so you should easily be able to create the queries you need.

Related

Adjacency List Model vs Nested Set Model for MySQL hierarchical data?

There are two ways to work with hierarchy data in MySQL:
Adjacency List Model
Nested Set Model
A major problem of the Adjacency List Model is that we need to run one query for each node to get the path of the hierarchy.
In the Nested Set Model this problem does not exist, but for each added node is necessary to give a MySQL UPDATE on all others left and right value.
My hierarchical data is not static data, such as product categories of e-commerce. Are constant registration of users in hierarchical sequence.
In my application, while there are many constants users registration, I also need to get the hierarchical path until reach the first node in the hierarchy.
Analyzing my situation, which of the two alternatives would be best for my application?
The Nested Set Model is nowdays not commonly used in databases, since it is more complex than the Adiacency List Model, given the fact that it requires managing two “pointers” instead of a single one. Actually, the Nested Set Model has been introduced in databases when it was complex or impossible to do recursive queries that traversed a hierarchy.
From 1999, standard SQL include the so called Recursive Common Table Expressions, or Recursive CTE, which makes more simple (and standardized!) to make queries that traverse recursive path within a hierarchy with any number of levels.
All the major DBMS systems have now included this feature, with a notably exception: MySQL. But in MySQL you can overcome this problem with the use of stored procedures. See, for instance, this post on StackOverflow, or this post on dba.stackexchange.
So, in summary, these are my advices:
If you can still decide which DBMS use, consider strongly some alternatives: for instance, if you want to stick with an open source database, use PostgreSQL, use the Adiacency List Model, and go with Recursive CTEs for your queries.
If you cannot change the DBMS, still you should go with the Adiacency List Model, and use stored procedures as those cited in the references.
UPDATE
This situation is changing with MySQL 8, which is currently in developement and which will integrate Recursive CTEs, so that from that version the Adiacency List Model will be more simple to use.

Structure of MongoDB vs MySQL

As mentioned in the following article : http://www.couchbase.com/why-nosql/nosql-database
When looking up data, the desired information needs to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application. Similarly, when writing data, the write needs to be coordinated and performed on many tables.
and the given example of data in JSON format tells
ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-off for web-based applications
But what if i capture all my data in a single table in mysql as is done in mongoDB [in the link given] , would that performance be like equivalent to mongoDB [meaning extracting data from mysql without JOINS] ?
It all depends on the structure you require. The main point of splitting data into tables is being able to index pieces of data, accelerating the retrieval of data.
Another point is that the normalization that a relational database offers ties you to a rigid structure. You can, of course, store json in mysql, but the json document won't have its pieces indexed. If you want fast retrieval of a json document by its pieces then you are looking into splitting it into parts.
If your data can change, which means, doesn't require a schema, then use Mongo.
If your data structure doesn't change then I'd go with MySQL

Most efficient multi level commenting system

I'm building a multi level commenting system and need a solution for quick reads and writes.
I've looked into adjacency list and nested set and it seems to me that for my particular scenario neither is the right method to use, so I'm looking into non RDBMS solutions as well.
What I would like to achieve:
Multy level parent/child relationship
Lots of reads and lots of writes
Adding/editing any child at any level
Sorting entire tree by dateime(old/new), voting score
I feel like the best solution for RDBMS is adjacency list, where you have recursive reads. But this is very inneficient because there will be thousands of reads per minute. Nested set is great for reads, but I will have lot of writes too which will make it really slow and inefficinet.
Do you know any other techniques that I could use here? Maybe other types of databases?
Most comment threads are very small in size ...less than a few K. So rather than storing each comment as it's own record in the database, you can store the entire comment graph as a single object. This will make it very easy to read and write the comment tree quickly.
This method lends itself very well to a shared cache ala redis or memcached.

What is the best practice for fetching a tree of nodes from database for further rendering?

Let's say we have a table with user comments. First-level comments have a reference to an article they are attached to. Deeper-level comments do not have this reference by design but they have a reference to it's parent comment.
For this database structure - what would be the most efficient way to fetch all comments for a given article and then render it in html format? (Let's assume that we have approx. 200 comments of first level and the deepiest level of 20)
I usually recommend a design called Closure Table.
See example in my answer to What is the most efficient/elegant way to parse a flat table into a tree?
I also designed this presentation: Models for Hierarchical Data with SQL and PHP. I developed a PHP app that render a tree in 0.3 seconds, from a collection of hierarchical data with 490k nodes.
I blogged about Closure Table here: Rendering Trees with Closure Table.
I wrote a chapter about different strategies for hierarchical data in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
For the most efficient way Quassnoi has written a series of articles on this subject.
Hierarchical queries in MySQL
Hierarchical queries in MySQL: adding level
Hierarchical queries in MySQL: adding ancestry chains.
Hierarchical queries in MySQL: finding leaves
Hierarchical queries in MySQL: finding loops
I suggest you read the first article and adapt the examples to work with your specific table, but the crux is to make a function that can recurse over the rows you need to fetch. You probably also want the level (depth in the heirarchy) so the second article is probably also relevant too.
The other articles may be useful if you need to make other types of queries on your data. He also has an article Adjacency list vs. nested sets: MySQL in which he compares highly optimized queries for both the adjacency model and the nested set model.

Multiple tables in nested sets hierarchy

I have a number of distinct items stored in different MySQL tables, which I'd like to put in a tree hierarchy. Using the adjacency list model, I can add a parent_id field to each table and link the tables using a foreign key relationship.
However, I'd like to use a nested sets/modified preorder tree traversal model. The data will be used in a environment that's heavily biased towards reads, and the kind of queries I expect to run favour this approach.
The problem is that all the information I have on nested sets assumes that you only have one type of item, stored in a single table. The ways round this that I can think of are:
Having multiple foreign key fields in the tree, one for each table/item type.
Storing the name of the item table in the tree structure as well as the item ID.
Both approaches are inelegant to say the least, so is there a better way of doing this?
RDBMS are not a good match to storing hierarchies to begin with, and your use case makes this even worse. I think a little more fine tuned but still ugly variations of your own suggestions are what you are going to get using a RDBMS. IMHO other data models would provide better solutions to your problem, like graph databases or maybe document databases. The article Should you go Beyond Relational Databases? gives a nice introduction to this kind of stuff.
You have have several types of tree, and a single table which contains the tree information (i.e. the left/right values) for all tree types?
If you have have several types of tree, why not a different table for each type?