I have a table that stores nested sets. It stores different nested sets differentiated by a collectionid (yes i'm mixing terms here, really should be nestedsetid). it looks somewhat like this:
id | orgid | leftedge | rightedge | level | collectionid
1 | 123 | 1 | 6 | 1 | 1
2 | 111 | 2 | 3 | 2 | 1
3 | 23 | 4 | 5 | 2 | 1
4 | 67 | 1 | 2 | 1 | 2
5 | 123 | 3 | 4 | 1 | 2
6 | 600 | 1 | 6 | 1 | 3
7 | 11 | 2 | 5 | 2 | 3
8 | 111 | 3 | 4 | 3 | 3
Originally I wanted to take advantage of the R-Tree Indexes, but the code i have seen for this: LineString(Point(-1, leftedge), Point(1, rightedge)) won't quite work since it doesn't take into account the collectionid and this id:1 and id:6 would end up being the same.
Is there a way I can use the R-Tree index with my current set up... Surely you can have different nested sets in the same table? My main aim is to be able to use the MBRWithin and MBRContains functions. Using MySQL 5.1
For single-dimensional data (these are 1d intervals, right?) there exist better index structures than r-trees. These are designed for dynamic data in 2-10 dimensions (at higher dimensions, performance isn't too good, as the split strategies and distance functions don't work very well anymore)
Actually for your use case, classic SQL should work very well. And the database can make use of its indexes efficiently. Having a good index structure is one thing, but you want to have the database exploit the indexes it has as good as possible.
As such, I'd just index leftEdge and rightEdge and the <, <=, >, >= functions. They are fast! And for the collectionid column, a bitmap index should be good.
Related
One very efficient way of storing hierarchies in SQL is by using some kind of 'closure table'.
It's a bit more difficult to edit and uses up more space, but can usually recursively be travelled with a single JOIN or a single query if you are only interested in IDs.
This table would contain 1 record for every possible ancestor/descendant relationship, as well as 1 record per item in the real table for which anc=des=id holds.
For this tree:
1
2 4 7
3 5 6
Our SQL table would contain:
+-----+-----+------+-------+
| anc | des | diff | depth |
+-----+-----+------+-------+
| 1 | 1 | 0 | 0 |
| 1 | 2 | 1 | 0 |
| 2 | 3 | 1 | 1 |
| 1 | 3 | 2 | 0 |
| 1 | 4 | 1 | 0 |
| 2 | 5 | 1 | 1 |
| 1 | 5 | 2 | 0 |
| 4 | 6 | 1 | 1 |
| 1 | 6 | 2 | 0 |
| 1 | 7 | 1 | 0 |
| 2 | 2 | 0 | 1 |
| 3 | 3 | 0 | 2 |
| 4 | 4 | 0 | 1 |
| 5 | 5 | 0 | 2 |
| 6 | 6 | 0 | 2 |
| 7 | 7 | 0 | 1 |
+-----+-----+------+-------+
Then the task: "Get all ancestors of node 5", in other words, the 'path to node 5' can be done with the following query:
SELECT `anc` FROM `closure` WHERE `des` = 5
And the task "Get all descendants of node 1" can be done with this query:
SELECT `des` FROM `closure` WHERE `anc` = 1
While "Get all direct descendants of node 1" is done like so:
SELECT `des` FROM `closure` WHERE `diff` = 1 AND `anc` = 1
Finally, "Get all root nodes" is done like this:
SELECT `anc` FROM `closure` WHERE `depth` = 0 AND `anc` = `des`
These four tasks together form the most utilized ways of selecting things from the tree.
However, in reality, when categorizing things, people can't decide where to put things. Inevitably, something is required to end up in multiple places in the tree. This throws a spanner in the works; two of these naïve queries no longer work (No 1 and No 2).
Note that, in order to prevent problems with stack overflow and recursion, it is true in the example that our graph remains a kind of 'tree'; there are no cycles in it.
The first problem is that "Get all descendants" now has duplicate results. This can be fixed with a GROUP BY clause.
The second, harder, for me unsolved problem is the 1st question: there are now multiple possible paths to a leaf node. Let's split the question into two possible satisfactory results:
Is there a way, using a single or fixed number of JOINs in a single or fixed number of queries, for an arbitrarily deep tree, to get either:
The Canonical path
This is the path with the least number of nodes leftmost in the tree representation to a specific leaf node. Note that it is not necessarily true that the tree is 'sorted' like the example in the data structure, as nodes are arbitrarily inserted and removed.
All paths
Gets all possible paths to a specific leaf node.
An illustrative example of why the naïve method fails:
Consider this tree:
1
2 3
5 4
6 6
Asking the question "What are the ascendants of 6" should have two logical answers:
1-2-5-6 and 1-3-4-6.
Yet, using the naïve query and sorting, we can only really get:
1-2-4-6 or 1-3-5-6.
Which are both not actually valid paths.
In all of the tutorials about closure tables I've read, it's plainly stated that closure tables are capable of handling hierarchies where the same item appears in multiple places, but it's never actually explained how to properly do this, just 'left to the reader'. I run into nontrivial problems trying to, however.
I want to build a page like shown below and all data should be retrieved from a database. Both the term, subject and sentences is retrieved from a database. Three levels of data. And under each term (eg. Spring 2017) I can pick and choose between all of these sentences.
Spring 2017
Subject1
Sentence 1
Sentence 2
Sentence 3
Subject2
Sentence 13
Sentence 12
Sentence 17
Subject3
Sentence 11
Sentence 14
Sentence 19
Autmn 2017
...
I want to present similar info from database to user, and let the user choose between all this sentences. How should i build up my database for achieving this in the best and most efficient way.
One way is:
Table 'subject' Table 'sentences'
| id | subjects | | id | subjectid | name |
| 3 | Subject1 | | 1 | 3 | Sentence 2 |
| 4 | Subject2 | | 2 | 4 | Sentence 13 |
Table 'term'
| id | term | sentenceid |
| 1 | Spring 17 | 1,2,28 |
Another way is maybe using pivot-tables, something like this:
Table 'sentences'
| id | parentid | name |
| 1 | 0 | Subject2 |
| 2 | 3 | Sentence 2 |
| 3 | 0 | Subject1 |
| 4 | 1 | Sentence 13 |
Table 'term'
| id | term | sentenceid |
| 1 | Spring 17 | 2,4,28 |
Notice: Number of terms can be many more than just two in a year.
Is it any of this structures you recommend, or any other way you think I should build my database? Is one of these more efficient? Not so demanding? Easier to adjust?
You are doing relational analysis/design:
Find all substantives/nouns of your domain. These are candidates for tables.
Find any relationships/associations between those substantives. "Has", "consists of", "belongs to", "depends on" and so on. Divide them into 1:1, 1:n, n:m associations.
look hard at the 1:1 ones and check if you can reduce two of your original tables into one.
the 1:n lead you to foreign keys in one of the tables.
the n:m give you additional association tables, possibly with their own attributes.
That's about it. I would strongly advise against optimizing for speed or space at this point. Any modem RDBMS will be totally indifferent against the number of rows you are likely to encounter in your example. All database related software (ORMs etc.) expect such a clean model. Packing ids into comma separated fields is an absolutes no-no as it defeats all mechanisms your RDBMS has to deal with such data; it makes the application harder to program; it confuses GUIs and so on.
Making weird choices in your table setup so they deviate from a clean model of your domain is the #1 cause of trouble along the way. You can optimize for performance later, if and when you actually get into trouble. Except for extreme cases (huge data sets or throughput), such optimisation primarily takes place inside the RDBMS (indexes, storage parameters, buffer management etc.) or by optimizing your queries, not by changing the tables.
If the data is hierarchical, consider representing it with a single table, with one column referencing a simple lookup for the "entry type".
Table AcademicEntry
================================
| ID | EntryTypeID | ParentAcademicEntryID | Description |
==========================================================
| 1 | 3 | 3 | Sentence 1 |
| 2 | 1 | <null> | Spring 2017 |
| 3 | 2 | 2 | Subject1 |
Table EntryType
================================
| ID | Description |
====================
| 1 | Semester |
| 2 | Subject |
| 3 | Sentence |
Start with the terms. Every term has subjects. Every subject has sentences. Then you may need the position of a subject within a term and probably the position of a sentence in a subject.
Table 'term'
id | term
---+------------
1 | Spring 2017
Table 'subject'
id | title | termid | pos
---+----------+--------+----
3 | Subject1 | 1 | 1
4 | Subject2 | 1 | 2
5 | Subject3 | 1 | 3
Table 'sentence'
id | name | subjectid | pos
---+-------------+-----------+-----
1 | Sentence 2 | 3 | 2
2 | Sentence 13 | 4 | 1
3 | Sentence 1 | 3 | 1
4 | Sentence 3 | 3 | 3
2 | Sentence 17 | 4 | 3
...
This table design Should resolve your need.
TblSeason
(
SeasonId int,
SeasonName varchar(30)
)
tblSubject
(
Subjectid int
sessionid int (fk to tblsession)
SubjectData varchar(max)
)
tblSentences
(
SentencesID INT
Subjectid int (Fk to tblSubject)
SentenceData varchar(max)
)
By referring table in the link, I have table category and another table name "package" to store category id.
http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/hierarchical-data.html
Category
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
Is there anyway I can left join until there is no parent left without knowing how many times I have to join?
And second question, my table "package" is only storing the last/smallest category id, for example in the table is "7 - FLASH", is that a good practices to only store the last/smallest category id and refer it back by joining the table? Will this action making the database heavy by query it back every time?
Thanks in advance!
It is not possible to do such queries in MySQL.
If you need to keep this database structure, then the fastest approach is likely to select the relevant data from the table and then process the data client-side into the approach array/join.
The above may not work well if you cannot sufficiently narrow down the number of rows to SELECT out, in which case, recursively running multiple queries may be faster. On your second query, the best approach is to do something like WHERE id IN (list_of_parent_values) rather than 1 query per parent.
Lastly if you can change your data structure, there is a way of using special tree column values to efficiently select all of the nodes out with a single SQL query. Some more work is required to insert and re-organise the tree however.
There are a number of slightly differing implementations of this, see here for one such discussion:
http://web.archive.org/web/20110606032941/http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
awesome_nested_set is also a ruby implementation of this pattern:
https://github.com/collectiveidea/awesome_nested_set
I would like to model a hierarchy/directory like what you see below, in a mysql table. You can see below the table schema I was thinking.
However, the directory Im talking about would be comprised of 100.000 elements and the depth would be ~5-10 levels. Moreover, we will have a tags pool and each element of the directory may be linked to one or more tags. So I was wondering if there is better approach. I was reading that some people decide to design tables that are not canonical for the shake of high performance and I am evaluating this case too.
ps: some people use Multi-way Trees to model this in the programming language level so the question how this ends up in the database remains.
hierarchy:
A
| -> 1
|->1
|->2
| -> 2
| -> 3
B
| -> 1
| -> 2
table:
___________________________
| id |element | father |
|---------------------------|
| 000 | A | null |
| 001 | 1 | 000 |
| 002 | 1 | 001 |
| 003 | 2 | 001 |
| 004 | 2 | 000 |
| 005 | 3 | 000 |
| 006 | B | null |
| 001 | 1 | 006 |
| 002 | 2 | 006 |
-----------------------------
A very fast hierachical tree is a nested set or a Celko-tree, it's a bit like a binary tree, or a huffman tree when you have a MySQL storage engine. Disadvantage is expensive deletion and insertion. Other RDBMS supports also recursive queries. In general I didn't see many nested sets. It seems to be complicated too create and maintain. When the nested set is too complicated and the RDBMS doesn't support recursive queries there is also materialized path.
http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html
http://en.wikipedia.org/wiki/Binary_tree
http://en.wikipedia.org/wiki/Huffman_coding
http://www.postgresql.org/docs/8.4/static/queries-with.html
Is it possible to make a recursive SQL query?
http://www.cybertec.at/pgbook/node122.html
Lets say we have a table that looks like this
connection_requirements
+-----------------------------------+
| item_id | connector_id | quantity |
+---------+--------------+----------+
| 1 | 4 | 1 |
| 1 | 5 | 1 |
| 1 | 2 | 2 |
+---------+--------------+----------+
This table is a list of connectors that a electronic device requires to operate, and how many of each type of connector it requires. (Think connections on a motherboard requiring certain types of connectors from a power supply)
Now we also have this table...
connections_compatability
+-------------------------+
| connector_id | works_as |
+--------------+----------+
| 6 | 4 |
| 6 | 5 |
+--------------+----------+
Where the first column is the connector that can also act as the connector id of the second column. (For instance a power supply has connectors such as "6+2 Pin" which can work as "8 Pin" or "6 Pin")
Now finally we have how many of each connectors are available in this table
connector_quantities
+-------------------------+
| connector_id | quantity |
+--------------+----------+
| 1 | 1 |
| 2 | 3 |
| 3 | 2 |
| 4 | 1 |
| 5 | 0 |
| 6 | 4 |
| 7 | 0 |
| 8 | 5 |
+--------------+----------+
Based off these tables, as you can infer, we do have enough connectors for item number 1 to properly operate. Even though we do not have enough of connector #5, we have 4 connector #6s, which can work as connector #4 and #5.
The connection_requirements table is joined onto the items table, how can we filter items that require more connections than we have available? We already have the code in place to filter items that require connectors that are unavailable.
The problem has many more layers of complexity to it, so we tried to simplify the problem.
Much appreciation for all the help!
One approach is to determine the "real" inventory of items including their substitutions. E.g., the real inventory of part 4 is actually 5: 1-part #4 + 4-part #6. So using that:
Select ...
From connection_requirements As CR
Where Not Exists (
Select 1
From connector_quantities As Q1
Left Join (
Select C2.connector_id, C2.works_as, Q2.quantity
From connections_compatibility As C2
Join connector_quantities As Q2
On Q2.connector_id = C2.connector_id
) As Subs1
On Subs1.works_as = Q1.connector_id
Where Q1.connector_id = CR.connector_id
And ( Coalesce(Subs1.quantity, 0) + Q1.quantity ) > CR.quantity
)
There is of course a catch with this approach. Suppose you have an item with a makeup of: 4 #4 connectors and 2 #6 connectors. Technically, you do have 4 #4 connectors (1 #4, and 3 #6 substitutions) but in combination with the requirement of 2 #6 connectors, you do not have enough parts. To solve this problem you would likely have to use a loop or multiple queries which would determine on-hand inventory after you use up all your primary parts.