Nested Set Query to retrieve all ancestors of each node - mysql

I have a MySQL query that I thought was working fine to retrieve all the ancestors of each node, starting from the top node, down to its immediate node. However when I added a 5th level to the nested set, it broke.
Below are example tables, queries and SQL Fiddles:
Four Level Nested Set:
CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
`tree` int,
`left` int,
`right` int);
INSERT Tree
VALUES
("Food", 1, 1, 18),
('Fruit', 1, 2, 11),
('Red', 1, 3, 6),
('Cherry', 1, 4, 5),
('Yellow', 1, 7, 10),
('Banana', 1, 8, 9),
('Meat', 1, 12, 17),
('Beef', 1, 13, 14),
('Pork', 1, 15, 16);
The Query:
SELECT t0.title node
,(SELECT GROUP_CONCAT(t2.title)
FROM Tree t2
WHERE t2.left<t0.left AND t2.right>t0.right
ORDER BY t2.left) ancestors
FROM Tree t0
GROUP BY t0.title;
The returned result for node Banana is Food,Fruit,Yellow - Perfect. You can see this here SQL Fiddle - 4 Levels
When I run the same query on the 5 level table below, the 5th level nodes come back in the wrong order:
CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
`tree` int,
`left` int,
`right` int);
INSERT Tree
VALUES
("Food", 1, 1, 24),
('Fruit', 1, 2, 13),
('Red', 1, 3, 8),
('Cherry', 1, 4, 7),
('Cherry_pie', 1, 5, 6),
('Yellow', 1, 9, 12),
('Banana', 1, 10, 11),
('Meat', 1, 14, 23),
('Beef', 1, 15, 16),
('Pork', 1, 17, 22),
('Bacon', 1, 18, 21),
('Bacon_Sandwich', 1, 19, 20);
The returned result for Bacon_Sandwich is Bacon,Food,Meat,Pork which is not the right order, it should be Food,Meat,Pork,Bacon - You can see this here SQL Fiddle - 5 Levels
I am not sure what is happening because I don't really understand subqueries well enough. Can anyone shed any light on this?
EDIT AFTER INVESTIGATION:
Woah!! Looks like writing all this out and reading up about ordering with GROUP_CONCAT gave me some inspiration.
Adding ORDER BY to the actual GROUP_CONCAT function and removing from the end of the subquery solved the issue. I now receive Food,Meat,Pork,Bacon for the node Bacon_Sandwich
SELECT t0.title node
,(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
FROM Tree t2
WHERE t2.left<t0.left AND t2.right>t0.right
) ancestors
FROM Tree t0
GROUP BY t0.title;
I still have no idea why though. Having ORDER BY at the end of the subquery works for 4 levels but not for 5?!?!
If someone could explain what the issue is and why moving the ORDER BY fixes it, I'd be most grateful.

First it's important to understand that you have an implicit GROUP BY
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
To make the point more understandable I'll leave out subqueries and reduce the problem to the banana. Banana is the set [10, 11]. The correct sorted ancestors are those:
SELECT "banana" as node, GROUP_CONCAT(title ORDER by `left`)
FROM Tree WHERE `left` < 10 AND `right` > 11
GROUP BY node;
The ORDER BY must be in GROUP_CONCAT() as you want the aggregation function to sort. ORDER BY outside sorts by the aggregated results (i.e. the result of GROUP_CONCAT()). The fact that it worked until level 4 is just luck. ORDER BY has no effect on an aggregate function. You would get the same results with or without the ORDER BY:
SELECT GROUP_CONCAT(title)
FROM Tree WHERE `left` < 10 AND `right` > 11
/* ORDER BY `left` */
It might help to understand what
SELECT GROUP_CONCAT(title ORDER BY left) FROM Tree WHERE … ORDER BY left does:
Get a selection (WHERE) which results in three rows in an undefined order:
("Food")
("Yellow")
("Fruit")
Aggregate the result into one row (implicit GROUP BY) in order to be able to use an aggregate function:
(("Food","Yellow", "Fruit"))
Fire the aggregate function (GROUP_CONCAT(title, ORDER BY link)) on it. I.e. order by link and then concatenate:
("Food,Fruit,Yellow")
And now finally it sorts that result (ORDER BY). As it's only one row, sorting changes nothing.
("Food,Fruit,Yellow")

You can get the result using JOIN or SUB-QUERY.
Using JOIN:
SELECT t0.title node, GROUP_CONCAT(t2.title ORDER BY t2.left) ancestors
FROM Tree t0
LEFT JOIN Tree t2 ON t2.left < t0.left AND t2.right > t0.right
GROUP BY t0.title;
Check this SQL FIDDLE DEMO
Using SUB-QUERY:
SELECT t0.title node,
(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
FROM Tree t2 WHERE t2.left<t0.left AND t2.right>t0.right) ancestors
FROM Tree t0
GROUP BY t0.title;
Check this SQL FIDDLE DEMO
OUTPUT
| NODE | ANCESTORS |
|----------------|-----------------------|
| Bacon | Food,Meat,Pork |
| Bacon_Sandwich | Food,Meat,Pork,Bacon |
| Banana | Food,Fruit,Yellow |
| Beef | Food,Meat |
| Cherry | Food,Fruit,Red |
| Cherry_pie | Food,Fruit,Red,Cherry |
| Food | (null) |
| Fruit | Food |
| Meat | Food |
| Pork | Food,Meat |
| Red | Food,Fruit |
| Yellow | Food,Fruit |
In your sub query you had used ORDER BY after WHERE clause which won't affect the output. By default GROUP_CONCAT() function will orders the output string in ascending order of column value. It won't consider you explicit ORDER BY clause.
If you check your output of first query which returns the data in ascending order of title column. So the returned result for node Banana is Food,Fruit,Yellow.
But in your second result for Bacon_Sandwich is Bacon,Food,Meat,Pork because in ascending order Bacon comes first than Food will come.
If you want to order the result based on left column than you have to specify ORDER BY inside the GROUP_CONCAT() function as above. Check my both queries.
I prefer that you use JOIN instead of SUB-QUERY for improving performance.

Related

Group By ignores sorting in subquery

There is a TLDR version at the bottom.
Note: I have based my current solution on the proposed solution in this question here (proposed in the question text itself), however it does not work for me even if it works for that person. So I'm not sure how to handle this, because the question seems like a duplicate but the answer given there doesn't work for me. So I guess something must be different for me. If someone can tell me how to correctly handle this, I'm open to hearing.
I have a table like this one here:
scope_id key_id value
0 0 0_0
0 1 0_1
1 0 1_0
2 0 2_0
2 1 2_1
The scopes have a hierarchy where scope 0 is the parent of scope 2 and scope 2 is the parent of scope 1. (on purpose not sorted, they IDs are UUIDs, just for reading numbers here)
My use case is that I want the value of multiple keys in a specific scope (scope 1). However if there is no value defined for scope 1, I would be fine with a value from its parent (scope 2) and lastly if there is also no value in scope 2 I would take a value from its parent, scope 0. So if possible, I want the value from scope 1, if it doesn't have a value then from scope 2 and lastly I try to get the value from scope 0. (The scopes are a tree structure, so each scope can have max one parent, however a parent can have multiple childs).
So in the example above, if I want the value of key 0 in scope 1, I'd like to get 1_0 as the key is defined in the scope. If I want the value of key 1 in scope 1, I'd like to get 2_1 as there is no value defined in the scope 1 but in its parent scope 2 there is. And lastly if I want the value of keys 0 and 1 in scope 1, I want to get 1_0 and 2_1.
Currently it is solved by making 3 separate SQL requests and merging it in code. That works fine and fast enough, but I want to see if it would be faster with a single SQL query. I came up with the following query (based on the update in the question text here):
SELECT *
FROM (
SELECT *
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id;
The inner subquery first finds all keys that I want to look at and makes sure they are in the scope that I want to look at or it's parent scope. Then I order the scopes, so that first the child is, then the parent, then the grandparent. Now I expect group by to leave the value of the first row it finds, so hopefully the child (scope 1). However this doesn't work. Instead the first value based on the actual table is used.
TLDR
When grouping with GROUP BY in the query above, why is the order defined by the ORDER BY query ignored? Instead the first value based on the original table is taken when grouping.
Using this code you can try for yourself:
# this group by doesn't work with strict mode
SET sql_mode = '';
CREATE TABLE IF NOT EXISTS test(
scope_id int,
key_id int,
`value` varchar(20),
PRIMARY KEY (scope_id, key_id)
);
INSERT IGNORE INTO test values
(0, 0, "0_0"),
(1, 0, "1_0"),
(2, 0, "2_0"),
(2, 1, "2_1"),
(0, 1, "0_1");
SELECT *
FROM (
SELECT *
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id;
# expected result are the rows that contain value 1_0 and 2_1
I understand your question as a greatest-n-per-group variant.
In this situation, you should not think aggregation, but filtering.
You could solve it with a correlated subquery that selects the first available scope_id per key_id:
select t.*
from test t
where t.scope_id = (
select t1.scope_id
from test t1
where t1.key_id = t.key_id
order by field(scope_id, 1, 2, 0)
limit 1
)
For performance, you want an index on (key_id, scope_id).
Demo on DB Fiddle:
scope_id | key_id | value
-------: | -----: | :----
1 | 0 | 1_0
2 | 1 | 2_1
This will get what you want. Use a row number to effectively "save" your order for the next section of the query.
MySQL 8.0 or newer:
SELECT *
FROM (
SELECT *, ROW_NUMBER() rank
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id
order by rank;
MySQL 5.7 or older:
SET #row_num = 0;
SELECT *
FROM (
SELECT *, #row_num := #row_num + 1 rank
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id
ORDER BY rank;
Soap Box: MySQL results are, in general, horribly unreliable in any query that has 1 or more columns in a group by or aggregate but does not have all columns in a group by or aggregate.

sql query execution order question for group by and select

the table structure:
create table t_hr_ship (
shipment_id int,
shipper_id int,
date_time date,
pickup_state varchar(20),
dropoff_state varchar(20));
Here are some data in this table:
insert into t_hr_ship values
(1, 1, "2018-01-01", "WA", "OR"),
(2, 1, "2018-01-02", "WA", "OR"),
(3, 1, "2018-01-03", "WA", "OR"),
(4, 1, "2018-01-04", "WA", "OR"),
(5, 2, "2018-01-05", "WA", "OR"),
(6, 3, "2018-01-06", "WA", "OR"),
(7, 2, "2018-02-01", "OR", "WA"),
(8, 4, "2018-02-02", "OR", "WA"),
(9, 3, "2018-02-03", "WA", "CA"),
(10, 5, "2018-02-04", "CA", "OR"),
(11, 2, "2018-03-05", "WA", "TX"),
(12, 3, "2018-01-06", "OR", "CA");
the question is to get top 3 busiest routes in Jan and Feb. note that the route is same for "WA" to "OR" and "OR" to "WA" (the order of two end points don't matter as long as they are the same two end ports).
the solution is as below:
select case when s.pickup_state < s.dropoff_state then s.pickup_state else s.dropoff_state end as pickup,
case when s.pickup_state > s.dropoff_state then s.pickup_state else s.dropoff_state end as dropoff,
count(s.shipment_id) as no_of_shipment
from t_hr_ship s
where month(s.date_time) in ("01","02")
group by pickup, dropoff
order by no_of_shipment desc
limit 3;
this does get what I expect. my question is: I read from online resource that the sql query execution order is from -> where -> group -> having -> select -> order by -> limit", if this is true than this solution should not work because thepickupanddropoffdefined inselectcan't be used ingroup by`. am I missing anything?
You can use least() and greatest() to group by consistently:
select
least(pickup_state, dropoff_state) pickup,
greatest(pickup_state, dropoff_state) dropoff,
count(*) as no_of_shipment
from t_hr_ship s
where month(date_time) in (1, 2)
group by pickup, dropoff
order by no_of_shipment desc
limit 3;
Note that, unlike other RDBMS, MySQL allows the use of column aliases in the GROUP BY clause (and as well in the ORDER BY clause, but this is common in most RDMS).
Demo on DB Fiddle:
pickup | dropoff | no_of_shipment
:----- | :------ | -------------:
OR | WA | 8
CA | OR | 2
CA | WA | 1
Execution order is NOT determined by the query. SQL is not a procedural language, it is a declarative language.
The SELECT statement is describing the result set. In fact, the ultimate execution path may have little resemblance to the actual query -- although MySQL is not as sophisticated as other databases.
What is specified is the order for interpreting the meaning of column aliases in the query. That is what you are referring to.
Some databases, such as MySQL relax the standard and allow column aliases in the GROUP BY. It is as simple as that.
It works because you are using alias name. Yes to use the select columns in group by you have 3 options
case when s.pickup_state < s.dropoff_state then s.pickup_state else s.dropoff_state end
pickup
1

Getting parent/child/subchild relation in mysql

I have a single table 'tags' with the following fields (id, parent_id, name). Now I've set a limit of 3 levels in the hierarchy, i.e.: parent > child > subchild. A subchild cannot have a further child. So I want a query to retrieve records such as:
Parent-data
(if parent has child) child-data
(if child has subchild) subchild-data
Try something like:
SELECT tparent.id AS parent_id,
tparent.name AS parent_name,
tchild1.id AS child_id,
tchild1.name AS child_name,
tchild2.id AS subchild_id,
tchild2.name AS subchild_name
FROM tags tparent
LEFT JOIN tags tchild1
ON tparent.id = tchild1.parent_id
LEFT JOIN tags tchild2
ON tchild1.id = tchild2.parent_id
According to your comment, you're looking for the following output:
ID | PARENT | NAME
1 | 0 | family
2 | 1 | male
3 | 2 | boy1
4 | 2 | boy2
5 | 1 | female
6 | 5 | girl1
I will assume that the ids won't always be in this order, cause if they are, problem solved :)
I'm not sure you can achieve this directly in SQL without adding some additional information that will be used for ordering. For instance, you could add another column where you'd concatenate the ids of parent-child-subchild. Something like:
-- parent
SELECT CONCAT(LPAD(id, 6, '0'), '-000000-000000') AS order_info,
id AS id,
parent_id AS parent,
name AS name
FROM tags
WHERE parent_id = 0
UNION
-- child
SELECT CONCAT_WS('-', LPAD(tparent.id, 6, '0'),
LPAD(tchild1.id, 6, '0'),
'000000'),
tchild1.id,
tparent.id,
tchild1.name
FROM tags tparent
INNER JOIN tags tchild1
ON tparent.id = tchild1.parent_id
WHERE tparent.parent_id = 0
UNION
-- subchild
SELECT CONCAT_WS('-', LPAD(tparent.id, 6, '0'),
LPAD(tchild1.id, 6, '0'),
LPAD(tchild2.id, 6, '0')),
tchild2.id,
tchild1.id,
tchild2.name
FROM tags tparent
INNER JOIN tags tchild1
ON tparent.id = tchild1.parent_id
INNER JOIN tags tchild2
ON tchild1.id = tchild2.parent_id
ORDER BY 1
See the fiddle illustrating this.
Here, I'm formatting the ids to keep ordering coherent. That implies to know the maximum length of the ids (I used a length of 6 here), which is trivial to guess from the id field type.

SQL Query for exact match in many to many relation

I have the following tables(only listing the required attributes)
medicine (id, name),
generic (id, name),
med_gen (med_id references medicine(id),gen_id references generic(id), potency)
Sample Data
medicine
(1, 'Crocin')
(2, 'Stamlo')
(3, 'NT Kuf')
generic
(1, 'Hexachlorodine')
(2, 'Methyl Benzoate')
med_gen
(1, 1, '100mg')
(1, 2, '50ml')
(2, 1, '100mg')
(2, 2, '60ml')
(3, 1, '100mg')
(3, 2, '50ml')
I want all the medicines which are equivalent to a given medicine. Those medicines are equivalent to each other that have same generic as well as same potency. In the above sample data, all the three have same generics, but only 1 and three also have same potency for the corresponding generics. So 1 and 3 are equivalent medicines.
I want to find out equivalent medicines given a medicine id.
NOTE : One medicine may have any number of generics. Medicine table has around 102000 records, generic table around 2200 and potency table around 200000 records. So performance is a key point.
NOTE 2 : The database used in MySQL.
One way to do it in MySQL is to leverage GROUP_CONCAT() function
SELECT g.med_id
FROM
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id = 1 -- here 1 is med_id for which you're trying to find analogs
) o JOIN
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id <> 1 -- here 1 is med_id for which you're trying to find analogs
GROUP BY med_id
) g
ON o.gen_id = g.gen_id
AND o.potency = g.potency
Output:
| MED_ID |
|--------|
| 3 |
Here is SQLFiddle demo

Get result from mysql orderd by IN clause

I have the following query
SELECT * FROM invoice WHERE invoice_id IN (13, 15, 9, 27)
My result is:
invoice_id | invoice_number | ...
------------------------------------
9 | 201006003 |
13 | 201006020 |
15 | 201006022 |
27 | 201006035 |
which is the result set I want except that is ordered by the invoice_id (which is an autoincrement value).
Now I want the result in the order I specified in my query (13, 15, ...). Is there a way to achive that?
The background is that I have a DataTable bound to a DataGridView. The user can filter and sort the result but if he want's to print the result I don't use the DataTable for printing because it only contains the most important columns and instead I pull the whole records from the database and pass it to my printing control.
I also tried to extend the existing DataTable with the missing results but that seems to slower than using the IN (...) query.
It's ugly, but you could do:
ORDER BY CASE invoice_id WHEN 13 THEN 0 WHEN 15 THEN 1 WHEN 9 THEN 2 WHEN 27 THEN 3 ELSE 4 END
Actually, there's the FIELD function:
ORDER BY FIELD(invoice_id, 13, 15, 9, 27)
The FIELD function returns the position of the first argument in the list of the rest.
Or, if you're generating it dynamically, you could do:
WHERE invoice_id IN ({list}) ORDER BY FIND_IN_SET(invoice_id, '{list}')
You want the FIELD order by parameter.
SELECT * FROM invoice WHERE invoice_id IN (13, 15, 9, 27) ORDER BY FIELD (invoice_id, 13, 15, 9, 27)