nested set binary tree, query leftmost and rightmost nodes - mysql

I have a MySQL table storing a binary tree in a nested set model with some extra properties. The binary tree is neither full nor complete, so there may be nodes on the middle with only one child.
It looks something like this (but with much more data):
+----+------+-------+-----------+-------+----------+--------------------+
| id | left | right | parent_id | level | position | number_of_children |
+----+------+-------+-----------+-------+----------+--------------------+
| 1 | 1 | 8 | NULL | 1 | LEFT | 3 |
| 2 | 2 | 3 | 1 | 2 | LEFT | 0 |
| 3 | 4 | 7 | 1 | 2 | RIGHT | 1 |
| 4 | 5 | 6 | 2 | 3 | LEFT | 0 |
+----+------+-------+-----------+-------+----------+--------------------+
How could I query the leftmost or the rightmost node of a tree or sub-tree?
Some graphic explanation: https://imgur.com/w7gxtOC
I tried it multiple ways, but these are not working on all scenarios (these are examples for the leftmost):
SELECT * FROM nodes
WHERE position = 'LEFT'
AND number_of_children < 2
HAVING `right` = `left` + 1
ORDER BY element.`left` ASC
LIMIT 1;
Or this one, it definitely finds the solution but finds some false ones too:
SELECT * FROM nodes AS element
JOIN nodes AS upline ON element.parent_id = upline.id
WHERE upline.position = 'LEFT'
AND element.position = 'LEFT'
AND element.number_of_children < 2
ORDER BY element.`left` ASC;

Related

SQL query to find set of doc_ids where there is maximum intersection of ent_ids

I have a table with O(1M) rows with columns doc_id and ent_id where (doc_id, ent_id) is the primary key.
+--------+--------+
| doc_id | ent_id |
+--------+--------+
| 1 | a |
| 1 | b |
| 1 | x |
| 1 | y |
| 2 | a |
| 3 | a |
| 3 | x |
| 3 | y |
| 4 | x |
| 4 | y |
+--------+--------+
My question is, How do I efficiently find a set of doc_ids ( say I need top 1000 or 5000 doc_ids) where there is maximum intersection of ent_ids among that selected set of doc_ids?
For example : In the above table,
say I need top 2 doc_ids where there is maximum intersection among their ent_ids.The result would be - doc_ids = {1,3} with [ common ent_ids={a,x,y}, common ent_ids count=3 ]
say I need top 3 doc_ids where there is maximum intersection among their ent_ids. The result would be - doc_ids = {1,3,4} with [ common ent_ids={x,y}, common ent_ids count=2 ]
footnote - If it's not possible do it efficiently with SQL, any direction towards alternative method of doing it in application code would also be helpful. say, convert to csv -> some data-structure[inverted index?]/library + python code -> result set.

Joining three tables to get a list of tags

I have these three tables:
user_submitted_value
id | owner_id | value |
-----------------------
1 | 1 | 1337 |
2 | 2 | 1337 |
3 | 2 | 1337 |
4 | 1 | 1337 |
tag
id | owner_id | text |
---------------------------
1 | 1 | 'Tag 01' |
2 | 1 | 'Tag 02' |
3 | 1 | 'Tag 03' |
4 | 2 | 'Tag 04' |
user_submitted_value_tag
id | owner_id | tag_id | value_id |
-----------------------------------
1 | 1 | 1 | 1 |
2 | 1 | 2 | 1 |
3 | 1 | 3 | 1 |
So basically, users can submit values and enter any number of freetext tags to attach to that value. I need to store the tags as belonging to a specific user, and I need to be able to count how many times they've used each tag.
What I want to accomplish is a query that gets rows from user_submitted_value with the tags appended onto them. For example:
Query value with id 1:
id | owner_id | value | tags |
------------------------------------------------------
1 | 1 | 1337 | "'Tag 01','Tag 02','Tag 03'" |
Query all values belonging to user with id 1:
id | owner_id | value | tags |
------------------------------------------------------
1 | 1 | 1337 | "'Tag 01','Tag 02','Tag 03'" |
4 | 1 | 1337 | "" |
I know I need to JOIN one or more times, somehow, but I am not comfortable enough with SQL to figure out exactly how.
This seems like a rather arcane data format -- particularly because owner_id is repeated in all the tables.
In any case, I think the basic query that you want to get the values and tags for a given user looks like this:
select usv.owner_id,
group_concat(distinct usvt.value_id) as values,
group_concat(distinct t.text) as tags
from user_submitted_value usv join
user_submitted_value_tag usvt
on usv.value_id = usvt.value_id and usv.owner_id = usvt.owner_id join
tags t
on usvt.tag_id = t.id and usvt.owner_id = t.owner_id
group by usv_owner_id;
Here's the final solution in my case. Heavily based on the answer submitted by Gordon Linoff.
SELECT
user_submitted_value.id,
user_submitted_value.creator_id,
user_submitted_value.value,
group_concat(tag.text) AS tags
FROM user_submitted_value
LEFT JOIN user_submitted_value_tag
ON user_submitted_value.id = user_submitted_value_tag.value_id
AND user_submitted_value.creator_id = user_submitted_value_tag.creator_id
LEFT JOIN tag
ON user_submitted_valuetag.tag_id = tag.id
AND user_submitted_value_tag.creator_id = tag.creator_id
WHERE user_submitted_value.id = ?
GROUP BY user_submitted_value.id
The WHERE clause on the second JOIN can be modified to get all values for a given user.

Making changes to multiple records based on change of single record with SQL

I have a table of food items. They have a "Position" field that represents the order they should appear in on a list (listID is the list they are on, we don't want to re-order items on another list).
+--id--+--listID--+---name---+--position--+
| 1 | 1 | cheese | 0 |
| 2 | 1 | chips | 1 |
| 3 | 1 | bacon | 2 |
| 4 | 1 | apples | 3 |
| 5 | 1 | pears | 4 |
| 6 | 1 | pie | 5 |
| 7 | 2 | carrots | 0 |
| 8,9+ | 3,4+ | ... | ... |
+------+----------+----------+------------+
I want to be able to say "Move Pears to before Chips" which involves setting the position of Pears to position 1, and then incrementing all the positions inbetween by 1. so that my resulting Table look like this...
+--id--+--listID--+---name---+--position--+
| 1 | 1 | cheese | 0 |
| 2 | 1 | chips | 2 |
| 3 | 1 | bacon | 3 |
| 4 | 1 | apples | 4 |
| 5 | 1 | pears | 1 |
| 6 | 1 | pie | 5 |
| 7 | 2 | carrots | 0 |
| 8,9+ | 3,4+ | ... | ... |
+------+----------+----------+------------+
So that all I need to do is SELECT name FROM mytable WHERE listID = 1 ORDER BY position and I'll get all my food in the right order.
Is it possible to do this with a single query? Keep in mind that a record might be moving up or down in the list, and that the table contains records for multiple lists, so we need to isolate the listID.
My knowledge of SQL is pretty limited so right now the only way I know of to do this is to SELECT id, position FROM mytable WHERE listID = 1 AND position BETWEEN 1 AND 5 then I can use Javascript (node.js) to change position 5 to 1, and increment all others +1. Then UPDATE all the records I just changed.
It's just that anytime I try to read up on SQL stuff everyone keeps saying to avoid multiple queries and avoid doing syncronous coding and stuff like that.
Thanks
This calls for a complex query that updates many records. But a small change to your data can change things so that it can be achieved with a simple query that modifies just one record.
UPDATE my_table set position = position*10;
In the old days, the BASIC programming language on many systems had line numbers, it encouraged spagetti code. Instead of functions many people wrote GOTO line_number. Real trouble arose if you numbered the lines sequentially and had to add or delete a few lines. How did people get around it? By increment lines by 10! That's what we are doing here.
So you want pears to be the second item?
UPDATE my_table set position = 15 WHERE listId=1 AND name = 'Pears'
Worried that eventually gaps between the items will disappear after multiple reordering? No fear just do
UPDATE my_table set position = position*10;
From time to time.
I do not think this can be conveniently done in less than two queries, which is OK, there should be as few queries as possible, but not at any cost. The two queries would be like (based on what you write yourself)
UPDATE mytable SET position = 1 WHERE listID = 1 AND name = 'pears';
UPDATE mytable SET position = position + 1 WHERE listID = 1 AND position BETWEEN 2 AND 4;
I've mostly figured out my problem. So I've decided to put an answer here incase anyone finds it helpful.
I can make use of a CASE statement in SQL. Also by using Javascript beforehand to build my SQL query I can change multiple records.
This builds my SQL query:
var sql;
var incrementDirection = (startPos > endPos)? 1 : -1;
sql = "UPDATE mytable SET position = CASE WHEN position = "+startPos+" THEN "+endPos;
for(var i=endPos; i!=startPos; i+=incrementDirection){
sql += " WHEN position = "+i+" THEN "+(i+incrementDirection);
}
sql += " ELSE position END WHERE listID = "+listID;
If I want to move Pears to before Chips. I can set:
startPos = 4;
endPos = 1;
listID = 1;
My code will produce an SQL statement that looks like:
UPDATE mytable
SET position = CASE
WHEN position = 4 THEN 1
WHEN position = 1 THEN 2
WHEN position = 2 THEN 3
WHEN position = 3 THEN 4
ELSE position
END
WHERE listID = 1
I run that code and my final table will look like:
+--id--+--listID--+---name---+--position--+
| 1 | 1 | cheese | 0 |
| 2 | 1 | chips | 2 |
| 3 | 1 | bacon | 3 |
| 4 | 1 | apples | 4 |
| 5 | 1 | pears | 1 |
| 6 | 1 | pie | 5 |
| 7 | 2 | carrots | 0 |
| 8,9+ | 3,4+ | ... | ... |
+------+----------+----------+------------+
After that, all I have to do is run SELECT name FROM mytable WHERE listID = 1 ORDER BY position and the output will be as follows::
cheese
pears
chips
bacon
apples
pie

MySql complex query - SUM on multiple and variable columns

I have the following table structure (simplified version)
+----------------+ +-----------------+ +------+
| fee_definition | | user_fee | | user |
+----------------+ +-----------------+ +------+
| id | | user_id | | id |
| label | | fee_id | | ... |
| case1 | | case | +------+
| case2 | | manual_override |
| case3 | +-----------------+
| case4 |
| case5 |
+----------------+
Base on a pretty simple algorithm id determine which case fits the user to determine the amount of money they have to pay. A user_fee can be base on 1 to no limit number of fees definitions. which mean i can have the following content in the intersection table
+-----------+----------+--------+-------------------+
| user_id | fee_id | case | manual_override |
+-----------+----------+--------+-------------------+
| 1 | 1 | case1 | |
| 1 | 3 | case1 | |
| 1 | 5 | case1 | 50.22 |
| 2 | 1 | case5 | |
| 3 | 1 | case2 | |
| 3 | 2 | case2 | 18.50 |
+-----------+----------+--------+-------------------+
If a user is setted to have the case 1, all the fees listed under the case 1 where the value is different from 0 get picked. Same goes for the four other cases.
Just for reference on how i did things here is the actual query that I execute which is written in french (sorry for that but since we are a team of french speaking developpers, we mostly write in our code and queries in french).:
SELECT
`etudiant_etu`.*,
`session_etudiant_set`.*,
SUM(ROUND(frais_session_etudiant.fse_frais_manuel*100)/100) AS `fse_frais_manuel`,
`frais_session_etudiant`.`des_colonne`,
SUM(ROUND(definition_frais_des.des_quebecCanada*100)/100) AS `des_quebecCanada`,
SUM(ROUND(definition_frais_des.des_etranger*100)/100) AS `des_etranger`,
SUM(ROUND(definition_frais_des.des_non_credite*100)/100) AS `des_non_credite`,
SUM(ROUND(definition_frais_des.des_visiteur*100)/100) AS `des_visiteur`,
SUM(ROUND(definition_frais_des.des_explore*100)/100) AS `des_explore`,
`type_etudiant_tye`.*,
`type_formation_tyf`.*,
`pays_pys`.*,
`province_prc`.*
FROM `etudiant_etu`
INNER JOIN `session_etudiant_set`
ON session_etudiant_set.etu_id = etudiant_etu.etu_id
INNER JOIN `frais_session_etudiant`
ON frais_session_etudiant.set_id = session_etudiant_set.set_id
INNER JOIN `definition_frais_des`
ON definition_frais_des.des_id = frais_session_etudiant.des_id
LEFT JOIN `type_etudiant_tye`
ON type_etudiant_tye.tye_id = session_etudiant_set.tye_id
LEFT JOIN `type_formation_tyf`
ON type_formation_tyf.tyf_id = session_etudiant_set.tyf_id
LEFT JOIN `pays_pys`
ON pays_pys.pys_code = etudiant_etu.pys_adresse_permanente_code
LEFT JOIN `province_prc`
ON province_prc.prc_code = etudiant_etu.prc_adresse_permanente_code
WHERE (set_session = 'P11')
GROUP BY `session_etudiant_set`.`set_id`
ORDER BY `etu_nom` asc, `etu_prenom` ASC
as for reference from the actual query with the simplified version:
simplified version actual version
fee_definition.id definition_frais_des.des_id
fee_definition.case1 definition_frais_des.des_quebecCanada
fee_definition.case2 definition_frais_des.des_etranger
fee_definition.case3 definition_frais_des.des_non_credite
fee_definition.case4 definition_frais_des.des_visiteur
fee_definition.case5 definition_frais_des.des_explore
user_fee.user_id frais_session_etudiant.set_id
user_fee.fee_id frais_session_etudiant.des_id
user_fee.case frais_session_etudiant.des_colonne
user_fee.manual_override frais_session_etudiant.fes_frais_manuel
user.id session_etudiant_set.set_id
The problem I have is when it comes to handling the manual override setting. What would be the best way of doing this?
I would rather this to be handled in the query itself than in the programmation.
the logic behind what I am looking for goes as follow
get the SUM of the fees to be charged for a user and if an override value as been set, use that value instead of the actual value setted in the fee_definition, else use the value in the fee_definition.
I don't mind to loose the 4 not used cases and only keep the right column
Edited to display final result
This is the query I ended with, five levels of IF's
'IF(`frais_session_etudiant`.des_colonne= "des_quebec_canada",
SUM(IF(`frais_session_etudiant`.fse_frais_manuel > 0,
ROUND(`frais_session_etudiant`.fse_frais_manuel*100)/100,
ROUND(definition_frais_des.des_quebec_canada*100)/100)
),
IF(`frais_session_etudiant`.des_colonne= "des_etranger",
SUM(IF(`frais_session_etudiant`.fse_frais_manuel > 0,
ROUND(`frais_session_etudiant`.fse_frais_manuel*100)/100,
ROUND(definition_frais_des.des_etranger*100)/100)
),
IF(`frais_session_etudiant`.des_colonne= "des_non_credite",
SUM(IF(`frais_session_etudiant`.fse_frais_manuel > 0,
ROUND(`frais_session_etudiant`.fse_frais_manuel*100)/100,
ROUND(definition_frais_des.des_non_credite*100)/100)
),
IF(`frais_session_etudiant`.des_colonne= "des_visiteur",
SUM(IF(`frais_session_etudiant`.fse_frais_manuel > 0,
ROUND(`frais_session_etudiant`.fse_frais_manuel*100)/100,
ROUND(definition_frais_des.des_visiteur*100)/100)
),
IF(`frais_session_etudiant`.des_colonne= "des_explore",
SUM(IF(`frais_session_etudiant`.fse_frais_manuel > 0,
ROUND(`frais_session_etudiant`.fse_frais_manuel*100)/100,
ROUND(definition_frais_des.des_explore*100)/100)
),
0
)
)
)
)
) as frais'
That's a monster! as said by Ted Hopp :D
You can use IFNULL(manual_override,non-override-value)

Order by in mysql using second table

I have two tables, one is a list os stores and attributes, the second is a list of allocationsa based on these attributes.
The attribute table (stores_metadata)
| key | store_key | field | value
| 1 | 1 | size | Large
| 2 | 1 | dist | Midlands
| 3 | 2 | size | Medium
| 4 | 3 | dist | South
The allocation table (allocation)
| key | ticket_key | field | value | count
| 1 | 1 | size | Large | 10
| 2 | 1 | size | Medium| 5
I've managed to get the allocations working using the code:
SELECT store_key, quantity FROM
allocation
INNER JOIN store_metadata
ON allocation.`field` = store_metadata.`field`
AND allocation.`value` = store_metadata.`value`
This returns a list of the stores and how many items they should recieve, what I now need to do it order the stores by the distribution attribute.
Any help would be greatly appreciated.
The question isn't asked very well.
To perform ordering by any column in your result set add ORDER BY [column] to the end of the query. E.g.
SELECT store_key, quantity FROM
allocation
INNER JOIN store_metadata
ON allocation.`field` = store_metadata.`field`
AND allocation.`value` = store_metadata.`value`
ORDER BY allocation.`field`;