SQL query to get hierarchical tree folder structure - sql-server-2008

I have tables as follows
file_table
f_id file_name
21 abc.xml
13 xyz.xml
folder_table
f_id f_name
15 Main
21 Sub
13 Sub2
group_table
parent child
21 13
15 21
In file_table, file_name "xyz.xml" and its corresponding f_id is 13(sub2) based on this f_id i want check its parent node from the group_table i.e 21(sub). and then check is there coresponding parent node from previous parent node in group_table i.e 15(Main) and so on. finally check this parent node exists in child if it is not exists that is root node. 13 -> 21 -> 15
I want to write a query to select data like below.
f_name file_name
Main/Sub/Sub2 xyz.xml
Main/Sub abc.xml

You could use a recursive common table expression (CTE) in your query to get all the directories in a file's path. In your main query, you can (ab)use a SELECT ... FOR XML subquery to concatenate all those directories into a full path.
WITH CTE (f_id, depth, f_id_depth) AS (
--Anchor statement
--(This provides the starting result set of the CTE.)
SELECT DISTINCT
f_id,
1,
f_id
FROM
file_table
UNION ALL
--Recursive statement
--(Note this contains the CTE itself in the FROM-clause.)
--(This statement gets executed repeatedly until it returns no rows anymore.)
--(Each time, its results are added to the CTE's result set.)
SELECT
CTE.f_id,
CTE.depth + 1,
F.f_id
FROM
CTE
INNER JOIN group_table AS G ON G.child = CTE.f_id_depth
INNER JOIN folder_table AS F ON F.f_id = G.parent
)
SELECT
--Use a SELECT ... FOR XML subquery to concatenate the folder names (each prefixed with a slash) in a single string.
--Additionally, wrap the subquery in a STUFF function to remove the leading slash.
STUFF((SELECT '/' + FF.f_name
FROM CTE INNER JOIN folder_table AS FF ON FF.f_id = CTE.f_id_depth
WHERE CTE.f_id = F.f_id
ORDER BY CTE.depth DESC
FOR XML PATH('')), 1, 1, '') AS f_name,
F.file_name
FROM
file_table AS F;

Related

MySQL query with assignment operators and variables works with MySQL 5.7 but not in MySQL 8

I have a query, that works with rows having 4 hierachies in one table. Now i query the table with the ID of the lowest level and get as result 4 rows of the parents and the child.
Here is a sample from the DB:
id
title
parent
type
1
Germany
0
1
2
Bavaria
1
2
3
Swabia
2
3
4
Augsburg
3
4
This works all the time in mysql 5.7 like this:
SELECT id, type, title FROM ( SELECT #r AS _id, (SELECT #r := parent FROM category WHERE id = _id) AS parent, #l := #l + 1 AS lvl
FROM (SELECT #r := 4, #l := 0) vars, category h WHERE #r <> 0) T1
JOIN category T2 ON T1._id = T2.id
ORDER BY T1.lvl DESC
Result in MySql 5.7:
id
type
title
1
1
Germany
2
2
Bavaria
3
3
Swabia
4
4
Augsburg
Under Mysql 8 i don't get an error but just no result.
Additional Info
According to the documentation, under both MySQL 5.7 and MySQL8.0, incrementing a variable itself is not guaranteed.
SET #a = #a + 1"; With other statements, such as SELECT, you might get
the expected results, but it's not guaranteed. With the following
statement, you might think that MySQL first evaluates #a and then
makes an assignment ...
I suspect that this is why the query does not work: https://dev.mysql.com/doc/refman/5.7/en/user-variables.html
Additional Info 2 (Explanation)
Beginning with MySQL 8.0.22, a reference to a user variable in a
prepared statement has its type determined when the statement is first
prepared, and retains this type each time the statement is executed
thereafter. Similarly, the type of a user variable employed in a
statement within a stored procedure is determined the first time the
stored procedure is invoked, and retains this type with each
subsequent invocation.
Possibly this would be an alternative in MySql 8 but i need a query that works in both:
with recursive parent_cats (id, parent, title, type) AS (
SELECT id, parent, title, type
FROM category
WHERE id = 4
union all
SELECT t.id, t.parent, t.title, t.type
FROM category t INNER JOIN parent_cats pc
ON t.id = pc.parent
)
SELECT * FROM parent_cats;

Recursive search in Mysql 5.7.30

I need to find the list of Parent id's in which particular text exists whether it may be in parent name or in its children's name.
Consider the following table
pid
parent
name
1
null
Parent1dynamic
2
null
Parent2
3
1
child1-P1
4
2
Child1-P2
5
4
Child-c1p2-dynamic
6
null
Parent3
7
null
Parent4
8
7
Child-p4-dynamic
i have used the following Mysql query
SELECT c.*
FROM db.tbl AS c
JOIN ( SELECT DISTINCT IFNULL(c.parent, c.pid) AS id
FROM db.tbl c
WHERE 1=1 AND c.name LIKE '%dyna%'
ORDER BY c.pid ASC ) s ON s.id = c.pid
WHERE parent IS NULL
ORDER BY pid LIMIT 0, 15
Using this query im searching for text 'dyna' and getting result with ids [1 & 7], its searching for first level
, but i need the result as [1, 2 & 7] - recursive search
In MySQL 8+ it may be
WITH RECURSIVE
cte AS ( SELECT pid, parent, name, pid rpid, pid rparent, name rname
FROM test
WHERE parent IS NULL
UNION ALL
SELECT test.pid, test.parent, test.name, cte.pid, cte.rparent, CONCAT(cte.rname, CHAR(0), test.name)
FROM cte
JOIN test ON cte.pid = test.parent )
SELECT DISTINCT rparent pid
FROM cte
WHERE rname LIKE #pattern;
or
WITH RECURSIVE
cte AS ( SELECT pid, parent
FROM test
WHERE name LIKE #pattern
UNION ALL
SELECT test.pid, test.parent
FROM cte
JOIN test ON cte.parent = test.pid )
SELECT DISTINCT pid
FROM cte
WHERE parent IS NULL
In MySQL 5+ use stored procedure:
CREATE PROCEDURE get_rows_like_pattern (IN pattern VARCHAR(255))
BEGIN
CREATE TABLE cte (pid INT PRIMARY KEY, parent INT)
SELECT pid, parent
FROM test
WHERE name LIKE pattern;
WHILE ROW_COUNT() DO
INSERT IGNORE INTO cte
SELECT test.pid, test.parent
FROM cte
JOIN test ON cte.parent = test.pid;
END WHILE;
SELECT DISTINCT pid
FROM cte
WHERE parent IS NULL;
DROP TABLE cte;
END
fiddle

User variable not updating in subquery

I am using a RECURSIVE statement on MariaDB, to get a product category path when I know the product category unique ID, from a self-referencing category table.
This works:
WITH RECURSIVE categories AS (
SELECT * FROM tbl_eav_categories tec2 WHERE tec2.category_id = 1023
UNION
SELECT tec3.* FROM tbl_eav_categories tec3, categories AS c WHERE tec3.category_id = c.parent_category_id
)
SELECT GROUP_CONCAT(CONCAT(category_default_label,' [',category_id,']') ORDER BY category_id ASC SEPARATOR ' >> ') FROM categories
And returns:
Consumables [7] >> Catering Equipment and Supplies [95] >> Tea Bags [1023]
Great.
But now I need to list all category ID's and in the second column, their paths.
I thought this would simply be a matter of doing a SELECT on the primary category table ('tbl_eav_categories') table, and dropping the above query in as a subquery column. Like this:
SELECT
#CatID := category_id AS 'cat_id',
(
WITH RECURSIVE categories AS (
SELECT * FROM tbl_eav_categories tec2 WHERE tec2.category_id = #CatID
UNION
SELECT tec3.* FROM tbl_eav_categories tec3, categories AS c WHERE tec3.category_id = c.parent_category_id
)
SELECT GROUP_CONCAT(CONCAT(category_default_label,' [',category_id,']') ORDER BY category_id ASC SEPARATOR ' >> ') FROM categories
) 'categorypath'
FROM tbl_eav_categories;
However, all I get is:
cat_id categorypath
1 Bearings [1]
2 Bearings [1]
3 Bearings [1]
4 Bearings [1]
5 Bearings [1]
6 Bearings [1]
...
(like this until the bottom of the entire result set).
After some research, I do believe it has something to do with the #CatID variable being evaluated before it gets assigned, but I can't work out how to work around it.
I tried to follow Ben English's guidance here: User variable in MySQL subquery but it baffles me :(
Please help! :)
I believe I have found the answer, by including the entire table scan in the actual CTE:
WITH RECURSIVE category_path (id, title, path) AS
(
SELECT category_id, category_default_label, category_default_label path FROM tbl_eav_categories WHERE parent_category_id IS NULL
UNION ALL
SELECT c.category_id, c.category_default_label, CONCAT(cp.path, ' [',c.parent_category_id, '] >> ', c.category_default_label) FROM category_path cp JOIN tbl_eav_categories c ON cp.id = c.parent_category_id
)
SELECT id,path FROM category_path
ORDER BY path;

single query to print all the rows whose count is greater than 10

I have a table named Table1 whose definition is as below.
Id int False
Source nvarchar(MAX) True
Dest nvarchar(MAX) True
Port nvarchar(MAX) True
DgmLen nvarchar(MAX) True
Flags nvarchar(MAX) True
Payload nvarchar(MAX) True
Now I want to print all the rows of this table whose "source" count is greater than 10.
Firstly I have used this query to fetch the count of sources in the table:
Select Source,count(*) t_count from Table1 group by Source
and it has has fetched the following data:
Source t_count
2-170.125.32.3 1
2-172.125.32.10 1
2-190.125.32.10 11
2-190.125.32.3 1
2-192.125.32.10 1
2-192.125.32.3 6
Now I want to print all the rows having "Source = 2-190.125.32.10" as its t_count is greater than 10.
How can write this in a single query.
If I got you correctly, then :-
select * from Table1 where Source in
(
Select Source from Table1 group by Source having count(*) > 10
)
This return all those rows from Table1 who have the Source column value appearing more than 10 times.
EDIT :-
select * from Table1 t1 join
(Select Source, Dest from Table1 group by Source, Dest having count(*) > 10) t2
on t1.Source = t2.Source and t1.Dest = t2.Dest
Here, the table t2 returns combination of Source, Dest appearing more than 10 times and joins it with the base table Table1.
having "Source = 2-190.125.32.10"
that's the Keyword: having:
Select Source,count(*) t_count from Table1 group by Source HAVING t_count > 10
and btw: if you are grouping by Soruce - there will be ALWAYS exactly one Result Row, that matches a certain source - that's the point of grouping.
Select
s.Source,
s.Dest,
s.Port,
s.DgmLen,
s.Flags,
s.Payload
from Table1 s
join
(
select
source,
count(*) as tot
from Table1
group by source
having tot > 10
)s1
on s1.source = s.source
Your single query should be like this
Select Source,count(*) t_count from Table1 group by Source HAVING t_count > 10
Similar EXAMPLE made

How can we find gaps in sequential numbering in MySQL?

We have a database with a table whose values were imported from another system. There is an auto-increment column, and there aren’t any duplicate values, but there are missing values. For example, running this query:
select count(id) from arrc_vouchers where id between 1 and 100
should return 100, but it returns 87 instead. Is there a query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there aren’t any records with id's of 71-82. I want to return 71, 72, 73, etc.
Is this possible?
A better answer
JustPlainMJS provided a much better answer in terms of performance.
The (not as fast as possible) answer
Here's a version that works on a table of any size (not just on 100 rows):
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
gap_starts_at - first id in current gap
gap_ends_at - last id in current gap
This just worked for me to find the gaps in a table with more than 80k rows:
SELECT
CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
SELECT
#rownum:=#rownum+1 AS expected,
IF(#rownum=YourCol, 0, #rownum:=YourCol) AS got
FROM
(SELECT #rownum:=0) AS a
JOIN YourTable
ORDER BY YourCol
) AS z
WHERE z.got!=0;
Result:
+------------------+
| missing |
+------------------+
| 1 thru 99 |
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)
Note that the order of columns expected and got is critical.
If you know that YourCol doesn't start at 1 and that doesn't matter, you can replace
(SELECT #rownum:=0) AS a
with
(SELECT #rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a
New result:
+------------------+
| missing |
+------------------+
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)
If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in Bash.
SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM ( SELECT #rownum:=#rownum+1 AS expected, IF(#rownum=height, 0, #rownum:=height) AS got FROM (SELECT #rownum:=0) AS a JOIN block ORDER BY height ) AS z WHERE z.got!=0;
This produces an output like so
$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)
You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID
for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
echo $ID
# Fill the gaps
done
It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.
A quick-and-dirty query that should do the trick:
SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM
(
SELECT a1.id AS a , MIN(a2.id) AS b
FROM arrc_vouchers AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab
WHERE
b > a + 1
This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between... E.g.,
id next_id missing_inbetween
1 4 2
68 70 1
75 87 11
If you are using a MariaDB database, you have a faster (800%) option using the sequence storage engine:
SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);
If there is a sequence having gap of maximum one between two numbers (like
1,3,5,6) then the query that can be used is:
select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
table_name - source1
column_name - id
An alternative solution that requires a query + some code doing some processing would be:
select l.id lValue, c.id cValue, r.id rValue
from
arrc_vouchers l
right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
left join arrc_vouchers r on r.id=c.id+1
where 1=1
and c.id > 0
and (l.id is null or r.id is null)
order by c.id asc;
Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.
That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), i.e.:
lValue |cValue|rValue
-------+------+-------
{null} | 2 | 3
8 | 9 | {null}
{null} | 22 | 23
23 | 24 | {null}
{null} | 29 | {null}
{null} | 33 | {null}
Without going into further details (we'll see them in next paragraphs) this output means that:
No values between 0 and 2
No values between 9 and 22
No values between 24 and 29
No values between 29 and 33
No values between 33 and MAX VALUE
So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (i.e., if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.
The complete raw output of my table is:
select * from arrc_vouchers order by id asc;
0
2
3
4
5
6
7
8
9
22
23
24
29
33
Some notes:
The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.
I tried it in a different manner, and the best performance that I found was this simple query:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
... one left join to check if the next id exists, only if next if is not found, then the subquery finds the next id that exists to find the end of gap. I did it because the query with equal (=) is better performance than the greater than (>) operator.
Using the sqlfiddle it does not show so a different performance compared to the other queries, but in a real database this query above results in 3 times faster than the others.
The schema:
CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;
Follow below all the queries that I made to compare the performance:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
select *, (gapEnd-gapIni) qt
from (
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
from arrc_vouchers a
order by id
) a where gapEnd <> gapIni
;
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
#,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
from arrc_vouchers a
order by id
;
You can see and test my query using this SQL Fiddle:
http://sqlfiddle.com/#!9/6bdca7/1
It is probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e., next available number), and this seems to work fine.
SELECT MIN(l.number_sequence + 1) as nextavabile
from patients as l
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL
Several other scenarios and solutions discussed there, from 2005!
How to Find Missing Values in a Sequence With SQL
Create a temporary table with 100 rows and a single column containing the values 1-100.
Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.
This should work:
select tempid from temptable
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id
where arrc_vouchers.id is null
Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.
I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.
SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;
Based on the answer given by matt, this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use #var to represent tables &/or columns in a stored procedure.
create definer=`root`#`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);
set #tbl=cast(param_tbl as char character set utf8);
set #col=cast(param_col as char character set utf8);
set #strsql=concat("select
( t1.",#col," + 1 ) as starts_at,
( select min(t3.",#col,") -1 from ",#tbl," t3 where t3.",#col," > t1.",#col," ) as ends_at
from ",#tbl," t1
where not exists ( select t2.",#col," from ",#tbl," t2 where t2.",#col," = t1.",#col," + 1 )
having ends_at is not null");
prepare stmt from #strsql;
execute stmt;
deallocate prepare stmt;
end
A simple, yet effective, solution to find the missing auto-increment values:
SELECT `id`+1
FROM `table_name`
WHERE `id`+1 NOT IN (SELECT id FROM table_name)
Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.
You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.
SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;
This works for me:
SELECT distinct(l.membership_no + 1) as nextavabile
from Tablename as l
LEFT OUTER JOIN Tablename as r on l.membership_no + 1 = r.membership_no
WHERE r.membership_no is NULL and l.membership_no is not null order by nextavabile asc;
Starting from the comment posted by user933161,
select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;
is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.)
Also,
insert into sequence (id) values (#);
where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.)
You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.
Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.
CREATE TABLE arrc_vouchers (id int primary key);
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16);
WITH RECURSIVE odd_num_cte (id) AS
(
SELECT (select min(id) from arrc_vouchers)
union all
SELECT id+1 from odd_num_cte where id <(SELECT max(id) from arrc_vouchers)
)
SELECT cte.id
from arrc_vouchers ar right outer join odd_num_cte cte on ar.id=cte.id
where ar.id is null;