Recursively generate JSON tree from hierarchical table in Postgres and jOOQ - json

I have a hierarchical table in Postgres database, e.g. category. The structure is simple like this:
id
parent_id
name
1
null
A
2
null
B
3
1
A1
4
3
A1a
5
3
A1b
6
2
B1
7
2
B2
What i need to get from this table is recursive deep tree structure like this:
[
{
"id": 1,
"name": "A",
"children": [
{
"id": 3,
"name": "A1",
"children": [
{
"id": 4,
"name": "A1a",
"children": []
},
{
"id": 5,
"name": "A1b",
"children": []
}
]
}
]
},
{
"id": 2,
"name": "B",
"children": [
{
"id": 6,
"name": "B1",
"children": []
},
{
"id": 7,
"name": "B2",
"children": []
}
]
},
]
Is it possible with unknown depth using combination of WITH RECURSIVE and json_build_array() or some other solution?

I found an answer to this question in this excellent blog post here, as I was wondering how to generalise over this problem in jOOQ. It would be useful if jOOQ could materialise arbitrary recursive object trees in a generic way: https://github.com/jOOQ/jOOQ/issues/12341
In the meantime, use this SQL statement, which was inspired by the above blog post, with a few modifications. Translate to jOOQ if you must, though you might as well store this as a view:
WITH RECURSIVE
d1 (id, parent_id, name) as (
values
(1, null, 'A'),
(2, null, 'B'),
(3, 1, 'A1'),
(4, 3, 'A1a'),
(5, 3, 'A1b'),
(6, 2, 'B1'),
(7, 2, 'B2')
),
d2 AS (
SELECT d1.*, 0 AS level
FROM d1
WHERE parent_id IS NULL
UNION ALL
SELECT d1.*, d2.level + 1
FROM d1
JOIN d2 ON d2.id = d1.parent_id
),
d3 AS (
SELECT d2.*, jsonb_build_array() children
FROM d2
WHERE level = (SELECT max(level) FROM d2)
UNION (
SELECT (branch_parent).*, jsonb_agg(branch_child)
FROM (
SELECT
branch_parent,
to_jsonb(branch_child) - 'level' - 'parent_id' AS branch_child
FROM d2 branch_parent
JOIN d3 branch_child ON branch_child.parent_id = branch_parent.id
) branch
GROUP BY branch.branch_parent
UNION
SELECT d2.*, jsonb_build_array()
FROM d2
WHERE d2.id NOT IN (
SELECT parent_id FROM d2 WHERE parent_id IS NOT NULL
)
)
)
SELECT jsonb_pretty(jsonb_agg(to_jsonb(d3) - 'level' - 'parent_id')) AS tree
FROM d3
WHERE level = 0;
dbfiddle. Again, read the linked blog post for an explanation of how this works

Related

How to return result of a join into a single property in a Postgres query?

Suppose the following,
CREATE SCHEMA IF NOT EXISTS my_schema;
CREATE TABLE IF NOT EXISTS my_schema.my_table_a (
id serial PRIMARY KEY
);
CREATE TABLE IF NOT EXISTS my_schema.my_table_b (
id serial PRIMARY KEY,
my_table_a_id BIGINT REFERENCES my_schema.my_table_a (id) NOT NULL
);
INSERT INTO my_schema.my_table_a VALUES
(1);
INSERT INTO my_schema.my_table_b VALUES
(1, 1),
(2, 1),
(3, 1);
If I run the following query,
SELECT
ta.*,
tb as tb
FROM my_schema.my_table_a ta
LEFT JOIN my_schema.my_table_b tb
ON ta.id = tb.my_table_a_id;
Then the result is,
[
{
"id": 1,
"tb": {
"id": 1,
"my_table_a_id": 1
}
},
{
"id": 1,
"tb": {
"id": 2,
"my_table_a_id": 1
}
},
{
"id": 1,
"tb": {
"id": 3,
"my_table_a_id": 1
}
}
]
How can I get it to work like this:
[
{
"id": 1,
"tb": [
{
"id": 1,
"my_table_a_id": 1
},
{
"id": 2,
"my_table_a_id": 1
},
{
"id": 3,
"my_table_a_id": 1
}
]
}
]
SELECT
ta.*,
ARRAY_AGG(tb) AS tb
FROM my_schema.my_table_a ta, my_schema.my_table_b tb
GROUP BY ta.id
ORDER BY ta.id;
Example https://www.db-fiddle.com/f/5i97YZ6FMRY48pZaJ255EJ/0

MySQL find in database where value in array in JSON is BETWEEN something

I have database
user
info
0
{"messages": [{"user_to": 1, "timestamp": 1663000000}, {"user_to": 2, "timestamp": 1662000000}]}
1
{"messages": [{"user_to": 0, "timestamp": 1661000000}, {"user_to": 2, "timestamp": 1660000000}]}
2
{"messages": []}
And I want to select all users who sent messages between timestamp 1662000000 and 1663000000 (any amount of messages, not all of them)
I don't have external table of messages, so I can't select from there
If you're using MySQL v8.0.x, you can utilize JSON_TABLE to create a JSON formatted table in a subquery. Then, select your DISTINCT users using your timestamp in a WHERE clause like this:
SELECT DISTINCT b.`user` FROM (
SELECT `user`, a.*
FROM `sample_table`,
JSON_TABLE(`info`,'$'
COLUMNS (
NESTED PATH '$.messages[*]'
COLUMNS (
`user_to` int(11) PATH '$.user_to',
`timestamp` int(40) PATH '$.timestamp')
)
) a
) b
WHERE b.`timestamp` BETWEEN 1662000000 AND 1663000000
ORDER BY b.`user` ASC
Input:
user
info
0
{"messages": [{"user_to": 1, "timestamp": 1663000000}, {"user_to": 2, "timestamp": 1662000000}]}
1
{"messages": [{"user_to": 0, "timestamp": 1661000000}, {"user_to": 2, "timestamp": 1660000000}]}
2
{"messages": []}
3
{"messages": [{"user_to": 0, "timestamp": 1662000000}, {"user_to": 2, "timestamp": 1661000000}, {"user_to": 2, "timestamp": 1660000000}, {"user_to": 2, "timestamp": 1663000000}]}
Output:
user
0
3
db<>fiddle here

Convert flat SQL rows into nested JSON array using FOR JSON

So, I have a simple view that looks like this:
Name | Type | Product | QuantitySold
------------------------------------------------------
Walmart | Big Store | Gummy Bears | 10
Walmart | Big Store | Toothbrush | 6
Target | Small Store | Toothbrush | 2
Without using nested queries, using sql's FOR JSON clause, can this be easily converted to this json.
[
{
"Type": "Big Store",
"Stores": [
{
"Name": "Walmart",
"Products": [
{
"Name": "Gummy Bears",
"QuantitySold": 10
},
{
"Name": "Toothbrush",
"QuantitySold": 6
}
]
}
]
},
{
"Type": "Smaller Store",
"Stores": [
{
"Name": "Target",
"Products": [
{
"Name": "Toothbrush",
"QuantitySold": 2
}
]
}
]
}
]
Essentially Group by Type, Store then, line items. My attempt so far below. Not sure how to properly group the rows.
SELECT Type, (
SELECT Store,
(SELECT Product,QuantitySold from MyTable m3 where m3.id=m2.id for json path) as Products
FROM MyTable m2 where m1.ID = m2.ID for json path) as Stores
) as Types FROM MyTable m1
You can try something like this:
DECLARE #Data TABLE (
Name VARCHAR(20), Type VARCHAR(20), Product VARCHAR(20), QuantitySold INT
);
INSERT INTO #Data ( Name, Type, Product, QuantitySold ) VALUES
( 'Walmart', 'Big Store', 'Gummy Bears', 10 ),
( 'Walmart', 'Big Store', 'Toothbrush', 6 ),
( 'Target', 'Small Store', 'Toothbrush', 2 );
SELECT DISTINCT
t.[Type],
Stores
FROM #Data AS t
OUTER APPLY (
SELECT (
SELECT DISTINCT [Name], Products FROM #Data x
OUTER APPLY (
SELECT (
SELECT Product AS [Name], QuantitySold FROM #Data n WHERE n.[Name] = x.[Name]
FOR JSON PATH
) AS Products
) AS p
WHERE x.[Type] = t.[Type]
FOR JSON PATH
) AS Stores
) AS Stores
ORDER BY [Type]
FOR JSON PATH;
Returns
[{
"Type": "Big Store",
"Stores": [{
"Name": "Walmart",
"Products": [{
"Name": "Gummy Bears",
"QuantitySold": 10
}, {
"Name": "Toothbrush",
"QuantitySold": 6
}]
}]
}, {
"Type": "Small Store",
"Stores": [{
"Name": "Target",
"Products": [{
"Name": "Toothbrush",
"QuantitySold": 2
}]
}]
}]
If you had normalized data structure you could use a another approach.
--Let's assume that Types are stored like this
DECLARE #Types TABLE (
id int,
Type nvarchar(20)
);
INSERT INTO #Types VALUES (1, N'Big Store'), (2, N'Small Store');
--Stores in separate table
DECLARE #Stores TABLE (
id int,
Name nvarchar(10),
TypeId int
);
INSERT INTO #Stores VALUES (1, N'Walmart', 1), (2, N'Target', 2),
(3, N'Tesco', 2); -- I added one more just for fun
--Products table
DECLARE #Products TABLE (
id int,
Name nvarchar(20)
);
INSERT INTO #Products VALUES (1, N'Gummy Bears'), (2, N'Toothbrush'),
(3, N'Milk'), (4, N'Ball') -- Added some here
-- And here comes the sales
DECLARE #Sales TABLE (
StoreId int,
ProductId int,
QuantitySold int
);
INSERT INTO #Sales VALUES (1, 1, 10), (1, 2, 6), (2, 2, 2),
(3, 4, 15), (3, 3, 7); -- I added few more
Now we can join the tables a get result that you need
SELECT Type = Type.Type,
Name = [Stores].Name,
Name = Products.Product,
QuantitySold = Products.QuantitySold
FROM (
SELECT s.StoreId,
p.Name Product,
s.QuantitySold
FROM #Sales s
INNER JOIN #Products p
ON p.id = s.ProductId
) Products
INNER JOIN #Stores Stores
ON Stores.Id = Products.StoreId
INNER JOIN #Types [Type]
ON Stores.TypeId = [Type].id
ORDER BY Type.Type, [Stores].Name
FOR JSON AUTO;
Output:
[
{
"Type": "Big Store",
"Stores": [
{
"Name": "Walmart",
"Products": [
{
"Name": "Gummy Bears",
"QuantitySold": 10
},
{
"Name": "Toothbrush",
"QuantitySold": 6
}
]
}
]
},
{
"Type": "Small Store",
"Stores": [
{
"Name": "Target",
"Products": [
{
"Name": "Toothbrush",
"QuantitySold": 2
}
]
},
{
"Name": "Tesco",
"Products": [
{
"Name": "Ball",
"QuantitySold": 15
},
{
"Name": "Milk",
"QuantitySold": 7
}
]
}
]
}
]

Count json tags in sql

I have this json strings
[{"count": 9, "name": "fixkit", "label": "Repair Kit"}, {"count": 1, "name": "phone", "label": "Telefoon"}]
[{"count": 3, "name": "phone", "label": "Telefoon"}]
[{"count": 5, "name": "kunststof", "label": "Kunststof"}, {"count": 6, "name": "papier", "label": "Papier"}, {"count": 2, "name": "metaal", "label": "Metaal"}, {"count": 2, "name": "inkt", "label": "Inkt"}, {"count": 3, "name": "kabels", "label": "Kabels"}, {"count": 2, "name": "klei", "label": "Klei"}, {"count": 2, "name": "glas", "label": "Glas"}, {"count": 12, "name": "phone", "label": "Telefoon"}]
[{"count": 77, "name": "weed", "label": "Cannabis"}, {"count": 1, "name": "firework1", "label": "Vuurpijl 1"}]
And know i want the following output
Phone | Number of phones (in this case: 16)
Fixkit | Number of fixkits (in this case: 9)
I wanted to do this with a sql query. If you know how to do this, thanks in advance!
If you're not using MySQL 8, this is a bit more complicated. First you have to find a path to a name element that has the value phone (or fixkit); then you can replace name in that path with count and extract the count field from that path; these values can then be summed:
SELECT param, SUM(JSON_EXTRACT(counts, REPLACE(JSON_UNQUOTE(JSON_SEARCH(counts, 'one', param, NULL, '$[*].name')), 'name', 'count'))) AS count
FROM data
CROSS JOIN (
SELECT 'phone' AS param
UNION ALL
SELECT 'fixkit'
) params
WHERE JSON_SEARCH(counts, 'one', param, NULL, '$[*].name') IS NOT NULL
GROUP BY param
Output:
param count
fixkit 9
phone 16
Demo on dbfiddle
If you are running MySQL 8.0, you can unnest the arrays into rows with json_table(), then filter on the names you are interested in, and aggregate.
Assuming that your table is mytable and that the json column is called js, that would be:
select j.name, sum(j.cnt) cnt
from mytable t
cross join json_table (
t.js,
'$[*]' columns(
cnt int path '$.count',
name varchar(50) path '$.name'
)
) j
where j.name in ('phone', 'fixkit')
group by j.name
Demo on DB Fiddle:
| name | cnt |
| ------ | --- |
| fixkit | 9 |
| phone | 16 |

Modelling multi-valued columns in RDBMS [duplicate]

This question already has answers here:
How to return rows that have the same column values in MySql
(3 answers)
Closed 6 years ago.
I have raw data in JSON as follows:
{
"id": 1,
"tags": [{
"category": "location",
"values": ["website", "browser"]
},{
"category": "campaign",
"values": ["christmas_email"]
}]
},
{
"id": 2,
"tags": [{
"category": "location",
"values": ["website", "browser", "chrome"]
}]
},
{
"id": 3,
"tags": [{
"category": "location",
"values": ["website", "web_view"]
}]
}
The tag category and its values are dynamically generated and are not known beforehand. I need to load this data into an RDBMS table and then later make queries to the data. The queries may be as follows:
Extract all rows where location has values "website" and "browser". The output of this query should return rows with id 1 and 2.
I need some help in modelling this into a table schema to support such queries. I was thinking of tables as:
Table 1: MAIN
Columns: ID, TAG_LIST_ID
Row1: 1 TL1
Row2: 2 TL2
Row3: 3 TL3
Table 2: TAGS
Columns: TAG_ID, TAG_CATEGORY, TAG_VALUE
Row1: TID1 location website
Row2: TID2 location browser
Row3: TID3 location chrome
Row4: TID4 location web_view
Row5: TID5 campaign christmas_email
Table 3: TAG_MAPPING
Columns: TAG_MAPPING_ID, TAG_LIST_ID, TAG_ID
Row1: TMID1 TL1 TID1
Row2: TMID2 TL1 TID2
Row3: TMID3 TL1 TID5
Row4: TMID4 TL2 TID1
Row5: TMID5 TL2 TID2
Row6: TMID6 TL2 TID3
Row7: TMID7 TL3 TID1
Row8: TMID8 TL3 TID4
Now to query all rows where location has values "website" and "browser", I could write
SELECT * from MAIN m, TAGS t, TAG_MAPPING tm
WHERE m.TAG_LIST_ID=tm.TAG_LIST_ID AND
tm.TAG_ID = t.TAG_ID AND
t.TAG_CATEGORY = "location" AND
(t.TAG_VALUE="website" OR t.TAG_VALUE="browser")
However this will return all the three rows; changing the OR condition to AND will return no rows. What is the right way to design the schema?
Any pointers appreciated.
Just replace the OR by IN and a counter:
SELECT tm.TAG_LIST_ID, count(1) as cnt
FROM MAIN m, TAGS t, TAG_MAPPING tm
WHERE tm.TAG_LIST_ID= m.TAG_LIST_ID
AND tm.TAG_ID = t.TAG_ID
AND t.TAG_CATEGORY = "location" AND
AND t.TAG_VALUE IN ("website","browser")
GROUP by tm.TAG_LIST_ID
having count(1) > 1 -- should be greater than 1 because you are looking for 2 words. This values change according the number of words.