PostgreSQL Materialized Path / Ltree to hierarchical JSON-object - json

I have this materialized path tree structure built using PostgreSQL's ltree module.
id1
id1.id2
id1.id2.id3
id1.id2.id5
id1.id2.id3.id4 ... etc
I can of course easily use ltree to get all nodes from the entire tree or from a specific path/subpath, but when I do that, naturally what I get is a lot of rows (which equals to an array/slice of nodes in the end.. Golang/whatever programming language you use)
What I'm after is to fetch the tree - ideally from a certain start and ending path/point - as a hieracical JSON tree object like etc
{
"id": 1,
"path": "1",
"name": "root",
"children": [
{
"id": 2,
"path": "1.2",
"name": "Node 2",
"children": [
{
"id": 3,
"path": "1.2.3",
"name": "Node 3",
"children": [
{
"id": 4,
"path": "1.2.3.4",
"name": "Node 4",
"children": [
]
}
]
},
{
"id": 5,
"path": "1.2.5",
"name": "Node 5",
"children": [
]
}
]
}
]
}
I know from a linear (non-hiearchical) row/array/slice resultset I can of course in Golang explode the path and make the necessary business logic there to create this json, but it'll certainly be MUCH much better if there's a handy way of achieving this with PostgreSQL directly.
So how would you in PostgreSQL output an ltree tree structure to json - potentionally from a starting to ending path?
If you don't know ltree, I guess the question could be generalized more to "Materalized path tree to hierachical json"
Also I'm playing with the thought of adding a parent_id on all nodes in addition to the ltree path, since at least then I would be able to use recursive calls using that id to fetch the json I guess... also I've thought about putting a trigger on that parent_id to manage the path (keep it updated) based on when a change in parent id happens - I know it's another question, but perhaps you could tell me your opinion as well, about this?
I hope some genius can help me with this. :)
For your convenience here's a sample create script you can use to save time:
CREATE TABLE node
(
id bigserial NOT NULL,
path ltree NOT NULL,
name character varying(255),
CONSTRAINT node_pkey PRIMARY KEY (id)
);
INSERT INTO node (path,name)
VALUES ('1','root');
INSERT INTO node (path,name)
VALUES ('1.2','Node 1');
INSERT INTO node (path,name)
VALUES ('1.2.3','Node 3');
INSERT INTO node (path,name)
VALUES ('1.2.3.4','Node 4');
INSERT INTO node (path,name)
VALUES ('1.2.5','Node 5');

I was able to find and slightly change it to work with ltree's materialized paths instead of parent ids like often used on adjacency tree structures.
While I still hope for a better solution, this I guess will get the job done.
I kinda feel I have to add the parent_id in addition to the ltree path, since this is of course not any way near as fast as referencing parent id's.
Well credits goes to this guy's solution, and here's my slightly modified code using ltree's subpath, ltree2text and nlevel to achieve the exact same:
WITH RECURSIVE c AS (
SELECT *, 1 as lvl
FROM node
WHERE id=1
UNION ALL
SELECT node.*, c.lvl + 1 as lvl
FROM node
JOIN c ON ltree2text(subpath(node.path,nlevel(node.path)-2 ,nlevel(node.path))) = CONCAT(subpath(c.path,nlevel(c.path)-1,nlevel(c.path)),'.',node.id)
),
maxlvl AS (
SELECT max(lvl) maxlvl FROM c
),
j AS (
SELECT c.*, json '[]' children
FROM c, maxlvl
WHERE lvl = maxlvl
UNION ALL
SELECT (c).*, json_agg(j) children FROM (
SELECT c, j
FROM j
JOIN c ON ltree2text(subpath(j.path,nlevel(j.path)-2,nlevel(j.path))) = CONCAT(subpath(c.path,nlevel(c.path)-1,nlevel(c.path)),'.',j.id)
) v
GROUP BY v.c
)
SELECT row_to_json(j)::text json_tree
FROM j
WHERE lvl = 1;
There is a big problem with this solution though, so far.. see the image below for the error (Node 5 is missing):

The reason node 5 does not show up is because it is a leaf node that is not at the max level and the subsequent join on condition excluded it.
The true base case for recursing through a tree is a node that is a leaf. By starting at the max level, that implicitly selects all leaf nodes but misses leaf nodes that occur at a lower level. Here is what we want to do in pseudo code:
for each node:
if node is leaf, then return empty array
else return the aggregated children
I found this hard to express in SQL though. Instead, I used the same strategy of starting from the max level and then moving up one level at a time. However, I added some code to handle the leaf node base case when I was above the max level.
Here is what I came up with:
WITH RECURSIVE c AS (
SELECT
name,
path,
nlevel(path) AS lvl
FROM node
),
maxlvl AS (
SELECT max(lvl) maxlvl FROM c
),
j AS (
SELECT
c.*,
json '[]' AS children
FROM c, maxlvl
WHERE lvl = maxlvl
UNION ALL
SELECT
(c).*,
CASE
WHEN COUNT(j) > 0 -- check if returned record is null
THEN json_agg(j) -- if not null, aggregate
ELSE json '[]' -- if null, then we are a leaf, so return empty array
END AS children
FROM (
SELECT
c,
CASE
WHEN c.path = subpath(j.path, 0, nlevel(j.path) - 1) -- c is a parent of the child
THEN j
ELSE NULL -- if c is not a parent, return NULL to trigger base case
END AS j
FROM j
JOIN c ON c.lvl = j.lvl - 1
) AS v
GROUP BY v.c
)
SELECT row_to_json(j)::text AS json_tree
FROM j
WHERE lvl = 1;
My solution only uses the path (and the derived level from the path). It does not need name or id to properly recurse.
Here is the result I get (I included a node 6 to make sure I handled multiple leaf nodes at the same level):
{
"name": "root",
"path": "1",
"lvl": 1,
"children": [
{
"name": "Node 1",
"path": "1.2",
"lvl": 2,
"children": [
{
"name": "Node 5",
"path": "1.2.5",
"lvl": 3,
"children": []
},
{
"name": "Node 3",
"path": "1.2.3",
"lvl": 3,
"children": [
{
"name": "Node 6",
"path": "1.2.3.4",
"lvl": 4,
"children": []
},
{
"name": "Node 4",
"path": "1.2.3.4",
"lvl": 4,
"children": []
}
]
}
]
}
]
}

Related

Accessing an Array Inside JSON with a Postgres Query

I have a table with a data_type of json that I need to query one of the properties inside of it.
This is what the data in the column looks like:
{
"id": 7008,
"access_links": [
{
"product_code": "PRODUCT-1",
"link": "https://some.url"
},
{
"product_code": "PRODUCT-2",
"link": "https://someOther.url"
}
],
"library_id": "2d1203db-75b3-43a5-947c-8555b48371db"
}
I need to be able to pull out and filter by the product_code nested inside of the access_links.
I can get one layer deep by using this query:
SELECT
courses.course_metadata -> 'access_links' as access_links
FROM
courses
This seems to get me into the column, but I can't query any further.
The output I receive from the query looks like:
[{"product_code":"PRODUCT-1","link":"https://some.url"},{"product_code":"PRODUCT-2","link":"https://someOther.url"}]
I've tried using the ->> and #>> operators, but they both complain about the array not starting with a {. Also worth noting that the column is a data type of JSON not JSONB, so the #> operator doesn't work.
What am I missing here?
Does this help?
select
json_array_elements (x->'access_links')->'product_code' as product_code
from
(select '{
"id": 7008,
"access_links": [
{
"product_code": "PRODUCT-1",
"link": "https://some.url"
},
{
"product_code": "PRODUCT-2",
"link": "https://someOther.url"
}
],
"library_id": "2d1203db-75b3-43a5-947c-8555b48371db"
}'::json x
) as v
;
product_code
"PRODUCT-1"
"PRODUCT-2"

Postgres - updating an array element in a json column

I have a json column in a postgres table.
The column contains the following json data:
{
"data": {
"id": "1234",
"sites": [
{
"site": {
"code": "1",
"display": "Site1"
}
},
{
"site": {
"code": "2",
"display": "Site2"
},
"externalSite": true
},
{
"site": {
"code": "3",
"display": "Site3"
}
}
]
}
}
I need to create an update query that adds another attribute ('newAttribute' in the sample below) to all array items that have '"externalSite": true', so, after running the update query the second array element will be:
{
"site": {
"code": "2",
"display": "Site2"
},
"externalSite": true,
"newAttribute": true
}
The following query returns the array elements that need to be updated:
select * from myTable, jsonb_array_elements(data -> 'sites') sites
where sites ->'externalSite' = 'true'
What is the syntax of the update query?
Thanks
Kobi
Assuming your table is called test and your column is called data, you can update it like so:
UPDATE test SET data =
(select jsonb_set(data::jsonb, '{"data","sites"}', sites)
FROM test
CROSS JOIN LATERAL (
SELECT jsonb_agg(CASE WHEN site ? 'externalSite' THEN site || '{"newAttribute":"true"}'::jsonb
ELSE site
END) AS sites
FROM jsonb_array_elements( (data#>'{"data","sites"}')::jsonb ) as ja(site)
) as sub
);
Note that I cast the data to jsonb data as there are more functions and operators available for manipulating jsonb than plain json.
You can run the SELECT statement alone to see what it is doing, but the basic idea is to re-create the sites object by expanding it with jsonb_array_elements and adding the newAttribute attribute if externalSite exists.
This array is then aggregated with jsonb_agg and, finally, in the outer select, the sites object is replaced entirely with this newly computed version.

Processing Array of Arrays in Azure Stream Analytics

I have this JSON Structure fed to my ASA:
[{
"Stages": [
{
"Name": "Stage 1",
"Count": 45,
"First": "2018-12-17T11:31:12.7448439-04:00",
"Average": 1.0,
"Max": 0.0
},
{
"Name": "Stage 2",
"Count": 7,
"First": "2018-12-17T11:31:12.7448469-04:00",
"Average": 0.0,
"Max": 0.0
}
],
"DateTimeET": "2018-12-17T11:31:12.7448477-04:00",
"Division": "One"
}]
I'm stuck on how to get the Name, Count, First, Average and Max for each element within the Stages Array.
I did this:
WITH CTE AS (
SELECT
event.Division
,event.DateTimeET
,StageElement
FROM
StageSummary AS event
CROSS APPLY getarrayelements(event.Stages) AS StageElement
)
SELECT
event2.Division
,event2.DateTimeET
,event2.StageElement
FROM
CTE AS event2
and I can get the array using the GetRecordProperties, but I get the full array again, I can't get something specific as 'Name' or 'Count'
Any help is appreciated.
Update:
I'm using the query as follows:
WITH CTE AS (
SELECT
event.Division
,event.DateTimeET
,StageElement
FROM
StageSummary AS event
CROSS APPLY getarrayelements(event.Stages) AS StageElement
)
SELECT
event2.Division
,event2.DateTimeET
,getrecordpropertyvalue(Elements,'Name') AS NameValue
FROM
CTE AS event2
CROSS APPLY getrecordproperties(event2.StageElement) AS Elements
but NameValue returns empty.
Since the structure of my Stages array is fixed, the solution is to use the ArrayValue to query each element like this:
SELECT
event.Division
,event.DateTimeET
,StageElement.ArrayValue.Name
,StageElement.ArrayValue.Count
,StageElement.ArrayValue.First
,StageElement.ArrayValue.Average
,StageElement.ArrayValue.Max
FROM
StageSummary AS event
CROSS APPLY getarrayelements(event.Stages) AS StageElement
That gives me the values I needed. I was missing the ArrayValue to reference the actual data, as the GetArrayElements return both ArrayValue and ArrayIndex

Select Objects from Array of Objects that match a property in MYSQL JSON

I have a table with 1 JSON type column city in a MySQL database that stores a JSON array of city objects with following structure:
{
"cities": [
{
"id": 1,
"name": "Mumbai",
"countryID": "9"
},
{
"id": 2,
"name": "New Delhi",
"countryID": "9"
},
{
"id": 3,
"name": "Abu Dhabi",
"countryID": "18"
}
]
}
I want to select objects from the cities array having countryID = 90 but I am stuck as the array of objects is stored in a single column city which is preventing me from doing a (*) with WHERE JSON_CONTAINS(city->'$.cities', JSON_OBEJECT('countryID', '90')).
My query looks like this and I am not getting anywhere,
SELECT JSON_EXTRACT(city, '$.cities') FROM MyTable WHERE JSON_CONTAINS(city->'$.cities', JSON_OBJECT('countryID', '90'))
It'd be a great help if someone can point me in right direction or gimme a solution to this.
Thanks
If you are using MySQL 8.0, there is a feature called JSON table functions. It converts JSON data into tabular form.Then onward you can filter the result.
The query to acheive the same is given below
Select country
FROM json_cal,
JSON_TABLE(
city,
"$.cities[*]" COLUMNS(
country JSON PATH "$",
NESTED PATH '$.countryID' COLUMNS (countryID TEXT PATH '$')
)
) AS jt1
where countryID = 90;
The DB Fiddle can be found here
More information on JSON Table functions can be found here

Testing to see if a value exists in a nested JSON array

I have a SQL 2016 table that contains a column holding JSON data. A sample JSON document looks as follows:
{
"_id": "5a450f0383cac0d725cd6735",
"firstname": "Nanette",
"lastname": "Mccormick",
"registered": "2016-07-10T01:50:10 +04:00",
"friends": [
{
"id": 0,
"name": "Cote Collins",
"interests": [
"Movies",
"Movies",
"Cars"
]
},
{
"id": 1,
"name": "Ratliff Ellison",
"interests": [
"Birding",
"Birding",
"Chess"
]
},
{
"id": 2,
"name": "William Ratliff",
"interests": [
"Music",
"Chess",
"Software"
]
}
],
"greeting": "Hello, Nanette! You have 4 unread messages.",
"favoriteFruit": "apple"
}
I want to pull all documents in which the interests array of each friends object contains a certain value. I attempted this but got no results:
Select *
From <MyTable>
Where 'Chess' IN (Select value From OPENJSON(JsonValue, '$.friends.interests'))
I should have gotten several rows returned. I must not be referencing the interests array correctly or not understanding how SQL Server deals with a JSON array of this type.
Since Interests is a nested array, you need to parse your way through the array levels. To do this, you can use CROSS APPLY with OPENJSON(). The first CROSS APPLY will get you the friend names and the JSON array of interests, and then the second CROSS APPLY pulls the interests out of the array and corrolates them with the appropriate friend names. Here's an example query:
Select [name]
From #MyTable
CROSS APPLY OPENJSON(JsonValue, '$.friends')
WITH ([name] NVARCHAR(100) '$.name',
interests NVARCHAR(MAX) AS JSON)
CROSS APPLY OPENJSON(interests)
WITH (Interest NVARCHAR(100) '$')
WHERE Interest = 'Chess'