Get bottom-up nested json for query in postgresql - json

Given the following table, I want to find a category by ID, then get a JSON object containing its parent row as JSON. if I look up category ID 999 I would like the following json structure.
How can I achieve this?
{
id: 999
name: "Sprinting",
slug: "sprinting",
description: "sprinting is fast running",
parent: {
id: 2
name: "Running",
slug: "running ",
description: "All plans related to running.",
parent: {
id: 1
name: "Sport",
slug: "sport ",
description: null,
}
}
}
CREATE TABLE public.categories (
id integer NOT NULL,
name text NOT NULL,
description text,
slug text NOT NULL,
parent_id integer
);
INSERT INTO public.categories (id, name, description, slug, parent_id) VALUES (1, 'Sport', NULL, 'sport', NULL);
INSERT INTO public.categories (id, name, description, slug, parent_id) VALUES (2, 'Running', 'All plans related to running.', 'running', 1);
INSERT INTO public.categories (id, name, description, slug, parent_id) VALUES (999, 'Sprinting', 'sprinting is fast running', 'sprinting', 2);```

demo:db<>fiddle
(Explanation below)
WITH RECURSIVE hierarchy AS (
SELECT id, parent_id
FROM categories
WHERE id = 999
UNION
SELECT
c.id, c.parent_id
FROM categories c
JOIN hierarchy h ON h.parent_id = c.id
),
jsonbuilder AS (
SELECT
c.id,
h.parent_id,
jsonb_build_object('id', c.id, 'name', c.name, 'description', c.description, 'slug', c.slug) as jsondata
FROM hierarchy h
JOIN categories c ON c.id = h.id
WHERE h.parent_id IS NULL
UNION
SELECT
c.id,
h.parent_id,
jsonb_build_object('id', c.id, 'name', c.name, 'description', c.description, 'slug', c.slug, 'parent', j.jsondata)
FROM hierarchy h
JOIN categories c ON c.id = h.id
JOIN jsonbuilder j ON j.id = h.parent_id
)
SELECT
jsondata
FROM jsonbuilder
WHERE id = 999
Generally you need a recursive query to create nested JSON objects. The naive approach is:
Get record with id = 999, create a JSON object
Get record with id = parent_id of record with 999 (id = 2), build JSON object, add this als parent attribute to previous object.
Repeat step 2 until parent is NULL
Unfortunately I saw no simple way to add a nested parent. Each step nests the JSON into deep. Yes, I am sure, there is a way to do this, storing a path of parents and use jsonb_set() everytime. This could work.
On the other hand, it's much simpler to put the currently created JSON object into a new one. So to speak, the approach is to build the JSON from the deepest level. In order to do this, you need the parent path as well. But instead create and store it while creating the JSON object, you could create it first with a separate recursive query:
WITH RECURSIVE hierarchy AS (
SELECT id, parent_id
FROM categories
WHERE id = 999
UNION
SELECT
c.id, c.parent_id
FROM categories c
JOIN hierarchy h ON h.parent_id = c.id
)
SELECT * FROM hierarchy
Fetching the record with id = 999 and its parent. Afterwards fetch the record of the parent, its id and its parent_id. Do this until parent_id is NULL.
This yields:
id | parent_id
--: | --------:
999 | 2
2 | 1
1 | null
Now we have a simple mapping list which shows the traversal tree. What is the difference to our original data? If your data contained two or more children for record with id = 1, we would not know which child we have to take to finally reach child 999. However, this result lists exactly only the anchestor relations and would not return any siblings.
Well having this, we are able to traverse the tree from the topmost element which can be embedded at the deepest level:
Fetch the record which has no parent. Create a JSON object from its data.
Fetch the child of the previous record. Create a JSON object from its data and embed the previous JSON data as parent.
Continue until there is no child.
How does it work?
This query uses recursive CTEs. The first part is the initial query, the first record, so to speak. The second part, the part after UNION, is the recursive part which usually references to the WITH clause itself. This is always a reference to the previous turn.
The JSON part is simply creating a JSON object using jsonb_build_object() which takes an arbitrary number of values. So we can use the current record data and additionally for the parent attribute the already created JSON data from the previous turn.

Related

Update postgres values with corresponding keys

I want lower case the value in for specific keys:-
Table:
logs
id bigint , jsondata text
[
{
"loginfo": "somelog1",
"id": "App1",
"identifier":"IDENTIF12"
},
{
"loginfo": "somelog2",
"id": "APP2",
"identifier":"IDENTIF1"
}
]
I need to lower only id and identifier..
Need acheive something as below
UPdATE SET json_agg(elems.id) = lowered_val...
SELECT
id,
lower(json_agg(elems.id)) as lowered_val
FROM logs,
json_array_elements(jsondata::json) as elems
GROUP BY id;
demo:db<>fiddle
This is not that simple. You need to expand and extract the complete JSON object and have to do this manually:
SELECT
id,
json_agg(new_object) -- 5
FROM (
SELECT
id,
json_object_agg( -- 4
attr.key,
CASE -- 3
WHEN attr.key IN ('id', 'identifier') THEN LOWER(attr.value)
ELSE attr.value
END
) as new_object
FROM mytable,
json_array_elements(jsondata::json) WITH ORDINALITY as elems(value, index), -- 1
json_each_text(elems.value) as attr -- 2
GROUP BY id, elems.index -- 4
) s
GROUP BY id
Extract the arrays. WITH ORDINALITY adds an index to the array elements, to be able to group the original arrays afterwards.
Extract the elements of the array elements into a new record each. This creates two columns key and value.
If the key is your expected keys to be modified, make the related values lower case, leave all others
Rebuild the JSON objects
Reagggregate them into a new JSON array

Count other documents in the same table matching the condition

I've got this query:
SELECT
id,
(
SELECT COUNT(*)
FROM default t USE KEYS default.id
WHERE t.p_id=default.id
) as children_count
FROM default
WHERE 1
I expect this:
[
{
"children_count": 5,
"id": "1"
},
...
]
But i got this:
[
{
"children_count": [
{
"$1": 0
}
],
"id": "1"
},
...
]
What am I doing wrong? I've googled this, but I can't find any clear explaination of count subqueries in N1QL, so any links to documentation will be highly appreciated.
UPD:
I've updated my code according to #prettyvoid answer. I've also created minimal example bucket to demonstrate the problem.
SELECT
id,
(
SELECT COUNT(*) as count
FROM test t USE KEYS "p.id"
WHERE t.p_id=p.id
)[0].count as children_count
FROM test p
WHERE 1
The result is following:
[
{
"children_count": 0,
"id": 1
},
{
"children_count": 0,
"id": 2
},
{
"children_count": 0,
"id": 3
}
]
Any select statement will yield you an array with objects inside, that's normal. If you want to get your expected result then scope-in to the count object inside the array [0]
edit: Following query will do what you want, I'm unsure if there is a better way though
SELECT
id,
(
SELECT COUNT(*) as count
FROM default t USE KEYS (SELECT RAW meta().id from default)
WHERE t.p_id=p.id
)[0].count as children_count
FROM default p
It's important to note that in Couchbase, the very fastest way to retrieve a document is via its document key. Using a GSI index is workable, but slower. And as with most other databases, it is best to avoid a full scan. You say you can make id the same as the document key, so I will assume that is the case, so that I can use p_id in the on keys clause.
Is it o.k. to only list documents with a non-zero number of children? In that case, you can write this as an aggregation query where you join each child to its parent, and group by the parent id (note my bucket is called default):
select p.id, count(*) as children_count
from default c join default p on keys c.p_id
group by p.id;
If you need to include documents with zero children, you need to UNION with a query that finds those documents as well. In this case we know that:
select raw array_agg(distinct(p_id)) from default where p_id is not null
will give us an array of parent IDs, so we can get the ids not in the list with:
select id, 0 as children_count
from default p
where not array_contains(
(select raw array_agg(distinct(p_id)) from default where p_id is not null)[0],id);
So, if we UNION the two:
(select p.id, count(*) as children_count
from default c join default p on keys c.p_id
group by p.id)
UNION
(select id, 0 as children_count
from default p
where not array_contains(
(select raw array_agg(distinct(p_id)) from default where p_id is not null)[0],id));
We get a list of all the ids and their children_count, zero or not. If you want more than just the id, add more fields or '*' to the select list for each query.

Multiple properties on root level when nesting FOR JSON PATH?

Running SQL Server 2016. Consider the sample below. Nesting FOR JSON PATH is easy as long as you give each query an alias. In my case, I want many (but not all) properties to belong to the root - i.e. no alias!
With unwanted alias a:
DECLARE #SomeID int = 1
SELECT
(SELECT TOP 1 ID, A1, A2 FROM A WHERE ID = #SomeID
FOR JSON PATH) AS 'a', -- Unwanted!
(SELECT TOP 1 ID, B1, B2 FROM B WHERE ID = #SomeID
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) AS 'b'
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
If you remove the alias, you get this error when running the query:
Column expressions and data sources without names or aliases cannot be
formatted as JSON text using FOR JSON clause. Add alias to the unnamed
column or table.
No alias. Repetitive queries:
SELECT
-- Wanted! But tedious for more complex queries...
(SELECT TOP 1 ID FROM A WHERE ID = #SomeID) AS 'id',
(SELECT TOP 1 A1 FROM A WHERE ID = #SomeID) AS 'a1',
(SELECT TOP 1 A2 FROM A WHERE ID = #SomeID) AS 'a2',
(SELECT TOP 1 ID, B1, B2 FROM B WHERE ID = #SomeID
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) AS 'b'
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
The latter produces the right JSON. However, in my complex database I cannot repeat the statements like that. Hence, I need a better construct to put many properties on the root - without an alias. How can this be achieved?
(For completeness. Script to create sample tables below.)
CREATE TABLE A(ID int, A1 int, A2 int)
GO
INSERT INTO A(ID, A1, A2)
SELECT 1, 0, 0
UNION
SELECT 1, 1, 1
CREATE TABLE B(ID int, B1 int, B2 int)
GO
INSERT INTO B(ID, B1, B2)
SELECT 1, 100, 100
UNION
SELECT 1, 101, 101
This should produce the JSON you are after, without repeating the queries.
select
top 1
id,
a1,
a2,
(SELECT TOP 1 ID, B1, B2 FROM #B WHERE ID = #SomeID FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) AS 'b'
from a
where id = #someid
for json path, without_array_wrapper

PostgreSQL json aggregation issue

How with Postgres aggregation function merge one object as element in array field in parent object.
What need
Sector
Project
Result
My SQL request.
select row_to_json(t)
from (
select id, data,
(
select array_to_json(array_agg(row_to_json(p)))
from (
select id, data
from public."Project"
where (s.data -ยป 'projectId') :: UUID = id
) p
) as projects
from public."Sector" s
) t;
It don't work, because projects is null. But need, unwind data field and merge projectId in data with Project table. Like unwind and lookup in MongoDB.

query for a set in a relational database

I would like to query a relational database if a set of items exists.
The data I am modeling are of the following form:
key1 = [ item1, item3, item5 ]
key2 = [ item2, item7 ]
key3 = [ item2, item3, item4, item5 ]
...
I am storing them in a table with the following schema
CREATE TABLE sets (key INTEGER, item INTEGER);
So for example, the following insert statements would insert the above three sets.
INSERT INTO sets VALUES ( key1, item1 );
INSERT INTO sets VALUES ( key1, item3 );
INSERT INTO sets VALUES ( key1, item5 );
INSERT INTO sets VALUES ( key2, item2 );
INSERT INTO sets VALUES ( key2, item7 );
INSERT INTO sets VALUES ( key3, item2 );
INSERT INTO sets VALUES ( key3, item3 );
INSERT INTO sets VALUES ( key3, item4 );
INSERT INTO sets VALUES ( key3, item5 );
Given a set of items, I would like the key associated with the set if it is stored in the table and NULL if it is not. Is it possible to do this with an sql query? If so, please provide details.
Details that may be relevant:
I am primarily interested in the database design / query strategy, though I will eventually implement this in MySQL and preform the query from with in python using the mysql-python package.
I have the freedom to restructure the database schema if a different layout would be more convenient for this type of query.
Each set, if it exists is supposed to be unique.
I am not interested in partial matches.
The database scale is on the order of < 1000 sets each of which contains < 10 items each, so performance at this point is not a priority.
Thanks in advance.
I won't comment on whether there is a better suited schema for doing this (it's quite possible), but for a schema having columns name and item, the following query should work. (mysql syntax)
SELECT k.name
FROM (SELECT DISTINCT name FROM sets) AS k
INNER JOIN sets i1 ON (k.name = i1.name AND i1.item = 1)
INNER JOIN sets i2 ON (k.name = i2.name AND i2.item = 3)
INNER JOIN sets i3 ON (k.name = i3.name AND i3.item = 5)
LEFT JOIN sets ix ON (k.name = ix.name AND ix.item NOT IN (1, 3, 5))
WHERE ix.name IS NULL;
The idea is that we have all the set keys in k, which we then join with the set item data in sets once for each set item in the set we are searching for, three in this case. Each of the three inner joins with table aliases i1, i2 and i3 filter out all set names that don't contain the item searched for with that join. Finally, we have a left join with sets with table alias ix, which brings in all the extra items in the set, that is, every item we were not searching for. ix.name is NULL in the case that no extra items are found, which is exactly what we want, thus the WHERE clause. The query returns a row containing the set key if the set is found, no rows otherwise.
Edit: The idea behind collapsars answer seems to be much better than mine, so here's a bit shorter version of that with explanation.
SELECT sets.name
FROM sets
LEFT JOIN (
SELECT DISTINCT name
FROM sets
WHERE item NOT IN (1, 3, 5)
) s1
ON (sets.name = s1.name)
WHERE s1.name IS NULL
GROUP BY sets.name
HAVING COUNT(sets.item) = 3;
The idea here is that subquery s1 selects the keys of all sets that contain items other that the ones we are looking for. Thus, when we left join sets with s1, s1.name is NULL when the set only contains items we are searching for. We then group by set key and filter out any sets having the wrong number of items. We are then left with only sets which contain only items we are searching for and are of the correct length. Since sets can only contain an item once, there can only be one set satisfying that criteria, and that's the one we're looking for.
Edit: It just dawned on me how to do this without the exclusion.
SELECT totals.name
FROM (
SELECT name, COUNT(*) count
FROM sets
GROUP BY name
) totals
INNER JOIN (
SELECT name, COUNT(*) count
FROM sets
WHERE item IN (1, 3, 5)
GROUP BY name
) matches
ON (totals.name = matches.name)
WHERE totals.count = 3 AND matches.count = 3;
The first subquery finds the total count of items in each set and the second one finds out the count of matching items in each set. When matches.count is 3, the set has all the items we're looking for, and if totals.count is also 3, the set doesn't have any extra items.
aleksis solution requires an specific query for every posssible item set. the following suggestion provides a generic solution in the sense that the item set to be queried can be factored in as a result set of another query - just replace the set containment operators by a suitable subquery.
SELECT CASE COUNT(ddd.key) WHEN 0 THEN NULL ELSE MIN(ddd.key) END
FROM (
SELECT s4.key
, COUNT(*) icount
FROM sets s4
JOIN (
SELECT DISTINCT d.key
FROM (
SELECT s1.key
FROM sets s1
WHERE s1.item IN ('item1', 'item3', 'item5')
MINUS
SELECT s2.key
FROM sets s2
WHERE s2.item NOT IN ('item1', 'item3', 'item5')
) d
) dd ON ( dd.key = s4.key )
GROUP BY s4.key
) ddd
WHERE ddd.icount = (
SELECT COUNT(*)
FROM (
SELECT DISTINCT s3.item
FROM sets s3
WHERE s3.item IN ('item1', 'item3', 'item5')
)
)
;
the result set dd delivers a candidate set of keys who do not asscociate with other items than those from the set to be tested. the only ambiguity may arise from keys who reference a proper subset of the tested item set. thus we count the number of items associated with the keys of dd and choose that key where this number matches the cardinality of the tested item set. if such a key exists it is unique (as we know that the item sets are unique).
the case expression in the outermost select is just a fancy way to guarantee that their will be no empty result set, i.e. a null value will be returned if the item set is not represented by the relation.
maybe this solution will be useful to you,
best regards
carsten
This query has a well known name. Google "relational division", "set containment join", "set equality join".
To simplify collapsar's solution, which was already simplified by Aleksi Torhamo:
It isn't necessary to get all keys that DO NOT MATCH, which could be large, just get the ones that do match and call them partial matches.
-- get all partial matches
CREATE TEMPORARY VIEW partial_matches AS
SELECT DISTINCT key FROM sets WHERE item IN (1,3,5);
-- filter for full matches
SELECT sets.key
FROM sets, partial_matches
WHERE sets.key = partial_matches.key
GROUP BY sets.key HAVING COUNT(sets.key) = 3;