Unpacking JSON Into Flat Format - json

I have JSON that resembles the following:
{
"ANNOTATIONS": [
{
"Label": "CommingledProduct",
"Text": "NBP"
},
{
"Label": "CommingledVenue",
"Text": "OTC"
}
]
}
I need to unpack this into a flat table with columns matching the annotation labels. So columns based on the above json become:
comingled_product
comingled_venue
The JSON is coming from a json field in a source table and being unpacked into another table.
I know that I could code as follows:
INSERT INTO my_target_table (comingled_product, comingled_venue)
SELECT
payload->'ANNOTATIONS'->0->>'Text',
payload->'ANNOTATIONS'->1->>'Text'
FROM my_source_table;
However, I would rather not use the ordinals of the annotations. I would prefer to use some syntax mirroring the psuedo-code below:
INSERT INTO my_target_table (comingled_product, comingled_venue)
SELECT
payload->'ANNOTATIONS'->'label="ComingledProduct"'->>'Text',
payload->'ANNOTATIONS'->'label="ComingledVenueID"'->>'Text'
FROM my_source_table;
Can anyone tell me if what I'm trying to ahcieve is possible and how to do it? There are more than the two annotations I have included in the sample, so anything that involves multiple joins is probably a no go.
Using PostGres 10.7

demo:db<>fiddle
WITH cte AS (
SELECT
elems.value
FROM
my_source_table,
json_array_elements(payload -> 'ANNOTATIONS') elems
)
SELECT
(SELECT value ->> 'Text' FROM cte WHERE value ->> 'Label' = 'CommingledProduct'),
(SELECT value ->> 'Text' FROM cte WHERE value ->> 'Label' = 'CommingledVenue')
Expanding the array into one row per array element and store this result for further usage into a CTE
This result can be used to query the expected values (without doing the expanding twice)
Could be a little bit faster:
demo:db<>fiddle
SELECT
payload,
MIN(the_text) FILTER (WHERE label = 'CommingledProduct'),
MIN(the_text) FILTER (WHERE label = 'CommingledVenue')
FROM (
SELECT
payload::text AS payload,
elems ->> 'Label' AS label,
elems ->> 'Text' AS the_text
FROM
my_source_table,
json_array_elements(payload -> 'ANNOTATIONS') elems
) s
GROUP BY payload

The answer from #S-Man is great and you should use that for your postgres 10.7. json_path will be added in postgres 12, which will allow you to do something a little bit closer to your pseudo-code, but only with jsonb (not json):
INSERT INTO my_target_table (comingled_product, comingled_venue)
SELECT jsonb_path_query(payload,
'$.ANNOTATIONS[*] ? (#.Label == "CommingledProduct")')->>'Text',
jsonb_path_query(payload,
'$.ANNOTATIONS[*] ? (#.Label == "CommingledVenue")')->>'Text'
FROM my_source_table;
The jsonb_path_query syntax takes a bit to figure out, but it is basically returning elements of the ANNOTATIONS array for which the Label equals either CommingledProduct or CommingledVenue. jsonb_path_query returns a jsonb object, so we can use the ->> operator to grab the value of 'Text' from the object.

Related

Using json_extract in sqlite to pull data from parent and child objects

I'm starting to explore the JSON1 library for sqlite and have been so far successful in the basic queries I've created. I'm now looking to create a more complicated query that pulls data from multiple levels.
Here's the example JSON object I'm starting with (and most of the data is very similar).
{
"height": 140.0,
"id": "cp",
"label": {
"bind": "cp_label"
},
"type": "color_picker",
"user_data": {
"my_property": 2
},
"uuid": "948cb959-74df-4af8-9e9c-c3cb53ac9915",
"value": {
"bind": "cp_color"
},
"width": 200.0
}
This json object is buried about seven levels deep in a json structure and I pulled it from the larger json construct using an sql statement like this:
SELECT value FROM forms, json_tree(forms.formJSON, '$.root')
WHERE type = 'object'
AND json_extract(value, '$.id') = #sControlID
// In this example, #sControlID is a variable that represents the `id` value we're looking for, which is 'cp'
But what I really need to pull from this object are the following:
the value from key type ("color_picker" in this example)
the values from keys bind ("cp_color" and "cp_label" in this example)
the keys value and label (which have values of {"bind":"<string>"} in this example)
For that last item, the key name (value and label in this case) can be any number of keywords, but no matter the keyword, the value will be an object of the form {"bind":"<some_string>"}. Also, there could be multiple keys that have a bind object associated with them, and I'd need to return all of them.
For the first two items, the keywords will always be type and bind.
With the json example above, I'd ideally like to retrieve two rows:
type key value
color_picker value cp_color
color_picker label cp_label
When I use json_extract methods, I end up retrieving the object {"bind":"cp_color"} from the json_tree table, but I also need to retrieve the data from the parent object. I feel like I need to do some kind of union, but my attempts have so far been unsuccessful. Any ideas here?
Note: if the {"bind":"<string>"} object doesn't exist as a child of the parent object, I don't want any rows returned.
Well, I was on the right track and eventually figured out it. I created a separate query for each of the items I was looking for, then INNER JOINed all the json_tree tables from each of the queries to have all the required fields available. Then I json_extracted the required data from each of the json fields I needed data from. In the end, it gave me exactly what I was looking for, though I'm sure it could be written more efficiently.
For anyone interested, this is what hte final query ended up looking like:
SELECT IFNULL(json_extract(parent.value, '$.type'), '_window_'), child.key, json_extract(child.value, '$.bind') FROM (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) parent INNER JOIN (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(value, '$.bind') != 'NULL' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) child ON child.parent = parent.id;
If you have any tips on reducing its complexity, feel free to comment!

Subquery for element of JSON column

I have a big JSON data in one column called response_return in a Postgres DB, with a response like:
{
"customer_payment":{
"OrderId":"123456789",
"Customer":{
"Full_name":"Francis"
},
"Payment":{
"AuthorizationCode":"9874565",
"Recurrent":false,
"Authenticate":false,
...
}
}
}
I tried to use Postgres functions like -> ,->> ,#> or #> to walk through headers to achieve AuthorizationCode for a query.
When I use -> in customer_payment in a SELECT, returns all after them. If I try with OrderId, it's returned NULL.
The alternatives and sources:
Using The JSON Datatype In PostgreSQL
Operator ->
Allows you to select an element based on its name.
Allows you to select an element within an array based on its index.
Can be used sequentially: ::json->'elementL'->'subelementM'->…->'subsubsubelementN'.
Return type is json and the result cannot be used with functions and operators that require a string-based datatype. But the result can be used with operators and functions that require a json datatype.
Query for element of array in JSON column
This is not helpful because I don't want filter and do not believe that need to transform to array.
If you just want to get a single attribute, you can use:
select response_return -> 'customer_payment' -> 'Payment' ->> 'AuthorizationCode'
from the_table;
You need to use -> for the intermediate access to the keys (to keep the JSON type) and ->> for the last key to return the value as a string.
Alternatively you can provide the path to the element as an array and use #>>
select response_return #>> array['customer_payment', 'Payment', 'AuthorizationCode']
from the_table;
Online example

T-SQL - search in filtered JSON array

SQL Server 2017.
Table OrderData has column DataProperties where JSON is stored. JSON example stored there:
{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Value": [
"test.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Value": [
"a"
]
}
]
}
}
So, it's an object with Input object, which has Data array that's KVP - full of objects with Key string and Value array of strings.
And my problem - I need to query for rows based on values in Files in example JSON - simple LIKE that matches %text%.
This query works:
SELECT TOP 10 *
FROM OrderData CROSS APPLY OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0
AND JSON_QUERY(dat.value, '$.Value') LIKE '%2%'
Problem is that this query is very slow, unsurprisingly.
How to make it faster?
I cannot create computed column with JSON_VALUE, because I need to filter in an array.
I cannot create computed column with JSON_QUERY on "$.Input.Data" or "$.Input.Data[0].Values" - because I need specific array item in this array with Key == "Files".
I've searched, but it seems that you cannot create computed column that also filters data, like with this attempt:
ALTER TABLE OrderData
ADD aaaTest AS (select JSON_QUERY(dat.value, '$.Value')
OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 );
Error: Subqueries are not allowed in this context. Only scalar expressions are allowed.
What are my options?
Add Files column with an index and use INSERT/UPDATE triggers that populate this column on inserts/updates?
Create a view that "computes" this column? Can't add index, will still be slow
So far only option 1. has some merit, but I don't like triggers and maybe there's another option?
You might try something along this:
Attention: I've added a 2 to the text2 to fullfill your filter. And I named both to the plural "Values":
DECLARE #mockupTable TABLE(ID INT IDENTITY, DataProperties NVARCHAR(MAX));
INSERT INTO #mockupTable VALUES
(N'{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Values": [
"test2.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Values": [
"a"
]
}
]
}
}');
The query
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
AND dat.[Values] LIKE '%2%';
The main difference is the WITH-clause, which is used to return the properties inside an object in a typed way and side-by-side (similar to a naked OPENJSON with a PIVOT for all columns - but much better). This avoids expensive JSON methods in your WHERE...
Hint: As we return the Value with NVARCHAR(MAX) AS JSON we can continue with the nested array and might proceed with something like this:
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
--we read the array again with `OPENJSON`:
AND 'test2.txt' IN(SELECT [Value] FROM OPENJSON(dat.[Values]));
You might use one more CROSS APPLY to add the array's values and filter this at the WHERE directly.
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
CROSS APPLY OPENJSON(dat.[Values]) vals
WHERE dat.[Key] = 'Files'
AND vals.[Value]='test2.txt'
Just check it out...
This is an old question, but I would like to revisit it. There isn't any mention of how the source table is actually constructed in terms of indexing. If the original author is still around, can you confirm/deny what indexing strategy you used? For performant json document queries, I've found that having a table using the COLUMSTORE indexing strategy yields very performant JSON queries even with large amounts of data.
https://learn.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15 has an example of different indexing techniques. For my personal solution I've been using COLUMSTORE albeit on a limited NVARCAHR document size. It's fast enough for any purposes I have even under millions of rows of decently sized json documents.

How do I search for a specific string in a JSON Postgres data type column?

I have a column named params in a table named reports which contains JSON.
I need to find which rows contain the text 'authVar' anywhere in the JSON array. I don't know the path or level in which the text could appear.
I want to just search through the JSON with a standard like operator.
Something like:
SELECT * FROM reports
WHERE params LIKE '%authVar%'
I have searched and googled and read the Postgres docs. I don't understand the JSON data type very well, and figure I am missing something easy.
The JSON looks something like this.
[
{
"tileId":18811,
"Params":{
"data":[
{
"name":"Week Ending",
"color":"#27B5E1",
"report":"report1",
"locations":{
"c1":0,
"c2":0,
"r1":"authVar",
"r2":66
}
}
]
}
}
]
In Postgres 11 or earlier it is possible to recursively walk through an unknown json structure, but it would be rather complex and costly. I would propose the brute force method which should work well:
select *
from reports
where params::text like '%authVar%';
-- or
-- where params::text like '%"authVar"%';
-- if you are looking for the exact value
The query is very fast but may return unexpected extra rows in cases when the searched string is a part of one of the keys.
In Postgres 12+ the recursive searching in JSONB is pretty comfortable with the new feature of jsonpath.
Find a string value containing authVar:
select *
from reports
where jsonb_path_exists(params, '$.** ? (#.type() == "string" && # like_regex "authVar")')
The jsonpath:
$.** find any value at any level (recursive processing)
? where
#.type() == "string" value is string
&& and
# like_regex "authVar" value contains 'authVar'
Or find the exact value:
select *
from reports
where jsonb_path_exists(params, '$.** ? (# == "authVar")')
Read in the documentation:
The SQL/JSON Path Language
jsonpath Type

How to get elements from Json array in PostgreSQL

I have searched quite much on this and still unanswerable. I'm using PostgreSQL. Column name is "sections" and column type is json[] in below example.
My column looks like this in database:
sections
[{"name" : "section1",
"attributes": [{"attrkey1": "value1",
"attrkey2": "value2"},
{"attrkey3": "value3",
"attrkey4": "value4"}]
},
{"name" : "section2",
"attributes": [{"attrkey3": "value5",
"attrkey6": "value6"},
{"attrkey1": "value7",
"attrkey8": "value8"}]
}]
It's json array and I want to get "attrkey3" in my result. For getting particular key from Json, I can use json_extract_path_text(json_column, 'json_property') which is working perfectly fine. But I have no idea how to get some property from json[].
If I talk about above example, I want to get value of property "attrkey2" to be shown in my result. I know it's an array so it might work differently than usual, e.g. all the values of my array would act as a different row so I might have to write subquery but no idea how to do it.
Also, I can't write index statically and get property of the json element from some particular index. My query will be generated dynamically so I would never know how many elements are inside json array.
I saw some static examples but don't know how to implement it in my case. Can someone tell me how to do this in query?
I'm not sure you have a json[] (PostgreSQL array of json values) typed column, or a json typed column, which appears to be a JSON array (like in your example).
Either case, you need to expand your array before querying. In case of json[], you need to use unnest(anyarray); in case of JSON arrays in a json typed column, you need to use json_array_elements(json) (and LATERAL joins -- they are implicit in my examples):
select t.id,
each_section ->> 'name' section_name,
each_attribute ->> 'attrkey3' attrkey3
from t
cross join unnest(array_of_json) each_section
cross join json_array_elements(each_section -> 'attributes') each_attribute
where (each_attribute -> 'attrkey3') is not null;
-- use "where each_attribute ? 'attrkey3'" in case of jsonb
select t.id,
each_section ->> 'name' section_name,
each_attribute ->> 'attrkey3' attrkey3
from t
cross join json_array_elements(json_array) each_section
cross join json_array_elements(each_section -> 'attributes') each_attribute
where (each_attribute -> 'attrkey3') is not null;
SQLFiddle
Unfortunately, you cannot use any index with your data. You need to fix your schema first, in order to do that.
If you wish to access a single element then use json_array -> index
For example, if you have json_arr=[1,2,3] then json_array -> 0 will return 1
And also, if there was a key value map data in array:
select each_data -> 'value' as value3
from t cross join jsonb_array_elements(t.sections -> 'attributes') each_attribute
where each_attribute -> 'key' = '"attrkey3"'
I am mentioning this because the great answer also provided a perfect solution for my case. By the way, also be aware of jsonb_array.. method for jsonb type attribute.