how to extract properly when sqlite json has value as an array - json

I have a sqlite database and in one of the fields I have stored complete json object . I have to make some json select requests . If you see my json
the ALL key has value which is an array . We need to extract some data like all comments where "pod" field is fb . How to extract properly when sqlite json has value as an array ?
select json_extract(data,'$."json"') from datatable ; gives me entire thing . Then I do
select json_extract(data,'$."json"[0]') but i dont want to do it manually . i want to iterate .
kindly suggest some source where i can study and work on it .
MY JSON
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
}]
}
]
}
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('<abovejson in a single line>'));

Simple List
Where your JSON represents a "simple" list of comments, you want something like:
select key, value
from datatable, json_each( datatable.data, '$.ALL' )
where json_extract( value, '$.pod' ) = 'fb' ;
which, using your sample data, returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
The use of json_each() returns a row for every element of the input JSON (datatable.data), starting at the path $.ALL (where $ is the top-level, and ALL is the name of your array: the path can be omitted if the top-level of the JSON object is required). In your case, this returns one row for each comment entry.
The fields of this row are documented at 4.13. The json_each() and json_tree() table-valued functions in the SQLite documentation: the two we're interested in are key (very roughly, the "row number") and value (the JSON for the current element). The latter will contain elements called comment and pod, etc..
Because we are only interested in elements where pod is equal to fb, we add a where clause, using json_extract() to get at pod (where $.pod is relative to value returned by the json_each function).
Nested List
If your JSON contains nested elements (something I didn't notice at first), then you need to use the json_tree() function instead of json_each(). Whereas the latter will only iterate over the immediate children of the node specified, json_tree() will descend recursively through all children from the node specified.
To give us some data to work with, I have augmented your test data with an extra element:
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
},
{
"comments": "inserted by TripeHound",
"data": ["facebook"],
"pod": "fb"
}]
}
]
}
'));
If we were to simply switch to using json_each(), then we see that a simple query (with no where clause) will return all elements of the source JSON:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' ) limit 10 ;
ALL|[{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"},{"comments":"your channel is good","data":["youTube"],"pod":"library"},{"comments":"you like everything","data":["facebook"],"pod":"fb"},{"data":["twitter"],"pod":"tw","ALL":[{"data":[{"codeLevel":"3"}],"pod":"mo","pod2":"p"},{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}]}]
0|{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"}
comments|your site is awesome
pod|passcode
originalDirectory|case1
1|{"comments":"your channel is good","data":["youTube"],"pod":"library"}
comments|your channel is good
data|["youTube"]
0|youTube
pod|library
Because JSON objects are mixed in with simple values, we can no longer simply add where json_extract( value, '$.pod' ) = 'fb' because this produces errors when value does not represent an object. The simplest way around this is to look at the type values returned by json_each()/json_tree(): these will be the string object if the row represents a JSON object (see above documentation for other values).
Adding this to the where clause (and relying on "short-circuit evaluation" to prevent json_extract() being called on non-object rows), we get:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
which returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
1|{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}
If desired, we could use json_extract() to break apart the returned objects:
.mode column
.headers on
.width 30 15 5
select json_extract( value, '$.comments' ) as Comments,
json_extract( value, '$.data' ) as Data,
json_extract( value, '$.pod' ) as POD
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
Comments Data POD
------------------------------ --------------- -----
you like everything ["facebook"] fb
inserted by TripeHound ["facebook"] fb
Note: If your structure contained other objects, of different formats, it may not be sufficient to simply select for type = 'object': you may have to devise a more subtle filtering process.

Related

Postgres jsonb conditional replace of specific property in array of objects

Imagine I have a column data in a postgres table with the following sample data:
[
{
"type": "a",
"name": "Joe"
},
{
"type": "b",
"name": "John"
}
]
I want to perform an update on this table to update the type properties for each object in the json array, converting them from the current text to a corresponding number.
text "a" becomes 1
text "b" becomes 2
and so forth
I got as far as this:
update "table"
set "data" = jsonb_set("data", '{0,type}','1')
I understand this will update whichever object is at position 0 in the array to have value 1 in the type property, which is of course not what I want.
The replace needs to be conditional, if there is an a, it should become a 1, if there is a b, it should become a 2, etc..
Is there any way to accomplish what I'm looking for?
You can use JSONB_SET() function nested in JSONB_AGG() within an UPDATE Statement after producing consecutive integers through use of WITH ORDINALITY keywords following JSONB_ARRAY_ELEMENTS() function such as
UPDATE tab
SET data = (
SELECT JSONB_AGG(JSONB_SET(j, '{type}', ('"'||idx||'"')::JSONB))
FROM JSONB_ARRAY_ELEMENTS(data)
WITH ORDINALITY arr(j,idx)
)
Demo

Update JSON Array in Postgres with specific key

I have a complex array which look like following in a table column:
{
"sometag": {},
"where": [
{
"id": "Krishna",
"nick": "KK",
"values": [
"0"
],
"function": "ADD",
"numValue": [
"0"
]
},
{
"id": "Krishna1",
"nick": "KK1",
"values": [
"0"
],
"function": "SUB",
"numValue": [
"0"
]
}
],
"anotherTag": [],
"TagTag": {
"tt": "tttttt",
"tt1": "tttttt"
}
In this array, I want to update the function and numValue of id: "Krishna".
Kindly help.
This is really nasty because
Updating an element inside a JSON array always requires to expand the array
On-top: The array is nested
The identfier for the elements to update is a sibling not a parent, which means, you have to filter by a sibling
So I came up with a solution, but I want to disclaim: You should avoid doing this as regular database action! Better would be:
Parsing your JSON in the backend and do the operations in your backend code
Normalize the JSON in your database if that would be a common task, meaning: Create tables with appropriate columns and extract your JSON into the table structure. Do not store entire JSON objects in the database! That would make every single task much more easier and incredible more performant!
demo:db<>fiddle
SELECT
jsonb_set( -- 5
(SELECT mydata::jsonb FROM mytable),
'{where}',
updated_array
)::json
FROM (
SELECT
jsonb_agg( -- 4
CASE WHEN array_elem ->> 'id' = 'Krishna' THEN
jsonb_set( -- 3
jsonb_set(array_elem.value::jsonb, '{function}', '"ADDITION"'::jsonb), -- 2
'{numValue}',
'["0","1"]'::jsonb
)
ELSE array_elem::jsonb END
) as updated_array
FROM mytable,
json_array_elements(mydata -> 'where') array_elem -- 1
) s
Extract the nested array elements into one element per row
Replace function value. Note the casts from type json to type jsonb. That is necessary because there's no json_set() function but only jsonb_set(). Naturally, if you just have type jsonb, the casts are not necessary.
Replace numValue value
Reaggregate the array
Replace the where value of the original JSON object with the newly created array object.

How can Postgres extract parts of json, including arrays, into another JSON field?

I'm trying to convince PostgreSQL 13 to pull out parts of a JSON field into another field, including a subset of properties within an array based on a discriminator (type) property. For example, given a data field containing:
{
"id": 1,
"type": "a",
"items": [
{ "size": "small", "color": "green" },
{ "size": "large", "color": "white" }
]
}
I'm trying to generate new_data like this:
{
"items": [
{ "size": "small" },
{ "size": "large"}
]
}
items can contain any number of entries. I've tried variations of SQL something like:
UPDATE my_table
SET new_data = (
CASE data->>'type'
WHEN 'a' THEN
json_build_object(
'items', json_agg(json_array_elements(data->'items') - 'color')
)
ELSE
null
END
);
but I can't seem to get it working. In this case, I get:
ERROR: set-returning functions are not allowed in UPDATE
LINE 6: 'items', json_agg(json_array_elements(data->'items')...
I can get a set of items using json_array_elements(data->'items') and thought I could roll this up into a JSON array using json_agg and remove unwanted keys using the - operator. But now I'm not sure if what I'm trying to do is possible. I'm guessing it's a case of PEBCAK. I've got about a dozen different types each with slightly different rules for how new_data should look, which is why I'm trying to fit the value for new_data into a type-based CASE statement.
Any tips, hints, or suggestions would be greatly appreciated.
One way is to handle the set json_array_elements() returns in a subquery.
UPDATE my_table
SET new_data = CASE
WHEN data->>'type' = 'a' THEN
(SELECT json_build_object('items',
json_agg(jae.item::jsonb - 'color'))
FROM json_array_elements(data->'items') jae(item))
END;
db<>fiddle
Also note that - isn't defined for json only for jsonb. So unless your columns are actually jsonb you need a cast. And you don't need an explicit ... ELSE NULL ... in a CASE expression, NULL is already the default value if no other value is specified in an ELSE branch.

MS SQL Query a field containing JSON

I have the following JSON in a SQL field in a table:
{
"type": "info",
"date": "2019/11/12 14:28:51",
"state": {
"6ee8587f-3b8c-4e5c-89a9-9f04752607f0": {
"state": "open",
"color": "#0000ff"
}
},
...
}
I query this in MS SQL using the folloing:
SELECT
JSON_VALUE(json_data, '$.type') AS msg_type
,JSON_VALUE(json_data, '$."date"') AS event_date
,JSON_QUERY(json_data, '$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0".state') AS json_state
,JSON_QUERY(json_data, '$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0".color') AS json_color
FROM
[dbo].[tbl_json_dump]
To get the date (a reserved word) back I have to put the the field name in like $."date"
I cannot seem to get the data back for the state or color fields and I think it has to do with that it is nested under "6ee8587f-3b8c-4e5c-89a9-9f04752607f0" because when I query :
JSON_QUERY(json_data, '$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0"') AS json_state
I get the object back -
{"state":"open","color":"#0000ff"}
but using
JSON_QUERY(json_data, '$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0".state') AS json_state
it is not working
Any suggestions on what I'm doing wrong??
Just replace JSON_QUERY with JSON_VALUE since you're interested in getting the value.
JSON_QUERY is supposed to return a JSON fragment and designed to work on objects and arrays, not values.
Salman A already provided the answer. Just to add a few points.
JSON_VALUE() - Extracts a Scalar value
JSON_QUERY() - Extracts an object or an array from a JSON string.
If you see the syntax , JSON_QUERY ( expression [ , path ] ) & JSON_VALUE ( expression , path ) , both are more or less except the [] square brackets for path and it means optional. It is because JSON_QUERY() can extract whole JSON field if required.
And on the return types,
JSON_VALUE() returns a JSON fragment of type nvarchar(max)
JSON_QUERY() returns a single text value of type nvarchar(4000)
Overall comparison
DECLARE #data NVARCHAR(4000)
SET #data=N'{
"type": "info",
"date": "2019/11/12 14:28:51",
"state": {
"6ee8587f-3b8c-4e5c-89a9-9f04752607f0": {
"state": "open",
"color": "#0000ff"
}
},
}'
SELECT
JSON_VALUE(#data,'$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0"') AS 'JSON_VALUE_FAILED',
JSON_QUERY(#data,'$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0"') AS 'JSON_QUERY_SUCCEED',
JSON_VALUE(#data,'$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0".state') AS 'JSON_VALUE_SUCCEED',
JSON_QUERY(#data,'$.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0".state') AS 'JSON_QUERY_SUCCEED';
Check Output here
You may try with another possible approach (more complicated), which parses all nested JSON objects.
Table:
CREATE TABLE Data (
JsonData nvarchar(max)
)
INSERT INTO Data
(JsonData)
VALUES
(N'{
"type": "info",
"date": "2019/11/12 14:28:51",
"state": {
"6ee8587f-3b8c-4e5c-89a9-9f04752607f0": {
"state": "open",
"color": "#0000ff"
},
"6ee8587f-3b8c-4e5c-89a9-9f04752607f1": {
"state": "open",
"color": "#0000ff"
}
}
}')
Statement:
SELECT
j1.[type], j1.[date], j2.[key], j3.state, j3.color
FROM Data d
CROSS APPLY OPENJSON(d.JsonData) WITH (
[type] nvarchar(100) '$.type',
[date] datetime '$.date',
[state] nvarchar(max) '$.state' AS JSON
) j1
CROSS APPLY OPENJSON(j1.state) j2
CROSS APPLY OPENJSON(j2.[value]) WITH (
state nvarchar(10) '$.state',
color nvarchar(10) '$.color'
) j3
Result:
type date key state color
info 12/11/2019 14:28:51 6ee8587f-3b8c-4e5c-89a9-9f04752607f0 open #0000ff
info 12/11/2019 14:28:51 6ee8587f-3b8c-4e5c-89a9-9f04752607f1 open #0000ff
Notes:
If the input JSON has only one key "6ee8587f-3b8c-4e5c-89a9-9f04752607f0" in the "state" JSON object, you may get the value with JSON_VALUE() using the correct path $.state."6ee8587f-3b8c-4e5c-89a9-9f04752607f0".state.

U-SQL - Extract data from complex json object

So I have a lot of json files structured like this:
{
"Id": "2551faee-20e5-41e4-a7e6-57bd20b02a22",
"Timestamp": "2016-12-06T08:09:57.5541438+01:00",
"EventEntry": {
"EventId": 1,
"Payload": [
"1a3e0c9e-ef69-4c6a-ac8c-9b2de2fbc701",
"DHS.PlanCare.Business.BusinessLogic.VisionModels.VisionModelServiceWithoutUnitOfWork.FetchVisionModelsForClientOnReferenceDateAsync(System.Int64 clientId, System.DateTime referenceDate, System.Threading.CancellationToken cancellationToken)",
25,
"DHS.PlanCare.Business.BusinessLogic.VisionModels.VisionModelServiceWithoutUnitOfWork+<FetchVisionModelsForClientOnReferenceDateAsync>d__11.MoveNext\r\nDHS.PlanCare.Core.Extensions.IQueryableExtensions+<ExecuteAndThrowTaskCancelledWhenRequestedAsync>d__16`1.MoveNext\r\n",
false,
"2197, 6-12-2016 0:00:00, System.Threading.CancellationToken"
],
"EventName": "Duration",
"KeyWordsDescription": "Duration",
"PayloadSchema": [
"instanceSessionId",
"member",
"durationInMilliseconds",
"minimalStacktrace",
"hasFailed",
"parameters"
]
},
"Session": {
"SessionId": "0016e54b-6c4a-48bd-9813-39bb040f7736",
"EnvironmentId": "C15E535B8D0BD9EF63E39045F1859C98FEDD47F2",
"OrganisationId": "AC6752D4-883D-42EE-9FEA-F9AE26978E54"
}
}
How can I create an u-sql query that outputs the
Id,
Timestamp,
EventEntry.EventId and
EventEntry.Payload[2] (value 25 in the example below)
I can't figure out how to extend my query
#extract =
EXTRACT
Timestamp DateTime
FROM #"wasb://xxx/2016/12/06/0016e54b-6c4a-48bd-9813-39bb040f7736/yyy/{*}/{*}.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#res =
SELECT Timestamp
FROM #extract;
OUTPUT #res TO "/output/result.csv" USING Outputters.Csv();
I have seen some examples like:
U- SQL Unable to extract data from JSON file => this only queries one level of the document, I need data from multiple levels.
U-SQL - Extract data from json-array => this only queries one level of the document, I need data from multiple levels.
JSONTuple supports multiple JSONPaths in one go.
#extract =
EXTRACT
Id String,
Timestamp DateTime,
EventEntry String
FROM #"..."
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#res =
SELECT Id, Timestamp, EventEntry,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(EventEntry,
"EventId", "Payload[2]") AS Event
FROM #extract;
#res =
SELECT Id,
Timestamp,
Event["EventId"] AS EventId,
Event["Payload[2]"] AS Something
FROM #res;
You may want to look at this GIT example. https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql
This take 2 disparate data elements and combines them, like you have the Payload, and Payload schema. If you create key value pairs using the "Donut" or "Cake and Batter" examples you may be able to match the scema up to the payload and use the cross apply explode function.