I have a kusto table with one of the columns as dynamic type with nested json,
How do I flatten in kusto?
mv-expand is only doing one level.
column1 : timetsamp
column2 : id
column3 : json object
timestamp id value
2020-10-13 22:42:05.0000000 d0 "{
""value"": ""0"",
""max"": ""0"",
""min"": ""0"",
""avg"": ""0""
}"
2020-10-13 22:42:05.0000000 d0 "{
""sid"": ""a0"",
""data"": {
""x"": {
""a"": {
""t1"": ""2020-10-13T22:46:50.1310000Z"",
""m1"": 446164,
""m4"": {
""m41"": ""abcd"",
""m42"": 1234
}
}
}
}
}"
#update2 : I was able to faltten keys, but not the values
let testJson = datatable(timestamp : datetime, id : string, value : dynamic )
[datetime(2020-10-13T22:42:05Z), 'd0', dynamic({"value":"0","max":"0","min":"0","avg":"0"}),
datetime(2020-10-13T22:42:05Z), 'd1', dynamic({"sid":"a0","data":{"x":{"a":{"t1":"2020-10-13T22:46:50.131Z","m1":446164,"m4":{"m41":"abcd","m42":1234}}}}})];
testJson
| extend key=treepath(value)
| mv-expand key
| extend value1 = value[tostring(key)]
You can invoke mv-expand several times:
let _data = datatable (column1:datetime , column2:string , column3:dynamic )
[
datetime(2020-10-13 22:42:05.0000000), 'd0', dynamic({
"value": "0",
"max": "0",
"min": "0",
"avg": "0"
}),
datetime(2020-10-13 22:42:05.0000000), 'd0', dynamic({
"sid": "a0",
"data": {
"x": {
"a": {
"t1": "2020-10-13T22:46:50.1310000Z",
"m1": 446164,
"m4": {
"m41": "abcd",
"m42": 1234
}
}
}
}
})
];
_data
| mv-expand column3
| mv-expand more_data=column3.data.x.a
| mv-expand more_data_m4=more_data.m4
You can also promote dynamic fields into columns using evaluate bag_unpack():
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/bag-unpackplugin
let _data = datatable (column1:datetime , column2:string , column3:dynamic )
[
datetime(2020-10-13 22:42:05.0000000), 'd0', dynamic({
"value": "0",
"max": "0",
"min": "0",
"avg": "0"
}),
datetime(2020-10-13 22:42:05.0000000), 'd0', dynamic({
"sid": "a0",
"data": {
"x": {
"a": {
"t1": "2020-10-13T22:46:50.1310000Z",
"m1": 446164,
"m4": {
"m41": "abcd",
"m42": 1234
}
}
}
}
})
];
_data
| mv-expand column3
| extend expanded_data = column3.data.x.a
| evaluate bag_unpack(expanded_data)
Related
We are tying to format a json similar to this:
[
{"id": 1,
"type": "A",
"changes": [
{"id": 12},
{"id": 13}
],
"wanted_key": "good",
"unwanted_key": "aaa"
},
{"id": 2,
"type": "A",
"unwanted_key": "aaa"
},
{"id": 3,
"type": "B",
"changes": [
{"id": 31},
{"id": 32}
],
"unwanted_key": "aaa",
"unwanted_key2": "aaa"
},
{"id": 4,
"type": "B",
"unwanted_key3": "aaa"
},
null,
null,
{"id": 7}
]
into something like this:
[
{
"id": 1,
"type": "A",
"wanted_key": true # every record must have this key/value
},
{
"id": 12, # note: this was in the "changes" property of record id 1
"type": "A", # type should be the same type than record id 1
"wanted_key": true
},
{
"id": 13, # note: this was in the "changes" property of record id 1
"type": "A", # type should be the same type than record id 1
"wanted_key": true
},
{
"id": 2,
"type": "A",
"wanted_key": true
},
{
"id": 3,
"type": "B",
"wanted_key": true
},
{
"id": 31, # note: this was in the "changes" property of record id 3
"type": "B", # type should be the same type than record id 3
"wanted_key": true
},
{
"id": 32, # note: this was in the "changes" property of record id 3
"type": "B", # type should be the same type than record id 3
"wanted_key": true
},
{
"id": 4,
"type": "B",
"wanted_key": true
},
{
"id": 7,
"type": "UNKN", # records without a type should have this type
"wanted_key": true
}
]
So far, I've been able to:
remove null records
obtain the keys we need with their default
give records without a type a default type
What we are missing:
from records having a changes key, create new records with the type of their parent record
join all records in a single array
Unfortunately we are not entirely sure how to proceed... Any help would be appreciated.
So far our jq goes like this:
del(..|nulls) | map({id, type: (.type // "UNKN"), wanted_key: (true)}) | del(..|nulls)
Here's our test code:
https://jqplay.org/s/eLAWwP1ha8P
The following should work:
map(select(values))
| map(., .type as $type | (.changes[]? + {$type}))
| map({id, type: (.type // "UNKN"), wanted_key: true})
Only select non-null values
Return the original items followed by their inner changes array (+ outer type)
Extract 3 properties for output
Multiple map calls can usually be combined, so this becomes:
map(
select(values)
| ., (.type as $type | (.changes[]? + {$type}))
| {id, type: (.type // "UNKN"), wanted_key: true}
)
Another option without variables:
map(
select(values)
| ., .changes[]? + {type}
| {id, type: (.type // "UNKN"), wanted_key: true}
)
# or:
map(select(values))
| map(., .changes[]? + {type})
| map({id, type: (.type // "UNKN"), wanted_key: true})
or even with a separate normalization step for the unknown type:
map(select(values))
| map(.type //= "UNKN")
| map(., .changes[]? + {type})
| map({id, type, wanted_key: true})
# condensed to a single line:
map(select(values) | .type //= "UNKN" | ., .changes[]? + {type} | {id, type, wanted_key: true})
Explanation:
Select only non-null values from the array
If type is not set, create the property with value "UNKN"
Produce the original array items, followed by their nested changes elements extended with the parent type
Reshape objects to only contain properties id, type, and wanted_key.
Here's one way:
map(
select(values)
| (.type // "UNKN") as $type
| ., .changes[]?
| {id, $type, wanted_key: true}
)
[
{
"id": 1,
"type": "A",
"wanted_key": true
},
{
"id": 12,
"type": "A",
"wanted_key": true
},
{
"id": 13,
"type": "A",
"wanted_key": true
},
{
"id": 2,
"type": "A",
"wanted_key": true
},
{
"id": 3,
"type": "B",
"wanted_key": true
},
{
"id": 31,
"type": "B",
"wanted_key": true
},
{
"id": 32,
"type": "B",
"wanted_key": true
},
{
"id": 4,
"type": "B",
"wanted_key": true
},
{
"id": 7,
"type": "UNKN",
"wanted_key": true
}
]
Demo
Something like below should work
map(
select(type == "object") |
( {id}, {id : ( .changes[]? .id )} ) +
{ type: (.type // "UNKN"), wanted_key: true }
)
jq play - demo
I have a table with a column that holds valid JSON, heres an example of the JSON structure:
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}
I need to return the values from MultiSelected collection depending on the value of ListID.
I'm using the following JSON Path to retun value
$.Requirements."$values"[?(#.ListId=='956cf9c5-24ab-47d9-8082-940118f2f1a3')].MultiSelected."$values"
This worked fine in a JSON Expression tester.
But when I try to use it to query the table I get the following error:
JSON path is not properly formatted. Unexpected character '?' is found at position 25.
The query I'm using is as follows:
SELECT ID AS PayloadID,
Items.Item AS ItemsValues
FROM dbo.Payload
CROSS APPLY ( SELECT *
FROM OPENJSON( JSON_QUERY( Payload, '$.Requirements."$values"[?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')].MultiSelected."$values"' ) )
WITH ( Item nvarchar(200) '$' ) ) AS Items
WHERE ID = 3
I've tried replacing
?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')
with 0 and it works fine on SQL Server.
My question is, is filter syntax ?(...) supported in JSON_QUERY or is there something else I should be doing?
The database is running on Azure, were the database compatability level is set to SQL Server 2017 (140).
Thanks for your help in advance.
Andy
I would use openjson twice in stead
drop table if exists #payload
create table #payload(ID int,Payload nvarchar(max))
insert into #payload VALUES
(3,N'
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}'
)
SELECT ID AS PayloadID,
Items.[value]
FROM #Payload a
CROSS APPLY OPENJSON( Payload, '$.Requirements."$values"' ) with ( ListId varchar(50),MultiSelected nvarchar(max) as json) b
CROSS APPLY OPENJSON( MultiSelected,'$."$values"' ) Items
where
a.id=3
AND b.listid='956cf9c5-24ab-47d9-8082-940118f2f1a3'
Reply:
+-----------+--------+
| PayloadID | value |
+-----------+--------+
| 3 | Value1 |
| 3 | Value2 |
| 3 | Value3 |
+-----------+--------+
Input Json file
{
"CarBrands": [{
"model": "audi",
"make": " (YEAR == \"2009\" AND CONDITION in (\"Y\") AND RESALE in (\"2015\")) ",
"service": {
"first": null,
"second": [],
"third": []
},
"dealerspot": [{
"dealername": [
"\"first\"",
"\"abc\""
]
},
{
"dealerlat": [
"\"45.00\"",
"\"38.00\""
]
}
],
"type": "ok",
"plate": true
},
{
"model": "bmw",
"make": " (YEAR == \"2010\" AND CONDITION OR (\"N\") AND RESALE in (\"2016\")) ",
"service": {
"first": null,
"second": [],
"third": []
},
"dealerspot": [{
"dealerlat": [
"\"99.00\"",
"\"38.00\""
]
},
{
"dealername": [
"\"sports\"",
"\"abc\""
]
}
],
"type": "ok",
"plate": true
},
{
"model": "toy",
"make": " (YEAR == \"2013\" AND CONDITION in (\"Y\") AND RESALE in (\"2018\")) ",
"service": {
"first": null,
"second": [],
"third": []
},
"dealerspot": [{
"dealerlat": [
"\"35.00\"",
"\"38.00\""
]
},
{
"dealername": [
"\"nelson\"",
"\"abc\""
]
}
],
"type": "ok",
"plate": true
}
]
}
expected output
+-------+-------------+-----------+
model | dealername | dealerlat |
--------+-------------+-----------+
audi | first | 45 |
bmw | sports | 99 |
toy | nelson | 35 |
--------+-------------+-----------+
import sparkSession.implicits._
val tagsDF = sparkSession.read.option("multiLine", true).option("inferSchema", true).json("src/main/resources/carbrands.json");
val df = tagsDF.select(explode($"CarBrands") as "car_brands")
val dfd = df.withColumn("_tmp", split($"car_brands.make", "\"")).select($"car_brands.model".as("model"),$"car_brands.dealerspot.dealername"(0)(0).as("dealername"),$"car_brands.dealerspot.dealerlat"(0)(0).as("dealerlat"))
note : since dealername and dealerlat position is not fixed, the index (0)(0) doesnt produce the desired output. please help
You can convert dealerspot into JSON string and then use JSONPath with get_json_object():
import org.apache.spark.sql.functions.{get_json_object,to_json,trim,explode}
val df1 = (tagsDF.withColumn("car_brands", explode($"CarBrands"))
.select("car_brands.*")
.withColumn("dealerspot", to_json($"dealerspot")))
//+--------------------+--------------------+-----+-----+----------+----+
//| dealerspot| make|model|plate| service|type|
//+--------------------+--------------------+-----+-----+----------+----+
//|[{"dealername":["...| (YEAR == "2009" ...| audi| true|[, [], []]| ok|
//|[{"dealerlat":["\...| (YEAR == "2010" ...| bmw| true|[, [], []]| ok|
//|[{"dealerlat":["\...| (YEAR == "2013" ...| toy| true|[, [], []]| ok|
//+--------------------+--------------------+-----+-----+----------+----+
df1.select(
$"model"
, trim(get_json_object($"dealerspot", "$[*].dealername[0]"), "\"\\") as "dealername"
, trim(get_json_object($"dealerspot", "$[*].dealerlat[0]"), "\"\\") as "dealerlat"
).show
//+-----+----------+---------+
//|model|dealername|dealerlat|
//+-----+----------+---------+
//| audi| first| 45.00|
//| bmw| sports| 99.00|
//| toy| nelson| 35.00|
//+-----+----------+---------+
I am trying to update several rows in SQL with JSON.
I'd like to match a primary key on a table row to an index nested in an array of JS objects.
Sample data:
let json = [{
"header": object_data,
"items": [{
"id": {
"i": 0,
"name": "item_id"
},
"meta": {
"data": object_data,
"text": "some_text"
}
}, {
"id": {
"i": 4,
"name": "item_id4"
},
"meta": {
"data": object_data,
"text": "some_text"
}
}, {
"id": {
"i": 17,
"name": "item_id17"
},
"meta": {
"data": object_data,
"text": "some_text"
}}]
}]
Sample table:
i | json | item_id
---+---------------------------+---------
0 | entire_object_at_index_0 | item_id
4 | entire_object_at_index_4 | item_id4
17 | entire_object_at_index_17 | item_id17
entire_object_at_index, meaning appending the item data to the header to create a new object for each row.
"header" "some_data",
"items": [{
"id": {
"i": 0,
"name": "item_id1"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}]
SQL:
update someTable set
json = json_value(#jsons, '$') -- not sure how to join on index here
item_id = json_value(#jsons, '$.items[?].id.name' -- not sure how to select by index here
where [i] = json_query(#jsons, '$.items.id.i')
The requirement to repeat the other properties complicates this a bit, because we need to build a new object explicitly. Even so it's not too hard:
update someTable
set
[json] = (
select (
select
"header" = json_query(#json, '$.header'),
"items" = json_query(N'[' + items.item + N']')
for json path, without_array_wrapper
)
),
item_id = items.item_id
from openjson(#json, '$.items') with (
item nvarchar(max) '$' as json,
item_id varchar(50) '$.id.name',
i int '$.id.i'
) items
join someTable on [someTable].i = items.i
Here I'm assuming the #json has already been unwrapped from its array, as your query seems to assume. If it's not, substitute $.[0] for $ in the outer query.
Update:
It's an attempt to improve my answer (I missed the header part of the JSON content in the original answer). Of course, the #JeroenMostert's answer is an excellent solution, so this is just another possible approach. Note, that if header part of JSON content is scalar value, you should use JSON_VALUE().
Table and JSON:
-- Table
CREATE TABLE #Data (
i int,
[json] nvarchar(max),
item_id nvarchar(100)
)
INSERT INTO #Data
(i, [json], [item_id])
VALUES
(0 , N'entire_object_at_index_0', N'item_id'),
(4 , N'entire_object_at_index_4', N'item_id4'),
(17, N'entire_object_at_index_17', N'item_id17')
-- JSON
DECLARE #json nvarchar(max) = N'[{
"header": {"key": "some_data"},
"items": [{
"id": {
"i": 0,
"name": "item_id"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 4,
"name": "item_id4"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 17,
"name": "item_id17"
},
"meta": {
"data": "some_data",
"text": "some_text"
}}]
}]'
Statement:
UPDATE #Data
SET #Data.Json = j.Json
FROM #Data
CROSS APPLY (
SELECT
JSON_QUERY(#json, '$[0].header') AS header,
JSON_QUERY(j.[value], '$') AS items
FROM OPENJSON(#json, '$[0].items') j
WHERE JSON_VALUE(j.[value], '$.id.i') = #Data.[i]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) j ([Json])
Original answer:
One possible approach is to use OPENJSON and appropriate join:
Table and JSON:
-- Table
CREATE TABLE #Data (
i int,
[json] nvarchar(max),
item_id nvarchar(100)
)
INSERT INTO #Data
(i, [json], [item_id])
VALUES
(0 , N'entire_object_at_index_0', N'item_id'),
(4 , N'entire_object_at_index_4', N'item_id4'),
(17, N'entire_object_at_index_17', N'item_id17')
-- JSON
DECLARE #json nvarchar(max) = N'[{
"header": "some_data",
"items": [{
"id": {
"i": 0,
"name": "item_id"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 4,
"name": "item_id4"
},
"meta": {
"data": "some_data",
"text": "some_text"
}
}, {
"id": {
"i": 17,
"name": "item_id17"
},
"meta": {
"data": "some_data",
"text": "some_text"
}}]
}]'
Statement:
UPDATE #Data
SET [json] = j.[value]
FROM #Data
LEFT JOIN (
SELECT
[value],
JSON_VALUE([value], '$.id.i') AS [i]
FROM OPENJSON(#json, '$[0].items')
) j ON (#Data.[i] = j.[i])
I was trying to load this json data in hive
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
using DDL commands
ADD JAR /home/cloudera/Downloads/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE format.json_serde (
`id` string,
`type` string,
`name` string,
`ppu` float,
batters` struct < `batter`:array < struct <`bid`:string, `btype`:string >>>,
`topping`:array < struct<`tid`:int, `ttype`:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
is throwing me error
FAILED: ParseException line 7:11 cannot recognize input near ':' 'array' '<' in column type </b>
You got typos
ttype`:string should be ttype:string
battersstruct should be batters struct
topping:array should be topping array
JSON SerDe mapping is done by name.
Your structs fields names should match the actual names, e.g. id and not bid or tid, otherwise you'll get NULL values for these fields.
There is already a JSON SerDe whicg is part of the Hive installation.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe
create external table json_serde
(
id string
,type string
,name string
,ppu float
,batters struct<batter:array<struct<id:string,type:string>>>
,topping array<struct<id:string,type:string>>
)
row format serde
'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from json_serde
;
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | type | name | ppu | batters | topping |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0001 | donut | Cake | 0.550000011920929 | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil'sFood"}]} | [{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"PowderedSugar"},{"id":"5006","type":"ChocolatewithSprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}] |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
It worked when i removed the semicolons near topping. Thanks
CREATE EXTERNAL TABLE format.json_serde (
id string,
type string,
name string,
ppu float,
batters struct<batter:array<
struct<bid:string, btype:string >>>,
topping array< struct<tid:string, ttype:string>>
)