jq to recursively profile JSON object - json

I have some huge JSON files I need to profile so I can transform them into some tables. I found jq to be really useful in inspecting them, but there are going to be hundreds of these, and I'm pretty new to jq.
I already have some really handy functions in my ~/.jq (big thank you to #mikehwang)
def profile_object:
to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
| sort_by(.key) | from_entries;
def profile_array_objects:
map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;
I'm sure I'll have to modify them after I describe my question.
I'd like a jq line to profile a single object. If a key maps to an array of objects then collect the unique keys across the objects and keep profiling down if there are nested arrays of objects there. If a value is an object, profile that object.
Sorry for the long example, but imagine several GBs of this:
{
"name": "XYZ Company",
"type": "Contractors",
"reporting": [
{
"group_id": "660",
"groups": [
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Austin, TX",
"value": "873275"
}
},
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Nashville, TN",
"value": "2393287"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "FFVII",
"code": "84144",
"type": "DJ",
"type_version": "V10",
"description": "DJ in a mask",
"name": "Claptone",
"negotiated_rates": [
{
"company_references": [
1,
5,
458
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 17.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_modifier_code": [
"124"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
747
],
"negotiated_prices": [
{
"type": "fee",
"rate": 28.42,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
},
{
"negotiation_arrangement": "MGS3",
"name": "David Byrne",
"type": "Producer",
"type_version": "V10",
"code": "654321",
"description": "Frontman from Talking Heads",
"negotiated_rates": [
{
"company_references": [
1,
9,
2344,
8456
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 68.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
679
],
"negotiated_prices": [
{
"type": "fee",
"rate": 89.25,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
}
],
"version": "1.3.1",
"last_updated_on": "2023-02-01"
}
Desired output:
{
"name": "string",
"type": "string",
"reporting": [
{
"group_id": "number",
"groups": [
{
"ids": [
"number"
],
"market": {
"type": "string",
"value": "string"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "string",
"code": "string",
"type": "string",
"type_version": "string",
"description": "string",
"name": "string",
"negotiated_rates": [
{
"company_references": [
"number"
],
"negotiated_prices": [
{
"type": "string",
"rate": "number",
"expiration_date": "string",
"code": [
"string"
],
"billing_modifier_code": [
"string"
],
"billing_class": "string"
}
]
}
]
}
],
"version": "string",
"last_updated_on": "string"
}
Really sorry if there's any errors in that, but I tried to make it all consistent and about as simple as I could.
To restate the need, recursively profile each key in a JSON object if a value is an object or array. Solution needs to be key name independent. Happily to clarify further if needed.

The jq module schema.jq at https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed
Was designed to produce the kind of structural schema you describe.
For very large inputs, it might be very slow, so if the JSON is sufficiently regular, it might be possible to use a hybrid strategy - profiling enough of the data to come up with a comprehensive structural schema, and then checking that it does apply.
For conformance testing of structural schemas such as produced by schema.jq, see https://github.com/pkoppstein/JESS

Given your input.json, here is a solution :
jq '
def schema:
if type == "object" then .[] |= schema
elif type == "array" then map(schema)|unique
| if (first | type) == "object" then [add] else . end
else type
end;
schema
' input.json

Here's a variant of #Philippe's solution: it coalesces objects in map(schema) for arrays in a principled though lossy way. (All these half-solutions trade speed for loss of precision.)
Note that keys_unsorted is used below; if using gojq, then either this would have to be changed to keys, or a def of keys_unsorted provided.
# Use "JSON" as the union of two distinct types
# except combine([]; [ $x ]) => [ $x ]
def combine($a;$b):
if $a == $b then $a elif $a == null then $b elif $b == null then $a
elif ($a == []) and ($b|type) == "array" then $b
elif ($b == []) and ($a|type) == "array" then $a
else "JSON"
end;
# Profile an array by calling mergeTypes(.[] | schema)
# in order to coalesce objects
def mergeTypes(s):
reduce s as $t (null;
if ($t|type) != "object" then .types = (.types + [$t] | unique)
else .object as $o
| .object = reduce ($t | keys_unsorted[]) as $k ($o;
.[$k] = combine( $t[$k]; $o[$k] )
)
end)
| (if .object then [.object] else null end ) + .types ;
def schema:
if type == "object" then .[] |= schema
elif type == "array"
then if . == [] then [] else mergeTypes(.[] | schema) end
else type
end;
schema
Example:
Input:
{"a": [{"b":[1]}, {"c":[2]}, {"c": []}] }
Output:
{
"a": [
{
"b": [
"number"
],
"c": [
"number"
]
}
]
}

Related

How to make jq to pick name value pairs

Might be more or less the same ask as How to get JQ name/value pair from nested (array?) response?, but that question and example there is way too convoluted than what I'm asking --
Giving the input jason as in https://jqplay.org/s/jyKBnpx9NYX
Pick out all the name/value pair under .QueryString, .Params into the same unnested array
E.g., for an input of
{
"Some": "Random stuff",
"One": {
"QueryString": [
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" }
]
},
"Two": {
"QueryString": [
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" }
]
},
"Params": [
{ "Name": "ClassName", "Value": "PRODUCT" },
{ "Name": "ListID", "Value": "Products" },
{ "Name": "Mode ", "Value": "1" },
{ "Name": "Dept" , "Value": "5" },
{ "Name": "HasPrevOrder", "Value": "" }
],
"And": {
"QueryString":[]
},
"More": "like",
"More+": "this"
}
The output would be:
[
{
"Name": "IsOrdered",
"Value": "1"
},
{
"Name": "TimeStamp",
"Value": "11654116426247"
},
{
"Name": "IsOrdered",
"Value": "1"
},
{
"Name": "TimeStamp",
"Value": "11654116426247"
},
{
"Name": "ClassName",
"Value": "PRODUCT"
},
{
"Name": "ListID",
"Value": "Products"
},
...
],
without any empty arrays output ([]), while keep the repeated values in the array.
I tried to remove empty arrays output ([]) by changing the jq expression from
[( .. | objects | ( .QueryString, .Params ) | select( . != null) )]
to
[( .. | objects | ( .QueryString, .Params ) | select( . != null && . != []) )]
but it failed.
And the final output need to be unnested into a single array too.
Bonus Q: Would it be possible to output each name/value pair on one line of their own like the following?
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" },
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" },
To get the Name/Value objects, one per line, you could go with:
jq -c '.. | objects | (.QueryString, .Params) | .. | objects | select( .Name and .Value)'
or more cavalierly:
jq -c '.. | objects | select( .Name and .Value)'
The && must be replaced with and. On the result you can use | flatten to convert "array of arrays of objects" into just "array of objects".
Bonus A: Use the -c/--compact-output flag of jq together with | flatten[] instead of just | flatten.
Together:
jq -c '
[
..
| objects
| ( .QueryString, .Params )
| select(. != null and . != [])
]
| flatten[]' input.json
Although this expression can be simplified into .. | objects | .QueryString[]?, .Params[]?
The output is:
{"Name":"ClassName","Value":"PRODUCT"}
{"Name":"ListID","Value":"Products"}
{"Name":"Mode ","Value":"1"}
{"Name":"Dept","Value":"5"}
{"Name":"HasPrevOrder","Value":""}
{"Name":"IsOrdered","Value":"1"}
{"Name":"TimeStamp","Value":"11654116426247"}
{"Name":"IsOrdered","Value":"1"}
{"Name":"TimeStamp","Value":"11654116426247"}

How to get key value pairs of the objects from complex JSON using jq and map? (Active Campaign)

I have following JSON. I want to get key-value pair objects based on their role. In this example there are 3 roles(Presenter, Approver, Customer) but there can be more as it is dynamic.
JSON
{
"Presenter Name": "Roney",
"Presenter Email": "roney#domain.com",
"Approver Name": "Tim",
"Approver Email": "tim#domain.com",
"Customer Name": "Alex",
"Customer Email": "alex#domain.com",
"Invoice": "001",
"Date": "2022-02-14"
}
Expected output using jq, map,
{
"Presenter": {
"email_address": "roney#domain.com",
"name": "Roney",
"role": "Presenter"
},
"Approver": {
"email_address": "tim#domain.com",
"name": "Tim",
"role": "Approver"
},
"Customer": {
"email_address": "alex#domain.com",
"name": "Alex",
"role": "Customer"
}
}
I have tried till following but didn't get what to do next. Please advice.
to_entries |map( { (.key): { name: .value, email_address:.value, role: .key} } ) | add
This splits the keys at the space character while discarding any items that don't have one in it. Then it assigns the three fields to their values accordingly, using reduce to combine the grouping.
to_entries
| map(.key |= split(" ") | select(.key[1]))
| reduce group_by(.key[0])[] as $g ({};
.[$g[0].key[0]] = (
INDEX($g[]; .key[1]) | {
email_address: .Email.value,
name: .Name.value,
role: .Name.key[0]
}
)
)
{
"Approver": {
"email_address": "tim#domain.com",
"name": "Tim",
"role": "Approver"
},
"Customer": {
"email_address": "alex#domain.com",
"name": "Alex",
"role": "Customer"
},
"Presenter": {
"email_address": "roney#domain.com",
"name": "Roney",
"role": "Presenter"
}
}
Demo
Here's another, shorter approach that doesn't use group_by. Instead, this directly iterates over the initial object using reduce and imediately sets all the fields accordingly if the key followed the space-separated role-key pattern.
reduce (to_entries[] | .key /= " ") as {key: [$role, $key], $value} ({};
if $key then
.[$role] += {({Email: "email_address", Name: "name"}[$key]): $value, $role}
else . end
)
{
"Presenter": {
"name": "Roney",
"role": "Presenter",
"email_address": "roney#domain.com"
},
"Approver": {
"name": "Tim",
"role": "Approver",
"email_address": "tim#domain.com"
},
"Customer": {
"name": "Alex",
"role": "Customer",
"email_address": "alex#domain.com"
}
}
Demo
{ "Name": "name", "Email": "email_address" } as $key_map |
to_entries |
map (
( .key | split(" ") | select( length == 2 ) ) as [ $role, $raw_key ] |
[ $role, "role", $role ],
[ $role, $key_map[$raw_key], .value ]
) |
reduce .[] as [ $role, $key, $val ] ( {}; .[ $role ][ $key ] = $val )
Demo on jqplay
In the above, we start by making the data uniform. Specifically, we start by producing the following:
[
[ "Presenter", "role", "Presenter" ],
[ "Presenter", "name", "Roney" ],
[ "Presenter", "role", "Presenter" ],
[ "Presenter", "email_address", "roney#domain.com" ],
[ "Approver", "role", "Approver" ],
[ "Approver", "name", "Tim" ],
[ "Approver", "role", "Approver" ],
[ "Approver", "email_address", "tim#domain.com" ],
[ "Customer", "role", "Customer" ],
[ "Customer", "name", "Alex" ],
[ "Customer", "role", "Customer" ],
[ "Customer", "email_address", "alex#domain.com" ]
]
There's redundant information, but that doesn't matter.
Then, the final simple reduce builds the desired structure.
.key | split(" ") | select( length == 2 )
can be replaced with the safer
.key | match("^(.*) (Name|Email)$") | .captures | map( .string )

jq ~ collapse specific single object arrays?

corresponding to jq ~ is there a better way to collapse single object arrays? and R: Nested data.table to JSON
how do I collapse only specific elements?
I want to get rid of the "group" arrays in
[
{
"id2": "A",
"group": [
{
"data": [
{
"id1": 1,
"group": [
{
"data": [
{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
},
{
"id2": "C",
"group": [
{
"data": [
{
"id1": 3,
"group": [
{
"data": [
{
"a": 1,
"b": 1
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
}
]
desired output
[{
"id2": "A",
"group": {
"data": [{
"id1": 1,
"group": {
"data": [{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
},
{
"id2": "C",
"group": {
"data": [{
"id1": 3,
"group": {
"data": [{
"a": 1,
"b": 1
}],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
}
]
The line 'walk(if type=="array" and length==1 then .[0] else . end)' additionally removes the array from the single "data" object.
Unfortunately, we are not able to install the jq 1.6 version on our RStudio Server und thereby I'm not able to use the walk function. (Although is working perfectly fine on my local system)
Can anybody help me out with an alternative solution without walk? Would be highly appreciated.
edit
Ok I got it. I can manually add the walk function such as:
'def walk(f):
. as $in
| if type == "object" then
reduce keys_unsorted[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end; walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'
We could operate one level higher in the nesting hierarchy, and test for "group" being a key, then update accordingly .group = .group[0] instead of . = .[0]
jq 'walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'

Appending Geojson with json field using JQ

I have a project I'm working on that creates a choropleth map with all US county borders loaded from file1.json and filled with a color gradient based on values in file2.json. In previous iterations, I just enter values manually into file1.json, but now I want to expand my map and make it more user-friendly.
file1.json is structured like this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"LSAD": "County",
"CENSUSAREA": 739.017
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
},
...
]
}
file2.json is structued like this:
[
{
"County": "Alameda",
"Count": 25
},
{
"County": "Amador",
"Count": 1
},
{
"County": "Butte",
"Count": 2
},
...
]
I want to create a new file that includes everything from file1.json, but append it to include the relevent Count field based on the County field.
The result would look like this:
[
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"Count": "25",
"LSAD": "County",
"CENSUSAREA": 739.017
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
},
...
]
I'm new to using jq, but I've played around with it enough to get it running in PowerShell.
Here is a test.jq file which may help
# utility to create lookup table from array of objects
# k is the name to use as the key
# f is a function to compute the value
#
def obj(k;f): reduce .[] as $o ({}; .[$o[k]] = ($o | f));
# create map from county to count
( $file2 | obj("County";.Count) ) as $count
# add .properties.Count to each feature
| .features |= map( .properties.Count = $count[.properties.NAME] )
Example use assuming suitable file1.json and file2.json:
$ jq -M --argfile file2 file2.json -f test.jq file1.json
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"LSAD": "County",
"CENSUSAREA": 739.017,
"Count": 25
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
}
]
}
Try it online!
I notice that "Count" is a string in your example output but it's a number in the sample file2. If you need to convert that to a string you can include a call to tostring. e.g.
.features |= map( .properties.Count = ( $count[.properties.NAME] | tostring ) )
or you could perform the conversion when the lookup table is created, e.g.
( $file2 | obj("County"; .Count | tostring ) ) as $count

jq group by property in array

I have an input json document as so:
[
{
"Name": "one",
"Tags": [
{
"Key": "Name",
"Value": "important"
},
{
"Key": "OtherTag",
"Value": "irrelevant"
}
]
},
{
"Name": "two",
"Tags": [
{
"Key": "OtherTag",
"Value": "irrelevant2"
},
{
"Key": "Name",
"Value": "important"
}
]
},
{
"Name": "three",
"Tags": [
{
"Key": "Name",
"Value": "important2"
},
{
"Key": "OtherTag",
"Value": "irrelevant3"
}
]
}
]
I want to use jq to group the three records by tag Value where the Key = "Name". The result would be two arrays, one with two records in it and one with one. The array with two records would have two because both records share the same tag with a value of "important". Here is what the result would look like:
[
[
{
"Name": "one",
"Tags": [
{
"Key": "Name",
"Value": "important"
},
{
"Key": "OtherTag",
"Value": "irrelevant"
}
]
},
{
"Name": "two",
"Tags": [
{
"Key": "OtherTag",
"Value": "irrelevant2"
},
{
"Key": "Name",
"Value": "important"
}
]
},
],
[
{
"Name": "three",
"Tags": [
{
"Key": "Name",
"Value": "important2"
},
{
"Key": "OtherTag",
"Value": "irrelevant3"
}
]
}
]
]
I just can't figure out how to do this with jq. Does anyone have any ideas?
Your proposed solution is fine, but if you don't mind converting the arrays of Key-Value pairs into objects, then the following can be used:
map( .Tags |= from_entries ) | group_by(.Tags.Name)
This at least makes the "group_by" easy to understand; furthermore, it would be easy to convert the .Tags objects back to key-value pairs (with lower-case "key" and "value"):
map( .Tags |= from_entries ) | group_by(.Tags.Name)
| map(map( .Tags |= to_entries))
Key/Value capitalization
One way to recover the capitalized Key/Value tags would be to tweak the above as follows:
def KV: map( {Key: .key, Value: .value} );
map( .Tags |= from_entries ) | group_by(.Tags.Name)
| map(map( .Tags |= (to_entries | KV)))