corresponding to jq ~ is there a better way to collapse single object arrays? and R: Nested data.table to JSON
how do I collapse only specific elements?
I want to get rid of the "group" arrays in
[
{
"id2": "A",
"group": [
{
"data": [
{
"id1": 1,
"group": [
{
"data": [
{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
},
{
"id2": "C",
"group": [
{
"data": [
{
"id1": 3,
"group": [
{
"data": [
{
"a": 1,
"b": 1
}
],
"type": "test"
}
],
"type": "B"
}
],
"type": "C"
}
]
}
]
desired output
[{
"id2": "A",
"group": {
"data": [{
"id1": 1,
"group": {
"data": [{
"a": 1,
"b": 1
},
{
"a": 2,
"b": 2
}
],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
},
{
"id2": "C",
"group": {
"data": [{
"id1": 3,
"group": {
"data": [{
"a": 1,
"b": 1
}],
"type": "test"
},
"type": "B"
}],
"type": "C"
}
}
]
The line 'walk(if type=="array" and length==1 then .[0] else . end)' additionally removes the array from the single "data" object.
Unfortunately, we are not able to install the jq 1.6 version on our RStudio Server und thereby I'm not able to use the walk function. (Although is working perfectly fine on my local system)
Can anybody help me out with an alternative solution without walk? Would be highly appreciated.
edit
Ok I got it. I can manually add the walk function such as:
'def walk(f):
. as $in
| if type == "object" then
reduce keys_unsorted[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end; walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'
We could operate one level higher in the nesting hierarchy, and test for "group" being a key, then update accordingly .group = .group[0] instead of . = .[0]
jq 'walk(if type=="object"
and has("group")
and (.group | type)=="array"
and (.group | length)==1
then .group = .group[0]
else . end)'
Related
I have some huge JSON files I need to profile so I can transform them into some tables. I found jq to be really useful in inspecting them, but there are going to be hundreds of these, and I'm pretty new to jq.
I already have some really handy functions in my ~/.jq (big thank you to #mikehwang)
def profile_object:
to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
| sort_by(.key) | from_entries;
def profile_array_objects:
map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;
I'm sure I'll have to modify them after I describe my question.
I'd like a jq line to profile a single object. If a key maps to an array of objects then collect the unique keys across the objects and keep profiling down if there are nested arrays of objects there. If a value is an object, profile that object.
Sorry for the long example, but imagine several GBs of this:
{
"name": "XYZ Company",
"type": "Contractors",
"reporting": [
{
"group_id": "660",
"groups": [
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Austin, TX",
"value": "873275"
}
},
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Nashville, TN",
"value": "2393287"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "FFVII",
"code": "84144",
"type": "DJ",
"type_version": "V10",
"description": "DJ in a mask",
"name": "Claptone",
"negotiated_rates": [
{
"company_references": [
1,
5,
458
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 17.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_modifier_code": [
"124"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
747
],
"negotiated_prices": [
{
"type": "fee",
"rate": 28.42,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
},
{
"negotiation_arrangement": "MGS3",
"name": "David Byrne",
"type": "Producer",
"type_version": "V10",
"code": "654321",
"description": "Frontman from Talking Heads",
"negotiated_rates": [
{
"company_references": [
1,
9,
2344,
8456
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 68.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
679
],
"negotiated_prices": [
{
"type": "fee",
"rate": 89.25,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
}
],
"version": "1.3.1",
"last_updated_on": "2023-02-01"
}
Desired output:
{
"name": "string",
"type": "string",
"reporting": [
{
"group_id": "number",
"groups": [
{
"ids": [
"number"
],
"market": {
"type": "string",
"value": "string"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "string",
"code": "string",
"type": "string",
"type_version": "string",
"description": "string",
"name": "string",
"negotiated_rates": [
{
"company_references": [
"number"
],
"negotiated_prices": [
{
"type": "string",
"rate": "number",
"expiration_date": "string",
"code": [
"string"
],
"billing_modifier_code": [
"string"
],
"billing_class": "string"
}
]
}
]
}
],
"version": "string",
"last_updated_on": "string"
}
Really sorry if there's any errors in that, but I tried to make it all consistent and about as simple as I could.
To restate the need, recursively profile each key in a JSON object if a value is an object or array. Solution needs to be key name independent. Happily to clarify further if needed.
The jq module schema.jq at https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed
Was designed to produce the kind of structural schema you describe.
For very large inputs, it might be very slow, so if the JSON is sufficiently regular, it might be possible to use a hybrid strategy - profiling enough of the data to come up with a comprehensive structural schema, and then checking that it does apply.
For conformance testing of structural schemas such as produced by schema.jq, see https://github.com/pkoppstein/JESS
Given your input.json, here is a solution :
jq '
def schema:
if type == "object" then .[] |= schema
elif type == "array" then map(schema)|unique
| if (first | type) == "object" then [add] else . end
else type
end;
schema
' input.json
Here's a variant of #Philippe's solution: it coalesces objects in map(schema) for arrays in a principled though lossy way. (All these half-solutions trade speed for loss of precision.)
Note that keys_unsorted is used below; if using gojq, then either this would have to be changed to keys, or a def of keys_unsorted provided.
# Use "JSON" as the union of two distinct types
# except combine([]; [ $x ]) => [ $x ]
def combine($a;$b):
if $a == $b then $a elif $a == null then $b elif $b == null then $a
elif ($a == []) and ($b|type) == "array" then $b
elif ($b == []) and ($a|type) == "array" then $a
else "JSON"
end;
# Profile an array by calling mergeTypes(.[] | schema)
# in order to coalesce objects
def mergeTypes(s):
reduce s as $t (null;
if ($t|type) != "object" then .types = (.types + [$t] | unique)
else .object as $o
| .object = reduce ($t | keys_unsorted[]) as $k ($o;
.[$k] = combine( $t[$k]; $o[$k] )
)
end)
| (if .object then [.object] else null end ) + .types ;
def schema:
if type == "object" then .[] |= schema
elif type == "array"
then if . == [] then [] else mergeTypes(.[] | schema) end
else type
end;
schema
Example:
Input:
{"a": [{"b":[1]}, {"c":[2]}, {"c": []}] }
Output:
{
"a": [
{
"b": [
"number"
],
"c": [
"number"
]
}
]
}
Giving the input JSON:
[
{
"name": "foo",
"value": 1
},
{
"name": "bar",
"value": 1
},
{
"name": "foo",
"value": 2
}
]
I'm trying to get the dicts with the name foo, so the expecting output is:
{
"name": "foo",
"value": 1
},
{
"name": "foo",
"value": 2
}
Try this
jq '.[] | select(.name == "foo")'
Demo
How can I manipulate this chunk of json:
{
"id": "whatever",
"attributes": [
{
"key": "this",
"value": "A"
},
{
"key": "that",
"value": "B"
},
{
"key": "other",
"value": "C"
}
]
}
So that it matches on "that" and removes the key and value both in that grouping, leaving json like this:
{
"id": "whatever",
"attributes": [
{
"key": "this",
"value": "A"
},
{
"key": "other",
"value": "C"
}
]
}
I am attempting to use jq on linux.
Try this
.attributes |= map(select(.key != "that"))
Demo
Figured it out.
jq 'del(.attributes[] | select(.key == "that"))' test.json | sponge test.json
I have been playing around with jq to format a json file but I am having some issues trying to solve a particular transformation. Given a test.json file in this format:
[
{
"name": "A", // This would be the first key
"number": 1,
"type": "apple",
"city": "NYC" // This would be the second key
},
{
"name": "A",
"number": "5",
"type": "apple",
"city": "LA"
},
{
"name": "A",
"number": 2,
"type": "apple",
"city": "NYC"
},
{
"name": "B",
"number": 3,
"type": "apple",
"city": "NYC"
}
]
I was wondering, how can I format it this way using jq?
[
{
"key": "A",
"values": [
{
"key": "NYC",
"values": [
{
"number": 1,
"type": "a"
},
{
"number": 2,
"type": "b"
}
]
},
{
"key": "LA",
"values": [
{
"number": 5,
"type": "b"
}
]
}
]
},
{
"key": "B",
"values": [
{
"key": "NYC",
"values": [
{
"number": 3,
"type": "apple"
}
]
}
]
}
]
I have followed this thread Using jq, convert array of name/value pairs to object with named keys and tried to group the json using this expression
jq '. | group_by(.name) | group_by(.city) ' ./test.json
but I have not been able to add the keys in the output.
You'll want to group the items at the different levels and building out your result objects as you want.
group_by(.name) | map({
key: .[0].name,
values: (group_by(.city) | map({
key: .[0].city,
values: map({number,type})
}))
})
Just keep in mind that group_by/1 yields groups in a sorted order. You'll probably want an implementation that preserves that order.
def group_by_unsorted(key_selector):
reduce .[] as $i ({};
.["\($i|key_selector)"] += [$i]
)|[.[]];
I have the two following json as input:
{
"one": {
"vars": [
{
"name": "a",
"value": "a"
},
{
"name": "b",
"value": "b"
}
]
},
"two": {
"vars": [
{
"name": "c",
"value": "c"
},
{
"name": "d",
"value": "d"
}
]
},
"extras": "whatever"
}
{
"one": {
"vars": [
{
"name": "e",
"value": "e"
},
{
"name": "f",
"value": "f"
}
]
},
"two": {
"vars": [
{
"name": "g",
"value": "g"
},
{
"name": "h",
"value": "h"
}
]
}
}
And I'd like to merge them in order to obtain the following result where each of the vars array of each section are merged together:
{
"one": {
"vars": [
{
"name": "a",
"value": "a"
},
{
"name": "b",
"value": "b"
},
{
"name": "e",
"value": "e"
},
{
"name": "f",
"value": "f"
}
]
},
"two": {
"vars": [
{
"name": "c",
"value": "c"
},
{
"name": "d",
"value": "d"
},
{
"name": "g",
"value": "g"
},
{
"name": "h",
"value": "h"
}
]
},
"extras": "whatever"
}
Ideally but not mandatory:
the keys (here one and two) would be arbitrary and an undefined number of them could be present.
the vars array would not contain duplicate (based on name) and right precedence would be applied to override values from the first array.
I managed to merge the two objects and only 1 array with the following command but the key is hardcoded and I'm a bit stuck from there:
jq -s '.[0].one.vars=([.[].one.vars]|flatten)|.[0]' file1.json file2.json
First, here is a solution which is oblivious to the top-level key names, but which does not attempt to avoid duplicates:
$A
| reduce keys_unsorted[] as $k (.;
if .[$k] | (type == "object") and has("vars")
then (.[$k]|.vars) += ($B[$k]|.vars) else . end )
Here of course $A and $B refer to the two objects. You can set $A and $B in several ways.
If you want to reorder the top-level keys, you can simply extend the above with a filter specifying the order, e.g.: {extras, two, one}.
To avoid duplicates, I'd suggest writing a helper function to do just that, as illustrated in the following section.
Avoiding duplicates
def extend(stream):
reduce stream as $s (.;
(map(.name) | index($s|.name)) as $i
| if $i then .[$i] += $s
else . + [$s]
end) ;
$A
| reduce keys_unsorted[] as $k (.;
if .[$k] | (type == "object") and has("vars")
then (.[$k].vars) = ( .[$k].vars | extend(($B[$k].vars[])))
else . end
)
jq -n 'input as $b | input
| .one.vars |= . + $b.one.vars
| .two.vars |= . + $b.two.vars' file2.json file1.json
file1.json must come after file2.json in order to preserve extras.