As a follow up to my question that was answered so well --
Given this sample input --
[
{
"x1": "abc123",
"x2": "Larry",
"attr": {
"f1": "one",
"f2": ["two", "2"],
"f3": "three",
"x1": "Never included in below examples as this is .attr.x1, not .x1"
}
},
{
"x1": "xyz789",
"x2": "Curly",
"attr": {
"f1": ["one", "111"],
"f2": "two"
}
},
{
"x1": "def456",
"x2": "moe",
"attr": {
"f4": {
"a": 1,
"b": "two"
}
}
}
]
I need a jq program that will allow me to pass the paths I want in the output document as --args on the command line. All non-target paths are removed. Examples --
jq -f program.jq test.json --args x1 attr.f1 attr.f3 would produce ---
[
{
"x1": "abc123",
"attr": {
"f1": "one",
"f3": "three"
}
},
{
"x1": "xyz789",
"attr": {
"f1": ["one", "111"],
"f3": null
}
},
{
"x1": "def456",
"attr": {
"f1": null,
"f3": null
}
}
]
whereas jq -f program.jq test.json --args x1 x2 attr.f2 attr.f4 would produce --
[
{
"x1": "abc123",
"x2": "Larry",
"attr": {
"f2": ["two", "2"],
"f4": null
}
},
{
"x1": "xyz789",
"x2": "Curly",
"attr": {
"f2": "two",
"f4": null
}
},
{
"x1": "def456",
"x2": "moe",
"attr": {
"f2": null,
"f4": {
"a": 1,
"b": "two"
}
}
}
]
Notes:
I'm not particularly concerned about the syntax. If --argsjson, a different way to specify the path, piping multiple commands together or really any syntax is more appropriate, that's fine.
Key names can be duplicated at different levels of the hierarchy, but only the path matching "exactly" should be included. .attr.x1 is always ignored in the examples provided. In those examples, I want .x1, not .attr.x1.
This example is fairly trivial, a real input document may have up to ~5 or so levels of hierarchy and I need to retrieve key/value's from any depth (ie. .attr.level2.level3.level4.mykey.
In practice, the input might be a small document with a dozen entries and < 20 keys to a reasonably large one with 100k entries and ~100 keys. Performance is important, but not critical.
EDIT - I also wouldn't be adverse to using some type of templating (Jinja2, etc) to generate the jq program at runtime. Not ideal, but perfectly fine if that's the answer.
Any ideas? I've hacked around with map, with_entries and many schemes and can't find the right syntax given the need to pass the "target" path as args.
Thanks!
Read or convert the path arguments as path arrays and Use getpath and setpath to filter and build your objects:
jq '
map(. as $item | reduce ($ARGS.positional[] / ".") as $p ({};
setpath($p; $item | getpath($p))
))
' file.json --args x1 attr.f1 attr.f3
[
{
"x1": "abc123",
"attr": {
"f1": "one",
"f3": "three"
}
},
{
"x1": "xyz789",
"attr": {
"f1": [
"one",
"111"
],
"f3": null
}
},
{
"x1": "def456",
"attr": {
"f1": null,
"f3": null
}
}
]
Demo
jq '
map(. as $item | reduce ($ARGS.positional[] / ".") as $p ({};
setpath($p; $item | getpath($p))
))
' file.json --args x1 x2 attr.f2 attr.f4
[
{
"x1": "abc123",
"x2": "Larry",
"attr": {
"f2": [
"two",
"2"
],
"f4": null
}
},
{
"x1": "xyz789",
"x2": "Curly",
"attr": {
"f2": "two",
"f4": null
}
},
{
"x1": "def456",
"x2": "moe",
"attr": {
"f2": null,
"f4": {
"a": 1,
"b": "two"
}
}
}
]
Demo
Related
I have some huge JSON files I need to profile so I can transform them into some tables. I found jq to be really useful in inspecting them, but there are going to be hundreds of these, and I'm pretty new to jq.
I already have some really handy functions in my ~/.jq (big thank you to #mikehwang)
def profile_object:
to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
| sort_by(.key) | from_entries;
def profile_array_objects:
map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;
I'm sure I'll have to modify them after I describe my question.
I'd like a jq line to profile a single object. If a key maps to an array of objects then collect the unique keys across the objects and keep profiling down if there are nested arrays of objects there. If a value is an object, profile that object.
Sorry for the long example, but imagine several GBs of this:
{
"name": "XYZ Company",
"type": "Contractors",
"reporting": [
{
"group_id": "660",
"groups": [
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Austin, TX",
"value": "873275"
}
},
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Nashville, TN",
"value": "2393287"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "FFVII",
"code": "84144",
"type": "DJ",
"type_version": "V10",
"description": "DJ in a mask",
"name": "Claptone",
"negotiated_rates": [
{
"company_references": [
1,
5,
458
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 17.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_modifier_code": [
"124"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
747
],
"negotiated_prices": [
{
"type": "fee",
"rate": 28.42,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
},
{
"negotiation_arrangement": "MGS3",
"name": "David Byrne",
"type": "Producer",
"type_version": "V10",
"code": "654321",
"description": "Frontman from Talking Heads",
"negotiated_rates": [
{
"company_references": [
1,
9,
2344,
8456
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 68.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
679
],
"negotiated_prices": [
{
"type": "fee",
"rate": 89.25,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
}
],
"version": "1.3.1",
"last_updated_on": "2023-02-01"
}
Desired output:
{
"name": "string",
"type": "string",
"reporting": [
{
"group_id": "number",
"groups": [
{
"ids": [
"number"
],
"market": {
"type": "string",
"value": "string"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "string",
"code": "string",
"type": "string",
"type_version": "string",
"description": "string",
"name": "string",
"negotiated_rates": [
{
"company_references": [
"number"
],
"negotiated_prices": [
{
"type": "string",
"rate": "number",
"expiration_date": "string",
"code": [
"string"
],
"billing_modifier_code": [
"string"
],
"billing_class": "string"
}
]
}
]
}
],
"version": "string",
"last_updated_on": "string"
}
Really sorry if there's any errors in that, but I tried to make it all consistent and about as simple as I could.
To restate the need, recursively profile each key in a JSON object if a value is an object or array. Solution needs to be key name independent. Happily to clarify further if needed.
The jq module schema.jq at https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed
Was designed to produce the kind of structural schema you describe.
For very large inputs, it might be very slow, so if the JSON is sufficiently regular, it might be possible to use a hybrid strategy - profiling enough of the data to come up with a comprehensive structural schema, and then checking that it does apply.
For conformance testing of structural schemas such as produced by schema.jq, see https://github.com/pkoppstein/JESS
Given your input.json, here is a solution :
jq '
def schema:
if type == "object" then .[] |= schema
elif type == "array" then map(schema)|unique
| if (first | type) == "object" then [add] else . end
else type
end;
schema
' input.json
Here's a variant of #Philippe's solution: it coalesces objects in map(schema) for arrays in a principled though lossy way. (All these half-solutions trade speed for loss of precision.)
Note that keys_unsorted is used below; if using gojq, then either this would have to be changed to keys, or a def of keys_unsorted provided.
# Use "JSON" as the union of two distinct types
# except combine([]; [ $x ]) => [ $x ]
def combine($a;$b):
if $a == $b then $a elif $a == null then $b elif $b == null then $a
elif ($a == []) and ($b|type) == "array" then $b
elif ($b == []) and ($a|type) == "array" then $a
else "JSON"
end;
# Profile an array by calling mergeTypes(.[] | schema)
# in order to coalesce objects
def mergeTypes(s):
reduce s as $t (null;
if ($t|type) != "object" then .types = (.types + [$t] | unique)
else .object as $o
| .object = reduce ($t | keys_unsorted[]) as $k ($o;
.[$k] = combine( $t[$k]; $o[$k] )
)
end)
| (if .object then [.object] else null end ) + .types ;
def schema:
if type == "object" then .[] |= schema
elif type == "array"
then if . == [] then [] else mergeTypes(.[] | schema) end
else type
end;
schema
Example:
Input:
{"a": [{"b":[1]}, {"c":[2]}, {"c": []}] }
Output:
{
"a": [
{
"b": [
"number"
],
"c": [
"number"
]
}
]
}
I have the below JSON structure :
{
"data": {
"a1": "value1",
"a2": "value1",
"a3": "value1",
"collection": [
{
"events": [
{
"x1": 123,
"x2": "NA",
"x3": 5678
},
{
"x1": 432,
"x2": 854,
"x3": 912
}
]
}
]
}
}
I would like to remove the field x2 whenever it has value as "NA" or "NV" using Jolt
Desired output :
{
"data": {
"a1": "value1",
"a2": "value1",
"a3": "value1",
"collection": [
{
"events": [
{
"x1": 123,
"x3": 5678
},
{
"x1": 432,
"x2": 854,
"x3": 912
}
]
}
]
}
}
You can use the following shift transformation spec with conditional logic
[
{
"operation": "shift",
"spec": {
"data": {
"*": "&1.&",
"collection": {
"*": {
"events": {
"*": {
"x2": {
"NA|NV": { // | is OR operator
"*": ""
},
"*": {
"#1": "&7.&6[&5].&4[&3].&2" // &7 represents the level of "data", &6 represents the level of "collection", [&5] -> the indexes of it, &4 -> the level of "events" etc.
}
},
"*": "&5.&4[&3].&2[&1].&" // reduce two level with respect to the above value identifier
}
}
}
}
}
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is
[You can see a demo of the problem in the following jq play: https://jqplay.org/s/Lx7eM2akzp]
Having the following array
{
"t": "0",
"d": "12090",
"w": "1",
"s": [
{
"ac": "252",
"$t": "pastas"
},
{
"t": "1280",
"ac": "226",
"$t": "299"
},
{
"t": "2780",
"ac": "252",
"$t": "187"
}
]
}
How can I flatten the inner array such that I can run queries similar to
jq '{ "absolute": .t, "word": .s[]."$t", "relative": .s[].t, }'
so that I get results such as:
{
"absolute": "0",
"word": "pastas",
"relative": null
}
{
"absolute": "0",
"word": "299",
"relative": "1280"
}
{
"absolute": "0",
"word": "187",
"relative": "2780"
}
instead of all the combinations between the inner properties
In this case, the iterator is .s[] and we want just one of them:
.s[] as $s
| { "absolute": .t, "word": $s."$t", "relative": $s.t }
Or, if you want to be a little DRYer:
{"absolute": .t} + (.s[] | {"word": ."$t", "relative": .t})
I have the two following json as input:
{
"one": {
"vars": [
{
"name": "a",
"value": "a"
},
{
"name": "b",
"value": "b"
}
]
},
"two": {
"vars": [
{
"name": "c",
"value": "c"
},
{
"name": "d",
"value": "d"
}
]
},
"extras": "whatever"
}
{
"one": {
"vars": [
{
"name": "e",
"value": "e"
},
{
"name": "f",
"value": "f"
}
]
},
"two": {
"vars": [
{
"name": "g",
"value": "g"
},
{
"name": "h",
"value": "h"
}
]
}
}
And I'd like to merge them in order to obtain the following result where each of the vars array of each section are merged together:
{
"one": {
"vars": [
{
"name": "a",
"value": "a"
},
{
"name": "b",
"value": "b"
},
{
"name": "e",
"value": "e"
},
{
"name": "f",
"value": "f"
}
]
},
"two": {
"vars": [
{
"name": "c",
"value": "c"
},
{
"name": "d",
"value": "d"
},
{
"name": "g",
"value": "g"
},
{
"name": "h",
"value": "h"
}
]
},
"extras": "whatever"
}
Ideally but not mandatory:
the keys (here one and two) would be arbitrary and an undefined number of them could be present.
the vars array would not contain duplicate (based on name) and right precedence would be applied to override values from the first array.
I managed to merge the two objects and only 1 array with the following command but the key is hardcoded and I'm a bit stuck from there:
jq -s '.[0].one.vars=([.[].one.vars]|flatten)|.[0]' file1.json file2.json
First, here is a solution which is oblivious to the top-level key names, but which does not attempt to avoid duplicates:
$A
| reduce keys_unsorted[] as $k (.;
if .[$k] | (type == "object") and has("vars")
then (.[$k]|.vars) += ($B[$k]|.vars) else . end )
Here of course $A and $B refer to the two objects. You can set $A and $B in several ways.
If you want to reorder the top-level keys, you can simply extend the above with a filter specifying the order, e.g.: {extras, two, one}.
To avoid duplicates, I'd suggest writing a helper function to do just that, as illustrated in the following section.
Avoiding duplicates
def extend(stream):
reduce stream as $s (.;
(map(.name) | index($s|.name)) as $i
| if $i then .[$i] += $s
else . + [$s]
end) ;
$A
| reduce keys_unsorted[] as $k (.;
if .[$k] | (type == "object") and has("vars")
then (.[$k].vars) = ( .[$k].vars | extend(($B[$k].vars[])))
else . end
)
jq -n 'input as $b | input
| .one.vars |= . + $b.one.vars
| .two.vars |= . + $b.two.vars' file2.json file1.json
file1.json must come after file2.json in order to preserve extras.
I'm currently working through an issue, and can't seem to figure this one out. Here's some data so you know what I'm talking about below:
foo.json
{
"Schedule": [
{
"deviceId": 123,
"reservationId": 123456,
"username": "jdoe"
},
{
"deviceId": 456,
"reservationId": 589114,
"username": "jsmith"
}
],
"serverTime": 1522863125.019958
}
bar.json
[
{
"a": {
"b": "10.0.0.1",
"c": "hostname1"
},
"deviceId": 123
},
{
"a": {
"b": "10.0.0.2",
"c": "hostname2"
},
"deviceId": 456
}
]
foobar.json
{
"Schedule": [
{
"deviceId": 123,
"reservationId": 123456,
"username": "jdoe",
"a": {
"b": "10.0.0.1",
"c": "hostname1"
}
}
},
{
"deviceId": 456,
"reservationId": 789101,
"username": "jsmith",
"a": {
"b": "10.0.0.2",
"c": "hostname2"
}
}
],
"serverTime": 1522863125.019958
}
I'm trying to use jq to do this, and had some help from this post: https://github.com/stedolan/jq/issues/1090
The goal is to be able to combine JSON, using some key as a common point between the documents. The data may be nested any amount of levels.. In this case foo.json has nested data only two levels deep, but needs to be combined with data nested 1 level deep.
Any and all suggestions would be super helpful. I'm also happy to clarify and answer questions if needed. Thank you!
With foobar.jq as follows:
def dict(f):
reduce .[] as $o ({}; .[$o | f | tostring] = $o ) ;
($bar | dict(.deviceId)) as $dict
| .Schedule |= map(. + ($dict[.deviceId|tostring] ))
the invocation:
jq -f foobar.jq --argfile bar bar.json foo.json
yields the output shown below.
Notice that the referents in the dictionary contain the full object (including the key/value pair for "deviceId"), but it's not necessary to del(.deviceId) because of the way + is defined in jq.
Output
{
"Schedule": [
{
"deviceId": 123,
"reservationId": 123456,
"username": "jdoe",
"a": {
"b": "10.0.0.1",
"c": "hostname1"
}
},
{
"deviceId": 456,
"reservationId": 589114,
"username": "jsmith",
"a": {
"b": "10.0.0.2",
"c": "hostname2"
}
}
],
"serverTime": 1522863125.019958
}