Getting only desired properties from nested array values with jq - json

The structure I ultimately want would be:
{
"catalog": [
{
"name": "X",
"catalog": [
{ "name": "Y", "uniqueId": "Z" },
{ "name": "Q", "uniqueId": "B" }
]
}
]
}
This is what the existing structure looks like except there are many other properties at each level (https://gist.github.com/ajcrites/e0e0ca4ca3a08ff2dc401ec872e6094c). I just want to filter those out and get a JSON format that looks specifically like this.
I have started out with: jq '.catalog', but this returns only the array. I still want the catalog property name there. I can do this with jq '{catalog: .catalog[]}, but this prints out each catalog object individually which makes the whole output invalid JSON. I still want the properties to be in the array. Is there a way to filter specific property key-values within arrays using jq?

The following transforms the given input to the desired output and may well be what you want:
{catalog}
| .catalog |= map( {name, catalog} )
| .catalog[].catalog |= map( {name, uniqueId} )
| .catalog |= .[0:1]
However, it's not clear to me that this is really what you want, as you don't discuss the duplication in the given JSON input. So maybe you don't really want the last line in the above, or maybe you want duplicates to be handled in some other way, or ....
Anyway, the trick to keeping things simple here is to use |=.
An alternative approach would be to use del to delete the unwanted properties (rather than selecting the ones you want), but in the present case, that would be (at best) tedious.

You could start by using tostream to convert your sample.json
into a stream of [path, value] arrays as you can see by running
jq -c tostream sample.json
This will generate
[["catalog",0,"catalog",0,"name"],"Y"]
[["catalog",0,"catalog",0,"prop11"],""]
[["catalog",0,"catalog",0,"uniqueId"],"Z"]
[["catalog",0,"catalog",0,"uniqueId"]]
[["catalog",0,"catalog",1,"name"],"Y"]
[["catalog",0,"catalog",1,"prop11"],""]
...
reduce and setpath can be used to convert back into the
original form with a filter such as:
reduce (tostream|select(length==2)) as [$p,$v] (
{};
setpath($p;$v)
)
Adding conditionals makes it easy to omit properties at any level.
For example the following removes leaf attributes starting with "prop":
reduce (tostream|select(length==2)) as [$p,$v] (
{};
if $p[-1]|startswith("prop")
then .
else setpath($p;$v)
end
)
With your sample.json this produces
{
"catalog": [
{
"catalog": [
{
"name": "Y",
"uniqueId": "Z"
},
{
"name": "Y",
"uniqueId": "Z"
}
],
"name": "X"
},
{
"catalog": [
{
"name": "Y",
"uniqueId": "Z"
},
{
"name": "Y",
"uniqueId": "Z"
}
],
"name": "X"
}
]
}

If the goal is to remove certain properties, then one could do so using walk/1. For example, to remove properties whose names start with "prop":
walk(if type == "object"
then with_entries(select(.key|startswith("prop") | not))
else . end)
The same approach would also be applicable if the focus is on retaining certain properties, e.g.:
walk(if type == "object"
then with_entries(select(.key == "name" or .key == "uniqueId" or .key == "catalog"))
else . end)

You could build up a file that contains paths into the json (expressed as arrays) that you want to keep. Then filter out values that do not fit in those paths.
paths.json:
["catalog","name"]
["catalog","catalog","name"]
["catalog","catalog","uniqueId"]
Then filter values based on their paths. Using streams is a great way to go for this since it gives you access to these paths directly:
$ jq --slurpfile paths paths.json '
def keep_path($path): any($paths[]; . == [$path[] | select(strings)]);
fromstream(tostream | select(length == 1 or keep_path(.[0])))
' input.json

Related

jq with multiple select statements and an array

I've got some JSON like the following (I've filtered the output here):
[
{
"Tags": [
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
}
],
"c7n:MatchedFilters": [
"tag: example_tag_rule"
],
"another_key": "another_value_I_dont_want"
},
{
"Tags": [
{
"Key": "Name",
"Value": "example2"
}
],
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I'd like to create a csv file with the value within the Name key and all of the "c7n:MatchedFilters" in the array. I've made a few attempts but still can't get quite the output I expect. There's some example code and the output below:
#Prints the key that I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(.Key=="Name")|.Value'
"example1"
"example2"
#Prints all the filters in an array I'm after.
cat new.jq | jq -r '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(."c7n:MatchedFilters") | .[]'
[
"tag: example_tag_rule"
]
[
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
#Prints *all* the tags (including ones I don't want) and all the filters in the array I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | select((.[].Key=="Name") and (.[]."c7n:MatchedFilters"))'
[
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
},
{
"c7n:MatchedFilters": [
"tag: example_tag_rule"
]
}
]
[
{
"Key": "Name",
"Value": "example2"
},
{
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I hope this makes sense, let me know if I've missed anything.
Your attempts are not working because you start out with [.Tags[], {"c7n:MatchedFilters"}] to construct one array containing all the tags and an object containing the filters. You are then struggling to find a way to process this entire array at once because it jumbles together these unrelated things without any distinction. You will find it much easier if you don't combine them in the first place!
You want to find the single tag with a Key of "Name". Here's one way to find that:
first(
.Tags[]|
select(.Key=="Name")
).Value as $name
By using a variable binding we can save it for later and worry about constructing the array separately.
You say (in the comments) that you just want to concatenate the filters with spaces. You can do that easily enough:
(
."c7n:MatchedFilters"|
join(" ")
) as $filters
You can combine all this together like follows. Note that each variable binding leaves the input stream unchanged, so it's easy to compose everything.
jq --raw-output '
.[]|
first(
.Tags[]|
select(.Key=="Name")
).Value as $name|
(
."c7n:MatchedFilters"|
join(" ")
) as $filters|
[$name, $filters]|
#csv
Hopefully that's easy enough to read and separates out each concept. We break up the array into a stream of objects. For each object, we find the name and bind it to $name, we concatenate the filters and bind them to $filters, then we construct an array containing both, then we convert the array to a CSV string.
We don't need to use variables. We could just have a big array constructor wrapped around the expression to find the name and the expression to find the filters. But I hope you can see the variables make things a bit flatter and easier to understand.

Rename duplicate Keys in Json array data

I have a json data as below.
{
"Data":
[
"User": [
{"Name": "Solomon", "Age":20},
{"Name": "Absolom", "Age":30},
]
"Country": [
{"Name" : "US", "Resident" : "Permanent"},
{"Name" : "UK", "Resident" : "Temporary"}
]]}
There are two tags with same keys,
in Users there is Name key and in Country also i have Name key. I need to preprocess the json file to differentiate the keys. My expected result is below. Tried through awk and sed commands, but i could not find proper solution. Any suggestion would be helpful.
Expected result:
{
"Data":
[
"User": [
{"User_Name": "Solomon", "User_Age":20},
{"User_Name": "Absolom", "User_Age":30},
]
"Country": [
{"Country_Name" : "US", "Country_Resident" : "Permanent"},
{"Country_Name" : "UK", "Country_Resident" : "Temporary"}
]]}
Tag name should be appended to the attribute name.
This is what i have tried,
jq '[.[] | .["User_Name"] = .Name]' file_name.json
But it changes for both the tages User as well as Country
with the permission of the OP, here's a jtc based solution while waiting for the jq's (assuming the input JSON is fixed):
bash $ <file.json jtc -w'<Data>l[:]<L>k<.*>L:<>k' -u'"{L}_{}";' -tc
{
"Data": {
"Country": [
{ "Country_Name": "US", "Country_Resident": "Permanent" },
{ "Country_Name": "UK", "Country_Resident": "Temporary" }
],
"User": [
{ "User_Age": 20, "User_Name": "Solomon" },
{ "User_Age": 30, "User_Name": "Absolom" }
]
}
}
bash $
Explanation of the jtc parameters:
-w'<Data>l[:]<L>k<.*>L:<>k' :
walk path (-w) selects Data label (<Data>l)
and then each of the nested elements ([:]),
and memorizes its key/label into the namespace L (<L>k),
then finds further each labeled element using REGEX label search (<.*>L:)
and finally reinterpret found element's key/label as the value (<>k)
-u'"{L}_{}";':
for each found label (in step 1) update operation (-u) is applied using template
"{L}_{}";', where {L} is interpolated with preserved in the namespace L value and {} is getting interpolated with the currently found label (at the each iteration of the walk path)
the trailing ; (or any other symbol) is required to distinguish the argument of -u from a literal JSON.
-tc is used to display JSON in a semi-compact form.
PS. I'm the creator of jtc unix JSON processing tool. The disclaimer is required by SO.
As originally posted, neither the illustrative input nor the corresponding output is valid JSON, but the following has been tested using JSON based on the shown input:
.Data |= ( (.User |= map(with_entries(.key |= ("User_" + .))))
| (.Country |= map(with_entries(.key |= ("Country_" + .)))) )
Of course, the above may need tweaking depending on the actual requirements, and can be generalized in various ways, e.g. as shown below.
A generalization
.Data |= with_entries( (.key + "_") as $newkey
| .value |= map(with_entries(.key |= ($newkey + .))))
Here is an approach using jq Streaming
fromstream(tostream | .[0] |= if length < 4 then . else .[3]="\(.[1])_\(.[3])" end)
It works by using tostream to convert your input to a stream of arrays
[["Data","Country",0,"Name"],"US"]
[["Data","Country",0,"Resident"],"Permanent"]
[["Data","Country",0,"Resident"]]
[["Data","Country",1,"Name"],"UK"]
[["Data","Country",1,"Resident"],"Temporary"]
[["Data","Country",1,"Resident"]]
[["Data","Country",1]]
[["Data","User",0,"Age"],20]
[["Data","User",0,"Name"],"Solomon"]
[["Data","User",0,"Name"]]
[["Data","User",1,"Age"],30]
[["Data","User",1,"Name"],"Absolom"]
[["Data","User",1,"Name"]]
[["Data","User",1]]
[["Data","User"]]
[["Data"]]
then applying a simple update assignment |= expression to transform the stream into
[["Data","Country",0,"Country_Name"],"US"]
[["Data","Country",0,"Country_Resident"],"Permanent"]
[["Data","Country",0,"Country_Resident"]]
[["Data","Country",1,"Country_Name"],"UK"]
[["Data","Country",1,"Country_Resident"],"Temporary"]
[["Data","Country",1,"Country_Resident"]]
[["Data","Country",1]]
[["Data","User",0,"User_Age"],20]
[["Data","User",0,"User_Name"],"Solomon"]
[["Data","User",0,"User_Name"]]
[["Data","User",1,"User_Age"],30]
[["Data","User",1,"User_Name"],"Absolom"]
[["Data","User",1,"User_Name"]]
[["Data","User",1]]
[["Data","User"]]
[["Data"]]
then reversing the transformation with fromstream.
Try it online!

Use JQ to select specific, arbitrarily nested objects from JSON

I'm looking for efficient means to search through an large JSON object for "sub-objects" that match a filter (via select(), I imagine). However, the top-level JSON is an object with arbitrary nesting contained within, including more simple values, objects and arrays of objects. For example:
{
"name": "foo",
"class": "system",
"description": "top-level-thing",
"configuration": {
"status": "normal",
"uuid": "id"
},
"children": [
{
"id": "c1",
"class": "c1",
"children": [
{
"id": "c1.1",
"class": "c1.1"
},
{
"id": "c1.1",
"class": "FINDME"
}
]
},
{
"id": "c2",
"class": "FINDME"
}
],
"thing": {
"id": "c3",
"class": "FINDME"
}
}
I have a solution which does part of what I want (and is understandable):
jq -r '.. | arrays | .[] | select(.class=="FINDME"?) | .id'
which returns:
c2
c1.1
... however, it misses c3, plus it changes the order of items output. Additionally I'm expecting this to operate on potentially very large JSON structures, I would like to make sure I find an efficient solution. Bonus points for something that remains readable by jq neophytes (myself included).
FWIW, references I was using to help me on the way, in case they help others:
Select objects based on value of variable in object using jq
How to use jq to find all paths to a certain key
Recursive search values by key
For small to modest-sized JSON input, you're on the right track with ..
but it seems you want to select objects, like so:
.. | objects | select(.class=="FINDME"?) | .id
For JSON documents that are very large, this might require too much memory, so it may be worth knowing about jq's streaming parser. Unfortunately it's much more difficult to use, so I'd suggest trying the above, and if you're interested, look in the usual places for documentation about the --stream option.
Here's a streaming-parser solution. To make sense of it, you'll need to read up on the --stream option, but the key is that the output includes lines of the form: [PATH, VALUE]
program.jq
foreach inputs as $in (null;
if has("id") and has("class") then null
else . as $x
| $in
| if length != 2 then null
elif .[0][-1] == "id" then ($x + {id: .[-1]})
elif .[0][-1] == "class"
and .[-1] == "FINDME" then ($x + {class: .[-1]})
else $x
end
end;
select(has("id") and has("class")) | .id )
Invocation
jq -n --stream -f program.jq input.json
Output with sample input
"c1.1"
"c2"
"c3"

Replace subkey without exact path in jq

Example JSON file:
{
"u": "stuff",
"x": [1,2,3],
"y": {
"field": "value"
},
"z": {
"zz": {
"name": "change me",
"more": "stuff"
},
"randomKey": {
"name": "change me",
"random": "more stuff"
}
}
}
How can I update all the name fields to "something", maintaining the rest of the JSON file the same?
{
"u": "stuff",
"x": [1,2,3],
"y": {
"field": "value"
},
"z": {
"zz": {
"name": "something",
"more": "stuff"
},
"randomKey": {
"name": "something",
"random": "more stuff"
}
}
}
With a direct path, this would be easy, but the parent keys (z and randomKey in these case) varies.
I tried something like:
jq '.z | .. | .name? |= "something"' file.json
And it's updating the names, but putting also all the recursive stuff..
If it is acceptable to change the "name" field wherever it occurs, you could use walk/1:
walk(if type == "object" and has("name") then .name = "something" else . end)
Please note that walk/1 was only included with jq after jq 1.5 was released. If your jq does not have it, then you can find its definition on the jq FAQ, for example.
If you only want to modify the "name" field in the "z" context, then consider:
.z |= with_entries(if .value.name?
then .value.name = "something"
else . end)
Assuming every value within z has a name property, you could do this:
$ jq --arg newname 'something' '.z[].name = $newname' input.json
Using [] on an object will yield all the values contained in that object. And for each of those values, we were simply setting the name to the new name.
If you needed to be more selective with what gets updated, you'll have to add more conditions to what objects to update. In general, I'd use peak's approach, but here's another way it could be achieved using a structure similar to the first approach, assuming we only want to update objects that already have a name property:
$ jq --arg newname 'something' '(.z[] | select(has("name")).name) = $newname' input.json
It's important to wrap the LHS of the assignment in parentheses, we don't want to change the context prior to the assignment, otherwise we won't see the rest of the results.

Using jq to list keys in a JSON object

I have a hierarchically deep JSON object created by a scientific instrument, so the file is somewhat large (1.3MB) and not readily readable by people. I would like to get a list of keys, up to a certain depth, for the JSON object. For example, given an input object like this
{
"acquisition_parameters": {
"laser": {
"wavelength": {
"value": 632,
"units": "nm"
}
},
"date": "02/03/2525",
"camera": {}
},
"software": {
"repo": "github.com/username/repo",
"commit": "a7642f",
"branch": "develop"
},
"data": [{},{},{}]
}
I would like an output like such.
{
"acquisition_parameters": [
"laser",
"date",
"camera"
],
"software": [
"repo",
"commit",
"branch"
]
}
This is mainly for the purpose of being able to enumerate what is in a JSON object. After processing the JSON objects from the instrument begin to diverge: for example, some may have a field like .frame.cross_section.stats.fwhm, while others may have .sample.species, so it would be convenient to be able to interrogate the JSON object on the command line.
The following should do exactly what you want
jq '[(keys - ["data"])[] as $key | { ($key): .[$key] | keys }] | add'
This will give the following output, using the input you described above:
{
"acquisition_parameters": [
"camera",
"date",
"laser"
],
"software": [
"branch",
"commit",
"repo"
]
}
Given your purpose you might have an easier time using the paths builtin to list all the paths in the input and then truncate at the desired depth:
$ echo '{"a":{"b":{"c":{"d":true}}}}' | jq -c '[paths|.[0:2]]|unique'
[["a"],["a","b"]]
Here is another variation uing reduce and setpath which assumes you have a specific set of top-level keys you want to examine:
. as $v
| reduce ("acquisition_parameters", "software") as $k (
{}; setpath([$k]; $v[$k] | keys)
)