jq denormalize a nested field - json

disclaimer: indeed, there are already different answers (like JQ Join JSON files by key or denormalizing JSON with jq) for but none of them helped me yet or did have different circumstances I was unable to derive a solution from ;/
I have 2 files, both are lists of objects where one of them ha field references to object ids of the other one
given
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
}
}
]
and
[
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
]
my goal would be to get a denormalized object list:
expected
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]
while I'm able to do the main parts, I didn't challenged to bring both together yet:
with
example 1
jq -s '(.[1][] | select(.id == "5de82d5072f4a72ad5d5dcc1"))' objects.json referredObjects.json
I get
{
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
and with
example 2
jq -s '.[0][] | .reference = {}' objects.json referredObjects.json
I can manipulate any .reference getting
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {}
}
(even I loose the list structure)
But: I can't do s.th. like
execpted "join"
jq -s '.[0][] as $obj | $obj.reference = (.[1][] | select(.id == $obj.reference.id))' objects.json referredObjects.json
even approaches with foreach or reduce looks promising
jq -s '[foreach .[0][] as $obj ({}; .reference.id = ""; . + $obj )]' objects.json referredObjects.json
=>
[
{
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1"
},
"id": "5b9f50ccdcdf200283f29052"
}
]
where I expected to get the same as in second example
I end up in headaches and looking forward to write a ineffective while routine in any language ... hopefully I would appreciate any help on this
~Marcel

Transform the second file into an object where ids and names are paired and use it as a reference while updating the first file.
$ jq '(map({(.id): .}) | add) as $idx
| input
| map_values(.reference = $idx[.reference.id])' file2 file1
[
{
"id": "5b9f50ccdcdf200283f29052",
"reference": {
"id": "5de82d5072f4a72ad5d5dcc1",
"name": "FooBar"
}
}
]

The following solution uses the same strategy as used in the solution by #OguzIsmail but uses the built-in function INDEX/2 to construct the dictionary from the second file.
The important point is that this strategy allows the arrays in both files to be of arbitrary size.
Invocation
jq --argfile file2 file2.json -f program.jq file1.json
program.jq
INDEX($file2[]; .id) as $dict
| map(.reference.id as $id | .reference = $dict[$id])

Related

jq merge json via dynamic sub keys

I think I'm a step off from figuring out how to jq reduce via filter a key to another objects sub-key.
I'm trying to combine files (simplified from Elasticsearch's ILM Explain & ILM Policy API responses):
$ echo '{".siem-signals-default": {"modified_date": "siem", "version": 1 }, "kibana-event-log-policy": {"modified_date": "kibana", "version": 1 } }' > ip1.json
$ echo '{"indices": {".siem-signals-default-000001": {"action": "complete", "index": ".siem-signals-default-000001", "policy" : ".siem-signals-default"} } }' > ie1.json
Such that the resulting JSON is:
{
".siem-signals-default-000001": {
"modified_date": "siem",
"version": 1
"action": "complete",
"index": ".siem-signals-default-000001",
"policy": ".siem-signals-default"
}
}
Where ie1 is base JSON and for a child-object, its sub-element policy should line up to ip1's key and copy its sub-elements into itself. I've been trying to build off this, this, and this (from StackOverflow, also this, this, this from external sources). I'll list various rabbit hole attempts building off these, but they're all insufficient:
$ ((cat ie1.json | jq '.indices') && cat ip1.json) | jq -s 'map(to_entries)|flatten|from_entries' | jq '. as $v| reduce keys[] as $k({}; if true then .[$k] += $v[$k] else . end)'
{
".siem-signals-default": {
"modified_date": "siem",
"version": 1
},
".siem-signals-default-000001": {
"action": "complete",
"index": ".siem-signals-default-000001",
"policy": ".siem-signals-default"
},
"kibana-event-log-policy": {
"modified_date": "kibana",
"version": 1
}
}
$ jq --slurpfile ip1 ip1.json '.indices as $ie1|$ie1+{ilm: $ip1 }' ie1.json
{
".siem-signals-default-000001": {
"action": "complete",
"index": ".siem-signals-default-000001",
"policy": ".siem-signals-default"
},
"ilm": [
{
".siem-signals-default": {
"modified_date": "siem",
"version": 1
},
"kibana-event-log-policy": {
"modified_date": "kibana",
"version": 1
}
}
]
}
I also expected something like this to work, but it compile errors
$ jq -s ip1 ip1.json '. as $ie1|$ie1 + {ilm:(keys[] as $k; $ip1 | select(.policy == $ie1[$k]) | $ie1[$k] )}' ie1.json
jq: error: ip1/0 is not defined at <top-level>, line 1:
ip1
jq: 1 compile error
From this you can see, I've determined various ways to join the separate files, but though I have code I thought would play into filtering, it's not correct / taking effect. Does anyone have an idea how to get the filter part working? TIA
This assumes you are trying to combine the .indices object stored in ie1.json with an object within the object stored in ip1.json. As the keys upon to match are different, I further assumed that you want to match the field name from the .indices object, reduced by cutting off everything that comes after the last dash -, to the same key in the object from ip1.json.
To this end, ip1.json is read in from input as $ip (alternatively you can use jq --argfile ip ip1.json for that), then the .indices object is taken from the first input ie1.json and to the inner object accessed via with_entries(.value …) is added the result of a lookup within $ip at the matching and accordingly reduced .key.
jq '
input as $ip | .indices | with_entries(.value += $ip[.key | sub("-[^-]*$";"")])
' ie1.json ip1.json
{
".siem-signals-default-000001": {
"action": "complete",
"index": ".siem-signals-default-000001",
"policy": ".siem-signals-default",
"modified_date": "siem",
"version": 1
}
}
Demo
If instead of the .indices object's inner field nane you want to have the content of field .index as reference (which in your sample data has the same value), you can go with map_values instead of with_entries as you don't need the field's name anymore.
jq '
input as $ip | .indices | map_values(. += $ip[.index | sub("-[^-]*$";"")])
'ie1.json ip1.json
Demo
Note: I used sub with a regex to manipulate the key name, which you can easily adjust to your liking if in reality it is more complicated. If, however, the pattern is infact as simple as cutting off after the last dash, then using .[:rindex("-")] instead will also get the job done.
I also received offline feedback of a simple "workable for my use case" but not exact answer:
$ jq '.indices | map(. * input[.policy])' ie1.json ip1.json
[
{
"action": "complete",
"index": ".siem-signals-default-000001",
"policy": ".siem-signals-default",
"modified_date": "siem",
"version": 1
}
]
Posting in case someone runs into similar, but other answer's better.

jq with multiple select statements and an array

I've got some JSON like the following (I've filtered the output here):
[
{
"Tags": [
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
}
],
"c7n:MatchedFilters": [
"tag: example_tag_rule"
],
"another_key": "another_value_I_dont_want"
},
{
"Tags": [
{
"Key": "Name",
"Value": "example2"
}
],
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I'd like to create a csv file with the value within the Name key and all of the "c7n:MatchedFilters" in the array. I've made a few attempts but still can't get quite the output I expect. There's some example code and the output below:
#Prints the key that I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(.Key=="Name")|.Value'
"example1"
"example2"
#Prints all the filters in an array I'm after.
cat new.jq | jq -r '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(."c7n:MatchedFilters") | .[]'
[
"tag: example_tag_rule"
]
[
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
#Prints *all* the tags (including ones I don't want) and all the filters in the array I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | select((.[].Key=="Name") and (.[]."c7n:MatchedFilters"))'
[
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
},
{
"c7n:MatchedFilters": [
"tag: example_tag_rule"
]
}
]
[
{
"Key": "Name",
"Value": "example2"
},
{
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I hope this makes sense, let me know if I've missed anything.
Your attempts are not working because you start out with [.Tags[], {"c7n:MatchedFilters"}] to construct one array containing all the tags and an object containing the filters. You are then struggling to find a way to process this entire array at once because it jumbles together these unrelated things without any distinction. You will find it much easier if you don't combine them in the first place!
You want to find the single tag with a Key of "Name". Here's one way to find that:
first(
.Tags[]|
select(.Key=="Name")
).Value as $name
By using a variable binding we can save it for later and worry about constructing the array separately.
You say (in the comments) that you just want to concatenate the filters with spaces. You can do that easily enough:
(
."c7n:MatchedFilters"|
join(" ")
) as $filters
You can combine all this together like follows. Note that each variable binding leaves the input stream unchanged, so it's easy to compose everything.
jq --raw-output '
.[]|
first(
.Tags[]|
select(.Key=="Name")
).Value as $name|
(
."c7n:MatchedFilters"|
join(" ")
) as $filters|
[$name, $filters]|
#csv
Hopefully that's easy enough to read and separates out each concept. We break up the array into a stream of objects. For each object, we find the name and bind it to $name, we concatenate the filters and bind them to $filters, then we construct an array containing both, then we convert the array to a CSV string.
We don't need to use variables. We could just have a big array constructor wrapped around the expression to find the name and the expression to find the filters. But I hope you can see the variables make things a bit flatter and easier to understand.

Merge and Sort JSON using JQ

I have a file containing the following structure and unknown number of results:
{
"results": [
[
{
"field": "AccountID",
"value": "5177497"
},
{
"field": "Requests",
"value": "50900"
}
],
[
{
"field": "AccountID",
"value": "pro"
},
{
"field": "Requests",
"value": "251"
}
]
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
{
"results": [
[
{
"field": "AccountID",
"value": "5577497"
},
{
"field": "Requests",
"value": "51900"
}
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
There are multiple such results which are indexed as an array with the results folder. They are not seperated by a comma.
I am trying to just print The "AccountID" sorted by "Requests" in ZSH using jq. I have tried flattening them and using:
jq -r '.results[][0] |.value ' filename
jq -r '.results[][1] |.value ' filename
To get the Account ID and Requests seperately and sorting them. I don't think bash has a dictionary that can be used. The problem lies in the file as the Field and value are not key value pair but are both pairs. Therefore extracting them using the above two lines into seperate arrays and sorting by the second array seems a bit too long. I was wondering if there is a way to combine both the operations.
The other way is to combine it all to a string and sort it in ascending order. Python would probably have the best solution but the code requires to be a zsh or bash script.
Solutions that use sed, jq or any other ZSH supported compilers are welcome. If there is a way to create a dictionary in bash, please do let me know.
The projectd output requirement is just the Account ID vs Request Number.
5577497 has 51900 requests
5177497 has 50900 requests
pro has 251 requests
If you don't mind learning a little jq, it will probably be best to write a small jq program to do what you want.
To get you started, consider the following jq program, which assumes your input is a stream of valid JSON objects with a "results" key similar to your sample:
[inputs | .results[] | map( { (.field) : .value} ) | add]
After making minor changes to your input so that it consists of valid JSON objects, an invocation of jq with the -n option produces an array of AccountID/Requests objects:
[
{
"AccountID": "5177497",
"Requests": "50900"
},
{
"AccountID": "pro",
"Requests": "251"
},
{
"AccountID": "5577497",
"Requests": "51900"
}
]
You could (for example) now use jq's group_by to group these objects by AccountID, and thereby produce the result you want.
jq -S '.results[] | map( { (.field) : .value} ) | add' query-results-aggregate \
| jq -s -c 'group_by(.number_of_requests) | .[]'
This does the trick. Thanks to peak for the guidance.

jq: convert array to object indexed by filename?

Using jq how can I convert an array into object indexed by filename, or read multiple files into one object indexed by their filename?
e.g.
jq -s 'map(select(.roles[]? | contains ("mysql")))' -C dir/file1.json dir/file2.json
This gives me the data I want, but I need to know which file they came from.
So instead of
[
{ "roles": ["mysql"] },
{ "roles": ["mysql", "php"] }
]
for output, I want:
{
"file1": { "roles": ["mysql"] },
"file2": { "roles": ["mysql", "php"] }
}
I do want the ".json" file extension stripped too if possible, and just the basename (dir excluded).
Example
file1.json
{ "roles": ["mysql"] }
file2.json
{ "roles": ["mysql", "php"] }
file3.json
{ }
My real files obviously have other stuff in them too, but that should be enough for this example. file3 is simply to demonstrate "roles" is sometimes missing.
In other words: I'm trying to find files that contain "mysql" in their list of "roles". I need the filename and contents combined into one JSON object.
To simplify the problem further:
jq 'input_filename' f1 f2
Gives me all the filenames like I want, but I don't know how to combine them into one object or array.
Whereas,
jq -s 'map(input_filename)' f1 f2
Gives me the same filename repeated once for each file. e.g. [ "f1", "f1" ] instead of [ "f1", "f2" ]
If your jq has inputs (as does jq 1.5) then the task can be accomplished with just one invocation of jq.
Also, it might be more efficient to use any than iterating over all the elements of .roles.
The trick is to invoke jq with the -n option, e.g.
jq -n '
[inputs
| select(.roles and any(.roles[]; contains("mysql")))
| {(input_filename | gsub(".*/|\\.json$";"")): .}]
| add' file*.json
jq approach:
jq 'if (.roles[] | contains("mysql")) then {(input_filename | gsub(".*/|\\.json$";"")): .}
else empty end' ./file1.json ./file2.json | jq -s 'add'
The expected output:
{
"file1": {
"roles": [
"mysql"
]
},
"file2": {
"roles": [
"mysql",
"php"
]
}
}

How to update a subitem in a json file using jq?

Using jq I tried to update this json document:
{
"git_defaults": {
"branch": "master",
"email": "jenkins#host",
"user": "Jenkins"
},
"git_namespaces": [
{
"name": "NamespaceX",
"modules": [
"moduleA",
"moduleB",
"moduleC",
"moduleD"
]
},
{
"name": "NamespaceY",
"modules": [
"moduleE"
]
}
]
}
with adding moduleF to NamespaceY. I need to write the file back again to the original source file.
I came close (but no cigar) with:
jq '. | .git_namespaces[] | select(.name=="namespaceY").modules |= (.+ ["moduleF"])' config.json
and
jq '. | select(.git_namespaces[].name=="namespaceY").modules |= (.+ ["moduleF"])' config.json
The following filter should perform the update you want:
(.git_namespaces[] | select(.name=="NamespaceY").modules) += ["moduleF"]
Note that the initial '.|' in your attempt is not needed; that "NamespaceY" is capitalized in config.json; that the parens as shown are the keys to success; and that += can be used here.
One way to write back to the original file would perhaps be to use 'sponge'; other possibilities are discussed on the jq FAQ https://github.com/stedolan/jq/wiki/FAQ