how to merge two JSON objects using jq? - json

I have two file json. I want to append two array of SomeFile2.json to SomeFile1.json as below.
SomeFile1.json
[
{
"DNSName": "CLB-test-112a877451.ap-northeast-1.elb.amazonaws.com",
"Instances": [
{
"InstanceId": "i-0886ed703de64028a"
}
]
},
{
"DNSName": "CLB-test1-156925981.ap-northeast-1.elb.amazonaws.com",
"Instances": [
{
"InstanceId": "i-0561634c4g3b4fa25"
}
]
}
]
SomeFile2.json
[
{
"InstanceId": "i-0886ed703de64028a",
"State": "InService"
},
{
"InstanceId": "i-0561634c4g3b4fa25",
"State": "InService"
}
]
I want the result as below:
[
{
"DNSName": "CLB-test-112a877451.ap-northeast-1.elb.amazonaws.com",
"Instances": [
{
"InstanceId": "i-0886ed703de64028a"
"State": "InService"
}
]
},
{
"DNSName": "CLB-test1-156925981.ap-northeast-1.elb.amazonaws.com",
"Instances": [
{
"InstanceId": "i-0561634c4g3b4fa25"
"State": "InService"
}
]
}
]
I'm processing in bash shell via jq. But, unsuccessful.

Since the contents of the second file are evidently intended to define a mapping from InstanceId to State, let's start by hypothesizing the following invocation of jq:
jq --argfile dict SomeFile2.json -f program.jq SomeFile1.json
Next, let's create a suitable dictionary:
reduce $dict[] as $x ({}; . + ($x|{(.InstanceId): .State}))) as $d
Now the rest is easy:
map(.Instances |= map(. + {State: $d[.InstanceId]}))
Putting the pieces together in program.jq:
(reduce $dict[] as $x ({}; . + ($x|{(.InstanceId): .State}))) as $d
| map(.Instances |= map(. + {State: $d[.InstanceId]}))
Alternatives
The dictionary as above can be constructed without using reduce, as follows:
($dict | map( {(.InstanceId): .State}) | add) as $d
Another alternative is to use INDEX/2:
(INDEX($dict[]; .InstanceId) | map_values(.State))) as $d
If your jq does not have INDEX/2 you can snarf its def from
https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq

Since I find jq pretty hard, I started in a procedural way: using ruby's json module:
ruby -rjson -e '
states = JSON.parse(File.read(ARGV.shift)).map {|o| [o["InstanceId"], o["State"]]}.to_h
data = JSON.parse(File.read(ARGV.shift))
data.each do |obj|
obj["Instances"].each do |instance|
instance["State"] = states[instance["InstanceId"]] || "unknown"
end
end
puts JSON.pretty_generate data
' SomeFile2.json SomeFile1.json
But we want jq, so after some trial and error, and finding this in the manual: https://stedolan.github.io/jq/manual/#Complexassignments -- (note, I changed the state for one of the instances so I could verify the output better)
$ cat SomeFile2.json
[
{
"InstanceId": "i-0886ed703de64028a",
"State": "InService"
},
{
"InstanceId": "i-0561634c4g3b4fa25",
"State": "NOTInService"
}
]
First, extract the states into an object mapping the id to the state:
$ state_map=$( jq -c 'map({"key":.InstanceId, "value":.State}) | from_entries' SomeFile2.json )
$ echo "$state_map"
{"i-0886ed703de64028a":"InService","i-0561634c4g3b4fa25":"NOTInService"}
Then, update the instances in the first file:
jq --argjson states "$state_map" '.[].Instances[] |= . + {"State": ($states[.InstanceId] // "unknown")}' SomeFile1.json
[
{
"DNSName": "CLB-test-112a877451.ap-northeast-1.elb.amazonaws.com",
"Instances": [
{
"InstanceId": "i-0886ed703de64028a",
"State": "InService"
}
]
},
{
"DNSName": "CLB-test1-156925981.ap-northeast-1.elb.amazonaws.com",
"Instances": [
{
"InstanceId": "i-0561634c4g3b4fa25",
"State": "NOTInService"
}
]
}
]

Related

Trying to merge 2 JSON documents using JQ

I'm using JQ CLI to merge JSON from document to another. The issue I am facing is that I have select by the value of a property, rather than by a numeric array index
The first file contains a chunk of JSON jqtest.json:
{
"event": [
{
"listen": "test",
"script": {
"exec": [],
"type": "text/javascript"
}
}
]
}
The second file is where I want to merge the JSON into under "accounts" collection.json:
{
"item": [
{
"name": "accounts",
"item": [
{
"name": "Retrieves the collection of Account resources."
}
]
},
{
"name": "accounts mapped",
"item": [
{
"name": "Retrieves the collection of AccountMapped resources."
}
]
}
]
}
What i am trying to do is merge it under "accounts" and under "name": "Retrieves the collection of Account resources." I use the command:
jq -s '
.[0].event += .[1].item |
map(select(.name=="accounts")) |
.[].item
' jqtest.json collection.json
But when executed nothing is outputted. What am doing wrong with JQ or is there another tool i can use to accomplish this?
{
"item": [
{
"name": "accounts",
"item": [
{
"name": "Retrieves the collection of Account resources.",
"event": [
{
"listen": "test",
"script": {
"exec": [],
"type": "text/javascript"
}
}
]
},
{
"name": "accounts mapped",
"item": [
{
"name": "Retrieves the collection of AccountMapped resources."
}
]
}
]
}
]
}
To merge two objects, one can use obj1 + obj2. From this, it follows that obj1 += obj2 can be used to merge an object (obj2) into another existing object (obj1).
Maybe that's what you trying to use. If so, you were missing parens around the expression producing the object to merge into (causing the code to be misparsed), you have the operands to += backwards, you don't actually produce the correct objects on each side of += (or even objects at all), and you didn't narrow down your output (accidentally including jqtest in the output).
Fixed:
jq -s '
( .[1].item[] | select( .name == "accounts" ) | .item[] ) += .[0] | .[1]
' jqtest.json collection.json
Demo on jqplay
I find the following clearer (less mental overhead):
jq -s '
.[0] as $to_insert |
.[1] | ( .item[] | select( .name == "accounts" ) | .item[] ) += $to_insert
' jqtest.json collection.json
Demo
That said, I would avoid slurping in favour of --argfile.
jq --argfile to_insert jqtest.json '
( .item[] | select( .name == "accounts" ) | .item[] ) += $to_insert
' collection.json
Demo on jqplay

Using jq to fetch key-value pairs from a json file

I am trying to get key#value pairs of JSON file below using jq
{
"STUFF_RELATED1": "STUFF_RELATED1",
"STUFF_RELATED2": "STUFF_RELATED2",
"THINGS": {
"THING_2": {
"details": {
"stuff_branch": "user/dev"
},
"repository": "path/to/repo",
"branch": "master",
"revision": "dsfkes4s34jlis4jsj4lis4sli3"
},
"THING_1": {
"details": {
"stuff_branch": "master"
},
"repository": "path/to/repo",
"branch": "master",
"revision": "dsfkes4s34jlis4jsj4lis4sli3"
}
},
"STUFF": {
"revision": "4u324i324iy32g",
"branch": "master"
}
}
The key#value pair should look like this:
THING_1#dsfkes4s34jlis4jsj4lis4sli3
Currently I have tried this on my own:
jq -r ' .THINGS | keys[] as $k | "($k)#(.[$k].revision)" ' things.json
But it does not give the resul that I really want.:( Thanks in advance!
You need to escape ( :
jq -r ' .THINGS | keys[] as $k | "\($k)#\(.[$k].revision)" ' things.json

Transforming high-redundancy CSV data into nested JSON using jq (or awk)?

Say I have the following CSV data in input.txt:
broker,client,contract_id,task_type,doc_names
alice#company.com,John Doe,33333,prove-employment,important-doc-pdf
alice#company.com,John Doe,33333,prove-employment,paperwork-pdf
alice#company.com,John Doe,33333,submit-application,blah-pdf
alice#company.com,John Doe,00000,prove-employment,test-pdf
alice#company.com,John Doe,00000,submit-application,test-pdf
alice#company.com,Jane Smith,11111,prove-employment,important-doc-pdf
alice#company.com,Jane Smith,11111,submit-application,paperwork-pdf
alice#company.com,Jane Smith,11111,submit-application,unimportant-pdf
bob#company.com,John Doe,66666,submit-application,pdf-I-pdf
bob#company.com,John Doe,77777,submit-application,pdf-J-pdf
And I'd like to transform it into the following JSON:
[
{"broker": "alice#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": 33333,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["important-doc-pdf", "paperwork-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["blah-pdf"]
}
]
},
{
"contract_id": 00000,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["test-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["test-pdf"]
}
]
}
]
},
{
"client": "Jane Smith",
"contracts": [
{
"contract_id": 11111,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["important-doc-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["paperwork-pdf", "unimportant-pdf"]
}
]
}
]
}
]
},
{"broker": "bob#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": 66666,
"documents": [
{
"task_type": "submit-application",
"doc_names": ["pdf-I-pdf"]
}
]
},
{
"contract_id": 77777,
"documents": [
{
"task_type": "submit-application",
"doc_names": ["pdf-J-pdf"]
}
]
}
]
}
]
}
]
Based on a quick search, it seems like people recommend jq for this type of task. I read some of the manual and played around with it for a bit, and I'm understand that it's meant to be used by composing its filters together to produce the desired output.
So far, I've been able to transform each line of the CSV into a list of strings for example with jq -Rs '. / "\n" | .[] | . / ","'.
But I'm having trouble with something even a bit more complex, like assigning a key to each value on a line (not even the final JSON form I'm looking to get). This is what I tried: jq -Rs '[inputs | . / "\n" | .[] | . / "," as $line | {"broker": $line[0], "client": $line[1], "contract_id": $line[2], "task_type": $line[3], "doc_name": $line[4]}]', and it gives back [].
Maybe jq isn't the best tool for the job here? Perhaps I should be using awk? If all else fails, I'd probably just parse this using Python.
Any help is appreciated.
Here's a jq solution that assumes the CSV input is very simple (e.g., no field has embedded commas), followed by a brief explanation.
To handle arbitrary CSV, you could use a CSV-to-TSV conversion tool in conjunction with the jq program given below with trivial modifications.
A Solution
The following jq program assumes jq is invoked with the -R option.
(The -n option should not be used as the header row is read without using input.)
# sort-free plug-in replacement for the built-in group_by/1
def GROUP_BY(f):
reduce .[] as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| [.[][]]
;
# input: an array
def obj($keys):
. as $in | reduce range(0; $keys|length) as $i ({}; .[$keys[$i]] = $in[$i]);
# input: an array to be grouped by $keyname
# output: an object
def gather_by($keyname; $newkey):
($keyname + "s") as $plural
| GROUP_BY(.[$keyname])
| {($plural): map({($keyname): .[0][$keyname],
($newkey) : map(del(.[$keyname])) } ) }
;
split(",") as $headers
| [inputs
| split(",")
| obj($headers)
]
| gather_by("broker"; "clients")
| .brokers[].clients |= (gather_by("client"; "contracts") | .clients)
| .brokers[].clients[].contracts |= (gather_by("contract_id"; "documents") | .contract_ids)
| .brokers[].clients[].contracts[].documents |= (gather_by("task_type"; "doc_names") | .task_types)
| .brokers[].clients[].contracts[].documents[].doc_names |= map(.doc_names)
| .brokers
Explanation
The expected output as shown respects the ordering of the input lines, and so jq's built-in group_by may not be appropriate; hence GROUP_BY is defined above as a plug-in replacement for group_by. It's a bit complicated because it is completely generic in the same way as group_by.
The obj filter converts an array into an object with keys $keys.
The gather_by filter groups together items in the input array as appropriate for the present problem.
gather_by/2 example
To get a feel for what gather_by does, here's an example:
[ {a:1,b:1}, {a:2, b:2}, {a:1,b:0}] | gather_by("a"; "objects")
produces:
{
"as": [
{
"a": 1,
"objects": [
{
"b": 1
},
{
"b": 0
}
]
},
{
"a": 2,
"objects": [
{
"b": 2
}
]
}
]
}
Output
[
{
"broker": "alice#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": "33333",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"important-doc-pdf",
"paperwork-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"blah-pdf"
]
}
]
},
{
"contract_id": "00000",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"test-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"test-pdf"
]
}
]
}
]
},
{
"client": "Jane Smith",
"contracts": [
{
"contract_id": "11111",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"important-doc-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"paperwork-pdf",
"unimportant-pdf"
]
}
]
}
]
}
]
},
{
"broker": "bob#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": "66666",
"documents": [
{
"task_type": "submit-application",
"doc_names": [
"pdf-I-pdf"
]
}
]
},
{
"contract_id": "77777",
"documents": [
{
"task_type": "submit-application",
"doc_names": [
"pdf-J-pdf"
]
}
]
}
]
}
]
}
]
Here's a jq solution which uses a generic approach that makes no reference to specific header names except for the specification of certain plural forms.
The generic approach is encapsulated in the recursively defined filter nested_group_by($headers; $plural).
The main assumptions are:
The CVS input can be parsed by splitting on commas;
jq is invoked with the -R command-line option.
# Emit a stream of arrays, each array being a group defined by a value of f,
# which can be any jq filter that produces exactly one value for each item in `stream`.
def GROUP_BY(f):
reduce .[] as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| [.[][]]
;
def obj($headers):
. as $in | reduce range(0; $headers|length) as $i ({}; .[$headers[$i]] = $in[$i]);
def nested_group_by($array; $plural):
def plural: $plural[.] // (. + "s");
if $array == [] then .
elif $array|length == 1 then GROUP_BY(.[$array[0]]) | map(map(.[])[])
else ($array[1] | plural) as $groupkey
| $array[0] as $a0
| GROUP_BY(.[$a0])
| map( { ($a0): .[0][$a0], ($groupkey): map(del( .[$a0] )) } )
| map( .[$groupkey] |= nested_group_by($array[1:]; $plural) )
end
;
split(",") as $headers
| {contract_id: "contracts",
task_type: "documents",
doc_names: "doc_names" } as $plural
| [inputs
| split(",")
| obj($headers)
]
| nested_group_by($headers; $plural)

jq: sort object values

I want to sort this data structure by the object keys (easy with -S and sort the object values (the arrays) by the 'foo' property.
I can sort them with
jq -S '
. as $in
| keys[]
| . as $k
| $in[$k] | sort_by(.foo)
' < test.json
... but that loses the keys.
I've tried variations of adding | { "\($k)": . }, but then I end up with a list of objects instead of one object. I also tried variations of adding to $in (same problem) or using $in = $in * { ... }, but that gives me syntax errors.
The one solution I did find was to just have the separate objects and then pipe it into jq -s add, but ... I really wanted it to work the other way. :-)
Test data below:
{
"": [
{ "foo": "d" },
{ "foo": "g" },
{ "foo": "f" }
],
"c": [
{ "foo": "abc" },
{ "foo": "def" }
],
"e": [
{ "foo": "xyz" },
{ "foo": "def" }
],
"ab": [
{ "foo": "def" },
{ "foo": "abc" }
]
}
Maybe this?
jq -S '.[] |= sort_by(.foo)'
Output
{
"": [
{
"foo": "d"
},
{
"foo": "f"
},
{
"foo": "g"
}
],
"ab": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"c": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"e": [
{
"foo": "def"
},
{
"foo": "xyz"
}
]
}
#user197693 had a great answer. A suggestion I got in a private message elsewhere was to use
jq -S 'with_entries(.value |= sort_by(.foo))'
If for some reason using the -S command-line option is not a satisfactory option, you can also perform the by-key sort using the to_entries | sort_by(.key) | from_entries idiom. So a complete solution to the problem would be:
.[] |= sort_by(.foo)
| to_entries | sort_by(.key) | from_entries

insert a json file into json

I'd like to know a quick way to insert a json to json.
$ cat source.json
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"environment": [
{
"name": "SERVICE_MANIFEST",
"value": ""
},
{
"name": "SERVICE_PORT",
"value": "4321"
}
]
}
]
}
The SERVICE_MANIFEST is content of another json file
$ cat service_manifest.json
{
"connections": {
"port": "1234"
},
"name": "foo"
}
I try to make it with jq command
cat service_manifest.json |jq --arg SERVICE_MANIFEST - < source.json
But seems it doesn't work
Any ideas? The final result still should be a valid json file
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"environment": [
{
"name": "SERVICE_MANIFEST",
"value": {
"connections": {
"port": "1234"
},
"name": "foo"
}
},
...
]
}
],
...
}
Updates.
Thanks, here is the command I run from your sample.
$ jq --slurpfile sm service_manifest.json '.containerDefinitions[].environment[] |= (select(.name=="SERVICE_MANIFEST").value=$sm)' source.json
But the result is an array, not list.
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"environment": [
{
"name": "SERVICE_MANIFEST",
"value": [
{
"connections": {
"port": "1234"
},
"name": "foo"
}
]
},
{
"name": "SERVICE_PORT",
"value": "4321"
}
]
}
]
}
You can try this jq command:
jq --slurpfile sm SERVICE_MANIFEST '.containerDefinitions[].environment[] |= (select(.name=="SERVICE_MANIFEST").value=$sm[])' file
--slurpfile assigns the content of the file to the variable sm
The filter replaces the array .containerDefinitions[].environment[] with the content of the file only on the element having SERVICE_MANIFEST as name.
A simple solution would use --argfile and avoid select:
< source.json jq --argfile sm service_manifest.json '
.containerDefinitions[0].environment[0].value = $sm '
Or if you want only to update the object(s) with .name == "SERVICE_MANIFEST" you could use the filter:
.containerDefinitions[].environment
|= map(if .name == "SERVICE_MANIFEST"
then .value = $sm
else . end)
Variations
There is no need for any "--arg"-style parameter at all, as illustrated by the following:
jq -s '.[1] as $sm
| .[0] | .containerDefinitions[0].environment[0].value = $sm
' source.json service_manifest.json