jq: sort object values - json

I want to sort this data structure by the object keys (easy with -S and sort the object values (the arrays) by the 'foo' property.
I can sort them with
jq -S '
. as $in
| keys[]
| . as $k
| $in[$k] | sort_by(.foo)
' < test.json
... but that loses the keys.
I've tried variations of adding | { "\($k)": . }, but then I end up with a list of objects instead of one object. I also tried variations of adding to $in (same problem) or using $in = $in * { ... }, but that gives me syntax errors.
The one solution I did find was to just have the separate objects and then pipe it into jq -s add, but ... I really wanted it to work the other way. :-)
Test data below:
{
"": [
{ "foo": "d" },
{ "foo": "g" },
{ "foo": "f" }
],
"c": [
{ "foo": "abc" },
{ "foo": "def" }
],
"e": [
{ "foo": "xyz" },
{ "foo": "def" }
],
"ab": [
{ "foo": "def" },
{ "foo": "abc" }
]
}

Maybe this?
jq -S '.[] |= sort_by(.foo)'
Output
{
"": [
{
"foo": "d"
},
{
"foo": "f"
},
{
"foo": "g"
}
],
"ab": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"c": [
{
"foo": "abc"
},
{
"foo": "def"
}
],
"e": [
{
"foo": "def"
},
{
"foo": "xyz"
}
]
}

#user197693 had a great answer. A suggestion I got in a private message elsewhere was to use
jq -S 'with_entries(.value |= sort_by(.foo))'

If for some reason using the -S command-line option is not a satisfactory option, you can also perform the by-key sort using the to_entries | sort_by(.key) | from_entries idiom. So a complete solution to the problem would be:
.[] |= sort_by(.foo)
| to_entries | sort_by(.key) | from_entries

Related

Reverse flatten nested arrays with jq

Suppose I have the following nested data structure
cat nested.json
[
{
"a": "a",
"b": [
{"c": "c"}
]
},
{
"a": "a",
"b": [
{"c": "c"}
]
}
]
I can flatten it like this
cat nested.json | jq '
[. as $in | reduce paths(scalars) as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) }
)]
' > flat.json
cat flat.json
[
{
"0.a": "a",
"0.b.0.c": "c",
"1.a": "a",
"1.b.0.c": "c"
}
]
To reverse the flatten operation with jq I tried this
cat flat.json | jq '
.[0] | reduce to_entries[] as $kv ({};
setpath($kv.key|split("."); $kv.value)
)
'
{
"0": {
"a": "a",
"b": {
"0": {
"c": "c"
}
}
},
"1": {
"a": "a",
"b": {
"0": {
"c": "c"
}
}
}
}
However, I want to convert numbers in the setpath param to create arrays. This doesn't quite work, but I think it's close?
cat flat.json | jq '
def makePath($s): [split(".")[] | if (test("\\d+")) then tonumber else . end];
.[0] | reduce to_entries[] as $kv ({}; setpath(makePath($kv.key); $kv.value))
'
jq: error (at <stdin>:8): split input and separator must be strings
The desired output is the same as the original data in nested.json
Wouldn't it be simpler to do it this way:
Encode your input with
jq '[path(.. | scalars) as $path | {($path | join(".")): getpath($path)}] | add' nested.json
{
"0.a": "a",
"0.b.0.c": "c",
"1.a": "a",
"1.b.0.c": "c"
}
And decode it with
jq 'reduce to_entries[] as $item (null; setpath($item.key / "." | map(tonumber? // .); $item.value))' flat.json
[
{
"a": "a",
"b": [
{
"c": "c"
}
]
},
{
"a": "a",
"b": [
{
"c": "c"
}
]
}
]
However, if you don't care about your special dot notation (e.g. "0.b.0.c") for the encoded keys, you can simply convert the path array into a JSON string instead, having albeit uglier virtually the same effect. Moreover, it would automatically enable the handling of input object field names that include dots (e.g. {"a.b":3}) or look like numbers (e.g. {"42":"Panic!"}).
Using JSON keys, encode your input with
jq '[path(.. | scalars) as $path | {($path | tojson): getpath($path)}] | add' nested.json
{
"[0,\"a\"]": "a",
"[0,\"b\",0,\"c\"]": "c",
"[1,\"a\"]": "a",
"[1,\"b\",0,\"c\"]": "c"
}
And decode it with
jq 'reduce to_entries[] as $item (null; setpath($item.key | fromjson; $item.value))' flat.json
[
{
"a": "a",
"b": [
{
"c": "c"
}
]
},
{
"a": "a",
"b": [
{
"c": "c"
}
]
}
]

Transforming high-redundancy CSV data into nested JSON using jq (or awk)?

Say I have the following CSV data in input.txt:
broker,client,contract_id,task_type,doc_names
alice#company.com,John Doe,33333,prove-employment,important-doc-pdf
alice#company.com,John Doe,33333,prove-employment,paperwork-pdf
alice#company.com,John Doe,33333,submit-application,blah-pdf
alice#company.com,John Doe,00000,prove-employment,test-pdf
alice#company.com,John Doe,00000,submit-application,test-pdf
alice#company.com,Jane Smith,11111,prove-employment,important-doc-pdf
alice#company.com,Jane Smith,11111,submit-application,paperwork-pdf
alice#company.com,Jane Smith,11111,submit-application,unimportant-pdf
bob#company.com,John Doe,66666,submit-application,pdf-I-pdf
bob#company.com,John Doe,77777,submit-application,pdf-J-pdf
And I'd like to transform it into the following JSON:
[
{"broker": "alice#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": 33333,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["important-doc-pdf", "paperwork-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["blah-pdf"]
}
]
},
{
"contract_id": 00000,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["test-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["test-pdf"]
}
]
}
]
},
{
"client": "Jane Smith",
"contracts": [
{
"contract_id": 11111,
"documents": [
{
"task_type": "prove-employment",
"doc_names": ["important-doc-pdf"]
},
{
"task_type": "submit-application",
"doc_names": ["paperwork-pdf", "unimportant-pdf"]
}
]
}
]
}
]
},
{"broker": "bob#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": 66666,
"documents": [
{
"task_type": "submit-application",
"doc_names": ["pdf-I-pdf"]
}
]
},
{
"contract_id": 77777,
"documents": [
{
"task_type": "submit-application",
"doc_names": ["pdf-J-pdf"]
}
]
}
]
}
]
}
]
Based on a quick search, it seems like people recommend jq for this type of task. I read some of the manual and played around with it for a bit, and I'm understand that it's meant to be used by composing its filters together to produce the desired output.
So far, I've been able to transform each line of the CSV into a list of strings for example with jq -Rs '. / "\n" | .[] | . / ","'.
But I'm having trouble with something even a bit more complex, like assigning a key to each value on a line (not even the final JSON form I'm looking to get). This is what I tried: jq -Rs '[inputs | . / "\n" | .[] | . / "," as $line | {"broker": $line[0], "client": $line[1], "contract_id": $line[2], "task_type": $line[3], "doc_name": $line[4]}]', and it gives back [].
Maybe jq isn't the best tool for the job here? Perhaps I should be using awk? If all else fails, I'd probably just parse this using Python.
Any help is appreciated.
Here's a jq solution that assumes the CSV input is very simple (e.g., no field has embedded commas), followed by a brief explanation.
To handle arbitrary CSV, you could use a CSV-to-TSV conversion tool in conjunction with the jq program given below with trivial modifications.
A Solution
The following jq program assumes jq is invoked with the -R option.
(The -n option should not be used as the header row is read without using input.)
# sort-free plug-in replacement for the built-in group_by/1
def GROUP_BY(f):
reduce .[] as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| [.[][]]
;
# input: an array
def obj($keys):
. as $in | reduce range(0; $keys|length) as $i ({}; .[$keys[$i]] = $in[$i]);
# input: an array to be grouped by $keyname
# output: an object
def gather_by($keyname; $newkey):
($keyname + "s") as $plural
| GROUP_BY(.[$keyname])
| {($plural): map({($keyname): .[0][$keyname],
($newkey) : map(del(.[$keyname])) } ) }
;
split(",") as $headers
| [inputs
| split(",")
| obj($headers)
]
| gather_by("broker"; "clients")
| .brokers[].clients |= (gather_by("client"; "contracts") | .clients)
| .brokers[].clients[].contracts |= (gather_by("contract_id"; "documents") | .contract_ids)
| .brokers[].clients[].contracts[].documents |= (gather_by("task_type"; "doc_names") | .task_types)
| .brokers[].clients[].contracts[].documents[].doc_names |= map(.doc_names)
| .brokers
Explanation
The expected output as shown respects the ordering of the input lines, and so jq's built-in group_by may not be appropriate; hence GROUP_BY is defined above as a plug-in replacement for group_by. It's a bit complicated because it is completely generic in the same way as group_by.
The obj filter converts an array into an object with keys $keys.
The gather_by filter groups together items in the input array as appropriate for the present problem.
gather_by/2 example
To get a feel for what gather_by does, here's an example:
[ {a:1,b:1}, {a:2, b:2}, {a:1,b:0}] | gather_by("a"; "objects")
produces:
{
"as": [
{
"a": 1,
"objects": [
{
"b": 1
},
{
"b": 0
}
]
},
{
"a": 2,
"objects": [
{
"b": 2
}
]
}
]
}
Output
[
{
"broker": "alice#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": "33333",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"important-doc-pdf",
"paperwork-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"blah-pdf"
]
}
]
},
{
"contract_id": "00000",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"test-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"test-pdf"
]
}
]
}
]
},
{
"client": "Jane Smith",
"contracts": [
{
"contract_id": "11111",
"documents": [
{
"task_type": "prove-employment",
"doc_names": [
"important-doc-pdf"
]
},
{
"task_type": "submit-application",
"doc_names": [
"paperwork-pdf",
"unimportant-pdf"
]
}
]
}
]
}
]
},
{
"broker": "bob#company.com",
"clients": [
{
"client": "John Doe",
"contracts": [
{
"contract_id": "66666",
"documents": [
{
"task_type": "submit-application",
"doc_names": [
"pdf-I-pdf"
]
}
]
},
{
"contract_id": "77777",
"documents": [
{
"task_type": "submit-application",
"doc_names": [
"pdf-J-pdf"
]
}
]
}
]
}
]
}
]
Here's a jq solution which uses a generic approach that makes no reference to specific header names except for the specification of certain plural forms.
The generic approach is encapsulated in the recursively defined filter nested_group_by($headers; $plural).
The main assumptions are:
The CVS input can be parsed by splitting on commas;
jq is invoked with the -R command-line option.
# Emit a stream of arrays, each array being a group defined by a value of f,
# which can be any jq filter that produces exactly one value for each item in `stream`.
def GROUP_BY(f):
reduce .[] as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| [.[][]]
;
def obj($headers):
. as $in | reduce range(0; $headers|length) as $i ({}; .[$headers[$i]] = $in[$i]);
def nested_group_by($array; $plural):
def plural: $plural[.] // (. + "s");
if $array == [] then .
elif $array|length == 1 then GROUP_BY(.[$array[0]]) | map(map(.[])[])
else ($array[1] | plural) as $groupkey
| $array[0] as $a0
| GROUP_BY(.[$a0])
| map( { ($a0): .[0][$a0], ($groupkey): map(del( .[$a0] )) } )
| map( .[$groupkey] |= nested_group_by($array[1:]; $plural) )
end
;
split(",") as $headers
| {contract_id: "contracts",
task_type: "documents",
doc_names: "doc_names" } as $plural
| [inputs
| split(",")
| obj($headers)
]
| nested_group_by($headers; $plural)

insert a json file into json

I'd like to know a quick way to insert a json to json.
$ cat source.json
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"environment": [
{
"name": "SERVICE_MANIFEST",
"value": ""
},
{
"name": "SERVICE_PORT",
"value": "4321"
}
]
}
]
}
The SERVICE_MANIFEST is content of another json file
$ cat service_manifest.json
{
"connections": {
"port": "1234"
},
"name": "foo"
}
I try to make it with jq command
cat service_manifest.json |jq --arg SERVICE_MANIFEST - < source.json
But seems it doesn't work
Any ideas? The final result still should be a valid json file
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"environment": [
{
"name": "SERVICE_MANIFEST",
"value": {
"connections": {
"port": "1234"
},
"name": "foo"
}
},
...
]
}
],
...
}
Updates.
Thanks, here is the command I run from your sample.
$ jq --slurpfile sm service_manifest.json '.containerDefinitions[].environment[] |= (select(.name=="SERVICE_MANIFEST").value=$sm)' source.json
But the result is an array, not list.
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"environment": [
{
"name": "SERVICE_MANIFEST",
"value": [
{
"connections": {
"port": "1234"
},
"name": "foo"
}
]
},
{
"name": "SERVICE_PORT",
"value": "4321"
}
]
}
]
}
You can try this jq command:
jq --slurpfile sm SERVICE_MANIFEST '.containerDefinitions[].environment[] |= (select(.name=="SERVICE_MANIFEST").value=$sm[])' file
--slurpfile assigns the content of the file to the variable sm
The filter replaces the array .containerDefinitions[].environment[] with the content of the file only on the element having SERVICE_MANIFEST as name.
A simple solution would use --argfile and avoid select:
< source.json jq --argfile sm service_manifest.json '
.containerDefinitions[0].environment[0].value = $sm '
Or if you want only to update the object(s) with .name == "SERVICE_MANIFEST" you could use the filter:
.containerDefinitions[].environment
|= map(if .name == "SERVICE_MANIFEST"
then .value = $sm
else . end)
Variations
There is no need for any "--arg"-style parameter at all, as illustrated by the following:
jq -s '.[1] as $sm
| .[0] | .containerDefinitions[0].environment[0].value = $sm
' source.json service_manifest.json

How do I update a single value in a nested array of objects in a json document using jq?

I have a JSON document that looks like the following. Note this is a simplified example of the real JSON, which is included at bottom of question:
{
"some_array": [
{
"k1": "A",
"k2": "XXX"
},
{
"k1": "B",
"k2": "YYY"
}
]
}
I would like to change the value of all the k2 keys in the some_array array where the value of the k1 key is "B".
Is this possible using jq ?
For reference this is the actual JSON document, which is an environment variable file for use in postman / newman tool. I am attempting this conversion using JQ because the tool does not yet support command line overrides of specific environment variables
Actual JSON
{
"name": "Local-Stack-Env-Config",
"values": [
{
"enabled": true,
"key": "KC_master_host",
"type": "text",
"value": "http://localhost:8087"
},
{
"enabled": true,
"key": "KC_user_guid",
"type": "text",
"value": "11111111-1111-1111-1111-11111111111"
}
],
"timestamp": 1502768145037,
"_postman_variable_scope": "environment",
"_postman_exported_at": "2017-08-15T03:36:41.474Z",
"_postman_exported_using": "Postman/5.1.3"
}
Here is a slightly simpler version of zayquan's filter:
.some_array |= map(if .k1=="B" then .k2="changed" else . end)
Here's another solution.
jq '(.some_array[] | select(.k1 == "B") | .k2) |= "new_value"'
Output
{
"some_array": [
{
"k1": "A",
"k2": "XXX"
},
{
"k1": "B",
"k2": "new_value"
}
]
}
Here is a viable solution:
cat some.json | jq '.some_array = (.some_array | map(if .k1 == "B" then . + {"k2":"changed"} else . end))'
produces the output:
"some_array": [
{
"k1": "A",
"k2": "XXX"
},
{
"k1": "B",
"k2": "changed"
}
]
}

jq get the value of x based on y in a complex json file

jq strikes again. Trying to get the value of DATABASES_DEFAULT based on the name in a json file that has a whole lot of names and I'm completely lost.
My file looks like the following (output of an aws ecs describe-task-definition) only much more complex; I've stripped this to the most basic example I can where the structure is still intact.
{
"taskDefinition": {
"status": "bar",
"family": "bar2",
"volumes": [],
"taskDefinitionArn": "bar3",
"containerDefinitions": [
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo"
}
],
"name": "baz",
"links": []
},
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo2"
}
],
"name": "boo",
"links": []
}
],
"revision": 1
}
}
I need the value of DATABASES_DEFAULT where the name is baz. Note that there are a lot of keypairs with name, I'm specifically talking about the one outside of environment.
I've been tinkering with this but only got this far before realizing that I don't understand how to access nested values.
jq '.[] | select(.name==DATABASES_DEFAULT) | .value'
which is returning
jq: error: DATABASES_DEFAULT/0 is not defined at <top-level>, line 1:
.[] | select(.name==DATABASES_DEFAULT) | .value
jq: 1 compile error
Obviously this a) doesn't work, and b) even if it did, it's independant of the name value. My thought was to return all the db defaults and then identify the one with baz, but I don't know if that's the right approach.
I like to think of it as digging down into the structure, so first you open the outer layers:
.taskDefinition.containerDefinitions[]
Now select the one you want:
select(.name =="baz")
Open the inner structure:
.environment[]
Select the desired object:
select(.name == "DATABASES_DEFAULT")
Choose the key you want:
.value
Taken together:
parse.jq
.taskDefinition.containerDefinitions[] |
select(.name =="baz") |
.environment[] |
select(.name == "DATABASES_DEFAULT") |
.value
Run it like this:
<infile jq -f parse.jq
Output:
"foo"
The following seems to work:
.taskDefinition.containerDefinitions[] |
select(
select(
.environment[] | .name == "DATABASES_DEFAULT"
).name == "baz"
)
The output is the object with the name key mapped to "baz".
$ jq '.taskDefinition.containerDefinitions[] | select(select(.environment[]|.name == "DATABASES_DEFAULT").name=="baz")' tmp.json
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo"
}
],
"name": "baz",
"links": []
}