How can I merge matching keys to into arrays via another key? - json

I have a GraphQL schema file with deeply nested object metadata that I'd like to extract into arrays of child properties. The original file is over 75000 lines long but I was able to successfully extract the Types & fields for each object using this command:
jq '.data.__schema.types[] | {name: .name, fields: .fields[]?.name?}' schema.json > output.json
Output:
{
"name": "UsersConnection",
"fields": "nodes"
}
{
"name": "UsersConnection",
"fields": "edges"
}
{
"name": "UsersConnection",
"fields": "pageInfo"
}
{
"name": "UsersConnection",
"fields": "totalCount"
}
{
"name": "UsersEdge",
"fields": "cursor"
}
{
"name": "UsersEdge",
"fields": "node"
}
...
But the output I want looks more like this:
[{
"name": "UsersConnection",
"fields": [ "nodes", "edges", "pageInfo", "totalCount" ]
},
{
"name": "UsersEdge",
"fields": [ "cursor", "node" ]
}]
I was able to do this by comma-separating each object, surrounding the output with { "data": [ -OUTPUT- ]} & the command:
jq 'map(. |= (group_by(.name) | map(first + {fields: map(.fields)})))' output.json > output2.json
How can I do this with a single command?

Assuming .data.__schema.types is an array, and so is .fields, you could try map in both cases:
.data.__schema.types | map({name: .name, fields: (.fields | map(.name))})

I totally missed that I put the fields object inside brackets like this:
jq '.data.__schema.types[] | {name: .name, fields: [.fields[]?.name?]}'
Keeping this up for posterity in case someone else is trying to do the same thing
Update: I was able to get a cleaner, comma-separated result like this:
jq 'reduce .data.__schema.types[] as $d (null; .[$d.name] += [$d.fields[]?.name?])'

Related

How to get a key/value pair on a sibling object in a JSON array conditionally

I'm interacting with an API that returns an array of objects related to a product in JSON format:
[
{
"type": "category",
"name": "food"
},
{
"type": "category",
"name": "fruit"
},
{
"type": "barcode",
"name": "123456"
}
]
I'm trying to use jq tool on bash in the shortest and tidiest form, to check if a product has a barcode and it's categorized as food. In other words, check if an object with type=barcode exists in the array, then check if there is an object with type=category together with name=food.
This outputs true or false based on your requirements:
any(.type=="barcode") and any(.type=="category" and .name=="food")
jq -e will set the exit code of the program accordingly:
if jq -e '...'; then
...
fi
Without -e:
if test "$(jq '...')" = true; then
...
fi
And not necessarily shorter or tidier, but semantically easier to follow:
group_by(.type)
| map({
key: first.type,
value: map(.name)
})
| from_entries
| .category as $cat
| .barcode and ("food"|IN($cat[]))
This first builds an intermediate object of the form:
{
"barcode": [
"123456"
],
"category": [
"food",
"fruit"
]
}
Which you can then query, e.g. does it have a barcode and is "food" one of the categories.

how to denormalise this json structure

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
],
"name": "dbname1"
}
]
Using jq I tried to generate a simpeler format out of this, until now without any luck.
What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"name": "dbname1",
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"name": "dbname1",
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
]
}
]
where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using
'.[].backup[].name + ":" + .[].backup[].type'
I get a cartesian product:
"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"
how to solve that?
So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try
jq 'map(.backup[] += {name} | del(.name))'
Demo
Then, generating a CSV output using jq is easy: There is a builtin called #csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.
jq -r '.[]
| .backup[] + {name}
| [(.archive | .start, .stop), .name, .info.size, .label, .type]
| #csv
'
Demo
First navigate to backup and only then “print” the stuff you’re interested.
.[].backup[] | .name + ":" + .type

Pair value with all products of a filter

I'm new to jq and I have a json response from a get request that looks like:
[
{
"vs": {
"name": "vs_name",
"pool": {
"p_id_name": "XYZ",
"members": [
{
"m_name": "XXX1",
"id_name": "YYY1",
"address": "ZZZ1"
},
{
"m_name": "XXX2",
"id_name": "YYY2",
"address": "ZZZ2"
}
]
}
}
}
]
I'm trying to get an output that looks like (repating the p_id_name for each m_name):
XYZ, XXX1
XYZ, XXX2
I tried the following but it didn't work.
$ jq '.[].vs.pool|[.members[].m_name,.p_id_name]' file
[
"XXX1",
"XXX2",
"XYZ"
]
Between square brackets, all products are collected into a single array. String interpolation doesn't have this effect.
.[].vs.pool | "\(.p_id_name), \(.members[].m_name)"
Online demo
If you want to output arrays, you need to create a separate array for each m_name.
.[].vs.pool | [.p_id_name] + (.members[] | [.m_name])
Online demo

jq: Insert values according to mappings from external file

I was wondering how I can complete this task by command line jq. I make up a file with similar nested structure as follows:
{
"item": "item1",
"features": [
{
"feature": "feature_a",
"value": ""
},
{
"feature": "feature_b",
"value": ""
}
]
}
Now I have another file that maps the feature to value:
feature_a value_1
feature_b value_2
So I would like to insert the value into the first json file, according to the maps, resulting the following output:
{
"item": "item1";
"features": [
{
"feature": "feature_a",
"value": "value_1"
},
{
"feature": "feature_b",
"value": "value_2"
}
]
}
How I can achieve above operation by jq?
Thanks in advance!
Assuming the text file is in dict.txt and the JSON file is in source.json, the invocation
jq -Rs --argfile target source.json dict.txt '
([ split("\n")[]
| select(length>0)
| split(" ")
| { (.[0]): .[1]} ]
| add) as $dict
| $target
| .features |= map(.value = $dict[.feature])'
would yield the desired output.
The main reason for including select(length>0) is to skip any empty strings that might result from using split("\n") to split an entire file.

Update inner attribute of JSON with jq

Could somebody help me to deal with jq command line utility to update JSON object's inner value?
I want to alter object interpreterSettings.2B263G4Z1.properties by adding several key-values, like "spark.executor.instances": "16".
So far I only managed to fully replace this object, not add new properties with command:
cat test.json | jq ".interpreterSettings.\"2B188AQ5T\".properties |= { \"spark.executor.instances\": \"16\" }"
This is input JSON:
{
"interpreterSettings": {
"2B263G4Z1": {
"id": "2B263G4Z1",
"name": "sh",
"group": "sh",
"properties": {}
},
"2B188AQ5T": {
"id": "2B188AQ5T",
"name": "spark",
"group": "spark",
"properties": {
"spark.cores.max": "",
"spark.yarn.jar": "",
"master": "yarn-client",
"zeppelin.spark.maxResult": "1000",
"zeppelin.dep.localrepo": "local-repo",
"spark.app.name": "Zeppelin",
"spark.executor.memory": "2560M",
"zeppelin.spark.useHiveContext": "true",
"spark.home": "/usr/lib/spark",
"zeppelin.spark.concurrentSQL": "false",
"args": "",
"zeppelin.pyspark.python": "python"
}
}
},
"interpreterBindings": {
"2AXUMXYK4": [
"2B188AQ5T",
"2AY8SDMRU"
]
}
}
I also tried the following but this only prints contents of interpreterSettings.2B263G4Z1.properties, not full object.
cat test.json | jq ".interpreterSettings.\"2B188AQ5T\".properties + { \"spark.executor.instances\": \"16\" }"
The following works using jq 1.4 or jq 1.5 with a Mac/Linux shell:
jq '.interpreterSettings."2B188AQ5T".properties."spark.executor.instances" = "16" ' test.json
If you have trouble adapting the above for Windows, I'd suggest putting the jq program in a file, say my.jq, and invoking it like so:
jq -f my.jq test.json
Notice that there is no need to use "cat" in this case.
p.s. You were on the right track - try replacing |= with +=