jq: how to filter nested keys? - json

I have done a lot of research on stackoverflow but cannot find any related post.
assume I have a json like
{
"talk": {
"docs": {
"count": 22038185,
"deleted": 626193
},
"store": {
"size_in_bytes": 6885993125,
"throttle_time_in_millis": 1836569
}
},
"list": {
"docs": {
"count": 22038185,
"deleted": 626193
},
"store": {
"size_in_bytes": 6885993125,
"throttle_time_in_millis": 1836569
}
}
}
I want to filter out "store" field in all keys to get an output like
{
"talk": {
"docs": {
"count": 22038185,
"deleted": 626193
}
},
"list": {
"docs": {
"count": 22038185,
"deleted": 626193
}
}
}
How can I achieve it with jq?

Use del and recurse together.
jq 'del(recurse|.store?)' foo.json
You can also the short .. for recurse with no arguments:
jq 'del(..|.store?)' foo.json
The ? prevents errors when recurse reaches something for which .store is an invalid filter.

If you only want to remove the "store" key when it occurs at the second level, then consider:
map_values( del(.store) )
Postscript
Subsequently, the OP asked:
But what if the deleted fields are many? can we only keep 'docs'
Answer (in this particular case):
map_values( {docs} )

Related

how to denormalise this json structure

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
],
"name": "dbname1"
}
]
Using jq I tried to generate a simpeler format out of this, until now without any luck.
What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"name": "dbname1",
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"name": "dbname1",
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
]
}
]
where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using
'.[].backup[].name + ":" + .[].backup[].type'
I get a cartesian product:
"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"
how to solve that?
So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try
jq 'map(.backup[] += {name} | del(.name))'
Demo
Then, generating a CSV output using jq is easy: There is a builtin called #csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.
jq -r '.[]
| .backup[] + {name}
| [(.archive | .start, .stop), .name, .info.size, .label, .type]
| #csv
'
Demo
First navigate to backup and only then “print” the stuff you’re interested.
.[].backup[] | .name + ":" + .type

Sorting strings in an array with jq

I have two json files here, both have the same content only in a different order, and these are to be checked for equality with a diff.
I already sort the keys with jq -S, but now I have to make sure that the strings are sorted equally within the arrays.
Unfortunately, I fail at the moment, I am not quite clear how I get to the right level and how I can sort the content.
Here is an example structure of the jsons, the array 'allowed-test-mapper-data' should be sorted in descending order
{
"accessCodeLife": 60,
"accessCodeLifespan": 1800,
"accessCodeType": 300,
"components": {
"test.data.app": [
{
"config": {
"allow-default-test-scopes": [
"true"
]
},
"name": "Allowed Test Client",
"id": "allowed-testdata",
"subComponents": {},
"subType": "testdata"
},
{
"config": {
"allowed-test-mapper-data": [
"alfred",
"usa",
"canada",
"somedata",
"alcohol",
"brother"
]
}
}
]
}
}
Can someone help me here ?
Would be great :)
Use the update assignment |= operator to change a part of the structure:
jq '.components."test.data.app"[].config."allowed-test-mapper-data"
|= if . then sort else empty end' file.json

Using `jq` to add key/value to a json file using another json file as a source

Been struggling with this for a while and I'm no closer to a solution. I'm not very experienced using jq.
I'd like to take the values from one json file and add them to another file when other values in the dict match. The example files below demonstrate what I'd like more clearly than an explanation.
hosts.json:
{
"hosts": [
{
"host": "hosta.example.com",
"hostid": "101",
"proxy_hostid": "1"
},
{
"host": "hostb.example.com",
"hostid": "102",
"proxy_hostid": "1"
},
{
"host": "hostc.example.com",
"hostid": "103",
"proxy_hostid": "2"
}
]
}
proxies.json:
{
"proxies": [
{
"host": "proxy1.example.com",
"proxyid": "1"
},
{
"host": "proxy2.example.com",
"proxyid": "2"
}
]
}
I also have the above file available with proxyid as the key, if this makes it easier:
{
"proxies": {
"1": {
"host": "proxy1.example.com",
"proxyid": "1"
},
"2": {
"host": "proxy2.example.com",
"proxyid": "2"
}
}
}
Using these json files above (from the Zabbix API), I'd like to add the value of .proxies[].host (from proxies.json) as .hosts[].proxy_host (to hosts.json).
This would only be when .hosts[].proxy_hostid equals .proxies[].proxyid
Desired output:
{
"hosts": [
{
"host": "hosta.example.com",
"hostid": "101",
"proxy_hostid": "1",
"proxy_host": "proxy1.example.com"
},
{
"host": "hostb.example.com",
"hostid": "102",
"proxy_hostid": "1",
"proxy_host": "proxy1.example.com"
},
{
"host": "hostc.example.com",
"hostid": "103",
"proxy_hostid": "2",
"proxy_host": "proxy2.example.com"
}
]
}
I've tried many different ways of doing this, and think I need to use jq -s or jq --slurpfile, but I've reached a lot of dead-ends and can't find a solution.
jq 'input as $p | map(.[].proxy_host = $p.proxies[].proxyid)' hosts.json proxies.json
I think I would need something like this as well, but not sure how to use it.
if .hosts[].proxy_hostid == .proxies[].proxyid then .hosts[].proxy_host = .proxies[].host else empty end'
I've found these questions but they haven't helped :(
How do I use a value as a key reference in jq? <- I think this one is the closest
Lookup values from one JSON file and replace in another
Using jq find key/value pair based on another key/value pair
This indeed is easier with the alternative version of your proxies.json. All you need is to store proxies in a variable as reference, and retrieve proxy hosts from it while updating hosts.
jq 'input as { $proxies } | .hosts[] |= . + { proxy_host: $proxies[.proxy_hostid].host }' hosts.json proxies.json
Online demo

jq - retrieve values from json table on one line for specific columns

I'm trying to get cell values from a json formatted table but only for specific columns and have it output into its own object.
json example -
{
"rows":[
{
"id":409363222161284,
"rowNumber":1,
"cells":[
{
"columnId":"nameColumn",
"value":"name1"
},
{
"columnId":"infoColumn",
"value":"info1"
},
{
"columnId":"excessColumn",
"value":"excess1"
}
]
},
{
"id":11312541213,
"rowNumber":2,
"cells":[
{
"columnId":"nameColumn",
"value":"name2"
},
{
"columnId":"infoColumn",
"value":"info2"
},
{
"columnId":"excessColumn",
"value":"excess2"
}
]
},
{
"id":11312541213,
"rowNumber":3,
"cells":[
{
"columnId":"nameColumn",
"value":"name3"
},
{
"columnId":"infoColumn",
"value":"info3"
},
{
"columnId":"excessColumn",
"value":"excess3"
}
]
}
]
}
Ideal output would be filtered by two columns - nameColumn, infoColumn - with each row being a single line of the values.
Output example -
{
"name": "name1",
"info": "info1"
}
{
"name": "name2",
"info": "info2"
}
{
"name": "name3",
"info": "info3"
}
I've tried quite a few different combinations of things with select statements and this is the closest I've come but it only uses one.
jq '.rows[].cells[] | {name: (select(.columnId=="nameColumn") .value), info: "infoHereHere"}'
{
"name": "name1",
"info": "infoHere"
}
{
"name": "name2",
"info": "infoHere"
}
{
"name": "name3",
"info": "infoHere"
}
If I try to combine another one, it's not so happy.
jq -j '.rows[].cells[] | {name: (select(.columnId=="nameColumn") .value), info: (select(.columnId=="infoColumn") .value)}'
Nothing is output.
** Edit **
Apologies for being unclear with this. The final output would ideally be a csv for the selected columns values
name1,info1
name2,info2
Presumably you would want the output to be grouped by row, so let's first consider:
.rows[].cells
| map(select(.columnId=="nameColumn" or .columnId=="infoColumn"))
This produces a stream of JSON arrays, the first of which using your main example would be:
[
{
"columnId": "nameColumn",
"value": "name1"
},
{
"columnId": "infoColumn",
"value": "info1"
}
]
If you want the output in some alternative format, then you could tweak the above jq program accordingly.
If you wanted to select a large number of columns, the use of a long "or" expression might become unwieldy, so you might also want to consider using a "whitelist". See e.g. Whitelisting objects using select
Or you might want to use del to delete the unwanted columns.
Producing CSV
One way would be to use #csv with the -r command-line option, e.g. with:
| map(select(.columnId=="nameColumn" or .columnId=="infoColumn")
| {(.columnId): .value} )
| add
| [.nameColumn, .infoColumn]
| #csv

Collect JSON objects with same attribute value, and create new key/value pairs

Here is a simplified sample of the JSON data I'm working with:
[
{ "certname": "one.example.com",
"name": "fact1",
"value": "value1"
},
{ "certname": "one.example.com",
"name": "fact2",
"value": 42
},
{ "certname": "two.example.com",
"name": "fact1",
"value": "value3"
},
{ "certname": "two.example.com",
"name": "fact2",
"value": 10000
},
{ "certname": "two.example.com",
"name": "fact3",
"value": { "anotherkey": "anothervalue" }
}
]
The result I want to achieve, using jq preferably, is the following:
[
{
"certname": "one.example.com",
"fact1": "value1",
"fact2": 42
},
{
"certname": "two.example.com",
"fact1": "value3",
"fact2": 10000,
"fact3": { "anotherkey": "anothervalue" }
}
]
Its worth pointing out that not all elements have the same name/value pairs, by any means. Also, values are often complex objects in their own right.
If I was doing this in Python, it wouldn't be a big deal (and yes, I can hear the chorus of "do it in Python" ringing in my ears now). I would like to understand how to do this in jq, and it's escaping me at the moment.
... using jq preferably ...
That's the spirit! And in that spirit, here's a concise solution:
map( {certname, (.name): .value} )
| group_by(.certname)
| map(add)
Of course there are other reasonable solutions. If the above is at first puzzling, you might like to add a debug statement here or there, or you might like to explore the pipeline by executing the first line by itself, etc.