JSON: using jq with variable keys - json

I have input JSON data in a bunch of files, with an IP address as one of the keys. I need to iterate over a the files, and I need to get "stuff" out of them. The IP address is different for each file, but I'd like to use a single jq command to get the data. I have tried a bunch of things, the closest I've come is this:
jq '.facts | keys | keys as $ip | .[0]' a_file
On my input in a_file of:
{
"facts": {
"1.1.1.1": {
"stuff":"value"
}
}
}
it returns the IP address, i.e. 1.1.1.1, but then how do I to go back do something like this (which is obviously wrong, but I hope you get the idea):
jq '.facts.$ip[0].stuff' a_file
In my mind I'm trying to populate a variable, and then use the value of that variable to rewind the input and scan it again.
=== EDIT ===
Note that my input was actually more like this:
{
"facts": {
"1.1.1.1": {
"stuff": {
"thing1":"value1"
}
},
"outer_thing": "outer_value"
}
}
So I got an error:
jq: error (at <stdin>:9): Cannot index string with string "stuff"
This fixed it- the question mark after .stuff:
.facts | keys_unsorted[] as $k | .[$k].stuff?

You almost got it right, but need the object value iterator construct, .[] to get the value corresponding to the key
.facts | keys_unsorted[] as $k | .[$k].stuff
This assumes that, inside facts you have one object containing the IP address as the key and you want to extract .stuff from it.
Optionally, to guard against objects that don't contain stuff inside, you could add ? as .[$k].stuff?. And also you could optionally validate keys against a valid IP regex condition and filter values only for those keys.

Related

Sorting a JSON file by outer object name

I have a json file input.json thus:
{
"foo":{
"prefix":"abc",
"body":[1,2,3]
},
"bar":{
"prefix":"def",
"body":[4,5,6]
}
}
I would like to sort it by the outer object names, with "bar" coming before "foo" in alphabetical order like so:
{
"bar":{
"prefix":"def",
"body":[4,5,6]
},
"foo":{
"prefix":"abc",
"body":[1,2,3]
}
}
to produce file output.json.
Versions of this question have been asked of Java/Javascript (here and here)
Is there a way to accomplish this using a command line tool like sed/awk or boost.json?
Using jq, you could use the keys built-in to get the key names in sorted order and form the corresponding value object
jq 'keys[] as $k | { ($k) : .[$k] }' json
Note that jq does have a field --sort-keys option, which cannot be used here, as it internally sorts the inner level objects as well.
Here's a variable-free jq solution:
to_entries | sort_by(.key) | from_entries
It is also worth noting that gojq, the Go implementation of jq, currently always sorts the keys within all JSON objects.

JQ: Convert Dictionary with List as Values to flat CSV

Original Data
I have the following JSON:
{
"foo":[
"asd",
"fgh"
],
"bar":[
"abc",
"xyz",
"ert"
],
"baz":[
"something"
]
}
Now I want to transform it to a "flat" CSV, such that for every key in my object the list in the value is expanded to n rows with n being the number of entries in the respective list.
Expected Output
foo;asd
foo;fgh
bar;abc
bar;xyz
bar;ert
baz;something
Approaches
I guess I need to use to_entries and then for each .value repeat the same .key for the first column. The jq docs state that:
Thus as functions as something of a foreach loop.
So I tried combining
to_entriesto give the keys and values from my dictionary an accessible name
then build kind of a foreach loop around the .values
and pass the result to #csv
to_entries|map(.value) as $v|what goes here?|#csv
I prepared something that at least compiles here
Don't need to use _entries function, a simple key/value lookup and string interpolation should suffice
keys_unsorted[] as $k | "\($k);\( .[$k][])"
The construct .[$k][] is an expression that first expands the value associated with each key, i.e. .foo and then with object construction, produces multiple results for each key identified and stored in $k variable.

Select lots of known IDs from a big JSON document efficiently

I am trying to get some value from json via jq in bash. With small value it work nice but with big json it work too slow, like 1 value for each 2-3 second. Example of my code:
json=$(curl -s -A "some useragent" "url" )
pid=$(cat idlist.json | jq '.page_ids[]')
for id in $pid
do
echo $pagejson|jq -r '.page[]|select(.id=='$id')|.url'>>path.url
done
The "pid" is list of id that I type before running script. It may contain 700-1000 id. Example object of json
{
"page":[
{
"url":"some url",
"id":some numbers
},
{
"url":"some url",
"id":some numbers
}
]
}
Is there any way to speed up it? In javascript it work faster than it. Example of javascript:
//First sort object with order
var url="";
var sortedjson= ids.map(id => obj.find(page => page.id === id));
//Then collect url
for ( x=0 ; x < sortedjson.length;x++) {
url+=sortedjson[x].url
};
Should I sort json like in javascript for better performance? I don't tried it because don't know how.
Edit:
Replaced "pid" variable with json to use less code and for id in $(echo $pid) with for id in $pid.
But it still slow down if id list more than about 50
Calling jq once per id is always going to be slow. Don't do that -- call jq just once, and have it match against the full set.
You can accomplish that by passing the entire comma-separated list of ids into your one copy of jq, and letting jq itself do the work of splitting that string into individual items (and then putting them in a dictionary for fast access)
For example:
pid="24885,73648,38758,8377,747"
jq --arg pidListStr "$pid" '
($pidListStr | [split(",")[] | {(.): true}] | add) as $pidDict |
.page[] | select($pidDict[.id | tostring]) | .url
' <<<"$pagejson"
The following solution uses the same approach as the one posted by Charles Duffy (*) but is only applicable:
if each of the specified id values in $pid appears at most once as an id in the JSON objects in the .page array; or
if the goal is to extract, for each id in $pid, at most one corresponding object from the .page array.
The idea is to remove an id from the dictionary once it is found, and to stop if and when all ids have been found.
jq --arg pidListStr "$pid" '
($pidListStr | [splits(" *, *") | {(.): true}] | add) as $pidDict
| label $finish
| foreach .page[] as $page ($pidDict + {emit:null};
if length == 1 then break $finish
else ($page.id | tostring) as $id
| if .[$id] then delpaths([[$id]]) | .emit = $page.url
else .emit = null
end
end;
.emit // empty )
'
(*) Caveat
Using $pidDict here assumes there are no "collisions"; this condition would hold if all the id values in the .page objects are numeric.
The following is a response to the original question, which posited:
pid="24885,73648,38758,8377,747"
echo $pagejson|jq -r '.page[]|select(.id=='$pid')|.url'
(Based on subsequent edits to the question, it would appear that the intent was to iterate over the id values separately, invoking jq once per value. That is a bad idea as well but can be dealt with in a separate response.)
Response to original question
There are several problems with the invocation of jq based on
interpolating $pid as was originally done.
The major problem is that your query, when expanded, includes this select statement:
select(.id==24885,73648,38758,8377,747)
whereas what you evidently intend is:
select(.id==(24885,73648,38758,8377,747))
It's not difficult to see that there's a huge difference, which affects both functionality and performance.
Since you don't give any hints about the expected input, it's not feasible to suggest how the query might be optimized. To illustrate, though, suppose it's known that the .id values in the input are distinct. Then once all the ids in the query have been found, execution can stop.
In general, passing shell variables in by string interpolation is not a great idea. Some alternatives to consider are using --arg or --argjson.
The following solution, which is based on the one posted by Charles
Duffy (*), can be used if each of the specified id values in $pid
appears at most once as an id in the JSON objects in the .page array.
The idea is to stop if and when all the $pid ids have been found.
This can be accomplished with the following helper function:
def first_n(stream; $n):
label $done
| foreach stream as $x (-1; .+1; if . >= $n then break $done else $x end);
The solution can then be written as follows:
($pidListStr | [splits(" *, *") | {(.): true}] | add) as $pidDict
| ($pidDict|length) as $n
| first_n(.page[] | select($pidDict[.id | tostring]) | .url; $n)
This solution is similar to the one using foreach posted elsewhere
on this page, but is simpler and probably slightly more efficient as
the dictionary, once constructed, is unaltered.
The solution using foreach, however, can also be used if the ids of the
objects in the .page array are not unique, and if the goal is to
extract, for each id in $pid, at most one corresponding object from
the .page array.
(*) Caveat
Using $pidDict here assumes there are no "collisions"; this condition would hold if all the id values in the .page objects are numeric.

JSON parse with jq and filter

I am learning jq. I have content as below:
{
"amphorae_links":[
],
"amphorae":[
{
"status":"BOOTING",
"loadbalancer_id":null,
"created_at":"2020-06-23T08:56:56",
"vrrp_id":null,
"id":"6d66935e-6d39-40c9-bb0d-dd6a734dc77b"
},
{
"status":"ALLOCATED",
"loadbalancer_id":"79970c9a-b0ba-4cde-a7e6-16b61641a7b8",
"created_at":"2020-06-25T06:41:56",
"vrrp_id":1,
"id":"872c08ee-9b21-4b26-9550-c2ffb4a1ad59"
}
]
}
I want to have an output like
"ALLOCATED=872c08ee-9b21-4b26-9550-c2ffb4a1ad59,79970c9a-b0ba-4cde-a7e6-16b61641a7b8"
I tried to use below, but I don't know how to remove the line with status as "BOOTING".
.amphorae[] | "\(.status)=\(.id),\(.loadbalancer_id)"
Thanks anyone for the helping.
You have a couple of options:
You can specify the second index of the amphorae array, that is if you are sure the "ALLOCATED" part will always be the second index of the amphorae array
.amphorae[1] | "\(.status)=\(.id),\(.loadbalancer_id)"
This is a little bit more complex, but will always ensure you get only the element of the amphorae array where the status key is 'ALLOCATED'
.amphorae | map(select(.status == "ALLOCATED"))[] | "\(.status)=\(.id),\(.loadbalancer_id)"

JSON key-globbing

On the jq manual page there are a few examples of output formatting, particularly some shortcuts for when you want to just echo exactly what was in the input JSON.
What if I want to echo exactly what was in the input, but only for keys that match a certain pattern?
For example, given input like so ...
[
{"Name":"Widgets","Size":10,"SymUS":"Widg","SymCN":"Zyin","SymJP":"Kono"},
{"Name":"Blodgets","Size":400,"SymUS":"Blodg","SymAU":"Blod","SymJP":"Kado"},
{"Name":"Fonzes","Size":11,"SymRU":"Fyet","SymBR":"Foao"}
]
Say I want to select all objects where the Name ends in "ets" and then display the Name and all attributes of the form Sym*. All I know about those attributes is that there will be one or more per JSON object, and the names have the format Sym followed by a two-letter ISO country code.
I would like to just do this:
jq '.[] | select(.Name | endswith("ets")) | {Name, Sym*}'
but that's not a thing.
Is this just not something jq is designed to handle in a single operation? Should I do a first pass through the file to collect all the possible keys and then list them all explicitly via a slurpfile?
The key to a simple solution to the problem is to_entries, as described in the online manual. With your example data, the following filter produces the output shown below, in accordance with what I understand to be the expectations:
.[]
| select(.Name | test("ets$"))
| {Name} + (to_entries | map(select(.key|test("^Sym"))) | from_entries)
You might want to refine the regex tests, and/or make other minor adjustments.
Output:
{
"Name": "Widgets",
"SymUS": "Widg",
"SymCN": "Zyin",
"SymJP": "Kono"
}
{
"Name": "Blodgets",
"SymUS": "Blodg",
"SymAU": "Blod",
"SymJP": "Kado"
}