JQ: perform token replacement - json

I'd like to replace the tokens in text with the variables defined in ma. Input JSON:
{
"ma":{
"a":"1",
"b":"2",
"c":"3"
},
"mb":{
"a":"11"
},
"text":"https://ph.com?a={a}&b={b}"
}
Desired result: https://ph.com?a=1&b=2
Extra credit, how can I have mb variables take precedence over ma variables so that my resulting text is: https://ph.com?a=11&b=2 ?
I've tried using combinations of scan and sub and walk but can't figure it out.
Thanks!

Define a function to replace the tokens with the new values.
def format($map): gsub("\\{(?<key>[^}]+)\\}"; "\($map[.key])");
With this, you can then pass in the map for the replacements.
.ma as $map | .text | format($map)
Update the mapping as needed.
(.ma * .mb) as $map | .text | format($map)

If you're stuck with the {a}-style template names, then see #JeffMercado's answer; if, however, you have control over the templating style, it would make things much simpler if you used jq's string-interpolation feature.
For example, if the template string (.text) were "https://ph.com?a=\\(.a)&b=\\(.b)" then if you just want the value of .text after substitution, you could simply write:
(.ma + .mb) as $map | .text | $map
Or if you wanted in-place substitution:
(.ma + .mb) as $map
| .text |= $map

Related

JQ: Using variable for dot notation path

Is it possible to use a variable to hold a dot notation path? (I'm probably not using the correct term.)
For example, given the following json:
{
"people": [
{
"names": {
"given": "Alice",
"family": "Smith"
},
"id": 47
},
{
"id": 42
}
]
}
Is it possible to construct something like:
.names.given as $ng | .people[] | select(.id==47) | ($ng)
and output "Alice"?
The idea is to allow easier modification of a complex expression. I've tried various parens and quotes with increasing literal results ('.names.given' and '$ng')
The answer is no and yes: as you've seen, once you write an expression such as .names.given as $ng, $ng holds the JSON values, not the path.
But jq does support path expressions in the form of arrays of strings and/or non-negative integers. These can be used to access values in conjunction with the built-in getpath/1.
So you could, for example, write something along the lines of:
["names", "given"] as $ng
| .people[]
| select(.id==47)
| getpath($ng)
Converting jq paths to JSON arrays
It's possible to convert a "dot notation" path into an "array path" using path/1; e.g. the assignment to $ng above could be written as:
(null | path(.names.given)) as $ng
Your question and the example you provided seems very confusing to me. The jist that I got is that you want to assign a name to a value obtained from dot notation and then use it at a later point in time.
See if this is of any help -
.people | map(select(.id = 47))[0].names.given as $ng | $ng

JSON: using jq with variable keys

I have input JSON data in a bunch of files, with an IP address as one of the keys. I need to iterate over a the files, and I need to get "stuff" out of them. The IP address is different for each file, but I'd like to use a single jq command to get the data. I have tried a bunch of things, the closest I've come is this:
jq '.facts | keys | keys as $ip | .[0]' a_file
On my input in a_file of:
{
"facts": {
"1.1.1.1": {
"stuff":"value"
}
}
}
it returns the IP address, i.e. 1.1.1.1, but then how do I to go back do something like this (which is obviously wrong, but I hope you get the idea):
jq '.facts.$ip[0].stuff' a_file
In my mind I'm trying to populate a variable, and then use the value of that variable to rewind the input and scan it again.
=== EDIT ===
Note that my input was actually more like this:
{
"facts": {
"1.1.1.1": {
"stuff": {
"thing1":"value1"
}
},
"outer_thing": "outer_value"
}
}
So I got an error:
jq: error (at <stdin>:9): Cannot index string with string "stuff"
This fixed it- the question mark after .stuff:
.facts | keys_unsorted[] as $k | .[$k].stuff?
You almost got it right, but need the object value iterator construct, .[] to get the value corresponding to the key
.facts | keys_unsorted[] as $k | .[$k].stuff
This assumes that, inside facts you have one object containing the IP address as the key and you want to extract .stuff from it.
Optionally, to guard against objects that don't contain stuff inside, you could add ? as .[$k].stuff?. And also you could optionally validate keys against a valid IP regex condition and filter values only for those keys.

Select lots of known IDs from a big JSON document efficiently

I am trying to get some value from json via jq in bash. With small value it work nice but with big json it work too slow, like 1 value for each 2-3 second. Example of my code:
json=$(curl -s -A "some useragent" "url" )
pid=$(cat idlist.json | jq '.page_ids[]')
for id in $pid
do
echo $pagejson|jq -r '.page[]|select(.id=='$id')|.url'>>path.url
done
The "pid" is list of id that I type before running script. It may contain 700-1000 id. Example object of json
{
"page":[
{
"url":"some url",
"id":some numbers
},
{
"url":"some url",
"id":some numbers
}
]
}
Is there any way to speed up it? In javascript it work faster than it. Example of javascript:
//First sort object with order
var url="";
var sortedjson= ids.map(id => obj.find(page => page.id === id));
//Then collect url
for ( x=0 ; x < sortedjson.length;x++) {
url+=sortedjson[x].url
};
Should I sort json like in javascript for better performance? I don't tried it because don't know how.
Edit:
Replaced "pid" variable with json to use less code and for id in $(echo $pid) with for id in $pid.
But it still slow down if id list more than about 50
Calling jq once per id is always going to be slow. Don't do that -- call jq just once, and have it match against the full set.
You can accomplish that by passing the entire comma-separated list of ids into your one copy of jq, and letting jq itself do the work of splitting that string into individual items (and then putting them in a dictionary for fast access)
For example:
pid="24885,73648,38758,8377,747"
jq --arg pidListStr "$pid" '
($pidListStr | [split(",")[] | {(.): true}] | add) as $pidDict |
.page[] | select($pidDict[.id | tostring]) | .url
' <<<"$pagejson"
The following solution uses the same approach as the one posted by Charles Duffy (*) but is only applicable:
if each of the specified id values in $pid appears at most once as an id in the JSON objects in the .page array; or
if the goal is to extract, for each id in $pid, at most one corresponding object from the .page array.
The idea is to remove an id from the dictionary once it is found, and to stop if and when all ids have been found.
jq --arg pidListStr "$pid" '
($pidListStr | [splits(" *, *") | {(.): true}] | add) as $pidDict
| label $finish
| foreach .page[] as $page ($pidDict + {emit:null};
if length == 1 then break $finish
else ($page.id | tostring) as $id
| if .[$id] then delpaths([[$id]]) | .emit = $page.url
else .emit = null
end
end;
.emit // empty )
'
(*) Caveat
Using $pidDict here assumes there are no "collisions"; this condition would hold if all the id values in the .page objects are numeric.
The following is a response to the original question, which posited:
pid="24885,73648,38758,8377,747"
echo $pagejson|jq -r '.page[]|select(.id=='$pid')|.url'
(Based on subsequent edits to the question, it would appear that the intent was to iterate over the id values separately, invoking jq once per value. That is a bad idea as well but can be dealt with in a separate response.)
Response to original question
There are several problems with the invocation of jq based on
interpolating $pid as was originally done.
The major problem is that your query, when expanded, includes this select statement:
select(.id==24885,73648,38758,8377,747)
whereas what you evidently intend is:
select(.id==(24885,73648,38758,8377,747))
It's not difficult to see that there's a huge difference, which affects both functionality and performance.
Since you don't give any hints about the expected input, it's not feasible to suggest how the query might be optimized. To illustrate, though, suppose it's known that the .id values in the input are distinct. Then once all the ids in the query have been found, execution can stop.
In general, passing shell variables in by string interpolation is not a great idea. Some alternatives to consider are using --arg or --argjson.
The following solution, which is based on the one posted by Charles
Duffy (*), can be used if each of the specified id values in $pid
appears at most once as an id in the JSON objects in the .page array.
The idea is to stop if and when all the $pid ids have been found.
This can be accomplished with the following helper function:
def first_n(stream; $n):
label $done
| foreach stream as $x (-1; .+1; if . >= $n then break $done else $x end);
The solution can then be written as follows:
($pidListStr | [splits(" *, *") | {(.): true}] | add) as $pidDict
| ($pidDict|length) as $n
| first_n(.page[] | select($pidDict[.id | tostring]) | .url; $n)
This solution is similar to the one using foreach posted elsewhere
on this page, but is simpler and probably slightly more efficient as
the dictionary, once constructed, is unaltered.
The solution using foreach, however, can also be used if the ids of the
objects in the .page array are not unique, and if the goal is to
extract, for each id in $pid, at most one corresponding object from
the .page array.
(*) Caveat
Using $pidDict here assumes there are no "collisions"; this condition would hold if all the id values in the .page objects are numeric.

How do I filter JSON using jq based on if an attribute value is in an array?

I have some JSON that I need to filter based on whether certain attribute values are present in an array.
I have something that works but if feels like a kludge. Is there a neater way of doing this?
Input
{"potato":4}
Filter
select(.potato as $k | ([1,2,3,4] | any(. == $k)))
Output
{
"potato": 4
}
jqplay link
https://jqplay.org/s/Ts97jkk21K
Does this seem less of a kludge?
[1,2,3,4] as $acceptable
| .potato as $k
| select( any($acceptable[]; . == $k) )
If your jq has IN/1, you could skip the $k:
[1,2,3,4] as $acceptable
| select(.potato | IN($acceptable[]))
This style makes it easy to pass $acceptable in as a command-line parameter, for example.
Temptation
It is easy to be tempted by the simplicity of a select-only solution such as:
[1,2,3,4] as $acceptable
| select($acceptable[] == .potato)
This would be fine under certain circumstances, e.g. if $acceptable is short and does not contain duplicates (assuming we want the semantics of any). But any and IN have a short-circuit semantics that may be desirable, e.g. for efficiency.

JSON key-globbing

On the jq manual page there are a few examples of output formatting, particularly some shortcuts for when you want to just echo exactly what was in the input JSON.
What if I want to echo exactly what was in the input, but only for keys that match a certain pattern?
For example, given input like so ...
[
{"Name":"Widgets","Size":10,"SymUS":"Widg","SymCN":"Zyin","SymJP":"Kono"},
{"Name":"Blodgets","Size":400,"SymUS":"Blodg","SymAU":"Blod","SymJP":"Kado"},
{"Name":"Fonzes","Size":11,"SymRU":"Fyet","SymBR":"Foao"}
]
Say I want to select all objects where the Name ends in "ets" and then display the Name and all attributes of the form Sym*. All I know about those attributes is that there will be one or more per JSON object, and the names have the format Sym followed by a two-letter ISO country code.
I would like to just do this:
jq '.[] | select(.Name | endswith("ets")) | {Name, Sym*}'
but that's not a thing.
Is this just not something jq is designed to handle in a single operation? Should I do a first pass through the file to collect all the possible keys and then list them all explicitly via a slurpfile?
The key to a simple solution to the problem is to_entries, as described in the online manual. With your example data, the following filter produces the output shown below, in accordance with what I understand to be the expectations:
.[]
| select(.Name | test("ets$"))
| {Name} + (to_entries | map(select(.key|test("^Sym"))) | from_entries)
You might want to refine the regex tests, and/or make other minor adjustments.
Output:
{
"Name": "Widgets",
"SymUS": "Widg",
"SymCN": "Zyin",
"SymJP": "Kono"
}
{
"Name": "Blodgets",
"SymUS": "Blodg",
"SymAU": "Blod",
"SymJP": "Kado"
}