JSON key-globbing - json

On the jq manual page there are a few examples of output formatting, particularly some shortcuts for when you want to just echo exactly what was in the input JSON.
What if I want to echo exactly what was in the input, but only for keys that match a certain pattern?
For example, given input like so ...
[
{"Name":"Widgets","Size":10,"SymUS":"Widg","SymCN":"Zyin","SymJP":"Kono"},
{"Name":"Blodgets","Size":400,"SymUS":"Blodg","SymAU":"Blod","SymJP":"Kado"},
{"Name":"Fonzes","Size":11,"SymRU":"Fyet","SymBR":"Foao"}
]
Say I want to select all objects where the Name ends in "ets" and then display the Name and all attributes of the form Sym*. All I know about those attributes is that there will be one or more per JSON object, and the names have the format Sym followed by a two-letter ISO country code.
I would like to just do this:
jq '.[] | select(.Name | endswith("ets")) | {Name, Sym*}'
but that's not a thing.
Is this just not something jq is designed to handle in a single operation? Should I do a first pass through the file to collect all the possible keys and then list them all explicitly via a slurpfile?

The key to a simple solution to the problem is to_entries, as described in the online manual. With your example data, the following filter produces the output shown below, in accordance with what I understand to be the expectations:
.[]
| select(.Name | test("ets$"))
| {Name} + (to_entries | map(select(.key|test("^Sym"))) | from_entries)
You might want to refine the regex tests, and/or make other minor adjustments.
Output:
{
"Name": "Widgets",
"SymUS": "Widg",
"SymCN": "Zyin",
"SymJP": "Kono"
}
{
"Name": "Blodgets",
"SymUS": "Blodg",
"SymAU": "Blod",
"SymJP": "Kado"
}

Related

Sorting a JSON file by outer object name

I have a json file input.json thus:
{
"foo":{
"prefix":"abc",
"body":[1,2,3]
},
"bar":{
"prefix":"def",
"body":[4,5,6]
}
}
I would like to sort it by the outer object names, with "bar" coming before "foo" in alphabetical order like so:
{
"bar":{
"prefix":"def",
"body":[4,5,6]
},
"foo":{
"prefix":"abc",
"body":[1,2,3]
}
}
to produce file output.json.
Versions of this question have been asked of Java/Javascript (here and here)
Is there a way to accomplish this using a command line tool like sed/awk or boost.json?
Using jq, you could use the keys built-in to get the key names in sorted order and form the corresponding value object
jq 'keys[] as $k | { ($k) : .[$k] }' json
Note that jq does have a field --sort-keys option, which cannot be used here, as it internally sorts the inner level objects as well.
Here's a variable-free jq solution:
to_entries | sort_by(.key) | from_entries
It is also worth noting that gojq, the Go implementation of jq, currently always sorts the keys within all JSON objects.

JQ: Using variable for dot notation path

Is it possible to use a variable to hold a dot notation path? (I'm probably not using the correct term.)
For example, given the following json:
{
"people": [
{
"names": {
"given": "Alice",
"family": "Smith"
},
"id": 47
},
{
"id": 42
}
]
}
Is it possible to construct something like:
.names.given as $ng | .people[] | select(.id==47) | ($ng)
and output "Alice"?
The idea is to allow easier modification of a complex expression. I've tried various parens and quotes with increasing literal results ('.names.given' and '$ng')
The answer is no and yes: as you've seen, once you write an expression such as .names.given as $ng, $ng holds the JSON values, not the path.
But jq does support path expressions in the form of arrays of strings and/or non-negative integers. These can be used to access values in conjunction with the built-in getpath/1.
So you could, for example, write something along the lines of:
["names", "given"] as $ng
| .people[]
| select(.id==47)
| getpath($ng)
Converting jq paths to JSON arrays
It's possible to convert a "dot notation" path into an "array path" using path/1; e.g. the assignment to $ng above could be written as:
(null | path(.names.given)) as $ng
Your question and the example you provided seems very confusing to me. The jist that I got is that you want to assign a name to a value obtained from dot notation and then use it at a later point in time.
See if this is of any help -
.people | map(select(.id = 47))[0].names.given as $ng | $ng

Select lots of known IDs from a big JSON document efficiently

I am trying to get some value from json via jq in bash. With small value it work nice but with big json it work too slow, like 1 value for each 2-3 second. Example of my code:
json=$(curl -s -A "some useragent" "url" )
pid=$(cat idlist.json | jq '.page_ids[]')
for id in $pid
do
echo $pagejson|jq -r '.page[]|select(.id=='$id')|.url'>>path.url
done
The "pid" is list of id that I type before running script. It may contain 700-1000 id. Example object of json
{
"page":[
{
"url":"some url",
"id":some numbers
},
{
"url":"some url",
"id":some numbers
}
]
}
Is there any way to speed up it? In javascript it work faster than it. Example of javascript:
//First sort object with order
var url="";
var sortedjson= ids.map(id => obj.find(page => page.id === id));
//Then collect url
for ( x=0 ; x < sortedjson.length;x++) {
url+=sortedjson[x].url
};
Should I sort json like in javascript for better performance? I don't tried it because don't know how.
Edit:
Replaced "pid" variable with json to use less code and for id in $(echo $pid) with for id in $pid.
But it still slow down if id list more than about 50
Calling jq once per id is always going to be slow. Don't do that -- call jq just once, and have it match against the full set.
You can accomplish that by passing the entire comma-separated list of ids into your one copy of jq, and letting jq itself do the work of splitting that string into individual items (and then putting them in a dictionary for fast access)
For example:
pid="24885,73648,38758,8377,747"
jq --arg pidListStr "$pid" '
($pidListStr | [split(",")[] | {(.): true}] | add) as $pidDict |
.page[] | select($pidDict[.id | tostring]) | .url
' <<<"$pagejson"
The following solution uses the same approach as the one posted by Charles Duffy (*) but is only applicable:
if each of the specified id values in $pid appears at most once as an id in the JSON objects in the .page array; or
if the goal is to extract, for each id in $pid, at most one corresponding object from the .page array.
The idea is to remove an id from the dictionary once it is found, and to stop if and when all ids have been found.
jq --arg pidListStr "$pid" '
($pidListStr | [splits(" *, *") | {(.): true}] | add) as $pidDict
| label $finish
| foreach .page[] as $page ($pidDict + {emit:null};
if length == 1 then break $finish
else ($page.id | tostring) as $id
| if .[$id] then delpaths([[$id]]) | .emit = $page.url
else .emit = null
end
end;
.emit // empty )
'
(*) Caveat
Using $pidDict here assumes there are no "collisions"; this condition would hold if all the id values in the .page objects are numeric.
The following is a response to the original question, which posited:
pid="24885,73648,38758,8377,747"
echo $pagejson|jq -r '.page[]|select(.id=='$pid')|.url'
(Based on subsequent edits to the question, it would appear that the intent was to iterate over the id values separately, invoking jq once per value. That is a bad idea as well but can be dealt with in a separate response.)
Response to original question
There are several problems with the invocation of jq based on
interpolating $pid as was originally done.
The major problem is that your query, when expanded, includes this select statement:
select(.id==24885,73648,38758,8377,747)
whereas what you evidently intend is:
select(.id==(24885,73648,38758,8377,747))
It's not difficult to see that there's a huge difference, which affects both functionality and performance.
Since you don't give any hints about the expected input, it's not feasible to suggest how the query might be optimized. To illustrate, though, suppose it's known that the .id values in the input are distinct. Then once all the ids in the query have been found, execution can stop.
In general, passing shell variables in by string interpolation is not a great idea. Some alternatives to consider are using --arg or --argjson.
The following solution, which is based on the one posted by Charles
Duffy (*), can be used if each of the specified id values in $pid
appears at most once as an id in the JSON objects in the .page array.
The idea is to stop if and when all the $pid ids have been found.
This can be accomplished with the following helper function:
def first_n(stream; $n):
label $done
| foreach stream as $x (-1; .+1; if . >= $n then break $done else $x end);
The solution can then be written as follows:
($pidListStr | [splits(" *, *") | {(.): true}] | add) as $pidDict
| ($pidDict|length) as $n
| first_n(.page[] | select($pidDict[.id | tostring]) | .url; $n)
This solution is similar to the one using foreach posted elsewhere
on this page, but is simpler and probably slightly more efficient as
the dictionary, once constructed, is unaltered.
The solution using foreach, however, can also be used if the ids of the
objects in the .page array are not unique, and if the goal is to
extract, for each id in $pid, at most one corresponding object from
the .page array.
(*) Caveat
Using $pidDict here assumes there are no "collisions"; this condition would hold if all the id values in the .page objects are numeric.

How do I filter JSON using jq based on if an attribute value is in an array?

I have some JSON that I need to filter based on whether certain attribute values are present in an array.
I have something that works but if feels like a kludge. Is there a neater way of doing this?
Input
{"potato":4}
Filter
select(.potato as $k | ([1,2,3,4] | any(. == $k)))
Output
{
"potato": 4
}
jqplay link
https://jqplay.org/s/Ts97jkk21K
Does this seem less of a kludge?
[1,2,3,4] as $acceptable
| .potato as $k
| select( any($acceptable[]; . == $k) )
If your jq has IN/1, you could skip the $k:
[1,2,3,4] as $acceptable
| select(.potato | IN($acceptable[]))
This style makes it easy to pass $acceptable in as a command-line parameter, for example.
Temptation
It is easy to be tempted by the simplicity of a select-only solution such as:
[1,2,3,4] as $acceptable
| select($acceptable[] == .potato)
This would be fine under certain circumstances, e.g. if $acceptable is short and does not contain duplicates (assuming we want the semantics of any). But any and IN have a short-circuit semantics that may be desirable, e.g. for efficiency.

jq - How do I print a parent value of an object when I am already deep into the object's children?

Say I have the following JSON, stored in my variable jsonVariable.
{
"id": 1,
"details": {
"username": "jamesbrown",
"name": "James Brown"
}
}
I parse this JSON with jq using the following:
echo $jsonVariable | jq '.details.name | select(.name == "James Brown")'
This would give me the output
James Brown
But what if I want to get the id of this person as well? Now, I'm aware this is a rough and simple example - the program I'm working with at the moment is 5 or 6 levels deep with many different JQ functions other than select. I need a way to select a parent's field when I am already 5 or 6 layers deep after carrying out various methods of filtering.
Can anyone help? Is there any way of 'going in reverse', back up to the parent? (Not sure if I'm making sense!)
For a more generic approach, save the value of the "parent" element at the detail level you want, then pipe it at the end of your filter:
jq '. as $parent | .details.name | select(. == "James Brown") | $parent'
Of course, for the trivial case you expose, you could omit this entirely:
jq 'select(.details.name == "James Brown")'
Also, consider that if your selecting filters return many matches for a single parent object, you will receive a copy of the parent object for each match. You may wish to make sure your select filters only return one element at the parent level by wrapping all matches below parent level into an array, or to deduplicate the final result with unique.
Give this a shot:
echo $jsonVariable | jq '{Name: .details.name, Id: .Id} | select(.name == "James Brown")'
Rather than querying up to the value you're testing for, query up to the root object that contains the value you're querying on and the values you wish to select.
You need the object that contains both the id and the name.
$ jq --arg name 'James Brown' 'select(.details.name == $name).id' input.json