I am trying to parse a very large file which consists of JSON objects like this:
{"id": "100000002", "title": "some_title", "year": 1988}
Now I also have a very big list of ID's that I want to extract from the file, if they are there.
Now I know that I can do this:
jq '[ .[map(.id)|indices("1", "2")[]] ]' 0.txt > p0.json
Which produces the result I want, namely fills p0.json with only the objects that have "id" 1 and "2". Now comes the problem: my list of id's is very long too (100k or so). So I have a Python programm that outputs the relevant id's. My line of thought was, to first assign that to a variable:
REL_IDS=`echo python3 rel_ids.py`
And then do:
jq --arg ids "$REL_IDS" '[ .[map(.id)|indices($ids)[]] ]' 0.txt > p0.json
I tried both with brackets [$ids] and without brackets, but no luck so far.
My question is, given a big amount of arguments for the filter, how would I proceed with putting them into my jq command?
Thanks a lot in advance!
Since the list of ids is long, the trick is NOT to use --arg. However, the details will depend on the details regarding the "long list of ids".
In general, though, you'd want to present the list of ids to jq as a file so that you could use --rawfile or --slurpfile or some such.
If for some reason you don't want to bother with an actual file, then provided your shell allows it, you could use these file-oriented options with process substitution: <( ... )
Example
Assuming ids.json contains a lising of the ids as JSON strings:
"1"
"2"
"3"
then one could write:
< objects.json jq -c -n --slurpfile ids ids.json '
inputs | . as $in | select( $ids | index($in.id))'
Notice the use of the -n command-line option.
I have a json file locally called pokemini.json. These are the contents of it;
{"name":"Bulbasaur","type":["Grass","Poison"],"total":318,"hp":45,"attack":49}
{"name":"Ivysaur","type":["Grass","Poison"],"total":405,"hp":60,"attack":62}
{"name":"Venusaur","type":["Grass","Poison"],"total":525,"hp":80,"attack":82}
{"name":"VenusaurMega Venusaur","type":["Grass","Poison"],"total":625,"hp":80,"attack":100}
{"name":"Charmander","type":["Fire"],"total":309,"hp":39,"attack":52}
{"name":"Charmeleon","type":["Fire"],"total":405,"hp":58,"attack":64}
{"name":"Charizard","type":["Fire","Flying"],"total":534,"hp":78,"attack":84}
{"name":"CharizardMega Charizard X","type":["Fire","Dragon"],"total":634,"hp":78,"attack":130}
{"name":"CharizardMega Charizard Y","type":["Fire","Flying"],"total":634,"hp":78,"attack":104}
{"name":"Squirtle","type":["Water"],"total":314,"hp":44,"attack":48}
There are a few types of pokemon in here and I want to do some aggregation with jq.
I could, per example, write this command;
> jq -s -c 'group_by(.type[0]) | .[]' pokemini.json
[{"name":"Charmander","type":["Fire"],"total":309,"hp":39,"attack":52},{"name":"Charmeleon","type":["Fire"],"total":405,"hp":58,"attack":64},{"name":"Charizard","type":["Fire","Flying"],"total":534,"hp":78,"attack":84},{"name":"CharizardMega Charizard X","type":["Fire","Dragon"],"total":634,"hp":78,"attack":130},{"name":"CharizardMega Charizard Y","type":["Fire","Flying"],"total":634,"hp":78,"attack":104}]
[{"name":"Bulbasaur","type":["Grass","Poison"],"total":318,"hp":45,"attack":49},{"name":"Ivysaur","type":["Grass","Poison"],"total":405,"hp":60,"attack":62},{"name":"Venusaur","type":["Grass","Poison"],"total":525,"hp":80,"attack":82},{"name":"VenusaurMega Venusaur","type":["Grass","Poison"],"total":625,"hp":80,"attack":100}]
[{"name":"Squirtle","type":["Water"],"total":314,"hp":44,"attack":48}]
I am aware that the -c flag is what is causing it to print line by line and that I need -s to handle the fact that my json file is more like jsonlines that actualy json. It should also be pointed that out there are only three types of pokemon detected because I can grouping over .type[0] (note that [0]).
I don't get why this does not work though;
> jq -s '.[] | group_by(.type[0])' pokemini.json
jq: error (at pokemini.json:10): Cannot index string with string "type"
group_by/1 expects its input to be an array. By calling .[] first, you are effectively undoing the work of the -s option.
By the way, an alternative to using -s is to use inputs with the -n command-line option, but in this case it makes little difference. When you don’t actually need to read all the entire stream of inputs at once, though, using inputs is in general more efficient.
I have a JSON file and I am extracting data from it using jq. One simple use case is pulling out any JSON Object that contains an Id which is provided as an argument.
I use the following simple script to do so:
[.[] | select(.id == $ID)]
The script is stored in a separate file (by_id.jq) which I pass in using the -f argument.
The full command looks something like this:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b" ./by_id.jq
Is there a way by only using jq that a comma separated list of values could be passed as an argument to the jq script and iterate through the ids and check them against the value of .id in the the JSON file with the result being the objects that have that id?
For example if I wanted to pull out three objects by their ids I would want to structure the command in this way:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963" ./by_id.jq
Sure. Though you'll need to parse (split) that list of ids to something that jq can work with, such as an array of ids. Then your problem becomes, given an array of keys, select objects that have any of these ids. Which you could use approaches found here.
$ jq --arg ID '8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963' '
select(.id | IN($ID|split(",")[]))
' ./my_json_file.json
I'm not sure what your input looks like but judging by your use of slurping then filtering the slurped input, it's a stream of objects. The slurping is not necessary here.
Here is an approach that focuses on efficiency.
Your Q indicates that in fact you have a stream of objects, so the first step towards efficiency is to avoid the -s option, and use -n with inputs instead.
The second step it to avoid splitting your comma-separated string of values more than once.
So your script might look like this:
INDEX($ids | splits(","); .) as $dict
| inputs
| select($dict[.id])
And the invocation would look like this:
jq -n --args a,b,c -f by_id.jq
This of course assumes that simply splitting the string of ids on "," will suffice. You might need to trim the values and take care of other potential anomalies.
For efficiency, it would be better to split $ID just once.
So if you have to use the -s option, you could use the following jq program:
INDEX($ID | splits(","); .) as $dict
| .[]
| select($dict[.id])
I'm trying to use jq to get a value from the JSON that cURL returns.
This is the JSON cURL passes to jq (and, FTR, I want jq to return "VALUE-I-WANT" without the quotation marks):
[
{
"success":{
"username":"VALUE-I-WANT"
}
}
]
I initially tried this:
jq ' . | .success | .username'
and got
jq: error (at <stdin>:0): Cannot index array with string "success"
I then tried a bunch of variations, with no luck.
With a bunch of searching the web, I found this SE entry, and thought it might have been my saviour (spoiler, it wasn't). But it led me to try these:
jq -r '.[].success.username'
jq -r '.[].success'
They didn't return an error, they returned "null". Which may or may not be an improvement.
Can anybody tell me what I'm doing wrong here? And why it's wrong?
You need to pipe the output of .[] into the next filter.
jq -r '.[] | .success.username' tmp.json
tl;dr
# Extract .success.username from ALL array elements.
# .[] enumerates all array elements
# -r produces raw (unquoted) output
jq -r '.[].success.username' file.json
# Extract .success.username only from the 1st array element.
jq -r '.[0].success.username' file.json
Your input is an array, so in order to access its elements you need .[], the array/object-value iterator (as the name suggests, it can also enumerate the properties of an object):
Just . | sends the input (.) array as a whole through the pipeline, and an array only has numerical indices, so the attempt to index (access) it with .success.username fails.
Thus, simply replacing . | with .[] | in your original attempt, combined with -r to get raw (unquoted output), should solve your problem, as shown in chepner's helpful answer.
However, peak points out that since at least jq 1.3 (current as of this writing is jq 1.5) you don't strictly need a pipeline, as demonstrated in the commands at the top.
So the 2nd command in your question should work with your sample input, unless you're using an older version.
I am trying to create a menu using the select function in bash. I am accessing an API which will return its output in json format. I will then process what the API returns into a select statement where a user can then interact with.
Here is the API call and how I parse the output:
curl -H "Authorization:Bearer $ACCESS_TOKEN" https://api.runscope.com/buckets \
| python -mjson.tool > output.json
This will send the output from the curl through python's json parsing tool and finally into the output.json file.
I then create an array using this json blob. I had to set IFS to \n in order to parse the file properly:
IFS=$'\n'
BUCKETS=("$(jq '.data | .[].name' output.json)")
I then add an exit option to the array so that users have a way to quit the selection menu:
BUCKETS+=("Exit")
Finally, I create the menu:
select BUCKET in $BUCKETS;
do
case $BUCKET in
"Exit")
echo "Exiting..."
break;;
esac
echo "You picked: $BUCKET"
done
Unfortunately, this does not create the exit option. I am able to see a menu consisting of every other option I want, except the exit option. Every option in the menu and in the array has quotes around them. How do I get the Exit option to show up?
$BUCKETS is expanding to the first element of the BUCKETS array.
Which is then being word-split and used as your select entries.
This makes sense since you wrapped the jq subshell in double quotes which prevented word-splitting from happening there (and means the IFS change isn't doing anything I believe).
Iff your entries can contain spaces and you want that assigned to an array properly the way to do that is by reading the output of jq with a while IFS= read -r entry; do loop.
BUCKETS=()
while IFS= read -r entry; do
BUCKETS+=("$entry")
done < <(jq '.data | .[].name' output.json)
Then appending your exit item to the array.
BUCKETS+=(Exit)
and then using
select BUCKET in "${BUCKETS[#]}"; do
(Both select a in l; do or select a in l\ndo there's no need for ; and \n there.)
That all being said unless you need output.json for something else you can avoid that too.
BUCKETS=()
while IFS= read -r entry; do
BUCKETS+=("$entry")
done < <(curl -H "Authorization:Bearer $ACCESS_TOKEN" https://api.runscope.com/buckets | python -mjson.tool | jq '.data | .[].name')
Instead of
jq '.data | .[].name' output.json
try
jq -r '.data | .[].name' output.json
(-r : raw data without quotes)
And the most important part :
select BUCKET in "${BUCKETS[#]}"
^^^
ARRAY syntax