Using jq with a large JSON file

Using jq with a large JSON file - json

I have an extremely large JSON file that I am working with, on my Linux box. When I jq the file I get an output in this format, which is perfect:
{
“ID:” 12345
“Name:” joe
“Address:” 123 first street
“Email:” joe#example.com
My goal is to be able to grep for a particular field but get all related fields to return. So if I did a grep for “123 first street” I would also get the ID , name, and email that was with that group of data.
Thus far, I have gotten here:
jq . Myfile.json | grep “123 first street”
Can anyone help with me with getting this query right? I would like to stay with this JSON format and stay in the Linux box.
jq . Myfile.json | grep “123 first street”

This should return all JSON objects with "field".
jq '.[] | select(has("field"))'

Related

Fetch json data using Regex on linux

I know we should use JQ for parsing json data, but I want to parse it using regex. I want to fetch the value of a json key into a variable in my shell script. As of now, I am using JQ for parsing.
So my abc.json is
{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[],"value4":{"value41":{"value411":5}}}
Currently, my XYZ.sh has these lines to fetch the data
data1 =$(cat abc.json | jq -r '.value4.value41.value411')
I want data1 to have value of value411. How can I achieve this?
ps- The JSON is mutable. The above JSON is just a part of the JSON file that I want to fetch.

Is your json structure immutable? If you have to use it, consider the following
┌──[root#vms83.liruilongs.github.io]-[~]
└─$cat abc.json | awk -F: '{print $NF}' | grep -o '[[:digit:]]'
5

I think your problem was you had a space between data and =. There can't be a space there.
This works as you want it to (I removed the unnecessary cat)
data1=$(jq -r '.value4.value41.value411' abc.json)
echo $data1

How to check if element(s) exist in JSON array using jq, and put the corresponding object into a new file

I am running curl commands on ~50 URL's and each have JSON that looks like this (but with different values for 'country' with each curl command, but the values for 'names' can possibly repeat or be unique:
e.g. one curl command can give JSON that looks like this:
{"names":["Mary","Tom","Sue","Rob"],"country":"USA"}
while the next curl command will give this:
{"names":["Sue"],"country":"Russia"}
and the next curl command will give this:
{"names":["Tom","Jenny"],"country":"Nigeria"}
and so on and so forth.
I have a separate list of names (e.g. Tom, Sarah, Jenny, Trinh, Nancy) and I want to find out if they're associated with a country in any of the JSON's I'm running the curl command on. If they exist in "names", I want to put the name of the person and the country into a new text file (or JSON file, doesn't matter - i just want it formatted properly), so at the end I have an output file that associates the name of the person and the country they belong to. If a country has multiple people, there shouldn't be a duplicate value for country in the output file; the names of the people should be listed under that one country.
I've tried multiple ways to solve this, but I'm not able to figure it out as it's my first time trying to write a script.
Last command that I tried:
curl "https://..." | jq -r 'select(.names[] as $a | ["Tom","Sarah","Jenny","Trinh","Nancy"] | index($a) | while read output; do tee -a listOfCountries; done; done
^This gave duplicates and I wasnt sure how to format the output so that there were spaces between each output and that the country had only the specific names of the people under it
The output file (given above example) should be something like:
USA: Tom
Nigeria: Tom, Jenny
Please let me know if you have any suggestions, it'll greatly be appreciated. Thank you!
Side question: If the list of names to search is extremely long (100+ names), what is the best way to script this?

With all your JSON objects in a file, say output.jsons:
jq -c -n --argjson list '[ "Tom", "Sarah", "Jenny", "Trinh", "Nancy"]' '
(reduce inputs as $in ({}; reduce $in.names[] as $name (.; .[$name] += [$in.country]))) as $dict
| reduce $list[] as $name ({};
if $dict[$name]
then reduce $dict[$name][] as $country (.; .[$country] += [$name])
else . end)
' output.jsons
produces:
{"USA":["Tom"],"Nigeria":["Tom","Jenny"]}
You can easily transform this into the desired output.
One way to ensure uniqueness of the elements of each array would be to append the following to the filter: map_values(unique).
Re the side question: instead of --argjson you could use --argfile or --slurpfile.

Parse JSON file without having to save it in a file

I have a command that returns a JSON dump that won't get saved into any file.
I have to parse a particular field from the JSON response without saving the output.
I am able to achieve it if I save the output of the command and then parse it using jq and grep like this:
platform json_dump platform_id >resp.json
jq . resp.json | grep elbName
But, I do not want to write the output of my command platform json_dump platform_id which is a JSON dump into any file. I want to parse the elbName directly from the out of the command.
Is there a way I can do that?

Just pipe the program's output to jq:
platform json_dump platform_id | jq .elbName
or whatever.
PS: Use jq to get the value you want, not grep. Example of doing that.:
$ echo '{"elbName":"foo"}' | jq .elbName
"foo"

You can try another pipe to pass the result to jq command
platform json_dump platform_id | jq .elbName

I'm assuming you have python :)
platform json_dump platform_id | python -c 'import sys,json; print(json.load(sys.stdin)["elbName"])' # a bit long ? :)

Ignore Unparseable JSON with jq

I'm using jq to parse some of my logs, but some of the log lines can't be parsed for various reasons. Is there a way to have jq ignore those lines? I can't seem to find a solution. I tried to use the --seq argument that was recommended by some people, but --seq ignores all the lines in my file.

Assuming that each log entry is exactly one line, you can use the -R or --raw-input option to tell jq to leave the lines unparsed, after which you can prepend fromjson? | to your filter to make jq try to parse each line as JSON and throw away the ones that error.

I have log stream where some messages are in json format.
I want to pipe the json messages through jq, and just echo the rest.
The json messages are on a single line.
Solution: use grep and tee to split the lines in two streams, those starting with "^{" pipe through jq and the rest just echo to terminal.
kubectl logs -f web-svjkn | tee >(grep -v "^{") | grep "^{" | jq .
or
cat logs | tee >(grep -v "^{") | grep "^{" | jq .
Explanation:
tee generates 2nd stream, and grep -v prints non json info, 2nd grep only pipes what looks like json opening bracket to jq.

This is an old thread, but here's another solution fully in jq. This allows you to both process proper json lines and also print out non-json lines.
jq -R . as $line | try (fromjson | <further processing for proper json lines>) catch $line'

There are several Q&As on the FAQ page dealing with the topic of "invalid JSON", but see in particular the Q:
Is there a way to have jq keep going after it hits an error in the input file?
In particular, this shows how to use --seq.
However, from the the sparse details you've given (SO recommends a minimal example be given), it would seem it might be better simply to use inputs. The idea is to process one JSON entity at a time, using "try/catch", e.g.
def handle: inputs | [., "length is \(length)"] ;
def process: try handle catch ("Failed", process) ;
process
Don't forget to use the -n option when invoking jq.
See also Processing not-quite-valid JSON.

If JSON in curly braces {}:
grep -Pzo '\{(?>[^\{\}]|(?R))*\}' | jq 'objects'
If JSON in square brackets []:
grep -Pzo '\[(?>[^\[\]]|(?R))*\]' | jq 'arrays'
This works if there are no []{} in non-JSON lines.

Bash script traversing a multi-line JSON object using jq

I have to curl to a site (statuscake.com) that sends multiple items back in a JSON, each line of which contains multiple items. I want to extract from each line two of them, WebsiteName and TestID, so I can check if WebsiteName matches the one I'm interested in, get the TestID out and pass this to a second curl statement to delete the test.
Although it's more complex, the JSON that comes back is essentially of the form
[{"TestID": 123, "WebsiteName": "SomeSite1"}, {"TestID": 1234, "WebsiteName": "SomeSite2"}]
I can't seem to find a magic jq command to do it all in one - if there is one, I'd be really happy to see it.
I've got
cat $data | jq '[.[] | .WebsiteName]'
to get an array of the website names (and a very similar one for the TestIDs, but I think I've done something daft. data is the information coming back from the curl to get the JSON and that's populated OK.
I want to be able to assign these to two arrays, names and ids, then search names for the index of the relevant name, grab the id from ids and pass that to the curl. Unless there's a better way.
Any advice please?

My Xidel can do it all at once by selecting the JSON with a XPath-like query:
E.g. return all ids where the WebsiteName contains "site2" from an array of objects:
xidel /tmp/x.json -e '$json()[contains((.).WebsiteName, "site2")]/TestID'
Or e.g. to download the original JSON and then make the HTTP request with the ids:
xidel http://statuscake.com/your-url... -f '$json()[contains((.).WebsiteName, "site2")]/TestID!x"/your-delete-url{.}..."'

If I'm getting your question right, it sounds like what you want is to, for each element, select those where .WebsiteName == "needle", and then get .TestID from it. You can do just that:
.[] | select(.WebsiteName == "needle") | .TestID
If you want an array as the result, you can wrap the above script in square brackets.
The jq filters startswith and endswith may be of interest to you. If you're going to pass the result back to cURL, you may also be interested in the #sh formatting filter and the -r command-line flag.

Assuming you have a bash 4+ and assuming the json is valid (does not contain newlines in strings, etc.) this works:
$ echo "$data"
[{"TestID": 123, "WebsiteName": "SomeSite1"}, {"TestID": 1234, "WebsiteName":
"SomeSite2"}, {"TestID": 555, "WebsiteName": "foo*ba#r blah[54]quux{4,5,6}"}]
$ declare -A arr
$ while IFS= read -r line; do
eval "$line"
done < <(jq -M -r '.[] | #sh "arr[\(.WebsiteName)]+=\(.TestID)"' <<<"$data")
$ declare -p arr
declare -A arr='(["foo*ba#r blah[54]quux{4,5,6}"]="555" [SomeSite2]="1234" [SomeSite1]="123" )'

Here is a solution using only jq primitives.
.[]
| if .WebsiteName == "SomeSite1" then .TestID else empty end
This is essentially the same as Santiago's answer but if you are new to jq it may be informative because select/1 is defined as
def select(f): if f then . else empty end;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Using jq with a large JSON file - json

This should return all JSON objects with "field". jq '.[] | select(has("field"))'

Related

Fetch json data using Regex on linux

How to check if element(s) exist in JSON array using jq, and put the corresponding object into a new file

Parse JSON file without having to save it in a file

Ignore Unparseable JSON with jq

Bash script traversing a multi-line JSON object using jq

Categories

Resources