Fetch json data using Regex on linux - json

I know we should use JQ for parsing json data, but I want to parse it using regex. I want to fetch the value of a json key into a variable in my shell script. As of now, I am using JQ for parsing.
So my abc.json is
{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[],"value4":{"value41":{"value411":5}}}
Currently, my XYZ.sh has these lines to fetch the data
data1 =$(cat abc.json | jq -r '.value4.value41.value411')
I want data1 to have value of value411. How can I achieve this?
ps- The JSON is mutable. The above JSON is just a part of the JSON file that I want to fetch.

Is your json structure immutable? If you have to use it, consider the following
┌──[root#vms83.liruilongs.github.io]-[~]
└─$cat abc.json | awk -F: '{print $NF}' | grep -o '[[:digit:]]'
5

I think your problem was you had a space between data and =. There can't be a space there.
This works as you want it to (I removed the unnecessary cat)
data1=$(jq -r '.value4.value41.value411' abc.json)
echo $data1

Related

Parse JSON file without having to save it in a file

I have a command that returns a JSON dump that won't get saved into any file.
I have to parse a particular field from the JSON response without saving the output.
I am able to achieve it if I save the output of the command and then parse it using jq and grep like this:
platform json_dump platform_id >resp.json
jq . resp.json | grep elbName
But, I do not want to write the output of my command platform json_dump platform_id which is a JSON dump into any file. I want to parse the elbName directly from the out of the command.
Is there a way I can do that?
Just pipe the program's output to jq:
platform json_dump platform_id | jq .elbName
or whatever.
PS: Use jq to get the value you want, not grep. Example of doing that.:
$ echo '{"elbName":"foo"}' | jq .elbName
"foo"
You can try another pipe to pass the result to jq command
platform json_dump platform_id | jq .elbName
I'm assuming you have python :)
platform json_dump platform_id | python -c 'import sys,json; print(json.load(sys.stdin)["elbName"])' # a bit long ? :)

Extract a JSON values from a txt file and write them seperated by comma

I have txt file with curl response with information on the thousands of files downloaded and the year in which they were downloaded.
I try unsuccessfully (sed+grep) to extract the filename and the year and write them to a separate file ("filname+year.txt") separated by a comma.
{"status_code":"200",
"status_message":"Results found.",
"results":[{"filename":"test189.pdf",
"year":"2012",
"URL":"https:\/\/www.orkistar.org\/random.php?q=iper.pdf&y=2012"
}
......
Any idea?
Use a JSON aware tool, e.g. jq:
jq -r '.results[] as $r | $r.filename + "," + $r.year' < file.json
jq has a filter for converting to CSV. Using it ensures various edge cases are handled appropriately, assuming the goal is to generate valid CSV:
jq -r '.results[] | [.filename, .year] | #csv' file.json
In any case, notice that there is no need to introduce any named variables.

How to create 2 CSV files from 1 JSON using JQ

I have a lot of rather large JSON logs which need to be imported into several DB tables.
I can easily parse them and create 1 CSV for import.
But how can I parse the JSON and get 2 different CSV files as output?
Simple (nonsense) example:
testJQ.log
{"id":1234,"type":"A","group":"games"}
{"id":5678,"type":"B","group":"cars"}
using
cat testJQ.log|jq --raw-output '[.id,.type,.group]|#csv'>testJQ.csv
I get one file testJQ.csv
1234,"A","games
5678,"B","cars"
But I would like to get this
types.csv
1234,"A"
5678,"B"
groups.csv
1234,"games"
5678,"cars"
Can this be done without having to parse the JSON twice, first time creating the types.csv and second time the groups.csv like this?
cat testJQ.log|jq --raw-output '[.id,.type]|#csv'>types.csv
cat testJQ.log|jq --raw-output '[.id,.group]|#csv'>groups.csv
I suppose one way you could hack this up is to output the contents of one file to stdout and the others to stderr and redirect to separate files. Of course you're limited to two files though.
$ <testJQ.log jq -r '([.id,.type]|#csv),([.id,.group]|#csv|stderr|empty)' \
1>types.csv 2>groups.csv
stderr outputs to stderr but the value propagates to the output, so you'll want to follow that up with empty to swallow that up.
Personally I wouldn't recommend doing this, I would just write a python script (or other language) to parse this if you needed to output to multiple files.
You will either need to run jq twice, or to run jq in conjunction with another program to "split" the output of the call to jq. For example, you could use a pipeline of the form: jq -c ... | awk ...
The potential disadvantage of the pipeline approach is that if JSON is the final output, it will be JSONL; but obviously that doesn't apply here.
There are many ways to craft such a pipeline. For example, assuming there are no raw newlines in the CSV:
< testJQ.log jq -r '
"types", ([.id,.type] |#csv),
"groups", ([.id,.group]|#csv)' |
awk 'NR % 2 == 1 {out=$1; next} {print >> out".csv"}'
Or:
< testJQ.log jq -r '([.id,.type],[.id,.group])|#csv' |
awk '{ out = ((NR % 2) == 1) ? "types" : "groups"; print >> out".csv"}'
For other examples, see e.g.
Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?
Splitting / chunking JSON files with JQ in Bash or Fish shell?
Split JSON into multiple files
Handling raw new-lines
Whether or not you split the CSV into multiple files, there is a potential issue with embedded raw newlines. One approach is to change "\n" in JSON strings to "\\n", e.g.
jq -r '([.id,.type],[.id,.group])
| map(if type == "string" then gsub("\n";"\\n") else . end)
| #csv'

Ignore Unparseable JSON with jq

I'm using jq to parse some of my logs, but some of the log lines can't be parsed for various reasons. Is there a way to have jq ignore those lines? I can't seem to find a solution. I tried to use the --seq argument that was recommended by some people, but --seq ignores all the lines in my file.
Assuming that each log entry is exactly one line, you can use the -R or --raw-input option to tell jq to leave the lines unparsed, after which you can prepend fromjson? | to your filter to make jq try to parse each line as JSON and throw away the ones that error.
I have log stream where some messages are in json format.
I want to pipe the json messages through jq, and just echo the rest.
The json messages are on a single line.
Solution: use grep and tee to split the lines in two streams, those starting with "^{" pipe through jq and the rest just echo to terminal.
kubectl logs -f web-svjkn | tee >(grep -v "^{") | grep "^{" | jq .
or
cat logs | tee >(grep -v "^{") | grep "^{" | jq .
Explanation:
tee generates 2nd stream, and grep -v prints non json info, 2nd grep only pipes what looks like json opening bracket to jq.
This is an old thread, but here's another solution fully in jq. This allows you to both process proper json lines and also print out non-json lines.
jq -R . as $line | try (fromjson | <further processing for proper json lines>) catch $line'
There are several Q&As on the FAQ page dealing with the topic of "invalid JSON", but see in particular the Q:
Is there a way to have jq keep going after it hits an error in the input file?
In particular, this shows how to use --seq.
However, from the the sparse details you've given (SO recommends a minimal example be given), it would seem it might be better simply to use inputs. The idea is to process one JSON entity at a time, using "try/catch", e.g.
def handle: inputs | [., "length is \(length)"] ;
def process: try handle catch ("Failed", process) ;
process
Don't forget to use the -n option when invoking jq.
See also Processing not-quite-valid JSON.
If JSON in curly braces {}:
grep -Pzo '\{(?>[^\{\}]|(?R))*\}' | jq 'objects'
If JSON in square brackets []:
grep -Pzo '\[(?>[^\[\]]|(?R))*\]' | jq 'arrays'
This works if there are no []{} in non-JSON lines.

Bash script traversing a multi-line JSON object using jq

I have to curl to a site (statuscake.com) that sends multiple items back in a JSON, each line of which contains multiple items. I want to extract from each line two of them, WebsiteName and TestID, so I can check if WebsiteName matches the one I'm interested in, get the TestID out and pass this to a second curl statement to delete the test.
Although it's more complex, the JSON that comes back is essentially of the form
[{"TestID": 123, "WebsiteName": "SomeSite1"}, {"TestID": 1234, "WebsiteName": "SomeSite2"}]
I can't seem to find a magic jq command to do it all in one - if there is one, I'd be really happy to see it.
I've got
cat $data | jq '[.[] | .WebsiteName]'
to get an array of the website names (and a very similar one for the TestIDs, but I think I've done something daft. data is the information coming back from the curl to get the JSON and that's populated OK.
I want to be able to assign these to two arrays, names and ids, then search names for the index of the relevant name, grab the id from ids and pass that to the curl. Unless there's a better way.
Any advice please?
My Xidel can do it all at once by selecting the JSON with a XPath-like query:
E.g. return all ids where the WebsiteName contains "site2" from an array of objects:
xidel /tmp/x.json -e '$json()[contains((.).WebsiteName, "site2")]/TestID'
Or e.g. to download the original JSON and then make the HTTP request with the ids:
xidel http://statuscake.com/your-url... -f '$json()[contains((.).WebsiteName, "site2")]/TestID!x"/your-delete-url{.}..."'
If I'm getting your question right, it sounds like what you want is to, for each element, select those where .WebsiteName == "needle", and then get .TestID from it. You can do just that:
.[] | select(.WebsiteName == "needle") | .TestID
If you want an array as the result, you can wrap the above script in square brackets.
The jq filters startswith and endswith may be of interest to you. If you're going to pass the result back to cURL, you may also be interested in the #sh formatting filter and the -r command-line flag.
Assuming you have a bash 4+ and assuming the json is valid (does not contain newlines in strings, etc.) this works:
$ echo "$data"
[{"TestID": 123, "WebsiteName": "SomeSite1"}, {"TestID": 1234, "WebsiteName":
"SomeSite2"}, {"TestID": 555, "WebsiteName": "foo*ba#r blah[54]quux{4,5,6}"}]
$ declare -A arr
$ while IFS= read -r line; do
eval "$line"
done < <(jq -M -r '.[] | #sh "arr[\(.WebsiteName)]+=\(.TestID)"' <<<"$data")
$ declare -p arr
declare -A arr='(["foo*ba#r blah[54]quux{4,5,6}"]="555" [SomeSite2]="1234" [SomeSite1]="123" )'
Here is a solution using only jq primitives.
.[]
| if .WebsiteName == "SomeSite1" then .TestID else empty end
This is essentially the same as Santiago's answer but if you are new to jq it may be informative because select/1 is defined as
def select(f): if f then . else empty end;