Ignore Unparseable JSON with jq - json

I'm using jq to parse some of my logs, but some of the log lines can't be parsed for various reasons. Is there a way to have jq ignore those lines? I can't seem to find a solution. I tried to use the --seq argument that was recommended by some people, but --seq ignores all the lines in my file.

Assuming that each log entry is exactly one line, you can use the -R or --raw-input option to tell jq to leave the lines unparsed, after which you can prepend fromjson? | to your filter to make jq try to parse each line as JSON and throw away the ones that error.

I have log stream where some messages are in json format.
I want to pipe the json messages through jq, and just echo the rest.
The json messages are on a single line.
Solution: use grep and tee to split the lines in two streams, those starting with "^{" pipe through jq and the rest just echo to terminal.
kubectl logs -f web-svjkn | tee >(grep -v "^{") | grep "^{" | jq .
or
cat logs | tee >(grep -v "^{") | grep "^{" | jq .
Explanation:
tee generates 2nd stream, and grep -v prints non json info, 2nd grep only pipes what looks like json opening bracket to jq.

This is an old thread, but here's another solution fully in jq. This allows you to both process proper json lines and also print out non-json lines.
jq -R . as $line | try (fromjson | <further processing for proper json lines>) catch $line'

There are several Q&As on the FAQ page dealing with the topic of "invalid JSON", but see in particular the Q:
Is there a way to have jq keep going after it hits an error in the input file?
In particular, this shows how to use --seq.
However, from the the sparse details you've given (SO recommends a minimal example be given), it would seem it might be better simply to use inputs. The idea is to process one JSON entity at a time, using "try/catch", e.g.
def handle: inputs | [., "length is \(length)"] ;
def process: try handle catch ("Failed", process) ;
process
Don't forget to use the -n option when invoking jq.
See also Processing not-quite-valid JSON.

If JSON in curly braces {}:
grep -Pzo '\{(?>[^\{\}]|(?R))*\}' | jq 'objects'
If JSON in square brackets []:
grep -Pzo '\[(?>[^\[\]]|(?R))*\]' | jq 'arrays'
This works if there are no []{} in non-JSON lines.

Related

Fetch json data using Regex on linux

I know we should use JQ for parsing json data, but I want to parse it using regex. I want to fetch the value of a json key into a variable in my shell script. As of now, I am using JQ for parsing.
So my abc.json is
{"value1":5.0,"value2":2.5,"value3":"2019-10-24T15:26:00.000Z","modifier":[],"value4":{"value41":{"value411":5}}}
Currently, my XYZ.sh has these lines to fetch the data
data1 =$(cat abc.json | jq -r '.value4.value41.value411')
I want data1 to have value of value411. How can I achieve this?
ps- The JSON is mutable. The above JSON is just a part of the JSON file that I want to fetch.
Is your json structure immutable? If you have to use it, consider the following
┌──[root#vms83.liruilongs.github.io]-[~]
└─$cat abc.json | awk -F: '{print $NF}' | grep -o '[[:digit:]]'
5
I think your problem was you had a space between data and =. There can't be a space there.
This works as you want it to (I removed the unnecessary cat)
data1=$(jq -r '.value4.value41.value411' abc.json)
echo $data1

jq group_by does not play nice with .[]

I have a json file locally called pokemini.json. These are the contents of it;
{"name":"Bulbasaur","type":["Grass","Poison"],"total":318,"hp":45,"attack":49}
{"name":"Ivysaur","type":["Grass","Poison"],"total":405,"hp":60,"attack":62}
{"name":"Venusaur","type":["Grass","Poison"],"total":525,"hp":80,"attack":82}
{"name":"VenusaurMega Venusaur","type":["Grass","Poison"],"total":625,"hp":80,"attack":100}
{"name":"Charmander","type":["Fire"],"total":309,"hp":39,"attack":52}
{"name":"Charmeleon","type":["Fire"],"total":405,"hp":58,"attack":64}
{"name":"Charizard","type":["Fire","Flying"],"total":534,"hp":78,"attack":84}
{"name":"CharizardMega Charizard X","type":["Fire","Dragon"],"total":634,"hp":78,"attack":130}
{"name":"CharizardMega Charizard Y","type":["Fire","Flying"],"total":634,"hp":78,"attack":104}
{"name":"Squirtle","type":["Water"],"total":314,"hp":44,"attack":48}
There are a few types of pokemon in here and I want to do some aggregation with jq.
I could, per example, write this command;
> jq -s -c 'group_by(.type[0]) | .[]' pokemini.json
[{"name":"Charmander","type":["Fire"],"total":309,"hp":39,"attack":52},{"name":"Charmeleon","type":["Fire"],"total":405,"hp":58,"attack":64},{"name":"Charizard","type":["Fire","Flying"],"total":534,"hp":78,"attack":84},{"name":"CharizardMega Charizard X","type":["Fire","Dragon"],"total":634,"hp":78,"attack":130},{"name":"CharizardMega Charizard Y","type":["Fire","Flying"],"total":634,"hp":78,"attack":104}]
[{"name":"Bulbasaur","type":["Grass","Poison"],"total":318,"hp":45,"attack":49},{"name":"Ivysaur","type":["Grass","Poison"],"total":405,"hp":60,"attack":62},{"name":"Venusaur","type":["Grass","Poison"],"total":525,"hp":80,"attack":82},{"name":"VenusaurMega Venusaur","type":["Grass","Poison"],"total":625,"hp":80,"attack":100}]
[{"name":"Squirtle","type":["Water"],"total":314,"hp":44,"attack":48}]
I am aware that the -c flag is what is causing it to print line by line and that I need -s to handle the fact that my json file is more like jsonlines that actualy json. It should also be pointed that out there are only three types of pokemon detected because I can grouping over .type[0] (note that [0]).
I don't get why this does not work though;
> jq -s '.[] | group_by(.type[0])' pokemini.json
jq: error (at pokemini.json:10): Cannot index string with string "type"
group_by/1 expects its input to be an array. By calling .[] first, you are effectively undoing the work of the -s option.
By the way, an alternative to using -s is to use inputs with the -n command-line option, but in this case it makes little difference. When you don’t actually need to read all the entire stream of inputs at once, though, using inputs is in general more efficient.

Split multiple input JSONs with jq

Given a JSON line
{"a":0,"b":{"c":"C"}}{"x":33}{"asd":889}
of 3 independent JSON objects.
And need to handle then one by one. It would be nice to have something like
echo "$json" | jq --first-one
Expected output:
{"a":0,"b":{"c":"C"}}
I found the only command which can remove first object and output others. inputs
echo '{"a":0,"b":{"c":"C"}}{"x":33}{"asd":889}' | jq -c inputs
output:
{"x":33}
{"asd":889}
How to read only first object from input stream and do not touch the rest objects?
Workaround
While writing this Q I found a workaround, but it looks cumbersome
echo '{"a":0,"b":{"c":"C"}}{"x":33}{"asd":889}' | jq -c . | head -1
simply get first line...
Slurping should, in general, be avoided if possible. If your jq has input, you could simply write:
echo '{"a":0,"b":{"c":"C"}}{"x":33}{"asd":889}' |
jq -n input
If your jq does not have input, now would be a great time to upgrade to jq 1.6. If that is not an option, then by all means use the -s option, e.g. jq -s '.[0]'

How to create 2 CSV files from 1 JSON using JQ

I have a lot of rather large JSON logs which need to be imported into several DB tables.
I can easily parse them and create 1 CSV for import.
But how can I parse the JSON and get 2 different CSV files as output?
Simple (nonsense) example:
testJQ.log
{"id":1234,"type":"A","group":"games"}
{"id":5678,"type":"B","group":"cars"}
using
cat testJQ.log|jq --raw-output '[.id,.type,.group]|#csv'>testJQ.csv
I get one file testJQ.csv
1234,"A","games
5678,"B","cars"
But I would like to get this
types.csv
1234,"A"
5678,"B"
groups.csv
1234,"games"
5678,"cars"
Can this be done without having to parse the JSON twice, first time creating the types.csv and second time the groups.csv like this?
cat testJQ.log|jq --raw-output '[.id,.type]|#csv'>types.csv
cat testJQ.log|jq --raw-output '[.id,.group]|#csv'>groups.csv
I suppose one way you could hack this up is to output the contents of one file to stdout and the others to stderr and redirect to separate files. Of course you're limited to two files though.
$ <testJQ.log jq -r '([.id,.type]|#csv),([.id,.group]|#csv|stderr|empty)' \
1>types.csv 2>groups.csv
stderr outputs to stderr but the value propagates to the output, so you'll want to follow that up with empty to swallow that up.
Personally I wouldn't recommend doing this, I would just write a python script (or other language) to parse this if you needed to output to multiple files.
You will either need to run jq twice, or to run jq in conjunction with another program to "split" the output of the call to jq. For example, you could use a pipeline of the form: jq -c ... | awk ...
The potential disadvantage of the pipeline approach is that if JSON is the final output, it will be JSONL; but obviously that doesn't apply here.
There are many ways to craft such a pipeline. For example, assuming there are no raw newlines in the CSV:
< testJQ.log jq -r '
"types", ([.id,.type] |#csv),
"groups", ([.id,.group]|#csv)' |
awk 'NR % 2 == 1 {out=$1; next} {print >> out".csv"}'
Or:
< testJQ.log jq -r '([.id,.type],[.id,.group])|#csv' |
awk '{ out = ((NR % 2) == 1) ? "types" : "groups"; print >> out".csv"}'
For other examples, see e.g.
Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?
Splitting / chunking JSON files with JQ in Bash or Fish shell?
Split JSON into multiple files
Handling raw new-lines
Whether or not you split the CSV into multiple files, there is a potential issue with embedded raw newlines. One approach is to change "\n" in JSON strings to "\\n", e.g.
jq -r '([.id,.type],[.id,.group])
| map(if type == "string" then gsub("\n";"\\n") else . end)
| #csv'

I cannot get jq to give me the value I'm looking for.

I'm trying to use jq to get a value from the JSON that cURL returns.
This is the JSON cURL passes to jq (and, FTR, I want jq to return "VALUE-I-WANT" without the quotation marks):
[
{
"success":{
"username":"VALUE-I-WANT"
}
}
]
I initially tried this:
jq ' . | .success | .username'
and got
jq: error (at <stdin>:0): Cannot index array with string "success"
I then tried a bunch of variations, with no luck.
With a bunch of searching the web, I found this SE entry, and thought it might have been my saviour (spoiler, it wasn't). But it led me to try these:
jq -r '.[].success.username'
jq -r '.[].success'
They didn't return an error, they returned "null". Which may or may not be an improvement.
Can anybody tell me what I'm doing wrong here? And why it's wrong?
You need to pipe the output of .[] into the next filter.
jq -r '.[] | .success.username' tmp.json
tl;dr
# Extract .success.username from ALL array elements.
# .[] enumerates all array elements
# -r produces raw (unquoted) output
jq -r '.[].success.username' file.json
# Extract .success.username only from the 1st array element.
jq -r '.[0].success.username' file.json
Your input is an array, so in order to access its elements you need .[], the array/object-value iterator (as the name suggests, it can also enumerate the properties of an object):
Just . | sends the input (.) array as a whole through the pipeline, and an array only has numerical indices, so the attempt to index (access) it with .success.username fails.
Thus, simply replacing . | with .[] | in your original attempt, combined with -r to get raw (unquoted output), should solve your problem, as shown in chepner's helpful answer.
However, peak points out that since at least jq 1.3 (current as of this writing is jq 1.5) you don't strictly need a pipeline, as demonstrated in the commands at the top.
So the 2nd command in your question should work with your sample input, unless you're using an older version.