Convert bash output to JSON - json

I am running the following command:
sudo clustat | grep primary | awk 'NF{print $1",""server:"$2 ",""status:"$3}'
Results are:
service:servicename,server:servername,status:started
service:servicename,server:servername,status:started
service:servicename,server:servername,status:started
service:servicename,server:servername,status:started
service:servicename,server:servername,status:started
My desired result is:
{"service":"servicename","server":"servername","status":"started"}
{"service":"servicename","server":"servername","status":"started"}
{"service":"servicename","server":"servername","status":"started"}
{"service":"servicename","server":"servername","status":"started"}
{"service":"servicename","server":"servername","status":"started"}
I can't seem to put the qoutation marks withour srewing up my output.

Use jq:
sudo clustat | grep primary |
jq -R 'split(" ")|{service:.[0], server:.[1], status:.[2]}'
The input is read as raw text, not JSON. Each line is split on a space (the argument to split may need to be adjusted depending on the actual input). jq ensures that values are properly quoted when constructing the output objects.

Don't do this: Instead, use #chepner's answer, which is guaranteed to generate valid JSON as output with all possible inputs (or fail with a nonzero exit status if no JSON representation is possible).
The below is only tested to generate valid JSON with the specific inputs shown in the question, and will quite certainly generate output that is not valid JSON with numerous possible inputs (strings with literal quotes, strings ending in literal backslashes, etc).
sudo clustat |
awk '/primary/ {
print "{\"service\":\"" $1 "\",\"server\":\"" $2 "\",\"status\":\""$3"\"}"
}'

For JSON conversion of common shell commands, a good option is jc (JSON Convert)
There is no parser for clustat yet though.
clustat output does look table-like, so you may be able to use the --asciitable parser with jc.

Related

convert JSON object to Prometheus metrics format using jq

Consider a JSON object like
{
"foo": 42,
"baz": -12,
"bar{label1=\"value1\"}": 12.34
}
constructed by jq using some data source. The actual key names and their amount may vary, but the result will always be an object with numbers (int or float) as values. The keys may contain quotation marks, but no whitespaces.
Can I use jq to format the object into a Prometheus-compatible format so I can just use the output to push the data to a Prometheus Pushgateway?
The required result would look like
foo 42
bar{label1="value1"} 12.34
baz -12
i.e. space-separated with newlines (no \r) and without quotes except for the label value.
I can't use bash for post-processing and would therefore prefer a pure jq solution if possible.
Use keys_unsorted to get object keys (keys does the same as well but the former is faster), generate desired output by means of string interpolation.
$ jq -r 'keys_unsorted[] as $k | "\($k) \(.[$k])"' file
foo 42
baz -12
bar{label1="value1"} 12.34
And, by adding -j option and printing line feed manually as #peak suggested you can make this portable.
On a Windows platform, jq will normally use CR-LF for newlines; to prevent this, use the -j command-line option and manually insert the desired 'newline' characters like so:
jq -rj 'to_entries[] | "\(.key) \(.value)\n"' file

map, but with newline characters between keypairs

Say I have input like
{"DESCRIPTION": "Need to run script to do stuff", "PRIORITY": "Medium"}
but also get input like
{"STACK_NAME": "applecakes", "BACKEND_OR_INTEGRATIONS": "integrations", "PRIORITY": "Medium"}
ie, the parameters can be completely different.
I need to get the output in a format more friendly to send to Jira to make tickets. Specifically, I would like to strip the json formatting away, and insert a \n between each keypair. Here's what the above samples should look like:
DESCRIPTION: Need to run script to do stuff\nPRIORITY: Medium
STACK_NAME: applecakes\nBACKEND_OR_INTEGRATIONS: integrations\nPRIORITY: Medium
There can be a little flexibility in that if, for example, more spaces were needed or whatever.
So far I've got this worked out (assuming my input is stored in a variable called description
echo $description | jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]"
This works to strip away the JSON formatting, but doesn't handle newlines. I'm stumped on how to make sure I split only on each keypair, not on say every space or anything equally messy. What do I need to add to include newlines? Is a map even my best choice?
Just join what the array of strings with \\n (the sequence of the \ character which we need to escape and the n character) and use raw-output :
jq --raw-output 'to_entries | map("\(.key) : \(.value)") | join("\\n")'
Try it here.
Or more efficiently and more simply:
jq -r 'to_entries[] | "\(.key) : \(.value)"'
This produces one line per key-value pair.
The two-character sequence \n as a join-string
With your sample JSON, the invocation:
jq -j -r 'to_entries[] | "\(.key) : \(.value)", "\\n" '
would produce:
STACK_NAME : applecakes\nBACKEND_OR_INTEGRATIONS : integrations\nPRIORITY : Medium\n
Notice the trailing "\n".

Getting JSON value from JSON String using Shell Script

I have this JSON String:
{"name":"http://someUrl/ws/someId","id":"someId"}
I just want to get value for "id" key and store it in some variable. I succesfully tried using jq. But due to some constraints, I need to achieve this just by using grep and string matching.
I tried this so far: grep -Po '"id":.*?[^\\]"'; But that is giving "id":"ws-4c906698-03a2-49c3-8b3e-dea829c7fdbe" as output. I just need the id value. Please help
With a PCRE regex, you may use lookarounds. Thus, you need to put "id":" into the positive lookbehind construct, and then match 1 or more chars other than ":
grep -Po '(?<="id":")[^"]+'
where
(?<="id":") - requires a "id":" to appear immediately to the left of the current position (but the matched text is not added to the match value) and
[^"]+ - matches and adds to the match 1 or more chars other than ".
To get the values with escaped quotes:
grep -Po '(?<="id":")[^"\\]*(?:\\.[^"\\]*)*'
Here, (?<="id":") will still match the position right after "id":" and then the following will get matched:
[^"\\]* - zero or more chars other than " and \
(?:\\.[^"\\]*)* - zero or more consequent sequences of:
\\. - a \ and any char (any escape sequence)
[^"\\]* - zero or more chars other than " and \
See Jshon, it is a command line Json parser for shell script usage.
echo '{"name":"http://someUrl/ws/someId","id":"someId"}' | jshon -e id
"someId"
Just noticed I read past the section stating you needed to use standard tools available, if your admin doesn't allow Jshon it is very likely that the system will have Python available which you could use.
echo '{"name":"http://someUrl/ws/someId","id":"someId"}' | python -c 'import sys, json; print json.load(sys.stdin)["id"]'
someId
Using grep for this is just asking for trouble, I would avoid it and opt for a proper Json parser as above.

Ignore Unparseable JSON with jq

I'm using jq to parse some of my logs, but some of the log lines can't be parsed for various reasons. Is there a way to have jq ignore those lines? I can't seem to find a solution. I tried to use the --seq argument that was recommended by some people, but --seq ignores all the lines in my file.
Assuming that each log entry is exactly one line, you can use the -R or --raw-input option to tell jq to leave the lines unparsed, after which you can prepend fromjson? | to your filter to make jq try to parse each line as JSON and throw away the ones that error.
I have log stream where some messages are in json format.
I want to pipe the json messages through jq, and just echo the rest.
The json messages are on a single line.
Solution: use grep and tee to split the lines in two streams, those starting with "^{" pipe through jq and the rest just echo to terminal.
kubectl logs -f web-svjkn | tee >(grep -v "^{") | grep "^{" | jq .
or
cat logs | tee >(grep -v "^{") | grep "^{" | jq .
Explanation:
tee generates 2nd stream, and grep -v prints non json info, 2nd grep only pipes what looks like json opening bracket to jq.
This is an old thread, but here's another solution fully in jq. This allows you to both process proper json lines and also print out non-json lines.
jq -R . as $line | try (fromjson | <further processing for proper json lines>) catch $line'
There are several Q&As on the FAQ page dealing with the topic of "invalid JSON", but see in particular the Q:
Is there a way to have jq keep going after it hits an error in the input file?
In particular, this shows how to use --seq.
However, from the the sparse details you've given (SO recommends a minimal example be given), it would seem it might be better simply to use inputs. The idea is to process one JSON entity at a time, using "try/catch", e.g.
def handle: inputs | [., "length is \(length)"] ;
def process: try handle catch ("Failed", process) ;
process
Don't forget to use the -n option when invoking jq.
See also Processing not-quite-valid JSON.
If JSON in curly braces {}:
grep -Pzo '\{(?>[^\{\}]|(?R))*\}' | jq 'objects'
If JSON in square brackets []:
grep -Pzo '\[(?>[^\[\]]|(?R))*\]' | jq 'arrays'
This works if there are no []{} in non-JSON lines.

Similar strings, different results

I'm creating a Bash script to parse the air pollution levels from the webpage:
http://aqicn.org/city/beijing/m/
There is a lot of stuff in the file, but this is the relevant bit:
"iaqi":[{"p":"pm25","v":[59,21,112],"i":"Beijing pm25 (fine
particulate matter) measured by U.S Embassy Beijing Air Quality
Monitor
(\u7f8e\u56fd\u9a7b\u5317\u4eac\u5927\u4f7f\u9986\u7a7a\u6c14\u8d28\u91cf\u76d1\u6d4b).
Values are converted from \u00b5g/m3 to AQI levels using the EPA
standard."},{"p":"pm10","v":[15,5,69],"i":"Beijing pm10
(respirable particulate matter) measured by Beijing Environmental
Protection Monitoring Center
I want the script to parse and display 2 numbers: current PM2.5 and PM10 levels (the numbers in bold in the above paragraph).
CITY="beijing"
AQIDATA=$(wget -q 0 http://aqicn.org/city/$CITY/m/ -O -)
PM25=$(awk -v FS="(\"p\":\"pm25\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
PM100=$(awk -v FS="(\"p\":\"pm10\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
echo $PM25 $PM100
Even though I can get PM2.5 levels to display correctly, I cannot get PM10 levels to display. I cannot understand why, because the strings are similar.
Anyone here able to explain?
The following approach is based on two steps:
(1) Extracting the relevant JSON;
(2) Extracting the relevant information from the JSON using a JSON-aware tool -- here jq.
(1) Ideally, the web service would provide a JSON API that would allow one to obtain the JSON directly, but as the URL you have is intended for viewing with a browser, some form of screen-scraping is needed. There is a certain amount of brittleness to such an approach, so here I'll just provide something that currently works:
wget -O - http://aqicn.org/city/beijing/m |
gawk 'BEGIN{RS="function"}
$1 ~/getAqiModel/ {
sub(/.*var model=/,"");
sub(/;return model;}/,"");
print}'
(gawk or an awk that supports multi-character RS can be used; if you have another awk, then first split on "function", using e.g.:
sed $'s/function/\\\n/g' # three backslashes )
The output of the above can be piped to the following jq command, which performs the filtering envisioned in (2) above.
(2)
jq -c '.iaqi | .[]
| select(.p? =="pm25" or .p? =="pm10") | [.p, .v[0]]'
The result:
["pm25",59]
["pm10",15]
I think your problem is that you have a single line HTML file that contains a script that contains a variable that contains the data you are looking for.
Your field delimiters are either "p":"pm100", "v":[ or a comma and some digits.
For pm25 this works, because it is the first, and there are no occurrences of ,21 or something similar before it.
However, for pm10, there are some that are associated with pm25 ahead of it. So the second field contains the empty string between ,21 and ,112
#karakfa has a hack that seems to work -- but he doesn't explain very well why it works.
What he does is use awk's record separator (which is usually a newline) and sets it to either of :, ,, or [. So in your case, one of the records would be "pm25", because it is preceded by a colon, which is a separator, and succeeded by a comma, also a separator.
Once it hits the matching content ("pm25") it sets a counter to 4. Then, for this and the next records, it counts this counter down. "pm25" itself, "v", the empty string between : and [, and finally reaches one when hitting the record with the number you want to output: 4 && ! 3 is false, 3 && ! 2 is false, 2 && ! 1 is false, but 1 && ! 0 is true. Since there is no execution block, awk simply prints this record, which is the value you want.
A more robust work would probably be using xpath to find the script, then use some json parser or similar to get the value.
chw21's helpful answer explains why your approach didn't work.
peak's helpful answer is the most robust, because it employs proper JSON parsing.
If you don't want to or can't use third-party utility jq for JSON parsing, I suggest using sed rather than awk, because awk is not a good fit for field-based parsing of this data.
$ sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA"
59 15
The above should work with both GNU and BSD/OSX sed.
To read the result into variables:
read pm25 pm10 < \
<(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA")
Note how I've chosen lowercase variable names, because it's best to avoid all upper-case variables in shell programming, so as to avoid conflicts with special shell and environment variables.
If you can't rely on the order of the values in the source string, use two separate sed commands:
pm25=$(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
pm10=$(sed -E 's/^.*"pm10"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
awk to the rescue!
If you have to, you can use this hacky way using smart counters with hand-crafted delimiters. Setting RS instead of FS transfers looping through fields to awk itself. Multi-char RS is not available for all awks (gawk supports it).
$ awk -v RS='[:,[]' '$0=="\"pm25\""{c=4} c&&!--c' file
59
$ awk -v RS='[:,[]' '$0=="\"pm10\""{c=4} c&&!--c' file
15