jq - Parsing fields with hyphen - Invalid Numeric Literal - json

I'm trying to pull a list of product categories from an API using jq and some nested for-loops. I have to pull the category ID first, then I'm able to pull product details. Some of the category IDs have hypens and jq seems to be treating them like math instead of a string, and I've tried every manner of quoting but I'm still running into this error. In Powershell, I'm able to pull the list just fine, but I really need this to work in bash.
Here's the expected list:
aprons
backpacks
beanies
bracelet
coaster
cutting-board
dress-shirts
duffel-bags
earring
full-brim-hats
generic-dropoff
hats
etc...
And trying to recreate the same script in Bash, here's the output:
aprons
backpacks
beanies
bracelet
coaster
parse error: Invalid numeric literal at line 1, column 6
parse error: Invalid numeric literal at line 1, column 7
parse error: Invalid numeric literal at line 1, column 5
earring
parse error: Invalid numeric literal at line 2, column 0
parse error: Invalid numeric literal at line 1, column 5
parse error: Invalid numeric literal at line 1, column 8
hats
etc...
You can see that it's running into this error with all values that contain hyphens. Here's my current script:
#!/bin/bash
CATEGORIES=$(curl -s https://api.scalablepress.com/v2/categories)
IFS=$' \t\n'
for CATEGORY in $(echo $CATEGORIES | jq -rc '.[]')
do
CATEGORY_IDS=$(echo $CATEGORY | jq -rc '."categoryId"')
for CATEGORY_ID in $(echo $CATEGORY_IDS)
do
echo $CATEGORY_ID
PRODUCT_IDS=$(curl -s https://api.scalablepress.com/v2/categories/$CATEGORY_ID | jq -rc '.products[].id')
#for PRODUCT_ID in $(echo $PRODUCT_IDS)
#do
#echo $PRODUCT_ID
#done
done
done
This is a publicly available API so you should be able to copy this script and produce the same results. All of the guides I've seen have said to put double quotes around the field you're trying to parse if it contains hyphens, but I'm having no luck trying that.

you can loop over categories ids right away, without doing all the "echos" that break the json. the two loops can be rewritten as:
#!/bin/bash
CATURL="https://api.scalablepress.com/v2/categories"
curl -s "$CATURL" | jq -rc '.[] | .categoryId' | while read catid; do
echo "$catid"
curl -s "$CATURL/$catid" | jq -rc '.products[].id'
done
this will print category id followed by all products ids which from you code seems like your end result:
$ ./pullcat.sh
aprons
port-authority-port-authority-â-medium-length-apron-with-pouch-pockets
port-authority-port-authority-â-full-length-apron-with-pockets
port-authority-easy-care-reversible-waist-apron-with-stain-release
port-authority-easy-care-waist-apron-with-stain-release
backpacks
port-authority-â-wheeled-backpack
nike-performance-backpack
port-authority-â-value-backpack
port-authority-â-basic-backpack
port-authority-â-cyber-backpack
port-authority-â-commuter-backpack
port-authority-â-contrast-honeycomb-backpack
port-authority-â-camo-xtreme-backpack
port-authority-â-xtreme-backpack
port-authority-â-xcapeâ-computer-backpack
port-authority-â-nailhead-backpack
nike-elite-backpack
port-authority-â-urban-backpack
eddie-bauer-eddie-bauer-â-ripstop-backpack
the-north-face-aurora-ii-backpack
the-north-face-fall-line-backpack
the-north-face-groundwork-backpack
the-north-face-connector-backpack
beanies
rabbit-skins-infant-baby-rib-cap
yupoong-adult-cuffed-knit-cap
ultra-club-adult-knit-beanie-with-cuff
ultra-club-adult-knit-beanie
ultra-club-adult-two-tone-knit-beanie
ultra-club-adult-knit-beanie-with-lid
ultra-club-adult-waffle-beanie
ultra-club-adult-knit-pom-pom-beanie-with-cuff
bayside-beanie
...
if you want just the categories ids, you can of course "drop" while loop:
#!/bin/bash
CATURL="https://api.scalablepress.com/v2/categories"
curl -s "$CATURL" | jq -rc '.[] | .categoryId'
$ ./pullcat.sh
aprons
backpacks
beanies
bracelet
coaster
cutting-board
dress-shirts
duffel-bags
earring
full-brim-hats
generic-dropoff
hats
hoodies
infant-shirts
ladies-dress-shirts
ladies-dresses
ladies-long-sleeve
ladies-pants
ladies-performance-shirts
ladies-polos
ladies-short-sleeve
ladies-tank-tops
large-bags
...

You can select the key categoryId for each object in the array by applying the selector: curl -s https://api.scalablepress.com/v2/categories | jq 'map(.categoryId)'
This will give you a JSON array with only the values you're interested in. Then you can use the antislurp filter .[] to turn the array into individual results. jq can then output raw strings with the -r switch.
Combining everything, you can achieve what you're looking for with a one-liner:
curl -s https://api.scalablepress.com/v2/categories | jq -r 'map(.categoryId) | .[]'
Even better, you can antislurp first, and then select the key you're looking for: curl -s https://api.scalablepress.com/v2/categories | jq -r '.[] | .categoryId'

Related

Pretty-print valid JSONs mixed with string keys

I have a Redis hash with keys and values like string key -- serialized JSON value.
Corresponding rediscli query (hgetall some_redis_hash) being dumped in a file:
redis_key1
{"value1__key1": "value1__value1", "value1__key2": "value1__value2" ...}
redis_key2
{"value2__key1": "value2__value1", "value2__key2": "value2__value2" ...}
...
and so on.
So the question is, how do I pretty-print these values enclosed in brackets? (note that key strings between are making the document invalid, if you'll try to parse the entire one)
The first thought is to get particular pairs from Redis, strip parasite keys, and use jq on the remaining valid JSON, as shown below:
rediscli hget some_redis_hash redis_key1 > file && tail -n +2 file
- file now contains valid JSON as value, the first string representing Redis key is stripped by tail -
cat file | jq
- produces pretty-printed value -
So the question is, how to pretty-print without such preprocessing?
Or (would be better in this particular case) how to merge keys and values in one big JSON, where Redis keys, accessible on the upper level, will be followed by dicts of their values?
Like that:
rediscli hgetall some_redis_hash > file
cat file | cool_parser
- prints { "redis_key1": {"value1__key1": "value1__value1", ...}, "redis_key2": ... }
A simple way for just pretty-printing would be the following:
cat file | jq --raw-input --raw-output '. as $raw | try fromjson catch $raw'
It tries to parse each line as json with fromjson, and just outputs the original line (with $raw) if it can't.
(The --raw-input is there so that we can invoke fromjson enclosed in a try instead of running it on every line directly, and --raw-output is there so that any non-json lines are not enclosed in quotes in the output.)
A solution for the second part of your questions using only jq:
cat file \
| jq --raw-input --null-input '[inputs] | _nwise(2) | {(.[0]): .[1] | fromjson}' \
| jq --null-input '[inputs] | add'
--null-input combined with [inputs] produces the whole input as an array
which _nwise(2) then chunks into groups of two (more info on _nwise)
which {(.[0]): .[1] | fromjson} then transforms into a list of jsons
which | jq --null-input '[inputs] | add' then combines into a single json
Or in a single jq invocation:
cat file | jq --raw-input --null-input \
'[ [inputs] | _nwise(2) | {(.[0]): .[1] | fromjson} ] | add'
...but by that point you might be better off writing an easier to understand python script.

Can't put JSON output into CSV format with jq

I'm building a list of AWS EBS volumes attributes so I can store it as CSV in a variable, using jq. I'm going to output the variable to a spread sheet.
The first command gives the values I'm looking for using jq:
aws ec2 describe-volumes | jq -r '.Volumes[] | .VolumeId, .AvailabilityZone, .Attachments[].InstanceId, .Attachments[].State, (.Tags // [] | from_entries.Name)'
Gives output that I want like this:
MIAPRBcdm0002_test_instance
vol-0105a1678373ae440
us-east-1c
i-0403bef9c0f6062e6
attached
MIAPRBcdwb00000_app1_vpc
vol-0d6048ec6b2b6f1a4
us-east-1c
MIAPRBcdwb00001 /carbon
vol-0cfcc6e164d91f42f
us-east-1c
i-0403bef9c0f6062e6
attached
However, if I put it into CSV format so I can output the variable to a spread sheet, the command blows up and doesn't work:
aws ec2 describe-volumes | jq -r '.Volumes[] | .VolumeId, .AvailabilityZone, .Attachments[].InstanceId, .Attachments[].State, (.Tags // [] | from_entries.Name)| #csv'
jq: error (at <stdin>:4418): string ("vol-743d1234") cannot be csv-formatted, only array
Even putting the top level of the JSON into CSV format fails for EBS volumes:
aws ec2 describe-volumes | jq -r '.Volumes[].VolumeId | #csv'
jq: error (at <stdin>:4418): string ("vol-743d1234") cannot be csv-formatted, only array
Here is the AWS EBS Volumes JSON FILE that I am working with, with these commands (the file has been cleaned of company identifiers, but is valid json).
How can I get this json into CSV format using jq?
You can only apply #csv over an array content, just enclose your filter within a [..] as below
jq -r '[.Volumes[] | .VolumeId, .AvailabilityZone, .Attachments[].InstanceId, .Attachments[].State, (.Tags // [] | from_entries.Name)]|#csv'
Using the above might still retain the quotes, so using join() would also be appropriate here
jq -r '[.Volumes[] | .VolumeId, .AvailabilityZone, .Attachments[].InstanceId, .Attachments[].State, (.Tags // [] | from_entries.Name)] | join(",")'
The accepted Answer resolves another obscure jq error:
string ("xxx") cannot be csv-formatted, only array
In my case I did not want the entire output of jq, but rather each Elastic Search document I supplied to jq to be printed as a CSV string on a line of its own. To accomplish this I simply moved the brackets to enclose only the items to be included on each line.
First, by placing my brackets only around items to be included on each line of output, I produced:
jq -r '.hits.hits[]._source | [.syscheck.path, .syscheck.size_after]'
[
"/etc/group-",
"783"
]
[
"/etc/gshadow-",
"640"
]
[
"/etc/group",
"795"
]
[
"/etc/gshadow",
"652"
]
[
"/etc/ssh/sshd_config",
"3940"
]
Piping this to | #csv prints each document's values of .syscheck.path and .syscheck.size_after, quoted and comma-separated, on a separate line:
$ jq -r '.hits.hits[]._source | [.syscheck.path, .syscheck.size_after] | #csv'
"/etc/group-","783"
"/etc/gshadow-","640"
"/etc/group","795"
"/etc/gshadow","652"
"/etc/ssh/sshd_config","3940"
Or to omit quotation marks, following the pattern noted in the accepted Answer:
$ jq -r '.hits.hits[]._source | [.syscheck.path, .syscheck.size_after] | join(",")'
/etc/group-,783
/etc/gshadow-,640
/etc/group,795
/etc/gshadow,652
/etc/ssh/sshd_config,3940

Optimize JSON denormalization using JQ - "cartesian product" from 1:N

I have a JSON database change log, output of wal2json. It looks like this:
{"xid":1190,"timestamp":"2018-07-19 17:18:02.905354+02","change":[
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
...
Each top level entry (xid) is a transaction, each item in change is, well, a change. One row may change multiple times.
To import to an OLAP system with limited feature set, I need to have the order explicitly stated. So I need to add a sn for each change in a transaction.
Also, each change must be a top level entry - the OLAP can't iterate sub-items within one entry.
{"xid":1190, "sn":1, "kind":"update", "data":{"id":401,"name":"Update AA","age":20} }
{"xid":1190, "sn":2, "kind":"update", "data":{"id":401,"name":"Update BB","age":20} }
{"xid":1191, "sn":1, "kind":"insert", "data":{"id":625,"name":"Inserted","age":20} }
{"xid":1191, "sn":2, "kind":"delete", "data":{"id":625} }
(The reason is that the OLAP has limited ability to transform the data during import, and also doesn't have the order as a parameter.)
So, I do this using jq:
function transformJsonDataStructure {
## First let's reformat it to XML, then transform using XPATH, then back to JSON.
## Example input:
# {"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change":[
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
cat "$1" | while read -r LINE ; do
XID=`echo "$LINE" | jq -c '.xid'`;
export SN=0;
#serr "{xid: $XID, changes: $CHANGES}";
echo "$LINE" | jq -c '.change[]' | while read -r CHANGE ; do
SN=$((SN+=1))
KIND=`echo "$CHANGE" | jq -c --raw-output .kind`;
TABLE=`echo "$CHANGE" | jq -c --raw-output .table`;
DEST_FILE="$TARGET_PATH-$TABLE.json";
case "$KIND" in
update|insert)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".columnnames")" "$(echo "$CHANGE" | jq -c ".columnvalues")") ;;
delete)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".oldkeys.keynames")" "$(echo "$CHANGE" | jq -c ".oldkeys.keyvalues")") ;;
esac
#echo "{\"xid\":$XID, \"table\":\"$TABLE\", \"kind\":\"$KIND\", \"data\":$MAP }" >> "$DEST_FILE"; ;;
echo "{\"xid\":$XID, \"sn\":$SN, \"kind\":\"$KIND\", \"data\":$MAP }" | tee --append "$DEST_FILE";
done;
done;
return;
}
The problem is the performance. I am calling jq few times per entry. This is quite slow, around 1000x times slower than without the transformation.
How can perform the transformation above using just one pass? (jq is not a must, other tool can be used too, but should be in CentOS packages. I want to avoid coding an extra tool for that.
From man jq it seems that it could be capable of processing the whole file (JSON entry per row) in one go. I could do it in XSLT but I can't wrap my head around jq. Especially the iteration of the change array and combining columnnames and columnvalues to a map.
For the iteration, I think map or map_values could be used.
For the 2 arrays to map, I see the from_entries and with_entries functions, but can't get it work.
Any jq master around to advise?
The following helper function converts the incoming array into an object using headers as the keys:
def objectify(headers):
[headers, .] | transpose | map({(.[0]): .[1]}) | add;
The trick now is to use range(0;length) to generate .sn:
{xid} +
(.change
| range(0;length) as $i
| .[$i]
| .columnnames as $header
| {sn: ($i + 1),
kind,
data: (.columnvalues|objectify($header)) } )
Output
For the given log entry, the output would be:
{"xid":1190,"sn":1,"kind":"update","data":{"id":401,"name":"Update AA","age":20}}
{"xid":1190,"sn":2,"kind":"update","data":{"id":401,"name":"Update BB","age":20}}
Moral
If a solution looks too complicated, it probably is.

Extract json response to shell variable using jq

I have a sample json response as shown below which i am trying to parse using jq in shell script.[{"id":1,"notes":"Demo1\nDemo2"}]
This is the command through which I am trying to access notes in the shell script.
value=($(curl $URL | jq -r '.[].notes'))
When I echo "$value" I only get Demo1. How to get the exact value: Demo1\nDemo2 ?
To clarify, there is no backslash or n in the notes field. \n is JSON's way of encoding a literal linefeed, so the value you should be expecting is:
Demo1
Demo2
The issue you're seeing is because you have split the value on whitespace and created an array. Each value can be accessed by index:
$ cat myscript
data='[{"id":1,"notes":"Demo1\nDemo2"}]'
value=($(printf '%s' "$data" | jq -r '.[].notes'))
echo "The first value was ${value[0]} and the second ${value[1]}"
$ bash myscript
The first value was Demo1 and the second Demo2
To instead get it as a simple string, remove the parens from value=(..):
$ cat myscript2
data='[{"id":1,"notes":"Demo1\nDemo2"}]'
value=$(printf '%s' "$data" | jq -r '.[].notes')
echo "$value"
$ bash myscript2
Demo1
Demo2

extract 2 values from JSON object and use as variables in loop using jq and bash

I am new to jq. I am trying to write a simple script that loops through a JSON file, gets two values within each object and assigns them to two separate variables I can use with a curl REST call. I see both values as output when I echo $i but how can I get value and addr as separate variables?
for i in `cat /Users/egraham/Downloads/test2 | jq .[] | jq ."value,.addr"`; do
You can do this:
jq -rc '.populator.value + " " + .populator.addr' file.json |
while read -r value addr; do
echo do something with "$value" and "$addr"
done
If spaces or tabs or other special characters make using 'read -r' problematic, and if your shell has "readarray", then it could be used:
$ readarray -t v < <(jq -rc '.populator | (.value,.addr)' file.json)
The values would then be available as ${v[0]} and ${v[1]}
This approach is especially useful if there are more than two values of interest, or if the number of values is variable or not known beforehand.
If your shell does not have readarray, then you can still use the array-oriented approach, e.g. along the lines of:
i=-1; while read -r a ; do i=$((i+1)); v[$i]="$a" ; done
First:
for i in cat /Users/egraham/Downloads/test2 | jq .[] | jq .value; do echo $i done
Second:
for i in cat /Users/egraham/Downloads/test2 | jq .[] | jq .addr; do echo $i done
I don't know any way to get it without running the commands separately. I don't know AWK, but maybe it's something worth considering.