Moving column to the end of JSON in bash - json

I'm officially out of options, I tried everything.
I have a CSV that looks like this
from/email
to/email
template_id
me#x.com
mike#x.com
12345
me#x.com
pete#x.com
12345
I run a package called csvkit to convert CSV to JSON like this
csvjson input.csv > output.json
and pipe it into curl
curl -X POST http://website.com -d #output.json
and get a big fat error from the server saying "to/email is required"
I check my json in Sublime and it's fine
[
{
"from/email": "me#x.com",
"to/0/email": "mike#x.com",
"template_id": "12345"
}
]
but I check my json with the terminal jtbl tool to visualize json
cat output.json | jtbl
and I get
from/email
template_id
to/email
me#x.com
12345
mike#x.com
me#x.com
12345
pete#x.com
which makes no sense. I have no idea what I'm doing wrong. Is there a way to move my template_id column back to the end of the file instead of in the middle?

Since the Q has the jq tag, it might help to note that jq does (unless otherwise instructed) preserve the ordering of keys. Apart from the headers, you could do worse than:
jq -r '(.[0]|keys_unsorted) as $keys | [.[][$keys[]]] | #csv' output.json

Related

Filter JSON on the command line

I want to filter JSON on the command line.
Task: print the "name" of each dictionary in the json list.
Example json:
[
{
"id":"d963984c-1075-4d25-8cd0-eae9a7e2d130",
"extra":{
"foo":false,
"bar":null
},
"created_at":"2020-05-06T15:31:59Z",
"name":"NAME1"
},
{
"id":"ee63984c-1075-4d25-8cd0-eae9a7e2d1xx",
"name":"NAME2"
}
]
Desired output:
NAME1
NAME2
This script would work:
#!/usr/bin/env python
import json
import sys
for item in json.loads(sys.stdin.read()):
print(item['name'])
But since I am very lazy, I am looking for a solution where I need to type less. For example in on the command line in a pipe:
curl https://example.com/get-json | MAGIC FILTER
I asked at code golf but they told me that it would make more sense to ask here.
You can use jq https://stedolan.github.io/jq/manual/
% curl https://example.com/get-json | jq -r '.[].name'
NAME1
NAME2
-r if the filter's result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes

How to Iterate over an array of objets using jq

I have a javascript file which prints a JSON array of objects:
// myfile.js output
[
{ "id": 1, "name": "blah blah", ... },
{ "id": 2, "name": "xxx", ... },
...
]
In my bash script, I want to iterate through each object.
I've tried following, but it doesn't work.
#!/bin/bash
output=$(myfile.js)
for row in $(echo ${output} | jq -c '.[]'); do
echo $row
done
You are trying to invoke myfile.js as a command. You need this:
output=$(cat myfile.js)
instead of this:
output=$(myfile.js)
But even then, your current approach isn't going to work well if the data has whitespace in it (which it does, based on the sample you posted). I suggest the following alternative:
jq -c '.[]' < myfile.js |
while read -r row
do
echo "$row"
done
Output:
{"id":1,"name":"blah blah"}
{"id":2,"name":"xxx"}
Edit:
If your data is arising from a previous process invocation, such as mongo in your case, you can pipe it directly to jq (to remain portable), like this:
mongo myfile.js |
jq -c '.[]' |
while read -r row
do
echo "$row"
done
How can I make jq -c '.[]' < (mongo myfile.js) work?
In a bash shell, you would write an expression along the following lines:
while read -r line ; do .... done < <(mongo myfile.js | jq -c .[])
Note that there are two occurrences of "<" in the above expression.
Also, the above assumes mongo is emitting valid JSON. If it emits //-style comments, those would have somehow to be removed.
Comparison with piping into while
If you use the idiom:
... | while read -r line ; do .... done
then the bindings of any variables in .... will be lost.

jq - Parsing fields with hyphen - Invalid Numeric Literal

I'm trying to pull a list of product categories from an API using jq and some nested for-loops. I have to pull the category ID first, then I'm able to pull product details. Some of the category IDs have hypens and jq seems to be treating them like math instead of a string, and I've tried every manner of quoting but I'm still running into this error. In Powershell, I'm able to pull the list just fine, but I really need this to work in bash.
Here's the expected list:
aprons
backpacks
beanies
bracelet
coaster
cutting-board
dress-shirts
duffel-bags
earring
full-brim-hats
generic-dropoff
hats
etc...
And trying to recreate the same script in Bash, here's the output:
aprons
backpacks
beanies
bracelet
coaster
parse error: Invalid numeric literal at line 1, column 6
parse error: Invalid numeric literal at line 1, column 7
parse error: Invalid numeric literal at line 1, column 5
earring
parse error: Invalid numeric literal at line 2, column 0
parse error: Invalid numeric literal at line 1, column 5
parse error: Invalid numeric literal at line 1, column 8
hats
etc...
You can see that it's running into this error with all values that contain hyphens. Here's my current script:
#!/bin/bash
CATEGORIES=$(curl -s https://api.scalablepress.com/v2/categories)
IFS=$' \t\n'
for CATEGORY in $(echo $CATEGORIES | jq -rc '.[]')
do
CATEGORY_IDS=$(echo $CATEGORY | jq -rc '."categoryId"')
for CATEGORY_ID in $(echo $CATEGORY_IDS)
do
echo $CATEGORY_ID
PRODUCT_IDS=$(curl -s https://api.scalablepress.com/v2/categories/$CATEGORY_ID | jq -rc '.products[].id')
#for PRODUCT_ID in $(echo $PRODUCT_IDS)
#do
#echo $PRODUCT_ID
#done
done
done
This is a publicly available API so you should be able to copy this script and produce the same results. All of the guides I've seen have said to put double quotes around the field you're trying to parse if it contains hyphens, but I'm having no luck trying that.
you can loop over categories ids right away, without doing all the "echos" that break the json. the two loops can be rewritten as:
#!/bin/bash
CATURL="https://api.scalablepress.com/v2/categories"
curl -s "$CATURL" | jq -rc '.[] | .categoryId' | while read catid; do
echo "$catid"
curl -s "$CATURL/$catid" | jq -rc '.products[].id'
done
this will print category id followed by all products ids which from you code seems like your end result:
$ ./pullcat.sh
aprons
port-authority-port-authority-â-medium-length-apron-with-pouch-pockets
port-authority-port-authority-â-full-length-apron-with-pockets
port-authority-easy-care-reversible-waist-apron-with-stain-release
port-authority-easy-care-waist-apron-with-stain-release
backpacks
port-authority-â-wheeled-backpack
nike-performance-backpack
port-authority-â-value-backpack
port-authority-â-basic-backpack
port-authority-â-cyber-backpack
port-authority-â-commuter-backpack
port-authority-â-contrast-honeycomb-backpack
port-authority-â-camo-xtreme-backpack
port-authority-â-xtreme-backpack
port-authority-â-xcapeâ-computer-backpack
port-authority-â-nailhead-backpack
nike-elite-backpack
port-authority-â-urban-backpack
eddie-bauer-eddie-bauer-â-ripstop-backpack
the-north-face-aurora-ii-backpack
the-north-face-fall-line-backpack
the-north-face-groundwork-backpack
the-north-face-connector-backpack
beanies
rabbit-skins-infant-baby-rib-cap
yupoong-adult-cuffed-knit-cap
ultra-club-adult-knit-beanie-with-cuff
ultra-club-adult-knit-beanie
ultra-club-adult-two-tone-knit-beanie
ultra-club-adult-knit-beanie-with-lid
ultra-club-adult-waffle-beanie
ultra-club-adult-knit-pom-pom-beanie-with-cuff
bayside-beanie
...
if you want just the categories ids, you can of course "drop" while loop:
#!/bin/bash
CATURL="https://api.scalablepress.com/v2/categories"
curl -s "$CATURL" | jq -rc '.[] | .categoryId'
$ ./pullcat.sh
aprons
backpacks
beanies
bracelet
coaster
cutting-board
dress-shirts
duffel-bags
earring
full-brim-hats
generic-dropoff
hats
hoodies
infant-shirts
ladies-dress-shirts
ladies-dresses
ladies-long-sleeve
ladies-pants
ladies-performance-shirts
ladies-polos
ladies-short-sleeve
ladies-tank-tops
large-bags
...
You can select the key categoryId for each object in the array by applying the selector: curl -s https://api.scalablepress.com/v2/categories | jq 'map(.categoryId)'
This will give you a JSON array with only the values you're interested in. Then you can use the antislurp filter .[] to turn the array into individual results. jq can then output raw strings with the -r switch.
Combining everything, you can achieve what you're looking for with a one-liner:
curl -s https://api.scalablepress.com/v2/categories | jq -r 'map(.categoryId) | .[]'
Even better, you can antislurp first, and then select the key you're looking for: curl -s https://api.scalablepress.com/v2/categories | jq -r '.[] | .categoryId'

jq raw json output carriage return?

Feel free to edit the title; not sure how to word it. I'm trying to turn shell output into JSON data for a reporting system I'm writing for work. Quick question, no matter what i do, when I take raw input in slurp mode and output the JSON, the last item in the array is blank (""). I feel like this is some sort of rookie jq issue I'm running into, but can't figure out how to word the issue. This seems to happen no matter what command I run on the shell and pipe to jq:
# rpm -qa | grep kernel | jq -R -s 'split("\n")'
[
"kernel-2.6.32-504.8.1.el6.x86_64",
"kernel-firmware-2.6.32-696.20.1.el6.noarch",
"kernel-headers-2.6.32-696.20.1.el6.x86_64",
"dracut-kernel-004-409.el6_8.2.noarch",
"abrt-addon-kerneloops-2.0.8-43.el6.x86_64",
"kernel-devel-2.6.32-358.11.1.el6.x86_64",
"kernel-2.6.32-131.4.1.el6.x86_64",
"kernel-devel-2.6.32-696.20.1.el6.x86_64",
"kernel-2.6.32-696.20.1.el6.x86_64",
"kernel-devel-2.6.32-504.8.1.el6.x86_64",
"libreport-plugin-kerneloops-2.0.9-33.el6.x86_64",
""
]
Any help is appreciated.
Every line ends with a newline. Either remove the final newline, or omit the empty element at the end of the array.
vnix$ printf 'foo\nbar\n' |
> jq -R -s '.[:-1] | split("\n")'
[
"foo",
"bar"
]
vnix$ printf 'foo\nbar\n' |
> jq -R -s 'split("\n")[:-1]'
[
"foo",
"bar"
]
The notation x[:-1] retrieves the value of a string or array x with the last element removed. This is called "slice notation".
Just to spell this out, if you take the string "foo\n" and split on newline, you get "foo" from before the newline and "" after it.
To make this really robust, maybe trim the last character only if it really is a newline.
vnix$ printf 'foo\nbar\n' |
> jq -R -s 'sub("\n$";"") | split("\n")'
[
"foo",
"bar"
]
vnix$ printf 'foo\nbar' |
> # notice, no final ^ newine
> jq -R -s 'sub("\n$";"") | split("\n")'
[
"foo",
"bar"
]
Assuming you have access to jq 1.5 or later, you can circumvent the problem entirely and economically using inputs:
jq -nR '[inputs]'
Just be sure to include the -n option, otherwise the first line will go missing.
You can also use
rpm -qa | grep kernel | jq -R . | jq -s .
to get the desired result.
Please see https://github.com/stedolan/jq/issues/563

Fix "is not valid in a csv row" for jq, by transforming array to string

I try to export a CSV from Neo4j with jq, with:
curl --header "Authorization: Basic myBase64hash=" -H accept:application/json -H content-type:application/json \
-d '{"statements":[{"statement":"MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)) RETURN path"}]}' \
http://localhost:7474/db/data/transaction/commit \
| jq -r '(.results[0]) | .columns,.data[].row | #csv' > '/tmp/export-subset.csv'
But I'm getting this error message:
jq: error (at <stdin>:0): array ([{"email":"...) is not valid in a csv row
I think it's because of I have multiple e-mail adresses,
is it possible to place all of them in a CSV cell seperated by comma?
How can I achieve that with jq?
Edit:
This is an example of my JSON file:
{"results":[{"columns":["path"],"data":[{"row":[[{"email":"gdggdd#gmail.com"},{},{"date_found":"2011-11-29 12:51:14","last_name":"Doe","provider_id":2649,"first_name":"John"},{},{"number":"133","lon":3.21114,"lat":22.8844},{},{"street_name":"Govstreet"},{},{"hood":"Rotterdam"}]],"meta":[[{"id":71390,"type":"node","deleted":false},{"id":226866,"type":"relationship","deleted":false},{"id":63457,"type":"node","deleted":false},{"id":227100,"type":"relationship","deleted":false},{"id":65076,"type":"node","deleted":false},{"id":214799,"type":"relationship","deleted":false},{"id":63915,"type":"node","deleted":false},{"id":226552,"type":"relationship","deleted":false},{"id":71120,"type":"node","deleted":false}]]}]}],"errors":[]}
Forgive me but I'm not familiar with Cypher syntax or how your data is actually structured, you don't provide much detail about that. But what I can gather, based on your sample output, each "row" item seems to correspond to what you return in your Cypher query.
Apparently you're returning path which is an entire set of nodes and relationships, and not necessarily just the data you're actually interested in.
MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood))
RETURN path
You just want the email addresses so you should probably just return the email. If I understand the syntax correctly, you could change that to this:
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email
I believe that should result in something that looks something like this:
{
"results": [
{
"columns": [ "email" ],
"data": [
{
"row": [
"gdggdd#gmail.com"
],
"meta": [
{
"id": 71390,
"type": "string",
"deleted": false
}
]
}
]
}
],
"errors": []
}
Then it should be trivial to export that data to csv using jq since the rows can be converted directly:
.results[0] | .columns, .data[].row | #csv
On the other hand, I could be completely wrong on what that output would actually look like. So just working with your example, if you just want emails, you need to map the rows to just the email.
.results[0] | .columns, (.data[].row | map(.[0].email)) | #csv
In case I misinterpreted, if you were intending to output all values and not just the email, you should select just the values in your Cypher query.
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email, p.date_found, p.last_name, p.provider_id, p.first_name,
h.number, h.lon, h.lat, s.street_name, n.hood
Then if my assumptions on the output are correct, the trivial jq query should give you your csv.
Since you want the keys in their original order, use keys_unsorted. This should get you on your way:
$ jq -r -c '.results[0] | .data[] | .row[]
| add
| keys_unsorted as $keys
| ($keys, [.[$keys[]]])
| #csv' input.json
(The newlines here are mainly for legibility.)
With your illustrative input, the output would be:
"email","date_found","last_name","provider_id","first_name","number","lon","lat","street_name","hood"
"gdggdd#gmail.com","2011-11-29 12:51:14","Doe",2649,"John","133",3.21114,22.8844,"Govstreet","Rotterdam"
Of course, in practice, you will probably have multiple lines of data, so in that case, you will probably want to make adjustments to ensure the headers are only printed once.