Filter JSON on the command line - json

I want to filter JSON on the command line.
Task: print the "name" of each dictionary in the json list.
Example json:
[
{
"id":"d963984c-1075-4d25-8cd0-eae9a7e2d130",
"extra":{
"foo":false,
"bar":null
},
"created_at":"2020-05-06T15:31:59Z",
"name":"NAME1"
},
{
"id":"ee63984c-1075-4d25-8cd0-eae9a7e2d1xx",
"name":"NAME2"
}
]
Desired output:
NAME1
NAME2
This script would work:
#!/usr/bin/env python
import json
import sys
for item in json.loads(sys.stdin.read()):
print(item['name'])
But since I am very lazy, I am looking for a solution where I need to type less. For example in on the command line in a pipe:
curl https://example.com/get-json | MAGIC FILTER
I asked at code golf but they told me that it would make more sense to ask here.

You can use jq https://stedolan.github.io/jq/manual/
% curl https://example.com/get-json | jq -r '.[].name'
NAME1
NAME2
-r if the filter's result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes

Related

Moving column to the end of JSON in bash

I'm officially out of options, I tried everything.
I have a CSV that looks like this
from/email
to/email
template_id
me#x.com
mike#x.com
12345
me#x.com
pete#x.com
12345
I run a package called csvkit to convert CSV to JSON like this
csvjson input.csv > output.json
and pipe it into curl
curl -X POST http://website.com -d #output.json
and get a big fat error from the server saying "to/email is required"
I check my json in Sublime and it's fine
[
{
"from/email": "me#x.com",
"to/0/email": "mike#x.com",
"template_id": "12345"
}
]
but I check my json with the terminal jtbl tool to visualize json
cat output.json | jtbl
and I get
from/email
template_id
to/email
me#x.com
12345
mike#x.com
me#x.com
12345
pete#x.com
which makes no sense. I have no idea what I'm doing wrong. Is there a way to move my template_id column back to the end of the file instead of in the middle?
Since the Q has the jq tag, it might help to note that jq does (unless otherwise instructed) preserve the ordering of keys. Apart from the headers, you could do worse than:
jq -r '(.[0]|keys_unsorted) as $keys | [.[][$keys[]]] | #csv' output.json

Parsing JSON using jq or Python

I have this nested JSON
[
"[[Input=[Name=ABC, createDateTime=2019-30-11, RollNumber=9]]]",
"[[SubjectList=[Summer=, Winter=, Autumn=, Spring=, rList=, sList=, additionalList=, emailList=, FoodList=, sAssignmentList=, summerworkList=, outdoorList=, movielist=]]]",
"[ProcessingDate=2018-10-06]",
"[Hobbies=Football]",
"[Phone=Android,,]"
]
How can I process this JSON and get the value football or rollnumber using Python?
This is what I tried:
Code
import json
row = '''[
"[[Input=[Name=ABC, createDateTime=2019-30-11, RollNumber=9]]]",
"[[SubjectList=[Summer=, Winter=, Autumn=, Spring=, rList=, sList=, additionalList=, emailList=, FoodList=, sAssignmentList=, summerworkList=, outdoorList=, movielist=]]]",
"[ProcessingDate=2018-10-06]",
"[Hobbies=Football]",
"[Phone=Android,,]"
]'''
row_dict = json.loads(row)
print(row_dict[3])
Using this - I get following output:
[Hobbies=Football]
But I am missing next level parsing to get just football as output
Here is an approach that uses capture on the non-json strings in the array.
It assumes the [:alnum:] posix regex character class suffices to match the values after the =
Sample execution assuming data in test.json
$ jq -M '.[] | capture("Hobbies=(?<Hobbies>[[:alnum:]]+)")' test.json
{
"Hobbies": "Football"
}
Here is a variation which produces exactly Football:
$ jq -Mr '.[] | capture("Hobbies=(?<Hobbies>[[:alnum:]]+)") | .Hobbies' test.json
Football
Here's an example script which uses multiple captures and combines them with add
[ .[]
| capture("Hobbies=(?<Hobbies>[[:alnum:]]+)")
, capture("RollNumber=(?<RollNumber>[[:alnum:]]+)")
] | add
Sample execution assuming script in test.jq
$ jq -M -f test.jq test.json
{
"RollNumber": "9",
"Hobbies": "Football"
}

Extract data from json file using grep and sed

I have a json file named output.json. It has a simple key:value format, e.g.:
{
"key":"value",
"key":"value",
"key":"value",
"key":"value",
}
I want to extract "value part".
If anyone can write me a command that will be really helpful.
With jq (which is much better suited for parsing and filtering JSON than grep/sed/awk/etc) you can extract all values with values function:
$ echo '{"a":1, "b":2, "c":3}' | jq '.[]|values'
1
2
3
Alternatively (since you mention you already use Python in your pipeline), you can do it like:
#!/usr/bin/env python
import json
my_values = json.load('output.json').values()

Bash jq modify json : get and set

I use jq to parse and modify cURL response and it works perfect for all of my requirements except one. I wish to modify a key value in the json, like:
A) Input json
[
{
"id": 169,
"path": "dir1/dir2"
}
]
B) Output json
[
{
"id": 169,
"path": "dir1"
}
]
So the last directory is removed from the path. I use the script:
curl --header -X GET -k "${URL}" | jq '[.[] | {id: .id, path: .path_with_namespace}]' | jq '(.[] | .path) = "${.path%/*}"'
The last pipe is ofcourse not correct and this is where I am stuck. The point is to get the path value and modify it. Any help is appreciated.
One way to do this is to use split and join to process the path, and use |= to bind the correct expression to the .path attribute.
... | jq '.[] | .path|=(split("/")[:-1]|join("/"))
split("/") takes a string and returns an array
x[:-1] returns an array consisting of all but the last element of x
join("/") combines the elements of the incoming array with / to return a single string.
.path|=x takes the value of .path, feeds it through the filter x, and assigns the resulting value to .path again.

Fix "is not valid in a csv row" for jq, by transforming array to string

I try to export a CSV from Neo4j with jq, with:
curl --header "Authorization: Basic myBase64hash=" -H accept:application/json -H content-type:application/json \
-d '{"statements":[{"statement":"MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)) RETURN path"}]}' \
http://localhost:7474/db/data/transaction/commit \
| jq -r '(.results[0]) | .columns,.data[].row | #csv' > '/tmp/export-subset.csv'
But I'm getting this error message:
jq: error (at <stdin>:0): array ([{"email":"...) is not valid in a csv row
I think it's because of I have multiple e-mail adresses,
is it possible to place all of them in a CSV cell seperated by comma?
How can I achieve that with jq?
Edit:
This is an example of my JSON file:
{"results":[{"columns":["path"],"data":[{"row":[[{"email":"gdggdd#gmail.com"},{},{"date_found":"2011-11-29 12:51:14","last_name":"Doe","provider_id":2649,"first_name":"John"},{},{"number":"133","lon":3.21114,"lat":22.8844},{},{"street_name":"Govstreet"},{},{"hood":"Rotterdam"}]],"meta":[[{"id":71390,"type":"node","deleted":false},{"id":226866,"type":"relationship","deleted":false},{"id":63457,"type":"node","deleted":false},{"id":227100,"type":"relationship","deleted":false},{"id":65076,"type":"node","deleted":false},{"id":214799,"type":"relationship","deleted":false},{"id":63915,"type":"node","deleted":false},{"id":226552,"type":"relationship","deleted":false},{"id":71120,"type":"node","deleted":false}]]}]}],"errors":[]}
Forgive me but I'm not familiar with Cypher syntax or how your data is actually structured, you don't provide much detail about that. But what I can gather, based on your sample output, each "row" item seems to correspond to what you return in your Cypher query.
Apparently you're returning path which is an entire set of nodes and relationships, and not necessarily just the data you're actually interested in.
MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood))
RETURN path
You just want the email addresses so you should probably just return the email. If I understand the syntax correctly, you could change that to this:
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email
I believe that should result in something that looks something like this:
{
"results": [
{
"columns": [ "email" ],
"data": [
{
"row": [
"gdggdd#gmail.com"
],
"meta": [
{
"id": 71390,
"type": "string",
"deleted": false
}
]
}
]
}
],
"errors": []
}
Then it should be trivial to export that data to csv using jq since the rows can be converted directly:
.results[0] | .columns, .data[].row | #csv
On the other hand, I could be completely wrong on what that output would actually look like. So just working with your example, if you just want emails, you need to map the rows to just the email.
.results[0] | .columns, (.data[].row | map(.[0].email)) | #csv
In case I misinterpreted, if you were intending to output all values and not just the email, you should select just the values in your Cypher query.
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email, p.date_found, p.last_name, p.provider_id, p.first_name,
h.number, h.lon, h.lat, s.street_name, n.hood
Then if my assumptions on the output are correct, the trivial jq query should give you your csv.
Since you want the keys in their original order, use keys_unsorted. This should get you on your way:
$ jq -r -c '.results[0] | .data[] | .row[]
| add
| keys_unsorted as $keys
| ($keys, [.[$keys[]]])
| #csv' input.json
(The newlines here are mainly for legibility.)
With your illustrative input, the output would be:
"email","date_found","last_name","provider_id","first_name","number","lon","lat","street_name","hood"
"gdggdd#gmail.com","2011-11-29 12:51:14","Doe",2649,"John","133",3.21114,22.8844,"Govstreet","Rotterdam"
Of course, in practice, you will probably have multiple lines of data, so in that case, you will probably want to make adjustments to ensure the headers are only printed once.