Parsing this JSON to obtain a human readable list - json

I have a JSON file whit this structure. For every URL field, we have a RESULT field which contains hundreds of LINKS. Would it be somehow possible to parse it and obtain a (ie. csv) list which contains all the LINKS for every URL?
[{
"url": "https://example.org/yyy",
"result": "{\"links\":[{\"link\":\"https://example.org/xxx/xxx\",\"text\":\"\"
},
{
\"link\":\"https://example.org/xxx/xxx\",\"text\":\"\"
},
{
\"link\":\"https://example.org/xxx/xxx\",\"text\":\"yyy\"}[.......]
Thanks in advance

Here is a solution using jq. If data.json contains the sample data
[{"url": "https://example.org/yyy", "result": "{\"links\":[{\"link\":\"https://example.org/xx1/xx1\",\"text\":\"\"},{\"link\":\"https://example.org/xx2/xx2\",\"text\":\"\"}]}"}]
then the command
$ jq -Mr '.[].result | fromjson | .links[].link' data.json
produces
https://example.org/xx1/xx1
https://example.org/xx2/xx2
If you would like both the url and the links, the command
$ jq -Mr '.[] | .url as $url | .result | fromjson | "\($url),\(.links[].link)"' data.json
produces
https://example.org/yyy,https://example.org/xx1/xx1
https://example.org/yyy,https://example.org/xx2/xx2

Related

Parsing JSON using jq or Python

I have this nested JSON
[
"[[Input=[Name=ABC, createDateTime=2019-30-11, RollNumber=9]]]",
"[[SubjectList=[Summer=, Winter=, Autumn=, Spring=, rList=, sList=, additionalList=, emailList=, FoodList=, sAssignmentList=, summerworkList=, outdoorList=, movielist=]]]",
"[ProcessingDate=2018-10-06]",
"[Hobbies=Football]",
"[Phone=Android,,]"
]
How can I process this JSON and get the value football or rollnumber using Python?
This is what I tried:
Code
import json
row = '''[
"[[Input=[Name=ABC, createDateTime=2019-30-11, RollNumber=9]]]",
"[[SubjectList=[Summer=, Winter=, Autumn=, Spring=, rList=, sList=, additionalList=, emailList=, FoodList=, sAssignmentList=, summerworkList=, outdoorList=, movielist=]]]",
"[ProcessingDate=2018-10-06]",
"[Hobbies=Football]",
"[Phone=Android,,]"
]'''
row_dict = json.loads(row)
print(row_dict[3])
Using this - I get following output:
[Hobbies=Football]
But I am missing next level parsing to get just football as output
Here is an approach that uses capture on the non-json strings in the array.
It assumes the [:alnum:] posix regex character class suffices to match the values after the =
Sample execution assuming data in test.json
$ jq -M '.[] | capture("Hobbies=(?<Hobbies>[[:alnum:]]+)")' test.json
{
"Hobbies": "Football"
}
Here is a variation which produces exactly Football:
$ jq -Mr '.[] | capture("Hobbies=(?<Hobbies>[[:alnum:]]+)") | .Hobbies' test.json
Football
Here's an example script which uses multiple captures and combines them with add
[ .[]
| capture("Hobbies=(?<Hobbies>[[:alnum:]]+)")
, capture("RollNumber=(?<RollNumber>[[:alnum:]]+)")
] | add
Sample execution assuming script in test.jq
$ jq -M -f test.jq test.json
{
"RollNumber": "9",
"Hobbies": "Football"
}

How to convert json into csv file using jq?

This is my json file:
{
"ClientCountry": "ca",
"ClientASN": 812,
"CacheResponseStatus": 404,
"CacheResponseBytes": 130756,
"CacheCacheStatus": "hit"
}
{
"ClientCountry": "ua",
"ClientASN": 206996,
"CacheResponseStatus": 301,
"CacheResponseBytes": 142,
"CacheCacheStatus": "unknown"
}
{
"ClientCountry": "ua",
"ClientASN": 206996,
"CacheResponseStatus": 0,
"CacheResponseBytes": 0,
"CacheCacheStatus": "unknown"
}
I want to convert these json into csv like below.
"ClientCountry", "ClientASN","CacheResponseStatus", "CacheResponseBytes", "CacheCacheStatus"
"ca", 812, 404, 130756, "hit";
"ua", 206996, 301, 142,"unknown";
"ua", 206996, 0,0,"unknown";
Please let me know how to achieve this using jq?
I just tried below. But its not working.
jq 'to_entries[] | [.key, .value] | #csv'
Regards
Palani
Since you want all the key-values,
then assuming that the keys are presented in a consistent order in the input file, you can simply write:
jq -r '[.[]] | #csv' palanikumar.json
With the given input, this produces the following CSV:
"ca",812,404,130756,"hit"
"ua",206996,301,142,"unknown"
"ua",206996,0,0,"unknown"
Adding the headers and the trailing semicolons (if you really want them) is left as a (very easy) exercise.
Inconsistent ordering
If the ordering of the keys varies or might vary, then the following could be used to produce suitable CSV, assuming that the ordering of the keys in the first object in the input stream should be used:
input
| . as $first
| keys_unsorted as $keys
| $keys, [$first[]], (inputs | [.[$keys[]]]) | #csv
The appropriate invocation of jq would include both the -n and -r command-line options.
Look at these links
How to convert arbirtrary simple JSON to CSV using jq?
http://bigdatums.net/2017/09/30/convert-json-to-csv-with-jq/
( jq -r '.myarray | #csv' )

Parsing nested json with jq

I am parsing a nested json to get specific values from the json response. The json response is as follows:
{
"custom_classes": 2,
"images":
[
{
"classifiers":
[
{
"classes":
[
{
"class": "football",
"score": 0.867376
}
],
"classifier_id": "players_367677167",
"name": "players"
}
],
"image": "1496A400EDC351FD.jpg"
}
],
"images_processed": 1
}
From the class images=>classifiers=>classes:"class" & "score" are the values that I want to save in a csv file. I have found how to save the result in a csv file. But I am unable to parse the images alone. I can get custom_classes and image_processed.
I am using jq-1.5.
The different commands I have tried :
curl "Some address"| jq '.["images"]'
curl "Some address"| jq '.[.images]'
curl "Some address"| jq '.[.images["image"]]'
Most of the times the error is about not being able to index the array images.
Any hints?
I must say, I'm not terribly good at jq, so probably all those array iterations could be shorthanded somehow, but this yields the values you mentioned:
cat foo.json | jq ".[] | .images | .[] | .classifiers | .[] | .classes | .[] | .[]"
If you want the keys, too, just omit that last .[].`
Edit
As #chepner pointed out in the comments, this can indeed be shortened to
cat foo.json | jq ".images[].classifiers[].classes[] | [.class, .score] | #csv "
Depending on the data this filter which uses Recursive Descent: .., objects and has may work:
.. | objects | select(has("class")) | [.class,.score] | #csv
Sample Run (assuming data in data.json)
$ jq -Mr '.. | objects | select(has("class")) | [.class,.score] | #csv' data.json
"football",0.867376
Try it online at jqplay.org
Here is another variation which uses paths and getpath
getpath( paths(has("class")?) ) | [.class,.score] | #csv
Try it online at jqplay.org
jq solution to obtain a prepared csv record:
jq -r '.images[0].classifiers[0].classes[0] | [.class, .score] | #csv' input.json
The output:
"football",0.867376

Bash jq modify json : get and set

I use jq to parse and modify cURL response and it works perfect for all of my requirements except one. I wish to modify a key value in the json, like:
A) Input json
[
{
"id": 169,
"path": "dir1/dir2"
}
]
B) Output json
[
{
"id": 169,
"path": "dir1"
}
]
So the last directory is removed from the path. I use the script:
curl --header -X GET -k "${URL}" | jq '[.[] | {id: .id, path: .path_with_namespace}]' | jq '(.[] | .path) = "${.path%/*}"'
The last pipe is ofcourse not correct and this is where I am stuck. The point is to get the path value and modify it. Any help is appreciated.
One way to do this is to use split and join to process the path, and use |= to bind the correct expression to the .path attribute.
... | jq '.[] | .path|=(split("/")[:-1]|join("/"))
split("/") takes a string and returns an array
x[:-1] returns an array consisting of all but the last element of x
join("/") combines the elements of the incoming array with / to return a single string.
.path|=x takes the value of .path, feeds it through the filter x, and assigns the resulting value to .path again.

Fix "is not valid in a csv row" for jq, by transforming array to string

I try to export a CSV from Neo4j with jq, with:
curl --header "Authorization: Basic myBase64hash=" -H accept:application/json -H content-type:application/json \
-d '{"statements":[{"statement":"MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)) RETURN path"}]}' \
http://localhost:7474/db/data/transaction/commit \
| jq -r '(.results[0]) | .columns,.data[].row | #csv' > '/tmp/export-subset.csv'
But I'm getting this error message:
jq: error (at <stdin>:0): array ([{"email":"...) is not valid in a csv row
I think it's because of I have multiple e-mail adresses,
is it possible to place all of them in a CSV cell seperated by comma?
How can I achieve that with jq?
Edit:
This is an example of my JSON file:
{"results":[{"columns":["path"],"data":[{"row":[[{"email":"gdggdd#gmail.com"},{},{"date_found":"2011-11-29 12:51:14","last_name":"Doe","provider_id":2649,"first_name":"John"},{},{"number":"133","lon":3.21114,"lat":22.8844},{},{"street_name":"Govstreet"},{},{"hood":"Rotterdam"}]],"meta":[[{"id":71390,"type":"node","deleted":false},{"id":226866,"type":"relationship","deleted":false},{"id":63457,"type":"node","deleted":false},{"id":227100,"type":"relationship","deleted":false},{"id":65076,"type":"node","deleted":false},{"id":214799,"type":"relationship","deleted":false},{"id":63915,"type":"node","deleted":false},{"id":226552,"type":"relationship","deleted":false},{"id":71120,"type":"node","deleted":false}]]}]}],"errors":[]}
Forgive me but I'm not familiar with Cypher syntax or how your data is actually structured, you don't provide much detail about that. But what I can gather, based on your sample output, each "row" item seems to correspond to what you return in your Cypher query.
Apparently you're returning path which is an entire set of nodes and relationships, and not necessarily just the data you're actually interested in.
MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood))
RETURN path
You just want the email addresses so you should probably just return the email. If I understand the syntax correctly, you could change that to this:
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email
I believe that should result in something that looks something like this:
{
"results": [
{
"columns": [ "email" ],
"data": [
{
"row": [
"gdggdd#gmail.com"
],
"meta": [
{
"id": 71390,
"type": "string",
"deleted": false
}
]
}
]
}
],
"errors": []
}
Then it should be trivial to export that data to csv using jq since the rows can be converted directly:
.results[0] | .columns, .data[].row | #csv
On the other hand, I could be completely wrong on what that output would actually look like. So just working with your example, if you just want emails, you need to map the rows to just the email.
.results[0] | .columns, (.data[].row | map(.[0].email)) | #csv
In case I misinterpreted, if you were intending to output all values and not just the email, you should select just the values in your Cypher query.
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email, p.date_found, p.last_name, p.provider_id, p.first_name,
h.number, h.lon, h.lat, s.street_name, n.hood
Then if my assumptions on the output are correct, the trivial jq query should give you your csv.
Since you want the keys in their original order, use keys_unsorted. This should get you on your way:
$ jq -r -c '.results[0] | .data[] | .row[]
| add
| keys_unsorted as $keys
| ($keys, [.[$keys[]]])
| #csv' input.json
(The newlines here are mainly for legibility.)
With your illustrative input, the output would be:
"email","date_found","last_name","provider_id","first_name","number","lon","lat","street_name","hood"
"gdggdd#gmail.com","2011-11-29 12:51:14","Doe",2649,"John","133",3.21114,22.8844,"Govstreet","Rotterdam"
Of course, in practice, you will probably have multiple lines of data, so in that case, you will probably want to make adjustments to ensure the headers are only printed once.