Bash jq modify json : get and set - json

I use jq to parse and modify cURL response and it works perfect for all of my requirements except one. I wish to modify a key value in the json, like:
A) Input json
[
{
"id": 169,
"path": "dir1/dir2"
}
]
B) Output json
[
{
"id": 169,
"path": "dir1"
}
]
So the last directory is removed from the path. I use the script:
curl --header -X GET -k "${URL}" | jq '[.[] | {id: .id, path: .path_with_namespace}]' | jq '(.[] | .path) = "${.path%/*}"'
The last pipe is ofcourse not correct and this is where I am stuck. The point is to get the path value and modify it. Any help is appreciated.

One way to do this is to use split and join to process the path, and use |= to bind the correct expression to the .path attribute.
... | jq '.[] | .path|=(split("/")[:-1]|join("/"))
split("/") takes a string and returns an array
x[:-1] returns an array consisting of all but the last element of x
join("/") combines the elements of the incoming array with / to return a single string.
.path|=x takes the value of .path, feeds it through the filter x, and assigns the resulting value to .path again.

Related

How to iterate a JSON array of objects with jq and grab multiple variables from each object in each loop

I need to grab variables from JSON properties.
The JSON array looks like this (GitHub API for repository tags), which I obtain from a curl request.
[
{
"name": "my-tag-name",
"zipball_url": "https://api.github.com/repos/path-to-my-tag-name",
"tarball_url": "https://api.github.com/repos/path-to-my-tag-name-tarball",
"commit": {
"sha": "commit-sha",
"url": "https://api.github.com/repos/path-to-my-commit-sha"
},
"node_id": "node-id"
},
{
"name": "another-tag-name",
"zipball_url": "https://api.github.com/repos/path-to-my-tag-name",
"tarball_url": "https://api.github.com/repos/path-to-my-tag-name-tarball",
"commit": {
"sha": "commit-sha",
"url": "https://api.github.com/repos/path-to-my-commit-sha"
},
"node_id": "node-id"
},
]
In my actual JSON there are 100s of objects like these.
While I loop each one of these I need to grab the name and the commit URL, then perform more operations with these two variables before I get to the next object and repeat.
I tried (with and without -r)
tags=$(curl -s -u "${GITHUB_USERNAME}:${GITHUB_TOKEN}" -H "Accept: application/vnd.github.v3+json" "https://api.github.com/repos/path-to-my-repository/tags?per_page=100&page=${page}")
for row in $(jq -r '.[]' <<< "$tags"); do
tag=$(jq -r '.name' <<< "$row")
# I have also tried with the syntax:
url=$(echo "${row}" | jq -r '.commit.url')
# do stuff with $tag and $url...
done
But I get errors like:
parse error: Unfinished JSON term at EOF at line 2, column 0 jq: error
(at :1): Cannot index string with string "name" } parse error:
Unmatched '}' at line 1, column 1
And from the terminal output it appears that it is trying to parse $row in a strange way, trying to grab .name from every substring? Not sure.
I am assuming the output from $(jq '.[]' <<< "$tags") could be valid JSON, from which I could again use jq to grab the object properties I need, but maybe that is not the case? If I output ${row} it does look like valid JSON to me, and I tried pasting the results in a JSON validator, everything seems to check out...
How do I grab the ".name" and ".commit.url" for each of these object before I move onto the next one?
Thanks
It would be better to avoid calling jq more than once. Consider, for example:
while read -r name ; do
read -r url
echo "$name" "$url"
done < <( curl .... | jq -r '.[] | .name, .commit.url' )
where curl .... signifies the relevant invocation of curl.

How to find something in a json file using Bash

I would like to search a JSON file for some key or value, and have it print where it was found.
For example, when using jq to print out my Firefox' extensions.json, I get something like this (using "..." here to skip long parts) :
{
"schemaVersion": 31,
"addons": [
{
"id": "wetransfer#extensions.thunderbird.net",
"syncGUID": "{e6369308-1efc-40fd-aa5f-38da7b20df9b}",
"version": "2.0.0",
...
},
{
...
}
]
}
Say I would like to search for "wetransfer#extensions.thunderbird.net", and would like an output which shows me where it was found with something like this:
{ "addons": [ {"id": "wetransfer#extensions.thunderbird.net"} ] }
Is there a way to get that with jq or with some other json tool?
I also tried to simply list the various ids in that file, and hoped that I would get it with jq '.id', but that just returned null, because it apparently needs the full path.
In other words, I'm looking for a command-line json parser which I could use in a way similar to Xpath tools
The path() function comes in handy:
$ jq -c 'path(.. | select(. == "wetransfer#extensions.thunderbird.net"))' input.json
["addons",0,"id"]
The resulting path is interpreted as "In the addons field of the initial object, the first array element's id field matches". You can use it with getpath(), setpath(), delpaths(), etc. to get or manipulate the value it describes.
Using your example with modifications to make it valid JSON:
< input.json jq -c --arg s wetransfer#extensions.thunderbird.net '
paths as $p | select(getpath($p) == $s) | null | setpath($p;$s)'
produces:
{"addons":[{"id":"wetransfer#extensions.thunderbird.net"}]}
Note
If there are N paths to the given value, the above will produce N lines. If you want only the first, you could wrap everything in first(...).
Listing all the "id" values
I also tried to simply list the various ids in that file
Assuming that "id" values of false and null are of no interest, you can print all the "id" values of interest using the jq filter:
.. | .id? // empty

Linux CLI - How to get substring from JSON jq + grep?

I need to pull a substring from JSON. In the JSON doc below, I need the end of the value of jq '.[].networkProfile.networkInterfaces[].id' In other words, I need just A10NICvw4konls2vfbw-data to pass to another command. I can't seem to figure out how to pull a substring using grep. I've seem regex examples out there but haven't been successful with them.
[
{
"id": "/subscriptions/blah/resourceGroups/IPv6v2/providers/Microsoft.Compute/virtualMachines/A10VNAvw4konls2vfbw",
"instanceView": null,
"licenseType": null,
"location": "centralus",
"name": "A10VNAvw4konls2vfbw",
"networkProfile": {
"networkInterfaces": [
{
"id": "/subscriptions/blah/resourceGroups/IPv6v2/providers/Microsoft.Network/networkInterfaces/A10NICvw4konls2vfbw-data",
"resourceGroup": "IPv6v2"
}
]
}
}
]
In your case, sub(".*/";"") will do the trick as * is greedy:
.[].networkProfile.networkInterfaces[].id | sub(".*/";"")
Try this:
jq -r '.[]|.networkProfile.networkInterfaces[].id | split("/") | last'
The -r tells JQ to print the output in "raw" form - in this case, that means no double-quotes around the string value.
As for the jq expression, after you access the id you want, piping it (still inside jq) through split("/") turns it into an array of the parts between slashes. Piping that through the last function (thanks, #Thor) returns just the last element of the array.
If you want to do it with grep here is one way:
jq -r '.[].networkProfile.networkInterfaces[].id' | grep -o '[^/]*$'
Output:
A10NICvw4konls2vfbw-data

Fix "is not valid in a csv row" for jq, by transforming array to string

I try to export a CSV from Neo4j with jq, with:
curl --header "Authorization: Basic myBase64hash=" -H accept:application/json -H content-type:application/json \
-d '{"statements":[{"statement":"MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)) RETURN path"}]}' \
http://localhost:7474/db/data/transaction/commit \
| jq -r '(.results[0]) | .columns,.data[].row | #csv' > '/tmp/export-subset.csv'
But I'm getting this error message:
jq: error (at <stdin>:0): array ([{"email":"...) is not valid in a csv row
I think it's because of I have multiple e-mail adresses,
is it possible to place all of them in a CSV cell seperated by comma?
How can I achieve that with jq?
Edit:
This is an example of my JSON file:
{"results":[{"columns":["path"],"data":[{"row":[[{"email":"gdggdd#gmail.com"},{},{"date_found":"2011-11-29 12:51:14","last_name":"Doe","provider_id":2649,"first_name":"John"},{},{"number":"133","lon":3.21114,"lat":22.8844},{},{"street_name":"Govstreet"},{},{"hood":"Rotterdam"}]],"meta":[[{"id":71390,"type":"node","deleted":false},{"id":226866,"type":"relationship","deleted":false},{"id":63457,"type":"node","deleted":false},{"id":227100,"type":"relationship","deleted":false},{"id":65076,"type":"node","deleted":false},{"id":214799,"type":"relationship","deleted":false},{"id":63915,"type":"node","deleted":false},{"id":226552,"type":"relationship","deleted":false},{"id":71120,"type":"node","deleted":false}]]}]}],"errors":[]}
Forgive me but I'm not familiar with Cypher syntax or how your data is actually structured, you don't provide much detail about that. But what I can gather, based on your sample output, each "row" item seems to correspond to what you return in your Cypher query.
Apparently you're returning path which is an entire set of nodes and relationships, and not necessarily just the data you're actually interested in.
MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood))
RETURN path
You just want the email addresses so you should probably just return the email. If I understand the syntax correctly, you could change that to this:
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email
I believe that should result in something that looks something like this:
{
"results": [
{
"columns": [ "email" ],
"data": [
{
"row": [
"gdggdd#gmail.com"
],
"meta": [
{
"id": 71390,
"type": "string",
"deleted": false
}
]
}
]
}
],
"errors": []
}
Then it should be trivial to export that data to csv using jq since the rows can be converted directly:
.results[0] | .columns, .data[].row | #csv
On the other hand, I could be completely wrong on what that output would actually look like. So just working with your example, if you just want emails, you need to map the rows to just the email.
.results[0] | .columns, (.data[].row | map(.[0].email)) | #csv
In case I misinterpreted, if you were intending to output all values and not just the email, you should select just the values in your Cypher query.
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email, p.date_found, p.last_name, p.provider_id, p.first_name,
h.number, h.lon, h.lat, s.street_name, n.hood
Then if my assumptions on the output are correct, the trivial jq query should give you your csv.
Since you want the keys in their original order, use keys_unsorted. This should get you on your way:
$ jq -r -c '.results[0] | .data[] | .row[]
| add
| keys_unsorted as $keys
| ($keys, [.[$keys[]]])
| #csv' input.json
(The newlines here are mainly for legibility.)
With your illustrative input, the output would be:
"email","date_found","last_name","provider_id","first_name","number","lon","lat","street_name","hood"
"gdggdd#gmail.com","2011-11-29 12:51:14","Doe",2649,"John","133",3.21114,22.8844,"Govstreet","Rotterdam"
Of course, in practice, you will probably have multiple lines of data, so in that case, you will probably want to make adjustments to ensure the headers are only printed once.

Flatten nested JSON using jq

I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
Kindly help.
EDIT:
I think the pair of examples 3&4 solves this for me:
https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I'll try soon.
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.
This is just a variant of Santiago's jq:
. as $in
| reduce leaf_paths as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })
It avoids the overhead of the key/value construction and destruction.
(If you have access to a version of jq later than jq 1.5, you can omit the "map(tostring)".)
Two important points about both these jq solutions:
Arrays are also flattened.
E.g. given {"a": {"b": [0,1,2]}} as input, the output would be:
{
"a.b.0": 0,
"a.b.1": 1,
"a.b.2": 2
}
If any of the keys in the original JSON contain periods, then key collisions are possible; such collisions will generally result in the loss of a value. This would happen, for example, with the following input:
{"a.b":0, "a": {"b": 1}}
Here is a solution that uses tostream, select, join, reduce and setpath
reduce ( tostream | select(length==2) | .[0] |= [join(".")] ) as [$p,$v] (
{}
; setpath($p; $v)
)
I've recently written a script called jqg that flattens arbitrarily complex JSON and searches the results using a regex; to simply flatten the JSON, your regex would be '.', which matches everything. Unlike the answers above, the script will handle embedded arrays, false and null values, and can optionally treat empty arrays and objects ([] & {}) as leaf nodes.
$ jq . test/odd-values.json
{
"one": {
"start-string": "foo",
"null-value": null,
"integer-number": 101
},
"two": [
{
"two-a": {
"non-integer-number": 101.75,
"number-zero": 0
},
"true-boolean": true,
"two-b": {
"false-boolean": false
}
}
],
"three": {
"empty-string": "",
"empty-object": {},
"empty-array": []
},
"end-string": "bar"
}
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.0.two-a.non-integer-number": 101.75,
"two.0.two-a.number-zero": 0,
"two.0.true-boolean": true,
"two.0.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
jqg was tested using jq 1.6
Note: I am the author of the jqg script.
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d #json_file does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d #{} \;.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d #-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)
As #hraban mentioned, leaf_paths does not work as expected (furthermore, it is deprecated). leaf_paths is equivalent to paths(scalars), it returns the paths of any values for which scalars returns a truthy value. scalars returns its input value if it is a scalar, or null otherwise. The problem with that is that null and false are not truthy values, so they will be removed from the output. The following code does work, by checking the type of the values directly:
. as $in
| reduce paths(type != "object" and type != "array") as $path ({};
. + { ($path | map(tostring) | join(".")): $in | getpath($path) })