Maintain order of jq CSV output and include empty values - json

I am currently working on a bash script that combines the output of both the aws iam list-users and aws iam list-user-tags commands in a CSV file containing all users along with their respective information and assigned tags. To parse the JSON output of those commands I choose to use jq.
Retrieving parsing and converting (JSON to CSV) the list-user output works fine and produces the expected comma-separated list of values.
The output of list-user-tags does not quite behave that way. Its JSON output has the following schema:
{
Tags: [
{
Key: "Name",
Value: "NameOfUser"
},
{
Key: "Email",
Value: "EmailOfUser"
},
{
Key: "Company",
Value: "CompanyOfUser"
}
]
}
Unfortunately the order of the tags is not consistent across users (and possibly across queries) which currently makes it impossible for me to maintain the order defined in the CSV file. On top of that there is the possibility of one or multiple missing tags.
What I am looking for is a way to achieve the following (preferably using jq):
Select a tags "Value"-value by its "Key"-value
Check whether it is existent and if not add an empty entry
Put the value in the exact same place every time (maintain a certain order)
Repeat for every entry in the original output
Convert the resulting array of values into CSV
What I tried so far:
aws iam list-user-tags --user-name abcdef --no-cli-pager \
| jq -r '[.Tags[] | select(.Key=="Name"),select(.Key=="Email"),select(.Key=="Company") | .Value // ""] | #csv'
Any help is much appreciated!

Let's suppose your sample "schema" is in a file named schema.jq
Then
jq -n -f schema.jq | jq -r '
def get($key):
map(select(.Key == $key))
| if length == 0 then null else .[0].Value end;
.Tags | [get("Name"), get("Email"), get("Company")] | #csv
'
produces the following CSV:
"NameOfUser","EmailOfUser","CompanyOfUser"
It should be easy to adapt this illustration to your needs.

Related

Discard JSON objects if they contain substrings from a list

I want to parse a JSON file and extract some values, while also discarding or skipping certain entries if they contain substrings from another list passed in as an argument. The purpose is to exclude objects containing miscellaneous human-readable keywords from a master list.
input.json
{
"entities": [
{
"id": 600,
"name": "foo-001"
},
{
"id": 601,
"name": "foo-002"
},
{
"id": 602,
"name": "foobar-001"
}
]
}
args.json (list of keywords)
"foobar-"
"BANANA"
The output must definitely contain the foo-* entries (but not the excluded foobar- entries), but it can also contain any other names, provided they don't contain foobar- or BANANA. The exclusions are to be based on substrings, not exact matches.
I'm looking for a more performant way of doing this, because currently I just do my normal filters:
jq '[.[].entities[] | select(.name != "")] | walk(if type == "string" then gsub ("\t";"") else . end)' > file
(the input file has some erroneous tab escapes and null fields in it that are preprocessed)
At this stage, the file has only been minimally prepared. Then I iterate through this file line by line in shell and invoke grep -vf with a long list of invalid patterns from the keywords file. This gives a "master list" that is sanitized for later parsing by other applications. This seems intuitively wrong, though.
It seems like this should be done in one fell swoop on the first pass with jq instead of brute forcing it in a loop later.
I tried various invocations of INDEX and --slurpfile, but I seem to be missing something:
jq '.entities | INDEX(.name)[inputs]' input.json args.json
The above is a simplistic way of indexing the input args that at least seems to demonstrate that the patterns in the file can be matched verbatim, but doesn't account for substrings (contains ).
jq '.[] | walk(if type == "object" and (.name | contains($args[]))then empty else . end)' --slurpfile args args.json input.json
This looks to be getting closer to the idea, but something is screwy here. It seems like it's regurgitating all of the input file for each iteration of the arguments in the keywords file and returning them all for N number of arguments, and not actually emptying the original input, just dumbly checking the entire file for the presence of a single keyword and then starting over.
It seems like I need to unwrap the $args[] and map it here somehow so that the input file only gets iterated through once, with each keyword being checked for each record, rather than the entire file over and over again.
I found some conflicting information about whether a slurpfile is strictly necessary and can't determine what's the optimal approach here.
Thanks.
You could use all/2 as follows:
< input.json jq --slurpfile blacklist args.json '
.entities
| map(select(.name as $n
| all( $blacklist[]; . as $b | $n | index($b) | not) ))
'
or more concisely (but perhaps less obviously correct):
.entities | map( select( all(.name; index( $blacklist[]) | not) ))
You might wish to write .entities |= map( ... ) instead if you want to retain the original structure.

Non json output from gcloud ai-platform predict. Parsing non-json outputs

I am using gcloud ai-platform predict to call an endpoint and get predictions as below using json-request and not json-response
gcloud ai-platform predict --json-request instances.json
The response is however not json and hense cannot be read further causing other complications. Below is the response.
VAL HS
0.5 {'hs_1': [[-0.134501, -0.307326, -0.151994, -0.065352, -0.14138]], 'hs_2' : [[-0.134501, -0.307326, -0.151994, -0.065352, 0.020759]]}
Can gcloud ai-platform predict return a json instead or may be parse it differently. ?
Thanks for your help.
Apparently, your output is a table with headers and two columns: a score and the (alleged) JSON content. You should extract the second column of any preferred data row (your example only has one but in general you might receive several score-JSON pairs). Maybe your API already offers functionality to extract a certain 'state', e.g. the one with the highest score. If not, a simple awk or sed script can get this job done easily.
Then, the only remaining issue before having proper JSON (which can then be queried by jq) is with the quoting style. Your output encloses field names with ' instead of " ('lstm_1' instead of "lstm_1"). Correcting thin, unfortunately, is a not-so-easy task if you can expect to receive arbitrarily complex JSON data (such as strings containing quotation marks etc.). However, if your JSON will always look as simple as in the example provided, simply substituting the wrong for the right one becomes an easy task again for tools like awk or sed.
For instance, using sed on your example output to select the second line (which is the first data row), drop everything from the beginning until but not including the first opening curly brace (which marks the beginning of the second column), make said substitutions and pipe the result into jq:
... | sed -n "2{s/^[^{]\+//;s/'/\"/g;p;q}" | jq .
{
"lstm_1": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
-0.1413831114768982
]
],
"lstm_2": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
0.02075939252972603
]
]
}
[Edited to reflect upon a comment]
If you want to utilize the score as well, let jq handle it. For instance:
... | sed -n "2{s/'/\"/g;p;q}" | jq -s '{score:first,status:last}'
{
"score": 0.548,
"status": {
"lstm_1": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
-0.1413831114768982
]
],
"lstm_2": [
[
-0.13450142741203308,
-0.3073260486125946,
-0.15199440717697144,
-0.06535257399082184,
0.02075939252972603
]
]
}
}
[Edited to reflect upon changes in the OP]
As changes affected only names and values but no structure, the hitherto valid approach still holds:
... | sed -n "2{s/'/\"/g;p;q}" | jq -s '{val:first,hs:last}'
{
"val": 0.5,
"hs": {
"hs_1": [
[
-0.134501,
-0.307326,
-0.151994,
-0.065352,
-0.14138
]
],
"hs_2": [
[
-0.134501,
-0.307326,
-0.151994,
-0.065352,
0.020759
]
]
}
}

How to find something in a json file using Bash

I would like to search a JSON file for some key or value, and have it print where it was found.
For example, when using jq to print out my Firefox' extensions.json, I get something like this (using "..." here to skip long parts) :
{
"schemaVersion": 31,
"addons": [
{
"id": "wetransfer#extensions.thunderbird.net",
"syncGUID": "{e6369308-1efc-40fd-aa5f-38da7b20df9b}",
"version": "2.0.0",
...
},
{
...
}
]
}
Say I would like to search for "wetransfer#extensions.thunderbird.net", and would like an output which shows me where it was found with something like this:
{ "addons": [ {"id": "wetransfer#extensions.thunderbird.net"} ] }
Is there a way to get that with jq or with some other json tool?
I also tried to simply list the various ids in that file, and hoped that I would get it with jq '.id', but that just returned null, because it apparently needs the full path.
In other words, I'm looking for a command-line json parser which I could use in a way similar to Xpath tools
The path() function comes in handy:
$ jq -c 'path(.. | select(. == "wetransfer#extensions.thunderbird.net"))' input.json
["addons",0,"id"]
The resulting path is interpreted as "In the addons field of the initial object, the first array element's id field matches". You can use it with getpath(), setpath(), delpaths(), etc. to get or manipulate the value it describes.
Using your example with modifications to make it valid JSON:
< input.json jq -c --arg s wetransfer#extensions.thunderbird.net '
paths as $p | select(getpath($p) == $s) | null | setpath($p;$s)'
produces:
{"addons":[{"id":"wetransfer#extensions.thunderbird.net"}]}
Note
If there are N paths to the given value, the above will produce N lines. If you want only the first, you could wrap everything in first(...).
Listing all the "id" values
I also tried to simply list the various ids in that file
Assuming that "id" values of false and null are of no interest, you can print all the "id" values of interest using the jq filter:
.. | .id? // empty

replace comma in json file's field with jq-win

I have a problem in working JSON file. I launch curl in AutoIt sciript to download a json file from web and then convert it to csv format by jq-win
jq-win32 -r ".[]" -c class.json>class.txt
and the json is in the following format:
[
{
"id":"1083",
"name":"AAAAA",
"channelNumber":8,
"channelImage":""},
{
"id":"1084",
"name":"bbbbb",
"channelNumber":7,
"channelImage":""},
{
"id":"1088",
"name":"CCCCCC",
"channelNumber":131,
"channelImage":""},
{
"id":"1089",
"name":"DDD,DDD",
"channelNumber":132,
"channelImage":""},
]
after jq-win, the file should become:
{"id":"1083","name":"AAAAA","channelNumber":8,"channelImage":""}
{"id":"1084","name":"bbbbb","channelNumber":7,"channelImage":""}
{"id":"1088","name":"CCCCCC","channelNumber":131,"channelImage":""}
{"id":"1089","name":"DDD,DDD","channelNumber":132,"channelImage":""}
and then the csv file will be further process by the AutoIt script and become:
AAAAA,1083
bbbbb,1084
CCCCCC,1088
DDD,DDD,1089
The json has around 300 records and among them, 5~6 record has comma in it eg DDD,DDD
so when I tried read in the csv file by _FileReadToArray, the comma in DDD,DDD cause trouble.
My question is: can I replace comma in the field using jq-win ?
(I tried use fart.exe but it will replace all comma in json file which is not suitable for me.)
Thanks a lot.
Regds
LAM Chi-fung
can I replace comma in the field using jq-win ?
Yes. For example, use gsub, pretty much as you’d use awk’s gsub, e.g.
gsub(","; "|")
If you want more details, please provide more details as per [mcve].
Example
With the given JSON input, the jq program:
.[]
| .name |= gsub(",";";")
| [.[]]
| map(tostring)
| join(",")
yields:
1083,AAAAA,8,
1084,bbbbb,7,
1088,CCCCCC,131,
1089,DDD;DDD,132,

Fix "is not valid in a csv row" for jq, by transforming array to string

I try to export a CSV from Neo4j with jq, with:
curl --header "Authorization: Basic myBase64hash=" -H accept:application/json -H content-type:application/json \
-d '{"statements":[{"statement":"MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)) RETURN path"}]}' \
http://localhost:7474/db/data/transaction/commit \
| jq -r '(.results[0]) | .columns,.data[].row | #csv' > '/tmp/export-subset.csv'
But I'm getting this error message:
jq: error (at <stdin>:0): array ([{"email":"...) is not valid in a csv row
I think it's because of I have multiple e-mail adresses,
is it possible to place all of them in a CSV cell seperated by comma?
How can I achieve that with jq?
Edit:
This is an example of my JSON file:
{"results":[{"columns":["path"],"data":[{"row":[[{"email":"gdggdd#gmail.com"},{},{"date_found":"2011-11-29 12:51:14","last_name":"Doe","provider_id":2649,"first_name":"John"},{},{"number":"133","lon":3.21114,"lat":22.8844},{},{"street_name":"Govstreet"},{},{"hood":"Rotterdam"}]],"meta":[[{"id":71390,"type":"node","deleted":false},{"id":226866,"type":"relationship","deleted":false},{"id":63457,"type":"node","deleted":false},{"id":227100,"type":"relationship","deleted":false},{"id":65076,"type":"node","deleted":false},{"id":214799,"type":"relationship","deleted":false},{"id":63915,"type":"node","deleted":false},{"id":226552,"type":"relationship","deleted":false},{"id":71120,"type":"node","deleted":false}]]}]}],"errors":[]}
Forgive me but I'm not familiar with Cypher syntax or how your data is actually structured, you don't provide much detail about that. But what I can gather, based on your sample output, each "row" item seems to correspond to what you return in your Cypher query.
Apparently you're returning path which is an entire set of nodes and relationships, and not necessarily just the data you're actually interested in.
MATCH path=(()<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood))
RETURN path
You just want the email addresses so you should probably just return the email. If I understand the syntax correctly, you could change that to this:
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email
I believe that should result in something that looks something like this:
{
"results": [
{
"columns": [ "email" ],
"data": [
{
"row": [
"gdggdd#gmail.com"
],
"meta": [
{
"id": 71390,
"type": "string",
"deleted": false
}
]
}
]
}
],
"errors": []
}
Then it should be trivial to export that data to csv using jq since the rows can be converted directly:
.results[0] | .columns, .data[].row | #csv
On the other hand, I could be completely wrong on what that output would actually look like. So just working with your example, if you just want emails, you need to map the rows to just the email.
.results[0] | .columns, (.data[].row | map(.[0].email)) | #csv
In case I misinterpreted, if you were intending to output all values and not just the email, you should select just the values in your Cypher query.
MATCH (i)<--(p:Person)-->(h:House)<--(s:Street)-->(n:Neighbourhood)
RETURN i.email, p.date_found, p.last_name, p.provider_id, p.first_name,
h.number, h.lon, h.lat, s.street_name, n.hood
Then if my assumptions on the output are correct, the trivial jq query should give you your csv.
Since you want the keys in their original order, use keys_unsorted. This should get you on your way:
$ jq -r -c '.results[0] | .data[] | .row[]
| add
| keys_unsorted as $keys
| ($keys, [.[$keys[]]])
| #csv' input.json
(The newlines here are mainly for legibility.)
With your illustrative input, the output would be:
"email","date_found","last_name","provider_id","first_name","number","lon","lat","street_name","hood"
"gdggdd#gmail.com","2011-11-29 12:51:14","Doe",2649,"John","133",3.21114,22.8844,"Govstreet","Rotterdam"
Of course, in practice, you will probably have multiple lines of data, so in that case, you will probably want to make adjustments to ensure the headers are only printed once.