How to parse a JSON file like this? - json

I'd like to parse this JSON file.
to get something like this with the 2nd column as Canonical SMILES and 3rd column as Isomeric SMILES.
5317139<TAB><TAB>CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1<TAB>CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1
Could anybody show me how to do it in the best way in jq?

The following jq script (run with the -r command-line option) meets the stated requirements, assuming that the occurrence of <TAB><TAB> is a typo:
def getString($TOCHeading):
.. | objects | select( .TOCHeading == $TOCHeading)
| .Information[0].Value.StringWithMarkup[0].String;
.Record
| [.RecordNumber,
getString("Canonical SMILES"),
getString("Isomeric SMILES")]
| #tsv
This script produces:
5317139 CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1 CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1

Related

How to use jq to extract a particular field from a terraform plan to show resources that are updated or changed?

I just want to be able to have a small quick view or list of what is changing with a terraform plan instead of the long output given by a terraform plan.
So far I think it can be done with a terraform plan and jq.
Here is what I have so far -
I run a plan like this:
terraform plan -out=tfplan -no-color -detailed-exitcode
Then I am trying to use jq to get the changes or updates using this:
terraform show -json tfplan | jq '.resource_changes[]
| select( .change.actions
| contains("create") or contains("update") )'
It gives me the error :
jq: error (at <stdin>:1): array (["no-op"]) and string ("create")
cannot have their containment checked
My jq skills are not the best - can anyone update my jq to work or is there an alternative way to do this?
contains checks if one array is a subarray of the other, recursively (substrings are matched too; note the "d" in "create" vs "created"):
$ jq -n '["created"] | contains(["create"])'
true
You can use the SQL-style IN filter:
$ jq -n '"create" | IN("created", "foo")'
false
$ jq -n '"created" | IN("created", "bar")'
true
So for your concrete use case you would probably want something like the following:
terraform show -json tfplan | jq '
.resource_changes[]
| select(
.change.actions as $actions
| "create" | IN($actions[])
or "update" | IN($actions[]))'
Or using any/2:
terraform show -json tfplan | jq '
.resource_changes[]
| select(any(.change.actions[]; .=="create" or .=="update"))'

Using jq with a large JSON file

I have an extremely large JSON file that I am working with, on my Linux box. When I jq the file I get an output in this format, which is perfect:
{
“ID:” 12345
“Name:” joe
“Address:” 123 first street
“Email:” joe#example.com
My goal is to be able to grep for a particular field but get all related fields to return. So if I did a grep for “123 first street” I would also get the ID , name, and email that was with that group of data.
Thus far, I have gotten here:
jq . Myfile.json | grep “123 first street”
Can anyone help with me with getting this query right? I would like to stay with this JSON format and stay in the Linux box.
jq . Myfile.json | grep “123 first street”
This should return all JSON objects with "field".
jq '.[] | select(has("field"))'

Can't combine col selection with CSV conversion

I've successfully been using jq for a while now, to take a JSON payload, select some of the columns, re-name the columns, and finally, create a JSON file. This is awesome bc I do not need a majority of the columns in the input dataset. Here is an example of one of those working commands:
curl -s https://c2.scryfall.com/file/scryfall-bulk/default-cards/default-cards-20220314210303.json
| jq '[.[] | {oracle_id: .oracle_id, scryfall_id: .id, rarity: .rarity, set_code: .set, latest_price: .prices.usd, scryfall_url: .scryfall_uri, art_crop_url: .image_uris.art_crop, is_digital: .digital, is_promo: .promo, is_variation: .variation}]' > Desktop/printings.json
However, what I really need is to have this data in CSV format. I have been manually working around this by feeding the output of the command above into a free web tool for converting to CSV. But I recently learned that jq can output CSV itself, so I would like to streamline this so I can just get CSV data from jq in the first place. I read the jq documentation, and reviewed several Stack Overflow threads to learn how this works. But none of the examples I've found for generating CSV data with jq involve selecting specific columns or re-naming those columns. So I've not been able to get this to work.
I tried this command below, where I am attempting to 1) read in the JSON file from the scryfall.com endpoint, then 2) map the keys as rows and columns to prep to convert to the CSV format, and 3) apply a filter selecting each of the 10 columns I need. (I could not figure out the column re-naming part, so I removed that part for now, for the sake of simplicity):
curl -s https://c2.scryfall.com/file/scryfall-bulk/default-cards/default-cards-20220314210303.json
| jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | .oracle_id | .id | .rarity | .set | .prices.usd | .scryfall_uri | .image_uris.art_crop | .digital | .promo | .variation | #csv' > Desktop/printings.csv
The result is this error:
jq: error (at <stdin>:67121): Cannot index array with string "oracle_id"
I'm not sure why "| .oracle_id" would be indexing anything. My intent is to filter the data. However, I think my struggle is an algorithmic one. Should I try to use pipes to sequence the different steps of selecting columns and generating the csv? Or should I combine them? If I need to separate the steps, what order do they need to come in? I understnad that the #csv filter at the end must take an array as input, but that's where I start to lose the plot.
Since the input JSON file is a freely-available, public dataset, you should be able to try this out to see if you get the same error output I showed above.
In general, you should try breaking out each "group" and testing separately, to see if it is mapping as you expect it to.
$cols maps out all the keys across all records while $rows are all the values of the records. You already have the rows and records you wanted so pass to #csv. Though keep in mind arrays passed to #csv must be all strings.
(map(keys) | add | unique) as $cols
| map(. as $row | $cols | map($row[.] | tostring)) as $rows
| $cols, $rows[]
| #csv
This however selects all mapped columns. If you only want a subset of them, just change the $cols variable to be what columns you want from the data. You might want to separate the value mapping from this since you have some nested values.
["oracle_id", "id", "rarity", "set", "price", "scryfall_uri", "image_uri", "digital", "promo", "variation"],
(.[] | [.oracle_id, .id, .rarity, .set, .prices.usd, .scryfall_uri, .image_uris.art_crop, .digital, .promo, .variation])
| #csv
jqplay

How to select multiple values in an array in json using jq

Am using jq to get multiple responses from the JSON file using the below command.
.components| to_entries[]| "\(.key)- \(.value.status)"
which gives me below
Server2- UP
server1 - UP
Splunk- UP
Datameer - UP
Platfora - UP
diskSpace- Good
But I want to select only a few I tried giving in braces of to_entries[] but it didn't work.
Expected output:
Server1 - UP
Splunk -UP
Platfora - UP
Is there any way to pick only a few values.
Appreciate your help. Thank you.
With the -r command-line option, the following transforms the given input to the desired output, and is perhaps close to what you're looking for:
.components
| to_entries[]
| select(.key == ("server1", "Splunk", "Platfora"))
| "\(.key)- \(.value.status)"
If the list of components is available as a JSON list, then you could modify the selection criterion accordingly, e.g. using IN (uppercase) or index.

Can I get the argmax in jq?

The following dataset contains movies. I want to find the longest title with jq. What I got so far:
$ wget https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json
$ cat movies.json | jq '[.[] | .title | length] | max'
So the longest title has 110 characters. The following query the shows it to me:
$ cat movies.json | jq '.[] | .title | select(length==110)'
"Cornell-Columbia-University of Pennsylvania Boat Race at Ithaca, N.Y., Showing Lehigh Valley Observation Train"
Is it possible to directly get the argmax?
Background
I am currently trying what I can do with jq for exploratory data analysis. Usually, I would use Pandas for most of it. However, I recently had an example where jq was just super handy. So I want to learn more about it to see how far jq can go / where it is easier to use than Pandas.
Yes, you can use max_by like:
max_by(.title | length).title
or,
map(.title) | max_by(length)