I've successfully been using jq for a while now, to take a JSON payload, select some of the columns, re-name the columns, and finally, create a JSON file. This is awesome bc I do not need a majority of the columns in the input dataset. Here is an example of one of those working commands:
curl -s https://c2.scryfall.com/file/scryfall-bulk/default-cards/default-cards-20220314210303.json
| jq '[.[] | {oracle_id: .oracle_id, scryfall_id: .id, rarity: .rarity, set_code: .set, latest_price: .prices.usd, scryfall_url: .scryfall_uri, art_crop_url: .image_uris.art_crop, is_digital: .digital, is_promo: .promo, is_variation: .variation}]' > Desktop/printings.json
However, what I really need is to have this data in CSV format. I have been manually working around this by feeding the output of the command above into a free web tool for converting to CSV. But I recently learned that jq can output CSV itself, so I would like to streamline this so I can just get CSV data from jq in the first place. I read the jq documentation, and reviewed several Stack Overflow threads to learn how this works. But none of the examples I've found for generating CSV data with jq involve selecting specific columns or re-naming those columns. So I've not been able to get this to work.
I tried this command below, where I am attempting to 1) read in the JSON file from the scryfall.com endpoint, then 2) map the keys as rows and columns to prep to convert to the CSV format, and 3) apply a filter selecting each of the 10 columns I need. (I could not figure out the column re-naming part, so I removed that part for now, for the sake of simplicity):
curl -s https://c2.scryfall.com/file/scryfall-bulk/default-cards/default-cards-20220314210303.json
| jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | .oracle_id | .id | .rarity | .set | .prices.usd | .scryfall_uri | .image_uris.art_crop | .digital | .promo | .variation | #csv' > Desktop/printings.csv
The result is this error:
jq: error (at <stdin>:67121): Cannot index array with string "oracle_id"
I'm not sure why "| .oracle_id" would be indexing anything. My intent is to filter the data. However, I think my struggle is an algorithmic one. Should I try to use pipes to sequence the different steps of selecting columns and generating the csv? Or should I combine them? If I need to separate the steps, what order do they need to come in? I understnad that the #csv filter at the end must take an array as input, but that's where I start to lose the plot.
Since the input JSON file is a freely-available, public dataset, you should be able to try this out to see if you get the same error output I showed above.
In general, you should try breaking out each "group" and testing separately, to see if it is mapping as you expect it to.
$cols maps out all the keys across all records while $rows are all the values of the records. You already have the rows and records you wanted so pass to #csv. Though keep in mind arrays passed to #csv must be all strings.
(map(keys) | add | unique) as $cols
| map(. as $row | $cols | map($row[.] | tostring)) as $rows
| $cols, $rows[]
| #csv
This however selects all mapped columns. If you only want a subset of them, just change the $cols variable to be what columns you want from the data. You might want to separate the value mapping from this since you have some nested values.
["oracle_id", "id", "rarity", "set", "price", "scryfall_uri", "image_uri", "digital", "promo", "variation"],
(.[] | [.oracle_id, .id, .rarity, .set, .prices.usd, .scryfall_uri, .image_uris.art_crop, .digital, .promo, .variation])
| #csv
jqplay
Related
I just want to be able to have a small quick view or list of what is changing with a terraform plan instead of the long output given by a terraform plan.
So far I think it can be done with a terraform plan and jq.
Here is what I have so far -
I run a plan like this:
terraform plan -out=tfplan -no-color -detailed-exitcode
Then I am trying to use jq to get the changes or updates using this:
terraform show -json tfplan | jq '.resource_changes[]
| select( .change.actions
| contains("create") or contains("update") )'
It gives me the error :
jq: error (at <stdin>:1): array (["no-op"]) and string ("create")
cannot have their containment checked
My jq skills are not the best - can anyone update my jq to work or is there an alternative way to do this?
contains checks if one array is a subarray of the other, recursively (substrings are matched too; note the "d" in "create" vs "created"):
$ jq -n '["created"] | contains(["create"])'
true
You can use the SQL-style IN filter:
$ jq -n '"create" | IN("created", "foo")'
false
$ jq -n '"created" | IN("created", "bar")'
true
So for your concrete use case you would probably want something like the following:
terraform show -json tfplan | jq '
.resource_changes[]
| select(
.change.actions as $actions
| "create" | IN($actions[])
or "update" | IN($actions[]))'
Or using any/2:
terraform show -json tfplan | jq '
.resource_changes[]
| select(any(.change.actions[]; .=="create" or .=="update"))'
I'd like to parse this JSON file.
to get something like this with the 2nd column as Canonical SMILES and 3rd column as Isomeric SMILES.
5317139<TAB><TAB>CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1<TAB>CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1
Could anybody show me how to do it in the best way in jq?
The following jq script (run with the -r command-line option) meets the stated requirements, assuming that the occurrence of <TAB><TAB> is a typo:
def getString($TOCHeading):
.. | objects | select( .TOCHeading == $TOCHeading)
| .Information[0].Value.StringWithMarkup[0].String;
.Record
| [.RecordNumber,
getString("Canonical SMILES"),
getString("Isomeric SMILES")]
| #tsv
This script produces:
5317139 CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1 CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1
I have a JSON file and I am extracting data from it using jq. One simple use case is pulling out any JSON Object that contains an Id which is provided as an argument.
I use the following simple script to do so:
[.[] | select(.id == $ID)]
The script is stored in a separate file (by_id.jq) which I pass in using the -f argument.
The full command looks something like this:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b" ./by_id.jq
Is there a way by only using jq that a comma separated list of values could be passed as an argument to the jq script and iterate through the ids and check them against the value of .id in the the JSON file with the result being the objects that have that id?
For example if I wanted to pull out three objects by their ids I would want to structure the command in this way:
cat ./my_json_file.json | jq -sf --arg ID "8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963" ./by_id.jq
Sure. Though you'll need to parse (split) that list of ids to something that jq can work with, such as an array of ids. Then your problem becomes, given an array of keys, select objects that have any of these ids. Which you could use approaches found here.
$ jq --arg ID '8df993c1-57d5-46b3-a8a3-d95066934e5b,1d5441ca-5758-474d-a9fc-40d0f68aa538,23cc618a-8ad4-4141-bc1c-0251y0663963' '
select(.id | IN($ID|split(",")[]))
' ./my_json_file.json
I'm not sure what your input looks like but judging by your use of slurping then filtering the slurped input, it's a stream of objects. The slurping is not necessary here.
Here is an approach that focuses on efficiency.
Your Q indicates that in fact you have a stream of objects, so the first step towards efficiency is to avoid the -s option, and use -n with inputs instead.
The second step it to avoid splitting your comma-separated string of values more than once.
So your script might look like this:
INDEX($ids | splits(","); .) as $dict
| inputs
| select($dict[.id])
And the invocation would look like this:
jq -n --args a,b,c -f by_id.jq
This of course assumes that simply splitting the string of ids on "," will suffice. You might need to trim the values and take care of other potential anomalies.
For efficiency, it would be better to split $ID just once.
So if you have to use the -s option, you could use the following jq program:
INDEX($ID | splits(","); .) as $dict
| .[]
| select($dict[.id])
I have the following JSON snippet:
{
"root_path": "/www",
"core_path": "/www/wp",
"content_path": "/www/content",
"vendor_path": "/www/vendor"
}
I would like to use jq first to get the values sorted in descending order of length:
/www/content
/www/vendor
/www/wp
/www
I need these so I can match against a list of files to find which of the named paths the files exist in.
Then I would like to use jq again to swap properties for values (it can drop duplicate properties, that's okay):
{
"/www": "root_path".
"/www/wp": "core_path",
"/www/content": "content_path",
"/www/vendor": "vendor_path"
}
My use case for this 2nd query is to be able to lookup a matched path value and find its path name, which I will then use in a second JSON snippet with an identical schema to get the named path's value.
My use-case is for website deployment and I have a config file that contains files names as they will exist on the deployment server that should be copied from the source server to the deploy server but the servers may have different directory layouts.
I need to use Bash for this, but if there is a better way to do what I am looking to do I am open. That said, I really do want to learn how to use jq better so I would prefer to learn how to use jq to do these transforms.
I am using jq version 1.5
the values sorted in descending order of length:
[.[]] | sort_by(length) | reverse[]
swap properties for values
with_entries(.key as $k | .key=.value | .value=$k )
Combining the two requirements
A solution to the combined problem can be crafted by combining the above two solutions, because with_entries is a combination of to_entries and from_entries:
to_entries
| map(.key as $k | .key=.value | .value=$k )
| sort_by(.key|length)
| reverse
| from_entries
Depending the context, my jq query select one or several elements. If my query return one element I want display a specific field value, else, I want display an another field value.
By example, I have this simple query :
jq '.foo | select(.faa | test("word")) | [ .fii, .fuu ]
Sometimes, the selection (select (.faa ... )) return one element, sometimes severals elements. If I have one element, I want display only the field .fii, else, I want display only the field .fuu.
Is there a way for do that with jq ? (with only one query)
Thank :)
The only way that .foo | select(.faa | test("word")) could produce
more than one JSON value is if the input consists of a stream of
objects. Thus, the following will assume that that is the case.
One way to solve the given problem is to use the -s command-line option.
You can then simply count the number of solutions, along
the following lines:
map(.foo | select(.faa | test("word")))
| if (length == 1) then map(.fii)
else map(.fuu)
end
If you want a stream of values, rather than an array, then you
could simply append | .[] to the above filter.
Using inputs
jq 1.5 introduced inputs, which avoids some potential problems with "slurping" a large file. Using jq with the -n option, a solution to the given problem would be as above but with the first line replaced by:
[inputs | .foo | select(.faa | test("word")) ]