The following dataset contains movies. I want to find the longest title with jq. What I got so far:
$ wget https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json
$ cat movies.json | jq '[.[] | .title | length] | max'
So the longest title has 110 characters. The following query the shows it to me:
$ cat movies.json | jq '.[] | .title | select(length==110)'
"Cornell-Columbia-University of Pennsylvania Boat Race at Ithaca, N.Y., Showing Lehigh Valley Observation Train"
Is it possible to directly get the argmax?
Background
I am currently trying what I can do with jq for exploratory data analysis. Usually, I would use Pandas for most of it. However, I recently had an example where jq was just super handy. So I want to learn more about it to see how far jq can go / where it is easier to use than Pandas.
Yes, you can use max_by like:
max_by(.title | length).title
or,
map(.title) | max_by(length)
Related
I just want to be able to have a small quick view or list of what is changing with a terraform plan instead of the long output given by a terraform plan.
So far I think it can be done with a terraform plan and jq.
Here is what I have so far -
I run a plan like this:
terraform plan -out=tfplan -no-color -detailed-exitcode
Then I am trying to use jq to get the changes or updates using this:
terraform show -json tfplan | jq '.resource_changes[]
| select( .change.actions
| contains("create") or contains("update") )'
It gives me the error :
jq: error (at <stdin>:1): array (["no-op"]) and string ("create")
cannot have their containment checked
My jq skills are not the best - can anyone update my jq to work or is there an alternative way to do this?
contains checks if one array is a subarray of the other, recursively (substrings are matched too; note the "d" in "create" vs "created"):
$ jq -n '["created"] | contains(["create"])'
true
You can use the SQL-style IN filter:
$ jq -n '"create" | IN("created", "foo")'
false
$ jq -n '"created" | IN("created", "bar")'
true
So for your concrete use case you would probably want something like the following:
terraform show -json tfplan | jq '
.resource_changes[]
| select(
.change.actions as $actions
| "create" | IN($actions[])
or "update" | IN($actions[]))'
Or using any/2:
terraform show -json tfplan | jq '
.resource_changes[]
| select(any(.change.actions[]; .=="create" or .=="update"))'
I have an extremely large JSON file that I am working with, on my Linux box. When I jq the file I get an output in this format, which is perfect:
{
“ID:” 12345
“Name:” joe
“Address:” 123 first street
“Email:” joe#example.com
My goal is to be able to grep for a particular field but get all related fields to return. So if I did a grep for “123 first street” I would also get the ID , name, and email that was with that group of data.
Thus far, I have gotten here:
jq . Myfile.json | grep “123 first street”
Can anyone help with me with getting this query right? I would like to stay with this JSON format and stay in the Linux box.
jq . Myfile.json | grep “123 first street”
This should return all JSON objects with "field".
jq '.[] | select(has("field"))'
I'd like to parse this JSON file.
to get something like this with the 2nd column as Canonical SMILES and 3rd column as Isomeric SMILES.
5317139<TAB><TAB>CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1<TAB>CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1
Could anybody show me how to do it in the best way in jq?
The following jq script (run with the -r command-line option) meets the stated requirements, assuming that the occurrence of <TAB><TAB> is a typo:
def getString($TOCHeading):
.. | objects | select( .TOCHeading == $TOCHeading)
| .Information[0].Value.StringWithMarkup[0].String;
.Record
| [.RecordNumber,
getString("Canonical SMILES"),
getString("Isomeric SMILES")]
| #tsv
This script produces:
5317139 CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1 CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1
I have a CSV that needs to be converted to a simple new-line separated format to be fed into another script, but running into a weird issue.
Contents of CSV:
"1. ID","2. Height","3. Gender","4. Age"
"<1111111111>","5ft. 10.0in.","M"," 15.0"
"<2222222222>","6ft. 0in.","M"," 22.0"
Version 1 of CLI command:
cat source.csv | sed 's/[\"<>]//g' | ~/projects/dp/vendor/jq/1.5/jq --raw-input --compact-output 'split("\n") | .[1:] | map(split(",")) | map({"phone_number":.[0],"opt_in":"yes"}) | .[]'
Version 1 output: None
Version 2 of CLI command:
cat source.csv | sed 's/[\"<>]//g' | ~/projects/dp/vendor/jq/1.5/jq --raw-input --compact-output 'split("\n") | .[0:] | map(split(",")) | map({"phone_number":.[0],"opt_in":"yes"}) | .[]'
Version 2 output:
{"phone_number":"1. ID","opt_in":"yes"}
{"phone_number":"1111111111","opt_in":"yes"}
{"phone_number":"2222222222","opt_in":"yes"}
It's my understanding that the .[1:] tells JQ to only parse rows (separated by new line) past row #1, however row #1 will dictate references (being able to reference phone_number).
So why is version 1 not outputting anything?
Version 1 is missing the -s command-line option.
Another way to skip the header row is to use inputs without the -n command-line option, as follows. Using inputs is also much more efficient than using the -s command-line option.
< source.csv sed 's/[\"<>]//g' |
jq -cR 'inputs
| split(",")
| {"phone_number":.[0],"opt_in":"yes"}'
Robustness
Using jq to parse a CSV file is fraught with potential difficulties. In general, it would be better to use a "csv2tsv" tool to convert the CSV to TSV, which jq can easily handle.
I'm taking a modified command from the jq tutorial:
curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' \
| jq -r -c '.[] | {message: .commit.message, name: .commit.committer.name} | [.[]] | #csv'
Which does csv export well, but missing the headers as the top:
"Fix README","Nicolas Williams"
"README: send questions to SO and Freenode","Nicolas Williams"
"usage() should check fprintf() result (fix #771)","Nicolas Williams"
"Use jv_mem_alloc() in compile.c (fix #771)","Nicolas Williams"
"Fix header guards (fix #770)","Nicolas Williams"
How can I add the header (in this case message,name) at the top? (I know it's possible manually, but how to do it within jq?)
Just add the header text in an array in front of the values.
["Commit Message","Committer Name"], (.[].commit | [.message,.committer.name]) | #csv
Based on Anton's comments on Jeff Mercado's answer, this snippet will get the key names of the properties of the first element and output them as an array before the rows, thus using them as headers. If different rows have different properties, then it won't work well; then again, neither would the resulting CSV.
map({message: .commit.message, name: .commit.committer.name}) | (.[0] | to_entries | map(.key)), (.[] | [.[]]) | #csv
While I fully realize OP was looking for a purely jq answer, I found this question looking for any answer. So, let me offer one I found (and found useful) to others like me.
sudo apt install moreutils - if you don't have them yet. Moreutils website.
echo "Any, column, name, that, is, not, in, your, json, object" | cat - your.csv | sponge your.csv
Disadvantages: requires moreutils package, is not just jq-reliant, so some would understandably say less elegant.
Advantages: you choose your headers, not your JSON keys. Also, pure jq ways are bothered by the sorting of the keys, depending on your version.
How does it work?
echo outputs your header
cat - takes echo output from stdin (cause -) and conCATenates it with your csv file
sponge waits until that is done and writes the result to same file, overwriting it.
But you could do it with tee without having to install any packages!
No, you could not, as Kos excellently demonstrates here. Not unless you're fine with loosing your csv at some point.