JQ to Parse CSV - How to Skip Headers - csv

I have a CSV that needs to be converted to a simple new-line separated format to be fed into another script, but running into a weird issue.
Contents of CSV:
"1. ID","2. Height","3. Gender","4. Age"
"<1111111111>","5ft. 10.0in.","M"," 15.0"
"<2222222222>","6ft. 0in.","M"," 22.0"
Version 1 of CLI command:
cat source.csv | sed 's/[\"<>]//g' | ~/projects/dp/vendor/jq/1.5/jq --raw-input --compact-output 'split("\n") | .[1:] | map(split(",")) | map({"phone_number":.[0],"opt_in":"yes"}) | .[]'
Version 1 output: None
Version 2 of CLI command:
cat source.csv | sed 's/[\"<>]//g' | ~/projects/dp/vendor/jq/1.5/jq --raw-input --compact-output 'split("\n") | .[0:] | map(split(",")) | map({"phone_number":.[0],"opt_in":"yes"}) | .[]'
Version 2 output:
{"phone_number":"1. ID","opt_in":"yes"}
{"phone_number":"1111111111","opt_in":"yes"}
{"phone_number":"2222222222","opt_in":"yes"}
It's my understanding that the .[1:] tells JQ to only parse rows (separated by new line) past row #1, however row #1 will dictate references (being able to reference phone_number).
So why is version 1 not outputting anything?

Version 1 is missing the -s command-line option.
Another way to skip the header row is to use inputs without the -n command-line option, as follows. Using inputs is also much more efficient than using the -s command-line option.
< source.csv sed 's/[\"<>]//g' |
jq -cR 'inputs
| split(",")
| {"phone_number":.[0],"opt_in":"yes"}'
Robustness
Using jq to parse a CSV file is fraught with potential difficulties. In general, it would be better to use a "csv2tsv" tool to convert the CSV to TSV, which jq can easily handle.

Related

How to use jq to extract a particular field from a terraform plan to show resources that are updated or changed?

I just want to be able to have a small quick view or list of what is changing with a terraform plan instead of the long output given by a terraform plan.
So far I think it can be done with a terraform plan and jq.
Here is what I have so far -
I run a plan like this:
terraform plan -out=tfplan -no-color -detailed-exitcode
Then I am trying to use jq to get the changes or updates using this:
terraform show -json tfplan | jq '.resource_changes[]
| select( .change.actions
| contains("create") or contains("update") )'
It gives me the error :
jq: error (at <stdin>:1): array (["no-op"]) and string ("create")
cannot have their containment checked
My jq skills are not the best - can anyone update my jq to work or is there an alternative way to do this?
contains checks if one array is a subarray of the other, recursively (substrings are matched too; note the "d" in "create" vs "created"):
$ jq -n '["created"] | contains(["create"])'
true
You can use the SQL-style IN filter:
$ jq -n '"create" | IN("created", "foo")'
false
$ jq -n '"created" | IN("created", "bar")'
true
So for your concrete use case you would probably want something like the following:
terraform show -json tfplan | jq '
.resource_changes[]
| select(
.change.actions as $actions
| "create" | IN($actions[])
or "update" | IN($actions[]))'
Or using any/2:
terraform show -json tfplan | jq '
.resource_changes[]
| select(any(.change.actions[]; .=="create" or .=="update"))'

Jq parse error: Invalid numeric literal at line 1, column 9

I am trying to get values from a json file from a url using curl and then printing specific keys with jq command (e.g. company). Unfortunately when I use:
jq '.[] | .company' JB.json
I get the error:
parse error: Invalid numeric literal at line 1, column 9. I have checked the downloaded file with less and it looks exactly like in the url.
Some people suggested to use the -R option but it prints:
jq: error: Cannot iterate over string
The url of the file is: https://jobs.github.com/positions.json?description=python&location=new+york
If you use curl -sS to retrieve the file, the '//'-style comments are skipped:
curl -Ss 'https://jobs.github.com/positions.json?description=python&location=new+york' | jq '.[] | .company'
"BentoBox"
"Aon Cyber Solutions"
"Sesame"
"New York University"
So presumably your JB.json contains the "//"-style comments. The simplest workaround would probably be to filter out those first two lines (e.g. using sed (or jq!)) first.
Here's a jq-only solution:
< JB.json jq -Rr 'select( test("^//")|not)' |
jq '.[] | .company'

Can I get the argmax in jq?

The following dataset contains movies. I want to find the longest title with jq. What I got so far:
$ wget https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json
$ cat movies.json | jq '[.[] | .title | length] | max'
So the longest title has 110 characters. The following query the shows it to me:
$ cat movies.json | jq '.[] | .title | select(length==110)'
"Cornell-Columbia-University of Pennsylvania Boat Race at Ithaca, N.Y., Showing Lehigh Valley Observation Train"
Is it possible to directly get the argmax?
Background
I am currently trying what I can do with jq for exploratory data analysis. Usually, I would use Pandas for most of it. However, I recently had an example where jq was just super handy. So I want to learn more about it to see how far jq can go / where it is easier to use than Pandas.
Yes, you can use max_by like:
max_by(.title | length).title
or,
map(.title) | max_by(length)

How to parse a JSON file like this?

I'd like to parse this JSON file.
to get something like this with the 2nd column as Canonical SMILES and 3rd column as Isomeric SMILES.
5317139<TAB><TAB>CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1<TAB>CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1
Could anybody show me how to do it in the best way in jq?
The following jq script (run with the -r command-line option) meets the stated requirements, assuming that the occurrence of <TAB><TAB> is a typo:
def getString($TOCHeading):
.. | objects | select( .TOCHeading == $TOCHeading)
| .Information[0].Value.StringWithMarkup[0].String;
.Record
| [.RecordNumber,
getString("Canonical SMILES"),
getString("Isomeric SMILES")]
| #tsv
This script produces:
5317139 CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1 CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1

linux command-line update csv file inline using value from another column that is json

I have a large csv file that contains several columns. One of the columns is a json string. I am trying to extract a specific value from the column that contains the json and add that value to the row as it's own column.
I've tinkered around a little with sed and awk to try to do this but really I'm just spinning my wheels
I'm also trying to do this as an inline file edit. The csv is tab delimited.
The value I'm trying to put in its own column is the value for destinationIDUsage
Sample row (highly trimmed down for readability here):
2017-03-22 00:00:01 %key%94e901fd3ceef351a0ad770e0be91d38 10 3.0.0 [{"MC_LIVEREPEATER":false},{"environment":"details"},{"feature":"pushPublishUsage","destinationIDUsage":876543}] false
End result for the row should now have 876543 as a value in its own column as such:
2017-03-22 00:00:01 %key%94e901fd3ceef351a0ad770e0be91d38 10 3.0.0 [{"MC_LIVEREPEATER":false},{"environment":"details"},{"feature":"pushPublishUsage","destinationIDUsage":876543}] 876543 false
Any help is greatly appreciated.
Something like this seems that does the job.
$ echo "$a"
2017-03-22 00:00:01 %key%94e901fd3ceef351a0ad770e0be91d38 10 3.0.0 [{MC_LIVEREPEATER:false},{environment:details},{feature:pushPublishUsage,destinationIDUsage:876543}] false
$ echo "$a" |awk '{for (i=1;i<=NF;i++) {if ($i~/destinationIDU/) {match($i,/(.*)(destinationIDUsage:)(.*)(})/,f);extra=f[3]}}}{prev=NF;$(NF+1)=$prev;$(NF-1)=extra}1'
2017-03-22 00:00:01 %key%94e901fd3ceef351a0ad770e0be91d38 10 3.0.0 [{MC_LIVEREPEATER:false},{environment:details},{feature:pushPublishUsage,destinationIDUsage:876543}] 876543 false
Is possible though, awk experts inhere to propose something different and maybe better.
With GNU awk for the 3rd arg to match():
$ awk 'BEGIN{FS=OFS="\t"} {match($6,/"destinationIDUsage":([0-9]+)/,a); $NF=a[1] OFS $NF}1' file
2017-03-22 00:00:01 %key%94e901fd3ceef351a0ad770e0be91d38 10 3.0.0 [{"MC_LIVEREPEATER":false},{"environment":"details"},{"feature":"pushPublishUsage","destinationIDUsage":876543}] 876543 false
Add -i inplace for "inplace" editing or just do awk 'script' file > tmp && mv tmp file like you can with any UNIX tool.
Here is a solution using jq
If the file filter.jq contains
split("\n")[] # split string into lines
| select(length>0) # eliminate blanks
| split("\t") # split data rows by tabs
| (.[5] | fromjson | add) as $f # expand json
| .[:-1] + [$f.destinationIDUsage] + .[-1:] # add destinationIDUsage column
| #tsv # convert to tab-separated
and data contains the sample data then the command
jq -M -R -s -r -f filter.jq data
will produce the output with the additional column
2017-03-22 00:00:01 %key%94e901fd3ceef351a0ad770e0be91d38 10 3.0.0 [{"MC_LIVEREPEATER":false},{"environment":"details"},{"feature":"pushPublishUsage","destinationIDUsage":876543}] 876543 false
To edit the file inplace you can make use of a tool like sponge as described in this answer:
Manipulate JSON with jq