Rewriting a JSON file into a CSV efficiently in Bash [duplicate] - json

This question already has answers here:
Use jq to Convert json File to csv
(1 answer)
Converting json map to csv using jq
(3 answers)
Closed 4 years ago.
I want to efficiently rewrite a large json, which has always the same field names, into a csv, ignoring its keys.
To give a concrete example, here is a large JSON file (tempST.json):
https://gist.githubusercontent.com/pedro-roberto/b81672a89368bc8674dae21af3173e68/raw/e4afc62b9aa3092c8722cdbc4b4b4b6d5bbc1b4b/tempST.json
If I rewrite just fields time, ancestorcount and descendantcount from this JSON into a CSV I should get:
1535995526,1,1
1535974524,1,1
1535974528,1,2
...
1535997274,1,1
The following script tempSpeedTest.sh writes the value of the fields time, ancestorcount and descendantcount into each line of the csv:
rm tempOutput.csv
jq -c '.[]' < tempST.json | while read line; do
descendantcount=$(echo $line | jq '.descendantcount')
ancestorcount=$(echo $line | jq '.ancestorcount')
time=$(echo $line | jq '.time')
echo "${time},${ancestorcount},${descendantcount}" >> tempOutput.csv
done
However the script takes around 3 minutes to run, which is unsatisfying:
>time bash tempSpeedTest.sh
real 2m50.254s
user 2m43.128s
sys 0m34.811s
What is a faster way to achieve the same result?

jq -r '.[] | [.time, .descendantcount, .ancestorcount] | #csv' <tempST.json >tempOutput.csv
See this running at https://jqplay.org/s/QJz5FCmuc9

Related

How to merge json objects into single array in bash [duplicate]

This question already has an answer here:
"Argument list too long" while slurping JSON files [duplicate]
(1 answer)
Closed 1 year ago.
There are more than 6k JSON files, each containing exactly one JSON object. I want to prepare one list of objects from these JSONs.
When I am running below jq command I am getting an error.
Kedar.Javalkar#KD2806 MINGW64 /c/zz
$ jq -s '.' inventoryItem_*.json > inventory_items_result_$(date +"%Y%m%d_%H%M%S").json
bash: /usr/bin/jq: Argument list too long
I tried ulimit -s unlimited but the same error
I am using a windows 10 git bash
This is a job that xargs is created to fix -- splitting lists of items into individual command lines that are within the permitted limit.
Because running jq -s a single time is different from concatenating the results of multiple smaller runs, it's appropriate to use xargs to combine cat invocations using the manner described in the linked duplicate.
printf '%s\0' inventoryItem_*.json \
| xargs -0 cat \
| jq -s . \
>"inventory_items_result_$(date +"%Y%m%d_%H%M%S").json"

"Argument list too long" while slurping JSON files [duplicate]

This question already has answers here:
Argument list too long error for rm, cp, mv commands
(31 answers)
Closed 1 year ago.
I have thousands of JSON files, and I want to merge them into a single one. I'm using the command below to do this.
jq -s . -- *.json > result.json
But I am getting argument list too long error, probably because of the number of files I'm trying to merge. Is there any workaround for this issue?
Built-in commands are immune to that limitation, and printf is one of them. In conjunction with xargs, it would help a lot to achieve this.
printf '%s\0' *.json | xargs -0 cat -- | jq -s .

How to create 2 CSV files from 1 JSON using JQ

I have a lot of rather large JSON logs which need to be imported into several DB tables.
I can easily parse them and create 1 CSV for import.
But how can I parse the JSON and get 2 different CSV files as output?
Simple (nonsense) example:
testJQ.log
{"id":1234,"type":"A","group":"games"}
{"id":5678,"type":"B","group":"cars"}
using
cat testJQ.log|jq --raw-output '[.id,.type,.group]|#csv'>testJQ.csv
I get one file testJQ.csv
1234,"A","games
5678,"B","cars"
But I would like to get this
types.csv
1234,"A"
5678,"B"
groups.csv
1234,"games"
5678,"cars"
Can this be done without having to parse the JSON twice, first time creating the types.csv and second time the groups.csv like this?
cat testJQ.log|jq --raw-output '[.id,.type]|#csv'>types.csv
cat testJQ.log|jq --raw-output '[.id,.group]|#csv'>groups.csv
I suppose one way you could hack this up is to output the contents of one file to stdout and the others to stderr and redirect to separate files. Of course you're limited to two files though.
$ <testJQ.log jq -r '([.id,.type]|#csv),([.id,.group]|#csv|stderr|empty)' \
1>types.csv 2>groups.csv
stderr outputs to stderr but the value propagates to the output, so you'll want to follow that up with empty to swallow that up.
Personally I wouldn't recommend doing this, I would just write a python script (or other language) to parse this if you needed to output to multiple files.
You will either need to run jq twice, or to run jq in conjunction with another program to "split" the output of the call to jq. For example, you could use a pipeline of the form: jq -c ... | awk ...
The potential disadvantage of the pipeline approach is that if JSON is the final output, it will be JSONL; but obviously that doesn't apply here.
There are many ways to craft such a pipeline. For example, assuming there are no raw newlines in the CSV:
< testJQ.log jq -r '
"types", ([.id,.type] |#csv),
"groups", ([.id,.group]|#csv)' |
awk 'NR % 2 == 1 {out=$1; next} {print >> out".csv"}'
Or:
< testJQ.log jq -r '([.id,.type],[.id,.group])|#csv' |
awk '{ out = ((NR % 2) == 1) ? "types" : "groups"; print >> out".csv"}'
For other examples, see e.g.
Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?
Splitting / chunking JSON files with JQ in Bash or Fish shell?
Split JSON into multiple files
Handling raw new-lines
Whether or not you split the CSV into multiple files, there is a potential issue with embedded raw newlines. One approach is to change "\n" in JSON strings to "\\n", e.g.
jq -r '([.id,.type],[.id,.group])
| map(if type == "string" then gsub("\n";"\\n") else . end)
| #csv'

Extract data from JSON file using bash [duplicate]

This question already has answers here:
Read JSON data in a shell script [duplicate]
(4 answers)
Closed 7 years ago.
Let's say that we have this kind of JSON file:
{
...
"quotes":{
"SOMETHING":10,
...
"SOMETHING_ELSE":120.4,
...
} }
How can I obtain those values and use them in order to add them together?
Am I able to do even this?
#!/bin/bash
#code ...
echo "$SOMETHING + $SOMETHING_ELSE" | bc
#code ...
#exit
I will obtain the JSON file with wget command. All I want is the content from this file.
Can you help me, please? I am a beginner in shell programming.
I usually use jq, a really fast json parser, to do this kind of things (because parsing a json file with tools like awk or sed is really error-prone).
Given an input file like this:
# file: input.json
{
"quotes":{
"SOMETHING":10,
"SOMETHING_ELSE":120.4
}
}
You can obtain the sum of the 2 fields with a simple filter:
jq '.quotes.SOMETHING + .quotes.SOMETHING_ELSE' input.json
# output -> 130.4
NOTE: jq is available in every major linux distribution. In a debian-derivative system you can install with a sudo apt-get install jq.
This will print out the sum of the selected lines' floats.
#!/bin/bash
awk '{ if ($1 ~ /"SOMETHING":/) {print}; if ($1 ~ /"SOMETHING_ELSE":/) {print} }' $1 | cut -d: -f2 | cut -d, -f1 | awk '{s+=$1};END{print s}'
This finds the lines you want, the plucks out the numbers, and adds them.
You should look up and learn jq as shown in Read the json data in shell script.
The tools in a "normal" shell installation like awk and sed all predate JSON by decades, and are a very very bad fit. jq is worth the time to learn.
Or use Python instead.

bash/shell jq parse to variable from aws json [duplicate]

This question already has answers here:
filtering data using parameters
(2 answers)
Closed 7 years ago.
I am trying to parse json result from aws result, but I getting error or null when I am using $ip, when I am using specific IP it work. something wrong when I am using tne variable inside the jq command
#!/bin/bash
aws ec2 describe-addresses --region eu-west-1 > 1.txt
ipList=( "52.16.121.238" "52.17.250.188" )
for ip in "${ipList[#]}";
do
echo $ip
cat 1.txt | jq '.Addresses | .[] | select (.PublicIp==$ip) | .InstanceId'
#echo $result
done
Please advise.
You're using single quotes around your jq program, which is causing the shell variable to not be interpolated. Furthermore, even if it were, you would still need to add string quoting around the variable interpolation to make jq interpret it as a string literal. Since doing shell variable interpolation into jq programs is hard and error-prone, jq provides a command-line argument to this effect, --arg, intended to lift shell variables into jq variables. Your jq invocation would therefore look like this:
jq --arg ip "$ip" '.Addresses[] | select(.PublicIp == $ip) | .InstanceId'
Thanks for your help. the right format is:
cat 1.txt | jq ".Addresses | .[] | select(.PublicIp==\"$ip\") | .InstanceId"