I currently have a JSON file that is around 15K of lines of code. I wish to extract some info and can do it individually using the commands bellow into 4 separate CSV files. I can't figure out if I can reduce this into a single command so that all the data goes neatly into a CSV.
This is what I have so far.
jq -r '.ServiceEngine[]?.data_vnics[]?.vlan_interfaces[]?.vnic_networks[].ip.ip_addr.addr' config_AviCluster.json > config_AviCluster.IP_address.csv
jq -r '.ServiceEngine[]?.data_vnics[]?.vlan_interfaces[].if_name' config_AviCluster.json > config_AviCluster.Interface_Name.csv
jq -r '.ServiceEngine[]?.data_vnics[]?.vlan_interfaces[]?.vnic_networks[].ip.mask' config_AviCluster.json > config_AviCluster.IP_Mask.csv
jq -r '.ServiceEngine[]?.data_vnics[]?.vlan_interfaces[].vlan_id' config_AviCluster.json > config_AviCluster.vlan_ID.csv
That is 4 separate commands and then I manually merge the CSV into a single file. Could this be improved? If so any advice and guidance would be great.
Kinds regards
In theory, the following should create a new JSON structure with new information. But need a sample dataset to accurately test it.
jq -r '.ServiceEngine[]?.data_vnics[]?.vlan_interfaces[]|{ip_addr: .vnic_networks[].ip.ip_addr.addr, netmask: .vnic_networks[].ip.mask, iface: .if_name, vlan: .vlan_id}' config_AviCluster.json
Output
{
"ip_addr": "0.0.0.0",
"netmask": "255.255.255.0",
"iface": "eth0",
"vlan_id": "1"
}
Related
I am using json2csv to convert multiple json files structured like
{
"address": "0xe9f6191596bca549e20431978ee09d3f8db959a9",
"copyright": "None",
"created_at": "None"
...
}
The problem is that I need to put multiple json files into one csv file.
In my code I iterate through a hash file, call a curl with that hash and output the data to a json. Then I use json2csv to convert each json to csv.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o main.csv;
done
I use -o to output the json into csv, however it just overwrites the previous json data. So I end up with only one row.
I have used >>, and this does append to the csv file.
json2csv -i "curl_outs/${h}.json" >> main.csv
But for some reason it appends the data's keys to the end of the csv file
I've also tried
cat csv_outs/*.csv > main.csv
However I get the same output.
How do I append multiple json files to one main csv file?
It's not entirely clear from the image and your description what's wrong with >>, but it looks like maybe the CSV file doesn't have a trailing line break, so appending the next file (>>) starts writing directly at the end of the last row and column (cell) of the previous file's data.
I deal with CSVs almost daily and love the GoCSV tool. Its stack subcommand will do just what the name implies: stack multiple CSVs, one on top of the other.
In your case, you could download each JSON and convert it to an individual (intermediate) CSV. Then, at the end, stack all the intermediate CSVs, then delete all the intermediate CSVs.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o curl_outs/$h.csv;
done
gocsv stack curl_outs/*.csv > main.csv;
# I suggested deleting the intermediate CSVs
# rm curl_outs/*.csv
# ...
I changed the last line of your loop to json2csv -i curl_outs/$h.json -o curl_outs/$h.csv; to create those intermediate CSVs I mentioned before. Now, gocsv's stack subcommand can take a list of those intermediate CSVs and give you main.csv.
Issue with Unix Split command for splitting large data: split -l 1000 file.json myfile. Want to split this file into multiple files of 1000 records each. But Im getting the output as single file - no change.
P.S. File is created converting Pandas Dataframe to JSON.
Edit: It turn outs that my JSON is formatted in a way that it contains only one row. wc -l file.json is returning 0
Here is the sample: file.json
[
{"id":683156,"overall_rating":5.0,"hotel_id":220216,"hotel_name":"Beacon Hill Hotel","title":"\u201cgreat hotel, great location\u201d","text":"The rooms here are not palatial","author_id":"C0F"},
{"id":692745,"overall_rating":5.0,"hotel_id":113317,"hotel_name":"Casablanca Hotel Times Square","title":"\u201cabsolutely delightful\u201d","text":"I travelled from Spain...","author_id":"8C1"}
]
Invoking jq once per partition plus once to determine the number of partitions would be extremely inefficient. The following solution suffices to achieve the partitioning deemed acceptable in your answer:
jq -c ".[]" file.json | split -l 1000
If, however, it is deemed necessary for each file to be pretty-printed, you could run jq -s . for each file, which would still be more efficient than running .[N:N+S] multiple times.
If each partition should itself be a single JSON array, then see Splitting / chunking JSON files with JQ in Bash or Fish shell?
After asking elsewhere, the file was, in fact a single line.
Reformatting with JQ (in compact form), would enable the split, though to process the file would at least need the first and last character to be deleted (or add '[' & ']' to the split files)
I'd recommend spliting the JSON array with jq (see manual).
cat file.json | jq length # get length of an array
cat file.json | jq -c '.[0:999]' # first 1000 items
cat file.json | jq -c '.[1000:1999]' # second 1000 items
...
Notice -c for compact result (not pretty printed).
For automation, you can code a simple bash script to split your file into chunks given the array length (jq length).
I have a folder that contains subfolders of json files inside.
I need to write a bash script that combine all json files into one big valid json.
1) Tried first to use jq to combine first, all json files into each directory and later on I'll need to combine all into one big file again.
I didn't manage to make it work. I used this command:
jq -rs 'reduce .[] as $item ({}; . * $item)'
2) Other option is to create a json file at the beginning with "[" --> Process all files from all directories and for each file append the content --> append "]" at the end.
Can I achieve the same result with first way using jq only?
a very simple way is :
jq -s 'flatten' $target/*/*.json > $merged_json
a alternative ( in the case you need to use | ) :
cat $target/*/*.json | jq -s 'flatten' > $merged_json
or if too many files
find $target/* -name \*json cat {} | jq -s 'flatten' > $merged_json
I have a lot of rather large JSON logs which need to be imported into several DB tables.
I can easily parse them and create 1 CSV for import.
But how can I parse the JSON and get 2 different CSV files as output?
Simple (nonsense) example:
testJQ.log
{"id":1234,"type":"A","group":"games"}
{"id":5678,"type":"B","group":"cars"}
using
cat testJQ.log|jq --raw-output '[.id,.type,.group]|#csv'>testJQ.csv
I get one file testJQ.csv
1234,"A","games
5678,"B","cars"
But I would like to get this
types.csv
1234,"A"
5678,"B"
groups.csv
1234,"games"
5678,"cars"
Can this be done without having to parse the JSON twice, first time creating the types.csv and second time the groups.csv like this?
cat testJQ.log|jq --raw-output '[.id,.type]|#csv'>types.csv
cat testJQ.log|jq --raw-output '[.id,.group]|#csv'>groups.csv
I suppose one way you could hack this up is to output the contents of one file to stdout and the others to stderr and redirect to separate files. Of course you're limited to two files though.
$ <testJQ.log jq -r '([.id,.type]|#csv),([.id,.group]|#csv|stderr|empty)' \
1>types.csv 2>groups.csv
stderr outputs to stderr but the value propagates to the output, so you'll want to follow that up with empty to swallow that up.
Personally I wouldn't recommend doing this, I would just write a python script (or other language) to parse this if you needed to output to multiple files.
You will either need to run jq twice, or to run jq in conjunction with another program to "split" the output of the call to jq. For example, you could use a pipeline of the form: jq -c ... | awk ...
The potential disadvantage of the pipeline approach is that if JSON is the final output, it will be JSONL; but obviously that doesn't apply here.
There are many ways to craft such a pipeline. For example, assuming there are no raw newlines in the CSV:
< testJQ.log jq -r '
"types", ([.id,.type] |#csv),
"groups", ([.id,.group]|#csv)' |
awk 'NR % 2 == 1 {out=$1; next} {print >> out".csv"}'
Or:
< testJQ.log jq -r '([.id,.type],[.id,.group])|#csv' |
awk '{ out = ((NR % 2) == 1) ? "types" : "groups"; print >> out".csv"}'
For other examples, see e.g.
Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?
Splitting / chunking JSON files with JQ in Bash or Fish shell?
Split JSON into multiple files
Handling raw new-lines
Whether or not you split the CSV into multiple files, there is a potential issue with embedded raw newlines. One approach is to change "\n" in JSON strings to "\\n", e.g.
jq -r '([.id,.type],[.id,.group])
| map(if type == "string" then gsub("\n";"\\n") else . end)
| #csv'
I have json file exported from mongodb which looks like:
{"_id":"99919","city":"THORNE BAY"}
{"_id":"99921","city":"CRAIG"}
{"_id":"99922","city":"HYDABURG"}
{"_id":"99923","city":"HYDER"}
there are about 30000 lines, I want to split each line into it's own .json file. (I'm trying to transfer my data onto couchbase cluster)
I tried doing this:
cat cities.json | jq -c -M '.' | \
while read line; do echo $line > .chunks/cities_$(date +%s%N).json; done
but I found that it seems to drop loads of line and the output of running this command only gave me 50 odd files when I was expecting 30000 odd!!
Is there a logical way to make this not drop any data using anything that would suite?
Assuming you don't care about the exact filenames, if you want to split input into multiple files, just use split.
jq -c . < cities.json | split -l 1 --additional-suffix=.json - .chunks/cities_
In general to split any text file into separate files per-line using any awk on any UNIX system is simply:
awk '{close(f); f=".chunks/cities_"NR".json"; print > f}' cities.json