I have a folder that contains subfolders of json files inside.
I need to write a bash script that combine all json files into one big valid json.
1) Tried first to use jq to combine first, all json files into each directory and later on I'll need to combine all into one big file again.
I didn't manage to make it work. I used this command:
jq -rs 'reduce .[] as $item ({}; . * $item)'
2) Other option is to create a json file at the beginning with "[" --> Process all files from all directories and for each file append the content --> append "]" at the end.
Can I achieve the same result with first way using jq only?
a very simple way is :
jq -s 'flatten' $target/*/*.json > $merged_json
a alternative ( in the case you need to use | ) :
cat $target/*/*.json | jq -s 'flatten' > $merged_json
or if too many files
find $target/* -name \*json cat {} | jq -s 'flatten' > $merged_json
Related
I have txt file with curl response with information on the thousands of files downloaded and the year in which they were downloaded.
I try unsuccessfully (sed+grep) to extract the filename and the year and write them to a separate file ("filname+year.txt") separated by a comma.
{"status_code":"200",
"status_message":"Results found.",
"results":[{"filename":"test189.pdf",
"year":"2012",
"URL":"https:\/\/www.orkistar.org\/random.php?q=iper.pdf&y=2012"
}
......
Any idea?
Use a JSON aware tool, e.g. jq:
jq -r '.results[] as $r | $r.filename + "," + $r.year' < file.json
jq has a filter for converting to CSV. Using it ensures various edge cases are handled appropriately, assuming the goal is to generate valid CSV:
jq -r '.results[] | [.filename, .year] | #csv' file.json
In any case, notice that there is no need to introduce any named variables.
I have a lot of rather large JSON logs which need to be imported into several DB tables.
I can easily parse them and create 1 CSV for import.
But how can I parse the JSON and get 2 different CSV files as output?
Simple (nonsense) example:
testJQ.log
{"id":1234,"type":"A","group":"games"}
{"id":5678,"type":"B","group":"cars"}
using
cat testJQ.log|jq --raw-output '[.id,.type,.group]|#csv'>testJQ.csv
I get one file testJQ.csv
1234,"A","games
5678,"B","cars"
But I would like to get this
types.csv
1234,"A"
5678,"B"
groups.csv
1234,"games"
5678,"cars"
Can this be done without having to parse the JSON twice, first time creating the types.csv and second time the groups.csv like this?
cat testJQ.log|jq --raw-output '[.id,.type]|#csv'>types.csv
cat testJQ.log|jq --raw-output '[.id,.group]|#csv'>groups.csv
I suppose one way you could hack this up is to output the contents of one file to stdout and the others to stderr and redirect to separate files. Of course you're limited to two files though.
$ <testJQ.log jq -r '([.id,.type]|#csv),([.id,.group]|#csv|stderr|empty)' \
1>types.csv 2>groups.csv
stderr outputs to stderr but the value propagates to the output, so you'll want to follow that up with empty to swallow that up.
Personally I wouldn't recommend doing this, I would just write a python script (or other language) to parse this if you needed to output to multiple files.
You will either need to run jq twice, or to run jq in conjunction with another program to "split" the output of the call to jq. For example, you could use a pipeline of the form: jq -c ... | awk ...
The potential disadvantage of the pipeline approach is that if JSON is the final output, it will be JSONL; but obviously that doesn't apply here.
There are many ways to craft such a pipeline. For example, assuming there are no raw newlines in the CSV:
< testJQ.log jq -r '
"types", ([.id,.type] |#csv),
"groups", ([.id,.group]|#csv)' |
awk 'NR % 2 == 1 {out=$1; next} {print >> out".csv"}'
Or:
< testJQ.log jq -r '([.id,.type],[.id,.group])|#csv' |
awk '{ out = ((NR % 2) == 1) ? "types" : "groups"; print >> out".csv"}'
For other examples, see e.g.
Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?
Splitting / chunking JSON files with JQ in Bash or Fish shell?
Split JSON into multiple files
Handling raw new-lines
Whether or not you split the CSV into multiple files, there is a potential issue with embedded raw newlines. One approach is to change "\n" in JSON strings to "\\n", e.g.
jq -r '([.id,.type],[.id,.group])
| map(if type == "string" then gsub("\n";"\\n") else . end)
| #csv'
I have json file exported from mongodb which looks like:
{"_id":"99919","city":"THORNE BAY"}
{"_id":"99921","city":"CRAIG"}
{"_id":"99922","city":"HYDABURG"}
{"_id":"99923","city":"HYDER"}
there are about 30000 lines, I want to split each line into it's own .json file. (I'm trying to transfer my data onto couchbase cluster)
I tried doing this:
cat cities.json | jq -c -M '.' | \
while read line; do echo $line > .chunks/cities_$(date +%s%N).json; done
but I found that it seems to drop loads of line and the output of running this command only gave me 50 odd files when I was expecting 30000 odd!!
Is there a logical way to make this not drop any data using anything that would suite?
Assuming you don't care about the exact filenames, if you want to split input into multiple files, just use split.
jq -c . < cities.json | split -l 1 --additional-suffix=.json - .chunks/cities_
In general to split any text file into separate files per-line using any awk on any UNIX system is simply:
awk '{close(f); f=".chunks/cities_"NR".json"; print > f}' cities.json
I am trying to create a menu using the select function in bash. I am accessing an API which will return its output in json format. I will then process what the API returns into a select statement where a user can then interact with.
Here is the API call and how I parse the output:
curl -H "Authorization:Bearer $ACCESS_TOKEN" https://api.runscope.com/buckets \
| python -mjson.tool > output.json
This will send the output from the curl through python's json parsing tool and finally into the output.json file.
I then create an array using this json blob. I had to set IFS to \n in order to parse the file properly:
IFS=$'\n'
BUCKETS=("$(jq '.data | .[].name' output.json)")
I then add an exit option to the array so that users have a way to quit the selection menu:
BUCKETS+=("Exit")
Finally, I create the menu:
select BUCKET in $BUCKETS;
do
case $BUCKET in
"Exit")
echo "Exiting..."
break;;
esac
echo "You picked: $BUCKET"
done
Unfortunately, this does not create the exit option. I am able to see a menu consisting of every other option I want, except the exit option. Every option in the menu and in the array has quotes around them. How do I get the Exit option to show up?
$BUCKETS is expanding to the first element of the BUCKETS array.
Which is then being word-split and used as your select entries.
This makes sense since you wrapped the jq subshell in double quotes which prevented word-splitting from happening there (and means the IFS change isn't doing anything I believe).
Iff your entries can contain spaces and you want that assigned to an array properly the way to do that is by reading the output of jq with a while IFS= read -r entry; do loop.
BUCKETS=()
while IFS= read -r entry; do
BUCKETS+=("$entry")
done < <(jq '.data | .[].name' output.json)
Then appending your exit item to the array.
BUCKETS+=(Exit)
and then using
select BUCKET in "${BUCKETS[#]}"; do
(Both select a in l; do or select a in l\ndo there's no need for ; and \n there.)
That all being said unless you need output.json for something else you can avoid that too.
BUCKETS=()
while IFS= read -r entry; do
BUCKETS+=("$entry")
done < <(curl -H "Authorization:Bearer $ACCESS_TOKEN" https://api.runscope.com/buckets | python -mjson.tool | jq '.data | .[].name')
Instead of
jq '.data | .[].name' output.json
try
jq -r '.data | .[].name' output.json
(-r : raw data without quotes)
And the most important part :
select BUCKET in "${BUCKETS[#]}"
^^^
ARRAY syntax
I kwould like to read the json file from http://freifunk.in-kiel.de/alfred.json in bash and separate it into files named by hostname of each element in that json string.
How do I read json with bash?
How do I read json with bash?
You can use jq for that. First thing you have to do is extract the list of hostnames and save it to a bash array. Running a loop on that array you would then run again a query for each hostname to extract each element based on them and save the data through redirection with the filename based on them as well.
The easiest way to do this is with two instances of jq -- one listing hostnames, and another (inside the loop) extracting individual entries.
This is, alas, a bit inefficient (since it means rereading the file from the top for each record to extract).
while read -r hostname; do
[[ $hostname = */* ]] && continue # paranoia; see comments
jq --arg hostname "$hostname" \
'.[] | select(.hostname == $hostname)' <alfred.json >"out-${hostname}.json"
done < <(jq -r '.[] | .hostname' <alfred.json)
(The out- prefix prevents alfred.json from being overwritten if it includes an entry for a host named alfred).
You can use python one-liner in similar way like (I haven't checked):
curl -s http://freifunk.in-kiel.de/alfred.json | python -c '
import json, sys
tbl=json.load(sys.stdin)
for t in tbl:
with open(tbl[t]["hostname"], "wb") as fp:
json.dump(tbl[t], fp)
'