transform json to key/value file in bash - json

I have a JSON which looks like this:
{
"lorem": "ipsum",
"dolor": "sid",
"data": {
"key1": "value1",
"key2": "value2"
}
}
and I want an output which is ini like where I only need the content of 'data' (which is always flat, no branches). The output should look like this:
key1=value1
key2=value2
I can use jq (just don't get it running) but have to use a bash script for it. Can anyone help?

jq solution:
jq -r '.data | to_entries[] | "\(.key)=\(.value)"' input.json
The output:
key1=value1
key2=value2

This will work in BASH.
#!/bin/bash
all_keys=$( cat input.txt );
while read key
do
grep "$key" ./text.txt | awk -F':' '{ print $1"="$2}' | tr -d '[", ]'
done <<< "$all_keys"
Assuming you values are in text.txt and that you have your keys in input.txt.
Regards!

In python
This reads stdin and outputs the desired "key=value" one per line
#!/usr/bin/python
import json
import sys
data = sys.stdin.read()
json_structure=json.loads(data)
start_point=json_structure["data"]
for k in start_point.keys():
print("%s=%s" % (k, start_point[k]))
If I was using python to unmangle the json input however, I would probably rewrite the bash script in python

Related

Create a json from given list of filenames in unix script

Hello I am trying to write unix script/command where I have to list out all filenames from given directory with filename format string-{number}.txt(eg: filename-1.txt,filename-2.txt) from which I have to form a json object. any pointers would be helpful.
[{
"filenumber": "1",
"name": "filename-1.txt"
},
{
"filenumber": "2",
"name": "filename-2.txt"
}
]
In the above json file-number should be read from {number} format of the each filename
A single call to jq should suffice :
shopt -s extglob
printf "%s\0" *-+([0-9]).txt | \
jq -sR 'split("\u0000") |
map({filenumber:capture(".*-(?<n>.*)\\.txt").n,
name:.})'
Very easy for the command-line tool xidel and its integrated EXPath File Module:
$ xidel -se '
array{
for $x in file:list(.,false(),"*.txt")
return {
"filenumber":extract($x,"(\d+)\.txt",1),
"name":$x
}
}
'
Intuitively, I'd say you can do this with jq. However, in practice I've rarely been able to achieve what I wanted with jq :-)
With some lunch break puzzling, I've come up with this beauty:
ls | jq -R '{filenumber:input_line_number, name:.}' | jq -s .
Instead of ls you could use any other command that produces a newline separated list of strings.
I have tried with multiple examples to achieve exact use case of mine and finally found this working fine exactly how I wanted Thanks
for file in $(ls *.txt); do file_version=$(echo $file | sed 's/\(^.*-\)\(.*\)\(.txt.*$\)/\2/'); jq -n --arg name "$file_version" --arg path "$file" '{name: $name, name: $path}'; done | jq -n '.urls |= [inputs]'

Using JQ to create a JSON file from a list of files

I want to create a JSON file from a list of files like:
$ ls -r *
folder1/
file1.pdf
file2.pdf
file3.pdf
folder2/
file4.pdf
file5.pdf
file6.pdf
I want a json file that looks like this:
{
'folder1' : [ 'file1.pdf', 'file2.pdf', 'file3.pdf' ],
'folder2' : [ 'file4.pdf', 'file5.pdf', 'file6.pdf' ]
}
For now I am able to create the list of files with jq but not sure how to name them. This is what I am doing now:
$ ls folder | jq -R -s 'split("\n") -[""]'
[
"file1.pdf",
"file2.pdf",
"file3.pdf"
]
Thanks a lot for the help!
PS. Additionally, I need to include a prefix in the name of files. I can try to do it with sed but if maybe there is an easier way to do it here, that would be even better.
You shouldn't parse the output of ls. Pass folder1/file1.pdf, folder1/file2.pdf, etc. to JQ as positional arguments instead, and parse them in there.
jq -n 'reduce ($ARGS.positional[] / "/") as [$k, $v] (.; .[$k] += ["prefix-\($v)"])' --args */*

How to Iterate over an array of objets using jq

I have a javascript file which prints a JSON array of objects:
// myfile.js output
[
{ "id": 1, "name": "blah blah", ... },
{ "id": 2, "name": "xxx", ... },
...
]
In my bash script, I want to iterate through each object.
I've tried following, but it doesn't work.
#!/bin/bash
output=$(myfile.js)
for row in $(echo ${output} | jq -c '.[]'); do
echo $row
done
You are trying to invoke myfile.js as a command. You need this:
output=$(cat myfile.js)
instead of this:
output=$(myfile.js)
But even then, your current approach isn't going to work well if the data has whitespace in it (which it does, based on the sample you posted). I suggest the following alternative:
jq -c '.[]' < myfile.js |
while read -r row
do
echo "$row"
done
Output:
{"id":1,"name":"blah blah"}
{"id":2,"name":"xxx"}
Edit:
If your data is arising from a previous process invocation, such as mongo in your case, you can pipe it directly to jq (to remain portable), like this:
mongo myfile.js |
jq -c '.[]' |
while read -r row
do
echo "$row"
done
How can I make jq -c '.[]' < (mongo myfile.js) work?
In a bash shell, you would write an expression along the following lines:
while read -r line ; do .... done < <(mongo myfile.js | jq -c .[])
Note that there are two occurrences of "<" in the above expression.
Also, the above assumes mongo is emitting valid JSON. If it emits //-style comments, those would have somehow to be removed.
Comparison with piping into while
If you use the idiom:
... | while read -r line ; do .... done
then the bindings of any variables in .... will be lost.

How can I transform one line JSON to pretty print in linux shell? [duplicate]

Is there a (Unix) shell script to format JSON in human-readable form?
Basically, I want it to transform the following:
{ "foo": "lorem", "bar": "ipsum" }
... into something like this:
{
"foo": "lorem",
"bar": "ipsum"
}
With Python 2.6+ you can do:
echo '{"foo": "lorem", "bar": "ipsum"}' | python -m json.tool
or, if the JSON is in a file, you can do:
python -m json.tool my_json.json
if the JSON is from an internet source such as an API, you can use
curl http://my_url/ | python -m json.tool
For convenience in all of these cases you can make an alias:
alias prettyjson='python -m json.tool'
For even more convenience with a bit more typing to get it ready:
prettyjson_s() {
echo "$1" | python -m json.tool
}
prettyjson_f() {
python -m json.tool "$1"
}
prettyjson_w() {
curl "$1" | python -m json.tool
}
for all the above cases. You can put this in .bashrc and it will be available every time in shell. Invoke it like prettyjson_s '{"foo": "lorem", "bar": "ipsum"}'.
Note that as #pnd pointed out in the comments below, in Python 3.5+ the JSON object is no longer sorted by default. To sort, add the --sort-keys flag to the end. I.e. ... | python -m json.tool --sort-keys.
You can use: jq
It's very simple to use and it works great! It can handle very large JSON structures, including streams. You can find
their tutorials here.
Usage examples:
$ jq --color-output . file1.json file1.json | less -R
$ command_with_json_output | jq .
$ jq # stdin/"interactive" mode, just enter some JSON
$ jq <<< '{ "foo": "lorem", "bar": "ipsum" }'
{
"bar": "ipsum",
"foo": "lorem"
}
Or use jq with identity filter:
$ jq '.foo' <<< '{ "foo": "lorem", "bar": "ipsum" }'
"lorem"
I use the "space" argument of JSON.stringify to pretty-print JSON in JavaScript.
Examples:
// Indent with 4 spaces
JSON.stringify({"foo":"lorem","bar":"ipsum"}, null, 4);
// Indent with tabs
JSON.stringify({"foo":"lorem","bar":"ipsum"}, null, '\t');
From the Unix command-line with Node.js, specifying JSON on the command line:
$ node -e "console.log(JSON.stringify(JSON.parse(process.argv[1]), null, '\t'));" \
'{"foo":"lorem","bar":"ipsum"}'
Returns:
{
"foo": "lorem",
"bar": "ipsum"
}
From the Unix command-line with Node.js, specifying a filename that contains JSON, and using an indent of four spaces:
$ node -e "console.log(JSON.stringify(JSON.parse(require('fs') \
.readFileSync(process.argv[1])), null, 4));" filename.json
Using a pipe:
echo '{"foo": "lorem", "bar": "ipsum"}' | node -e \
"\
s=process.openStdin();\
d=[];\
s.on('data',function(c){\
d.push(c);\
});\
s.on('end',function(){\
console.log(JSON.stringify(JSON.parse(d.join('')),null,2));\
});\
"
I wrote a tool that has one of the best "smart whitespace" formatters available. It produces more readable and less verbose output than most of the other options here.
underscore-cli
This is what "smart whitespace" looks like:
I may be a bit biased, but it's an awesome tool for printing and manipulating JSON data from the command-line. It's super-friendly to use and has extensive command-line help/documentation. It's a Swiss Army knife that I use for 1001 different small tasks that would be surprisingly annoying to do any other way.
Latest use-case: Chrome, Dev console, Network tab, export all as HAR file, "cat site.har | underscore select '.url' --outfmt text | grep mydomain"; now I have a chronologically ordered list of all URL fetches made during the loading of my company's site.
Pretty printing is easy:
underscore -i data.json print
Same thing:
cat data.json | underscore print
Same thing, more explicit:
cat data.json | underscore print --outfmt pretty
This tool is my current passion project, so if you have any feature requests, there is a good chance I'll address them.
I usually just do:
echo '{"test":1,"test2":2}' | python -mjson.tool
And to retrieve select data (in this case, "test"'s value):
echo '{"test":1,"test2":2}' | python -c 'import sys,json;data=json.loads(sys.stdin.read()); print data["test"]'
If the JSON data is in a file:
python -mjson.tool filename.json
If you want to do it all in one go with curl on the command line using an authentication token:
curl -X GET -H "Authorization: Token wef4fwef54te4t5teerdfgghrtgdg53" http://testsite/api/ | python -mjson.tool
If you use npm and Node.js, you can do npm install -g json and then pipe the command through json. Do json -h to get all the options. It can also pull out specific fields and colorize the output with -i.
curl -s http://search.twitter.com/search.json?q=node.js | json
Thanks to J.F. Sebastian's very helpful pointers, here's a slightly enhanced script I've come up with:
#!/usr/bin/python
"""
Convert JSON data to human-readable form.
Usage:
prettyJSON.py inputFile [outputFile]
"""
import sys
import simplejson as json
def main(args):
try:
if args[1] == '-':
inputFile = sys.stdin
else:
inputFile = open(args[1])
input = json.load(inputFile)
inputFile.close()
except IndexError:
usage()
return False
if len(args) < 3:
print json.dumps(input, sort_keys = False, indent = 4)
else:
outputFile = open(args[2], "w")
json.dump(input, outputFile, sort_keys = False, indent = 4)
outputFile.close()
return True
def usage():
print __doc__
if __name__ == "__main__":
sys.exit(not main(sys.argv))
It is not too simple with a native way with the jq tools.
For example:
cat xxx | jq .
a simple bash script for pretty json printing
json_pretty.sh
#/bin/bash
grep -Eo '"[^"]*" *(: *([0-9]*|"[^"]*")[^{}\["]*|,)?|[^"\]\[\}\{]*|\{|\},?|\[|\],?|[0-9 ]*,?' | awk '{if ($0 ~ /^[}\]]/ ) offset-=4; printf "%*c%s\n", offset, " ", $0; if ($0 ~ /^[{\[]/) offset+=4}'
Example:
cat file.json | json_pretty.sh
With Perl, use the CPAN module JSON::XS. It installs a command line tool json_xs.
Validate:
json_xs -t null < myfile.json
Prettify the JSON file src.json to pretty.json:
< src.json json_xs > pretty.json
If you don't have json_xs, try json_pp . "pp" is for "pure perl" – the tool is implemented in Perl only, without a binding to an external C library (which is what XS stands for, Perl's "Extension System").
On *nix, reading from stdin and writing to stdout works better:
#!/usr/bin/env python
"""
Convert JSON data to human-readable form.
(Reads from stdin and writes to stdout)
"""
import sys
try:
import simplejson as json
except:
import json
print json.dumps(json.loads(sys.stdin.read()), indent=4)
sys.exit(0)
Put this in a file (I named mine "prettyJSON" after AnC's answer) in your PATH and chmod +x it, and you're good to go.
That's how I do it:
curl yourUri | json_pp
It shortens the code and gets the job done.
The JSON Ruby Gem is bundled with a shell script to prettify JSON:
sudo gem install json
echo '{ "foo": "bar" }' | prettify_json.rb
Script download: gist.github.com/3738968
$ echo '{ "foo": "lorem", "bar": "ipsum" }' \
> | python -c'import fileinput, json;
> print(json.dumps(json.loads("".join(fileinput.input())),
> sort_keys=True, indent=4))'
{
"bar": "ipsum",
"foo": "lorem"
}
NOTE: It is not the way to do it.
The same in Perl:
$ cat json.txt \
> | perl -0007 -MJSON -nE'say to_json(from_json($_, {allow_nonref=>1}),
> {pretty=>1})'
{
"bar" : "ipsum",
"foo" : "lorem"
}
Note 2:
If you run
echo '{ "Düsseldorf": "lorem", "bar": "ipsum" }' \
| python -c'import fileinput, json;
print(json.dumps(json.loads("".join(fileinput.input())),
sort_keys=True, indent=4))'
the nicely readable word becomes \u encoded
{
"D\u00fcsseldorf": "lorem",
"bar": "ipsum"
}
If the remainder of your pipeline will gracefully handle unicode and you'd like your JSON to also be human-friendly, simply use ensure_ascii=False
echo '{ "Düsseldorf": "lorem", "bar": "ipsum" }' \
| python -c'import fileinput, json;
print json.dumps(json.loads("".join(fileinput.input())),
sort_keys=True, indent=4, ensure_ascii=False)'
and you'll get:
{
"Düsseldorf": "lorem",
"bar": "ipsum"
}
UPDATE I'm using jq now as suggested in another answer. It's extremely powerful at filtering JSON, but, at its most basic, also an awesome way to pretty print JSON for viewing.
jsonpp is a very nice command line JSON pretty printer.
From the README:
Pretty print web service responses like so:
curl -s -L http://<!---->t.co/tYTq5Pu | jsonpp
and make beautiful the files running around on your disk:
jsonpp data/long_malformed.json
If you're on Mac OS X, you can brew install jsonpp. If not, you can simply copy the binary to somewhere in your $PATH.
Try pjson. It has colors!
Install it with pip:
⚡ pip install pjson
And then pipe any JSON content to pjson.
Or, with Ruby:
echo '{ "foo": "lorem", "bar": "ipsum" }' | ruby -r json -e 'jj JSON.parse gets'
You can use this simple command to achieve the result:
echo "{ \"foo\": \"lorem\", \"bar\": \"ipsum\" }"|python -m json.tool
I use jshon to do exactly what you're describing. Just run:
echo $COMPACTED_JSON_TEXT | jshon
You can also pass arguments to transform the JSON data.
Check out Jazor. It's a simple command line JSON parser written in Ruby.
gem install jazor
jazor --help
JSONLint has an open-source implementation on GitHub that can be used on the command line or included in a Node.js project.
npm install jsonlint -g
and then
jsonlint -p myfile.json
or
curl -s "http://api.twitter.com/1/users/show/user.json" | jsonlint | less
Simply pipe the output to jq ..
Example:
twurl -H ads-api.twitter.com '.......' | jq .
You can simply use standard tools like jq or json_pp.
echo '{ "foo": "lorem", "bar": "ipsum" }' | json_pp
or
echo '{ "foo": "lorem", "bar": "ipsum" }' | jq
will both prettify output like the following (jq even more colorful):
{
"foo": "lorem",
"bar": "ipsum"
}
The huge advantage of jq is that it can do A LOT more if you'd like to parse and process the json.
With Perl, if you install JSON::PP from CPAN you'll get the json_pp command. Stealing the example from B Bycroft you get:
[pdurbin#beamish ~]$ echo '{"foo": "lorem", "bar": "ipsum"}' | json_pp
{
"bar" : "ipsum",
"foo" : "lorem"
}
It's worth mentioning that json_pp comes pre-installed with Ubuntu 12.04 (at least) and Debian in /usr/bin/json_pp
Pygmentize
I combine Python's json.tool with pygmentize:
echo '{"foo": "bar"}' | python -m json.tool | pygmentize -g
There are some alternatives to pygmentize which are listed in my this answer.
Here is a live demo:
You only need to use jq
If jq is not installed then you need to install jq first:
sudo apt-get update
sudo apt-get install jq
After installing jq then only need to use jq:
echo '{ "foo": "lorem", "bar": "ipsum" }' | jq
Output looks like
{
"foo": "lorem",
"bar": "ipsum"
}
I recommend using the json_xs command line utility which is included in the JSON::XS perl module. JSON::XS is a Perl module for serializing/deserializing JSON, on a Debian or Ubuntu machine you can install it like this:
sudo apt-get install libjson-xs-perl
It is obviously also available on CPAN.
To use it to format JSON obtained from a URL you can use curl or wget like this:
$ curl -s http://page.that.serves.json.com/json/ | json_xs
or this:
$ wget -q -O - http://page.that.serves.json.com/json/ | json_xs
and to format JSON contained in a file you can do this:
$ json_xs < file-full-of.json
To reformat as YAML, which some people consider to be more humanly-readable than JSON:
$ json_xs -t yaml < file-full-of.json
jj is super-fast, can handle ginormous JSON documents economically, does not mess with valid JSON numbers, and is easy to use, e.g.
jj -p # for reading from STDIN
or
jj -p -i input.json
It is (2018) still quite new so maybe it won’t handle invalid JSON the way you expect, but it is easy to install on major platforms.
bat is a cat clone with syntax highlighting:
Example:
echo '{"bignum":1e1000}' | bat -p -l json
-p will output without headers, and -l will explicitly specify the language.
It has colouring and formatting for JSON and does not have the problems noted in this comment: How can I pretty-print JSON in a shell script?
Install yajl-tools with the command below:
sudo apt-get install yajl-tools
then,
echo '{"foo": "lorem", "bar": "ipsum"}' | json_reformat

Split a JSON file into separate files

I have a large JSON file that is an object of objects, which I would like to split into separate files name after object keys. Is it possible to achieve this using jq or any other off-the-shelf tools?
The original JSON is in the following format
{ "item1": {...}, "item2": {...}, ...}
Given this input I would like to produce files item1.json, item2.json etc.
This should give you a start:
for f in `cat input.json | jq -r 'keys[]'` ; do
cat input.json | jq ".$f" > $f.json
done
or when you insist on more bashy syntax like some seem to prefer:
for f in $(jq -r 'keys[]') ; do
jq ".[\"$f\"]" < input.json > "$f.json"
done < input.json
Here's a solution that requires only one call to jq:
jq -cr 'keys[] as $k | "\($k)\n\(.[$k])"' input.json |
while read -r key ; do
read -r item
printf "%s\n" "$item" > "/tmp/$key.json"
done
It might be faster to pipe the output of the jq command to awk, e.g.:
jq -cr 'keys[] as $k | "\($k)\t\(.[$k])"' input.json |
awk -F\\t '{ print $2 > "/tmp/" $1 ".json" }'
Of course, these approaches will need to be modified if the key names contain characters that cannot be used in filenames.
Is it possible to achieve this using jq or any other off-the-shelf tools?
It is. xidel can also do this very efficiently.
Let's assume 'input.json' :
{
"item1": {
"a": 1
},
"item2": {
"b": 2
},
"item3": {
"c": 3
}
}
Inefficient Bash method:
for f in $(xidel -s input.json -e '$json()'); do
xidel -s input.json -e '$json("'$f'")' > $f.json
done
For every object key another instance of xidel is called to parse the object. Especially when you have a very large JSON this is pretty slow.
Efficient file:write() method:
xidel -s input.json -e '
$json() ! file:write(
.||".json",
$json(.),
{"method":"json"}
)
'
One xidel call creates 'item{1,2,3}.json'. Their content is a compact/minified object, like {"a": 1} for 'item1.json'.
xidel -s input.json -e '
for $x in $json() return
file:write(
concat($x,".json"),
$json($x),
{
"method":"json",
"indent":true()
}
)
'
One xidel call creates 'item{1,2,3}.json'. Their content is a prettified object (because of {"indent":true()}), like...
{
"a": 1
}
...for 'item1.json'. Different query (for-loop), same result.
This method is multitudes faster!