I'm working on parsing JSON data using JSON.sh. And I wanted to read data from json file (test.json) whose content will be something like,
{
"/home/ukrishnan/projects/test.yml": {
"LOG_DRIVER": "syslog",
"IMAGE": "mysql:5.6"
},
"/home/ukrishnan/projects/mysql/app.xml": {
"ENV_ACCOUNT_BRIDGE_ENDPOINT": "/u01/src/test/sample.txt"
}
}
And I try to parse this JSON using JSON.sh by using,
test_parser=`sh ./lib/JSON.sh < test/test.json`
echo $test_parser
It prints,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog" ["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6" ["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"} ["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt" ["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"} [] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
Whereas, the same command (sh ./lib/JSON.sh < test/test.json), if I run through terminal, it is printing with line breaks,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog"
["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6"
["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"}
["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt"
["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}
[] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
I wanted to read this and assign to bash variables like,
file_name='/home/ukrishnan/projects/test.yml'
key='LOG_DRIVER'
value='syslog'
As I'm almost completely new to shell script and grep or awk, I don't have much idea of how to achieve this. Any help on this would be greatly appreciated.
I wrote a JSON serializer / deserializer for gawk, if you're interested. Save that script and modify it, replacing everything above # === FUNCTIONS === with the following:
#!/usr/bin/gawk -f
# capture JSON string from beginning to end into a scalar variable
{ json = json ORS $0 }
END {
# objectify JSON string to the multilevel array "obj"
deserialize(json, obj)
for (filename in obj) {
print "file_name=" quote(filename)
for (key in obj[filename]) {
# print key="value"
print key "=" quote(obj[filename][key])
}
}
}
Do chmod 755 json.awk and execute it. Output will resemble this:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
ENV_ACCOUNT_BRIDGE_ENDPOINT="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
LOG_DRIVER="syslog"
IMAGE="mysql:5.6"
Hopefully the logic is reasonably easy to follow. If you prefer to output filename=, key=, and value= on every loop iteration, modify the nested for loops accordingly:
for (filename in obj) {
for (key in obj[filename]) {
print "file_name=" quote(filename)
print "key=" quote(key)
print "value=" quote(obj[filename][key])
}
}
That change will result in the following output:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
key="ENV_ACCOUNT_BRIDGE_ENDPOINT"
value="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
key="LOG_DRIVER"
value="syslog"
file_name="/home/ukrishnan/projects/test.yml"
key="IMAGE"
value="mysql:5.6"
Anyway, with that output, you can do something silly in BASH like this to populate and act upon the variables:
#!/bin/bash
./test.awk test5.json | while read -r line; do {
eval $line
[ "${line/=*/}" = "value" ] && {
echo "bash: file_name=$file_name"
echo "bash: key=$key"
echo "bash: value=$value"
echo "------"
}
}; done
It'd probably be more graceful just to do all processing within gawk from start to finish and not mess with the polyglot handoff, though.
Getting back to json.awk, if you prefer to keep json.awk modular for easy reuse in future projects, you could remove everything above # === FUNCTIONS ===, create a separate main.awk containing the code block at the top of this answer, and #include "json.awk" as a helper library pretty much anywhere outside of END {...} (just below the shbang, for example).
JSON.sh (from http://json.org) offers a nice bash friendly means of flattening out a JSON file. Which you've already provided how it looks in your question. So, the flatten form is the format:
[node] tab value
You have to think in UNIX script in extracting the information you want, you'll note the lines you're interested in actually follow this pattern:
["filename","key"] tab ["value"]
In regex notation, we replace:
filename with (.*)
key with (.*)
tab with \t
value with (.*)
We can retrieve the first, second and third matching groups with \1, \2, \3 respectively.
When used in sed we also note that these symbols []() need to be escaped with a backslash \, resulting in the following script:
./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d'
/home/ukrishnan/projects/test.yml,LOG_DRIVER,syslog
/home/ukrishnan/projects/test.yml,IMAGE,mysql:5.6
/home/ukrishnan/projects/mysql/app.xml,ENV_ACCOUNT_BRIDGE_ENDPOINT,/u01/src/test/sample.txt
Now we put the lines in a loop and for each line, we can extract out filename,key,value:
for line in $(./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d')
do
IFS="," read -ra arr <<< $line
filename=${arr[0]}
key=${arr[1]}
value=${arr[2]}
cat <<EOF
filename : $filename
key : $key
value : $value
EOF
done
Which outputs:
filename : /home/ukrishnan/projects/test.yml
key : LOG_DRIVER
value : syslog
filename : /home/ukrishnan/projects/test.yml
key : IMAGE
value : mysql:5.6
filename : /home/ukrishnan/projects/mysql/app.xml
key : ENV_ACCOUNT_BRIDGE_ENDPOINT
value : /u01/src/test/sample.txt
I've found some data that someone is downloading into a JSON file (I think! - I'm a newb!). The file contains data on nearly 600 football players.
Here's the file: https://raw.githubusercontent.com/llimllib/fantasypl_stats/f944410c21f90e7c5897cd60ecca5dc72b5ab619/data/players.1426687570.json
Is there a way I can grab some of the data and convert it to .csv? Specifically the 'Fixture History'?
Thanks in advance for any help :)
Here is a solution using jq
If the file filter.jq contains
.[]
| {first_name, second_name, all:.fixture_history.all[]}
| [.first_name, .second_name, .all[]]
| #csv
and data.json contains the sample data then the command
jq -M -r -f filter.jq data.json
will produce the output (note only 10 rows shown here)
"Wojciech","Szczesny","16 Aug 17:30",1,"CRY(H) 2-1",90,0,0,0,1,0,0,0,0,0,1,0,13,7,0,55,2
"Wojciech","Szczesny","23 Aug 17:30",2,"EVE(A) 2-2",90,0,0,0,2,0,0,0,0,0,0,0,5,9,-9306,55,1
"Wojciech","Szczesny","31 Aug 16:00",3,"LEI(A) 1-1",90,0,0,0,1,0,0,0,1,0,2,0,7,15,-20971,55,1
"Wojciech","Szczesny","13 Sep 12:45",4,"MCI(H) 2-2",90,0,0,0,2,0,0,0,0,0,6,0,12,17,-39686,55,3
"Wojciech","Szczesny","20 Sep 15:00",5,"AVL(A) 3-0",90,0,0,1,0,0,0,0,0,0,2,0,14,22,-15931,55,6
"Wojciech","Szczesny","27 Sep 17:30",6,"TOT(H) 1-1",90,0,0,0,1,0,0,0,0,0,4,0,10,13,-5389,55,3
"Wojciech","Szczesny","05 Oct 14:05",7,"CHE(A) 0-2",90,0,0,0,2,0,0,0,0,0,1,0,3,9,-8654,55,1
"Wojciech","Szczesny","18 Oct 15:00",8,"HUL(H) 2-2",90,0,0,0,2,0,0,0,0,0,2,0,7,9,-824,54,1
"Wojciech","Szczesny","25 Oct 15:00",9,"SUN(A) 2-0",90,0,0,1,0,0,0,0,0,0,3,0,16,22,-11582,54,7
JSON is a more detailed data format than CSV - it allows for more complex data structures. Inevitably if you do this, you 'lose detail'.
If you want to fetch it automatically - that's doable, but I've skipped it because 'doing' https URLs is slightly more complicated.
So assuming you've downloaded your file, here's a possible solution in Perl (You've already got one for Python - both are very powerful scripting languages, but can pretty much cover the same ground - so it's as much a matter of taste as to which you use).
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
my $file = 'players.json';
open( my $input, "<", $file ) or die $!;
my $json_data = decode_json(
do { local $/; <$input> }
);
foreach my $player_id ( keys %{$json_data} ) {
foreach my $fixture (
#{ $json_data->{$player_id}->{fixture_history}->{all} } )
{
print join( ",",
$player_id, $json_data->{$player_id}->{web_name},
#{$fixture}, "\n", );
}
}
Hopefully you can see what's going on here - you load the file $input, and decode_json to create a data structure.
This data structure is a nested hash (perl's term for the type of data structure). hashes are key-value pairs.
So we extract the keys from this hash - which is the ID number right at the beginning of each entry.
Then we loop through each of them - extracting the the fixture_history array. And for each element in that array, we print the player ID, their web_name and then the data from fixture_history.
This gives output like:
1,Szczesny,10 Feb 19:45,25,LEI(H) 2-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-2413,52,0,
1,Szczesny,21 Feb 15:00,26,CRY(A) 2-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-2805,52,0,
1,Szczesny,01 Mar 14:05,27,EVE(H) 2-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1862,52,0,
1,Szczesny,04 Mar 19:45,28,QPR(A) 2-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1248,52,0,
1,Szczesny,14 Mar 15:00,29,WHU(H) 3-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1897,52,0,
Does this make sense?
Python has some good libraries for doing this. If you copy the following code into a file and save it as fix_hist.py or something, then save your JSON file as file.json in the same directory, it will create a csv file with the fixture histories each saved as a row. Just run python fix_hist.py in your command prompt (or terminal for mac):
import csv
import json
json_data = open("file.json")
data = json.load(json_data)
f = csv.writer(open("fix_hists.csv","wb+"))
for i in data:
fh = data[i]["fixture_history"]
array = fh["all"]
for j in array:
f.writerow(j)
json_data.close()
To add additional data to the fixture history, you can add insert statements before writing the rows:
import csv
import json
json_data = open("file.json")
data = json.load(json_data)
f = csv.writer(open("fix_hists.csv","wb+"))
arr = []
for i in data:
fh = data[i]["fixture_history"]
array = fh["all"]
for j in array:
try:
j.insert(0,str(data[i]["first_name"]))
except:
j.insert(0,'error')
try:
j.insert(1,data[i]["web_name"])
except:
j.insert(1,'error')
try:
f.writerow(j)
except:
f.writerow(['error','error'])
json_data.close()
With insert(), just indicate the position in the row you want the data point to occupy as the first argument.