I have a json file that requires parsing.
Using scripting like sed/awk or perl, how to extract value30 and substitute that to value6 prefixed by string "XX" (eg. XX + value30).
Where:
field6 = fixed string
value6 = fixed string
value30 = varying string
[
{"field6" : "value6", "field30" : "value30" },
{ "field6" : "value6", "field30" : "value30" }
]
If I understand you correctly, this program should do what you're after:
use JSON qw(decode_json encode_json);
use strict;
use warnings;
# set the input line separator to undefined so the next read (<>) reads the entire file
undef $/;
# read the entire input (stdin or a file passed on the command line) and parse it as JSON
my $data = decode_json(<>);
my $from_field = "field6";
my $to_field = "field30";
for (#$data) {
$_->{$to_field} = $_->{$from_field};
}
print encode_json($data), "\n";
It relies on the JSON module being installed, which you can install via cpanm (which should be available in most modern Perl distributions):
cpanm install JSON
If the program is in the file substitute.pl and your json array is in data.json, then you would run it as:
perl substitute.pl data.json
# or
cat data.json | perl substitute.pl
It should produce:
[{"field30":"value6","field6":"value6"},{"field30":"value6","field6":"value6"}]
Replacing field30's value iwth field6's.
Is this what you were attempting to do?
Related
Some background
Versioning notebooks can become very inefficient if the output is expected to vary a lot. I solved this problem with my Jupyter notebooks using nbstripout, but so far I've found no alternative for Zeppelin notebooks.
Because nbstripout uses nbformat to parse ipynb files, it's not an easy patch to make it support Zeppelin. On the other hand, the goal is not that complex: simply empty out all the "msg": "...".
Goal
Given a JSON file, empty out all 'paragraphs.result.msg' fields.
Sample (schema):
{"paragraps": [{"result": {"msg": "Very long output..."}}]}
In (1) and (2) below, I'll assume that the incoming JSON looks like this:
{
"paragraphs": [
{
"result": {
"msg": "msg1"
}
},
{
"result": {
"msg": "msg2"
}
}
]
}
1. To set the .result.msg values to ""
.paragraphs[].result.msg = ""
2. To remove the .result.msg fields altogether:
del(.paragraphs[].result.msg)
3. To remove "msg" fields in all objects, wherever they occur:
walk(if type == "object" then del(.msg) else . end)
(If your jq does not have walk, google: jq faq walk)
4. To remove "msg" fields wherever they occur in a .result object in a .paragraphs array:
walk(if type == "object" and (.paragraphs|type) == "array"
then del(.paragraphs[].result?.msg?) else . end)
JQ can do this:
jq .paragraphs[].result.msg file
http://stedolan.github.io/jq
Git Filter
The best solution (thanks to #steven-penny) is to run this:
git config filter.znbstripout.clean "jq '.paragraphs[].result.msg = \"\"'"
which will setup a filter called znbstripout that invokes the jq tool. Then, in your .gitattributes file you can just put:
*.json filter=znbstripout
Python Script (usable with Git Hooks)
The following can be used as a git hook:
#!/usr/bin/env python3
from glob import glob
import json
files = glob('**/note.json', recursive=True)
for file in files:
with open(file, 'r') as fp:
nb = json.load(fp)
for p in nb['paragraphs']:
if 'result' in p:
p['result']['msg'] = ""
with open(file, 'w') as fp:
json.dump(nb, fp, sort_keys=True, indent=2)
I'm working on parsing JSON data using JSON.sh. And I wanted to read data from json file (test.json) whose content will be something like,
{
"/home/ukrishnan/projects/test.yml": {
"LOG_DRIVER": "syslog",
"IMAGE": "mysql:5.6"
},
"/home/ukrishnan/projects/mysql/app.xml": {
"ENV_ACCOUNT_BRIDGE_ENDPOINT": "/u01/src/test/sample.txt"
}
}
And I try to parse this JSON using JSON.sh by using,
test_parser=`sh ./lib/JSON.sh < test/test.json`
echo $test_parser
It prints,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog" ["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6" ["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"} ["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt" ["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"} [] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
Whereas, the same command (sh ./lib/JSON.sh < test/test.json), if I run through terminal, it is printing with line breaks,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog"
["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6"
["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"}
["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt"
["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}
[] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
I wanted to read this and assign to bash variables like,
file_name='/home/ukrishnan/projects/test.yml'
key='LOG_DRIVER'
value='syslog'
As I'm almost completely new to shell script and grep or awk, I don't have much idea of how to achieve this. Any help on this would be greatly appreciated.
I wrote a JSON serializer / deserializer for gawk, if you're interested. Save that script and modify it, replacing everything above # === FUNCTIONS === with the following:
#!/usr/bin/gawk -f
# capture JSON string from beginning to end into a scalar variable
{ json = json ORS $0 }
END {
# objectify JSON string to the multilevel array "obj"
deserialize(json, obj)
for (filename in obj) {
print "file_name=" quote(filename)
for (key in obj[filename]) {
# print key="value"
print key "=" quote(obj[filename][key])
}
}
}
Do chmod 755 json.awk and execute it. Output will resemble this:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
ENV_ACCOUNT_BRIDGE_ENDPOINT="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
LOG_DRIVER="syslog"
IMAGE="mysql:5.6"
Hopefully the logic is reasonably easy to follow. If you prefer to output filename=, key=, and value= on every loop iteration, modify the nested for loops accordingly:
for (filename in obj) {
for (key in obj[filename]) {
print "file_name=" quote(filename)
print "key=" quote(key)
print "value=" quote(obj[filename][key])
}
}
That change will result in the following output:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
key="ENV_ACCOUNT_BRIDGE_ENDPOINT"
value="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
key="LOG_DRIVER"
value="syslog"
file_name="/home/ukrishnan/projects/test.yml"
key="IMAGE"
value="mysql:5.6"
Anyway, with that output, you can do something silly in BASH like this to populate and act upon the variables:
#!/bin/bash
./test.awk test5.json | while read -r line; do {
eval $line
[ "${line/=*/}" = "value" ] && {
echo "bash: file_name=$file_name"
echo "bash: key=$key"
echo "bash: value=$value"
echo "------"
}
}; done
It'd probably be more graceful just to do all processing within gawk from start to finish and not mess with the polyglot handoff, though.
Getting back to json.awk, if you prefer to keep json.awk modular for easy reuse in future projects, you could remove everything above # === FUNCTIONS ===, create a separate main.awk containing the code block at the top of this answer, and #include "json.awk" as a helper library pretty much anywhere outside of END {...} (just below the shbang, for example).
JSON.sh (from http://json.org) offers a nice bash friendly means of flattening out a JSON file. Which you've already provided how it looks in your question. So, the flatten form is the format:
[node] tab value
You have to think in UNIX script in extracting the information you want, you'll note the lines you're interested in actually follow this pattern:
["filename","key"] tab ["value"]
In regex notation, we replace:
filename with (.*)
key with (.*)
tab with \t
value with (.*)
We can retrieve the first, second and third matching groups with \1, \2, \3 respectively.
When used in sed we also note that these symbols []() need to be escaped with a backslash \, resulting in the following script:
./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d'
/home/ukrishnan/projects/test.yml,LOG_DRIVER,syslog
/home/ukrishnan/projects/test.yml,IMAGE,mysql:5.6
/home/ukrishnan/projects/mysql/app.xml,ENV_ACCOUNT_BRIDGE_ENDPOINT,/u01/src/test/sample.txt
Now we put the lines in a loop and for each line, we can extract out filename,key,value:
for line in $(./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d')
do
IFS="," read -ra arr <<< $line
filename=${arr[0]}
key=${arr[1]}
value=${arr[2]}
cat <<EOF
filename : $filename
key : $key
value : $value
EOF
done
Which outputs:
filename : /home/ukrishnan/projects/test.yml
key : LOG_DRIVER
value : syslog
filename : /home/ukrishnan/projects/test.yml
key : IMAGE
value : mysql:5.6
filename : /home/ukrishnan/projects/mysql/app.xml
key : ENV_ACCOUNT_BRIDGE_ENDPOINT
value : /u01/src/test/sample.txt
how to decode the json file if a value has more than one line
a.json file:
{
"sv1" : {
"output" : "Hostname: abcd
asdkfasfjsl",
"exp_result" : "xyz"
}
}
when I try to read the above json file, I am hitting with an error "invalid character encountered while parsing JSON string, at character offset 50 (before "\n ...")"
code to read the above json file:
#!/volume/perl/bin/perl -w
use strict;
use warnings;
use JSON;
local $/;
open(AA,"<a.json") or die "can't open json file : $!\n";
my $json = <AA>;
my $data = decode_json($json);
print "reading output $data->{'sv1'}->{'output'}\n";
print "reading output $data->{'sv1'}->{'exp_result'}\n";
close AA;
Besides from whether the JSON is valid or not (see comments on question), you're reading only the first line from the file.
my $json = <AA>;
This is a scalar variable and receives only one line.
Use an array to get all lines:
my #json = <AA>;
my $json = join "\n", #json;
or even better: use File::Slurp::read_file to get the whole content of the file with one simple command.
use File::Slurp qw/read_file/;
my $json = read_file( "a.json" );
I have created a JSON file which in this case contains:
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"active"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"active"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"},
On other side I have a variable with values for example:
arr="10.1.1.2 10.1.1.3"
which comes from a subsequent check of the server status for example. For those values I want to change the status field to "inactive". In other words to grep the host and change its "status" value.
Expected output:
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"inactive"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"inactive"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"},
$ arr="10.1.1.2 10.1.1.3"
$ awk -v arr="$arr" -F, 'BEGIN { gsub(/\./,"\\.",arr); gsub(/ /,"|",arr) }
$1 ~ "\"(" arr ")\"" { sub(/active/,"in&") } 1' file
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"inactive"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"inactive"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"},
Here is a quick perl "wrap-around one-liner": that uses the JSON module and slurps with the -0 switch:
perl -MJSON -n0E '$j = decode_json($_);
for (#{$j->{hosts}}){$_->{status}=inactive if $_->{ipaddr}=~/2|3/} ;
say to_json( $j->{hosts}, {pretty=>1} )' status_data.json
might be nicer or might violate PBP recommendations for map:
perl -MJSON -n0E '$j = decode_json($_);
map { $_->{status}=inactive if $_->{ipaddr}=~/2|3/ } #{ $j->{hosts} } ;
say to_json( $j->{hosts} )' status_data.json
A shell script that resets status using jq would also be possible. Here's a quick way to parse and output changes to JSON using jq:
cat status_data.json| jq -r '.hosts |.[] |
select(.ipaddr == "10.1.1.2"//.ipaddr == "10.1.1.3" )' |jq '.status = "inactive"'
EDIT In an earlier comment I was uncertain whether the OP was more interested in an application than a quick search and replace (something about the phrases "On other side..." and "check on the server status"). Here is a (still simple) perl approach in script form:
use v5.16; #strict, warnings, say
use JSON ;
use IO::All;
my $status_data < io 'status_data.json';
my $network = JSON->new->utf8->decode($status_data) ;
my #changed_hosts= qw/10.1.1.2 10.1.1.3/;
sub status_report {
foreach my $host ( #{ $network->{hosts} }) {
say "$host->{hostname} is $host->{status}";
}
}
sub change_status {
foreach my $host ( #{ $network->{hosts} }){
foreach (#changed_hosts) {
$host->{status} = "inactive" if $host->{ipaddr} eq $_ ;
}
}
status_report;
}
defined $ENV{CHANGE_HAPPENED} ? change_status : status_report ;
The script reads the JSON file status_data.json (using IO::All which is great fun) then decodes it with JSON into a hash. It is hard to tell if this us a complete a solution because if you are "monitoring" host status then we should check the JSON data file periodically and compare it to our hash and then run the main body of the script one when changes have occurred.
To simulate changes occurring you can define/undefine CHANGE_HAPPENED in your environment with export CHANGE_HAPPENED=1 (or setenv if in in tcsh) and unset CHANGE_HAPPENED and the script will then either update the messages and the hash or "report". For this to be complete the data in our hash should be updated to match the the data file either periodically or when an event occurs. The status_report() subroutine could be changed so that it builds arrays of #inactive_hosts and #active_hosts when update_status() told it to do so: if ( something_happened() ) { update_status() }, etc.
Hope that helps.
status_data.json
{
"hosts":[
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"active"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"active"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"}
]
}
output:
~/ % perl network_status_json.pl
host2 is active
host3 is active
host4 is active
~/ % export CHANGE_HAPPENED=1
~/ % perl network_status_json.pl
host2 is inactive
host3 is inactive
host4 is active
Version 1:
Using a simple regex based transformation. This can be done in several ways. From the initial question, the list of ipaddr is in variable in arr. Example using a Bash env variable:
$ export var="... ..."
It would be a possible solution to provide this information by command line parameters.
#!/usr/bin/perl
my %inact; # ipaddr to inactivate
my $arr=$ENV{arr} ; # from external var (export arr=...)
## $arr=shift; # from command line arg
for( split(/\s+/, $arr)){ $inact{$_}=1 }
while(<>){ # one "json" line at the time
if(/"ipaddr":"(.*?)"/ and $inact{$1}){
s/"active"/"inactive"/}
print $_;
}
Version 2:
Using Json parser we can do more complex transformations; as the input is not real JSON we will process one line of "almost json" at the time:
use JSON;
use strict;
my ($line, %inact);
my $arr=$ENV{arr} ;
for( split(/\s+/, $arr)){ $inact{$_}=1 }
while(<>){ # one "json" line at the time
if(/^\{.*\},/){
s/,\n//;
$line = from_json( $_);
if($inact{$line->{ipaddr}}){
$line->{status} = "inactive" ;}
print to_json($line), ",\n"; }
else { print $_;}
}
#!/bin/ksh
# your "array" of IP
arr="10.1.1.2 10.1.1.3"
# create and prepare temporary file for sed action
SedAction=/tmp/Action.sed
# --- for/do generating SedAction --------
echo "#sed action" > ${SedAction}
#take each IP from the arr variable one by one
for IP in ${arr}
do
# prepare for a psearch pattern use
IP_RE="$( echo "${IP}" | sed 's/\./\\./g' )"
# generate sed action in temporary file.
# final action will be like:
# s/\("ipaddr":"10\.1\.1\.2".*\)"active"}/\1"inactive"}/;t
# escape(double) \ for in_file espace, escape(simple) " for this line interpretation
echo "s/\\\(\"ipaddr\":\"${IP_RE}\".*\\\)\"active\"}/\\\1\"inactive\"}/;t" >> ${SedAction}
done
# --- sed generating sed action ---------------
echo "${arr}" \
| tr " " "\n" \
| sed 's/\./\\./g
s#.*#s/\\("ipaddr":"&".*\\)"active"}/\\1"inactive"}/;t#
' \
> ${SedAction}
# core of the process (use -i for inline editing or "double" redirection for non GNU sed)
sed -f ${SedAction} YourFile
# clean temporary file
rm ${SedAction}
Self commented, tested in ksh/AIX.
2 way to generate the SedAction depending of action you want to do also (if any). You only need one to work, i prefer the second
This is very simple indeed in Perl, using the JSON module.
use strict;
use warnings;
use JSON qw/ from_json to_json /;
my $json = JSON->new;
my $data = from_json(do { local $/; <DATA> });
my $arr = "10.1.1.2 10.1.1.3";
my %arr = map { $_ => 1 } split ' ', $arr;
for my $item (#$data) {
$item->{status} = 'inactive' if $arr{$item->{ipaddr}};
}
print to_json($data, { pretty => 1 }), "\n";
__DATA__
[
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"active"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"active"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"}
]
output
[
{
"role" : "http",
"hostname" : "host2",
"status" : "inactive",
"ipaddr" : "10.1.1.2"
},
{
"hostname" : "host3",
"role" : "sql",
"ipaddr" : "10.1.1.3",
"status" : "inactive"
},
{
"ipaddr" : "10.1.1.4",
"status" : "active",
"hostname" : "host4",
"role" : "quad"
}
]
I am able to convert a hard coded json string into perl hashes however if i want to convert a complete json file into perl data structures which can be parsed later in any manner, I am getting the folloring error.
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)") at json_vellai.pl line 9
use JSON::PP;
$json= JSON::PP->new()
$json = $json->allow_singlequote([$enable]);
open (FH, "jsonsample.doc") or die "could not open the file\n";
#$fileContents = do { local $/;<FH>};
#fileContents = <FH>;
#print #fileContents;
$str = $json->allow_barekey->decode(#filecontents);
foreach $t (keys %$str)
{
print "\n $t -- $str->{$t}";
}
This is how my code looks .. plz help me out
It looks to me like decode doesn't want a list, it wants a scalar string.
You could slurp the file:
undef $/;
$fileContents = <FH>;