I have a JSON file and content like below:
[
{
"id":"54545-f919-4b0f-930c-0117d6e6c987",
"name":"Inventory_Groups",
"path":"/Groups",
"subGroups":[
{
"id":"343534-394b-429a-834e-f8774240d736",
"name":"UserGroup",
"path":"/Groups/UserGroup",
"subGroups":[
]
}
]
}
]
Now I want to grep value of key id from the subGroups area. How to achive this, if id key not duplicate then it can be achieved by:
grep -o '"id": "[^"]*' Group.json | grep -o '[^"]*$'
But in my case how can I get the value of id as it appears two times?
A valid question to ask your employer is why you're in a position to use the shell but not to use appropriate linux packages. Compare:
awk -F '[":,]+' '$2=="subGroups" {f=1} f && $2=="id" {print $3; exit}' file
(Brittle solution, will fail if the structure of your JSON changes)
To:
jq '.[].subGroups[].id' file
Which can handle compact JSON in addition to numerous other realistic complications.
Using just standard UNIX tools and assuming your sed can tolerate input without a terminating newline (otherwise we can swap out the tr for an awk command that keeps the last newline):
$ tr -d '\n' < file | sed 's/.*"subGroups":[^]}]*"id":"\([^"]*\)\".*/\1\n/'
343534-394b-429a-834e-f8774240d736
Alternatively with just a call to any awk:
$ awk '
{ rec = (NR>1 ? rec ORS : "") $0 }
END {
gsub(/.*"subGroups":[^]}]*"id":"|".*/,"",rec)
print rec
}
' file
343534-394b-429a-834e-f8774240d736
Related
I am trying to increment a value in a csv file, provided it matches a search string. Here is the script that was utilized:
awk -i inplace -F',' '$1 == "FL" { print $1, $2+1} ' data.txt
Contents of data.txt:
NY,1
FL,5
CA,1
Current Output:
FL 6
Intended Output:
NY,1
FL,6
CA,1
Thanks.
$ awk 'BEGIN{FS=OFS=","} $1=="FL"{++$2} 1' data.txt
NY,1
FL,6
CA,1
Intended Output:
NY,1 FL,6 CA,1
I would harness GNU AWK for this task following way, let file.txt content be
NY,1
FL,5
CA,1
then
awk 'BEGIN{FS=OFS=",";ORS=" "}{print $1,$2+($1=="FL")}' file.txt
gives output
NY,1 FL,6 CA,1
Explanation: I inform GNU AWK that field separator (FS) and output field separator (OFS) is , and output row separator (ORS) is space with accordance to your requirements. Then for each line I print 1st field followed by 2nd field increased by is 1st field FL? with 1 denoting it does hold, 0 denotes it does not hold. If you want to know more about FS or OFS or ORS then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
(tested in gawk 4.2.1)
Use this Perl one-liner:
perl -i -F',' -lane 'if ( $F[0] eq "FL" ) { $F[1]++; } print join ",", #F;' data.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
I want to add a column at the multiple (500) CSV files (same dimensionality). Each column should act as an identifier for the individual file. I want to create a bash script using awk(I am a new bee in awk). The CSV files do come with headers.
For eg.
Input File1.csv
#name,#age,#height
A,12,4.5
B,13,5.0
Input File2.csv
#name,#age,#height
C,11,4.6
D,12,4.3
I want to add a new column "#ID" in both the files, where the value of ID will be same for an individual file but not for both the file.
Expected Output
File1.csv
#name,#age,#height,#ID
A,12,4.5,1
B,13,5.0,1
Expected File2.csv
#name,#age,#height,#ID
C,11,4.6,2
D,12,4.3,2
Please suggest.
If you don't need to extract the id number from the filename, this should do.
$ c=1; for f in File*.csv;
do
sed -i '1s/$/,#ID/; 2,$s/$/,'$c'/' "$f";
c=$((c+1));
done
note that this is inplace edit. Perhaps make a backup or test first.
UPDATE
If you don't need the individual files to be updated, this may work better for you
$ awk -v OFS=, 'BEGIN {f="allFiles.csv"}
FNR==1 {c++; print $0,"#ID" > f; next}
{print $0,c > f}' File*.csv
awk -F, -v OFS=, ‘
FNR == 1 {
$(NF + 1) = “ID#”
i++
f = FILENAME
sub(/Input/, “Output”, f)
} FNR != 1 {
$(NF + 1) = i
} {
print > f
}’ Input*.csv
With GNU awk for inplace editing and ARGIND:
awk -i inplace -v OFS=, '{print $0, (FNR==1 ? "#ID" : ARGIND)}' File*.csv
I have a directory with 2.5 million small JSON files in it. It's 104gb on disk. They're multi-line files.
I would like to create a set of JSON arrays from the files so that I can import them using mongoimport in a reasonable amount of time. The files can be no bigger than 16mb, but I'd be happy even if I managed to get them in sets of ten.
So far, I can use this to do them one at a time at about 1000/minute:
for i in *.json; do mongoimport --writeConcern 0 --db mydb --collection all --quiet --file $i; done
I think I can use "jq" to do this, but I have no idea how to make the bash loop pass 10 files at a time to jq.
Note that using bash find results in an error as there are too many files.
With jq you can use --slurp to create arrays, and -c to make multiline json single line. However, I can't see how to combine the two into a single command.
Please help with both parts of the problem if possible.
Here's one approach. To illustrate, I've used awk as it can read the list of files in small batches and because it has the ability to execute jq and mongoimport. You will probably need to make some adjustments to make the whole thing more robust, to test for errors, and so on.
The idea is either to generate a script that can be reviewed and then executed, or to use awk's system() command to execute the commands directly. First, let's generate the script:
ls *.json | awk -v group=10 -v tmpfile=json.tmp '
function out() {
print "jq -s . " files " > " tmpfile;
print "mongoimport --writeConcern 0 --db mydb --collection all --quiet --file " tmpfile;
print "rm " tmpfile;
files="";
}
BEGIN {n=1; files="";
print "test -r " tmpfile " && rm " tmpfile;
}
n % group == 0 {
out();
}
{ files = files " \""$0 "\"";
n++;
}
END { if (files) {out();}}
'
Once you've verified this works, you can either execute the generated script, or change the "print ..." lines to use "system(....)"
Using jq to generate the script
Here's a jq-only approach for generating the script.
Since the number of files is very large, the following uses features that were only introduced in jq 1.5, so its memory usage is similar to the awk script above:
def read(n):
# state: [answer, hold]
foreach (inputs, null) as $i
([null, null];
if $i == null then .[0] = .[1]
elif .[1]|length == n then [.[1],[$i]]
else [null, .[1] + [$i]]
end;
.[0] | select(.) );
"test -r json.tmp && rm json.tmp",
(read($group|tonumber)
| map("\"\(.)\"")
| join(" ")
| ("jq -s . \(.) > json.tmp", mongo("json.tmp"), "rm json.tmp") )
Invocation:
ls *.json | jq -nRr --arg group 10 -f generate.jq
Here is what I came up with. It seems to work and is importing at roughly 80 a second into an external hard drive.
#!/bin/bash
files=(*.json)
for((I=0;I<${#files[*]};I+=500)); do jq -c '.' ${files[#]:I:500} | mongoimport --writeConcern 0 --numInsertionWorkers 16 --db mydb --collection all --quiet;echo $I; done
However, some are failing. I've imported 105k files but only 98547 appeared in the mongo collection. I think it's because some documents are > 16mb.
I wrote some script that takes all user data of aws ec2 instance, and echo to local.json. All this happens when I install my node.js modules.
I don't know how to delete last comma in the json file. Here is the bash script:
#!/bin/bash
export DATA_DIR=/data
export PATH=$PATH:/usr/local/bin
#install package from git repository
sudo -- sh -c "export PATH=$PATH:/usr/local/bin; export DATA_DIR=/data; npm install git+https://reader:secret#bitbucket.org/somebranch/$1.git#$2"
#update config files from instance user-data
InstanceConfig=`cat /instance-config`
echo '{' >> node_modules/$1/config/local.json
while read line
do
if [ ! -z "$line" -a "$line" != " " ]; then
Key=`echo $line | cut -f1 -d=`
Value=`echo $line | cut -f2 -d=`
if [ "$Key" = "Env" ]; then
Env="$Value"
fi
printf '"%s" : "%s",\n' "$Key" "$Value" >> node_modules/*/config/local.json
fi
done <<< "$InstanceConfig"
sed -i '$ s/.$//' node_modules/$1/config/local.json
echo '}' >> node_modules/$1/config/local.json
To run him im doing that way: ./script
I get json(OUTPUT), but with comma in all lines. Here is local.json that I get:
{
"Env" : "dev",
"EngineUrl" : "engine.url.net",
}
All I trying to do, is delete in last line of the json file - comma(",").
I try many ways, that I found in internet. I know that it should be after last "fi"(end of the loop). And I know that it should be something like this line:
sed -i "s?${Key} : ${Value},?${Key} : ${Value}?g" node_modules/$1/config/local.json
Or this:
sed '$ s/,$//' node_modules/$1/config/local.json
But they not work for me.
Can someone help me with that? Who knows Bash scripting well?
Thanks!
If you know that it is the last comma that needs to be replaced, a reasonably robust way is to use GNU sed in "slurp" mode like this:
sed -zr 's/,([^,]*$)/\1/' local.json
Output:
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
If you'd just post some sample input/output it'd remove the guess-work but IF this is your input file:
$ cat file
Env=dev
EngineUrl=engine.url.net
Then IF you're trying to do what I think you are then all you need is:
$ cat tst.awk
BEGIN { FS="="; sep="{\n" }
{
printf "%s \"%s\" : \"%s\"", sep, $1, $2
sep = ",\n"
}
END { print "\n}" }
which you'd execute as:
$ awk -f tst.awk file
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
Or you can execute the awk script inline within a shell script if you prefer:
awk '
BEGIN { FS="="; sep="{\n" }
{
printf "%s \"%s\" : \"%s\"", sep, $1, $2
sep = ",\n"
}
END { print "\n}" }
' file
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
The above is far more robust, portable, efficient and better in every other way than the shell script you posted because it's using the right tool for the job. A UNIX shell is an environment from which to call tools with a language to sequence those calls. It is NOT a language to process text which is why it's so difficult to get it right. The UNIX tool for general text processing is awk so when you need to process text in UNIX, you just have shell call awk, that's all.
Here a jq version if it's available:
jq --raw-input 'split("=") | {(.[0]):.[1]}' /instance-config | jq --slurp 'add'
There might be a way to do it with one jqpass, but I couldn't see it.
You an remove all trailing commas from invalid json with:
sed -i.bak ':begin;$!N;s/,\n}/\n}/g;tbegin;P;D' FILE
sed -i.bak = creates a backup of the original file, then applies changes to the file
':begin;$!N;s/,\n}/\n}/g;tbegin;P;D' = anything ending with , followed by
"new line and }". Remove the , on the previous line
FILE = the file you want to make the change to
If you're willing to use it, xidel is rather forgiving for trailing commas:
xidel -s local.json -e '$json'
{
"Env": "dev",
"EngineUrl": "engine.url.net"
}
xidel - -se '$json' <<< '{"Env":"dev","EngineUrl":"engine.url.net",}'
#or
xidel - -se 'parse-json($raw,{"liberal":true()})' <<< '{"Env":"dev","EngineUrl":"engine.url.net",}'
{
"Env": "dev",
"EngineUrl": "engine.url.net"
}
I am writing a bash script to use with badips.com
This command:
wget https://www.badips.com/get/key -qO -
Will return something like this:
{"err":"","suc":"new key 5f72253b673eb49fc64dd34439531b5cca05327f has been set.","key":"5f72253b673eb49fc64dd34439531b5cca05327f"}
Or like this:
{"err":"","suc":"Your Key was already present! To overwrite, see http:\/\/www.badips.com\/apidoc.","key":"5f72253b673eb49fc64dd34439531b5cca05327f"}
I need to parse the key value out (5f72253b673eb49fc64dd34439531b5cca05327f) into a variable in the script. I would prefer to use grep to do it but can't get it right.
Instead of parsing with some grep, you have the perfect tool for this: jq.
See:
jq '.key' file
or
.... your_commands .... | jq '.key'
will return
"5f72253b673eb49fc64dd34439531b5cca05327f"
See another example, for example to get the suc attribute:
$ cat a
{"err":"","suc":"new key 5f72253b673eb49fc64dd34439531b5cca05327f has been set.","key":"5f72253b673eb49fc64dd34439531b5cca05327f"}
{"err":"","suc":"Your Key was already present! To overwrite, see http:\/\/www.badips.com\/apidoc.","key":"5f72253b673eb49fc64dd34439531b5cca05327f"}
$ jq '.suc' a
"new key 5f72253b673eb49fc64dd34439531b5cca05327f has been set."
"Your Key was already present! To overwrite, see http://www.badips.com/apidoc."
You could try the below grep command,
grep -oP '"key":"\K[^"]*(?=")' file
Using perl :
wget https://www.badips.com/get/key -qO - |
perl -MJSON -MFile::Slurp=slurp -le '
my $s = slurp "/dev/stdin";
my $d = JSON->new->decode($s);
print $d->{key}
'
Not as strong as precedent one, but that don't require to install new modules, a stock perl can do it :
wget https://www.badips.com/get/key -qO - |
perl -lne 'print $& if /"key":"\K[[:xdigit:]]+/'
awk keeps it simple
wget ... - | awk -F: '{split($NF,k,"\"");print k[2]}'
the field separator is :;
the key is always in the last field, in awk this field is accessed using $NF (Number of Fields);
the split function splits $NF and puts the pieces in array k, according to separator "\"" that is just a single double quote character;
the second field of the k array is what you want.