Bourne shell function return variable always empty - function

The following Bourne shell script, given a path, is supposed to test each component of the path for existence; then set a variable comprising only those components that actually exist.
#! /bin/sh
set -x # for debugging
test_path() {
path=""
echo $1 | tr ':' '\012' | while read component
do
if [ -d "$component" ]
then
if [ -z "$path" ]
then path="$component"
else path="$path:$component"
fi
fi
done
echo "$path" # this prints nothing
}
paths=/usr/share/man:\
/usr/X11R6/man:\
/usr/local/man
MANPATH=`test_path $paths`
echo $MANPATH
When run, it always prints nothing. The trace using set -x is:
+ paths=/usr/share/man:/usr/X11R6/man:/usr/local/man
++ test_path /usr/share/man:/usr/X11R6/man:/usr/local/man
++ path=
++ echo /usr/share/man:/usr/X11R6/man:/usr/local/man
++ tr : '\012'
++ read component
++ '[' -d /usr/share/man ']'
++ '[' -z '' ']'
++ path=/usr/share/man
++ read component
++ '[' -d /usr/X11R6/man ']'
++ read component
++ '[' -d /usr/local/man ']'
++ '[' -z /usr/share/man ']'
++ path=/usr/share/man:/usr/local/man
++ read component
++ echo ''
+ MANPATH=
+ echo
Why is the final echo $path empty? The $path variable within the while loop was incrementally set for each iteration just fine.

The pipe runs all commands involved in sub-shells, including the entire while ... loop. Therefore, all changes to variables in that loop are confined to the sub-shell and invisible to the parent shell script.
One way to work around that is putting the while ... loop and the echo into a list that executes entirely in the sub-shell, so that the modified variable $path is visible to echo:
test_path()
{
echo "$1" | tr ':' '\n' | {
while read component
do
if [ -d "$component" ]
then
if [ -z "$path" ]
then
path="$component"
else
path="$path:$component"
fi
fi
done
echo "$path"
}
}
However, I suggest using something like this:
test_path()
{
echo "$1" | tr ':' '\n' |
while read dir
do
[ -d "$dir" ] && printf "%s:" "$dir"
done |
sed 's/:$/\n/'
}
... but that's a matter of taste.
Edit: As others have said, the behaviour you are observing depends on the shell. The POSIX standard describes pipelined commands as run in sub-shells, but that is not a requirement:
Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment.
Bash runs them in sub-shells, but some shells run the last command in the context of the main script, when only the preceding commands in the pipeline are run in sub-shells.

This should work in a Bourne shell that understands functions (and would work in Bash and other shells too):
test_path() {
echo $1 | tr ':' '\012' |
{
path=""
while read component
do
if [ -d "$component" ]
then
if [ -z "$path" ]
then path="$component"
else path="$path:$component"
fi
fi
done
echo "$path" # this prints nothing
}
}
The inner set of braces groups the commands into a unit, so path is only set in the subshell but is echoed from the same subshell.

Why is the final echo $path empty?
Until recently, Bash would give all components of a pipeline their own process, separate from the shell process in which the pipeline is run.
Separate process == separate address space, and no variable sharing.
In ksh93 and in recent Bash (may need a shopt setting), the shell will run the last component of a pipeline in the calling shell, so any variables changed inside the loop are preserved when the loop exits.
Another way to accomplish what you want is to make sure that the echo $path is in the same process as the loop, using parentheses:
#! /bin/sh
set -x # for debugging
test_path() {
path=""
echo $1 | tr ':' '\012' | ( while read component
do
[ -d "$component" ] || continue
path="${path:+$path:}$component"
done
echo "$path"
)
}
Note: I simplified the inner if. There was no else so the test can be replaced with a shortcut. Also, the two path assignments can be combined into one, using the S{var:+ ...} parameter substitution trick.

Your script works just fine with no change under Solaris 11 and probably also most commercial Unix like AIX and HP-UX because under these OSes, the underlying implementation of /bin/sh is provided by ksh. This would be also the case if /bin/sh is backed by zsh.
It doesn't work for you likely because your /bin/sh is implemented by one of bash, dash, mksh or busybox sh which all process each component of a pipeline in a subshell while ksh and zsh both keep the last element of a pipeline in the current shell, saving an unnecessary fork.
It is possible to "fix" your script for it to work when sh is provided by bash by adding this line somewhere before the pipeline:
shopt -s lastpipe
or better, if you wan't to keep portability:
command -v shopt > /dev/null && shopt -s lastpipe
This will keep the script working for ksh, and zsh but still won't work for dash, mksh or the original Bourne shell.
Note that both bash and ksh behaviors are allowed by the POSIX standard.

Related

How can I write a batch file using jq to find json files with certain attribute and copy the to new location

I have 100,000's of lined json files that I need to split out based on whether or not, they contain a certain value for an attribute and then I need to convert them into valid json that can be read in by another platform.
I'm using a batch file to do this and I've managed to convert them into valid json using the following:
for /r %%f in (*.json*) do jq -s -c "." "%%f" >> "C:\Users\me\my-folder\%%~nxf.json"
I just can't figure out how to only copy the files that contain a certain value. So logic should be:
Look at all the files in the folders and sub solders
If the file contains an attribute "event" with a value of "abcd123"
then: convert the file into valid json and persist it with the same filename over to location "C:\Users\me\my-folder\"
else: ignore it
Example of files it should select:
{"name":"bob","event":"abcd123"}
and
{"name":"ann","event":"abcd123"},{"name":"bob","event":"8745LLL"}
Example of files it should NOT select:
{"name":"ann","event":"778PPP"}
and
{"name":"ann","event":"778PPP"},{"name":"bob","event":"8745LLL"}
Would love help to figure out the filtering part.
Since there are probably more file names than will fit on the command line, this response will assume a shell loop through the file names will be necessary, as the question itself envisions. Since I'm currently working with a bash shell, I'll present a bash solution, which hopefully can readily be translated to other shells.
The complication in the question is that the input file might contain one or more valid JSON values, or one or more comma-separated JSON values.
The key to a simple solution using jq is jq's -e command-line option, since this sets the return code to 0 if and only if
(a) the program ran normally; and (b) the last result was a truthy value.
For clarity, let's encapsulate the relevant selection criterion in two bash functions:
# If the input is a valid stream of JSON objects
function try {
jq -e -n 'any( inputs | objects; select( .event == "abcd123") | true)' 2> /dev/null > /dev/null
}
# If the input is a valid JSON array whose elements are to be checked
function try_array {
jq -e 'any( .[] | objects; select( .event == "abcd123") | true)' 2> /dev/null > /dev/null
}
Now a comprehensive solution can be constructed along the following lines:
find . -type f -maxdepth 1 -name '*.json' | while read -r f
do
< "$f" try
if [ $? = 0 ] ; then
echo copy $f
elif [ $? = 5 ] ; then
(echo '['; cat "$f"; echo ']') | try_array
if [ $? = 0 ] ; then
echo copy $f
fi
fi
done
Have you considered using findstr?
%SystemRoot%\System32\findstr.exe /SRM "\"event\":\"abcd123\"" "C:\Users\me\my-folder\*.json"
Please open a Command Prompt window, type findstr /?, press the ENTER key, and read its usage information. (You may want to consider the /I option too, for instance).
You could then use that within another for loop to propagate those files into a variable for your copy command.
batch-file example:
#For /F "EOL=? Delims=" %%G In (
'%SystemRoot%\System32\findstr.exe /SRM "\"event\":\"abcd123\"" "C:\Users\me\my-folder\*.json"'
) Do #Copy /Y "%%G" "S:\omewhere Else"
cmd example:
For /F "EOL=? Delims=" %G In ('%SystemRoot%\System32\findstr.exe /SRM "\"event\":\"abcd123\"" "C:\Users\me\my-folder\*.json"') Do #Copy /Y "%G" "S:\omewhere Else"

Bash loop to merge files in batches for mongoimport

I have a directory with 2.5 million small JSON files in it. It's 104gb on disk. They're multi-line files.
I would like to create a set of JSON arrays from the files so that I can import them using mongoimport in a reasonable amount of time. The files can be no bigger than 16mb, but I'd be happy even if I managed to get them in sets of ten.
So far, I can use this to do them one at a time at about 1000/minute:
for i in *.json; do mongoimport --writeConcern 0 --db mydb --collection all --quiet --file $i; done
I think I can use "jq" to do this, but I have no idea how to make the bash loop pass 10 files at a time to jq.
Note that using bash find results in an error as there are too many files.
With jq you can use --slurp to create arrays, and -c to make multiline json single line. However, I can't see how to combine the two into a single command.
Please help with both parts of the problem if possible.
Here's one approach. To illustrate, I've used awk as it can read the list of files in small batches and because it has the ability to execute jq and mongoimport. You will probably need to make some adjustments to make the whole thing more robust, to test for errors, and so on.
The idea is either to generate a script that can be reviewed and then executed, or to use awk's system() command to execute the commands directly. First, let's generate the script:
ls *.json | awk -v group=10 -v tmpfile=json.tmp '
function out() {
print "jq -s . " files " > " tmpfile;
print "mongoimport --writeConcern 0 --db mydb --collection all --quiet --file " tmpfile;
print "rm " tmpfile;
files="";
}
BEGIN {n=1; files="";
print "test -r " tmpfile " && rm " tmpfile;
}
n % group == 0 {
out();
}
{ files = files " \""$0 "\"";
n++;
}
END { if (files) {out();}}
'
Once you've verified this works, you can either execute the generated script, or change the "print ..." lines to use "system(....)"
Using jq to generate the script
Here's a jq-only approach for generating the script.
Since the number of files is very large, the following uses features that were only introduced in jq 1.5, so its memory usage is similar to the awk script above:
def read(n):
# state: [answer, hold]
foreach (inputs, null) as $i
([null, null];
if $i == null then .[0] = .[1]
elif .[1]|length == n then [.[1],[$i]]
else [null, .[1] + [$i]]
end;
.[0] | select(.) );
"test -r json.tmp && rm json.tmp",
(read($group|tonumber)
| map("\"\(.)\"")
| join(" ")
| ("jq -s . \(.) > json.tmp", mongo("json.tmp"), "rm json.tmp") )
Invocation:
ls *.json | jq -nRr --arg group 10 -f generate.jq
Here is what I came up with. It seems to work and is importing at roughly 80 a second into an external hard drive.
#!/bin/bash
files=(*.json)
for((I=0;I<${#files[*]};I+=500)); do jq -c '.' ${files[#]:I:500} | mongoimport --writeConcern 0 --numInsertionWorkers 16 --db mydb --collection all --quiet;echo $I; done
However, some are failing. I've imported 105k files but only 98547 appeared in the mongo collection. I think it's because some documents are > 16mb.

Periodically reading output from async background scripts

Context: I'm making my own i3-Bar script to read output from other (asynchronous) scripts running in background, concatenate them and then echo them to i3-Bar itself.
The way I'm passing outputs is in plain files, and I guess (logically) the problem is that the files are sometimes read and written at the same time. The best way to reproduce this behavior is by suspending the computer and then waking it back up - I don't know the exact cause of this, I can only go on what I see from my debug log files.
Main Code: Added comments for clarity
#!/usr/bin/env bash
cd "${0%/*}";
trap "kill -- -$$" EXIT; #The bg. scripts are on a while [ 1 ] loop, have to kill them.
rm -r ../input/*;
mkdir ../input/; #Just in case.
for tFile in ./*; do
#Run all of the available scripts in the current directory in the background.
if [ $(basename $tFile) != "main.sh" ]; then ("$tFile" &); fi;
done;
echo -e '{ "version": 1 }\n['; #I3-Bar can use infinite array of JSON input.
while [ 1 ]; do
input=../input/*; #All of the scripts put their output in this folder as separate text files
input=$(sort -nr <(printf "%s\n" $input));
output="";
for tFile in $input; do
#Read and add all of the files to one output string.
if [ $tFile == "../input/*" ]; then break; fi;
output+="$(cat $tFile),";
done;
if [ "$output" == "" ]; then
echo -e "[{\"full_text\":\"ERR: No input files found\",\"color\":\"#ff0000\"}],\n";
else
echo -e "[${output::-1}],\n";
fi;
sleep 0.2s;
done;
Example Input Script:
#!/usr/bin/env bash
cd "${0%/*}";
while [ 1 ]; do
echo -e "{" \
"\"name\":\"clock\"," \
"\"separator_block_width\":12," \
"\"full_text\":\"$(date +"%H:%M:%S")\"}" > ../input/0_clock;
sleep 1;
done;
The Problem
The problem isn't the script itself, but the fact, that i3-Bar receives a malformed JSON input (-> parse error), and terminates - I'll show such log later.
Another problem is, that the background scripts should run asynchronously, because some need to update every 1 second nad some only every 1 minute, etc. So the use of a FIFO isn't really an option, unless I create some ugly inefficient hacky stuff.
I know there is a need for IPC here, but I have no idea how to effieciently do this.
Script output from randomly crashing - waking up error looks the same
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"100%"}],
[{ "separator_block_width":12, "color":"#BAF2F8", "full_text":"192.168.1.104 "},,],
(Error is created by the second line)
As you see, the main script tries to read the file, doesn't get any output, but the comma is still there -> malformed JSON.
The immediate error is easy to fix: don't append an entry to output if the corresponding file is empty:
for tFile in $input; do
[[ $tFile != "../input/*" ]] &&
[[ -s $tFile ]] &&
output+="$(<$tFile),"
done
There is a potential race condition here, though. Just because a particular input file exists doesn't mean that the data is fully written to it yet. I would change your input scripts to look something like
#!/usr/bin/env bash
cd "${0%/*}";
while true; do
o=$(mktemp)
printf '{"name": "clock", "separator_block_width": 12, "full_text": %(%H:%M:%S)T}\n' > "$o"
mv "$o" ../input/0_clock
sleep 1
done
Also, ${output%,} is a safer way to trim a trailing comma when necessary.

How delete last comma in json file using Bash?

I wrote some script that takes all user data of aws ec2 instance, and echo to local.json. All this happens when I install my node.js modules.
I don't know how to delete last comma in the json file. Here is the bash script:
#!/bin/bash
export DATA_DIR=/data
export PATH=$PATH:/usr/local/bin
#install package from git repository
sudo -- sh -c "export PATH=$PATH:/usr/local/bin; export DATA_DIR=/data; npm install git+https://reader:secret#bitbucket.org/somebranch/$1.git#$2"
#update config files from instance user-data
InstanceConfig=`cat /instance-config`
echo '{' >> node_modules/$1/config/local.json
while read line
do
if [ ! -z "$line" -a "$line" != " " ]; then
Key=`echo $line | cut -f1 -d=`
Value=`echo $line | cut -f2 -d=`
if [ "$Key" = "Env" ]; then
Env="$Value"
fi
printf '"%s" : "%s",\n' "$Key" "$Value" >> node_modules/*/config/local.json
fi
done <<< "$InstanceConfig"
sed -i '$ s/.$//' node_modules/$1/config/local.json
echo '}' >> node_modules/$1/config/local.json
To run him im doing that way: ./script
I get json(OUTPUT), but with comma in all lines. Here is local.json that I get:
{
"Env" : "dev",
"EngineUrl" : "engine.url.net",
}
All I trying to do, is delete in last line of the json file - comma(",").
I try many ways, that I found in internet. I know that it should be after last "fi"(end of the loop). And I know that it should be something like this line:
sed -i "s?${Key} : ${Value},?${Key} : ${Value}?g" node_modules/$1/config/local.json
Or this:
sed '$ s/,$//' node_modules/$1/config/local.json
But they not work for me.
Can someone help me with that? Who knows Bash scripting well?
Thanks!
If you know that it is the last comma that needs to be replaced, a reasonably robust way is to use GNU sed in "slurp" mode like this:
sed -zr 's/,([^,]*$)/\1/' local.json
Output:
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
If you'd just post some sample input/output it'd remove the guess-work but IF this is your input file:
$ cat file
Env=dev
EngineUrl=engine.url.net
Then IF you're trying to do what I think you are then all you need is:
$ cat tst.awk
BEGIN { FS="="; sep="{\n" }
{
printf "%s \"%s\" : \"%s\"", sep, $1, $2
sep = ",\n"
}
END { print "\n}" }
which you'd execute as:
$ awk -f tst.awk file
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
Or you can execute the awk script inline within a shell script if you prefer:
awk '
BEGIN { FS="="; sep="{\n" }
{
printf "%s \"%s\" : \"%s\"", sep, $1, $2
sep = ",\n"
}
END { print "\n}" }
' file
{
"Env" : "dev",
"EngineUrl" : "engine.url.net"
}
The above is far more robust, portable, efficient and better in every other way than the shell script you posted because it's using the right tool for the job. A UNIX shell is an environment from which to call tools with a language to sequence those calls. It is NOT a language to process text which is why it's so difficult to get it right. The UNIX tool for general text processing is awk so when you need to process text in UNIX, you just have shell call awk, that's all.
Here a jq version if it's available:
jq --raw-input 'split("=") | {(.[0]):.[1]}' /instance-config | jq --slurp 'add'
There might be a way to do it with one jqpass, but I couldn't see it.
You an remove all trailing commas from invalid json with:
sed -i.bak ':begin;$!N;s/,\n}/\n}/g;tbegin;P;D' FILE
sed -i.bak = creates a backup of the original file, then applies changes to the file
':begin;$!N;s/,\n}/\n}/g;tbegin;P;D' = anything ending with , followed by
"new line and }". Remove the , on the previous line
FILE = the file you want to make the change to
If you're willing to use it, xidel is rather forgiving for trailing commas:
xidel -s local.json -e '$json'
{
"Env": "dev",
"EngineUrl": "engine.url.net"
}
xidel - -se '$json' <<< '{"Env":"dev","EngineUrl":"engine.url.net",}'
#or
xidel - -se 'parse-json($raw,{"liberal":true()})' <<< '{"Env":"dev","EngineUrl":"engine.url.net",}'
{
"Env": "dev",
"EngineUrl": "engine.url.net"
}

How to parse json response in the shell script?

I am working with bash shell script. I need to execute an URL using shell script and then parse the json data coming from it.
This is my URL - http://localhost:8080/test_beat and the responses I can get after hitting the URL will be from either these two -
{"error": "error_message"}
{"success": "success_message"}
Below is my shell script which executes the URL using wget.
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/test_beat)
#grep $DATA for error and success key
Now I am not sure how to parse json response in $DATA and see whether the key is success or error. If the key is success, then I will print a message "success" and print $DATA value and exit out of the shell script with zero status code but if the key is error, then I will print "error" and print $DATA value and exit out of the shell script with non zero status code.
How can I parse json response and extract the key from it in shell script?
I don't want to install any library to do this since my JSON response is fixed and it will always be same as shown above so any simpler way is fine.
Update:-
Below is my final shell script -
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/tester)
echo $DATA
#grep $DATA for error and success key
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
case "$KEY" in
success)
exit 0
;;
error)
exit 1
;;
esac
Does this looks right?
If you are going to be using any more complicated json from the shell and you can install additional software, jq is going to be your friend.
So, for example, if you want to just extract the error message if present, then you can do this:
$ echo '{"error": "Some Error"}' | jq ".error"
"Some Error"
If you try this on the success case, it will do:
$echo '{"success": "Yay"}' | jq ".error"
null
The main advantage of the tool is simply that it fully understands json. So, no need for concern over corner cases and whatnot.
#!/bin/bash
IFS= read -d '' DATA < temp.txt ## Imitates your DATA=$(wget ...). Just replace it.
while IFS=\" read -ra LINE; do
case "${LINE[1]}" in
error)
# ERROR_MSG=${LINE[3]}
printf -v ERROR_MSG '%b' "${LINE[3]}"
;;
success)
# SUCCESS_MSG=${LINE[3]}
printf -v SUCCESS_MSG '%b' "${LINE[3]}"
;;
esac
done <<< "$DATA"
echo "$ERROR_MSG|$SUCCESS_MSG" ## Shows: error_message|success_message
* %b expands backslash escape sequences in the corresponding argument.
Update as I didn't really get the question at first. It should simply be:
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
[[ $KEY == success ]] ## Gives $? = 0 if true or else 1 if false.
And you can examine it further:
case "$KEY" in
success)
echo "Success message: $MESSAGE"
exit 0
;;
error)
echo "Error message: $MESSAGE"
exit 1
;;
esac
Of course similar obvious tests can be done with it:
if [[ $KEY == success ]]; then
echo "It was successful."
else
echo "It wasn't."
fi
From your last comment it can be simply done as
IFS=\" read __ KEY __ MESSAGE __ <<< "$DATA"
echo "$DATA" ## Your really need to show $DATA and not $MESSAGE right?
[[ $KEY == success ]]
exit ## Exits with code based from current $?. Not necessary if you're on the last line of the script.
You probably already have python installed, which has json parsing in the standard library. Python is not a great language for one-liners in shell scripts, but here is one way to use it:
#!/bin/bash
DATA=$(wget -O - -q -t 1 http://localhost:8080/test_beat)
if python -c '
import json, sys
exit(1 if "error" in json.loads(sys.stdin.read()) else 0)' <<<"$DATA"
then
echo "SUCCESS: $DATA"
else
echo "ERROR: $DATA"
exit 1
fi
Given:
that you don't want to use JSON libraries.
and that the response you're parsing is simple and the only thing you care about is the presence of substring "success", I suggest the following simplification:
#!/bin/bash
wget -O - -q -t 1 http://localhost:8080/tester | grep -F -q '"success"'
exit $?
-F tells grep to search for a fixed (literal) string.
-q tells grep to produce no output and instead only reflect via its exit code whether a match was found or not.
exit $? simply exits with grep's exit code ($? is a special variable that reflects the most recently executed command's exit code).
Note that if you all you care about is whether wget's output contains "success", the above pipeline will do - no need to capture wget's output in an aux. variable.