how to make a shell script function able to either specify arguments in command line or get them from a pipe? - function

For example, I want to write a function called fooFun, which will do some process on a PDF file. I'd like to make it able to run on both of the ways as following:
$ fooFun foo.pdf
$ ls *.pdf | fooFun
Any ideas? Thanks.

I don't think you can easily do this with a shell function. A better idea is to make it a script, let it take command line arguments, and achieve the second style with xargs:
ls *.pdf | xargs fooFun

I agree with #larsmans, better to stick with passing arguments as parameters. However, here's how to achieve what you're asking:
foofun() {
local args arg
if [[ $# -eq 0 ]]; then
args=()
# consume stdin
while IFS= read -r arg; do args+=($arg); done
else
args=("$#")
fi
# do something with "${args[#]}"
}

Related

How to strip control characters when saving output to variable?

Trying to strip control characters such as ^[[1m and ^[(B^[[m from ^[[1mfoo^[(B^[[m.
$ cat test.sh
#! /bin/bash
bold=$(tput bold)
normal=$(tput sgr0)
printf "%s\n" "Secret:"
printf "$bold%s$normal\n" "foo"
printf "%s\n" "Done"
$ cat test.exp
#!/usr/bin/expect
log_file -noappend ~/Desktop/test.log
spawn ~/Desktop/test.sh
expect {
-re {Secret:\r\n(.+?)\r\nDone} {
set secret $expect_out(1,string)
}
}
$ expect ~/Desktop/test.exp
spawn ~/Desktop/test.sh
Secret:
foo
Done
$ cat -e ~/Desktop/test.log
spawn ~/Desktop/test.sh^M$
Secret:^M$
^[[1mfoo^[(B^[[m^M$
Done^M$
The escape sequences depend on the TERM variable. You can avoid getting them in the first place by pretending to have a dumb terminal:
set env(TERM) dumb
spawn ~/Desktop/test.sh
This works for the provided example. If it will work in the real case is impossible to tell from the provided information. That depends on whether the program actually uses termcaps to generate the escape sequences.
I don't see any way in expect to add hooks to manipulate the data being read before it's matched/logged/etc. However, you can add another layer into your pipeline to strip ANSI escapes from what the real program being run outputs before expect sees it by adjusting your test.exp:
set csi_re [subst -nocommands {\x1B\\[[\x30-\x3F]*[\x20-\x2F]*[\x40-\x7E]}]
spawn sh -c "~/Desktop/test.sh | sed 's/$csi_re//g'"
This uses sed to strip out all strings that match ANSI terminal CSI escape sequences from test.sh's output.

How to download and then use the file in the same tcl script?

I'm new using Tcl and I have the following script:
proc prepare_xml {pdb_id} {
set filename [exec wget ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz]
set filename_unzip [exec gunzip "$pdb_id.xml.gz"]
set ready_xml [exec sed -i "/entry /c\<entry>" "$pdb_id.xml"]
return $ready_xml
}
The expected output is the file "filename" uncompress and modified. However, when I execute it the first time, it only downloads the file and it does not uncompress it. If I execute it for a second time, I obtained the expected output and a second copy of the original downloaded file.
Can anyone help me with this? I've tried with after and vwait commands but it doesn't work.
Thank you :)
It's hard to say for sure as you're not describing whether any errors are thrown (that'd be the only reason for the code to not run to completion), but I'd expect something like this to be the right approach:
proc prepare_xml {pdb_id} {
# Double quotes on next line just because of Stack Overflow highlighter
set url "ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz"
set file $pdb_id.xml
append sedcode {/entry /} "c\\\n" {<entry>}
exec wget -q -O - $url | gunzip -c | sed $sedcode > $file
return $file
}
Firstly, I'm keeping complicated bits in (local) variables to stop the exec line from getting too long. Secondly, I've put all the subprocesses together in the one pipeline. Thirdly, I'm using -q and -O - with wget, and -c with gunzip; look up what they do if you don't understand them. Fourthly, I've put the scriptlet for sed in braces where possible to stop there from being trouble with backslashes, but I've used append and a non-backslashed section to make the pattern because the syntax of c in sed is downright weird (it needs a backslash-newline sequence immediately after on at least some platforms…)
I'd actually use native Tcl code to extract and transform the data if I was doing it for me, but that's a rather larger change.

Split large directory into subdirectories

I have a directory with about 2.5 million files and is over 70 GB.
I want to split this into subdirectories, each with 1000 files in them.
Here's the command I've tried using:
i=0; for f in *; do d=dir_$(printf %03d $((i/1000+1))); mkdir -p $d; mv "$f" $d; let i++; done
That command works for me on a small scale, but I can leave it running for hours on this directory and it doesn't seem to do anything.
I'm open for doing this in any way via command line: perl, python, etc. Just whatever way would be the fastest to get this done...
I suspect that if you checked, you'd noticed your program was actually moving the files, albeit really slowly. Launching a program is rather expensive (at least compared to making a system call), and you do so three or four times per file! As such, the following should be much faster:
perl -e'
my $base_dir_qfn = ".";
my $i = 0;
my $dir;
opendir(my $dh, $base_dir_qfn)
or die("Can'\''t open dir \"$base_dir_qfn\": $!\n");
while (defined( my $fn = readdir($dh) )) {
next if $fn =~ /^(?:\.\.?|dir_\d+)\z/;
my $qfn = "$base_dir_qfn/$fn";
if ($i % 1000 == 0) {
$dir_qfn = sprintf("%s/dir_%03d", $base_dir_qfn, int($i/1000)+1);
mkdir($dir_qfn)
or die("Can'\''t make directory \"$dir_qfn\": $!\n");
}
rename($qfn, "$dir_qfn/$fn")
or do {
warn("Can'\''t move \"$qfn\" into \"$dir_qfn\": $!\n");
next;
};
++$i;
}
'
Note: ikegami's helpful Perl-based answer is the way to go - it performs the entire operation in a single process and is therefore much faster than the Bash + standard utilities solution below.
A bash-based solution needs to avoid loops in which external utilities are called order to perform reasonably.
Your own solution calls two external utilities and creates a subshell in each loop iteration, which means that you'll end up creating about 7.5 million processes(!) in total.
The following solution avoids loops, but, given the sheer number of input files, will still take quite a while to complete (you'll end up creating 4 processes for every 1000 input files, i.e., ca. 10,000 processes in total):
printf '%s\0' * | xargs -0 -n 1000 bash -O nullglob -c '
dirs=( dir_*/ )
dir=dir_$(printf %04s $(( 1 + ${#dirs[#]} )))
mkdir "$dir"; mv "$#" "$dir"' -
printf '%s\0' * prints a NUL-separated list of all files in the dir.
Note that since printf is a Bash builtin rather than an external utility, the max. command-line length as reported by getconf ARG_MAX does not apply.
xargs -0 -n 1000 invokes the specified command with chunks of 1000 input filenames.
Note that xargs -0 is nonstandard, but supported on both Linux and BSD/OSX.
Using NUL-separated input robustly passes filenames without fear of inadvertently splitting them into multiple parts, and even works with filenames with embedded newlines (though such filenames are very rare).
bash -O nullglob -c executes the specified command string with option nullglob turned on, which means that a globbing pattern that matches nothing will expand to the empty string.
The command string counts the output directories created so far, so as to determine the name of the next output dir with the next higher index, creates the next output dir, and moves the current batch of (up to) 1000 files there.
if the directory is not under use, I suggest the following
find . -maxdepth 1 -type f | split -l 1000 -d -a 5
this will create n number of files named x00000 - x02500 (just to make sure 5 digits although 4 will work too). You can then move the 1000 files listed in each file to a corresponding directory.
perhaps set -o noclobber to eliminate risk of overrides in case of name clash.
to move the files, it's easier to use brace expansion to iterate over file names
for c in x{00000..02500};
do d="d$c";
mkdir $d;
cat $c | xargs -I f mv f $d;
done
Moving files around is always a challenge. IMHO all the solutions presented so far have some risk of destroying your files. This may be because the challenge sounds simple, but there is a lot to consider and to test when implementing it.
We must also not underestimate the efficiency of the solution as we are potentially handling a (very) large number of files.
Here is script carefully & intensively tested with own files. But of course use at your own risk!
This solution:
is safe with filenames that contain spaces.
does not use xargs -L because this will easily result in "Argument list too long" errors
is based on Bash 4 and does not depend on awk, sed, tr etc.
is scaling well with the amount of files to move.
Here is the code:
if [[ "${BASH_VERSINFO[0]}" -lt 4 ]]; then
echo "$(basename "$0") requires Bash 4+"
exit -1
fi >&2
opt_dir=${1:-.}
opt_max=1000
readarray files <<< "$(find "$opt_dir" -maxdepth 1 -mindepth 1 -type f)"
moved=0 dirnum=0 dirname=''
for ((i=0; i < ${#files[#]}; ++i))
do
if [[ $((i % opt_max)) == 0 ]]; then
((dirnum++))
dirname="$opt_dir/$(printf "%02d" $dirnum)"
fi
# chops the LF printed by "find"
file=${files[$i]::-1}
if [[ -n $file ]]; then
[[ -d $dirname ]] || mkdir -v "$dirname" || exit
mv "$file" "$dirname" || exit
((moved++))
fi
done
echo "moved $moved file(s)"
For example, save this as split_directory.sh. Now let's assume you have 2001 files in some/dir:
$ split_directory.sh some/dir
mkdir: created directory some/dir/01
mkdir: created directory some/dir/02
mkdir: created directory some/dir/03
moved 2001 file(s)
Now the new reality looks like this:
some/dir contains 3 directories and 0 files
some/dir/01 contains 1000 files
some/dir/02 contains 1000 files
some/dir/03 contains 1 file
Calling the script again on the same directory is safe and returns almost immediately:
$ split_directory.sh some/dir
moved 0 file(s)
Finally, let's take a look at the special case where we call the script on one of the generated directories:
$ time split_directory.sh some/dir/01
mkdir: created directory 'some/dir/01/01'
moved 1000 file(s)
real 0m19.265s
user 0m4.462s
sys 0m11.184s
$ time split_directory.sh some/dir/01
moved 0 file(s)
real 0m0.140s
user 0m0.015s
sys 0m0.123s
Note that this test ran on a fairly slow, veteran computer.
Good luck :-)
This is probably slower than a Perl program (1 minute for 10.000 files) but it should work with any POSIX compliant shell.
#! /bin/sh
nd=0
nf=0
/bin/ls | \
while read file;
do
case $(expr $nf % 10) in
0)
nd=$(/usr/bin/expr $nd + 1)
dir=$(printf "dir_%04d" $nd)
mkdir $dir
;;
esac
mv "$file" "$dir/$file"
nf=$(/usr/bin/expr $nf + 1)
done
With bash, you can use arithmetic expansion $((...)).
And of course this idea can be improved by using xargs - should not take longer than ~ 45 sec for 2.5 million files.
nd=0
ls | xargs -L 1000 echo | \
while read cmd;
do
nd=$((nd+1))
dir=$(printf "dir_%04d" $nd)
mkdir $dir
mv $cmd $dir
done
I would use the following from the command line:
find . -maxdepth 1 -type f |split -l 1000
for i in `ls x*`
do
mkdir dir$i
mv `cat $i` dir$i& 2>/dev/null
done
Key is the "&" which threads out each mv statement.
Thanks to karakfa for the split idea.

jq cast result into bash array

next to my attempt to parse a JSON response from curl using bash I now decided to give a try with jq.
I have checked the documentation but I could not find a way to iterate trough the elements and "do" something.
Here's an idea on what I am trying to achieve, cast the result from jq into an array, (it doesn't work)
__json=$($omd_response | ~/local-workspace/bash/jq -r '[.]')
for x in "${__json[#]}"
do
echo "-metadata" $x
done
Any other idea is much appreciated.
Thanks
This:
declare -a things
things=($(jq tostring myfile.json) )
for x in "${things[#]}"; do
echo "-metadata" "$x"
done
almost works. It splits things on whitespace.
This works:
declare -a things
OIFS=$IFS
IFS= things=($(jq -r 'tojson|tostring' myfile.json) )
IFS=$OIFS
for x in "${things[#]}"; do
echo "-metadata" "$x"
done
Really, we need a JSON-aware shell... Something like ksh93's compound variables, but JSON-compatible.

Shell Script call a function with a variable?

Hi I'm creating a shell script.
and an example code looks like
#!/bin/bash
test_func(){
{
echo "It works!"
}
funcion_name = "test_func"
I want to somehow be able to call test_func() using the variable "function_name"
I know that's possible in php using call_user_func($function_name) or by sying $function_name()
is this also possible in the shell scripting?
Huge appreciation for the help! :)
You want the bash built-in eval. From man bash:
eval [arg ...]
The args are read and concatenated together into a single command. This command is then read and executed by the shell, and its exit status is returned as the value of eval. If there are no
args, or only null arguments, eval returns 0.
You can also accomplish it with simple variable substitution, as in
#!/bin/bash
test_func() {
echo "It works!"
}
function_name="test_func"
$function_name
#!/bin/bash
test_func() {
echo "It works!"
}
function_name="test_func"
eval ${function_name}