How to strip control characters when saving output to variable? - tcl

Trying to strip control characters such as ^[[1m and ^[(B^[[m from ^[[1mfoo^[(B^[[m.
$ cat test.sh
#! /bin/bash
bold=$(tput bold)
normal=$(tput sgr0)
printf "%s\n" "Secret:"
printf "$bold%s$normal\n" "foo"
printf "%s\n" "Done"
$ cat test.exp
#!/usr/bin/expect
log_file -noappend ~/Desktop/test.log
spawn ~/Desktop/test.sh
expect {
-re {Secret:\r\n(.+?)\r\nDone} {
set secret $expect_out(1,string)
}
}
$ expect ~/Desktop/test.exp
spawn ~/Desktop/test.sh
Secret:
foo
Done
$ cat -e ~/Desktop/test.log
spawn ~/Desktop/test.sh^M$
Secret:^M$
^[[1mfoo^[(B^[[m^M$
Done^M$

The escape sequences depend on the TERM variable. You can avoid getting them in the first place by pretending to have a dumb terminal:
set env(TERM) dumb
spawn ~/Desktop/test.sh
This works for the provided example. If it will work in the real case is impossible to tell from the provided information. That depends on whether the program actually uses termcaps to generate the escape sequences.

I don't see any way in expect to add hooks to manipulate the data being read before it's matched/logged/etc. However, you can add another layer into your pipeline to strip ANSI escapes from what the real program being run outputs before expect sees it by adjusting your test.exp:
set csi_re [subst -nocommands {\x1B\\[[\x30-\x3F]*[\x20-\x2F]*[\x40-\x7E]}]
spawn sh -c "~/Desktop/test.sh | sed 's/$csi_re//g'"
This uses sed to strip out all strings that match ANSI terminal CSI escape sequences from test.sh's output.

Related

not able to store sed output to variable

I am new to bash script.
I am getting some json response and i get only one property from the response. I want to save it to a variable but it is not working
token=$result |sed -n -e 's/^.*access_token":"//p' | cut -d'"' -f1
echo $token
it returns blank line.
I cannot use jq or any third party tools.
Please let me know what I am missing.
Your command should be:
token=$(echo "$result" | sed -n -e 's/^.*access_token":"//p' | cut -d'"' -f1)
You need to use echo to print the contents of the variable over standard output, and you need to use a command substitution $( ) to assign the output of the pipeline to token.
Quoting your variables is always encouraged, to avoid problems with white space and glob characters like *.
As an aside, note that you can probably obtain the output using something like:
token=$(jq -r .access_token <<<"$result")
I know you've said that you can't use jq but it's a standalone binary (no need to install it) and treats your JSON in the correct way, not as arbitrary text.
Give this a try:
token="$(sed -E -n -e 's/^.*access_token": ?"//p' <<<"$result" | cut -d'"' -f1)"
Explanation:
token="$( script here )" means that $token is set to the output/result of the script run inside the subshell through a process known as command substituion
-E in sed allows Extended Regular Expressions. We want this because JSON generally contains a space after the : and before the next ". We use the ? after the space to tell sed that the space may or may not be present.
<<<"$result" is a herestring that feeds the data into sed as stdin in place of a file.

How to use non-displaying characters like newline (\n) and tab (\t) with jq's "join" function

I couldn't find this anywhere on the internet, so figured I'd add it as documentation.
I wanted to join a json array around the non-displaying character \30 ("RecordSeparator") so I could safely iterate over it in bash, but I couldn't quite figure out how to do it. I tried echo '["one","two","three"]' | jq 'join("\30")' and a couple permutations of that, but it didn't work.
Turns out the solution is pretty simple.... (See answer)
Use jq -j to eliminate literal newlines between records and use only your own delimiter. This works in your simple case:
#!/usr/bin/env bash
data='["one","two","three"]'
sep=$'\x1e' # works only for non-NUL characters, see NUL version below
while IFS= read -r -d "$sep" rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j --arg sep "$sep" 'join($sep)' <<<"$data")
...but it also works in a more interesting scenario where naive answers fail:
#!/usr/bin/env bash
data='["two\nlines","*"]'
while IFS= read -r -d $'\x1e' rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j 'join("\u001e")' <<<"$data")
returns (when run on Cygwin, hence the CRLF):
Record: $'two\r\nlines'
Record: \*
That said, if using this in anger, I would suggest using NUL delimiters, and filtering them out from the input values:
#!/usr/bin/env bash
data='["two\nlines","three\ttab-separated\twords","*","nul\u0000here"]'
while IFS= read -r -d '' rec || [[ $rec ]]; do
printf 'Record: %q\n' "$rec"
done < <(jq -j '[.[] | gsub("\u0000"; "#NUL#")] | join("\u0000")' <<<"$data")
NUL is a good choice because it's a character than can't be stored in C strings (like the ones bash uses) at all, so there's no loss in the range of data which can be faithfully conveyed when they're excised -- if they did make it through to the shell, it would (depending on version) either discard them, or truncate the string at the point when one first appears.
The recommended way to solve the problem is to use the -c command-line
option, e.g. as follows:
echo "$data" | jq -c '.[]' |
while read -r rec
do
echo "Record: $rec"
done
Output:
Record: "one"
Record: "two"
Record: "three"
Problems with the OP's proposed answer
There are several problems with the proposal in the OP's answer based on $'\30'
First, it doesn't work reliably, e.g. using bash on a Mac
the output is: Record: "one\u0018two\u0018three";
this is because jq correctly converts octal 30 to \u0018
within the JSON string.
Second, RS is ASCII decimal 30, i.e. octal 36, which
would be written as $'\36' in the shell.
If you use this value instead, the program produces:
Record: "one\u001etwo\u001ethree" because that is
the correct JSON string with embedded RS characters. (For the record $'\30' is Control-X.)
Third, as noted by Charles Duffy, "for rec in $(...) is inherently buggy."
Fourth, any approach which assumes jq will in future accept
illegal JSON strings is brittle in the sense that in the
future, jq might disallow them or at least require a command-line
switch to allow them.
Fifth, unset IFS is not guaranteed to restore IFS to its state beforehand.
The RS character is special in jq when used with the --seq command-line option. For example, with a JSON array stored in a shell variable named data we could invoke jq as follows:
$ jq -n --seq --argjson arg '[1,2]' '$arg | .[]'
Here is a transcript:
$ data='["one","two","three"]'
$ jq -n --seq --argjson arg "$data" '$arg | .[]' | tr $'\36' X
X"one"
X"two"
X"three"
$
You simply use bash's $'\30' syntax to insert the special character in-line, like so: echo '["one","two","three"]' | jq '. | join("'$'\30''")'.
Here's the whole working example:
data='["one","two","three"]'
IFS=$'\30'
for rec in $(echo "$data" | jq '. | join("'$'\30''")'); do
echo "Record: $rec"
done
unset IFS
This prints
Record: one
Record: two
Record: three
as expected.
NOTE: It's important not to quote the subshell in the for loop. If you quote it, it will be taken as a single argument, regardless of the RecordSeparator characters. If you don't quote it, it will work as expected.

How to download and then use the file in the same tcl script?

I'm new using Tcl and I have the following script:
proc prepare_xml {pdb_id} {
set filename [exec wget ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz]
set filename_unzip [exec gunzip "$pdb_id.xml.gz"]
set ready_xml [exec sed -i "/entry /c\<entry>" "$pdb_id.xml"]
return $ready_xml
}
The expected output is the file "filename" uncompress and modified. However, when I execute it the first time, it only downloads the file and it does not uncompress it. If I execute it for a second time, I obtained the expected output and a second copy of the original downloaded file.
Can anyone help me with this? I've tried with after and vwait commands but it doesn't work.
Thank you :)
It's hard to say for sure as you're not describing whether any errors are thrown (that'd be the only reason for the code to not run to completion), but I'd expect something like this to be the right approach:
proc prepare_xml {pdb_id} {
# Double quotes on next line just because of Stack Overflow highlighter
set url "ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/$pdb_id.xml.gz"
set file $pdb_id.xml
append sedcode {/entry /} "c\\\n" {<entry>}
exec wget -q -O - $url | gunzip -c | sed $sedcode > $file
return $file
}
Firstly, I'm keeping complicated bits in (local) variables to stop the exec line from getting too long. Secondly, I've put all the subprocesses together in the one pipeline. Thirdly, I'm using -q and -O - with wget, and -c with gunzip; look up what they do if you don't understand them. Fourthly, I've put the scriptlet for sed in braces where possible to stop there from being trouble with backslashes, but I've used append and a non-backslashed section to make the pattern because the syntax of c in sed is downright weird (it needs a backslash-newline sequence immediately after on at least some platforms…)
I'd actually use native Tcl code to extract and transform the data if I was doing it for me, but that's a rather larger change.

How to Convert Regex Pattern Match to Lowercase for URL Standardization/Tidying

I am currently trying to convert all links and files and tags on my site from UPPERCASE.ext and CamelCase.ext to lowercase.ext.
I can match the links in pages using a regular expression match for href="[^"]*" and src="[^"]*"
This seems to work fine for identifying the link and images in the HTML.
However what I need to do with this is to take the match and run a ToLowercase() function on the matches. Since I have a lot of pages that I'd like to parse through, I'm looking to make a short shell script that will run on a specified directory and pattern match the specified regexes and perform a lowercase operation on them.
Perl one-liner to rename all regular files to lowercase:
perl -le 'use File::Find; find({wanted=>sub{-f && rename($_, lc)}}, "/path/to/files");'
If you want to be more specific about what files are renamed you could change -f to a regex or something:
perl -le 'use File::Find; find({wanted=>sub{/\.(txt|htm|blah)$/i && rename($_, lc)}}, "/path/to/files");'
EDIT: Sorry, after rereading the question I see you also want to replace occurrences within files as well:
find /path/to/files -name "*.html" -exec perl -pi -e 's/\b(src|href)="(.+)"/$1="\L$2"/gi;' {} \;
EDIT 2: Try this one as the find command uses + instead of \; which is more efficient since multiple files are passed to perl at once (thanks to #ikegami from another post). It also It also handles both ' and " around the URL. Finally, it uses {} instead of // for substitutions since you are substituting URLs (maybe the /s in the URL are confusing perl or your shell?). It shouldn't matter, and I tried both on my system with the same effect (both worked fine), but it's worth a shot:
find . -name "*.html" -exec perl -pi -e \
'$q=qr/"|\x39/; s{\b(src|href)=($q?.+$q?)\b}{$1=\L$2}gi;' {} +
PS: I also have a Macbook and tested these using bash shell with Perl versions 5.8.9 and 5.10.0.
With bash, you can declare a variable to only hold lower case values:
declare -l varname
read varname <<< "This Is LOWERCASE"
echo $varname # ==> this is lowercase
Or, you can convert a value to lowercase (bash version 4, I think)
x="This Is LOWERCASE"
echo ${x,,} # ==> this is lowercase
you want this?
kent$ echo "aBcDEF"|sed 's/.*/\L&/g'
abcdef
or this
kent$ echo "aBcDEF"|awk '$0=tolower($0)'
abcdef
with your own regex:
kent$ echo 'FOO src="htTP://wWw.GOOGLE.CoM" BAR BlahBlah'|sed -r 's/src="[^"]*"/\L&/g'
FOO src="http://www.google.com" BAR BlahBlah
You could use sed with -i (in-place edit):
sed -i'' -re's/(href|src)="[^"]*"/\L&/g' /path/to/files/*

how to pass command line parameter containing '<' to 'exec'

$ date > '< abcd'
$ cat '< abcd'
<something>
$ tclsh8.5
% exec cat {< abcd}
couldn't read file " abcd": no such file or directory
whoops. This is due to the the specification of 'exec'.
If an arg (or pair of args) has one of the forms described below then it is used by exec to control the flow of input and output among the subprocess(es). Such arguments will not be passed to the subprocess(es). In forms such as “< fileName”, fileName may either be in a separate argument from “<” or in the same argument with no intervening space".
Is there a way to work around this?
Does the value have to be passed as an argument? If not, you can use something like this:
set strToPass "< foo"
exec someProgram << $strToPass
For filenames, you can (almost always) pass the fully qualified name instead. The fully qualified name can be obtained with file normalize:
exec someProgram [file normalize "< foo"] ;# Odd filename!
But if you need to pass in an argument where < (or >) is the first character, you're stuck. The exec command always consumes such arguments as redirections; unlike with the Unix shell, you can't just use quoting to work around it.
But you can use a helper program. Thus, on Unix you can do this:
exec /bin/sh -c "exec someProgram \"$strToPass\""
(The subprogram just replaces itself with what you want to run passing in the argument you really wanted. You might need to use string map or regsub to put backslashes in front of problematic metacharacters.)
On Windows, you have to write a batch file and run that, which has a lot of caveats and nasty side issues, especially for GUI applications.
One simple solution: ensure the word does not begin with the redirection character:
exec cat "./< abcd"
One slightly more complex:
exec sh -c {cat '< abcd'}
# also
set f {< abcd}
exec sh -c "cat '$f'"
This page on the Tcl Wiki talks about the issue a bit.
Have you tried this?
% exec {cat < abcd}
Try:
set myfile "< abcd"
exec cat $myfile