Expect: extract specific string from output - tcl

I am navigating a Java-based CLI menu on a remote machine with expect inside a bash script and I am trying to extract something from the output without leaving the expect session.
Expect command in my script is:
expect -c "
spawn ssh user#host
expect \"#\"
send \"java cli menu command here\r\"
expect \"java cli prompt\"
send \"java menu command\"
"
###I want to extract a specific string from the above output###
Expect output is:
Id Name
-------------------
abcd 12 John Smith
I want to extract abcd 12 from the above output into another expect variable for further use within the expect script. So that's the 3rd line, first field by using a double-space delimiter. The awk equivalent would be: awk -F ' ' 'NR==3 {$1}'
The big issue is that the environment through which I am navigating with Expect is, as I stated above, a Java CLI based menu so I can't just use awk or anything else that would be available from a bash shell.
Getting out from the Java menu, processing the output and then getting in again is not an option as the login process lasts for 15 seconds so I need to remain inside and extract what I need from the output using expect internal commands only.

You can use regexp in expect itself directly with the use of -re flag. Thanks to Donal on pointing out the single quote and double quote issues. I have given solution using both ways.
I have created a file with the content as follows,
Id Name
-------------------
abcd 12 John Smith
This is nothing but your java program's console output. I have tested this in my system with this. i.e. I just simulated your program's output with cat. You just replace the cat code with your program commands. Simple. :)
Double Quotes :
#!/bin/bash
expect -c "
spawn ssh user#domain
expect \"password\"
send \"mypassword\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input_file\r\"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
"
Single Quotes :
#!/bin/bash
expect -c '
spawn ssh user#domain
expect "password"
send "mypasswordhere\r"
expect "\\\$" { puts matched_literal_dollar_sign}
send "cat input_file\r"; # Replace this code with your java program commands
expect -re {-\r\n(.*?)\s\s}
set output $expect_out(1,string)
#puts $expect_out(1,string)
puts "Result : $output"
'
As you can see, I have used {-\r\n(.*?)\s\s}. Here the braces prevent any variable substitutions. In your output, we have a 2nd line with full of hyphens. Then a newline. Then your 3rd line content. Let's decode the regex used.
-\r\n is to match one literal hyphen and a new line together. This will match the last hyphen in the 2nd line and the newline which in turn make it to 3rd line now. So, .*? will match the required output (i.e. abcd 12) till it encounters double space which is matched by \s\s.
You might be wondering why I need parenthesis which is used to get the sub-match patterns.
In general, expect will save the expect's whole match string in expect_out(0,string) and buffer all the matched/unmatched input to expect_out(buffer). Each sub match will be saved in subsequent numbering of string such as expect_out(1,string), expect_out(2,string) and so on.
As Donal pointed out, it is better to use single quote's approach since it looks less messy. :)
It is not required to escape the \r with the backslash in case of double quotes.
Update :
I have changed the regexp from -\r\n(\w+\s+\w+)\s\s to -\r\n(.*?)\s\s.
With this way - your requirement - such as match any number of letters and single spaces until you encounter first occurrence of double spaces in the output
Now, let's come to your question. You have mentioned that you have tried -\r\n(\w+)\s\s. But, there is a problem here with \w+. Remember \w+ will not match space character. Your output has some spaces in it till double spaces.
The use of regexp will matter based on your requirements on the input string which is going to get matched. You can customize the regular expressions based on your needs.
Update version 2 :
What is the significance of .*?. If you ask separately, I am going to repeat what you commented. In regular expressions, * is a greedy operator and ? is our life saver. Let us consider the string as
Stackoverflow is already overflowing with number of users.
Now, see the effect of the regular expression .*flow as below.
* matches any number of characters. More precisely, it matches the longest string possible while still allowing the pattern itself to match. So, due to this, .* in the pattern matched the characters Stackoverflow is already over and flow in pattern matched the text flow in the string.
Now, in order to prevent the .* to match only up to the first occurrence of the string flow, we are adding the ? to it. It will help the pattern to behave as non-greedy manner.
Now, again coming back to your question. If we have used .*\s\s, then it will match the whole line since it is trying to match as much as possible. This is common behavior of regular expressions.
Update version 3:
Have your code in the following way.
x=$(expect -c "
spawn ssh user#host
expect \"password\"
send \"password\r\"
expect {\\\$} { puts matched_literal_dollar_sign}
send \"cat input\r\"
expect -re {-\r\n(.*?)\s\s}
if {![info exists expect_out(1,string)]} {
puts \"Match did not happen :(\"
exit 1
}
set output \$expect_out(1,string)
#puts \$expect_out(1,string)
puts \"Result : \$output\"
")
y=$?
# $x now contains the output from the 'expect' command, and $y contains the
# exit status
echo $x
echo $y;
If the flow happened properly, then exit code will have value as 0. Else, it will have 1. With this way, you can check the return value in bash script.
Have a look at here to know about the info exists command.

Related

unable to get return value from MariaDB via perl DBI [duplicate]

I'm getting a bunch of text from an outside source, saving it in a variable, and then displaying that variable as part of a larger block of HTML. I need to display it as is, and dollar signs are giving me trouble.
Here's the setup:
# get the incoming text
my $inputText = "This is a $-, as in $100. It is not a 0.";
print <<"OUTPUT";
before-regex: $inputText
OUTPUT
# this regex seems to have no effect
$inputText =~ s/\$/\$/g;
print <<"OUTPUT";
after-regex: $inputText
OUTPUT
In real life, those print blocks are much larger chunks of HTML with variables inserted directly.
I tried escaping the dollar signs using s/\$/\$/g because my understanding is that the first \$ escapes the regex so it searches for $, and the second \$ is what gets inserted and later escapes the Perl so that it just displays $. But I can't get it to work.
Here's what I'm getting:
before-regex: This is a 0, as in . It is not a 0.
after-regex: This is a 0, as in . It is not a 0.
And here's what I want to see:
before-regex: This is a 0, as in . It is not a 0.
after-regex: This is a $-, as in $100. It is not a 0.
Googling brings me to this question. When I try using the array and for loop in the answer, it has no effect.
How can I get the block output to display the variable exactly as it is?
When you construct a string with double-quotes, the variable substitution happens immediately. Your string will never contain the $ character in that case. If you want the $ to appear in the string, either use single-quotes or escape it, and be aware that you will not get any variable substitution if you do that.
As for your regex, that is odd. It is looking for $ and replacing them with $. If you want backslashes, you have to escape those too.
And here's what I want to see:
before-regex: This is a 0, as in . It is not a 0.
after-regex: This is a $-, as in $100. It is not a 0.
hum, well, I'm not sure what the general case is, but maybe the following will do:
s/0/\$-/;
s/in \K/\$100/;
Or did you mean to start with
my $inputText = "This is a \$-, as in \$100. It is not a 0.";
# Produces the string: This is a $-, as in $100. It is not a 0.
or
my $inputText = 'This is a $-, as in $100. It is not a 0.';
# Produces the string: This is a $-, as in $100. It is not a 0.
Your mistake is using double quotes instead of single quotes in the declaration of your variable.
This should be :
# get the incoming text
my $inputText = 'This is a $-, as in $100. It is not a 0.';
Learn the difference between ' and " and `. See http://mywiki.wooledge.org/Quotes and http://wiki.bash-hackers.org/syntax/words
This is for shell, but it's the same in Perl.

Similar strings, different results

I'm creating a Bash script to parse the air pollution levels from the webpage:
http://aqicn.org/city/beijing/m/
There is a lot of stuff in the file, but this is the relevant bit:
"iaqi":[{"p":"pm25","v":[59,21,112],"i":"Beijing pm25 (fine
particulate matter) measured by U.S Embassy Beijing Air Quality
Monitor
(\u7f8e\u56fd\u9a7b\u5317\u4eac\u5927\u4f7f\u9986\u7a7a\u6c14\u8d28\u91cf\u76d1\u6d4b).
Values are converted from \u00b5g/m3 to AQI levels using the EPA
standard."},{"p":"pm10","v":[15,5,69],"i":"Beijing pm10
(respirable particulate matter) measured by Beijing Environmental
Protection Monitoring Center
I want the script to parse and display 2 numbers: current PM2.5 and PM10 levels (the numbers in bold in the above paragraph).
CITY="beijing"
AQIDATA=$(wget -q 0 http://aqicn.org/city/$CITY/m/ -O -)
PM25=$(awk -v FS="(\"p\":\"pm25\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
PM100=$(awk -v FS="(\"p\":\"pm10\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
echo $PM25 $PM100
Even though I can get PM2.5 levels to display correctly, I cannot get PM10 levels to display. I cannot understand why, because the strings are similar.
Anyone here able to explain?
The following approach is based on two steps:
(1) Extracting the relevant JSON;
(2) Extracting the relevant information from the JSON using a JSON-aware tool -- here jq.
(1) Ideally, the web service would provide a JSON API that would allow one to obtain the JSON directly, but as the URL you have is intended for viewing with a browser, some form of screen-scraping is needed. There is a certain amount of brittleness to such an approach, so here I'll just provide something that currently works:
wget -O - http://aqicn.org/city/beijing/m |
gawk 'BEGIN{RS="function"}
$1 ~/getAqiModel/ {
sub(/.*var model=/,"");
sub(/;return model;}/,"");
print}'
(gawk or an awk that supports multi-character RS can be used; if you have another awk, then first split on "function", using e.g.:
sed $'s/function/\\\n/g' # three backslashes )
The output of the above can be piped to the following jq command, which performs the filtering envisioned in (2) above.
(2)
jq -c '.iaqi | .[]
| select(.p? =="pm25" or .p? =="pm10") | [.p, .v[0]]'
The result:
["pm25",59]
["pm10",15]
I think your problem is that you have a single line HTML file that contains a script that contains a variable that contains the data you are looking for.
Your field delimiters are either "p":"pm100", "v":[ or a comma and some digits.
For pm25 this works, because it is the first, and there are no occurrences of ,21 or something similar before it.
However, for pm10, there are some that are associated with pm25 ahead of it. So the second field contains the empty string between ,21 and ,112
#karakfa has a hack that seems to work -- but he doesn't explain very well why it works.
What he does is use awk's record separator (which is usually a newline) and sets it to either of :, ,, or [. So in your case, one of the records would be "pm25", because it is preceded by a colon, which is a separator, and succeeded by a comma, also a separator.
Once it hits the matching content ("pm25") it sets a counter to 4. Then, for this and the next records, it counts this counter down. "pm25" itself, "v", the empty string between : and [, and finally reaches one when hitting the record with the number you want to output: 4 && ! 3 is false, 3 && ! 2 is false, 2 && ! 1 is false, but 1 && ! 0 is true. Since there is no execution block, awk simply prints this record, which is the value you want.
A more robust work would probably be using xpath to find the script, then use some json parser or similar to get the value.
chw21's helpful answer explains why your approach didn't work.
peak's helpful answer is the most robust, because it employs proper JSON parsing.
If you don't want to or can't use third-party utility jq for JSON parsing, I suggest using sed rather than awk, because awk is not a good fit for field-based parsing of this data.
$ sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA"
59 15
The above should work with both GNU and BSD/OSX sed.
To read the result into variables:
read pm25 pm10 < \
<(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA")
Note how I've chosen lowercase variable names, because it's best to avoid all upper-case variables in shell programming, so as to avoid conflicts with special shell and environment variables.
If you can't rely on the order of the values in the source string, use two separate sed commands:
pm25=$(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
pm10=$(sed -E 's/^.*"pm10"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
awk to the rescue!
If you have to, you can use this hacky way using smart counters with hand-crafted delimiters. Setting RS instead of FS transfers looping through fields to awk itself. Multi-char RS is not available for all awks (gawk supports it).
$ awk -v RS='[:,[]' '$0=="\"pm25\""{c=4} c&&!--c' file
59
$ awk -v RS='[:,[]' '$0=="\"pm10\""{c=4} c&&!--c' file
15

Line continuation in command substitution

Given this short code snippet:
set a [list \
foo \
bar
]
When I remove the \ after list I get invalid command name "foo" and when I remove the \ after foo I get invalid command name "bar".
However, the code as I have put it runs fine and I do not get something like invalid command name "]".
Why is no \ required after bar?
The contents of a […] sequence is actually a script, not just a command (though people mostly don't take advantage of this because they're not crazy). Putting extra blank lines at the end of the script doesn't change what the result of the script is: it's the result of the last command in the script (unless the script is empty, when it is the empty string).
Leaving the \ characters out means that there's a multi-command script in there, which is legal (though you've not created the commands foo or bar). Having them in means that you've got an overall script that is equivalent to:
set a [list foo bar
]
With exactly those spaces. That's usually practically identical to this:
set a [list foo bar]
But the difference is there.
There is no error message because the recursive evaluation ends at the close bracket. It's not part of the command substitution script. As far as the recursive evaluation in the command substitution is concerned, that's just a list command with some white space following it.

if all else fails tcl script fails

I am trying to make a script to transfer file to another device. Since I cannot account for every error that may occur, I am trying to make an if-all-else fails situation:
spawn scp filename login#ip:filename
expect "word:"
send "password"
expect {
"100" {
puts "success"
} "\*" {
puts "Failed"
}
}
This always returns a Failed message and does not even transfer the file, where as this piece of code:
spawn scp filename login#ip:filename
expect "word:"
send "password"
expect "100"
puts "success"
shows the transfer of the file and prints a success message.
I cant understand what is wrong with my if-expect statement n the first piece of code.
The problem is because of \*. The backslash will be translated by Tcl, thereby making the \* into * alone which is then passed to expect as
expect *
As you know, * matches anything. This is like saying, "I don't care what's in the input buffer. Throw it away." This pattern always matches, even if nothing is there. Remember that * matches anything, and the empty string is anything! As a corollary of this behavior, this command always returns immediately. It never waits for new data to arrive. It does not have to since it matches everything.
I don't know why you have used *. Suppose, if your intention is to match literal asterisk sign, then use \\*.
The string \\* is translated by Tcl to \*. The pattern matcher then interprets the \* as a request to match a literal *.
expect "*" ;# matches * and? and X and abc
expect "\*" ;# matches * and? and X and abc
expect "\\*" ;# matches * but not? or X or abc
Just remember two rules:
Tcl translates backslash sequences.
The pattern matcher treats backs lashed characters as literals.
Note : Apart from question, one observation. You are referring your expect block as a if-else block. It is not same as If-Else block.
The reason is, in traditional if-else block, we know for sure that at least one of that block will be executed. But, in expect, it is not the case. It is more of like multiple if blocks alone.

Output of array as comma separated BASH

I'm trying to pull variables from an API in json format and then put them back together with one variable changed and fire them back as a put.
Only issue is that every value has quote marks in it and must go back to the API separated by commas only.
example of what it should see with redacted information, variables inside the **'s:
curl -skv -u redacted:redacted -H Content-Type: application/json -X PUT -d'{properties:{basic:{request_rules:[**"/(req) testrule","/test-body","/(req) test - Admin","test-Caching"**]}}}' https://x.x.x.x:9070/api/tm/1.0/config/active/vservers/xxx-xx
Obviously if I fire them as a plain array I get spaces instead of commas. However I tried outputting it as a plain string
longstr=$(echo ${valuez[#]})
output=$(echo $longstr |sed -e 's/" /",/g')
And due to the way bash is interpreted it seems to either interpret the quotes wrong or something else. I guess it might well be the single ticks encapsulating after the PUT -d as well but I'm not sure how I can throw a variable into something that has single ticks.
If I put the raw data in manually it works so it's either the way the variable is being sent or the single ticks. I don't get an error and when I echo the line out it looks perfect.
Any ideas?
valuez=( "/(req) testrule" "/test-body" "/(req) test - Admin" "test-Caching" )
# Temporarily set IFS to some character which is known not to appear in the array.
oifs=$IFS
IFS=$'\014'
# Flatten the array with the * expansion giving a string containing the array's elements separated by the first character of $IFS.
d_arg="${valuez[*]}"
IFS=$oifs
# If necessary, quote or escape embedded quotation marks. (Implementation-specific, using doubled double quotes as an example.)
d_arg="${d_arg//\"/\"\"}"
# Substitute the known-to-be-absent character for the desired quote+separator+quote.
d_arg="${d_arg//$'\014'/\",\"}"
# Prepend and append quotes.
d_arg="\"$d_arg\""
# insert the prepared arg into the final string.
d_arg="{properties:{basic:{request_rules:[${d_arg}]}}}"
curl ... -d"$d_arg" ...
if you have gnu awk with version 4 and above, which support FPAT
output=$(echo $longstr |awk '$1=$1' FPAT="(\"[^\"]+\")" OFS=",")
Explanation
FPAT #
This is a regular expression (as a string) that tells gawk to create the fields based on text that matches the regular expression. Assigning a value to FPAT overrides the use of FS and FIELDWIDTHS for field splitting. See Splitting By Content, for more information.
If gawk is in compatibility mode (see Options), then FPAT has no special meaning, and field-splitting operations occur based exclusively on the value of FS.
valuez=( "/(req) testrule" "/test-body" "/(req) test - Admin" "test-Caching" )
csv="" sep=""
for v in "${valuez[#]}"; do csv+="$sep\"$v\""; sep=,; done
echo "$csv"
"/(req) testrule","/test-body","/(req) test - Admin","test-Caching"
If it's something you need to do repeatedly, but it into a function:
toCSV () {
local csv sep val
for val; do
csv+="$sep\"$val\""
sep=,
done
echo "$csv"
}
csv=$(toCSV "${valuez[#]}")