Same regexp won't match on another machine - tcl

I have this Tcl8.5 code:
set regexp_str {^[[:blank:]]*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec[[:blank:]]*$}
set subject {
[ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec
[ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec
[ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec
}
set matches [regexp -line -inline -all -- $regexp_str $subject]
$matches populates with the matched data on one machine, while the other simply gets an empty list.
Both machines have Tcl8.5.
Using the -about flag of regexp, the following list is returned: 3 {REG_UUNPORT REG_ULOCALE}
I don't understand how could this be possible and what else should I do to debug it?
Edit #1, 17 Feb 07:00 UTC:
#Donal Fellows:
The patch level on the "good" machine is 8.5.15.
The patch level on the "bad" machine is 8.5.10.
I'm familiar with \s and \d, but as far as I know (please correct me), they both mean to a broader characters range than I need to:
\s includes newlines, which in my example mustn't exists.
\d includes Unicode digits, which I will not encounter in my example.
In regexp I generally prefer to be as specific as possible to avoid cases I didn't think of..
There's something which I didn't specify and could be important:
The variable $subject is populated using the expect_out(buffer) variable, following a grep command executed in shell.
expect_out(buffer) returns the output from a ssh session that is tunneled using a proxy called netcat (binary name is nc):
spawn ssh -o "ProxyCommand nc %h %p" "$username#$ipAddress"
In general, the output received & sent on this session is only ASCII/English characters.
The prompt of the destination PC contains control characters like ESC and BEL and they are contained in $subject.
I don't think of it to be a problem because that I tested the regular expression with all of these characters and it worked OK.
Thank you guys for the elaborated info!
Edit #2, 17 Feb 11:05 UTC:
Response to #Donal Fellows:
Indeed I've tried:
set regexp_str {^[[:blank:]]*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec[[:blank:]]*$}
puts [regexp -line -inline -all -- $regexp_str [string map {\r\n \n \r \n} $subject]]
and got (please ignore the different numbers in the output, the idea is the same):
{[ 5] 0.0-150.0 sec 86.7 MBytes 4.85 Mbits/sec} 150.0 4.85 M {[ 4] 0.0-150.8 sec 60.4 MBytes 3.36 Mbits/sec} 150.8 3.36 M
Also I tried to replace the [[:blank:]] from both sides of regexp string with \s:
set regexp_str {^\s*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec\s*$}
puts [regexp -line -inline -all -- $regexp_str $subject]
and it finally found what I needed:
{[ 5] 0.0-150.0 sec 86.7 MBytes 4.85 Mbits/sec
} 150.0 4.85 M {[ 4] 0.0-150.8 sec 60.4 MBytes 3.36 Mbits/sec
} 150.8 3.36 M

Tcl uses the same regular expression engine on all platforms. (But double-check whether you've got the same patchlevel on the two machines; that'll let us examine what — if any — exact code changes might there be between the systems.) It also shouldn't be anything related to newline terminators; Tcl automatically normalizes them under anything even remotely resembling normal circumstances (and in particular, does so in scripts).
With respect to the -about flags, only the 3 is useful (it's the number of capture groups). The other item in the list is the set of state flags set about the RE by the RE compiler, and frankly they're only useful to real RE experts (and our test suite). I've never found a use for them!
You can probably shorten your RE by using \s (mnemonically “spaces”) instead of that cumbersome [[:blank:]] and \d (“digits”) instead of [0-9]. When I do that, I get something quite a lot shorter and so easier to understand.
set regexp_str {^\s*\[\s*\d+\]\s+0\.0-(\d+\.\d+) sec.+(\d+\.\d+) ([MK]?)bits/sec\s*$}
It produces the same match groups.
[EDIT]: Even with the exact version of the code you report, checked out directly from the source code repository tag that was used to drive the 8.5.10 distribution, I can't reproduce your problem. However, the fact that it's really coming from an Expect buffer is really helpful; the problem may well actually be that the line separation sequence is not a newline but rather something else (CRLF — \r\n — is the number 1 suspect, but a plain carriage return could also be there). Expect is definitely not the same as normal I/O for various reasons (in particular, exact byte sequences are often needed in terminal handling).
The easiest thing might be to manually standardize the line separators before feeding the string into regexp. (This won't affect the string in the buffer; it copies, as usual with Tcl.)
regexp -line -inline -all -- $regexp_str [string map {\r\n \n \r \n} $subject]
It's also possible that there are other, invisible characters in the output. Working out what is really going on can be complex, but in general you can use a regular expression to test this theory by looking to see if the inverse of the set of expected characters is matchable:
regexp {[^\n [:graph:]]} $subject
When I try with what you pasted, that doesn't match (good!). If it does against your real buffer, it gives you a way to hunt the problem.

I saw that you are missing optional space(s) right after the first dash. I inserted those optional spaces in and all is working:
set regexp_str {^[[:blank:]]*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-[[:blank:]]*([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec[[:blank:]]*$}
# missing --> ^^^^^^^^^^^^
set subject {
[ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec
[ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec
[ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec
}
set matches [regexp -line -inline -all -- $regexp_str $subject]
puts "\n\n"
foreach {all a b c} $matches {
puts "- All: >$all<"
puts " >$a<"
puts " >$b<"
puts " >$c<"
}
Output
- All: > [ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec<
>1.0<
>5.57<
>M<
- All: > [ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec<
>150.0<
>8.56<
>M<
- All: > [ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec<
>1.0<
>0.00<
><
- All: > [ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec<
>150.4<
>2.15<
>M<
Update
When dealing with complex regular expression, I often break up the expression into several lines and add comments. The following is equivalent to my previous code, but more verbose and easier to troubleshoot. The key is to use and additional flag to the regexp command: the -expanded flag, which tells regexp to ignore any white spaces and comments in the expression.
set regexp_str {
# Initial blank
^[[:blank:]]*
# Bracket, number, optional spaces, bracket
\[[[:blank:]]*[0-9]+\]
# Spaces
[[:blank:]]+
# Number, dash, number
0\.0\-[[:blank:]]*([0-9]+\.[0-9]+)
# Unwanted stuff
[[:blank:]]sec.+
# Final number, plus unit
([0-9]+\.[0-9]+)[[:blank:]]([MK]?)bits/sec
# Trailing spaces
[[:blank:]]*$
}
set subject {
[ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec
[ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec
[ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec
}
set matches [regexp -expanded -line -inline -all -- $regexp_str $subject]
puts "\n\n"
foreach {all a b c} $matches {
puts "- All: >$all<"
puts " >$a<"
puts " >$b<"
puts " >$c<"
}

(ETA: the question is about regular expressions, so why am I talking about massaging a string into a list and picking items out of that? See the end of this answer.)
As a workaround, if you don't really need to use a regular expression, this code gives the exact same result:
set result [list]
foreach line [split [string trim $subject] \n] {
set list [string map {- { } / { }} $line]
lappend result \
$line \
[lindex $list 3] \
[lindex $list 7] \
[string map {Mbits M Kbits K bits {}} [lindex $list 8]]
}
The lines aren't strictly well-formed lists because of the brackets, but it does work.
To clarify:
the string trim command takes out the newlines before and after the data: they would otherwise yield extra empty elements
the split command creates a list of four elements, each corresponding to a line of data
the foreach command processes each of those elements
the string map command changes each - or / character into a space, essentially making it a (part of a) list item separator
the lappend incrementally builds the result list out of four items per line of data: the items are the whole line, the fourth item in the corresponding list, the eight item in the corresponding list, and the ninth item in the corresponding list after the string map command has shortened the strings Mbits, Kbits, and bits to M, K, and the empty string, respectively.
The thing is (moderate rant warning): regular expression matching isn't the only tool in the string analysis toolbox, even though it sometimes looks that way. Tcl itself is, among other things, a powerful string and list manipulation language, and usually far more readable than RE. There is also, for instance, scan: the scan expression "[ %*d] %*f- %f sec %*f %*s %f %s" captures the relevant fields out of the data strings (provided they are split into lines and processed separately) -- all that remains is to look at the last captured string to see if it begins with M, K, or something else (which would be b). This code gives the same result as my solution above and as your example:
set result [list]
foreach line [split [string trim $subject] \n] {
scan $line "\[ %*d\] %*f- %f sec %*f %*s %f %s" a b c
lappend result $line $a $b [string map {its/sec {} Mb M Kb K b {}} $c]
}
Regular expressions are very useful, but they are also hard to get right and to debug when they aren't quite right, and even when you've got them right they're still hard to read and, in the long run, to maintain. Since in very many cases they are actually overkill, it makes sense to at least consider if other tools can't do the job instead.

Related

Why expect does not match as expected?

The expect does not match as expected from the debugging mode in my case and I don't understand why...
The part of related tcl is like this
...
set index 0
set p [lindex $propname $index]
send "prove -property {<embedded>::wrapper.$p}\r"
expect {
"*proven\r\n\[<embedded>\] % " {
incr index
if {$index == [llength $propname]} {
send "exit\r"
expect "*bash-4.2$ "
send "exit\r"
close
}
set p [lindex $propname $index]
send "prove -property {<embedded>::wrapper.$p}\r"
exp_continue
}
"*cex\r\n\[<embedded>\] % " {
send "visualize -violation -property <embedded>::wrapper.$p\r"
expect "*\[<embedded>\] % "
send "visualize -save -vcd cex.vcd -force\r"
}
}
...
From the output of the debugging mode:
expect: does "prove -property {<embedded>::wrapper.x0_nouse}\r\nINFO (IPF031): Settings used for this proof:\r\n time_limit = 86400s\r\n per_property_time_limit = 1s * 10 ^ scan\r\n engine_mode = Hp Ht N B \r\n proofgrid_per_engine_max_jobs = 1\r\n proofgrid_mode = local\r\n proofgrid_restarts = 10\r\nINFO (IPF036): Starting proof on task: "<embedded>", 1 properties to prove with 0 already proven/unreachable\r\nINFO (IRS029): Starting reset analysis: phase 1 of 4.\r\nINFO (IRS030): Running reset analysis phase 2 of 4.\r\nINFO (IRS031): Running reset analysis phase 3 of 4.\r\nINFO (IRS020): Starting the reset analysis simulation with a limit of 100 iterations (phase 4 of 4).\r\nINFO (IRS024): Reset iterations 0 to 4 analyzed.\r\nINFO (IRS018): Reset analysis simulation executed for 3 iterations. Assigned values for 280 of 4626 design flops, 0 of 32 design latches, 136 of 2696 internal elements.\r\nUsing multistage preprocessing\r\nStarting reduce\r\nFinished reduce in 0.192s\r\n0.PRE: A proof was found: No trace exists. [0.00 s]\r\nINFO (IPF057): 0.PRE: The property "wrapper.x0_nouse" was proven in 0.00 s.\r\nFound proofs for 1 properties in preprocessing\r\nINFO (IPF059): Completed proof on task: <embedded>\r\nproven\r\n[<embedded>] % " (spawn_id exp4) match glob pattern "*proven\r\n[<embedded>] % "? no
"*cex\r\n[<embedded>] % "? no
Sorry the line is a bit long, but if you scroll to the right most, you will see that the end of that line is exactly what is expected in the first case.
You are providing a glob pattern and rightly need to escape the [] so that it is not executed by tcl as a command. You need to further escape [] so that [abc] matches the literal string [abc] and not just a character from the set abc.
However, when quoting with "", the backslash needs to be escaped too, so you need as your glob pattern
"proven\r\n\\\[<embedded>\\\] % "
As an alternative you can try for an exact match without a glob
-exact "proven\r\n\[<embedded>\] % "
You can also use {} instead of "" but then your \r and so on will not be converted.

How to pass arguments to tcl scripts when using tclsh [duplicate]

This is the code in TCL that is meant to produce factorial of a number given as parameter by the user.
if {$argc !=1}{
puts stderr "Error! ns called with wrong number of arguments! ($argc)"
exit 1
} else
set f [lindex $argv 0]
proc Factorial {x}{
for {set result 1} {$x>1}{set x [expr $x - 1]}{
set result [expr $result * $x]
}
return $result
}
set res [Factorial $f]
puts "Factorial of $f is $res"
There is a similar SO question, but it does not appear to directly address my problem. I have double-checked the code for syntax errors, but it does not compile successfully in Cygwin via tclsh producing the error:
$ tclsh ext1-1.tcl
extra characters after close-brace
while executing
"if {$argc !=1}{
puts stderr "Error! ns called with wrong number of arguments! ($argc)"
exit 1
} else
set f [lindex $argv 0]
proc Factorial {x}{..."
(file "ext1-1.tcl" line 3)
TCL Code from: NS Simulator for Beginners, Sophia-Antipolis, 2003-2004
Tcl is a little bit more sensitive about whitespace than most languages (though not as much as, say, Python). For instance, you can't add unescaped newlines except between commands as command separators. Another set of rules are that 1) every command must be written in the same manner as a proper list (where the elements are separated by whitespace) and 2) a command invocation must have exactly the number of arguments that the command definition has specified.
Since the invocation must look like a proper list, code like
... {$x>1}{incr x -1} ...
won't work: a list element that starts with an open brace must end with a matching close brace, and there can't be any text immediately following the close brace that matches the initial open brace. (This sounds more complicated than it is, really.)
The number-of-arguments requirement means that
for {set result 1} {$x>1}{incr x -1}{
set result [expr $result * $x]
}
won't work because the for command expects four arguments (start test next body) and it's only getting two, start and a mashup of the rest of other three (and actually not even that, since the mashup is illegal).
To make this work, the arguments need to be separated:
for {set result 1} {$x>1} {incr x -1} {
set result [expr {$result * $x}]
}
Putting in spaces (or tabs, if you want) makes the arguments legal and correct in number.

Looking for a search string in a file and using those lines for processing in TCL

To be more precise:
I need to be looking into a file abc.txt which has contents something like this:
files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100
The script needs to find "!!" and use those lines to print out the following as output:
atmp2.c 75
btmp2.c 85
Any help?
this should do the trick.
set data {files/f1/atmp.c 98 100
files/f1/atmp1.c 89 100
files/f1/atmp2.c !! 75 100
files/f2/btmp.c 92 100
files/f2/btmp2.c !! 85 100
files/f3/xtmp.c 92 100}
set lines [split $data \n]
foreach line $lines {
set match [regexp {(\S+)\s+!!\s+(\d+)} $line -> file num]
if {$match} {puts "$file $num"}
}
Although regexp has a -all switch I don't think we can use it here as we only get the last match vars with -all
If your file isn't huge, you can slurp the whole thing into memory, split the lines into a TCL list, and then iterate through the list looking for a match. For example:
set fh [open foo]
set lines [read $fh]
close $fh
set lines [split $lines "\n"]
foreach line $lines {
if { [regexp {.*/(\S+\.c)\s*!!\s*(\d+)} $line match file data] } {
puts "$file $data"
}
}
This will successfully return just the lines with "!!" in them. With your posted corpus, the results are:
atmp2.c 75
btmp2.c 85
I might be tempted in this case to exec to awk:
set output [exec awk {$2 == "!!" {print $1, $3}} abc.txt]
puts $output
The trick is to combine the code that reads lines from the file with a regular expression that detects matching lines and extracts the relevant parts (a one-step process with regexp). The only tricky part is working out what exactly to use as the regular expression, so that you get exactly what you want. I'm going to guess that you're after the parts of the filenames after the /, that those filenames won't contain spaces, and that the number you're after is the entirety of the first digit sequence after the double exclamation. (Other formats are possible, some of which are easier to extract with other tools such as scan.) That would give us something like this:
set f [open abc.txt]
while {[gets $f line] >= 0} {
if {[regexp {([^\s/]+)\s+!!\s+(\d+)} $line -> name value]} {
# Or do whatever you want with these
puts "$name $value"
}
}
close $f
(The gets command with two arguments returns the length of line read, or -1 on failure. For normal files the only failure mode is EOF, so we can just terminate the loop when we get a negative value. Other kinds of channels can be more complex…)

How to manipulate each line in a file in TCL

I'm trying to write some data from iperf to a file using tcl script.The file has more than 100 lines. Now i need to parse the first 10 lines, neglect it and consider the next set of 10 lines and print it, again i need to neglect the next set of 10 lines and print the next 10 lines and keep continuing until i reach the end of file. How could i do it programmatic ally?
exec c:\\iperf_new\\iperf -c $REF_WLAN_IPAddr -f m -w 2M -i 1 -t $run_time > xx.txt
set fp [open "xx.txt" r ]
set file_data [read $fp]
set data [split $file_data "\n"]
foreach line $data {
if {[regexp {(MBytes) +([0-9\.]*)} $line match pre tput]==1 } {
puts "Throughput: $tput Mbps"
}
Well, as your example shows, you have found out how to split a (slurped) file into lines and process them one-by-one.
Now what's the problem with implementing "skip ten lines, process ten lines, skip another ten lines etc"? It's just about using a variable which counts lines seen so far plus selecting a branch of code based on its value. This approach has nothing special when it comes to Tcl: there are commands available to count, conditionally select branches of code and control looping.
If branching based on the current value of a line counter looks too lame, you could implement a state machine around that counter variable. But for this simple case it looks like over-engeneering.
Another approach would be to pick the necessary series of lines out of the list returned by split using lrange. This approach might use a nice property of lrange which can be told to return a sublist "since this index and until the end of the list", so the solution really boils down to:
set lines [split [read $fd] \n]
parse_header [lrange $lines 0 9]
puts [join [lrange $lines 10 19] \n]
parse_something_else [lrange 20 29]
puts [join [lrange $lines 30 end] \n]
For a small file this solution looks pretty compact and clean.
If I understood you correctly, you want to print lines 11-20, 31-40, 51-60,... The following will do what you want:
package require Tclx
set counter 0
for_file line xxx.txt {
if {$counter % 20 >= 10} { puts $line }
incr counter
}
The Tclx package provides a simple way to read lines from a file: the for_file command.

How to ensure my regular expression does not match too much

A file has few words with numbers in the begining of them. i want to extract a particular no line.when given 1, it extracts line 1 also with 11, 21
FILE.txt has contents:
1.sample
lines of
2.sentences
present in
...
...
10.the
11.file
when Executed pro 1 file.txt
gives results from line 1,10 and also from line 11
as these three results have 1 in their string. i.e
Output of the script:
1.sample
10.the
11.file
Expected output: the output which i am expecting
is only line 1 contents and not the line 10 or line 11 contents.
i.e
Expected output:
1.sample
My current code:
proc pro { pattern args} {
set file [open $args r]
set lnum 0
set occ 0
while {[gets $file line] >=0} {
incr lnum
if {[regexp $pattern $line]} {
incr occ
puts "The pattern is present in line: $lnum"
puts "$line"
} else {
puts "not found"
}
}
puts "total number of occurencese : $occ"
close $file
}
the program is working fine but the thing is i am retrieving lines that i dont want to along with the expected line. As the number (1) which i want to retrieve is present in the other strings such as 11, 21, 14 etc these lines are also getting printed.
kindly tolerate my unclear way of explaining the question.
You can solve the problem using word boundaries as suggested by glen but you can also consider the following things:
If after every line number there is a . then you can use it as delimiter in regular expression
regexp "^$lineNo\\." $a
I would also suggest to use ^ (match at the beginning of line) so that even if number is present in the line elsewhere it would not get counted.
tcl word boundaries are well explained at http://www.regular-expressions.info/wordboundaries.html
You have to ensure your pattern matches only between word boundaries:
if {[regexp "\\m$pattern\\M" $line]} { ...
See the documentation for regular expression syntax.
If what you're looking to do is as constrained as what you're describing, why not just use something like
if { [string range $line 0 [string length $pattern]] eq "${pattern}." } {
...
}