The expect does not match as expected from the debugging mode in my case and I don't understand why...
The part of related tcl is like this
...
set index 0
set p [lindex $propname $index]
send "prove -property {<embedded>::wrapper.$p}\r"
expect {
"*proven\r\n\[<embedded>\] % " {
incr index
if {$index == [llength $propname]} {
send "exit\r"
expect "*bash-4.2$ "
send "exit\r"
close
}
set p [lindex $propname $index]
send "prove -property {<embedded>::wrapper.$p}\r"
exp_continue
}
"*cex\r\n\[<embedded>\] % " {
send "visualize -violation -property <embedded>::wrapper.$p\r"
expect "*\[<embedded>\] % "
send "visualize -save -vcd cex.vcd -force\r"
}
}
...
From the output of the debugging mode:
expect: does "prove -property {<embedded>::wrapper.x0_nouse}\r\nINFO (IPF031): Settings used for this proof:\r\n time_limit = 86400s\r\n per_property_time_limit = 1s * 10 ^ scan\r\n engine_mode = Hp Ht N B \r\n proofgrid_per_engine_max_jobs = 1\r\n proofgrid_mode = local\r\n proofgrid_restarts = 10\r\nINFO (IPF036): Starting proof on task: "<embedded>", 1 properties to prove with 0 already proven/unreachable\r\nINFO (IRS029): Starting reset analysis: phase 1 of 4.\r\nINFO (IRS030): Running reset analysis phase 2 of 4.\r\nINFO (IRS031): Running reset analysis phase 3 of 4.\r\nINFO (IRS020): Starting the reset analysis simulation with a limit of 100 iterations (phase 4 of 4).\r\nINFO (IRS024): Reset iterations 0 to 4 analyzed.\r\nINFO (IRS018): Reset analysis simulation executed for 3 iterations. Assigned values for 280 of 4626 design flops, 0 of 32 design latches, 136 of 2696 internal elements.\r\nUsing multistage preprocessing\r\nStarting reduce\r\nFinished reduce in 0.192s\r\n0.PRE: A proof was found: No trace exists. [0.00 s]\r\nINFO (IPF057): 0.PRE: The property "wrapper.x0_nouse" was proven in 0.00 s.\r\nFound proofs for 1 properties in preprocessing\r\nINFO (IPF059): Completed proof on task: <embedded>\r\nproven\r\n[<embedded>] % " (spawn_id exp4) match glob pattern "*proven\r\n[<embedded>] % "? no
"*cex\r\n[<embedded>] % "? no
Sorry the line is a bit long, but if you scroll to the right most, you will see that the end of that line is exactly what is expected in the first case.
You are providing a glob pattern and rightly need to escape the [] so that it is not executed by tcl as a command. You need to further escape [] so that [abc] matches the literal string [abc] and not just a character from the set abc.
However, when quoting with "", the backslash needs to be escaped too, so you need as your glob pattern
"proven\r\n\\\[<embedded>\\\] % "
As an alternative you can try for an exact match without a glob
-exact "proven\r\n\[<embedded>\] % "
You can also use {} instead of "" but then your \r and so on will not be converted.
Related
When running commands interactively at the tclsh command line, is there a way to truncate how much of a return value gets printed to stdout?
For example, this will take a very long time because the return value will print to stdout.
tclsh> set a [lrepeat 500000000 x]
I know I can add a dummy command in the same line, but this is an ad hoc solution. Is there some I could set in my ~/.tclshrc to truncate stdout to a finite length?
tclsh> set a [lrepeat 500000000 x] ; puts ""
Maybe this is an XY-problem (as turning off or swallowing prints to stdout seems to satisfy the OP), but the actual question was:
Is there some I could set in my ~/.tclshrc to truncate stdout to a
finite length?
You can use an interceptor on stdout (and/ or, stderr) to cap strings to a default limit:
oo::class create capped {
variable max
constructor {m} {
set max $m
}
method initialize {handle mode} {
if {$mode ne "write"} {error "can't handle reading"}
return {finalize initialize write}
}
method finalize {handle} {
# NOOP
}
method write {handle bytes} {
if {[string length $bytes] > $max} {
set enc [encoding system]
set str [encoding convertfrom $enc $bytes]
set newStr [string range $str 0 $max-1]
if {[string index $str end] eq "\n"} {
append newStr "\n"
}
set bytes [encoding convertto $enc $newStr]
}
return $bytes
}
}
Using chan push and chan pop you may turn on/off capping to, e.g., 30 characters:
% chan push stdout [capped new 30]
serial1
% puts [string repeat € 35]
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
% chan pop stdout
% puts [string repeat € 35]
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
Some remarks:
You can use an object, a namespace, or a proc offering the required interface of channel interceptors (initialize, write, ...); I prefer objects.
Ad write: You want to cap based on a character-based limit, not a byte-level one. However, write receives a string of bytes, not a string of characters. So, you need to be careful when enforcing the limit (back-transform the byte string into a char string, and vice versa, using encoding convertfrom and encoding convertto).
Similar, whether certain values of max might not be a good choice or the value range should be restricted. E.g., a max of 1 or 0 will turn off the basic REPL (the prompt % ), effectively.
As for tclshrc: You may want place the interceptor definition and chan push call therein, to enable capping per default?
tclsh is a REPL, and the "P" there is what you're seeing. Without digging into the source, I don't know that there's a simple way to accomplish exactly what you're asking.
If I remember to do it, the list command is useful to provide no output
set a [lrepeat 500000000 x]; list
or perhaps something informative
set a [lrepeat 500000000 x]; llength $a
If you want to get programmy:
proc i {val} {set ::tcl_interactive $val}
Then do i off or i 0 or i false to turn off interactivity and then execute the commands with large results. Going non-interactive silences the printing of command results, but it also turns off the prompt which could be confusing. Restore interactivity with i on or i 1 or i true
TCL Program Sample:
proc fun { x } {
puts "$$x = $x"
}
set a 10
fun $a
In this above program which prints the output as $10 = 10 But i would like to get a = 10 has the output. The variable which passes the values has to be read and the corresponding values as well. Is there a way to read the variable name.
proc fun name {
upvar 1 $name var
puts "$name = $var"
}
set a 10
fun a
The upvar command takes a name and creates an alias of a variable with that name.
Documentation:
proc,
puts,
set,
upvar
If you've got a currently-supported version of Tcl (8.5 or 8.6), you can use info frame -1 to find out some information about the caller. That has all sorts of information in it, but we can do a reasonable approximation like this:
proc fun { x } {
set call [dict get [info frame -1] cmd]
puts "[lindex $call 1] = $x"
}
set a 10
fun $a
# ==> $a = 10
fun $a$a
# ==> $a$a = 1010
Now, the use of lindex there is strictly wrong; it's Tcl code, not a list (and you'll see the difference if you use a complex command substitution). But if you're only ever using a fairly simple word, it works well enough.
% set x a
a
% set a 10
10
% eval puts $x=$$x
a=10
% puts "$x = [subst $$x]"
a = 10
% puts "$x = [set $x]"
a = 10
%
If you are passing the variable to a procedure, then you should rely on upvar.
This is the code in TCL that is meant to produce factorial of a number given as parameter by the user.
if {$argc !=1}{
puts stderr "Error! ns called with wrong number of arguments! ($argc)"
exit 1
} else
set f [lindex $argv 0]
proc Factorial {x}{
for {set result 1} {$x>1}{set x [expr $x - 1]}{
set result [expr $result * $x]
}
return $result
}
set res [Factorial $f]
puts "Factorial of $f is $res"
There is a similar SO question, but it does not appear to directly address my problem. I have double-checked the code for syntax errors, but it does not compile successfully in Cygwin via tclsh producing the error:
$ tclsh ext1-1.tcl
extra characters after close-brace
while executing
"if {$argc !=1}{
puts stderr "Error! ns called with wrong number of arguments! ($argc)"
exit 1
} else
set f [lindex $argv 0]
proc Factorial {x}{..."
(file "ext1-1.tcl" line 3)
TCL Code from: NS Simulator for Beginners, Sophia-Antipolis, 2003-2004
Tcl is a little bit more sensitive about whitespace than most languages (though not as much as, say, Python). For instance, you can't add unescaped newlines except between commands as command separators. Another set of rules are that 1) every command must be written in the same manner as a proper list (where the elements are separated by whitespace) and 2) a command invocation must have exactly the number of arguments that the command definition has specified.
Since the invocation must look like a proper list, code like
... {$x>1}{incr x -1} ...
won't work: a list element that starts with an open brace must end with a matching close brace, and there can't be any text immediately following the close brace that matches the initial open brace. (This sounds more complicated than it is, really.)
The number-of-arguments requirement means that
for {set result 1} {$x>1}{incr x -1}{
set result [expr $result * $x]
}
won't work because the for command expects four arguments (start test next body) and it's only getting two, start and a mashup of the rest of other three (and actually not even that, since the mashup is illegal).
To make this work, the arguments need to be separated:
for {set result 1} {$x>1} {incr x -1} {
set result [expr {$result * $x}]
}
Putting in spaces (or tabs, if you want) makes the arguments legal and correct in number.
I am new to TCL and trying to learn by doing some simple scripting, I have taken upon to write a simple script which generates valid ip address from a given starting ip address.
I have managed to write one but have run into two problems,
The last octet has a zero getting added in front of the number that is 192.168.1.025
When i specify the starting ip something like this 250.250.5.1 it fails to generate proper ips,
Below is my code:
proc generate {start_addr total_addr} {
if {$total_addr == 0} {return}
regexp {([0-9]+\.)([0-9]+\.)([0-9]+\.)([0-9]+)} $start_addr match a b c d
set filename "output.txt"
set fileId [open $filename "a"]
puts $fileId $a$b$c$d
close $fileId
while {$a<255 && $b <255 && $c <255 && $d < 255 } {
set d [expr {$d + 1}];
set filename "output.txt"
set fileId [open $filename "a"]
puts $fileId $a$b$c$d
close $fileId
set total_addr [expr {$total_addr - 1}];
if {$total_addr == 1} {return}
if {$total_addr > 1 && $d == 255} {
set c [expr {$c + 1}];
set d 1
set filename "output.txt"
set fileId [open $filename "a"]
puts $fileId $a$b$c$d
close $fileId
set total_addr [expr {$total_addr - 1}];
}
if {$total_addr > 1 && $c==255 && $d == 255} {
set b [expr {$b + 1}];
set c 1
set d 1
set filename "output.txt"
set fileId [open $filename "a"]
puts $fileId $a$b$c$d
close $fileId
set total_addr [expr {$total_addr - 1}];
}
if {$total_addr > 1 && $b == 255 && $c == 255 && $d == 255} {
set a [expr {$a + 1}];
set b 1
set c 1
set d 1
set filename "output.txt"
set fileId [open $filename "a"]
puts $fileId $a$b$c$d
close $fileId
set total_addr [expr {$total_addr - 1}];
}
}
}
flush stdout
puts "Please enter the starting IPv4 address with . as delimiter EX: 1.1.1.1"
set start_addr [gets stdin]
regexp {([0-9]+\.)([0-9]+\.)([0-9]+\.)([0-9]+)} $start_addr match a b c d
if {$a <= 255 & $b <= 255 & $c <= 255 & $d <= 255} {
puts "this is a valid ip address"
} else {
puts "this not a valid ip address"
}
flush stdout
puts "Please enter the total number of IPv4 address EX: 1000"
set total_addr [gets stdin]
set result [generate $start_addr $total_addr]
For parsing an IP address the simple way, it is better to use scan. If you know C's sscanf() function, Tcl's scan is very similar (in particular, %d matches a decimal number). Like that, we can do:
if {[scan $start_addr "%d.%d.%d.%d" a b c d] != 4} {
error "some components of address are missing"
}
It's a good idea to throw an error when things go wrong. You can catch them later or just let the script exit, depending on what's right for you. (You still need to check the number range.)
More generally, there's a package in Tcllib that does IP address parsing. It is far more complete than you're likely to need, but it's there.
Second major thing that you should do? Factor out the code to append a string to a file. It's can be a short procedure, short enough that it is obviously right.
proc addAddress {filename address} {
set fileId [open $filename "a"]
puts $fileId $address
close $fileId
}
Then you can replace:
set filename "output.txt"
set fileId [open $filename "a"]
puts $fileId $a$b$c$d
close $fileId
With:
addAddress "output.txt" $a$b$c$d
Less to go wrong. Less noise. (Protip: consider $a.$b.$c.$d there.)
More seriously, your code is just really unlikely to work. It's too complicated. In particular, you should generate one address each time through the loop, and you should concentrate on how to advance the counters right. Using incr to add one to an integer is highly recommended too.
You might try something like this:
incr d
if {$d > 255} {
set d 1
incr c
}
if {$c > 255} {
set c 1
incr b
}
if {$b > 255} {
set b 1
incr a
}
if {$a > 255} {
set a 1
}
But that's less than efficient. We can do better with this:
if {[incr d] > 255} {
set d 1
if {[incr c] > 255} {
set c 1
if {[incr b] > 255} {
set b 1
if {[incr a] > 255} {
set a 1
}
}
}
}
That's better (though actual valid IP addresses have a wider range: you can have a 0 or two in the middle, such as in 127.0.0.1…)
Splitting the address
Apart from using the ip package in Tcllib, there are a few ways to split up an IPv4 "dot-decimal" address and put the octet values into four variables. The one you used was
regexp {([0-9]+\.)([0-9]+\.)([0-9]+\.)([0-9]+)} $start_addr match a b c d
This basically works, but there are a couple of problems with it. The first problem is that the address 1.234.1.234 will be split up as 1. 234. 1. 234, and then when you try to use the incr command on the first three variables you will get an error message (I suppose that's why you used expr {$x + 1} instead of incr). Instead, write
regexp {(\d+)\.(\d+)\.(\d+)\.(\d+)} $start_addr match a b c d
This expression puts the dots outside the capturing parentheses and places integer values into the variables. It's also a good idea to use the shorthand \d (decimal digit) instead of the [0-9] sets. But you could also do this:
regexp -all -inline -- {\d+} $start_addr
where you simply ask regexp to collect all (-all) unbroken sequences of decimal digits and return them as a list (-inline). Since you get the result as a list, you then need to lassign (list assign) them into variables:
lassign [regexp -all -inline -- {\d+} $start_addr] a b c d
But if you can make do without a regular expression, you should. Donal suggested
scan $start_addr "%d.%d.%d.%d" a b c d
which is fine. Another way is to split the string at the dots:
lassign [split $start_addr .] a b c d
(again you get a list as the result and need to assign it to your variables in a second step).
Checking the result
As Donal wrote, it's a good idea whenever you create data from user input (and in many other situations as well) to check that you did get what you expected to get. If you use an assigning regexp the command returns 1 or 0 depending on whether the matched succeeded or failed. This result can be plugged directly into an if invocation:
if {![regexp {(\d+)\.(\d+)\.(\d+)\.(\d+)} $start_addr match a b c d]} {
error "input data didn't match IPv4 dot-decimal notation"
}
Donal already gave an example of checking the result of scan. In this case you check against 4 since the command returns the number of successful matches it managed.
if {[scan $start_addr "%d.%d.%d.%d" a b c d] != 4} {
error "input data didn't match IPv4 dot-decimal notation"
}
If you use either of the list-creating commands (inline regexp or split) you can check the list length of the result:
if {[llength [set result [split $start_addr .]]] == 4} {
lassign $result a b c d
} else {
error "input data didn't match IPv4 dot-decimal notation"
}
This check should be followed by checking all variables for octet values (0-255). One convenient way to do this is like this:
proc isoctet args {
::tcl::mathop::* {*}[lmap octet $args {expr {0 <= $octet && $octet <= 255}}]
}
(It's usually a good idea to break out tests as functions; it's practically the law* if you are using the tests in several places in your code.)
This command, isoctet, takes a number of values as arguments, lumping them together as a list in the special parameter args. The lmap command creates a new list with the same number of elements as the original list, where the value of each element is the result of applying the given script to the corresponding element in the original list. In this case, lmap produces a list of ones and zeros depending on whether the value was a true octet value or not. Example:
input list: 1 234 567 89
result list: 1 1 0 1
The resulting list is then expanded by {*} into individual arguments to the ::tcl::mathop::* command, which multiplies them together. Why? Because if 1 and 0 can be taken as true and false values, the product of a list of ones and zeros happens to be exactly the same as the logical conjunction (AND, &&) of the same list.
result 1: 1 1 0 1
product : 0 (false)
result 2: 1 1 1 1
product : 1 (true)
So,
if {![isoctet $a $b $c $d]} {
error "one of the values was outside the (0, 255) range"
}
Generating new addresses
Possibly the least sexy way to generate a new address is to use a ready-made facility in Tcl: binary.
binary scan [binary format c* [list $a $b $c $d]] I n
This invocation first converts a list of integer values (while constraining them to octet size) to a bit string, and then interprets that bit string as a big-endian 32-bit integer (if your machine uses little-endian integers, you should use the conversion specifier i instead of I).
Increment the number. Wheee!
incr n
Convert it back to a list of 8-bit values:
binary scan [binary format I $n] c4 parts
The components of parts are now signed 8-bit integers, i.e. the highest value is 127, and the values that should be higher than 127 are now negative values. Convert the values to unsigned (0 - 255) values like this:
lassign [lmap part $parts {expr {$part & 0xff}}] a b c d
and join them up to a dot-decimal string like this:
set addr [join [list $a $b $c $d] .]
If you want more than one new address, repeat the process.
Documentation: binary, error, expr, if, incr, join, lassign, llength, lmap, mathop, proc, regexp, scan, set, split, {*}
lmap is a Tcl 8.6 command. Pure-Tcl implementations for Tcl 8.4 and 8.5 are available here.
*) If there were any laws. What you must learn is that these rules are no different than the rules of the Matrix. Some of them can be bent. Others can be broken.
proc ip_add { ip add } {
set re "^\\s*(\\d+)\.(\\d+)\.(\\d+)\.(\\d+)\\s*$"
if [regexp $re $ip match a b c d] {
set x [expr {(($a*256+$b)*256+$c)*256+$d+$add}]
set d [expr {int(fmod($x,256))}]
set x [expr {int($x/256)}]
set c [expr {int(fmod($x,256))}]
set x [expr {int($x/256)}]
set b [expr {int(fmod($x,256))}]
set x [expr {int($x/256)}]
set a [expr {int(fmod($x,256))}]
return "$a.$b.$c.$d"
} else {
puts stderr "invalid ip $ip"
exit 1
}
}
set res [ip_add "127.0.0.1" 512]
puts "res=$res"
I have this Tcl8.5 code:
set regexp_str {^[[:blank:]]*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec[[:blank:]]*$}
set subject {
[ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec
[ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec
[ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec
}
set matches [regexp -line -inline -all -- $regexp_str $subject]
$matches populates with the matched data on one machine, while the other simply gets an empty list.
Both machines have Tcl8.5.
Using the -about flag of regexp, the following list is returned: 3 {REG_UUNPORT REG_ULOCALE}
I don't understand how could this be possible and what else should I do to debug it?
Edit #1, 17 Feb 07:00 UTC:
#Donal Fellows:
The patch level on the "good" machine is 8.5.15.
The patch level on the "bad" machine is 8.5.10.
I'm familiar with \s and \d, but as far as I know (please correct me), they both mean to a broader characters range than I need to:
\s includes newlines, which in my example mustn't exists.
\d includes Unicode digits, which I will not encounter in my example.
In regexp I generally prefer to be as specific as possible to avoid cases I didn't think of..
There's something which I didn't specify and could be important:
The variable $subject is populated using the expect_out(buffer) variable, following a grep command executed in shell.
expect_out(buffer) returns the output from a ssh session that is tunneled using a proxy called netcat (binary name is nc):
spawn ssh -o "ProxyCommand nc %h %p" "$username#$ipAddress"
In general, the output received & sent on this session is only ASCII/English characters.
The prompt of the destination PC contains control characters like ESC and BEL and they are contained in $subject.
I don't think of it to be a problem because that I tested the regular expression with all of these characters and it worked OK.
Thank you guys for the elaborated info!
Edit #2, 17 Feb 11:05 UTC:
Response to #Donal Fellows:
Indeed I've tried:
set regexp_str {^[[:blank:]]*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec[[:blank:]]*$}
puts [regexp -line -inline -all -- $regexp_str [string map {\r\n \n \r \n} $subject]]
and got (please ignore the different numbers in the output, the idea is the same):
{[ 5] 0.0-150.0 sec 86.7 MBytes 4.85 Mbits/sec} 150.0 4.85 M {[ 4] 0.0-150.8 sec 60.4 MBytes 3.36 Mbits/sec} 150.8 3.36 M
Also I tried to replace the [[:blank:]] from both sides of regexp string with \s:
set regexp_str {^\s*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec\s*$}
puts [regexp -line -inline -all -- $regexp_str $subject]
and it finally found what I needed:
{[ 5] 0.0-150.0 sec 86.7 MBytes 4.85 Mbits/sec
} 150.0 4.85 M {[ 4] 0.0-150.8 sec 60.4 MBytes 3.36 Mbits/sec
} 150.8 3.36 M
Tcl uses the same regular expression engine on all platforms. (But double-check whether you've got the same patchlevel on the two machines; that'll let us examine what — if any — exact code changes might there be between the systems.) It also shouldn't be anything related to newline terminators; Tcl automatically normalizes them under anything even remotely resembling normal circumstances (and in particular, does so in scripts).
With respect to the -about flags, only the 3 is useful (it's the number of capture groups). The other item in the list is the set of state flags set about the RE by the RE compiler, and frankly they're only useful to real RE experts (and our test suite). I've never found a use for them!
You can probably shorten your RE by using \s (mnemonically “spaces”) instead of that cumbersome [[:blank:]] and \d (“digits”) instead of [0-9]. When I do that, I get something quite a lot shorter and so easier to understand.
set regexp_str {^\s*\[\s*\d+\]\s+0\.0-(\d+\.\d+) sec.+(\d+\.\d+) ([MK]?)bits/sec\s*$}
It produces the same match groups.
[EDIT]: Even with the exact version of the code you report, checked out directly from the source code repository tag that was used to drive the 8.5.10 distribution, I can't reproduce your problem. However, the fact that it's really coming from an Expect buffer is really helpful; the problem may well actually be that the line separation sequence is not a newline but rather something else (CRLF — \r\n — is the number 1 suspect, but a plain carriage return could also be there). Expect is definitely not the same as normal I/O for various reasons (in particular, exact byte sequences are often needed in terminal handling).
The easiest thing might be to manually standardize the line separators before feeding the string into regexp. (This won't affect the string in the buffer; it copies, as usual with Tcl.)
regexp -line -inline -all -- $regexp_str [string map {\r\n \n \r \n} $subject]
It's also possible that there are other, invisible characters in the output. Working out what is really going on can be complex, but in general you can use a regular expression to test this theory by looking to see if the inverse of the set of expected characters is matchable:
regexp {[^\n [:graph:]]} $subject
When I try with what you pasted, that doesn't match (good!). If it does against your real buffer, it gives you a way to hunt the problem.
I saw that you are missing optional space(s) right after the first dash. I inserted those optional spaces in and all is working:
set regexp_str {^[[:blank:]]*\[[[:blank:]]*[0-9]+\][[:blank:]]+0\.0\-[[:blank:]]*([0-9]+\.[0-9]+) sec.+([0-9]+\.[0-9]+) ([MK]?)bits/sec[[:blank:]]*$}
# missing --> ^^^^^^^^^^^^
set subject {
[ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec
[ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec
[ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec
}
set matches [regexp -line -inline -all -- $regexp_str $subject]
puts "\n\n"
foreach {all a b c} $matches {
puts "- All: >$all<"
puts " >$a<"
puts " >$b<"
puts " >$c<"
}
Output
- All: > [ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec<
>1.0<
>5.57<
>M<
- All: > [ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec<
>150.0<
>8.56<
>M<
- All: > [ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec<
>1.0<
>0.00<
><
- All: > [ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec<
>150.4<
>2.15<
>M<
Update
When dealing with complex regular expression, I often break up the expression into several lines and add comments. The following is equivalent to my previous code, but more verbose and easier to troubleshoot. The key is to use and additional flag to the regexp command: the -expanded flag, which tells regexp to ignore any white spaces and comments in the expression.
set regexp_str {
# Initial blank
^[[:blank:]]*
# Bracket, number, optional spaces, bracket
\[[[:blank:]]*[0-9]+\]
# Spaces
[[:blank:]]+
# Number, dash, number
0\.0\-[[:blank:]]*([0-9]+\.[0-9]+)
# Unwanted stuff
[[:blank:]]sec.+
# Final number, plus unit
([0-9]+\.[0-9]+)[[:blank:]]([MK]?)bits/sec
# Trailing spaces
[[:blank:]]*$
}
set subject {
[ 5] 0.0- 1.0 sec 680 KBytes 5.57 Mbits/sec
[ 5] 0.0-150.0 sec 153 MBytes 8.56 Mbits/sec
[ 4] 0.0- 1.0 sec 0.00 Bytes 0.00 bits/sec
[ 4] 0.0-150.4 sec 38.6 MBytes 2.15 Mbits/sec
}
set matches [regexp -expanded -line -inline -all -- $regexp_str $subject]
puts "\n\n"
foreach {all a b c} $matches {
puts "- All: >$all<"
puts " >$a<"
puts " >$b<"
puts " >$c<"
}
(ETA: the question is about regular expressions, so why am I talking about massaging a string into a list and picking items out of that? See the end of this answer.)
As a workaround, if you don't really need to use a regular expression, this code gives the exact same result:
set result [list]
foreach line [split [string trim $subject] \n] {
set list [string map {- { } / { }} $line]
lappend result \
$line \
[lindex $list 3] \
[lindex $list 7] \
[string map {Mbits M Kbits K bits {}} [lindex $list 8]]
}
The lines aren't strictly well-formed lists because of the brackets, but it does work.
To clarify:
the string trim command takes out the newlines before and after the data: they would otherwise yield extra empty elements
the split command creates a list of four elements, each corresponding to a line of data
the foreach command processes each of those elements
the string map command changes each - or / character into a space, essentially making it a (part of a) list item separator
the lappend incrementally builds the result list out of four items per line of data: the items are the whole line, the fourth item in the corresponding list, the eight item in the corresponding list, and the ninth item in the corresponding list after the string map command has shortened the strings Mbits, Kbits, and bits to M, K, and the empty string, respectively.
The thing is (moderate rant warning): regular expression matching isn't the only tool in the string analysis toolbox, even though it sometimes looks that way. Tcl itself is, among other things, a powerful string and list manipulation language, and usually far more readable than RE. There is also, for instance, scan: the scan expression "[ %*d] %*f- %f sec %*f %*s %f %s" captures the relevant fields out of the data strings (provided they are split into lines and processed separately) -- all that remains is to look at the last captured string to see if it begins with M, K, or something else (which would be b). This code gives the same result as my solution above and as your example:
set result [list]
foreach line [split [string trim $subject] \n] {
scan $line "\[ %*d\] %*f- %f sec %*f %*s %f %s" a b c
lappend result $line $a $b [string map {its/sec {} Mb M Kb K b {}} $c]
}
Regular expressions are very useful, but they are also hard to get right and to debug when they aren't quite right, and even when you've got them right they're still hard to read and, in the long run, to maintain. Since in very many cases they are actually overkill, it makes sense to at least consider if other tools can't do the job instead.