I am trying to write a Tcl script in which I need to match a variable in a regular expression.
For instance, file has some lines of code containing 'major'. Out of all these lines I need to identify one particular line:
major("major",0x32)
I m using variable p1 for 'major' (set p1 major)
How can I write a regexp using variable p1 ($p1) to capture that particular line?
regexp -- "$p1\\(\"$p1\",0x32\\)" $line match
In tclsh:
% set line {major("major",0x32)}
major("major",0x32)
% set p1 major
major
% regexp -- "$p1\\(\"$p1\",0x32\\)" $line match
1
% puts $match
major("major",0x32)
Use a String Match
If you just want to know whether a single line matches, you can test for string match rather than a regular expression. This is often faster and less finicky. For example:
set fh [open /tmp/foo]
set lines [read $fh]
close $fh
set p1 major
set lines [split $lines "\n"]
foreach line $lines {
if {[string match *$p1* $line]} {set match $line}
}
puts $match
Note that this will store the entire line in match, and not just the search pattern. This is probably what you want, but your mileage may vary.
Related
I'm having a problem that involves an increment of a constant in a VHDL library .vhd file.
I need to create a tcl script that will look for a specific line from the library file:
constant a : integer :=0;
and will increment the 0 to a 1 and the 1 to a 2 with every run of the tcl script.
If the file isn't very large (no more than, oh, 100MB) then the easiest way is to read it all in, find the line, do the update "in place" in memory, and then write it all back out again.
# read the file
set f [open theFile.vhd]
set lines [split [read $f] "\n"]
close $f
# find the line with a regular expression (remember, brace your REs!)
set RE {^\s*constant\s+a\s*:\s*integer\s*:=\s*(\d+)\s*;\s*$}
set idx [lsearch -regexp $lines $RE]
# extract the current value and update it
regexp $RE [lindex $lines $idx] -> value
incr value
# write back into the list of lines
lset lines $idx "constant a:integer := $value;"
# write the lines back to the file
set f [open the file.vhd w]
puts -nonewline $f [join $lines "\n"]
close $f
In this case we are using the regular expression twice, and I made the RE by taking the line and putting in whitespace matchers (\s+/\s*) and number matchers (\d+) in the sensible places.
Trying to match a certain words in a directory path and extract the value, example
C:\working\Ever7\FILE\
I need to extract Ever7 from the path and it works well
set seq $name
set aa [split $seq \\]
set bb [lsearch -inline $aa Ev*]
set seq_number $bb
Now my question is the Ever7 it varys, means it will be other words, there are Ever7, Mak, Inge, DM, FP, Lin
How to add or change the above expression so that it can match Ever7 or Mak or Inge or DM or FP or Lin, if match those words, it will set to seq_number as the next line of the code.
It would be nice if can shows the change.
Assuming that the path segment of interest to you can be reliably found at a constant position, you may want to use Tcl's capabilities for introspecting on filesystem paths:
set fp "/working/Ever7/FILE"
set needles [list Ever7 Mak Inge DM FP Lin]
if {[lindex [file split $fp] 2] in $needles} {
incr seq_number
}
Otherwise, without any guarantees, just run a loop over [string match] needles:
foreach needle $needles {
if {[string match *$needle* $fp]} {
incr seq_number
break;
}
}
This won't require your haystack being transformed into a Tcl list before.
I found the answer, by changing the code on set bb that line
set bb [lsearch -inline -regexp $mach {^[DM|Ever7|Inge|FP|Lin|Mak]+$}]
I know I have been asking a lot of questions but I'm still learning tcl and I haven't found anything that similar to this issue anywhere so far. Is it at all possible to replace a set f commands in tcl with one variable function0 for example?
I want to be able to replace the following code;
set f [listFromFile $path1]
set f [lsort -unique $f]
set f [lsearch -all -inline $f "test_*"]
set f [regsub -all {,} $f "" ]
set len [llength $f]
set cnt 0
with a variable function0 because this same code appears numerous times within the script. I should mention it appears both in a proc and not in a proc
The above code relates to similar script as
while {$cnt < $len} {
puts [lindex $f $cnt]
incr cnt
after 25; #not needed, but for viewing purposes
}
Variables are for storing values. To hide away (encapsulate) some lines of code you need a command procedure, which you define using the proc command.
You wanted to hide away the following lines
set f [listFromFile $path1]
set f [lsort -unique $f]
set f [lsearch -all -inline $f "test_*"]
set f [regsub -all {,} $f "" ]
set len [llength $f]
set cnt 0
to be able to just invoke for instance function0 $path1 and have all those calculations made in one fell swoop. Further, you wanted to use the result of calling the procedure in code like this:
while {$cnt < $len} {
puts [lindex $f $cnt]
# ...
Which means you want function0 to produce three different values, stored in cnt, len, and f. There are several ways to have a command procedure return multiple values, but the cleanest solution here is to make it return a single value; the list that you want to print. The value in len can be calculated from that list with a single command, and the initialization of cnt is better performed outside the command procedure. What you get is this:
proc function0 path {
set f [listFromFile $path]
set f [lsort -unique $f]
set f [lsearch -all -inline $f test_*]
set f [regsub -all , $f {}]
return $f
}
which you can use like this:
set f [function0 $path1]
set len [llength $f]
set cnt 0
while {$cnt < $len} {
puts [lindex $f $cnt]
incr cnt
after 25; #not needed, but for viewing purposes
}
or like this:
set f [function0 $path1]
set len [llength $f]
for {set cnt 0} {$cnt < $len} {incr cnt} {
puts [lindex $f $cnt]
after 25; #not needed, but for viewing purposes
}
or like this:
set f [function0 $path1]
foreach item $f {
puts $item
after 25; #not needed, but for viewing purposes
}
This is why I didn't bother to create a procedure returning three values: you only really needed one.
glenn jackman makes a very good point (or two points, actually) in another answer about the use of regsub. For completeness, I will repeat it here.
Tcl is a bit confusing because it usually allows string operations (like string substitution) on data structures that aren't formally strings. This makes the language very powerful and expressive, but also means that newbies do not always get the kick in the shins that a regular type system would give them.
In this case you created a list structure inside listFromFile by reading a string from a file and then using split on it. From that point on it's a list and you should only perform list operations on it. If you wanted to take out all commas in your data you should either perform that operation on each item in the list, or else perform the operation inside listFromFile, before splitting the text.
String operations on lists will work, but sometimes the result will be garbled, so mixing them should be avoided. The other good point was that in this case string map is preferable to regsub, if nothing else it makes the code a bit clearer.
Documentation: for, foreach, lindex, llength, lsearch, lsort, proc, puts, regsub, set, split, string, while
(more of a comment than an answer, but I want the formatting)
One thing to be aware of: $f holds a list, then you use the string command regsub on it, then you treat the result of regsub as a list again.
Use list commands with list values. I'd replace the regsub command with
set f [lmap elem $f {string map {"," ""} $elem} ]
for Tcl version 8.5 or earlier, you could do this:
for {set i 0} {$i < [llength $f]} {incr i} {
lset f $i [string map {, ""} [lindex $f $i]]
}
I am new to tcl, trying to learn, need a help for below.
My string looks like in configFileBuf and trying to replace second occurance of ConfENB:local-udp-port>31001" with XYZ, but below regsub cmd i was tried is always replacing with first occurance (37896). Plz help how to replace second occurance with xyz.
set ConfigFileBuf "<ConfENB:virtual-phy>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>37896</ConfENB:local-udp-port>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>31001</ConfENB:local-udp-port>
</ConfENB:virtual-phy>"
regsub -start 1 "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>" $ConfigFileBuf "XYZ" ConfigFileBuf
puts $ConfigFileBuf
You have to use regexp -indices to find where to start the replacement, and only then regsub. It's not too bad if you put the regular expression in its own variable.
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] 1 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
The 1 is the number of submatches in the RE (zero in this case) plus 1. You can compute it with the help of regexp -about, giving this piece of trickiness:
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set relen [expr {1 + [lindex [regexp -about $RE] 0]}]
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] $relen 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
If your string was well-formed XML I'd suggest something like tDOM to manipulate it. DOM-style manipulation is almost always better than regular expression-based manipulation on XML markup. (I mention this on the off chance that it's actually supposed to be XML and you just quoted it wrong.)
It looks like you're trying to use -start 1 to tell regsub to skip the first match. The starting index is actually a character index, so in this invocation regsub will just skip the first character in the string. You could set -start further into your string, but that's fragile unless you use regexp to calculate where the first match ends.
I think the best solution would be to get a list of indices to matches by invoking regexp with -all -inline -indices, pick out the second index pair using lindex and finally use string replace to perform the substitution, like this:
set pattern {</ConfENB:local-ip-addr>[ \n\t]+<ConfENB:local-udp-port>[0-9 ]+</ConfENB:local-udp-port>}
set matches [regexp -all -inline -indices -- $pattern $ConfigFileBuf]
set match [lindex $matches 1]
set ConfigFileBuf [string replace $ConfigFileBuf {*}$match XYZ]
The variable match contains a pair of indices (start and end, respectively) for the range of characters you want to replace. As string replace expects those indices to be in different arguments you need to expand $match with the {*} prefix. If you have an earlier version of Tcl than 8.5, you need a slight change to the above code:
foreach {start end} $match break
set ConfigFileBuf [string replace $ConfigFileBuf $start $end XYZ]
In passing, note that you can avoid escaping e.g. character sets in a regular expression if you quote it with braces instead of double quotes.
Documentation links: regexp, lindex, string
I have a string in this pattern:
2(some_substring) -> 3(some_other_substring)
Now these number can be anything.
I think this answer would solve the problem. But it gives all the integers in one variable. I want them to be in different variables, so that I can analyze them. Can we split it? But Splitting would cause problem:
If the the numbers are not single-digit, then the splitting will be erroneous.
Is there any other way?
You can use a variation of this: instead of removing the non-digit characters, you can extract all digit characters into a list:
set text {2(some_substring) -> 3(some_other_substring)}
set numbers [regexp -all -inline -- {[0-9]+} $text]
puts $numbers
# => 2 3
And to get each number, you can use lindex:
puts [lindex $numbers 0]
# => 2
Or in versions 8.5 and later, you can use lassign to assign them to specific variable names:
lassign $numbers first second
puts $first
# => 2
puts $second
# => 3
In regexp -all -inline -- {[0-9]+} $text, -all extract all the matches, -inline puts the matches into a list, -- ends the options, [0-9]+ matches at least one integer.
To extend Jerry's answer, in case digits can appear within the parentheses, a regular expression to only extract digits that are immediately followed by an open parenthesis is: {\d+(=\()}
% set text {2(some_6substring) -> 3(some_other_5substring)}
2(some_6substring) -> 3(some_other_5substring)
% lassign [regexp -all -inline {\d+(?=\()} $text] first second
% set first
2
% set second
3
This assumes that you don't have nested parentheses.