Extracting integer from a string in TCL - tcl

I have a string in this pattern:
2(some_substring) -> 3(some_other_substring)
Now these number can be anything.
I think this answer would solve the problem. But it gives all the integers in one variable. I want them to be in different variables, so that I can analyze them. Can we split it? But Splitting would cause problem:
If the the numbers are not single-digit, then the splitting will be erroneous.
Is there any other way?

You can use a variation of this: instead of removing the non-digit characters, you can extract all digit characters into a list:
set text {2(some_substring) -> 3(some_other_substring)}
set numbers [regexp -all -inline -- {[0-9]+} $text]
puts $numbers
# => 2 3
And to get each number, you can use lindex:
puts [lindex $numbers 0]
# => 2
Or in versions 8.5 and later, you can use lassign to assign them to specific variable names:
lassign $numbers first second
puts $first
# => 2
puts $second
# => 3
In regexp -all -inline -- {[0-9]+} $text, -all extract all the matches, -inline puts the matches into a list, -- ends the options, [0-9]+ matches at least one integer.

To extend Jerry's answer, in case digits can appear within the parentheses, a regular expression to only extract digits that are immediately followed by an open parenthesis is: {\d+(=\()}
% set text {2(some_6substring) -> 3(some_other_5substring)}
2(some_6substring) -> 3(some_other_5substring)
% lassign [regexp -all -inline {\d+(?=\()} $text] first second
% set first
2
% set second
3
This assumes that you don't have nested parentheses.

Related

list searching to find exact matches using TCL lsearch

I have a list and need to search some strings in this list. My list is like following:
list1 = {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
I am trying to use lsearch to check if above list includes some strings or not. Strings are like:
string1 = {slt0_reg_1 slt0_reg_1}
I am doing the following to check this:
set listInd [lsearch -all -exact -nocase -regexp $list1 $string1]
This commands gives the indexes if list1 includes $string1 (This is what I want). However, problem is if I have a string like slt0_reg_1, the above command identifies the first two elements of the list (slt0_reg_11.CK slt0_reg_11.Q) because these covers the string I search.
How can I make exact search?
It sound like you want to add in word-boundary constraints (\y) to your RE. (Don't use -exact and -regexp at the same time; only one of those modes can be used on any run because they change the comparison engine used.) A little care must be taken because we can't enclose the RE in braces as we want to do variable substitution within it.
set list1 {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
foreach str {slt0_reg_11 slt0_reg_1} {
set matches [lsearch -all -regexp $list1 "\\y$str\\y"]
puts "$str: $matches"
}
Prints:
slt0_reg_11: 0 1
slt0_reg_1:
If you want to compare your list for an exact match of the part before the dot against another list, you may be better off using lmap:
set index -1
set listInd [lmap str $list1 {
incr index
if {[lindex [split $str .] 0] ni $string1} continue
set index
}]

Return string after specific character

I have a question regarding possibility of getting string after specific character in TCL.
Whan I mean is :
Input:
abcdefgh = hgfedcba
Output:
hgfedcba
(return everything after "=" without possible whitespaces)
This is what I was using:
regexp {abcdefgh=\s+"(.*)"} $text_var all variable
In some cases it is ok (with spaces) but when there is no whitespaces then it is not working.
Assuming
% set s {abcdefgh = hgfedcba}
# => abcdefgh = hgfedcba
(or the same thing without one or both of the blanks) you could do one of these:
% scan $s {%*[^=]= %s}
# => hgfedcba
(Scan the string for a substring not containing "=", then advance past the equals sign and optional whitespace, then return the rest of the string.)
string trim [lindex [split $s =] 1]
(Split the string at the equals sign, return the (whitespace-trimmed) second resulting element.)
string trim [string range $s [string first = $s]+1 end]
(Return the (whitespace-trimmed) substring starting after the equals sign.)
string trim [lindex [regexp -inline {[^=]+$} $s] 0]
(Return the (whitespace-trimmed) first match of one or more characters, not including the equals sign, anchored on the end of the string.)
lindex [regexp -inline -all {[a-h]+} $s] 1
(Return the second match of consecutive characters from the set "a" to "h".)
string trimleft [string trimleft $s {abcdefgh }] {= }
(Remove all characters from the start of the string that occur in the set "a" to "h" and blank, then remove from start of the resulting string any characters that are equals sign or blank.)
% regexp {abcdefgh\s*=\s*(\S+)} "abcdefgh = hgfedcba" all variable
1
% set variable
hgfedcba
% regexp {abcdefgh\s*=\s*(\S+)} "abcdefgh=hgfedcba" all variable
1
% set variable
hgfedcba
%

need to remove multiple "-" from the string which is alpha numeric using tcl

I have this string:
svpts-7-40.0001
And I need to remove the second '-' from this.
Basically I am fetching values like these which would come with double '-' SOMETIMES. So if such variables are seen then I have to remove the second '-' and replace the same with '.' , so the string should look like:
svpts-7.40.0001
[EDIT] I have tried:
% set list1 [split $string -]
svpts 7 40.0001
% set var2 [join $list1 .]
svpts.7.40.0001
%
Here's a regular expression that will change only the 2nd hyphen:
% regsub -expanded {( .*? - .*? ) -} "svpts-7-40.0001" {\1.}
svpts-7.40.0001
% regsub -expanded {( .*? - .*? ) -} "svpts-7_40.0001" {\1.}
svpts-7_40.0001
% regsub -expanded {( .*? - .*? ) -} "svpts-7-40.0001-a-b-c" {\1.}
svpts-7.40.0001-a-b-c
Try
% set data svpts-7-40.0001
svpts-7-40.0001
% regexp {([^-]*-)(.*)} $data -> a b
1
% set b [string map {- .} $b]
7.40.0001
% set newdata $a$b
svpts-7.40.0001
The above code changes every hyphen after the first. To change only the second hyphen, one can do this:
set idx [string first - $data [string first - $data]+1]
set newdata [string replace $data $idx $idx .]
or this:
set idxs [lindex [regexp -inline -all -indices -- - $data] 1]
set newdata [string replace $data {*}$idxs .]
The first snippet is well-behaved if the data string doesn't contain at least two hyphens; the other needs some kind of checking to avoid throwing an error.
Documentation:
lindex,
regexp,
set,
string,
{*} (syntax),
Syntax of Tcl regular expressions
Syntax of Tcl index expressions:
integer zero-based index number
end the last element
end-N the nth element before the last element
end+N the nth element after the last element (in practice, N should be negative)
M-N the nth element before element m
M+N the nth element after element m
There can be no whitespace within the expression.

how to find and replace sencond occurance of string using regsub

I am new to tcl, trying to learn, need a help for below.
My string looks like in configFileBuf and trying to replace second occurance of ConfENB:local-udp-port>31001" with XYZ, but below regsub cmd i was tried is always replacing with first occurance (37896). Plz help how to replace second occurance with xyz.
set ConfigFileBuf "<ConfENB:virtual-phy>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>37896</ConfENB:local-udp-port>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>31001</ConfENB:local-udp-port>
</ConfENB:virtual-phy>"
regsub -start 1 "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>" $ConfigFileBuf "XYZ" ConfigFileBuf
puts $ConfigFileBuf
You have to use regexp -indices to find where to start the replacement, and only then regsub. It's not too bad if you put the regular expression in its own variable.
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] 1 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
The 1 is the number of submatches in the RE (zero in this case) plus 1. You can compute it with the help of regexp -about, giving this piece of trickiness:
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set relen [expr {1 + [lindex [regexp -about $RE] 0]}]
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] $relen 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
If your string was well-formed XML I'd suggest something like tDOM to manipulate it. DOM-style manipulation is almost always better than regular expression-based manipulation on XML markup. (I mention this on the off chance that it's actually supposed to be XML and you just quoted it wrong.)
It looks like you're trying to use -start 1 to tell regsub to skip the first match. The starting index is actually a character index, so in this invocation regsub will just skip the first character in the string. You could set -start further into your string, but that's fragile unless you use regexp to calculate where the first match ends.
I think the best solution would be to get a list of indices to matches by invoking regexp with -all -inline -indices, pick out the second index pair using lindex and finally use string replace to perform the substitution, like this:
set pattern {</ConfENB:local-ip-addr>[ \n\t]+<ConfENB:local-udp-port>[0-9 ]+</ConfENB:local-udp-port>}
set matches [regexp -all -inline -indices -- $pattern $ConfigFileBuf]
set match [lindex $matches 1]
set ConfigFileBuf [string replace $ConfigFileBuf {*}$match XYZ]
The variable match contains a pair of indices (start and end, respectively) for the range of characters you want to replace. As string replace expects those indices to be in different arguments you need to expand $match with the {*} prefix. If you have an earlier version of Tcl than 8.5, you need a slight change to the above code:
foreach {start end} $match break
set ConfigFileBuf [string replace $ConfigFileBuf $start $end XYZ]
In passing, note that you can avoid escaping e.g. character sets in a regular expression if you quote it with braces instead of double quotes.
Documentation links: regexp, lindex, string

How to match a Variable in a regexp?

I am trying to write a Tcl script in which I need to match a variable in a regular expression.
For instance, file has some lines of code containing 'major'. Out of all these lines I need to identify one particular line:
major("major",0x32)
I m using variable p1 for 'major' (set p1 major)
How can I write a regexp using variable p1 ($p1) to capture that particular line?
regexp -- "$p1\\(\"$p1\",0x32\\)" $line match
In tclsh:
% set line {major("major",0x32)}
major("major",0x32)
% set p1 major
major
% regexp -- "$p1\\(\"$p1\",0x32\\)" $line match
1
% puts $match
major("major",0x32)
Use a String Match
If you just want to know whether a single line matches, you can test for string match rather than a regular expression. This is often faster and less finicky. For example:
set fh [open /tmp/foo]
set lines [read $fh]
close $fh
set p1 major
set lines [split $lines "\n"]
foreach line $lines {
if {[string match *$p1* $line]} {set match $line}
}
puts $match
Note that this will store the entire line in match, and not just the search pattern. This is probably what you want, but your mileage may vary.