list searching to find exact matches using TCL lsearch - tcl

I have a list and need to search some strings in this list. My list is like following:
list1 = {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
I am trying to use lsearch to check if above list includes some strings or not. Strings are like:
string1 = {slt0_reg_1 slt0_reg_1}
I am doing the following to check this:
set listInd [lsearch -all -exact -nocase -regexp $list1 $string1]
This commands gives the indexes if list1 includes $string1 (This is what I want). However, problem is if I have a string like slt0_reg_1, the above command identifies the first two elements of the list (slt0_reg_11.CK slt0_reg_11.Q) because these covers the string I search.
How can I make exact search?

It sound like you want to add in word-boundary constraints (\y) to your RE. (Don't use -exact and -regexp at the same time; only one of those modes can be used on any run because they change the comparison engine used.) A little care must be taken because we can't enclose the RE in braces as we want to do variable substitution within it.
set list1 {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
foreach str {slt0_reg_11 slt0_reg_1} {
set matches [lsearch -all -regexp $list1 "\\y$str\\y"]
puts "$str: $matches"
}
Prints:
slt0_reg_11: 0 1
slt0_reg_1:

If you want to compare your list for an exact match of the part before the dot against another list, you may be better off using lmap:
set index -1
set listInd [lmap str $list1 {
incr index
if {[lindex [split $str .] 0] ni $string1} continue
set index
}]

Related

Tcl partial string match of one list against another

I'm trying to find the items in list1 that are partial string matches against items from list2 using Tcl.
I'm using this, but it's very slow. Is there any more efficient way to do this?
set list1 [list abc bcd cde]
set list2 [list ab cd]
set l_matchlist [list]
foreach item1 $list1 {
foreach item2 $list2 {
if {[string match -nocase "*${item2}*" $item1]} {
lappend l_matchlist $item1
break
}
}
}
my actual lists are very long and this takes a long time. Is this the best way to do this?
In addition to being slow, there is also a problem if list2 contains elements that have glob wildcard characters, such as '?' and '*'.
I expect the following method will work faster. At least it fixes the issue mentioned above:
set list1 [list abc BCD ace cde]
set list2 [list cd ab de]
set l_matchlist [list]
foreach item2 $list2 {
lappend l_matchlist \
{*}[lsearch -all -inline -nocase -regexp $list1 (?q)$item2]
}
The -regexp option in combination with (?q) may seem strange at first. It uses regexp matching and then tells regexp to treat the pattern as a literal string. But this has the effect of performing the partial match that you're after.
This differs with your version in that it may produce the results in a different order and the same item from list1 may be reported multiple times if it matches more than one item in list2.
If that is undesired, you can follow up with:
set l_matchlist [lmap item1 $list1 {
if {$item1 ni $l_matchlist} continue
set item1
}]
Of course that will reduce some of the speed gains achieved earlier.
You could cheat a bit and turn it from a list-processing task to a string processing task. The latter are usually quite a bit faster in Tcl.
Below I first turn list1 into a string with the original list elements separated by the ASCII field separator character "\x1F". Then the result can be gotten in a single loop via a regular expression search. The regular expression finds the first substring bounded by the field separator chars that contains item2:
# convert list to string:
set string1 \x1F[join $list1 \x1F]\x1F
set l_matchlist [list]
foreach item2 $list2 {
# escape out regexp special chars:
set item2 [regsub -all {\W} $item2 {\\&}]
# use append to assemble regexp pattern
set item2 [append x {[^\x1F]*} $item2 {[^\x1F]*}][unset x]
if {[regexp -nocase $item2 $string1 match]} {
lappend l_matchlist $match
}
}

How to copy exactly in tcl?

I created a list using:
set list1 { o\\/one o\\/two o\\/three }
now I want to copy this list to another list by adding { } to each item
my new list should become :
{ {o\\/one} {o\\/two} {o\\/three} }
I tried using
foreach a $list1 {
set x "{$a}"
append new_list " " "{$a}"
lappend new_list1 $x
}
newlist → {o\/one} {o\/two} {o\/three}
newlist1 → {{o\/one}} {{o\/two}} {{o\/three}}
Please help?
Your original list has these items in it (as you can verify with lindex):
puts [lindex $list1 0] → o\/one
puts [lindex $list1 1] → o\/two
puts [lindex $list1 2] → o\/three
Any list that has those elements in it, however encoded, is pairwise-equivalent. The canonical form (as produced by Tcl's own list operations) of the list is:
{o\/one} {o\/two} {o\/three}
Perhaps the easiest way of obtaining that is:
set list2 [lrange $list1 0 end]
The lrange command uses Tcl's standard list-to-string engine (shared with a great many other commands). That prefers to not add braces, but prefers adding braces to adding backslashes; backslashes are a last resort because they're ugly and hard to read. But it works with arbitrary contents in the elements; just blindly adding braces is vulnerable to tricky edge cases.
Another way of getting the above canonical form is this (provided you're not stuck on versions of Tcl so old they're no longer supported):
set list2 [list {*}$list1]
[EDIT]: If you've got a string with some things in it separated by spaces, you might want to convert it into a proper list; this is useful particularly when the input data contains list metacharacters like braces and (relevant in this case) backslashes. There are two main ways to do this:
set theList [split $inputString]
set theList [regexp -all -inline {\S+} $inputString]
They differ in what happens when the input string has two (or more) spaces between two words:
set inputString "a b c d"; # NB: two spaces between b and c
puts [split $inputString]; # ==> a b {} c d
puts [regexp -all -inline {\S+} $inputString]; # ==> a b c d
There are use-cases for both.

how to find and replace sencond occurance of string using regsub

I am new to tcl, trying to learn, need a help for below.
My string looks like in configFileBuf and trying to replace second occurance of ConfENB:local-udp-port>31001" with XYZ, but below regsub cmd i was tried is always replacing with first occurance (37896). Plz help how to replace second occurance with xyz.
set ConfigFileBuf "<ConfENB:virtual-phy>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>37896</ConfENB:local-udp-port>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>31001</ConfENB:local-udp-port>
</ConfENB:virtual-phy>"
regsub -start 1 "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>" $ConfigFileBuf "XYZ" ConfigFileBuf
puts $ConfigFileBuf
You have to use regexp -indices to find where to start the replacement, and only then regsub. It's not too bad if you put the regular expression in its own variable.
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] 1 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
The 1 is the number of submatches in the RE (zero in this case) plus 1. You can compute it with the help of regexp -about, giving this piece of trickiness:
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set relen [expr {1 + [lindex [regexp -about $RE] 0]}]
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] $relen 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
If your string was well-formed XML I'd suggest something like tDOM to manipulate it. DOM-style manipulation is almost always better than regular expression-based manipulation on XML markup. (I mention this on the off chance that it's actually supposed to be XML and you just quoted it wrong.)
It looks like you're trying to use -start 1 to tell regsub to skip the first match. The starting index is actually a character index, so in this invocation regsub will just skip the first character in the string. You could set -start further into your string, but that's fragile unless you use regexp to calculate where the first match ends.
I think the best solution would be to get a list of indices to matches by invoking regexp with -all -inline -indices, pick out the second index pair using lindex and finally use string replace to perform the substitution, like this:
set pattern {</ConfENB:local-ip-addr>[ \n\t]+<ConfENB:local-udp-port>[0-9 ]+</ConfENB:local-udp-port>}
set matches [regexp -all -inline -indices -- $pattern $ConfigFileBuf]
set match [lindex $matches 1]
set ConfigFileBuf [string replace $ConfigFileBuf {*}$match XYZ]
The variable match contains a pair of indices (start and end, respectively) for the range of characters you want to replace. As string replace expects those indices to be in different arguments you need to expand $match with the {*} prefix. If you have an earlier version of Tcl than 8.5, you need a slight change to the above code:
foreach {start end} $match break
set ConfigFileBuf [string replace $ConfigFileBuf $start $end XYZ]
In passing, note that you can avoid escaping e.g. character sets in a regular expression if you quote it with braces instead of double quotes.
Documentation links: regexp, lindex, string

Extracting integer from a string in TCL

I have a string in this pattern:
2(some_substring) -> 3(some_other_substring)
Now these number can be anything.
I think this answer would solve the problem. But it gives all the integers in one variable. I want them to be in different variables, so that I can analyze them. Can we split it? But Splitting would cause problem:
If the the numbers are not single-digit, then the splitting will be erroneous.
Is there any other way?
You can use a variation of this: instead of removing the non-digit characters, you can extract all digit characters into a list:
set text {2(some_substring) -> 3(some_other_substring)}
set numbers [regexp -all -inline -- {[0-9]+} $text]
puts $numbers
# => 2 3
And to get each number, you can use lindex:
puts [lindex $numbers 0]
# => 2
Or in versions 8.5 and later, you can use lassign to assign them to specific variable names:
lassign $numbers first second
puts $first
# => 2
puts $second
# => 3
In regexp -all -inline -- {[0-9]+} $text, -all extract all the matches, -inline puts the matches into a list, -- ends the options, [0-9]+ matches at least one integer.
To extend Jerry's answer, in case digits can appear within the parentheses, a regular expression to only extract digits that are immediately followed by an open parenthesis is: {\d+(=\()}
% set text {2(some_6substring) -> 3(some_other_5substring)}
2(some_6substring) -> 3(some_other_5substring)
% lassign [regexp -all -inline {\d+(?=\()} $text] first second
% set first
2
% set second
3
This assumes that you don't have nested parentheses.

-end index in lsearch in tcl

The lsearch command has -start index as one of the options
-start index The list is searched starting at position index. If index has the value end, it refers to the last element in the list, and
end-integer refers to the last element in the list minus the specified
integer offset.
I would like to use -end along with -start. How can it be done?
It can be done by discarding the indices greater than or equal to -end index in the lsearch returned list. But is there any better way?
I'd be tempted in your case to use lrange to produce a copy of the list being searched without the elements you don't want returned.
lsearch [lrange $theList $start $end] $searchTerm
The real purpose of the -start option is to allow skipping over previously-found matches, and it is less useful now that we have the -all option (which makes lsearch return a list of all the places where it can match the search term). It was used a bit like this:
for {set idx -1} {[set idx [lsearch -start [incr idx] $list $term]] >= 0} {} {
# Process the match index...
puts "found $term at $idx"
}
And now you'd write:
foreach idx [lsearch -all $list $term] {
puts "found $term at $idx"
}