Adding an additional backslash to elements of TCL list - tcl

I have a list which looks like:
list1 = {a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\] ....}
When I operate on the element of the list one by one using a foreach loop:
`foreach ele $list1 {
puts "$ele
}`
I get the following output:
a.b.c.d
a.bb.ccd[0]
a.bb.ccd[1]
Observe that, the backslash goes missing(due to the tcl language flow).
In order to preserve the same, I want to add an extra backslash to all the elements of list1 having an existing backslash.
I tried :
regsub -all {\} $list1 {\\} list1
(Also tried the double quotes instead of braces and other possible trials).
Nothing seems to work.
Is there a way to make sure the $ele preserves the backslash characters inside the foreach loop, as I need the elements with the exact same characters for further processing.
P.S. Beginner in using regexp/regsub

If your input data has backslashes in it like that that you wish to preserve, you need to use a little extra work when converting that data to a Tcl list or Tcl will simply assume that you were using the backslashes as conventional Tcl metacharacters. There's a few ways to do it; here's my personal favourite as it is logically selecting exactly the chars that we want (the non-empty sequences of non-whitespaces):
set input {a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\] ....}
set items [regexp -all -inline {\S+} $input]
foreach item $items {
puts $item
}
As you can see from the output:
a.b.c.d
a.bb.ccd\[0\]
a.bb.ccd\[1\]
....
this keeps the backslashes exactly. (Also, yes, I quite like regular expressions. Especially very simple ones like this.)

As you have defined list1 it is a string. When list1 is used with the foreach command, then the string is converted to a list. Remember that lists in Tcl are really just specially formatted strings that are parsed when converted from a string to a list. As the list elements are parsed, the backslashes are removed in accordance with normal Tcl parsing rules. There are several ways to build lists that contains characters that are significant to the Tcl parser. The code below shows two examples contrasted to your code:
set list1 {a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\]}
puts "list1 as a string"
puts $list1
puts "converting the list1 string to a proper list"
foreach ele $list1 {
puts $ele
}
set list2 [list a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}]
puts "list2 build by the list command"
puts $list2
puts "list2, element by element"
foreach ele $list2 {
puts $ele
}
set list3 {a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}}
puts "list3 build properly quoting each element"
puts $list3
puts "list3, element by element"
foreach ele $list3 {
puts $ele
}
Running this yields:
list1 as a string
a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\]
converting the list1 string to a proper list
a.b.c.d
a.bb.ccd[0]
a.bb.ccd[1]
list2 build by the list command
a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}
list2, element by element
a.b.c.d
a.bb.ccd\[0\]
a.bb.ccd\[1\]
list3 build properly quoting each element
a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}
list3, element by element
a.b.c.d
a.bb.ccd\[0\]
a.bb.ccd\[1\]
Your regsub attempt will work if you replace each backslash by two backslashes, but building the list properly is much clearer.

Related

Tcl: Parsing input with strings in quotes

I have the following code to split stdin into a list of strings:
set cmd [string toupper [gets stdin]]
set items [split $cmd " "]
This splits the user input into a list (items) using the space as a delimiter. It works fine for simple input such as:
HELLO 1 2 3
What I get in items:
HELLO
1
2
But how can I get the quoted string in the example below to be become one item in the list (items):
"HELLO THERE" 1 2 3
What I want in items:
HELLO THERE
1
2
How can I do this?
This is where you get into building a more complex parser. The first step towards that is switching to using regular expressions.
regexp -all -inline {"[^\"]*"|[^\"\s]+} $inputData
That will do the right thing... provided the input is well-formed and only uses double quotes for quoting. It also doesn't strip the quotes off the outside of the "words"; you'll want to use string trim $word \" to clean that up.
If this is a command that you are parsing, use a safe interpreter. Then you can allow Tcl syntax to be used without exposing the guts of your code. I'm pretty sure there are answers here on how to do that already.
Because Tcl doesn't have strong types, the simplest way to do this is to just treat your stdin string like a list of strings. No need to use split to convert a string into a list.
set cmd {"HELLO THERE" 1 2 3}
foreach item $cmd {
puts $item
}
--> HELLO THERE
1
2
3
Use string is list to check if your $cmd string can be treated as a list.
if {[string is list $cmd]} {
puts "Can be a list"
} else {
puts "Cannot be a list"
}

Tcl partial string match of one list against another

I'm trying to find the items in list1 that are partial string matches against items from list2 using Tcl.
I'm using this, but it's very slow. Is there any more efficient way to do this?
set list1 [list abc bcd cde]
set list2 [list ab cd]
set l_matchlist [list]
foreach item1 $list1 {
foreach item2 $list2 {
if {[string match -nocase "*${item2}*" $item1]} {
lappend l_matchlist $item1
break
}
}
}
my actual lists are very long and this takes a long time. Is this the best way to do this?
In addition to being slow, there is also a problem if list2 contains elements that have glob wildcard characters, such as '?' and '*'.
I expect the following method will work faster. At least it fixes the issue mentioned above:
set list1 [list abc BCD ace cde]
set list2 [list cd ab de]
set l_matchlist [list]
foreach item2 $list2 {
lappend l_matchlist \
{*}[lsearch -all -inline -nocase -regexp $list1 (?q)$item2]
}
The -regexp option in combination with (?q) may seem strange at first. It uses regexp matching and then tells regexp to treat the pattern as a literal string. But this has the effect of performing the partial match that you're after.
This differs with your version in that it may produce the results in a different order and the same item from list1 may be reported multiple times if it matches more than one item in list2.
If that is undesired, you can follow up with:
set l_matchlist [lmap item1 $list1 {
if {$item1 ni $l_matchlist} continue
set item1
}]
Of course that will reduce some of the speed gains achieved earlier.
You could cheat a bit and turn it from a list-processing task to a string processing task. The latter are usually quite a bit faster in Tcl.
Below I first turn list1 into a string with the original list elements separated by the ASCII field separator character "\x1F". Then the result can be gotten in a single loop via a regular expression search. The regular expression finds the first substring bounded by the field separator chars that contains item2:
# convert list to string:
set string1 \x1F[join $list1 \x1F]\x1F
set l_matchlist [list]
foreach item2 $list2 {
# escape out regexp special chars:
set item2 [regsub -all {\W} $item2 {\\&}]
# use append to assemble regexp pattern
set item2 [append x {[^\x1F]*} $item2 {[^\x1F]*}][unset x]
if {[regexp -nocase $item2 $string1 match]} {
lappend l_matchlist $match
}
}

How to copy exactly in tcl?

I created a list using:
set list1 { o\\/one o\\/two o\\/three }
now I want to copy this list to another list by adding { } to each item
my new list should become :
{ {o\\/one} {o\\/two} {o\\/three} }
I tried using
foreach a $list1 {
set x "{$a}"
append new_list " " "{$a}"
lappend new_list1 $x
}
newlist → {o\/one} {o\/two} {o\/three}
newlist1 → {{o\/one}} {{o\/two}} {{o\/three}}
Please help?
Your original list has these items in it (as you can verify with lindex):
puts [lindex $list1 0] → o\/one
puts [lindex $list1 1] → o\/two
puts [lindex $list1 2] → o\/three
Any list that has those elements in it, however encoded, is pairwise-equivalent. The canonical form (as produced by Tcl's own list operations) of the list is:
{o\/one} {o\/two} {o\/three}
Perhaps the easiest way of obtaining that is:
set list2 [lrange $list1 0 end]
The lrange command uses Tcl's standard list-to-string engine (shared with a great many other commands). That prefers to not add braces, but prefers adding braces to adding backslashes; backslashes are a last resort because they're ugly and hard to read. But it works with arbitrary contents in the elements; just blindly adding braces is vulnerable to tricky edge cases.
Another way of getting the above canonical form is this (provided you're not stuck on versions of Tcl so old they're no longer supported):
set list2 [list {*}$list1]
[EDIT]: If you've got a string with some things in it separated by spaces, you might want to convert it into a proper list; this is useful particularly when the input data contains list metacharacters like braces and (relevant in this case) backslashes. There are two main ways to do this:
set theList [split $inputString]
set theList [regexp -all -inline {\S+} $inputString]
They differ in what happens when the input string has two (or more) spaces between two words:
set inputString "a b c d"; # NB: two spaces between b and c
puts [split $inputString]; # ==> a b {} c d
puts [regexp -all -inline {\S+} $inputString]; # ==> a b c d
There are use-cases for both.

list searching to find exact matches using TCL lsearch

I have a list and need to search some strings in this list. My list is like following:
list1 = {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
I am trying to use lsearch to check if above list includes some strings or not. Strings are like:
string1 = {slt0_reg_1 slt0_reg_1}
I am doing the following to check this:
set listInd [lsearch -all -exact -nocase -regexp $list1 $string1]
This commands gives the indexes if list1 includes $string1 (This is what I want). However, problem is if I have a string like slt0_reg_1, the above command identifies the first two elements of the list (slt0_reg_11.CK slt0_reg_11.Q) because these covers the string I search.
How can I make exact search?
It sound like you want to add in word-boundary constraints (\y) to your RE. (Don't use -exact and -regexp at the same time; only one of those modes can be used on any run because they change the comparison engine used.) A little care must be taken because we can't enclose the RE in braces as we want to do variable substitution within it.
set list1 {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
foreach str {slt0_reg_11 slt0_reg_1} {
set matches [lsearch -all -regexp $list1 "\\y$str\\y"]
puts "$str: $matches"
}
Prints:
slt0_reg_11: 0 1
slt0_reg_1:
If you want to compare your list for an exact match of the part before the dot against another list, you may be better off using lmap:
set index -1
set listInd [lmap str $list1 {
incr index
if {[lindex [split $str .] 0] ni $string1} continue
set index
}]

How to split string by numerics

I havetried to split but still failed.
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
puts [split $strdata "3334445330535"] ;#<---- this command does not work
The result needed as below:
{34a64323R6662w0332665323020346t534r66662v43037} {34a64323R6662w0332665323020346t534r66662v43037}
The split command's optional second argument is interpreted as a set of characters to split on, so it really isn't going to do what you want. However, there are other approaches. One of the simpler methods of doing what you want is to use string map to convert the character sequence into a character that isn't in the input data (Unicode is full of those!) and then split on that:
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
set splitterm "3334445330535"
set items [split [string map [list $splitterm "\uFFFF"] $strdata] "\uFFFF"]
foreach i $items {
puts "==> $i"
}
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> {}
Note that there is a {} (i.e., an empty-string list element) at the end because that's the string that came after the last split element. If you don't want that, add a string trimright between the string map and the split:
# Doing this in steps because the line is a bit long otherwise
set mapped [string map [list $splitterm "\uFFFF"] $strdata]
set trimmed [string trimright $mapped "\uFFFF"]
set items [split $trimmed "\uFFFF"]
The split command doesn't work like that, see the documentation.
Try making the data string into a list like this:
regsub -all 3334445330535 $strdata " "
i.e. replacing the delimiter with a space.
Documentation:
regsub,
split