I created a list using:
set list1 { o\\/one o\\/two o\\/three }
now I want to copy this list to another list by adding { } to each item
my new list should become :
{ {o\\/one} {o\\/two} {o\\/three} }
I tried using
foreach a $list1 {
set x "{$a}"
append new_list " " "{$a}"
lappend new_list1 $x
}
newlist → {o\/one} {o\/two} {o\/three}
newlist1 → {{o\/one}} {{o\/two}} {{o\/three}}
Please help?
Your original list has these items in it (as you can verify with lindex):
puts [lindex $list1 0] → o\/one
puts [lindex $list1 1] → o\/two
puts [lindex $list1 2] → o\/three
Any list that has those elements in it, however encoded, is pairwise-equivalent. The canonical form (as produced by Tcl's own list operations) of the list is:
{o\/one} {o\/two} {o\/three}
Perhaps the easiest way of obtaining that is:
set list2 [lrange $list1 0 end]
The lrange command uses Tcl's standard list-to-string engine (shared with a great many other commands). That prefers to not add braces, but prefers adding braces to adding backslashes; backslashes are a last resort because they're ugly and hard to read. But it works with arbitrary contents in the elements; just blindly adding braces is vulnerable to tricky edge cases.
Another way of getting the above canonical form is this (provided you're not stuck on versions of Tcl so old they're no longer supported):
set list2 [list {*}$list1]
[EDIT]: If you've got a string with some things in it separated by spaces, you might want to convert it into a proper list; this is useful particularly when the input data contains list metacharacters like braces and (relevant in this case) backslashes. There are two main ways to do this:
set theList [split $inputString]
set theList [regexp -all -inline {\S+} $inputString]
They differ in what happens when the input string has two (or more) spaces between two words:
set inputString "a b c d"; # NB: two spaces between b and c
puts [split $inputString]; # ==> a b {} c d
puts [regexp -all -inline {\S+} $inputString]; # ==> a b c d
There are use-cases for both.
Related
I'm trying to find the items in list1 that are partial string matches against items from list2 using Tcl.
I'm using this, but it's very slow. Is there any more efficient way to do this?
set list1 [list abc bcd cde]
set list2 [list ab cd]
set l_matchlist [list]
foreach item1 $list1 {
foreach item2 $list2 {
if {[string match -nocase "*${item2}*" $item1]} {
lappend l_matchlist $item1
break
}
}
}
my actual lists are very long and this takes a long time. Is this the best way to do this?
In addition to being slow, there is also a problem if list2 contains elements that have glob wildcard characters, such as '?' and '*'.
I expect the following method will work faster. At least it fixes the issue mentioned above:
set list1 [list abc BCD ace cde]
set list2 [list cd ab de]
set l_matchlist [list]
foreach item2 $list2 {
lappend l_matchlist \
{*}[lsearch -all -inline -nocase -regexp $list1 (?q)$item2]
}
The -regexp option in combination with (?q) may seem strange at first. It uses regexp matching and then tells regexp to treat the pattern as a literal string. But this has the effect of performing the partial match that you're after.
This differs with your version in that it may produce the results in a different order and the same item from list1 may be reported multiple times if it matches more than one item in list2.
If that is undesired, you can follow up with:
set l_matchlist [lmap item1 $list1 {
if {$item1 ni $l_matchlist} continue
set item1
}]
Of course that will reduce some of the speed gains achieved earlier.
You could cheat a bit and turn it from a list-processing task to a string processing task. The latter are usually quite a bit faster in Tcl.
Below I first turn list1 into a string with the original list elements separated by the ASCII field separator character "\x1F". Then the result can be gotten in a single loop via a regular expression search. The regular expression finds the first substring bounded by the field separator chars that contains item2:
# convert list to string:
set string1 \x1F[join $list1 \x1F]\x1F
set l_matchlist [list]
foreach item2 $list2 {
# escape out regexp special chars:
set item2 [regsub -all {\W} $item2 {\\&}]
# use append to assemble regexp pattern
set item2 [append x {[^\x1F]*} $item2 {[^\x1F]*}][unset x]
if {[regexp -nocase $item2 $string1 match]} {
lappend l_matchlist $match
}
}
Is there a way to add strings (words) that end with a semi-colon to a list without it being a single-item list itself?
I'm working with strings and breaking them into words; and the punctuation often is attached as a suffix. I need to have the words both ways, that is with and without the punctuation.
Is it okay to simply let it a be a list within a list, and just reference all of the words as if they were single-item lists? It appears to work. Or is there a better method altogether?
Thank you.
set list { a; b c d }
chan puts stdout $list; # a; b c d
set new "b;"
lset list 1 $new
chan puts stdout $list; # {a;} {b;} c d
chan puts stdout [lindex [lindex $list 1] 0]; # b;
chan puts stdout [lindex [lindex $list 3] 0]; # d
chan puts stdout [lindex $new 0]; # b;
I'm working with strings and breaking them into words
It is important to use split for this step, other than that, you should be fine using the so-produced list by appending to it. The various list commands will make sure that special characters (your ;) will be protected from being ill-interpreted.
From the Tcl perspective, there is no difference between a single-element list {a;} and an atomic string a;. To quote the Tclers' Wiki:
No program can tell the difference between the string "a" and the
one-element list "a", because the one-element list "a" is the string
"a".
Don't let the curly braces confuse you.
I have a list and need to search some strings in this list. My list is like following:
list1 = {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
I am trying to use lsearch to check if above list includes some strings or not. Strings are like:
string1 = {slt0_reg_1 slt0_reg_1}
I am doing the following to check this:
set listInd [lsearch -all -exact -nocase -regexp $list1 $string1]
This commands gives the indexes if list1 includes $string1 (This is what I want). However, problem is if I have a string like slt0_reg_1, the above command identifies the first two elements of the list (slt0_reg_11.CK slt0_reg_11.Q) because these covers the string I search.
How can I make exact search?
It sound like you want to add in word-boundary constraints (\y) to your RE. (Don't use -exact and -regexp at the same time; only one of those modes can be used on any run because they change the comparison engine used.) A little care must be taken because we can't enclose the RE in braces as we want to do variable substitution within it.
set list1 {slt0_reg_11.CK slt0_reg_11.Q slt0_reg_12.CK slt0_reg_12.Q}
foreach str {slt0_reg_11 slt0_reg_1} {
set matches [lsearch -all -regexp $list1 "\\y$str\\y"]
puts "$str: $matches"
}
Prints:
slt0_reg_11: 0 1
slt0_reg_1:
If you want to compare your list for an exact match of the part before the dot against another list, you may be better off using lmap:
set index -1
set listInd [lmap str $list1 {
incr index
if {[lindex [split $str .] 0] ni $string1} continue
set index
}]
I am comparing two strings, how I can get the part of string which did not match between these two
This is an interesting problem that requires a longest common subsequence algorithm. Tcl's got one of those already in Tcllib, but it's for lists. Fortunately, we can convert a string into a list of characters with split:
package require struct::list
set a "the quick brown fox"
set b "the slow green fox"
set listA [split $a ""]; set lenA [llength $listA]
set listB [split $b ""]; set lenB [llength $listB]
set correspondences [struct::list longestCommonSubsequence $listA $listB]
set differences [struct::list lcsInvertMerge $correspondences $lenA $lenB]
Now we can get the parts that didn't match up by picking the parts from the differences that are added, changed or deleted:
set common {}
set unmatchedA {}
set unmatchedB {}
foreach diff $differences {
lassign $diff type rangeA rangeB
switch $type {
unchanged {
lappend common [join [lrange $listA {*}$rangeA] ""]
}
added {
lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
}
changed {
lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
}
deleted {
lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
}
}
}
puts common->$common
# common->{the } ow {n fox}
puts A->$unmatchedA
# A->{quick br}
puts B->$unmatchedB
# B->sl { gree}
In this case, we see the following correspondences (. is a spacer I've inserted to help line things up):
the quick br..ow.....n fox
the ........slow green fox
Whether this is exactly what you want, I don't know (and there's more detail in the computed differences; they're just a bit hard to read). You can easily switch to doing a word-by-word correspondence instead if that's more to your taste. It's pretty much just removing the split and join…
If you have a string and you want to remove a fixed substring, for example
set str "this is a larger? string"
set substr "a larger?"
Then you can do this:
set parts [split [string map [list $s2 \uffff] $s1] \uffff]
# returns the list: {this is } { string}
That globally replaces the substring within the larger string with a single character, then splits the result on that same character.
I am doing :
glob -nocomplain *
as a result I get 4 files:
a b c d
how can I remove from list b?
I am using this func:
proc lremove {args} {
if {[llength $args] < 2} {
puts stderr {Wrong # args: should be "lremove ?-all? list pattern"}
}
set list [lindex $args end-1]
set elements [lindex $args end]
if [string match -all [lindex $args 0]] {
foreach element $elements {
set list [lsearch -all -inline -not -exact $list $element]
}
} else {
# Using lreplace to truncate the list saves having to calculate
# ranges or offsets from the indexed element. The trimming is
# necessary in cases where the first or last element is the
# indexed element.
foreach element $elements {
set idx [lsearch $list $element]
set list [string trim \
"[lreplace $list $idx end] [lreplace $list 0 $idx]"]
}
}
return $list
}
however it does not working with glob results, but only with strings. please help.
That lreplace procedure is rather dodgy, really, what with swapping the order around, ghetto concatenation and string trim to try to clean up the mess. Yuck. Here's a simpler version (without support for -all, which you don't need for processing the output of glob as that's normally a list of unique elements anyway):
proc lremove {list args} {
foreach toRemove $args {
set index [lsearch -exact $list $toRemove]
set list [lreplace $list $index $index]
}
return $list
}
Let's test it!
% lremove {a b c d e} b d f
a c e
Theoretically it could be made more efficient, but it would take a lot of work and be a PITA to debug. This version is way easier to write and is obviously correct. It should also be substantially faster than what you were working with, as it sticks to purely list operations.
The results from glob shouldn't be particularly special that any unusual effort be required to work with them, but there were some really nasty historic bugs that made that not always true. The latest versions of 8.4 and 8.5 (i.e., 8.4.20 and 8.5.15) don't have the bugs. Nor does any release version of 8.6 (8.6.0 or 8.6.1). If stuff is behaving mysteriously, we'll get into asking about versions and telling you to not be quite so behind the times…