How I can get unmatched part of string using TCL? - tcl

I am comparing two strings, how I can get the part of string which did not match between these two

This is an interesting problem that requires a longest common subsequence algorithm. Tcl's got one of those already in Tcllib, but it's for lists. Fortunately, we can convert a string into a list of characters with split:
package require struct::list
set a "the quick brown fox"
set b "the slow green fox"
set listA [split $a ""]; set lenA [llength $listA]
set listB [split $b ""]; set lenB [llength $listB]
set correspondences [struct::list longestCommonSubsequence $listA $listB]
set differences [struct::list lcsInvertMerge $correspondences $lenA $lenB]
Now we can get the parts that didn't match up by picking the parts from the differences that are added, changed or deleted:
set common {}
set unmatchedA {}
set unmatchedB {}
foreach diff $differences {
lassign $diff type rangeA rangeB
switch $type {
unchanged {
lappend common [join [lrange $listA {*}$rangeA] ""]
}
added {
lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
}
changed {
lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
}
deleted {
lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
}
}
}
puts common->$common
# common->{the } ow {n fox}
puts A->$unmatchedA
# A->{quick br}
puts B->$unmatchedB
# B->sl { gree}
In this case, we see the following correspondences (. is a spacer I've inserted to help line things up):
the quick br..ow.....n fox
the ........slow green fox
Whether this is exactly what you want, I don't know (and there's more detail in the computed differences; they're just a bit hard to read). You can easily switch to doing a word-by-word correspondence instead if that's more to your taste. It's pretty much just removing the split and join…

If you have a string and you want to remove a fixed substring, for example
set str "this is a larger? string"
set substr "a larger?"
Then you can do this:
set parts [split [string map [list $s2 \uffff] $s1] \uffff]
# returns the list: {this is } { string}
That globally replaces the substring within the larger string with a single character, then splits the result on that same character.

Related

Tcl partial string match of one list against another

I'm trying to find the items in list1 that are partial string matches against items from list2 using Tcl.
I'm using this, but it's very slow. Is there any more efficient way to do this?
set list1 [list abc bcd cde]
set list2 [list ab cd]
set l_matchlist [list]
foreach item1 $list1 {
foreach item2 $list2 {
if {[string match -nocase "*${item2}*" $item1]} {
lappend l_matchlist $item1
break
}
}
}
my actual lists are very long and this takes a long time. Is this the best way to do this?
In addition to being slow, there is also a problem if list2 contains elements that have glob wildcard characters, such as '?' and '*'.
I expect the following method will work faster. At least it fixes the issue mentioned above:
set list1 [list abc BCD ace cde]
set list2 [list cd ab de]
set l_matchlist [list]
foreach item2 $list2 {
lappend l_matchlist \
{*}[lsearch -all -inline -nocase -regexp $list1 (?q)$item2]
}
The -regexp option in combination with (?q) may seem strange at first. It uses regexp matching and then tells regexp to treat the pattern as a literal string. But this has the effect of performing the partial match that you're after.
This differs with your version in that it may produce the results in a different order and the same item from list1 may be reported multiple times if it matches more than one item in list2.
If that is undesired, you can follow up with:
set l_matchlist [lmap item1 $list1 {
if {$item1 ni $l_matchlist} continue
set item1
}]
Of course that will reduce some of the speed gains achieved earlier.
You could cheat a bit and turn it from a list-processing task to a string processing task. The latter are usually quite a bit faster in Tcl.
Below I first turn list1 into a string with the original list elements separated by the ASCII field separator character "\x1F". Then the result can be gotten in a single loop via a regular expression search. The regular expression finds the first substring bounded by the field separator chars that contains item2:
# convert list to string:
set string1 \x1F[join $list1 \x1F]\x1F
set l_matchlist [list]
foreach item2 $list2 {
# escape out regexp special chars:
set item2 [regsub -all {\W} $item2 {\\&}]
# use append to assemble regexp pattern
set item2 [append x {[^\x1F]*} $item2 {[^\x1F]*}][unset x]
if {[regexp -nocase $item2 $string1 match]} {
lappend l_matchlist $match
}
}

how to split a file to list of lists TCL

I'm coding TCL and I would like to split a file into two lists of lists,
the file contain:
(1,2) (3,4) (5,6)
(7,8) (9,10) (11,12)
and I would like to get two list
one for each line, that contain lists that each one contain to two number
for example:
puts $list1 #-> {1 2} {3 4} {5 6}
puts [lindex $list1 0] #-> 1 2
puts [lindex $list2 2] #-> 11 12
I tried to use regexp and split but no success
The idea of using regexp is good, but you'll need to do some post-processing on its output.
# This is what you'd read from a file
set inputdata "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
foreach line [split $inputdata "\n"] {
# Skip empty lines.
# (I often put a comment format in my data files too; this is where I'd handle it.)
if {$line eq ""} continue
# Parse the line.
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
# Example results of regexp:
# (1,2) 1 2 (3,4) 3 4 (5,6) 5 6
# Post-process to build the lists you really want
set list([incr idx]) [lmap {- a b} $bits {list $a $b}]
}
Note that this is building up an array; long experience says that calling variables list1, list2, …, when you're building them in a loop is a bad idea, and that an array should be used, effectively giving variables like list(1), list(2), …, as that yields a much lower bug rate.
An alternate approach is to use a simpler regexp and then have scan parse the results. This can be more effective when the numbers aren't just digit strings.
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) [lmap substr $bits {scan $substr "(%d,%d)"}]
}
If you're not using Tcl 8.6, you won't have lmap yet. In that case you'd do something like this instead:
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\(\s*(\d+)\s*,\s*(\d+)\s*\)} $line]
set list([incr idx]) {}
foreach {- a b} $bits {
lappend list($idx) [list $a b]
}
}
foreach line [split $inputdata "\n"] {
if {$line eq ""} continue
set bits [regexp -all -inline {\([^()]+\)} $line]
set list([incr idx]) {}
foreach substr $bits {
lappend list($idx) [scan $substr "(%d,%d)"]
# In *very* old Tcl you'd need this:
# scan $substr "(%d,%d)" a b
# lappend list($idx) [list $a $b]
}
}
You have an answer already, but it can actually be done a little bit simpler (or at least without regexp, which is usually a good thing).
Like Donal, I'll assume this to be the text read from a file:
set lines "(1,2) (3,4) (5,6)\n(7,8) (9,10) (11,12)\n"
Clean it up a bit, removing the parentheses and any white space before and after the data:
% set lines [string map {( {} ) {}} [string trim $lines]]
1,2 3,4 5,6
7,8 9,10 11,12
One way to do it with good old-fashioned Tcl, resulting in a cluster of variables named lineN, where N is an integer 1, 2, 3...:
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line[incr idx] $res
}
A doubly iterative structure like this (a number of lines, each having a number of pairs of numbers separated by a single comma) is easy to process using one foreach within the other. The variable res is used for storing result lines as they are assembled. At the innermost level, the pairs are split and list-appended to the result. For each completed line, a variable is created to store the result: its name consists of the string "line" and an increasing index.
As Donal says, it's not a good idea to use clusters of variables. It's much better to collect them into an array (same code, except for how the result variable is named):
set idx 0
foreach lin [split $lines \n] {
set res {}
foreach li [split $lin] {
lappend res [split $li ,]
}
set line([incr idx]) $res
}
If you have the results in an array, you can use the parray utility command to list them in one fell swoop:
% parray line
line(1) = {1 2} {3 4} {5 6}
line(2) = {7 8} {9 10} {11 12}
(Note that this is printed output, not a function return value.)
You can get whole lines from this result:
% set line(1)
{1 2} {3 4} {5 6}
Or you can access pairs:
% lindex $line(1) 0
1 2
% lindex $line(2) 2
11 12
If you have the lmap command (or the replacement linked to below), you can simplify the solution somewhat (you don't need the res variable):
set idx 0
foreach lin [split $lines \n] {
set line([incr idx]) [lmap li [split $lin] {
split $li ,
}]
}
Still simpler is to let the result be a nested list:
set lineList [lmap lin [split $lines \n] {
lmap li [split $lin] {
split $li ,
}
}]
You can access parts of the result similar to above:
% lindex $lineList 0
{1 2} {3 4} {5 6}
% lindex $lineList 0 0
1 2
% lindex $lineList 1 2
11 12
Documentation:
array,
foreach,
incr,
lappend,
lindex,
lmap (for Tcl 8.5),
lmap,
parray,
set,
split,
string
The code works for windows :
TCL file code is :
proc captureImage {} {
#open the image config file.
set configFile [open "C:/main/image_config.txt" r]
#To retrive the values from the config file.
while {![eof $configFile]} {
set part [split [gets $configFile] "="]
set props([string trimright [lindex $part 0]]) [string trimleft [lindex $part 1]]
}
close $configFile
set time [clock format [clock seconds] -format %Y%m%d_%H%M%S]
set date [clock format [clock seconds] -format %Y%m%d]
#create the folder with the current date
set folderPath $props(folderPath)
append folderDate $folderPath "" $date "/"
set FolderCreation [file mkdir $folderDate]
while {0} {
if { [file exists $date] == 1} {
}
break
}
#camera selection to capture image.
set camera "video"
append cctv $camera "=" $props(cctv)
#set the image resolution (XxY).
set resolutionX $props(resolutionX)
set resolutionY $props(resolutionY)
append resolution $resolutionX "x" $resolutionY
#set the name to the save image
set imagePrefix $props(imagePrefix)
set imageFormat $props(imageFormat)
append filename $folderDate "" $imagePrefix "_" $time "." $imageFormat
set logPrefix "Image_log"
append logFile $folderDate "" $logPrefix "" $date ".txt"
#ffmpeg command to capture image in background
exec ffmpeg -f dshow -benchmark -i $cctv -s $resolution $filename >& $logFile &
after 3000
}
}
captureImage
thext file code is :
cctv=Integrated Webcam
resolutionX=1920
resolutionY=1080
imagePrefix=ImageCapture
imageFormat=jpg
folderPath=c:/test/
//camera=video=Integrated Webcam,Logitech HD Webcam C525
This code works for me me accept the code from text file were list of parameters are passed.

TCL String Manipulation and Extraction

I have a string xxxxxxx-s12345ab7_0_0_xx2.log and need to have an output like AB700_xx2 in TCL.
ab will be the delimiter and need to extract from ab to . (including ab) and also have to remove only the first two underscores.
Tried string trim, string trimleft and string trimright, but not much use. Is there anything like string split in TCL?
The first stage is to extract the basic relevant substring; the easiest way to do that is actually with a regular expression:
set inputString "xxxxxxx-s12345ab7_0_0_xx2.log"
if {![regexp {ab[^.]+} $inputString extracted]} {
error "didn't match!"
}
puts "got $extracted"
# ===> got ab7_0_0_xx2
Then, we want to get rid of those nasty underscores with string map:
set final [string map {"_" ""} $extracted]
puts "got $final"
# ===> ab700xx2
Hmm, not quite what we wanted! We wanted to keep the last underscore and to up-case the first part.
set pieces [split $extracted "_"]
set final [string toupper [join [lrange $pieces 0 2] ""]]_[join [lrange $pieces 3 end] "_"]
puts "got $final"
# ===> got AB700_xx2
(The split command divides a string up into “records” by an optional record specifier — which defaults to any whitespace character — that we can then manipulate easily with list operations. The join command does the reverse, but here I'm using an empty record specifier on one half which makes everything be concatenated. I think you can guess what the string toupper and lrange commands do…)
set a "xxxxxxx-s12345ab7_0_0_xx2.log"
set a [split $a ""]
set trig 0
set extract ""
for {set i 0} {$i < [llength $a]} {incr i} {
if {"ab" eq "[lindex $a $i][lindex $a [expr $i+1]]"} {
set trig 1
}
if {$trig == 1} {
append extract [lindex $a $i]
}
}
set extract "[string toupper [join [lrange [split [lindex [split $extract .] 0] _] 0 end-1] ""]]_[lindex [split [lindex [split $extract .] 0] _] end]"
puts $extract
Only regexp is enough to do the trick.
Set string "xxxxxxx-s12345ab7_0_0_xx2.log"
regexp {(ab)(.*)_(.*)_(.*)_(.*)\\.} $string -> s1 s2 s3 s4 s5
Set rstring "$s1$s2$s3$s4\_$s5"
Puts $rstring

Removing elements from results of glob function in TCL

I am doing :
glob -nocomplain *
as a result I get 4 files:
a b c d
how can I remove from list b?
I am using this func:
proc lremove {args} {
if {[llength $args] < 2} {
puts stderr {Wrong # args: should be "lremove ?-all? list pattern"}
}
set list [lindex $args end-1]
set elements [lindex $args end]
if [string match -all [lindex $args 0]] {
foreach element $elements {
set list [lsearch -all -inline -not -exact $list $element]
}
} else {
# Using lreplace to truncate the list saves having to calculate
# ranges or offsets from the indexed element. The trimming is
# necessary in cases where the first or last element is the
# indexed element.
foreach element $elements {
set idx [lsearch $list $element]
set list [string trim \
"[lreplace $list $idx end] [lreplace $list 0 $idx]"]
}
}
return $list
}
however it does not working with glob results, but only with strings. please help.
That lreplace procedure is rather dodgy, really, what with swapping the order around, ghetto concatenation and string trim to try to clean up the mess. Yuck. Here's a simpler version (without support for -all, which you don't need for processing the output of glob as that's normally a list of unique elements anyway):
proc lremove {list args} {
foreach toRemove $args {
set index [lsearch -exact $list $toRemove]
set list [lreplace $list $index $index]
}
return $list
}
Let's test it!
% lremove {a b c d e} b d f
a c e
Theoretically it could be made more efficient, but it would take a lot of work and be a PITA to debug. This version is way easier to write and is obviously correct. It should also be substantially faster than what you were working with, as it sticks to purely list operations.
The results from glob shouldn't be particularly special that any unusual effort be required to work with them, but there were some really nasty historic bugs that made that not always true. The latest versions of 8.4 and 8.5 (i.e., 8.4.20 and 8.5.15) don't have the bugs. Nor does any release version of 8.6 (8.6.0 or 8.6.1). If stuff is behaving mysteriously, we'll get into asking about versions and telling you to not be quite so behind the times…

TCL Program that Compare String

I'm trying to create a program that the First and last characters are compared, Second and second to the last are compared, Third and third to the last are compared, and so on, and if any of these characters match, the two will be converted to the uppercase of that character.
Example:
Please enter a text: Hello Philippines
finals: HEllo PhIlippinEs
I can't create any piece of code, I'm stuck with
puts "Please enter text:"
set myText [gets stdin]
string index $myText 4
Can someone help me please?
This procedure will also capitalize the first i in Phillipines because it's equidistant from the start and the end of the string.
proc compare_chars {str} {
set letters [split $str ""]
for {set i [expr {[llength $letters] / 2}]} {$i >= 0} {incr i -1} {
set a [lindex $letters $i]
set b [lindex $letters end-$i]
if {$a eq $b} {
lset letters $i [set L [string toupper $a]]
lset letters end-$i $L
}
}
join $letters ""
}
puts [compare_chars "Hello Phillipines"]
# outputs => HEllo PhIllipinEs
The simplest way to code this is to use foreach over the split-up characters. (It's formally not the most efficient, but it's very easy to code correctly.)
puts "Please enter text:"
set myText [gets stdin]
set chars [split $myText ""]
set idx 0
foreach a $chars b [lreverse $chars] {
if {[string equals -nocase $a $b]} {
lset chars $idx [string toupper $a]
}
incr idx
}
set output [join $chars ""]
puts $output
Note that the foreach is iterating over a copy of the list; there are no problems with concurrent modification. In fact, the only vaguely-tricky part from a coding perspective is actually that we need to keep track of the index to modify, in the idx variable above.
With Tcl 8.6 you could write:
set chars [split $myText ""]
set output [join [lmap a $chars b [lreverse $chars] {
expr {[string equals -nocase $a $b] ? [string toupper $a] : $a}
}] ""]
That does depend on having the new lmap command though.
If you're really stuck with 8.3 (it's unsupported and has been so for years, so you should be prioritizing upgrading to something more recent) then try this:
set chars [split $myText ""]
set idx [llength $chars]
set output {}
foreach ch $chars {
if {[string equals -nocase $ch [lindex $chars [incr idx -1]]]} {
append output [string toupper $ch]
} else {
append output [string tolower $ch]
}
}
All the features this uses were present in 8.3 (though some were considerably slower than in later versions).