Searching for a number in a sorted list in Tcl - tcl

I'm using Tcl. I have a sorted list of real numbers. Given the number n I need to find an index of the list element which is:
either less of equal to n;
or greater than n.
Is there any standard way to do this? lsearch expects exact match and can't be used.

With Tcl 8.6 (still in beta) lsearch will do what your asking, the -sorted and new -bisect options allow the following:
-bisect
Inexact search when the list elements are in sorted order. For an increasing list the last index where the element is less than
or equal to the pattern is returned. For a decreasing list the last
index where the element is greater than or equal to the pattern is
returned.
For Tcl versions prior to 8.6 your going to have to roll your own code, given that the list is sorted it should be fairly straightforward to write a binary search with the properties that you require, Rosetta code here contains a description of a pure binary search and also a Tcl implemention. You should be able to use this as your starting point.
Here is a very quick version I created, it returns the index of either the value you search for or the value greater than it. The exception to watch for is the end of the list, searching for a value beyond the largest element returns the largest element. It's only had minimal testing so if you do use it do some additional tests! I also don't stop if the search finds the value, if this is likely to happen often you may want to optimize for this.
set lst [lsort -real [list 1.2 3.4 5.4 7.9 2.3 1.1 0.9 22.7 4.3]]
puts $lst
# Assumes that lst is sorted in ascending order
proc bisect { lst val } {
puts "Looking for $val in $lst"
set len [llength $lst]
# Initial interval - the start to the middle of the list
set start 0
set end [expr $len - 1]
set mid [expr $len / 2]
set lastmid -1
while { $mid != $lastmid } {
if { [expr $val <= [lindex $lst $mid]] } {
# val lies somewhere between the start and the mid
set end $mid
} else {
# val lies somewhere between mid and end
set start [expr $mid + 1]
}
set lastmid $mid
set mid [expr ($start + $end ) / 2]
}
return $mid
}
set res [bisect $lst 2.4]
puts "found [lindex $lst $res] at index $res"
set res [bisect $lst -1]
puts "found [lindex $lst $res] at index $res"
set res [bisect $lst 999]
puts "found [lindex $lst $res] at index $res"
set res [bisect $lst 1.2]
puts "found [lindex $lst $res] at index $res"
set res [bisect $lst 0.9]
puts "found [lindex $lst $res] at index $res"
set res [bisect $lst 22.7]
puts "found [lindex $lst $res] at index $res"

Related

How to create an efficient permutation algorithm in Tcl?

I have written the following proc in tcl which gives a permutation of the set {1, 2, ..., n} for some positive integer n:
proc permu {n} {
set list {}
while {[llength $list] < $n} {
set z [expr 1 + int(rand() * $n)]
if {[lsearch $list $z] == -1} {
lappend list $z
}
}
return $list
}
I have used some code snippets from tcl-codes which I found on other web sites in order to write the above one.
The following part of the code is problematic:
[lsearch $list $z] == -1
This makes the code quite inefficient. For example, if n=10000 then it takes a few seconds
until the result is displayed and if n=100000 then it takes several minutes. On the other hand, this part is required as I need to check whether a newly generated number is already in my list.
I need an efficient code to permute the set {1, 2, ..., n}. How can this be solved in tcl?
Thank you in advance!
Looking up a value in a list is a problem that grows in runtime as the list gets larger. A faster way is to look up a key in a dictionary. Key lookup time does not increase as the size of the dictionary increases.
Taking advantage of the fact the Tcl dictionary keys are ordered by oldest to most recent:
proc permu {n} {
set my_dict [dict create]
while {[dict size $my_dict] < $n} {
set z [expr 1 + int(rand() * $n)]
if {![dict exists $my_dict $z]} {
dict set my_dict $z 1
}
}
return [dict keys $my_dict]
}
This fixes the problem of slow list lookup, but the random number z is now the limiting factor. As the dict size approaches $n you need to wait longer and longer for a new value of z to be a unique value.
A different faster approach is to first assign the numbers 1 to n as value to randomized keys in a dict. Next, you can get values of each sorted key.
proc permu2 {n} {
# Add each number in sequence as a value to a dict for a random key.
set random_key_dict [dict create]
for {set i 1} {$i <= $n} {incr i} {
while {1} {
set random_key [expr int(rand() * $n * 100000)]
if {![dict exists $random_key_dict $random_key]} {
dict set random_key_dict $random_key $i
break
}
}
}
# Sort the random keys to shuffle the values.
set permuted_list [list]
foreach key [lsort -integer [dict keys $random_key_dict]] {
lappend permuted_list [dict get $random_key_dict $key]
}
return $permuted_list
}

To add preceding digits in a list in Tcl

I am trying to incorporate the preceding digits in a numerical list, say 1-14, before the list 15-43.
I am using TCL for writing my code.
I am expecting the list should be 1,2,3,....43, instead the list is coming to be 15,16,17,...43.
I have tried to incorporate the missing numbers as follows:
set nres2 ""
set x 0
set y [lindex $nres $x]
while {$x < [llength $nres]} {
set i [lindex $nres $x]
while {$y < $i} {
lappend nres2 $y
incr y
}
incr x
incr y
}
This will incorporate missing numbers within a list like, if a list 15,16,17...,43 do not have numbers like 18, 22, 34, etc., it will incorporate these numbers in a separate list named as nres2.
But, I cannot include the preceding numbers of a corresponding list.
Any comments/suggestions will be greatly helpful.
Thanks in advance...
Prathit Chatterjee
I would change the y at the start:
set nres2 ""
set x 0
set y 1
while {$x < [llength $nres]} {
set i [lindex $nres $x]
while {$y < $i} {
lappend nres2 $y
incr y
}
incr x
incr y
}
With nres as 4 6 8 9, nres2 becomes 1 2 3 5 7, which is what I think you are trying to get.
Jerry posted already the immediate fix, but you may think about general improvements over your implementation using pairwise comparisons:
proc printMissings1 {lst} {
set b [lrepeat [lindex $lst end] 0]
foreach i $lst {
lset b $i 1
}
lsearch -exact -integer -all $b 0
}
printMissings1 $nres
lrepeat is used to create a Tcl list (array) of [llength $nres] elements.
lset is used to mark the elements whose indices correspond to values in the given range.
lsearch is used to return the indices of the elements left unmarked, which correspond to the missing values.

Finding Median and average of list in tcl

I am having trouble finding a way to calculate the median and average of a list of numbers and the resources online seem to be really limited with Tcl. So far I managed to only print the numbers of the list.
Your help would be greatly appreciated.
proc ladd {l} {
set total 0
set counter 0
foreach nxt $l {
incr total $nxt
incr counter 1
}
puts "$total"
puts "$counter"
set average ($total/$counter)
puts "$average"
}
set a [list 4 3 2 1 15 6 29]
ladd $a
To get the average (i.e., the arithmetic mean) of a list, you can just do:
proc average {list} {
expr {[tcl::mathop::+ {*}$list 0.0] / max(1, [llength $list])}
}
That sums the values in the list (the trailiing 0.0 forces the result to be a floating point value, even if all the added numbers are integers) and divides by the number of elements (or 1 if the list is empty so an empty list gets a mean of 0.0 instead of an error).
To get the median of a list, you have to sort it and pick the middle element.
proc median {list {mode -real}} {
set list [lsort $mode $list]
set len [llength $list]
if {$len & 1} {
# Odd number of elements, unique middle element
return [lindex $list [expr {$len >> 1}]]
} else {
# Even number of elements, average the middle two
return [average [lrange $list [expr {($len >> 1) - 1] [expr {$len >> 1}]]]
}
}
To complete the set, here's how to get the mode of the list if there is a unique one (relevant for some applications where values are selected from a fairly small set):
proc mode {list} {
# Compute a histogram
foreach val $list {dict incr h $val}
# Sort the histogram in descending order of frequency; type-puns the dict as a list
set h [lsort -stride 2 -index 1 -descending -integer $h]
# The mode is now the first element
return [lindex $h 0]
}
I'll leave handling the empty and non-unique cases as an exercise.

TCL String Manipulation and Extraction

I have a string xxxxxxx-s12345ab7_0_0_xx2.log and need to have an output like AB700_xx2 in TCL.
ab will be the delimiter and need to extract from ab to . (including ab) and also have to remove only the first two underscores.
Tried string trim, string trimleft and string trimright, but not much use. Is there anything like string split in TCL?
The first stage is to extract the basic relevant substring; the easiest way to do that is actually with a regular expression:
set inputString "xxxxxxx-s12345ab7_0_0_xx2.log"
if {![regexp {ab[^.]+} $inputString extracted]} {
error "didn't match!"
}
puts "got $extracted"
# ===> got ab7_0_0_xx2
Then, we want to get rid of those nasty underscores with string map:
set final [string map {"_" ""} $extracted]
puts "got $final"
# ===> ab700xx2
Hmm, not quite what we wanted! We wanted to keep the last underscore and to up-case the first part.
set pieces [split $extracted "_"]
set final [string toupper [join [lrange $pieces 0 2] ""]]_[join [lrange $pieces 3 end] "_"]
puts "got $final"
# ===> got AB700_xx2
(The split command divides a string up into “records” by an optional record specifier — which defaults to any whitespace character — that we can then manipulate easily with list operations. The join command does the reverse, but here I'm using an empty record specifier on one half which makes everything be concatenated. I think you can guess what the string toupper and lrange commands do…)
set a "xxxxxxx-s12345ab7_0_0_xx2.log"
set a [split $a ""]
set trig 0
set extract ""
for {set i 0} {$i < [llength $a]} {incr i} {
if {"ab" eq "[lindex $a $i][lindex $a [expr $i+1]]"} {
set trig 1
}
if {$trig == 1} {
append extract [lindex $a $i]
}
}
set extract "[string toupper [join [lrange [split [lindex [split $extract .] 0] _] 0 end-1] ""]]_[lindex [split [lindex [split $extract .] 0] _] end]"
puts $extract
Only regexp is enough to do the trick.
Set string "xxxxxxx-s12345ab7_0_0_xx2.log"
regexp {(ab)(.*)_(.*)_(.*)_(.*)\\.} $string -> s1 s2 s3 s4 s5
Set rstring "$s1$s2$s3$s4\_$s5"
Puts $rstring

How can I cut the substring from a string in tcl

I have string like NYMEX UTBPI. Here I want to fetch the index of white space in middle of NYMEX and UTBPI and then from that index to last index I want to cut the substring. In this case my substring will be UTBPI
I'm using below
set part1 [substr $line [string index $line " "] [string index $line end-1]]
I'm getting below error.
wrong # args: should be "string index string charIndex"
while executing
"string index $line "
("foreach" body line 2)
invoked from within
"foreach line $pollerName {
set part1 [substr $line [string index $line ] [string index $line end-1]]
puts $part1
puts $line
}"
(file "Config.tcl" line 9)
Can you give me the idea on how can I do some other string manupulation as well. Any good link for this.
I would just use string range and pass it the index of the whitespace (that you can find using string first or whatever).
% set s "NYMEX UTBPI"
NYMEX UTBPI
% string range $s 6 end
UTBPI
Or using string first to dynamically find the whitespace:
% set output [string range $s [expr {[string first " " $s] + 1}] end]
UTBPI
If processor time isn't a problem, split it into a list and take the 2nd element:
set part1 [lindex [split $line] 1]
If the string can have an arbitrary number of words,
set new [join [lrange [split $line] 1 end]]
However, I'd use Donal's suggestion and stick with string operations.
I think, the best way to do it in Tcl, is:
set s "NYMEX UTBPI"
regexp -indices " " $s index;
puts [lindex $index 0]
the variable index will contain the first and the last index of your matching pattern. Here, as you are looking for single char, first and last will be the same, so you can use
puts [lindex $index 0]
or
puts [lindex $index 1]
For more info, this is the official doc: http://www.tcl.tk/man/tcl8.5/TclCmd/regexp.htm#M7