how to find the count of Uppercase and lowercase letters in TCL - tcl

how to find the count of uppercase amd lower case in tcl
with this code im getting only ascii values
foreach character {H e l l o T C L} {
scan $character %c numeric
puts "ASCII character '$numeric' displays as '$character'."
}

Instead of looping through a string yourself, you can use regexp to give you a count:
set str "Hello Tcl"
puts "Uppercase: [regexp -all {[[:upper:]]} $str]"
puts "Lowercase: [regexp -all {[[:lower:]]} $str]"
I'm using [[:upper:]] and [[:lower:]] instead of [A-Z] and [a-z] because the former will correctly capture unicode upper- and lowercase, rather than just the ones in the ASCII set.

You can test each character with string is upper $character and string is lower $character. Note that non-alphabetic characters are neither upper or lower case. For more info check the documentation at https://www.tcl-lang.org/man/tcl8.6/TclCmd/string.htm#M10

Related

How to compare tcl strings with don't care chars in the middle?

I have lists of strings I want to compare
When comparing 2 strings, I want to ignore a single char - making it a don't care.
e.g.
Mister_T_had4_beers
should be equal to:
Mister_Q_had4_beers
but shouldn't be equal to Mister_T_had2_beers
I know that _had\d+ will always appear in the string, so it can be used as an anchor.
I believe I can split the 2 strings using regexp and compare, or use string equal -length to the point and from it onwards, but there must be a nicer way...
Edit
Based on the answer below (must read - pure gold!) the solution comes from regexp:
regexp -line {(.*).(_had\d+.*)\n\1.\2$} $str1\n$str2
If you know which character can vary, the easiest way is to use string match with a ? at the varying position.
if {[string match Mister_?_had4_beers $string1]} {
puts "$string1 matches the pattern"
}
You can also use string range or string replace to get strings to compare:
# Compare substrings; prefixes can be done with [string equal -length] too
if {[string range $string1 0 6] eq [string range $string2 0 6]
&& [string range $string1 8 end] eq [string range $string2 8 end]} {
puts "$string1 and $string2 are equal after ignoring the chars at index 7"
}
# Compare strings with variation point removed
if {[string replace $string1 7 7] eq [string replace $string2 7 7]} {
puts "$string1 and $string2 are equal after ignoring the chars at index 7"
}
To have the varying point be at an arbitrary position is trickier. The easiest approach for that is to select a character that is present in neither string, say a newline, and use that to make a single string that we can run a more elaborate RE against:
regexp -line {^(.*).(.*)\n\1.\2$} $string1\n$string2
The advantage of using a newline is that regexp's -line matching mode makes . not match a newline; we need to match it explicitly (which is great for our purposes).
If the strings you're comparing have newlines in, you'll need to pick something else (and the preferred RE gets more long-winded). There's lots of rare Unicode characters you could choose, but \u0000 (NUL) is one of the best as it is exceptionally rare in non-binary data.

Why the newline char are removed by concat

I have this code:
set l [concat a b c "\r\n"]
puts "[llength $l]:$l"
I like to add "\r\n" as the last element of the list, but it seems it is removed:
>tclsh try.tcl
3:a b c
Any reason of that?
Both \r (carriage return) and \n (newline) are whitespace characters according to Tcl's rules, so the whitespace character stripping rules of concat remove them from leading and trailing positions. As the documentation says (emphasis mine):
This command joins each of its arguments together with spaces after trimming leading and trailing white-space from each of them.
If you want that extra two-character EOL-sequence on the end of your list where it won't affect the values in the list, just append it afterwards:
set l [concat a b c]
append l "\r\n"
puts "$l:[llength $l]"
On the other hand, if you want that string as a list element, lappend it as that will automatically add all the quoting required. Also bear in mind that concat isn't a true list concatenation operation (it does complicated string operations); the true list concatenation is:
set concatenated [list {*}$listA {*}$listB]
In Tcl, list elements are separated by spaces or newlines. So having newline alone doesn't mandate it to be separate element. You should nest the list element and then it will work.
% set l [concat a b c [list "\r\n"]]
a b c {
}
% puts "[llength $l]:$l"
4:a b c {
}
%
The concat command returns a string which is a prettified list with a single space between elements and no whitespace before the first element or after the last.
What you're looking for is the list command:
% set l [list a b c "\r\n"]
a b c {
}
% puts [llength $l]:$l
4:a b c {
}
Documentation:
concat,
list

How to match a colon after a close bracket

Why does the following not match the :
expect {
timeout {puts timedout\n}
\[1\]: {puts matched\n}
}
> expect test.tcl
[1]:
timedout
If I change it and remove the colon the match works:
expect {
timeout {puts timedout\n}
\[1\] {puts matched\n}
}
$ expect test.tcl
[1]
matched
Or if I get rid of the 1st bracket
expect {
timeout {puts timedout\n}
1\]: {puts matched\n}
}
then it matches:
$ expect test.tcl
1]:
matched
It is not the problem with :, but with [.
The [ is special to both Tcl and the Expect pattern matcher so it is particularly messy. To match a literal [, you have to backslash once from Tcl and then again so that it is not treated as a range during pattern matching. The first backslash, of course, has to be backslashed to prevent it from turning the next backslash into a literal backslash!
expect "\\\[" ; #matches literal '['
So, your code should be,
expect {
timeout {puts timedout\n}
\\\[1]: {puts matched\n}
}
You can prefix the ] with a backslash if it makes you feel good, but it is not
necessary. Since there is no matching left-hand bracket to be matched within the
double-quoted string, nothing special happens with the right-hand bracket. It stands for itself and is passed on to the Expect command, where it is then interpreted as the end of the range.
The next set of examples shows the behavior of [ as a pattern preceded by differing numbers of backslashes. If the [ is not prefixed by a backslash, Tcl interprets whatever follows as a command. For these examples, imagine that there is a procedure named XY that returns the string n*w.
expect" [XY]" ; # matches n followed by anything
expect "\[XY]" ; # matches X or Y
expect "\\[XY]" ; # matches n followed by anything followed by w
expect "\\\[XY]" ; # matches [XYl
expect "\\\\[XY]" ; # matches \ followed by n followed ...
expect "\\\\\[XY]" ; # matches sequence of \ and X or Y
The \\[XY] case deserves close scrutiny. Tcl interprets the first backslash to mean that the second is a literal character. Tcl then produces n*w as the result of the XY command. The pattern matcher ultimately sees the four character string n*w. The pattern matcher interprets this in the usual way. The backslash indicates that the n is to be matched literally (which it would even without the backslash since the n is not special to the pattern matcher).
Source : Exploring Expect
The patterns that worked for me:
-exact {[1]:}
-exact "\[1]:"
{\[1]:}
"\\\[1]:"

how to find and replace sencond occurance of string using regsub

I am new to tcl, trying to learn, need a help for below.
My string looks like in configFileBuf and trying to replace second occurance of ConfENB:local-udp-port>31001" with XYZ, but below regsub cmd i was tried is always replacing with first occurance (37896). Plz help how to replace second occurance with xyz.
set ConfigFileBuf "<ConfENB:virtual-phy>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>37896</ConfENB:local-udp-port>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>31001</ConfENB:local-udp-port>
</ConfENB:virtual-phy>"
regsub -start 1 "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>" $ConfigFileBuf "XYZ" ConfigFileBuf
puts $ConfigFileBuf
You have to use regexp -indices to find where to start the replacement, and only then regsub. It's not too bad if you put the regular expression in its own variable.
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] 1 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
The 1 is the number of submatches in the RE (zero in this case) plus 1. You can compute it with the help of regexp -about, giving this piece of trickiness:
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set relen [expr {1 + [lindex [regexp -about $RE] 0]}]
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] $relen 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
If your string was well-formed XML I'd suggest something like tDOM to manipulate it. DOM-style manipulation is almost always better than regular expression-based manipulation on XML markup. (I mention this on the off chance that it's actually supposed to be XML and you just quoted it wrong.)
It looks like you're trying to use -start 1 to tell regsub to skip the first match. The starting index is actually a character index, so in this invocation regsub will just skip the first character in the string. You could set -start further into your string, but that's fragile unless you use regexp to calculate where the first match ends.
I think the best solution would be to get a list of indices to matches by invoking regexp with -all -inline -indices, pick out the second index pair using lindex and finally use string replace to perform the substitution, like this:
set pattern {</ConfENB:local-ip-addr>[ \n\t]+<ConfENB:local-udp-port>[0-9 ]+</ConfENB:local-udp-port>}
set matches [regexp -all -inline -indices -- $pattern $ConfigFileBuf]
set match [lindex $matches 1]
set ConfigFileBuf [string replace $ConfigFileBuf {*}$match XYZ]
The variable match contains a pair of indices (start and end, respectively) for the range of characters you want to replace. As string replace expects those indices to be in different arguments you need to expand $match with the {*} prefix. If you have an earlier version of Tcl than 8.5, you need a slight change to the above code:
foreach {start end} $match break
set ConfigFileBuf [string replace $ConfigFileBuf $start $end XYZ]
In passing, note that you can avoid escaping e.g. character sets in a regular expression if you quote it with braces instead of double quotes.
Documentation links: regexp, lindex, string

TCL : find and remove all characters in a string from the first occurrence of a character in a string

I am a newbie in TCL and in need of a TCL method / utility / code which can find and remove all characters (including itself) in a string from its first occurrence in a string.
I have a string like below:
Func::set()->method();
In the above string I need to find first occurrence of '(' and remove all it and after that so that the resultant string would be just:
Func::set
You can do it with a regular expression:
set the_string [regsub {\(.*} $the_string ""]
or if you're not familiar with regexp then you can do it the more traditional way:
set the_string [
string range $the_string 0 [
expr {[string first "(" $the_string]-1}
]
]
For further info, read the manual pages for [string], [regsub] and [re_syntax].
Another technique is to use split and lindex:
set the_string [lindex [split $the_string "("] 0]
This is short and simple, but may do a lot of extra work if your string is very long.