How to make without exception uppercase and lowercase letters in tcl regsub? - tcl

I want to regsub without exception, so as not to double I do a regsub.
example before:
regsub -all "feat" $songtitle "" songtitle
regsub -all "Feat" $songtitle "" songtitle
I want a simple one line for regsub:
regsub -all "feat" $songtitle "" songtitle
It's a little inconvenient if there are many words that I want to regsub, I want it to be simple with no exceptions in the regsub, so that only one line of each word is regsub, not two lines for uppercase and lowercase letters. thanks

You can specify the -nocase option to regsub to get it to ignore case when matching.
regsub -all -nocase {feat} $songtitle "" songtitle
You can also enable that mode of operation by putting the (?i) marker at the start of the RE:
regsub -all {(?i)feat} $songtitle "" songtitle
You probably should put some \y (a word boundary constraint) in that RE too, so that it doesn't change defeated into deed:
regsub -all {(?i)\yfeat\y} $songtitle "" songtitle
(Once you add either backslashes or square brackets to an RE, it's pretty much essential in Tcl that you put the RE in curly braces. Otherwise you end up using a disgustingly large number of backslashes…)

Also be aware of the string map command:
string map {feat "" Feat ""} $songtitle
Useful when you don't actually need regular expressions.

Related

How to remove a single letter/number

I have single letters and numbers in a variable that I would like to remove
example inputs:
USA-2019-1-aoiwer
USA-A-jowerasf
BB-a_owierlasdf-2019
flsfwer_5_2015-asfdlwer
desired outputs:
USA-2019--aoiwer
USA--jowerasf
BB-_owierlasdf-2019
flsfwer__2015-asfdlwer
my code:
bind pub "-|-" !aa proc:aa
proc proc:aa { nick host handle channel arg } {
set line [lindex $arg 0]
set line [string map {[a-z] """} $line]
set line [string map {[0-9] """} $line]
putnow "PRIVMSG $channel :$line"
}
Unfortunately that does not work and i have no other idea
Regards
string map would remove all the lowercase letters and numbers, if it worked. However, you also have unbalanced quotes, which causes a syntax error when the proc is resolving.
I would recommend using regsub. The hard part, however, would be to get a proper expression to do the task. I will suggest the following:
bind pub "-|-" !aa proc:aa
proc proc:aa { nick host handle channel arg } {
set line [lindex $arg 0]
regsub -nocase -all {([^a-z0-9]|\y)[a-z0-9]([^a-z0-9]|\y)} $line {\1\2} line
putnow "PRIVMSG $channel :$line"
}
Basically ([^a-z0-9]|\y) matches a character that is non alphanumeric, or a word boundary (which will match at the beginning of a sentence for example if it can, or at the end of a sentence), and stores it (this is the purpose of the parens).
The matched groups are stored in order starting with 1, so in the replace portion of regsub, I'm placing the parts that shouldn't be replaced back where they were.
The above should work fine.
You could technically go a little fancier with a slightly different expression:
regsub -nocase -all {([^a-z0-9]|\y)[a-z0-9](?![a-z0-9])} $line {\1} line
Which uses a negative lookahead ((?! ... )).
Anyway, if you do want to get more in depth, I recommend reading the manual on regular expression syntax

Regsub use for set text [join $text \n] for missing brace error

I am getting missing closing brace error for the line
set text [join $text \n]
my entire code is
proc ProcessText { text} {
regsub -all -- ({) $text {\{} text
set text [join $text \n]
return $text
}
##it starts from here
set text "{a b c"
puts $text
puts [ProcessText $text]
If I am using regsub to replace the { to any proper substitution that will not throw error, I am getting error
"Missing close-brace while executing"proc ProcessText {}"
if I comment regsub then I get error
"unmatched open brace in list while executing
"join $text \n"
Can anyone please suggest me here how to proceed for the same in tcl.
FYI:
text is a list which contains lot of textual information in which a { is also there, if i remove the {. It works other wise not.
As Donal has sensed already, it is the formatting of the value hold by variable text that does not conform to a Tcl list, which is expected by [join], however.
Your options are:
1) Turn the value into a Tcl list by using [split]:
join [split $text] \n
2) Avoid the conversion into a list and [join] altogether by using [string map]:
string map {" " "\n"} $text
(or use [regsub] as below, if you can't control white-space proliferation in your input)
Sometimes, a string better stays just a string ;)
Varia
Your use of [regsub] is problematic, foremost, better use it once to obtain your ultimate goal, rather than sanitizing the input string before calling [join]:
regsub -all {\s+} $text "\n"
Background
You run into errors because you do not escape the sentinel { in the regular expression ({) to [regsub] correctly:
regsub -all -- ({) $text {\{} text
This should be:
regsub -all -- {\{} $text {\{} text
In your variant, { is considered an opening brace that is, actually, not matched in the remainder of the script.

how to find and replace sencond occurance of string using regsub

I am new to tcl, trying to learn, need a help for below.
My string looks like in configFileBuf and trying to replace second occurance of ConfENB:local-udp-port>31001" with XYZ, but below regsub cmd i was tried is always replacing with first occurance (37896). Plz help how to replace second occurance with xyz.
set ConfigFileBuf "<ConfENB:virtual-phy>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>37896</ConfENB:local-udp-port>
</ConfENB:local-ip-addr>
<ConfENB:local-udp-port>31001</ConfENB:local-udp-port>
</ConfENB:virtual-phy>"
regsub -start 1 "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>" $ConfigFileBuf "XYZ" ConfigFileBuf
puts $ConfigFileBuf
You have to use regexp -indices to find where to start the replacement, and only then regsub. It's not too bad if you put the regular expression in its own variable.
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] 1 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
The 1 is the number of submatches in the RE (zero in this case) plus 1. You can compute it with the help of regexp -about, giving this piece of trickiness:
set RE "</ConfENB:local-ip-addr>\[ \n\t\]+<ConfENB:local-udp-port>\[0-9 \]+</ConfENB:local-udp-port>"
set relen [expr {1 + [lindex [regexp -about $RE] 0]}]
set start [lindex [regexp -all -indices -inline $RE $ConfigFileBuf] $relen 0]
regsub -start $start RE $ConfigFileBuf "XYZ" ConfigFileBuf
If your string was well-formed XML I'd suggest something like tDOM to manipulate it. DOM-style manipulation is almost always better than regular expression-based manipulation on XML markup. (I mention this on the off chance that it's actually supposed to be XML and you just quoted it wrong.)
It looks like you're trying to use -start 1 to tell regsub to skip the first match. The starting index is actually a character index, so in this invocation regsub will just skip the first character in the string. You could set -start further into your string, but that's fragile unless you use regexp to calculate where the first match ends.
I think the best solution would be to get a list of indices to matches by invoking regexp with -all -inline -indices, pick out the second index pair using lindex and finally use string replace to perform the substitution, like this:
set pattern {</ConfENB:local-ip-addr>[ \n\t]+<ConfENB:local-udp-port>[0-9 ]+</ConfENB:local-udp-port>}
set matches [regexp -all -inline -indices -- $pattern $ConfigFileBuf]
set match [lindex $matches 1]
set ConfigFileBuf [string replace $ConfigFileBuf {*}$match XYZ]
The variable match contains a pair of indices (start and end, respectively) for the range of characters you want to replace. As string replace expects those indices to be in different arguments you need to expand $match with the {*} prefix. If you have an earlier version of Tcl than 8.5, you need a slight change to the above code:
foreach {start end} $match break
set ConfigFileBuf [string replace $ConfigFileBuf $start $end XYZ]
In passing, note that you can avoid escaping e.g. character sets in a regular expression if you quote it with braces instead of double quotes.
Documentation links: regexp, lindex, string

How to convert a string where the first letter is in uppercase and rest are lower case.

For example I have a sentence : whAT is yOur hoUSe nUmBer ? Is iT 26. I have to convert all the first the first letters of each word to uppercase and rest in lower case. I am suppose to use all lsearch, lindex lreplace and stuff and form the code. Can someone tell me how to do this?
The string totitle command is close: it lowercases the whole string except for the first char which is uppercase.
set s {whAT is yOur hoUSe nUmBer ? Is iT 26.}
string totitle $s
What is your house number ? is it 26.
To capitalize each word is a little more involved:
proc CapitalizeEachWord {sentence} {
subst -nobackslashes -novariables [regsub -all {\S+} $sentence {[string totitle &]}]
}
set s {whAT is yOur hoUSe nUmBer ? Is iT 26.}
CapitalizeEachWord $s
What Is Your House Number ? Is It 26.
The regsub command takes each space-separated word and replaces it with the literal string "[string totitle word]":
"[string totitle whAT] [string totitle is] [string totitle yOur] [string totitle hoUSe] [string totitle nUmBer] [string totitle ?] [string totitle Is] [string totitle iT] [string totitle 26.]"
The we use the subst command to evaluate all the individual "string totitle" commands.
When Tcl 8.7 comes out, we'll be able to do:
proc CapitalizeEachWord {sentence} {
regsub -all -command {\S+} $sentence {string totitle}
}
The usual model (in 8.6 and before) for applying a command to a bunch of regular-expression-chosen substrings of a string is this:
subst [regsub -all $REtoFindTheSubstrings [MakeSafe $input] {[TheCommandToApply &]}]
The MakeSafe is needed because subst doesn't just do the bits that you want. Even with disabling some substitution classes (e.g., with the -novariables) option, you still need the trickiest one of all — command substitutions — and that means that strings like hello[pwd]goodbye can catch you out. To deal with this you make the string “safe” by replacing every Tcl metacharacter (or at least the ones that matter in a subst) by its backslashed version. Here's a classic version of MakeSafe (that you'll often see inlined):
proc MakeSafe {inputString} {
regsub -all {[][$\\{}"" ]} $inputString {\\&}
}
Demonstrating it interactively:
% MakeSafe {hello[pwd]goodbye}
hello\[pwd\]goodbye
With that version, no substitution classes need to be turned off in subst though you could turn off variables, and there's no surprises possible when you're applying the command as things that could crop up in the substituted argument string have been escaped. But there's a big disadvantage: you potentially need to change the regular expression in your transformation to take account of the extra backslashes now present. It's not required for the question's RE (as that just selects sequences of word characters) and indeed that could safely be this reduced version:
subst [regsub -all {\w+} [regsub -all {[][\\$]} $input {\\&}] {[string totitle &]}]
In 8.7 onwards, there's a -command option to regsub that avoids all this mess. It's also quite a bit faster, as subst works by compiling its transformations into bytecode (that's not a good win for a one-off substitution!) and regsub -command uses direct command invoking instead, much more likely to be fast.
regsub -all -command {\w+} $input {string totitle}
The internal approach used by regsub -all -command can be emulated in 8.6 (or earlier with more extra shims), but it is non-trivial:
proc regsubAllCommand {RE input command} {
# I'll assume there's no sub-expressions in the RE in order to keep this code shorter
set indices [regexp -all -inline -indices -- $RE $input]
# Gather the replacements first to make state behave right
set replacements [lmap indexPair $indices {
# Safe version of: uplevel $command [string range $input {*}$indexPair]
uplevel 1 [list {*}$command [string range $input {*}$indexPair]]
}]
# Apply the replacements in reverse order
set output $input
foreach indexPair [lreverse $indices] replacement [lreverse $replacements] {
set output [string replace $output {*}$indexPair $replacement]
}
return $output
}
The C implementation of regsub uses working buffers and so on internally, but that's not quite so convenient at the Tcl level.
You can Use Initcap function for making 1st letter in upper case and rest in lower case.

TCL command - string trim

I was using the command 'string trimright' to trim my string but I found that this command trims more than required.
My expression is "dssss.dcsss" If I use string trim command to trim the last few characters ".dcsss", it trims the entire string. How can I deal with this?
Command:
set a [string trimright "dcssss.dcsss" ".dcsss"]
puts $a
Intended output:
dcsss
Actual output
""
The string trimright command treats its (optional) last argument as a set of characters to remove (and so .dcsss is the same as sdc. to it), just like string trim and string trimleft do; indeed, string trim is just like using both string trimright and string trimleft in succession. This makes it unsuitable for what you are trying to do; to remove a suffix if it is present, you can use several techniques:
# It looks like we're stripping a filename extension...
puts [file rootname "dcssss.dcsss"]
# Can use a regular expression if we're careful...
puts [regsub {\.dcsss$} "dcssss.dcsss" {}]
# Do everything by hand...
set str "dcssss.dcsss"
if {[string match "*.dcsss" $str]} {
set str [string range $str 0 end-6]
}
puts $str
If what you're doing really is filename manipulation, like it looks like, do use the first of these options. The file command has some really useful commands for working with filenames in a cross-platform manner in it.