How to trim two words from right of a string - tcl

I want to remove two words from right of a string.
For example:
set str "sachin is the pride of india"
I need to remove india and of from right and there should be no space after that.
I have tried using string trimright.

The string trimright command is exactly the wrong tool for this; it treats its trim argument as a set of characters to remove, not a literal. The simplest way of doing this is with lreplace, provided the string doesn't contain list metacharacters and you don't care about the number of spaces.
set shortened [lreplace $str end-1 end]
If you need to do it reliably, regular expressions are the tool of choice.
set shortened [regsub {\s*\S+\s+\S+\s*$} $str ""]
Use regsub for this. Please.

Related

How to search for 0,a1[4],* where * is a wildcard in a list of 0,a2,4 0,a1[4],3 0,a1[4],5 .... in tcl

I tried lsearch -all $list_ 0,a1[4],*
a1[4] is stored in a variable
SO basically need
set var "a1[4]"
lsearch -all $list_ 0,$var,*
By default lsearch uses glob patterns (as described by the documentation for string match — it's the exact same matching engine being used). That's good because it means that * is a wildcard, but awkward because it means that [ is also special (it starts a character set match). You need some simple escaping, and to keep that sane you should put your whole pattern in {braces} so we don't need to fight with Tcl over what the meanings of bracket and backslash are:
lsearch -all $list_ {0,a1\[4\],*}
You don't need braces; you could write this instead:
lsearch -all $list_ 0,a1\\\[4\\\],*
But that's ugly! And difficult to maintain (trust me on that). So use braces, OK?
In the case where you're pulling the subpattern from a variable, things get more complicated. The fix is to use string map (or regsub) to condition the pattern piece.
# Split into three lines for clarity; qvar = “quoted var”
set ADD_BACKSLASHES {[ {\[} ] {\]}}
set qvar [string map $ADD_BACKSLASHES $var]
lsearch -all $list_ 0,$qvar,*

Can I convert a string with space using totitle?

The Tcl documentation is clear on how to use string totitle:
Returns a value equal to string except that the first character in
string is converted to its Unicode title case variant (or upper case
if there is no title case variant) and the rest of the string is
converted to lower case.
Is there a workaround or method that will convert a string with spaces (the first letter of each word would be upper case)?
For example in Python:
intro : str = "hello world".title()
print(intro) # Will print Hello World, notice the capital H and W.
In Tcl 8.7, the absolutely most canonical way of doing this is to use regsub with the -command option to apply string totitle to the substrings you want to alter:
set str "hello world"
# Very simple RE: (greedy) sequence of word characters
set tcstr [regsub -all -command {\w+} $str {string totitle}]
puts $tcstr
In earlier versions of Tcl, you don't have that option so you need a two stage transformation:
set tcstr [subst [regsub -all {\w+} $str {[string totitle &]}]]
The problem with this is that it will below up if the input string has certain Tcl metacharacters in it; it is possible to fix this, but it's horrible to do; I added the -command option to regsub precisely because I was fed up of having to do a multi-stage substitute just to make a string I could feed through subst. Here's the safe version (the input stage could also be done with string map):
set tcstr [subst [regsub -all {\w+} [regsub -all {[][$\\]} $str {\\&}] {[string totitle &]}]]
It gets really complicated (well, at least quite non-obvious) when you want to actually do the replacement on substrings that have been transformed. Which is why it is now possible to circumvent all that mess with regsub -command that is careful with word boundaries when doing the replacement command running (because the Tcl C API is actually good at that).
Donal gave you an answer but there is a package that allows you to do what you want textutil::string from Tcllib
package require textutil::string
puts [::textutil::string::capEachWord "hello world"]
> Hello World

TCL regsub multiple special characters in one shot

Is there a way to add escape '\' into a string with multiple special characters?
Example input : a/b[1]/c/d{3}
Desired outcome : a\/b\[1\]\/c\/d\{3\}
I've done it in multiple regsubs one special character at a time. But is there a way to do it in one shot?
I would simply escape all non-word characters:
set input {a/b[1]/c/d{3}}
set output [regsub -all {\W} $input {\\&}]
puts $output
a\/b\[1\]\/c\/d\{3\}
ref: https://tcl.tk/man/tcl8.6/TclCmd/regsub.htm and https://tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm
The general approach to use is to build a RE character set ([…]) and use that. You have to be a bit careful with those in some cases (some characters are special in them, especially ^, ], - and \), but it's not too difficult.
regsub -all {[][/{}]} $input {\\&}
However, if you can use character classes (such as \W or [^\w]) then it's a lot simpler and easier to read. Most common cases of needing to apply backslashes work with those.

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

Is there any Tcl function to add escape character automatically?

Is there any Tcl function to add escape character to a string automatically?
For example, I have a regular expression
"[xy]"
After I call the function, I get
"\[xy]"
After being called again, I get
"\\\[xy]"
I remember there's such function with some script language, but I cannot recall which language it is.
The usual way of adding such escape characters as are “necessary” is to use list (% is my Tcl prompt):
% set s {[xy]}
[xy]
% set s [list $s]
{[xy]}
% set s [list $s]
{{[xy]}}
The list command prefers to leave alone if it can, wrap with braces if it can get away with it, and resorts to backslashing otherwise (because backslashes are really unreadable).
If you really need backslashes, string map or regsub will do what you need. For example:
set s [regsub -all {\W} $s {\\&}]