TCL regsub multiple special characters in one shot - tcl

Is there a way to add escape '\' into a string with multiple special characters?
Example input : a/b[1]/c/d{3}
Desired outcome : a\/b\[1\]\/c\/d\{3\}
I've done it in multiple regsubs one special character at a time. But is there a way to do it in one shot?

I would simply escape all non-word characters:
set input {a/b[1]/c/d{3}}
set output [regsub -all {\W} $input {\\&}]
puts $output
a\/b\[1\]\/c\/d\{3\}
ref: https://tcl.tk/man/tcl8.6/TclCmd/regsub.htm and https://tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm

The general approach to use is to build a RE character set ([…]) and use that. You have to be a bit careful with those in some cases (some characters are special in them, especially ^, ], - and \), but it's not too difficult.
regsub -all {[][/{}]} $input {\\&}
However, if you can use character classes (such as \W or [^\w]) then it's a lot simpler and easier to read. Most common cases of needing to apply backslashes work with those.

Related

How to search for 0,a1[4],* where * is a wildcard in a list of 0,a2,4 0,a1[4],3 0,a1[4],5 .... in tcl

I tried lsearch -all $list_ 0,a1[4],*
a1[4] is stored in a variable
SO basically need
set var "a1[4]"
lsearch -all $list_ 0,$var,*
By default lsearch uses glob patterns (as described by the documentation for string match — it's the exact same matching engine being used). That's good because it means that * is a wildcard, but awkward because it means that [ is also special (it starts a character set match). You need some simple escaping, and to keep that sane you should put your whole pattern in {braces} so we don't need to fight with Tcl over what the meanings of bracket and backslash are:
lsearch -all $list_ {0,a1\[4\],*}
You don't need braces; you could write this instead:
lsearch -all $list_ 0,a1\\\[4\\\],*
But that's ugly! And difficult to maintain (trust me on that). So use braces, OK?
In the case where you're pulling the subpattern from a variable, things get more complicated. The fix is to use string map (or regsub) to condition the pattern piece.
# Split into three lines for clarity; qvar = “quoted var”
set ADD_BACKSLASHES {[ {\[} ] {\]}}
set qvar [string map $ADD_BACKSLASHES $var]
lsearch -all $list_ 0,$qvar,*

Split camelcase value with TCL

I have this TCL expression:
[string toupper [join [lrange [file split [value [topnode].file]] 1 1]]]
This retrieves companyName value from c:/companyName... and I need to split that value before the first capital letter into Company Name. Any ideas?
Thanks in advance.
That's rather more in one word than I would consider a good idea. It makes the whole thing quite opaque! Let's split it up.
Firstly, I would expect the base company name to be better retrieved with lindex from the split filename.
set companyName [lindex [file split [value [topnode].file]] 1]
Now, we need to process that to get the human-readable version out of it. Alas, that's going be a bit difficult without knowing what's been done to it, but if we use as our example fooBarBoo_grill then we can see what we can do. First, we get the pieces with some regular expressions (this part might need tweaking if there are non-ASCII characters involved, or if certain critical characters need special treatment):
# set companyName "fooBarBoo_grill"
set pieces [regexp -all -inline {[a-z]+|[A-Z][a-z]*} $companyName]
# pieces = foo Bar Boo grill
Next, we need to capitalise. I'll assume you're using Tcl 8.6 and so have lmap as it is perfect for this task. The string totitle command has been around for a very long time.
set pieces [lmap word $pieces {string totitle $word}]
# pieces = Foo Bar Boo Grill
That list might need a bit more tweaking, or it might be OK as it is. An example of tweaking that might be necessary is if you've got an Irish name like O'Hanrahan, or if you need to insert a comma before and period after Inc.
Finally, we properly ought to set companyName [join $pieces] to get back a true string, but that doesn't have a noticeable effect with a list of words made purely out of letters. Also, more complex joins with regular expressions might be needed if you've done insertion of prefixing punctuation (the , Inc. case).
If I was doing this for real, I'd try to have the proper company name expressed directly elsewhere rather than relying on the filename. Much simpler to get right!
To begin with, try using
lindex [file split [value [topnode].file]] 1
The lrange command will return a list, which might cause problems with some directory names. The join command should be pointless if you don't use lrange, and string toupper removes the information you need to do the operation you want to do.
To split before uppercase letters, you can use repetitive matches of either (?:[a-z]+|[A-Z][a-z]+) (ASCII / English alphabet letters only) or (?:[[:lower:]]+|[[:upper:]][[:lower:]]+) (any Unicode letters).
% regexp -all -inline {(?:[a-z]+|[A-Z][a-z]+)} camelCaseWord
camel Case Word
Use string totitle to change the first letter of the first word to upper case.
Documentation:
file,
lindex,
regexp,
string,
Syntax of Tcl regular expressions

Escape square bracket in Tcl_StringCaseMatch

I am using Tcl_StringCaseMatch function in C++ code for string pattern matching. Everything works fine until input pattern or string has [] bracket. For example, like:
str1 = pq[0]
pattern = pq[*]
Tcl_StringCaseMatch is not working i.e returning false for above inputs.
How to avoid [] in pattern matching?
The problem is [] are special characters in the pattern matching. You need to escape them using a backslash to have them treated like plain characters
pattern= "pq\\[*\\]"
I don't think this should affect the string as well. The reason for double slashing is you want to pass the backslash itself to the TCL engine.
For the casual reader:
[] have a special meaning in TCL in general, beyond the pattern matching role they take here - "run command" (like `` or $() in shells), but [number] will have no effect, and the brackets are treated normally - thus the string str1 does not need escaping here.
For extra confusion:
TCL will interpret ] with no preceding [ as a normal character by default. I feel that's getting too confusing, and would rather that TCL complains on unbalanced brackets. As OP mentions though, this allows you to forgo the final two backslashes and use "pq\\[*]". I dislike this, and rather make it obvious both are treated normally and not the usual TCL way, but to each her/is own.

How to trim two words from right of a string

I want to remove two words from right of a string.
For example:
set str "sachin is the pride of india"
I need to remove india and of from right and there should be no space after that.
I have tried using string trimright.
The string trimright command is exactly the wrong tool for this; it treats its trim argument as a set of characters to remove, not a literal. The simplest way of doing this is with lreplace, provided the string doesn't contain list metacharacters and you don't care about the number of spaces.
set shortened [lreplace $str end-1 end]
If you need to do it reliably, regular expressions are the tool of choice.
set shortened [regsub {\s*\S+\s+\S+\s*$} $str ""]
Use regsub for this. Please.

Is there any Tcl function to add escape character automatically?

Is there any Tcl function to add escape character to a string automatically?
For example, I have a regular expression
"[xy]"
After I call the function, I get
"\[xy]"
After being called again, I get
"\\\[xy]"
I remember there's such function with some script language, but I cannot recall which language it is.
The usual way of adding such escape characters as are “necessary” is to use list (% is my Tcl prompt):
% set s {[xy]}
[xy]
% set s [list $s]
{[xy]}
% set s [list $s]
{{[xy]}}
The list command prefers to leave alone if it can, wrap with braces if it can get away with it, and resorts to backslashing otherwise (because backslashes are really unreadable).
If you really need backslashes, string map or regsub will do what you need. For example:
set s [regsub -all {\W} $s {\\&}]