Hi i am newbie in TCL please help me with code, method anything. There is a string for example
(abcgfhdskls12345)HELLO(hikjkflklfk)
(bkjopkjjkl)HI(kjkkjjuilpp)
i just want to remove everything between () and want to print only Hi and Hello
You could use Tcl regsub to remove anything with parentheses around it:
set x "(abcgfhdskls12345)HELLO(hikjkflklfk) (bkjopkjjkl)HI(kjkkjjuilpp)"
regsub -all {\(.*?\)} $x {} x
puts $x
which yields:
$ tclsh foo.tcl
HELLO HI
My proposal is to use the regsub command, which performs substitutions based on regular expressions. You can write something like this:
set str {(abcgfhdskls12345)HELLO(hikjkflklfk) (bkjopkjjkl)HI(kjkkjjuilpp)}
set result [ regsub -all {\(.*?\)} $str {} ]
The -all option is required because your pattern may appear more than once in the source string and you don't want to strip just the first.
The text inside {...} is the pattern. You want to match anything that is inside brackets, so you use the \(...\) part; escaping brackets is required because they have a special meaning in the regular expressions syntax.
Inside brackets you want to match any character repeated zero or more times, so you have .*?, where the dot means any character and the *? is the zero-or-more non-greedy repeating command.
You could also split the string on open or close parentheses, and throw out the odd-numbered elements:
set s "(abcgfhdskls12345)HELLO(hikjkflklfk) (bkjopkjjkl)HI(kjkkjjuilpp)"
foreach {out in} [split $s "()"] {puts $out}
HELLO
HI
This is also a way using regexp only.
set p {(abcgfhdskls12345)HELLO(hikjkflklfk) (bkjopkjjkl)HI(kjkkjjuilpp)}
set pol [regexp {(.*)(HELLO)(.*) (.*)(HI)(.*)} $p p1 p2 p3 p4 p5 p6]
puts "$p3 $p6"
o/p: HELLO HI
Related
The Tcl documentation is clear on how to use string totitle:
Returns a value equal to string except that the first character in
string is converted to its Unicode title case variant (or upper case
if there is no title case variant) and the rest of the string is
converted to lower case.
Is there a workaround or method that will convert a string with spaces (the first letter of each word would be upper case)?
For example in Python:
intro : str = "hello world".title()
print(intro) # Will print Hello World, notice the capital H and W.
In Tcl 8.7, the absolutely most canonical way of doing this is to use regsub with the -command option to apply string totitle to the substrings you want to alter:
set str "hello world"
# Very simple RE: (greedy) sequence of word characters
set tcstr [regsub -all -command {\w+} $str {string totitle}]
puts $tcstr
In earlier versions of Tcl, you don't have that option so you need a two stage transformation:
set tcstr [subst [regsub -all {\w+} $str {[string totitle &]}]]
The problem with this is that it will below up if the input string has certain Tcl metacharacters in it; it is possible to fix this, but it's horrible to do; I added the -command option to regsub precisely because I was fed up of having to do a multi-stage substitute just to make a string I could feed through subst. Here's the safe version (the input stage could also be done with string map):
set tcstr [subst [regsub -all {\w+} [regsub -all {[][$\\]} $str {\\&}] {[string totitle &]}]]
It gets really complicated (well, at least quite non-obvious) when you want to actually do the replacement on substrings that have been transformed. Which is why it is now possible to circumvent all that mess with regsub -command that is careful with word boundaries when doing the replacement command running (because the Tcl C API is actually good at that).
Donal gave you an answer but there is a package that allows you to do what you want textutil::string from Tcllib
package require textutil::string
puts [::textutil::string::capEachWord "hello world"]
> Hello World
If I have a variable that has values like this:
set var "/abc/def/ghi/jkl/mno/pqr"
How do I use regsub so that it removes everything except the second last value "mno"?
Well, you can do this:
set updatedValue [regsub {^.*(mno).*$} $value {\1}]
which is an old trick from the Unix utility, sed, translated into Tcl. That will remove everything but the last text to match mno. Pick your RE appropriately.
And don't use this for manipulating filenames, please. It might work, but it makes your code more confusing. The file command has utility subcommands for this sort of work and they handle tricky gotchas that you might not be aware of.
Why a regular expression?
How about:
set val [file tail [file dirname $var]]
References: file
% regexp {.*/([^/]+)/} $var -> val
1
% set val
mno
Try this, Using split and lindex
% set var "/abc/def/ghi/jkl/mno/pqr"
/abc/def/ghi/jkl/mno/pqr
%
% puts "[lindex [split $var "/"] end-1]"
mno
I need to pass a regular expression as a variable within a regsub command. I want to eliminate the brackets from cap variable, but I am unable to pass the curly braces within the match variable.
set cap {[equality choice control]}
set match {\[}
regsub -all $match $cap "" cap
puts $cap
The reason for doing this is that I am building a proc and I need to pass the regex as an argument.
Square brackets are special in regular expressions as well as Tcl. To match a literal bracket you need to escape it in the regular expression and it needs separate escaping in Tcl.
% set match {[}
[
% regsub -all $match $cap ""
couldn't compile regular expression pattern: brackets [] not balanced
% set match {\[}
\[
% regsub -all $match $cap ""
equality choice control]
You could also use quotes but ensure you deal with the Tcl escaping requirements. ie: set match "\\\["
Note: for a simple substitution you may find string map easier to use:
set cap [string map [list "\[" ""] $cap]
would achieve the same effect.
Is there any Tcl function to add escape character to a string automatically?
For example, I have a regular expression
"[xy]"
After I call the function, I get
"\[xy]"
After being called again, I get
"\\\[xy]"
I remember there's such function with some script language, but I cannot recall which language it is.
The usual way of adding such escape characters as are “necessary” is to use list (% is my Tcl prompt):
% set s {[xy]}
[xy]
% set s [list $s]
{[xy]}
% set s [list $s]
{{[xy]}}
The list command prefers to leave alone if it can, wrap with braces if it can get away with it, and resorts to backslashing otherwise (because backslashes are really unreadable).
If you really need backslashes, string map or regsub will do what you need. For example:
set s [regsub -all {\W} $s {\\&}]
How can I extract a word inside a double quote inside a file?
e.g.
variable "xxx"
Reading a text file into Tcl is just this:
set fd [open $filename]
set data [read $fd] ;# Now $data is the entire contents of the file
close $fd
To get the first quoted string (under some assumptions, notably a lack backslashed double quote characters inside the double quotes), use this:
if {[regexp {"([^""]*)"} $data -> substring]} {
# We found one, it's now in $substring
}
(Doubling up the quote in the brackets is totally unnecessary — only one is needed — but it does mean that the highlighter does the right thing here.)
The simplest method of finding all the quoted strings is this:
foreach {- substring} [regexp -inline -all {"([^""]*)"} $data] {
# One of the substrings is $substring at this point
}
Notice that I'm using the same regular expression in each case. Indeed, it's actually good practice to factor such REs (especially if repeatedly used) into a variable of their own so that you can “name” them.
Combining all that stuff above:
set FindQuoted {"([^""]*)"}
set fd [open $filename]
foreach {- substring} [regexp -inline -all $FindQuoted [read $fd]] {
puts "I have found $substring for you"
}
close $fd
Internal Matching
If you're just looking for a regular expression, then you can use TCL's capture groups. For example:
set string {variable "xxx"}
regexp {"(.*)"} $string match group1
puts $group1
This will return xxx, discarding the quotes.
External Matching
If you want to match data in a file without having to handling reading the file into TCL directly, you can do that too. For example:
set match [exec sed {s/^variable "\(...\)"/\1/} /tmp/foo]
This will call sed to find just the parts of the match you want, and assign them to a TCL variable for further process. In this example, the match variable is set to xxx as above, but is operating on an external file rather than a stored string.
When you just want to find with grep all words in quotes in a file and do something with the words, you do something like this (in a shell):
grep -o '"[^"]*"' | while read word
do
# do something with $word
echo extracted: $word
done