regular expression to treat unbalanced braces as a word - tcl

I am getting an error message in this regex when line contains unbalanced braces.
set line "a b { c{}"
set lst [regexp -all -inline {^(\s*(\S*)\s*)*(\{(.*)\})?(\s*(\S*)\s*)*$} $line]
set lst [lindex $lst 0]
set firstelement [lindex $lst 0]
How to avoid such cases and treat unbalanced braces as a word?

When you have a string from an arbitrary source (like a user) there's no guarantee at all that it is a well-formed list. Now regexp -inline returns a list of what it matched, but the elements of that list are strings (unless you use the -indices option, of course) and that means that you can't safely use lindex on them to pick out the pieces.
The safe way to get the first “word”, assuming you define “word” to be “sequence of non-whitespace characters” (the usual user definition), is to do this:
set firstWord [lindex [regexp -all -inline {\S+} $item] 0]
It's a bit ugly, but it's totally safe. (In fact, for the first word only, use regexp -inline {\S+} $item on its own, but that won't let you get later words.)
Using split to break a string into words is also possible, but that strongly assumes that the word separator is a single (whitespace-by-default) character and does something that you might not expect if you have multi-whitespace separators, or leading and trailing whitespace. Frankly, it's more useful for dividing up non-whitespace separated strings (e.g., a file into lines, an /etc/passwd record into fields) or for turning a string into the list of its characters (with an empty second argument).

The regexp command returns a list. You then take the first element of the list. But in the final line you then treat that element as a list - but it is not guaranteed to be so - hence the actual string content matters. Instead, if you want to deal with this item as a list you need to use split and convert it into words:
% split "a b {" " "
a b \{
In your case:
set lst [lindex $lst 0]
set firstelement [lindex [split $lst " "] 0]
You may also want to look into subst. It looks like you are trying to read poorly specified tcl lists as input and doing some parsing to get them as a proper tcl list. In which case, subst -nocommands [lindex $lst 0] might be more helpful to you. For example:
% lindex [subst -nocommands [lindex $lst 0]] 2
c{}
Note that this is the content of the braced part of $line.

Related

how to use lsearch in tcl with numeric

I am new to TCL word. I have a list which js numeric and when I use lsearch for numeric it is not print properly. Could you please help me what's wrong in my command
set a {12,121,124,21,212}
lsearch -integer $a 12
Expected output : 12
Actual Output : 12,121, 124, 212
You've got a list that is separated by commas, whereas Tcl lists (the kind that lsearch can search and lsort can sort) are separated by spaces. The split command can do the conversion for you:
set a {12,121,124,21,212}
set theList [split $a ","]
lsearch -integer $theList 12
The result of the search is 0, which is the index of the first item in the list (Tcl uses zero-indexing, like a lot of programming languages).
To get the actual value found (no so useful in this case, but definitely more useful in more complex ones) you'd provide the -inline option.
lsearch -inline -integer $theList 12

Can I convert a string with space using totitle?

The Tcl documentation is clear on how to use string totitle:
Returns a value equal to string except that the first character in
string is converted to its Unicode title case variant (or upper case
if there is no title case variant) and the rest of the string is
converted to lower case.
Is there a workaround or method that will convert a string with spaces (the first letter of each word would be upper case)?
For example in Python:
intro : str = "hello world".title()
print(intro) # Will print Hello World, notice the capital H and W.
In Tcl 8.7, the absolutely most canonical way of doing this is to use regsub with the -command option to apply string totitle to the substrings you want to alter:
set str "hello world"
# Very simple RE: (greedy) sequence of word characters
set tcstr [regsub -all -command {\w+} $str {string totitle}]
puts $tcstr
In earlier versions of Tcl, you don't have that option so you need a two stage transformation:
set tcstr [subst [regsub -all {\w+} $str {[string totitle &]}]]
The problem with this is that it will below up if the input string has certain Tcl metacharacters in it; it is possible to fix this, but it's horrible to do; I added the -command option to regsub precisely because I was fed up of having to do a multi-stage substitute just to make a string I could feed through subst. Here's the safe version (the input stage could also be done with string map):
set tcstr [subst [regsub -all {\w+} [regsub -all {[][$\\]} $str {\\&}] {[string totitle &]}]]
It gets really complicated (well, at least quite non-obvious) when you want to actually do the replacement on substrings that have been transformed. Which is why it is now possible to circumvent all that mess with regsub -command that is careful with word boundaries when doing the replacement command running (because the Tcl C API is actually good at that).
Donal gave you an answer but there is a package that allows you to do what you want textutil::string from Tcllib
package require textutil::string
puts [::textutil::string::capEachWord "hello world"]
> Hello World

list doesn't contain its own members in Tcl

I have a list containing one member, that member is the string <cmd_stichstudy1>XXDDR0_MA[12]. When I search for that string in the list (using lsearch) I get that the list doesn't contain it. I even get it when I search for the member of the list:
tcl> set nets_names
{<cmd_stichstudy1>XXDDR0_MA[12]}
tcl> lsearch $nets_names [lindex $nets_names 0]
-1
Why does this happen?
If you use -exact it will work the way you want.
% set nets_names {<cmd_stichstudy1>XXDDR0_MA[12]}
<cmd_stichstudy1>XXDDR0_MA[12]
% lsearch -exact $nets_names [lindex $nets_names 0]
0
%
lsearch has an unfortunate property of using glob-style matching by default.
To cite the manual:
If all matching style options are omitted, the default matching style is -glob.
So always pass -exact to lsearch unless you really want -glob.

Grep the word inside double quote

How can I extract a word inside a double quote inside a file?
e.g.
variable "xxx"
Reading a text file into Tcl is just this:
set fd [open $filename]
set data [read $fd] ;# Now $data is the entire contents of the file
close $fd
To get the first quoted string (under some assumptions, notably a lack backslashed double quote characters inside the double quotes), use this:
if {[regexp {"([^""]*)"} $data -> substring]} {
# We found one, it's now in $substring
}
(Doubling up the quote in the brackets is totally unnecessary — only one is needed — but it does mean that the highlighter does the right thing here.)
The simplest method of finding all the quoted strings is this:
foreach {- substring} [regexp -inline -all {"([^""]*)"} $data] {
# One of the substrings is $substring at this point
}
Notice that I'm using the same regular expression in each case. Indeed, it's actually good practice to factor such REs (especially if repeatedly used) into a variable of their own so that you can “name” them.
Combining all that stuff above:
set FindQuoted {"([^""]*)"}
set fd [open $filename]
foreach {- substring} [regexp -inline -all $FindQuoted [read $fd]] {
puts "I have found $substring for you"
}
close $fd
Internal Matching
If you're just looking for a regular expression, then you can use TCL's capture groups. For example:
set string {variable "xxx"}
regexp {"(.*)"} $string match group1
puts $group1
This will return xxx, discarding the quotes.
External Matching
If you want to match data in a file without having to handling reading the file into TCL directly, you can do that too. For example:
set match [exec sed {s/^variable "\(...\)"/\1/} /tmp/foo]
This will call sed to find just the parts of the match you want, and assign them to a TCL variable for further process. In this example, the match variable is set to xxx as above, but is operating on an external file rather than a stored string.
When you just want to find with grep all words in quotes in a file and do something with the words, you do something like this (in a shell):
grep -o '"[^"]*"' | while read word
do
# do something with $word
echo extracted: $word
done

Tcl: Removing the pound sign commented line

Why can't I remove the pound sign commented line?
#!/usr/bin/tclsh
set lines [list file1.bmp { # file2.bmp} file3.bmp ]
# Now we apply the substitution to get a subst-string that
# will perform the computational parts of the conversion.
set out [regsub -all -line {^\s*#.*$} $lines {}]
puts $out
Output:
file1.bmp { # file2.bmp} file3.bmp
-UPDATE-
Expected output:
file1.bmp {} file3.bmp
{} means empty string.
In fact, it's my first step. My ultimate goal is to eliminating all commented line and all empty lines. The above question only changes all comment lines into empty lines. For example, if the input is:
set lines [list file1.bmp { # file2.bmp} {} file3.bmp ]
I want my ultimate results to be
file1.bmp file3.bmp
Note: Stackoverflow mistakenly dim everything from and after the pound (#) sign, thinking that those are comments. Yet in TCL syntax, it should not be comments.
#Tensibai:
I also want to remove empty lines, thus I match any number of spaces before '#'. (since after removing all following '#' included, it's an empty line). In fact, in my data, the comment always appears as a full line by itself. Yet the '#' sign may not appear at the 1st character => the spaces can leads a comment line.
Edit to answer after edit:
#!/usr/bin/tclsh
set lines [list file1.bmp { # file2.bmp } file3.bmp #test ]
puts $lines
# Now we apply the substitution to get a subst-string that
# will perform the computational parts of the conversion.
set out [lsearch -regexp -all -inline -not $lines {^\s*(#.*)?$}]
puts $out
Output:
file1.bmp file3.bmp
You're working on a list, the representation of a list is a simple text so you can regsub it, but it's a single line.
If you want to check elements on this list you have to use list related commands.
Here lsearch will do what you wich, checking each item to see if they match the regex, the -not tells to return the elements no matching with -all -inline
Old answer:
Why: because your regex match any pound preceded only by 0 or unlimited number of spaces. Thus it will only match comment lines and not inline comments.
Have a look to http://regex101.com to test regexes.
A working regex would be:
#!/usr/bin/tclsh
set lines [list file1.bmp { # file2.bmp} file3.bmp ]
# Now we apply the substitution to get a subst-string that
# will perform the computational parts of the conversion.
set out [regsub -all -line {^(.*?)#.*$} $lines {\1}]
puts $out
For the regex (complete details here):
^ Matches start of line
(.*?)# Matches and capture as limited number of chars as possible before the # (non greedy operator ? to limit the match)
.*$ matches any numbe of chars until end of line
And we replace with \1 which is the first capture group (and the only one in this case).
Output:
file1.bmp {
This will also remove full line comments but may leave spaces or tabs if there's some before the pound sign and so leave blank lines.