Tcl Regular expression not working at the end of line - tcl

I'm trying to match a file that looks like this:
22.000 abc_/dasdf
23.652 abc_1/dasdf_0/l
The regular expression I used is this:
[regexp { (\S+)\s+(.+) } $line -> number name }
However, it only matches when there is a space after the string in the file. For example, it returns a match when:
22.000 abc_/dasdf<space>
But no match when there is nothing after /dasdf. By default, there are no such spaces after the string inside the file. Any reason why this could be?

That's because you have spaces inside the braces. Those are significant.
Use
regexp {(\S+)\s+(.+)} $line -> number name
# ......^...........^ no spaces here
or if you want whitespace for readability:
regexp -expanded { (\S+) \s+ (.+) } $line -> number name

Related

catch multiple empty lines in file in tcl

There are 4 empty space in my file,set in wr_fp.I want to catch four empty space in code. But below code is not working.
while {[gets $wr_fp line3] >= 0} {
if {[regexp "\n\s+\n\s+\n\s+\n" $line3]} { puts "found 4 empty lines"}
}
tl;dr: Don't put REs in "quotes", put them in {braces}.
The problem is that you've put your RE in quotes, so that it is actually this:
s+
s+
s+
Because of Tcl's general substitution rules, \n becomes a newline and \s becomes a simple s. Putting the RE in braces inhibits this (unwanted in this case) behaviour.
this is my answer.I want this.
while {[gets $rd_fp line] >= 0} {
if {[string match "" $line]} {
if {[expr $count % 4] == 1} {puts "found 4 space"}
incr count
}
}
The gets / chan gets command reads one line at a time and discards the newline character from each line, so your test will never succeed. You need to read in the full contents of the file at once:
set txt [chan read $wr_fp]
if {[regexp {\n\s+\n\s+\n\s+\n} $txt]} { puts "found 4 empty lines"}
Note that you need to use braces around the regular expression as Donal explains.
On some typical pitfalls of RE formulation:
do you really intend to specify that there must be at least one whitespace character on each 'empty' line? If you want to allow lines with no characters at all between the newlines, use \s* instead of \s+.
Also note that this regular expression will match ranges with more than four newlines: the extra newlines will be consumed by one of the \s+ groups. If you want to disallow extra newlines, match with (e.g.) [ \t\f\r] (or any other combination of whitespace you want) instead of \s. Note that this means the expression will match exactly three lines with nothing but blanks, tabs, form feeds, and returns, the lines surrounded and separated by newlines: you might want to extend it with one more subgroup to match the fourth line.
I'm a bit mystified by your solution as described in your own answer, since it doesn't do what was specified in the question. With the following text file:
abc
def
ghi
jkl
mno
pqr
stu
vwx
yz.
(where there is a tab character in the second line after "pqr")
and assuming count has the value 0 when the code is called, your code outputs "found 4 space" after reading the blank lines after "def", "pqr", and "vwx", but not after the line before "stu", where your question indicated it should be.
This code
set count 0
while {[gets $rd_fp line] >= 0} {
if {[string is space $line]} {
incr count
if {$count == 4} {puts "found 4 space"}
} else {
set count 0
}
}
does do what you asked for (nearly): it accepts lines containing whitespace as empty, and it prints its message only after finding four consecutive empty lines. The major difference from the specification in your question is that it also accepts lines without any characters as empty. To match your specification, string is space -strict $line should be used instead.
Documentation: chan, gets, if, incr, puts, regexp, set, string, while

Using backslash-newline sequence in Tcl

In Tcl, we are using the backslash for escaping special characters as well as for spreading long commands across multiple lines.
For example, a typical if loop can be written as
set some_Variable_here 1
if { $some_Variable_here == 1 } {
puts "it is equal to 1"
} else {
puts "it is not equal to 1"
}
With the help of backslash, it can be written as follows too
set some_Variable_here 1
if { $some_Variable_here == 1 } \
{
puts "it is equal to 1"
} \
else {
puts "it is not equal to 1"
}
So, with backslash we can make the statements to be treated as if like they are in the same line.
Lets consider the set statement
I can write something like as below
set x Albert\ Einstein;# This works
puts $x
#This one is not working
set y Albert\
Einstein
If I try with double quotes or braces, then the above one will work. So, is it possible to escape the newline with backslashes without double quotes or braces?
A backslash-newline-whitespace* sequence (i.e., following whitespace is skipped over) is always replaced with a single space. To get a backslash followed by a newline in the resulting string, use \\ followed by \n instead.
set y Albert\\\nEinstein

How to check that the string is single word?

How to check that string is a single word?
Is this right way to do that?
set st "some string"
if { [llength $st] != 1 } {
puts "error"
}
According to one possible definition, you check if a string is one word by using:
catch {set oneWord 0;set oneWord [expr {[llength $string] == 0}]}
That's the Tcl language definition of a word.
On the other hand, if your preferred definition is “is alphanumeric” then you have other possibilities, such as:
# -strict excludes the empty string (normally included for historic reasons)
set oneWord [string is alnum -strict $string]
My answer is based on the assumption that a word contains only alphabet characters.
If you don't mind using some regexp, you can use this:
set st "some string"
if { ![regexp {^[A-Za-z]+$} $st] } {
puts "error"
}
[regexp expression string] returns 0 if there is no match and 1 is there is a match.
The expression I used is ^[A-Za-z]+$ which means the string starts with a letter and can contain any number of letters and must end with a letter. If you want to include a dash inside (e.g. co-operate is one word), you add it in the character class:
^[A-Za-z-]+$
If you are now worried about trailing spaces, I would suggest trimming it first before passing it to the regexp:
set st " some string "
if { ![regexp {^[A-Za-z]+$} [string trim $st]] } {
puts "error"
}
or if you want to directly use the regexp...
set st " some string "
if { ![regexp {^\s*[A-Za-z]+\s*$} $st] } {
puts "error"
}
EDIT: If a word is considered as a string of characters except space, you can do something else: check if the string contains a space.
set st "some strings"
if { [regexp { } $st] } {
puts "error"
}
If it finds a space, regexp will return 1.
regexp provides a straight forward way to match a word with \w and \W. \w matches a word character, while \W matches any character except a word character.
set st "some string"
if { [regexp {\W} $st] } {
puts "error"
}
However \w matches only digits, alphabets and _ (in any combination). If special characters are there in your word, this will not work.

TCL command - string trim

I was using the command 'string trimright' to trim my string but I found that this command trims more than required.
My expression is "dssss.dcsss" If I use string trim command to trim the last few characters ".dcsss", it trims the entire string. How can I deal with this?
Command:
set a [string trimright "dcssss.dcsss" ".dcsss"]
puts $a
Intended output:
dcsss
Actual output
""
The string trimright command treats its (optional) last argument as a set of characters to remove (and so .dcsss is the same as sdc. to it), just like string trim and string trimleft do; indeed, string trim is just like using both string trimright and string trimleft in succession. This makes it unsuitable for what you are trying to do; to remove a suffix if it is present, you can use several techniques:
# It looks like we're stripping a filename extension...
puts [file rootname "dcssss.dcsss"]
# Can use a regular expression if we're careful...
puts [regsub {\.dcsss$} "dcssss.dcsss" {}]
# Do everything by hand...
set str "dcssss.dcsss"
if {[string match "*.dcsss" $str]} {
set str [string range $str 0 end-6]
}
puts $str
If what you're doing really is filename manipulation, like it looks like, do use the first of these options. The file command has some really useful commands for working with filenames in a cross-platform manner in it.

TCL : find and remove all characters in a string from the first occurrence of a character in a string

I am a newbie in TCL and in need of a TCL method / utility / code which can find and remove all characters (including itself) in a string from its first occurrence in a string.
I have a string like below:
Func::set()->method();
In the above string I need to find first occurrence of '(' and remove all it and after that so that the resultant string would be just:
Func::set
You can do it with a regular expression:
set the_string [regsub {\(.*} $the_string ""]
or if you're not familiar with regexp then you can do it the more traditional way:
set the_string [
string range $the_string 0 [
expr {[string first "(" $the_string]-1}
]
]
For further info, read the manual pages for [string], [regsub] and [re_syntax].
Another technique is to use split and lindex:
set the_string [lindex [split $the_string "("] 0]
This is short and simple, but may do a lot of extra work if your string is very long.