Tcl: Parsing input with strings in quotes - tcl

I have the following code to split stdin into a list of strings:
set cmd [string toupper [gets stdin]]
set items [split $cmd " "]
This splits the user input into a list (items) using the space as a delimiter. It works fine for simple input such as:
HELLO 1 2 3
What I get in items:
HELLO
1
2
But how can I get the quoted string in the example below to be become one item in the list (items):
"HELLO THERE" 1 2 3
What I want in items:
HELLO THERE
1
2
How can I do this?

This is where you get into building a more complex parser. The first step towards that is switching to using regular expressions.
regexp -all -inline {"[^\"]*"|[^\"\s]+} $inputData
That will do the right thing... provided the input is well-formed and only uses double quotes for quoting. It also doesn't strip the quotes off the outside of the "words"; you'll want to use string trim $word \" to clean that up.
If this is a command that you are parsing, use a safe interpreter. Then you can allow Tcl syntax to be used without exposing the guts of your code. I'm pretty sure there are answers here on how to do that already.

Because Tcl doesn't have strong types, the simplest way to do this is to just treat your stdin string like a list of strings. No need to use split to convert a string into a list.
set cmd {"HELLO THERE" 1 2 3}
foreach item $cmd {
puts $item
}
--> HELLO THERE
1
2
3
Use string is list to check if your $cmd string can be treated as a list.
if {[string is list $cmd]} {
puts "Can be a list"
} else {
puts "Cannot be a list"
}

Related

Adding an additional backslash to elements of TCL list

I have a list which looks like:
list1 = {a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\] ....}
When I operate on the element of the list one by one using a foreach loop:
`foreach ele $list1 {
puts "$ele
}`
I get the following output:
a.b.c.d
a.bb.ccd[0]
a.bb.ccd[1]
Observe that, the backslash goes missing(due to the tcl language flow).
In order to preserve the same, I want to add an extra backslash to all the elements of list1 having an existing backslash.
I tried :
regsub -all {\} $list1 {\\} list1
(Also tried the double quotes instead of braces and other possible trials).
Nothing seems to work.
Is there a way to make sure the $ele preserves the backslash characters inside the foreach loop, as I need the elements with the exact same characters for further processing.
P.S. Beginner in using regexp/regsub
If your input data has backslashes in it like that that you wish to preserve, you need to use a little extra work when converting that data to a Tcl list or Tcl will simply assume that you were using the backslashes as conventional Tcl metacharacters. There's a few ways to do it; here's my personal favourite as it is logically selecting exactly the chars that we want (the non-empty sequences of non-whitespaces):
set input {a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\] ....}
set items [regexp -all -inline {\S+} $input]
foreach item $items {
puts $item
}
As you can see from the output:
a.b.c.d
a.bb.ccd\[0\]
a.bb.ccd\[1\]
....
this keeps the backslashes exactly. (Also, yes, I quite like regular expressions. Especially very simple ones like this.)
As you have defined list1 it is a string. When list1 is used with the foreach command, then the string is converted to a list. Remember that lists in Tcl are really just specially formatted strings that are parsed when converted from a string to a list. As the list elements are parsed, the backslashes are removed in accordance with normal Tcl parsing rules. There are several ways to build lists that contains characters that are significant to the Tcl parser. The code below shows two examples contrasted to your code:
set list1 {a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\]}
puts "list1 as a string"
puts $list1
puts "converting the list1 string to a proper list"
foreach ele $list1 {
puts $ele
}
set list2 [list a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}]
puts "list2 build by the list command"
puts $list2
puts "list2, element by element"
foreach ele $list2 {
puts $ele
}
set list3 {a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}}
puts "list3 build properly quoting each element"
puts $list3
puts "list3, element by element"
foreach ele $list3 {
puts $ele
}
Running this yields:
list1 as a string
a.b.c.d a.bb.ccd\[0\] a.bb.ccd\[1\]
converting the list1 string to a proper list
a.b.c.d
a.bb.ccd[0]
a.bb.ccd[1]
list2 build by the list command
a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}
list2, element by element
a.b.c.d
a.bb.ccd\[0\]
a.bb.ccd\[1\]
list3 build properly quoting each element
a.b.c.d {a.bb.ccd\[0\]} {a.bb.ccd\[1\]}
list3, element by element
a.b.c.d
a.bb.ccd\[0\]
a.bb.ccd\[1\]
Your regsub attempt will work if you replace each backslash by two backslashes, but building the list properly is much clearer.

How to split string by numerics

I havetried to split but still failed.
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
puts [split $strdata "3334445330535"] ;#<---- this command does not work
The result needed as below:
{34a64323R6662w0332665323020346t534r66662v43037} {34a64323R6662w0332665323020346t534r66662v43037}
The split command's optional second argument is interpreted as a set of characters to split on, so it really isn't going to do what you want. However, there are other approaches. One of the simpler methods of doing what you want is to use string map to convert the character sequence into a character that isn't in the input data (Unicode is full of those!) and then split on that:
set strdata "34a64323R6662w0332665323020346t534r66662v43037333444533053534a64323R6662w0332665323020346t534r66662v430373334445330535"
set splitterm "3334445330535"
set items [split [string map [list $splitterm "\uFFFF"] $strdata] "\uFFFF"]
foreach i $items {
puts "==> $i"
}
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> 34a64323R6662w0332665323020346t534r66662v43037
# ==> {}
Note that there is a {} (i.e., an empty-string list element) at the end because that's the string that came after the last split element. If you don't want that, add a string trimright between the string map and the split:
# Doing this in steps because the line is a bit long otherwise
set mapped [string map [list $splitterm "\uFFFF"] $strdata]
set trimmed [string trimright $mapped "\uFFFF"]
set items [split $trimmed "\uFFFF"]
The split command doesn't work like that, see the documentation.
Try making the data string into a list like this:
regsub -all 3334445330535 $strdata " "
i.e. replacing the delimiter with a space.
Documentation:
regsub,
split

catch multiple empty lines in file in tcl

There are 4 empty space in my file,set in wr_fp.I want to catch four empty space in code. But below code is not working.
while {[gets $wr_fp line3] >= 0} {
if {[regexp "\n\s+\n\s+\n\s+\n" $line3]} { puts "found 4 empty lines"}
}
tl;dr: Don't put REs in "quotes", put them in {braces}.
The problem is that you've put your RE in quotes, so that it is actually this:
s+
s+
s+
Because of Tcl's general substitution rules, \n becomes a newline and \s becomes a simple s. Putting the RE in braces inhibits this (unwanted in this case) behaviour.
this is my answer.I want this.
while {[gets $rd_fp line] >= 0} {
if {[string match "" $line]} {
if {[expr $count % 4] == 1} {puts "found 4 space"}
incr count
}
}
The gets / chan gets command reads one line at a time and discards the newline character from each line, so your test will never succeed. You need to read in the full contents of the file at once:
set txt [chan read $wr_fp]
if {[regexp {\n\s+\n\s+\n\s+\n} $txt]} { puts "found 4 empty lines"}
Note that you need to use braces around the regular expression as Donal explains.
On some typical pitfalls of RE formulation:
do you really intend to specify that there must be at least one whitespace character on each 'empty' line? If you want to allow lines with no characters at all between the newlines, use \s* instead of \s+.
Also note that this regular expression will match ranges with more than four newlines: the extra newlines will be consumed by one of the \s+ groups. If you want to disallow extra newlines, match with (e.g.) [ \t\f\r] (or any other combination of whitespace you want) instead of \s. Note that this means the expression will match exactly three lines with nothing but blanks, tabs, form feeds, and returns, the lines surrounded and separated by newlines: you might want to extend it with one more subgroup to match the fourth line.
I'm a bit mystified by your solution as described in your own answer, since it doesn't do what was specified in the question. With the following text file:
abc
def
ghi
jkl
mno
pqr
stu
vwx
yz.
(where there is a tab character in the second line after "pqr")
and assuming count has the value 0 when the code is called, your code outputs "found 4 space" after reading the blank lines after "def", "pqr", and "vwx", but not after the line before "stu", where your question indicated it should be.
This code
set count 0
while {[gets $rd_fp line] >= 0} {
if {[string is space $line]} {
incr count
if {$count == 4} {puts "found 4 space"}
} else {
set count 0
}
}
does do what you asked for (nearly): it accepts lines containing whitespace as empty, and it prints its message only after finding four consecutive empty lines. The major difference from the specification in your question is that it also accepts lines without any characters as empty. To match your specification, string is space -strict $line should be used instead.
Documentation: chan, gets, if, incr, puts, regexp, set, string, while

TCL command - string trim

I was using the command 'string trimright' to trim my string but I found that this command trims more than required.
My expression is "dssss.dcsss" If I use string trim command to trim the last few characters ".dcsss", it trims the entire string. How can I deal with this?
Command:
set a [string trimright "dcssss.dcsss" ".dcsss"]
puts $a
Intended output:
dcsss
Actual output
""
The string trimright command treats its (optional) last argument as a set of characters to remove (and so .dcsss is the same as sdc. to it), just like string trim and string trimleft do; indeed, string trim is just like using both string trimright and string trimleft in succession. This makes it unsuitable for what you are trying to do; to remove a suffix if it is present, you can use several techniques:
# It looks like we're stripping a filename extension...
puts [file rootname "dcssss.dcsss"]
# Can use a regular expression if we're careful...
puts [regsub {\.dcsss$} "dcssss.dcsss" {}]
# Do everything by hand...
set str "dcssss.dcsss"
if {[string match "*.dcsss" $str]} {
set str [string range $str 0 end-6]
}
puts $str
If what you're doing really is filename manipulation, like it looks like, do use the first of these options. The file command has some really useful commands for working with filenames in a cross-platform manner in it.

TCL : find and remove all characters in a string from the first occurrence of a character in a string

I am a newbie in TCL and in need of a TCL method / utility / code which can find and remove all characters (including itself) in a string from its first occurrence in a string.
I have a string like below:
Func::set()->method();
In the above string I need to find first occurrence of '(' and remove all it and after that so that the resultant string would be just:
Func::set
You can do it with a regular expression:
set the_string [regsub {\(.*} $the_string ""]
or if you're not familiar with regexp then you can do it the more traditional way:
set the_string [
string range $the_string 0 [
expr {[string first "(" $the_string]-1}
]
]
For further info, read the manual pages for [string], [regsub] and [re_syntax].
Another technique is to use split and lindex:
set the_string [lindex [split $the_string "("] 0]
This is short and simple, but may do a lot of extra work if your string is very long.