TCL expression parsing - why brace and bracket are escaped differently - tcl

I just did the following experiment in TCL 8.6:
% expr \"\{" ne \"x\"
1
% expr \"\[" ne \"x\"
extra characters after close-quote
in expression ""[" ne "x""
The first command makes sense to me:
Because the argument is not braced, first round parsing is script level parsing, backslash escapes are removed: expr "{" ne "x"
expr command continues the parsing, "{" and "x" are 2 quoted literals and the execution goes well.
The error in the 2nd command does not make sense. The only difference is replacing bracket with brace, why does it fail?
I know bracing the arguments is expected for expression, this question is mostly to understand TCL parsing.

The problem with the second command is that the expr command processes […] sequences inside double quotes as command substitutions. This is independent of whether Tcl does and is part of why it is a really good idea to always brace your overall expressions. Had you instead used:
expr \{\[\} ne \"x\"
then it would have worked; just as with the base Tcl language, expr does not expand command substitutions in brace-quoted terms.

Related

TCL error: extra characters after close-quote

I'm trying to evaluate certain expression. I have "pqr" && "xyz" in command. on evaluating the command, it gives an error: extra characters after close-quote.
I think tcl cant able to parse && after double quote. If this is the reason then how should i have to deal with double quote and &&?
You're not giving us enough information.
A wild guess is that you were writing something like this:
expr "pqr"&& "xyz"
which does give the error message "extra characters after close-quote". This is because the interpreter tries to parse the command according to Tcl language rules, and one of those rules is that a word that starts with a double quote must end with a matching double quote. In this case, there are two & characters following the matching double quote.
Now,
expr "pqr" && "xyz"
(with a space between the double quote and the ampersand) is no good either. This is because the interpreter will remove any characters that have syntactic function as it prepares the arguments for the command. This means that the argument expr gets is the string pqr && xyz. When the expr command executes, it tries to interpret its argument as a string in a special expression language that isn't Tcl. In particular, unlike in Tcl strings that aren't boolean values or the names of variables of functions must always be enclosed in braces or double quotes, like this: "pqr" && "xyz". So how do you get that? You always* brace the argument to expr, that's how.
expr {"pqr" && "xyz"}
means that expr gets the legal string "pqr" && "xyz".
But the string "pqr" && "xyz" is still not valid, since the && (logical and) operation isn't defined for strings other than strings that are equal to the string representation of boolean values, such as expr {"true" && "false"}
So, again we're stuck, because what you seem to be trying to do makes no sense. If you show us what you're doing we might be able to help you.
*) except when you shouldn't. Rare, expert level.
Documentation:
expr,
Mathematical operators as Tcl commands,
Summary of Tcl language syntax
Brace your expressions
The expr command (and by extension, the commands for, if, and while, which use the same mechanism to evaluate their conditions) interprets an expression string that is constructed from its arguments. Note that the language of the expression string isn't Tcl, but specific to the expr command's expression evaluator: the languages share many syntactic forms, but are fundamentally different with infix, operator-based structure for the expr language and prefix, command-based structure for Tcl.
Letting the Tcl interpreter evaluate the arguments before passing them to expr can lead to
double substitution, which has security problems similar to SQL injection attacks.
iterative commands (for, while) getting constant-valued condition arguments, leading to infinite loops.
all substitutions (and thus their side-effects) always occurring while expr can selectively suppress some of them.
Therefore, it is almost always better to provide the expression string as a braced (escaped) string, which will not be evaluated by the Tcl interpreter, only the expr interpreter.
Note that while unbraced arguments to expr are allowed to be a invalid expression string as long as the argument evaluation transforms them into a valid one, braced expressions must be valid as they are (e.g. variable or command substitutions must be simple operands and not operators or complex expressions).
Another benefit from using braced expression strings is that the byte compiler usually can generate more efficient code (5 - 10x faster) from them.

Exception to the "brace your expr expressions" rule. What's going on?

I have a number, say 10, in one variable and a string like +1 or -2 in another. I need to evaluate 10+1 or 10-2 in the above cases.
So, I have
set foo 10
set garp -1
If I do
expr $foo $garp
all is well (I get 9).
Ah! But in general, you should put braces around the expr expression.
expr {$foo $garp}
fails with missing operator at _#_ in expression $foo _#_$garp.
Similarly,
expr [concat $foo $garp]
works nicely but
expr {[concat $foo $garp]}
returns 10 -1.
I don't want to leave the expression unbraced without really understanding what's going on as I'm afraid that otherwise I, or someone else, is going to put braces around the expression and the code will stop working.
What's the "correct" way to do this?
In general, expr involves two rounds of substitution.
The first round of substitution is performed by the command parser on those arguments of the expr command that are not enclosed in braces. The resulting strings are concatenated (by adding separator spaces between them) into a single expression string, which is then parsed (and later evaluated) by the expression processor.
During parsing, the expression is decomposed into operators and operands. Operands must be delimited with operators. Assuming focus on mathematical expressions (i.e. discarding string operations), an operand may be one of the following:
a numeric value
a Tcl variable, using standard $ notation. The variable's value will be used as the operand.
a Tcl command enclosed in brackets. The command will be executed and its result will be used as the operand.
a parenthesized sub-expression, which is parsed using the same rules.
a mathematical function whose arguments are sub-expressions, parsed using the same rules.
Items 2 and 3 correspond to the second round of substitution, which is performed by the expression processor during evaluation. Each substitution performed at this step is expected to yield a numeric value that is directly usable in further evaluation, without needing to re-parse and re-evaluate it.
Having all this said, let's look at your examples:
expr $foo $garp
The command processor expands this during the 1st round of substitution to expr 10 -1, the expression string after concatenation of the arguments is {10 -1}, and the expression processor parses it into a valid expression 10 subtract 1.
expr [concat $foo $garp]
During the 1st round of substitution the command processor expands this to expr {10 -1}, effectively producing the same expression string as in the previous case.
expr {$foo $garp}
The command processor leaves this intact, and the expression processor sees two consecutive operands (corresponding to clause 2 above), without any operator between them.
expr {[concat $foo $garp]}
Again, the 1st round of substitution is not performed. Parsing this expression extracts a single operand [concat $foo $garp] corresponding to clause 3. Expression processor evaluates the command and substitutes its result (i.e. the string "10 -1") for the result of the full expression.
So the correctly braced version of your expression must read:
expr {$foo + $garp}
which will be parsed as $foo add $garp.
In this case,
expr {$foo + $garp}
The rule "always brace your expressions" stems from the fact that it is a good idea to bypass the argument evaluation step and leave the evaluation of the expression string completely to expr (because it is more secure1 and results in more effective bytecode2).
For this to work, the string passed to expr needs to be legal according to the rules laid out in the expr docs (an unbraced expression doesn't have to be legal as long as the argument evaluation step makes it legal). From this follows that anytime you need the argument evaluation to help you create a legal expression string is an exception to the "always brace" rule (and possibly a hint that you need to rethink the structure of your code3).
The string {$foo $garp} is illegal because variable substitutions can only be operands in an expression, meaning that we have two operands without an operator. The string "$foo $garp" is transformed by the argument evaluation into a legal expression as the minus operator is reinterpreted into a subtraction operator.
If you have a bunch of values that you are getting in pairs, a and b, and you want to add those, expr $a $b might work if you are sure that they always have a sign. That's brittle, though. It's better to use one of
expr {$a + $b}
tcl::mathop::+ $a $b
expr [join [list $a $b] +]
(The first one is the solution we've discussed above. The second one avoids double substitution by using the + operator outside of expr: the variables are evaluated by the argument evaluator but not by the command. The third variant has all the problems of double substitution and is mentioned mostly for completeness. It's still better than just expr $a $b, though.)
Documentation:
+ (operator),
expr,
join,
list,
Mathematical operators as Tcl commands
1) The argument evaluator, given hostile arguments, could for instance replace $foo in the expression with [exec rm -rf *] or whatever you crazy Linux kids call it, and then the command substitution will be performed inside expr. This is less likely to happen if you disallow double substitution by bracing the expression.
2) The byte compiler can analyze a braced expression and replace the call to expr with more efficient inlined calculations. For an unbraced string, the compiler has no other option than to set up a call to expr whatever the expression is.
3) Seemingly paradoxically, it is not a problem to construct an expression by some trusted method and pass it unbraced via a variable (set myexpr [...] ; expr $myexpr), because this way you are still in full control of the content of the expression, and you are certainly not depending on the argument evaluator to patch it up for you. You won't get the bytecode optimization, though.

How to delete a character in a string using TCL

I have the following string
"TCL is known as "tool command language", TCL is known as "tool
command language", TCL is known as "tool command language""
from the above input I want a output like below
"TCL is known as tool command language, TCL is known as tool command
language, TCL is known as tool command language"
i.e. only first and last double quotes should be displayed on output, and all other should be deleted, Could someone let me know the different methods to accomplish this
There can be many ways. I have tried with regsub
set str {"TCL is known as "tool command language", TCL is known as "tool command language", TCL is known as "tool command language""}
puts "Input : $str"
regsub -all {(.)"} $str {\1} output
puts "Output : $output"
which will produce the following
Input : "TCL is known as "tool command language", TCL is known as "tool command language", TCL is known as "tool command language""
Output : "TCL is known as tool command language, TCL is known as tool command language, TCL is known as tool command language"
The pattern I have used is (.)". In regular expressions, the atom . will match any single character. (Will talk about the parenthesis usage at the bottom). Then a single quote. So, basically, this will match any single char and having a single quote next to it as shown below.
As you can see, we have a total of 6 matches. Let us take the 2nd match which is e". Our main intention is to remove the quotes. But, we have matched 2 characters. This is the reason why we have grouped it with parenthesis.
With Tcl, we can access 1st subgroup with the help of \1 and 2nd subgroup with \2 and so on. Finally, we are substituting the 2 characters with one character which is nothing but the first letter other than quote. i.e. e" is substituted with character e.
Notice the use of -all flag at the beginning which is responsible for matching all the occurrence of this pattern.
Note : \1 should be used with braces like {\1} as I have mentioned. In case, if you want to access it without braces, you have to use \\1
Reference : Non-capturing subpatterns
You could remove all quotes and re-add the outer ones. One way:
set new [format {"%s"} [string map {{"} {}} $str]]
You seek to remove all " characters from the string that have at least one character on each side of them. That leads to this regular expression substitution:
set transformedString [regsub -all {(.)[""]+(.)} $inputString {\1\2}]
The " is doubled up and in [brackets] just to make the highlighting here work. You could use {(.)"+(.)} instead.

expr command syntax: string expression must be enclosed?

When I write the following script:
expr "a" ne "ab"
I get an error:
invalid bareword "a"
in expression "a ne ab";
should be "$a" or "{a}" or "a(...)" or ...
I need to change it to expr {"a" ne "ab"}.
Yes, I know it is best practice to always brace-quote the expression arguments, but from syntax point of view, what is wrong in the above script?
I checked out the manual page, https://www.tcl.tk/man/tcl8.6/TclCmd/expr.htm, it does not say there is syntax requirement here.
Look at the man page again, under "Operands". A string operand must be enclosed in double quotes or braces. Those quotes or braces must themselves be quoted in the invocation, otherwise the Tcl interpreter will strip them off before passing the arguments to the command.
If your invocation is
expr "a" ne "ab"
The command will get the argument list
a ne ab
which it can't process.
You could quote the quotes like this:
expr \"a\" ne \"ab\"
But you are much better off bracing it all up into a single argument. This will look neater, preserve your quotes, and avoid some other serious problems as well.

Tcl: Is parameter evaluation guaranteed to be left-to-right?

I have a Tcl program where I often find expressions of the following kind:
proc func {} {...}
...
lappend arr([set v [func]]) $v
The intended meaning of the last line is
set v [func]
lappend arr($v) $v
It obviously works. What I would like to know: Does it work "by accident", or does Tcl guarantee, that the first parameter passed to lappend is evaluated before the second?
Tcl is always evaluated from left to right as you can read on the documentation, I quote the part:
Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like:
set y [set x 0][incr x][incr x]
will always set the variable y to the value, 012.
Agreed with Jerry. Adding some flavor in it.
Tcl commands are evaluated in two steps : parsing & execution.
First the Tcl interpreter parses the command string into words, performing substitutions along the way.
Then a command procedure processes the words to produce a result string. Each command has a separate command procedure.
Let us consider the following code.
%set input "The cat in the hat"
The cat in the hat
%string match "*at in*" $input
1
In the parsing step the Tcl interpreter applies the rules described in this chapter to divide the command up into words and perform substitutions.
Parsing is done in exactly the same way for every command. During the parsing step the Tcl interpreter does not apply any meaning to the values of the words. Tcl just performs a set of simple string operations such as replacing the characters $a with the string stored in variable a. Tcl does not know or care whether a or the resulting word is a number or the name of a widget or anything else.
In the execution step meaning is applied to the words of the command. Tcl treats the first word as a command name, checking to see if the command is defined and locating a command procedure to carry out its function. If the command is defined then the Tcl interpreter invokes its command procedure, passing all of the words of the command to the command procedure. The command procedure is free to interpret the words in any way that it pleases, and different commands apply very different meanings to their arguments.
Major rule to remember here
Tcl parses a command and makes substitutions in a single pass from left to right. Each character is scanned exactly once.
At most a single layer of substitution occurs for each character; the result of one substitution is not scanned for further
substitutions.
Reference : Tcl and the Tk Toolkit