For the following code:
set str "a bb ccc"
if {[string first bb "$str"] >= 0} {
puts "yes"
}
My college said I should not double-quote $str because there is performance difference, something like TCL makes a new object internally using $str.
I cannot find a convincing document on this. Do you know if the claim is accurate?
Your colleague is actually wrong, as Tcl's parser is smart enough to know that "$str" is identical to $str. Let's look at the bytecode generated (this is with Tcl 8.6.0, but the part that we're going to look at in detail is actually the same in older versions all the way back to 8.0a1):
% tcl::unsupported::disassemble script {
set str "a bb ccc"
if {[string first bb "$str"] >= 0} {
puts "yes"
}
}
ByteCode 0x0x78710, refCt 1, epoch 15, interp 0x0x2dc10 (epoch 15)
Source "\nset str \"a bb ccc\"\nif {[string first bb \"$str\"] >= 0} "
Cmds 4, src 74, inst 37, litObjs 7, aux 0, stkDepth 2, code/src 0.00
Commands 4:
1: pc 0-5, src 1-18 2: pc 6-35, src 20-72
3: pc 15-20, src 25-46 4: pc 26-31, src 61-70
Command 1: "set str \"a bb ccc\""
(0) push1 0 # "str"
(2) push1 1 # "a bb ccc"
(4) storeScalarStk
(5) pop
Command 2: "if {[string first bb \"$str\"] >= 0} {\n puts \"yes\"\n}"
(6) startCommand +30 2 # next cmd at pc 36, 2 cmds start here
Command 3: "string first bb \"$str\""
(15) push1 2 # "bb"
(17) push1 0 # "str"
(19) loadScalarStk
(20) strfind
(21) push1 3 # "0"
(23) ge
(24) jumpFalse1 +10 # pc 34
Command 4: "puts \"yes\""
(26) push1 4 # "puts"
(28) push1 5 # "yes"
(30) invokeStk1 2
(32) jump1 +4 # pc 36
(34) push1 6 # ""
(36) done
As you can see (look at (17)–(19)), the "$str" is compiled to a push of the name of the variable and a dereference (loadScalarStk). That's the most optimal sequence given that there's no local variable table (i.e., we're not in a procedure). The compiler doesn't do non-local optimizations.
I think your colleague is correct: if Tcl sees plain $str where a word is expected, it parses out that "str" as the name of a variable, looks it up in the approptiate scope, then extracts an internal object representing its value from that variable and then asks that object to produce the string representation of that value. At this point that string representation will be either already available and cached (in the object) — and it will, in your case, — or it will be transparently generated by the object, and cached.
If you put dereferencing of a variable ($str) in a double quoted string, then Tcl goes like this: when it sees the first " in a place where a word is expected, it enters a mode where it would parse the following characters, performing variable- and command substitutions as it goes until it sees the next unescaped ", at which point the substituted text accumulated since the opening " is considered to be one word and it ends up being in a (newly created) internal object representing that word's value.
As you can see, in the second (your) case the original object holding the value of a variable named "str" will be asked for its value, and it then will be used to construct another value while in the first case the first value would be used right away.
Now there's a more subtle matter. For the scripts it evaluates, Tcl only guarantees that its interpreter obeys certain evaluation rules, and nothing more; everything else is implementation details. These details might change from version to version; for instance, in Tcl 8.6, the engine has been reimplemented using non-recursive evaluation (NRE), and while those were rather radical changes to the Tcl internals, your existing scripts did not notice.
What I'm leading you to, is that discussing of implicit performance "hacks" such as the one we're at now only have sense when applied to a particular version of the runtime. I very much doubt Tcl currently optimizes away "$str" to just re-use the object from $str but it could eventually start, in theory.
The real "problem" with your approach is not performance degradation but rather an apparent self-delusion you seem to apply to yourself which leads to Tcl code of dubious style. Let me explain. Contrary to "more conventional" languages (usually influenced by C and the like), Tcl does not have special syntax for strings. This is because it does not have string literals: every value starting its life in a script from a literal is initially a string. The actual type of any value is defined at runtime by commands operating on those values. To demonstrate, set x 10; incr x will put a string "10" to a variable named "x", and then the incr command will force the value in that variable "x" to convert the string "10" it holds to an integer (of value 10); then this integer will be incremented by 1 (producing 11) invalidating the string representation as a side effect. If you later will do puts $x, the string representation will be regenerated from the integer (producing "11"), cached in the value and then printed.
Hence the code style you adopted actually tries to make Tcl code look more like Python (or Perl or whatever was your previous language) for no real value, and also look alien to seasoned Tcl developers. Both double quotes and curly braces are used in Tcl for grouping, not for producing string values and code blocks, respectively — these are just particular use cases for different ways of grouping. Consider reading this thread for more background.
Update: various types of grouping are very well explained in the tutorial which is worth reading as a whole.
Related
I'm learning about Tcl just now. I've seen just a bit of it, I see for instance to create a variable (and initialize it) you can do
set varname value
I am familiarizing with the fact that basically everything is a string, such as "value" above, but "varname" gets kind of a special treatment I guess because of the "set" built-in function, so varname is not interpreted as a string but rather as a name.
I can later on access the value with $varname, and this is fine to me, it is used to specify varname is not to be considered as a string.
I'm now reading about lists and a couple commands make me a bit confused
set colors {"aqua" "maroon" "cyan"}
puts "list length is [llength $colors]"
lappend colors "purple"
So clearly "lappend" is another one of such functions like set that can interpret the first argument as a name and not a string, but then why didn't they make it llength the same (no need for $)?
I'm thinking that it's just a convention that, in general, when you "read" a variable you need the $ while you don't for "writing".
A different look at the question: what Tcl commands are appropriate for list literals?
It's valid to count the elements of a list literal:
llength {my dog has fleas}
But it doesn't make sense to append a new element to a literal
lappend {my dog has fleas} and ticks
(That is actually valid Tcl, but it sets the odd variable ${my dog has fleas})
this is more sensible:
set mydog {my dog has fleas}
lappend mydog and ticks
Names are strings. Or rather a string is a name because it is used as a name. And $ in Tcl means “read this variable right now”, unlike in some other languages where it really means “here is a variable name”.
The $blah syntax for reading from a variable is convenient syntax that approximately stands in for doing [set blah] (with just one argument). For simple names, they become the same bytecode, but the $… form doesn't handle all the weird edge cases (usually with generated names) that the other one does. If a command (such as set, lappend, unset or incr) takes a variable name, it's because it is going to write to that variable and it will typically be documented to take a varName (variable name, of course) or something like that. Things that just read the value (e.g., llength or lindex) will take the value directly and not the name of a variable, and it is up to the caller to provide the value using whatever they want, perhaps $blah or [call something].
In particular, if you have:
proc ListRangeBy {from to {by 1}} {
set result {}
for {set x $from} {$x <= $to} {incr x $by} {
lappend result $x
}
return $result
}
then you can do:
llength [ListRangeBy 3 77 8]
and
set listVar [ListRangeBy 3 77 8]
llength $listVar
and get exactly the same value out of the llength. The llength doesn't need to know anything special about what is going on.
I can't understand how assignments and use of variables work in Tcl.
Namely:
If I do something like
set a 5
set b 10
and I do
set c [$a + $b]
Following what internet says:
You obtain the results of a command by placing the command in square
brackets ([]). This is the functional equivalent of the back single
quote (`) in sh programming, or using the return value of a function
in C.
So my statement should set c to 15, right?
If yes, what's the difference with
set c [expr $a + $b]
?
If no, what does that statement do?
Tcl's a really strict language at its core; it always follows the rules. For your case, we can therefore analyse it like this:
set c [$a + $b]
That's three words, set (i.e., the standard “write to a variable” command), c, and what we get from evaluating the contents of the brackets in [$a + $b]. That in turn is a script formed by a single command invocation with another three words, the contents of the a variable (5), +, and the contents of the b variable (10). That the values look like numbers is irrelevant: the rules are the same in all cases.
Since you probably haven't got a command called 5, that will give you an error. On the other hand, if you did this beforehand:
proc 5 {x y} {
return "flarblegarble fleek"
}
then your script would “work”, writing some (clearly defined) utter nonsense words into the c variable. If you want to evaluate a somewhat mathematical expression, you use the expr command; that's it's one job in life, to concatenate all its arguments (with a space between them) and evaluate the result as an expression using the documented little expression language that it understands.
You virtually always want to put braces around the expression, FWIW.
There are other ways to make what you wrote do what you expect, but don't do them. They're slow. OTOH, if you are willing to put the + first, you can make stuff go fast with minimum interference:
# Get extra commands available for Lisp-like math...
namespace path ::tcl::mathop
set c [+ $a $b]
If you're not a fan of Lisp-style prefix math, use expr. It's what most Tcl programmers do, after all.
set c [$a + $b]
Running the above command, you will get invalid command name "5" error message.
For mathematical operations, we should rely on expr only as Tcl treats everything as string.
set c [expr $a + $b]
In this case, the value of a and b is passed and addition is performed.
Here, it is always safe and recommended to brace the expressions as,
set c [expr {$a+$b}]
To avoid any possible surprises in the evaluation.
Update 1 :
In Tcl, everything is based on commands. It can a user-defined proc or existing built-in commands such as lindex. Using a bare-word of string will trigger a command call. Similarly, usage of [ and ] will also trigger the same.
In your case, $a replaced with the value of the variable a and since they are enclosed within square brackets, it triggers command call and since there is no command with the name 5, you are getting the error.
I am trying to make a script to transfer file to another device. Since I cannot account for every error that may occur, I am trying to make an if-all-else fails situation:
spawn scp filename login#ip:filename
expect "word:"
send "password"
expect {
"100" {
puts "success"
} "\*" {
puts "Failed"
}
}
This always returns a Failed message and does not even transfer the file, where as this piece of code:
spawn scp filename login#ip:filename
expect "word:"
send "password"
expect "100"
puts "success"
shows the transfer of the file and prints a success message.
I cant understand what is wrong with my if-expect statement n the first piece of code.
The problem is because of \*. The backslash will be translated by Tcl, thereby making the \* into * alone which is then passed to expect as
expect *
As you know, * matches anything. This is like saying, "I don't care what's in the input buffer. Throw it away." This pattern always matches, even if nothing is there. Remember that * matches anything, and the empty string is anything! As a corollary of this behavior, this command always returns immediately. It never waits for new data to arrive. It does not have to since it matches everything.
I don't know why you have used *. Suppose, if your intention is to match literal asterisk sign, then use \\*.
The string \\* is translated by Tcl to \*. The pattern matcher then interprets the \* as a request to match a literal *.
expect "*" ;# matches * and? and X and abc
expect "\*" ;# matches * and? and X and abc
expect "\\*" ;# matches * but not? or X or abc
Just remember two rules:
Tcl translates backslash sequences.
The pattern matcher treats backs lashed characters as literals.
Note : Apart from question, one observation. You are referring your expect block as a if-else block. It is not same as If-Else block.
The reason is, in traditional if-else block, we know for sure that at least one of that block will be executed. But, in expect, it is not the case. It is more of like multiple if blocks alone.
In TCL, I need to split an ipv6 address and port combination in the format [fec1::10]:80 to fec1::10 and 80.
Please suggest a way to do it.
Thanks!
(In the examples below I assume that the address will be subjected to further processing (expansion, etc) because there are a lot of forms that it can take: hence, in this preliminary stage I treat it simply as a string of any character rather than groups of hex digits separated by colons. The ip package mentioned by kostix is excellent for processing the address, just not for separating the address from the port number.)
Given the variable
set addrport {[fec1::10]:80}
There are several possible ways, including brute-force regular expression matching:
regexp -- {\[(.+)\]:(\d+)} $addrport -> addr port
(which means "capture a non-empty sequence of any character that is inside literal brackets, then skip a colon and thereafter capture a non-empty sequence of any digit"; the three variables at the end of the invocation get the whole match, the first captured submatch, and the second captured submatch, respectively)
(note 1: American usage of the word 'brackets' here: for British speakers I mean square brackets, not round brackets/parentheses)
(note 2: I'm using the code fragment -> in two ways: as a variable name in the above example, and as a commenting symbol denoting return value in some of the following examples. I hope you're not confused by it. Both usages are kind of a convention and are seen a lot in Tcl examples.)
regexp -inline -- {\[(.+)\]:(\d+)} $addrport
# -> {[fec1::10]:80} fec1::10 80
will instead give you a list with three elements (again, the whole match, the address, and the port).
Many programmers will stop looking for possible solutions here, but you're still with me, aren't you? Because there are more, possibly better, methods.
Another alternative is to convert the string to a two-element list (where the first element is the address and the second the port number):
split [string map {[ {} ]: { }} $addrport]
# -> fec1::10 80
(which means "replace any left brackets with empty strings (i.e. remove them) and any substrings that consist of a right bracket and a colon with a single space; then split the resulting string into a list")
it can be used to assign to variables like so:
lassign [split [string map {[ {} ]: { }} $addrport]] addr port
(which performs a sequential assign from the resulting list into two variables).
The scan command will also work:
scan $addrport {[%[^]]]:%d} addr port
(which means "after a left bracket, take a sequence of characters that does not include a right bracket, then skip a right bracket and a colon and then take a decimal number")
want the result as a list instead?
scan $addrport {[%[^]]]:%d}
# -> fec1::10 80
Even split works, in a slightly roundabout way:
set list [split $addrport {[]:}]
# -> {} fec1 {} 10 {} 80
set addr [lindex $list 1]::[lindex $list 3]
set port [lindex $list 5]
(note: this will have to be rewritten for addresses that are expanded to more than two groups).
Take your pick, but remember to be wary of regular expressions. Quicker, easier, more seductive they are, but always bite you in the ass in the end, they will.
(Note: the 'Hoodiecrow' mentioned in the comments is me, I used that nick earlier. Also note that at the time this question appeared I was still sceptical towards the ip module: today I swear by it. One is never to old to learn, hopefully.)
The ip package from the Tcl standard library can do that, and more.
One of the simplest ways to parse these sorts of things is with scan. It's the command that many Tclers forget!
set toParse {[fec1::10]:80}
scan $toParse {[%[a-f0-9:]]:%d} ip port
puts "host is $ip and port is $port"
The trick is that you want “scan charcters from limited set”. And in production code you want to check the result of scan, which should be the number of groups matched (2 in this case).
I am looking at the TCL source code and trying to understand the mechanism how the TCL variables are managed internally. For example, given the following TCL script:
set a 1
set b 2
I looked at Tcl_SetObjCmd() function, it sets an object to interpreter and that is it. So when the first line runs, there is a Tcl_Obj is set to interpreter with value "1", but I do not find where this object is retrieved which leads to my ultimate goal, where does that object get stored?
Any pointer is greatly appreciated!
It's more complicated than it appears. The simple version is that the Tcl expression there would call Tcl_SetVar2Ex as something like Tcl_SetVar2Ex(interp, "a", NULL, Tcl_NewIntObj(1), 0) to setup a variable called 'a' in the interpreter with the value given. It will also assign this to to the interpreter's result using Tcl_SetObjResult.
However, modern Tcl does byte compilation and executes something else. We can examine this as shown below:
% tcl::unsupported::disassemble script {set a 1}
ByteCode 0x0x10e0110, refCt 1, epoch 3, interp 0x0xde9d00 (epoch 3)
Source "set a 1"
Cmds 1, src 7, inst 6, litObjs 2, aux 0, stkDepth 2, code/src 0.00
Commands 1:
1: pc 0-4, src 0-6
Command 1: "set a 1"
(0) push1 0 # "a"
(2) push1 1 # "1"
(4) storeScalarStk
(5) done
So the byte compiled version actually pushes the name and the value onto the stack then calls this storeScalarStk function. Some digging in the sources shows this gets executed in generic/tclExecute.c as INST_STORE_SCALAR_STK which basically just jumps to doCallPtrSetVar where it calls TclPtrSetVar which does a similar job to the Tcl_SetVar2Ex function from the public API. The main advantage of the byte compilation is on repeat runs where the syntactic parsing has already been handled so subsequent execution of a function is much faster than the first run.
Your basic question seems to be about how the value was returned to the interpreter. The interp structure has a result slot that is manipulated with Tcl_SetObjResult and Tcl_GetObjResult. Functions that want to return a result to script level assign a Tcl_Obj to the interp result.