Does Tcl eval command prevent byte coding? - tcl

I know that in some dynamic, interpreted languages, using eval can slow things down, as it stops byte-coding.Is it so in Tcl 8.5?
Thanks

It doesn't prevent bytecode compilation, but it can slow things down anyway. The key issue is that it can prevent the bytecode compiler from having access to the local variable table (LVT) during compilation, forcing variable accesses to go via a hash lookup. Tcl's got an ultra-fast hash algorithm (we've benchmarked it a lot and tried a lot of alternatives; it's very hot code) but the LVT has it beat as that's just a simple C array lookup when the bytes hit the road. The LVT is only known properly when compiling a whole procedure (or other procedure-like thing, such as a lambda term or TclOO method).
Now, I have tried making this specific case:
eval {
# Do stuff in here...
}
be fully bytecode-compiled and it mostly works (apart from a few weird things that are currently observable but perhaps shouldn't be) yet for the amount that we use that, it's just plain not worth it. In any other case, the fact that the script can't be known too precisely at the point where the compiler is running forces the LVT-less operation mode.
On the other hand, it's not all doom and gloom. Provided the actual script being run inside the eval doesn't change (and that includes not being regenerated through internal concat — multi-argument eval never gets this benefit) Tcl can cache the compilation of the code in the internal representation of the script value, LVT-less though it is, and so there's still quite a bit of performance gain there. This means that this isn't too bad, performance wise:
set script {
foo bar $boo
}
for {set i 0} {$i < 10} {incr i} {
eval $script
}
If you have real performance-sensitive code, write it without eval. Expansion syntax — {*} — can help here, as can helper procedures. Or write the critical bits in C or C++ or Fortran or … (see the critcl and ffidl extension packages for details of cool ways to do this, or just load the DLL as needed if it has a suitable *_Init function).

Related

When is a command's compile function called?

My understanding of TCL execution is, if a command's compile function is defined, it is first called when it comes to execute the command before its execution function is called.
Take command append as example, here is its definition in tclBasic.c:
static CONST CmdInfo builtInCmds[] = {
{"append", (Tcl_CmdProc *) NULL, Tcl_AppendObjCmd,
TclCompileAppendCmd, 1},
Here is my testing script:
$ cat t.tcl
set l [list 1 2 3]
append l 4
I add gdb breakpoints at both functions, TclCompileAppendCmd and Tcl_AppendObjCmd. My expectation is TclCompileAppendCmd is hit before Tcl_AppendObjCmd.
Gdb's target is tclsh8.4 and argument is t.tcl.
What I see is interesting:
TclCompileAppendCmd does get hit first, but it is not from t.tcl,
rather it is from init.tcl.
TclCompileAppendCmd gets hit several times and they all are from init.tcl.
The first time t.tcl executes, it is Tcl_AppendObjCmd gets hit, not TclCompileAppendCmd.
I cannot make sense of it:
Why is the compile function called for init.tcl but not for t.tcl?
Each script should be independently compiled, i.e. the object with compiled command append at init.tcl is not reused for later scripts, isn't it?
[UPDATE]
Thanks Brad for the tip, after I move the script to a proc, I can see TclCompileAppendCmd is hit.
The compilation function (TclCompileAppendCmd in your example) is called by the bytecode compiler when it wants to issue bytecode for a particular instance of that particular command. The bytecode compiler also has a fallback if there is no compilation function for a command: it issues instructions to invoke the standard implementation (which would be Tcl_AppendObjCmd in this case; the NULL in the other field causes Tcl to generate a thunk in case someone really insists on using a particular API but you can ignore that). That's a useful behaviour, because it is how operations like I/O are handled; the overhead of calling a standard command implementation is pretty small by comparison with the overhead of doing disk or network I/O.
But when does the bytecode compiler run?
On one level, it runs whenever the rest of Tcl asks for it to be run. Simple! But that's not really helpful to you. More to the point, it runs whenever Tcl evaluates a script value in a Tcl_Obj that doesn't already have bytecode type (or if the saved bytecode indicates that it is for a different resolution context or different compilation epoch) except if the evaluation has asked to not be bytecode compiled by the flag TCL_EVAL_DIRECT to Tcl_EvalObjEx or Tcl_EvalEx (which is a convenient wrapper for Tcl_EvalObjEx). It's that flag which is causing you problems.
When is that flag used?
It's actually pretty simple: it's used when some code is believed to be going to be run only once because then the cost of compilation is larger than the cost of using the interpretation path. It's particularly used by Tk's bind command for running substituted script callbacks, but it is also used by source and the main code of tclsh (essentially anything using Tcl_FSEvalFileEx or its predecessors/wrappers Tcl_FSEvalFile and Tcl_EvalFile). I'm not 100% sure whether that's the right choice for a sourced context, but it is what happens now. However, there is a workaround that is (highly!) worthwhile if you're handling looping: you can put the code in a compiled context within that source using a procedure that you call immediately or use an apply (I recommend the latter these days). init.tcl uses these tricks, which is why you were seeing it compile things.
And no, we don't normally save compiled scripts between runs of Tcl. Our compiler is fast enough that that's not really worthwhile; the cost of verifying that the loaded compiled code is correct for the current interpreter is high enough that it's actually faster to recompile from the source code. Our current compiler is fast (I'm working on a slower one that generates enormously better code). There's a commercial tool suite from ActiveState (the Tcl Dev Kit) which includes an ahead-of-time compiler, but that's focused around shrouding code for the purposes of commercial deployment and not speed.

How many arguments are passed in a function call?

I wish to analyze assembly code that calls functions, and for each 'call' find out how many arguments are passed to the function. I assume that the target functions are not accessible to me, but only the calling code.
I limit myself to code that was compiled with GCC only, and to System V ABI calling convention.
I tried scanning back from each 'call' instruction, but I failed to find a good enough convention (e.g., where to stop scanning? what happen on two subsequent calls with the same arguments?). Assistance is highly appreciated.
Reposting my comments as an answer.
You can't reliably tell in optimized code. And even doing a good job most of the time probably requires human-level AI. e.g. did a function leave a value in RSI because it's a second argument, or was it just using RSI as a scratch register while computing a value for RDI (the first argument)? As Ross says, gcc-generated code for stack-args calling-conventions have more obvious patterns, but still nothing easy to detect.
It's also potentially hard to tell the difference between stores that spill locals to the stack vs. stores that store args to the stack (since gcc can and does use mov stores for stack-args sometimes: see -maccumulate-outgoing-args). One way to tell the difference is that locals will be reloaded later, but args are always assumed to be clobbered.
what happen on two subsequent calls with the same arguments?
Compilers always re-write args before making another call, because they assume that functions clobber their args (even on the stack). The ABI says that functions "own" their args. Compilers do make code that does this (see comments), but compiler-generated code isn't always willing to re-purpose the stack memory holding its args for storing completely different args in order to enable tail-call optimization. :( This is hand-wavey because I don't remember exactly what I've seen as far as missed tail-call optimization opportunities.
Yet if arguments are passed by the stack, then it shall probably be the easier case (and I conclude that all 6 registers are used as well).
Even that isn't reliable. The System V x86-64 ABI is not simple.
int foo(int, big_struct, int) would pass the two integer args in regs, but pass the big struct by value on the stack. FP args are also a major complication. You can't conclude that seeing stuff on the stack means that all 6 integer arg-passing slots are used.
The Windows x64 ABI is significantly different: For example, if the 2nd arg (after adding a hidden return-value pointer if needed) is integer/pointer, it always goes in RDX, regardless of whether the first arg went in RCX, XMM0, or on the stack. It also requires the caller to leave "shadow space".
So you might be able to come up with some heuristics to will work ok for un-optimized code. Even that will be hard to get right.
For optimized code generated by different compilers, I think it would be more work to implement anything even close to useful than you'd ever save by having it.

Tcl upvar performance improvement vs. direct pass

This pertains to Tcl 8.5
Say I have a very large dictionary.
From performance points of view (memory footprint, etc), assuming that I do not modify the dictionary, should upvar provide a massive performance improvement in terms of memory? I am using an EDA tool, which has TCL shell, but the vendor disabled the TCL memory command. I know that Tcl can share strings under the hood for performance... The same dictionary can be passed several nested procs' calls.
Thanks.
As long as you don't modify the dictionary, it won't provide much noticeable performance difference or memory-consumption difference.
Tcl passes values by immutable reference, and copies them when you write an update to them if they're shared, e.g., between a global variable and a local variable (procedure formal parameters are local variables). If you never change anything, you're going to use a shared reference and everything will be quick. If you do need to change something, you should use upvar or global (or one of the more exotic variants) to make a local variable alias to the caller's/global variable and change via that, as that's fastest. But that's only an issue if you're going to change the value.
I would imagine that under the hood the dictionary isn't copied until it's written to so if there's no writes then you should be okay. Use a global if you want to be absolutely sure.
proc myproc {} {
global mydictionary
}

How to carry out synchronous processes in TCL

I am trying to carry out two process in parallel. Help me in writing a code in tcl which carries out two processes synchronously in TCL.
In Tcl, there are two ways to run a pair of subprocesses “at the same time”.
Simplest: Without control
If you just want to fire off two processes at once without keeping any control over them, put an ampersand (&) as the last argument to exec:
exec process1 "foo.txt" &
exec process2 "bar.txt" &
Note that, apart from the process ID (returned by exec) you've got no control over these subprocesses at all. Once you set them going, you'll essentially never hear from them again (using appropriate redirections to/from standard in/out may well be advisable!)
More complex: With control
To keep control over a subprocess while running it in the background, make it run in a pipeline created with open. The syntax for doing so is rather odd; be sure to follow it exactly (except as noted below):
set pipelineChannel1 [open |[list process1 "foo.txt" ] "r"]
set pipelineChannel2 [open |[list process2 "bar.txt" ] "r"]
These are reader pipelines where you're consuming the output of the subprocesses; that's what the (optional) r means. To get a pipeline that you write to (i.e., that you provide input to) you use w instead, and if you want to both read and write, use r+. The pipelines are then just normal channels that you use with puts, gets, read, fconfigure, etc. Just close when you are done.
The | must come outside and immediately before the [list …]. This matters especially if the name of the command (possibly a full pathname) has any Tcl metacharacters in it, and is because the specification of open says this:
If the first character of fileName is “|” then the remaining characters of fileName are treated as a list of arguments that describe a command pipeline to invoke, in the same style as the arguments for exec.
The main things to beware of when working with a pipeline are that.
The processing of the subprocesses really is asynchronous. You need to take care to avoid forcing too much output through at once, though turning on non-blocking IO with fconfigure $channel -blocking 0 is usually enough there.
The other processes can (and frequently do) buffer their output differently when outputting to a pipeline than when they're writing to a terminal. If this is a problem, you'll have to consider whether to use a package like Expect (which can also run multiple interactions at once, though that should be used much more sparingly as virtual terminals are a much more expensive and limited system resource than pipelines).
If you're doing truly complex asynchronous interactions with the subprocesses, consider using Tcl 8.6 where there are Tcllib packages built on top of the base coroutine feature that make keeping track of what's going on much easier.

Bootstrapping an interpreter?

We know that a compiler can be written in its own language using a trick known as bootstrapping. My question is whether this trick can be applied to interpreters as well?
In theory the answer is certainly yes, but there is one worry that the interpretation of source code will become more and more inefficient as we go through the iterations. Would that be a serious problem?
I'm bootstrapping a very dynamical system where the programs will be constantly changing, so it rules out a compiler.
Let me spell it out this way:
Let the i's be interpreters.
Let the L's be programming languages.
We can write i1 in machine code (lowest level), to interpret L1.
We then write i2 in L1, interpreting L2 -- a new language.
We then write i3 in L2, interpreting L3 -- another new language.
and so on...
We don't need any compiler above, just interpreters. Right?
It could be inefficient. That is my question, and how to overcome it if it is indeed inefficient.
That doesn't make sense. An interpreter doesn't produce a binary, so can't create something that can run itself standalone. Somewhere, ultimately, you need to have a binary that is the interpreter.
Example of a compiler bootstrapping. Let's say we have two languages A(ssembler) and C. We want to bootstrap a C compiler written in C. But we only have an assembler to start with.
Write basic C compiler in A
Write C compiler in C and compile with earlier compiler written in A
You now have a C compiler which can compile itself, you don't need A or the original compiler any more.
Later runs become just
Compile C program using compiler written in C
Now let's say you have an interpreted language instead, I'll call it Y. The first version can be called Y1, the next Y2 and so on. Let's try to "bootstrap" it.
First off we don't have anything that can interpret Y programs, we need to write a basic interpreter. Let's say we have a C compiler and write a Y1 interpreter in C.
Write Y1 interpreter in C, compile it
Write Y2 interpreter in Y1, run it on Y1 interpreter written in C
Write Y3 interpreter in Y2, run it on Y2 interpreter running on Y1 interpreter... Written in C.
The problem is that you can never escape the stack of interpreters as you never compile a higher level interpreter. So you're always going to need to compile and run that first version interpreter written in C. You can never escape it, which I think is the fundamental point of the compiler bootstrapping process. This is why I say your question does not make sense.
The answer depends on what is being interpreted. If you're targeting a virtual machine which interprets bytecode, and your language is being developed iteratively while the bytecode doesn't change, then it is not a given that you will lose performance along the way. There are plenty of examples of languages which are bootstrapped on a target VM which wasn't designed particularly for that language, and they don't suffer a significant performance hit as a direct result (Scala on the JVM, for instance).
Using the JVM for example, you'd write the first compiler in Java, which compiles your source language to JVM bytecode. Then you'd rewrite your compiler to do exactly the same thing but in your new source language. The resulting bytecode could be indistinguishable between the two. Note that this is not the same thing as writing an interpreter in an interpreted language, which will become slower with each iteration.
This sentence does not seem to make sense:
I'm bootstrapping a very dynamical system where the programs will be constantly changing, so it rules out a compiler.
No matter if you have an interpreter or a compiler: both will have to deal with something that is not changing, i.e. with your language. And even if the language is somehow "dynamic", then there will be a meta-language that is fixed. Most probably you also have some low-level code, or at least a data structure the interpreter is working with.
You could first design and formalize this low-level code (whatever it is) and write some program that can "run" this. Once you have this, you can add a stack of interpreters and as long as they all produce this low level code, efficiency should not be an issue.
You can indeed, and this is the approach used by squeak (and I believe many other smalltalks). Here is one approach to doing just that: https://github.com/yoshikiohshima/SqueakBootstrapper/blob/master/README