tcl - when to use package+namespace vs interp? - tcl

I'm just starting with TCL and trying to get my head around how to best define and integrate modules. There seem to be much effort put into the package+namespace concept, but from what I can tell interp is more powerful and lean for every thinkable scenario. In particular when it comes to hiding and renaming procedures, but also the lack of creep in the global namespace. The only reason to use package+namespaces seem to be because "once upon a time Sun said so".
When should I ever use package+namespace instead of interp?

Namespaces and packages work together. Interpreters are something else.
A namespace is a small scale naming context in Tcl. It can contain commands, variables and other namespaces. You can refer to entities in a namespace via either local names (foo) or via qualified names (bar::foo); if a qualified name starts with ::, it is relative to the (interpreter-)global namespace, and can be used to refer to its command or variable from anywhere in the interpreter. (FWIW, the TclOO object system builds extensively on top of namespaces; there is one namespace per object.)
A package is a high-level concept for a bunch of code supplied by some sort of library. Packages have abstract names (the name do not have to correspond to how the library's implementation is stored on disk) and a distinct version; you can ask for a particular version if necessary, though most of the time you don't bother. Packages can be implemented by multiple mechanisms, but they almost all come down to sourceing some number of Tcl scripts and loading some number of DLLs. Almost all packages declare commands, and they conventionally are encouraged to put those commands in a namespace with the same general name as the package. However, quite a few older packages do not do this for various reasons, mostly to do with compatibility with existing code.
An interpreter is a security context in Tcl. By default, Tcl creates one interpreter (plus another if it sets up the console window in wish). Named entities in one interpreter are completely distinct from named entities in another interpreter with a few key exceptions:
Channels have common names across all interpreters. This means that an interpreter can talk about channels owned by another interpreter, but merely being able to mention its name doesn't give permission to access the channel. (The stdin, stdout and stderr channels are shared by default.)
The interp alias command can be used to make alias commands, which are such that invoking a command (the alias) in one interpreter can cause a command (the implementation) in another interpreter to be called, with all arguments safely passed over. This allows one interpreter to expose whatever special calls it wants another interpreter to access without losing control, but it is up to the implementation of those commands to act safely on those arguments.
A safe interpreter is one with the unsafe commands of Tcl profiled out by default. (That's things like open, socket, source, load, cd, etc.) The parent interpreter that created the safe child interpreter can use the alias mechanism to add in exactly the functionality desired; it's very much analogous to an OS system call except you can easily make your own application-specific ones.
Tcl's threading package is designed to create one interpreter per thread (and the aliasing mechanism does not work across threads). That means that there's very little in the way of shared resources by default, and inter-thread communication is done via queued message passing.
In general, packages are required at most once per interpreter and are how you are recommended to access most third-party functionality. Namespaces are fairly lightweight and are used for all sorts of things, and interpreters are considered to be expensive; lots of quite thoroughly production-grade Tcl scripts only ever work with a single interpreter. (Threads are even more expensive than interpreters; it's good practice to match the number of threads you create to the hardware load that you wish to impose, probably through the use of suitable thread pools.)

The purpose of a module is to provide modular code, i.e. code that can easily be used by applications beyond the module writer's knowledge and control, and that encapsulates their own internals.
Package-namespace- and interpreter-based modules are probably equally good at encapsulation, but it's not as easy to make interpreter-based modules that play well with arbitrary applications (it is of course possible).
My own opinion is that interpreters are application level (I mostly use them for user input and for controlled evaluation), not module level. Both namespaces and packages have their warts, but in most cases they do what is expected of them with a minimum of fuss.
My recommendation is that if you are writing modules for your own benefit and interpreters serve you well, by all means use them. If you write modules that other people are to use, possibly including yourself in 18 months, you should stick with namespaces and packages.

Related

The use of packages to parse command arguments employing options/switches?

I have a couple questions about adding options/switches (with and without parameters) to procedures/commands. I see that tcllib has cmdline and Ashok Nadkarni's book on Tcl recommends the parse_args package and states that using Tcl to handle the arguments is much slower than this package using C. The Nov. 2016 paper on parse_args states that Tcl script methods are or can be 50 times slower.
Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Can both be easily included in a starkit?
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
Thank you for considering my questions.
Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Potentially. Procedures have overhead to do with managing the stack frame and so on, and code implemented in C can avoid a number of overheads due to the way values are managed in current Tcl implementations. The difference is much more profound for numeric code than for string-based code, as the cost of boxing and unboxing numeric values is quite significant (strings are always boxed in all languages).
As for which is the one to use, it really depends on the details as you are trading off flexibility for speed. I've never known it be a problem for command line parsing.
(If you ask me, fifty options isn't really that many, except that it's quite a lot to pass on an actual command line. It might be easier to design a configuration file format — perhaps a simple Tcl script! — and then to just pass the name of that in as the actual argument.)
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Performance? Details of how you describe things to the parser?
Can both be easily included in a starkit?
As long as any C code is built with Tcl stubs enabled (typically not much more than define USE_TCL_STUBS and link against the stub library) then it can go in a starkit as a loadable library. Using the stubbed build means that the compiled code doesn't assume exactly which version of the Tcl library is present or what its path is; those are assumptions that are usually wrong with a starkit.
Tcl-implemented packages can always go in a starkit. Hybrid packages need a little care for their C parts, but are otherwise pretty easy.
Many packages either always build in stubbed mode or have a build configuration option to do so.
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
We think we're about a month from the feature freeze for 8.7, and builds seem stable in automated testing so the beta phase will probably be fairly short. The list of what's in can be found here (filter for 8.7 and Final). However, bear in mind that we tend to feel that if code can be done in an extension then there's usually no desperate need for it to be in Tcl itself.

How do I find where a function is declared in Tcl?

I think this is more of a Tcl configuration question rather than a Tcl coding question...
I inherited a whole series of Tcl scripts that are used within a simulation tool that my company built in-house. In my scripts, I'm finding numerous instances where there are function calls to functions that don't seem to be declared anywhere. How can I trace the path to these phantom functions?
For example, rather than use source, someone build a custom include function that they named INCLUDE. Tclsh obviously balks when I try to run it there, but with my simulation software, it runs fine.
I've tried grep-ing through the entire simulation software for INCLUDE, but I'm not having any luck. Are there any other obvious locations outside the simulation software where a Tcl function might be defined?
The possibilities:
Within your software. (you have checked for this).
Within some other package included by the software.
Check and see if the environment variable TCLLIBPATH is set.
Also check and see if the simulation software sets TCLLIBPATH.
This will be a list of directories to search for Tcl packages, and you
will need to search the packages that are located outside of the
main source tree.
Another possibility is that the locations are specified in the pkgIndex.tcl file.
Check any pkgIndex.tcl files and look for locations outside the main source tree.
Within an unknown command handler. This could be in
your software or within some other package. You should be able to find
some code that processes the INCLUDE statement.
Within a binary package. These are shared libraries that are loaded
by Tcl. If this is the case, there should be some C code used to
build the shared library that can be searched.
Since you say there are numerous instances of unknown functions, my first
guess is that you have
not found all the directories where packages are loaded from. But an
''unknown'' command handler is also a possibility.
Edit:
One more possibility I forgot. Check and see if your software sets the auto_path variable. Check any directories added to the auto_path for
other packages.
This isn't a great answer for you, but I suspect it is the best you're going to get...
The procedure could be defined in a great many places. Your best bet for finding it is to use a tool like findstr (on Windows) or grep -R (on POSIX platforms) to search across all the relevant source files. But that still might not help! It might not be a procedure but instead a general command, which could be implemented in C and not as a procedure, or it could be defined in a packaged application archive (which are usually awkward to look inside). There are also other types of script-implemented command too, which could make things awkward. Generally searching and investigating is your best bet, but it might not work.
Tcl doesn't really differentiate strongly between different types of command except in some introspection operations. If you're lucky, you could find that info body tells you the definition of the procedure (and info args and info default tell you about the arguments) but that won't help with other command types at all. Tcl 8.7 will include a command (info cmdtype) that would help a lot with narrowing down what to do next, but that's no use to you now and it definitely doesn't exist in older versions.

Advantages of a VM

The majority of languages I have come across utilise a VM, or virtual machine. Languages such as Java (the JVM), Python, Ruby, PHP (the HHVM), etc.
Then there are languages such as C, C++, Haskell, etc. which compile directly to native.
My question is, what is the advantage of using a VM (outside of OS-independence)? Isn't using a VM just creating an extra interpretation step, by going [source code -> bytecode -> native] instead of just [source code -> native]?
Why use a VM when you can compile directly?
EDIT
My understanding is that Python, Ruby, et al. use something akin to a VM, if not exactly fitting under such a definition, where scripts are compiled to an intermediate representation (for Python, e.g. .pyc files).
EDIT 2
Yep. Looked it up. Python, Ruby and PHP all use intermediate representations, but are simply not stored in seperate files but executed by the VM directly. See question : Java "Virtual Machine" vs. Python "Interpreter" parlance?
" Even though Python uses a virtual machine under the covers, from a
user's perspective, one can ignore this detail most of the time. "
An advantage of VM is that, it is much easier to modify some parts of the code on runtime, which is called Reflection. It brings some elegance capabilities. For example, you can ask the user which function/class he want to call, and call the function/class by its STRING name. In Java programs (and maybe some other VM-based languages) users can add additional library to the program in runtime, and the library can be run immediately!
Another advantage is the ability to use advanced garbage collection, because the bytecode's structure is easier to analyze.
Let me note that a virtual machine does not always interpret the code, and therefore it is not always slower than machine code. For example, Java has a component named hotspot which searches for code blocks that are frequently called, and replaces their bytecode with native code (machine code). For instance, if a for loop is called for, say , 100+ times, hotspot converts it to machine-code, so that in the next calls it will run natively! This insures that just the bottlenecks of your code are running natively, while the rest part allows for the above advantages.
P.S. It is not impossible to compile the code directly to native code. Many VM-based languages have compiler versions (e.g. there is a compiler for PHP: http://www.phpcompiler.org). However, remember that you are disabling some of the above features by compiling the whole program to native code.
P.S. The [source-code -> byte-code] part is not a problem, it is compiled once and does not relate to execution time. I presumed you are asking why they do not execute the machine code while it is possible.
Python, Ruby, and PhP do not utilize VMs. They are, however, interpreted.
To answer your actual question: Java utilizes a VM in order to add some distance between the operating system/hardware and the code being executed. The goal there was security and hardiness (hardiness meaning there was a lower likelihood of code having an averse effect on other processes in the system.)
All the languages you listed are interpreted so I think what you may have actually meant to ask was the difference between interpreted and compiled languages. Interpreted languages are cross-platform. That is the biggest, and main, advantage. You need not compile them for each different set of hardware or operating system they operate on, and instead they will simply work everywhere.
The advantage of a compiled language, traditionally, is speed and efficiency.
Because a VM allows for the same set of instructions to be run on my different operating systems (provided they have the interperetor)
Let's take Java as an example. Java gets compiled into bytecode, which is basically a set of operations for a computer to follow. However, not all processors in computers understand the same set of instructions the same way - meaning, what one set of native instruction means on computer A could be something different on computer B.
As a result, a VM is run, with one specific to each computer. This way, the Java bytecode that is written is standardized, and only the interpreter has to work to convert it to machine language.
OS independence is a big part of it but you also get abstractions from other things like CPUs... the same Java code can execute on ARM, x86, whatever without modification so long as there is a JVM in place.

Tcl extensions: Life Cycle of extensions' ClientData

Non-trivial native extensions will require per-interpreter data
structures that are dynamically allocated.
I am currently using Tcl_SetAssocData, with a key corresponding
to the extension's name and an appropriate deletion routine,
to prevent this memory from leaking away.
However, Tcl_PkgProvideEx also allows one to record such
information. This information can be retrieved by
Tcl_PkgRequireEx. Associating the extension's data structures
with its package seems more natural than in the "grab-bag"
AssocData; yet the Pkg*Ex routines do not provide an
automatically invoked deletion routine. So I think I need
to stay with the AssocData approach.
For which situations were the Pkg*Ex routines designed for?
Additionally, the Tcl Library allows one to install
ExitHandlers and ThreadExitHandlers. Paraphasing the
manual, this is for flushing buffers to disk etc.
Are there any other situations requiring use of ExitHandlers?
When Tcl calls exit, are Tcl_PackageUnloadProcs called?
The whole-extension ClientData is intended for extensions that want to publish their own stub table (i.e., an organized list of functions that represent an exact ABI) that other extensions can build against. This is a very rare thing to want to do; leave at NULL if you don't want it (and contact the Tcl core developers' mailing list directly if you do; we've got quite a bit of experience in this area). Since it is for an ABI structure, it is strongly expected to be purely static data and so doesn't need deletion. Dynamic data should be sent through a different mechanism (e.g., via the Tcl interpreter or through calling functions via the ABI).
Exit handlers (which can be registered at multiple levels) are things that you use when you have to delete some resource at an appropriate time. The typical points of interest are when an interpreter (a Tcl_Interp structure) is being deleted, when a thread is being deleted, and when the whole process is going away. What resources need to be specially deleted? Well, it's usually obvious: file handles, database handles, that sort of thing. It's awkward to answer in general as the details matter very much: ask a more specific question to get tailored advice.
However, package unload callbacks are only called in response to the unload command. Like package load callbacks, they use “special function symbol” registration, and if they are absent then the unload command will refuse to unload the package. Most packages do not use them. The use case is where there are very long-lived processes that need to have extra upgradeable functionality added to them.

What are common conventions for using namespaces in Clojure?

I'm having trouble finding good advice and common practices for the use of namespaces in Clojure. I realize that namespaces are not the same as Java packages so I'm trying to tease out the conventions in Clojure, which seem surprisingly hard to determine.
I think I have a pretty good idea how to split functions into clj files and even roughly how I'd want to organize those files into directories. But beyond that I'm having trouble finding the mechanics for my dev environment. Some inter-related questions:
Do I use the same uniqueness conventions for Clojure namespaces as I would normally use for Java packages? [ie backwards-company-domain.project.subsystem]
Should I save my files in a directory structure that matches my namespaces? [ala Java]
If I have multiple namespaces, do I need to compile all of my code into a jar and add it to my classpath to make it accessible?
Should each namespace compile to one jar? Or should I create a single jar that contains clj code from many namespaces?
Thanks...
I guess it's ok if you think it helps, but many Clojure projects don't do so -- cf. Compojure (using a top-level compojure ns and various compojure.* ns's for specific functionality), Ring, Leiningen... Clojure itself uses clojure.* (and clojure.contrib.* for contrib libraries), but that's a special case, I suppose.
Yes! You absolutely must do so, or else Clojure won't be able to find your namespaces. Also note that you musn't use the underscore in namespace names or the hyphen in filenames and wherever you use a hyphen in a namespace name, you must use an underscore in the filename (so that the ns my.cool-project is defined in a file called cool_project.clj in a directory called my).
You need to make sure all your stuff is on the classpath, but it doesn't matter if it's in a jar, multiple jars, a mixture of jars and directories on the filesystem... As long as it obeys the correct naming conventions (your point no. 2) you should be fine.
However, do not compile things ahead-of-time if there's no particular reason to do so -- this may prevent your code from being portable across various versions of Clojure without providing any benefits besides a slightly improved loading time.
You'll still need to use AOT compilation sometimes, notably in some Java interop scenarios -- the documentation of the relevant functions / macros always mentions that. There are examples of things requiring AOT in clojure.contrib; I've never needed it, so I can't provide much in the way of details.
I'd say you should use jars for functional units of code. E.g. Compojure and Ring get packaged as single jars containing many namespaces which together compose the whole package. Also, clojure.contrib is notably packaged as a single jar with multiple unrelated libraries; but that again may be a special case.
On the other hand, a single jar containing all of your project's code together with its dependencies might occasionally be useful for deployment. Check out the Leiningen build tool and its 'uberjar' facility if you think that sort of thing may be useful to you.
Strictly speaking, not necessary, though many Java projects have dropped that convention as well, especially for internal projects or private APIs. Do avoid single-segment namespaces though, which would result in classfiles being generated in the default package.
Yes.
Regarding 3 & 4, packaging and AOT compilation are entirely orthogonal to the question of namespace conventions.