Why make global Lua functions local? - function

I've been looking at some Lua source code, and I often see things like this at the beginning of the file:
local setmetatable, getmetatable, etc.. = setmetatable, getmetatable, etc..
Do they only make the functions local to let Lua access them faster when often used?

Local data are on the stack, and therefore they do access them faster. However, I seriously doubt that the function call time to setmetatable is actually a significant issue for some program.
Here are the possible explanations for this:
Prevention from polluting the global environment. Modern Lua convention for modules is to not have them register themselves directly into the global table. They should build a local table of functions and return them. Thus, the only way to access them is with a local variable. This forces a number of things:
One module cannot accidentally overwrite another module's functions.
If a module does accidentally do this, the original functions in the table returned by the module will still be accessible. Only by using local modname = require "modname" will you be guaranteed to get exactly and only what that module exposed.
Modules that include other modules can't interfere with one another. The table you get back from require is always what the module stores.
A premature optimization by someone who read "local variables are accessed faster" and then decided to make everything local.
In general, this is good practice. Well, unless it's because of #2.

In addition to Nicol Bolas's answer, I'd add on to the 3rd point:
It allows your code to be run from within a sandbox after it's been loaded.
If the functions have been excluded from the sandbox and the code is loaded from within the sandbox, then it won't work. But if the code is loaded first, the sandbox can then call the loaded code and be able to exclude setmetatable, etc, from the sandbox.

I do it because it allows me to see the functions used by each of my modules
Additionally it protects you from others changing the functions in global environment.
That it is a free (premature) optimisation is a bonus.

Another subtle benefit: It clearly documents which variables (functions, modules) are imported by the module. And if you are using the module statement, it enforces such declarations, because the global environment is replaced (so globals are not available).

Related

Clojurescript decoupled file & namespace?

I'm using reagent to build several alternate root components, only one of which will be mounted on any given page; definitely either/or. These have a degree of commonality in their makeup, hence it will be convenient to move what is common among them to a common namespace.
What would be ideal is if in the file for each of these components I had the option to switch namespace into common, and add defs particular to the component, then switch back, thus avoiding circular dependencies nor needing any kind of inheritance.
I recalled this being possible in common lisp, how wonderful it was, and it also seems possible in clojure.
From Clojurescript docs: ns must be the first form and can only be used once, and in-ns is only usable from the repl.
I'm wondering if there's a way to achieve this kind of thing in clojurescript which is still eluding me.
If not I may need to reconsider my assumptions behind multiple alternate root components; the "many builds within one build" kind of idea, if that makes sense.
Update after some futher experimentation and confusion:
another option might be to split a single namespace across multiple files (is this possible?). Not sure what direction to turn in here.
The fact that in reagent I am using atoms in the global namespace is what's creating the need for circular dependencies if I use a separate namespace for common. Hence, wonder about one global namespace, in which case multiple files might help. Or is the way forward one giant file and one namespace??
Update: I've realised there is a great tension between keeping all app state globally (in my current case, multiple atoms), and passing app state around. My pattern currently is everything global, don't pass any of it around. Passing the necessary state as parameters to fns in the common namespace would solve the problem here (duh!), but then there's the question of what principles are being followed here regarding app state. If I just added a param whenever I needed one, but started with the idea that everything was global, there'd be no real principle to it...
In ClojureScript, everything is pre-compiled into a single static JavaScript "executable", so there is nothing like the repl you are used to in Clojure. Indeed, in CLJS the "Var" concept doesn't really after the compiler, they are just static (constant) variables and cannot be rebound.
Having said that, CLJS does emulate the behavior of Clojure dynamic variables via the binding form, so that may help you to reach your goal. As in CLJ, it creates what amounts to a (thread-local) global variable. This is a degenerate case in CLJS since there is only one thread. However, the source code looks identical to the CLJ case.
Another way to accomplish this is to just use a plain atom as a global variable so you don't have to pass a parameter around.
As always, when using a global variable, it reduces the number of parameters in function call trees, but it creates invisible dependencies between different parts of the code. Somethimes convenient, but usually a bad tradeoff.

tcl - when to use package+namespace vs interp?

I'm just starting with TCL and trying to get my head around how to best define and integrate modules. There seem to be much effort put into the package+namespace concept, but from what I can tell interp is more powerful and lean for every thinkable scenario. In particular when it comes to hiding and renaming procedures, but also the lack of creep in the global namespace. The only reason to use package+namespaces seem to be because "once upon a time Sun said so".
When should I ever use package+namespace instead of interp?
Namespaces and packages work together. Interpreters are something else.
A namespace is a small scale naming context in Tcl. It can contain commands, variables and other namespaces. You can refer to entities in a namespace via either local names (foo) or via qualified names (bar::foo); if a qualified name starts with ::, it is relative to the (interpreter-)global namespace, and can be used to refer to its command or variable from anywhere in the interpreter. (FWIW, the TclOO object system builds extensively on top of namespaces; there is one namespace per object.)
A package is a high-level concept for a bunch of code supplied by some sort of library. Packages have abstract names (the name do not have to correspond to how the library's implementation is stored on disk) and a distinct version; you can ask for a particular version if necessary, though most of the time you don't bother. Packages can be implemented by multiple mechanisms, but they almost all come down to sourceing some number of Tcl scripts and loading some number of DLLs. Almost all packages declare commands, and they conventionally are encouraged to put those commands in a namespace with the same general name as the package. However, quite a few older packages do not do this for various reasons, mostly to do with compatibility with existing code.
An interpreter is a security context in Tcl. By default, Tcl creates one interpreter (plus another if it sets up the console window in wish). Named entities in one interpreter are completely distinct from named entities in another interpreter with a few key exceptions:
Channels have common names across all interpreters. This means that an interpreter can talk about channels owned by another interpreter, but merely being able to mention its name doesn't give permission to access the channel. (The stdin, stdout and stderr channels are shared by default.)
The interp alias command can be used to make alias commands, which are such that invoking a command (the alias) in one interpreter can cause a command (the implementation) in another interpreter to be called, with all arguments safely passed over. This allows one interpreter to expose whatever special calls it wants another interpreter to access without losing control, but it is up to the implementation of those commands to act safely on those arguments.
A safe interpreter is one with the unsafe commands of Tcl profiled out by default. (That's things like open, socket, source, load, cd, etc.) The parent interpreter that created the safe child interpreter can use the alias mechanism to add in exactly the functionality desired; it's very much analogous to an OS system call except you can easily make your own application-specific ones.
Tcl's threading package is designed to create one interpreter per thread (and the aliasing mechanism does not work across threads). That means that there's very little in the way of shared resources by default, and inter-thread communication is done via queued message passing.
In general, packages are required at most once per interpreter and are how you are recommended to access most third-party functionality. Namespaces are fairly lightweight and are used for all sorts of things, and interpreters are considered to be expensive; lots of quite thoroughly production-grade Tcl scripts only ever work with a single interpreter. (Threads are even more expensive than interpreters; it's good practice to match the number of threads you create to the hardware load that you wish to impose, probably through the use of suitable thread pools.)
The purpose of a module is to provide modular code, i.e. code that can easily be used by applications beyond the module writer's knowledge and control, and that encapsulates their own internals.
Package-namespace- and interpreter-based modules are probably equally good at encapsulation, but it's not as easy to make interpreter-based modules that play well with arbitrary applications (it is of course possible).
My own opinion is that interpreters are application level (I mostly use them for user input and for controlled evaluation), not module level. Both namespaces and packages have their warts, but in most cases they do what is expected of them with a minimum of fuss.
My recommendation is that if you are writing modules for your own benefit and interpreters serve you well, by all means use them. If you write modules that other people are to use, possibly including yourself in 18 months, you should stick with namespaces and packages.

Tcl extensions: Life Cycle of extensions' ClientData

Non-trivial native extensions will require per-interpreter data
structures that are dynamically allocated.
I am currently using Tcl_SetAssocData, with a key corresponding
to the extension's name and an appropriate deletion routine,
to prevent this memory from leaking away.
However, Tcl_PkgProvideEx also allows one to record such
information. This information can be retrieved by
Tcl_PkgRequireEx. Associating the extension's data structures
with its package seems more natural than in the "grab-bag"
AssocData; yet the Pkg*Ex routines do not provide an
automatically invoked deletion routine. So I think I need
to stay with the AssocData approach.
For which situations were the Pkg*Ex routines designed for?
Additionally, the Tcl Library allows one to install
ExitHandlers and ThreadExitHandlers. Paraphasing the
manual, this is for flushing buffers to disk etc.
Are there any other situations requiring use of ExitHandlers?
When Tcl calls exit, are Tcl_PackageUnloadProcs called?
The whole-extension ClientData is intended for extensions that want to publish their own stub table (i.e., an organized list of functions that represent an exact ABI) that other extensions can build against. This is a very rare thing to want to do; leave at NULL if you don't want it (and contact the Tcl core developers' mailing list directly if you do; we've got quite a bit of experience in this area). Since it is for an ABI structure, it is strongly expected to be purely static data and so doesn't need deletion. Dynamic data should be sent through a different mechanism (e.g., via the Tcl interpreter or through calling functions via the ABI).
Exit handlers (which can be registered at multiple levels) are things that you use when you have to delete some resource at an appropriate time. The typical points of interest are when an interpreter (a Tcl_Interp structure) is being deleted, when a thread is being deleted, and when the whole process is going away. What resources need to be specially deleted? Well, it's usually obvious: file handles, database handles, that sort of thing. It's awkward to answer in general as the details matter very much: ask a more specific question to get tailored advice.
However, package unload callbacks are only called in response to the unload command. Like package load callbacks, they use “special function symbol” registration, and if they are absent then the unload command will refuse to unload the package. Most packages do not use them. The use case is where there are very long-lived processes that need to have extra upgradeable functionality added to them.

Is it bad practice to use the system() function when library functions could be used instead? Why?

Say there is some functionality needed for an application under development which could be achieved by making a system call to either a command line program or utilizing a library. Assuming efficiency is not an issue, is it bad practice to simply make a system call to a program instead of utilizing a library? What are the disadvantages of doing this?
To make things more concrete, an example of this scenario would be an application which needs to download a file from a web server, either the cURL program or the libcURL library could be used for this.
Unless you are writing code for only one OS, there is no way of knowing if your system call will even work. What happens when there is a system update or an OS upgrade?
Never use a system call if there is a library to do the same function.
I prefer libraries because of the dependency issue, namely the executable might not be there when you call it, but the library will be (assuming external library references get taken care of when the process starts on your platform). In other words, using libraries would seem to guarantee a more stable, predictable outcome in more environments than system calls would.
There are several factors to take into account. One key one is the reliability of whether the external program will be present on all systems where your software is installed. If there is a possibility that it will be missing, then maybe it is better to do it inside your program.
Weighing against that, you might consider that the extra code loaded into your program is prohibitive - you don't need the code bloat for such a seldom-used part of your application.
The system() function is convenient, but dangerous, not least because it invokes a shell, usually. You may be better off calling the program more directly - on Unix, via the fork() and exec() system calls. [Note that a system call is very different from calling the system() function, incidentally!] OTOH, you may need to worry about ensuring all open file descriptors in your program are closed - especially if your program is some sort of daemon running on behalf of other users; that is less of a problem if your are not using special privileges, but it is still a good idea not to give the invoked program access to anything you did not intend. You may need to look at the fcntl() system call and the FD_CLOEXEC flag.
Generally, it is easier to keep control of things if you build the functionality into your program, but it is not a trivial decision.
Security is one concern. A malicious cURL could cause havoc in your program. It depends if this is a personal program where coding speed is your main focus, or a commercial application where things like security play a factor.
System calls are much harder to make safely.
All sorts of funny characters need to be correctly encoded to pass arguments in, and the types of encoding may vary by platform or even version of the command. So making a system call that contains any user data at all requires a lot of sanity-checking and it's easy to make a mistake.
Yeah, as mentioned above, keep in mind the difference between system calls (like fcntl() and open()) and system() calls. :)
In the early stages of prototyping a c program, I often make external calls to programs like grep and sed for manipulation of files using popen(). It's not safe, it's not secure, and it's certainly not portable. But it can allow you to get going quickly. That's valuable to me. It lets me focus on the really important core of the program, usually the reason I used c in the first place.
In high level languages, you'd better have a pretty good reason. :)
Instead of doing either, I'd Unix it up and build a script framework around your app, using the command line arguments and stdin.
Other's have mentioned good points (reliability, security, safety, portability, etc) - but I'll throw out another. Performance. Generally it is many times faster to call a library function or even spawn a new thread then it is to start an entire new process (and then you still have to correctly check/verify it's execution and parse it's output!)

The use of config file is it equivalent to use of globals?

I've read many times and agree with avoiding the use of globals to keep code orthogonal. Does the use of the config file to keep read only information that your program uses similar to using Globals?
If you're using config files in place of globals, then yes, they are similar.
Config files should only be used in cases where the end-user (presumably a computer-savvy user, like a developer) needs to declare settings for an application or piece of code, while keeping their hands out of the code itself.
My first reaction would be that it is not the same. I think the problem with globals is the read+write scenario. Config-files are readonly (at least in terms of execution).
In the same way constants are not considered bad programming behaviour. Config-files, at least in the way I use them, are just easy-changable constants.
Well, since a config file and a global variable can both have the effect of propagating changes throughout a system - they are roughly similar.
But... in the case of a configuration file that change is usually going to take place in a single, highly-visible (to the developer) location, and global variables can affect change in very sneaky and hard to track down ways -- so in this way the two concepts are not similar.
Having a configuration file ususally helps with DRY concepts, and it shouldn't hurt the orthogonality of the system, either.
Bonus points for using the $25 word 'orthogonal'. I had to look that one up in Wikipedia to find out the non-Euclidean definition.
Configuration files are really meant to be easily editable by the end user as a way of telling the program how to run.
A more specialized form of configuration files, user preferences, are used to remember things between program executions.
Global is related to a unique instance for an object which will never change, whereas config file is used as container for reference values, for objects within the application that can change.
One "global" object will never change during runtime, the other object is initialized through config file, but can change later on.
Actually, those objects not only can change during the lifetime of the application, they can also monitor the config file in order to realize "hot-change" (modification of their value without stopping/restarting the application), if that config file is modified.
They are absolutely not the same or replacements for eachother. A config file, or object can be used non-globally, ie passed explicitly.
You can of course have a global variable that refers to a config object, and that would be defeating the purpose.