What are some examples of a dynamically scoped language? And what are the reasons for choosing that design? Is it because it is easy to implement?
Mathematica is another language that is dynamically scoped, via the Block construct. This is actually quite useful when working with formulas. It allows you to write things like
In[1]:= expr = a*t^2 + b*t+ c;
In[2]:= Block[{a = 1, b = -1, c = 2}, Table[expr, {t, 5}]]
Out[2]= {2, 4, 8, 14, 22}
which wouldn't work at all if variables like a and t were scoped lexically. It works particularly nicely with Mathematica's rule-rewriting system, which will, among other things, leave variables unevaluated (as symbolic expressions) if it doesn't have an existing definition for them.
Mathematica can fake lexical scoping with the Module construct, but what this really does is rewrite the expression in terms of new, allegedly unique symbol (you can cause clashes if you predict what the next unique symbol will be, which is easy in most cases). This means
Module[{x = 4},
Table[x * t, {t, 5}]]
will be turned into something like this:
Block[{x$134 = 4},
Table[x$134 * t, {t, 5}]
Emacs Lisp, in one of its libraries, has a construct (really a Lisp macro) called lexical-let that pulls exactly the same trick to fake lexical scoping.
There are performance advantages to real lexical scoping when you're compiling your language which you don't get with the fake lexicals of ELisp or Mathematica, since you need some mapping between the dynamic variable and its current value, which means doing lookups (through a hash table or property list or something) and additional layers of indirection.
EDIT: If you have only lexical variables, you can fake dynamic scoping by storing the original value of a global, lexical variable on entering the scope and guaranteeing that the old value is restored upon exiting the scope. In order to ensure that, you'll need something like Lisp's UNWIND-PROTECT or a finally block. I've seen this done using C++ destructors as well, mostly as an exercise.
Dynamically scoped languages are much easier to implement. To access variables which is not in the current activaiton record / stack frame, one just follows the control links. Static/lexical access links are then not needed, making stack frames smaller.
Dynamic variables can be "unpredictable" at runtime, because one needs to know in which order the actual stackframes are to know which variable will be used. This information is not available by just looking at the static structure of the code. One could quite easily get caught out if the actual call graph of the program is not easy to predict at implementation time. Thats why most languages today have static scoping (most Exception systems however, are dynamic as this is the most practical).
However in some cases, dynamically scoped variables are very useful. For example when redirecting output, you could using dynamic variables set standard output for local code and all code called from there on.
(let ((*standard-output* *some-other-stream*))
(stuff))
In this common-lisp example (from Seibel), standard output is bound to another stream for the duration of the let form, (inside its enclosing parens). When execution leaves the let, it goes back to whatever it was beforehand. See http://gigamonkeys.com/book/variables.html Peter Seibels free and excellent book, Practical Common Lisp, for a good discussion. In Seibels own words:
Dynamic bindings make global variables much more manageable, but it's important to notice they still allow action at a distance. Binding a global variable has two at a distance effects--it can change the behavior of downstream code, and it also opens the possibility that downstream code will assign a new value to a binding established higher up on the stack. You should use dynamic variables only when you need to take advantage of one or both of these characteristics.
Well, there's a bunch of websites that discuss the pro's and con's, so I'm not going there.
One interesting language that has some features that faintly resemble dynamic scope is XSLT; although XSLT's templates and variables and the like are lexically scoped, XSLT is of course all about XML - and the current position in the xml tree is "dynamically scoped" in the sense that the context node is global and thus that XPath expressions are evaluated not according to XSLT's lexical scope but according to it's dynamic evaluation.
Dynamic scope is/was easier to implement with interpreters. Most early Lisp interpreters were using dynamic scope. After several years lexical scope was found to have an advantage, but was first mostly implemented in Lisp compilers. Several implementations appeared that implemented dynamic scope in interpreted code and lexical scope in compiled code. Some provided a special language construct to provide closures. Lisp dialects like Scheme and Common Lisp required then that there is no difference between interpreted and compiled code and thus interpreted based implementations had to implement lexical scope, too.
Early Smalltalk implementations implemented dynamic scope. All kinds of Lisp dialect implementations implemented dynamic scope (Interlisp, UCI Lisp, Lisp Machine Lisp, MacLisp, ...).
Almost all new Lisp dialects from the last 20 years use lexical scope by default or even exclusively. Several publications have described in detail how to implement Lisp with lexical scope - so there is no excuse not to use lexical scope.
All the shell languages (bash, ksh, etc) use dynamic scoping.
Related
Let's I have a class foo, with a variable bar. Now... I want that if there is a class moo, which has class foo as a superclass, I want that any attempts to write to, or better yet, even refer directly to bar will be errored out. This could save situations when someone is using my code (which could be compiled to byte-code), to not override, by having one's own variable with the same name
TclOO simply does not support the concept. Classes are not security boundaries in TclOO, just as namespaces are not security boundaries in plain Tcl (TclOO objects are really just fancy namespaces). Tcl's security boundaries are between interpreters, and between the Tcl script level and the (usually) C implementation level. We're considering adding “private” instance variables for Tcl 8.7, but even those won't be truly private; their names will still be predictable if you know how (and they will be accessible from outside the class; that's important for when using the variable with third-party code such as Tk). To reiterate: classes are not security boundaries.
If you have something that must be locked out of sight, it is easiest to implement it in C. You can plug in methods implemented in C into TclOO (applying whatever controls you can think of) and those methods can use the (C level only) metadata mechanism to create instance-attached storage that they can use. All the callbacks are in place to do deletion correctly at the right time. Methods in C are not much more complicated than commands in C; the function callback signature is a little different and the usage is a bit more complicated (because there are other standard operations on methods such as copying them) but if you can do one, you can figure out how to do the other too.
I'm new to Julia, and I'm trying to understand, at the language level, what ccall is. At the syntax level, it looks like a normal function, but it clearly doesn't behave the same way in how it takes its arguments:
Note that the argument type tuple must be a literal tuple, and not a
tuple-valued variable or expression.
Additionally, if I evaluate a variable bound to a function in the Julia REPL, I get something like
julia> max
max (generic function with 15 methods)
But if I try to do the same with ccall:
julia> ccall
ERROR: syntax: invalid "ccall" syntax
Clearly, ccall is a special piece of syntax, but it's also not a macro (no # prefix, and invalid macro usage gives a more specific error). So, what is it? Is it something baked into the language, or something I could define myself with some language construct I'm not familiar with?
And if it is some baked-in piece of syntax, why was it decided to use function call notation, instead of implementing it as a macro or designing a more readable and distinct syntax?
In the current nightly (and thus, upcoming 0.6 release), much of the special behavior you observe has been removed (see this pull-request). ccall is no longer a reserved word, so it can be used as a function or macro name.
However there is still a slight oddity: defining a 3- or 4-argument function called ccall is allowed, but actually calling such a function will give an error about ccall argument types (other numbers of arguments are ok). The reasons go directly to your question:
So, what is it? Is it something baked into the language
Yes, ccall, though it will no longer be a keyword in 0.6, is still "baked in" to the language in several ways:
the :ccall([four args...]) expression form is recognized and specially handled during syntax lowering. This lowering step does several things including wrapping arguments in a call to unsafe_convert, which allows for customized conversion from Julia objects to C-compatible objects; as well as pulling out arguments that might need to be rooted to prevent garbage collection of a referenced object during the ccall. (see code_lowered output, or try the expand function; more info on the compiler here).
ccall requires extensive handling in the code generation backend, including: look-up of the requested function name in the specified shared library, and generation of an LLVM call instruction -- which is eventually translated to platform-specific machine code by the LLVM Just-In-Time compiler. (see the different stages with code_llvm and code_native).
And if it is some baked-in piece of syntax, why was it decided to use
function call notation, instead of implementing it as a macro or
designing a more readable and distinct syntax?
For the reasons detailed above, ccall requires special handling whether it looks like a macro or a function. In this mailing list thread, one of the Julia creators (Stefan Karpinski) commented on why not to make it a macro:
I suppose we could reimplement it as a macro, but that would really just be pushing the magic further down.
As far as "a more readable and distinct syntax", perhaps that is a matter of taste. It's not clear to me why some other syntax would be preferable (except for the convenience of a LuaJIT/CFFI-style inline C syntax parsing, of which I am a fan). My only strong personal wish for ccall would be to have arguments and types entered adjacent (e.g. ccall((:foo, :libbar), Void, (x::Int, y::Float))), because working with longer argument lists can be inconvenient. In 0.6 it will be possible to implement this form as a macro!
In Julia 0.5 and earlier.
It is not a function and it is not a macro.
It is indeed something special baked into the language.
It is an Intrinsic.
In julia 0.6 this changes
It a lot of ways it is more like a Macro than a function call.
But in other ways it is not -- it does not return an AST.
It does call a function and on a low enough level it looks similar to calling a julia function.
The history of why it looks the way it does is beyond me, you'ld need to hear from one of the people who worked on the earliest code for the language.
Right now it is everywhere, and is one of the harder things to change -- but not impossible. It would trigger up for 3 years of bikeshedding though :-P .
I like to think of ccall as being two things.
Foreign Function Interface, for C and other compiled languages (eg Fortran, Rust apparently work)
Way to access the raw guts of the language "runtime".
Foreign Function Interface (FFI)
Most of the time when one uses ccall in a package one wants to invoke some code that is in a compile library. In this sense it is C-Call, like R-Call, or Py-Call.
I think mlewe/BlossomV.jl is a nice compact example.
For a more intense example oxinabox/SLEEF.jl.
As an FFI, it does not have to share memory space/a process with julia -- PyCall.jl does, RCall.jl and Matlab.jl don't.
It doesn't matter as long as the result comes back.
In these cases it is theoretically possible to replace ccall with some kind of safe_ccall which would run the called library in a separate process, and would not segfault julia if the library being called segfaulted.
But as of yet, no-one has written such a method/package.
Using ccall for FFI is even done in Base, like for accessing MPFR to define BigFloat.
But this is not the main reason ccall is used in Base.
Accessing the guts of the language.
ccall is really what drives a large portion of the program "doing a thing".
It is used throughout Base, to call the functions from src.
For this, ccall basically triggers a function call at the compiled level, that shifts the instruction pointer directly into the compiled code of the ccalled function. Like calling a function would if the whole thing had been written in say C.
You can see in base/threadingconstructs.jl ccall being used to manage work on threads -- that triggers code from src/threading.c.
It is used to map a section of disk to memory. mmap.jl. -- obviously can't be done from another process.
It is used to make a section of code non-intruptable
It is used call LibC to do things like malloc to allocate memory (though right now this is mostly used as part of FFI).
There are tricks you can do with ccall to #undef a variable after it has already been assigned.
ccall is in many ways the "master" key to the language.
Conclusion
I've described ccall here as two things, a FFI function and a core part of the language "runtime". This duality is not real, and there is plenty of overlap, like filehandling (is it FFI?).
The behavour many expect ccall to have comes from its FFI uses.
Here ccall could just be a function.
The behaviour it actually has comes from it's use as a core part of the language -- linking the julia code of the standard library in Base to the low level C code from src.
Allowing the very direct control over the running of the julia process.
How are implemented classes in dynamic languages ?
I know that Javascript is using a prototype pattern (there is 'somewhere' a container of unbound JS functions, which are bind when calling them through an object), but I have no idea of how it works in other languages.
I'm curious about this, because I can't think of an efficient way to have native bound methods without wasting memory and/or cpu by copying members for each instance.
(By bound method, I mean that the following code should work :)
class Foo { function bar() : return 42; };
var test = new Foo();
var method = test.bar;
method() == 42;
This highly depends on the language and the implementation. I'll tell you what I know about CPython and PyPy.
The general idea, which is also what CPython does for the most part, goes like this:
Every object has a class, specifically a reference to that class object.
Apart from instance members, which are obviously stored in the individual object, the class also has members. This includes methods, so methods don't have a per-object cost.
A class has a method resolution order (MRO) determined by the inheritance relationships, wherein each base class occurs exactly once. If we didn't have multiple inheritance, this would simply be a reference to the base class, but this way the MRO is hard to figure out on the fly (you'd have to start from the most derived class every time).
(Classes are also objects and have classes themselves, but we'll gloss over that for now.)
If attribute lookup on an object fails, the same attribute is looked up on the classes in the MRO, in the order specified by the MRO. (This is the default behavior, which can be changed by defining magic methods like __getattr__ and __getattribute__.)
So far so simple, and not really an explanation for bound methods. I just wanted to make sure we're talking about the same thing. The missing piece is descriptors. The descriptor protocol is defined in the "deep magic" section of the language reference, but the short and simple story is that lookup on a class can be hijacked by the object it results in via a __get__ method. More importantly, this __get__ method is told whether the lookup started on an instance or on the "owner" (the class).
In Python 2, we have an ugly and unnecessary UnboundMethod descriptor which (apart from the __get__ method) simply wraps the function to throw errors on Class.method(self) if self is not of an acceptable type. In Python 3, the __get__ is simply part of all function objects, and unbound methods are gone. In both cases, the __get__ method returns itself when you look it up on a class (so you can use Class.method, which is useful in a few cases) and a "bound method" object when you look it up on an object. This bound method object does nothing more than storing the raw function and the instance, and passing the latter as first argument to the former in its __call__ (special method overriding the function call syntax).
So, for CPython: While there is a cost to bound methods, it's smaller than you might think. Only two references are needed space-wise, and the CPU cost is limited to a small memory allocation, and an extra indirection when calling. Note though that this cost applies to all method calls, not just those which actually make use of bound method features. a.f() has to call the descriptor and use its return value, because in a dynamic language we don't know if it's monkey-patched to do something different.
In PyPy, things are more interesting. As it's an implementation which doesn't compromise on correctness, the above model is still correct for reasoning about semantics. However, it's actually faster. Apart from the fact that the JIT compiler inlines and then eliminates the entire mess described above in most cases, they also tackle the problem on bytecode level. There are two new bytecode instructions, which preserve the semantics but omit the allocation of the bound method object in the case of a.f(). There is also a method cache which can simplify the lookup process, but requires some additional bookkeeping (though some of that bookkeeping is already done for the JIT).
I found it interesting to read on one of the ways that you can do functional dynamic dispatch in sicp - using a table of type tag + name -> functions that you can fetch from or add to.
I was wondering, is this a typical type dispatch mechanism for a dynamic non OO language?
Also what would be the typical way to monkey path this, using a chaining list of tables(if you don't find it in the first table try next table recursively)? Rebind the table within local scope to a modified copy? ect?
I believe this is a typical type dispatch mechanism, even for non-dynamic non-OO languages, based on this article about the JHC Haskell compiler and how it implements type classes. The implication in the article is that most Haskell compilers implement type classes (a kind of type dispatch) by passing dictionaries. His alternative is direct case analysis, which likely would not be applicable in dynamically typed languages, since you don't know ahead of time what the types of the constituents of your expression will be. On the other hand, this isn't extensible at run-time either.
As for dynamic non-OO languages, I'm not aware of many examples outside Lisp/Scheme. Common Lisp's CLOS makes Lisp a proper OO language and provides dynamic dispatch as well as multiple dispatch (you can add or remove generics and methods at run-time, and they can key off the type of more than just the first parameter). I don't know how this is usually implemented, but I do know that it is usually an add-on facility rather than a built-in facility, which implies it's using functionality available to the would-be monkey-patcher, and also that certain versions have been criticized for their lack of speed (CLISP, I think, but they may have resolved this). Of course, you could implement this type of parallel dispatch mechanism within an OO language as well, and you can probably find plenty of examples of that.
If you were using purely-functional persistent maps or dictionaries, you could certainly implement this faculty without even needing the chain of inherited maps; as you "modify" the map, you get a new map back, but all the existing references to the old map would still be valid and see it as the old version. If you were implementing a language with this facility you could interpret it by putting the type->function map in the Reader monad and wrapping your interpreter in it.
what is the purpose of namespaces ?
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
or should they be used as packages in java ?
or should they be used more generally as a module system or something ?
Given that you use the Clojure tag, I suppose that you'll be interested in a Clojure-specific answer:
what is the purpose of namespaces ?
Clojure namespaces, Java packages, Haskell / Python / whatever modules... At a very high level, they're all different names for the same basic mechanism whose primary purpose is to prevent name clashes in non-trivial codebases. Of course, each solution has its own little twists and quirks which make sense in the context of a given language and would not make sense outside of it. The rest of this answer will deal with the twists and quirks specific to Clojure.
A Clojure namespace groups Vars, which are containers holding functions (most often), macro functions (functions used by the compiler to generate macroexpansions of appropriate forms, normally defined with defmacro; actually they are just regular Clojure functions, although there is some magic to the way in which they are registered with the compiler) and occasionally various "global parameters" (say, clojure.core/*in* for standard input), Atoms / Refs etc. The protocol facility introduced in Clojure 1.2 has the nice property that protocols are backed by Vars, as are the individual protocol functions; this is key to the way in which protocols present a solution to the expression problem (which is however probably out of the scope of this answer!).
It stands to reason that namespaces should group Vars which are somehow related. In general, creating a namespace is a quick & cheap operation, so it is perfectly fine (and indeed usual) to use a single namespace in early stages of development, then as independent chunks of functionality emerge, factor those out into their own namespaces, rinse & repeat... Only the things which are part of the public API need to be distributed between namespaces up front (or rather: prior to a stable release), since the fact that function such-and-such resides in namespace so-and-so is of course a part of the API.
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
Normally, the answer is no. You might get a picture not too far from the truth if you approach them as classes with lots of static methods, no instance methods, no public constructors and often no state (though occasionally there may be some "class data members" in the form of Vars holding Atoms / Refs); but arguably it may be more useful not to try to apply Java-ish metaphors to Clojure idioms and to approach a namespace as a group of functions etc. and not "a class holding a group of functions" or some such thing.
There is an important exception to this general rule: namespaces which include :gen-class in their ns form. These are meant precisely to implement a Java class which may later be instantiated, which might have instance methods and per-instance state etc. Note that :gen-class is an interop feature -- pure Clojure code should generally avoid it.
or should they be used as packages in java ?
They serve some of the same purposes packages were designed to serve (as already mentioned above); the analogy, although it's certainly there, is not that useful, however, just because the things which packages group together (Java classes) are not at all like the things which Clojure namespaces group together (Clojure Vars), the various "access levels" (private / package / public in Java, {:private true} or not in Clojure) work very differently etc.
That being said, one has to remember that there is a certain correspondence between namespaces and packages / classes residing in particular packages. A namespace called foo.bar, when compiled, produces a class called bar in the package foo; this means, in particular, that namespace names should contain at least one dot, as so-called single-segment names apparently lead to classes being put in the "default package", leading to all sorts of weirdness. (E.g. I find it impossible to have VisualVM's profiler notice any functions defined in single-segment namespaces.)
Also, deftype / defrecord-created types do not reside in namespaces. A (defrecord Foo [...] ...) form in the file where namespace foo.bar is defined creates a class called Foo in the package foo.bar. To use the type Foo from another namespace, one would have to :import the class Foo from the foo.bar package -- :use / :require would not work, since they pull in Vars from namespaces, which records / types are not.
So, in this particular case, there is a certain correspondence between namespaces and packages which Clojure programmers who wish to take advantage of some of the newer language features need to be aware of. Some find that this gives an "interop flavour" to features which are not otherwise considered to belong in the realm of interop (defrecord / deftype / defprotocol are a good abstraction mechanism even if we forget about their role in achieving platform speed on the JVM) and it is certainly possible that in some future version of Clojure this flavour might be done away with, so that the namespace name / package name correspondence for deftype & Co. can be treated as an implementation detail.
or should they be used more generally as a module system or something ?
They are a module system and this is indeed how they should be used.
A package in Java has its own namespace, which provides a logical grouping of classes. It also helps prevent naming collisions. For example in java you will find java.util.Date and java.sql.Date - two different classes with the same name differentiated by their namespace. If you try an import both into a java file, you will see that it wont compile. At least one version will need to use its explicit namespace.
From a language independant view, namespaces are a way to isolate things (i.e. encapsulate in a sens). It's a more general concept (see xml namespaces for example). You can "create" namespaces in several ways, depending on the language you use: packages, static classes, modules and so on. All of these provides namespaces to the objects/data/functions they contain. This allow to organize the code better, to isolate features, tends for better code reuse and adaptability (as encapsulation)
As stated in the "Zen of Python", "Namespaces are one honking great idea -- let's do more of those !".
Think of them as containers for your classes. As in if you had a helper class for building strings and you wanted it in your business layer you would use a namespace such as MyApp.Business.Helpers. This allows your classes to be contained in sensical locations so when you or some else referencing your code wants to cosume them they can be located easily. For another example if you wanted to consume a SQL connection helper class you would probably use something like:
MyApp.Data.SqlConnectionHelper sqlHelper = new MyApp.Data.SqlConnectionHelper();
In reality you would use a "using" statement so you wouldn't need to fully qualify the namespace just to declare the variable.
Paul