I discovered that a lot of "special forms" are just macros that use their asterisks version in the background (fn*, let* and all the others).
In case of fn, for example, it adds the destructuring capability into the mix which fn* alone does not provide. I tried to find some detailed documentation of what fn* can and can't do on its own, but I was not so lucky.
It definitely supports:
the &/catchall indicator
(fn* [x & rest] (do-smth-here...))
and curiously also the overloading on arity, as in:
(fn* ([x] (smth-with-one-arg ...)
([x y] (smth-with-two-args ...))
So my question finally is, why not only define:
(fn& [& all-args] ...)
which would be absolutely minimal and could provide all the arity selection via macros (checking size of parameter list, if/case statement to direct code path, bind first few parameters to desired symbol, etc..).
Is this for performance reasons? Maybe someone even has a link to the actual standard definition of the asterisks special forms handy.
Arity selection leverages the JVM virtual method dispatch: each arity (from 0 to 20 arguments) has its own method and there's a single method for 21+-arg arities.
You may notice the applyTo method which is the generic method akin to what you propose with fn&. It's implementation is just a giant switch to select the correct specialized method.
Yes, you could do all that as macros on top of the ultra-primitive fn& that you propose, and it would certainly simplify the compiler implementation. The reason this is not done is partly for performance reasons (it would be rather slow, and the JVM already has a fast facility for dispatching based on arity), and partly "cosmetic": it means that each arity of a function is a different method of the JVM class that a function is compiled down to, which makes stacktraces nicer. This also helps the JIT "understand" our functions better, so that it can optimize accordingly.
My guess would be that this is convenience/extensibility driven. The compiler (where fn* is actually "defined"/processed) is written in java and handles the minimum needed functionality in order to bootstrap the language while fn is a macro that builds on top of it. Same with some of the other forms. Somewhere there was a statement from Rich that he could rewrite the compiler from java to clojure but does not see the benefit (correct me if wrong).
Related
Let's I have a class foo, with a variable bar. Now... I want that if there is a class moo, which has class foo as a superclass, I want that any attempts to write to, or better yet, even refer directly to bar will be errored out. This could save situations when someone is using my code (which could be compiled to byte-code), to not override, by having one's own variable with the same name
TclOO simply does not support the concept. Classes are not security boundaries in TclOO, just as namespaces are not security boundaries in plain Tcl (TclOO objects are really just fancy namespaces). Tcl's security boundaries are between interpreters, and between the Tcl script level and the (usually) C implementation level. We're considering adding “private” instance variables for Tcl 8.7, but even those won't be truly private; their names will still be predictable if you know how (and they will be accessible from outside the class; that's important for when using the variable with third-party code such as Tk). To reiterate: classes are not security boundaries.
If you have something that must be locked out of sight, it is easiest to implement it in C. You can plug in methods implemented in C into TclOO (applying whatever controls you can think of) and those methods can use the (C level only) metadata mechanism to create instance-attached storage that they can use. All the callbacks are in place to do deletion correctly at the right time. Methods in C are not much more complicated than commands in C; the function callback signature is a little different and the usage is a bit more complicated (because there are other standard operations on methods such as copying them) but if you can do one, you can figure out how to do the other too.
I'm new to Julia, and I'm trying to understand, at the language level, what ccall is. At the syntax level, it looks like a normal function, but it clearly doesn't behave the same way in how it takes its arguments:
Note that the argument type tuple must be a literal tuple, and not a
tuple-valued variable or expression.
Additionally, if I evaluate a variable bound to a function in the Julia REPL, I get something like
julia> max
max (generic function with 15 methods)
But if I try to do the same with ccall:
julia> ccall
ERROR: syntax: invalid "ccall" syntax
Clearly, ccall is a special piece of syntax, but it's also not a macro (no # prefix, and invalid macro usage gives a more specific error). So, what is it? Is it something baked into the language, or something I could define myself with some language construct I'm not familiar with?
And if it is some baked-in piece of syntax, why was it decided to use function call notation, instead of implementing it as a macro or designing a more readable and distinct syntax?
In the current nightly (and thus, upcoming 0.6 release), much of the special behavior you observe has been removed (see this pull-request). ccall is no longer a reserved word, so it can be used as a function or macro name.
However there is still a slight oddity: defining a 3- or 4-argument function called ccall is allowed, but actually calling such a function will give an error about ccall argument types (other numbers of arguments are ok). The reasons go directly to your question:
So, what is it? Is it something baked into the language
Yes, ccall, though it will no longer be a keyword in 0.6, is still "baked in" to the language in several ways:
the :ccall([four args...]) expression form is recognized and specially handled during syntax lowering. This lowering step does several things including wrapping arguments in a call to unsafe_convert, which allows for customized conversion from Julia objects to C-compatible objects; as well as pulling out arguments that might need to be rooted to prevent garbage collection of a referenced object during the ccall. (see code_lowered output, or try the expand function; more info on the compiler here).
ccall requires extensive handling in the code generation backend, including: look-up of the requested function name in the specified shared library, and generation of an LLVM call instruction -- which is eventually translated to platform-specific machine code by the LLVM Just-In-Time compiler. (see the different stages with code_llvm and code_native).
And if it is some baked-in piece of syntax, why was it decided to use
function call notation, instead of implementing it as a macro or
designing a more readable and distinct syntax?
For the reasons detailed above, ccall requires special handling whether it looks like a macro or a function. In this mailing list thread, one of the Julia creators (Stefan Karpinski) commented on why not to make it a macro:
I suppose we could reimplement it as a macro, but that would really just be pushing the magic further down.
As far as "a more readable and distinct syntax", perhaps that is a matter of taste. It's not clear to me why some other syntax would be preferable (except for the convenience of a LuaJIT/CFFI-style inline C syntax parsing, of which I am a fan). My only strong personal wish for ccall would be to have arguments and types entered adjacent (e.g. ccall((:foo, :libbar), Void, (x::Int, y::Float))), because working with longer argument lists can be inconvenient. In 0.6 it will be possible to implement this form as a macro!
In Julia 0.5 and earlier.
It is not a function and it is not a macro.
It is indeed something special baked into the language.
It is an Intrinsic.
In julia 0.6 this changes
It a lot of ways it is more like a Macro than a function call.
But in other ways it is not -- it does not return an AST.
It does call a function and on a low enough level it looks similar to calling a julia function.
The history of why it looks the way it does is beyond me, you'ld need to hear from one of the people who worked on the earliest code for the language.
Right now it is everywhere, and is one of the harder things to change -- but not impossible. It would trigger up for 3 years of bikeshedding though :-P .
I like to think of ccall as being two things.
Foreign Function Interface, for C and other compiled languages (eg Fortran, Rust apparently work)
Way to access the raw guts of the language "runtime".
Foreign Function Interface (FFI)
Most of the time when one uses ccall in a package one wants to invoke some code that is in a compile library. In this sense it is C-Call, like R-Call, or Py-Call.
I think mlewe/BlossomV.jl is a nice compact example.
For a more intense example oxinabox/SLEEF.jl.
As an FFI, it does not have to share memory space/a process with julia -- PyCall.jl does, RCall.jl and Matlab.jl don't.
It doesn't matter as long as the result comes back.
In these cases it is theoretically possible to replace ccall with some kind of safe_ccall which would run the called library in a separate process, and would not segfault julia if the library being called segfaulted.
But as of yet, no-one has written such a method/package.
Using ccall for FFI is even done in Base, like for accessing MPFR to define BigFloat.
But this is not the main reason ccall is used in Base.
Accessing the guts of the language.
ccall is really what drives a large portion of the program "doing a thing".
It is used throughout Base, to call the functions from src.
For this, ccall basically triggers a function call at the compiled level, that shifts the instruction pointer directly into the compiled code of the ccalled function. Like calling a function would if the whole thing had been written in say C.
You can see in base/threadingconstructs.jl ccall being used to manage work on threads -- that triggers code from src/threading.c.
It is used to map a section of disk to memory. mmap.jl. -- obviously can't be done from another process.
It is used to make a section of code non-intruptable
It is used call LibC to do things like malloc to allocate memory (though right now this is mostly used as part of FFI).
There are tricks you can do with ccall to #undef a variable after it has already been assigned.
ccall is in many ways the "master" key to the language.
Conclusion
I've described ccall here as two things, a FFI function and a core part of the language "runtime". This duality is not real, and there is plenty of overlap, like filehandling (is it FFI?).
The behavour many expect ccall to have comes from its FFI uses.
Here ccall could just be a function.
The behaviour it actually has comes from it's use as a core part of the language -- linking the julia code of the standard library in Base to the low level C code from src.
Allowing the very direct control over the running of the julia process.
In C, if I want to see a function that how to work, I open the library which provides the function and analyze the code. How can be implementations of the lisp functions seen? For example, intersection function
You can also look at the source code of lisp functions.
For example, the source files for CLISP, one Common Lisp implementation, are available here: http://www.clisp.org/impnotes/src-files.html
If you want to examine the implementation of functions related to lists, you can look at the file: http://clisp.cvs.sourceforge.net/viewvc/clisp/clisp/src/list.d
The usual answer is "M-."
Assuming you have a properly configured IDE, and the source code of the function, clicking on its name and pressing M-. (that's Meta, or Alt or Option or Escape, and dot/period; or whatever key your IDE uses) should reveal its definition (or, for a generic function, definitions, plural; including any compiler macros that might optimize out some cases). Sometimes it's on a right-click or other mouse menu or toolbar.
If the source isn't available, you can often see the actual compiled form by evaluating (disassemble 'function)
Most IDE's, including perennial favourite Emacs+Slime, have other Inspection operations on the menu as well.
In a non-IDE environment, most compilers have reflection tools of their own (compiler-dependant) which are usually also mapped by the Swank library that Slime uses; one might find useful function in that package.
And this really should be documented in your IDE's manual.
I should postscript this that:
You really shouldn't care about the implementation of the core library functions; their contractual behavior is very well documented in the CLHS standard, which is available online and eg, Quicklisp has an utility to link it to Slime (C-c C-d h on a symbol in the COMMON-LISP package); for all well-written Lisp libraries, there should be documentation attached to functions, variables, classes, etc. accessible via the documentation function in the REPL or the IDE's menus and Inspection windows.
Core library functions are often highly optimized and far more complex than most user-level code should want to be, and often call down into compiler-specific "guts" that one should avoid doing in application code.
what is the purpose of namespaces ?
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
or should they be used as packages in java ?
or should they be used more generally as a module system or something ?
Given that you use the Clojure tag, I suppose that you'll be interested in a Clojure-specific answer:
what is the purpose of namespaces ?
Clojure namespaces, Java packages, Haskell / Python / whatever modules... At a very high level, they're all different names for the same basic mechanism whose primary purpose is to prevent name clashes in non-trivial codebases. Of course, each solution has its own little twists and quirks which make sense in the context of a given language and would not make sense outside of it. The rest of this answer will deal with the twists and quirks specific to Clojure.
A Clojure namespace groups Vars, which are containers holding functions (most often), macro functions (functions used by the compiler to generate macroexpansions of appropriate forms, normally defined with defmacro; actually they are just regular Clojure functions, although there is some magic to the way in which they are registered with the compiler) and occasionally various "global parameters" (say, clojure.core/*in* for standard input), Atoms / Refs etc. The protocol facility introduced in Clojure 1.2 has the nice property that protocols are backed by Vars, as are the individual protocol functions; this is key to the way in which protocols present a solution to the expression problem (which is however probably out of the scope of this answer!).
It stands to reason that namespaces should group Vars which are somehow related. In general, creating a namespace is a quick & cheap operation, so it is perfectly fine (and indeed usual) to use a single namespace in early stages of development, then as independent chunks of functionality emerge, factor those out into their own namespaces, rinse & repeat... Only the things which are part of the public API need to be distributed between namespaces up front (or rather: prior to a stable release), since the fact that function such-and-such resides in namespace so-and-so is of course a part of the API.
and, more important, should they be used as objects in java (things that have data and functions and that try to achieve encapsulation) ? is this idea to far fetched ? :)
Normally, the answer is no. You might get a picture not too far from the truth if you approach them as classes with lots of static methods, no instance methods, no public constructors and often no state (though occasionally there may be some "class data members" in the form of Vars holding Atoms / Refs); but arguably it may be more useful not to try to apply Java-ish metaphors to Clojure idioms and to approach a namespace as a group of functions etc. and not "a class holding a group of functions" or some such thing.
There is an important exception to this general rule: namespaces which include :gen-class in their ns form. These are meant precisely to implement a Java class which may later be instantiated, which might have instance methods and per-instance state etc. Note that :gen-class is an interop feature -- pure Clojure code should generally avoid it.
or should they be used as packages in java ?
They serve some of the same purposes packages were designed to serve (as already mentioned above); the analogy, although it's certainly there, is not that useful, however, just because the things which packages group together (Java classes) are not at all like the things which Clojure namespaces group together (Clojure Vars), the various "access levels" (private / package / public in Java, {:private true} or not in Clojure) work very differently etc.
That being said, one has to remember that there is a certain correspondence between namespaces and packages / classes residing in particular packages. A namespace called foo.bar, when compiled, produces a class called bar in the package foo; this means, in particular, that namespace names should contain at least one dot, as so-called single-segment names apparently lead to classes being put in the "default package", leading to all sorts of weirdness. (E.g. I find it impossible to have VisualVM's profiler notice any functions defined in single-segment namespaces.)
Also, deftype / defrecord-created types do not reside in namespaces. A (defrecord Foo [...] ...) form in the file where namespace foo.bar is defined creates a class called Foo in the package foo.bar. To use the type Foo from another namespace, one would have to :import the class Foo from the foo.bar package -- :use / :require would not work, since they pull in Vars from namespaces, which records / types are not.
So, in this particular case, there is a certain correspondence between namespaces and packages which Clojure programmers who wish to take advantage of some of the newer language features need to be aware of. Some find that this gives an "interop flavour" to features which are not otherwise considered to belong in the realm of interop (defrecord / deftype / defprotocol are a good abstraction mechanism even if we forget about their role in achieving platform speed on the JVM) and it is certainly possible that in some future version of Clojure this flavour might be done away with, so that the namespace name / package name correspondence for deftype & Co. can be treated as an implementation detail.
or should they be used more generally as a module system or something ?
They are a module system and this is indeed how they should be used.
A package in Java has its own namespace, which provides a logical grouping of classes. It also helps prevent naming collisions. For example in java you will find java.util.Date and java.sql.Date - two different classes with the same name differentiated by their namespace. If you try an import both into a java file, you will see that it wont compile. At least one version will need to use its explicit namespace.
From a language independant view, namespaces are a way to isolate things (i.e. encapsulate in a sens). It's a more general concept (see xml namespaces for example). You can "create" namespaces in several ways, depending on the language you use: packages, static classes, modules and so on. All of these provides namespaces to the objects/data/functions they contain. This allow to organize the code better, to isolate features, tends for better code reuse and adaptability (as encapsulation)
As stated in the "Zen of Python", "Namespaces are one honking great idea -- let's do more of those !".
Think of them as containers for your classes. As in if you had a helper class for building strings and you wanted it in your business layer you would use a namespace such as MyApp.Business.Helpers. This allows your classes to be contained in sensical locations so when you or some else referencing your code wants to cosume them they can be located easily. For another example if you wanted to consume a SQL connection helper class you would probably use something like:
MyApp.Data.SqlConnectionHelper sqlHelper = new MyApp.Data.SqlConnectionHelper();
In reality you would use a "using" statement so you wouldn't need to fully qualify the namespace just to declare the variable.
Paul
What are some examples of a dynamically scoped language? And what are the reasons for choosing that design? Is it because it is easy to implement?
Mathematica is another language that is dynamically scoped, via the Block construct. This is actually quite useful when working with formulas. It allows you to write things like
In[1]:= expr = a*t^2 + b*t+ c;
In[2]:= Block[{a = 1, b = -1, c = 2}, Table[expr, {t, 5}]]
Out[2]= {2, 4, 8, 14, 22}
which wouldn't work at all if variables like a and t were scoped lexically. It works particularly nicely with Mathematica's rule-rewriting system, which will, among other things, leave variables unevaluated (as symbolic expressions) if it doesn't have an existing definition for them.
Mathematica can fake lexical scoping with the Module construct, but what this really does is rewrite the expression in terms of new, allegedly unique symbol (you can cause clashes if you predict what the next unique symbol will be, which is easy in most cases). This means
Module[{x = 4},
Table[x * t, {t, 5}]]
will be turned into something like this:
Block[{x$134 = 4},
Table[x$134 * t, {t, 5}]
Emacs Lisp, in one of its libraries, has a construct (really a Lisp macro) called lexical-let that pulls exactly the same trick to fake lexical scoping.
There are performance advantages to real lexical scoping when you're compiling your language which you don't get with the fake lexicals of ELisp or Mathematica, since you need some mapping between the dynamic variable and its current value, which means doing lookups (through a hash table or property list or something) and additional layers of indirection.
EDIT: If you have only lexical variables, you can fake dynamic scoping by storing the original value of a global, lexical variable on entering the scope and guaranteeing that the old value is restored upon exiting the scope. In order to ensure that, you'll need something like Lisp's UNWIND-PROTECT or a finally block. I've seen this done using C++ destructors as well, mostly as an exercise.
Dynamically scoped languages are much easier to implement. To access variables which is not in the current activaiton record / stack frame, one just follows the control links. Static/lexical access links are then not needed, making stack frames smaller.
Dynamic variables can be "unpredictable" at runtime, because one needs to know in which order the actual stackframes are to know which variable will be used. This information is not available by just looking at the static structure of the code. One could quite easily get caught out if the actual call graph of the program is not easy to predict at implementation time. Thats why most languages today have static scoping (most Exception systems however, are dynamic as this is the most practical).
However in some cases, dynamically scoped variables are very useful. For example when redirecting output, you could using dynamic variables set standard output for local code and all code called from there on.
(let ((*standard-output* *some-other-stream*))
(stuff))
In this common-lisp example (from Seibel), standard output is bound to another stream for the duration of the let form, (inside its enclosing parens). When execution leaves the let, it goes back to whatever it was beforehand. See http://gigamonkeys.com/book/variables.html Peter Seibels free and excellent book, Practical Common Lisp, for a good discussion. In Seibels own words:
Dynamic bindings make global variables much more manageable, but it's important to notice they still allow action at a distance. Binding a global variable has two at a distance effects--it can change the behavior of downstream code, and it also opens the possibility that downstream code will assign a new value to a binding established higher up on the stack. You should use dynamic variables only when you need to take advantage of one or both of these characteristics.
Well, there's a bunch of websites that discuss the pro's and con's, so I'm not going there.
One interesting language that has some features that faintly resemble dynamic scope is XSLT; although XSLT's templates and variables and the like are lexically scoped, XSLT is of course all about XML - and the current position in the xml tree is "dynamically scoped" in the sense that the context node is global and thus that XPath expressions are evaluated not according to XSLT's lexical scope but according to it's dynamic evaluation.
Dynamic scope is/was easier to implement with interpreters. Most early Lisp interpreters were using dynamic scope. After several years lexical scope was found to have an advantage, but was first mostly implemented in Lisp compilers. Several implementations appeared that implemented dynamic scope in interpreted code and lexical scope in compiled code. Some provided a special language construct to provide closures. Lisp dialects like Scheme and Common Lisp required then that there is no difference between interpreted and compiled code and thus interpreted based implementations had to implement lexical scope, too.
Early Smalltalk implementations implemented dynamic scope. All kinds of Lisp dialect implementations implemented dynamic scope (Interlisp, UCI Lisp, Lisp Machine Lisp, MacLisp, ...).
Almost all new Lisp dialects from the last 20 years use lexical scope by default or even exclusively. Several publications have described in detail how to implement Lisp with lexical scope - so there is no excuse not to use lexical scope.
All the shell languages (bash, ksh, etc) use dynamic scoping.