At a language level, what exactly is `ccall`? - function

I'm new to Julia, and I'm trying to understand, at the language level, what ccall is. At the syntax level, it looks like a normal function, but it clearly doesn't behave the same way in how it takes its arguments:
Note that the argument type tuple must be a literal tuple, and not a
tuple-valued variable or expression.
Additionally, if I evaluate a variable bound to a function in the Julia REPL, I get something like
julia> max
max (generic function with 15 methods)
But if I try to do the same with ccall:
julia> ccall
ERROR: syntax: invalid "ccall" syntax
Clearly, ccall is a special piece of syntax, but it's also not a macro (no # prefix, and invalid macro usage gives a more specific error). So, what is it? Is it something baked into the language, or something I could define myself with some language construct I'm not familiar with?
And if it is some baked-in piece of syntax, why was it decided to use function call notation, instead of implementing it as a macro or designing a more readable and distinct syntax?

In the current nightly (and thus, upcoming 0.6 release), much of the special behavior you observe has been removed (see this pull-request). ccall is no longer a reserved word, so it can be used as a function or macro name.
However there is still a slight oddity: defining a 3- or 4-argument function called ccall is allowed, but actually calling such a function will give an error about ccall argument types (other numbers of arguments are ok). The reasons go directly to your question:
So, what is it? Is it something baked into the language
Yes, ccall, though it will no longer be a keyword in 0.6, is still "baked in" to the language in several ways:
the :ccall([four args...]) expression form is recognized and specially handled during syntax lowering. This lowering step does several things including wrapping arguments in a call to unsafe_convert, which allows for customized conversion from Julia objects to C-compatible objects; as well as pulling out arguments that might need to be rooted to prevent garbage collection of a referenced object during the ccall. (see code_lowered output, or try the expand function; more info on the compiler here).
ccall requires extensive handling in the code generation backend, including: look-up of the requested function name in the specified shared library, and generation of an LLVM call instruction -- which is eventually translated to platform-specific machine code by the LLVM Just-In-Time compiler. (see the different stages with code_llvm and code_native).
And if it is some baked-in piece of syntax, why was it decided to use
function call notation, instead of implementing it as a macro or
designing a more readable and distinct syntax?
For the reasons detailed above, ccall requires special handling whether it looks like a macro or a function. In this mailing list thread, one of the Julia creators (Stefan Karpinski) commented on why not to make it a macro:
I suppose we could reimplement it as a macro, but that would really just be pushing the magic further down.
As far as "a more readable and distinct syntax", perhaps that is a matter of taste. It's not clear to me why some other syntax would be preferable (except for the convenience of a LuaJIT/CFFI-style inline C syntax parsing, of which I am a fan). My only strong personal wish for ccall would be to have arguments and types entered adjacent (e.g. ccall((:foo, :libbar), Void, (x::Int, y::Float))), because working with longer argument lists can be inconvenient. In 0.6 it will be possible to implement this form as a macro!

In Julia 0.5 and earlier.
It is not a function and it is not a macro.
It is indeed something special baked into the language.
It is an Intrinsic.
In julia 0.6 this changes
It a lot of ways it is more like a Macro than a function call.
But in other ways it is not -- it does not return an AST.
It does call a function and on a low enough level it looks similar to calling a julia function.
The history of why it looks the way it does is beyond me, you'ld need to hear from one of the people who worked on the earliest code for the language.
Right now it is everywhere, and is one of the harder things to change -- but not impossible. It would trigger up for 3 years of bikeshedding though :-P .
I like to think of ccall as being two things.
Foreign Function Interface, for C and other compiled languages (eg Fortran, Rust apparently work)
Way to access the raw guts of the language "runtime".
Foreign Function Interface (FFI)
Most of the time when one uses ccall in a package one wants to invoke some code that is in a compile library. In this sense it is C-Call, like R-Call, or Py-Call.
I think mlewe/BlossomV.jl is a nice compact example.
For a more intense example oxinabox/SLEEF.jl.
As an FFI, it does not have to share memory space/a process with julia -- PyCall.jl does, RCall.jl and Matlab.jl don't.
It doesn't matter as long as the result comes back.
In these cases it is theoretically possible to replace ccall with some kind of safe_ccall which would run the called library in a separate process, and would not segfault julia if the library being called segfaulted.
But as of yet, no-one has written such a method/package.
Using ccall for FFI is even done in Base, like for accessing MPFR to define BigFloat.
But this is not the main reason ccall is used in Base.
Accessing the guts of the language.
ccall is really what drives a large portion of the program "doing a thing".
It is used throughout Base, to call the functions from src.
For this, ccall basically triggers a function call at the compiled level, that shifts the instruction pointer directly into the compiled code of the ccalled function. Like calling a function would if the whole thing had been written in say C.
You can see in base/threadingconstructs.jl ccall being used to manage work on threads -- that triggers code from src/threading.c.
It is used to map a section of disk to memory. mmap.jl. -- obviously can't be done from another process.
It is used to make a section of code non-intruptable
It is used call LibC to do things like malloc to allocate memory (though right now this is mostly used as part of FFI).
There are tricks you can do with ccall to #undef a variable after it has already been assigned.
ccall is in many ways the "master" key to the language.
Conclusion
I've described ccall here as two things, a FFI function and a core part of the language "runtime". This duality is not real, and there is plenty of overlap, like filehandling (is it FFI?).
The behavour many expect ccall to have comes from its FFI uses.
Here ccall could just be a function.
The behaviour it actually has comes from it's use as a core part of the language -- linking the julia code of the standard library in Base to the low level C code from src.
Allowing the very direct control over the running of the julia process.

Related

Racket: Using "csv-reading" package within a function

I am using csv-reading to read from a csv file to convert it into a list.
When I call at the top level, like this
> (call-with-input-file "to-be-asked.csv" csv->list)
I am able to read csv file and convert it into list of lists.
However, if I call the same thing within a function, I am getting the error.
> (read-from-file "to-be-asked.csv")
csv->list: undefined;
cannot reference an identifier before its definition
in module: top-level
I am not getting what's going wrong. I have added (require csv-reading) before the function call.
My read-from-file code is:
(define (read-from-file file-name)
(call-with-input-file file-name csv->list))
EDIT
I am using racket within emacs using Geiser. When I (exit) the buffer and type C-c C-z, it is showing the error.
When I kill the buffer and start the Geiser again, it is working properly.
Is it the mistake of Geiser and emacs?
You've hit the classic problem with what I'll call resident programming environments (I don't know the right word for then). A resident programming environment is one where you talk to a running instance of the language, successively modifying its state.
The problem with these environments is that the state of the running language instance is more-or-less opaque and in particular it can get out of sync with the state you can see in files or buffers. That means that it can become obscure why something is happening and, worse, you can get into states where the results you get from the resident environment are essentially unreproducible later. This matters a lot for things like Jupyter notebooks where people doing scientific work can end up with results which they can't reproduce because the notebook was evaluated out of sequence or some of it was not evaluated at all.
On the other hand, these environments are an enormous joy to use which is why I use them. That outweighs the problems for me: you just have to be careful you can recreate the session and be willing to do so occasionally.
In this case you probably had something like this in the buffer/file:
(require csv-reading)
(define (read-from-file file-name)
(call-with-input-file file-name csv->list))
But you either failed to evaluate the first form at all, or (worse!) you evaluated the forms out of order. If you did this in Common Lisp or any traditional Lisp this would all be fine: evaluating the first form would make the second form work. But Racket decides once and for all what csv->list means (or does not mean) at the point the read-from-file is defined, and a later provide won't fix that. You then end up in the mysterious situation where the function you defined does not work, but if you define a new function which uses csv->list it will work. This is because it has much cleverer semantics than CL, but also semantics not designed for this kind of interactive development as far as I can tell (certainly DrRacket strongly discourages it).

Are "procedure" and "function" synonymous in Racket?

Are "procedure" and "function" synonymous in Racket (a dialect of Scheme)? It seems to be implied by the documentation. For example, the documentation for compose describes it as a procedure that
[r]eturns a procedure that composes the given functions...The compose function
allows the given functions to consume and produce any number of
values...
(All of the above italicization was added by me.)
I understand that procedure? is a library procedure and function? is not. My question is whether it is correct to use the terms interchangeably when discussing programs (such as when teaching a class or writing documentation).
TL;Dr It's just lingo and means the same. function, procedure, and static method is the same in programming.
Historically a function is in the mathematical sense a mapping between arguments to a result. A procedure is a block of code that does something and its output does not need to be tied to any specific input. Thus you could say a function is a procedure with no side effects.
The Scheme standard uses only the term procedure. You won't find any hints about function at all. Racket is historically a standard Scheme implementation made for education purposes and is pretty much still compatible with Scheme for the most part today, but they have made a split and does not consider themselves to follow a Scheme standard. How to design programs and lots of documentation uses the term function and in this documentation it is meant as a synonym to procedure.
Common Lisp uses the term function consistently and its predecessors too, which predates Scheme.
I think I have even translated a SO answer between languages and changed the code as well as just switched function and procedure for consistency with the languages lingo itself. I would love to see Racket clean up some day and stay with one name to rule them all.
The short version: yes.
The longer version: a number of folks have done good work on aligning vocabulary for use in teaching. This is the first paper that comes to mind, although it does not specifically address the procedure/function choice:
https://cs.brown.edu/~sk/Publications/Papers/Published/mfk-measur-effect-error-msg-novice-sigcse/paper.pdf
From a pedagogic standpoint, of course, it's unhelpful to have two names for the same thing, sigh.
Finally, you'll get a more authoritative answer (and frankly, I'd like to know what the state of things here is) if you ask this question on the Racket Mailing List.
[EDIT] Ooh, further, I would not at all say that the word procedure is more likely to denote something defined in a library.

Making superclass variables read-only to children in TCL OO

Let's I have a class foo, with a variable bar. Now... I want that if there is a class moo, which has class foo as a superclass, I want that any attempts to write to, or better yet, even refer directly to bar will be errored out. This could save situations when someone is using my code (which could be compiled to byte-code), to not override, by having one's own variable with the same name
TclOO simply does not support the concept. Classes are not security boundaries in TclOO, just as namespaces are not security boundaries in plain Tcl (TclOO objects are really just fancy namespaces). Tcl's security boundaries are between interpreters, and between the Tcl script level and the (usually) C implementation level. We're considering adding “private” instance variables for Tcl 8.7, but even those won't be truly private; their names will still be predictable if you know how (and they will be accessible from outside the class; that's important for when using the variable with third-party code such as Tk). To reiterate: classes are not security boundaries.
If you have something that must be locked out of sight, it is easiest to implement it in C. You can plug in methods implemented in C into TclOO (applying whatever controls you can think of) and those methods can use the (C level only) metadata mechanism to create instance-attached storage that they can use. All the callbacks are in place to do deletion correctly at the right time. Methods in C are not much more complicated than commands in C; the function callback signature is a little different and the usage is a bit more complicated (because there are other standard operations on methods such as copying them) but if you can do one, you can figure out how to do the other too.

Understanding complex post-conditions in DbC

I have been reading over design-by-contract posts and examples, and there is something that I cannot seem to wrap my head around. In all of the examples I have seen, DbC is used on a trivial class testing its own state in the post-conditions (e.g. lots of Bank Accounts).
It seems to me that most of the time when you call a method of a class, it does much more work delegating method calls to its external dependencies. I understand how to check for this in a Unit-Test with specific scenarios using dependency inversion and mock objects that focus on the external behavior of the method, but how does this work with DbC and post-conditions?
My second question has to deal with understanding complex post-conditions. It seems to me that to write out a post-condition for many functions, that you basically have to re-write the body of the function for your post-condition to know what the new state is going to be. What is the point of that?
I really do like the notion of DbC and I think that it has great promise, particularly if I can figure out how to reproduce some failure state once I find a validated contract. Over the past couple of hours I have been reading some neat stuff wrt. automatic test generation in Eiffel. I am currently trying to improve my processes in C++ development, but I am open to learning something new if I can figure out how to not lose all of the ground I have made in my current projects. Thanks.
but how does this work with DbC and
post-conditions?
Every function is basically one of these:
A sequence of statements
A conditional statement
A loop
The idea is that you should check any postconditions about the results of the function that go beyond the union of the postconditions of all the functions called.
that you basically have to re-write
the body of the function for your
post-condition to know what the new
state is going to be
Think about it the other way round. What made you write the function in the first place? What were you pursuing? Can that be expressed in a postcondition which is more simple than the function body itself? A postcondition will typically use queries (what in C++ are const functions), while the body usually combines commands and queries (methods that modify the object and methods which only get information from it).
In some cases, yes, you will find out that you can really add little value with postconditions. In these cases, writing a bunch of tests will typically be enough.
See also:
Bertrand Meyer, Contract Driven
Development
Related questions 1, 2
Delegation at the contract level
most of the time when you call a
method of a class, it does much more
work delegating method calls to its
external dependencies
As for this first question: the implementation of a function/method may call many other function/methods, but if the designer of the code had a clear mind, this does not imply that the specification of the caller is the concatenation of the specifications of the callees. For a method that calls many others, the size of the specification can remain contained if the method accomplishes a precise and well-defined task. Which it should if the whole system was well designed.
You are clearly asking your question from the point of view of run-time assertion checking. In this context, the above would perhaps be expressed as "you don't need to re-check in the post-condition of the caller that all the callees have respected their respective contracts. These checks will already be made on each call. In the post-condition of the caller, only check the functionally visible result of the caller."
Understanding complex post-conditions
You may find this "ACSL by example" document interesting (although probably different from what you're used to). It contains many examples of formal contracts for C functions. The language of the contracts is intended for static verification instead of run-time checking, with all the advantages and the drawbacks that it entails. They are a little more sophisticated than the "Bank Accounts" that you mention — these functions implement real algorithms, although simple ones. The document keeps the contracts short and readable by introducing well-thought-out auxiliary predicates (which would be called queries in Eiffel, as Daniel points out in his answer).

Besides Logo and Emacs Lisp, what are other pure dynamically scoped languages?

What are some examples of a dynamically scoped language? And what are the reasons for choosing that design? Is it because it is easy to implement?
Mathematica is another language that is dynamically scoped, via the Block construct. This is actually quite useful when working with formulas. It allows you to write things like
In[1]:= expr = a*t^2 + b*t+ c;
In[2]:= Block[{a = 1, b = -1, c = 2}, Table[expr, {t, 5}]]
Out[2]= {2, 4, 8, 14, 22}
which wouldn't work at all if variables like a and t were scoped lexically. It works particularly nicely with Mathematica's rule-rewriting system, which will, among other things, leave variables unevaluated (as symbolic expressions) if it doesn't have an existing definition for them.
Mathematica can fake lexical scoping with the Module construct, but what this really does is rewrite the expression in terms of new, allegedly unique symbol (you can cause clashes if you predict what the next unique symbol will be, which is easy in most cases). This means
Module[{x = 4},
Table[x * t, {t, 5}]]
will be turned into something like this:
Block[{x$134 = 4},
Table[x$134 * t, {t, 5}]
Emacs Lisp, in one of its libraries, has a construct (really a Lisp macro) called lexical-let that pulls exactly the same trick to fake lexical scoping.
There are performance advantages to real lexical scoping when you're compiling your language which you don't get with the fake lexicals of ELisp or Mathematica, since you need some mapping between the dynamic variable and its current value, which means doing lookups (through a hash table or property list or something) and additional layers of indirection.
EDIT: If you have only lexical variables, you can fake dynamic scoping by storing the original value of a global, lexical variable on entering the scope and guaranteeing that the old value is restored upon exiting the scope. In order to ensure that, you'll need something like Lisp's UNWIND-PROTECT or a finally block. I've seen this done using C++ destructors as well, mostly as an exercise.
Dynamically scoped languages are much easier to implement. To access variables which is not in the current activaiton record / stack frame, one just follows the control links. Static/lexical access links are then not needed, making stack frames smaller.
Dynamic variables can be "unpredictable" at runtime, because one needs to know in which order the actual stackframes are to know which variable will be used. This information is not available by just looking at the static structure of the code. One could quite easily get caught out if the actual call graph of the program is not easy to predict at implementation time. Thats why most languages today have static scoping (most Exception systems however, are dynamic as this is the most practical).
However in some cases, dynamically scoped variables are very useful. For example when redirecting output, you could using dynamic variables set standard output for local code and all code called from there on.
(let ((*standard-output* *some-other-stream*))
(stuff))
In this common-lisp example (from Seibel), standard output is bound to another stream for the duration of the let form, (inside its enclosing parens). When execution leaves the let, it goes back to whatever it was beforehand. See http://gigamonkeys.com/book/variables.html Peter Seibels free and excellent book, Practical Common Lisp, for a good discussion. In Seibels own words:
Dynamic bindings make global variables much more manageable, but it's important to notice they still allow action at a distance. Binding a global variable has two at a distance effects--it can change the behavior of downstream code, and it also opens the possibility that downstream code will assign a new value to a binding established higher up on the stack. You should use dynamic variables only when you need to take advantage of one or both of these characteristics.
Well, there's a bunch of websites that discuss the pro's and con's, so I'm not going there.
One interesting language that has some features that faintly resemble dynamic scope is XSLT; although XSLT's templates and variables and the like are lexically scoped, XSLT is of course all about XML - and the current position in the xml tree is "dynamically scoped" in the sense that the context node is global and thus that XPath expressions are evaluated not according to XSLT's lexical scope but according to it's dynamic evaluation.
Dynamic scope is/was easier to implement with interpreters. Most early Lisp interpreters were using dynamic scope. After several years lexical scope was found to have an advantage, but was first mostly implemented in Lisp compilers. Several implementations appeared that implemented dynamic scope in interpreted code and lexical scope in compiled code. Some provided a special language construct to provide closures. Lisp dialects like Scheme and Common Lisp required then that there is no difference between interpreted and compiled code and thus interpreted based implementations had to implement lexical scope, too.
Early Smalltalk implementations implemented dynamic scope. All kinds of Lisp dialect implementations implemented dynamic scope (Interlisp, UCI Lisp, Lisp Machine Lisp, MacLisp, ...).
Almost all new Lisp dialects from the last 20 years use lexical scope by default or even exclusively. Several publications have described in detail how to implement Lisp with lexical scope - so there is no excuse not to use lexical scope.
All the shell languages (bash, ksh, etc) use dynamic scoping.