Octave JIT compiler. Current state, and minimal example demonstrating effect - octave

I hear very conflicting information about Octave's experimental JIT compiler feature, ranging from "it was a toy project but it basically doesn't work" to "I've used it and I get a significant speedup".
I'm aware that in order to use it successfully one needs to
Compile octave with the --enable-jit at configure time
Launch octave with the --jit-compiler option
Specify jit compilation preference at runtime using jit_enable and jit_startcnt commands
but I have been unable to reproduce the effects convincingly; not sure if this is because I've missed out any other steps I'm unaware of, or it simply doesn't have much of an effect on my machine.
Q: Can someone who has used the feature successfully provide a minimal working example demonstrating its proper use and the effect it has (if any) on their machine?

In short:
you don't need to do anything to use JIT, it should just work and speed up your code if it can;
it's mostly useless because it only works for simple loops;
it's little more than a proof of a concept;
there is no one currently working on improving it because:
it's a complicated problem;
it's mostly a fix for sloppy Octave code;
uses LLVM which is too unstable.
Q: Can someone who has used the feature successfully provide a minimal working example demonstrating its proper use and the effect it has (if any) on their machine?
There is nothing to show. If you build Octave with JIT support, Octave will automatically use faster code for some loops. The only difference is on the speed, and you don't have to change your code (although you can disable jit at runtime):
octave> jit_enable (1) # confirm JIT is enabled
octave> tic; x = 0; for i=1:100000, x += i; endfor, toc
Elapsed time is 0.00490594 seconds.
octave> jit_enable (0) # disable JIT
octave> tic; x = 0; for i=1:100000, x += i; endfor, toc
Elapsed time is 0.747599 seconds.
## but you should probably write it like this
octave> tic; x = sum (1:100000); toc
Elapsed time is 0.00327611 seconds.
## If Octave was built without JIT support, you will get the following
octave> jit_enable (1)
warning: jit_enable: support for JIT was unavailable or disabled when Octave was built
This is a simple example but you can see better examples and more details on the blog of the only person that worked on it, as well as his presentation at OctConf 2012. More details on the (outdated), Octave's JIT wiki page
Note that Octave's JIT works only for very simple loops. So simple loops that no one familiar with the language would write them in the first place. The feature is there as proof of concept, and a starting point for anyone that may want to extend it (I personally prefer to write vectorized code, that's what the language is designed for).
Octave's JIT has one other problem which is that it uses LLVM. Octave developers have found it too unreliable for this purpose because it keeps breaking backwards compatibility. Every minor release of LLVM has broken Octave builds so Octave developers stopped fixing when LLVM 3.5 got released and disabled it by default.

Related

How to disable or remove numba and cuda from python project?

i've cloned a "PointPillars" repo for 3D detection using just point cloud as input. But when I came to run it, I noted it use cuda and numba. With any prior knowledge about these two, I'm asking if there is any way to remove or disable numba and cuda. I want to run it on local server with CPU only, so I want your advice to solve.
The actual code matters here.
If the usage is only of vectorize or guvectorize using the target=cuda parameter, then "removal" of CUDA should be trivial. Just remove the target parameter.
However if there is use of the #cuda.jit decorator, or explicit copying of data between host and device, then other code refactoring would be involved. There is no simple answer here in that case, the code would have to be converted to an alternate serial or parallel realization via refactoring or porting.

When is a command's compile function called?

My understanding of TCL execution is, if a command's compile function is defined, it is first called when it comes to execute the command before its execution function is called.
Take command append as example, here is its definition in tclBasic.c:
static CONST CmdInfo builtInCmds[] = {
{"append", (Tcl_CmdProc *) NULL, Tcl_AppendObjCmd,
TclCompileAppendCmd, 1},
Here is my testing script:
$ cat t.tcl
set l [list 1 2 3]
append l 4
I add gdb breakpoints at both functions, TclCompileAppendCmd and Tcl_AppendObjCmd. My expectation is TclCompileAppendCmd is hit before Tcl_AppendObjCmd.
Gdb's target is tclsh8.4 and argument is t.tcl.
What I see is interesting:
TclCompileAppendCmd does get hit first, but it is not from t.tcl,
rather it is from init.tcl.
TclCompileAppendCmd gets hit several times and they all are from init.tcl.
The first time t.tcl executes, it is Tcl_AppendObjCmd gets hit, not TclCompileAppendCmd.
I cannot make sense of it:
Why is the compile function called for init.tcl but not for t.tcl?
Each script should be independently compiled, i.e. the object with compiled command append at init.tcl is not reused for later scripts, isn't it?
[UPDATE]
Thanks Brad for the tip, after I move the script to a proc, I can see TclCompileAppendCmd is hit.
The compilation function (TclCompileAppendCmd in your example) is called by the bytecode compiler when it wants to issue bytecode for a particular instance of that particular command. The bytecode compiler also has a fallback if there is no compilation function for a command: it issues instructions to invoke the standard implementation (which would be Tcl_AppendObjCmd in this case; the NULL in the other field causes Tcl to generate a thunk in case someone really insists on using a particular API but you can ignore that). That's a useful behaviour, because it is how operations like I/O are handled; the overhead of calling a standard command implementation is pretty small by comparison with the overhead of doing disk or network I/O.
But when does the bytecode compiler run?
On one level, it runs whenever the rest of Tcl asks for it to be run. Simple! But that's not really helpful to you. More to the point, it runs whenever Tcl evaluates a script value in a Tcl_Obj that doesn't already have bytecode type (or if the saved bytecode indicates that it is for a different resolution context or different compilation epoch) except if the evaluation has asked to not be bytecode compiled by the flag TCL_EVAL_DIRECT to Tcl_EvalObjEx or Tcl_EvalEx (which is a convenient wrapper for Tcl_EvalObjEx). It's that flag which is causing you problems.
When is that flag used?
It's actually pretty simple: it's used when some code is believed to be going to be run only once because then the cost of compilation is larger than the cost of using the interpretation path. It's particularly used by Tk's bind command for running substituted script callbacks, but it is also used by source and the main code of tclsh (essentially anything using Tcl_FSEvalFileEx or its predecessors/wrappers Tcl_FSEvalFile and Tcl_EvalFile). I'm not 100% sure whether that's the right choice for a sourced context, but it is what happens now. However, there is a workaround that is (highly!) worthwhile if you're handling looping: you can put the code in a compiled context within that source using a procedure that you call immediately or use an apply (I recommend the latter these days). init.tcl uses these tricks, which is why you were seeing it compile things.
And no, we don't normally save compiled scripts between runs of Tcl. Our compiler is fast enough that that's not really worthwhile; the cost of verifying that the loaded compiled code is correct for the current interpreter is high enough that it's actually faster to recompile from the source code. Our current compiler is fast (I'm working on a slower one that generates enormously better code). There's a commercial tool suite from ActiveState (the Tcl Dev Kit) which includes an ahead-of-time compiler, but that's focused around shrouding code for the purposes of commercial deployment and not speed.

Why is the initialization of the GPU taking very long on Kepler achitecture and how to fix this?

When running my application the very first cuda_malloc takes 40 seconds which is due to the initialization of the GPU. When I build in debug mode this reduces to 5 seconds and when I run the same code on a Fermi device, it takes far less than a second (not even worth measuring in my case).
Now the funny thing is that if I compile for this specific architecture, using the flag sm35 instead of sm20, it becomes fast again. As I should not use any new sm35 features just yet, how can I compile for sm20 and not have this huge delay? Also I am curious what is causing this delay? Is the machine code recompiled on the fly into sm35 code?
Ps. I run on windows but a colleague of mine encountered the same problem, probably on windows. The device is a Kepler, driver version 320.
Yes, the machine code is recompiled on the fly. This is called the JIT-compile step, and it will occur any time the machine code does not match the device that is being used (and assuming valid PTX code exists in the executable.)
You can learn more about JIT-compile here. Note the discussion of the cache which should alleviate the issue after the first run.
If you specify compilation for both sm_20 and sm_35, you can build a binary/executable that will run quickly on both types of devices, and you will also get notification if you are using a sm_35 feature that is not supported on sm_20 (during the compile process).

What is "runtime"?

I have heard about things like "C Runtime", "Visual C++ 2008 Runtime", ".NET Common Language Runtime", etc.
What is "runtime" exactly?
What is it made of?
How does it interact with my code? Or maybe more precisely, how is my code controlled by it?
When coding assembly language on Linux, I could use the INT instruction to make the system call. So, is the runtime nothing but a bunch of pre-fabricated functions that wrap the low level function into more abstract and high level functions? But doesn't this seem more like the definition for the library, not for the runtime?
Are "runtime" and "runtime library" two different things?
ADD 1
These days, I am thinking maybe Runtime has something in common with the so called Virtual Machine, such as JVM. Here's the quotation that leads to such thought:
This compilation process is sufficiently complex to be broken into
several layers of abstraction, and these usually involve three
translators: a compiler, a virtual machine implementation, and an
assembler. --- The Elements of Computing Systems (Introduction,
The Road Down To Hardware Land)
ADD 2
The book Expert C Programming: Deep C Secrets. Chapter 6 Runtime Data Structures is an useful reference to this question.
ADD 3 - 7:31 AM 2/28/2021
Here's some of my perspective after getting some knowledge about processor design. The whole computer thing is just multiple levels of abstraction. It goes from elementary transistors all the way up to the running program. For any level N of abstraction, its runtime is the immediate level N-1 of abstraction that goes below it. And it is God that give us the level 0 of abstraction.
Runtime describes software/instructions that are executed while your program is running, especially those instructions that you did not write explicitly, but are necessary for the proper execution of your code.
Low-level languages like C have very small (if any) runtime. More complex languages like Objective-C, which allows for dynamic message passing, have a much more extensive runtime.
You are correct that runtime code is library code, but library code is a more general term, describing the code produced by any library. Runtime code is specifically the code required to implement the features of the language itself.
Runtime is a general term that refers to any library, framework, or platform that your code runs on.
The C and C++ runtimes are collections of functions.
The .NET runtime contains an intermediate language interpreter, a garbage collector, and more.
As per Wikipedia: runtime library/run-time system.
In computer programming, a runtime library is a special program library used by a compiler, to implement functions built into a programming language, during the runtime (execution) of a computer program. This often includes functions for input and output, or for memory management.
A run-time system (also called runtime system or just runtime) is software designed to support the execution of computer programs written in some computer language. The run-time system contains implementations of basic low-level commands and may also implement higher-level commands and may support type checking, debugging, and even code generation and optimization.
Some services of the run-time system are accessible to the programmer through an application programming interface, but other services (such as task scheduling and resource management) may be inaccessible.
Re: your edit, "runtime" and "runtime library" are two different names for the same thing.
The runtime or execution environment is the part of a language implementation which executes code and is present at run-time; the compile-time part of the implementation is called the translation environment in the C standard.
Examples:
the Java runtime consists of the virtual machine and the standard library
a common C runtime consists of the loader (which is part of the operating system) and the runtime library, which implements the parts of the C language which are not built into the executable by the compiler; in hosted environments, this includes most parts of the standard library
I'm not crazy about the other answers here; they're too vague and abstract for me. I think more in stories. Here's my attempt at a better answer.
a BASIC example
Let's say it's 1985 and you write a short BASIC program on an Apple II:
] 10 PRINT "HELLO WORLD!"
] 20 GOTO 10
So far, your program is just source code. It's not running, and we would say there is no "runtime" involved with it.
But now I run it:
] RUN
How is it actually running? How does it know how to send the string parameter from PRINT to the physical screen? I certainly didn't provide any system information in my code, and PRINT itself doesn't know anything about my system.
Instead, RUN is actually a program itself -- its code tells it how to parse my code, how to execute it, and how to send any relevant requests to the computer's operating system. The RUN program provides the "runtime" environment that acts as a layer between the operating system and my source code. The operating system itself acts as part of this "runtime", but we usually don't mean to include it when we talk about a "runtime" like the RUN program.
Types of compilation and runtime
Compiled binary languages
In some languages, your source code must be compiled before it can be run. Some languages compile your code into machine language -- it can be run by your operating system directly. This compiled code is often called "binary" (even though every other kind of file is also in binary :).
In this case, there is still a minimal "runtime" involved -- but that runtime is provided by the operating system itself. The compile step means that many statements that would cause your program to crash are detected before the code is ever run.
C is one such language; when you run a C program, it's totally able to send illegal requests to the operating system (like, "give me control of all of the memory on the computer, and erase it all"). If an illegal request is hit, usually the OS will just kill your program and not tell you why, and dump the contents of that program's memory at the time it was killed to a .dump file that's pretty hard to make sense of. But sometimes your code has a command that is a very bad idea, but the OS doesn't consider it illegal, like "erase a random bit of memory this program is using"; that can cause super weird problems that are hard to get to the bottom of.
Bytecode languages
Other languages (e.g. Java, Python) compile your code into a language that the operating system can't read directly, but a specific runtime program can read your compiled code. This compiled code is often called "bytecode".
The more elaborate this runtime program is, the more extra stuff it can do on the side that your code did not include (even in the libraries you use) -- for instance, the Java runtime environment ("JRE") and Python runtime environment can keep track of memory assignments that are no longer needed, and tell the operating system it's safe to reuse that memory for something else, and it can catch situations where your code would try to send an illegal request to the operating system, and instead exit with a readable error.
All of this overhead makes them slower than compiled binary languages, but it makes the runtime powerful and flexible; in some cases, it can even pull in other code after it starts running, without having to start over. The compile step means that many statements that would cause your program to crash are detected before the code is ever run; and the powerful runtime can keep your code from doing stupid things (e.g., you can't "erase a random bit of memory this program is using").
Scripting languages
Still other languages don't precompile your code at all; the runtime does all of the work of reading your code line by line, interpreting it and executing it. This makes them even slower than "bytecode" languages, but also even more flexible; in some cases, you can even fiddle with your source code as it runs! Though it also means that you can have a totally illegal statement in your code, and it could sit there in your production code without drawing attention, until one day it is run and causes a crash.
These are generally called "scripting" languages; they include Javascript, Perl, and PHP. Some of these provide cases where you can choose to compile the code to improve its speed (e.g., Javascript's WebAssembly project). So Javascript can allow users on a website to see the exact code that is running, since their browser is providing the runtime.
This flexibility also allows for innovations in runtime environments, like node.js, which is both a code library and a runtime environment that can run your Javascript code as a server, which involves behaving very differently than if you tried to run the same code on a browser.
In my understanding runtime is exactly what it means - the time when the program is run. You can say something happens at runtime / run time or at compile time.
I think runtime and runtime library should be (if they aren't) two separate things. "C runtime" doesn't seem right to me. I call it "C runtime library".
Answers to your other questions:
I think the term runtime can be extended to include also the environment and the context of the program when it is run, so:
it consists of everything that can be called "environment" during the time when the program is run, for example other processes, state of the operating system and used libraries, state of other processes, etc
it doesn't interact with your code in a general sense, it just defines in what circumstances your code works, what is available to it during execution.
This answer is to some extend just my opinion, not a fact or definition.
Matt Ball answered it correctly. I would say about it with examples.
Consider running a program compiled in Turbo-Borland C/C++ (version 3.1 from the year 1991) compiler and let it run under a 32-bit version of windows like Win 98/2000 etc.
It's a 16-bit compiler. And you will see all your programs have 16-bit pointers. Why is it so when your OS is 32bit? Because your compiler has set up the execution environment of 16 bit and the 32-bit version of OS supported it.
What is commonly called as JRE (Java Runtime Environment) provides a Java program with all the resources it may need to execute.
Actually, runtime environment is brain product of idea of Virtual Machines. A virtual machine implements the raw interface between hardware and what a program may need to execute. The runtime environment adopts these interfaces and presents them for the use of the programmer. A compiler developer would need these facilities to provide an execution environment for its programs.
Run time exactly where your code comes into life and you can see lot of important thing your code do.
Runtime has a responsibility of allocating memory , freeing memory , using operating system's sub system like (File Services, IO Services.. Network Services etc.)
Your code will be called "WORKING IN THEORY" until you practically run your code.
and Runtime is a friend which helps in achiving this.
a runtime could denote the current phase of program life (runtime / compile time / load time / link time)
or it could mean a runtime library, which form the basic low level actions that interface with the execution environment.
or it could mean a runtime system, which is the same as an execution environment.
in the case of C programs, the runtime is the code that sets up the stack, the heap etc. which a requirement expected by the C environment. it essentially sets up the environment that is promised by the language. (it could have a runtime library component, crt0.lib or something like that in case of C)
Runtime basically means when program interacts with the hardware and operating system of a machine. C does not have it's own runtime but instead, it requests runtime from an operating system (which is basically a part of ram) to execute itself.
I found that the following folder structure makes a very insightful context for understanding what runtime is:
You can see that there is the 'source', there is the 'SDK' or 'Software Development Kit' and then there is the Runtime, eg. the stuff that gets run - at runtime. It's contents are like:
The win32 zip contains .exe -s and .dll -s.
So eg. the C runtime would be the files like this -- C runtime libraries, .so-s or .dll -s -- you run at runtime, made special by their (or their contents' or purposes') inclusion in the definition of the C language (on 'paper'), then implemented by your C implementation of choice. And then you get the runtime of that implementation, to use it and to build upon it.
That is, with a little polarisation, the runnable files that the users of your new C-based program will need. As a developer of a C-based program, so do you, but you need the C compiler and the C library headers, too; the users don't need those.
If my understanding from reading the above answers is correct, Runtime is basically 'background processes' such as garbage collection, memory-allocation, basically any processes that are invoked indirectly, by the libraries / frameworks that your code is written in, and specifically those processes that occur after compilation, while the application is running.
The fully qualified name of Runtime seems to be the additional environment to provide programming language-related functions required at run time for non-web application software.
Runtime implements programming language-related functions, which remain the same to any application domain, including math operations, memory operations, messaging, OS or DB abstraction service, etc.
The runtime must in some way be connected with the running applications to be useful, such as being loaded into application memory space as a shared dynamic library, a virtual machine process inside which the application runs, or a service process communicating with the application.
Runtime is somewhat opposite to design-time and compile-time/link-time. Historically it comes from slow mainframe environment where machine-time was expensive.

Is there any self-improving compiler around?

I am not aware of any self-improving compiler, but then again I am not much of a compiler-guy.
Is there ANY self-improving compiler out there?
Please note that I am talking about a compiler that improves itself - not a compiler that improves the code it compiles.
Any pointers appreciated!
Side-note: in case you're wondering why I am asking have a look at this post. Even if I agree with most of the arguments I am not too sure about the following:
We have programs that can improve
their code without human input now —
they’re called compilers.
... hence my question.
While it is true that compilers can improve code without human interference, however, the claim that "compilers are self-improving" is rather dubious. These "improvements" that compilers make are merely based on a set of rules that are written by humans (cyborgs anyone?). So the answer to your question is : No.
On a side note, if there was anything like a self improving compiler, we'd know... first the thing would improve the language, then its own code and finally, it would modify its code to become a virus and make all developers use it... and then finally we'd have one of those classic computer-versus-humans-last-hope-for-humanity kind of things... so ... No.
MilepostGCC is a MachineLearning compiler, which improve itself with time in the sense that it is able to change itself in order to become "better" with time. A simpler iterative compilation approach is able to improve pretty much any compiler.
25 years of programming and I have never heard of such a thing (unless you're talking about compilers that auto download software updates!).
Not yet practically implemented, to my knowledge, but yes, the theory is there:
Goedel machines: self-referential universal problem solvers making provably optimal self- improvements.
A self improving compiler would, by definition, have to have self modifying code. If you look around, you can find examples of people doing this (self modifying code). However, it's very uncommon to see - especially on projects as large and complex as a compiler. And it's uncommon for the very good reason that it's ridiculously hard (ie, close to impossible) to guarantee correct functionality. A lot of coders who think they're smart (especially Assembly coders) play around with this at one point or another. The ones who actually are smart mostly move out of this phase. ;)
In some situations, a C compiler is run several times without any human input, getting a "better" compiler each time.
Fortunately (or unfortunately, from another point of view) this process plateaus after a few steps -- further iterations generate exactly the same compiler executable as the last.
We have all the GCC source code, but the only C compiler on this machine available is not GCC.
Alas, parts of GCC use "extensions" that can only be built with GCC.
Fortunately, this machine does have a functional "make" executable, and some random proprietary C compiler.
The human goes to the directory with the GCC source, and manually types "make".
The make utility finds the MAKEFILE, which directs it to run the (proprietary) C compiler to compile GCC, and use the "-D" option so that all the parts of GCC that use "extensions" are #ifdef'ed out. (Those bits of code may be necessary to compile some programs, but not the next stage of GCC.). This produces a very limited cut-down binary executable, that barely has enough functionality to compile GCC (and the people who write GCC code carefully avoid using functionality that this cut-down binary does not support).
The make utility runs that cut-down binary executable with the appropriate option so that all the parts of GCC are compiled in, resulting in a fully-functional (but relatively slow) binary executable.
The make utility runs the fully-functional binary executable on the GCC source code with all the optimization options turned on, resulting in the actual GCC executable that people will use from now on, and installs it in the appropriate location.
The make utility tests to make sure everything is working OK: it runs the GCC executable from the standard location on the GCC source code with all the optimization options turned on. It then compares the resulting binary executable with the GCC executable in the standard location, and confirms that they are identical (with the possible exception of irrelevant timestamps).
After the human types "make", the whole process runs automatically, each stage generating an improved compiler (until it plateaus and generates an identical compiler).
http://gcc.gnu.org/wiki/Top-Level_Bootstrap and http://gcc.gnu.org/install/build.html and Compile GCC with Code Sourcery have a few more details.
I've seen other compilers that have many more stages in this process -- but they all require some human input after each stage or two. Example: "Bootstrapping a simple compiler from nothing" by Edmund Grimley Evans 2001
http://homepage.ntlworld.com/edmund.grimley-evans/bcompiler.html
And there is all the historical work done by the programmers who have worked on GCC, who use previous versions of GCC to compile and test their speculative ideas on possibly improved versions of GCC. While I wouldn't say this is without any human input, the trend seems to be that compilers do more and more "work" per human keystroke.
I'm not sure if it qualifies, but the Java HotSpot compiler improves code at runtime using statistics.
But at compile time? How will that compiler know what's deficient and what's not? What's the measure of goodness?
There are plenty of examples of people using genetic techniques to pit algorithms against each other to come up with "better" solutions. But these are usually well-understood problems that have a metric.
So what metrics could we apply at compile time? Minimum size of compiled code, cyclometric complexity, or something else? Which of these is meaningful at runtime?
Well, there is JIT (just in time) techniques. One could argue that a compiler with some JIT optimizations might re-adjust itself to be more efficient with the program it is compiling ???