What's the difference using extension type (cdef) and set up pure python code? - cython

Note that I'm new to the C language. According to Basic Tutorial of Cython, I believe there are two ways of using Cython: Building the extension of pure Python code, and using Ctype variable (cdef).
What I don't understand is the difference between them. Which one of them is the more efficient or proper way to using Cython?

It's mostly historical.
Originally Cython only supported cdef declarations.
Pure Python mode was added as a way of adding declarations to a file to help speed it up while not requiring Cython.
Python added type annotations. Cython can increasingly use these (with the annotation_typing directive, which defaults to true). If you like the syntax of these better than cdef then use them. Or not.
The cdef version is slightly better tested and there's still gaps in what you can do in "pure Python" mode. Especially with respect to interfacing with native C/C++. But mostly they are different ways to achieve the same thing and they should generate largely the same code, so you should use whichever you prefer. You can also use a mixture.

Most Python codes can be directly "cythonized", with no change to your code. Nevertheless, to get the best of Cython, you need to adapt your Python code by providing the cdef and the type of your variables. Not mandatory, but essential to get the decent speed up that you expect from Cython.

Related

Does the implementation of Eigen depend on standard containers?

Eigen is an awesome algebra/matrix computation c++ library and I'm using it in a developing project. But someone told me not to use it because it depends on standard containers, which is doubtful to me. The reason not to use standard containers is complicated and we just ignore it for now. My question is, does eigen's implementation really depends on the standard containers? I've searched on the Eigen homepage but nothing found. Can anyone help me?
I would rather say no as there are only two very marginal use:
The first one is in IncompleteCholesky where std::vector and std::list are used to hold some temporary objects during the computation, not as member. This class is only used if a user explicitly uses it.
The second one is in SuperLUSupport module, which is a module to support a third library. Again, you cannot use accidentally!
The StlSupport module mentioned by Avi is just a helper module to ease the storage of Eigen's matrices within STL containers.
Yes, but a very little bit. You may not even need those parts, depending on your precise use. You can run a quick grep to see exactly what std:: containers are used and where. In 3.3.0, there is a std::vector member as well as a std::list<>::iterator in ./src/IterativeLinearSolvers/IncompleteCholesky.h, std::vectors are typically used as input for sparse matrices (SparseMatrix::setFromTriplets, although it really needs the iterators).
There is also the ./src/StlSupport/ directory, but I'm not sure that's what you don't want.

Extending embedded Python in C++ - Design to interact with C++ instances

There are several packages out there that help in automating the task of writing bindings between C\C++ and other languages.
In my case, I'd like to bind Python, some options for such packages are: SWIG, Boost.Python and Robin.
It seems that the straight forward process is to use these packages to create C\C++ linkable libraries (with mostly static functions) and have the higher language be extended using them.
However, my situation is that I already have a developed working system in C++ therefore plan to embed Python into it so that future development will be in Python.
It's not clear to me how, and if at all possible, to use these packages in helping to extend embedded Python in such a way that the Python code would be able to interact with the various Singleton instances already running in the system, and instantiate C++ classes and interact with them.
What I'm looking for is an insight regarding the design best fitted for this situation.
Boost.python lets you do a lot of those things right out of the box, especially if you use smart pointers. You can even inherit from C++ classes in Python, then pass instances of those back to your C++ code and have everything still work. My favorite resource on how to do various stuff is this (especially check out the "How To" section): http://wiki.python.org/moin/boost.python/ .
Boost.python is especially good if you're using smart pointers or intrusive pointers, as those translate transparently into PyObject reference counting. Also, it's very good at making factory functions look like Python constructors, which makes for very clean Python APIs.
If you're not using smart pointers, it's still possible to do all the things you want, but you have to mess with various return and lifetime policies, which can give you a headache.
To make it short: There is the modern alternative pybind11.
Long version: I also had to embed python. The C++ Python interface is small so I decided to use the C Api. That turned out to be a nightmare. Exposing classes lets you write tons of complicated boilerplate code. Boost::Python greatly avoids this by using readable interface definitions. However I found that boost lacks a sophisticated documentation and dor some things you still have to call the Python api. Further their build system seems to give people troubles. I cant tell since i use packages provided by the system. Finally I tried the boost python fork pybind11 and have to say that it is really convenient and fixes some shortcomings of boost like the necessity of the use of the Python Api, ability to use lambdas, the lack of an easy comprehensible documentation and automatic exception translation. Further it is header only and does not pull the huge boost dependency on deployment, so I can definitively recommend it.

Why compilers don't translate in simpler languages?

Usually compilers translate from the language they support to assembly. Or at most to an assembly-like language (bytecode), like GIMPLE/GENERIC for GCC or Python/Java/.NET bytecode.
Wouldn't it be simpler for a compiler translate to a simpler language, which already implement a big subset of their grammar?
For example an Objective-C compiler, which is 100% compatible with C, could add the semantics only for the syntax it extends to C's, translating it into C. I can see many advantages of doing this; one could use this Objective-C compiler to translate its code into C in order to compile the generated C code with a different compiler that doesn't support C++ (but that optimizes more, or that compiles quicker, or able to compile for more architectures). Or one would be able to use the generated C code in a project where only C is allowed.
I guess/hope that if things were working like this, it would have been a lot easier to write extensions for current languages (eg: adding to C++ keywords to ease the implementation of common patterns, or, still in C++, removing the declare before use rule by moving inline member functions to the end of header files)
What kind of penalties would there be? Generated code would be very difficult to be understood by humans? Compilers wouldn't be able to optimize as much as they can now? What else?
This is actually used by a lot of languages, through the use of Intermediate languages. The biggest example for this would be Pascal, which had the Pascal-P system: Pascal was compiled into a hypothetical assembly language. To port pascal would only mean making a compiler for this assembly language: a task a lot simpler than porting the entire pascal compiler. After writing this compiler, you'd only need to compile the (machine-independent) pascal compiler that was written in this.
Bootstrapping is also used quite often in programming language design. Many languages have their compilers written in the same language(Haskell comes to mind here). By doing this, writing a new functionality for the language simply means translating that idea into the current language, putting it into the compiler, then recompiling.
I don't think the problem with this method is really the readability of generated code(I don't sift through assembly byte-code generated through compilers, personally), but one of optimization. Many ideas in higher-level programming languages( weak-typing comes to mind) are hard to automatically translate into lower-level system languages such as C. There's a reason why GCC tends to do its optimization before code generation.
But for the most part, compilers do translate into simpler languages except for maybe the most basic of system languages.
Incidentally, as a counterexample, Tcl is one language that is known to be very-very hard (if not totally impossible) to translate to C. Over the last 20 years there have been a couple of projects that tried this, even one promise of a commercial product but none have materialized.
In part it is because Tcl is a very dynamic language (as any language with an eval function is). In part it is because the only way to know if something is code or data is to run the program.
Since Objective-C is a strict superset of C and C++ contains a very large amount that is a lot like C, to parse either you effectively already need to be able to parse C. In which case, outputting to machine code and outputting to more C code aren't substantially different in processing cost, the main cost to the user being that compiling now takes as long as it originally did plus the amount of time a second compiler takes.
Any attempt to copy and paste the stuff that looks like C and translate the rest around it would be prone to problems. Firstly, C++ isn't a strict superset of C so things that look like C don't necessarily compile exactly the same anyway (especially versus C99). And even if they did, supposing a user made an error in their C stuff, compilers don't tend to provide error information in a machine readable format, so it'd be really hard for the Objective-C to C layer to give the user a meaningful error after receiving e.g. "error at line 99".
That said, many compiler suites, like GCC and even more so like the upcoming Clang + LLVM, use an intermediate form to decouple the bit that knows about the specifics of one architecture from the bit that knows the specifics of a particular language. However, it tends to be more of a data structure than something intentionally easy to express as a written language.
So: compilers don't work like this for purely practical reasons.
Haskell is actually compiled this way: the GHC compiler first translates the source code to an intermediary functional language (which is less rich than Haskell self), performs optimizations and then lowers the whole thing to C code which is then compiled by GCC. This solutions has problems tough, and projects were started to replace this backend.
http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html
There is a compilers construction stack which is fully based on this idea. Any new language is implemented as a trivial translation into a lower level language or a combination of languages which are already defined within this stack.
http://www.meta-alternative.net/mbase.html
However, in order to be able to do so, you'd need at least some metaprogramming capabilities in every little language you add to a hierarchy. This requirement adds some severe limitations on languages semantics.

Are preprocessors obsolete in modern languages?

I'm making a simple compiler for a simple pet language I'm creating and coming from a C background(though I'm writing it in Ruby) I wondered if a preprocessor is necessary.
What do you think? Is a "dumb" preprocessor still necessary in modern languages? Would C#'s conditional compilation capabilities be considered a "preprocessor"? Does every modern language that doesn't include a preprocessor have the utilities necessary to properly replace it? (for instance, the C++ preprocessor is now mostly obsolete(though still depended upon) because of templates.)
C's preprocessing can do really neat things, but if you look at the things it's used for you realize it's often for just adding another level of abstraction.
Preprocessing for different operations on different platforms? It's basically a layer of abstraction for platform independence.
Preprocessing for easily adding complex code? Abstraction because the language isn't generic enough.
Preprocessing for adding extensions into your code? Abstraction because your code / your language isn't flexible enough.
So my answer is: you don't need a preprocessor if your language is high-level enough *. I wouldn't call preprocessing evil or useless, I just say that the more abstract the language gets, the less reason I can think for it needing preprocessing.
* What's high-level enough? That is, of course, entirely subjective.
EDIT: Of course, I'm only really referring to macros. Using preprocessors for interfacing with other code files or for defining constants is evil.
The preprocessor is a cheap method to provide incomplete metaprogramming facilities to a language in an ugly fashion.
Prefer true metaprogramming or Lisp-style macros instead.
A preprocesssor is not necessary. For real metaprogramming, you should have something like MetaML or Template Haskell or hygienic macros à la Scheme. For quick and dirty stuff, if your users absolutely must have it, there's always m4.
However, a modern language should support the equivalent of C's #line directives. Such directives enable the compiler to locate errors in the original source, even when that source is embedded in a parser generator or a lexer generator or a literate program. In other words,
Design your language so as not to need a preprocessor.
Don't bundle your language with a blessed preprocessor.
But if others have their own reasons for using a preprocessor (parser generation is a popular one), provide support for accurate error messages.
I think that preprocessors are a crutch to keep a language with poor expressive power walking.
I have seen so much abuse of preprocessors that I hate them with a passion.
A preprocessor is a separated phase of compilation.
While preprocessing can be useful in some cases, the headaches and bugs it can cause make it a problem.
In C, preprocessor is used mostly for:
Including data - While powerful, the most common use-cases do not need such power, and "import"/"using" stuff(like in Java/C#) is much cleaner to use, and few people need the remaining cases;
Defining constants - Why not just provide a "const" statement
Macros - While C-style macros are very powerful(they can include statements such as returns), they also harm readability. Generics/Templates are cleaner and, while less powerful in a few ways, they are easier to understand.
Conditional compilation - This is possibly the most legitimate use-case for preprocessors, but once again it's painful for readability. Separating platform-specific code in platform-specific source code and using common if statements ends up being better for readability.
So my answer is while powerful, the preprocessor harms readability and/or isn't the best way to deal with some problems. Newer languages tend to consider code maintenance very important, and for those reasons the preprocessor appears to be obsolete.
It's your language so you can build whatever capabilities you want into the language itself, without a need for a preprocessor. I don't think a preprocessor should be necessary, and it adds a layer of complexity and obscurity on top of a language. Most modern languages don't have preprocessors, and in C++ you only use it when you have no other choice.
By the way, I believe D handles conditional compilation without a preprocessor.
It depends on exactly what other features you offer. For example, if I have a const int N, do you offer for me to take N variables? Have N member variables, take an argument to construct all of them? Create N functions? Perform N operations that don't necessarily work in loops (for example, pass N arguments)? N template arguments? Conditional compilation? Constants that aren't integral?
The C preprocessor is so absurdly powerful in the proper hands, you'd need to make a seriously powerful language not to warrant one.
I would say that although you should avoid the pre-processor for most everything you normally do, it's still necessary.
For example, in C++, in order to write a unit-testing library like Catch, a pre-processor is absolutely necessary. They use it in two different ways: One for assertion expansion1, and one for nesting sections in test cases2.
But, the pre-processor shouldn't be abused to do compile-time computations in C++ where const-expressions and template meta-programming can be used.
Sorry, I don't have enough reputation to post more than two links, so I'm putting this here:
github.com/philsquared/Catch/blob/master/docs/assertions.md
github.com/philsquared/Catch/blob/master/docs/test-cases-and-sections.md
A others have pointed out, much of the functionality provided by the C preprocessor exists to compensate for limitations of the C language. For example, #include and inclusion guards exist due to the lack of an import statement, and macros largely exist due to the lack of inline functions and constant declarations.
However, the one feature of the C preprocessor that would still be beneficial in more modern languages is the #line directive, since this supports the use of semantically-rich preprocessors/compilers. An an example, consider yacc, which is a domain-specific-language (DSL) for writing a parser as a collection of BNF grammar rules. A central feature of yacc is that chunks of C code called actions can be embedded within BNF rules. When a BNF rule is used to parse a piece of an input file, an action embedded in that rule will be executed. The yacc compiler generates a C file that implements the BNF-based parser specified in the input file, and any actions that appeared in the input Yacc file are copied to the generated C file, but each action is surrounded by #line directives. This use of #line directives provides two important benefits.
First, if there is a syntax error in an action, then the error message generated by the C compiler can specify that the error occurred in, say, <input-file-to-yacc>, line 42 rather than in <output-file-generated-by-yacc>.c, line 3967.
Second, the location information provided by #line directives is copied into generated object code files created by the C compiler. So if you are using a debugger to investigate a program crash, if the bug that caused the crash originated from an action embedded in a Yacc input file, then the debugger will report the location of that buggy line of source code as being in <input-file-to-yacc>, line 42 rather than in <output-file-generated-by-yacc>.c, line 3967.
The designers of C# and Perl wisely provided a #line directive. Unfortunately, the designers of many other languages (Java being one that springs to mind) neglected to provide a #line directive. Because of this, Yacc-like parser generators for many languages are unable to communicate the source location of embedded actions to compilers (and, therefore, to debuggers).

What features of interpreted languages can a compiled one not have?

Interpreted languages are usually more high-level and therefore have features as dynamic typing (including creating new variables dynamically without declaration), the infamous eval and many many other features that make a programmer's life easier - but why can't compiled languages have these as well?
I don't mean languages like Java that run on a VM, but those that compile to binary like C(++).
I'm not going to make a list now but if you are going to ask which features I mean, please look into what PHP, Python, Ruby etc. have to offer.
Which common features of interpreted languages can't/don't/do exist in compiled languages? Why?
Whether source code is compiled - to native binaries, some kind of intermediate language (Java Bytecode/IL) - or interpreted is absolutely no trait of the language. It's just a question of the implementation.
You can actually have both compilers and interpreters for the same language like
Haskell: GHC <-> GHCI
C: gcc <-> ch
VB6: VS IDE <-> VB6 compiler
Certain language features like eval or dynamic typing may suggest a distinction between so called "dynamic languages" and static ones, but how this is run can never be the primary question.
Initially, one of the largest benefits of interpreted languages was debugging. That way you can get incredibly accurate and detailed information when looking for the reason a program isn't working. However, most compilers have become advanced enough that that is not too big of a deal any more.
The other main benefit (in my opinion anyway), is that with interpreted languages, you don't have to wait for eternity for your project to compile to test it out.
You couldn't plausibly do eval, for example, for reasons I'd have thought were pretty obvious: exactly how would you implement it? Make the runtime contain a full copy of the compiler? Every time you wanted to evaluate a string (keeping in mind that each time it could be different!) you'd save the string to a file, run the compiler on it to make a DLL/shared-lib, then load that DLL/shared-lib and call your code? You can't see why this might be a wee bit impractical? ;)
You can find this kind of thing in dynamic languages all over the place that you can't do with static code short of basically running an interpreter, in effect, behind the scenes.
Continuing on from Dario - I think you are really asking why a compiled program can't evaluate statements at runtime (e.g. eval). Here's some reasons I can think of:
The full compiler would have to be distributed with the program (or be part of the program)
For an eval function to have access to type information and symbols (such as variable names and function names) in the environment it was used the original program would have to be compiled with those symbols accessible (compiled languages usually remove these symbols at compile time).
Edit: As noted neither of these reasons make it impossible for a language/compiler to be able to evaluate code at runtime, but they are definitely things that need to be taken into consideration when developing a compiler or when designing a language.
Maybe the question is not about interpreted/compiled languages (compile is ambiguous anyway) but about languages that do/don't carry their own compiler around with them? For instance we've said C++ could do eval with a handy compiler floating around in the app, and reflection presumably is similar in some ways.