Difference between PyObject and object? (and possibly other definitions) - cython

I'm looking for a precise definition of the following for the purpose of programming in Cython:
PyObject
object
Python object (as in "cannot convert X to Python object")
Cython object
and the differences between these and a shared object compiled from C/C++ code that Cython generates from a .pxd and .pyx file defining an extension type.
Edit: I meant the code one writes in C/C++ and the code that Cython generates. Compiling regular C/C++ code and Cython generated code will produce a different binary, right? What does Cython do that makes it worth not writing everything in C/C++?
(bonus: definition of PyObject in CPython source code)
I've tried learning cython from the official tutorials and reading the documentation, but this confusion is a major obstacle to further development.

Everything you manipulate in Python code is ultimately a "Python object". These are implemented/represented in C by the PyObject structure that contains
A link to the another PyObject defining the type.
A reference count (to decide when it can be destroyed)
Some data, which can be basically anything, depends on the type of the object and is what makes it useful.
(Almost) all Python programs should work in Cython. Thus, if you compile a Python program in Cython you're still using Python objects. Cython generates some C code that manipulates these Python objects using the Python C API (i.e. using them as PyObject*). You mostly don't need to worry about what it's actually doing since it behaves the same as in Python - they're automatically reference counted etc.
In Cython you can then specify types (e.g. cdef int, cdef char* or perhaps a C struct). These are directly C types, and using them appropriately gives you extra speed. They aren't Python objects (and so may need to be converted to Python objects if you want to pass them back to pure Python code - Cython knows how to do this for some, but not all, C types). The general rule in Cython is that everything is Python Object unless you say otherwise.
The object keyword in Cython is a way of specifying a type to be a Python object. (It also has its usual meaning from Python where it represents the type of an "empty" object). You don't normally have to use it since Cython assumes things are a Python object by default unless you tell it otherwise.
A "Cython object" refers to an object with a type defined as cdef class Something. These are still Python objects, however the "useful data" is stored in a way that Cython knows about which makes it quick to access from Cython. Often this useful data is composed of basic C types (like int or char*).
If you really want you can use the C PyObject directly in Cython, calling the C API functions yourself. If you do this then Cython does not take care of the reference counting for you (as it would do if you'd declared the type as object or simply not declared it). For this reason you shouldn't usually do this.
There's no real difference between writing C code yourself and letting Cython generate it - you can do anything it can do. It does take care of lots of tedious and hard-to-get-right stuff for you, and this is the main advantage.

Related

How to generate Verilog code with parametized modules in Chisel?

The following module definition in chisel:
class Mux2 (width: Int = 4) extends Module
does not result in a Verilog module that is parametrized. The generated Verilog RTL will instead substitute the parameter value that the user instantiated the module with.
Is there a way to generate Verilog with actual parametrized module definitions.
module Mux2 #(parameter width = 4)
If there is no way to do this this would be a very useful feature to add.
Unfortunately this is probably an impossible feature to add. Chisel is really just a Scala library of hardware primitives that enables you to write a Scala program to elaborate a circuit. Parameterization of Chisel generators is arbitrary Scala code which would be impossible to map to Verilog constructs in the general case. In fact, the primary utility of Chisel comes from enabling designers to use these higher-level constructs that do not exist in [synthesizable] Verilog (eg. object-oriented programming, functional programming).

Chisel Output with SystemVerilog Interfaces/Structs

I'm finding when generating Verilog output from the Chisel framework, all of the 'structure' defined in the chisel framework is lost at the interface.
This is problematic for instantiating this work in larger SystemVerilog designs.
Are there any extensions or features in Chisel to support this better? For example, automatically converting Chisel "Bundle" objects into SystemVerilog 'struct' ports.
Or creating SV enums, when the Chisel code is written using the Enum class.
Currently, no. However, both suggestions sound like very good candidates for discussion for future implementation in Chisel/FIRRTL.
SystemVerilog Struct Generation
Most Chisel code instantiated inside Verilog/SystemVerilog will use some interface wrapper that deals with converting the necessary signal names that the instantiator wants to use into Chisel-friendly names. As one example of doing this see AcceleratorWrapper. That instantiates a specific accelerator and does the connections to the Verilog names the instantiator expects. You can't currently do this with SystemVerilog structs, but you could accomplish the same thing with a SystemVerilog wrapper that maps the SystemVerilog structs to deterministic Chisel names. This is the same type of problem/solution that most people encounter/solve when integrating external IP in their project.
Kludges aside, what you're talking about is possible in the future...
Some explanation is necessary as to why this is complex:
Chisel is converted to FIRRTL. FIRRTL is then lowered to a reduced subset of FIRRTL called "low" FIRRTL. Low FIRRTL is then mapped to Verilog. Part of this lowering process flattens all bundles using uniquely determined names (typically a.b.c will lower to a_b_c but will be uniquified if a namespace conflict due to the lowering would result). Verilog has no support for structs, so this has to happen. Additionally, and more critically, some optimizations happen at the Low FIRRTL level like Constant Propagation and Dead Code Elimination that are easier to write and handle there.
However, SystemVerilog or some other language that a FIRRTL backend is targeting that supports non-flat types benefits from using the features of that language to produce more human-readable output. There are two general approaches for rectifying this:
Lowered types retain information about how they were originally constructed via annotations and the SystemVerilog emitter reconstructs those. This seems inelegant due to lowering and then un-lowering.
The SystemVerilog emitter uses a different sequence of FIRRTL transforms that does not go all the way to Low FIRRTL. This would require some of the optimizing transforms run on Low FIRRTL to be rewritten to work on higher forms. This is tractable, but hard.
If you want some more information on what passes are run during each compiler phase, take a look at LoweringCompilers.scala
Enumerated Types
What you mention for Enum is planned for the Verilog backend. The idea here was to have Enums emit annotations describing what they are. The Verilog emitter would then generate localparams. The preliminary work for annotation generation was added as part of StrongEnum (chisel3#885/chisel3#892), but the annotations portion had to be later backed out. A solution to this is actively being worked on. A subsequent PR to FIRRTL will then augment the Verilog emitter to use these. So, look for this going forward.
On Contributions and Outreach
For questions like this with (currently) negative answers, feel free to file an issue on the respective Chisel3 or FIRRTL repository. And even better than that is an RFC followed by an implementation.

Issue when linking cuBLAS subroutine (FORTRAN binding) with FORTRAN subroutines

I'm trying to optimize some molecular simulation code (written completely in fortran) by using GPUs. I've developed a small subroutine that performs matrix vector multiplication using the cuBLAS fortran binding library (non-thunking - /usr/local/cuda/src/fortran.c on Linux).
When I tested the subroutine outside of the rest of the code (i.e. without any other external subroutine calls) everything worked. When I compiled, I used these tags -names uppercase -assume nounderscore. Without them, I would receive undefined reference errors.
When porting this into the main function of the molecular dynamics code, the -assume nounderscore -names uppercase tags mess up all of my other function calls in the main program.
Any idea of a way around this? Please refer to my previous question where -assume nounderscore -names uppercase was suggested here
Thanks in advance!
I would try Fortran-C interop. With something like
interface
function cublas_alloc(argument list) bind(C, name="the_binding_name")
defs of arguments
end function
end interface
the binding name can be upper case or lowercase, whatever you need, for example, bind(C,name="CUBLAS_ALLOC"). No underscores will be appended to that.
The iso_c_binding module might be also helpful.

Given a pointer to a __global__ function, can I retrieve its name?

Suppose I have a pointer to a __global__ function in CUDA. Is there a way to programmatically ask CUDART for a string containing its name?
I don't believe this is possible by any public API.
I have previously tried poking around in the driver itself, but that doesn't look too promising. The compiler emitted code for <<< >>> kernel invocation clearly registers the mangled function name with the runtime via __cudaRegisterFunction, but I couldn't see any obvious way to perform a lookup by name/value in the runtime library. The driver API equivalent cuModuleGetFunction leads to an equally opaque type from which it doesn't seem possible to extract the function name.
Edited to add:
The host compiler itself doesn't support reflection, so there are no obvious fancy language tricks that could be pulled at runtime. One possibility would be to add another preprocessor pass to the compilation trajectory to build a static kernel function lookup table before the final build. That would be rather a lot of work, but it could be done, at least for "classic" compilation where everything winds up in a single translation unit.

Bootstrapping an interpreter?

We know that a compiler can be written in its own language using a trick known as bootstrapping. My question is whether this trick can be applied to interpreters as well?
In theory the answer is certainly yes, but there is one worry that the interpretation of source code will become more and more inefficient as we go through the iterations. Would that be a serious problem?
I'm bootstrapping a very dynamical system where the programs will be constantly changing, so it rules out a compiler.
Let me spell it out this way:
Let the i's be interpreters.
Let the L's be programming languages.
We can write i1 in machine code (lowest level), to interpret L1.
We then write i2 in L1, interpreting L2 -- a new language.
We then write i3 in L2, interpreting L3 -- another new language.
and so on...
We don't need any compiler above, just interpreters. Right?
It could be inefficient. That is my question, and how to overcome it if it is indeed inefficient.
That doesn't make sense. An interpreter doesn't produce a binary, so can't create something that can run itself standalone. Somewhere, ultimately, you need to have a binary that is the interpreter.
Example of a compiler bootstrapping. Let's say we have two languages A(ssembler) and C. We want to bootstrap a C compiler written in C. But we only have an assembler to start with.
Write basic C compiler in A
Write C compiler in C and compile with earlier compiler written in A
You now have a C compiler which can compile itself, you don't need A or the original compiler any more.
Later runs become just
Compile C program using compiler written in C
Now let's say you have an interpreted language instead, I'll call it Y. The first version can be called Y1, the next Y2 and so on. Let's try to "bootstrap" it.
First off we don't have anything that can interpret Y programs, we need to write a basic interpreter. Let's say we have a C compiler and write a Y1 interpreter in C.
Write Y1 interpreter in C, compile it
Write Y2 interpreter in Y1, run it on Y1 interpreter written in C
Write Y3 interpreter in Y2, run it on Y2 interpreter running on Y1 interpreter... Written in C.
The problem is that you can never escape the stack of interpreters as you never compile a higher level interpreter. So you're always going to need to compile and run that first version interpreter written in C. You can never escape it, which I think is the fundamental point of the compiler bootstrapping process. This is why I say your question does not make sense.
The answer depends on what is being interpreted. If you're targeting a virtual machine which interprets bytecode, and your language is being developed iteratively while the bytecode doesn't change, then it is not a given that you will lose performance along the way. There are plenty of examples of languages which are bootstrapped on a target VM which wasn't designed particularly for that language, and they don't suffer a significant performance hit as a direct result (Scala on the JVM, for instance).
Using the JVM for example, you'd write the first compiler in Java, which compiles your source language to JVM bytecode. Then you'd rewrite your compiler to do exactly the same thing but in your new source language. The resulting bytecode could be indistinguishable between the two. Note that this is not the same thing as writing an interpreter in an interpreted language, which will become slower with each iteration.
This sentence does not seem to make sense:
I'm bootstrapping a very dynamical system where the programs will be constantly changing, so it rules out a compiler.
No matter if you have an interpreter or a compiler: both will have to deal with something that is not changing, i.e. with your language. And even if the language is somehow "dynamic", then there will be a meta-language that is fixed. Most probably you also have some low-level code, or at least a data structure the interpreter is working with.
You could first design and formalize this low-level code (whatever it is) and write some program that can "run" this. Once you have this, you can add a stack of interpreters and as long as they all produce this low level code, efficiency should not be an issue.
You can indeed, and this is the approach used by squeak (and I believe many other smalltalks). Here is one approach to doing just that: https://github.com/yoshikiohshima/SqueakBootstrapper/blob/master/README