Inlining a cdef method from a cdef class from another cython package - cython

I have cython a class which looks like this:
cdef class Cls:
cdef func1(self):
pass
If I use this class in another library, will I be able to inline func1 which is a class method? Or should I find a way around it (by creating a func that takes a Cls pointer as an arg, for example?

There are bad and good news: The inlining isn't possible from the other module, but you don't have to pay the full price of a Python-function-call.
What is inlining? It is done by the C-compiler: when the C-compiler knows the definition of a function it can decide to inline it. This has two advantages:
You don't have to pay the overhead of calling a function
It makes further optimizations possible.
See for example:
%%cython -a
ctypedef unsigned long long ull
cdef ull doit(ull a):
return a
def calc_sum_fun():
cdef ull res=0
cdef ull i
for i in range(1000000000):#10**9
res+=doit(i)
return res
>>> %timeit calc_sum_fun()
53.4 ns ± 1.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
How was it possible to do 10^9 additions in 53 nanoseconds? Because it was not done: The C-Compiler inlined the cdef doit() and was able to calculate the result of the loop during the compiler time. So during the run time the program simple returns the precomputed result.
It is pretty obvious from there, that C compiler will not be able to inline a function from another module, because the definition is concealed from it in another c-file/translation-unit. As example see:
#simple.pdx:
ctypedef unsigned long long ull
cdef ull doit(ull a)
#simple.pyx:
cdef ull doit(ull a):
return a
def doit_slow(a):
return a
and now accessing it from another cython module:
%%cython
cimport simple
ctypedef unsigned long long ull
def calc_sum_fun():
cdef ull res=0
cdef ull i
for i in range(10000000):#10**7
res+=doit(i)
return res
leads to the following timings:
>>> %timeit calc_sum_fun()
17.8 ms ± 208 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Because the inlining was not possible, the function really has to do the loop... However, it does it faster than a normal python-call, which we can do by replacing cdef doit() through def doit_slow():
%%cython
import simple #import, not cimport
ctypedef unsigned long long ull
def calc_sum_fun_slow():
cdef ull res=0
cdef ull i
for i in range(10000000):#10**7
res+=simple.doit_slow(i) #slow
return res
Python-call is about 50 times slower!
>>> %timeit calc_sum_fun_slow()
1.07 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But you asked about class-methods and not global functions. For class-methods the inlining is not possible even in the same module:
%%cython
ctypedef unsigned long long ull
cdef class A:
cdef ull doit(self, ull a):
return a
def calc_sum_class():
cdef ull res=0
cdef ull i
cdef A a=A()
for i in range(10000000):#10**7
res+=a.doit(i)
return res
Leads to:
>>> %timeit calc_sum_class()
18.2 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
which is basically the same as in the case, where the cdef class is defined in another module.
The reason for this behavior is the way a cdef-class is build. It is a lot unlike virtual classes in C++ - the class definition has something similar to a virtual table called __pyx_vtab:
struct __pyx_obj_12simple_class_A {
PyObject_HEAD
struct __pyx_vtabstruct_12simple_class_A *__pyx_vtab;
};
where the pointer to cdef doit() is saved:
struct __pyx_vtabstruct_12simple_class_A {
__pyx_t_12simple_class_ull (*doit)(struct __pyx_obj_12simple_class_A *, __pyx_t_12simple_class_ull);
};
When we call a.doit() we don't call the function directly but via this pointer:
((struct __pyx_vtabstruct_12simple_class_A *)__pyx_v_a->__pyx_vtab)->doit(__pyx_v_a, __pyx_v_i);
which explains why the C-compiler cannot inline the function doit().

Related

How to define a tuple that has a Python object?

The language docs how to define a ctuple of regular C types, but would it be possible to mix a Python object in a ctuple?
A PyObject * is a quite cumbersome thing in C: every time its value (i.e. the address of a Python-object) is copied the reference counter must be increased and every time a PyObject * gets a new value (or goes out of scope) the reference counter must be decreased.
Very similar to C++ std::shared_ptr, only without (copy)constructor, destructor or assignment-operator being supported by C.
For a variable of type object, Cython manages reference count - but this doesn't work with C-structs out of the box.
So one has to fall back to PyObject * in a ctuple- the main difference between PyObject * and object, is that Cython no longer manages reference counting and thus it can be used in a ctuple.
How it should be done depends on the usage of ctuple.
If we have a guarantee, that Python objects live longer than our ctuple, we don't have to care about in-/decreasing reference counter (i.e. weak references are enough), e.g.:
%%cython
from cpython cimport PyObject
cdef (PyObject *, PyObject *) create_weak(object a, object b):
return (<PyObject *>a, <PyObject *>b) # Cython no longer manages ref-counting
def use_weak(a, b):
cdef (PyObject *, PyObject *) p = create_weak(a,b)
return <object>p[0], <object>p[1] # casting to object => Cython manages ref-counting
However, if we must ensure that the objects live long enough, we must perform reference counting (and that can be quite error prone):
%%cython
from cpython cimport PyObject, Py_XINCREF, Py_XDECREF
cdef (PyObject *, PyObject *) create(object a, object b):
cdef PyObject *a_ptr = <PyObject *>a
cdef PyObject *b_ptr = <PyObject *>b
Py_XINCREF(a_ptr) # need to ensure that objects
Py_XINCREF(b_ptr) # stay alive as long as ctuple lives
return (a_ptr, b_ptr)
cdef void free((PyObject *, PyObject *) p):
Py_XDECREF(p[0]) # p will go out of scope soon
Py_XDECREF(p[1]) # no need to keep objects alive
def use(a, b):
cdef (PyObject *, PyObject *) p = create(a,b)
# as long as object of p alive use them:
res0 = <object>p[0]
res1 = <object>p[1]
# before p goes out of scope decrease ref count of objects
free(p)
# res0, res1 are still alive, because Cython ensured
# it when casting to <object>
return res0, res1
Another alternative would be to use C++ and to wrap PyObject * into a C++ which would handle the reference counting, here is a small prototype:
%%cython -+
from cpython cimport PyObject
cdef extern from *:
"""
#include <Python.h>
class PyObjectHolder{
public:
PyObject *ptr;
PyObjectHolder():ptr(nullptr){}
PyObjectHolder(PyObject *o):ptr(o){
Py_XINCREF(ptr);
}
//rule of 3
~PyObjectHolder(){
Py_XDECREF(ptr);
}
PyObjectHolder(const PyObjectHolder &h):
PyObjectHolder(h.ptr){}
PyObjectHolder& operator=(const PyObjectHolder &other){
Py_XDECREF(ptr);
ptr=other.ptr;
Py_XINCREF(ptr);
return *this;
}
};
"""
cdef cppclass PyObjectHolder:
PyObjectHolder(object o)
PyObject *ptr
cdef (PyObjectHolder, PyObjectHolder) create_cpp(object a, object b):
return (PyObjectHolder(a), PyObjectHolder(b))
def use_cpp(a, b):
cdef (PyObjectHolder, PyObjectHolder) p = create_cpp(a,b)
return <object>(p[0].ptr), <object>(p[1].ptr)
If using c++ is possible, then using a wrapper for PyObject seems to me the saner alternative.

How to call a cdef method

I'd like to call my cdef methods and improve the speed of my program as much as possible. I do not want to use cpdef (I explain why below). Ultimately, I'd like to access cdef methods (some of which return void) that are members of my Cython extensions.
I tried following this example, which gives me the impression that I can call a cdef function by making a Python (def) wrapper for it.
I can't reproduce these results, so I tried a different problem for myself (summing all the numbers from 0 to n).
Of course, I'm looking at the documentation, which says
The directive cpdef makes two versions of the method available; one fast for use from Cython and one slower for use from Python.
and later (emphasis mine),
This does slightly more than providing a python wrapper for a cdef method: unlike a cdef method, a cpdef method is fully overridable by methods and instance attributes in Python subclasses. It adds a little calling overhead compared to a cdef method.
So how does one use a cdef function without the extra calling overhead of a cpdef function?
With the code at the end of this question, I get the following results:
def/cdef:
273.04207632583245
def/cpdef:
304.4114626176919
cpdef/cdef:
0.8969507060538783
Somehow, cpdef is faster than cdef. For n < 100, I can occasionally get cpdef/cdef > 1, but it's rare. I think it has to do with wrapping the cdef function in a def function. This is what the example I link to does, but they claim better performance from using cdef than from using cpdef.
I'm pretty sure this is not how you wrap a cdef function while avoiding the additional overhead (the source of which is not clearly documented) of a cpdef.
And now, the code:
setup.py
from setuptools import setup, Extension
from Cython.Build import cythonize
pkg_name = "tmp"
compile_args=['-std=c++17']
cy_foo = Extension(
name=pkg_name + '.core.cy_foo',
sources=[
pkg_name + '/core/cy_foo.pyx',
],
language='c++',
extra_compile_args=compile_args,
)
setup(
name=pkg_name,
ext_modules=cythonize(cy_foo,
annotate=True,
build_dir='build'),
packages=[
pkg_name,
pkg_name + '.core',
],
)
foo.py
def foo_def(n):
sum = 0
for i in range(n):
sum += i
return sum
cy_foo.pyx
def foo_cdef(n):
return foo_cy(n)
cdef int foo_cy(int n):
cdef int sum = 0
cdef int i = 0
for i in range(n):
sum += i
return sum
cpdef int foo_cpdef(int n):
cdef int sum = 0
cdef int i = 0
for i in range(n):
sum += i
return sum
test.py
import timeit
from tmp.core.foo import foo_def
from tmp.core.cy_foo import foo_cdef
from tmp.core.cy_foo import foo_cpdef
n = 10000
# Python call
start_time = timeit.default_timer()
a = foo_def(n)
pyTime = timeit.default_timer() - start_time
# Call Python wrapper for C function
start_time = timeit.default_timer()
b = foo_cdef(n)
cTime = timeit.default_timer() - start_time
# Call cpdef function, which does more than wrap a cdef function (whatever that means)
start_time = timeit.default_timer()
c = foo_cpdef(n)
cpTime = timeit.default_timer() - start_time
print("def/cdef:")
print(pyTime/cTime)
print("def/cpdef:")
print(pyTime/cpTime)
print("cpdef/cdef:")
print(cpTime/cTime)
The reason for your seemingly anomalous result is that you aren't calling the cdef function foo_cy directly, but instead the def function foo_cdef wrapping it.
when you are wrapping inside another def indeed you are again calling the python function. However you should be able to reach similar results as the cpdef.
Here is what you could do:
like the python def, give the type for both input and output
def foo_cdef(int n):
cdef int val = 0
val = foo_cy(n)
return val
this should have similar results as cpdef, however again you are calling a python function. If you want to directly call the c function, you should use the ctypes and call from there.
and for the benchmarking, the way that you have written, it only considers one run and could fluctuate a lot due OS other task and as well the timer.
better to use the timeit builtin method to calculate for some iteration:
# Python call
pyTime = timeit.timeit('foo_def(n)',globals=globals(), number=10000)
# Call Python wrapper for C function
cTime = timeit.timeit('foo_cdef(n)',globals=globals(), number=10000)
# Call cpdef function, which does more than wrap a cdef function (whatever that means)
cpTime = timeit.timeit('foo_cpdef(n)',globals=globals(), number=10000)
output:
def/cdef:
154.0166154428522
def/cpdef:
154.22669848136132
cpdef/cdef:
0.9986378296327566
like this, you get consistent results and as well you see always close to 1 for both either cython itself wraps or we explicitly wrap around a python function.

Cython, using function pointer inside class

I am trying to use a pointer inside a cython class.
the outside_class ctypedef works like a charm but i am unable to get the inside_class to work. a "ctypedef statement not allowed here" error is thrown and i don't understand what is wrong.
Why should this work
the outside_class typdef works so i assumed it should also work inside. I was unable to get it to work so i tried to find some more information on it, unfortunately all information is about the outside_class example so i do not know whether the other is allowed or even possible. to me it seems the only difference is the self argument.
Why do i want this to work
This class is going to contain 35+ functions with the same arguments, when used only a part of those functions is called in a specific order. When initializing i want to create an array with all functions in the correct order. Of course a different way of doing so is also welcome.
updated code sample 14-02
test A & B work but C & D do not, error message is given below.
My code:
ctypedef int (*outside_class)()
ctypedef int (*inside_class)(Preprocess)
cdef int outside_foo():
return 12
cdef int outside_bar(Preprocess self):
return 20
cdef class Preprocess:
cdef int inside_foo(self):
return 18
cdef int inside_bar(self):
return 14
cdef int inside_sek(self):
return 16
def __init__(self):
cdef outside_class test_A
test_A = &outside_foo
print( test_A() )
cdef inside_class test_B
test_B = &outside_bar
print( test_B(self) )
cdef inside_class test_C
test_C = &self.inside_foo
#print( test_C(self) )
print( "no error, yet.." )
cdef inside_class test_D
test_D = &self.inside_foo
print( test_D(self) )
error
/home/boss/.pyxbld/temp.linux-x86_64-2.7/pyrex/aa/preprocessing/preprocessing.c: In function ‘__pyx_pf_7aa_13preprocessing_13preprocessing_10Preprocess___init__’:
/home/boss/.pyxbld/temp.linux-x86_64-2.7/pyrex/aa/preprocessing/preprocessing.c:938:18: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
__pyx_v_test_C = (&((struct __pyx_vtabstruct_7aa_13preprocessing_13preprocessing_Preprocess *)__pyx_v_se
^
/home/boss/.pyxbld/temp.linux-x86_64-2.7/pyrex/aa/preprocessing/preprocessing.c:955:18: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
__pyx_v_test_D = (&((struct __pyx_vtabstruct_7aa_13preprocessing_13preprocessing_Preprocess *)__pyx_v_se
^
12
20
no error, yet..
Segmentation fault (core dumped)
cython raises the error as soon as it sees the cdeftype within the class definition. It hasn't even looked at, or run, the &self.inside_foo assignment:
0000:~/mypy/cython3$ cython stack42214943.pyx -a
Error compiling Cython file:
------------------------------------------------------------
...
cdef int outside_foo():
return 12
cdef class Preprocess:
ctypedef int (*inside_class)(Preprocess)
^
------------------------------------------------------------
stack42214943.pyx:8:4: ctypedef statement not allowed here
If I try cdef int(*)(Preprocess) inside_test, I get a Syntax error in C variable declaration. Again before the self line.
(edit)
With the following code I can create and run both a python list of 3 functions and a C array of the same.
def __init__(self):
cdef outside_class test_A
test_A = &outside_foo
print( test_A() )
cdef inside_class test_B
test_B = &outside_bar
print( test_B(self) )
print(self.inside_foo())
cpdef evalc(self):
# cdef int (*inside_array[3]) (Preprocess)
cdef inside_class inside_array[3]
inside_array[0] = self.inside_foo
inside_array[1] = self.inside_bar
inside_array[2] = self.inside_sek
print('eval inside_array')
for fn in inside_array:
print(fn(self))
def evals(self):
alist = [self.inside_foo, self.inside_bar, self.inside_sek]
alist = [fn(self) for fn in alist]
print(alist)
self.evalc()
In an Ipython session I can compile and import this, and run it with:
In [3]: p=stack42214943.Preprocess()
12
20
18
In [4]: p.evals()
[18, 14, 16]
eval inside_array
18
14
16
In [5]: p.evalc()
eval inside_array
18
14
16
I haven't figured out how to define and access inside_array outside of the evalc function. But maybe I don't need to. And instead of printing, that function could return the 3 values as some sort of int array or list.

Cython return tuple within cdef?

Hi I am trying to convert a python code into cython in order to speed up its calculation. I am trying to return multiple arrays within the cython code from a cdef to cpdef. Based on classical C, I could either use a pointer or a tuple. I decide to use tuple because the size varies. I know the following code doesn't work, any help? Thank you!
import numpy as np
cimport numpy as np
cdef tuple funA(double[:] X, double[:] Y):
cdef int nX, nY, i
nX = len(X)
nY = len(Y)
for i in range(nX):
X[i] = X[i]*X[i]
for i in range(nY):
Y[i] = Y[i]*Y[i]
return X,Y
cpdef Run(double[:] X, double[:] Y)
cdef Tuple1, Tuple2 = funA(X,Y)
# Do some calculation with Tuple1 and Tuple2
# Example
cdef int i, nTuple1, nTuple2
nTuple1 = len(Tuple1)
for i in range(nTuple1):
Tuple1[i] = Tuple1[i]**2
nTuple2 = len(Tuple2)
for i in range(nTuple2):
Tuple2[i] = Tuple2[i]/2
return Tuple1, Tuple2
You've got a few indentation errors and missing colons. But your real issue is:
cdef Tuple1, Tuple2 = funA(X,Y)
Remove the cdef and it's fine. It doesn't look like cdef and tuple unpacking quite mix, and since you're treating them as Python objects it should be OK.
However, note that you don't really need to return anything from funA since you modify X and Y them in place there.

Why cannot I pass a c array to a function which expects memory view in nogil content?

cdef double testB(double[:] x) nogil:
return x[0]
def test():
cdef double xx[2]
with nogil:
testB(xx)
# compiler error: Operation not allowed without gil
If with gil, it works fine.
Is it because that when pass in an c array, it creates a memory view and such creation action actually requires gil? So the memory view is not completely a c object?
Update
%%cython --annotate
cimport cython
cdef double testA(double[:] x) nogil:
return x[0]
cpdef myf():
cdef double pd[8]
cdef double[:] x = pd
testA(x)
cdef double[:] x = pd is compiled to:
__pyx_t_3 = __pyx_format_from_typeinfo(&__Pyx_TypeInfo_double);
__pyx_t_2 = Py_BuildValue((char*) "(" __PYX_BUILD_PY_SSIZE_T ")", ((Py_ssize_t)8));
if (unlikely(!__pyx_t_3 || !__pyx_t_2 || !PyBytes_AsString(__pyx_t_3))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_3);
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_1 = __pyx_array_new(__pyx_t_2, sizeof(double), PyBytes_AS_STRING(__pyx_t_3), (char *) "fortran", (char *) __pyx_v_pd);
if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_4 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(((PyObject *)__pyx_t_1));
if (unlikely(!__pyx_t_4.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0;
__pyx_v_x = __pyx_t_4;
__pyx_t_4.memview = NULL;
__pyx_t_4.data = NULL;
There exists __Pyx_PyObject_to_MemoryviewSlice_ds_double. So it seems when binding a memory view it does require gil.
You should use a numpy array, as your cdef double[:] declaration gets wrapped by a Python object, and its use is restricted without gil. You can see it by trying to slice a double[:]
def test()
cdef double[:] asd
with nogil:
asd[:1]
Your output will be:
with nogil:
asd[:1]
^
------------------------------------------------------------
prueba.pyx:16:11: Slicing Python object not allowed without gil
Using a numpy array would compile; numpy uses Python buffer protocole, and is smoothly integrated with Cython (a Google Summercamp project was financed for this). So no wrapping conflict arises inside the def:
import numpy as np
cdef double testA(double[:] x) nogil:
return x[0]
cpdef test():
xx = np.zeros(2, dtype = 'double')
with nogil:
a = testB(xx)
print(a)
This will build your module with test() on it. But it crashes, and in an ugly way (at least with mi PC):
Process Python segmentation fault (core dumped)
If I may insist with my (now deleted) previous answer, in my own experience, when dealing with Cython memoryviews and C arrays, passing pointers works just like one would expect in C. And most wrapping is avoided (actually, you are writing the code passing exactly the directions you want, thus making unnecesary wrapping). This compiles and functions as expected:
cdef double testB(double* x) nogil:
return x[0]
def test():
cdef double asd[2]
asd[0] = 1
asd[1] = 2
with nogil:
a = testB(asd)
print(a)
And, after compilig:
In [5]: import prueba
In [6]: prueba.test()
1.0
Memoryviews are not, by themselves, Python objects, but they can be wrapped in one. I am not a proficient Cython programmer, so sometimes I get unexpected wrappings or code that remains at Python level when I supposed it would be at C. Trial and error got me to the pointer strategy.