Cython overhead on extension types in memoryview - cython

I am compiling a Cython module, and checked this piece of code with cython -a command.
cdef INT_t print_info(Charge[:] electrons):
cdef INT_t i, index
for i in range(electrons.shape[0]):
index = electrons[i].particleindex
return index
It turns out that
+ index = electrons[i].particleindex
__pyx_t_4 = __pyx_v_i;
__pyx_t_3 = (PyObject *) *((struct __pyx_obj_14particle_class_Charge * *) ( /* dim=0 */ (__pyx_v_electrons.data + __pyx_t_4 * __pyx_v_electrons.strides[0]) ));
__Pyx_INCREF((PyObject*)__pyx_t_3);
__pyx_t_5 = ((struct __pyx_obj_14particle_class_Charge *)__pyx_t_3)->particleindex;
__Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0;
__pyx_v_index = __pyx_t_5;
Charge is a cdef extension type and I am trying to use a memoryview buffer Charge[:] here. It seems that Cython calls some Python API in this case, in particular __Pyx_INCREF((PyObject*) and __Pyx_DECREF(((PyObject *) have been generated.
I am wondering what causes this, will it cause a lot of slowdown? It is my first post in the forum, any comments or suggestions are greatly appreciated!
PS: Charge object is defined as
charge.pyx
cdef class Charge:
def __cinit__(Charge self):
self.particleindex = 0
self.charge = 0
self.mass = 0
self.energy = 0
self.on_electrode = False
charge.pxd
cdef class Charge:
cdef INT_t particleindex
cdef FLOAT_t charge
cdef FLOAT_t mass
cdef FLOAT_t energy
cdef bint on_electrode

Cython will likely be happier with pythonic code. Rewrite your function:
cdef INT_t print_info(Charge[:] electrons):
cdef INT_t i, index
for electron in electrons:
index = electron.particleindex
return index
and try again.

It's not the memoryview, it's the extension type. Cython extension types are treated with the same reference-counting semantics as Python objects.
You can get and work with pointers to them which do not change recounts, either with <void*> or with <PyObject*> (which you can cimport from cpython.ref), but the pointers obviously don't have any methods or attributes. The minute you try to cast back to the extension type type, the INCREF/DECREF code reappears. Those instructions are pretty fast though.
There was some talk on the mailing list about how non-refcounted references (i.e., with access to object data) to extension types might be a new feature in the future, but there seemed to be little enthusiasm for adding a feature that, realistically, is probably going to lead to a lot of horrifyingly buggy code of the "access violation" variety.

Related

How does one get a pointer to a Cython memoryview's data?

With the new Cython 3.0+ API, the usage of ndarray.data within Cython code is deprecated.
My question is how does one obtain a pointer to the underlying ndarray data then that is held by a memoryview? What is the proper syntax or methodology that abides by the NPY_1_7 C-API and Cython's new 3.0+ API?
My code used to look something like this:
arr = np.zeros((10,))
# arr_data is a pointer to the underlying data
cdef double* arr_data = <double*>arr.data
# I want a pointer because it can be passed efficiently to other functions
do_something_to_arr_data_in_cythonorcpp(arr_data)
cdef do_something_to_arr_data_in_cythonorcpp(double* arr_data):
for idx in range(10):
arr_data[idx] += idx
That's easy, just take the address of the first element of the memory view:
cdef int[::1] mm_view_1 = np.arange(100)
cdef int* p_int_1 = &mm_view_1[0]
cdef int[:, ::1] mm_view_2 = np.arange(100).reshape(10, 10)
cdef int* p_int_2 = &mm_view_2[0][0]
The step 1 in the memoryview int[::1], int[:, ::1] makes sure the underlying array is continuous in memory on corresponding dim.
You can get more information in the doc https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#pass-data-from-a-c-function-via-pointer

using special function such as __add__ in cython cdef classes

I am wanting to create a cython object that can has convenient operations such as addition multiplication and comparisons. But when I compile such classes they all seem to have a lot of python overhead.
A simple example:
%%cython -a
cdef class Pair:
cdef public:
int a
int b
def __init__(self, int a, int b):
self.a = a
self.b = b
def __add__(self, Pair other):
return Pair(self.a + other.a, self.b + other.b)
p1 = Pair(1, 2)
p2 = Pair(3, 4)
p3 = p1+p2
print(p3.a, p3.b)
But I end up getting quite large readouts from the annotated compiler
It seems like the __add__ function is converting objects from python floats to cython doubles and doing a bunch of type checking. Am I doing something wrong?
There's likely a couple of issues:
I'm assuming that you're using Cython 0.29.x (and not the newer Cython 3 alpha). See https://cython.readthedocs.io/en/stable/src/userguide/special_methods.html#arithmetic-methods
This means that you can’t rely on the first parameter of these methods being “self” or being the right type, and you should test the types of both operands before deciding what to do
It is likely treating self as untyped and thus accessing a and b as Python attributes.
The Cython 3 alpha treats special methods differently (see https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#arithmetic-methods) so you could also consider upgrading to that.
Although the call to __init__ has C typed arguements it's still a Python call so you can't avoid boxing and unboxing the arguments to Python ints. You could avoid this call and do something like:
cdef Pair res = Pair.__new__()
res.a = ... # direct assignment to attribute

Reference counting of memoryviews with nogil

I don't quite understand how reference counting is done with memoryviews in large/longer nogil sections. Let's assume basically all my code is nogil, except for the creation of a numpy-array-to-memoryview deep down. The memoryview is returned and used upwards.
A fairly simple example would be
import numpy as np
cdef:
double[::1] mv
cdef double[::1] someFun(int nn) nogil:
cdef:
double[::1] mvb
with gil:
mvb = np.arange(nn, dtype=np.double)
return mvb
with nogil:
mv = someFun(30)
# Here could be MUCH more "nogil" code
# How is memory management/reference counting done here?
I assume when someFun() returns the memoryview the refcount of the numpy array should still be at 1.
How does Cython handle the refcounting afterwards? I mean it's not allowed to change the refcount even if the memoryview/array is dereferenced, right? And how would it know to dereference the memoryview if there were several layers with nogil code above, and maybe unlike to someFun() the memoryview isn't returned upwards?
EDIT: So I figured out a rather crude way to do some more testing. My code now looks like this.
import numpy as np
cdef extern from "stdio.h":
int getchar() nogil
int printf(const char* formatt, ...) nogil
cdef:
double[::1] mv, mv2 = np.ones(3)
int ii, leng = 140000000
cdef double[::1] someFun(int nn) nogil:
cdef:
double[::1] mvb
with gil:
mvb = np.ones(nn, dtype=np.double)
return mvb
with nogil:
mv = someFun(leng)
printf("1st stop")
getchar()
mv = mv2
printf("2nd stop")
getchar()
The interesting part for me is that at the 1st stop the array/memoryview mv is still allocated, but when I dereference it gets free'd until 2nd stop. I only checked memory usage with htop (that's why the array is chosen so large), there is probably a better way.
Obviously that free/refcounting behavior what I want to happen, but it's weird that it does it when it doesn't have the GIL. Maybe memoryviews are not completely nogil?
Can someone explain to if this is reliable behavior?
Updating reference count of the memoryview in the nogil-block happens the same way your function someFun is nogil: it acquires gil to update the reference count.
The line
with nogil:
mv = someFun(leng)
is translated to the following C-code:
__pyx_t_3 = __pyx_f_3foo_someFun(__pyx_v_3foo_leng); if (unlikely(!__pyx_t_3.memview)) __PYX_ERR(0, 18, __pyx_L3_error)
__PYX_XDEC_MEMVIEW(&__pyx_v_3foo_mv, 0);
__pyx_v_3foo_mv = __pyx_t_3;
__pyx_t_3.memview = NULL;
__pyx_t_3.data = NULL;
in order to bind to a new value, the reference counting for the old value must be updated, which happens in __PYX_XDEC_MEMVIEW. Its implementation can be looked up here:
static CYTHON_INLINE void __Pyx_XDEC_MEMVIEW({{memviewslice_name}} *memslice,
int have_gil, int lineno) {
...
} else if (likely(old_acquisition_count == 1)) {
// Last slice => discard owned Python reference to memoryview object.
if (have_gil) {
Py_CLEAR(memslice->memview);
} else {
PyGILState_STATE _gilstate = PyGILState_Ensure();
Py_CLEAR(memslice->memview);
PyGILState_Release(_gilstate);
}
...
}
which means if we don't have gil (__Pyx_XDEC_MEMVIEW called with second argument = 0), it will be acquired to ensure that the reference counting is done properly.
The consequence of the above is, that rebinding a memory view is not cheap as it needs to acquire the GIL and thus should be avoided in tight nogil-loops.

How to use shared_ptr and make_shared with arrays?

I want to use a C++ shared_ptr as a replacement for raw C pointers. As a simple example the following code seems to work as intended:
from libcpp.memory cimport shared_ptr, allocator
cdef shared_ptr[double] spd
cdef allocator[double] allo
spd.reset(allo.allocate(13))
The size is chosen as 13 here, but in general is not know at compile time.
I'm not sure if this is correct, but I haven't had any errors (no memory leaks and segfaults yet). I'm curious if there is a more optimal solution with make_shared.
But I can't use C++11 arrays because Cython doesn't allow literals as templates, e.g. something like
cdef shared_ptr[array[double]] spd = make_shared[array[double,13]]()
and "normal" arrays which are supposed to work with C++20 compiler (e.g. gcc 10) are causing problems in one way or another:
# Cython error "Expected an identifier or literal"
cdef shared_ptr[double[]] spd = make_shared[double[]](3)
# Can't get ptr to work
ctypedef double[] darr
cdef shared_ptr[darr] spd = make_shared[darr](13)
cdef double* ptr = spd.get() # or spd.get()[0] or <double*> spd.get()[0] or ...
Is the allocator solution the correct and best one or is there another way how to do it?
Here is what I'm going with
cdef extern from *:
"""
template <typename T>
struct Ptr_deleter{
size_t nn;
void (*free_ptr)(T*, size_t);
Ptr_deleter(size_t nIn, void (*free_ptrIn)(T*, size_t)){
this->nn = nIn;
this->free_ptr = free_ptrIn;
};
void operator()(T* ptr){
free_ptr(ptr, nn);
};
};
template <typename T>
std::shared_ptr<T> ptr_to_sptr (T* ptr, size_t nn, void (*free_ptr)(T*, size_t)) {
Ptr_deleter dltr(nn, free_ptr);
std::shared_ptr<T> sp (ptr, dltr);
return sp;
};
"""
shared_ptr[double] d_ptr_to_sptr "ptr_to_sptr"(double* ptr, size_t nn, void (*free_ptr)(double*, size_t) nogil) nogil
cdef void free_d_ptr(double* ptr, size_t nn) nogil:
free(ptr)
cdef shared_ptr[double] sp_d_empty(size_t nn) nogil:
return d_ptr_to_sptr(<double*> nullCheckMalloc(nn*sizeof(double)), nn, &free_d_ptr)
My understanding is that the "right" way to handle malloced arrays is to use a custom deleter like I did. I personally prefer sticking with somewhat-raw C pointers (double* instead of double[] or something), since it's more natural in Cython and my projects.
I think it's reasonably easy to see how to change free_ptr for more complicated data types. For simple data types it could be done in less lines and less convoluted, but I wanted to have the same base.
I like my solution in the regard that I can just "wrap" existing Cython/C code raw pointers in a shared_ptr.
When working with C++ (especially newer standards like C++20) I think verbatim code is pretty often necessary. But I've intentionally defined free_d_ptr in Cython, so it's easy to use existing Cython code to handle the actual work done to free/clear/whatever the array.
I didn't get C++11 std::arrays to work, and it's apparently not "properly" possible in Cython in general (see Interfacing C++11 array with Cython).
I didn't get double[] or similar to work either (is possible in C++20), but with verbatim C++ code I think this should be doable in Cython. I prefer more C-like pointers/arrays anyway as I said.

issue using deepcopy function for cython classes

I've been playing with Cython recently for the speed ups, but when I was trying to use copy.deepcopy() some error occurred.Here is the code:
from copy import deepcopy
cdef class cy_child:
cdef public:
int move[2]
int Q
int N
def __init__(self, move):
self.move = move
self.Q = 0
self.N = 0
a = cy_child((1,2))
b = deepcopy(a)
This is the error:
can't pickle _cython_magic_001970156a2636e3189b2b84ebe80443.cy_child objects
How can I solve the problem for this code?
As hpaulj says in the comments, deepcopy looks to use pickle by default to do its work. Cython cdef classes didn't used to be pickleable. In recent versions of Cython they are where possible (see also http://blog.behnel.de/posts/whats-new-in-cython-026.html) but pickling the array looks to be a problem (and even without the array I didn't get it to work).
The solution is to implement the relevant functions yourself. I've done __deepcopy__ since it's simple but alternatively you could implement the pickle protocol
def __deepcopy__(self,memo_dictionary):
res = cy_child(self.move)
res.Q = self.Q
res.N = self.N
return res
I suspect that you won't need to do that in the future as Cython improves their pickle implementation.
A note on memo_dictionary: Suppose you have
a=[None]
b=[A]
a[0]=B
# i.e. A contains a link to B and B contains a link to A
c = deepcopy(a)
memo_dictionary is used by deepcopy to keep a note of what it's already copied so that it doesn't loop forever. You don't need to do much with it yourself. However, if your cdef class contains a Python object (including another cdef class) you should copy it like this:
cdef class C:
cdef object o
def __deepcopy__(self,memo_dictionary):
# ...
res.o = deepcopy(self.o,memo_dictionary)
# ...
(i.e. make sure it gets passed on to further calls of deepcopy)