How do I read a C char array into a python bytearray with cython? - cython

I have an array with bytes and its size:
cdef char *bp
cdef size_t size
How do I read the array into a Python bytearray (or another appropriate structure that can easily be pickled)?

Three reasonably straightforward ways to do it:
Use the appropriate C API function as I suggested in the comments:
from cpython.bytes cimport PyBytes_FromStringAndSize
output = PyBytes_FromStringAndSize(bp,size)
This makes a copy, which may be an issue with a sufficiently large string. For Python 2 the functions are similarly named but with PyString rather than PyBytes.
View the char pointer with a typed memoryview, get a numpy array from that:
cdef char[::1] mview = <char[:size:1]>(bp)
output = np.asarray(mview)
This shouldn't make a copy, so could be more efficient if large.
Do the copy manually:
output = bytearray(size)
for i in range(size):
output[i] = bp[i]
(this could be somewhat accelerated with Cython if needed)
This issue I think you're having with ctypes (based on the subsequent question you linked to in the comments) is that you cannot pass C pointer to the ctypes Python interface. If you try to pass a char* to a Python function Cython will try to convert it to a string. This fails because it stops at the first 0 element (hence you need size). Therefore you aren't passing ctypes a char*, you're passing it a nonsense Python string.

Related

How to use C complex numbers in 'language=c++' mode?

Most of my library is written with Cython in the "normal" C mode. Up to now I rarely needed any C++ functionality, but always assumed (and sometimes did!) I could just switch to C++-mode for one module if I wanted to.
So I have like 10+ modules in C-mode and 1 module in C++-mode.
The problem is now how Cython seems to handle complex numbers definitions. In C-mode it assumes I think C complex numbers, and in C++-mode it assumes I think C++ complex numbers. I've read they might be even the same by now, but in any case Cython complains that they are not:
openChargeState/utility/cheb.cpp:2895:35: error: cannot convert ‘__pyx_t_double_complex {aka std::complex<double>}’ to ‘__complex__ double’ for argument ‘1’ to ‘double cabs(__complex__ double)’
__pyx_t_5 = ((cabs(__pyx_v_num) == INFINITY) != 0);
In that case I'm trying to use cabs defined in a C-mode module, and calling it from the C++-mode module.
I know there are some obvious workarounds (right now I'm just not using C++-mode; I'd like to use vectors and instead use the slower Python lists for now).
Is there maybe a way to tell my C++-mode module to use C complex numbers, or tell it that they are the same? If there is I couldn't find a working way to ctypedef C complex numbers in my C++-mode module... Or are there any other solutions?
EDIT: Comments of DavidW and ead suggested some reasonable things. First the minimum working example.
setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
from Cython.Build import cythonize
extra_compile_args=['-O3']
compdir = {'language_level' : '3'}
extensions = cythonize([
Extension("cmod", ["cmod.pyx"]),
Extension("cppmod", ["cppmod.pyx"], language='c++')
],
compiler_directives = compdir
)
setup(cmdclass = {'build_ext': build_ext},
ext_modules = extensions
)
import cppmod
cmod.pyx
cdef double complex c_complex_fun(double complex xx):
return xx**2
cmod.pxd
cdef double complex c_complex_fun(double complex xx)
cdef extern from "complex.h":
double cabs(double complex zz) nogil
cppmod.pyx
cimport cmod
cdef double complex cpp_complex_fun(double complex xx):
return cmod.c_complex_fun(xx)*abs(xx) # cmod.cabs(xx) doesn't work here
print(cpp_complex_fun(5.5))
Then just compile with python3 setup.py build_ext --inplace.
Now the interesting part is that (as written in the code) only "indirectly" imported c functions have a problem, in my case cabs. So the suggestion to just use abs actually does help, but I still don't understand the underlying logic. I hope I don't encounter this in another problem. I'm leaving the question up for now. Maybe somebody knows what's happening.
Your problem has nothing to do with the fact, that one module is compiled as a C-extension and the other as a C++-extension - one can easily reproduce the issue in a C++-extension alone:
%%cython -+
cdef extern from "complex.h":
double cabs(double complex zz) nogil
def cpp_complex_fun(double complex xx):
return cabs(xx)
results in your error-message:
error: cannot convert __pyx_t_double_complex {aka
std::complex<double>} to __complex__ double for argument 1 to
double cabs(__complex__ double)
The problem is that the complex numbers are ... well, complex. Cython's strategy (can be looked up here and here) to handle complex numbers is to use an available implementation from C/CPP and if none is found a hand-written fallback is used:
#if !defined(CYTHON_CCOMPLEX)
#if defined(__cplusplus)
#define CYTHON_CCOMPLEX 1
#elif defined(_Complex_I)
#define CYTHON_CCOMPLEX 1
#else
#define CYTHON_CCOMPLEX 0
#endif
#endif
....
#if CYTHON_CCOMPLEX
#ifdef __cplusplus
typedef ::std::complex< double > __pyx_t_double_complex;
#else
typedef double _Complex __pyx_t_double_complex;
#endif
#else
typedef struct { double real, imag; } __pyx_t_double_complex;
#endif
In case of a C++-extension, Cython's double complex is translated to std::complex<double> and thus cannot be called with cabs( double complex z ) because std::complex<double> isn't double complex.
So actually, it is your "fault": you lied to Cython and told him, that cabs has the signature double cabs(std::complex<double> z), but it was not enough to fool the c++-compiler.
That means, in c++-module std::abs(std::complex<double>) could be used, or just Cython's/Python's abs, which is automatically translated to the right function (this is however not possible for all standard-function).
In case of the C-extension, because you have included complex.h as an so called "early include" with cdef extern from "complex.h", thus for the above defines _Complex_I becomes defined and Cython's complex becomes an alias for double complex and thus cabs can be used.
Probably the right thing for Cython would be to always use the fallback per default and that the user should be able to choose the desired implementation (double complex or std::complex<double>) explicitly.

Cython: declare a PyCapsule_Destructor in pyx file

I don't know python and trying to wrap an existing C library that provides 200 init functions for some objects and 200 destructors with help of PyCapsule. So my idea is to return a PyCapsule from init functions` wrappers and forget about destructors that shall be called automatically.
According to documentation PyCapsule_New() accepts:
typedef void (*PyCapsule_Destructor)(PyObject *);
while C-library has destructors in a form of:
int foo(void*);
I'm trying to generate a C function in .pyx file with help of cdef that would generate a C-function that will wrap library destructor, hide its return type and pass a pointer taken with PyCapsule_GetPointer to destructor. (pyx file is programmatically generated for 200 functions).
After a few experiments I end up with following .pyx file:
from cpython.ref cimport PyObject
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_IsValid, PyCapsule_GetPointer
cdef void stateFree( PyObject *capsule ):
cdef:
void * _state
# some code with PyCapsule_GetPointer
def stateInit():
cdef:
void * _state
return PyCapsule_New(_state, "T", stateFree)
And when I'm trying to compile it with cython I'm getting:
Cannot assign type 'void (PyObject *)' to 'PyCapsule_Destructor'
using PyCapsule_New(_state, "T", &stateFree) doesn't help.
Any idea what is wrong?
UPD:
Ok, I think I found a solution. At least it compiles. Will see if it works. I'll bold the places I think I made a mistake:
from cpython.ref cimport PyObject
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_IsValid, PyCapsule_GetPointer, PyCapsule_Destructor
cpdef void stateFree( object capsule ):
cdef:
void* _state
_state = PyCapsule_GetPointer(capsule, "T")
print('destroyed')
def stateInit():
cdef:
int _state = 1
print ("initialized")
return PyCapsule_New(_state, "T", < PyCapsule_Destructor >stateFree)
The issue is that Cython distinguishes between
object - a Python object that it knows about and handles the reference-counting for, and
PyObject*, which as far as it's concerned is a mystery type that it basically nothing about except that it's a pointer to a struct.
This is despite the fact that the C code generated for Cython's object ends up written in terms of PyObject*.
The signature used by the Cython cimport is ctypedef void (*PyCapsule_Destructor)(object o) (which isn't quite the same as the C definition. Therefore, define the destructor as
cdef void stateFree( object capsule ):
Practically in this case the distinction makes no difference. It matters more in cases where a function steals a reference or returns a borrowed reference. Here capsule has the same reference count on both the input and output of the function whether Cython manages it or not.
In terms of your edited-in solution:
cpdef is wrong for stateFree. Use cdef since it is not a function that should be exposed in a Python interface (and if you use cpdef it isn't obvious whether the Python or C version is passed as a function pointer).
You shouldn't need the cast to PyCapsule_Destructor and should avoid it because casts can easily hide bugs.
Can I just take a moment to express my general dislike for PyCapsule (it's occasionally useful for passing an opaque type through Python code without touching it, but for anything more I think it's usually better to wrap it properly in a cdef class). It's possible you've thought about it and it is the right tool for the job, but I'm putting this warning in to try to discourage people in the future who might be trying to use it on a more "copy-and-paste" basis.

Typing a list of lists of strings in Cython

I have a Cython function that's receiving a list of lists of strings:
cdef cbuild(char*** corpus, int state):
# corpus is a list of lists of strings
cdef char** run
for run in corpus:
# run is a list of strings
...
I'd like to be able to type the corpus in order to elicit speedups from Cython. The problem is, it's a pretty complex type, and char*** doesn't seem to work (and thus I have no way of knowing if char** for run works).
This function is the bottleneck for my Python application, which is why I'm rewriting it in Cython. What can I do to get the most out of Cython by typing these complex objects? Is there some other way I can organise my data to avoid these problems?
It's easy with C++:
from libcpp.vector import vector
from libcpp.string import string
cdef cbuild(vector[vector[string]] corpus, int state):
cdef vector[string] run
cdef string word
for run in corpus:
for word in run:
...
Just make sure that language="c++" is passed to the Cython compiler (e.g. as a kwarg to setuptools.Extension)

non-member operator in Cython

I'm currently doing a Cython wrapper for an existing C++ library. I have an overloaded non-member operator in C++ like
Data operator+(Data const& a, Data const& b)
And in the pxd file describing the header, I wrote
cdef extern from 'blabla.h':
Data operator+(const Data&, const Data&)
Now how can I use this operator+ in another pyx file?
For very simple cases, like in your example you can lie to Cython and tell it that the operator is a member function:
cdef extern from 'blabla.h':
cdef cppclass Data:
# the rest of the data definitions
Data operator+(const Data&)
It only uses this information to know that it can translate the code a+b (where a and b are data objects) to __pyx_v_a + __pyx_v_b and let the c++ compiler do the rest (which it knows how to because of the import from "blabla.h"). Therefore, the distinction between member and non-member is irrelevant.
However: one of the main reasons to use nonmember operators is to allow things like
Data operator+(int a, const Data& b);
You can make this work, but it's very slightly messy. In your pxd file do
cdef extern from 'blabla.h':
Data operator+(int, const Data&) # i.e. do nothing special
In your pyx file
from my_pxd_file import * # works fine
## but this isn't accepted unfortunately:
from my_pxd_file import operator+
If you wanted to avoid too much namespace pollution from doing import *, you could probably create a pxd file that only contains operators and not the class definitions (I haven't tested this though)
In conclusion - two methods depending on how complicated your use-case is...

When and how does cython do boundscheck?

c doesn't do bounds check. So how does cython do the check if it compiles to c?
%%cython --annotate
cimport cython
#cython.boundscheck(True)
cpdef myf():
cdef double pd[8]
for i in range(100):
pd[i] = 0
print pd[i]
The above code compiles to the same C code no matter whether I set True or False for boundscheck. And if I run myf() there is no warnings (it happens to not crash...).
Update
So cython doens't do bounds check on c arrays anyway.
http://docs.cython.org/src/reference/compilation.html#compiler-directives
"Cython is free to assume that indexing operations ([]-operator) in the code will not cause any IndexErrors to be raised. Lists, tuples, and strings are affected..."
I think in your code a C double array doesn't store its length anywhere, and so it's impossible for Cython to do any useful checks (except in your very trivial example). However, a built in Python type which can raise IndexErrors should be different (I'd assume numpy arrays, python arrays and cython memoryviews should also be affected since they all have a mechanism for Cython to tell if it's gone off the end).