Why are libcpp.vector so slow? - cython

I'm making a few experimentations using Cython to determine whether I should use it to speed up some parts of my Django modules.
Since I'm using a lot of strings and lists of strings, I tried this:
from libcpp.string cimport string
from libcpp.vector cimport vector
def cython_test_string():
cdef unsigned short int i = 0
cdef string s
# cdef vector[string] ls
for i in range(10000):
s.append('a')
# ls.push_back(s)
def python_test_string():
s = ''
# ls = []
for i in range(10000):
s += 'a'
# ls.append(s)
Of course, C++ strings are faster than Python objects, so we get these results (for 100 iterations):
Cython : 0.0110609531403 seconds
Python : 0.193608045578 seconds
But when we uncomment the four lines that deal with vector and list, we get this:
Cython : 2.80126094818 seconds
Python : 2.13021802902 seconds
Am I missing something, or are vectors of strings extremely slow?

Related

Cythonize: check if word in list of strings is a substring of another string

I want to iterate over a list of input words list_words and check if any belongs to an input string.
I tried to cythonize the code but when I annotate it I see almost all of it in yellow, suggesting python interactions.
Not sure how I could speedup this:
cpdef cy_check_any_word_is_substring(list_words, string):
cdef unicode w
cdef unicode s_lowered = string.lower()
for w in list_words:
if w in s_lowered:
return True
return False
Example
# all words in list_words are lower cased
list_words = ['cat', 'dog', 'eat', 'seat']
input_string = 'The animal saw the Dog and started to make noises'
# should return true
cy_check_any_word_is_substring(list_words, input_string)
Note I want to make the code work independently if characters are capitalized or not (that is why I do string.lower()), I assume the input list of words is already lowered.
Update
I wonder if a solution that uses C++ could be faster.
I don't know C++ though, I tried
from libcpp.vector cimport vector
from libcpp.string cimport string
cpdef cy_check_any_word_is_substring(vector[string] list_words,string string):
s_lowered = string.lower()
for w in list_words:
if w in s_lowered:
return True
return False
But it produces the error
Invalid types for 'in' (string, Python object)
Update 2
I tried the following to avoid the error presented in the previous section update.
from libcpp.vector cimport vector
from libcpp.string cimport string,npos
cdef bint cy_check_w_substring(string s_lowered, vector[string] list_words):
cdef string w
for w in list_words:
if s_lowered.find(w) !=npos:
return True
return False
cpdef cy3_check_any_word_is_substring(words_bytes, input_string):
cdef bint result = False
s_lowered = input_string.lower()
result = cy_check_w_substring(bytes(s_lowered, 'utf8'), words_bytes)
return result
This can be used changing the original list of words as a list of bytes.
# all words in list_words are lower cased
list_words = ['cat', 'dog', 'eat', 'seat']
list_words_bytes = [bytes(w,'utf8') for w in list_words]
input_string = 'The animal saw the Dog and started to make noises'
# should return true
cy3_check_any_word_is_substring(list_words_bytes, input_string)
Nevertheless this is much slower
%%timeit
cy3_check_any_word_is_substring(list_words_bytes, input_string)
#1.01 µs ± 3.16 ns per loop
%%timeit
cy_check_any_word_is_substring(list_words, input_string)
#190 ns ± 0.773 ns per loop
Note cy3_check_any_word_is_substring internally casts s_lowered as bytes but this already takes 145 ns which is almost the cost of cy_check_any_word_is_substring which makes this approach clearly not viable.
%%timeit
bytes(input_string, 'utf8')
#145 ns ± 0.55 ns per loop
The basic problem with the C++ solution is that if you pass it a Python iterable there's a hidden type conversion. So it has to iterate through the entire list and then convert every string to a C++ string. For this reason I doubt it'll give you much benefit.
If you can generate the data as a C++ vector without the type conversion then it may work better. For this you should use a cdef function instead of a cpdef function (I rarely like cpdef functions because they're usually the worst of both worlds).
The specific problems you have:
The C++ string class does not have a .lower() function, so the line s_lowered = string.lower() is implicitly converting it back to a Python bytes then calling .lower() on that. You'll have to implement .lower yourself (or convert to the C++ string after calling .lower on the Python object).
w in s_lowered isn't implemented for C++ strings. You want s_lowered.find(w) != npos (where npos is cimported from libcpp.string).

Need to store "untyped" memory view and cast to typed quickly. Is there an reinterpret_cast equivalent in Cython?

Actually I have a working version of what I need, but it is awfully slow:
%%cython -a
cimport numpy as cnp
import numpy as np # for np.empty
from time import time
cdef packed struct Bar:
cnp.int64_t dt
double open
double high
double low
double close
cnp.int64_t volume
bar_dtype = np.dtype([('dt', 'i8'), ('open', 'f8'), ('high', 'f8'), ('low', 'f8'), ('close', 'f8'), ('volume', 'i8')])
cpdef f(Bar bar):
cdef cnp.ndarray bars_sar
cdef Bar[:] buffer
bars_sar = np.empty((1000000,), dtype=bar_dtype)
start = time()
for n in range(1000000):
buffer = bars_sar
buffer[0] = bar
print (f'Elapsed: {time() - start}')
return buffer
def test():
sar = f(Bar(1,2,3,4,5,6))
print(sar[0])
return sar
This 1M iterations loop takes about 3 seconds - because of the line takes memory view from numpy structured array:
buffer = bars_sar
The idea is that I need bars_sar untyped. Only when I want to store or read something from it, I want to reinterpret it as a particular type of memory view, and I don't see any problem why it cannot be done fast, but don't know how to do it. I hope, there is something similar to C reinterpret_cast in Cython.
I tried to declare bars_sar as void *, but I'm unable to store memory view address there like:
cdef Bar[:] view = np.empty((5,), dtype=bar_dtype)
bars_sar = <void*>view
or even
cdef Bar[:] view = np.empty((5,), dtype=bar_dtype)
bars_sar = <void*>&view
The first one results in error C2440 that it cannot convert "__Pyx_memviewslice" to "void *"
The second one results in error: "Cannot take address of memoryview slice"
Please suggest

Memoryviews slices in Cython ask for a scalar

I'm trying to create a memoryview to store several vectors as rows, but when I try to change the value of any I got an error, like it is expecting a scalar.
%%cython
import numpy as np
cimport numpy as np
DTYPE = np.float
ctypedef np.float_t DTYPE_t
cdef DTYPE_t[:, ::1] results = np.zeros(shape=(10, 10), dtype=DTYPE)
results[:, 0] = np.random.rand(10)
This trows me the following error:
TypeError: only size-1 arrays can be converted to Python scalars
Which I don't understand given that I want to overwrite the first row with that vector... Any idea about what I am doing wrong?
The operation you would like to use is possible between numpy arrays (Python functionality) or Cython's memory views (C functionality, i.e. Cython generates right for-loops in the C-code), but not when you mix a memory view (on the left-hand side) and a numpy array (on the right-hand side).
So you have either to use Cython's memory-views:
...
cdef DTYPE_t[::1] r = np.random.rand(10)
results[:, 0] = r
#check it worked:
print(results.base)
...
or numpy-arrays (we know .base is a numpy-array):
results.base[:, 0] = np.random.rand(10)
#check it worked:
print(results.base)
Cython's version has less overhead, but for large matrices there won't be much difference.

How to call a cdef method

I'd like to call my cdef methods and improve the speed of my program as much as possible. I do not want to use cpdef (I explain why below). Ultimately, I'd like to access cdef methods (some of which return void) that are members of my Cython extensions.
I tried following this example, which gives me the impression that I can call a cdef function by making a Python (def) wrapper for it.
I can't reproduce these results, so I tried a different problem for myself (summing all the numbers from 0 to n).
Of course, I'm looking at the documentation, which says
The directive cpdef makes two versions of the method available; one fast for use from Cython and one slower for use from Python.
and later (emphasis mine),
This does slightly more than providing a python wrapper for a cdef method: unlike a cdef method, a cpdef method is fully overridable by methods and instance attributes in Python subclasses. It adds a little calling overhead compared to a cdef method.
So how does one use a cdef function without the extra calling overhead of a cpdef function?
With the code at the end of this question, I get the following results:
def/cdef:
273.04207632583245
def/cpdef:
304.4114626176919
cpdef/cdef:
0.8969507060538783
Somehow, cpdef is faster than cdef. For n < 100, I can occasionally get cpdef/cdef > 1, but it's rare. I think it has to do with wrapping the cdef function in a def function. This is what the example I link to does, but they claim better performance from using cdef than from using cpdef.
I'm pretty sure this is not how you wrap a cdef function while avoiding the additional overhead (the source of which is not clearly documented) of a cpdef.
And now, the code:
setup.py
from setuptools import setup, Extension
from Cython.Build import cythonize
pkg_name = "tmp"
compile_args=['-std=c++17']
cy_foo = Extension(
name=pkg_name + '.core.cy_foo',
sources=[
pkg_name + '/core/cy_foo.pyx',
],
language='c++',
extra_compile_args=compile_args,
)
setup(
name=pkg_name,
ext_modules=cythonize(cy_foo,
annotate=True,
build_dir='build'),
packages=[
pkg_name,
pkg_name + '.core',
],
)
foo.py
def foo_def(n):
sum = 0
for i in range(n):
sum += i
return sum
cy_foo.pyx
def foo_cdef(n):
return foo_cy(n)
cdef int foo_cy(int n):
cdef int sum = 0
cdef int i = 0
for i in range(n):
sum += i
return sum
cpdef int foo_cpdef(int n):
cdef int sum = 0
cdef int i = 0
for i in range(n):
sum += i
return sum
test.py
import timeit
from tmp.core.foo import foo_def
from tmp.core.cy_foo import foo_cdef
from tmp.core.cy_foo import foo_cpdef
n = 10000
# Python call
start_time = timeit.default_timer()
a = foo_def(n)
pyTime = timeit.default_timer() - start_time
# Call Python wrapper for C function
start_time = timeit.default_timer()
b = foo_cdef(n)
cTime = timeit.default_timer() - start_time
# Call cpdef function, which does more than wrap a cdef function (whatever that means)
start_time = timeit.default_timer()
c = foo_cpdef(n)
cpTime = timeit.default_timer() - start_time
print("def/cdef:")
print(pyTime/cTime)
print("def/cpdef:")
print(pyTime/cpTime)
print("cpdef/cdef:")
print(cpTime/cTime)
The reason for your seemingly anomalous result is that you aren't calling the cdef function foo_cy directly, but instead the def function foo_cdef wrapping it.
when you are wrapping inside another def indeed you are again calling the python function. However you should be able to reach similar results as the cpdef.
Here is what you could do:
like the python def, give the type for both input and output
def foo_cdef(int n):
cdef int val = 0
val = foo_cy(n)
return val
this should have similar results as cpdef, however again you are calling a python function. If you want to directly call the c function, you should use the ctypes and call from there.
and for the benchmarking, the way that you have written, it only considers one run and could fluctuate a lot due OS other task and as well the timer.
better to use the timeit builtin method to calculate for some iteration:
# Python call
pyTime = timeit.timeit('foo_def(n)',globals=globals(), number=10000)
# Call Python wrapper for C function
cTime = timeit.timeit('foo_cdef(n)',globals=globals(), number=10000)
# Call cpdef function, which does more than wrap a cdef function (whatever that means)
cpTime = timeit.timeit('foo_cpdef(n)',globals=globals(), number=10000)
output:
def/cdef:
154.0166154428522
def/cpdef:
154.22669848136132
cpdef/cdef:
0.9986378296327566
like this, you get consistent results and as well you see always close to 1 for both either cython itself wraps or we explicitly wrap around a python function.

Using Cython extension module to wrap std::vector - How do I program __setitem__() method?

This seems like a question that should have an obvious answer, but for some reason I can't find any examples online.
I am wrapping a vector of C++ objects in a Python class using Cython. I also have a Cython wrapper for the C++ class already coded. I can get several methods such as __len__(), __getitem__(), and resize() to work properly, but the __setitem__() method is giving me problems.
For simplicity, I coded a small example using a vector of ints. I figure if I can get this code to work, then I can build on that to get the solution for my C++ class as well.
MyPyModule.pyx
# distutils: language = c++
from libcpp.vector cimport vector
from cython.operator cimport dereference as deref
cdef class MyArray:
cdef vector[int]* thisptr
def __cinit__(self):
self.thisptr = new vector[int]()
def __dealloc__(self):
del self.thisptr
def __len__(self):
return self.thisptr.size()
def __getitem__(self, size_t key):
return self.thisptr.at(key)
def resize(self, size_t newsize):
self.thisptr.resize(newsize)
def __setitem__(self, size_t key, int value):
# Attempt 1:
# self.thisptr.at(key) = value
# Attempt 2:
# cdef int* itemptr = &(self.thisptr.at(key))
# itemptr[0] = value
# Attempt 3:
# (self.thisptr)[key] = value
# Attempt 4:
self[key] = value
When I tried to cythonize using Attempt 1, I got the error Cannot assign to or delete this. When I tried Attempt 2, the .cpp file was created, but the compiler complained that:
error: cannot convert ‘__Pyx_FakeReference<int>*’ to ‘int*’ in assignment
__pyx_v_itemptr = (&__pyx_t_1);
On Attempt 3, Cython would not build the file because Cannot assign type 'int' to 'vector[int]'. (When I tried this style with the C++ object instead of int, it complained because I had a reference as a left-value.) Attempt 4 compiles, but when I try to use it, I get a segfault.
Cython docs say that returning a reference as a left-value is not supported, which is fine -- but how do I get around it so that I can assign a new value to one of my vector elements?
There are two ways to access the vector through a pointer,
def __setitem__(self, size_t key, int value):
deref(self.thisptr)[key] = value
# or
# self.thisptr[0][key] = value
Cython translates those two cases as follows:
Python: deref(self.thisptr)[key] = value
C++: ((*__pyx_v_self->thisptr)[__pyx_v_key]) = __pyx_v_value;
Python: self.thisptr[0][key] = value
C++: ((__pyx_v_self->thisptr[0])[__pyx_v_key]) = __pyx_v_value;
which are equivalent i.e. access the same vector object.
Instead of trying to handle a pointer from Cython code, you can let Cython itself do it for you:
cdef class MyArray:
cdef vector[int] thisptr
def __len__(self):
return self.thisptr.size()
def __getitem__(self, size_t key):
return self.thisptr[key]
def __setitem__(self, size_t key, int value):
self.thisptr[key] = value
def resize(self, size_t newsize):
self.thisptr.resize(newsize)
Is there any problem with this approach?
I have already accepted J.J. Hakala's answer (many thanks!). I tweaked that method to include an out-of-bounds check, since it uses the [] operator instead of the at() method:
cdef class MyArray:
(....)
def __setitem__(self, size_t key, int value):
if key < self.thisptr.size():
deref(self.thisptr)[key] = value
else:
raise IndexError("Index is out of range.")