Can openMP be used like multiprocessing? - cython

I have a problem which is trivially parallelizeable: I need to perform the same operation on 24 cdef objects. I know I could use multiprocess for this, but copying the data/starting a new process takes as long as just doing the computation serially so nothing is gained. Therefore, openMP might be a better alternative.
The operation I'd like to do would look like this with multiprocessing:
multiprocess.map(f, list_of_cython_objects)
Could something like the below work? Why/why not? I understand I'd have to create an array of pointers to the cython objects, I cannot use lists.
from cython.parallel import prange, threadid
with nogil, parallel():
for i in prange(len(list_of_cython_objects), schedule='guided'):
f(list_of_cython_objects[i])

Provided that the majority of f can be done without the GIL (i.e. it uses cdef attributes of the cdef class) then this can be made to work pretty well. The only bit that needs the GIL is indexing the list, and you can easily put that in a with gil block.
An illustrative example is:
from cython.parallel import prange, parallel
cdef class C:
cdef double x
def __init__(self,x):
self.x = x
cdef void f(C c) nogil:
c.x *= 2
def test_function():
list_of_cython_objects = [ C(x) for x in range(20) ]
cdef int l = len(list_of_cython_objects)
cdef int i
cdef C c
with nogil, parallel():
for i in prange(l, schedule='guided'):
with gil:
c = list_of_cython_objects[i]
f(c)
So long as the with gil blocks are small (in terms of the proportion of computation time) then you should get most of the parallelization speed-up that you expect.

Related

Function that returns two matrices in a convenient way

I would like to create a function in Julia that returns two matrices. One way to do that is as follows:
function AB(n,m)
A = rand(n,n)
B = rand(m,m)
return (A = A, B = B)
end
The output looks like this:
julia> AB(2,3)
(A = [0.7001462182920173 0.5485248069467998; 0.8559801029748708 0.8023848206563642], B = [0.7970654693626167 0.08666821253389378 0.45550050243098306; 0.5436826530244554 0.9204593389763813 0.9270606176586167; 0.7055633627200892 0.3702008285594489 0.670758477684624])
The output is not particularly convenient. What I would like to have is something similar to what the function qr() from LinearAlgebra returns. For example:
julia> qr(rand(3,3))
LinearAlgebra.QRCompactWY{Float64,Array{Float64,2}}
Q factor:
3×3 LinearAlgebra.QRCompactWYQ{Float64,Array{Float64,2}}:
-0.789051 -0.570416 -0.228089
-0.213035 -0.0941769 0.972495
-0.576207 0.815939 -0.0472084
R factor:
3×3 Array{Float64,2}:
-0.929496 -0.563585 -0.787584
0.0 0.377304 -0.505203
0.0 0.0 -0.01765
This output is very useful to get at least some idea how these matrices look like.
How can I create a function that returns the matrices like the function qr() from LinearAlgebra?
Any help is much appreciated!
You will need to define your own display method. I present here an abstract approach so you can comfortably reuse it.
We start with a decorator supertype - all structs having this supertype will be nicely displayed.
abstract type Pretty end
And here is the actual implementation:
function Base.display(x::Pretty)
for f in fieldnames(typeof(x))
print(f," is a ")
display(getfield(x,f))
end
end
Let us now see it. You just define any struct, e.g.:
struct S{T} <: Pretty
a::Matrix{T}
b::Matrix{T}
end
And now you have what you need e.g.:
julia> S(rand(2,3), rand(3,2))
a is a 2×3 Matrix{Float64}:
0.40661 0.753072 0.708016
0.371099 0.0948791 0.538046
b is a 3×2 Matrix{Float64}:
0.670715 0.457208
0.353189 0.0248713
0.455794 0.136496

Complex Numbers in Cython - I or 1j?

Whenever I attempt to do simple complex arithmetic in Cython, I seem to get some Python overhead; does this have anything to do with using the Pythonic 1j? At this point, I can't find any way to import the C-style imaginary unit into Cython. Is this possible?
Take for example, the simple Cython function below that converts polar complex numbers into a rectangular form. (Note this is using cos and sin cimported from the "complex.h" library)
cdef float complex rect(float r, float phi):
return r*cos(phi) + r*sin(phi)*1j
This code is converted into the following C code by Cython:
__pyx_t_1 = __Pyx_c_sum(__pyx_t_double_complex_from_parts((__pyx_v_r * cos(__pyx_v_phi)), 0), __Pyx_c_prod(__pyx_t_double_complex_from_parts((__pyx_v_r * sin(__pyx_v_phi)), 0), __pyx_t_double_complex_from_parts(0, 1.0)));
__pyx_r = __pyx_t_float_complex_from_parts(__Pyx_CREAL(__pyx_t_1), __Pyx_CIMAG(__pyx_t_1));
goto __pyx_L0;
Considering the simple nature of this function, it seems like this should be able to be converted into pure C yet it remains partially in the Python realm - is there something that I'm missing in order to make this statement be converted into pure C?
It's possible to use I from complex.h instead of 1j, as in:
cdef extern from "complex.h":
double cos(double x) nogil
double sin(double x) nogil
float complex I
def rect(r, phi):
return crect(r, phi)
cdef float complex crect(float r, float phi):
cdef float rpart = r*cos(phi)
cdef float ipart = r*sin(phi)
return rpart + ipart * I
cython -a reports no Python code for crect.

Scala: Passing Vs applying function

Let's say we have the following code snippet:
List(1, 2, 3)
.map(doubleIt) // passing function
.map(x => doubleIt(x)) // applying function
def doubleIt(i: Int): Int = 2 * i
As you can see we can either pass doubleIt as a function literal or apply it inside another anonymous Lambda. I have always wondered which approach is better. I personally prefer passing a function literal as it seems like second approach would end up creating an extra wrapper Lambda for no good reason, but I am not 100% positive my reasoning is correct.
I am curious to know what the pro/cons of each style are and whether one is definitely better than the other.
This might change in Scala 2.12+, but at the moment both approaches are identical. As a test, I created the following:
class Test {
def testPassingFunction: List[Int] = List(1, 2, 3).map(doubleIt)
def testApplyingFunction: List[Int] = List(1, 2, 3).map(x => doubleIt(x))
def doubleIt(i: Int): Int = 2 * i
}
I then compiled it and used javap to disassemble the bytecode. Both functions are identical (except for different Strings. In all cases a new class that extends from Function1 is created that calls the appropriate method. As #Mike says in the comments, the Scala compiler converts everything to the second form.
It turns out that it depends somewhat on what your "function" is. If it is actually a function (that is, a function value, defined as val doubleIt = (x: Int) => 2 * x), then your hunch is correct. The version in which you pass a function literal that simply applies doubleIt (i.e., l map { x => doubleIt(x) } is compiled just as written, resulting in an anonymous function that delegates to doubleIt. Passing doubleIt as a function value takes out the middle man. If doubleIt is a method, on the other hand, then both forms are compiled identically.
You can easily verify this yourself at the REPL. Define the following class:
class A {
val l = List(1,2,3)
val f = (x: Int) => 2 * x
def g(x: Int) = 2 * x
def m1 = l map f
def m2 = l map { x => f(x) }
def m3 = l map g
def m4 = l map { x => g(x) }
}
Then run :power and :javap -v A.
That said, the distinction is unlikely to make a practical difference in any but the most performance-critical code. In ordinary circumstances, code clarity is the more important consideration and depends somewhat on who will be reading your code in the future. Personally, I tend to prefer the concise lst map doubleIt form; this form eliminates a bunch of syntactic noise that adds nothing semantically. I suppose the longer form may be considered more explicit, especially for developers that aren't very familiar with the map method. The literal reading matches the intent quite well: "(Given) list, map (each) x to doubleIt(x)". Your team will have to decide what's best for you and your organization.

controlling program flow without if-else / switch-case statements

Let's say I have 1000 functions defined as follows
void func dummy1(int a);
void func dummy2(int a, int aa);
void func dummy3(int a, int aa, int aaa);
.
.
.
void func dummy1000(int a, int aa, int aaa, ...);
I want to write a function that takes an integer, n (n < 1000) and calls nth dummy function (in case of 10, dummy10) with exactly n arguments(arguments can be any integer, let's say 0) as required. I know this can be achieved by writing a switch case statement with 1000 cases which is not plausible.
In my opinion, this cannot be achieved without recompilation at run time so languages like java, c, c++ will never let such a thing happen.
Hopefully, there is a way to do this. If so I am curious.
Note: This is not something that I will ever use, I asked question just because of my curiosity.
In modern functional languages, you can make a list of functions which take a list as an argument. This will arguably solve your problem, but it is also arguably cheating, as it is not quite the statically-typed implementation your question seems to imply. However, it is pretty much what dynamic languages such as Python, Ruby, or Perl do when using "manual" argument handling...
Anyway, the following is in Haskell: it supplies the nth function (from its first argument fs) a list of n copies of the second argument (x), and returns the result. Of course, you will need to put together the list of functions somehow, but unlike a switch statement this list will be reusable as a first-class argument.
selectApplyFunction :: [ [Int] -> a ] -> Int -> Int -> a
selectApplyFunction fs x n = (fs !! (n-1)) (replicate n x)
dummy1 [a] = 5 * a
dummy2 [a, b] = (a + 3) * b
dummy3 [a, b, c] = (a*b*c) / (a*b + b*c + c*a)
...
myFunctionList = [ dummy1, dummy2, dummy3, ... ]
-- (myfunction n) provides n copies of the number 42 to the n'th function
myFunction = selectApplyFunction myFunctionList 42
-- call the 666'th function with 666 copies of 42
result = myFunction 666
Of course, you will get an exception if n is greater than the number of functions, or if the function can't handle the list it is given. Note, too, that it is poor Haskell style -- mainly because of the way it abuses lists to (abusively) solve your problem...
No, you are incorrect. Most modern languages support some form of Reflection that will allow you to call a function by name and pass params to it.
You can create an array of functions in most of modern languages.
In pseudo code,
var dummy = new Array();
dummy[1] = function(int a);
dummy[2] = function(int a, int aa);
...
var result = dummy[whateveryoucall](1,2,3,...,whateveryoucall);
In functional languages you could do something like this, in strongly typed ones, like Haskell, the functions must have the same type, though:
funs = [reverse, tail, init] -- 3 functions of type [a]->[a]
run fn arg = (funs !! fn) $ args -- applies function at index fn to args
In object oriented languages, you can use function objects and reflection together to achieve exactly what you want. The problem of the variable number of arguments is solved by passing appropriate POJOs (recalling C stucts) to the function object.
interface Functor<A,B> {
public B compute(A input);
}
class SumInput {
private int x, y;
// getters and setters
}
class Sum implements Functor<SumInput, Integer> {
#Override
public Integer compute(SumInput input) {
return input.getX() + input.getY();
}
}
Now imagine you have a large number of these "functors". You gather them in a configuration file (maybe an XML file with metadata about each functor, usage scenarios, instructions, etc...) and return the list to the user.
The user picks one of them. By using reflection, you can see what is the required input and the expected output. The user fills in the input, and by using reflection you instantiate the functor class (newInstance()), call the compute() function and get the output.
When you add a new functor, you just have to change the list of the functors in the config file.

Using Cython to wrap a library that wraps another library

My goal is to use Cython to wrap the Apohenia library, a C library for scientific computing.
This is an effort to not rebuild the wheel, and Apophenia itself tries to do the same, by basing its structures on those from the GNU Scientific Library:
typedef struct {
gsl_vector *vector;
gsl_matrix *matrix;
gsl_vector *weights;
apop_names *names;
...
} apop_data;
Apophenia provides lots of vector/matrix operations that the GSL either doesn't provide or provides a little awkwardly, but if the GSL has a function, there's no point rewriting it. You should be able to write C code that jumps between the apop_data set as a whole and its GSL parts as often as needed, e.g.:
apop_data *dataset = apop_text_to_data("infile.csv"); //fill the matrix element
gsl_vector *minv = apop_matrix_inverse(dataset->matrix);
apop_data *dinv = apop_matrix_to_data(minv);
apop_data *identity_matrix = apop_dot(dataset, dinv); // I = D * D^-1
dataset->vector = gsl_vector_alloc(10);
gsl_vector_set_all(dataset->vector, 1);
I'm not sure how to wrap this in Cython. The typical method seems to be to provide a Python-side structure that includes an internal copy of the C struct being wrapped:
"""I'm omitting the Cython declarations of the C structs and functions,
which are just translations of the C declarations. Let those be in c_apop."""
cdef class apop_data:
cdef c_apop.apop_data *d
def set(self, row, col, val):
c_apop.apop_data_set(self.d, row, col, val)
def get(self, row, col):
c_apop.apop_data_get(self.d, row, col)
[et cetera]
cdef class gsl_vector:
cdef c_apop.gsl_vector *v
def set(self, row, val):
c_apop.gsl_vector_set(self.v, row)
def get(self, row):
c_apop.gsl_vector_get(self.v, row)
[et cetera]
But now we're stuck, because if we were to get the vector element from the data set,
pyd = apop_data(10)
v = pyd.d.vector
v is a raw C gsl_vector, not a python object, so the next line can't be v.get(0) or v.set(0, 1).
We could add methods to the apop_data class named vector_get and vector_set, that will return a python-wrapped gsl_vector, but that creates its own issues: if the user reallocates the C vector underlying the py-vector from pyv = pyd.get_vector(), how do we guarantee that pyd.d.vector is reallocated with it?
I've tried a couple of things, and I feel like I'm missing the point every time. Any suggestions on how to best design the Cython classes for this situation?
The C Structure should never be exposed to the python side.
I gave a quick look at the library and does not seems to have anything out of the ordinary.
The only situation that you have to track is when the library actually reallocates the underlying vector. Those functions usually will require a pointer to pointer and will update the pointer value to the new allocated structure.
Why do you need to expose the pyd.get_vector ?