using MKL in cython fails with undefined symbol in shared object from MKL, but direct execution is fine - cython

I'm writting a physics simulation in C using Intel Math Kernel Library (MKL) and want to call it directly from python code using cython.
The cython compilation itself works (program runs error free if MKL is not included in the example) and if I compile my C code directly in gcc with
gcc -O3 -Wall -m64 -I"${MKLROOT}/include" bar.c -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl
it works well. The compiler flags are generated by the MKL Link line Advisor.
But If I now try to compile the same code with cython, I'm getting the error message
INTEL MKL ERROR: /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_avx2.so.1: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8. Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so.1 or libmkl_def.so.1.
I've also tried to compile my program (without the main) into a shared library (.so) to get the MKL linking done outside of cython and then just link against that, but after fiddling that library into LD_LIBRARY_PATH I just got the same behavior again.
Any idea how I can get the linking correct?
I can swap out the used cblas_ function for a different one (e.g. cblas_drot and some more tried) and get the same error.
I've read quite some other questions (many concerning MKL within anaconda, my MKL is installed manually in /opt as the path above shows), including this which tries to use MKL in java and got the same error.
I can reproduce the same results regarding the nm statements (undefined in libmkl_avx2.so.1, but defined in libmkl_gnu_thread.so), but I failed to apply the answer from that question to my problem. If I try to add -lmkl_gnu_thread in the below shown setup.py script, I get different unfulfilled dependencies, fixed by also including -fopenmp to get back to the old error...
further information and used files
LD_LIBRARY_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64:/opt/intel/oneapi/compiler/2021.1.1/linux/lib:/opt/intel/oneapi/compiler/2021.1.1/linux/lib/x64:/opt/intel/oneapi/compiler/2021.1.1/linux/lib/emu:/opt/intel/oneapi/compiler/2021.1.1/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/compiler/2021.1.1/linux/compiler/lib:/opt/intel/oneapi/tbb/2021.1.1/env/../lib/intel64/gcc4.8 (set by the mkl setvars script, obtained by echo $LD_LIBRARY_PATH)
does python maybe do sth weird with this variable?
readelf -d $MKLROOT/lib/intel64/libmkl_avx2.so.1 | grep NEEDED 0x0000000000000001 (NEEDED) returns Shared library: [libdl.so.2], indicating that it claims to only depend on libdl (I don't find the question where I got that command from, but it was also to this topic, just the .1 was missing in the filename)
used files
setup.py
(contains a second version as comamnd, can be run like this)
from Cython.Distutils import build_ext
#from Cython.Build import cythonize
from distutils.extension import Extension
from distutils.core import setup
import numpy
extensions = [
Extension("foo", ["foo.pyx"],
include_dirs=[numpy.get_include()],
extra_compile_args=["-Wall", "-m64", "-I\"${MKLROOT}/include\""],
extra_link_args=["-fopenmp", "-L${MKLROOT}/lib/intel64", "-Wl,--no-as-needed", "-lmkl_gnu_thread", "-lmkl_intel_lp64",
"-lmkl_sequential", "-lmkl_core", "-lpthread", "-lm", "-ldl"])
]
for e in extensions:
e.cython_directives = {'language_level': "3"} #all are Python-3
# produce the same behaviour, first:
setup(ext_modules=cythonize(extensions))
#second
#setup(ext_modules=cythonize(extensions),
# cmdclass = {'build_ext':build_ext})
used with
python setup.py build_ext --inplace
EDIT another version with better usage of keywords, according to comments. It gives the same error.
from Cython.Build import cythonize
from distutils.extension import Extension
from distutils.core import setup
import numpy
extensions = [
Extension("foo", ["foo.pyx"],
include_dirs=[numpy.get_include(), "\"${MKLROOT}/include\""],
libraries=["mkl_intel_lp64", "mkl_sequential", "mkl_core", "pthread", "m", "dl"],
library_dirs=["${MKLROOT}/lib/intel64"],
extra_compile_args=["-Wall", "-m64"],
extra_link_args=["-Wl,--no-as-needed", ])
]
for e in extensions:
e.cython_directives = {'language_level': "3"} #all are Python-3
setup(ext_modules=cythonize(extensions))
foo.pyx
cimport numpy as np
import numpy as np
import ctypes
cdef extern from "bar.c":
void double_elements(int n, double* vec_y)
def func(np.ndarray[np.double_t, ndim=1] y not None):
double_elements(<int> y.size//2, <double*> <size_t> y.__array_interface__['data'][0])
return y
bar.c
#include <mkl.h>
#include <stdio.h>
void double_elements(int n,
double* x) {
cblas_dscal(n, 2., x, 1);
}
#ifndef PY_VERSION_HEX // compile the main only, if not using cython
int main() {
double x[2] = {1., 2.};
double_elements(2, x);
printf("%g %g\n", x[0], x[1]);
return 0;
}
#endif
run.py
(used for testing, quite boring and only calls the function)
import numpy as np
import foo
x = np.array([1., 2.])
y = foo.func(x)
print(x)
print(y)

I've solved the problem by using static linking of MKL instead. Because I also failed to apply all the commands from static linking (it uses groups) in the setup.py script, I've switched to compiling it entirely on my own with
gcc -O3 -Wall -m64 -I"${MKLROOT}/include" -c bar.c -o build/bar.o
cythonize foo.pyx
# exchange /path/to/numpy/ by value given by numpy.get_include()
gcc -Wall -O2 -fstack-protector-strong -fwrapv -fstack-protector-strong -D_FORTIFY_SOURCE=2 -fPIC -I/path/to/numpy/core/include -I/usr/include/python3.8 -c foo.c -o build/foo.o -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION
gcc -shared build/foo.o build/bar.o -o foo.cpython-38-x86_64-linux-gnu.so -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl
This corresponds to the compile steps done by the setup script (without checking the change dates) and with some compile flags removed (it used many duplicates).

Related

Cython Extern C++ Function

I'm trying to extern a c++ function to cython. Here is my code (all files are in the same directory)
function.cpp
int cfunc(int x){
return x;
}
wrapper.pyx
cdef extern from "function.cpp":
cpdef int cfunc(int)
def pyfunc(int x):
return cfunc(x)
setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize
source = ['function.cpp', 'wrapper.pyx']
ext = [Extension('lib', source, language='c++')]
setup(ext_modules=cythonize(ext))
When I run python setup.py build_ext --inplace it gives the following error
/home/hyunix/anaconda3/envs/c-playground/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: build/temp.linux-x86_64-3.7/function.o: in function `cfunc(int)':
function.cpp:(.text._Z5cfunci+0x0): multiple definition of `cfunc(int)'; build/temp.linux-x86_64-3.7/wrapper.o:wrapper.cpp:(.text._Z5cfunci+0x0): first defined here
collect2: error: ld returned 1 exit status
error: command '/home/hyunix/anaconda3/envs/c-playground/bin/x86_64-conda_cos6-linux-gnu-c++' failed with exit status 1
However if I remove language='c++' from setup.py it works fine. Why does this happen?
I'm using:
Python 3.7.9
Cython 0.29.21
Ubuntu 20.04
Well, when you use cpdef int cfunc(int), you're explicitly creating a new C function, and a new python function. If you want to refer to cfunc() as it's externally defined in function.cpp, your signature should be
cdef extern from "function.cpp":
int cfunc(int)
So, when you compile with the language='c++' flag, Cython is giving you an appropriate error. However, when you remove the language flag, Cython needs to reason based on compiler directives whether you're asking for a .c or a .cpp, and it defaults to .c. You should notice that your wrapper is being compiled to .c instead of .cpp when the language argument is removed. In this C compilation, Cython does not recognize the signature in the .cpp, but it does recognize the cpdef. So, no error, but you're getting an empty cfunc function, as opposed to the one defined in cpp.

Can't compile libpd with emmake (Emscripten SDK)

I'm trying to compile libpd to javascript or webassembly using emscripten sdk. According to some docs, if there is a Makefile, it can be compiled by using emmake make, (emconfigure is not used because there is no ./configure file), but I get the following error:
/home/ian/Documents/emsdk/emscripten/1.37.37/emcc.py -DPD -DHAVE_UNISTD_H -DUSEAPI_DUMMY -I./pure-data/src -I./libpd_wrapper -I./libpd_wrapper/util -Wno-int-to-pointer-cast -Wno-pointer-to-int-cast -fPIC -I"/usr/lib/jvm/default-java/include/linux" -DHAVE_LIBDL -ffast-math -funroll-loops -fomit-frame-pointer -O3 -DLIBPD_EXTRA -c -o pure-data/src/d_array.o pure-data/src/d_array.c
pure-data/src/d_array.c:523:2: error: No byte order defined
#error No byte order defined
^
1 error generated.
ERROR:root:compiler frontend failed to generate LLVM bitcode, halting
<integrado>: fallo en las instrucciones para el objetivo 'pure-data/src/d_array.o'
make: *** [pure-data/src/d_array.o] Error 1
Any ideas? Do you think is possible to compile this library?
UPDATE: After tweaking every complaining file as suggested in #zakki 's answer
I get another error:
libpd_wrapper/util/ringbuffer.c:18:12: fatal error: 'stdatomic.h' file not found
#include <stdatomic.h>
That file have this content:
#if __STDC_VERSION__ >= 201112L // use stdatomic if C11 is available
#include <stdatomic.h> // HERE IS WHERE ERROR GOES
#define SYNC_FETCH(ptr) atomic_fetch_or((_Atomic int *)ptr, 0)
#define SYNC_COMPARE_AND_SWAP(ptr, oldval, newval) \
atomic_compare_exchange_strong((_Atomic int *)ptr, &oldval, newval)
//Some other definitions that I didn't put here
I read some threads some time ago about this problem with C++11, how can i fix this?
UPDATE 2: After adding && !defined(__EMSCRIPTEN__) now is able to compile, but I'm getting this warning that I don't understand:
WARNING:root:Dynamic libraries (.so, .dylib, .dll) are currently not
supported by Emscripten. For build system emulation purposes,
Emscripten will now generate a static library file (.bc) with the
suffix '.so'. For best practices, please adapt your build system to
directly generate a static LLVM bitcode library by setting the output
suffix to '.bc.')
Emscripten has endian.h. So add defined(__EMSCRIPTEN__) to ifdef.
#if defined(__linux__) || defined(__CYGWIN__) || defined(__GNU__) || \
defined(ANDROID) || defined(__EMSCRIPTEN__)
#include <endian.h>
#endif
Second, it seems like Emscripten bug.
#if __STDC_VERSION__ >= 201112L && !defined(__EMSCRIPTEN__)

Cython Error linking Shared Library?

I'm trying to use the Cuhre routine provided in the Cuba library. I previously encountered some errors linking a static library to Cython, so I tried to create a shared library with the Cuhre attribute. To do this, I have three files: cuhre.c, cuhre.h, and libcuhre.so (created by compiling cuhre.c).
cuhre.c has a routine tryCuhre that essentially calls on the Cuhre routing provided in the Cuba library. For simplicity, it is just for 2D integration:
double tryCuhre(integrand_t t, void * ud)
{
int comp, nregions, neval, fail;
cubareal integral[NCOMP], error[NCOMP], prob[NCOMP];
Cuhre(2, 1, t, ud, 1,
EPSREL, EPSABS, VERBOSE | LAST,
MINEVAL, MAXEVAL, 13,
STATEFILE, SPIN,
&nregions, &neval, &fail, integral, error, prob);
return (double)integral[0];
}
The variables in all caps (e.g. MINEVAL and SPIN) are all predefined at compile time and constant.
This is my cuhre.h file, which is included by cuhre.c:
#ifndef CUHRE_H_
#define CUHRE_H_
#ifdef __cplusplus
extern "C" {
#endif
typedef double cubareal;
typedef int (*integrand_t)(const int *ndim, const cubareal x[], const int
*ncomp, cubareal f[], void *userdata);
double tryCuhre(integrand_t t, void * ud);
#ifdef __cplusplus
}
#endif
#endif
And after running the set of commands
gcc -Wall -fPIC -c cuhre.c
gcc -shared -o libcuhre.so cuhre.o
I am able to create the shared library libcuhre.so. So far so good. It should be noted that up to this point, the routine works just as I want, i.e. making the executable cuhre from cuhre.c performs correctly.
I am trying to use the tryCuhre routine in a cython file now (execute.pyx). At the top, I have the declarations:
cdef extern from "math.h":
double sin(double x)
double cos(double x)
double sqrt(double x)
double atan(double x)
double exp(double x)
double log(double x)
cdef extern from "cuhre.h":
ctypedef double cubareal
ctypedef int (*integrand_t)(const int *ndim, const cubareal x[], const int *ncomp, cubareal f[], void *userdata)
double tryCuhre(integrand_t t, void * ud)
Finally, to compile, I am using the command
python setup.py build_ext --inplace
on setup.py, which is as follows:
from distutils.core import setup, Extension
from Cython.Distutils import build_ext
from Cython.Build import cythonize
from distutils.extension import Extension
sourcefiles = ['execute.pyx']
ext_modules = [Extension("execute", sourcefiles, library_dirs =
['~/Documents/project/libcuhre.so'],)]
setup(
name = 'execute',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
which compiles the file. However, whenever I try the statement
import execute
in the python shell, it raises the error:
/Documents/project/execute.so: undefined symbol: tryCuhre
I have looked all around for ways to link my artificially created libcuhre.so library, but so far none have worked. How can this issue be fixed? Furthermore, how come my program is able to find all of the methods from the math library (sin, cos, exp, etc.) but not any from my libcuhre.so? (It should also be noted that all of these files are in the same directory, ~/Documents/project.)
Thank you so much for any help!
Dependent source code and libraries need to be included in the Extension, either as libraries to link against, or source files to compile with.
library_dirs only adds to the directories the linker will search for a library but does not link anything so is not sufficient.
In this case, since the C code is self-built and a single .c, it is easiest to compile it together as a source. This also means cuhre.c will be compiled by setuptools itself, automating its compilation on the target distribution.
from setuptools import setup, Extension
from Cython.Distutils import build_ext
from Cython.Build import cythonize
sourcefiles = ['execute.pyx', 'cuhre.c']
ext_modules = [Extension("execute", sourcefiles,
include_dirs=['.'],
depends='cuhre.h',
)]
setup(
name='execute',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
Code also changed to use setuptools, distutils is deprecated and now part of setuptools.

Determining which gencode (compute_, arch_) values I need for nvcc - within CMake

I'm using CMake as a build system for my code, which involves CUDA. I was thinking of automating the task of deciding which compute_XX and arch_XX I need to to pass to my nvcc in order to compile for the GPU(s) on my current machine.
Is there a way to do this:
With the NVIDIA GPU deployment kit?
Without the NVIDIA GPU deployment kit?
Does CMake's FindCUDA help you in determining the values for these switches?
My strategy has been to compile and run a bash script that probes the card and returns the gencode for cmake. Inspiration came from University of Chicago's SLURM. To handle errors or multiple gpus or other circumstances, modify as necessary.
In your project folder create a file cudaComputeVersion.bash and ensure it is executable from the shell. Into this file put:
#!/bin/bash
# create a 'here document' that is code we compile and use to probe the card
cat << EOF > /tmp/cudaComputeVersion.cu
#include <stdio.h>
int main()
{
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop,0);
int v = prop.major * 10 + prop.minor;
printf("-gencode arch=compute_%d,code=sm_%d\n",v,v);
}
EOF
# probe the card and cleanup
/usr/local/cuda/bin/nvcc /tmp/cudaComputeVersion.cu -o /tmp/cudaComputeVersion
/tmp/cudaComputeVersion
rm /tmp/cudaComputeVersion.cu
rm /tmp/cudaComputeVersion
And in your CMakeLists.txt put:
# at cmake-build-time, probe the card and set a cmake variable
execute_process(COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/cudaComputeVersion.bash OUTPUT_VARIABLE GENCODE)
# at project-compile-time, include the gencode into the compile options
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; "${GENCODE}")
# this makes CMake all chatty and allows you to see that GENCODE was set correctly
set(CMAKE_VERBOSE_MAKEFILE TRUE)
cheers
You can use the cuda_select_nvcc_arch_flags() macro in the FindCUDA module for this without any additional scripts when using CMake 3.7 or newer.
include(FindCUDA)
set(CUDA_ARCH_LIST Auto CACHE STRING
"List of CUDA architectures (e.g. Pascal, Volta, etc) or \
compute capability versions (6.1, 7.0, etc) to generate code for. \
Set to Auto for automatic detection (default)."
)
cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS ${CUDA_ARCH_LIST})
list(APPEND CUDA_NVCC_FLAGS ${CUDA_ARCH_FLAGS})
The above sets CUDA_ARCH_FLAGS to -gencode arch=compute_61,code=sm_61 on my machine, for example.
The CUDA_ARCH_LIST cache variable can be configured by the user to generate code for specific compute capabilites instead of automatic detection.
Note: the FindCUDA module has been deprecated since CMake 3.10. However, no equivalent alternative to the cuda_select_nvcc_arch_flags() macro appears to be provided yet in the latest CMake release (v3.14). See this relevant issue at the CMake issue tracker for further details.
A slight improvement over #orthopteroid's answer, which pretty much ensures a unique temporary file is generated, and only requires one instead of two temporary files.
The following goes into scripts/get_cuda_sm.sh:
#!/bin/bash
#
# Prints the compute capability of the first CUDA device installed
# on the system, or alternatively the device whose index is the
# first command-line argument
device_index=${1:-0}
timestamp=$(date +%s.%N)
gcc_binary=$(which g++)
gcc_binary=${gcc_binary:-g++}
cuda_root=${CUDA_DIR:-/usr/local/cuda}
CUDA_INCLUDE_DIRS=${CUDA_INCLUDE_DIRS:-${cuda_root}/include}
CUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY:-${cuda_root}/lib64/libcudart.so}
generated_binary="/tmp/cuda-compute-version-helper-$$-$timestamp"
# create a 'here document' that is code we compile and use to probe the card
source_code="$(cat << EOF
#include <stdio.h>
#include <cuda_runtime_api.h>
int main()
{
cudaDeviceProp prop;
cudaError_t status;
int device_count;
status = cudaGetDeviceCount(&device_count);
if (status != cudaSuccess) {
fprintf(stderr,"cudaGetDeviceCount() failed: %s\n", cudaGetErrorString(status));
return -1;
}
if (${device_index} >= device_count) {
fprintf(stderr, "Specified device index %d exceeds the maximum (the device count on this system is %d)\n", ${device_index}, device_count);
return -1;
}
status = cudaGetDeviceProperties(&prop, ${device_index});
if (status != cudaSuccess) {
fprintf(stderr,"cudaGetDeviceProperties() for device ${device_index} failed: %s\n", cudaGetErrorString(status));
return -1;
}
int v = prop.major * 10 + prop.minor;
printf("%d\\n", v);
}
EOF
)"
echo "$source_code" | $gcc_binary -x c++ -I"$CUDA_INCLUDE_DIRS" -o "$generated_binary" - -x none "$CUDA_CUDART_LIBRARY"
# probe the card and cleanup
$generated_binary
rm $generated_binary
and the following goes into CMakeLists.txt or a CMake module:
if (NOT CUDA_TARGET_COMPUTE_CAPABILITY)
if("$ENV{CUDA_SM}" STREQUAL "")
set(ENV{CUDA_INCLUDE_DIRS} "${CUDA_INCLUDE_DIRS}")
set(ENV{CUDA_CUDART_LIBRARY} "${CUDA_CUDART_LIBRARY}")
set(ENV{CMAKE_CXX_COMPILER} "${CMAKE_CXX_COMPILER}")
execute_process(COMMAND
bash -c "${CMAKE_CURRENT_SOURCE_DIR}/scripts/get_cuda_sm.sh"
OUTPUT_VARIABLE CUDA_TARGET_COMPUTE_CAPABILITY_)
else()
set(CUDA_TARGET_COMPUTE_CAPABILITY_ $ENV{CUDA_SM})
endif()
set(CUDA_TARGET_COMPUTE_CAPABILITY "${CUDA_TARGET_COMPUTE_CAPABILITY_}"
CACHE STRING "CUDA compute capability of the (first) CUDA device on \
the system, in XY format (like the X.Y format but no dot); see table \
of features and capabilities by capability X.Y value at \
https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications")
execute_process(COMMAND
bash -c "echo -n $(echo ${CUDA_TARGET_COMPUTE_CAPABILITY})"
OUTPUT_VARIABLE CUDA_TARGET_COMPUTE_CAPABILITY)
execute_process(COMMAND
bash -c "echo ${CUDA_TARGET_COMPUTE_CAPABILITY} | sed 's/^\\([0-9]\\)\\([0-9]\\)/\\1.\\2/;' | xargs echo -n"
OUTPUT_VARIABLE FORMATTED_COMPUTE_CAPABILITY)
message(STATUS
"CUDA device-side code will assume compute capability \
${FORMATTED_COMPUTE_CAPABILITY}")
endif()
set(CUDA_GENCODE
"arch=compute_${CUDA_TARGET_COMPUTE_CAPABILITY}, code=compute_${CUDA_TARGET_COMPUTE_CAPABILITY}")
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -gencode ${CUDA_GENCODE} )

What's wrong with my Cython?

I tried to write a Cython wrapper around the C++ library http://primesieve.org/
It wraps a single function count. So far, it installs correctly python setup.py install, but when I import primesieve the function primesieve.count is missing. Any ideas?
primesieve.pxd (following http://docs.cython.org/src/tutorial/clibraries.html)
cdef extern from "stdint.h":
ctypedef unsigned long long uint64_t
cdef extern from "primesieve/include/primesieve.h":
uint64_t primesieve_count_primes(uint64_t start, uint64_t stop)
primesieve.pyx
cimport primesieve
cpdef int count(self, int n):
return primesieve.primesieve_count_primes(1, n)
setup.py
from setuptools import setup, Extension
from Cython.Build import cythonize
setup(
ext_modules = cythonize([Extension("*", ["primesieve.pyx"], include_dirs = ["primesieve/include"])])
)
Modify setup.py to link against libprimesieve.so by adding libraries = ["primesieve"] to you arguments to the Extension constructor. Without it, you'll get this error:
ImportError: ./primesieve.so: undefined symbol: primesieve_count_primes
Once I changed setup.py, it worked for me:
$ python2 setup.py build
...
$ (cd build/lib.linux-x86_64-2.7 && python2 -c 'import primesieve; print primesieve.count(None, 5)')
3