Converting Octave to Use CuBLAS - cuda

I'd like to convert Octave to use CuBLAS for matrix multiplication. This video seems to indicate this is as simple as typing 28 characters:
Using CUDA Library to Accelerate Applications
In practice it's a bit more complex than this. Does anyone know what additional work must be done to make the modifications made in this video compile?
UPDATE
Here's the method I'm trying
in dMatrix.cc add
#include <cublas.h>
in dMatrix.cc change all occurences of (preserving case)
dgemm
to
cublas_dgemm
in my build terminal set
export CC=nvcc
export CFLAGS="-lcublas -lcudart"
export CPPFLAGS="-I/usr/local/cuda/include"
export LDFLAGS="-L/usr/local/cuda/lib64"
the error I receive is:
libtool: link: g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast
-Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -g -O2
-o .libs/octave octave-main.o -L/usr/local/cuda/lib64
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath
-Wl,/usr/local/lib/octave/3.7.5
../liboctave/.libs/liboctave.so: undefined reference to `cublas_dgemm_'

EDIT2:
The method described in this video requires the use of the fortran "thunking library" bindings for cublas.
These steps worked for me:
Download octave 3.6.3 from here:
wget ftp://ftp.gnu.org/gnu/octave/octave-3.6.3.tar.gz
extract all files from the archive:
tar -xzvf octave-3.6.3.tar.gz
change into the octave directory just created:
cd octave-3.6.3
make a directory for your "thunking cublas library"
mkdir mycublas
change into that directory
cd mycublas
build the "thunking cublas library"
g++ -c -fPIC -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -o fortran_thunking.o /usr/local/cuda/src/fortran_thunking.c
ar rvs libmycublas.a fortran_thunking.o
switch back to the main build directory
cd ..
run octave's configure with additional options:
./configure --disable-docs LDFLAGS="-L/usr/local/cuda/lib64 -lcublas -lcudart -L/home/user2/octave/octave-3.6.3/mycublas -lmycublas"
Note that in the above command line, you will need to change the directory for the second -L switch to that which matches the path to your mycublas directory that you created in step 4
Now edit octave-3.6.3/liboctave/dMatrix.cc according to the instructions given in the video. It should be sufficient to replace every instance of dgemm with cublas_dgemm and every instance of DGEMM with CUBLAS_DGEMM. In the octave 3.6.3 version I used, there were 3 such instances of each (lower case and upper case).
Now you can build octave:
make
(make sure you are in the octave-3.6.3 directory)
At this point, for me, Octave built successfully. I did not pursue make install although I assume that would work. I simply ran octave using the ./run-octave script in the octave-3.6.3 directory.
The above steps assume a proper and standard CUDA 5.0 install. I will try to respond to CUDA-specific questions or issues, but there are any number of problems that may arise with a general Octave install on your platform. I'm not an octave expert and I won't be able to respond to those. I used CentOS 6.2 for this test.
This method, as indicated, involves modification of the C source files of octave.
Another method was covered in some detail in the S3527 session at the GTC 2013 GPU Tech Conference. This session was actually a hands-on laboratory exercise. Unfortunately the materials on that are not conveniently available. However the method there did not involve any modification of GNU Octave source, but instead uses the LD_PRELOAD capability of Linux to intercept the BLAS library calls and re-direct (the appropriate ones) to the cublas library.
A newer, better method (using the NVBLAS intercept library) is discussed in this blog article

I was able to produce a compiled executable using the information supplied. It's a horrible hack, but it works.
The process looks like this:
First produce an object file for fortran_thunking.c
sudo /usr/local/cuda-5.0/bin/nvcc -O3 -c -DCUBLAS_GFORTRAN fortran_thunking.c
Then move that object file to the src subdirectory in octave
cp /usr/local/cuda-5.0/src/fortran_thunking.o ./octave/src
run make. The compile will fail on the last step. Change to the src directory.
cd src
Then execute the failing final line with the addition of ./fortran_thunking.o -lcudart -lcublas just after octave-main.o. This produces the following command
g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast -Wformat
-Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual
-I/usr/local/cuda/include -o .libs/octave octave-main.o
./fortran_thunking.o -lcudart -lcublas -L/usr/local/cuda/lib64
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath
-Wl,/usr/local/lib/octave/3.7.5
An octave binary will be created in the src/.libs directory. This is your octave executable.

In a most recent version of CUDA you don't have to recompile anything. At least as I found in Debian. First, create a config file for NVBLAS (a cuBLAS wrapper). It won't work without it, at all.
tee nvblas.conf <<EOF
NVBLAS_CPU_BLAS_LIB $(dpkg -L libopenblas-base | grep libblas)
NVBLAS_GPU_LIST ALL
EOF
Then use Octave as you would usually do running it with:
LD_PRELOAD=libnvblas.so octave
NVBLAS will do what it can on a GPU while relaying everything else to OpenBLAS.
Further reading:
Benchmark for Octave.
Relevant slides for NVBLAS presentation.
Manual for nvblas.conf
Worth noting that you may not enjoy all the benefits of GPU computing depending on used CPU/GPU: OpenBLAS is quite fast with current multi-core processors. So fast that time spend copying data to GPU, working on it, and copying back could come close to time needed to do the job right on CPU. Check for yourself. Though GPUs are usually more energy efficient.

Related

Set default host compiler for nvcc

I have just installed Debian Stretch (9) and Cuda 8 on a new GPU server. Stretch does not come with older versions of gcc, so I need to use clang as the host compiler (nvcc does not support gcc-6). I can do this invoking nvcc as:
nvcc -ccbin clang-3.8
Is there any way to acheive this system wide - e.g. in cuda config or an environment variable?
Documentation of nvcc does not list any env variable to change ccbin, only the option:
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
--compiler-bindir directory, -ccbin Specify the directory in which the compiler executable resides. The host compiler executable name can be also specified to ensure that the correct host compiler is selected.
Linux guide have no such info too: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
You may try creating some nvcc wrapper script and putting it earlier in the PATH env var like:
mkdir ~/supernvcc
echo '#!/bin/sh' > ~/supernvcc/nvcc
echo `which nvcc` -ccbin clang-3.8 '$#' >> ~/supernvcc/nvcc
chmod +x ~/supernvcc/nvcc
export PATH=/home/`id -un`/supernvcc:$PATH
(repeat last line with export in every new shell before using nvcc or add it to your .bashrc or other shell init script)
PS: and nvcc is bash script too, you can just copy it and edit:
cat `which nvcc`
UPDATE: People recommend to link correct gcc version to the internal dir /usr/local/cuda/bin/ of cuda:
sudo ln -s /usr/bin/gcc-4.4 /usr/local/cuda/bin/gcc
you can use NVCC_PREPEND_FLAGS and NVCC_APPEND_FLAGS as described in the official docs to inject -ccbin into all calls of nvcc.
For example, I have the following in my ~/.bash_profile:
export NVCC_PREPEND_FLAGS='-ccbin /home/linuxbrew/.linuxbrew/bin/g++-11'

WordNet: file was built for i386 which is not the architecture being linked (x86_64)

I get the following error when running make trying to compile WordNet 3.0:
gcc -m64 -g -O2 -o wishwn wishwn-tkAppInit.o wishwn-stubs.o -L../lib -lWN - F/Library/Frameworks -framework Tk -F/Library/Frameworks -framework Tcl -lpthread -framework CoreFoundation -framework Cocoa -framework Carbon -framework IOKit -lz -lpthread -framework CoreFoundation
ld: warning: ignoring file /Library/Frameworks/Tk.framework/Tk, file was built for i386 which is not the architecture being linked (x86_64): /Library/Frameworks/Tk.framework/Tk
Undefined symbols for architecture x86_64:
"_Tk_Init", referenced from:
_Tcl_AppInit in wishwn-tkAppInit.o
"_Tk_MainEx", referenced from:
_main in wishwn-tkAppInit.o
"_Tk_SafeInit", referenced from:
_Tcl_AppInit in wishwn-tkAppInit.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [wishwn] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
I'm running Tcl 8.6 installed via ActiveTcl.
Any ideas?
Each executable must (under all normal circumstances) be built for a single architecture, e.g., for just i386 (32-bit Intel-ish) or x86_64 (64-bit Intel-ish), as while it is possible to have a single CPU support multiple architectures, switching between them is formally a context switch and so is only done between the OS kernel and your code. (Some platforms — notably OSX — have supported multi-architecture binaries — “fat” binaries — in the past, but that's very much like having a single meta-architecture, and you're not supposed to do that now.) This in turn means that the libraries that are linked in, whether shared libraries or not, must also match the architecture you're building for.
Unfortunately, the libraries that you're trying to actually use are built for a different architectures: the ActiveState build you're trying to link against looks like it is for i386, yet you're trying to make a 64-bit version. That won't work.
The simplest way is probably to get the sources for Tcl and Tk and build them in the configuration you want; that's what I do on OSX, and it's easy. Grab the sources from the official release location on SourceForge. To build Tcl for OSX, run:
cd tcl8.6.1/unix
./configure --prefix=/where/you/will/install --enable-64bit
make
make test
sudo make install
You might pass in --enable-framework to configure (but I don't), you might omit the make test and you might not need to use sudo with make install (it depends on where you're installing to).
For Tk, you do:
cd tk8.6.1/unix
./configure --prefix=/where/you/will/install --enable-64bit --enable-aqua
make
sudo make install
You might need to pass in --with-tcl=/location/of/tclConfig.sh (where that's the tclConfig.sh that you just installed above) but the build scripts find it automatically if you've put the Tk sources next to the Tcl sources. There's a test suite for Tk, but it's much more intrusive than the Tcl one (since it has to pop up windows, steal the focus, grab the mouse pointer, that sort of thing; it's a GUI toolkit, so you should expect that sort of thing) which means I can totally understand you not running the unit tests.
Make sure you have i386 and x86_64 listed in your Architectures in Build settings for your lib. Also set Build Active Architecture Only explicitly to No.

nvcc compiled object on windows not a valid file format

Since I did not have access to a nVIDIA card, I was using GPUOcelot to compile and run my programs. Since I had separated out my cuda kernel and the C++ programs in two separate files (since I was using C++11 features) I was doing the following to run my program.
nvcc -c my_kernel.cu -arch=sm_20
g++ -std=c++0x -c my_main.cpp
g++ my_kernel.o my_main.o -o latest_output.o 'OcelotConfig -l'
I have recently been given access to a Windows box which has a nVIDIA card. I downloaded the CUDA toolkit for windows and mingw g++. Now I run
nvcc -c my_kernel.cu -arch=sm_20
g++ -std=c++0x -c my_main.cpp
The nvcc call now instead of producing my_kernel.o produces my_kernel.obj. And when I try to link them and run using g++ as I did before
g++ my_kernel.obj my_main.o -o m
I get the following error:
my_kernel.obj: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 status
Could you please resolve the problem? Thanks.
nvcc is a compiler wrapper that invokes the device compiler and the host compiler under the hood (it can also invoke the host linker, but you're using -c so not doing linking). On Windows, the supported host compiler is cl.exe from Visual Studio.
Linking two object files created with two different C++ compilers is typically not possible, even if you are just using CPU only. This is because the ABI is different. The error message you are seeing is simply telling you that the object file format from cl.exe (via nvcc) is incompatible with g++.
You need to compile my_main.cpp with cl.exe, if that's producing errors then that's a different question!

C compiler not finding header files

I'm trying to compile with this command line (in OSX):
gcc -Wall upload-comments.c -l -o upload-comments -I/usr/local/mysql/include/ -L/usr/local/mysql/lib/ -lmysqlclient
It comes back with the error:
upload-comments.c:14:20: error: mysql.h: No such file or directory
mysql.h is clearly in /usr/local/mysql/include/
Why is it not finding the header file?
Thanks.
You need to put the -I... bit before the source files that need it.
gcc -Wall
-I/usr/local/mysql/include/
upload-comments.c
-l
-o upload-comments
-L/usr/local/mysql/lib/
-lmysqlclient
The gcc command is position-sensitive, both for that and the libraries (you should list libraries after the files that need them, for example).
I'm also not sure why you have a naked library name specifier (-l without an actual library name). That seems like it may cause problems to me but it's irrelevant to the specific issue raised.
You may want to remove it if, after you fix your command according to this answer, you still have problems.

apxs on MacOSX having trouble with architecture

I'm trying to set up a development environment on my aging Macbook Pro that matches my Linux EC2 production environment. I'm on the home stretch now, need only to get the mod_auth_mysql plugin for apache working. After a few hours of googling and patching and scratching my head, I think I'm almost there, but I've hit something that nothing I've found online has been able to solve.
nathan#ichigo:/usr/local/mod_auth_mysql-2.9.0$ sudo apxs -c -L/usr/local/mysql/lib -I/usr/local/mysql/include/ -lmysqlclient -lm -lz mod_auth_mysql.c
/usr/share/apr-1/build-1/libtool --silent --mode=compile gcc -DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -I/usr/local/include -I/usr/include/apache2 -I/usr/include/apr-1 -I/usr/include/apr-1 -I/usr/local/mysql/include/ -c -o mod_auth_mysql.lo mod_auth_mysql.c && touch mod_auth_mysql.slo
/usr/share/apr-1/build-1/libtool --silent --mode=link gcc -o mod_auth_mysql.la -L/usr/local/mysql/lib -lmysqlclient -lm -lz -rpath /usr/libexec/apache2 -module -avoid-version mod_auth_mysql.lo
ld: warning: in /usr/local/mysql/lib/libmysqlclient.dylib, file is not of required architecture
ld: warning: in /usr/local/mysql/lib/libz.a, file is not of required architecture
warning: no debug symbols in executable (-arch x86_64)
I think this is complaining because it's trying to build for 64-bit, but I'm on a 32-bit platform? I'm not entirely sure. I've tried forcing a 32-bit build with env ARCHFLAGS and -D arch on apxs, to no avail.
FWIW, I also tried mod_auth_mysql-3.0.0, and hit more or less the same result.
Alternatively, is there a more modern way to auth against mysql in Apache? I didn't find anything else, but this module hasn't gotten any love in a good 5 years, and I had to apply some patches I found scattered about the 'net to even get this far.
First, check what architectures your libraries support with file /usr/local/mysql/lib/libmysqlclient.dylib, etc. Once you know that, I think you can control what apxs builds for by adding flags like -Wc,"-arch i386" -Wl,"-arch i386"