How to cross compile cython with zig cc - cython

Is there a way to cross compile cython project with zig cc.
According to this blog zig can cross compile.
An example which cross compiles cython hello world would be great.

You'd need the python headers for the target.
Once you have them, you should be able to run:
# Compile .pyx to .c
cython helloworld.pyx
# Use zig to compile+link
zig build-lib -dynamic -target x86_64-windows \
-I mingw-w64-x86_64-python/mingw64/include/python3.10/ \
-lc \
mingw-w64-x86_64-python/mingw64/bin/libpython3.10.dll \
helloworld.c

Related

how to see the assembly of libc functions in an elf

how can i see the assembly of standard c library functions in an elf? for example, i have a binary that i have the source code of this binary and i know that printf is called in main function. i want to see the assembly of printf function in this elf. please notice that i want to see the assembly in elf itself.
i search a lot but i don't find anything
You can compile with
~$ gcc -static prog.c
while prog.c uses the functions you the assembly of.
That will statically link the libraries used to the binary.
Then you can just:
~$ objdump --disassemble a.out
EDIT
You can even take a simpler way:
just objdump the libc library:
~$ objdump --disassemble /usr/lib/libc.so.6 // or whatever the path of libc is

CUDA *.cpp files

Is there a flag I can pass nvcc to treat a .cpp file like it would a .cu file? I would rather not have to do a cp x.cpp x.cu; nvcc x.cu; rm x.cu.
I ask because I have cpp files in my library that I would like to compile with/without CUDA based on particular flags passed to the Makefile.
Yes, referring to the nvcc documentation the flag is -x:
nvcc -x cu test.cpp
will compile test.cpp as if it were a test.cu file (i.e. pass it through the CUDA toolchain)

linker not working in Xcode -CUDA error [duplicate]

I'm currently trying to build a Cuda project with Cmake on MacOS 10.9. My C and C++ compiler are gcc, but it seems that since Mavericks gcc and g++ links to clang, which is not supported by CUDA.
Has anyone found a good solution to use the real gcc, or to make clang work without "dumpspecs"?
The issue with 10.9 is that gcc is actually clang. Please try latest CUDA toolkit and explicitely point NVCC to use /usr/bin/clang (nvcc -ccbin /usr/bin/clang). This way NVCC will know it's dealing with clang.
This is an extension of the answer provided by Eugene:
The CUDA toolkit download for Mac OSX 10.9 has been posted to the CUDA download page
It supports XCode 5 on 10.9, and will automatically use clang instead of gcc, FYI.
IF you are using XCode 5 on 10.8, please see the release notes:
ยท On Mac OS X 10.8, if you install XCode 5, including the command-line tools, the gcc compiler will get replaced with clang. You can continue to successfully compile with nvcc from the command-line by using the --ccbin /usr/bin/clang option, which instructs nvcc to use the clang compiler instead of gcc to compile any host code passed to it. However, this solution will not work when building with NSight Eclipse Edition. An alternative solution that will work from the command-line and with NSight Eclipse Edition is to download an older version of the command-line tools package from the Apple Developer website after installing XCode 5, which will re-install gcc to /usr/bin. In order to do this, go to
http://developer.apple.com/downloads/index.action
sign in with your Apple ID, and search for command-line tools using the search pane on the left side of the screen.
The problem actually was there before OS X Mavericks, I had the same problem with Moutain Lion and the answer from Eugene is 100% correct.
However, it seems that my Geforce card is not recognized as a CUDA capable device since the upgrade to Mavericks and it seems that people using softwares that use CUDA are having the same problems.
So better not update right now
I have just downloaded CUDA 5.5, installed under Mac OSX 10.8.5, with Xcode 5.0.2, and updated command line tools in Xcode.
But I couldn't get the CUDA sample "nbody" to compile.
I got all kinds of funny error messages, like clang error: unsupported option '-dumpspecs'
I thought I had solved that one by the help of some other web pages, but then other problems kept creeping up (e.g., GLEW not found, CUDA samples path undefined, ...).
(And the provided makefiles and cmake files seemed just too contrived, so that I couldn't find the bug.)
So I rolled my own makefile. I'm posting it here, in the hope that it might help others save some hours.
#!/usr/bin/make -R
# Simple Makefile for a CUDA sample program
# (because the provided ones don't work! :-( )
#
# We assume that all files in this directory produce one executable.
# Set your GPU version in variable NVCC below
#
# Developed and tested under Mac OS X 10.8.5;
# under Linux, you probably need to adjust a few paths, compiler options, and #ifdef's.
#
# Prerequisites:
# - CUDA 5.5
# - The "Command Line Tools" installed via XCode 5
# - DYLD_FALLBACK_LIBRARY_PATH or DYLD_LIBRARY_PATH must include
# /Developer/NVIDIA/CUDA-5.5/lib:/Developer/NVIDIA/CUDA-5.5/samples/common/lib/darwin
#
# GZ Dec 2013
# -------- variables and settings ---------
#
CUDA := /Developer/NVIDIA/CUDA-5.5
NVCC := nvcc -ccbin /usr/bin/clang -arch=sm_30 --compiler-options -Wall,-ansi,-Wno-extra-tokens
# -ccbin /usr/bin/clang is needed with XCode 5 under OSX 10.8
# -arch=sm_30 is needed for my laptop (it does not provide sm_35)
INCFLAGS := -I $(CUDA)/samples/common/inc
TARGET := nbody
OBJDIR := obj
MAKEFLAGS := kR
.SUFFIXES: .cu .cpp .h
ALLSOURCES := $(wildcard *.cu *.cpp)
ALLFILES := $(basename $(ALLSOURCES))
ALLOBJS := $(addsuffix .o,$(addprefix $(OBJDIR)/,$(ALLFILES)))
DEPDIR := depend
# --------- automatic targets --------------
.PHONY: all
all: $(OBJDIR) $(DEPDIR) $(TARGET)
#true
$(OBJDIR):
mkdir $#
# --------- generic rules --------------
UNAME = $(shell uname)
ifeq ($(UNAME), Darwin) # Mac OS X
# OpenGL and GLUT are frameworks
LDFLAGS = -Xlinker -framework,OpenGL,-framework,GLUT,-L,$(CUDA)/samples/common/lib/darwin,-lGLEW
endif
$(TARGET): $(ALLOBJS)
$(NVCC) $^ $(LDFLAGS) -o $#
$(OBJDIR)/%.o: %.cu
$(NVCC) -c $(INCFLAGS) $ $#
$(DEPDIR)/%.d : %.cpp $(DEPDIR)
#echo creating dependencies for $ $#
$(DEPDIR):
mkdir $#
# ------ administrative stuff -------
.PHONY: clean
clean:
rm -f *.o $(TARGET)
rm -rf $(DEPDIR) $(OBJDIR)
echo:
#echo $(ALLSOURCES)
#echo $(ALLFILES)
#echo $(ALLOBJS)
Clang become the default compiler with OsX Maverick release, gcc and g++ commands are redirected to the clang compiler. You can download gcc compiler via homebrew and link gcc and g++ commands back to the gcc compiler via following the steps below.
$ brew install gcc48
[...]
$ sudo mkdir /usr/bin/backup && sudo mv /usr/bin/gcc /usr/bin/g++ /usr/bin/backup
$ sudo ln -s /usr/local/bin/gcc-4.8 /usr/bin/gcc
$ sudo ln -s /usr/local/bin/g++-4.8 /usr/bin/g++
I have found the solution on this article:
http://gvsmirnov.ru/blog/tech/2014/02/07/building-openjdk-8-on-osx-maverick.html#all-you-had-to-do-was-follow-the-damn-train-cj
Here's a Homebrew recipe that works for me on Lion.
$ brew cat memtestG80
require "formula"
# Documentation: https://github.com/Homebrew/homebrew/wiki/Formula-Cookbook
# /usr/local/Library/Contributions/example-formula.rb
# PLEASE REMOVE ALL GENERATED COMMENTS BEFORE SUBMITTING YOUR PULL REQUEST!
# Requires at compile time nvcc from https://developer.nvidia.com/cuda-downloads
# Requires popt at compile time
class Memtestg80 < Formula
homepage ""
url "https://github.com/ihaque/memtestG80/archive/master.zip"
sha1 ""
version "c4336a69fff07945c322d6c7fc40b0b12341cc4c"
# depends_on "cmake" => :build
depends_on :x11 # if your formula requires any X11/XQuartz components
def install
# ENV.deparallelize # if your formula fails when building in parallel
system "make", "-f", "Makefiles/Makefile.osx", "NVCC=/usr/local/cuda/bin/nvcc -ccbin /usr/bin/clang", "POPT_DIR=/usr/local/Cellar/popt/1.16/lib"
system "cp", "-a", "memtestG80", "/usr/local/bin"
end
test do
# `test do` will create, run in and delete a temporary directory.
#
# This test will fail and we won't accept that! It's enough to just replace
# "false" with the main program this formula installs, but it'd be nice if you
# were more thorough. Run the test with `brew test memtestG80`.
#
# The installed folder is not in the path, so use the entire path to any
# executables being tested: `system "#{bin}/program", "do", "something"`.
system "false"
end
end

Converting Octave to Use CuBLAS

I'd like to convert Octave to use CuBLAS for matrix multiplication. This video seems to indicate this is as simple as typing 28 characters:
Using CUDA Library to Accelerate Applications
In practice it's a bit more complex than this. Does anyone know what additional work must be done to make the modifications made in this video compile?
UPDATE
Here's the method I'm trying
in dMatrix.cc add
#include <cublas.h>
in dMatrix.cc change all occurences of (preserving case)
dgemm
to
cublas_dgemm
in my build terminal set
export CC=nvcc
export CFLAGS="-lcublas -lcudart"
export CPPFLAGS="-I/usr/local/cuda/include"
export LDFLAGS="-L/usr/local/cuda/lib64"
the error I receive is:
libtool: link: g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast
-Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -g -O2
-o .libs/octave octave-main.o -L/usr/local/cuda/lib64
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath
-Wl,/usr/local/lib/octave/3.7.5
../liboctave/.libs/liboctave.so: undefined reference to `cublas_dgemm_'
EDIT2:
The method described in this video requires the use of the fortran "thunking library" bindings for cublas.
These steps worked for me:
Download octave 3.6.3 from here:
wget ftp://ftp.gnu.org/gnu/octave/octave-3.6.3.tar.gz
extract all files from the archive:
tar -xzvf octave-3.6.3.tar.gz
change into the octave directory just created:
cd octave-3.6.3
make a directory for your "thunking cublas library"
mkdir mycublas
change into that directory
cd mycublas
build the "thunking cublas library"
g++ -c -fPIC -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -o fortran_thunking.o /usr/local/cuda/src/fortran_thunking.c
ar rvs libmycublas.a fortran_thunking.o
switch back to the main build directory
cd ..
run octave's configure with additional options:
./configure --disable-docs LDFLAGS="-L/usr/local/cuda/lib64 -lcublas -lcudart -L/home/user2/octave/octave-3.6.3/mycublas -lmycublas"
Note that in the above command line, you will need to change the directory for the second -L switch to that which matches the path to your mycublas directory that you created in step 4
Now edit octave-3.6.3/liboctave/dMatrix.cc according to the instructions given in the video. It should be sufficient to replace every instance of dgemm with cublas_dgemm and every instance of DGEMM with CUBLAS_DGEMM. In the octave 3.6.3 version I used, there were 3 such instances of each (lower case and upper case).
Now you can build octave:
make
(make sure you are in the octave-3.6.3 directory)
At this point, for me, Octave built successfully. I did not pursue make install although I assume that would work. I simply ran octave using the ./run-octave script in the octave-3.6.3 directory.
The above steps assume a proper and standard CUDA 5.0 install. I will try to respond to CUDA-specific questions or issues, but there are any number of problems that may arise with a general Octave install on your platform. I'm not an octave expert and I won't be able to respond to those. I used CentOS 6.2 for this test.
This method, as indicated, involves modification of the C source files of octave.
Another method was covered in some detail in the S3527 session at the GTC 2013 GPU Tech Conference. This session was actually a hands-on laboratory exercise. Unfortunately the materials on that are not conveniently available. However the method there did not involve any modification of GNU Octave source, but instead uses the LD_PRELOAD capability of Linux to intercept the BLAS library calls and re-direct (the appropriate ones) to the cublas library.
A newer, better method (using the NVBLAS intercept library) is discussed in this blog article
I was able to produce a compiled executable using the information supplied. It's a horrible hack, but it works.
The process looks like this:
First produce an object file for fortran_thunking.c
sudo /usr/local/cuda-5.0/bin/nvcc -O3 -c -DCUBLAS_GFORTRAN fortran_thunking.c
Then move that object file to the src subdirectory in octave
cp /usr/local/cuda-5.0/src/fortran_thunking.o ./octave/src
run make. The compile will fail on the last step. Change to the src directory.
cd src
Then execute the failing final line with the addition of ./fortran_thunking.o -lcudart -lcublas just after octave-main.o. This produces the following command
g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast -Wformat
-Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual
-I/usr/local/cuda/include -o .libs/octave octave-main.o
./fortran_thunking.o -lcudart -lcublas -L/usr/local/cuda/lib64
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath
-Wl,/usr/local/lib/octave/3.7.5
An octave binary will be created in the src/.libs directory. This is your octave executable.
In a most recent version of CUDA you don't have to recompile anything. At least as I found in Debian. First, create a config file for NVBLAS (a cuBLAS wrapper). It won't work without it, at all.
tee nvblas.conf <<EOF
NVBLAS_CPU_BLAS_LIB $(dpkg -L libopenblas-base | grep libblas)
NVBLAS_GPU_LIST ALL
EOF
Then use Octave as you would usually do running it with:
LD_PRELOAD=libnvblas.so octave
NVBLAS will do what it can on a GPU while relaying everything else to OpenBLAS.
Further reading:
Benchmark for Octave.
Relevant slides for NVBLAS presentation.
Manual for nvblas.conf
Worth noting that you may not enjoy all the benefits of GPU computing depending on used CPU/GPU: OpenBLAS is quite fast with current multi-core processors. So fast that time spend copying data to GPU, working on it, and copying back could come close to time needed to do the job right on CPU. Check for yourself. Though GPUs are usually more energy efficient.

CUDA bandwidthTest.cu

I want to compile and run the bandwidthTest.cu in the CUDA SDK. I face the two following errors when I compile it with:
nvcc -arch=sm_20 bandwidthTest.cu -o bTest
cutil_inline.h: no such file or directory
shrUtils.h: no such file or directory
How can I solve this problem?
Add the current directory to your include search path.
nvcc -I. -arch=sm_20 bandwidthTest.cu -o bTest
Probably the two header files you tried to #include are not available in that directory. If you use the Visual Studio IDE, you can see the red outlining.
Find the path to cutil_inline.h and the path to shrUtils.h and put them in the compilation line in the following way:
nvcc -Ipath to cutil_inline.h -Ipath to shrUtils.h -arch=sm_20 bandwidthTest.cu -o bTest
Also, consider using a makefile for the compilation in case you aren't.