CUDA5 Examples: Has anyone translated some cutil definitions to CUDA5? - cuda

Has anyone started to work with the CUDA5 SDK?
I have an old project that uses some cutil functions, but they've been abandoned in the new one.
The solution was that most functions can be translated from cutil*/cut* to a similar named sdk* equivalent from the helper*.h headers...
As an example:
cutStartTimer becomes sdkCreateTimer
Just that simple...

Has anyone started to work with the CUDA5 SDK?
Probably.
Has anyone translated some cutil definitions to CUDA5?
Maybe. But why not just use the new header files intended to replace it? Quoted from the Beta release notes:
Prior to CUDA 5.0, CUDA Sample projects referenced a utility library
with header and source files called cutil. This has been removed with
the CUDA Samples in CUDA 5.0, and replaced with header files found
in CUDA Samples\v5.0\C\common\inc
helper_cuda.h, helper_cuda_gl.h, helper_cuda_drvapi.h, helper_functions.h,
helper_image.h, helper_math.h, helper_string.h, and helper_timer.h
These files provide utility functions for CUDA device initialization,
CUDA error checking, string parsing, image file loading and saving, and
timing functions. The CUDA Samples projects no longer have references
and dependencies to cutil, and will now use these helper functions
going forward.

Related

Adding CUDA CUB library to a project in Visual Studio

I want to use the CUB library for a CUDA project; I'm using CUDA 11.2 on VS 2019. The CUB version integrated into CUDA 11.2 is 1.10, while I want to use BlockMergeSort which is not available in this version. Therefore, I chose to use cub 1.16.0 . The problem is, I cannot add the new cub header files in my project. I simply add .\cub-1.16.0 to Additional Include Directories in VS project properties, then use #include <cub-1.16.0/cub/block/block_merge_sort.cuh> in my source file (the path is ./cub-1.16.0/cub-1.16.0/cub).
When I do this, several errors in including headers show up, for example:
Cannot include thread_sort.cuh on #include <cub/thread/thread_sort.cuh> which is at the beginning of block_merge_sort.cuh .
I can tell why this gives an error and I can fix it manually (replacing with #include "../thread/thread_sort.cuh".) But what I don't get is the correct way of adding this library since no modifications should be required. Am I missing something when adding a header library to a VS project? And in the case of CUB, is it possible to add another version of this library when an older one already exists?

Wii Broadway disassembly with libopcodes

I want to disassemble Wii game executable binaries in C, which use the broadway microprocessor and unfortunately the only disassembler I am aware that I can use is libopcodes.
Documentation about this library is scarce and I'm using this tutorial https://blog.yossarian.net/2019/05/18/Basic-disassembly-with-libopcodes to get a basic disassembler, from which (after reading) I copy pasted the last complete code snippet. I initially used the default binutils version of Ubuntu 20, which worked for the x86 architecture but immediately segfaulted with no output for my architecture of interest (bfd_arch_powerpc and bfd_mach_ppc_750). I now built from source the latest binutils version (2.39.50), which now demands an fprintf_styled argument (I provided a very simple one which vprintfs to stdout). Now I am getting an a floating point exception on buffer_read_memory (?) when disassembling the tutorial's architecture and a segfault when diassembling mine.
I am not familiar at all with libopcodes and am pretty much blindly following the only tutorial I could find for it on the internet. If anyone could help be up to create a basic powerpc disassembler with libopcodes that disassembles a void* buffer (or at least point me to any resource) it would be greatly appreciated.
A ppc example usage of libbfd can be seen in the disasm() function of qtrace-tools/qtdis. This is used to disassemble a buffer of powerpc64 instructions.
I solved my issue. I had to install binutils-multiarch-dev to support bfd_arch_powerpc and bfd_mach_ppc_750. In my case, I also had to remove my custom installation of binutils because the custom build with no flags apparently does not support PowerPC and dis-asm.h from /usr/local/include was taking priority over the one in /usr/include.

Can I use STL in a DriverKit driver?

Can I use for example std::vector in DriverKit driver in macOs / XCode?
DriverKit has some container class like OSArray
https://developer.apple.com/documentation/driverkit/osarray?language=objc
If I create a new "DriverKit driver" project and include <vector> then I get build errors. Those error comes from including <cstring> and the error is
No member named 'strcpy' in the global namespace
No member named 'strcat' in the global namespace
As far as I can tell, you aren't supposed to. The headers you end up pulling in there aren't from the DriverKit SDK, they're the default ones that come with Xcode's compiler toolchain. They assume a normal macOS build environment, not the dext environment. The DriverKit SDK doesn't include C++ stdlib headers, and only contains some stripped-down headers for a subset of the C standard library. Hence the missing strcpy and strcat.
Linking against libc++ also fails, even the version included in the DriverKit SDK. I don't know why there is a version of that library included with the DriverKit SDK, but it's clearly not intended for being used in dexts.
There's presumably nothing in particular stopping you from including some other container library, or even directly including parts of an STL implementation. You may need to manually wire up memory allocation calls though.

Is there a list of headers that can be used in an string to compile with NVRTC? [duplicate]

Specifically, my issue is that I have CUDA code that needs <curand_kernel.h> to run. This isn't included by default in NVRTC. Presumably then when creating the program context (i.e. the call to nvrtcCreateProgram), I have to send in the name of the file (curand_kernel.h) and also the source code of curand_kernel.h? I feel like I shouldn't have to do that.
It's hard to tell; I haven't managed to find an example from NVIDIA of someone needing standard CUDA files like this as a source, so I really don't understand what the syntax is. Some issues: curand_kernel.h also has includes... Do I have to do the same for each of these? I am not even sure the NVRTC compiler will even run correctly on curand_kernel.h, because there are some language features it doesn't support, aren't there?
Next: if you've sent in the source code of a header file to nvrtcCreateProgram, do I still have to #include it in the code to be executed / will it cause an error if I do so?
A link to example code that does this or something like it would be appreciated much more than a straightforward answer; I really haven't managed to find any.
You have to send the "filename" and the source of each header separately.
When the preprocessor does its thing, it'll use any #include filenames as a key to find the source for the header, based on the collection that you provide.
I suspect that, in this case, the compiler (driver) doesn't have file system access, so you have to give it the source in much the same way that you would for shader includes in OpenGL.
So:
Include your header's name when calling nvrtcCreateProgram. The compiler will, internally, generate the equivalent of a std::map<string,string> containing the source of each header indexed by the given name.
In your kernel source, use #include "foo.cuh" as usual.
The compiler will use foo.cuh as an index or key into its internal map (created when you called nvrtcCreateProgram), and will retrieve the header source from that collection
Compilation proceeds as normal.
One of the reasons that nvrtc provides only a "subset" of features is that the compiler plays in a somewhat sandboxed environment, without necessarily having all of the supporting tools and utilities lying around that you have with offline compilation. So, you have to manually handle a lot of the stuff that the normal nvcc + (gcc | MSVC| clang) combination provides.
A possible, but non-ideal, solution would be to preprocess the file that you need in your IDE, save the result and then #include that. However, I bet there is a better way to do that. if you just want curand, consider diving into the library and extracting the part you need (blech) or using another GPU-friendly rand implementation. On older CUDA versions, I just generated a big array of random floats on the host, uploaded it to the GPU, and sampled it in the kernels.
This related link may be helpful.
You do not need to load curand_kernel.h yourself and add it to the include "aliases" mechanism.
Instead, you can simply add the CUDA include directory to your (set of) include paths, e.g. by adding --include-path=/usr/local/cuda/include to your NVRTC compiler options.
(I do this in my GPU-kernel-runner test harness, by default, to be on the safe side.)

How can I use ord2 function

I'm using Octave right now, but earlier I had used MatLab. In MatLab there is a function ord2 and according to this document (http://www.obihiro.ac.jp/~suzukim/masuda/octave/html3/octave_159.html/) it should also exist in Octave. Could anyone help me with this issue, since I'm new in Octave. How can I import or in other way use 'ord2' function (second-ordered system).
The documentation you mention is for Octave 3.0 which is a very very old version. All of Control functions, together with Finance and Quaternion, were removed from core Octave with release 3.2. If you see the news for Octave 3.2 (if you have a very recent version, this may be in one of the old news file:
** The Control, Finance and Quaternion functions have been removed.
These functions are now available as separate packages from
http://octave.sourceforge.net/packages.html
and can be reinstalled using the Octave package manager (see
the pkg function).
As it is mentioned, you can download the packages from Octave Forge website but depending on your operating system, you may be better off use your package manager.
Finally, the ord2 function does not actually exist in the control package and I'm even unsure that the version you are referring to in Octave 3.0 was even Matlab compatible. However, you can still see its source online right before it was removed.