When does cublasInit() return a NOT_INITIALIZED status? - cuda

during my cublas initialization, i get an error, i.e. not the wanted CUBLAS_STATUS_SUCCESS.
Checking the returned status, i figured out that the returned status is CUBLAS_STATUS_NOT_INITIALIZED which is not listed as possible returns of that function.
Does anyone have an idea what may have caused that behavior?

The CUBLAS 4.x documentation mentions CUBLAS_STATUS_NOT_INITIALIZED as error code for cublasCreate with the meaning "the CUDA Runtime initialization failed".
Can you verify that you have a valid CUDA context?
If so, did you create a valid CUBLAS context?
For CUBLAS 3.x and CUBLAS 4.x using the legacy API: did you call cublasInit while there is a CUDA context in the current thread active, and did it return CUBLAS_STATUS_SUCCESS?
For CUBLAS 4.x with new API: did you call cublasCreate and did it return CUBLAS_STATUS_SUCCESS? Are you using the handle created when calling cublas..._v2 methods?

Related

How to determine the state of peer access without producing a warning in cuda-memcheck

In a multi-gpu system, I use the return value of
cudaError_t cudaDeviceDisablePeerAccess ( int peerDevice ) to determine if peer access is disabled. In that case, the function returns cudaErrorPeerAccessNotEnabled
This is not an error in my program, but produces a warning in both cuda-gdb and cuda-memcheck since an API call did not return cudaSuccess.
In the same manner cudaDeviceEnablePeerAccess returns cudaErrorPeerAccessAlreadyEnabled if access has already been enabled.
How can one find out if peer access is enabled / disabled without producing a warning?
Summarizing comments into an answer: you can't.
The runtime API isn't blessed with the ability to have informational/warning level status returns and error returns. Everything which isn't success is treated as an error. And the toolchain utilities like cuda-memcheck cannot be instructed to ignore errors. The default beahaviour is to report and continue, so it will not interfere with anything, but it will emit an error message.
If you want to avoid the errors then you will need to build some layers of your own state tracking and condition preemption to avoid potential errors being returned.

How to find more details on CUDA_ERROR_INVALID_VALUE?

As a side question to Use Vulkan VkImage as a CUDA cuArray, how could I get more details on what's wrong on a CUDA Driver API call that returns CUDA_ERROR_INVALID_VALUE?
Specifically, the call is to cuExternalMemoryGetMappedMipmappedArray() and the documentation does not list CUDA_ERROR_INVALID_VALUE among its return values.
Any suggestions on how to go about debugging this issue?
Specifically, the call is to cuExternalMemoryGetMappedMipmappedArray() and the documentation does not list CUDA_ERROR_INVALID_VALUE among its return values.
That appears to be have been a transient documentation error. The current documentation linked in the question (CUDA 11.5 at the time of writing), shows CUDA_ERROR_INVALID_VALUE as an expected return value.
As for the debugging part, the function only has two inputs, the memory object handle, and the array descriptor. One of those is invalid. It should be trivial to debug if you know that the function call is returning the error, and not a prior call.

CUDA - invalid device function, how to know [architecture, code]?

I am getting the following error when running the default generated kernel when creating a CUDA project in VS Community:
addKernel launch failed: invalid device function
addWithCuda failed!
I searched for how to solve it, and found out that have to change the Project->Properties->CUDA C/C++->Device->Code Generation(default values for [architecture, code] are compute_20,sm_20), but I couldn't find the values needed for my graphic card (GeForce 8400 GS)
Is there any list on the net for the [architecture, code] or is it possible to get them by any command?
The numeric value in compute_XX and sm_XX are the Compute Capability (CC) for your CUDA device.
You can lookup this link http://en.wikipedia.org/wiki/CUDA#Supported_GPUs for a (maybe not complete) list of GPUs and there corresponding CC.
Your quite old 8400 GS (when I remember correctly) hosts a G86 chip which supports CC 1.1.
So you have to change to compute_11,sm_11
`

How error handling is done in Jcuda?

In CUDA we can get to know about errors simply by checking return type of functions such as cudaMemcpy(), cudaMalloc() etc. which is cudaError_t with cudaSuccess. Is there any method available in JCuda to check error for functions such as cuMemcpyHtoD(), cuMemAlloc(), cuLaunchKernel() etc.
First of all, the methods of JCuda (should) behave exactly like the corresponding CUDA functions: They return an error code in form of an int. These error codes are also defined in...
the cudaError class for the Runtime API
the CUresult class for the Driver API
the cublasStatus class for JCublas
the cufftResult class for JCufft
the curandStatus class for JCurand
the cusparseStatus class for JCusparse
and are the same error codes as in the respective CUDA library.
All these classes additionally have a static method called stringFor(int) - for example, cudaError#stringFor(int) and CUresult#stringFor(int). These methods return a human-readable String representation of the error code.
So you could do manual error checks, for example, like this:
int error = someCudaFunction();
if (error != 0= {
System.out.println("Error code "+error+": "+cudaError.stringFor(error));
}
which might print something like
Error code 10: cudaErrorInvalidDevice
But...
...the error checks may be a hassle. You might have noticed in the CUDA samples that NVIDIA introduced some macros that simplify the error checks. And similarly, I added optional exception checks for JCuda: All the libraries offer a static method called setExceptionsEnabled(boolean). When calling
JCudaDriver.setExceptionsEnabled(true);
then all subsequent method calls for the Driver API will automatically check the method return values, and throw a CudaException when there was any error.
(Note that this method exists separately for all libraries. E.g. the call would be JCublas.setExceptionsEnabled(true) when using JCublas)
The samples usually enable exception checks right at the beginning of the main method. And I'd recommend to also do this, at least during the development phase. As soon as it is clear that the program does not contain any errors, one could disable the exceptions, but there's hardly a reason to do so: They conveniently offer clear information about which error occurred, whereas otherwise, the calls may fail silently.

lua can't unwind exception, generated by luaL_error on hp-ux

I have a problem with lua... on hp-ux 11.31
I have a lua-script that call some function that has been written on C++.
In that function luaL_error is called... but application crashed then luaL_error is called, because exception isn't unwind by lua...
On other platforms this application is work correct.
Do you have any idea what may be wrong?
You probably need to compile the Lua library as a C++ library, instead of a C library. Then Lua will use C++ exceptions instead of longjmps.