I am attempting dynamic parallelism on a GTX 980 ti card.
All attempts at running code return "unknown error".
Simple code is shown below with compilation options.
I can execute kernels at depth=0 with no issues.
The first time a child is called, the error is given.
The cudaDeviceSynchronize() were included after looking at other questions here, but didn't solve problem.
Any ideas? Could this be a drivers issue?
Edit 1:
OS: Linux-x86_64
Nvidia driver version: 384.59
nvcc version 7.5.17
There are two 980 ti's connected with PCIe x16 Gen3. The system also has windows installed on another RAID configured SSD.
#include <cuda.h>
#include <fstream>
#include <stdio.h>
#include <stdlib.h>
__global__ void ker_two(){
int two=0;
two++;
}
__global__ void ker_one(){
int one=0;
one++;
ker_two<<<1,1>>>();
cudaDeviceSynchronize();
};
int main( ){
ker_one<<<1,1>>>();
cudaDeviceSynchronize();
cudaError_t err = cudaGetLastError();
if (err != cudaSuccess)
printf("Cuda Error: %s\n", cudaGetErrorString(err));//*/
return 0;
}
compiled with
nvcc -arch=compute_52 -rdc=true -lcudadevrt test.cu
I am able (?) to reproduce the error on a machine with a Maxwell Titan card. It's a Fedora 24 distribution with CUDA 8.0.61 installed manually. Driver version is 375.51.
However - it seems the problem only occurs on my system when I call the cudaDeviceSynchronize() within the ker_one(), regardless of whether I call the second kernel or not. So maybe that's the problem you're seeing rather than dynamic parallelism per se.
Considering #talonmies' comment, this might even be just a driver issue.
Related
After heavy simulation that crashes my GPU, terminating the program and rebooting my computer, I cannot call any cuda api that runs correctly before rebooting.
nvidia-smi works well.
In my Ubuntu 20.04 computer, the CUDA11.6 and nvidia-driver 510.47.03 are installed.
The minimum codes for getting error in my computer are followings,
Driver api version
#include "cuda.h"
int main() {
CUresult result;
result = cuInit(0);
const char* pStr;
cuGetErrorName(result, &pStr);
if(result != CUDA_SUCCESS){
std::cerr << pStr << std::endl;
}
}
The compile command is,
nvcc a.cu -lcuda
and I got CUDA_ERROR_UNKNOWN.
Runtime api version,
#include <iostream>
int main() {
int n;
cudaError_t error = cudaGetDeviceCount(&n);
if(error){
std::cerr << cudaGetErrorName(error) << std::endl;
}
}
The compile command is,
nvcc a.cu
and I got cudaUnknownError.
Why I got this error ? Is this a problem of driver ?
looks like you trashed something (driver or cuda runtime) and you are not able to call any function related to cuda.
in my humble experience, I usually get these errors when my kernels runs for too long on a Windows machine and the Windows Display Driver Manager reset my GPU while i'm running the kernel.
Maybe you are experiencing some similiar issues on linux.
To fix this, have you tried to reset your GPU using the following bash command line ?
nvidia-smi --gpu-reset
Please also note that all CUDA errors are not recoverable and if the previous command doesn't work, you may reset your computer to fix the issue (or simply unload/reload the cuda driver).
My simple cuda helloworld application runs fine when built in 32 bit using visual studio 2015 community on windows 10. However, if I build it in 64 bit, it is not executed
GPU: Telsa K40c
ToolKit: CUDA 8
Operating System: windows 10 64bit
Visual Studio: community edition.
there is no error message in output console.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include<stdio.h>
__global__ void welcome() {
printf("Hello world :)");
}
int main() {
welcome << <1, 1 >> > ();
cudaDeviceSynchronize();
return 0;
}
I faced the same issue and opened a bug to nvidia #1855074.
They reproduced it successfully and I'm waiting for update on it.
One thing is sure, it's on their side.
The only workaround I found was to put my card in WDDM mode via nvidia-smi, which broke my taskbar.
I recommend to wait for a fix.
I have a relatively simple CUDA kernel and I immediately call the kernel in the main method of my program in the following way:
__global__ void block() {
for (int i = 0; i < 20; i++) {
printf("a");
}
}
int main(int argc, char** argv) {
block << <1, 1 >> > ();
cudaError_t cudaerr = cudaDeviceSynchronize();
printf("Kernel executed!\n");
if (cudaerr != cudaSuccess)
printf("kernel launch failed with error \"%s\".\n",
cudaGetErrorString(cudaerr));
}
This program is compiled and launched using Visual Studio 2015, and the project being executed has been generated with CMAKE using the following CMakeLists.txt file:
project (Comparison)
cmake_minimum_required (VERSION 2.6)
find_package(CUDA REQUIRED)
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-arch=compute_30 -code=sm_30 -g -G
)
cuda_add_executable(Comparison kernel.cu)
I would expect the output of this program to print 20 A's to the console and then end with printing kernel executed. However, the A's are never printed to the console and the line Kernel executed shows up immediately. Even if I replace the for loop by a while(true) loop.
Even when running the code with the Nsight debugger attached and a breakpoint in the for loop of the kernel nothing happens. Leading me to believe that the kernel is never actually launched. Does anyone know how to make this kernel behave as expected?
The reason the kernel was not running correctly when compiled with the given CMakeLists.txt file was due to these flags:
-arch=compute_30 -code=sm_30
combined with the GPU that was being used (GTX 970, a cc 5.2 GPU).
Those flags specify the generation of cc 3.0 SASS code only, and such code is not compatible with a cc 5.2 device. The fix would be to modify the flags to something like:
-arch=sm_30
or
-arch=sm_52
or
-arch=compute_52 -code=sm_52
I would recommend the first or second approach, as it will include PTX support for future devices.
The kernel error was not evident because the error checking after the kernel was incomplete. Refer to the canonical/question answer.
My simple cuda helloworld application runs fine when built in 32 bit using visual studio 2015 community on windows 10. However, if I build it in 64 bit, it is not executed
GPU: Telsa K40c
ToolKit: CUDA 8
Operating System: windows 10 64bit
Visual Studio: community edition.
there is no error message in output console.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include<stdio.h>
__global__ void welcome() {
printf("Hello world :)");
}
int main() {
welcome << <1, 1 >> > ();
cudaDeviceSynchronize();
return 0;
}
I faced the same issue and opened a bug to nvidia #1855074.
They reproduced it successfully and I'm waiting for update on it.
One thing is sure, it's on their side.
The only workaround I found was to put my card in WDDM mode via nvidia-smi, which broke my taskbar.
I recommend to wait for a fix.
I have a very simple CUDA program. The program when compiled with -arch=sm_11 option, works correctly as expected. However, when compiled with -arch=sm_12, the results are unexpected.
Here is the kernel code :
__global__ void dev_test(int *test) {
*test = 100;
}
I invoke the kernel code as below :
int *dev_int, val;
val = 0;
cudaMalloc((void **)&dev_int, sizeof(int));
cudaMemset((void *)dev_int, 0, sizeof(int));
cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice);
dev_test <<< 1, 1>>> (dev_int);
int *host_int = (int*)malloc(sizeof(int));
cudaMemcpy(host_int, dev_int, sizeof(int), cudaMemcpyDeviceToHost);
printf("copied back from device %d\n",*host_int);
When compiled with -arch=sm_11, the print statement correctly prints 100.
However when compiled with -arch=sm_12, it prints 0 i.e the changes inside the kernel function is not taking effect. I am guessing this is due to some incompatibility between my CUDA version and the nvidia drivers.
CUDA version - 3.0
NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010
GCC version: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Any help is highly appreciated.
My problem finally got resolved.. Not sure which one truly resolved it - i upgraded to Cuda 4.1 and upgraded my nVidia driver and the combination of the two solved the problem.