My simple cuda helloworld application runs fine when built in 32 bit using visual studio 2015 community on windows 10. However, if I build it in 64 bit, it is not executed
GPU: Telsa K40c
ToolKit: CUDA 8
Operating System: windows 10 64bit
Visual Studio: community edition.
there is no error message in output console.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include<stdio.h>
__global__ void welcome() {
printf("Hello world :)");
}
int main() {
welcome << <1, 1 >> > ();
cudaDeviceSynchronize();
return 0;
}
I faced the same issue and opened a bug to nvidia #1855074.
They reproduced it successfully and I'm waiting for update on it.
One thing is sure, it's on their side.
The only workaround I found was to put my card in WDDM mode via nvidia-smi, which broke my taskbar.
I recommend to wait for a fix.
Related
After heavy simulation that crashes my GPU, terminating the program and rebooting my computer, I cannot call any cuda api that runs correctly before rebooting.
nvidia-smi works well.
In my Ubuntu 20.04 computer, the CUDA11.6 and nvidia-driver 510.47.03 are installed.
The minimum codes for getting error in my computer are followings,
Driver api version
#include "cuda.h"
int main() {
CUresult result;
result = cuInit(0);
const char* pStr;
cuGetErrorName(result, &pStr);
if(result != CUDA_SUCCESS){
std::cerr << pStr << std::endl;
}
}
The compile command is,
nvcc a.cu -lcuda
and I got CUDA_ERROR_UNKNOWN.
Runtime api version,
#include <iostream>
int main() {
int n;
cudaError_t error = cudaGetDeviceCount(&n);
if(error){
std::cerr << cudaGetErrorName(error) << std::endl;
}
}
The compile command is,
nvcc a.cu
and I got cudaUnknownError.
Why I got this error ? Is this a problem of driver ?
looks like you trashed something (driver or cuda runtime) and you are not able to call any function related to cuda.
in my humble experience, I usually get these errors when my kernels runs for too long on a Windows machine and the Windows Display Driver Manager reset my GPU while i'm running the kernel.
Maybe you are experiencing some similiar issues on linux.
To fix this, have you tried to reset your GPU using the following bash command line ?
nvidia-smi --gpu-reset
Please also note that all CUDA errors are not recoverable and if the previous command doesn't work, you may reset your computer to fix the issue (or simply unload/reload the cuda driver).
I am trying to compile and run the following code on an Nvidia P100. I'm running CentOS 6.9, Driver version 396.37 and CUDA-9.2. It appears that these driver/cuda versions are compatible.
#include <stdio.h>
#include <cuda_runtime_api.h>
int main(int argc, char *argv[])
{
// Declare variables
int * dimA = NULL; //{2,3};
cudaMallocManaged(&dimA, 2 * sizeof(float));
dimA[0] = 2;
dimA[1] = 3;
cudaDeviceSynchronize();
printf("The End\n");
return 0;
}
It fails with a segmentation fault. When I compile with nvcc -g -G src/get_p100_to_work.cu and run the core file (cuda-gdb ./a.out core.277512), I get
Reading symbols from ./a.out...done.
[New LWP 277512]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040317d in main (argc=1, argv=0x7fff585da548) at src/get_p100_to_work.cu:71
71 dimA[0] = 2;
(cuda-gdb) bt full
#0 0x000000000040317d in main (argc=1, argv=0x7fff585da548) at src/get_p100_to_work.cu:71
dimA = 0x0
(cuda-gdb)
When I run this code on an NVidia K40, the code runs without error.
QUESTION :
How do I get my code to run on the P100? It seems from this tutorial, this code should run.
Previously, I had cloned an image of a GPU node with a 2 K40's in it. I then put that image on a node with 2 - P100's in it. I suspect that when installing the driver on the K40 node, there is a configuration specific to the graphics cards on the machine (which is makes sense). This configuration was not compatible with the P100. Since the driver on the P100 machine was basically corrupted, this would explain why my code failed so cataclysmically.
Solution : I ended up having to reinstall the driver and now it works.
My simple cuda helloworld application runs fine when built in 32 bit using visual studio 2015 community on windows 10. However, if I build it in 64 bit, it is not executed
GPU: Telsa K40c
ToolKit: CUDA 8
Operating System: windows 10 64bit
Visual Studio: community edition.
there is no error message in output console.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include<stdio.h>
__global__ void welcome() {
printf("Hello world :)");
}
int main() {
welcome << <1, 1 >> > ();
cudaDeviceSynchronize();
return 0;
}
I faced the same issue and opened a bug to nvidia #1855074.
They reproduced it successfully and I'm waiting for update on it.
One thing is sure, it's on their side.
The only workaround I found was to put my card in WDDM mode via nvidia-smi, which broke my taskbar.
I recommend to wait for a fix.
In order to use unified memory feature in CUDA 6, the following requirement must be meet,
a GPU with SM architecture 3.0 or higher (Kepler class or newer)
a 64-bit host application and operating system, except on Android
Linux or Windows
My setup is,
System: ubuntu 13.10 (64-bit)
GPU: GTX770
CUDA: 6.0
Driver Version: 331.49
The sample code are taken from the programming guide page 210.
__device__ __managed__ int ret[1000];
__global__ void AplusB(int a, int b) {
ret[threadIdx.x] = a + b + threadIdx.x;
}
int main() {
AplusB<<< 1, 1000 >>>(10, 100);
cudaDeviceSynchronize();
for(int i=0; i<1000; i++)
printf("%d: A+B = %d\n", i, ret[i]);
return 0;
}
The nvcc compile option I used is,
nvcc -m64 -Xptxas=-Werror -arch=compute_30 -code=sm_30 -o UM UnifiedMem.cu
This code compiles perfectly fine. During execution, it produces "segmentation fault" at printf(). It feels like that unified memory feature didn't come into effect. The address of variable ret is still of GPU but printf is called on CPU. CPU is trying to access a piece of data that is not allocated on CPU so it produces a segmentation fault. Can anybody help me? What is wrong here?
Thought I am not certain sure (and I can't check it for myself right now) I think that because Ubuntu 13.10 has gcc in version of 4.8.1, which I believe is not supported yet even in newest CUDA Toolkit 6.0. Try to compile your code with host compiler gcc 4.7.3 (that is, the same one that is included in officially supported Ubuntu 13.04 for default). For that you might install gcc-4.7 package and point /usr/bin/gcc-4.7 as host compiler for nvcc. For C++ support I believe you need g++-4.7 as well.
If you need some simple step-by-step guide, then you might proceed with http://n00bsys0p.co.uk/blog/2014/01/23/nvidia-cuda-55ubuntu-1310-saucy-salamander. It's for CUDA Toolkit 5.5, but I think it should be relevant for recent version as well.
I have a very simple CUDA program. The program when compiled with -arch=sm_11 option, works correctly as expected. However, when compiled with -arch=sm_12, the results are unexpected.
Here is the kernel code :
__global__ void dev_test(int *test) {
*test = 100;
}
I invoke the kernel code as below :
int *dev_int, val;
val = 0;
cudaMalloc((void **)&dev_int, sizeof(int));
cudaMemset((void *)dev_int, 0, sizeof(int));
cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice);
dev_test <<< 1, 1>>> (dev_int);
int *host_int = (int*)malloc(sizeof(int));
cudaMemcpy(host_int, dev_int, sizeof(int), cudaMemcpyDeviceToHost);
printf("copied back from device %d\n",*host_int);
When compiled with -arch=sm_11, the print statement correctly prints 100.
However when compiled with -arch=sm_12, it prints 0 i.e the changes inside the kernel function is not taking effect. I am guessing this is due to some incompatibility between my CUDA version and the nvidia drivers.
CUDA version - 3.0
NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010
GCC version: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Any help is highly appreciated.
My problem finally got resolved.. Not sure which one truly resolved it - i upgraded to Cuda 4.1 and upgraded my nVidia driver and the combination of the two solved the problem.