I am trying to compile and run the following code on an Nvidia P100. I'm running CentOS 6.9, Driver version 396.37 and CUDA-9.2. It appears that these driver/cuda versions are compatible.
#include <stdio.h>
#include <cuda_runtime_api.h>
int main(int argc, char *argv[])
{
// Declare variables
int * dimA = NULL; //{2,3};
cudaMallocManaged(&dimA, 2 * sizeof(float));
dimA[0] = 2;
dimA[1] = 3;
cudaDeviceSynchronize();
printf("The End\n");
return 0;
}
It fails with a segmentation fault. When I compile with nvcc -g -G src/get_p100_to_work.cu and run the core file (cuda-gdb ./a.out core.277512), I get
Reading symbols from ./a.out...done.
[New LWP 277512]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040317d in main (argc=1, argv=0x7fff585da548) at src/get_p100_to_work.cu:71
71 dimA[0] = 2;
(cuda-gdb) bt full
#0 0x000000000040317d in main (argc=1, argv=0x7fff585da548) at src/get_p100_to_work.cu:71
dimA = 0x0
(cuda-gdb)
When I run this code on an NVidia K40, the code runs without error.
QUESTION :
How do I get my code to run on the P100? It seems from this tutorial, this code should run.
Previously, I had cloned an image of a GPU node with a 2 K40's in it. I then put that image on a node with 2 - P100's in it. I suspect that when installing the driver on the K40 node, there is a configuration specific to the graphics cards on the machine (which is makes sense). This configuration was not compatible with the P100. Since the driver on the P100 machine was basically corrupted, this would explain why my code failed so cataclysmically.
Solution : I ended up having to reinstall the driver and now it works.
Related
After heavy simulation that crashes my GPU, terminating the program and rebooting my computer, I cannot call any cuda api that runs correctly before rebooting.
nvidia-smi works well.
In my Ubuntu 20.04 computer, the CUDA11.6 and nvidia-driver 510.47.03 are installed.
The minimum codes for getting error in my computer are followings,
Driver api version
#include "cuda.h"
int main() {
CUresult result;
result = cuInit(0);
const char* pStr;
cuGetErrorName(result, &pStr);
if(result != CUDA_SUCCESS){
std::cerr << pStr << std::endl;
}
}
The compile command is,
nvcc a.cu -lcuda
and I got CUDA_ERROR_UNKNOWN.
Runtime api version,
#include <iostream>
int main() {
int n;
cudaError_t error = cudaGetDeviceCount(&n);
if(error){
std::cerr << cudaGetErrorName(error) << std::endl;
}
}
The compile command is,
nvcc a.cu
and I got cudaUnknownError.
Why I got this error ? Is this a problem of driver ?
looks like you trashed something (driver or cuda runtime) and you are not able to call any function related to cuda.
in my humble experience, I usually get these errors when my kernels runs for too long on a Windows machine and the Windows Display Driver Manager reset my GPU while i'm running the kernel.
Maybe you are experiencing some similiar issues on linux.
To fix this, have you tried to reset your GPU using the following bash command line ?
nvidia-smi --gpu-reset
Please also note that all CUDA errors are not recoverable and if the previous command doesn't work, you may reset your computer to fix the issue (or simply unload/reload the cuda driver).
I have a relatively simple CUDA kernel and I immediately call the kernel in the main method of my program in the following way:
__global__ void block() {
for (int i = 0; i < 20; i++) {
printf("a");
}
}
int main(int argc, char** argv) {
block << <1, 1 >> > ();
cudaError_t cudaerr = cudaDeviceSynchronize();
printf("Kernel executed!\n");
if (cudaerr != cudaSuccess)
printf("kernel launch failed with error \"%s\".\n",
cudaGetErrorString(cudaerr));
}
This program is compiled and launched using Visual Studio 2015, and the project being executed has been generated with CMAKE using the following CMakeLists.txt file:
project (Comparison)
cmake_minimum_required (VERSION 2.6)
find_package(CUDA REQUIRED)
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-arch=compute_30 -code=sm_30 -g -G
)
cuda_add_executable(Comparison kernel.cu)
I would expect the output of this program to print 20 A's to the console and then end with printing kernel executed. However, the A's are never printed to the console and the line Kernel executed shows up immediately. Even if I replace the for loop by a while(true) loop.
Even when running the code with the Nsight debugger attached and a breakpoint in the for loop of the kernel nothing happens. Leading me to believe that the kernel is never actually launched. Does anyone know how to make this kernel behave as expected?
The reason the kernel was not running correctly when compiled with the given CMakeLists.txt file was due to these flags:
-arch=compute_30 -code=sm_30
combined with the GPU that was being used (GTX 970, a cc 5.2 GPU).
Those flags specify the generation of cc 3.0 SASS code only, and such code is not compatible with a cc 5.2 device. The fix would be to modify the flags to something like:
-arch=sm_30
or
-arch=sm_52
or
-arch=compute_52 -code=sm_52
I would recommend the first or second approach, as it will include PTX support for future devices.
The kernel error was not evident because the error checking after the kernel was incomplete. Refer to the canonical/question answer.
This is my MCVE using MySQL C API:
#include <my_global.h>
#include <mysql.h>
int main()
{
mysql_init( NULL );
}
Compiling with gcc 4.9.3 -O3 -I /usr/include/mysql produces this warning:
/usr/lib64/mysql/libmysqlclient.a(ssl.cpp.o): In function `memset':
/usr/include/bits/string3.h:82: warning: memset used with constant zero length parameter;
this could be due to transposed parameters
Environment:
Using MySQL 5.7.13 Server and C API 6.1.6 on SLES 12 SP1 VM on XenServer.
My question:
Should I report it as a bug to MySQL or does anybody know how to remove this warning?
Update:
Because a comment mentioned I should play with the arguments in mysql_library_init() I updated the question to a more compact/minimal working example.
Update:
This warning appears while the link process! The compilation works fine.
In order to use unified memory feature in CUDA 6, the following requirement must be meet,
a GPU with SM architecture 3.0 or higher (Kepler class or newer)
a 64-bit host application and operating system, except on Android
Linux or Windows
My setup is,
System: ubuntu 13.10 (64-bit)
GPU: GTX770
CUDA: 6.0
Driver Version: 331.49
The sample code are taken from the programming guide page 210.
__device__ __managed__ int ret[1000];
__global__ void AplusB(int a, int b) {
ret[threadIdx.x] = a + b + threadIdx.x;
}
int main() {
AplusB<<< 1, 1000 >>>(10, 100);
cudaDeviceSynchronize();
for(int i=0; i<1000; i++)
printf("%d: A+B = %d\n", i, ret[i]);
return 0;
}
The nvcc compile option I used is,
nvcc -m64 -Xptxas=-Werror -arch=compute_30 -code=sm_30 -o UM UnifiedMem.cu
This code compiles perfectly fine. During execution, it produces "segmentation fault" at printf(). It feels like that unified memory feature didn't come into effect. The address of variable ret is still of GPU but printf is called on CPU. CPU is trying to access a piece of data that is not allocated on CPU so it produces a segmentation fault. Can anybody help me? What is wrong here?
Thought I am not certain sure (and I can't check it for myself right now) I think that because Ubuntu 13.10 has gcc in version of 4.8.1, which I believe is not supported yet even in newest CUDA Toolkit 6.0. Try to compile your code with host compiler gcc 4.7.3 (that is, the same one that is included in officially supported Ubuntu 13.04 for default). For that you might install gcc-4.7 package and point /usr/bin/gcc-4.7 as host compiler for nvcc. For C++ support I believe you need g++-4.7 as well.
If you need some simple step-by-step guide, then you might proceed with http://n00bsys0p.co.uk/blog/2014/01/23/nvidia-cuda-55ubuntu-1310-saucy-salamander. It's for CUDA Toolkit 5.5, but I think it should be relevant for recent version as well.
I have a very simple CUDA program. The program when compiled with -arch=sm_11 option, works correctly as expected. However, when compiled with -arch=sm_12, the results are unexpected.
Here is the kernel code :
__global__ void dev_test(int *test) {
*test = 100;
}
I invoke the kernel code as below :
int *dev_int, val;
val = 0;
cudaMalloc((void **)&dev_int, sizeof(int));
cudaMemset((void *)dev_int, 0, sizeof(int));
cudaMemcpy(dev_int, &val, sizeof(int), cudaMemcpyHostToDevice);
dev_test <<< 1, 1>>> (dev_int);
int *host_int = (int*)malloc(sizeof(int));
cudaMemcpy(host_int, dev_int, sizeof(int), cudaMemcpyDeviceToHost);
printf("copied back from device %d\n",*host_int);
When compiled with -arch=sm_11, the print statement correctly prints 100.
However when compiled with -arch=sm_12, it prints 0 i.e the changes inside the kernel function is not taking effect. I am guessing this is due to some incompatibility between my CUDA version and the nvidia drivers.
CUDA version - 3.0
NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.24 Thu Apr 22 19:10:14 PDT 2010
GCC version: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Any help is highly appreciated.
My problem finally got resolved.. Not sure which one truly resolved it - i upgraded to Cuda 4.1 and upgraded my nVidia driver and the combination of the two solved the problem.