The skeleton of the code is
a_kernel.cu
__constant__ unsigned char carray[256];
a.cu
#include <a_kernel.cu>
...
unsigned char h_carray[256];
...
cudaMemcpyToSymbol("carray", h_carray, 256);
The system configuration is
Windows7 64bit
CUDA toolkit 3.1, SDK 3.1
GeForce GTX 460
rules file in SDK 3.1
I've got invalid device symbol error string at cudaMemcpyToSymbol.
Any help would be appreciated. :)
It would help if you could post some code to reproduce the problem, perhaps you could do this on the CUDA forums. Having said that, __constant__ variables have static (i.e. translation unit) scope. The simplest structure to follow would be as follows. Note that it may also be worth checking out CUDA 3.2.
host_code.cpp:
#include "cuda_funcs.h"
...
{
unsigned char h_carray[256];
cudaMemcpyToSymbol("carray", h_carray, 256);
processOnGpu(...);
}
...
cuda_funcs.h:
void processOnGpu(...);
cuda_funcs.cu:
__constant__ unsigned char carray[256];
__global__ void kernel(...)
{
...
}
void processOnGpu(...)
{
...
kernel<<<...>>>(...);
...
}
Checkout the document in the cuda manual
You need to include the kind or direction of the memory copy. Maybe default is "cudaDevicetoHost".
cudaMemcpyToSymbol("carray", h_carray, 256, 0, cudaHostToDevice);
Related
I know that we can't write CUDA kernels with a variable number of parameters:
Is it possible to have a CUDA kernel with varying number of parameters?
(at least not in the C varargs sense; we can use C++ variadic templates.)
But what about non-kernel device-side code, i.e. __device__ functions? Can these be varargs functions?
Yes, we can write varargs device-side functions.
For example:
#include <stdio.h>
#include <stdarg.h>
__device__ void foo(const char* str, ...)
{
va_list ap;
va_start(ap, str);
int arg = va_arg(ap, int); // making an assumption here
printf("str is \"%s\", first va_list argument is %d\n", str, arg);
}
This compiles fine with NVCC - and works, provided you actually pass a null-terminated string and an int. I would not be surprised if CUDA's printf() itself were implemented this way.
The problem occurred when I run the "Gipuma" project, which need the support of opencv and CUDA. My Video card is GTX 750Ti,with CUDA 8.0.
It got "Access Violation" when access the "__managed__ int" variables through a host function.In general,a "__managed__"variable can be read and write through both device and host.I am so confused and I think there may be something wrong in Configuration?
The variables declare in "gipuma.cu":
#ifndef SHARED_HARDCODED
__managed__ int SHARED_SIZE_W_m;
__constant__ int SHARED_SIZE_W;
__managed__ int SHARED_SIZE_H;
__managed__ int SHARED_SIZE = 0;
__managed__ int WIN_RADIUS_W;
__managed__ int WIN_RADIUS_H;
__managed__ int TILE_W;
__managed__ int TILE_H;
#endif
and the host function in "gipuma.cu":
int runcuda(GlobalState &gs)
{
WIN_RADIUS_W = 0;//it gets wrong here,access violation.
printf("test is %d\n", WIN_RADIUS_W);
printf("Run cuda\n");
if(gs.params->color_processing)
gipuma<float4>(gs);
else
gipuma<float>(gs);
return 0;
}
and the error message:
0x000000013FA1DCBD has an unhandled exception (in gipuma.exe): 0xC0000005: An access violation occurred when writing to location 0x0000000000000000.
On devices before compute capability 6.0 host and device may not access __managed__ memory concurrently, because the driver needs an opportunity to programmatically copy the data between host and device.
So, As Robert Crovella already pointed out in his comment, you need to insert a call to cudaDeviceSynchronize() after a kernel call before being able to access __managed__ memory from the host again.
When I use the following code, it shows the correct value 3345.
#include <iostream>
#include <cstdio>
__device__ int d_Array[1];
__global__ void foo(){
d_Array[0] = 3345;
}
int main()
{
foo<<<1,1>>>();
cudaDeviceSynchronize();
int h_Array[1];
cudaMemcpyFromSymbol(&h_Array, d_Array, sizeof(int));
std::cout << "values: " << h_Array[0] << std::endl;
}
But if we replace the line of code __device__ int d_Array[1]; by
__device__ int *d_Array; it shows a wrong value. Why?
The problem is in memory allocation. Try the same thing on C++ (on host) and you will either get an error or unexpected value.
In addition, you can check the Cuda errors calling cudaGetLastError() after your kernel. In the first case everything is fine, and the result is cudaSuccess. In the second case there is cudaErrorLaunchFailure error. Here is the explanation of this error (from cuda toolkit documentation):
"An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The device cannot be used until cudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA."
Note that cudaMemcpyToSymbol also supports the offset parameter for array indexing
Recently I started learning CUDA. Here is my simple code for printing from a kernel.
#include"cuPrintf.cu"
#include"cuPrintf.cuh"
#include<cuda.h>
#include<stdio.h>
__global__ void cuprint()
{
cuPrintf("He he, I am printing from here");
}
main()
{
cuprint<<<1,1>>>cuprint();
}
cuPrintf.cu and cuPrintf.cuh are downloaded and kept in the directory where I wrote this program. I am getting the following error.
cuprint.cu(11): error: expected a "("
cuprint.cu(13): error: expected a declaration
Can any one tell me why I am getting those errors.
There is no need to use cuPrintf with CUDA 6.0 and a card with compute capability 3.5.
This simple code will work
#include<stdio.h>
__global__ void cuprint()
{
printf("Printing...\n");
}
main()
{
cuprint<<<1,1>>>();
cudaDeviceSynchronize();
}
You are calling it the wrong way, you should call it like cuprint<<<1,1>>>();
And according to this page: https://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialHelloWorld you need to add some more functions (for init() and stuff)), but I can not confirm that because I have no CUDA PC here)
softwarre: ActiveState ActiveTcl 8.5.13.296436/Win7/DEV C++ 5.4.1.
ActiveTcl is installed at D:/TCL/.
error information:
E:\src\c\tcl\main.oIn function `Tcl_AppInit':
8E:\src\c\tcl\main.cundefined reference to `_imp__Tcl_Init'
E:\src\c\tcl\main.oIn function `main':
14E:\src\c\tcl\main.cundefined reference to `_imp__Tcl_Main'
E:\src\c\tcl\collect2.exe[Error] ld returned 1 exit status
26E:\src\c\tcl\Makefile.winrecipe for target 'tcl_test.exe' failed
c source code:
#include <stdio.h>
#include <stdlib.h>
#include <tcl.h>
#include <tk.h>
int Tcl_AppInit(Tcl_Interp *interp)
{
return (Tcl_Init(interp) == (TCL_ERROR))?TCL_ERROR:TCL_OK;
}
int main(int argc, char** argv)
{
printf("--- Tcl Third-Party Shell Start ---\n");
Tcl_Main(argc, argv, Tcl_AppInit);
printf("--- Tcl Third-Party Shell End ---\n");
return 0;
}
In order to access those functions (notably, Tcl_Main) you must link your code against the Tcl DLL (which I think will be tcl85.dll in your installation); it's not a symbol that is exported through Tcl's stub linking mechanism. I don't know exactly how you do that on Windows (nor exactly where it will be located) but instructing your build environment to use the DLL should not be too difficult.
FWIW, Tcl_Init always returns either TCL_OK (i.e., 0) or TCL_ERROR (i.e., 1). You can just return the value directly from your AppInit function if you're not going to install your own commands and functionality after basic initialization.
After playing around a bit I could reproduce and solve this in Visual Studio.
You just have to add D:\Tcl\lib\tcl86.lib to the "Additional Dependencies" under "Linker/Input".
This solved the problem for me.
Edit
You could either pass Tcl_Init to Tcl_Main (If you don't have to do any specific initialization) or just return the result of Tcl_Init like this:
int Tcl_AppInit(Tcl_Interp *interp)
{
return Tcl_Init(interp);
}