Unable to understand the error in a simple CUDA function - cuda

Recently I started learning CUDA. Here is my simple code for printing from a kernel.
#include"cuPrintf.cu"
#include"cuPrintf.cuh"
#include<cuda.h>
#include<stdio.h>
__global__ void cuprint()
{
cuPrintf("He he, I am printing from here");
}
main()
{
cuprint<<<1,1>>>cuprint();
}
cuPrintf.cu and cuPrintf.cuh are downloaded and kept in the directory where I wrote this program. I am getting the following error.
cuprint.cu(11): error: expected a "("
cuprint.cu(13): error: expected a declaration
Can any one tell me why I am getting those errors.

There is no need to use cuPrintf with CUDA 6.0 and a card with compute capability 3.5.
This simple code will work
#include<stdio.h>
__global__ void cuprint()
{
printf("Printing...\n");
}
main()
{
cuprint<<<1,1>>>();
cudaDeviceSynchronize();
}

You are calling it the wrong way, you should call it like cuprint<<<1,1>>>();
And according to this page: https://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialHelloWorld you need to add some more functions (for init() and stuff)), but I can not confirm that because I have no CUDA PC here)

Related

Declaring a global device array using a pointer with cudaMemcpyFromSymbol

When I use the following code, it shows the correct value 3345.
#include <iostream>
#include <cstdio>
__device__ int d_Array[1];
__global__ void foo(){
d_Array[0] = 3345;
}
int main()
{
foo<<<1,1>>>();
cudaDeviceSynchronize();
int h_Array[1];
cudaMemcpyFromSymbol(&h_Array, d_Array, sizeof(int));
std::cout << "values: " << h_Array[0] << std::endl;
}
But if we replace the line of code __device__ int d_Array[1]; by
__device__ int *d_Array; it shows a wrong value. Why?
The problem is in memory allocation. Try the same thing on C++ (on host) and you will either get an error or unexpected value.
In addition, you can check the Cuda errors calling cudaGetLastError() after your kernel. In the first case everything is fine, and the result is cudaSuccess. In the second case there is cudaErrorLaunchFailure error. Here is the explanation of this error (from cuda toolkit documentation):
"An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The device cannot be used until cudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA."
Note that cudaMemcpyToSymbol also supports the offset parameter for array indexing

Use Tcl library with C code gets errors

softwarre: ActiveState ActiveTcl 8.5.13.296436/Win7/DEV C++ 5.4.1.
ActiveTcl is installed at D:/TCL/.
error information:
E:\src\c\tcl\main.oIn function `Tcl_AppInit':
8E:\src\c\tcl\main.cundefined reference to `_imp__Tcl_Init'
E:\src\c\tcl\main.oIn function `main':
14E:\src\c\tcl\main.cundefined reference to `_imp__Tcl_Main'
E:\src\c\tcl\collect2.exe[Error] ld returned 1 exit status
26E:\src\c\tcl\Makefile.winrecipe for target 'tcl_test.exe' failed
c source code:
#include <stdio.h>
#include <stdlib.h>
#include <tcl.h>
#include <tk.h>
int Tcl_AppInit(Tcl_Interp *interp)
{
return (Tcl_Init(interp) == (TCL_ERROR))?TCL_ERROR:TCL_OK;
}
int main(int argc, char** argv)
{
printf("--- Tcl Third-Party Shell Start ---\n");
Tcl_Main(argc, argv, Tcl_AppInit);
printf("--- Tcl Third-Party Shell End ---\n");
return 0;
}
In order to access those functions (notably, Tcl_Main) you must link your code against the Tcl DLL (which I think will be tcl85.dll in your installation); it's not a symbol that is exported through Tcl's stub linking mechanism. I don't know exactly how you do that on Windows (nor exactly where it will be located) but instructing your build environment to use the DLL should not be too difficult.
FWIW, Tcl_Init always returns either TCL_OK (i.e., 0) or TCL_ERROR (i.e., 1). You can just return the value directly from your AppInit function if you're not going to install your own commands and functionality after basic initialization.
After playing around a bit I could reproduce and solve this in Visual Studio.
You just have to add D:\Tcl\lib\tcl86.lib to the "Additional Dependencies" under "Linker/Input".
This solved the problem for me.
Edit
You could either pass Tcl_Init to Tcl_Main (If you don't have to do any specific initialization) or just return the result of Tcl_Init like this:
int Tcl_AppInit(Tcl_Interp *interp)
{
return Tcl_Init(interp);
}

ARM7 TDMI undefined instruction exception manual generation

I am trying to test out some exception handling code running on an ARM7 TDMI processor. I am wanting to manually create an instruction opcode which will generate the "Undefined instruction" exception. So far, I've done this:
void createUndefinedException()
{
static const int instr = 0x26889912; // bad opcode
((void(*)(void))instr)();
}
I arrived at the above opcode because of a reference page I found today on the web that talks about undefined instruction opcodes at the very bottom.
The above code generates the prefetch abort exception instead of the undefined instruction exception.
Anyone have an idea of how to create this easily?
I just want to verify my handling of this exception is going to work properly.
create an asm file
.globl test_function
test_function:
.word 0x26889912
bx lr
assemble it
arm-none-linux-gnueabi-as fun.s -o fun.o
call it from your C code
extern void test_function ( void );
...
test_function();
then add it to the list of objects you are linking
arm-none-linux-gnueabi-gcc myprogram.c fun.o -o myprogram
and run it.
You need to create a function out of the address of the int:
typedef void (*Exception)(void)
static unsigned long illegalOpcode=0x26889912;
Exception e=(Exception)&illegalOpcode;
e();

Cython: call function from external C file

After Cython's "Hello World" and the example of calling a function in the C math libraries here, what I really want to do is to have C code of my own in a separate file and use it from Cython. Following this, I modify the setup.py file:
sourcefiles = ['hello2_caller.pyx', 'hello2.c']
This is hello2.c (main is just there to compile and test it separately---though that product isn't present for the test:
#import <stdio.h>
void f() {
printf("%s", "Hello world!\n");
}
int main(int argc, const char* argv[]) {
f();
return 0;
}
This is hello2_caller.pyx
cdef extern from "hello2.c":
void f()
cpdef myf():
f()
I get:
In file included from hello2_caller.c:219:
hello2.c:3: warning: function declaration isn’t a prototype
So I guess I'm failing to provide a header in some way.. though just feeding setup.py a standard header like 'hello2.h' doesn't work. Can you point me to a working example or explain what I'm doing wrong. Thanks.
Thanks to help from the Cython users' list here.
My writeup here.
Bottom line: this is only a warning, that is not fixed by a declaration of f(), but the compiled .so works. I'm still not sure how you would provide a .h file to Cython or if there is a better way to do this.
And there's a couple of errors: should be #include, and don't list the .c file in sourcfiles.

CUDA "invalid device symbol"

The skeleton of the code is
a_kernel.cu
__constant__ unsigned char carray[256];
a.cu
#include <a_kernel.cu>
...
unsigned char h_carray[256];
...
cudaMemcpyToSymbol("carray", h_carray, 256);
The system configuration is
Windows7 64bit
CUDA toolkit 3.1, SDK 3.1
GeForce GTX 460
rules file in SDK 3.1
I've got invalid device symbol error string at cudaMemcpyToSymbol.
Any help would be appreciated. :)
It would help if you could post some code to reproduce the problem, perhaps you could do this on the CUDA forums. Having said that, __constant__ variables have static (i.e. translation unit) scope. The simplest structure to follow would be as follows. Note that it may also be worth checking out CUDA 3.2.
host_code.cpp:
#include "cuda_funcs.h"
...
{
unsigned char h_carray[256];
cudaMemcpyToSymbol("carray", h_carray, 256);
processOnGpu(...);
}
...
cuda_funcs.h:
void processOnGpu(...);
cuda_funcs.cu:
__constant__ unsigned char carray[256];
__global__ void kernel(...)
{
...
}
void processOnGpu(...)
{
...
kernel<<<...>>>(...);
...
}
Checkout the document in the cuda manual
You need to include the kind or direction of the memory copy. Maybe default is "cudaDevicetoHost".
cudaMemcpyToSymbol("carray", h_carray, 256, 0, cudaHostToDevice);