Intel C++ Composer and CUDA - cuda

I create a default CUDA project in VisualStudio2008. It works OK for the MS compiler. When I try the Intel C++ Composer, it fails as showed in the following:
1>------ Rebuild All started: Project: testCUDA, Configuration: Debug Win32 ------
1>Deleting intermediate files and output files for project 'testCUDA', configuration 'Debug|Win32'.
1>Compiling with CUDA Build Rule... (Microsoft VC++ Environment)
1>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\\bin\nvcc.exe" -G -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "D:\Microsoft Visual Studio 9\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\\include" -maxrregcount=0 --compile -o "Debug/kernel.cu.obj" kernel.cu
1>nvcc : fatal error : A single input file is required for a non-link phase when an outputfile is specified
1>Project testCUDA : error: A tool returned an error code from "Compiling with CUDA Build Rule..."
1>Build log was saved at "file://C:\Users\JSC\Documents\Visual Studio 2008\Projects\testCUDA\testCUDA\Debug\BuildLog.htm"
1>testCUDA - 2 error(s), 0 warning(s), 0 remark(s)
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========
My platform is win7(32bit) with CUDA5.0. I use the Intel C++ compiler with version form 11.1 to Composer XE 2011 and even Composer XE 2013. All the versions of Intel C++ compiler will provide the error information.
Your help will be highly appreciated!

As explained in The CUDA 5.0 Release Notes, on Windows only Visual C++ 9.0/10.0 compilers are supported.
On Linux only GCC is supported (see the link above for specific versions).

Related

cannot build with compatibility above 20

I am using vs2010, cuda6.5. When I specify
compute_20,sm_20
in project Properties -> CUDA C/C++ ->Device, the code builds with no problem.
However, when I designate two more compatibility like following:
compute_20,sm_20
compute_35,sm_35
compute_52,sm_52
The build fails, and gives following error message:
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 6.5.targets(593,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2010 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64" -IC:\FAWKESBASE\Release\INC -IC:\FAWKESBASE\Release\INC -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MD " -o x64\Release\FilterSino.cu.obj "D:\SW_ImageChan_64Slice\RecCWinLibAxial64\FilterSino.cu"" exited with code 1.
1>
I tried 35 and 52, both failed. It seems as long as the compatibility is higher than 20, it fails. Anyone has a pointer? Thanks a lot.
UPDATE:
Looks like it says:
nvcc fatal : Unsupported gpu architecture 'compute_52'
So can I support 52 at this time? Is this a vs2010 problem or a cuda6.5 problem? I have a card that is compatible with 52, that is why I added this option in my build.
The first release of CUDA 6.5 doesn't support compute capability 5.2 or newer devices. NVIDIA released an updated version of 6.5 with support for the GT9xx family of GPUs (sm_52 architecture) which you could try, otherwise you need to install CUDA 7.0 or newer to compile for that architecture.

compilation .cu files with Dynamic Parallelism(CUDA)

I switched to a new GPU GeForce GTX 980 with cc 5.2, so it must support dynamic parallelism. However, I was not able to compile even a simple code (from programming guide). I will not provide it here (not necessary, just there is a global kernel calling another global kernel).
1) I use VS2013 for coding. In property pages -> CUDA C/C++ -> device, I changed code generation property to compute_35,sm_35, and here is the output:
1>------ Build started: Project: testCublas3, Configuration: Debug Win32 ------
1> Compiling CUDA source file kernel.cu...
1>
1> C:\programs\misha\cuda\Projects\test projects\testCublas3\testCublas3>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_35,code=\"sm_35,compute_35\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -g -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o Debug\kernel.cu.obj "C:\programs\misha\cuda\Projects\test projects\testCublas3\testCublas3\kernel.cu"
1>C:/programs/misha/cuda/Projects/test projects/testCublas3/testCublas3/kernel.cu(13): error : kernel launch from __device__ or __global__ functions requires separate compilation mode
1> kernel.cu
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V120\BuildCustomizations\CUDA 6.5.targets(593,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_35,code=\"sm_35,compute_35\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -g -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o Debug\kernel.cu.obj "C:\programs\misha\cuda\Projects\test projects\testCublas3\testCublas3\kernel.cu"" exited with code 2.
I guess, that I need another option for this compilation: -rdc=true, but I didn't find where I can set it in VS2013.
2) When I set code generationproperty to compute_52,sm_52, there is a error: Unsupported gpu architecture 'compute_52'. But my cc is 5.2. So I can compile codes for 3.5 cc maximum?
Thanks
Regarding item 1, cuda dynamic parallelism requires separate compilation and linking (-rdc=true), as well as linking in of the device cudart libraries (-lcudadevrt). Dynamic parallelism that also uses CUBLAS will also require linking in the device CUBLAS library (-lcublas_device). Possibly the simplest way to define where all these should go in a visual studio project is to start by looking at a visual studio project for the device cublas sample.
Regarding item 2, the reason your GTX 980 compute capability 5.2 is not being recognized is that you need the latest update for the cuda 6.5 toolkit, which is available here.
(Note that the cublas_device capability has been removed from recent versions of CUDA.)

How to change compute_' ' and sm_' ' parameters in visual studio 2010?

My graphic card is EVGA GTX 550 Ti and Compute Capability is 2.1, I want set Code Generation to compute_20,sm_21in Configuration Properties in Visual studio for dynamic global memory allocation in device. I follow this link but after change compute_10,sm_10 to compute_20,sm_21, compiler still use previous parameters. message of output window is:
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 5.0.targets(498,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2010 -ccbin "E:\Programs\Microsoft Visual Studio 10.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -arch=sm_20 -g -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\kernel.cu.obj" "C:\Users\Mahdi\Documents\Visual Studio 2010\Projects\Paralllel SW Algorithm\Paralllel SW Algorithm\kernel.cu"" exited with code 2.
Specifications:
Microsoft visual studio 2010
Nsight Visual studio v3.0
CUDA Toolkit v5.0 64bit
If you added -arch=sm_20 in the "additional options" area, that won't work. You must change it in the relevant project properties area. Instead of following the "question", you should have followed the first (accepted) answer in that link
delete the addition you made in the "additional options" area.
with the project properties dialog open, on the left hand side, under "configuration properties", select "Device" under CUDA C/C++
now on the right hand pane, there is a drop down selection box for "Code Generation", select "compute_20, sm_21". Since you are building the debug version of the project, make sure to make this change for the debug version in the properties (listed at the top of the dialog box). You probably also want to make this change in the release version as well.
Instead of following the picture here you should follow the picture here

GASS.CUDA.CUResult.ErrorInvalidImage error in CUDA.NET

I try to develop a form application using CUDA.NET in Visual Stuio C#. However I need to create cubin file, here is my problem:
I tried to create cubin file using Visual Studio. I changed the setting in Project Properties->Configuration Properties->Cuda Runtime API->GPU->NVCC Compilation Type to "Generate 64 bit .cubin file (-m64 -cubin)" from "Generate hybrid object file (--compile / -c)".
But I got this error "fatal error LNK1181: cannot open input file '.\Debug\histogram256.cu.obj'". It cannot find the object of some of my .cu files.
So I changed only those .cu files' "NVCC Compilation Type" setting to (-m64 -cubin) and compile them alone by right clicking on those files. However it did not create any .obj or .cubin file.
Then I tried to compile it from the command line. I copied the line in the .cu file's Property Page->Cuda Runtime API->Command Line which is
" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "c:\Program Files\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o "Debug/device.cubin" "device.cu""
device.cu is my cuda file.
It created the .cubin file, but I got this error from CUDA.NET: "GASS.CUDA.CUResult.ErrorInvalidImage."
Do you have any suggestions?
1: be sure that your viedecard support gencode-arch and code you specify.
2: You compile for 32 bit, be sure u are compiling and reference cuda.net for 32 bit also for .net.
3: Use managedcuda (codeplex) is much much better samples and documentation are provided.

Nsight skips (ignores) over break points in VS10 Cuda works fine, nsight consistently skips over several breakpoints

I'm using nsight 2.2, Toolkit 4.2, latest nvidia driver, I'm using couple GPUs in my computer. Build customize 4.2. I have set "generate GPU ouput" on CUDA's project properties, nsight monitor is on (everything looks great).
I set several break points on my global - kernel function. nsight stops at the declaration of the function, but skips over several break points. It's just like nsight decide whether to hit a break point or skip over a break point. The funny thing is that nsight stops at for loops, but doesn't stop on simple assignment operations.
One more problem is that I can't set focus or add variables to the watch list. In this case (see attached screenshot) I can't resolve the value of variable : "posss" or "testDetctoinRate1" which are registers in this case. On the other hand, shared memory or block memory would insert automatically to the local's list.
Here is a screen shot of the kernel, before debugging
Here is a screen shot during debugging
I evoke my kernel function with following call:
checkCUDA<<<1, 32>>>(sumMat->rows,sumMat->cols , (UINT *)pGPUsumMat);
cudaError = cudaGetLastError();
if(cudaError != cudaSuccess)
{
printf("CUDA error: %s\n", cudaGetErrorString(cudaError));
exit(-1);
}
kernel call works without an error.
Is there any option to forcing nsight stops at all breakpoints? How can I add thread's registers to my watch list?
Update
Initially, my debug command line is as follows:
# Runtime API (NVCC Compilation Type is hybrid object or .c file)
set CUDAFE_FLAGS=--sdk_dir "c:\Program Files\Microsoft SDKs\Windows\v7.0A\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" --use-local-env --cl-version 2010 -ccbin "C:\Program Files\Microsoft Visual Studio 10.0\VC\bin" -I"..\..\..\opencv\modules\gpu\src\opencv2\gpu\device" -I"..\..\..\opencv\modules\gpu\include\opencv2\gpu" -I"..\..\..\build\include\\" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -Xcompiler "/EHsc /nologo /Od /Zi /MDd " -o "Debug\%(Filename)%(Extension).obj" "%(FullPath)"
I changed on property page --> cuda --> host --> generate hosting debug information --> No
Now my command line doesn't contain the -g and -O letters , my command line is as followed:
# Runtime API (NVCC Compilation Type is hybrid object or .c file)
set CUDAFE_FLAGS=--sdk_dir "c:\Program Files\Microsoft SDKs\Windows\v7.0A\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe" --use-local-env --cl-version 2010 -ccbin "C:\Program Files\Microsoft Visual Studio 10.0\VC\bin" -I"..\..\..\opencv\modules\gpu\src\opencv2\gpu\device" -I"..\..\..\opencv\modules\gpu\include\opencv2\gpu" -I"..\..\..\build\include\\" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -Xcompiler "/EHsc /nologo /Od /Zi /MDd " -o "Debug\%(Filename)%(Extension).obj" "%(FullPath)"
Although, I do debug with -o, does it matter? It doesn't make any change.
Right click the .cu file in the Solution Explorer, then go to CUDA C/C++ | Device and set Generate GPU Debug Information to Yes (-G0).
Check whether "Enable CUDA Memory Checker" under Nsight is turned off or not. It may allow NSight to stop breakpoints of your CUDA kernel code on Debug mode of VS C++ 2010. At least, it works for me.
In the debug build, are you passing both the -O and the -g options to nvcc? If so, try removing the -O.
Background: This sounds like the kind of problem one gets when trying to debug code that has been optimized by the compiler. During optimization, the compiler changes the code in such a way that some lines of source no longer have any machine code instructions associated with them, making it impossible for the debugger to set breakpoints on those lines.
I have similar issue. Nsight is not stopping at any of the break points. But completes execution.
If i use -G0 as debug info option. It gives an error.
I am using nvidia 2.2.0.1225 with cuda 4.2 and cuda 5 tool kit. With 301.42 graphic driver.