Is there a Cuda call or even a Windows 7 call that will help me determine if NVENC is supported on a particular machine?
Thanks.
Related
I know that each CUDA toolkit has a minimum required driver, what I'm wondering is the following: suppose I'm loading each function pointer for each driver API function (e.g. cuInit) via dlsym from libcuda.so. I use no runtime API and neither link against cudart. My kernel uses virtual architectures to be JIT-ted at runtime (and the architecture is quite low, e.g. compute_30 so that I'm content with any kepler-and-above device).
Does the minimum driver required restriction still apply in my case?
Yes, there is still a minimum driver version requirement.
The GPU driver has a CUDA version that it is designed to be compatible with. This can be discovered in a variety of ways, one of which is to run the deviceQuery (or deviceQueryDrv) sample code.
Therefore a particular GPU driver will have a "compatibility" associated with a particular CUDA version.
In order to run correctly, Driver API codes will require an installed GPU Driver that is compatible with (i.e. has a CUDA compatibility version equal to or greater than) the CUDA version that the Driver API code was compiled against.
The CUDA/GPU Driver compatibility relationships, and the concept of forward compatibility, are similar to what is described in this question/answer.
To extend/generalize the ("forward") compatibility relationship statement from the previous answer, newer GPU Driver versions are generally compatible with older CUDA codes, whether those codes were compiled against the CUDA Runtime or CUDA Driver APIs.
While building my CUDA project I get the following error:
cutil_inline_runtime.h(328): error: identifier "CURAND_STATUS_DOUBLE_PRECISION_REQUIRED" is undefined
So I started googling. Since I couldn't find the solution (nor did I find the actual problem) I downloaded from nVIDIA CURAND guide pdf and started reading. In that pdf it says:
Enumerator:
...
**CURAND_STATUS_DOUBLE_PRECISION_REQUIRED** GPU does not have double precision required by MRG32k3a
...
This means I can't perform operations with double data type... right? Well that seems wrong because, I assure you, couple a days ago I made a project using double data type and it worked just fine. Does anyone have a suggestion? Thank you!
EDIT Since I was unable to find the "real" solution for the problem, eventually I commented out the lines with "CURAND_STATUS_DOUBLE_PRECISION_REQUIRED" in cutil_inline_runtime.h and now it works. I know it is "ugly" but it was the only thing that helped...
I also had this problem. The issue was that I was using different versions of the cuda SDK and the cuda toolkit. I was using a newer version of the SDK which referenced the definition of CURAND_STATUS_DOUBLE_PRECISION_REQUIRED, but that definition was undefined in curand.h in the older version of the cuda toolkit that I had installed.
Try installing the version of the toolkit that matches your cuda SDK version.
Try Googling "Compute Capability", it's how nvidia defines the various CUDA capabilities. In the CUDA C Programming guide, it's mentioned a couple of times that devices with compute capability 1.2 or lower do not support double precision FP math: CUDA C Programming, pages 140, and also check table F-1, namely the row about dpfp support.
Gefore 8800 GT is a G80 architecture chip, which is compute capability 1.0. You cannot do double computations on this card. Keep in mind much example code makes use of emulation modes, and the like, which can fool you into thinking it works.
is it possible to check if any CUDA devices are present before all cudaMalloc... commands are called?
im using C++ and i just want to print an error message before the program launches incase the user doesn't support cuda tech.
EDIT: if i can check it from C#, it will be even better.
thanks!
You can use cudaGetDeviceCount to get the number of cuda devices and use cuda device properties to retrieve your necessary compute capabilities.
A rather old version of the API documentation for cudaGetDeviceCount can be found here.
Is there a trace or debug function that I can place in a CUDA kernel? I'm writing a program on Windows 7, VS2010 and I just discovered that to use NSIGHT Monitor I have to have 2 GPUs. I'm on a laptop unfortunately so this isn't really an option. I figured that I would fall back to tried and true debug/trace functions. Something akin to OutputDebugString. I don't see if one exists though.
Any help is appreciated. Thanks,
mj
Parallel Nsight 2.2 supports local single GPU debugging.
CUDA compute capability >= 2.0 support printf from device code.
I upgraded CUDA GPU computing SDK and CUDA computing toolkit to 4.1. I was testing simpleStreams programs, but consistently it is taking more time that non-streamed execution. my device is with compute capability 2.1 and i'm using VS2008,windows OS.
This sample constantly has issues. If you tweak the sample to have equal duration for the kernel and memory copy the overlap will improve. Normally breadth first submission is better for concurrency; however, on WDDM OS this sample will usually have better overlap if you issue the memory copy right after kernel launch.
I noticed this as well. I thought it was just me but I didn't notice any improvement and tried searching the forums but didn't find anyone else with the issue.
I also ran the source code in the Cuda By Example book (which is really helpful and I recommend you pick it up if you're serious about GPU programming).
Chapter 10 examples has the progression of examples showing how streams should be used.
http://developer.nvidia.com/content/cuda-example-introduction-general-purpose-gpu-programming-0
But comparing the,
1. non-streamed version(which is basically the single stream version)
2. the streamed (incorrectly queued asyncmemcpy and kernel launch)
3. the streamed (correctly queued asyncmemcpy and kernel launch)
I find no benefit in using cuda streams. It might be a win7 issue as I found some sources online discussing that win vista didn't support the cuda streams correctly.
Let me know what you find with the example I linked. My setup is: Win7 64bit Pro, Cuda 4.1, Dual Geforce GTX460 cards, 8GB RAM.
I'm pretty new to Cuda so may not be able to help but generally its very hard to help without you posting any code. If posting is not possible then I suggest you take a look at Nvidia's visual profiler. Its cross platform and can show you were your bottlenecks are.