Why does CUDA CUdeviceptr use unsigned int instead of void? [closed] - cuda

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Just curious.
Why does functions in driver API use unsigned int as CUdeviceptr, instead of void?
Runtime API use void, though.

I believe the underlying reason is because a CUdeviceptr is a handle to an allocation in device memory and not an address in device memory. The driver looks up addresses internally from a memory map using this handle, and the internal driver API requires it to be an unsigned integer.
Tim Murray, who was at one stage in charge of CUDA driver development at NVIDIA, wrote this answer on another forum a few years ago. I think that is about as authoritative answer as you will find (although Nick Wilt, who was the original CUDA driver author, also answers questions here on Stack Overflow occasionally and might chime in and provide a better answer than mine).

Related

Why are Cuda kernels annotated with `__global__` instead of `__kernel__` [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Actually, the title already is the full question.
Why did Nvidia decide to call its GPU entry functions kernels, but in Cuda they must be annotated with __global__ instead of __kernel__?
The goal is to separate the entity (kernel) and its scope or location.
There three types of the functions which relate to your question:
__device__ functions can be called only from the device, and it is
executed only in the device.
__global__ functions can be called
from the host, and it is executed in the device.
__host__
functions run on the host, called from the host.
If they named functions scope __kernel__, it would be impossible to distinguish them in the way they are separated above.
The __global__ here means "in space shared between host and device" and in these terms in "global area between them".

understanding HPC Linpack (CUDA edition) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I want to know what role play CPUs when HPC Linpack (CUDA version) is runnig. They are recieving data from other cluster nodes and performing CPU-GPU data exchange, arenot they? so thier work doesnot influence on performance, yes?
In typical usage both GPU and CPU are contributing to the numerical calculations. The host code will use MKL or another BLAS implementation for host-generated numerical results, and the device code will use CUBLAS or something related for device numerical results.
A version of HPL is available to registered developers in source code format, so you can inspect all this yourself.
And as you say the CPUs are also involved in various other administration activities such as internode data exchange in a multinode setting.

Platform vs Software Framework [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
CUDA advertises itself as a parallel computing platform. However, I'm having trouble seeing how it's any different from a software framework (a collection of libraries used for some functionality). I am using CUDA in class and all I'm seeing is that it provides libraries in C for - functions that help in parallel computing on the GPU - which fits my definition of a framework. So tell me, how is a platform like CUDA different from a framework? Thank you.
CUDA the hardware platform, is the actual GPU and its scheduler ("CUDA architecture"). However CUDA is also a programming language, which is very close to C. To work with the software written in CUDA you also need an API for calling these functions, allocating memory etc. from your host language. So CUDA is a platform, a language and a set of APIs.
If the latter (a set of APIs) matches your definition of a software framework, then the answer is simply yes, as both options are true.

What CUDA GPU should I buy? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am new to CUDA and going to buy a GPU that will be sufficient for my needs without spending much. I will be working on an application that will require graphics rendering as well as other general purpose computations.
What should be my primary consideration while buying ?
No. of SMs
No. of CUDA Cores
Core/Shader/Memory Clock
Memory Size
Memory Bus width
How do the above mentioned specifications affect CUDA performance?

resource acquisition failure handling [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
After years of programming, I havent had a situation where reasonable malloc or new would fail (maybe because my mallocs are trully reasonable), though I always check for it.
In my case, apps should gracefully (i hope) close with an appropriate log entry. What would you do in this case? Its interesting to hear your approach - do you wait for resources or close the shop?
I usually have my program shut-down as gracefully as it can, with simple logging of the error message. In C++ I do this by having a catch for std::bad_alloc in main(). By the time the catch executes, destructors called by stack unwinding should have freed some memory, so the logging itself is less likely to fail. I avoid memory allocation (for example by using char * strings rather than std::string strings) in that logging code, to further reduce the chance of the logging failing.
There's pretty much nothing you can do if dynamic allocation fails- there are pretty much no operations written to handle that situation. If it fails, then just let the app crash.