Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I know the make_float4 constructor is in vector_functions.h, but which is the header file that implements float4 operation in CUDA?
Thanks.
I don't believe there is a standard cuda header file (i.e. one that will be found by nvcc automatically, such as those in /usr/local/cuda/include) that implements a variety of float4 operators.
However the "helper" header file at:
/usr/local/cuda/samples/common/inc/helper_math.h
(example path on linux) which gets installed with the cuda samples, defines a number of arithmetic operators on float4 quantities.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 days ago.
Improve this question
I have a deep learning algorithm using CUDA. I have an RTX3060 graphics card, but only 13% is used when running code. What should I do to make my code run faster and use my graphics card at full performance?
I did the nvidia's cuda driver installations properly
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 10 months ago.
Improve this question
I'm new to CUDA, and so far all the tutorials I've seen are for arrays.
I am wondering if you can define something like double variable on CUDA, or does something like that have to live on the CPU?
You can have scalar variable as a kernel parameter, as a private variable, as a shared memory variable and even as a global compilation unit variable.
You can have scalar fields in classes, array of structs, struct of arrays, anything that uses a plain old data. You can use typedef, define macro and any bit level hacking as long as the variable is loaded/stored with proper alignment.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Actually, the title already is the full question.
Why did Nvidia decide to call its GPU entry functions kernels, but in Cuda they must be annotated with __global__ instead of __kernel__?
The goal is to separate the entity (kernel) and its scope or location.
There three types of the functions which relate to your question:
__device__ functions can be called only from the device, and it is
executed only in the device.
__global__ functions can be called
from the host, and it is executed in the device.
__host__
functions run on the host, called from the host.
If they named functions scope __kernel__, it would be impossible to distinguish them in the way they are separated above.
The __global__ here means "in space shared between host and device" and in these terms in "global area between them".
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I tried creating a new enviroment as shown here:
https://github.com/microsoft/computervision-recipes/blob/master/SETUP.md
and I checked cuda is running well, my gpu is detected and everything seems well. but when I fit the model nvdia-smi shows no occupation on GPU and the CPU is at 100%.
Torch requires setting Cuda device for each model and training process, you need to pass all weights and another tensors to Cuda device by hand. Like that:
a = torch.tensor([1, 2, 3])
a = a.to('cuda')
But in your case, maybe you need to use the method from the library. I recommend checking this one https://github.com/microsoft/computervision-recipes/blob/1bb489af757fde7c773e16fab87b24305cff4457/utils_cv/tracking/references/fairmot/trains/base_trainer.py#L32
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
What is the difference these two definitions?
If no, does it mean, I will be never able to run code with sm > 21 on the gpu with compute level 2.1?
That's correct. For a compute capability 2.1 device, the maximum code specification (virtual architecture/target architecture) you can give it is -arch=sm_21 Code compiled for -arch=sm_30 for example, would not run correctly on a cc 2.1 device
For more information, you can take a look at the nvcc manual section which covers virtual architectures, as well as the manual section which covers the compile switches specifying virtual architecture and compile targets (code architecture).