Suppress "stack size cannot be dynamically determined" warnings? - cuda

I'm getting a CUDA warning saying
ptxas warning : Stack size for entry function '_Z13a_test_kernelv' cannot be
statically determined.
Now, I know what it means, and there's a SO question about why it happens. What I want to suppress the warning (when compiling with nvcc 10.x). Can I? If so, where exactly do I put the warning suppression #pragma for this?

Add --nvlink-options -suppress-stack-size-warning when compiling with nvcc.

Related

How can I disable the ptxas warning about indeterminable stack size?

When compiling CUDA device code, you may get the error (with line break for readability):
ptxas warning : Stack size for entry function '_ZN7kernels11print_stuffIiEEvv'
cannot be statically determined
This can have several reasons, like dynamic memory allocation or use of recursion, but those don't matter right now. I want to disable the warning, within some function at least. The thing is, I don't know which token to use to do so. It's no use searching this list (following the suggestion here SO about disabling specific warnings) - because those are warnings in the C/C++ front-end of NVCC, not the assembler.
So how can I disable this warning?
The important point to note is that this is an assembler warning, so none of the usual front end warning suppression options are relevant.
ptxas only supports a very limited number of warning control options. Prior to CUDA 9, only the following were supported:
--suppress-double-demote-warning (-suppress-double-demote-warning)
Suppress the warning that is otherwise emitted when a double precision instruction
is encountered in PTX that is targeted for an SM version that does not have
double precision support
--disable-warnings (-w)
Inhibit all warning messages.
--warn-on-double-precision-use (-warn-double-usage)
Warning if double(s) are used in an instruction.
--warn-on-local-memory-usage (-warn-lmem-usage)
Warning if local memory is used.
--warn-on-spills (-warn-spills)
Warning if registers are spilled to local memory.
--warning-as-error (-Werror)
Make all warnings into errors.
In your case the only option would be to suppress all warnings. Adding -Xptxas='-w' to any nvcc invocation should achieve that.
CUDA 9 and newer add another option ptxas which suppresses the warning you ask about:
--suppress-stack-size-warning (-suppress-stack-size-warning)
Suppress the warning that otherwise is printed when stack size cannot be
determined.
In this case, adding -Xptxas='-suppress-stack-size-warning' to any nvcc invocation should eliminate the warning.

How to disable all unwanted nvcc compiler warnings

I'm getting so many warnings at the nvcc compiler such as "was set but never used
", "was declared but never referenced", "expression has no effect" etc. In one post, someone suggested brute force #prgma for each warning. This is impractical because I have so many warnings. I tried -w, but that didn't work. In another post, someone suggested -Xcompiler "-w" , but that didn't work for me either. How can I suppress all nvcc warnings?
As per the documentation, --disable-warnings or -w will disable all nvcc (technically CUDA toolchain) generated warnings. As a rule, I counsel against ignoring compiler warnings. They exist for a reason.

CUDA code too large to be expanded [duplicate]

I have a CUDA class, let's call it A, defined in a header file. I have written a test kernel which creates an instance of class A, which compiles fine and produces the expected result.
In addition, I have my main CUDA kernel, which also compiles fine and produces the expected result. However, when I add code to my main kernel to instantiate an instance of class A, the nvcc compiler fails with a segmentation fault.
Update:
To clarify, the segmentation fault happens during compilation, not when running the kernel. The line I am using to compile is:
`nvcc --cubin -arch compute_20 -code sm_20 -I<My include dir> --keep kernel.cu`
where <My include dir> is the path to my local path containing some utility header files.
My question is, before spending a lot of time isolating a minimal example exhibiting the behaviour (not trivial, due to relatively large code base), has anyone encountered a similar issue? Would it be possible for the nvcc compiler to fail and die if the kernel is either too long or uses too many registers?
If an issue such as register count can affect the compiler this way, then I will need to rethink how to implement my kernel to use fewer resources. This would also mean that trimming things down to a minimal example will likely make the problem disappear. However, if this is not even a possibility, I don't want to waste time on a dead-end, but will rather try to cut things down to a minimal example and will file a bug report to NVIDIA.
Update:
As per the suggestion of #njuffa, I reran the compilation with the -v flag enabled. The output ends with the following:
#$ ptxas -arch=sm_20 -m64 -v "/path/to/kernel_ptx/kernel.ptx" -o "kernel.cubin"
Segmentation fault
# --error 0x8b --
This suggests the problem is due to the ptxas program, which is failing to generate a CUDA binary from the ptx file.
This would appear to have been a genuine bug of some sort in the CUDA 5.0 ptxas assembler. It was reported to NVIDIA and we can assume that it was fixed sometime during the more than three years since the question was asked and this answer added.
[This answer has been assembled from comments and added as a community wiki entry to get this question off the unanswered question list ]

Computation between two different kernels in Cuda [duplicate]

I'm writing a CUDA program but I'm getting the obnoxious warning:
Warning: Cannot tell what pointer points to, assuming global memory space
this is coming from nvcc and I can't disable it.
Is there any way to filter out warning from third-party tools (like nvcc)?
I'm asking for a way to filter out of the output window log errors/warnings coming from custom build tools.
I had the same annoying warnings, I found help on this thread: link.
You can either remove the -G flag on the nvcc command line,
or
change the compute_10,sm_10 to compute_20,sm_20 in the Cuda C/C++ options of your project if you're using Visual Studio.

nvcc -Xptxas –v compiler flag has no effect

I have a CUDA project. It consists of several .cpp files that contain my application logic and one .cu file that contains multiple kernels plus a __host__ function that invokes them.
Now I would like to determine the number of registers used by my kernel(s). My normal compiler call looks like this:
nvcc -arch compute_20 -link src/kernel.cu obj/..obj obj/..obj .. -o bin/..exe -l glew32 ...
Adding the "-Xptxas –v" compiler flag to this call unfortunately has no effect. The compiler still produces the same textual output as before. The compiled .exe also works the same way as before with one exception: My framerate jumps to 1800fps, up from 80fps.
I had the same problem, here is my solution:
Compile *cu files into device only *ptx file, this will discard host code
nvcc -ptx *.cu
Compile *ptx file:
ptxas -v *.ptx
The second step will show you number of used registers by kernel and amount of used shared memory.
Convert the compute_20 to sm_20 in your compiler call. That should fix it.
When using "-Xptxas -v", "-arch" together, we can not get verbose information(register num, etc.). If we want to see the verbose without losing the chance of assigning GPU architecture(-arch, -code) ahead, we can do the following steps: nvcc -arch compute_XX *.cu -keep then ptxas -v *.ptx. But we will obtain many processing files. Certainly, kogut's answer is to the point.
when you compile
nvcc --ptxas-options=-v
You may want to ctrl your compiler verbose option defaults.
For example is VStudio goto :
Tools->Options->ProjectsAndSolutions->BuildAndRun
then set the verbosity output to Normal.
Not exactly what you were looking for, but you can use the CUDA visual profiler shipped with the nvidia gpu computing sdk. Besides many other useful informations, it shows the number of registers used by each kernel in you application.