Learning PTX from scratch [closed] - cuda

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'd like to start learning PTX, where should I start? Is there any good book/resource to do this?
I already know x86/x64 ASM (more or less) if this might help

It will help to be familiar with some other assembly language.
The definitive reference is the PTX guide. Although it serves as a reference manual for the instruction set, it's fairly readable and the first 7 or so chapters start from a relatively basic introduction of parallel thread exection to describe all the concepts.
You may also be interested in the shorter document:
/usr/local/cuda/doc/pdf/Inline_PTX_Assembly.pdf
(on a standard linux install. On windows, just search for "Inline_PTX_Assembly.pdf" The PTX ISA 3.2 document is there as well)
This document discusses enough of PTX so that you can try out little snippets without having to build a complete kernel out of it, if you don't want to.
You should also be aware of the nvcc options that may be useful, such as -ptx to generate ptx code, -G to eliminate most optimizations (which can make the generated ptx hard to understand), and -src-in-ptx which will interleave your lines of kernel source code with the generated ptx, to additionally help with your understanding.
Finally, be aware that PTX is not actually what the machine runs, although it's close to it. PTX is an intermediate code, which will go through an additional compilation step to create SASS code, which is the actual machine code. You can inspect the SASS code as well, using the cuobjdump utility (cuobjdump -sass mycode), but SASS doesn't have the same level of documentation as PTX. So you should start with an understanding of PTX.

Related

GPU Programming, CUDA or OpenCL or? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 9 months ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What is the best way to do programming for GPU?
I know:
CUDA is very good, much developer support and very nice zo debug, but only on NVidia Hardware
OpenCL is very flexible, run on NVidia, AMD and Intel Hardware, run on Accellerators, GPU and CPU but as far as I know not supported anymore by NVidia.
Coriander (https://github.com/hughperkins/coriander) which converts CUDA to OpenCL
HIP https://github.com/ROCm-Developer-Tools/HIP is made by AMD to have a possibility to write in a way to convert to AMD and NVidia CUDA. It also can convert CUDA to HIP.
OpenCL would my prefered way, I want to be very flexible in hardware support. But if not longer supported by NVidia, it is a knockout.
HIP sounds then best to me with different released files. But how will be the support of Intels soon coming hardware?
Are there any other options?
Important is for me many supported hardeware, long term support, so that can be compiled in some years also and manufacture independant.
Additional: Should be able to use more than obe compiler, on Linux and Windows supported.
Nvidia won't cancel OpenCL support anytime soon.
A newly emerging approach for portable code on GPU is SYCL. It enables higher level programming from a single source file that is then compiled twice, once for the CPU and once for GPU. The GPU part then runs on GPU via either OpenCL, CUDA or some other backend.
As of right now however, the best supported GPU framework across plattforms is OpenCL 1.2, which is very well established at this point. With that your code runs on 10 year old GPUs, on the latest and fastest data-center GPUs, on gaming and workstation GPUs and even on CPUs if you need more memory. On Nvidia GPUs there is no performance/efficiency tradeoff at all compared to CUDA; it runs just as fast.
The porting tools like HIP are great if you already have a large code base, but performance could possibly suffer. My advice is to go for either one framework and stay fully committed to it, rather than using some tool to then generate a possibly poorly optimized port.
If you choose to start with OpenCL, have a look at this OpenCL-Wrapper. The native OpenCL C++ bindings are a bit cumbersome to use, and this lightweight wrapper simplifies learning a lot, while keeping functionality and full performance.

Dynamic Function Analysis [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I found this program that appears to assist with locating when a function is called in a program. It seems quite handy and I am wondering if there is more out there like it.
http://split-code.com/cda.html
https://www.youtube.com/watch?v=P0UXR861WYM
What exactly would this program be classified as? Are there other programs similar? Is this widely used and I'm just a fool?
As the link you provided states, this tool is a
dynamic code analysis process instrumentation tool
Dynamic It is used to inspect programs at runtime.
Code analysis It provides information about the code executing (?)
Process It analysis code running in a process (specifically, a 32-bit x86 process under Windows)
Instrumentation This tool uses debugging techniques to allow automatic tracing (into every inter-modular function call) and profiling. It also allows for PIN like (although probably not as neatly implemented) callbacks.
I must mention that the author using analysis is somewhat inaccurate. The software (as far as I understand it) does not analyses code, it only provides inter-modular and intra-modular calls information from runtime. IDA, on the other hand, is a real analysis tool, because it provides information like x-refs and string view, which can only be given via in depth analysis.
There is no 'short name' for this specific type of program. This program will be classified as some sort of Instrumentational software, .

Why I have to manually active my GPUs? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I installed a new Intel Xeon Phi in a work station which already has 3 Nvidia GPUs installed. To make the Phi card work, I have to load the Intel's MIC kernel module into my Linux kernel. And by doing so the Phi card works fine. However, every time when we reboot the system, we just couldn't use the GPU. The error message is that the system couldn't find the CUDA driver.
However, the only thing I need to do to fix this is to use "SUDO" to run one CUDA binaries or some Nvidia's command just like "sudo nvida-smi". Then everything just works fine, both CUDA and Intel's Xeon phi.
Anybody knows why? Without my sudo command, other people just can not use the GPUs. This is kind of annoying. How can I fix this?
CUDA requires that certain resource files be established for GPU usage, and this is covered in the Linux getting started guide (step 6 under runfile installation -- note the recommended startup script).
You may also be interested in this article, which focuses on the same subject -- how to automatically establish the resource files at startup.
Once these files are established correctly, an ordinary user (non-root) will be able to use the GPUs without any other intervention.
I have no idea why Xeon Phi installation might have affected this in your particular setup.

what is "SASS" short for? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
what is "SASS" short for ?
I know it is an asembly level native code ISA targeting specific hardware,
exits in between PTX code and binary code.
but anyone could kindly tell me what does each character stands for ?
all that i can find about Fermi hardware native instruction is in cuobjdump.pdf, but it only gives their names, where can I find more information such as their throughput or latency or IPC or CPI, undertaking units corresponding to each intruction, like SFU, FPU ?
Streaming ASSembler.... I should know since I invented the term, lead the core team G80 streaming process architecture team and developed first SASS assembler ;-)
Since there seems to be no information on this anywhere, I can only guess: Shader ASSembly language
SASS (as Ashwin points out probably "Shader ASSembly") is the binary code that runs on the metal of Fermi architecture devices. What cudaobjdump (and older third party tools like decuda and nv50dis) show is a direct disassembly of the cubin payload emitted by the ptxas assembler.
To the best of my knowledge there is no open instruction set documentation for any NVIDIA GPUs.
At some point during the CUDA 5 release cycle, NVIDIA began to provide a summary document which annotates the basic instruction set of suported GPUs (Fermi, Kepler, and Maxwell as of CUDA 7).
Streaming ASSembly? Since NVIDIA calls their cores, "Streaming Multiprocessors".

Free implementation of multi-layer perceptron? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there a free (preferably public-domain or BSD-like license, but GPL will do) implementation of a multi-layer perceptron anywhere on the net?
I have textbook examples but the licenses are too restrictive, and although I can just about follow the math in the Wikipedia articles I'm not confident enough of getting it right and it's hard to test.
I've done a quick google search and found some free (as in beer) binary-only versions. I'm hoping to find an MLP which is part of a larger open-source project.
FANN (Fast Artifical Neural Network Library) is a great general-purpose neural-network library written in C but has bindings for just about any language you might want (C++, .NET, Python, Mathematica among others). Even better, it's open-source and licensed under the LGPL, so I'd imagine that would be fine for you.
Neuron.NET is another good alternative if you're using .NET (also open-source), though it's licensed under the GPL.
Hope that helps.
WEKA includes a multi-layer perceptron implementation. I haven't examined the source code myself but its GPL I believe.
OpenCV has a Feedforward neural network implementation.
Have a look at http://neuralensemble.org/trac/PyNN! It is a unified layer to a lot of different free simulators such as BRIAN, NEST, NEURON, etc.