understanding HPC Linpack (CUDA edition) [closed] - cuda

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I want to know what role play CPUs when HPC Linpack (CUDA version) is runnig. They are recieving data from other cluster nodes and performing CPU-GPU data exchange, arenot they? so thier work doesnot influence on performance, yes?

In typical usage both GPU and CPU are contributing to the numerical calculations. The host code will use MKL or another BLAS implementation for host-generated numerical results, and the device code will use CUBLAS or something related for device numerical results.
A version of HPL is available to registered developers in source code format, so you can inspect all this yourself.
And as you say the CPUs are also involved in various other administration activities such as internode data exchange in a multinode setting.

Related

GPGPU performance in high-level languages [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
For my science fair project I have to write a computationally-intensive algorithm that is well suited to parallelization. I have read about OpenCL and CUDA and it seems they are mainly used from C/C++. While it would not be that difficult for me to pick up a bit of C to write a simple main, I was wondering how big the performance hit would be if I used Java or Python bindings for my GPU computation? Specifically, I was more interested in the performance hit using CUDA because that's the framework I'm planning on using.
In general, every time you add an abstraction layer you're loosing performance but, in the case of CUDA this is not completely true because, whether using Python or Java you'll end up writing your CUDA kernels on C/Fortran, so the performance in the GPU side will be the same as using C/Fortran (check some pyCUDA examples here)
The bad news it that Java and Python will never achieve the performance of compiled languages such as C on certain tasks, see this SO answer for a more detailed discussion about this topic. Here is a good discussion about C versus Java, also on SO.
There are many questions and discussions about performance comparison between interpreted and compiled languages, so I encourage you to read some of them.

Platform vs Software Framework [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
CUDA advertises itself as a parallel computing platform. However, I'm having trouble seeing how it's any different from a software framework (a collection of libraries used for some functionality). I am using CUDA in class and all I'm seeing is that it provides libraries in C for - functions that help in parallel computing on the GPU - which fits my definition of a framework. So tell me, how is a platform like CUDA different from a framework? Thank you.
CUDA the hardware platform, is the actual GPU and its scheduler ("CUDA architecture"). However CUDA is also a programming language, which is very close to C. To work with the software written in CUDA you also need an API for calling these functions, allocating memory etc. from your host language. So CUDA is a platform, a language and a set of APIs.
If the latter (a set of APIs) matches your definition of a software framework, then the answer is simply yes, as both options are true.

Difference between CUDA level and compute level? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
What is the difference these two definitions?
If no, does it mean, I will be never able to run code with sm > 21 on the gpu with compute level 2.1?
That's correct. For a compute capability 2.1 device, the maximum code specification (virtual architecture/target architecture) you can give it is -arch=sm_21 Code compiled for -arch=sm_30 for example, would not run correctly on a cc 2.1 device
For more information, you can take a look at the nvcc manual section which covers virtual architectures, as well as the manual section which covers the compile switches specifying virtual architecture and compile targets (code architecture).

What is the absolutely fastest way to output a signal to external hardware in modern PC? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I was wondering, what is the absolutely fastest way (lowest latency) to produce external signal (for example CMOS state change from 0 to 1 on electrical wire connected to other device etc.) from PC, counting from the moment, where CPU assembler program knows that signal must be produced.
I know that network device, usb, VGA monitor output have some large latency comapred to other interfaces (SATA, PCI-E). Wich of interfaces or what hardware modification can provide a near-0 latency in output from let's suppose assembler program?
I don't know if it is really the fastest interface you can provide, because that also depends on your definition of "external", but http://en.wikipedia.org/wiki/InfiniBand certainly comes close to what your question aims at. Latency is 200 nanoseconds and below in certain scenarios ...

Getting started in Firmware development [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am a software development guy. Lately I was thinking of trying out some firmware development, as the company I work for is trying to enter that domain.
I have many questions regarding firmware devlopment - like:
What are the tools used - like IDE?
In which language is most of the code written in?
How to port the code into microcontroller?
How to code for different microcontrollers?
How to determine things I would need for building a specific application(choosing the microcontroller etc.)?
Anything else I should know about and where do I start? Sorry if this question is too basic, but I could not find out any satisfactory answers elsewhere.
Most microcontrollers have decent C compilers so are best coded for in C, although you might need to delve into assembly routines for occassional high performance routines. The choice of microcontroller is usually determined by the hardware demands, on board peripherals, performance and cost constraints.
You wouldn't generally be porting code from a Windows/Linux/Mac environment to a microcontroller one; you would generally be writing directly for the microcontroller, so strictly the compiler is a cross compiler - compiling on your PC to run on a different processor. You typically get debuggers, emulators and full editor capabilities in the IDE, so its a similar experience to writing code in a PC environment, but it runs slower, and has to be downloaded to the target hardware or emulated to be tested.
A great authority to start reading about embedded development is Jack Gansle and his firmware handbook. Also www.embedded.com for general articles.