Is GPGPU a hack? - language-agnostic

Is GPGPU a hack? - language-agnostic

I had started working on GPGPU some days ago and successfully implemented cholesky factorization with good performacne and I attended a conference on High Performance Computing where some people said that "GPGPU is a Hack".
I am still confused what does it mean and why they were saying it hack. One said that this is hack because you are converting your problem into a matrix and doing operations on it. But still I am confused that does people think it is a hack or if yes then why?
Can anyone help me, why they called it a hack while I found nothing wrong with it.

One possible reason for such opinion is that the GPU was not originally intended for general purpose computations. Also programming a GPU is less traditional and more hardcore and therefore more likely to be perceived as a hack.
The point that "you convert the problem into a matrix" is not reasonable at all. Whatever task you solve with writing code you choose reasonable data structures. In case of GPU matrices are likely the most reasonable datastructures and it's not a hack but just a natural choice to use them.
However I suppose that it's a matter of time for GPGPU becoming widespread. People just have to get used to the idea. After all who cares which unit of the computer runs the program?

On the GPU, having efficient memory access is paramount to achieving optimal performance. This often involves restructuring or even choosing entirely new algorithms and data structures. This is reason why GPU programming can be perceived as a hack.
Secondly, adapting an existing algorithm to run on the GPU is not in and of itself science. The relatively low scientific contribution of some GPU algorithm-related papers has led to a negative perception of GPU programming as strictly "engineering".

Obviously, only the person who said that can say for certain why he said it, but, here's my take:
A "Hack" is not a bad thing.
It forces people to learn new programming languages and concepts. For people who are just trying to model the weather or protein folding or drug reactions, this is an unwelcome annoyance. They didn't really want to learn FORTRAN (or whatever) in the first place, and now the have to learn another programming system.
The programming tools are NOT very mature yet.
The hardware isn't as reliable as CPUs (yet) so all of the calculations have to be done twice to make sure you've got the right answer. One reason for this is that GPUs don't come with error-correcting memory yet, so if you're trying to build a supercomputer with thousands of processors, the probability of a cosmic ray flipping a bit in you numbers approaches certainty.
As for the comment "you are converting your problem into a matrix and doing operations on it", I think that shows a lot of ignorance. Virtually ALL of high-performance computing fits that description!

One of the major problems in GPGPU for the past few years and probably for the next few is that programming them for arbitrary tasks was not very easy. Up until DX10 there was no integer support among GPUs and branching is still very poor. This is very much a situation where in order to get maximum benefit you have to write your code in a very awkward manner to extract all sorts of efficiency gains from the GPU. This is because you're running on hardware that is still dedicated to processing polygons and textures, rather than abstract parallel tasks.
Obviously, thats my take on it and YMMV

The GPGPU harks back to the days of the math co-processor. A hack is a shortcut to solving a long winded problem. GPGPU is a hack just like NAT on top of IPV4 is a hack. Computational problems just like networks are getting bigger as we try to do more, GPGPU is an useful interim solution, whether it stays outside the core CPU chip and has separate cranky API or gets sucked into the CPU via API or manufacture is up to the path finders.

I suppose he meant that using GPGPU forced you to restructure your implementation, so that it fitted the hardware, not the problem domain. Elegant implementation should fit the latter.
Note, that the word "hack" may have several different meanings:
http://www.urbandictionary.com/define.php?term=hack

Related

Normal Cuda Vs CuBLAS?

Just of curiosity. CuBLAS is a library for basic matrix computations. But these computations, in general, can also be written in normal Cuda code easily, without using CuBLAS. So what is the major difference between the CuBLAS library and your own Cuda program for the matrix computations?

We highly recommend developers use cuBLAS (or cuFFT, cuRAND, cuSPARSE, thrust, NPP) when suitable for many reasons:
We validate correctness across every supported hardware platform, including those which we know are coming up but which maybe haven't been released yet. For complex routines, it is entirely possible to have bugs which show up on one architecture (or even one chip) but not on others. This can even happen with changes to the compiler, the runtime, etc.
We test our libraries for performance regressions across the same wide range of platforms.
We can fix bugs in our code if you find them. Hard for us to do this with your code :)
We are always looking for which reusable and useful bits of functionality can be pulled into a library - this saves you a ton of development time, and makes your code easier to read by coding to a higher level API.
Honestly, at this point, I can probably count on one hand the number of developers out there who actually implement their own dense linear algebra routines rather than calling cuBLAS. It's a good exercise when you're learning CUDA, but for production code it's usually best to use a library.
(Disclosure: I run the CUDA Library team)

There's several reasons you'd chose to use a library instead of writing your own implementation. Three, off the top of my head:
You don't have to write it. Why do work when somebody else has done it for you?
It will be optimised. NVIDIA supported libraries such as cuBLAS are likely to be optimised for all current GPU generations, and later releases will be optimised for later generations. While most BLAS operations may seem fairly simple to implement, to get peak performance you have to optimise for hardware (this is not unique to GPUs). A simple implementation of SGEMM, for example, may be many times slower than an optimised version.
They tend to work. There's probably less chance you'll run up against a bug in a library then you'll create a bug in your own implementation which bites you when you change some parameter or other in the future.
The above isn't just relevent to cuBLAS: if you have a method that's in a well supported library you'll probably save a lot of time and gain a lot of performance using it relative to using your own implementation.

When does it make sense to use a GPU?

I have code doing a lot of operations with objects which can be represented as arrays.
When does it make to sense to use GPGPU environments (like CUDA) in an application? Can I predict performance gains before writing real code?

The convenience depends on a number of factors. Elementwise independent operations on large arrays/matrices are a good candidate.
For your particular problem (machine learning/fuzzy logic), I would recommend reading some related documents, as
Large Scale Machine Learning using NVIDIA CUDA
and
Fuzzy Logic-Based Image Processing Using Graphics Processor Units
to have a feeling on the speedup achieved by other people.

As already mentioned, you should specify your problem. However, if large parts of your code involve operations on your objects that are independent in a sense that object n does not have to wait for the results of the operations objects 0 to n-1, GPUs may enhance performance.

You could go to CUDA Zone to get yourself a general idea about what CUDA can do and do better than CPU.
https://developer.nvidia.com/category/zone/cuda-zone
CUDA has already provided lots of performance libraries, tools and ecosystems to reduce the development difficulty. It could also help you understand what kind of operations CUDA are good at.
https://developer.nvidia.com/cuda-tools-ecosystem
Further more, CUDA provided benchmark report on some of the most common and representative operations. You could find if your code can benefit from that.
https://developer.nvidia.com/sites/default/files/akamai/cuda/files/CUDADownloads/CUDA_5.0_Math_Libraries_Performance.pdf

Purpose of abstraction

What is the purpose of abstraction in coding:
Programmer's efficiency or program's efficiency?
Our professor said that it is used merely for helping the programmer comprehend & modify programs faster to suit different scenarios. He also contended that it adds an extra burden on the program's performance. I am not exactly clear by what this means.
Could someone kindly elaborate?

I would say he's about half right.
The biggest purpose is indeed to help the programmer. The computer couldn't care less how abstracted your program is. However, there is a related, but different, benefit - code reuse. This isn't just for readability though, abstraction is what lets us plug various components into our programs that were written by others. If everything were just mixed together in one code file, and with absolutely no abstraction, you would never be able to write anything even moderately complex, because you'd be starting with the bare metal every single time. Just writing text on the screen could be a week long project.
About performance, that's a questionable claim. I'm sure it depends on the type and depth of the abstraction, but in most cases I don't think the system will notice a hit. Especially modern compiled languages, which actually "un-abstract" the code for you (things like loop unrolling and function inlining) sometimes to make it easier on the system.

Your professor is correct; abstraction in coding exists to make it easier to do the coding, and it increases the workload of the computer in running the program. The trick, though, is to make the (hopefully very tiny) increase in computer workload be dwarfed by the increase in programmer efficiency.
For example, on an extremely low-level; object-oriented code is an abstraction that helps the programmer, but adds some overhead to the program in the end in extra 'stuff' in memory, and extra function calls.

Since Abstraction is really the process of pulling out common pieces of functionality into re-usable components (be it abstract classes, parent classes, interfaces, etc.) I would say that it is most definitely a Programmer's efficiency.
Saying that Abstraction comes at the cost of performance is treading on unstable ground at best though. With most modern languages, abstraction (thus enhanced flexibility) can be had a little to no cost to the performance of the application.

What abstraction is is effectively outlined in the link Tesserex posted. To your professor's point about adding an additional burden on the program, this is actually fairly true. However, the burden in modern systems is negligible. Think of it in terms of what actually happens when you call a method: each additional method you call requires adding a number of additional data structures to the stack and then handling the return values also placed on the stack. So for instance, calling
c = add(a, b);
which looks something like
public int add(int a, int b){
return a + b;
}
requires pushing two integers onto the stack for the parameters and then pushing an additional one onto the stack for the return value. However, no memory interaction is required if both values are already in registers -- it's a simple, one-instruction call. Given that memory operations are much slower than register operations, you can see where the notion of a performance hit comes from.
Ultimately, every method call you make is going to increase the overhead of your program a little bit. However as #Tesserex points out, it's minute in most modern computer systems and as #Andrew Barber points out, that compromise is usually totally dwarfed by the increase in programmer efficiency.

Abstraction is a tool to make it easier for the programmer. The abstraction may or may not have an effect on the runtime performance of the system.
For an example of an abstraction that doesn't alter performance consider assembly. The pneumonics like mov and add are an abstraction that makes opcodes easier to remember, as compared to remembering byte-codes and other instruction encoding details. However, given the 1 to 1 mapping I'd suggest its clear that this abstraction has 0 effect on final performance.

There's not a clear-cut situation that abstraction makes life easier for the programmer at the expense of more work for the computer.
Although a higher level of abstraction typically adds at least a small amount of overhead to executing a discrete unit of code, it's also what allows the programmer to think about a problem in larger "units" so he can do a better job of understanding an entire problem, and avoid executing mane (or at least some) of those discrete units of code.
Therefore, a higher level of abstraction will often lead to faster-executing programs as long as you avoid adding too much overhead. The problem, of course, is that there's no easy or simple definition of how much overhead is too much. That stems largely from the fact that the amount of overhead that's acceptable depends heavily on the problem being solved, and the degree to which working at a higher level of abstraction allows the programmer to recognize operations that are truly unnecessary, and eliminate them.

How well do common programming tasks translate to GPUs?

I have recently begun working on a project to establish how best to leverage the processing power available in modern graphics cards for general programming. It seems that the field general purpose GPU programming (GPGPU) has a large bias towards scientific applications with a lot of heavy math as this fits well with the GPU computational model. This is all good and well, but most people don't spend all their time running simulation software and the like so we figured it might be possible to create a common foundation for easily building GPU-enabled software for the masses.
This leads to the question I would like to pose; What are the most common types of work performed by programs? It is not a requirement that the work translates extremely well to GPU programming as we are willing to accept modest performance improvements (Better little than nothing, right?).
There are a couple of subjects we have in mind already:
Data management - Manipulation of large amounts of data from databases
and otherwise.
Spreadsheet type programs (Is somewhat related to the above).
GUI programming (Though it might be impossible to get access to the
relevant code).
Common algorithms like sorting and searching.
Common collections (And integrating them with data manipulation
algorithms)
Which other coding tasks are very common? I suspect a lot of the code being written is of the category of inventory management and otherwise tracking of real 'objects'.
As I have no industry experience I figured there might be a number of basic types of code which is done more often than I realize but which just doesn't materialize as external products.
Both high level programming tasks as well as specific low level operations will be appreciated.

General programming translates terribly to GPUs. GPUs are dedicated to performing fairly simple tasks on streams of data at a massive rate, with massive parallelism. They do not deal well with the rich data and control structures of general programming, and there's no point trying to shoehorn that into them.

General programming translates terribly to GPUs. GPUs are dedicated to performing fairly simple tasks on streams of data at a massive rate, with massive parallelism. They do not deal well with the rich data and control structures of general programming, and there's no point trying to shoehorn that into them.
This isn't too far away from my impression of the situation but at this point we are not concerning ourselves too much with that. We are starting out by getting a broad picture of which options we have to focus on. After that is done we will analyse them a bit deeper and find out which, if any, are plausible options. If we end up determining that it is impossible to do anything within the field, and we are only increasing everybody's electricity bill then that is a valid result as well.

Things that modern computers do a lot of, where a little benefit could go a long way? Let's see...
Data management: relational database management could benefit from faster relational joins (especially joins involving a large number of relations). Involves massive homogeneous data sets.
Tokenising, lexing, parsing text.
Compilation, code generation.
Optimisation (of queries, graphs, etc).
Encryption, decryption, key generation.
Page layout, typesetting.
Full text indexing.
Garbage collection.

I do a lot of simplifying of configuration. That is I wrap the generation/management of configuration values inside a UI. The primary benefit is I can control work flow and presentation to make it simpler for non-techie users to configure apps/sites/services.

The other thing to consider when using a GPU is the bus speed, Most Graphics cards are designed to have a higher bandwidth when transferring data from the CPU out to the GPU as that's what they do most of the time. The bandwidth from the GPU back up to the CPU, which is needed to return results etc, isn't as fast. So they work best in a pipelined mode.

You might want to take a look at the March/April issue of ACM's Queue magazine, which has several articles on GPUs and how best to use them (besides doing graphics, of course).

Feasibility of GPU as a CPU? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What do you think the future of GPU as a CPU initiatives like CUDA are? Do you think they are going to become mainstream and be the next adopted fad in the industry? Apple is building a new framework for using the GPU to do CPU tasks and there has been alot of success in the Nvidias CUDA project in the sciences. Would you suggest that a student commit time into this field?

Commit time if you are interested in scientific and parallel computing. Don't think of CUDA and making a GPU appear as a CPU. It only allows a more direct method of programming GPUs than older GPGPU programming techniques.
General purpose CPUs derive their ability to work well on a wide variety of tasks from all the work that has gone into branch prediction, pipelining, superscaler, etc. This makes it possible for them to achieve good performance on a wide variety of workloads, while making them suck at high-throughput memory intensive floating point operations.
GPUs were originally designed to do one thing, and do it very, very well. Graphics operations are inherently parallel. You can calculate the colour of all pixels on the screen at the same time, because there are no data dependencies between the results. Additionally, the algorithms needed did not have to deal with branches, since nearly any branch that would be required could be achieved by setting a co-efficient to zero or one. The hardware could therefore be very simple. It is not necessary to worry about branch prediction, and instead of making a processor superscaler, you can simply add as many ALU's as you can cram on the chip.
With programmable texture and vertex shaders, GPU's gained a path to general programmability, but they are still limited by the hardware, which is still designed for high throughput floating point operations. Some additional circuitry will probably be added to enable more general purpose computation, but only up to a point. Anything that compromises the ability of a GPU to do graphics won't make it in. After all, GPU companies are still in the graphics business and the target market is still gamers and people who need high end visualization.
The GPGPU market is still a drop in the bucket, and to a certain extent will remain so. After all, "it looks pretty" is a much lower standard to meet than "100% guaranteed and reproducible results, every time."
So in short, GPU's will never be feasible as CPU's. They are simply designed for different kinds of workloads. I expect GPU's will gain features that make them useful for quickly solving a wider variety of problems, but they will always be graphics processing units first and foremost.
It will always be important to always match the problem you have with the most appropriate tool you have to solve it.

Long-term I think that the GPU will cease to exist, as general purpose processors evolve to take over those functions. Intel's Larrabee is the first step. History has shown that betting against x86 is a bad idea.
Study of massively parallel architectures and vector processing will still be useful.

First of all I don't think this questions really belongs on SO.
In my opinion the GPU is a very interesting alternative whenever you do vector-based float mathematics. However this translates to: It will not become mainstream. Most mainstream (Desktop) applications do very few floating-point calculations.
It has already gained traction in games (physics-engines) and in scientific calculations. If you consider any of those two as "mainstream", than yes, the GPU will become mainstream.
I would not consider these two as mainstream and I therefore think, the GPU will raise to be the next adopted fad in the mainstream industry.
If you, as a student have any interest in heavily physics based scientific calculations, you should absolutely commit some time to it (GPUs are very interesting pieces of hardware anyway).

GPU's will never supplant CPU's. A CPU executes a set of sequential instructions, and a GPU does a very specific type of calculation in parallel. These GPU's have great utility in numerical computing and graphics; however, most programs can in no way utilize this flavor of computing.
You will soon begin seeing new processers from Intel and AMD that include GPU-esque floating point vector computations as well as standard CPU computations.

I think it's the right way to go.
Considering that GPUs have been tapped to create cheap supercomputers, it appears to be the natural evolution of things. With so much computing power and R&D already done for you, why not exploit the available technology?
So go ahead and do it. It will make for some cool research, as well as a legit reason to buy that high-end graphic card so you can play Crysis and Assassin's Creed on full graphic detail ;)

Its one of those things that you see 1 or 2 applications for, but soon enough someone will come up with a 'killer app' that figures out how to do something more generally useful with it, at superfast speeds.
Pixel shaders to apply routines to large arrays of float values, maybe we'll see some GIS coverage applications or well, I don't know. If you don't devote more time to it than I have then you'll have the same level of insight as me - ie little!
I have a feeling it could be a really big thing, as do Intel and S3, maybe it just needs 1 little tweak adding to the hardware, or someone with a lightbulb above their head.

With so much untapped power I cannot see how it would go unused for too long. The question is, though, how the GPU will be used for this. CUDA seems to be a good guess for now but other techologies are emerging on the horizon which might make it more approachable by the average developer.
Apple have recently announced OpenCL which they claim is much more than CUDA, yet quite simple. I'm not sure what exactly to make of that but the khronos group (The guys working on the OpenGL standard) are working on the OpenCL standard, and is trying to make it highly interoperable with OpenGL. This might lead to a technology which is better suited for normal software development.
It's an interesting subject and, incidentally, I'm about to start my master thesis on the subject of how best to make the GPU power available to the average developers (if possible) with CUDA as the main focus.

A long time ago, it was really hard to do floating point calculations (thousands/millions of cycles of emulation per instruction on terribly performing (by today's standards) CPUs like the 80386). People that needed floating point performance could get an FPU (for example, the 80387. The old FPU were fairly tightly integrated into the CPU's operation, but they were external. Later on they became integrated, with the 80486 having an FPU built-in.
The old-time FPU is analagous to GPU computation. We can already get it with AMD's APUs. An APU is a CPU with a GPU built into it.
So, I think the actual answer to your question is, GPU's won't become CPUs, instead CPU's will have a GPU built in.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008