OpenCL for GPU vs. FPGA - cuda

I read recently about OpenCL/CUDA for FPGA vs. GPU
As I understood FPGA wins in power criteria.
The explanation for that ,I`ve found in some article:
Reconfigurable devices can have much lower power consumption from peak
values since only configured portions of the chip are active
Based on said above I have a question - does it mean that ,if some CU [Compute Unit] doen`t execute any work-item,it still consumes power? (and if yes - what for it consumes power?)

Yes, idle circuitry still consumes power. It doesn't consume as much, but it still consumes some. The reason for this is down to how transistors work, and how CMOS logic gates consume power.
Classically, CMOS logic (the type on all modern chips) only consumes power when it switches state. This made is very low power when compared to the technologies that came before it which consumed power all the time. Even so, every time a clock edge occurs, some logic changes state even if there's no work to do. The higher the clock rate, the more power used. GPUs tend to have high clock rates so they can do lots of work; FPGAs tend to have low clock rates. That's the first effect, but it can be mitigated by not clocking circuits that have no work to do (called 'clock gating')
As the size of transistors became smaller and smaller, the amount of power used when switching became smaller, but other effects (known as leakage) became more significant. Now we're at a point where the leakage power is very significant, and it's multiplied up by the number of gates you have in a design. Complex designs have high leakage power; Simple designs have low leakage power (in very basic terms). This is a second effect.
Hence, for a simple task it may be more power efficient to have a small dedicated low speed FPGA rather than a large complex, but high speed / general purpose CPU/GPU.

As always, it depends on the workload. For workloads that are well-supported by native GPU hardware (e.g. floating point, texture filtering), I doubt an FPGA can compete. Anecdotally, I've heard about image processing workloads where FPGAs are competitive or better. That makes sense, since GPUs are not optimized to operate on small integers. (For that reason, GPUs often are uncompetitive with CPUs running SSE2-optimized image processing code.)
As for power consumption, for GPUs, suitable workloads generally keep all the execution units busy, so it's a bit of an all-or-nothing proposition.

Based on my research on FPGAs and the way they work, these devices can be designed to be very power efficient and really fine-tuned for one special task (e.g., an algorithm) and use the smallest resources possible (therefore the lower amount of energy consumption among all possible choices except ASIC)
When implementing turning-complete algorithms using FPGAs, the designers have the option of either unrolling their algorithms to use the maximum parallelism offered or use a compact sequential design. Each method has its own cost-benefits; the former helps maximizing performance at the cost of higher resource consumption, and the latter helps minimizing area and resource consumption by reusing hardware at the cost of minimizing the performance.
This level of control over implementation of algorithms doesn’t exist when developing for GPUs. The developers have the control to use the most efficient algorithms yet they are not the one determining the final precise hardware implementation of their algorithms. Unlike FPGA designers who even count “nano-seconds” when calculating their design’s hardware implementation (using post-layout tools), GPU developers rely on available frameworks to enhance all implementation details for them automatically. They develop at much higher levels compared to FPGA designers.
So the well known topic of trade-offs pops up here too; you want exact control over the hardware implementation at the cost of longer development times? Choose FPGAs. You want parallelism, yet have made up your mind to give up exact control over hardware implementation and want to develop using your existing software skills? use OpenCL.

Kudos to #hamzed, but OpenCL is not taking control away from the designer of OpenCL on FPGAs. It actually gives the best of the both worlds: full programmability of FPGA with all custom parallel algorithm benefits as well as much better design closure speed vs. RTL. By being clever about your algorithm moving and not moving data you can get to near theoretical performance of FPGAs. Please see the last chart in this reference: https://www.iwocl.org/wp-content/uploads/iwocl2017-andrew-ling-fpga-sdk.pdf

Related

Does watching HD videos slow down my program using the CUDA CPU? [duplicate]

I'm trying to figure out if I can use OpenACC in place of normal CPU serial execution calls. Usually my programming is all about 3D programming, or uses the GPU normally in some way. I.E. Image processing, or some other type of rendering that requires the use of shaders. I'm trying to figure out if this Library would benefit me or not.
The reason I ask this is because if I'm rendering 3D Graphics (as fast as possible) would it slow down that process in away? Or is it able to maintain it's (in theory) "high frame rates" or not.
If so, what's the trade off, and how much? I'm not willing to loose 3D Graphics (display) performance to enhance operations that can be done on the CPU serially.
Edit:
This is a C++ context.
On the AMD and NVIDIA GPUs that I am familiar with, OpenACC programs will make use of compute resources that would also be used to some degree by shader programs. There are many other pieces of graphics hardware in a GPU that are not shared between compute and graphics, but there are some shared resources. Likewise, the GPU may be connected to the system by PCIE, and so this can also present a shared resource or contention point (however it's the rare compute or graphics program that would even come close to using up the bandwidth of a modern Gen3 x16 PCIE connection.)
So if you were using both graphics (or compute) shaders, as well as OpenACC acceleration, there would be contention for resources, to some degree. The level of contention, or the trade off, is not something that I can generalize about. It will depend very much on the specifics of your program, and the extent and the detail sequencing of the compute functions and the graphics functions.
GPU designers have these types of use-cases in mind, and so GPUs are generally pretty good at rapid context switching between the various tasks that may compete for resources.

GPGPU Applications other than Image processing?

I am in search for few cpu applications which can be ported to gpgpu for better efficiency.
Else where can gpgpu be used other than image processing area ?
This is actually for my graduate project.
The specialized processing architectures of GPU compute engines are useful for just about any data crunching problem where you have:
a non-trivial amount of data
a non-trivial computation to perform on every element of that data,
the input data needed to compute each output element fits in GPU memory, or can be choreographed to arrive in GPU memory when it is needed.
It helps if the computation can be performed independently on all data elements at the same time, but this is not strictly required.
Image processing happens to be one example of that scenario - a finite (but large) number of pixels to process, and many image algorithms can be executed on each pixel in parallel.
Other examples include: generalized signal analysis, such as processing audio signals. Image processing is just a specialized form of signal analysis. Pattern recognition, where much of the challenge is to separate the signal from the noise. Voice recognition, anyone? 3 dimensional surface matching, such as figuring out the shapes of organic compounds based on the flex angles of their chemical bonds, or figuring out if two organic compounds are likely to interact in interesting ways (eg, bioreceptors). Physical modelling of all kinds (collision simulations, seismic analysis, etc). And of course, cryptography, where you can always spend more compute time going over the same data again and again.
GPU compute engines are not well suited for problems where the volume of data significantly dwarfs the computations to be performed. GPUs work well on stuff in memory. Moving data into or out of GPU memory is often the most expensive step of an entire computation, so you want to make sure you have enough computation going on to "make up" for the cost of loading the data into memory. If the data is too big to fit into memory you have to adopt distributed computing tactics.
For example, calculating a primary key index of a petabyte database probably isn't a great fit for a GPU since most of the effort will probably be spent just getting the data off the hard disk into memory. The index computation itself is fairly trivial, which doesn't make for a very interesting GPU win, and while I'm sure the data could be carved up into chunks and the chunks indexed independently by a boatload of GPU cores, variability in the data will likely prevent the GPU from operating at its full capacity. (GPU code works best when all "oarsmen" (processor cores / threads) are pulling in the same direction - uniform execution on separate data) While database indexing might see some benefit by using a GPU approach, it certainly won't be as big of a performance improvement over CPU baseline as something better suited to the GPU execution model constraints - like signal processing.
Brute force crypto attacks? MD5 has been done by Whitepixel and SHA-256 has been done by all the Bitcoin miners. On the other hand, I am not aware of any GPU implementations of bcrypt() or scrypt(), but an academic working in the area is probably a better person to ask.
Easiest way to figure out what kinds of applications are well-suited for GPGPU is to look at the speedups other groups have achieved. Here are a couple of links with that information:
NVIDIA's case studies
AccelerEyes' case studies
Looks like military/aerospace, life science, energy, finance, manufacturing, media, and a few other smaller industries have examples of strong speedups.

GPU-accelerated hardware simulation?

I am investigating if GPGPUs could be used for accelerating simulation of hardware.
My reasoning is this: As hardware by nature is very parallel, why simulate on highly sequential CPUs?
GPUs would be excellent for this, if not for their restrictive style of programming: You have a single kernel running, etc.
I have little experience with GPGPU-programming, but is it possible to use events or queues in OpenCL / CUDA?
Edit: By hardware simulation I don't mean emulation, but bit-accurate behavorial simulation (as in VHDL behavioral simulation).
I am not aware of any approaches regarding VHDL simulation on GPUs (or a general scheme to map discrete-event simulations), but there are certain application areas where discrete-event simulation is typically applied and which can be simulated efficiently on GPUs (e.g. transportation networks, as in this paper or this one, or stochastic simulation of chemical systems, as done in this paper).
Is it possible to re-formulate the problem in a way that makes a discrete time-stepped simulator feasible? In this case, simulation on a GPU should be much simpler (and still faster, even if it seems wasteful because the time steps have to be sufficiently small - see this paper on the GPU-based simulation of cellular automata, for example).
Note, however, that this is still most likely a non-trivial (research) problem, and the reason why there is no general scheme (yet) is what you already assumed: implementing an event queue on a GPU is difficult, and most simulation approaches on GPUs gain speed-up due to clever memory layout and application-specific optimizations and problem modifications.
This is outside my area of expertise, but it seems that while the following paper discusses gate-level simulation rather than behavioral simulation, it may contain some useful ideas:
Debapriya Chatterjee, Andrew Deorio, Valeria Bertacco.
Gate-Level Simulation with GPU Computing
http://web.eecs.umich.edu/~valeria/research/publications/TODAES0611.pdf

How to estimate FPGA utilization for designing a work a like core?

I was considering some older generation FPGA's to interface with a legacy system. So I want a good way of estimating how much space is necessary to replace an ASIC given its transistor count.
Does Verilog versus VHDL affect the utilization? (According to one of our contractors it affects the timing, so utilization seems likely.)
What effect do different vendor's parts have on it? (Actel's architecture is significantly different from Xilinx', for example. I expect some "weighting" based on this.)
This discussion originally from comp.arch.fpga seems to indicate that it's pretty complicated, including factors such as what space vs. speed tradeoffs you've asked the VHDL (or verilog) compiler to make, etc. When you consider that VHDL is source code and an FPGA implementation of it is object code, you'll see why it's not straightforward.
"FPGA vs. ASIC" notes that "a design created to work well on an FPGA is usually horrible on an ASIC and a design created for an ASIC may not work at all on an FPGA (certainly at the original frequency)".
A Google search for FPGA ASIC gates may have more useful info.
Verilog vs. VHDL has little real difference on speed or utilization. It is more related to amount of code you have to type (more for VHDL) and strong vs weak-typing.
The marketing gates for FPGA vendors are inflated. Altera vs. Xilinx are similar utilization. Look at memories (if memory intensive) and number of flip-flops; that will likely be good enough.
Consider what a similar core requires, for example if you need to do an error-coding core, look at a Reed-Solomon core.

Feasibility of GPU as a CPU? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What do you think the future of GPU as a CPU initiatives like CUDA are? Do you think they are going to become mainstream and be the next adopted fad in the industry? Apple is building a new framework for using the GPU to do CPU tasks and there has been alot of success in the Nvidias CUDA project in the sciences. Would you suggest that a student commit time into this field?
Commit time if you are interested in scientific and parallel computing. Don't think of CUDA and making a GPU appear as a CPU. It only allows a more direct method of programming GPUs than older GPGPU programming techniques.
General purpose CPUs derive their ability to work well on a wide variety of tasks from all the work that has gone into branch prediction, pipelining, superscaler, etc. This makes it possible for them to achieve good performance on a wide variety of workloads, while making them suck at high-throughput memory intensive floating point operations.
GPUs were originally designed to do one thing, and do it very, very well. Graphics operations are inherently parallel. You can calculate the colour of all pixels on the screen at the same time, because there are no data dependencies between the results. Additionally, the algorithms needed did not have to deal with branches, since nearly any branch that would be required could be achieved by setting a co-efficient to zero or one. The hardware could therefore be very simple. It is not necessary to worry about branch prediction, and instead of making a processor superscaler, you can simply add as many ALU's as you can cram on the chip.
With programmable texture and vertex shaders, GPU's gained a path to general programmability, but they are still limited by the hardware, which is still designed for high throughput floating point operations. Some additional circuitry will probably be added to enable more general purpose computation, but only up to a point. Anything that compromises the ability of a GPU to do graphics won't make it in. After all, GPU companies are still in the graphics business and the target market is still gamers and people who need high end visualization.
The GPGPU market is still a drop in the bucket, and to a certain extent will remain so. After all, "it looks pretty" is a much lower standard to meet than "100% guaranteed and reproducible results, every time."
So in short, GPU's will never be feasible as CPU's. They are simply designed for different kinds of workloads. I expect GPU's will gain features that make them useful for quickly solving a wider variety of problems, but they will always be graphics processing units first and foremost.
It will always be important to always match the problem you have with the most appropriate tool you have to solve it.
Long-term I think that the GPU will cease to exist, as general purpose processors evolve to take over those functions. Intel's Larrabee is the first step. History has shown that betting against x86 is a bad idea.
Study of massively parallel architectures and vector processing will still be useful.
First of all I don't think this questions really belongs on SO.
In my opinion the GPU is a very interesting alternative whenever you do vector-based float mathematics. However this translates to: It will not become mainstream. Most mainstream (Desktop) applications do very few floating-point calculations.
It has already gained traction in games (physics-engines) and in scientific calculations. If you consider any of those two as "mainstream", than yes, the GPU will become mainstream.
I would not consider these two as mainstream and I therefore think, the GPU will raise to be the next adopted fad in the mainstream industry.
If you, as a student have any interest in heavily physics based scientific calculations, you should absolutely commit some time to it (GPUs are very interesting pieces of hardware anyway).
GPU's will never supplant CPU's. A CPU executes a set of sequential instructions, and a GPU does a very specific type of calculation in parallel. These GPU's have great utility in numerical computing and graphics; however, most programs can in no way utilize this flavor of computing.
You will soon begin seeing new processers from Intel and AMD that include GPU-esque floating point vector computations as well as standard CPU computations.
I think it's the right way to go.
Considering that GPUs have been tapped to create cheap supercomputers, it appears to be the natural evolution of things. With so much computing power and R&D already done for you, why not exploit the available technology?
So go ahead and do it. It will make for some cool research, as well as a legit reason to buy that high-end graphic card so you can play Crysis and Assassin's Creed on full graphic detail ;)
Its one of those things that you see 1 or 2 applications for, but soon enough someone will come up with a 'killer app' that figures out how to do something more generally useful with it, at superfast speeds.
Pixel shaders to apply routines to large arrays of float values, maybe we'll see some GIS coverage applications or well, I don't know. If you don't devote more time to it than I have then you'll have the same level of insight as me - ie little!
I have a feeling it could be a really big thing, as do Intel and S3, maybe it just needs 1 little tweak adding to the hardware, or someone with a lightbulb above their head.
With so much untapped power I cannot see how it would go unused for too long. The question is, though, how the GPU will be used for this. CUDA seems to be a good guess for now but other techologies are emerging on the horizon which might make it more approachable by the average developer.
Apple have recently announced OpenCL which they claim is much more than CUDA, yet quite simple. I'm not sure what exactly to make of that but the khronos group (The guys working on the OpenGL standard) are working on the OpenCL standard, and is trying to make it highly interoperable with OpenGL. This might lead to a technology which is better suited for normal software development.
It's an interesting subject and, incidentally, I'm about to start my master thesis on the subject of how best to make the GPU power available to the average developers (if possible) with CUDA as the main focus.
A long time ago, it was really hard to do floating point calculations (thousands/millions of cycles of emulation per instruction on terribly performing (by today's standards) CPUs like the 80386). People that needed floating point performance could get an FPU (for example, the 80387. The old FPU were fairly tightly integrated into the CPU's operation, but they were external. Later on they became integrated, with the 80486 having an FPU built-in.
The old-time FPU is analagous to GPU computation. We can already get it with AMD's APUs. An APU is a CPU with a GPU built into it.
So, I think the actual answer to your question is, GPU's won't become CPUs, instead CPU's will have a GPU built in.