OpenCL examples with benchmarks - cuda

I'm looking for some introductory examples to OpenCL which illustrate the types of applications that can experience large (e.g., 50x-1000x) increases in speed. Cuda has lots of nice examples, but I haven't found the same thing for OpenCL.
A nice example might be global optimization of complex functions via particle swarms, simulated annealing, evolutionary algorithms, ant colony optimization, etc.

The algorithms you are describing are neither simple nor introductory from the perspective of GPU programming. The reason CUDA has examples in these areas is that it has been around long enough for people to have developed these examples. There is currently no publicly available version of OpenCL that runs on GPUs. Both ATI and NVIDIA are offering beta versions of their OpenCL drivers, but ATI's supports only CPU computation and NVIDIA's requires signing an NDA to get. Simply put, OpenCL has not been around long enough for comprehensive examples like these to have been developed and demonstrated.
That said, gaining access to NVIDIA's OpenCL drivers is not difficult. You can find out how to do so on their forums here. I assume that the OpenCL distribution contains some sample programs to help you get started.
This also means that it's an excellent opportunity for you to develop some of these benchmarks and post your results. Then people will refer to your work rather than you referring to their work. I wouldn't expect too many surprises though. OpenCL performance should be roughly on par with CUDA performance once it becomes widely available and supported.

The are some great examples in the SDK from nvidia:
http://developer.nvidia.com/object/get-opencl.html

Our team has been working on OpenCL algorithms and acceleration and we would like to suggest the article
http://www.cmsoft.com.br/index.php?view=article&catid=1:latest-news&id=247:opencl-simulated-annealing
as a sample implementation of Simulated Annealing algorithm for minimization.

You could try the following two books:
Programming Massively Parallel Processors ... A Hands-on Approach (NVIDIA)(chapter 1 and 2)
The OpenCL Programming Book ... Parallel Programming for MultiCore CPU and GPU (History components
Both go in detail as to explain why the development was made and where the true bonusses can be found.
Not sure about benchmarking though , haven't had any luck there myself either.

Related

How do I develop CUDA application on my ATI, to be later executed on NVIDIA

My computer has an ATI graphics card, but I need to code an algorithm I already have in CUDA, to accerelate the process. Is that even possible? If yes does anyone have any link or tutorial from setting up my IDE to coding a simple image processing or passing an image. I also considered OpenCL but I have not found any information how to do anything with it.
This answer is more directed toward the part
I also considered OpenCL but I have not found any information how to do anything with it.
Check on this NVIDIA site:
http://developer.nvidia.com/nvidia-gpu-computing-documentation
Scroll down and you find
OpenCL Programming Guide
This is a detailed programming guide for OpenCL developers.
OpenCL Best Practices Guide
This is a manual to help developers obtain the best performance from OpenCL.
OpenCL Overview for the CUDA Architecture
This whitepaper summarizes the guidelines for how to choose the best implementations for NVIDIA GPUs.
OpenCL Implementation Notes
This document describes the "Implementation Defined" behavior for the NVIDIA OpenCL implementation as required by the OpenCL specification Version: 1.0. The implementation defined behavior is referenced below in the order of it's reference in the OpenCL specification and is grouped by the section number for the specification.
On AMD/ATI you have this site for a brief introduction:
http://www.amd.com/us/products/technologies/stream-technology/opencl/pages/opencl-intro.aspx
And for more resources check:
http://www.amd.com/us/products/technologies/stream-technology/Pages/training-resources.aspx
Unless CUDA is a requirement you should consider OpenCL again as you can you use it on both platforms and you state have one and want to develop for the other.
You might also want to take a look at these:
http://blogs.nvidia.com/2011/06/cuda-now-available-for-multiple-x86-processors/
http://www.pgroup.com/resources/cuda-x86.htm
I haven't tried it myself, but the prospect of running CUDA code on x86 seems pretty attractive.

Using High Level Shader Language for computational algorithms

So, I heard that some people have figured out ways to run programs on the GPU using High Level Shader Language and I would like to start writing my own programs that run on the GPU rather than my CPU, but I have been unable to find anything on the subject.
Does anyone have any experience with writing programs for the GPU or know of any documentation on the subject?
Thanks.
For computation, CUDA and OpenCL are more suitable than shader languages. For CUDA, I highly recommend the book CUDA by Example. The book is aimed at absolute beginners to this area of programming.
The best way I think to start is to
Have a CUDA Card from Nvidia
Download Driver + Toolkit + SDK
Build the examples
Read the Cuda Programming Guide
Start to recreate the cudaDeviceInfo example
Try to allocate memory in the gpu
Try to create a little kernel
From there you should be able to gain enough momentum to learn the rest.
Once you learn CUDA then OpenCL and other are a breeze.
I am suggesting CUDA because is the one most widely supported and tested.

What is CUDA like? What is it for? What are the benefits? And how to start?

I am interested in developing under some new technology and I was thinking in trying out CUDA. Now... their documentation is too technical and doesn't provide the answers I'm looking for. Also, I'd like to hear those answers from people that've had some experience with CUDA already.
Basically my questions are those in the title:
What exactly IS CUDA? (is it a framework? Or an API? What?)
What is it for? (is there something more than just programming to the GPU?)
What is it like?
What are the benefits of programming against CUDA instead of programming to the CPU?
What is a good place to start programming with CUDA?
CUDA brings together several things:
Massively parallel hardware designed to run generic (non-graphic) code, with appropriate drivers for doing so.
A programming language based on C for programming said hardware, and an assembly language that other programming languages can use as a target.
A software development kit that includes libraries, various debugging, profiling and compiling tools, and bindings that let CPU-side programming languages invoke GPU-side code.
The point of CUDA is to write code that can run on compatible massively parallel SIMD architectures: this includes several GPU types as well as non-GPU hardware such as nVidia Tesla. Massively parallel hardware can run a significantly larger number of operations per second than the CPU, at a fairly similar financial cost, yielding performance improvements of 50× or more in situations that allow it.
One of the benefits of CUDA over the earlier methods is that a general-purpose language is available, instead of having to use pixel and vertex shaders to emulate general-purpose computers. That language is based on C with a few additional keywords and concepts, which makes it fairly easy for non-GPU programmers to pick up.
It's also a sign that nVidia is willing to support general-purpose parallelization on their hardware: it now sounds less like "hacking around with the GPU" and more like "using a vendor-supported technology", and that makes its adoption easier in presence of non-technical stakeholders.
To start using CUDA, download the SDK, read the manual (seriously, it's not that complicated if you already know C) and buy CUDA-compatible hardware (you can use the emulator at first, but performance being the ultimate point of this, it's better if you can actually try your code out)
(Disclaimer: I have only used CUDA for a semester project in 2008, so things might have changed since then.) CUDA is a development toolchain for creating programs that can run on nVidia GPUs, as well as an API for controlling such programs from the CPU.
The benefits of GPU programming vs. CPU programming is that for some highly parallelizable problems, you can gain massive speedups (about two orders of magnitude faster). However, many problems are difficult or impossible to formulate in a manner that makes them suitable for parallelization.
In one sense, CUDA is fairly straightforward, because you can use regular C to create the programs. However, in order to achieve good performance, a lot of things must be taken into account, including many low-level details of the Tesla GPU architecture.

best way of using cuda

There are ways of using cuda:
auto-paralleing tools such as PGI workstation;
wrapper such as Thrust(in STL style)
NVidia GPUSDK(runtime/driver API)
Which one is better for performance or learning curve or other factors?
Any suggestion?
Performance rankings will likely be 3, 2, 1.
Learning curve is (1+2), 3.
If you become a CUDA expert, then it will be next to impossible to beat the performance of your hand-rolled code using all the tricks in the book using the GPU SDK due to the control that it gives you.
That said, a wrapper like Thrust is written by NVIDIA engineers and shown on several problems to have 90-95+% efficiency compared with hand-rolled CUDA. The reductions, scans, and many cool iterators they have are useful for a wide class of problems too.
Auto-parallelizing tools tend to not do quite as good a job with the different memory types as karlphillip mentioned.
My preferred workflow is using Thrust to write as much as I can and then using the GPU SDK for the rest. This is largely a factor of not trading away too much performance to reduce development time and increase maintainability.
Go with the traditional CUDA SDK, for both performance and smaller learning curve.
CUDA exposes several types of memory (global, shared, texture) which have a dramatic impact on the performance of your application, there are great articles about it on the web.
This page is very interesting and mentions the great series of articles about CUDA on Dr. Dobb's.
I believe that the NVIDIA GPU SDK is the best, with a few caveats. For example, try to avoid using the cutil.h functions, as these were written solely for use with the SDK, and I've personally, as well as many others, have run into some problems and bugs in them, that are hard to fix (There also is no documentation for this "library" and I've heard that NVIDIA does not support it at all)
Instead, as you mentioned, use the one of the two provided APIs. In particular I recommend the Runtime API, as it is a higher level API, and so you don't have to worry quite as much about all of the low level implementation details as you do in the Device API.
Both APIs are fully documented in the CUDA Programming Guide and CUDA Reference Guide, both of which are updated and provided with each CUDA release.
It depends on what you want to do on the GPU. If your algorithm would highly benefit from the things thrust can offer, like reduction, prefix, sum, then thrust is definitely worth a try and I bet you can't write the code faster yourself in pure CUDA C.
However if you're porting already parallel algorithms from the CPU to the GPU, it might be easier to write them in plain CUDA C. I had already successful projects with a good speedup going this route, and the CPU/GPU code that does the actual calculations is almost identical.
You can combine the two paradigms to some extend, but as far as I know you're launching new kernels for each thrust call, if you want to have all in one big fat kernel (taking too frequent kernel starts out of the equation), you have to use plain CUDA C with the SDK.
I find the pure CUDA C actually easier to learn, as it gives you quite a good understanding on what is going on on the GPU. Thrust adds a lot of magic between your lines of code.
I never used auto-paralleing tools such as PGI workstation, but I wouldn't advise to add even more "magic" into the equation.

Financial applications on GPGPU

I want to know what sort of financial applications can be implemented using a GPGPU. I'm aware of Option pricing/ Stock price estimation using Monte Carlo simulation on GPGPU using CUDA. Can someone enumerate the various possibilities of utilizing GPGPU for any application in Finance domain,
There are many financial applications that can be run on the GPU in various fields, including pricing and risk. There are some links from NVIDIA's Computational Finance page.
It's true that Monte Carlo is the most obvious starting point for many people. Monte Carlo is a very broad class of applications many of which are amenable to the GPU. Also many lattice based problems can be run on the GPU. Explicit finite difference methods run well and are simple to implement, many examples on NVIDIA's site as well as in the SDK, it's also used in Oil & Gas codes a lot so plenty of material. Implicit finite difference methods can also work well depending on the exact nature of the problem, Mike Giles has a 3D ADI solver on his site which also has other useful finance stuff.
GPUs are also good for linear algebra type problems, especially where you can leave the data on the GPU to do reasonable work. NVIDIA provide cuBLAS with the CUDA Toolkit and you can get cuLAPACK too.
Basically, anything that requires a lot of parallel mathematics to run. As you originally stated, Monte Carlo simultation of options that cannot be priced with closed-form solutions are excellent candidates. Anything that involves large matrixes and operations upon them will be ideal; after all, 3D graphics use alot of matrix mathematics.
Given that many trader desktops sometimes have 'workstation' class GPUs in order to drive several monitors, possibly with video feeds, limited 3D graphics (volatility surfaces, etc) it would make sense to run some of the pricing analytics on the GPU, rather than pushing the responsibility onto a compute grid; in my experience the compute grids are frequently struggling under the weight of EVERYONE in the bank trying to use them, and some of the grid computing products leave alot to be desired.
Outside of this particular problem, there's not a great deal more that can be easily achieved with GPUs, because the instruction set and pipelines are more limited in their functional scope compared to a regular CISC CPU.
The problem with adoption has been one of standardisation; NVidia had CUDA, ATI had Stream. Most banks have enough vendor lock-in to deal with without hooking their derivative analytics (which many regard as extremely sensitive IP) into a gfx card vendor's acceleration technology. I suppose with the availability of OpenCL as an open standard this may change.
F# is used a lot in finance, so you might check out these links
http://blogs.msdn.com/satnam_singh/archive/2009/12/15/gpgpu-and-x64-multicore-programming-with-accelerator-from-f.aspx
http://tomasp.net/blog/accelerator-intro.aspx
High-end GPUs are starting to offer ECC memory (a serious consideration for financial and, eh, military applications) and high-precision types.
But it really is all about Monte Carlo at the moment.
You can go to workshops on it, and from their descriptions see that it'll focus on Monte Carlo.
A good start would be probably to check NVIDIA's website:
CUDA's Finance Showcases
CUDA's Finance Tutorials
Using a GPU introduces limitations to architecture, deployment and maintenance of your app.
Think twice before you invest efforts in such solution.
E.g. if you're running in virtual environment, it would require all physical machines to have GPU hardware installed and a special vGPU hardware and software support + licenses.
What if you decide to host your service in the cloud (e.g. Azure, Amazon)?
In many cases it is worth building your architecture in advance to support scale out and be flexible and scalable (with some overhead of course) rather than scale up and squeeze as much as you can from your hardware.
Answering the complement of your question: anything that involves accounting can't be done on GPGPU (or binary floating point, for that matter)