Recommendation for OpenCL GPGPU [closed] - cuda

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I first got into GPGPU with my (now aging) NVIDIA 9800GT 512MB via CUDA. It seems these days my GPU just doesn't cut it.
I'm specifically interested in OpenCL, as opposed to CUDA or StreamSDK, though some info on whether either of these are still worth pursuing would be nice.
My budget is around 150 GBP plus/minus 50 GBP. I'm a little out of the loop on which GPUs are best for scientific computing (specifically fluid simulation and 3D medical image processing).
A comparison of ATI vs. NVIDIA may also be helpful, if they are really so disparate.
[I'd also be interested to hear any suggestions on games that make use of GPGPU capabilities, but that's a minor issue next to the potential for scientific computing.]
I'm also a little lost when it comes to evaluating the pros/cons of memory speed vs. clock speed vs. memory capacity, etc, so any info with regard to these more technical aspects would be most appreciated.
Cheers.

If you were going purely off OpenCL being the requirement, I would say you go with ATI because they have a released version of OpenCL 1.1 drivers where as nVidia had beta drivers almost instantly when the spec was published, but has not updated them since and they have a couple bugs from what I've read in the nVidia open OpenCL forums.
Personally I chose nVidia because it gives me all the options. You really ought to check out CUDA. It's a far more productive approach to leveraging the GPU and CPU using a common language. Down the road Microsoft's AMP language extensions for C++ are going to provide the same sort of approach as CUDA in a more platform agnostic way though and I'm sure that will be more widely adopted by the community at that point than CUDA.
Another reason to choose nVidia is because that's what the HPC system builders have been building systems with since nVidia made a huge push for GPGPU computing where as it's less backed by AMD/ATI. There really is no answer to the Tesla lineup from that camp. Even Amazon EC2 offers a GPU compute cluster based on Tesla. So, if you're looking for reach and scale beyond the desktop, I think nVidia is a better bet.

Related

GPU Programming, CUDA or OpenCL or? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 9 months ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What is the best way to do programming for GPU?
I know:
CUDA is very good, much developer support and very nice zo debug, but only on NVidia Hardware
OpenCL is very flexible, run on NVidia, AMD and Intel Hardware, run on Accellerators, GPU and CPU but as far as I know not supported anymore by NVidia.
Coriander (https://github.com/hughperkins/coriander) which converts CUDA to OpenCL
HIP https://github.com/ROCm-Developer-Tools/HIP is made by AMD to have a possibility to write in a way to convert to AMD and NVidia CUDA. It also can convert CUDA to HIP.
OpenCL would my prefered way, I want to be very flexible in hardware support. But if not longer supported by NVidia, it is a knockout.
HIP sounds then best to me with different released files. But how will be the support of Intels soon coming hardware?
Are there any other options?
Important is for me many supported hardeware, long term support, so that can be compiled in some years also and manufacture independant.
Additional: Should be able to use more than obe compiler, on Linux and Windows supported.
Nvidia won't cancel OpenCL support anytime soon.
A newly emerging approach for portable code on GPU is SYCL. It enables higher level programming from a single source file that is then compiled twice, once for the CPU and once for GPU. The GPU part then runs on GPU via either OpenCL, CUDA or some other backend.
As of right now however, the best supported GPU framework across plattforms is OpenCL 1.2, which is very well established at this point. With that your code runs on 10 year old GPUs, on the latest and fastest data-center GPUs, on gaming and workstation GPUs and even on CPUs if you need more memory. On Nvidia GPUs there is no performance/efficiency tradeoff at all compared to CUDA; it runs just as fast.
The porting tools like HIP are great if you already have a large code base, but performance could possibly suffer. My advice is to go for either one framework and stay fully committed to it, rather than using some tool to then generate a possibly poorly optimized port.
If you choose to start with OpenCL, have a look at this OpenCL-Wrapper. The native OpenCL C++ bindings are a bit cumbersome to use, and this lightweight wrapper simplifies learning a lot, while keeping functionality and full performance.

comparison of cuda enabled nvidia GPUs [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I would like to state a theoretical question about the compute capabilities of the Nvidia cards.
From my relatively short experience I have noticed that cards with CC 2.0 can perform better than the 1.3 ones. That could really depend on the nature of a kernel and the occupancy each SM will use.
But since everything has its advantages and disadvantages, what are the disadvantages of a 2.0 card and the advantages of a 1.3?
How can a 1.3 card can perform a certain kernel faster than a 2.0 and what characteristics should that kernel have.
Any personal experience is well accepted and if there is a complete interpretation through the architecture of each card even better.
Regards
In general, the higher the compute capability, the more capabilities the GPU is capable of.
Check out Wikipedia
Of course, if you write bad code for a GPU with a CC of 3.5 and great code or GPU with a CC of 2.0, the 2.0 GPU can outperform the 3.5 GPU.

Which Rendering Engine or Game Engine for Simulation with CUDA and F# Back-end [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm putting together a simulation environment catered towards multi-agent/swarm robotic simulation. The parts I'll be doing from scratch are the math library, physics engine, and AI engine. These will be implemented in F#/CUDA. I'm looking for advice on how to best implement everything else - i.e. all of the other parts of a game engine. After a bit of research in the beginning, it seemed like OGRE would be my best bet but I'm still unsure. Basically, I want to concentrate on the physics/ai stuff and have some game engine do everything else for me - rendering, scene graphs, etc.
Something I'm confused about is how my F#/CUDA back-end will interact with the rendering engine/game engine. It seems like the popular engines are set up so you could replace parts of the architecture with your own implementations fairly easily... Which one would be best for me to use?
Since I'm using CUDA and will be rendering results from the gpu, how will this affect implementation of a rendering engine like OGRE? I know I can't be stuck just having to use straight OpenGL...
Currently, I'm looking at Ogre, Panda3D, jMonkey, and Gazebo. On the surface, Gazebo seems like it could be exactly what I'm looking for.
I am also considering Python and Lua, though I'm leaning towards the latter.
I'd like for this "simulation environment" to be able to easily converted to (or just used as) a real game engine. I know all of the elements would pretty much be there already, I'm just stating that in case it helps with giving me advice.
Also, I would really like for this thing to be able to run on the CUDA-capable Tegra 4s that are supposed to come out this quarter (last I checked). This may or may not be something that can accurately be determined at this point, but you guys will know better than me... i.e. if I want the possibility of future tegra/android use, I should do everything in jMonkey? Would it matter?
...Overall I'm most interested in advice on what architecture configuration would work well with a F#/CUDA physics & ai engine

Starling framework- only for games? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Going to be building these types of applications in AS3, targeting kiosks using multitouch.
https://vimeo.com/8869517
I understand Starling's raison d'etre is providing a layer of abstraction when targeting the GPU.
My first question would be, is there any advantage in targeting the GPU for this type of application?
And if so, would the Starling framework be a good choice? Or is it really only useful for games?
Such as from the description of the O'Reilly free book: Introducing Starling - Building GPU Accelerated Applications description:
Starling is an ActionScript 3 2D framework developed on top of the
Stage3D APIs (available on desktop in Flash Player 11 and Adobe AIR
3). Starling is mainly designed for game development, but could be
used for many other use cases. Starling makes it possible to write
fast GPU accelerated applications without having to touch the
low-level Stage3D APIs.
Most Flash developers want to be able to leverage GPU acceleration
(through Stage3D) without the need to write such higher-level
frameworks and dig into the low-level Stage3D APIs. Starling is
completely designed after the Flash Player APIs and abstracts the
complexity of Stage3D (Molehill) and allows easy and intuitive
programming for everyone.
Obviously Starling is for ActionScript 3 developers, especially those
involved in 2D game development; of course you will need to have a
basic understanding of ActionScript 3. By its design (lightweight,
flexible and simple), Starling can be used also be used for other use
cases like UI programming. That said, everything is designed to be as
intuitive as possible, so any Java™ or .Net™ developer will get the
hang of it quickly as well.
Per GPU, clearly any visual based runtime could benefit from hardware acceleration.
This also depends on the hardware specs of your kiosk.
There are many performance considerations beyond GPU, such as leveraging stage video in your kiosk apps. You should also weigh authoring requirements within Flash Pro.

How can one learn multi-threaded parallel programming? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
No matter how proficient you are at programming C or Java, you should consider adding multi-threaded programming to your set of skills.
This is NOT something you should try to learn on your own. It is MUCH harder to learn than sequential programming. If you are a Technical Manager, you SHOULD invest in retraining your key staff in multi-threaded programming. You might also monitor the research activities in concurrent programming languages (like those listed above). You can be sure that your competitors will.
This is a quote from this article. I imagine most of us here are very proficient in teaching ourselves different languages, data structures, algorithms, etc, and I do recognize the mental shift that needs to occur to do parallel programming right.
I reject the idea that one cannot learn parallel programming "right" on their own. So what's the most responsible way to teach oneself parallel programming? What books and other resources are recommended?
Edit: Here are some more details. I would be mostly applying these to scientific computing, but I was looking for general, language-agnostic material/advice. I am also looking for a heathy dose of practical theory. Imagine you have an excellent developer who loves math and computer science, but never took a course on parallel programming. Now imagine he has a deadline for a problem (say 1 year), and you have to give him the materials to figure out whether parallelization would be helpful, and how to implement it right. What resources would you give him? That is how I (and I hope other developers) would be interested in learning parallelization/multi-threading.
If you pitched up in my workplace and asked that question I'd throw a couple of books at you:
Introduction to Parallel Computing and Parallel Scientific Computing
Your response may well be 'that's not what I want to learn about !' so come back and be a bit more specific in your question and we'll be able to be a bit more specific in our answers.
But the most 'responsible' way to teach yourself this stuff is the same as the most responsible way to teach yourself any programming stuff: get a problem, get a toolbag, get a deadline, and get cracking.
For Microsoft technology, there is a wealth of information at the MSDN Parallel Computing portal here. You could start out with the Getting Started links.
That article is basically an advertisement for training services. You should treat a salesman's opinion of the value of his own products with a degree of circumspection.
I've no idea how you learned everything else you already know about computers, but if that worked for you I'd stick with the same approach for the next thing you want to learn.
I can't recommend any language/platform agnostic books - I suspect they'd be very academic anyway. If you're actually on .NET, then Jeff Richter writes a lot of good stuff about threading, and I believe the 3rd edition of his C#/CLR book (earlier editions were excellent) has a great deal about parallel programming.
if you read everything Google finds for the stuff below, you'll have a pretty good start. Assuming general IT background, etc. These are not language or OS specific:
Peterson's algorithm
atomic test-and-set
critical section
rendezvous
memory barriers
lock-free algorithms
The mathematical background of this is probably Petri net.
Read Dijkstra.