Anyone know any good math functions that causes a lot of load on the CPU. I am wanting to create a simple program the just creates load for X amount of seconds while another program monitors it. I'm just looking for functions, not actual stress testing programs.
The Computer Language Benchmark Game has quite a few benchmarks, many of which are math-based. It's a good source because the source code for each algorithm is included and there are implementations of each benchmark in dozens of languages. That way, you can just use the implementation in whatever language you're comfortable compiling and running.
Try the Lucas-Lehmer primality test. It's what's used in the Prime95 executable, and Prime95's fairly standard for CPU stress testing.
A naive implementation of Fibonacci? Something like:
let fib = Seq.unfold(fun (p, c) -> Some((p, c), (c, p+c))) (1,1)
Related
I have no problem with the IO Monad. But I want to understand the followings:
In All/almost Haskell tutorials/ text books they keep saying that getChar is not a pure function, because it can give you a different result. My question is: Who said that this is a function in the first place. Unless you give me the implementation of this function, and I study that implementation, I can't guarantee it is pure. So, where is that implementation?
In All/almost Haskell tutorials/ text books, it's said that, say (IO String) is an action that (When executed) it can give you back a value of type String. This is fine, but who/where this execution is taking place. Of course! The computer is doing this execution. This is OK too. but since I am only a beginner, I hope you forgive me to ask, where is the recipe for this "execution". I would guess it is not written in Haskell. Does this later idea mean that, after all, that a Haskell program is converted into a C-like program, which will eventually be converted into Assembly -> Machine code? If so, where one can find the implementation of the IO stuff in Haskell?
Many thanks
Haskell functions are not the same as computations.
A computation is a piece of imperative code (perhaps written in C or Assembler, and then compiled to machine code, directly executable on a processor), that is by nature effectful and even unrestricted in its effects. That is, once it is ran, a computation may access and alter any memory and perform any operations, such as interacting with keyboard and screen, or even launching missiles.
By contrast, a function in a pure language, such as Haskell, is unable to alter arbitrary memory and launch missiles. It can only alter its own personal section of memory and return a result that is specified in its type.
So, in a sense, Haskell is a language that cannot do anything. Haskell is useless. This was a major problem during the 1990's, until IO was integrated into Haskell.
Now, an IO a value is a link to a separately prepared computation that will, eventually, hopefully, produce a. You will not be able to create an IO a out of pure Haskell functions. All the IO primitives are designed separately, and packaged into GHC. You can then compose these simple computations into less trivial ones, and eventually your program may have any effects you may wish.
One point, though: pure functions are separate from each other, they can only influence each other if you use them together. Computations, on the other hand, may interact with each other freely (as I said, they can generally do anything), and therefore can (and do) accidentally break each other. That's why there are so many bugs in software written in imperative languages! So, in Haskell, computations are kept in IO.
I hope this dispels at least some of your confusion.
Just of curiosity. CuBLAS is a library for basic matrix computations. But these computations, in general, can also be written in normal Cuda code easily, without using CuBLAS. So what is the major difference between the CuBLAS library and your own Cuda program for the matrix computations?
We highly recommend developers use cuBLAS (or cuFFT, cuRAND, cuSPARSE, thrust, NPP) when suitable for many reasons:
We validate correctness across every supported hardware platform, including those which we know are coming up but which maybe haven't been released yet. For complex routines, it is entirely possible to have bugs which show up on one architecture (or even one chip) but not on others. This can even happen with changes to the compiler, the runtime, etc.
We test our libraries for performance regressions across the same wide range of platforms.
We can fix bugs in our code if you find them. Hard for us to do this with your code :)
We are always looking for which reusable and useful bits of functionality can be pulled into a library - this saves you a ton of development time, and makes your code easier to read by coding to a higher level API.
Honestly, at this point, I can probably count on one hand the number of developers out there who actually implement their own dense linear algebra routines rather than calling cuBLAS. It's a good exercise when you're learning CUDA, but for production code it's usually best to use a library.
(Disclosure: I run the CUDA Library team)
There's several reasons you'd chose to use a library instead of writing your own implementation. Three, off the top of my head:
You don't have to write it. Why do work when somebody else has done it for you?
It will be optimised. NVIDIA supported libraries such as cuBLAS are likely to be optimised for all current GPU generations, and later releases will be optimised for later generations. While most BLAS operations may seem fairly simple to implement, to get peak performance you have to optimise for hardware (this is not unique to GPUs). A simple implementation of SGEMM, for example, may be many times slower than an optimised version.
They tend to work. There's probably less chance you'll run up against a bug in a library then you'll create a bug in your own implementation which bites you when you change some parameter or other in the future.
The above isn't just relevent to cuBLAS: if you have a method that's in a well supported library you'll probably save a lot of time and gain a lot of performance using it relative to using your own implementation.
I have code written in old-style Fortran 95 for combustion modelling. One of the features of this problem is that one have to solve stiff ODE system for taking into account chemical reactions influence. For this purpouse I use Fortran SLATEC library, which is also quite old. The solving procedure is straight forward, one just need to call subroutine ddriv3 in every cell of computational domain, so that looks something like that:
do i = 1,Number_of_cells ! Number of cells is about 2000
call ddriv3(...) ! All calls are independent on cell number i
end do
ddriv3 is quite complex and utilizes many other library functions.
Is there any way to get an advantage with CUDA Fortran, without searching some another library for this purpose? If I just run this as "parallel loop" is that will be efficient, or may be there is another way?
I'm sorry for such kind of question that immidiately arises the most obvious answer: "Why wouldn't you try and know it by yourself?", but i'm in a really straitened time conditions. I have no any experience in CUDA and I just want to choose the most right and easiest way to start.
Thanks in advance !
You won't be able to use or parallelize the ddriv3 call without some effort. Your usage of the phrase "parallel loop" suggests to me you may be thinking of using OpenACC directives with Fortran, as opposed to CUDA Fortran, but the general answer isn't any different in either case.
The ddriv3 call, being part of a Fortran library (which is presumably compiled for x86 usage) cannot be directly used in either CUDA Fortran (i.e. using CUDA GPU kernels within Fortran) or in OpenACC Fortran, for essentially the same reason: The library code is x86 code and cannot be used on the GPU.
Since presumably you may have access to the source implementation of ddriv3, you might be able to extract the source code, and work on creating a CUDA version of it (or a version that OpenACC won't choke on), but if it uses many other library routines, it may mean that you have to create CUDA (or direct Fortran source, for OpenACC) versions of each of those library calls as well. If you have no experience with CUDA, this might not be what you want to do (I don't know.) If you go down this path, it would certainly imply learning more about CUDA, or at least converting the library calls to direct Fortran source (for an OpenACC version).
For the above reasons, it might make sense to investigate whether a GPU library replacement (or something similar) might exist for the ddriv3 call (but you specifically excluded that option in your question.) There are certainly GPU libraries that can assist in solving ODE's.
I've recently started a project, building a physics engine.
I was hoping you could give me some advice related to some documentation and/or best technologies for this.
First of all, I've seen that Game-Physics-Engine-Development is highly recommended for the task at hand, and I was wondering if you could give me a second opinion.Should I get it?
Also, while browsing Amazon, I've stumbled onto Game Engine Architecture and since I want to build my physics engine for games, I thought this might be a good read aswell.
Second, I know that simulating physics is highly computation intensive so I would like to use either CUDA or OpenCL.Right now I'm leaning towards OpenCL, because it would work on both NVIDIA and ATI chipsets.What do you guys suggest?
PS: I will be implementing this in C++ on Linux.
Thanks.
I would suggest first of all planning a simple game as a test case for your engine. Having a basic game will drive feature and API development. Writing an engine without having clear goal makes the project riskier. While I agree nVidia and ATi should be treated as separate targets for performance reasons, I'd recommend you start with neither.
I personally wrote physics engine for Uncharted:Drake's Fortune - a PS3 game - and I did a pass in C++, and when it worked, made a pass to optimize it for VMX and then put it on SPU. Mind you, I did just a fraction what I wanted to do initially because of time constraints. After that I made an iteration to split data stages out and formulate a pipeline of data transformations. It's important because whether CPU, GPU or SPU, modern processors running nontrivial code spend most of the time waiting for caches. You have to pay special attention to data structures and pipeline them such that you have a small working set of data at any stage. E.g. first I do broadphase, so I don't need shapes but I need world-space bounding boxes. So I split bounding boxes into separate array and compute them all together in another pass, that writes them out in an optimal way. As input to bbox computation, I need shape transformations and some bounds from them, but not necessarily the whole shapes. After broadphase, I compute/update sim islands, at the same time performing narrow phase, for which I do actually need the shapes. And so on - I described this with pictures in an article to Game Physics Pearls I wrote.
I guess what I'm trying to say are the following points:
Make sure you have a clear goal that drives your development - a very basic game with flushed out design would be best in game physics engine case.
Don't try to optimize before you have a working product. Write it in the simplest and fastest way possible first and fix all the bugs in math. Design it so that you can port it to CUDA later, but don't start writing CUDA kernels before you have boxes rolling on the screen.
After you write the first pass in C++, optimize it for CPU : streamline it such that it doesn't thrash the cache, and compartmentalize the code so that there's no spaghetti of calls and all the code from each stage is localized. This will help a) port to CUDA b) port to OpenCL c) port to a console d) make it run reasonably fast e) make it possible to debug.
While developing, resist temptation to go do something you just thought about unless that feature is not necessary for your clear goal (see #1) - that's why you need a goal, to steer you towards it and make it possible to finish the actual project. Distractions usually kill projects without clear goals.
Remember that in one way or another, software development is iterative. It's ok to do a rough-in and then refine it. Leather, rinse, repeat - it's a mantra of a programmer :)
It's easy to give advice. If you wanna do something, just go and do it, and we'll sit back and critique :)
Here is an answer regarding the choice of CUDA or OpenCL. I do not have a recommendation for a book.
If you want to run your program on both NVIDIA and ATI chipsets, then OpenCL will make the job easier. However, you will want to write a different version of each kernel to get good performance on each chipset. For example, on ATI cards you'll want to manually vectorize code using float4/int4 data types (or accept a nearly 4x performance penalty), while NVIDIA works better with scalar data types.
If you're only targeting NVIDIA, then CUDA is somewhat more convenient to program in.
In every Turing-Complete language, is it possible to create a working
Compiler for itself which first runs on an interpreter written in some other language and then compiles it's own source code? (Bootstrapping)
Standards-Compilant C++ compiler which outputs binaries for, e.g.: Windows?
Regex Parser and Evaluater?
World of Warcraft clone? (Assuming the language gets the necessary API bindings as, for example, OpenGL and the WoW source code is available)
(Everything here theoretical)
Let's take Brainf*ck as an example language.
In every Turing-Complete language, is it possible to create a working...
If one Turing-complete language can do it, then they all can. In this sense, they're all equally "powerful". Since everything you described already exists in at least one Turing-complete language, any of these programs can be written in any other Turing-complete language.
However, merely because something is possible doesn't mean it's easy, or even feasible. That's a hugely important distinction, and it's the crux of why different programming languages exist. They're not all equally good at making specific kinds of software -- if they were, we'd only need one language!
Turing-Complete only express computation capability, nothing about I/O capability !
No, Turing completeness have nothing to do with I/O and hardwares.
However, you can pretend I/O, hardware systems and graphic systems existed by using variables (or the "memory tape"). In BF, you could use the first 2 cells (x, y) for the "pretended" screen resolution, then another x times y cells for all pixels on the screen, then next cell (n) for the "pretended" filesystem size, then next n cells for the filesystem content...
Yes, of course, all of those. That's what "Turing-complete" means, after all: it can compute everything that can be computed.
All that being Turing-complete really requires is that you can do simple math, have some variables, and can do a while loop. Or any number of equivalent things. If you want to do real programs, you need a bit more (notably syscalls) and you have to worry about efficiency too (turing machines can be very slow...) In theory there's no difference between turing-equivalent systems, but in practice there is.
If anyone does a WoW client in BF, I will be very impressed!
Any algorithm that can be implemented in one Turing-Complete language can be implemented in any other. Your questions have more to do with the operating system services and APIs which must be made available through the language in question.
In short, the answer is yes to all of the above from a formal language point of view.
Theoretical, yes. But a much more interesting question is if it would be practically possible given a certain "esoteric" programming language.
Every Turing-Complete language could compute the same set of functions.
So, a turing-complete language could do all what you wrote because that things are computed with other turing-complete languages.