Numerical stability of ODE system - numerical-methods

I have to perform a numerical solving of an ODE system which has the following form:
du_j/dt = f_1(u_j, v_j, t) + g_1(t)v_(j-1) + h_1(t)v_(j+1),
dv_j/dt = f_2(u_j, v_j, t) + g_2(t)u_(j-1) + h_2(t)u_(j+1),
where u_j(t) and v_j(t) are complex-valued scalar functions of time t, f_i and g_i are given functions, and j = -N,..N. This is an initial value problem and the task is to find the solution at a certain time T.
If g_i(t) = h_i(t) = 0, then the equations for different values of j can be solved independently. In this case I obtain a stable and accurate solutions with the aid of the fourth-order Runge-Kutta method. However, once I turn on the couplings, the results become very unstable with respect to the time grid step and explicit form of the functions g_i, h_i.
I guess it is reasonable to try to employ an implicit Runge-Kutta scheme, which might be stable in such a case, but if I do so, I will have to evaluate the inverse of a huge matrix of size 4*N*c, where c depends on the order of the method (e. g. c = 3 for the Gauss–Legendre method) at each step. Of course, the matrix will mostly contain zeros and have a block tridiagonal form but it still seems to be very time-consuming.
So I have two questions:
Is there a stable explicit method which works even when the coupling functions g_i and h_i are (very) large?
If an implicit method is, indeed, a good solution, what is the fastest method of the inversion of a block tridiagonal matrix? At the moment I just perform a simple Gauss method avoiding redundant operations which arise due to the specific structure of the matrix.
Additional info and details that might help us:
I use Fortran 95.
I currently consider g_1(t) = h_1(t) = g_2(t) = h_2(t) = -iAF(t)sin(omega*t), where i is the imaginary unit, A and omega are given constants, and F(t) is a smooth envelope going slowly, first, from 0 to 1 and then from 1 to 0, so F(0) = F(T) = 0.
Initially u_j = v_j = 0 unless j = 0. The functions u_j and v_j with great absolute values of j are extremely small for all t, so the initial peak does not reach the "boundaries".

To 1) There will be no stable explicit method if your functions are very large. This is due to the fact that the area of stability of explicit (Runge-Kutta) methods is compact.
To 2) If your matrices are larger then 100x100 you could use this method:
Inverses of Block Tridiagonal Matrices and Rounding Errors.

Related

DM Script, why does the fourier transform of gaussian-kenel needs modulus

Recently I learn DM_Script for TEM image processing
I needed Gaussian blur process and I found one whose name is 'Gaussian Blur' in http://www.dmscripting.com/recent_updates.html
This code implements Gaussian blur algorithm by multiplying the fast fourier transform(FFT) of source image by the FFT of Gaussian-kernel image and finally doing inverse fourier transform of it.
Here is the part of the code,
// Carry out the convolution in Fourier space
compleximage fftkernelimg:=realFFT(kernelimg) (-> FFT of Gaussian-kernel image)
compleximage FFTSource:=realfft(warpimg) (-> FFT of source image)
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
realimage invFFT:=realIFFT(FFTProduct)
The point I want to ask is this
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
Why does the FFT of Gaussian-kernel need '.modulus().sqrt()' for the convolution?
It is related to the fact that the fourier transform of a Gaussian function becomes another Gaussian function?
Or It is related to a sort of limitation of discrete fourier transform?
Please answer me
Thanks
This is related to the general precision limitation of any floating point numeric computing. (see f.e. here, or more in depth here)
A rotational (real-valued) Gaussian of stand.dev. sigma should be transformed into a 100% real-values rotational Gaussioan of 1/sigma. However, doing this numerically will show you deviations: Just try the following:
number sigma = 30
number A0 = 1
realimage first := RealImage( "First", 8, 256, 256 )
first = A0 * exp( - (iradius**2/(2*sigma*sigma) ))
first.showimage()
complexImage second := FFT(first)
second.Showimage()
image nonZeroImaginaryMask = ( 0 != second.Imaginary() )
nonZeroImaginaryMask.Showimage()
nonZeroImaginaryMask.SetLimits(0,1)
When you then multiply these complex images (before back-transferring) you are introducing even more errors. By using modulus, one ensures that the forward transformed kernel is purely real and hence a better "damping" curve.
A better implementation of a FFT filtering code would actually create the FFT(Gaussian) directly with a std.dev of 1/sigma, as this is the analytically correct result. Doing a FFT of the kernel only makes sense if the kernel (or its FFT) is not analytically known.
In general: When implementing any "maths" into a program code, it can pay hugely to think it through with numerical computation limits in the back of your head. Reduce actual computation whenever possible (i.e. compute analytically and use the result instead of relying on brute force numerical computation) and try to "reshape" equations when possible, f.e. avoid large sums over many small numbers, be careful about checks against exact numeric values, try to avoid expressions which are very sensitive on small numerica errors etc.

Indirect Kalman Filter for Inertial Navigation System

I'm trying to implement an Inertial Navigation System using an Indirect Kalman Filter. I've found many publications and thesis on this topic, but not too much code as example. For my implementation I'm using the Master Thesis available at the following link:
https://fenix.tecnico.ulisboa.pt/downloadFile/395137332405/dissertacao.pdf
As reported at page 47, the measured values from inertial sensors equal the true values plus a series of other terms (bias, scale factors, ...).
For my question, let's consider only bias.
So:
Wmeas = Wtrue + BiasW (Gyro meas)
Ameas = Atrue + BiasA. (Accelerometer meas)
Therefore,
when I propagate the Mechanization equations (equations 3-29, 3-37 and 3-41)
I should use the "true" values, or better:
Wmeas - BiasW
Ameas - BiasA
where BiasW and BiasA are the last available estimation of the bias. Right?
Concerning the update phase of the EKF,
if the measurement equation is
dzV = VelGPS_est - VelGPS_meas
the H matrix should have an identity matrix in corrispondence of the velocity error state variables dx(VEL) and 0 elsewhere. Right?
Said that I'm not sure how I have to propagate the state variable after update phase.
The propagation of the state variable should be (in my opinion):
POSk|k = POSk|k-1 + dx(POS);
VELk|k = VELk|k-1 + dx(VEL);
...
But this didn't work. Therefore I've tried:
POSk|k = POSk|k-1 - dx(POS);
VELk|k = VELk|k-1 - dx(VEL);
that didn't work too... I tried both solutions, even if in my opinion the "+" should be used. But since both don't work (I have some other error elsewhere)
I would ask you if you have any suggestions.
You can see a snippet of code at the following link: http://pastebin.com/aGhKh2ck.
Thanks.
The difficulty you're running into is the difference between the theory and the practice. Taking your code from the snippet instead of the symbolic version in the question:
% Apply corrections
Pned = Pned + dx(1:3);
Vned = Vned + dx(4:6);
In theory when you use the Indirect form you are freely integrating the IMU (that process called the Mechanization in that paper) and occasionally running the IKF to update its correction. In theory the unchecked double integration of the accelerometer produces large (or for cheap MEMS IMUs, enormous) error values in Pned and Vned. That, in turn, causes the IKF to produce correspondingly large values of dx(1:6) as time evolves and the unchecked IMU integration runs farther and farther away from the truth. In theory you then sample your position at any time as Pned +/- dx(1:3) (the sign isn't important -- you can set that up either way). The important part here is that you are not modifying Pned from the IKF because both are running independent from each other and you add them together when you need the answer.
In practice you do not want to take the difference between two enourmous double values because you will lose precision (because many of the bits of the significand were needed to represent the enormous part instead of the precision you want). You have grasped that in practice you want to recursively update Pned on each update. However, when you diverge from the theory this way, you have to take the corresponding (and somewhat unobvious) step of zeroing out your correction value from the IKF state vector. In other words, after you do Pned = Pned + dx(1:3) you have "used" the correction, and you need to balance the equation with dx(1:3) = dx(1:3) - dx(1:3) (simplified: dx(1:3) = 0) so that you don't inadvertently integrate the correction over time.
Why does this work? Why doesn't it mess up the rest of the filter? As it turns out, the KF process covariance P does not actually depend on the state x. It depends on the update function and the process noise Q and so on. So the filter doesn't care what the data is. (Now that's a simplification, because often Q and R include rotation terms, and R might vary based on other state variables, etc, but in those cases you are actually using state from outside the filter (the cumulative position and orientation) not the raw correction values, which have no meaning by themselves).

Is there a way to avoid creating an array in this Julia expression?

Is there a way to avoid creating an array in this Julia expression:
max((filter(n -> string(n) == reverse(string(n)), [x*y for x = 1:N, y = 1:N])))
and make it behave similar to this Python generator expression:
max(x*y for x in range(N+1) for y in range(x, N+1) if str(x*y) == str(x*y)[::-1])
Julia version is 2.3 times slower then Python due to array allocation and N*N iterations vs. Python's N*N/2.
EDIT
After playing a bit with a few implementations in Julia, the fastest loop style version I've got is:
function f(N) # 320ms for N=1000 Julia 0.2.0 i686-w64-mingw32
nMax = NaN
for x = 1:N, y = x:N
n = x*y
s = string(n)
s == reverse(s) || continue
nMax < n && (nMax = n)
end
nMax
end
but an improved functional version isn't far behind (only 14% slower or significantly faster, if you consider 2x larger domain):
function e(N) # 366ms for N=1000 Julia 0.2.0 i686-w64-mingw32
isPalindrome(n) = string(n) == reverse(string(n))
max(filter(isPalindrome, [x*y for x = 1:N, y = 1:N]))
end
There is 2.6x unexpected performance improvement by defining isPalindrome function, compared to original version on the top of this page.
We have talked about allowing the syntax
max(f(x) for x in itr)
as a shorthand for producing each of the values f(x) in one coroutine while computing the max in another coroutine. This would basically be shorthand for something like this:
max(#task for x in itr; produce(f(x)); end)
Note, however, that this syntax that explicitly creates a task already works, although it is somewhat less pretty than the above. Your problem can be expressed like this:
max(#task for x=1:N, y=x:N
string(x*y) == reverse(string(x*y)) && produce(x*y)
end)
With the hypothetical producer syntax above, it could be reduced to something like this:
max(x*y if string(x*y) == reverse(string(x*y) for x=1:N, y=x:N)
While I'm a fan of functional style, in this case I would probably just use a for loop:
m = 0
for x = 1:N, y = x:N
n = x*y
string(n) == reverse(string(n)) || continue
m < n && (m = n)
end
Personally, I don't find this version much harder to read and it will certainly be quite fast in Julia. In general, while functional style can be convenient and pretty, if your primary focus is on performance, then explicit for loops are your friend. Nevertheless, we should make sure that John's max/filter/product version works. The for loop version also makes other optimizations easier to add, like Harlan's suggestion of reversing the loop ordering and exiting on the first palindrome you find. There are also faster ways to check if a number is a palindrome in a given base than actually creating and comparing strings.
As to the general question of "getting flexible generators and list comprehensions in Julia", the language already has
A general high-performance iteration protocol based on the start/done/next functions.
Far more powerful multidimensional array comprehensions than most languages. At this point, the only missing feature is the if guard, which is complicated by the interaction with multidimensional comprehensions and the need to potentially dynamically grow the resulting array.
Coroutines (aka tasks) which allow, among other patterns, the producer-consumer pattern.
Python has the if guard but doesn't worry about comprehension performance nearly as much – if we're going to add that feature to Julia's comprehensions, we're going to do it in a way that's both fast and interacts well with multidimensional arrays, hence the delay.
Update: The max function is now called maximum (maximum is to max as sum is to +) and the generator syntax and/or filters work on master, so for example, you can do this:
julia> #time maximum(100x - x^2 for x = 1:100 if x % 3 == 0)
0.059185 seconds (31.16 k allocations: 1.307 MB)
2499
Once 0.5 is out, I'll update this answer more thoroughly.
There are two questions being mixed together here: (1) can you filter a list comprehension mid-comprehension (for which the answer is currently no) and (2) can you use a generator that doesn't allocate an array (for which the answer is partially yes). Generators are provided by the Iterators package, but the Iterators package seems to not play well with filter at the moment. In principle, the code below should work:
max((x, y) -> x * y,
filter((x, y) -> string(x * y) == reverse(string(x * y)),
product(1:N, 1:N)))
I don't think so. There aren't currently filters in Julia array comprehensions. See discussion in this issue.
In this particular case, I'd suggest just nested for loops if you want to get faster computation.
(There might be faster approaches where you start with N and count backwards, stopping as soon as you find something that succeeds. Figuring out how to do that correctly is left as an exercise, etc...)
As mentioned, this is now possible (using Julia 0.5.0)
isPalindrome(n::String) = n == reverse(n)
fun(N::Int) = maximum(x*y for x in 1:N for y in x:N if isPalindrome(string(x*y)))
I'm sure there are better ways that others can comment on. Time (after warm-up):
julia> #time fun(1000);
0.082785 seconds (2.03 M allocations: 108.109 MB, 27.35% gc time)

What's So Good About Recursion? [duplicate]

Is there a performance hit if we use a loop instead of recursion or vice versa in algorithms where both can serve the same purpose? Eg: Check if the given string is a palindrome.
I have seen many programmers using recursion as a means to show off when a simple iteration algorithm can fit the bill.
Does the compiler play a vital role in deciding what to use?
Loops may achieve a performance gain for your program. Recursion may achieve a performance gain for your programmer. Choose which is more important in your situation!
It is possible that recursion will be more expensive, depending on if the recursive function is tail recursive (the last line is recursive call). Tail recursion should be recognized by the compiler and optimized to its iterative counterpart (while maintaining the concise, clear implementation you have in your code).
I would write the algorithm in the way that makes the most sense and is the clearest for the poor sucker (be it yourself or someone else) that has to maintain the code in a few months or years. If you run into performance issues, then profile your code, and then and only then look into optimizing by moving over to an iterative implementation. You may want to look into memoization and dynamic programming.
Comparing recursion to iteration is like comparing a phillips head screwdriver to a flat head screwdriver. For the most part you could remove any phillips head screw with a flat head, but it would just be easier if you used the screwdriver designed for that screw right?
Some algorithms just lend themselves to recursion because of the way they are designed (Fibonacci sequences, traversing a tree like structure, etc.). Recursion makes the algorithm more succinct and easier to understand (therefore shareable and reusable).
Also, some recursive algorithms use "Lazy Evaluation" which makes them more efficient than their iterative brothers. This means that they only do the expensive calculations at the time they are needed rather than each time the loop runs.
That should be enough to get you started. I'll dig up some articles and examples for you too.
Link 1: Haskel vs PHP (Recursion vs Iteration)
Here is an example where the programmer had to process a large data set using PHP. He shows how easy it would have been to deal with in Haskel using recursion, but since PHP had no easy way to accomplish the same method, he was forced to use iteration to get the result.
http://blog.webspecies.co.uk/2011-05-31/lazy-evaluation-with-php.html
Link 2: Mastering Recursion
Most of recursion's bad reputation comes from the high costs and inefficiency in imperative languages. The author of this article talks about how to optimize recursive algorithms to make them faster and more efficient. He also goes over how to convert a traditional loop into a recursive function and the benefits of using tail-end recursion. His closing words really summed up some of my key points I think:
"recursive programming gives the programmer a better way of organizing
code in a way that is both maintainable and logically consistent."
https://developer.ibm.com/articles/l-recurs/
Link 3: Is recursion ever faster than looping? (Answer)
Here is a link to an answer for a stackoverflow question that is similar to yours. The author points out that a lot of the benchmarks associated with either recursing or looping are very language specific. Imperative languages are typically faster using a loop and slower with recursion and vice-versa for functional languages. I guess the main point to take from this link is that it is very difficult to answer the question in a language agnostic / situation blind sense.
Is recursion ever faster than looping?
Recursion is more costly in memory, as each recursive call generally requires a memory address to be pushed to the stack - so that later the program could return to that point.
Still, there are many cases in which recursion is a lot more natural and readable than loops - like when working with trees. In these cases I would recommend sticking to recursion.
Typically, one would expect the performance penalty to lie in the other direction. Recursive calls can lead to the construction of extra stack frames; the penalty for this varies. Also, in some languages like Python (more correctly, in some implementations of some languages...), you can run into stack limits rather easily for tasks you might specify recursively, such as finding the maximum value in a tree data structure. In these cases, you really want to stick with loops.
Writing good recursive functions can reduce the performance penalty somewhat, assuming you have a compiler that optimizes tail recursions, etc. (Also double check to make sure that the function really is tail recursive---it's one of those things that many people make mistakes on.)
Apart from "edge" cases (high performance computing, very large recursion depth, etc.), it's preferable to adopt the approach that most clearly expresses your intent, is well-designed, and is maintainable. Optimize only after identifying a need.
Recursion is better than iteration for problems that can be broken down into multiple, smaller pieces.
For example, to make a recursive Fibonnaci algorithm, you break down fib(n) into fib(n-1) and fib(n-2) and compute both parts. Iteration only allows you to repeat a single function over and over again.
However, Fibonacci is actually a broken example and I think iteration is actually more efficient. Notice that fib(n) = fib(n-1) + fib(n-2) and fib(n-1) = fib(n-2) + fib(n-3). fib(n-1) gets calculated twice!
A better example is a recursive algorithm for a tree. The problem of analyzing the parent node can be broken down into multiple smaller problems of analyzing each child node. Unlike the Fibonacci example, the smaller problems are independent of each other.
So yeah - recursion is better than iteration for problems that can be broken down into multiple, smaller, independent, similar problems.
Your performance deteriorates when using recursion because calling a method, in any language, implies a lot of preparation: the calling code posts a return address, call parameters, some other context information such as processor registers might be saved somewhere, and at return time the called method posts a return value which is then retrieved by the caller, and any context information that was previously saved will be restored. the performance diff between an iterative and a recursive approach lies in the time these operations take.
From an implementation point of view, you really start noticing the difference when the time it takes to handle the calling context is comparable to the time it takes for your method to execute. If your recursive method takes longer to execute then the calling context management part, go the recursive way as the code is generally more readable and easy to understand and you won't notice the performance loss. Otherwise go iterative for efficiency reasons.
I believe tail recursion in java is not currently optimized. The details are sprinkled throughout this discussion on LtU and the associated links. It may be a feature in the upcoming version 7, but apparently it presents certain difficulties when combined with Stack Inspection since certain frames would be missing. Stack Inspection has been used to implement their fine-grained security model since Java 2.
http://lambda-the-ultimate.org/node/1333
There are many cases where it gives a much more elegant solution over the iterative method, the common example being traversal of a binary tree, so it isn't necessarily more difficult to maintain. In general, iterative versions are usually a bit faster (and during optimization may well replace a recursive version), but recursive versions are simpler to comprehend and implement correctly.
Recursion is very useful is some situations. For example consider the code for finding the factorial
int factorial ( int input )
{
int x, fact = 1;
for ( x = input; x > 1; x--)
fact *= x;
return fact;
}
Now consider it by using the recursive function
int factorial ( int input )
{
if (input == 0)
{
return 1;
}
return input * factorial(input - 1);
}
By observing these two, we can see that recursion is easy to understand.
But if it is not used with care it can be so much error prone too.
Suppose if we miss if (input == 0), then the code will be executed for some time and ends with usually a stack overflow.
In many cases recursion is faster because of caching, which improves performance. For example, here is an iterative version of merge sort using the traditional merge routine. It will run slower than the recursive implementation because of caching improved performances.
Iterative implementation
public static void sort(Comparable[] a)
{
int N = a.length;
aux = new Comparable[N];
for (int sz = 1; sz < N; sz = sz+sz)
for (int lo = 0; lo < N-sz; lo += sz+sz)
merge(a, lo, lo+sz-1, Math.min(lo+sz+sz-1, N-1));
}
Recursive implementation
private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi)
{
if (hi <= lo) return;
int mid = lo + (hi - lo) / 2;
sort(a, aux, lo, mid);
sort(a, aux, mid+1, hi);
merge(a, aux, lo, mid, hi);
}
PS - this is what was told by Professor Kevin Wayne (Princeton University) on the course on algorithms presented on Coursera.
Using recursion, you're incurring the cost of a function call with each "iteration", whereas with a loop, the only thing you usually pay is an increment/decrement. So, if the code for the loop isn't much more complicated than the code for the recursive solution, loop will usually be superior to recursion.
Recursion and iteration depends on the business logic that you want to implement, though in most of the cases it can be used interchangeably. Most developers go for recursion because it is easier to understand.
It depends on the language. In Java you should use loops. Functional languages optimize recursion.
Recursion has a disadvantage that the algorithm that you write using recursion has O(n) space complexity.
While iterative aproach have a space complexity of O(1).This is the advantange of using iteration over recursion.
Then why do we use recursion?
See below.
Sometimes it is easier to write an algorithm using recursion while it's slightly tougher to write the same algorithm using iteration.In this case if you opt to follow the iteration approach you would have to handle stack yourself.
If you're just iterating over a list, then sure, iterate away.
A couple of other answers have mentioned (depth-first) tree traversal. It really is such a great example, because it's a very common thing to do to a very common data structure. Recursion is extremely intuitive for this problem.
Check out the "find" methods here:
http://penguin.ewu.edu/cscd300/Topic/BSTintro/index.html
Recursion is more simple (and thus - more fundamental) than any possible definition of an iteration. You can define a Turing-complete system with only a pair of combinators (yes, even a recursion itself is a derivative notion in such a system). Lambda calculus is an equally powerful fundamental system, featuring recursive functions. But if you want to define an iteration properly, you'd need much more primitives to start with.
As for the code - no, recursive code is in fact much easier to understand and to maintain than a purely iterative one, since most data structures are recursive. Of course, in order to get it right one would need a language with a support for high order functions and closures, at least - to get all the standard combinators and iterators in a neat way. In C++, of course, complicated recursive solutions can look a bit ugly, unless you're a hardcore user of FC++ and alike.
I would think in (non tail) recursion there would be a performance hit for allocating a new stack etc every time the function is called (dependent on language of course).
it depends on "recursion depth".
it depends on how much the function call overhead will influence the total execution time.
For example, calculating the classical factorial in a recursive way is very inefficient due to:
- risk of data overflowing
- risk of stack overflowing
- function call overhead occupy 80% of execution time
while developing a min-max algorithm for position analysis in the game of chess that will analyze subsequent N moves can be implemented in recursion over the "analysis depth" (as I'm doing ^_^)
Recursion? Where do I start, wiki will tell you “it’s the process of repeating items in a self-similar way"
Back in day when I was doing C, C++ recursion was a god send, stuff like "Tail recursion". You'll also find many sorting algorithms use recursion. Quick sort example: http://alienryderflex.com/quicksort/
Recursion is like any other algorithm useful for a specific problem. Perhaps you mightn't find a use straight away or often but there will be problem you’ll be glad it’s available.
In C++ if the recursive function is a templated one, then the compiler has more chance to optimize it, as all the type deduction and function instantiations will occur in compile time. Modern compilers can also inline the function if possible. So if one uses optimization flags like -O3 or -O2 in g++, then recursions may have the chance to be faster than iterations. In iterative codes, the compiler gets less chance to optimize it, as it is already in the more or less optimal state (if written well enough).
In my case, I was trying to implement matrix exponentiation by squaring using Armadillo matrix objects, in both recursive and iterative way. The algorithm can be found here... https://en.wikipedia.org/wiki/Exponentiation_by_squaring.
My functions were templated and I have calculated 1,000,000 12x12 matrices raised to the power 10. I got the following result:
iterative + optimisation flag -O3 -> 2.79.. sec
recursive + optimisation flag -O3 -> 1.32.. sec
iterative + No-optimisation flag -> 2.83.. sec
recursive + No-optimisation flag -> 4.15.. sec
These results have been obtained using gcc-4.8 with c++11 flag (-std=c++11) and Armadillo 6.1 with Intel mkl. Intel compiler also shows similar results.
Mike is correct. Tail recursion is not optimized out by the Java compiler or the JVM. You will always get a stack overflow with something like this:
int count(int i) {
return i >= 100000000 ? i : count(i+1);
}
You have to keep in mind that utilizing too deep recursion you will run into Stack Overflow, depending on allowed stack size. To prevent this make sure to provide some base case which ends you recursion.
Using just Chrome 45.0.2454.85 m, recursion seems to be a nice amount faster.
Here is the code:
(function recursionVsForLoop(global) {
"use strict";
// Perf test
function perfTest() {}
perfTest.prototype.do = function(ns, fn) {
console.time(ns);
fn();
console.timeEnd(ns);
};
// Recursion method
(function recur() {
var count = 0;
global.recurFn = function recurFn(fn, cycles) {
fn();
count = count + 1;
if (count !== cycles) recurFn(fn, cycles);
};
})();
// Looped method
function loopFn(fn, cycles) {
for (var i = 0; i < cycles; i++) {
fn();
}
}
// Tests
var curTest = new perfTest(),
testsToRun = 100;
curTest.do('recursion', function() {
recurFn(function() {
console.log('a recur run.');
}, testsToRun);
});
curTest.do('loop', function() {
loopFn(function() {
console.log('a loop run.');
}, testsToRun);
});
})(window);
RESULTS
// 100 runs using standard for loop
100x for loop run.
Time to complete: 7.683ms
// 100 runs using functional recursive approach w/ tail recursion
100x recursion run.
Time to complete: 4.841ms
In the screenshot below, recursion wins again by a bigger margin when run at 300 cycles per test
If the iterations are atomic and orders of magnitude more expensive than pushing a new stack frame and creating a new thread and you have multiple cores and your runtime environment can use all of them, then a recursive approach could yield a huge performance boost when combined with multithreading. If the average number of iterations is not predictable then it might be a good idea to use a thread pool which will control thread allocation and prevent your process from creating too many threads and hogging the system.
For example, in some languages, there are recursive multithreaded merge sort implementations.
But again, multithreading can be used with looping rather than recursion, so how well this combination will work depends on more factors including the OS and its thread allocation mechanism.
I found another differences between those approaches.
It looks simple and unimportant, but it has a very important role while you prepare for interviews and this subject arises, so look closely.
In short:
1) iterative post-order traversal is not easy - that makes DFT more complex
2) cycles check easier with recursion
Details:
In the recursive case, it is easy to create pre and post traversals:
Imagine a pretty standard question: "print all tasks that should be executed to execute the task 5, when tasks depend on other tasks"
Example:
//key-task, value-list of tasks the key task depends on
//"adjacency map":
Map<Integer, List<Integer>> tasksMap = new HashMap<>();
tasksMap.put(0, new ArrayList<>());
tasksMap.put(1, new ArrayList<>());
List<Integer> t2 = new ArrayList<>();
t2.add(0);
t2.add(1);
tasksMap.put(2, t2);
List<Integer> t3 = new ArrayList<>();
t3.add(2);
t3.add(10);
tasksMap.put(3, t3);
List<Integer> t4 = new ArrayList<>();
t4.add(3);
tasksMap.put(4, t4);
List<Integer> t5 = new ArrayList<>();
t5.add(3);
tasksMap.put(5, t5);
tasksMap.put(6, new ArrayList<>());
tasksMap.put(7, new ArrayList<>());
List<Integer> t8 = new ArrayList<>();
t8.add(5);
tasksMap.put(8, t8);
List<Integer> t9 = new ArrayList<>();
t9.add(4);
tasksMap.put(9, t9);
tasksMap.put(10, new ArrayList<>());
//task to analyze:
int task = 5;
List<Integer> res11 = getTasksInOrderDftReqPostOrder(tasksMap, task);
System.out.println(res11);**//note, no reverse required**
List<Integer> res12 = getTasksInOrderDftReqPreOrder(tasksMap, task);
Collections.reverse(res12);//note reverse!
System.out.println(res12);
private static List<Integer> getTasksInOrderDftReqPreOrder(Map<Integer, List<Integer>> tasksMap, int task) {
List<Integer> result = new ArrayList<>();
Set<Integer> visited = new HashSet<>();
reqPreOrder(tasksMap,task,result, visited);
return result;
}
private static void reqPreOrder(Map<Integer, List<Integer>> tasksMap, int task, List<Integer> result, Set<Integer> visited) {
if(!visited.contains(task)) {
visited.add(task);
result.add(task);//pre order!
List<Integer> children = tasksMap.get(task);
if (children != null && children.size() > 0) {
for (Integer child : children) {
reqPreOrder(tasksMap,child,result, visited);
}
}
}
}
private static List<Integer> getTasksInOrderDftReqPostOrder(Map<Integer, List<Integer>> tasksMap, int task) {
List<Integer> result = new ArrayList<>();
Set<Integer> visited = new HashSet<>();
reqPostOrder(tasksMap,task,result, visited);
return result;
}
private static void reqPostOrder(Map<Integer, List<Integer>> tasksMap, int task, List<Integer> result, Set<Integer> visited) {
if(!visited.contains(task)) {
visited.add(task);
List<Integer> children = tasksMap.get(task);
if (children != null && children.size() > 0) {
for (Integer child : children) {
reqPostOrder(tasksMap,child,result, visited);
}
}
result.add(task);//post order!
}
}
Note that the recursive post-order-traversal does not require a subsequent reversal of the result. Children printed first and your task in the question printed last. Everything is fine. You can do a recursive pre-order-traversal (also shown above) and that one will require a reversal of the result list.
Not that simple with iterative approach! In iterative (one stack) approach you can only do a pre-ordering-traversal, so you obliged to reverse the result array at the end:
List<Integer> res1 = getTasksInOrderDftStack(tasksMap, task);
Collections.reverse(res1);//note reverse!
System.out.println(res1);
private static List<Integer> getTasksInOrderDftStack(Map<Integer, List<Integer>> tasksMap, int task) {
List<Integer> result = new ArrayList<>();
Set<Integer> visited = new HashSet<>();
Stack<Integer> st = new Stack<>();
st.add(task);
visited.add(task);
while(!st.isEmpty()){
Integer node = st.pop();
List<Integer> children = tasksMap.get(node);
result.add(node);
if(children!=null && children.size() > 0){
for(Integer child:children){
if(!visited.contains(child)){
st.add(child);
visited.add(child);
}
}
}
//If you put it here - it does not matter - it is anyway a pre-order
//result.add(node);
}
return result;
}
Looks simple, no?
But it is a trap in some interviews.
It means the following: with the recursive approach, you can implement Depth First Traversal and then select what order you need pre or post(simply by changing the location of the "print", in our case of the "adding to the result list"). With the iterative (one stack) approach you can easily do only pre-order traversal and so in the situation when children need be printed first(pretty much all situations when you need start print from the bottom nodes, going upwards) - you are in the trouble. If you have that trouble you can reverse later, but it will be an addition to your algorithm. And if an interviewer is looking at his watch it may be a problem for you. There are complex ways to do an iterative post-order traversal, they exist, but they are not simple. Example:https://www.geeksforgeeks.org/iterative-postorder-traversal-using-stack/
Thus, the bottom line: I would use recursion during interviews, it is simpler to manage and to explain. You have an easy way to go from pre to post-order traversal in any urgent case. With iterative you are not that flexible.
I would use recursion and then tell: "Ok, but iterative can provide me more direct control on used memory, I can easily measure the stack size and disallow some dangerous overflow.."
Another plus of recursion - it is simpler to avoid / notice cycles in a graph.
Example (preudocode):
dft(n){
mark(n)
for(child: n.children){
if(marked(child))
explode - cycle found!!!
dft(child)
}
unmark(n)
}
It may be fun to write it as recursion, or as a practice.
However, if the code is to be used in production, you need to consider the possibility of stack overflow.
Tail recursion optimization can eliminate stack overflow, but do you want to go through the trouble of making it so, and you need to know you can count on it having the optimization in your environment.
Every time the algorithm recurses, how much is the data size or n reduced by?
If you are reducing the size of data or n by half every time you recurse, then in general you don't need to worry about stack overflow. Say, if it needs to be 4,000 level deep or 10,000 level deep for the program to stack overflow, then your data size need to be roughly 24000 for your program to stack overflow. To put that into perspective, a biggest storage device recently can hold 261 bytes, and if you have 261 of such devices, you are only dealing with 2122 data size. If you are looking at all the atoms in the universe, it is estimated that it may be less than 284. If you need to deal with all the data in the universe and their states for every millisecond since the birth of the universe estimated to be 14 billion years ago, it may only be 2153. So if your program can handle 24000 units of data or n, you can handle all data in the universe and the program will not stack overflow. If you don't need to deal with numbers that are as big as 24000 (a 4000-bit integer), then in general you don't need to worry about stack overflow.
However, if you reduce the size of data or n by a constant amount every time you recurse, then you can run into stack overflow when n becomes merely 20000. That is, the program runs well when n is 1000, and you think the program is good, and then the program stack overflows when some time in the future, when n is 5000 or 20000.
So if you have a possibility of stack overflow, try to make it an iterative solution.
As far as I know, Perl does not optimize tail-recursive calls, but you can fake it.
sub f{
my($l,$r) = #_;
if( $l >= $r ){
return $l;
} else {
# return f( $l+1, $r );
#_ = ( $l+1, $r );
goto &f;
}
}
When first called it will allocate space on the stack. Then it will change its arguments, and restart the subroutine, without adding anything more to the stack. It will therefore pretend that it never called its self, changing it into an iterative process.
Note that there is no "my #_;" or "local #_;", if you did it would no longer work.
"Is there a performance hit if we use a loop instead of
recursion or vice versa in algorithms where both can serve the same purpose?"
Usually yes if you are writing in a imperative language iteration will run faster than recursion, the performance hit is minimized in problems where the iterative solution requires manipulating Stacks and popping items off of a stack due to the recursive nature of the problem. There are a lot of times where the recursive implementation is much easier to read because the code is much shorter,
so you do want to consider maintainability. Especailly in cases where the problem has a recursive nature. So take for example:
The recursive implementation of Tower of Hanoi:
def TowerOfHanoi(n , source, destination, auxiliary):
if n==1:
print ("Move disk 1 from source",source,"to destination",destination)
return
TowerOfHanoi(n-1, source, auxiliary, destination)
print ("Move disk",n,"from source",source,"to destination",destination)
TowerOfHanoi(n-1, auxiliary, destination, source)
Fairly short and pretty easy to read. Compare this with its Counterpart iterative TowerOfHanoi:
# Python3 program for iterative Tower of Hanoi
import sys
# A structure to represent a stack
class Stack:
# Constructor to set the data of
# the newly created tree node
def __init__(self, capacity):
self.capacity = capacity
self.top = -1
self.array = [0]*capacity
# function to create a stack of given capacity.
def createStack(capacity):
stack = Stack(capacity)
return stack
# Stack is full when top is equal to the last index
def isFull(stack):
return (stack.top == (stack.capacity - 1))
# Stack is empty when top is equal to -1
def isEmpty(stack):
return (stack.top == -1)
# Function to add an item to stack.
# It increases top by 1
def push(stack, item):
if(isFull(stack)):
return
stack.top+=1
stack.array[stack.top] = item
# Function to remove an item from stack.
# It decreases top by 1
def Pop(stack):
if(isEmpty(stack)):
return -sys.maxsize
Top = stack.top
stack.top-=1
return stack.array[Top]
# Function to implement legal
# movement between two poles
def moveDisksBetweenTwoPoles(src, dest, s, d):
pole1TopDisk = Pop(src)
pole2TopDisk = Pop(dest)
# When pole 1 is empty
if (pole1TopDisk == -sys.maxsize):
push(src, pole2TopDisk)
moveDisk(d, s, pole2TopDisk)
# When pole2 pole is empty
else if (pole2TopDisk == -sys.maxsize):
push(dest, pole1TopDisk)
moveDisk(s, d, pole1TopDisk)
# When top disk of pole1 > top disk of pole2
else if (pole1TopDisk > pole2TopDisk):
push(src, pole1TopDisk)
push(src, pole2TopDisk)
moveDisk(d, s, pole2TopDisk)
# When top disk of pole1 < top disk of pole2
else:
push(dest, pole2TopDisk)
push(dest, pole1TopDisk)
moveDisk(s, d, pole1TopDisk)
# Function to show the movement of disks
def moveDisk(fromPeg, toPeg, disk):
print("Move the disk", disk, "from '", fromPeg, "' to '", toPeg, "'")
# Function to implement TOH puzzle
def tohIterative(num_of_disks, src, aux, dest):
s, d, a = 'S', 'D', 'A'
# If number of disks is even, then interchange
# destination pole and auxiliary pole
if (num_of_disks % 2 == 0):
temp = d
d = a
a = temp
total_num_of_moves = int(pow(2, num_of_disks) - 1)
# Larger disks will be pushed first
for i in range(num_of_disks, 0, -1):
push(src, i)
for i in range(1, total_num_of_moves + 1):
if (i % 3 == 1):
moveDisksBetweenTwoPoles(src, dest, s, d)
else if (i % 3 == 2):
moveDisksBetweenTwoPoles(src, aux, s, a)
else if (i % 3 == 0):
moveDisksBetweenTwoPoles(aux, dest, a, d)
# Input: number of disks
num_of_disks = 3
# Create three stacks of size 'num_of_disks'
# to hold the disks
src = createStack(num_of_disks)
dest = createStack(num_of_disks)
aux = createStack(num_of_disks)
tohIterative(num_of_disks, src, aux, dest)
Now the first one is way easier to read because suprise suprise shorter code is usually easier to understand than code that is 10 times longer. Sometimes you want to ask yourself is the extra performance gain really worth it? The amount of hours wasted debugging the code. Is the iterative TowerOfHanoi faster than the Recursive TowerOfHanoi? Probably, but not by a big margin. Would I like to program Recursive problems like TowerOfHanoi using iteration? Hell no. Next we have another recursive function the Ackermann function:
Using recursion:
if m == 0:
# BASE CASE
return n + 1
elif m > 0 and n == 0:
# RECURSIVE CASE
return ackermann(m - 1, 1)
elif m > 0 and n > 0:
# RECURSIVE CASE
return ackermann(m - 1, ackermann(m, n - 1))
Using Iteration:
callStack = [{'m': 2, 'n': 3, 'indentation': 0, 'instrPtr': 'start'}]
returnValue = None
while len(callStack) != 0:
m = callStack[-1]['m']
n = callStack[-1]['n']
indentation = callStack[-1]['indentation']
instrPtr = callStack[-1]['instrPtr']
if instrPtr == 'start':
print('%sackermann(%s, %s)' % (' ' * indentation, m, n))
if m == 0:
# BASE CASE
returnValue = n + 1
callStack.pop()
continue
elif m > 0 and n == 0:
# RECURSIVE CASE
callStack[-1]['instrPtr'] = 'after first recursive case'
callStack.append({'m': m - 1, 'n': 1, 'indentation': indentation + 1, 'instrPtr': 'start'})
continue
elif m > 0 and n > 0:
# RECURSIVE CASE
callStack[-1]['instrPtr'] = 'after second recursive case, inner call'
callStack.append({'m': m, 'n': n - 1, 'indentation': indentation + 1, 'instrPtr': 'start'})
continue
elif instrPtr == 'after first recursive case':
returnValue = returnValue
callStack.pop()
continue
elif instrPtr == 'after second recursive case, inner call':
callStack[-1]['innerCallResult'] = returnValue
callStack[-1]['instrPtr'] = 'after second recursive case, outer call'
callStack.append({'m': m - 1, 'n': returnValue, 'indentation': indentation + 1, 'instrPtr': 'start'})
continue
elif instrPtr == 'after second recursive case, outer call':
returnValue = returnValue
callStack.pop()
continue
print(returnValue)
And once again I will argue that the recursive implementation is much easier to understand. So my conclusion is use recursion if the problem by nature is recursive and requires manipulating items in a stack.
I'm going to answer your question by designing a Haskell data structure by "induction", which is a sort of "dual" to recursion. And then I will show how this duality leads to nice things.
We introduce a type for a simple tree:
data Tree a = Branch (Tree a) (Tree a)
| Leaf a
deriving (Eq)
We can read this definition as saying "A tree is a Branch (which contains two trees) or is a leaf (which contains a data value)". So the leaf is a sort of minimal case. If a tree isn't a leaf, then it must be a compound tree containing two trees. These are the only cases.
Let's make a tree:
example :: Tree Int
example = Branch (Leaf 1)
(Branch (Leaf 2)
(Leaf 3))
Now, let's suppose we want to add 1 to each value in the tree. We can do this by calling:
addOne :: Tree Int -> Tree Int
addOne (Branch a b) = Branch (addOne a) (addOne b)
addOne (Leaf a) = Leaf (a + 1)
First, notice that this is in fact a recursive definition. It takes the data constructors Branch and Leaf as cases (and since Leaf is minimal and these are the only possible cases), we are sure that the function will terminate.
What would it take to write addOne in an iterative style? What will looping into an arbitrary number of branches look like?
Also, this kind of recursion can often be factored out, in terms of a "functor". We can make Trees into Functors by defining:
instance Functor Tree where fmap f (Leaf a) = Leaf (f a)
fmap f (Branch a b) = Branch (fmap f a) (fmap f b)
and defining:
addOne' = fmap (+1)
We can factor out other recursion schemes, such as the catamorphism (or fold) for an algebraic data type. Using a catamorphism, we can write:
addOne'' = cata go where
go (Leaf a) = Leaf (a + 1)
go (Branch a b) = Branch a b

Repeated application of functions

Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?
For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.
There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).
There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.
In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).
Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.
I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.