Octave -inf and NaN - octave

I searched the forum and found this thread, but it does not cover my question
Two ways around -inf
From a Machine Learning class, week 3, I am getting -inf when using log(0), which later turns into an NaN. The NaN results in no answer being given in a sum formula, so no scalar for J (a cost function which is the result of matrix math).
Here is a test of my function
>> sigmoid([-100;0;100])
ans =
3.7201e-44
5.0000e-01
1.0000e+00
This is as expected. but the hypothesis requires ans = 1-sigmoid
>> 1-ans
ans =
1.00000
0.50000
0.00000
and the Log(0) gives -Inf
>> log(ans)
ans =
0.00000
-0.69315
-Inf
-Inf rows do not add to the cost function, but the -Inf carries through to NaN, and I do not get a result. I cannot find any material on -Inf, but am thinking there is a problem with my sigmoid function.
Can you provide any direction?

The typical way to avoid infinity in these cases is to add eps to the operand:
log(ans + eps)
eps is a very, very small value, and won't affect the output for values of ans unless ans is zero:
>> z = [-100;0;100];
>> g = 1 ./ (1+exp(-z));
>> log(1-g + eps)
ans =
0.0000
-0.6931
-36.0437

Adding to the answers here, I really do hope you would provide some more context to your question (in particular, what are you actually trying to do.
I will go out on a limb and guess the context, just in case this is useful. You are probably doing machine learning, and trying to define a cost function based on the negative log likelihood of a model, and then trying to differentiate it to find the point where this cost is at its minimum.
In general for a reasonable model with a useful likelihood that adheres to Cromwell's rule, you shouldn't have these problems, but, in practice it happens. And presumably in the process of trying to calculate a negative log likelihood of a zero probability you get inf, and trying to calculate a differential between two points produces inf / inf = nan.
In this case, this is an 'edge case', and generally in computer science edge cases need to be spotted as exceptional circumstances and dealt with appropriately. The reality is that you can reasonably expect that inf isn't going to be your function's minimum! Therefore, whether you remove it from the calculations, or replace it by a very large number (whether arbitrarily or via machine precision) doesn't really make a difference.
So in practice you can do either of the two things suggested by others here, or even just detect such instances and skip them from the calculation. The practical result should be the same.

-inf means negative infinity. Which is the correct answer because log of (0) is minus infinity by definition.
The easiest thing to do is to check your intermediate results and if the number is below some threshold (like 1e-12) then just set it to that threshold. The answers won't be perfect but they will still be pretty close.
Using the following as the sigmoid function:
function g = sigmoid(z)
g = 1 ./ (1 + e.^-z);
end
Then the following code runs with no issues. Choose the threshold value in the 'max' statement to be less than the expected noise in your measurements and then you're good to go
>> a = sigmoid([-100, 0, 100])
a =
3.7201e-44 5.0000e-01 1.0000e+00
>> b = 1-a
b =
1.00000 0.50000 0.00000
>> c = max(b, 1e-12)
c =
1.0000e+00 5.0000e-01 1.0000e-12
>> d = log(c)
d =
0.00000 -0.69315 -27.63102

Related

Mathematica Integration taking too long

Using Mathematica I need to evaluate the integral of a function. Since it is taking the program too much to compute it, would it be possible to use parallel computation to shorten the time needed? If so, how can I do it?
I uploaded a picture of the integrand function:
I need to integrate it with respect to (x3, y3, x, y) all of them ranging in a certain interval (x3 and y3 from 0 to 1) (x and y from 0 to 100). The parameters (a,b,c...,o) are preventing the NIntegrate function to work. Any suggestions?
If you evaluate this
expr=E^((-(x-y)^4-(x3-y3)^4)/10^4)*
(f x+e x^2+(m+n x)x3-f y-e y^2-(m+n y)y3)*
((378(x-y)^2(f x+e x^2+(m+n x)x3-f y-e y^2-(m+n y)y3))/
(Pi(1/40+Sqrt[((x-y)^2+(x3-y3)^2)^3]))+
(378(x-y)(x3-y3)(h x+g x^2+(o+p x)x3-h y-g y^2-(o+p y)y3))/
(Pi(1/40+Sqrt[((x-y)^2+(x3-y3)^2)^3])))+
(h x+g x^2+(o+p x)x3-h y-g y^2-(o +p y) y3)*
((378(x-y)(x3-y3)(f x+e x^2+(m+n x)x3-f y-e y^2-(m+n y)y3))/
(Pi(1/40+Sqrt[((x-y)^2+(x3-y3)^2)^3]))+
(378 (x3 - y3)^2 (h x + g x^2 + (o + p x)x3-h y-g y^2-(o+p y)y3))/
(Pi(1/40+Sqrt[((x-y)^2+(x3-y3)^2)^3])));
list=List ## Expand[expr]
then you will get a list of 484 expressions, each very similar in form to this
(378*f*h*x^3*x3)/(Pi*(1/40+Sqrt[(x^2+x3^2-2*x*y+y^2-2*x3*y3+y3^2)^3]))
Notice that you can then use NIntegrate in this way
f*h*NIntegrate[(378*x^3*x3)/(Pi*(1/40+Sqrt[(x^2+x3^2-2*x*y+y^2-2*x3*y3+y3^2)^3])),
{x,0,100},{y,0,100},{x3,0,1},{y3,0,1}]
but it gives warnings and errors about the convergence and accuracy, almost certainly due to your fractional powers in the denominator.
If you can find a way to pull out the scalar multipliers which are independent of x,y,x3,y3 and then perform that integration without warnings and errors and get an accurate result which isn't infinity then you could perhaps perform these integrals in parallel and total the results.
Some of the integrands are scalar multiples of others and if you combine similar integrands then you can reduce this down to 300 unique integrands.
I doubt this is going to lead to an acceptable solution for you.
Please check all this very carefully to make certain that no mistakes have been made.
EDIT
Since the variables that are independent of the integration appear to be easily separated from the dependent variables in the problem posed above, I think this will allow parallel NIntegrate
independentvars[z_] := (z/(z//.{e->1, f->1, g->1, h->1, m->1, n->1, o->1, p->1}))*
NIntegrate[(z//.{e->1, f->1, g->1, h->1, m->1, n->1, o->1, p->1}),
{x, 0, 100}, {y, 0, 100}, {x3, 0, 1}, {y3, 0, 1}]
Total[ParallelMap[independentvars, list]]
As I mentioned previously, the fractional powers in the denominator result in a flood of warnings and errors about convergence failing.
You can test this with the following much simpler example
expr = f x + f g x3 + o^2 x x3;
list = List ## Expand[expr];
Total[ParallelMap[independentvars, list]]
which instantly returns
500000. f + 5000. f g + 250000. o^2
This is a very primitive method of pulling independent symbolic variables outside an NIntegrate. This gives absolutely no warning if one of the integrands is not in a form where this primitive attempt at extraction is not appropriate or fails.
There may be a far better method that someone else has written out there somewhere. If someone could show a far better method of doing this then I would appreciate it.
It might be nice if Wolfram would consider incorporating something like this into NIntegrate itself.

Is there a way to avoid creating an array in this Julia expression?

Is there a way to avoid creating an array in this Julia expression:
max((filter(n -> string(n) == reverse(string(n)), [x*y for x = 1:N, y = 1:N])))
and make it behave similar to this Python generator expression:
max(x*y for x in range(N+1) for y in range(x, N+1) if str(x*y) == str(x*y)[::-1])
Julia version is 2.3 times slower then Python due to array allocation and N*N iterations vs. Python's N*N/2.
EDIT
After playing a bit with a few implementations in Julia, the fastest loop style version I've got is:
function f(N) # 320ms for N=1000 Julia 0.2.0 i686-w64-mingw32
nMax = NaN
for x = 1:N, y = x:N
n = x*y
s = string(n)
s == reverse(s) || continue
nMax < n && (nMax = n)
end
nMax
end
but an improved functional version isn't far behind (only 14% slower or significantly faster, if you consider 2x larger domain):
function e(N) # 366ms for N=1000 Julia 0.2.0 i686-w64-mingw32
isPalindrome(n) = string(n) == reverse(string(n))
max(filter(isPalindrome, [x*y for x = 1:N, y = 1:N]))
end
There is 2.6x unexpected performance improvement by defining isPalindrome function, compared to original version on the top of this page.
We have talked about allowing the syntax
max(f(x) for x in itr)
as a shorthand for producing each of the values f(x) in one coroutine while computing the max in another coroutine. This would basically be shorthand for something like this:
max(#task for x in itr; produce(f(x)); end)
Note, however, that this syntax that explicitly creates a task already works, although it is somewhat less pretty than the above. Your problem can be expressed like this:
max(#task for x=1:N, y=x:N
string(x*y) == reverse(string(x*y)) && produce(x*y)
end)
With the hypothetical producer syntax above, it could be reduced to something like this:
max(x*y if string(x*y) == reverse(string(x*y) for x=1:N, y=x:N)
While I'm a fan of functional style, in this case I would probably just use a for loop:
m = 0
for x = 1:N, y = x:N
n = x*y
string(n) == reverse(string(n)) || continue
m < n && (m = n)
end
Personally, I don't find this version much harder to read and it will certainly be quite fast in Julia. In general, while functional style can be convenient and pretty, if your primary focus is on performance, then explicit for loops are your friend. Nevertheless, we should make sure that John's max/filter/product version works. The for loop version also makes other optimizations easier to add, like Harlan's suggestion of reversing the loop ordering and exiting on the first palindrome you find. There are also faster ways to check if a number is a palindrome in a given base than actually creating and comparing strings.
As to the general question of "getting flexible generators and list comprehensions in Julia", the language already has
A general high-performance iteration protocol based on the start/done/next functions.
Far more powerful multidimensional array comprehensions than most languages. At this point, the only missing feature is the if guard, which is complicated by the interaction with multidimensional comprehensions and the need to potentially dynamically grow the resulting array.
Coroutines (aka tasks) which allow, among other patterns, the producer-consumer pattern.
Python has the if guard but doesn't worry about comprehension performance nearly as much – if we're going to add that feature to Julia's comprehensions, we're going to do it in a way that's both fast and interacts well with multidimensional arrays, hence the delay.
Update: The max function is now called maximum (maximum is to max as sum is to +) and the generator syntax and/or filters work on master, so for example, you can do this:
julia> #time maximum(100x - x^2 for x = 1:100 if x % 3 == 0)
0.059185 seconds (31.16 k allocations: 1.307 MB)
2499
Once 0.5 is out, I'll update this answer more thoroughly.
There are two questions being mixed together here: (1) can you filter a list comprehension mid-comprehension (for which the answer is currently no) and (2) can you use a generator that doesn't allocate an array (for which the answer is partially yes). Generators are provided by the Iterators package, but the Iterators package seems to not play well with filter at the moment. In principle, the code below should work:
max((x, y) -> x * y,
filter((x, y) -> string(x * y) == reverse(string(x * y)),
product(1:N, 1:N)))
I don't think so. There aren't currently filters in Julia array comprehensions. See discussion in this issue.
In this particular case, I'd suggest just nested for loops if you want to get faster computation.
(There might be faster approaches where you start with N and count backwards, stopping as soon as you find something that succeeds. Figuring out how to do that correctly is left as an exercise, etc...)
As mentioned, this is now possible (using Julia 0.5.0)
isPalindrome(n::String) = n == reverse(n)
fun(N::Int) = maximum(x*y for x in 1:N for y in x:N if isPalindrome(string(x*y)))
I'm sure there are better ways that others can comment on. Time (after warm-up):
julia> #time fun(1000);
0.082785 seconds (2.03 M allocations: 108.109 MB, 27.35% gc time)

Determining the input of a function given an output (Calculus involved)

My Calculus teacher gave us a program on to calculate the definite integrals of a given interval using the trapezoidal rule. I know that programmed functions take an input and produce an output as arithmetic functions would but I don't know how to do the inverse: find the input given the output.
The problem states:
"Use the trapezoidal rule with varying numbers, n, of increments to estimate the distance traveled from t=0 to t=9. Find a number D for which the trapezoidal sum is within 0.01 unit of this limit (468) when n > D."
I've estimated the limit through "plug and chug" with the calculator and I know that with a regular algebraic function, I could easily do:
limit (468) = algebraic expression with variable x
(then solve for x)
However, how would I do this for a programmed function? How would I determine the input of a programmed function given output?
I am calculating the definite integral for the polynomial, (x^2+11x+28)/(x+4), between the interval 0 and 9. The trapezoidal rule function in my calculator calculates the definite integral between the interval 0 and 9 using a given number of trapezoids, n.
Overall, I want to know how to do this:
Solve for n:
468 = trapezoidal_rule(a = 0, b = 9, n);
The code for trapezoidal_rule(a, b, n) on my TI-83:
Prompt A
Prompt B
Prompt N
(B-A)/N->D
0->S
A->X
Y1/2->S
For(K,1,N-1,1)
X+D->X
Y1+S->S
End
B->X
Y1/2+S->S
SD->I
Disp "INTEGRAL"
Disp I
Because I'm not familiar with this syntax nor am I familiar with computer algorithms, I was hoping someone could help me turn this code into an algebraic equation or point me in the direction to do so.
Edit: This is not part of my homework—just intellectual curiosity
the polynomial, (x^2+11x+28)/(x+4)
This is equal to x+7. The trapezoidal rule should give exactly correct results for this function! I'm guessing that this isn't actually the function you're working with...
There is no general way to determine, given the output of a function, what its input was. (For one thing, many functions can map multiple different inputs to the same output.)
So, there is a formula for the error when you apply the trapezoidal rule with a given number of steps to a given function, and you could use that here to work out the value of n you need ... but (1) it's not terribly beautiful, and (2) it doesn't seem like a very reasonable thing to expect you to do when you're just starting to look at the trapezoidal rule. I'd guess that your teacher actually just wanted you to "plug and chug".
I don't know (see above) what function you're actually integrating, but let's pretend it's just x^2+11x+28. I'll call this f(x) below. The integral of this from 0 to 9 is actually 940.5. Suppose you divide the interval [0,9] into n pieces. Then the trapezoidal rule gives you: [f(0)/2 + f(1*9/n) + f(2*9/n) + ... + f((n-1)*9/n) + f(9)/2] * 9/n.
Let's separate this out into the contributions from x^2, from 11x, and from 28. It turns out that the trapezoidal approximation gives exactly the right result for the latter two. (Exercise: Work out why.) So the error you get from the trapezoidal rule is exactly the same as the error you'd have got from f(x) = x^2.
The actual integral of x^2 from 0 to 9 is (9^3-0^3)/3 = 243. The trapezoidal approximation is [0/2 + 1^2+2^2+...+(n-1)^2 + n^2/2] * (9/n)^2 * (9/n). (Exercise: work out why.) There's a standard formula for sums of consecutive squares: 1^2 + ... + n^2 = n(n+1/2)(n+1)/3. So our trapezoidal approximation to the integral of x^2 is (9/n)^3 times [(n-1)(n-1/2)n/3 + n^2/2] = (9/n)^3 times [n^3/3+1/6] = 243 + (9/n)^3/6.
In other words, the error in this case is exactly (9/n)^3/6 = (243/2) / n^3.
So, for instance, the error will be less than 0.01 when (243/2) / n^3 < 0.01, which is the same as n^3 > 100*243/2 = 12150, which is true when n >= 23.
[EDITED to add: I haven't checked any of my algebra or arithmetic carefully; there may be small errors. I take it what you're interested is the ideas rather than the specific numbers.]

Multivariate Bisection Method

I need an algorithm to perform a 2D bisection method for solving a 2x2 non-linear problem. Example: two equations f(x,y)=0 and g(x,y)=0 which I want to solve simultaneously. I am very familiar with the 1D bisection ( as well as other numerical methods ). Assume I already know the solution lies between the bounds x1 < x < x2 and y1 < y < y2.
In a grid the starting bounds are:
^
| C D
y2 -+ o-------o
| | |
| | |
| | |
y1 -+ o-------o
| A B
o--+------+---->
x1 x2
and I know the values f(A), f(B), f(C) and f(D) as well as g(A), g(B), g(C) and g(D). To start the bisection I guess we need to divide the points out along the edges as well as the middle.
^
| C F D
y2 -+ o---o---o
| | |
|G o o M o H
| | |
y1 -+ o---o---o
| A E B
o--+------+---->
x1 x2
Now considering the possibilities of combinations such as checking if f(G)*f(M)<0 AND g(G)*g(M)<0 seems overwhelming. Maybe I am making this a little too complicated, but I think there should be a multidimensional version of the Bisection, just as Newton-Raphson can be easily be multidimed using gradient operators.
Any clues, comments, or links are welcomed.
Sorry, while bisection works in 1-d, it fails in higher dimensions. You simply cannot break a 2-d region into subregions using only information about the function at the corners of the region and a point in the interior. In the words of Mick Jagger, "You can't always get what you want".
I just stumbled upon the answer to this from geometrictools.com and C++ code.
edit: the code is now on github.
I would split the area along a single dimension only, alternating dimensions. The condition you have for existence of zero of a single function would be "you have two points of different sign on the boundary of the region", so I'd just check that fro the two functions. However, I don't think it would work well, since zeros of both functions in a particular region don't guarantee a common zero (this might even exist in a different region that doesn't meet the criterion).
For example, look at this image:
There is no way you can distinguish the squares ABED and EFIH given only f() and g()'s behaviour on their boundary. However, ABED doesn't contain a common zero and EFIH does.
This would be similar to region queries using eg. kD-trees, if you could positively identify that a region doesn't contain zero of eg. f. Still, this can be slow under some circumstances.
If you can assume (per your comment to woodchips) that f(x,y)=0 defines a continuous monotone function y=f2(x), i.e. for each x1<=x<=x2 there is a unique solution for y (you just can't express it analytically due to the messy form of f), and similarly y=g2(x) is a continuous monotone function, then there is a way to find the joint solution.
If you could calculate f2 and g2, then you could use a 1-d bisection method on [x1,x2] to solve f2(x)-g2(x)=0. And you can do that by using 1-d bisection on [y1,y2] again for solving f(x,y)=0 for y for any given fixed x that you need to consider (x1, x2, (x1+x2)/2, etc) - that's where the continuous monotonicity is helpful -and similarly for g. You have to make sure to update x1-x2 and y1-y2 after each step.
This approach might not be efficient, but should work. Of course, lots of two-variable functions don't intersect the z-plane as continuous monotone functions.
I'm not much experient on optimization, but I built a solution to this problem with a bisection algorithm like the question describes. I think is necessary to fix a bug in my solution because it compute tow times a root in some cases, but i think it's simple and will try it later.
EDIT: I seem the comment of jpalecek, and now I anderstand that some premises I assumed are wrong, but the methods still works on most cases. More especificaly, the zero is garanteed only if the two functions variate the signals at oposite direction, but is need to handle the cases of zero at the vertices. I think is possible to build a justificated and satisfatory heuristic to that, but it is a little complicated and now I consider more promising get the function given by f_abs = abs(f, g) and build a heuristic to find the local minimuns, looking to the gradient direction on the points of the middle of edges.
Introduction
Consider the configuration in the question:
^
| C D
y2 -+ o-------o
| | |
| | |
| | |
y1 -+ o-------o
| A B
o--+------+---->
x1 x2
There are many ways to do that, but I chose to use only the corner points (A, B, C, D) and not middle or center points liky the question sugests. Assume I have tow function f(x,y) and g(x,y) as you describe. In truth it's generaly a function (x,y) -> (f(x,y), g(x,y)).
The steps are the following, and there is a resume (with a Python code) at the end.
Step by step explanation
Calculate the product each scalar function (f and g) by them self at adjacent points. Compute the minimum product for each one for each direction of variation (axis, x and y).
Fx = min(f(C)*f(B), f(D)*f(A))
Fy = min(f(A)*f(B), f(D)*f(C))
Gx = min(g(C)*g(B), g(D)*g(A))
Gy = min(g(A)*g(B), g(D)*g(C))
It looks to the product through tow oposite sides of the rectangle and computes the minimum of them, whats represents the existence of a changing of signal if its negative. It's a bit of redundance but work's well. Alternativaly you can try other configuration like use the points (E, F, G and H show in the question), but I think make sense to use the corner points because it consider better the whole area of the rectangle, but it is only a impression.
Compute the minimum of the tow axis for each function.
F = min(Fx, Fy)
G = min(Gx, Gy)
It of this values represents the existence of a zero for each function, f and g, within the rectangle.
Compute the maximum of them:
max(F, G)
If max(F, G) < 0, then there is a root inside the rectangle. Additionaly, if f(C) = 0 and g(C) = 0, there is a root too and we do the same, but if the root is in other corner we ignore him, because other rectangle will compute it (I want to avoid double computation of roots). The statement bellow resumes:
guaranteed_contain_zeros = max(F, G) < 0 or (f(C) == 0 and g(C) == 0)
In this case we have to proceed breaking the region recursively ultil the rectangles are as small as we want.
Else, may still exist a root inside the rectangle. Because of that, we have to use some criterion to break this regions ultil the we have a minimum granularity. The criterion I used is to assert the largest dimension of the current rectangle is smaller than the smallest dimension of the original rectangle (delta in the code sample bellow).
Resume
This Python code resume:
def balance_points(x_min, x_max, y_min, y_max, delta, eps=2e-32):
width = x_max - x_min
height = y_max - y_min
x_middle = (x_min + x_max)/2
y_middle = (y_min + y_max)/2
Fx = min(f(C)*f(B), f(D)*f(A))
Fy = min(f(A)*f(B), f(D)*f(C))
Gx = min(g(C)*g(B), g(D)*g(A))
Gy = min(g(A)*g(B), g(D)*g(C))
F = min(Fx, Fy)
G = min(Gx, Gy)
largest_dim = max(width, height)
guaranteed_contain_zeros = max(F, G) < 0 or (f(C) == 0 and g(C) == 0)
if guaranteed_contain_zeros and largest_dim <= eps:
return [(x_middle, y_middle)]
elif guaranteed_contain_zeros or largest_dim > delta:
if width >= height:
return balance_points(x_min, x_middle, y_min, y_max, delta) + balance_points(x_middle, x_max, y_min, y_max, delta)
else:
return balance_points(x_min, x_max, y_min, y_middle, delta) + balance_points(x_min, x_max, y_middle, y_max, delta)
else:
return []
Results
I have used a similar code similar in a personal project (GitHub here) and it draw the rectangles of the algorithm and the root (the system have a balance point at the origin):
Rectangles
It works well.
Improvements
In some cases the algorithm compute tow times the same zero. I thinh it can have tow reasons:
I the case the functions gives exatly zero at neighbour rectangles (because of an numerical truncation). In this case the remedy is to incrise eps (increase the rectangles). I chose eps=2e-32, because 32 bits is a half of the precision (on 64 bits archtecture), then is problable that the function don't gives a zero... but it was more like a guess, I don't now if is the better. But, if we decrease much the eps, it extrapolates the recursion limit of Python interpreter.
The case in witch the f(A), f(B), etc, are near to zero and the product is truncated to zero. I think it can be reduced if we use the product of the signals of f and g in place of the product of the functions.
I think is possible improve the criterion to discard a rectangle. It can be made considering how much the functions are variating in the region of the rectangle and how distante the function is of zero. Perhaps a simple relation between the average and variance of the function values on the corners. In another way (and more complicated) we can use a stack to store the values on each recursion instance and garantee that this values are convergent to stop recursion.
This is a similar problem to finding critical points in vector fields (see http://alglobus.net/NASAwork/topology/Papers/alsVugraphs93.ps).
If you have the values of f(x,y) and g(x,y) at the vertexes of your quadrilateral and you are in a discrete problem (such that you don't have an analytical expression for f(x,y) and g(x,y) nor the values at other locations inside the quadrilateral), then you can use bilinear interpolation to get two equations (for f and g). For the 2D case the analytical solution will be a quadratic equation which, according to the solution (1 root, 2 real roots, 2 imaginary roots) you may have 1 solution, 2 solutions, no solutions, solutions inside or outside your quadrilateral.
If instead you have analytic functions of f(x,y) and g(x,y) and want to use them, this is not useful. Instead you could divide your quadrilateral recursively, however as it was already pointed out by jpalecek (2nd post), you would need a way to stop your divisions by figuring out a test that would assure you would have no zeros inside a quadrilateral.
Let f_1(x,y), f_2(x,y) be two functions which are continuous and monotonic with respect to x and y. The problem is to solve the system f_1(x,y) = 0, f_2(x,y) = 0.
The alternating-direction algorithm is illustrated below. Here, the lines depict sets {f_1 = 0} and {f_2 = 0}. It is easy to see that the direction of movement of the algorithm (right-down or left-up) depends on the order of solving the equations f_i(x,y) = 0 (e.g., solve f_1(x,y) = 0 w.r.t. x then solve f_2(x,y) = 0 w.r.t. y OR first solve f_1(x,y) = 0 w.r.t. y and then solve f_2(x,y) = 0 w.r.t. x).
Given the initial guess, we don't know where the root is. So, in order to find all roots of the system, we have to move in both directions.

Repeated application of functions

Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?
For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.
There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).
There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.
In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).
Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.
I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.