Rather x <= 1 or x < 2? - language-agnostic

Rather x <= 1 or x < 2? - language-agnostic

Say we have x which is an integer.
is there any reason to prefer x <= 1 or x < 2? I mean if one is maybe faster or more readable.
This question is language independant, meaning if the answer is different for two different languages, please let me know.

Usually when I loop over a zero-based collection, I use i < col.Length, as it is more readable than i <= col.Length - 1. If I am iterating from 1 to x, I use for (int i = 1; i <= x ...), as it is more readable than < x + 1. Both of theese instructions have the same time requirement (at least on x86 architecture), so yes, it is only about readability.

I would say that it depends on the requirements of your software, was it specified that x needs to be one or less or was it specified that x needs to be less than two?
If you ever changed x to be of a number type that allows decimal points, which way would work best then? This happens more often than you think and can introduce some interesting bugs.

In general there is no evidence what types the literals 1 and 2 have. In most languages they will be the same, but in theory they could be different and then the results of two comparisons could be different as well. Also, integers are not infinite in most languages, so the behavior could be different on boundaries.
In plain C, if the comparisons are x <= -0x80000000 and x < -0x7fffffff (note that -0x80000000 < -0x7fffffff) and x has type int, the results depend on the value of x:
-0x80000000 : 1 1
-0x7fffffff .. -1 : 0 0
0 .. 0x7fffffff: 1 0
In other words, for all non-negative x, the results will be different.
Similarly, with comparisons x <= 0x7fffffff and x < 0x80000000 (the relation between constants 0x7fffffff < 0x80000000 still holds), we get:
-0x80000000 .. -1 : 1 0
0 .. 0x7fffffff: 1 1
Now the results are different for all negative values of x.
Clearly, there are some typing rules and type conversions involved (they are described in the C language standard), but the point is that the two comparisons are not replaceable on boundary cases.

Related

Three-way xor-like function

I'm trying to solve the following puzzle:
Given a stream of numbers (only 1 iteration over them is allowed) in which all numbers appear 3 times, but 1 number appear only 2 times, find this number, using O(1) memory.
I started with the idea that, if all numbers appeared 2 times, and 1 number only once, I could use xor operation between all numbers and the result would be the incognito number.
So I want to extend this idea to solve the puzzle. All I need is a xor-like function (or operator), which would yield 0 on the third apply:
SEED xor3 X xor3 X xor3 X = SEED
X xor3 Y xor3 SEED xor3 X xor3 Y xor3 Y xor3 X = SEED
Any ideas for such a function?

Regard XOR as summation on each bit of a number expressed in binary (i.e. a radix of 2), modulo 2.
Now consider a numerical system consisting of tribits 0, 1, and 2. That is, it has a radix of 3.
The operator T now becomes an operation on any number, decomposed into this radix. As in XOR, you sum the bits, but the difference is that operator T is ran in modulo 3.
You can easily show that a T a T a is zero for any a. You can also show that T is both commutative and associative. That is necessary since, in general, your sequence will have the numbers jumbled up.
Now apply this to your list of numbers. At the end of the operation, the output will be b where b = o T o and o is the number that occurs exactly twice.

Your solution for the simpler case (all number appear twice, one number appears once) works since xor operates on each bit x as
x xor x = 0 and 0 xor x = x
xor is basically a bit-wise summation modulus 2. You would need the base-3 equivalent: Transform each number into a base-3 representation. And then use summation modulus 3 for each decimal:
0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
Call this operation xor3. Now you have for each decimal x:
x xor3 x xor3 x = 0 and 0 xor3 x = x
If you apply that to all your numbers then all values that appear 3 times will vanish. The result is x xor3 x of the number x that appears twice. You need to apply decimal-wise division by 2 modulus 3.
I believe there are more efficient ways to implement that. The advantage of the xor function in the first case relies on the fact that xor is a natural base-2 operation. Is there any practical application for that?

This approach is a bit fragile: If the precondition (all numbers appear 3 times except one that appears twice) breaks the algorithm will not help you.
Take a Map with int-keys and int-values. Then walk through your numbers and for each number x increase each the according value. If x is a new key take 0 as start value.
Then you can analyze it easily: Walk through all keys and check the cardinality. It should be three for all keys except one that should be two. This is more robust and my gut feeling says it is also faster.

Don't reduce algebraic fractions in Maple

I would like to do some computations with partially defined functions like (x-3)^3 / (x-3). Maple automatically reduces algebraic fractions with no regard to points like x = 3. Unevaulation doesn't help much, because it's stripped before functions (i.e. solve) could ever make use of that points.
How should I work with such functions?

For your given example you could try working instead with the expanded forms of numerator (and also of denominator, for some other examples).
Ie,
nd := (n,d) -> expand(n)/expand(d):
expr := nd( (x-2)^3, x-2 );
3 2
x - 6 x + 12 x - 8
--------------------
x - 2

Math function to filter negative numbers

Is there any standard math function for this operation:
f(x)=max(x,0)
I was wondering maybe there is a well-known function for this operation in mathematics literature.
Any idea?

This is usually denoted as (x)+, sometimes also x⊔0 or x∨0, where the symbol alludes to the shape of the kinks in the maximum of two functions, for instance in |x|=max(x,-x).
In Lebesgue integration theory, for example, a function is first split into its positive and negative part, so that the integration theory can be reduced to non-negative functions.
Another application is splines, the cubic B-spline has the representation
B3(x)=1/6 * ( (x+2)+3 - 4 * (x+1)+3 + 6 * (x)+3 - 4 * (x-1)+3 + (x-2)+3 )

I guess, you are looking for:
(abs(x)+x)/2
https://www.wolframalpha.com/input/?i=%28%7Cx%7C%2Bx%29%2F2

Another way it might be characterised is as
x H(x)
where H(x) is the Heaviside unit step function.
H(x) = ( x >= 0 ? 1 : 0 )
i.e. 1 for positive x, 0 for negative x and either 0, 1, or 1/2 at x=0. This is used in control theory, signal processing and Fourier analysis. Its quite common to use f(x) H(x) for functions which start at a particular time, say switching some electronics on. So in this area of study x H(x) might be the best way to answer your question.

Is the use of machine epsilon appropriate for floating-point equality tests?

This is a follow-up to Testing for floating-point value equality: Is there a standard name for the “precision” constant?.
There is a very similar question Double.Epsilon for equality, greater than, less than, less than or equal to, greater than or equal to.
It is well known that an equality test for two floating-point values x and y should look more like this (rather than a straightforward =):
abs( x - y ) < epsilon , where epsilon is some very small value.
How to choose a value for epsilon?
It would obviously be preferable to choose for epsilon as small a value as possible, to get the highest-possible precision for the equality check.
As an example, the .NET framework offers a constant System.Double.Epsilon (= 4.94066 × 10-324), which represents the smallest positive System.Double value that is greater than zero.
However, it turns out that this particular value can't be reliably used as epsilon, since:
0 + System.Double.Epsilon ≠ 0
1 + System.Double.Epsilon = 1 (!)
which is, if I understand correctly, because that constant is less than machine epsilon.
→ Is this correct?
→ Does this also mean that I can reliably use epsilon := machine epsilon for equality tests?
Removed these two questions, as they are already adequately answered by the second SO question linked-to above.
The linked-to Wikipedia article says that for 64-bit floating-point numbers (ie. the double type in many languages), machine epsilon is equal to:
2-53, or approx. 0.000000000000000111 (a number with 15 zeroes after the decimal point)
→ Does it follow from this that all 64-bit floating point values are guaranteed to be accurate to 14 (if not 15) digits?

How to choose a value for epsilon?
Short Answer: You take a small value which fits your applications needs.
Long Answer: Nobody can know which calculations your application does and how accurate you expect your results to be. Since rounding errors sum up machine epsilon will be almost all times far too big so you have to chose your own value. Depending on your needs, 0.01 be be sufficient, or maybe 0.00000000000001 or less will.
The question is, do you really want/need to do equality tests on floating point values? Maybe you should redesign your algorithms.

In the past when I have had to use an epsilon value it's been very much bigger than the machine epsilon value.
Although it was for 32 bit doubles (rather than 64 bit doubles) we found that an epsilon value of 10-6 was needed for most (if not all) calculated values in our particular application.
The value of epsilon you choose depends on the scale of your numbers. If you are dealing with the very large (10+10 say) then you might need a larger value of epsilon as your significant digits don't stretch very far into the fractional part (if at all). If you are dealing with the very small (10-10 say) then obviously you need an epsilon value that's smaller than this.
You need to do some experimentation, performing your calculations and checking the differences between your output values. Only when you know the range of your potential answers will you be able to decide on a suitable value for your application.

The sad truth is: There is no appropriate epsilon for floating-point comparisons. Use another approach for floating-point equality tests if you don't want to run into serious bugs.
Approximate floating-point comparison is an amazingly tricky field, and the abs(x - y) < eps approach works only for a very limited range of values, mainly because of the absolute difference not taking into account the magnitude of the compared values, but also due to the significant digit cancellation occurring in the subtraction of two floating-point values with different exponents.
There are better approaches, using relative differences or ULPs, but they have their own shortcomings and pitfalls. Read Bruce Dawson's excellent article Comparing Floating Point Numbers, 2012 Edition for a great introduction into how tricky floating-point comparisons really are -- a must-read for anyone doing floating-point programming IMHO! I'm sure countless thousands of man-years have been spent finding out the subtle bugs due to naive floating-point comparisons.

I also have questions regarding what would be the correct procedure. However I believe one should do:
abs(x - y) <= 0.5 * eps * max(abs(x), abs(y))
instead of:
abs(x - y) < eps
The reason for this arises from the definition of the machine epsilon. Using python code:
import numpy as np
real = np.float64
eps = np.finfo(real).eps
## Let's get the machine epsilon
x, dx = real(1), real(1)
while x+dx != x: dx/= real(2) ;
print "eps = %e dx = %e eps*x/2 = %e" % (eps, dx, eps*x/real(2))
Which gives: eps = 2.220446e-16 dx = 1.110223e-16 eps*x/2 = 1.110223e-16
## Now for x=16
x, dx = real(16), real(1)
while x+dx != x: dx/= real(2) ;
print "eps = %e dx = %e eps*x/2 = %e" % (eps, dx, eps*x/real(2))
Which now gives: eps = 2.220446e-16 dx = 1.776357e-15 eps*x/2 = 1.776357e-15
## For x not equal to 2**n
x, dx = real(36), real(1)
while x+dx != x: dx/= real(2) ;
print "eps = %e dx = %e eps*x/2 = %e" % (eps, dx, eps*x/real(2))
Which returns: eps = 2.220446e-16 dx = 3.552714e-15 eps*x/2 = 3.996803e-15
However, despite the difference between dx and eps*x/2, we see that dx <= eps*x/2,
thus it serves the purpose for equality tests, checking for tolerances when testing for convergence in numerical procedures, etc.
Such is similar to what is in:
www.ibiblio.org/pub/languages/fortran/ch1-8.html#02,
however if someone knows of better procedures or if something here is incorrect, please do say.

Repeated application of functions

Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?

For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.

There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.

(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).

There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.

In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).

Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.

I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008