Theorem 7
When β = 2, if m and n are integers with |m| < 2^(p - 1) and n has the special form n = 2^i + 2^j, then (m n) n = m, provided floating-point operations are exactly rounded.
If I post the whole proof here, it will be long, unreadable and ugly. So please click the link on the right, press ctrl + F, and find Theorem 7. There it is! Goldberg91
OK, I have to say, at least from my own perspective, the proof of theorem 7 is too weird to comprehend, although the author claims that it's ingenious.
What I can only understand is that m has at most 1 bit right of the binary point. Yes, I know that, but why so n*qbar will round to m consequently? I can't also understand the so-called "halfway case" and almost everything from that line on.
Any help is welcome, thank you in advance.
EDIT:
Interestingly, the first comment below has solved all my questions in a roll, and the second comment suggests me to narrow my post.
That's right. It's inhuman to ask a person to explain the whole proof. So now my question becomes this:
Why the initial unscaled m's low-order bit was 0(found in the paragraph below formula 9)? Shouldn't the most significant digit be zero instead of the least significant one? Does this have something to do with 'Big-endian' or 'Little-endian'?
First off, let's scale n by 2. n should be made greater than or equal to 2^p - 1, and less than 2^p. The scaled n will be donated by n'. Scaling won't make any difference. It is exponent that is modified, since numbers are in binary, and we just have to focus on the significand / mantissa.
Next, m is scaled, producing m', so that q' = m'/n', less than 1 and greater than 1/2 (In fact, I think there should be 1/2 < q' ≤ 1). Scaling like these are possible because the upper bounds are exactly double the lower bounds, and β = 2.
As we can see, 2^p - 1 ≤ n' < 2^p and 1/2 < q' < 1, while m' = q' * n'. Since q' and n' are positive, the maximum of m' is the product of n' and q''s maximums, that is 2^p * 1 = 2^p. Similarly, the minimum of m' is 2^p - 1 * 1/2 = 2^p - 2.
Now that 2^p - 2 < m' < 2^p, we can say p - 2 < log2(m') < p, so the number of digits to the right of the binary point will be either p-1 (this happens when log2(m') is between p-2 and p-1) or p (when log2(m') is greater than or equal to p-1). Thus, m' has at most one bit to the right of the binary point.
As Mark Dickinson has said, as a result, the difference between m' and the next precision-p float up/down is at least 1/2. So to show that a quantity will round to m', it's enough to show that it's within 1/4 of m'.
Besides that, "the halfway case", namely the case where that quantity is exactly 1/4 from m', worth a separate discussion: Since the initial unscaled m had |m| < 2*p - 1, it has at least one bit to the right of the binary point, due to the same reason as mentioned above. m is an integer, so all digits to the right of the binary point are zero. Certainly, its low-order bit is 0 for that reason. Because scaling doesn't have effect on the significand / mantissa, the low-order bit of m' is also 0.
Consequently, using round to even, which is adopted by the original author by writing "Throughout the rest of this paper, round to even will be used." above (You can use ctrl + F again to find it), m' + 1/4 (0.01 in binary) will round to m', since 0 is even.
That is, if q̄ = m n, to prove the theorem requires showing that
|n' * q̄ - m'| ≤ 1/4.
Update 1
q' is a rational number, and q' < 1, so we can suppose that q' = 0.q1 q2... in binary, where qi, where i= 1,2,3... are single digits that contains 0 or 1. Let q̂ = 0.q1 q2 ... qp 1. Be careful, this token is "q-hat", not "q-bar", and it's a new variable I've just introduced in.
Now, if we shift q̂ left by p + 1 digits, we'll get an integer, namely q1 q2 ... qp 1, because all bits are to the left of the binary point. I'll use N to donate this integer hereafter. As a result, |q̂ - q'| = |N / 2^(p + 1) - m' / n'|.
The lower-bit of N is 1, so we know N = 1 + (qp * 2^1 + qp-1 * 2^2 + ... + q1 * 2^p). Obviously, N is an odd integer. Originally, n = 2^i + 2^j. Since scaling a number is just multiplying or dividing it by 2, n' is still a sum of two exponents of 2. Let them be n' = 2^i' + 2^j'. For convenience, it's assumed that i' ≥ j'.
2^p - 1 ≤ n' < 2^p, so 2^p - 1 ≤ 2^i' + 2^j' < 2^p. As a result, i', which accounts for a higher proportion of n', must be p - 1. To make n' less than 2^p, j' mustn't equal to i'. For readability, let k = j', so k ≤ p - 2. Thus,
. (Replace all n by n', m by m')
I suggest you to use some scratch papers to verify this formula on your own.
Update 2
Take a look at the numerator |(2^(p - 1 - k) + 1) * N - 2^(p + 1 - k) * m'|. As we have proved, k ≤ p - 2, so p - 1 - k ≥ 1, ensuring 2^(p - 1 - k) and 2^(p + 1 - k) to be even. Both (2^(p - 1 - k) + 1) and N are odd, so (2^(p - 1 - k) + 1) * N is odd, while 2^(p + 1 - k) * m' is even, and an odd integer can't equal to an even one. Therefore, (2^(p - 1 - k) + 1) * N - 2^(p + 1 - k) * m' is a non-zero integer. It's absolute value, namely the numerator, is consequently guaranteed to be equal to or greater than 1. Hence,
|q̂ - q'| ≥ 1 / (n' * 2^(p + 1 - k)).
q' < 1, and q̄ < 1, so q' * q̄ < 1. As a result, (m' * q') * q̄ < m', that is, n' * q̄ < m. Consequently,
|n' * q̄ - m'|
= m' - n' * q̄
= n' * (q' - q̄)
/* Since q̄ only has a precision of p, it will be 0.q1 q2 ... qp. So q̄ = q̂ - 2^(- p - 1). */
= n' * {q' - [q̂ - 2^(- p - 1)]}
= n' * [q' - q̂ + 2^(- p - 1)]
/* Assume that q' < q̂. The case q > q̂ is not discussed. */
= n' * [- |q' - q̂| + 2^(- p - 1)]
/* Since |q̂ - q'| ≥ 1 / (n' * 2^(p + 1 - k)), - |q̂ - q'| ≤ - 1 / (n' * 2^(p + 1 - k)). So */
≤ n' * {- 1 / [n' * 2^(p + 1 - k)] + 2^(- p - 1)}
= n' * {2^(- p - 1) - 1 / [n' * 2^(p + 1 - k)]}
/* We know n' = 2^i' + 2^j' = 2^(p - 1) + 2^k */
= [2^(p - 1) + 2^k] * {2^(- p - 1) - 1 / {[2^(p - 1) + 2^k] * 2^(p + 1 - k)}}
/* The equation is becoming less as less readable. For brevity, insignificant algebraic steps are omitted. */
= 2^-2 + 2^(- p - 1 + k) - 1 / 2^(p + 1 - k)
= 1/4
By now, |n' * q̄ - m'| ≤ 1/4 have been established. As mentioned above, this proves the theorem.
Q.E.D.
Questions that still remain:
Why "the case q > q̂ is similar" ? I think things will be totally different without that minus!
Related
i need to proof that 2n^2 - 2n -7 = O(n^2), when the n is 1 or two i got negative value of f(n). i am not sure does the way i proof Big-O is correct.Your help and advice is highly appreciated.
f(n) = 2n^2 - 2n -7 = O(n^2) if c=2;
n=1->2(1)^2 - 2(1) -7 = -7 <= 2*(1)^2
n=2->2(2)^2 - 2(2) -7 = -3 <= 2*(2)^2
n=3->2(3)^2 - 2(3) -7 = 14 <= 2*(3)^2
The definition of Big-Oh has more to do with asymptotic behavior than local behavior. If your function went negative for increasing values of n, say it oscillated, there might be more of a concern. For this function, though, there is no problem: you are free to consider the function for all values greater than some n0 which you alone are allowed to choose. So, if the function going negative early on bothers you, write your proof such that those numbers aren't used. For example:
Base case: for n = 3, f(n) = 2*3*3 - 2*3 - 7 = 18 - 6 - 7 = 5 <= 9 * c = c * 3 * 3 = c * n^2. This is true provided that c >= 5/9.
Induction hypothesis: assume f(n) <= c * n^2 for all n starting at 3 up through k.
Induction step: we must show that f(k+1) <= c * (k+1)^2. We have f(k+1) = 2(k+1)^2 - 2(k+1) - 7 = 2k^2+4k+2 - 2k - 2 - 7 = 2k^2 + 2k - 7 < 2k^2 + 4k < 2k^2 + 4k + 2 = 2(k^2 + 2k + 1) = 2(k+1)^2, so the choice c = 2 works here.
In hindsight, it should be obvious that 2n^2 - 2n - 7 is always less than 2n^2 for positive increasing n.
I am wondering why this Fibonacci recursive function works:
int fibRec(int n)
{
if ((n == 1) || (n == 0))
{
return n;
}
int i = fibRec(n - 1) + fibRec(n - 2);
return i;
}
I understand what the Fibonacci sequence is and I understand what a recursive function does and how this function is working. I'm just having troubles understanding why it works. I know that when you break it down, you are essentially adding a bunch of 0s and 1s, as this image depicts.
fibonacci recursive
But why is it that when I pass a 5 to the function and all the 0 and 1s are added that it will equal the 5th sequence number in the Fibonacci sequence? I've seen this question asked before but never really explained. The responses are all just "because recursion". Yes, I know what a recursive function is and how this one is working. But WHY does this recursive function give you the correct Fibonacci sequence number?
In the Fibonacci sequence the first two numbers are zero and one. Every number after these is the sum of the previous 2 numbers. So the first few numbers are
F(0) ≡ 0
F(1) ≡ 1
F(2) = F(1) + F(0) = 1 + 0 = 1
F(3) = F(2) + F(1) = 1 + 1 = 2
F(4) = F(3) + F(2) = 2 + 1 = 3
F(5) = F(4) + F(3) = 3 + 2 = 5
F(6) = F(5) + F(4) = 5 + 3 = 8
...
F(n) = F(n - 1) + F(n - 2) ∀ n > 1
Therefore when we calculate a Fibonacci number recursively we have to practice the following logical procedure (in pseudo-code out of respect to StackOverflow).
Integer NthFibonacci(Integer n) {
if (n < 0) {
return undefined;
} else if (n < 2) {
return n;
} else {
return NthFibonacci(n - 1) + NthFibonacci(n - 2);
}
}
I'm sure you know all this but I think it will help my explanation to have this part as a reference.
Where the Ones and Zeros Come In
The best way to explain this is probably with an example.
Imagine that, as above, we are trying to recursively calculate F(6). Try following the procedure given above. Remember that we will perform recursion only if n > 1.
First we start with F(6) = F(5) + F(4).
Then we find F(5) = F(4) + F(3).
Then we find F(4) = F(3) + F(2).
Then we find F(3) = F(2) + F(1).
Then we find F(2) = F(1) + F(0).
This is where things start to work out!
We have now gotten F(2) in terms of F(1) ≡ 1 and F(0) ≡ 0 (both of which are known), and so we are able to calculate an actual value instead of performing more recursion.
We can now find F(2) = F(1) + F(0) = 1 + 0 = 1.
NOTICE THE 1 AND 0 Those are what people are talking about when they say the whole thing comes down to ones and zeros. Every time we recurse down to find a base value we will end up finding F(2) = 1 + 0. This leads to more ones and zeros as we move back up our recursion tree being able to calculate higher and higher values, as follows.
F(3) = F(2) + F(1) = (1 + 0) + 1
F(4) = F(3) + F(2) = ((1 + 0) + 1) + (1 + 0)
F(5) = F(4) + F(3) = (((1 + 0) + 1) + (1 + 0)) + ((1 + 0) + 1)
F(6) = F(5) + F(4) = ((((1 + 0) + 1) + (1 + 0)) + ((1 + 0) + 1)) + (((1 + 0) + 1) + (1 + 0))
Now if you add up all the 1's you get a sum of 8, and so F(6) = 8, which is correct!
This is how it works, and this is how it breaks down to ones and zeros.
Remember, recursion works by breaking down the problem till we know what the answer is, and then building it up from there.
What do we know about the fibonacci sequence?
We know that when:
x = 1
and
x = 0
That that is the lowest it goes. That is an important key. Because when x = 0 we are really doing 0 + 0 and when x = 1 we are really doing 0 + 1. Now start at the top.
0,1,1,2,3,5,8,13...
If we are at 13. what is 13? Why simply 5 + 8 right? So That is where
int i = fibRec(n - 1) + fibRec(n - 2);
comes from. Because these are going to branch out lower and lower till we are at a base case for each one.
This is the recursive calling. Because now the method is going to go back to the stack and call fibRec again. You will notice that (n-1) and (n-2) are both added together and set to i. This is so that we don't lose the value. because of the + sign the stack then ends up returning more and more (n-1)s and (n-2)s until we are at the base case. I hope all of this makes sense. Thinking recursively can be very difficult. Here is a a visual representation from top to bottom of what it would look like.
In short. This just keeps adding the previous fibonacci sequences to the current one until it gets to the current loop.
As we all know usually negative numbers in memory represents as two's complement numbers like that
from x to ~x + 1
and to get back we don't do the obvious thing like
~([~x + 1] - 1)
but instead we do
~[~x + 1] + 1
can someone explain why does it always work? I think I can proof it with 1-bit, 2-bit, 3-bit numbers and then use Mathematical induction but it doesn't help me understand how exactly that works.
Thanks!
That's the same thing anyway. That is, ~x + 1 == ~(x - 1). But let's put that aside for now.
f(x) = ~x + 1 is its own inverse. Proof:
~(~x + 1) + 1 =
(definition of subtraction: a - b = ~(~a + b))
x - 1 + 1 =
(you know this step)
x
Also, ~x + 1 == ~(x - 1). Why? Well,
~(x - 1) =
(definition of subtraction: a - b = ~(~a + b))
~(~(~x + 1)) =
(remove double negation)
~x + 1
And that (slightly unusual) definition of subtraction, a - b = ~(~a + b)?
~(~a + b) =
(use definition of two's complement, ~x = -x - 1)
-(~a + b) - 1 =
(move the 1)
-(~a + b + 1) =
(use definition of two's complement, ~x = -x - 1)
-(-a + b) =
(you know this step)
a - b
This is because if you increment ~x (assuming no overflow). Then converting it to back to x, you've incremented relative to ~x, but decremented relative to x. Same thing applies vice versa. Assuming your variable x has a specific value, every time you increment it, relative to ~x you'll notice it decrements.
From a programmer's point of view, this is what you'd essentially witness.
Let short int x = 1 (0x0001)
then ~x = 65534 (0xFFFE)
~x + 1 = 65534 + 1 (0xFFFF)
~(~x+1) = 0 (0x0000)
~(~x+1) + 1 = 0 + 1 (0x0001)
As a sub-routine for a script I'm writing, I need a numerical function that behaves in a specific pattern. It takes a single input and provides a single output, such that between 0 and L inclusive, it is the identity function, but then between L+1 and L*2+1, it returns L to 0 respectively, and then from L*2+2 to L*3+2 it returns 0 to L respectively, and so on. I want to do this without any if statements, only using absolute value and modulus. Can anyone give me pseudocode for this function?
Given input I, limit L and result R, and using only basic arithmetic and absolute values, this gives the desired result.
R = ABS(L * ((I - (I % (L + 1))) / (L + 1) % 2) + (L + 1) * (I - (I % (L + 1))) / (L + 1) - I)
This can obviously be hugely simplified by declaring some intermediate variables, and using additional methods, e.g. floor to simulate integer division. Here's a Javascript example:
var factor = Math.floor(input / (limit + 1));
var flag = factor % 2;
var result = Math.abs(limit * flag + (limit + 1) * factor - input)
I'm trying to get x and y coordinates for points along a line (segment) at even intervals. In my test case, it's every 16 pixels, but the idea is to do it programmatically in ActionScript-3.
I know how to get slope between two points, the y intercept of a line, and a2 + b2 = c2, I just can't recall / figure out how to use slope or angle to get a and b (x and y) given c.
Does anyone know a mathematical formula to figure out a and b given c, y-intercept and slope (or angle)? (AS3 is also fine.)
You have a triangle:
|\ a^2 + b^2 = c^2 = 16^2 = 256
| \
| \ c a = sqrt(256 - b^2)
a | \ b = sqrt(256 - a^2)
| \
|__________\
b
You also know (m is slope):
a/b = m
a = m*b
From your original triangle:
m*b = a = sqrt(256 - b^2)
m^2 * b^2 = 256 - b^2
Also, since m = c, you can say:
m^2 * b^2 = m^2 - b^2
(m^2 + 1) * b^2 = m^2
Therefore:
b = m / sqrt(m^2 + 1)
I'm lazy so you can find a yourself: a = sqrt(m^2 - b^2)
Let s be the slop.
we have: 1) s^2 = a^2/b^2 ==> a^2 = s^2 * b^2
and: 2) a^2 + b^2 = c^2 = 16*16
substitute a^2 in 2) with 1):
b = 16/sqrt(s^2+1)
and
a = sqrt((s^2 * 256)/(s^2 + 1)) = 16*abs(s)/sqrt(s^2+1)
In above, I assume you want to get the length of a and b. In reality, your s is a signed value, so a could be negative. Therefore, the incremental value of a will really be:
a = 16s/sqrt(s^2+1)
The Point class built in to Flash has a wonderful set of methods for doing exactly what you want. Define the line using two points and you can use the "interpolate" method to get points further down the line automatically, without any of the trigonometry.
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/geom/Point.html#interpolate()
The Slope is dy/dx. Or in your terms A/B.
Therefore you can step along the line by adding A to the Y coordinate, and B to the X coordinate. You can Scale A and B to make the steps bigger or smaller.
To Calculate the slope and get A and B.
Take two points on the line (X1,Y1) , (X2,Y2)
A= (Y2-Y1)
B= (X2-X1)
If you calculate this with the two points you want to iterate between simply divide A and B by the number of steps you want to take
STEPS=10
yStep= A/STEPS
xStep= B/STEPS
for (i=0;i<STEPS;i++)
{
xCur=x1+xStep*i;
yCur=y1+yStep*i;
}
Given the equation for a line as y=slope*x+intercept, you can simply plug in the x-values and read back the y's.
Your problem is computing the step-size along the x-axis (how big a change in x results from a 16-pixel move along the line, which is b in your included plot). Given that you know a^2 + b^2 = 16 (by definition) and slope = a/b, you can compute this:
slope = a/b => a = b * slope [multiply both sides by b]
a^2 + b^2 = 16 => (b * slope)^2 + b^2 = 16 [by substitution from the previous step]
I'll leave it to you to solve for b. After you have b you can compute (x,y) values by:
for x = 0; x += b
y = slope * x + intercept
echo (x,y)
loop