Understanding how recursive functions work - function

As the title explains I have a very fundamental programming question which I have just not been able to grok yet. Filtering out all of the (extremely clever) "In order to understand recursion, you must first understand recursion." replies from various online threads I still am not quite getting it.
Understanding that when faced with not knowing what we don't know, we can tend to ask the wrong questions or ask the right questions incorrectly I will share what I "think" my question is in hopes that someone with a similar outlook can share some bit of knowledge that will help turn on the recursive light bulb for me!
Here is the function (the syntax is written in Swift):
func sumInts(a: Int, b: Int) -> Int {
if (a > b) {
return 0
} else {
return a + sumInts(a: a + 1, b: b)
}
}
We'll use 2 and 5 as our arguments:
println(sumInts(a: 2, b: 5))
Obviously the answer is 14. But I'm not clear on how that value is achieved.
These are my 2 hangups:
The function is called recursively until a condition is met. That condition is a > b. When this condition is met, return 0. At first glance, I would expect the return value to be 0 which is obviously incorrect.
Printing out the value of 'a' on each iteration yields a value which I would expect: 2, 3, 4, 5 (at which point 5+1 > b which meets the first condition: a > b) but I still don't see how the value of 14 is achieved.
My first thought is that something similar to the following is happening magically:
var answer = a;
answer += a+1 until a > b;
return answer;
So ruling out magic, I'm just not getting something. I would love to understand what's happening more than just implicitly.
If someone could kindly explain what technically happens during this kind of function and why the result isn't 0 and how, eventually, a + sumInts(a: a + 1, b: b) = 14, I would be forever in your debt.

1.The function is called recursively until a condition is met. That condition is a > b. When this condition is met, return 0. At first glance, I would expect the return value to be 0 which is obviously incorrect.
Here is what the computer computing sumInts(2,5) would think if it were able to:
I want to compute sumInts(2, 5)
for this, I need to compute sumInts(3, 5)
and add 2 to the result.
I want to compute sumInts(3, 5)
for this, I need to compute sumInts(4, 5)
and add 3 to the result.
I want to compute sumInts(4, 5)
for this, I need to compute sumInts(5, 5)
and add 4 to the result.
I want to compute sumInts(5, 5)
for this, I need to compute sumInts(6, 5)
and add 5 to the result.
I want to compute sumInts(6, 5)
since 6 > 5, this is zero.
The computation yielded 0, therefore I shall return 5 = 5 + 0.
The computation yielded 5, therefore I shall return 9 = 4 + 5.
The computation yielded 9, therefore I shall return 12 = 3 + 9.
The computation yielded 12, therefore I shall return 14 = 2 + 12.
As you see, some call to the function sumInts actually returns 0 however this not the final value because the computer still has to add 5 to that 0, then 4 to the result, then 3, then 2, as described by the four last sentences of the thoughts of our computer. Note that in the recursion, the computer does not only have to compute the recursive call, it also has to remember what to do with the value returned by the recursive call. There is a special area of computer's memory called the stack where this kind of information is saved, this space is limited and functions that are too recursive can exhaust the stack: this is the stack overflow giving its name to our most loved website.
Your statement seems to make the implicit assumption that the computer forgets what it were at when doing a recursive call, but it does not, this is why your conclusion does not match your observation.
2.Printing out the value of 'a' on each iteration yields a value which I would expect: 2, 3, 4, 5 (at which point 5+1 > b which meets the first condition: a > b) but I still don't see how the value of 14 is achieved.
This is because the return value is not an a itself but the sum of the value of a and the value returned by the recursive call.

I think the confusion is stemming from thinking of it as "the same function" being called many times. If you think of it as "many copies of the same function being called", then it may be clearer:
Only one copy of the function ever returns 0, and it's not the first one (it's the last one). So the result of calling the first one is not 0.
For the second bit of confusion, I think it will be easier to spell out the recursion in English. Read this line:
return a + sumInts(a + 1, b: b)
as "return the value of 'a' plus (the return value of another copy of the function, which is the copy's value of 'a' plus (the return value of another copy of the function, which is the second copy's value of 'a' plus (...", with each copy of the function spawning a new copy of itself with a increased by 1, until the a > b condition is met.
By the time you reach the the a > b condition being true, you have a (potentially arbitrarily) long stack of copies of the function all in the middle of being run, all waiting on the result of the next copy to find out what they should add to 'a'.
(edit: also, something to be aware of is that the stack of copies of the function I mention is a real thing that takes up real memory, and will crash your program if it gets too large. The compiler can optimize it out in some cases, but exhausting stack space is a significant and unfortunate limitation of recursive functions in many languages)

To understand recursion you must think of the problem in a different way. Instead of a large logical sequence of steps that makes sense as a whole you instead take a large problem and break up into smaller problems and solve those, once you have an answer for the sub problems you combine the results of the sub problems to make the solution to the bigger problem. Think of you and your friends needing to count the number of marbles in a huge bucket. You do each take a smaller bucket and go count those individually and when you are done you add the totals together.. Well now if each of you find some friend and split the buckets further, then you just need to wait for these other friends to figure out their totals, bring it back to each of you, you add it up. And so on. The special case is when you only get 1 marble to count then you just return it back and say 1. let the other people above you do the adding you are done.
You must remember every time the function calls itself recursively it creates a new context with a subset of the problem, once that part is resolved it gets returned so that the previous iteration can complete.
Let me show you the steps:
sumInts(a: 2, b: 5) will return: 2 + sumInts(a: 3, b: 5)
sumInts(a: 3, b: 5) will return: 3 + sumInts(a: 4, b: 5)
sumInts(a: 4, b: 5) will return: 4 + sumInts(a: 5, b: 5)
sumInts(a: 5, b: 5) will return: 5 + sumInts(a: 6, b: 5)
sumInts(a: 6, b: 5) will return: 0
once sumInts(a: 6, b: 5) has executed, the results can be computed so going back up the chain with the results you get:
sumInts(a: 6, b: 5) = 0
sumInts(a: 5, b: 5) = 5 + 0 = 5
sumInts(a: 4, b: 5) = 4 + 5 = 9
sumInts(a: 3, b: 5) = 3 + 9 = 12
sumInts(a: 2, b: 5) = 2 + 12 = 14.
Another way to represent the structure of the recursion:
sumInts(a: 2, b: 5) = 2 + sumInts(a: 3, b: 5)
sumInts(a: 2, b: 5) = 2 + 3 + sumInts(a: 4, b: 5)
sumInts(a: 2, b: 5) = 2 + 3 + 4 + sumInts(a: 5, b: 5)
sumInts(a: 2, b: 5) = 2 + 3 + 4 + 5 + sumInts(a: 6, b: 5)
sumInts(a: 2, b: 5) = 2 + 3 + 4 + 5 + 0
sumInts(a: 2, b: 5) = 14

Recursion is a tricky topic to understand and I don't think I can fully do it justice here. Instead, I'll try to focus on the particular piece of code you have here and try to describe both the intuition for why the solution works and the mechanics of how the code computes its result.
The code you've given here solves the following problem: you want to know the sum of all the integers from a to b, inclusive. For your example, you want the sum of the numbers from 2 to 5, inclusive, which is
2 + 3 + 4 + 5
When trying to solve a problem recursively, one of the first steps should be to figure out how to break the problem down into a smaller problem with the same structure. So suppose that you wanted to sum up the numbers from 2 to 5, inclusive. One way to simplify this is to notice that the above sum can be rewritten as
2 + (3 + 4 + 5)
Here, (3 + 4 + 5) happens to be the sum of all the integers between 3 and 5, inclusive. In other words, if you want to know the sum of all the integers between 2 and 5, start by computing the sum of all the integers between 3 and 5, then add 2.
So how do you compute the sum of all the integers between 3 and 5, inclusive? Well, that sum is
3 + 4 + 5
which can be thought of instead as
3 + (4 + 5)
Here, (4 + 5) is the sum of all the integers between 4 and 5, inclusive. So, if you wanted to compute the sum of all the numbers between 3 and 5, inclusive, you'd compute the sum of all the integers between 4 and 5, then add 3.
There's a pattern here! If you want to compute the sum of the integers between a and b, inclusive, you can do the following. First, compute the sum of the integers between a + 1 and b, inclusive. Next, add a to that total. You'll notice that "compute the sum of the integers between a + 1 and b, inclusive" happens to be pretty much the same sort of problem we're already trying to solve, but with slightly different parameters. Rather than computing from a to b, inclusive, we're computing from a + 1 to b, inclusive. That's the recursive step - to solve the bigger problem ("sum from a to b, inclusive"), we reduce the problem to a smaller version of itself ("sum from a + 1 to b, inclusive.").
If you take a look at the code you have above, you'll notice that there's this step in it:
return a + sumInts(a + 1, b: b)
This code is simply a translation of the above logic - if you want to sum from a to b, inclusive, start by summing a + 1 to b, inclusive (that's the recursive call to sumInts), then add a.
Of course, by itself this approach won't actually work. For example, how would you compute the sum of all the integers between 5 and 5 inclusive? Well, using our current logic, you'd compute the sum of all the integers between 6 and 5, inclusive, then add 5. So how do you compute the sum of all the integers between 6 and 5, inclusive? Well, using our current logic, you'd compute the sum of all the integers between 7 and 5, inclusive, then add 6. You'll notice a problem here - this just keeps on going and going!
In recursive problem solving, there needs to be some way to stop simplifying the problem and instead just go solve it directly. Typically, you'd find a simple case where the answer can be determined immediately, then structure your solution to solve simple cases directly when they arise. This is typically called a base case or a recursive basis.
So what's the base case in this particular problem? When you're summing up integers from a to b, inclusive, if a happens to be bigger than b, then the answer is 0 - there aren't any numbers in the range! Therefore, we'll structure our solution as follows:
If a > b, then the answer is 0.
Otherwise (a ≤ b), get the answer as follows:
Compute the sum of the integers between a + 1 and b.
Add a to get the answer.
Now, compare this pseudocode to your actual code:
func sumInts(a: Int, b: Int) -> Int {
if (a > b) {
return 0
} else {
return a + sumInts(a + 1, b: b)
}
}
Notice that there's almost exactly a one-to-one map between the solution outlined in pseudocode and this actual code. The first step is the base case - in the event that you ask for the sum of an empty range of numbers, you get 0. Otherwise, compute the sum between a + 1 and b, then go add a.
So far, I've given just a high-level idea behind the code. But you had two other, very good questions. First, why doesn't this always return 0, given that the function says to return 0 if a > b? Second, where does the 14 actually come from? Let's look at these in turn.
Let's try a very, very simple case. What happens if you call sumInts(6, 5)? In this case, tracing through the code, you see that the function just returns 0. That's the right thing to do, to - there aren't any numbers in the range. Now, try something harder. What happens when you call sumInts(5, 5)? Well, here's what happens:
You call sumInts(5, 5). We fall into the else branch, which return the value of `a + sumInts(6, 5).
In order for sumInts(5, 5) to determine what sumInts(6, 5) is, we need to pause what we're doing and make a call to sumInts(6, 5).
sumInts(6, 5) gets called. It enters the if branch and returns 0. However, this instance of sumInts was called by sumInts(5, 5), so the return value is communicated back to sumInts(5, 5), not to the top-level caller.
sumInts(5, 5) now can compute 5 + sumInts(6, 5) to get back 5. It then returns it to the top-level caller.
Notice how the value 5 was formed here. We started off with one active call to sumInts. That fired off another recursive call, and the value returned by that call communicated the information back to sumInts(5, 5). The call to sumInts(5, 5) then in turn did some computation and returned a value back to the caller.
If you try this with sumInts(4, 5), here's what will happen:
sumInts(4, 5) tries to return 4 + sumInts(5, 5). To do that, it calls sumInts(5, 5).
sumInts(5, 5) tries to return 5 + sumInts(6, 5). To do that, it calls sumInts(6, 5).
sumInts(6, 5) returns 0 back to sumInts(5, 5).</li>
<li>sumInts(5, 5)now has a value forsumInts(6, 5), namely 0. It then returns5 + 0 = 5`.
sumInts(4, 5) now has a value for sumInts(5, 5), namely 5. It then returns 4 + 5 = 9.
In other words, the value that's returned is formed by summing up values one at a time, each time taking one value returned by a particular recursive call to sumInts and adding on the current value of a. When the recursion bottoms out, the deepest call returns 0. However, that value doesn't immediately exit the recursive call chain; instead, it just hands the value back to the recursive call one layer above it. In that way, each recursive call just adds in one more number and returns it higher up in the chain, culminating with the overall summation. As an exercise, try tracing this out for sumInts(2, 5), which is what you wanted to begin with.
Hope this helps!

You've got some good answers here so far, but I'll add one more that takes a different tack.
First off, I have written many articles on simple recursive algorithms that you might find interesting; see
http://ericlippert.com/tag/recursion/
http://blogs.msdn.com/b/ericlippert/archive/tags/recursion/
Those are in newest-on-top order, so start from the bottom.
Second, so far all of the answers have described recursive semantics by considering function activation. That each, each call makes a new activation, and the recursive call executes in the context of this activation. That is a good way to think of it, but there is another, equivalent way: smart text seach-and-replace.
Let me rewrite your function into a slightly more compact form; don't think of this as being in any particular language.
s = (a, b) => a > b ? 0 : a + s(a + 1, b)
I hope that makes sense. If you're not familiar with the conditional operator, it is of the form condition ? consequence : alternative and its meaning will become clear.
Now we wish to evaluate s(2,5) We do so by doing a textual replacing of the call with the function body, then replace a with 2 and b with 5:
s(2, 5)
---> 2 > 5 ? 0 : 2 + s(2 + 1, 5)
Now evaluate the conditional. We textually replace 2 > 5 with false.
---> false ? 0 : 2 + s(2 + 1, 5)
Now textually replace all false conditionals with the alternative and all true conditionals with the consequence. We have only false conditionals, so we textually replace that expression with the alternative:
---> 2 + s(2 + 1, 5)
Now, to save me having to type all those + signs, textually replace constant arithmetic with its value. (This is a bit of a cheat, but I don't want to have to keep track of all the parentheses!)
---> 2 + s(3, 5)
Now search-and-replace, this time with the body for the call, 3 for a and 5 for b. We'll put the replacement for the call in parentheses:
---> 2 + (3 > 5 ? 0 : 3 + s(3 + 1, 5))
And now we just keep on doing those same textual substitution steps:
---> 2 + (false ? 0 : 3 + s(3 + 1, 5))
---> 2 + (3 + s(3 + 1, 5))
---> 2 + (3 + s(4, 5))
---> 2 + (3 + (4 > 5 ? 0 : 4 + s(4 + 1, 5)))
---> 2 + (3 + (false ? 0 : 4 + s(4 + 1, 5)))
---> 2 + (3 + (4 + s(4 + 1, 5)))
---> 2 + (3 + (4 + s(5, 5)))
---> 2 + (3 + (4 + (5 > 5 ? 0 : 5 + s(5 + 1, 5))))
---> 2 + (3 + (4 + (false ? 0 : 5 + s(5 + 1, 5))))
---> 2 + (3 + (4 + (5 + s(5 + 1, 5))))
---> 2 + (3 + (4 + (5 + s(6, 5))))
---> 2 + (3 + (4 + (5 + (6 > 5 ? 0 : s(6 + 1, 5)))))
---> 2 + (3 + (4 + (5 + (true ? 0 : s(6 + 1, 5)))))
---> 2 + (3 + (4 + (5 + 0)))
---> 2 + (3 + (4 + 5))
---> 2 + (3 + 9)
---> 2 + 12
---> 14
All we did here was just straightforward textual substitution. Really I shouldn't have substituted "3" for "2+1" and so on until I had to, but pedagogically it would have gotten hard to read.
Function activation is nothing more than replacing the function call with the body of the call, and replacing the formal parameters with their corresponding arguments. You have to be careful about introducing parentheses intelligently, but aside from that, it's just text replacement.
Of course, most languages do not actually implement activation as text replacement, but logically that's what it is.
So what then is an unbounded recursion? A recursion where the textual substitution doesn't stop! Notice how eventually we got to a step where there was no more s to replace, and we could then just apply the rules for arithmetic.

The way that I usually figure out how a recursive function works is by looking at the base case and working backwards. Here's that technique applied to this function.
First the base case:
sumInts(6, 5) = 0
Then the call just above that in the call stack:
sumInts(5, 5) == 5 + sumInts(6, 5)
sumInts(5, 5) == 5 + 0
sumInts(5, 5) == 5
Then the call just above that in the call stack:
sumInts(4, 5) == 4 + sumInts(5, 5)
sumInts(4, 5) == 4 + 5
sumInts(4, 5) == 9
And so on:
sumInts(3, 5) == 3 + sumInts(4, 5)
sumInts(3, 5) == 3 + 9
sumInts(3, 5) == 12
And so on:
sumInts(2, 5) == 2 + sumInts(3, 5)
sumInts(4, 5) == 2 + 12
sumInts(4, 5) == 14
Notice that we've arrived at our original call to the function sumInts(2, 5) == 14
The order in which these calls are executed:
sumInts(2, 5)
sumInts(3, 5)
sumInts(4, 5)
sumInts(5, 5)
sumInts(6, 5)
The order in which these calls return:
sumInts(6, 5)
sumInts(5, 5)
sumInts(4, 5)
sumInts(3, 5)
sumInts(2, 5)
Note that we came to a conclusion about how the function operates by tracing the calls in the order that they return.

Recursion. In Computer Science recursion is covered in depth under the topic of Finite Automata.
In its simplest form it is a self reference. For example, saying that "my car is a car" is a recursive statement. The problem is that the statement is an infinite recursion in that it will never end. The definition in the statement of a "car" is that it is a "car" so it may be substituted. However, there is no end because in the case of substitution, it still becomes "my car is a car".
This could be different if the statement were "my car is a bentley. my car is blue." In which case the substitution in the second situation for car could be "bentley" resulting in "my bentley is blue". These types of substitutions are mathematically explained in Computer Science through Context-Free Grammars.
The actual substitution is a production rule. Given that the statement is represented by S and that car is a variable which can be a "bentley" this statement can be recursively reconstructed.
S -> "my"S | " "S | CS | "is"S | "blue"S | ε
C -> "bentley"
This can be constructed in multiple ways, as each | means there is a choice. S can be replaced by any one of those choices, and S always starts empty. The ε means to terminate the production. Just as S can be replaced, so can other variables (there is only one and it is C which would represent "bentley").
So starting with S being empty, and replacing it with the first choice "my"S S becomes
"my"S
S can still be substituted as it represents a variable. We could choose "my" again, or ε to end it, but lets continue making our original statement. We choose the space which means S is replaced with " "S
"my "S
Next lets choose C
"my "CS
And C only has one choice for replacement
"my bentley"S
And the space again for S
"my bentley "S
And so on "my bentley is"S, "my bentley is "S, "my bentley is blue"S, "my bentley is blue" (replacing S for ε ends the production) and we have recursively built our statement "my bentley is blue".
Think of recursion as these productions and replacements. Each step in the process replaces its predecessor in order to produce the end result. In the exact example of the recursive sum from 2 to 5, you end up with the production
S -> 2 + A
A -> 3 + B
B -> 4 + C
C -> 5 + D
D -> 0
This becomes
2 + A
2 + 3 + B
2 + 3 + 4 + C
2 + 3 + 4 + 5 + D
2 + 3 + 4 + 5 + 0
14

Think recursion as a multiple clones doing same thing...
You ask to clone[1]: "sum numbers between 2 and 5"
+ clone[1] it knows that: result is 2 + "sum numbers between 3 and 5". so it asks to clone[2] to return: "sum numbers between 3 and 5"
| + clone[2] it knows that: result is 3 + "sum numbers between 4 and 5". so it asks to clone[3] to return: "sum numbers between 4 and 5"
| | + clone[3] it knows that: result is 4 + "sum numbers between 5 and 5". so it asks to clone[4] to return: "sum numbers between 5 and 5"
| | | + clone[4] it knows that: result is 5 + "sum numbers between 6 and 5". so it asks to clone[5] to return: "sum numbers between 6 and 5"
| | | | clone[5] it knows that: it can't sum, because 6 is larger than 5. so he returns 0 as result.
| | | + clone[4] it gets the result from clone[5] (=0) and sums: 5 + 0, returning 5
| | + clone[3] it gets the result from clone[4] (=5) and sums: 4 + 5, returning 9
| + clone[2] it gets the result from clone[3] (=9) and sums: 3 + 9, returning 12
+ clone[1] it gets the result from clone[2] (=12) and sums: 2 + 12, returning 14
and voilá!!

I'll give it a go.
Executing the equation a + sumInts(a+1, b), I will show how the final answer is 14.
//the sumInts function definition
func sumInts(a: Int, b: Int) -> Int {
if (a > b) {
return 0
} else {
return a + sumInts(a + 1, b)
}
}
Given: a = 2 and b = 5
1) 2 + sumInts(2+1, 5)
2) sumInts(3, 5) = 12
i) 3 + sumInts(3+1, 5)
ii) 4 + sumInts(4+1, 5)
iii) 5 + sumInts(5+1, 5)
iv) return 0
v) return 5 + 0
vi) return 4 + 5
vii) return 3 + 9
3) 2 + 12 = 14.
Let us know if you have any further questions.
Here's another example of recursive functions in the following example.
A man has just graduated college.
t is the amount of time in years.
The total actual number of years worked before retiring, can be calculated as follows:
public class DoIReallyWantToKnow
{
public int howLongDoIHaveToWork(int currentAge)
{
const int DESIRED_RETIREMENT_AGE = 65;
double collectedMoney = 0.00; //remember, you just graduated college
double neededMoneyToRetire = 1000000.00
t = 0;
return work(t+1);
}
public int work(int time)
{
collectedMoney = getCollectedMoney();
if(currentAge >= DESIRED_RETIREMENT_AGE
&& collectedMoney == neededMoneyToRetire
{
return time;
}
return work(time + 1);
}
}
And that should be just enough to depress anyone, lol. ;-P

A little bit off-topic, I know, but... try looking up recursion in Google... You'll see by example what it means :-)
Earlier versions of Google returned the following text (cited from memory):
Recursion
See Recursion
On September 10th 2014, the joke about recursion has been updated:
Recursion
Did you mean: Recursion
For another reply, see this answer.

One really good tip I came across in learning and really understanding recursion is to spend some time learning a language that doesn't have any form of loop construct other than via recursion. That way you'll get a great feel for how to USE recursion via practice.
I followed http://www.htdp.org/ which, as well as being a Scheme tutorial, is also a great introduction on how to design programs in terms of the architecture and design.
But basically, you need to invest some time. Without a 'firm' grasp of recursion certain algorithms, such as backtracking, will always seem 'hard' or even 'magic' to you. So, persevere. :-D
I hope this helps and Good Luck!

I think the best way to understand recursive functions is realizing that they are made to process recursive data structures. But in your original function sumInts(a: Int, b: Int) that calculates recursively the sum of numbers from a to b, it seems not to be a recursive data structure... Let's try a slightly modified version sumInts(a: Int, n: Int) where n is how many numbers you'll add.
Now, sumInts is recursive over n, a natural number. Still not a recursive data, right? Well, a natural number could be considered a recursive data structre using Peano axioms:
enum Natural = {
case Zero
case Successor(Natural)
}
So, 0 = Zero, 1 = Succesor(Zero), 2 = Succesor(Succesor(Zero)), and so on.
Once you have a a recursive data structure, you have the template for the function. For each non recursive case, you can calculate the value directly. For the recursive cases you assume that the recursive function is already working and use it to calculate the case, but deconstructing the argument. In the case of Natural, it means that instead of Succesor(n) we'll use n, or equivalently, instead of n we'll use n - 1.
// sums n numbers beginning from a
func sumInts(a: Int, n: Int) -> Int {
if (n == 0) {
// non recursive case
} else {
// recursive case. We use sumInts(..., n - 1)
}
}
Now the recursive function is simpler to program. First, the base case, n=0. What should we return if we want to add no numbers? The answer is, of course 0.
What about the recursive case? If we want to add n numbers beginning with a and we already have a working sumInts function that works for n-1? Well, we need to add a and then invoke sumInts with a + 1, so we end with:
// sums n numbers beginning from a
func sumInts(a: Int, n: Int) -> Int {
if (n == 0) {
return 0
} else {
return a + sumInts(a + 1, n - 1)
}
}
The nice thing is that now you shouldn't need to think in the low level of recursion. You just need to verify that:
For the base cases of the recursive data, it calculates the answer without using recursion.
For the recursive cases of the recursive data, it calculates the answer using recursion over the destructured data.

You might be interested in Nisan and Schocken's implementation of functions. The linked pdf is part of a free online course. It describes the second part of a virtual machine implementation in which the student should write a virtual-machine-language-to-machine-language compiler. The function implementation they propose is capable of recursion because it is stack-based.
To introduce you to the function implementation: Consider the following virtual machine code:
If Swift compiled to this virtual machine language, then the following block of Swift code:
mult(a: 2, b: 3) - 4
would compile down to
push constant 2 // Line 1
push constant 3 // Line 2
call mult // Line 3
push constant 4 // Line 4
sub // Line 5
The virtual machine language is designed around a global stack. push constant n pushes an integer onto this global stack.
After executing lines 1 and 2, the stack looks like:
256: 2 // Argument 0
257: 3 // Argument 1
256 and 257 are memory addresses.
call mult pushes the return line number (3) onto the stack and allocates space for the function's local variables.
256: 2 // argument 0
257: 3 // argument 1
258: 3 // return line number
259: 0 // local 0
...and it goes-to the label function mult. The code inside mult is executed. As a result of executing that code we compute the product of 2 and 3, which is stored in the function's 0th local variable.
256: 2 // argument 0
257: 3 // argument 1
258: 3 // return line number
259: 6 // local 0
Just before returning from mult, you will notice the line:
push local 0 // push result
We will push the product onto the stack.
256: 2 // argument 0
257: 3 // argument 1
258: 3 // return line number
259: 6 // local 0
260: 6 // product
When we return, the following happens:
Pop the last value on the stack to the memory address of the 0th argument (256 in this case). This happens to be the most convenient place to put it.
Discard everything on the stack up to the address of the 0th argument.
Go-to the return line number (3 in this case) and then advance.
After returning we are ready to execute line 4, and our stack looks like this:
256: 6 // product that we just returned
Now we push 4 onto the stack.
256: 6
257: 4
sub is a primitive function of the virtual machine language. It takes two arguments and returns its result in the usual address: that of the 0th argument.
Now we have
256: 2 // 6 - 4 = 2
Now that you know how a function call works, it is relatively simple to understand how recursion works. No magic, just a stack.
I have implemented your sumInts function in this virtual machine language:
function sumInts 0 // `0` means it has no local variables.
label IF
push argument 0
push argument 1
lte
if-goto ELSE_CASE
push constant 0
return
label ELSE_CASE
push constant 2
push argument 0
push constant 1
add
push argument 1
call sumInts // Line 15
add // Line 16
return // Line 17
// End of function
Now I will call it:
push constant 2
push constant 5
call sumInts // Line 21
The code executes and we get all the way to the stopping point where lte returns false. This is what the stack looks like at this point:
// First invocation
256: 2 // argument 0
257: 5 // argument 1
258: 21 // return line number
259: 2 // augend
// Second
260: 3 // argument 0
261: 5 // argument 1
262: 15 // return line number
263: 3 // augend
// Third
264: 4 // argument 0
265: 5 // argument 1
266: 15 // return line number
267: 4 // augend
// Fourth
268: 5 // argument 0
269: 5 // argument 1
270: 15 // return line number
271: 5 // augend
// Fifth
272: 6 // argument 0
273: 5 // argument 1
274: 15 // return line number
275: 0 // return value
Now let's "unwind" our recursion. return 0 and goto line 15 and advance.
271: 5
272: 0
Line 16: add
271: 5
Line 17: return 5 and goto line 15 and advance.
267: 4
268: 5
Line 16: add
267: 9
Line 17: return 9 and goto line 15 and advance.
263: 3
264: 9
Line 16: add
263: 12
Line 17: return 12 and goto line 15 and advance.
259: 2
260: 12
Line 16: add
259: 14
Line 17: return 14 and goto line 21 and advance.
256: 14
There you have it. Recursion: Glorified goto.

There are already a lot of good answers. Still I am giving a try.
When called, a function get a memory-space allotted, which is stacked upon the memory-space of the caller function. In this memory-space, the function keeps the parameters passed to it, the variables and their values. This memory-space vanishes along with the ending return call of the function. As the idea of stack goes, the memory-space of the caller function now becomes active.
For recursive calls, the same function gets multiple memory-space stacked one upon another. That's all. The simple idea of how stack works in memory of a computer should get you through the idea of how recursion happens in implementation.

Recursion started making sense to me when I stopped reading what others say about it or seeing it as something I can avoid and just wrote code. I found a problem with a solution and tried to duplicate the solution without looking. I only looked at the solution when I got helplessly stuck. Then I went back at trying to duplicate it. I did this again on multiple problems until I developed my own understanding and sense of how to identify a recursive problem and solve it. When I got to this level, I started making up problems and solving them. That helped me more. Sometimes, things can only be learned by trying it out on your own and struggling; until you “get it”.

Many of the answers above are very good. A useful technique for solving recursion though, is to spell out first what we want to do and code as a human would solve it . In the above case, we want to sum up a sequence of consecutive integers (using the numbers from above):
2, 3, 4, 5 //adding these numbers would sum to 14
Now, note that these lines are confusing (not wrong, but confusing).
if (a > b) {
return 0
}
Why the test a>b?, and whyreturn 0
Let's change the code to reflect more closely what a human does
func sumInts(a: Int, b: Int) -> Int {
if (a == b) {
return b // When 'a equals b' I'm at the most Right integer, return it
}
else {
return a + sumInts(a: a + 1, b: b)
}
}
Can we do it even more human like? Yes! Usually we sum up from left to right (2+3+...). But the above recursion is summing from right to left (...+4+5). Change the code to reflect it (The - can be a little intimidating, but not much)
func sumInts(a: Int, b: Int) -> Int {
if (a == b) {
return b // When I'm at the most Left integer, return it
}
else {
return sumInts(a: a, b: b - 1) + b
}
}
Some may find this function more confusing since we are starting from the 'far' end, but practicing can make it feel natural (and it is another good 'thinking' technique: Trying 'both' sides when solving a recursion). And again, the function reflects what a human (most?) does: Takes the sum of all left integers and adds the 'next' right integer.

I was having hard time to understanding recursion then i found this blog and i already seen this question so i thought i must have to share . You must read this blog i found this extremely helpful it explain with stack and even it explain how two recursion works with stack step by step. I recommend you first understand how stack works which it explain very well here : journey-to-the-stack
then now you will understand how recursion works now take a look of this post : Understand recursion step by step
Its a program :
def hello(x):
if x==1:
return "op"
else:
u=1
e=12
s=hello(x-1)
e+=1
print(s)
print(x)
u+=1
return e
hello(3)

Let me tell you with an example of Fibonacci series, Fibonacci is
t(n) = t(n - 1) + n;
if n = 0 then 1
so let see how recursion works, I just replace n in t(n) with n-1 and so on. it looks:
t(n-1) = t(n - 2) + n+1;
t(n-1) = t(n - 3) + n+1 + n;
t(n-1) = t(n - 4) + n+1 + n+2 + n;
.
.
.
t(n) = t(n-k)+ ... + (n-k-3) + (n-k-2)+ (n-k-1)+ n ;
we know if t(0)=(n-k) equals to 1 then n-k=0 so n=k we replace k with n:
t(n) = t(n-n)+ ... + (n-n+3) + (n-n+2)+ (n-n+1)+ n ;
if we omit n-n then:
t(n)= t(0)+ ... + 3+2+1+(n-1)+n;
so 3+2+1+(n-1)+n is natural number. it calculates as Σ3+2+1+(n-1)+n = n(n+1)/2 => n²+n/2
the result for fib is : O(1 + n²) = O(n²)
This the best way to understand recursive relation

Related

Understanding recursive function in DART

I have trouble understanding the code of this recursive function. I am new to DART programming. I understand what a recursive function accomplishes, but I have a problem understanding the programming syntax.
int sum(List<int> numberList, int index) {
if (index < 0) {
return 0;
} else {
return numberList[index] + sum(numberList, index - 1);
}
}
main() {
// Driver Code
var result = sum([1, 2, 3, 4, 5], 4);
print(result);
}
Question: where is the value for each step stored- Does the result for the first pass at line 5 equals 9 taken the inputs from line 11. Where is the value of result 9 stored? How does the function know to add 9 + 3 in the second pass?
Does the recursive function have "internal memory" of the values generated by each pass?
My understanding of the programing language would be that var result passes the arguments to the sum function.
The sum function executes the if-else command until the index value is 0, which means it executes 4 times. With the first pass the return command creates a value of 9 (5 + 4 since value of index is 5 and value of index-1 is 4).
Here begins my confusion. The sum function would now do a second if-else pass and execute the return command again.
Now the initial value of numberList[index] would need to be 9 and the value of sum(numberList, index - 1); would need to be 3, to get 9 + 3 = 12. Additional 2 passes gets 12 + 2 = 14 and 14 + 1 = 15 the expected result.
My question here is how does (if it does) the index value in the "numberList[index]" changes. The index value is defined as 4. Is this an internal logic of the recursive function or am I completely misinterpreting the programming syntax? I would expect that we have a "temporary" variable for the result which increases with each pass.

How does this function calculate?

I've been working through CodeWars katas and I came across a pretty cool solution that someone came up with. The problem I have is I don't understand how it works. I understand some of it like what it is generally doing but not detail specifics. Is it returning itself? How is it doing the calculation? Can someone explain this to me because I really what to learn how to do this. And if you know of any other resources I can read or watch that would be helpful. I didn't see anything like this in the Swift documentation.
func findDigit(_ num: Int, _ nth: Int) -> Int {
let positive = abs(num)
guard nth > 0 else { return -1 }
guard positive > 0 else { return 0 }
guard nth > 1 else { return positive % 10 }
return findDigit(positive / 10, nth - 1) }
For context:
Description:
The function findDigit takes two numbers as input, num and nth. It outputs the nth digit of num (counting from right to left).
Note
If num is negative, ignore its sign and treat it as a positive value.
If nth is not positive, return -1.
Keep in mind that 42 = 00042. This means that findDigit(42, 5) would return 0.
Examples
findDigit(5673, 4) returns 5
findDigit(129, 2) returns 2
findDigit(-2825, 3) returns 8
findDigit(-456, 4) returns 0
findDigit(0, 20) returns 0
findDigit(65, 0) returns -1
findDigit(24, -8) returns -1
Greatly appreciate any help. Thanks.
This is a simple recursive function. Recursive means that it calls itself over and over until a condition is satisfied that ends the recursion. If the condition is never satisfied, you'll end up with an infinite recursion which is not a good thing :)
As you already understand the purpose of the function, here are the details of how it works internally:
// Saves the absolute value (removes the negative sign) of num
let positive = abs(num)
// Returns -1 if num is 0 or negative
guard nth > 0 else { return -1 }
// Returns 0 if the absolute value of num is 0 (can't be negative)
guard positive > 0 else { return 0 } // Could be guard positive == 0
// nth is a counter that is decremented with every recursion.
// positive % 10 returns the remainder of positive / 10
// For example 23 % 10 = 3
// In this line it always returns a number from 0 - 9 IF nth <= 0
guard nth > 1 else { return positive % 10 }
// If none of the above conditions are true, calls itself using
// the current absolute value divided by 10, decreasing nth.
// nth serves to target a different digit in the original number
return findDigit(positive / 10, nth - 1)
Let's run through an example step by step:
findDigit(3454, 3)
num = 3454, positive = 3454, nth = 3
-> return findDigit(3454 / 10, 3 - 1)
num = 345, positive = 345, nth = 2 // 345, not 345.4: integer type
-> return findDigit(345 / 10, 2 - 1)
num = 35, positive = 35, nth = 1
-> return 35 % 10
-> return 5
It is a recursive solution. It does not return itself, per se, it calls itself on a simpler case, until it gets to a base case (here a 1 digit number). So for example, let us trace through what it does in your first example:
findDigit(5673, 4) calls
findDigit (567, 3) calls
findDigit (56,2) calls
findDigit (5,1) which is the base case which returns 5 which bubbles all the way back up to the surface.
This is a recursive algorithm. It works by solving the original problem by reducing it to a smaller problem of the same time, then solving that, recursively, until a base case is hit.
I think you'll have a much easier time understanding it if you see the calls being made. Of course, it's best to step through this in the debugger to really see what's going on. I've numbered the sections of interest to refer to them below
func findDigit(_ num: Int, _ nth: Int) -> Int {
print("findDigit(\(num), \(nth))") //#1
let positive = abs(num) // #2
guard nth > 0 else { return -1 } // #3
guard positive > 0 else { return 0 } // #4
guard nth > 1 else { return positive % 10 } // #5
return findDigit(positive / 10, nth - 1) // #6
}
print(findDigit(5673, 4))
I print out the function and its parameters, do you can see what's going on. Here's what's printed:
findDigit(5673, 4)
findDigit(567, 3)
findDigit(56, 2)
findDigit(5, 1)
5
Take the positive value of num, so the - sign doesn't get in the way.
Assert that the nth variable is greater than 0. Since the digit counting in this problem, any value equal to less 0 is invalid. In such a case, -1 is returned. This is very bad practice in Swift. This is what Optionals exist for. It's much better to make this function return Int? and returning nil to represent an error in the nth variable.
Assert that the positive variable is greater than 0. The only other possible case is that positive is 0, in which case its digit (for any position) is 0, so that's why you have return 0.
Assert that nth is greater than 1. If this is not the case, then nth must be 1 (the guard numbered #3 ensures it can't be negative, or 0. In such a case, the digit in the first position of a decimal number is that number modulo 10, hence why positive % 10 is returned.
If we reach this line, than we know we have a sane value of nth (> 0), which isn't 1, and we have a positive number greater than 0. Now we can proceed to solve this problem by recursing. We'll divid positive by 10, and make it into the new nth, and we'll decrement nth, because what is the nth digit of this call, will be in the n-1 th spot of the next call.
Someone by the name of JohanWiltink on CodeWars answered my question. But I chose to accept Nicolas's for the detail.
This was JohanWiltink explanation:
The function does not return itself as a function; it calls itself with different arguments and returns the result of that recursive call (this is possibly nested until, in this case, nth=1).
findDigit(10,2) thus returns the value of findDigit(1,1).
If you're not seeing how this works, try to work out by hand what e.g. findDigit(312,3) would return.
Thanks so much to everyone that answered! Really appreciate it!

Call by value, name/reference, need in ML

I am studying for a final, and I have a practice problem here.
The question asks for the result of
val y = ref 1;
fun f x = (!y) + (x + x);
(f (y := (!y)+1; !y)) + (!y);
under the following parameter passing techniques:
Call by value
Call by name
Call by need.
It seems to me that for call by value, the answer is 8.
However, I believe the answer for call by name is also 8, but I would expect it to be different. The reason I think it is 8:
y := (!y)+1 derefs y as 1, adds 1, and then sets y to 2
!y in line 3 serves as the argument to f, and since it is being dereferenced it is
passed as a value rather than as a reference (this may be where I am
going wrong?)
The function call returns 6, but does not set y as y was passed in as a value from the previous step
6 is added to the dereferenced value of y, which is 2.
This returns 8
Is this the correct answer, and if not, can someone please point out where I have gone wrong? Also, can someone explain to me how call by need would work in this situation also?
Many thanks.
I found out how it works:
(y := (!y)+1; !y) is the parameter passed to f.
f then looks like:
fun f x = (!y) + ((y:= (!y)+1; !y) + (y:= (!y)+1; !y));
so this ends up being 1+2+3, and the final step + (!y) adds 3 as this is the current value of y, giving 9.
Thanks for pointing out that I was still doing call-by-value.

Least amount of voters, given two halves

One of my former students sent me a message about this interview question he got while applying for a job as a Junior Developer.
There are two candidates running for president in a mock classroom election. Given the two percentages of voters, find out the least amount of possible voters in the classroom.
Examples:
Input: 50.00,50.00
Output: 2
Input: 25.00,75.00
Output: 4
Input: 53.23, 46.77
Output: 124 // The first value, 1138 was wrong. Thanks to Loïc for the correct value
Note: The sum of the input percentages are always 100.00%, two decimal places
The last example got me scratching my head. It was the first time I heard about this problem, and I'm kindof stumped on how to solve this.
EDIT: I called my student about the problem, and told me that he was not sure about the last value. He said, and I quote, "It was an absurdly large number output" :( sorry! I should've researched more before posting it online~ I'm guessing 9,797 is the output on the last example though..
You can compute these values by using the best rational approximations of the voter percentages. Wikipedia describes how to obtain these values from the continued fraction (which can be computed these using the euclidean algorithm). The desired result is the first approximation which is within 0.005% of the expected value.
Here's an example with 53.23%:
10000 = 1 * 5323 + 4677
5323 = 1 * 4677 + 646
4677 = 7 * 646 + 155
646 = 4 * 155 + 26
155 = 5 * 26 + 25
26 = 1 * 25 + 1
25 = 25* 1 + 0
Approximations:
1: 1 / 1
-> 1 = 100%
2: 1 / (1 + 1/1)
-> 1/2 = 50%
2.5: 1 / (1 + 1 / (1 + 1/6))
-> 7/1 = 53.75%
3: 1 / (1 + 1 / (1 + 1/7))
-> 8/15 = 53.33%
3.5: 1 / (1 + 1 / (1 + 1 / (7 + 1/3)))
-> 25/47 = 53.19%
4: 1 / (1 + 1 / (1 + 1 / (7 + 1/4)))
-> 33/62 = 53.23%
The reason we have extra values before the 3rd and 4th convergents is that their last terms (7 and 4 respectively) are greater than 1, so we must test the approximation with the last term decremented.
The desired result is the denominator of the first value which rounds to the desired value, which in this vase is 62.
Sample Ruby implementation available here (using the formulae from the Wikipedia page here, so it looks slightly different to the above example).
First you can notice that a trivial solution is to have 10.000 voters. Now let's try to find something lower than that.
For each value of N starting à 1
For Each value of i starting à 1
If i/N = 46.77
return N
Always choose the minimum of the two percentages to be faster.
Or faster :
For each value of N starting à 1
i = floor(N*46.77/100)
For j = i or i+1
If round(j/N) = 46.77 and round((N-j)/N) = 53.23
return N
For the third example :
605/1138 = .5316344464
(1138-605)/1138 = .4683655536
but
606/1138 = .5325131810
(1138-606)/1138 = .4674868190
It can't be 1138...
But 62 is working :
33/62 = .5322580645
(62-33)/62 = .4677419355
Rounded it's giving you the good values.
(After some extensive edits:)
If you only have 2 voters, then you can only generate the following percentages for candidates A and B:
0+100, 100+0, or 50+50
If you have 3 voters, then you have
0+100, 100+0, 33.33+66.67, 66.67+33.33 [notice the rounding]
So this is a fun problem about fractions.
If you can make 25% then you have to have at least 4 people (so you can do 1/4, since 1/2 and 1/3 won't cut it). You can do it with more (i.e. 2/8 = 25%) but the problem asks for the least.
However, more interesting fractions require numbers greater than 1 in the numerator:
2/5 = 40%
Since you can't get that with anything but a 2 or more in the numerator (1/x will never cut it).
You can compare at each step and increase either the numerator or denominator, which is much more efficient than iterating over the whole sample space for j and then incrementing i;
i.e. if you have a percentage of 3%, checking solutions all the way up in the fashion of 96/99, 97/99, 98/99 before even getting to x/100 is a waste of time. Instead, you can increment the numerator or denominator based on how well your current guess is doing (greater than or less than) like so
int max = 5000; //we only need to go half-way at most.
public int minVoters (double onePercentage) {
double checkPercentage = onePercentage;
if (onePercentage > 50.0)
checkPercentage = 100-onePercentage; //get the smaller percentage value
double i=1;
double j=1; //arguments of Math.round must be double or float
double temp = 0;
while (j<max || i<max-1) { //we can go all the way to 4999/5000 for the lesser value
temp = (i/j)*100;
temp = Math.round(temp);
temp = temp/100;
if (temp == checkPercentage)
return j;
else if (temp > checkPercentage) //we passed up our value and need to increase the denominator
j++;
else if (temp < checkPercentage) //we are too low and increase the numerator
i++;
}
return 0; //no such solution
}
Step-wise example for finding the denominator that can yield 55%
55/100 = 11/20
100-55 = 45 = 9/20 (checkPercentage will be 45.0)
1/1 100.0%
1/2 50.00%
1/3 33.33%
2/3 66.67%
2/4 50.00%
2/5 40.00%
3/5 60.00%
3/6 50.00%
3/7 42.86% (too low, increase numerator)
4/7 57.14% (too high, increase denominator)
4/8 50.00%
4/9 44.44%
5/9 55.56%
5/10 50.00%
5/11 45.45%
6/11 54.54%
6/12 50.00%
6/13 46.15%
6/14 42.86%
7/14 50.00%
7/15 46.67%
7/16 43.75%
8/16 50.00%
8/17 47.06%
8/19 42.11%
9/19 47.37%
9/20 45.00% <-bingo
The nice thing about this method is that it will only take (i+j) steps where i is the numerator and j is the denominator.
I cannot see the relevance of this question to a position as junior developer.
Then answer that jumped into my head was more of a brute-force approach. There can be at most 5001 unique answers because there 5001 unique numbers between 00.00 and 50.00 . Consequently, why not create and save a look-up table. Obviously, there won't be 5001 unique answer because some answers will be repeated. The point is, there are only 5001 valid fractions because we are rounding to two digits.
int[] minPossible = new int[5001];
int numSolutionsFound = 0;
N = 2;
while(numSolutionsFound < 5001) {
for(int i = 0 ; i <= N/2 ; i++) {
//compute i/N
//see if the corresponding table entry is set
//if not write N there and increment numSolutionsFound
}
N++;
}
//Save answer here
Now the solution is merely a table look up.
FWIW I realize the euclidean solution is "correct". But I'd NEVER come up with that mid interview. However, I'd know something like that was possible -- but I won't be able to whip it out on the spot.

How to reduce calculation of average to sub-sets in a general way?

Edit: Since it appears nobody is reading the original question this links to, let me bring in a synopsis of it here.
The original problem, as asked by someone else, was that, given a large number of values, where the sum would exceed what a data type of Double would hold, how can one calculate the average of those values.
There was several answers that said to calculate in sets, like taking 50 and 50 numbers, and calculating the average inside those sets, and then finally take the average of all those sets and combine those to get the final average value.
My position was that unless you can guarantee that all those values can be split into a number of equally sized sets, you cannot use this approach. Someone dared me to ask the question here, in order to provide the answer, so here it is.
Basically, given an arbitrary number of values, where:
I know the number of values beforehand (but again, how would your answer change if you didn't?`)
I cannot gather up all the numbers, nor can I sum them (the sum will be too big for a normal data type in your programming language)
how can I calculate the average?
The rest of the question here outlines how, and the problems with, the approach to split into equally sized sets, but I'd really just like to know how you can do it.
Note that I know perfectly well enough math to know that in math theory terms, calculating the sum of A[1..N]/N will give me the average, let's assume that there are reasons that it isn't just as simple, and I need to split up the workload, and that the number of values isn't necessarily going to be divisable by 3, 7, 50, 1000 or whatever.
In other words, the solution I'm after will have to be general.
From this question:
What is a good solution for calculating an average where the sum of all values exceeds a double’s limits?
my position was that splitting the workload up into sets is no good, unless you can ensure that the size of those sets are equal.
Edit: The original question was about the upper limit that a particular data type could hold, and since he was summing up a lot of numbers (count that was given as example was 10^9), the data type could not hold the sum. Since this was a problem in the original solution, I'm assuming (and this is a prerequisite for my question, sorry for missing that) that the numbers are too big to give any meaningful answers.
So, dividing by the total number of values directly is out. The original reason for why a normal SUM/COUNT solution was out was that SUM would overflow, but let's assume, for this question that SET-SET/SET-SIZE will underflow, or whatever.
The important part is that I cannot simply sum, I cannot simply divide by the number of total values. If I cannot do that, will my approach work, or not, and what can I do to fix it?
Let me outline the problem.
Let's assume you're going to calculate the average of the numbers 1 through 6, but you cannot (for whatever reason) do so by summing the numbers, counting the numbers, and then dividing the sum by the count. In other words, you cannot simply do (1+2+3+4+5+6)/6.
In other words, SUM(1..6)/COUNT(1..6) is out. We're not considering NULL's (as in database NULL's) here.
Several of the answers to that question alluded to being able to split the numbers being averaged into sets, say 3 or 50 or 1000 numbers, then calculating some number for that, and then finally combining those values to get the final average.
My position is that this is not possible in the general case, since this will make some numbers, the ones appearing in the final set, more or less valuable than all the ones in the previous sets, unless you can split all the numbers into equally sized sets.
For instance, to calculate the average of 1-6, you can split it up into sets of 3 numbers like this:
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 3 3 3 / \ 3 3 3 / <-- 3 because 3 numbers in the set
---------- -----------
2 2 <-- 2 because 2 equally sized groups
Which gives you this:
2 5
- + - = 3.5
2 2
(note: (1+2+3+4+5+6)/6 = 3.5, so this is correct here)
However, my point is that once the number of values cannot be split into a number of equally sized sets, this method falls apart. For instance, what about the sequence 1-7, which contains a prime number of values.
Can a similar approach, that won't sum all the values, and count all the values, in one go, work?
So, is there such an approach? How do I calculate the average of an arbitrary number of values in which the following holds true:
I cannot do a normal sum/count approach, for whatever reason
I know the number of values beforehand (what if I don't, will that change the answer?)
Well, suppose you added three numbers and divided by three, and then added two numbers and divided by two. Can you get the average from these?
x = (a + b + c) / 3
y = (d + e) / 2
z = (f + g) / 2
And you want
r = (a + b + c + d + e + f + g) / 7
That is equal to
r = (3 * (a + b + c) / 3 + 2 * (d + e) / 2 + 2 * (f + g) / 2) / 7
r = (3 * x + 2 * y + 2 * z) / 7
Both lines above overflow, of course, but since division is distributive, we do
r = (3.0 / 7.0) * x + (2.0 / 7.0) * y + (2.0 / 7.0) * z
Which guarantees that you won't overflow, as I'm multiplying x, y and z by fractions less than one.
This is the fundamental point here. Neither I'm dividing all numbers beforehand by the total count, nor am I ever exceeding the overflow.
So... if you you keep adding to an accumulator, keep track of how many numbers you have added, and always test if the next number will cause an overflow, you can then get partial averages, and compute the final average.
And no, if you don't know the values beforehand, it doesn't change anything (provided that you can count them as you sum them).
Here is a Scala function that does it. It's not idiomatic Scala, so that it can be more easily understood:
def avg(input: List[Double]): Double = {
var partialAverages: List[(Double, Int)] = Nil
var inputLength = 0
var currentSum = 0.0
var currentCount = 0
var numbers = input
while (numbers.nonEmpty) {
val number = numbers.head
val rest = numbers.tail
if (number > 0 && currentSum > 0 && Double.MaxValue - currentSum < number) {
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
currentSum = 0
currentCount = 0
} else if (number < 0 && currentSum < 0 && Double.MinValue - currentSum > number) {
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
currentSum = 0
currentCount = 0
}
currentSum += number
currentCount += 1
inputLength += 1
numbers = rest
}
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
var result = 0.0
while (partialAverages.nonEmpty) {
val ((partialSum, partialCount) :: rest) = partialAverages
result += partialSum * (partialCount.toDouble / inputLength)
partialAverages = rest
}
result
}
EDIT:
Won't multiplying with 2, and 3, get me back into the range of "not supporter by the data type?"
No. If you were diving by 7 at the end, absolutely. But here you are dividing at each step of the sum. Even in your real case the weights (2/7 and 3/7) would be in the range of manageble numbers (e.g. 1/10 ~ 1/10000) which wouldn't make a big difference compared to your weight (i.e. 1).
PS: I wonder why I'm working on this answer instead of writing mine where I can earn my rep :-)
If you know the number of values beforehand (say it's N), you just add 1/N + 2/N + 3/N etc, supposing that you had values 1, 2, 3. You can split this into as many calculations as you like, and just add up your results. It may lead to a slight loss of precision, but this shouldn't be an issue unless you also need a super-accurate result.
If you don't know the number of items ahead of time, you might have to be more creative. But you can, again, do it progressively. Say the list is 1, 2, 3, 4. Start with mean = 1. Then mean = mean*(1/2) + 2*(1/2). Then mean = mean*(2/3) + 3*(1/3). Then mean = mean*(3/4) + 4*(1/4) etc. It's easy to generalize, and you just have to make sure the bracketed quantities are calculated in advance, to prevent overflow.
Of course, if you want extreme accuracy (say, more than 0.001% accuracy), you may need to be a bit more careful than this, but otherwise you should be fine.
Let X be your sample set. Partition it into two sets A and B in any way that you like. Define delta = m_B - m_A where m_S denotes the mean of a set S. Then
m_X = m_A + delta * |B| / |X|
where |S| denotes the cardinality of a set S. Now you can repeatedly apply this to partition and calculate the mean.
Why is this true? Let s = 1 / |A| and t = 1 / |B| and u = 1 / |X| (for convenience of notation) and let aSigma and bSigma denote the sum of the elements in A and B respectively so that:
m_A + delta * |B| / |X|
= s * aSigma + u * |B| * (t * bSigma - s * aSigma)
= s * aSigma + u * (bSigma - |B| * s * aSigma)
= s * aSigma + u * bSigma - u * |B| * s * aSigma
= s * aSigma * (1 - u * |B|) + u * bSigma
= s * aSigma * (u * |X| - u * |B|) + u * bSigma
= s * u * aSigma * (|X| - |B|) + u * bSigma
= s * u * aSigma * |A| + u * bSigma
= u * aSigma + u * bSigma
= u * (aSigma + bSigma)
= u * (xSigma)
= xSigma / |X|
= m_X
The proof is complete.
From here it is obvious how to use this to either recursively compute a mean (say by repeatedly splitting a set in half) or how to use this to parallelize the computation of the mean of a set.
The well-known on-line algorithm for calculating the mean is just a special case of this. This is the algorithm that if m is the mean of {x_1, x_2, ... , x_n} then the mean of {x_1, x_2, ..., x_n, x_(n+1)} is m + ((x_(n+1) - m)) / (n + 1). So with X = {x_1, x_2, ..., x_(n+1)}, A = {x_(n+1)}, and B = {x_1, x_2, ..., x_n} we recover the on-line algorithm.
Thinking outside the box: Use the median instead. It's much easier to calculate - there are tons of algorithms out there (e.g. using queues), you can often construct good arguments as to why it's more meaningful for data sets (less swayed by extreme values; etc) and you will have zero problems with numerical accuracy. It will be fast and efficient. Plus, for large data sets (which it sounds like you have), unless the distributions are truly weird, the values for the mean and median will be similar.
When you split the numbers into sets you're just dividing by the total number or am I missing something?
You have written it as
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 3 3 3 / \ 3 3 3 /
---------- -----------
2 2
but that's just
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 6 6 6 / \ 6 6 6 /
so for the numbers from 1 to 7 one possible grouping is just
/ 1 2 3 \ / 4 5 6 \ / 7 \
| - + - + - | + | - + - + - | + | - |
\ 7 7 7 / \ 7 7 7 / \ 7 /
Average of x_1 .. x_N
= (Sum(i=1,N,x_i)) / N
= (Sum(i=1,M,x_i) + Sum(i=M+1,N,x_i)) / N
= (Sum(i=1,M,x_i)) / N + (Sum(i=M+1,N,x_i)) / N
This can be repeatedly applied, and is true regardless of whether the summations are of equal size. So:
Keep adding terms until both:
adding another one will overflow (or otherwise lose precision)
dividing by N will not underflow
Divide the sum by N
Add the result to the average-so-far
There's one obvious awkward case, which is that there are some very small terms at the end of the sequence, such that you run out of values before you satisfy the condition "dividing by N will not underflow". In which case just discard those values - if their contribution to the average cannot be represented in your floating type, then it is in particular smaller than the precision of your average. So it doesn't make any difference to the result whether you include those terms or not.
There are also some less obvious awkward cases to do with loss of precision on individual summations. For example, what's the average of the values:
10^100, 1, -10^100
Mathematics says it's 1, but floating-point arithmetic says it depends what order you add up the terms, and in 4 of the 6 possibilities it's 0, because (10^100) + 1 = 10^100. But I think that the non-commutativity of floating-point arithmetic is a different and more general problem than this question. If sorting the input is out of the question, I think there are things you can do where you maintain lots of accumulators of different magnitudes, and add each new value to whichever one of them will give best precision. But I don't really know.
Here's another approach. You're 'receiving' numbers one-by-one from some source, but you can keep track of the mean at each step.
First, I will write out the formula for mean at step n+1:
mean[n+1] = mean[n] - (mean[n] - x[n+1]) / (n+1)
with the initial condition:
mean[0] = x[0]
(the index starts at zero).
The first equation can be simplified to:
mean[n+1] = n * mean[n] / (n+1) + x[n+1]/(n+1)
The idea is that you keep track of the mean, and when you 'receive' the next value in your sequence, you figure out its offset from the current mean, and divide it equally between the n+1 samples seen so far, and adjust your mean accordingly. If your numbers don't have a lot of variance, your running mean will need to be adjusted very slightly with the new numbers as n becomes large.
Obviously, this method works even if you don't know the total number of values when you start. It has an additional advantage that you know the value of the current mean at all times. One disadvantage that I can think of is the it probably gives more 'weight' to the numbers seen in the beginning (not in a strict mathematical sense, but because of floating point representations).
Finally, all such calculations are bound to run into floating-point 'errors' if one is not careful enough. See my answer to another question for some of the problems with floating point calculations and how to test for potential problems.
As a test, I generated N=100000 normally distributed random numbers with mean zero and variance 1. Then I calculated their mean by three methods.
sum(numbers) / N, call it m1,
my method above, call it m2,
sort the numbers, and then use my method above, call it m3.
Here's what I found: m1 − m2 ∼ −4.6×10−17, m1 − m3 ∼ −3×10−15, m2 − m3 ∼ −3×10−15. So, if your numbers are sorted, the error might not be small enough for you. (Note however that even the worst error is 10−15 parts in 1 for 100000 numbers, so it might be good enough anyway.)
Some of the mathematical solutions here are very good. Here's a simple technical solution.
Use a larger data type. This breaks down into two possibilities:
Use a high-precision floating point library. One who encounters a need to average a billion numbers probably has the resources to purchase, or the brain power to write, a 128-bit (or longer) floating point library.
I understand the drawbacks here. It would certainly be slower than using intrinsic types. You still might over/underflow if the number of values grows too high. Yada yada.
If your values are integers or can be easily scaled to integers, keep your sum in a list of integers. When you overflow, simply add another integer. This is essentially a simplified implementation of the first option. A simple (untested) example in C# follows
class BigMeanSet{
List<uint> list = new List<uint>();
public double GetAverage(IEnumerable<uint> values){
list.Clear();
list.Add(0);
uint count = 0;
foreach(uint value in values){
Add(0, value);
count++;
}
return DivideBy(count);
}
void Add(int listIndex, uint value){
if((list[listIndex] += value) < value){ // then overflow has ocurred
if(list.Count == listIndex + 1)
list.Add(0);
Add(listIndex + 1, 1);
}
}
double DivideBy(uint count){
const double shift = 4.0 * 1024 * 1024 * 1024;
double rtn = 0;
long remainder = 0;
for(int i = list.Count - 1; i >= 0; i--){
rtn *= shift;
remainder <<= 32;
rtn += Math.DivRem(remainder + list[i], count, out remainder);
}
rtn += remainder / (double)count;
return rtn;
}
}
Like I said, this is untested—I don't have a billion values I really want to average—so I've probably made a mistake or two, especially in the DivideBy function, but it should demonstrate the general idea.
This should provide as much accuracy as a double can represent and should work for any number of 32-bit elements, up to 232 - 1. If more elements are needed, then the count variable will need be expanded and the DivideBy function will increase in complexity, but I'll leave that as an exercise for the reader.
In terms of efficiency, it should be as fast or faster than any other technique here, as it only requires iterating through the list once, only performs one division operation (well, one set of them), and does most of its work with integers. I didn't optimize it, though, and I'm pretty certain it could be made slightly faster still if necessary. Ditching the recursive function call and list indexing would be a good start. Again, an exercise for the reader. The code is intended to be easy to understand.
If anybody more motivated than I am at the moment feels like verifying the correctness of the code, and fixing whatever problems there might be, please be my guest.
I've now tested this code, and made a couple of small corrections (a missing pair of parentheses in the List<uint> constructor call, and an incorrect divisor in the final division of the DivideBy function).
I tested it by first running it through 1000 sets of random length (ranging between 1 and 1000) filled with random integers (ranging between 0 and 232 - 1). These were sets for which I could easily and quickly verify accuracy by also running a canonical mean on them.
I then tested with 100* large series, with random length between 105 and 109. The lower and upper bounds of these series were also chosen at random, constrained so that the series would fit within the range of a 32-bit integer. For any series, the results are easily verifiable as (lowerbound + upperbound) / 2.
*Okay, that's a little white lie. I aborted the large-series test after about 20 or 30 successful runs. A series of length 109 takes just under a minute and a half to run on my machine, so half an hour or so of testing this routine was enough for my tastes.
For those interested, my test code is below:
static IEnumerable<uint> GetSeries(uint lowerbound, uint upperbound){
for(uint i = lowerbound; i <= upperbound; i++)
yield return i;
}
static void Test(){
Console.BufferHeight = 1200;
Random rnd = new Random();
for(int i = 0; i < 1000; i++){
uint[] numbers = new uint[rnd.Next(1, 1000)];
for(int j = 0; j < numbers.Length; j++)
numbers[j] = (uint)rnd.Next();
double sum = 0;
foreach(uint n in numbers)
sum += n;
double avg = sum / numbers.Length;
double ans = new BigMeanSet().GetAverage(numbers);
Console.WriteLine("{0}: {1} - {2} = {3}", numbers.Length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
for(int i = 0; i < 100; i++){
uint length = (uint)rnd.Next(100000, 1000000001);
uint lowerbound = (uint)rnd.Next(int.MaxValue - (int)length);
uint upperbound = lowerbound + length;
double avg = ((double)lowerbound + upperbound) / 2;
double ans = new BigMeanSet().GetAverage(GetSeries(lowerbound, upperbound));
Console.WriteLine("{0}: {1} - {2} = {3}", length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
}