Exponent Binary Numbers - binary

Could someone tell me the logic behind exponenting binary numbers? For example, I want to take 110^10, but I don't know the logic behind it. If someone could supply me with that, it'd be a great help.. (And I want it to be done in pure binary with no conversions and no looping multiplication. Just logic...)

peenut is correct in that exponentiation doesn't care what base you're representing your numbers in, and I don't know what you mean by "just logic," but here's a stab at it.
A quick search over at Wikipedia reveals this algorithm. The basic ideas is to square your base, store the result, and then square the result and repeat. This will give you the factors of your answer, which you can then multiply together. I think of it as a "binary search"-flavored exponentiation algorithm since you can skip a lot of intermediate steps by squaring and storing.

Binary exponents are very easy. They are simply additions and shifts only.
the number 110 is where you start.
Working backwards from the number 10 - (i.e. 0) - it's a zero, so this means "do not add it in."
Now you shift left - so 110 becomes 1100
Now you work on the next bit of the 10 (i.e. 1) - it's a one, so this means "add this to the result" - it's 0 so far, because we didn't already add it, so the result is now 1100
there are no more bits to do - so the answer is 1100
If you were doing 110^110 - you would have one more to do - so - you again shift and get 11000 now.
The last bit is again a one, so now you add:
1100 +
11000 =
100100
110^10=1100 i.e. 6^2=12
110^110=100100 i.e. 6^6=36

Exponentiation is operation that is independent of actual textual representation of number (e.g. in base 2 - binary, base 10 - decimal).
Maybe you want to ask about binary XOR (eXclusive OR) operation?

Unfortunately the easiest way for your computer to handle simple exponents is your "looping multiplication" (or the naïve approach), which is the most rudimentary (and literal) way of handling it. As #user1561358 commented, it is NOT just binary adds and shifts. That is multiplication. To raise 66 (110110) the naïve approach has you multiplying the base n times (as below):
110
x 110
--------------
100100 = 36
x 110
--------------
11011000 = 216
x 110
--------------
10100010000 = 1296
x 110
--------------
1111001100000 = 7776
x 110
--------------
01011011001000000 = 46656
The simple code for a naïve multiplication is elegant for most applications:
long long binpow(long long a, long long b) {
if (b == 0)
return 1;
long long res = binpow(a, b / 2);
if (b % 2)
return res * res * a;
else
return res * res;
}
For larger or arbitrary exponents you can dramatically reduce the number of calculations by applying Horner's Method, explained in great detail in this video specifically calculating binary exponents.
In essence, you are just multiplying the bits with non-zero exponents. Let's look at 11021102, (or 66):
11021102 breaks down into the following exponents:
There is no "1" bit set so 61 won't be multiplied, but we do have the two and four bits to calculate:
6102 = 36
61002 = 1296
So, 66 = 36 x 1296 = 46656
The above code can be modified only slightly to check for non-zero exponents with a while {.. test:
long long binpow(long long a, long long b) {
long long res = 1;
while (b > 0) {
if (b & 1)
res = res * a;
a = a * a;
b >>= 1;
}
return res;
}
To really see the advantage of this let's try the binary exponentiation of
11121000000002, which is 7256.
The naïve approach would require us to make 256 multiplication iterations!
Instead, all the exponents except 2256 are zero, so they are skipped in the while loop. There is one single iterative calculation where a * a happens 256 times:
11121000000002 = (a 718 digit binary beginning with 11001101011....)
728 = 2213595400046048155450188615474945937162517050260073069916366390524704974007989996848003433837940380782794455262312607598867363425940560014856027866381946458951205837379116473663246733509680721264246243189632348313601

Related

Bitwise comparison for 16 bitstrings

I have 16 unrelated binary strings (of the same length). eg. 100000001010, 010100010010 and so on, and I need to find out a bitstring in which position x is a 1 IF position x is 1 for ATLEAST 2 bitstrings out of the 16.
Initially, I tries using bitwise XOR and this works great as long as even number of strings contain a 1, but when odd number of strings contain 1, the answer given is reverse.
A simple example (with 3 strings) would be:
A: 10101010
B: 01010111
C: 11011011
f(A,B,C)= answer
Expected answer: 11011011
Answer I'm getting right now: 11011001
I know I'm wrong somewhere but I'm at a loss on how to proceed
Help much appreciated
You can do something like
unsigned once = x[0], twice = 0;
for (int i = 1; i < 16; ++i) {
twice |= once & x[i];
once |= x[i];
}
(A AND B) OR (A AND C) OR (B AND C)
This is higher complexity than what you had originally.

Addition as binary operations

I'm adding a pair of unsigned 32bit binary integers (including overflow). The addition is expressive rather than actually computed, so there's no need for an efficient algorithm, but since each component is manually specified in terms of individual bits, I need one with a compact representation. Any suggestions?
Edit: In terms of boolean operators. So I'm thinking that carry = a & b; sum = a ^ b; for the first bit, but the other 31?
Oh, and subtraction!
You can not perform addition with simple boolean operators, you need an adder. (Of course the adder can be built using some more complex boolean operators.)
The adder adds two bits plus carry, and passes carry out to next bit.
Pseudocode:
carry = 0
for i = 31 to 0
sum = a[i] + b[i] + carry
result[i] = sum & 1
carry = sum >> 1
next i
Here is an implementation using the macro language of VEDIT text editor.
The two numbers to be added are given as ASCII strings, one on each line.
The results are inserted on the third line.
Reg_Empty(10) // result as ASCII string
#0 = 0 // carry bit
for (#9=31; #9>=0; #9--) {
#1 = CC(#9)-'0' // a bit from first number
#2 = CC(#9+34)-'0' // a bit from second number
#3 = #0+#1+#2 // add with carry
#4 = #3 & 1 // resulting bit
#0 = #3 >> 1 // new carry
Num_Str(#4, 11, LEFT) // convert bit to ASCII
Reg_Set(10, #11, INSERT) // insert bit to start of string
}
Line(2)
Reg_Ins(10) IN
Return
Example input and output:
00010011011111110101000111100001
00110110111010101100101101110111
01001010011010100001110101011000
Edit:
Here is pseudocode where the adder has been implemented with boolean operations:
carry = 0
for i = 31 to 0
sum[i] = a[i] ^ b[i] ^ carry
carry = (a[i] & b[i]) | (a[i] & carry) | (b[i] & carry)
next i
Perhaps you can begin by stating addition for two 1-bit numbers, with overflow (=carry):
A | B | SUM | CARRY
===================
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
To generalize this further, you need a "full adder" which also takes a carry as an input, from the preceding stage. Then you can express the 32-bit addition as a chain of 32 such full adders (with the first stage's carry input tied to 0).
Regarding data structure part to represent these numbers. There are 4 ways
1) Bit Array
A bit array is an array data structure that compactly stores individual bits.
They are also known as bitmap, bitset or bitstring.
2) Bit Field
A bit field is a common idiom used in computer programming to compactly store multiple logical values as a short series of bits where each of the single bits can be addressed separately.
3) Bit Plane
A bit plane of a digital discrete signal (such as image or sound) is a set of bits corresponding to a given bit position in each of the binary numbers representing the signal.
4) Bit Board
A bitboard or bit field is a format that stuffs a whole group of related boolean variables into the same integer, typically representing positions on a board game.
Regarding implementation, you can check that at each step, we have following
S = a xor b xor c
S is result of sum of current bits a an b
c is input carry
Cout - output carry is (a & b) xor (c & (a xor b))

Translation from Complex-FFT to Finite-Field-FFT

Good afternoon!
I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have.
Consider the following code (coefficients' length, let it be m, is an exact power of two):
/// <summary>
/// Calculates the result of the recursive Number Theoretic Transform.
/// </summary>
/// <param name="coefficients"></param>
/// <returns></returns>
private static BigInteger[] Recursive_NTT_Skeleton(
IList<BigInteger> coefficients,
IList<BigInteger> rootsOfUnity,
int step,
int offset)
{
// Calculate the length of vectors at the current step of recursion.
// -
int n = coefficients.Count / step - offset / step;
if (n == 1)
{
return new BigInteger[] { coefficients[offset] };
}
BigInteger[] results = new BigInteger[n];
IList<BigInteger> resultEvens =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset);
IList<BigInteger> resultOdds =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset + step);
for (int k = 0; k < n / 2; k++)
{
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
}
return results;
}
It worked for complex FFT (replace BigInteger with a complex numeric type (I had my own)). It doesn't work here even though I changed the procedure of finding the primitive roots of unity appropriately.
Supposedly, the problem is this: rootsOfUnity parameter passed originally contained only the first half of m-th complex roots of unity in this order:
omega^0 = 1, omega^1, omega^2, ..., omega^(n/2)
It was enough, because on these three lines of code:
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
I originally made use of the fact, that at any level of recursion (for any n and i), the complex root of unity -omega^(i) = omega^(i + n/2).
However, that property obviously doesn't hold in finite fields. But is there any analogue of it which would allow me to still compute only the first half of the roots?
Or should I extend the cycle from n/2 to n and pre-compute all the m-th roots of unity?
Maybe there are other problems with this code?..
Thank you very much in advance!
I recently wanted to implement NTT for fast multiplication instead of DFFT too. Read a lot of confusing things, different letters everywhere and no simple solution, and also my finite fields knowledge is rusty , but today i finally got it right (after 2 days of trying and analog-ing with DFT coefficients) so here are my insights for NTT:
Computation
X(i) = sum(j=0..n-1) of ( Wn^(i*j)*x(i) );
where X[] is NTT transformed x[] of size n where Wn is the NTT basis. All computations are on integer modulo arithmetics mod p no complex numbers anywhere.
Important values
Wn = r ^ L mod p is basis for NTT
Wn = r ^ (p-1-L) mod p is basis for INTT
Rn = n ^ (p-2) mod p is scaling multiplicative constant for INTT ~(1/n)
p is prime that p mod n == 1 and p>max'
max is max value of x[i] for NTT or X[i] for INTT
r = <1,p)
L = <1,p) and also divides p-1
r,L must be combined so r^(L*i) mod p == 1 if i=0 or i=n
r,L must be combined so r^(L*i) mod p != 1 if 0 < i < n
max' is the sub-result max value and depends on n and type of computation. For single (I)NTT it is max' = n*max but for convolution of two n sized vectors it is max' = n*max*max etc. See Implementing FFT over finite fields for more info about it.
functional combination of r,L,p is different for different n
this is important, you have to recompute or select parameters from table before each NTT layer (n is always half of the previous recursion).
Here is my C++ code that finds the r,L,p parameters (needs modular arithmetics which is not included, you can replace it with (a+b)%c,(a-b)%c,(a*b)%c,... but in that case beware of overflows especial for modpow and modmul) The code is not optimized yet there are ways to speed it up considerably. Also prime table is fairly limited so either use SoE or any other algo to obtain primes up to max' in order to work safely.
DWORD _arithmetics_primes[]=
{
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,
179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,
419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,
661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997,1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093,1097,1103,1109,1117,1123,1129,1151,
0}; // end of table is 0, the more primes are there the bigger numbers and n can be used
// compute NTT consts W=r^L%p for n
int i,j,k,n=16;
long w,W,iW,p,r,L,l,e;
long max=81*n; // edit1: max num for NTT for my multiplication purposses
for (e=1,j=0;e;j++) // find prime p that p%n=1 AND p>max ... 9*9=81
{
p=_arithmetics_primes[j];
if (!p) break;
if ((p>max)&&(p%n==1))
for (r=2;r<p;r++) // check all r
{
for (l=1;l<p;l++)// all l that divide p-1
{
L=(p-1);
if (L%l!=0) continue;
L/=l;
W=modpow(r,L,p);
e=0;
for (w=1,i=0;i<=n;i++,w=modmul(w,W,p))
{
if ((i==0) &&(w!=1)) { e=1; break; }
if ((i==n) &&(w!=1)) { e=1; break; }
if ((i>0)&&(i<n)&&(w==1)) { e=1; break; }
}
if (!e) break;
}
if (!e) break;
}
}
if (e) { error; } // error no combination r,l,p for n found
W=modpow(r, L,p); // Wn for NTT
iW=modpow(r,p-1-L,p); // Wn for INTT
and here is my slow NTT and INTT implementations (i havent got to fast NTT,INTT yet) they are both tested with Schönhage–Strassen multiplication successfully.
//---------------------------------------------------------------------------
void NTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wj,wi,a,n2=n>>1;
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=a;
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
void INTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wi=1,wj=1,rN,a,n2=n>>1;
rN=modpow(n,m-2,m);
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=modmul(a,rN,m);
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
dst is destination array
src is source array
n is array size
m is modulus (p)
w is basis (Wn)
hope this helps to someone. If i forgot something please write ...
[edit1: fast NTT/INTT]
Finally I manage to get fast NTT/INTT to work. Was little bit more tricky than normal FFT:
//---------------------------------------------------------------------------
void _NFTT(long *dst,long *src,long n,long m,long w)
{
if (n<=1) { if (n==1) dst[0]=src[0]; return; }
long i,j,a0,a1,n2=n>>1,w2=modmul(w,w,m);
// reorder even,odd
for (i=0,j=0;i<n2;i++,j+=2) dst[i]=src[j];
for ( j=1;i<n ;i++,j+=2) dst[i]=src[j];
// recursion
_NFTT(src ,dst ,n2,m,w2); // even
_NFTT(src+n2,dst+n2,n2,m,w2); // odd
// restore results
for (w2=1,i=0,j=n2;i<n2;i++,j++,w2=modmul(w2,w,m))
{
a0=src[i];
a1=modmul(src[j],w2,m);
dst[i]=modadd(a0,a1,m);
dst[j]=modsub(a0,a1,m);
}
}
//---------------------------------------------------------------------------
void _INFTT(long *dst,long *src,long n,long m,long w)
{
long i,rN;
rN=modpow(n,m-2,m);
_NFTT(dst,src,n,m,w);
for (i=0;i<n;i++) dst[i]=modmul(dst[i],rN,m);
}
//---------------------------------------------------------------------------
[edit3]
I have optimized my code (3x times faster than code above),but still i am not satisfied with it so i started new question with it. There I have optimized my code even further (about 40x times faster than code above) so its almost the same speed as FFT on floating point of the same bit size. Link to it is here:
Modular arithmetics and NTT (finite field DFT) optimizations
To turn Cooley-Tukey (complex) FFT into modular arithmetic approach, i.e. NTT, you must replace complex definition for omega. For the approach to be purely recursive, you also need to recalculate omega for each level based on current signal size. This is possible because min. suitable modulus decreases as we move down in the call tree, so modulus used for root is suitable for lower layers. Additionally, as we are using same modulus, the same generator may be used as we move down the call tree. Also, for inverse transform, you should take additional step to take recalculated omega a and instead use as omega: b = a ^ -1 (via using inverse modulo operation). Specifically, b = invMod(a, N) s.t. b * a == 1 (mod N), where N is the chosen prime modulus.
Rewriting an expression involving omega by exploiting periodicity still works in modular arithmetic realm. You also need to find a way to determine the modulus (a prime) for the problem and a valid generator.
We note that your code works, though it is not a MWE. We extended it using common sense, and got correct result for a polynomial multiplication application. You just have to provide correct values of omega raised to certain powers.
While your code works, though, like from many other sources, you double spacing for each level. This does not lead to recursion that is as clean, though; this turns out to be identical to recalculating omega based on current signal size because the power for omega definition is inversely proportional to signal size. To reiterate: halving signal size is like squaring omega, which is like giving doubled powers for omega (which is what one would do for doubling of spacing). The nice thing about the approach that deals with recalculating of omega is that each subproblem is more cleanly complete in its own right.
There is a paper that shows some of the math for modular approach; it is a paper by Baktir and Sunar from 2006. See the paper at the end of this post.
You do not need to extend the cycle from n / 2 to n.
So, yes, some sources which say to just drop in a different omega definition for modular arithmetic approach are sweeping under the rug many details.
Another issue is that it is important to acknowledge that the signal size must be large enough if we are to not have overflow for result time-domain signal if we are performing convolution. Additionally, it may be useful to find certain implementations for exponentiation subject to modulus exist that are fast, even if the power is quite large.
References
Baktir and Sunar - Achieving efficient polynomial multiplication in Fermat fields using the fast Fourier transform (2006)
You must make sure that roots of unity actually exist. In R there are only 2 roots of unity: 1 and -1, since only for them x^n=1 can be true.
In C you have infinitely many roots of unity: w=exp(2*pi*i/N) is a primitive N-th roots of unity and all w^k for 0<=k
Now to your problem: you have to make sure the ring you're working in offers the same property: enough roots of unity.
Schönhage and Strassen (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm) use integers modulo 2^N+1. This ring has enough roots of unity. 2^N == -1 is a 2nd root of unity, 2^(N/2) is a 4th root of unity and so on. Furthermore, these roots of unity have the advantage that they are powers of two and can be implemented as binary shifts (with a modulo operation afterwards, which comes down to a add/subtract).
I think QuickMul (http://www.cs.nyu.edu/exact/doc/qmul.ps) works modulo 2^N-1.

Howto convert decimal (xx.xx) to binary

This isn't necessarily a programming question but i'm sure you folks know how to do it. How would i convert floating point numbers into binary.
The number i am looking at is 27.625.
27 would be 11011, but what do i do with the .625?
On paper, a good algorithm to convert the fractional part of a decimal number is the "repeated multiplication by 2" algorithm (see details at http://www.exploringbinary.com/base-conversion-in-php-using-bcmath/, under the heading "dec2bin_f()"). For example, 0.8125 converts to binary as follows:
1. 0.8125 * 2 = 1.625
2. 0.625 * 2 = 1.25
3. 0.25 * 2 = 0.5
4. 0.5 * 2 = 1.0
The integer parts are stripped off and saved at each step, forming the binary result: 0.1101.
If you want a tool to do these kinds of conversions automatically, see my decimal/binary converter.
Assuming you are not thinking about inside a PC, just thinking about binary vs decimal as physically represented on a piece of paper:
You know .1 in binary should be .5 in decimal, so the .1's place is worth .5 (1/2)
the .01 is worth .25 (1/4) (half of the previous one)
the .001 is worth (1/8) (Half of 1/4)
Notice how the denominator is progressing just like the whole numbers to the left of the decimal do--standard ^2 pattern? The next should be 1/16...
So you start with your .625, is it higher than .5? Yes, so set the first bit and subtract the .5
.1 binary with a decimal remainder of .125
Now you have the next spot, it's worth .25dec, is that less than your current remainder of .125? No, so you don't have enough decimal "Money" to buy that second spot, it has to be a 0
.10 binary, still .125 remainder.
Now go to the third position, etc. (Hint: I don't think there will be too much etc.)
There are several different ways to encode a non-integral number in binary. By far the most common type are floating point representations, especially the one codified in IEEE 754.
the code works for me is as below , you can use this code to convert any type of dobule values:
private static String doubleToBinaryString( double n ) {
String val = Integer.toBinaryString((int)n)+"."; // Setting up string for result
String newN ="0" + (""+n).substring((""+n).indexOf("."));
n = Double.parseDouble(newN);
while ( n > 0 ) { // While the fraction is greater than zero (not equal or less than zero)
double r = n * 2; // Multiply current fraction (n) by 2
if( r >= 1 ) { // If the ones-place digit >= 1
val += "1"; // Concat a "1" to the end of the result string (val)
n = r - 1; // Remove the 1 from the current fraction (n)
}else{ // If the ones-place digit == 0
val += "0"; // Concat a "0" to the end of the result string (val)
n = r; // Set the current fraction (n) to the new fraction
}
}
return val; // return the string result with all appended binary values
}

How to reduce calculation of average to sub-sets in a general way?

Edit: Since it appears nobody is reading the original question this links to, let me bring in a synopsis of it here.
The original problem, as asked by someone else, was that, given a large number of values, where the sum would exceed what a data type of Double would hold, how can one calculate the average of those values.
There was several answers that said to calculate in sets, like taking 50 and 50 numbers, and calculating the average inside those sets, and then finally take the average of all those sets and combine those to get the final average value.
My position was that unless you can guarantee that all those values can be split into a number of equally sized sets, you cannot use this approach. Someone dared me to ask the question here, in order to provide the answer, so here it is.
Basically, given an arbitrary number of values, where:
I know the number of values beforehand (but again, how would your answer change if you didn't?`)
I cannot gather up all the numbers, nor can I sum them (the sum will be too big for a normal data type in your programming language)
how can I calculate the average?
The rest of the question here outlines how, and the problems with, the approach to split into equally sized sets, but I'd really just like to know how you can do it.
Note that I know perfectly well enough math to know that in math theory terms, calculating the sum of A[1..N]/N will give me the average, let's assume that there are reasons that it isn't just as simple, and I need to split up the workload, and that the number of values isn't necessarily going to be divisable by 3, 7, 50, 1000 or whatever.
In other words, the solution I'm after will have to be general.
From this question:
What is a good solution for calculating an average where the sum of all values exceeds a double’s limits?
my position was that splitting the workload up into sets is no good, unless you can ensure that the size of those sets are equal.
Edit: The original question was about the upper limit that a particular data type could hold, and since he was summing up a lot of numbers (count that was given as example was 10^9), the data type could not hold the sum. Since this was a problem in the original solution, I'm assuming (and this is a prerequisite for my question, sorry for missing that) that the numbers are too big to give any meaningful answers.
So, dividing by the total number of values directly is out. The original reason for why a normal SUM/COUNT solution was out was that SUM would overflow, but let's assume, for this question that SET-SET/SET-SIZE will underflow, or whatever.
The important part is that I cannot simply sum, I cannot simply divide by the number of total values. If I cannot do that, will my approach work, or not, and what can I do to fix it?
Let me outline the problem.
Let's assume you're going to calculate the average of the numbers 1 through 6, but you cannot (for whatever reason) do so by summing the numbers, counting the numbers, and then dividing the sum by the count. In other words, you cannot simply do (1+2+3+4+5+6)/6.
In other words, SUM(1..6)/COUNT(1..6) is out. We're not considering NULL's (as in database NULL's) here.
Several of the answers to that question alluded to being able to split the numbers being averaged into sets, say 3 or 50 or 1000 numbers, then calculating some number for that, and then finally combining those values to get the final average.
My position is that this is not possible in the general case, since this will make some numbers, the ones appearing in the final set, more or less valuable than all the ones in the previous sets, unless you can split all the numbers into equally sized sets.
For instance, to calculate the average of 1-6, you can split it up into sets of 3 numbers like this:
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 3 3 3 / \ 3 3 3 / <-- 3 because 3 numbers in the set
---------- -----------
2 2 <-- 2 because 2 equally sized groups
Which gives you this:
2 5
- + - = 3.5
2 2
(note: (1+2+3+4+5+6)/6 = 3.5, so this is correct here)
However, my point is that once the number of values cannot be split into a number of equally sized sets, this method falls apart. For instance, what about the sequence 1-7, which contains a prime number of values.
Can a similar approach, that won't sum all the values, and count all the values, in one go, work?
So, is there such an approach? How do I calculate the average of an arbitrary number of values in which the following holds true:
I cannot do a normal sum/count approach, for whatever reason
I know the number of values beforehand (what if I don't, will that change the answer?)
Well, suppose you added three numbers and divided by three, and then added two numbers and divided by two. Can you get the average from these?
x = (a + b + c) / 3
y = (d + e) / 2
z = (f + g) / 2
And you want
r = (a + b + c + d + e + f + g) / 7
That is equal to
r = (3 * (a + b + c) / 3 + 2 * (d + e) / 2 + 2 * (f + g) / 2) / 7
r = (3 * x + 2 * y + 2 * z) / 7
Both lines above overflow, of course, but since division is distributive, we do
r = (3.0 / 7.0) * x + (2.0 / 7.0) * y + (2.0 / 7.0) * z
Which guarantees that you won't overflow, as I'm multiplying x, y and z by fractions less than one.
This is the fundamental point here. Neither I'm dividing all numbers beforehand by the total count, nor am I ever exceeding the overflow.
So... if you you keep adding to an accumulator, keep track of how many numbers you have added, and always test if the next number will cause an overflow, you can then get partial averages, and compute the final average.
And no, if you don't know the values beforehand, it doesn't change anything (provided that you can count them as you sum them).
Here is a Scala function that does it. It's not idiomatic Scala, so that it can be more easily understood:
def avg(input: List[Double]): Double = {
var partialAverages: List[(Double, Int)] = Nil
var inputLength = 0
var currentSum = 0.0
var currentCount = 0
var numbers = input
while (numbers.nonEmpty) {
val number = numbers.head
val rest = numbers.tail
if (number > 0 && currentSum > 0 && Double.MaxValue - currentSum < number) {
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
currentSum = 0
currentCount = 0
} else if (number < 0 && currentSum < 0 && Double.MinValue - currentSum > number) {
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
currentSum = 0
currentCount = 0
}
currentSum += number
currentCount += 1
inputLength += 1
numbers = rest
}
partialAverages = (currentSum / currentCount, currentCount) :: partialAverages
var result = 0.0
while (partialAverages.nonEmpty) {
val ((partialSum, partialCount) :: rest) = partialAverages
result += partialSum * (partialCount.toDouble / inputLength)
partialAverages = rest
}
result
}
EDIT:
Won't multiplying with 2, and 3, get me back into the range of "not supporter by the data type?"
No. If you were diving by 7 at the end, absolutely. But here you are dividing at each step of the sum. Even in your real case the weights (2/7 and 3/7) would be in the range of manageble numbers (e.g. 1/10 ~ 1/10000) which wouldn't make a big difference compared to your weight (i.e. 1).
PS: I wonder why I'm working on this answer instead of writing mine where I can earn my rep :-)
If you know the number of values beforehand (say it's N), you just add 1/N + 2/N + 3/N etc, supposing that you had values 1, 2, 3. You can split this into as many calculations as you like, and just add up your results. It may lead to a slight loss of precision, but this shouldn't be an issue unless you also need a super-accurate result.
If you don't know the number of items ahead of time, you might have to be more creative. But you can, again, do it progressively. Say the list is 1, 2, 3, 4. Start with mean = 1. Then mean = mean*(1/2) + 2*(1/2). Then mean = mean*(2/3) + 3*(1/3). Then mean = mean*(3/4) + 4*(1/4) etc. It's easy to generalize, and you just have to make sure the bracketed quantities are calculated in advance, to prevent overflow.
Of course, if you want extreme accuracy (say, more than 0.001% accuracy), you may need to be a bit more careful than this, but otherwise you should be fine.
Let X be your sample set. Partition it into two sets A and B in any way that you like. Define delta = m_B - m_A where m_S denotes the mean of a set S. Then
m_X = m_A + delta * |B| / |X|
where |S| denotes the cardinality of a set S. Now you can repeatedly apply this to partition and calculate the mean.
Why is this true? Let s = 1 / |A| and t = 1 / |B| and u = 1 / |X| (for convenience of notation) and let aSigma and bSigma denote the sum of the elements in A and B respectively so that:
m_A + delta * |B| / |X|
= s * aSigma + u * |B| * (t * bSigma - s * aSigma)
= s * aSigma + u * (bSigma - |B| * s * aSigma)
= s * aSigma + u * bSigma - u * |B| * s * aSigma
= s * aSigma * (1 - u * |B|) + u * bSigma
= s * aSigma * (u * |X| - u * |B|) + u * bSigma
= s * u * aSigma * (|X| - |B|) + u * bSigma
= s * u * aSigma * |A| + u * bSigma
= u * aSigma + u * bSigma
= u * (aSigma + bSigma)
= u * (xSigma)
= xSigma / |X|
= m_X
The proof is complete.
From here it is obvious how to use this to either recursively compute a mean (say by repeatedly splitting a set in half) or how to use this to parallelize the computation of the mean of a set.
The well-known on-line algorithm for calculating the mean is just a special case of this. This is the algorithm that if m is the mean of {x_1, x_2, ... , x_n} then the mean of {x_1, x_2, ..., x_n, x_(n+1)} is m + ((x_(n+1) - m)) / (n + 1). So with X = {x_1, x_2, ..., x_(n+1)}, A = {x_(n+1)}, and B = {x_1, x_2, ..., x_n} we recover the on-line algorithm.
Thinking outside the box: Use the median instead. It's much easier to calculate - there are tons of algorithms out there (e.g. using queues), you can often construct good arguments as to why it's more meaningful for data sets (less swayed by extreme values; etc) and you will have zero problems with numerical accuracy. It will be fast and efficient. Plus, for large data sets (which it sounds like you have), unless the distributions are truly weird, the values for the mean and median will be similar.
When you split the numbers into sets you're just dividing by the total number or am I missing something?
You have written it as
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 3 3 3 / \ 3 3 3 /
---------- -----------
2 2
but that's just
/ 1 2 3 \ / 4 5 6 \
| - + - + - | + | - + - + - |
\ 6 6 6 / \ 6 6 6 /
so for the numbers from 1 to 7 one possible grouping is just
/ 1 2 3 \ / 4 5 6 \ / 7 \
| - + - + - | + | - + - + - | + | - |
\ 7 7 7 / \ 7 7 7 / \ 7 /
Average of x_1 .. x_N
= (Sum(i=1,N,x_i)) / N
= (Sum(i=1,M,x_i) + Sum(i=M+1,N,x_i)) / N
= (Sum(i=1,M,x_i)) / N + (Sum(i=M+1,N,x_i)) / N
This can be repeatedly applied, and is true regardless of whether the summations are of equal size. So:
Keep adding terms until both:
adding another one will overflow (or otherwise lose precision)
dividing by N will not underflow
Divide the sum by N
Add the result to the average-so-far
There's one obvious awkward case, which is that there are some very small terms at the end of the sequence, such that you run out of values before you satisfy the condition "dividing by N will not underflow". In which case just discard those values - if their contribution to the average cannot be represented in your floating type, then it is in particular smaller than the precision of your average. So it doesn't make any difference to the result whether you include those terms or not.
There are also some less obvious awkward cases to do with loss of precision on individual summations. For example, what's the average of the values:
10^100, 1, -10^100
Mathematics says it's 1, but floating-point arithmetic says it depends what order you add up the terms, and in 4 of the 6 possibilities it's 0, because (10^100) + 1 = 10^100. But I think that the non-commutativity of floating-point arithmetic is a different and more general problem than this question. If sorting the input is out of the question, I think there are things you can do where you maintain lots of accumulators of different magnitudes, and add each new value to whichever one of them will give best precision. But I don't really know.
Here's another approach. You're 'receiving' numbers one-by-one from some source, but you can keep track of the mean at each step.
First, I will write out the formula for mean at step n+1:
mean[n+1] = mean[n] - (mean[n] - x[n+1]) / (n+1)
with the initial condition:
mean[0] = x[0]
(the index starts at zero).
The first equation can be simplified to:
mean[n+1] = n * mean[n] / (n+1) + x[n+1]/(n+1)
The idea is that you keep track of the mean, and when you 'receive' the next value in your sequence, you figure out its offset from the current mean, and divide it equally between the n+1 samples seen so far, and adjust your mean accordingly. If your numbers don't have a lot of variance, your running mean will need to be adjusted very slightly with the new numbers as n becomes large.
Obviously, this method works even if you don't know the total number of values when you start. It has an additional advantage that you know the value of the current mean at all times. One disadvantage that I can think of is the it probably gives more 'weight' to the numbers seen in the beginning (not in a strict mathematical sense, but because of floating point representations).
Finally, all such calculations are bound to run into floating-point 'errors' if one is not careful enough. See my answer to another question for some of the problems with floating point calculations and how to test for potential problems.
As a test, I generated N=100000 normally distributed random numbers with mean zero and variance 1. Then I calculated their mean by three methods.
sum(numbers) / N, call it m1,
my method above, call it m2,
sort the numbers, and then use my method above, call it m3.
Here's what I found: m1 − m2 ∼ −4.6×10−17, m1 − m3 ∼ −3×10−15, m2 − m3 ∼ −3×10−15. So, if your numbers are sorted, the error might not be small enough for you. (Note however that even the worst error is 10−15 parts in 1 for 100000 numbers, so it might be good enough anyway.)
Some of the mathematical solutions here are very good. Here's a simple technical solution.
Use a larger data type. This breaks down into two possibilities:
Use a high-precision floating point library. One who encounters a need to average a billion numbers probably has the resources to purchase, or the brain power to write, a 128-bit (or longer) floating point library.
I understand the drawbacks here. It would certainly be slower than using intrinsic types. You still might over/underflow if the number of values grows too high. Yada yada.
If your values are integers or can be easily scaled to integers, keep your sum in a list of integers. When you overflow, simply add another integer. This is essentially a simplified implementation of the first option. A simple (untested) example in C# follows
class BigMeanSet{
List<uint> list = new List<uint>();
public double GetAverage(IEnumerable<uint> values){
list.Clear();
list.Add(0);
uint count = 0;
foreach(uint value in values){
Add(0, value);
count++;
}
return DivideBy(count);
}
void Add(int listIndex, uint value){
if((list[listIndex] += value) < value){ // then overflow has ocurred
if(list.Count == listIndex + 1)
list.Add(0);
Add(listIndex + 1, 1);
}
}
double DivideBy(uint count){
const double shift = 4.0 * 1024 * 1024 * 1024;
double rtn = 0;
long remainder = 0;
for(int i = list.Count - 1; i >= 0; i--){
rtn *= shift;
remainder <<= 32;
rtn += Math.DivRem(remainder + list[i], count, out remainder);
}
rtn += remainder / (double)count;
return rtn;
}
}
Like I said, this is untested—I don't have a billion values I really want to average—so I've probably made a mistake or two, especially in the DivideBy function, but it should demonstrate the general idea.
This should provide as much accuracy as a double can represent and should work for any number of 32-bit elements, up to 232 - 1. If more elements are needed, then the count variable will need be expanded and the DivideBy function will increase in complexity, but I'll leave that as an exercise for the reader.
In terms of efficiency, it should be as fast or faster than any other technique here, as it only requires iterating through the list once, only performs one division operation (well, one set of them), and does most of its work with integers. I didn't optimize it, though, and I'm pretty certain it could be made slightly faster still if necessary. Ditching the recursive function call and list indexing would be a good start. Again, an exercise for the reader. The code is intended to be easy to understand.
If anybody more motivated than I am at the moment feels like verifying the correctness of the code, and fixing whatever problems there might be, please be my guest.
I've now tested this code, and made a couple of small corrections (a missing pair of parentheses in the List<uint> constructor call, and an incorrect divisor in the final division of the DivideBy function).
I tested it by first running it through 1000 sets of random length (ranging between 1 and 1000) filled with random integers (ranging between 0 and 232 - 1). These were sets for which I could easily and quickly verify accuracy by also running a canonical mean on them.
I then tested with 100* large series, with random length between 105 and 109. The lower and upper bounds of these series were also chosen at random, constrained so that the series would fit within the range of a 32-bit integer. For any series, the results are easily verifiable as (lowerbound + upperbound) / 2.
*Okay, that's a little white lie. I aborted the large-series test after about 20 or 30 successful runs. A series of length 109 takes just under a minute and a half to run on my machine, so half an hour or so of testing this routine was enough for my tastes.
For those interested, my test code is below:
static IEnumerable<uint> GetSeries(uint lowerbound, uint upperbound){
for(uint i = lowerbound; i <= upperbound; i++)
yield return i;
}
static void Test(){
Console.BufferHeight = 1200;
Random rnd = new Random();
for(int i = 0; i < 1000; i++){
uint[] numbers = new uint[rnd.Next(1, 1000)];
for(int j = 0; j < numbers.Length; j++)
numbers[j] = (uint)rnd.Next();
double sum = 0;
foreach(uint n in numbers)
sum += n;
double avg = sum / numbers.Length;
double ans = new BigMeanSet().GetAverage(numbers);
Console.WriteLine("{0}: {1} - {2} = {3}", numbers.Length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
for(int i = 0; i < 100; i++){
uint length = (uint)rnd.Next(100000, 1000000001);
uint lowerbound = (uint)rnd.Next(int.MaxValue - (int)length);
uint upperbound = lowerbound + length;
double avg = ((double)lowerbound + upperbound) / 2;
double ans = new BigMeanSet().GetAverage(GetSeries(lowerbound, upperbound));
Console.WriteLine("{0}: {1} - {2} = {3}", length, avg, ans, avg - ans);
if(avg != ans)
Debugger.Break();
}
}