How do I fix my output for floating-point imprecision?

How do I fix my output for floating-point imprecision? - actionscript-3

I am doing some float manipulation and end up with the following numbers:
-0.5
-0.4
-0.3000000000000000004
-0.2000000000000000004
-0.1000000000000000003
1.10E-16
0.1
0.2
0.30000000000000000004
0.4
0.5
The algorithm is the following:
var inc:Number = nextMultiple(min, stepSize);
trace(String(inc));
private function nextMultiple(x:Number, y:Number) {
return Math.ceil(x/y)*y;
}
I understand the fact the float cannot always be represented accurately in a byte. e.g 1/3. I also know my stepsize being 0.1. If I have the stepsize how could I get a proper output?
The strange thing is that its the first time I've encountered this type of problem.
Maybe I dont play with float enough.

A language agnostic solution would be to store your numbers as an integer number of steps, given that you know your step size, instead of as floats.
A non-language agnostic solution would be to find out what your language's implementation of printf is.
printf ("float: %.1f\n", number);

The limited floating point precision of binary numbers is your problem, as you recognize. One way around this is not to do floating point math. Translate the problem to integers, then translate back for the output.

Either use integers instead of a floating point type, or use a floating point type where the "point" is a decimal point (e.g. System.Decimal in .NET).

If you're using a language with a round function, you can use that.
Edit
In response to comments about rounding, here's a sample in c#:
float value = 1.0F;
for (int i = 0; i < 20; i++)
{
value -= 0.1F;
Console.WriteLine(Math.Round(value, 1).ToString() + " : " + value.ToString());
}
The results are:
0.9 : 0.9
0.8 : 0.8
0.7 : 0.6999999
0.6 : 0.5999999
(etc)
The rounding does resolve the precision problem. I'm not arguing that it's better than doing integer math and then dividing by 10, just that it works.

With your specific problem, count from -5 to 5 and divide by 10 before actually using the value for something.

I did the following,
var digitsNbr:Number = Math.abs(Math.ceil(((Math.log(stepSize) / Math.log(10))) + 1));
tickTxt.text = String(inc.toPrecision(digitsNbr));
Its not efficient but i dont have many steps.
======
I should just get the nbr of steps as an int and multiply by step ...

If you don't have printf, or if the steps are not just powers of 10 (e.g. if you want to round to the nearest 0.2) then it sounds like you want a quantizer:
q(x,u) = u*floor(x/u + 0.5);
"u" is the step size (0.1 in your case), floor() finds the greatest integer not greater than its input, and the "+ 0.5" is to round to the nearest integer.
So basically, you divide by the step size, round to the nearest integer, and multiply by the step size.
edit: oh, never mind, you're basically doing that anyway & the step where it's multiplying by u is introducing rounding error.

Simply scale the numbers to obtain integers then do maths and scale them back to floats for display:
//this will round to 3 decimal places
var NUM_SCALE = 1000;
function scaleUpNumber(n) {
return (Math.floor(n * NUM_SCALE));
}
function scaleDnNumber(n) {
return (n / NUM_SCALE);
}
var answ = scaleUpNumber(2.1) - scaleUpNumber(3.001);
alert(scaleDnNumber(answ)); // displays: -0.901
Change NUM_SCALE to increase/decrease decimap places
|/|ax

Your best bet is to use a Decimal data type if your language supports it. Decimals were added to a number of languages to combat this exact problem.

This is a bit counter-intuitive, but I tested it and it works (example in AS3):
var inc:Number = nextMultiple(min, stepSize);
trace(String(inc));
private function nextMultiple(x:Number, y:Number) {
return Math.ceil(x/y)*(y*10)/10;
}
So the only thing I added is multiplying y by 10, then dividing by 10. Not an universal solution but works with your stepSize.
[edit:] The logic here seems to be that you multiply by a big enough number so as for the last decimal digits to "drop off the scale", then divide again to get a rounded number. That said, the example above which uses Math.round() is more readable and better in the sense that the code explicitly says what will happen to the numbers passed in.

Related

How can I round an integer to the nearest 1000 in Pascal?

I've got a Integer variable in Pascal. Is there any possible function I can use that can round that value to the nearest 1000, for example:
RoundTo(variable, 1000);
Does anything of the sort exist? Or is there another method I should try using?
Thanks!

The general solution for this kind of problem is to scale before and after rounding, e.g.
y = 1000 * ROUND(x / 1000);

Use RoundTo(variable, 3).
The second parameter specifies the digits you want to round to. Since you want to round to 1000 = 103 you need to specifiy 3, not 1000.
The documentation for RoundTo says:
function RoundTo(const AValue: Extended; const ADigit: TRoundToEXRangeExtended): Extended;
Rounds a floating-point value to a specified digit or power of ten using "Banker's rounding".
ADigit indicates the power of ten to which you want AValue rounded. It can be any value from –37 to 37 (inclusive).
The following examples illustrate the use of RoundTo:
RoundTo(1234567, 3) = 1235000
(I left out parts not relevant to your question)
Side-note: RoundTo uses Banker's round, so RoundTo(500, 3) = 0 and RoundTo(1500, 3) = 2000.

x = 1000*(x/1000), or x = x - (x mod 1000)

Translation from Complex-FFT to Finite-Field-FFT

Good afternoon!
I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have.
Consider the following code (coefficients' length, let it be m, is an exact power of two):
/// <summary>
/// Calculates the result of the recursive Number Theoretic Transform.
/// </summary>
/// <param name="coefficients"></param>
/// <returns></returns>
private static BigInteger[] Recursive_NTT_Skeleton(
IList<BigInteger> coefficients,
IList<BigInteger> rootsOfUnity,
int step,
int offset)
{
// Calculate the length of vectors at the current step of recursion.
// -
int n = coefficients.Count / step - offset / step;
if (n == 1)
{
return new BigInteger[] { coefficients[offset] };
}
BigInteger[] results = new BigInteger[n];
IList<BigInteger> resultEvens =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset);
IList<BigInteger> resultOdds =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset + step);
for (int k = 0; k < n / 2; k++)
{
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
}
return results;
}
It worked for complex FFT (replace BigInteger with a complex numeric type (I had my own)). It doesn't work here even though I changed the procedure of finding the primitive roots of unity appropriately.
Supposedly, the problem is this: rootsOfUnity parameter passed originally contained only the first half of m-th complex roots of unity in this order:
omega^0 = 1, omega^1, omega^2, ..., omega^(n/2)
It was enough, because on these three lines of code:
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
I originally made use of the fact, that at any level of recursion (for any n and i), the complex root of unity -omega^(i) = omega^(i + n/2).
However, that property obviously doesn't hold in finite fields. But is there any analogue of it which would allow me to still compute only the first half of the roots?
Or should I extend the cycle from n/2 to n and pre-compute all the m-th roots of unity?
Maybe there are other problems with this code?..
Thank you very much in advance!

I recently wanted to implement NTT for fast multiplication instead of DFFT too. Read a lot of confusing things, different letters everywhere and no simple solution, and also my finite fields knowledge is rusty , but today i finally got it right (after 2 days of trying and analog-ing with DFT coefficients) so here are my insights for NTT:
Computation
X(i) = sum(j=0..n-1) of ( Wn^(i*j)*x(i) );
where X[] is NTT transformed x[] of size n where Wn is the NTT basis. All computations are on integer modulo arithmetics mod p no complex numbers anywhere.
Important values
Wn = r ^ L mod p is basis for NTT
Wn = r ^ (p-1-L) mod p is basis for INTT
Rn = n ^ (p-2) mod p is scaling multiplicative constant for INTT ~(1/n)
p is prime that p mod n == 1 and p>max'
max is max value of x[i] for NTT or X[i] for INTT
r = <1,p)
L = <1,p) and also divides p-1
r,L must be combined so r^(L*i) mod p == 1 if i=0 or i=n
r,L must be combined so r^(L*i) mod p != 1 if 0 < i < n
max' is the sub-result max value and depends on n and type of computation. For single (I)NTT it is max' = n*max but for convolution of two n sized vectors it is max' = n*max*max etc. See Implementing FFT over finite fields for more info about it.
functional combination of r,L,p is different for different n
this is important, you have to recompute or select parameters from table before each NTT layer (n is always half of the previous recursion).
Here is my C++ code that finds the r,L,p parameters (needs modular arithmetics which is not included, you can replace it with (a+b)%c,(a-b)%c,(a*b)%c,... but in that case beware of overflows especial for modpow and modmul) The code is not optimized yet there are ways to speed it up considerably. Also prime table is fairly limited so either use SoE or any other algo to obtain primes up to max' in order to work safely.
DWORD _arithmetics_primes[]=
{
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,
179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,
419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,
661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997,1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093,1097,1103,1109,1117,1123,1129,1151,
0}; // end of table is 0, the more primes are there the bigger numbers and n can be used
// compute NTT consts W=r^L%p for n
int i,j,k,n=16;
long w,W,iW,p,r,L,l,e;
long max=81*n; // edit1: max num for NTT for my multiplication purposses
for (e=1,j=0;e;j++) // find prime p that p%n=1 AND p>max ... 9*9=81
{
p=_arithmetics_primes[j];
if (!p) break;
if ((p>max)&&(p%n==1))
for (r=2;r<p;r++) // check all r
{
for (l=1;l<p;l++)// all l that divide p-1
{
L=(p-1);
if (L%l!=0) continue;
L/=l;
W=modpow(r,L,p);
e=0;
for (w=1,i=0;i<=n;i++,w=modmul(w,W,p))
{
if ((i==0) &&(w!=1)) { e=1; break; }
if ((i==n) &&(w!=1)) { e=1; break; }
if ((i>0)&&(i<n)&&(w==1)) { e=1; break; }
}
if (!e) break;
}
if (!e) break;
}
}
if (e) { error; } // error no combination r,l,p for n found
W=modpow(r, L,p); // Wn for NTT
iW=modpow(r,p-1-L,p); // Wn for INTT
and here is my slow NTT and INTT implementations (i havent got to fast NTT,INTT yet) they are both tested with Schönhage–Strassen multiplication successfully.
//---------------------------------------------------------------------------
void NTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wj,wi,a,n2=n>>1;
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=a;
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
void INTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wi=1,wj=1,rN,a,n2=n>>1;
rN=modpow(n,m-2,m);
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=modmul(a,rN,m);
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
dst is destination array
src is source array
n is array size
m is modulus (p)
w is basis (Wn)
hope this helps to someone. If i forgot something please write ...
[edit1: fast NTT/INTT]
Finally I manage to get fast NTT/INTT to work. Was little bit more tricky than normal FFT:
//---------------------------------------------------------------------------
void _NFTT(long *dst,long *src,long n,long m,long w)
{
if (n<=1) { if (n==1) dst[0]=src[0]; return; }
long i,j,a0,a1,n2=n>>1,w2=modmul(w,w,m);
// reorder even,odd
for (i=0,j=0;i<n2;i++,j+=2) dst[i]=src[j];
for ( j=1;i<n ;i++,j+=2) dst[i]=src[j];
// recursion
_NFTT(src ,dst ,n2,m,w2); // even
_NFTT(src+n2,dst+n2,n2,m,w2); // odd
// restore results
for (w2=1,i=0,j=n2;i<n2;i++,j++,w2=modmul(w2,w,m))
{
a0=src[i];
a1=modmul(src[j],w2,m);
dst[i]=modadd(a0,a1,m);
dst[j]=modsub(a0,a1,m);
}
}
//---------------------------------------------------------------------------
void _INFTT(long *dst,long *src,long n,long m,long w)
{
long i,rN;
rN=modpow(n,m-2,m);
_NFTT(dst,src,n,m,w);
for (i=0;i<n;i++) dst[i]=modmul(dst[i],rN,m);
}
//---------------------------------------------------------------------------
[edit3]
I have optimized my code (3x times faster than code above),but still i am not satisfied with it so i started new question with it. There I have optimized my code even further (about 40x times faster than code above) so its almost the same speed as FFT on floating point of the same bit size. Link to it is here:
Modular arithmetics and NTT (finite field DFT) optimizations

To turn Cooley-Tukey (complex) FFT into modular arithmetic approach, i.e. NTT, you must replace complex definition for omega. For the approach to be purely recursive, you also need to recalculate omega for each level based on current signal size. This is possible because min. suitable modulus decreases as we move down in the call tree, so modulus used for root is suitable for lower layers. Additionally, as we are using same modulus, the same generator may be used as we move down the call tree. Also, for inverse transform, you should take additional step to take recalculated omega a and instead use as omega: b = a ^ -1 (via using inverse modulo operation). Specifically, b = invMod(a, N) s.t. b * a == 1 (mod N), where N is the chosen prime modulus.
Rewriting an expression involving omega by exploiting periodicity still works in modular arithmetic realm. You also need to find a way to determine the modulus (a prime) for the problem and a valid generator.
We note that your code works, though it is not a MWE. We extended it using common sense, and got correct result for a polynomial multiplication application. You just have to provide correct values of omega raised to certain powers.
While your code works, though, like from many other sources, you double spacing for each level. This does not lead to recursion that is as clean, though; this turns out to be identical to recalculating omega based on current signal size because the power for omega definition is inversely proportional to signal size. To reiterate: halving signal size is like squaring omega, which is like giving doubled powers for omega (which is what one would do for doubling of spacing). The nice thing about the approach that deals with recalculating of omega is that each subproblem is more cleanly complete in its own right.
There is a paper that shows some of the math for modular approach; it is a paper by Baktir and Sunar from 2006. See the paper at the end of this post.
You do not need to extend the cycle from n / 2 to n.
So, yes, some sources which say to just drop in a different omega definition for modular arithmetic approach are sweeping under the rug many details.
Another issue is that it is important to acknowledge that the signal size must be large enough if we are to not have overflow for result time-domain signal if we are performing convolution. Additionally, it may be useful to find certain implementations for exponentiation subject to modulus exist that are fast, even if the power is quite large.
References
Baktir and Sunar - Achieving efficient polynomial multiplication in Fermat fields using the fast Fourier transform (2006)

You must make sure that roots of unity actually exist. In R there are only 2 roots of unity: 1 and -1, since only for them x^n=1 can be true.
In C you have infinitely many roots of unity: w=exp(2*pi*i/N) is a primitive N-th roots of unity and all w^k for 0<=k
Now to your problem: you have to make sure the ring you're working in offers the same property: enough roots of unity.
Schönhage and Strassen (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm) use integers modulo 2^N+1. This ring has enough roots of unity. 2^N == -1 is a 2nd root of unity, 2^(N/2) is a 4th root of unity and so on. Furthermore, these roots of unity have the advantage that they are powers of two and can be implemented as binary shifts (with a modulo operation afterwards, which comes down to a add/subtract).
I think QuickMul (http://www.cs.nyu.edu/exact/doc/qmul.ps) works modulo 2^N-1.

Prime's notation in RFC3526 about Diffie Hellman Key Exchange

In RFC3526, the prime is presented by
2^1536 - 2^1472 - 1 + 2^64 * { [2^1406 pi] + 741804 } form
I wanna know the "[2^1406 pi]" part means what
THKS

The [] notation represents the greatest integer or floor function. Therefore [2^1406 pi] means multiply pi (3.141...) by 2^1406 and throw away the fractional part. Enter Floor[2^1406 * Pi] into Wolfram Alpha to see the result.

round to arbitrary divisions?

I want to round according to arbitrary divisions, e.g. I get a number from 0 to 1 and I want to round it according to divisions into 48ths, e.g. if I get something like 5/96, i want either 2/48 or 3/48. What's a good formula to do that?

The same as with rounding to decimal signs. If you have n 'divisions' and number x, do round(x*n)/n.
In your example, it'll be round((5/96)*48)/48 = round(2.5)/48 = 2/48
round can be replaced with floor or ceil, depending on the direction you want.

Assumes integer division, so it will round down:
float retVal = scaleRange(in, oldMin, oldMax, newMin, newMax) {
return (in / ((oldMax - oldMin) / (48 - 0))) + 0;
}

Howto convert decimal (xx.xx) to binary

This isn't necessarily a programming question but i'm sure you folks know how to do it. How would i convert floating point numbers into binary.
The number i am looking at is 27.625.
27 would be 11011, but what do i do with the .625?

On paper, a good algorithm to convert the fractional part of a decimal number is the "repeated multiplication by 2" algorithm (see details at http://www.exploringbinary.com/base-conversion-in-php-using-bcmath/, under the heading "dec2bin_f()"). For example, 0.8125 converts to binary as follows:
1. 0.8125 * 2 = 1.625
2. 0.625 * 2 = 1.25
3. 0.25 * 2 = 0.5
4. 0.5 * 2 = 1.0
The integer parts are stripped off and saved at each step, forming the binary result: 0.1101.
If you want a tool to do these kinds of conversions automatically, see my decimal/binary converter.

Assuming you are not thinking about inside a PC, just thinking about binary vs decimal as physically represented on a piece of paper:
You know .1 in binary should be .5 in decimal, so the .1's place is worth .5 (1/2)
the .01 is worth .25 (1/4) (half of the previous one)
the .001 is worth (1/8) (Half of 1/4)
Notice how the denominator is progressing just like the whole numbers to the left of the decimal do--standard ^2 pattern? The next should be 1/16...
So you start with your .625, is it higher than .5? Yes, so set the first bit and subtract the .5
.1 binary with a decimal remainder of .125
Now you have the next spot, it's worth .25dec, is that less than your current remainder of .125? No, so you don't have enough decimal "Money" to buy that second spot, it has to be a 0
.10 binary, still .125 remainder.
Now go to the third position, etc. (Hint: I don't think there will be too much etc.)

There are several different ways to encode a non-integral number in binary. By far the most common type are floating point representations, especially the one codified in IEEE 754.

the code works for me is as below , you can use this code to convert any type of dobule values:
private static String doubleToBinaryString( double n ) {
String val = Integer.toBinaryString((int)n)+"."; // Setting up string for result
String newN ="0" + (""+n).substring((""+n).indexOf("."));
n = Double.parseDouble(newN);
while ( n > 0 ) { // While the fraction is greater than zero (not equal or less than zero)
double r = n * 2; // Multiply current fraction (n) by 2
if( r >= 1 ) { // If the ones-place digit >= 1
val += "1"; // Concat a "1" to the end of the result string (val)
n = r - 1; // Remove the 1 from the current fraction (n)
}else{ // If the ones-place digit == 0
val += "0"; // Concat a "0" to the end of the result string (val)
n = r; // Set the current fraction (n) to the new fraction
}
}
return val; // return the string result with all appended binary values
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008