NaN and +-INF in floating point number system following IEEE754

NaN and +-INF in floating point number system following IEEE754 - language-agnostic

In the standard, representation of NaN and INF is like this:
For NaN: exponent = emax+1 & mantissa != 0;
For INF: exponent = emax+1 & mantissa = 0;
Their are many ways and calculations resulting these two value.
But what ACTUALLY is NaN(INF)?
And HOW does the system "decide" or "judge" to store value as these one(two)?
Here may be a case seeming to be odd to me:
a = b = 1*2(emax);
then calculating c = a+b, the actual result is 1*2^(emax+1);
Now, c is not an available FP value according to the standard;
then how does the system store c in device?
Is it NaN?
If yes, how can this be even reasonable?
I mean, 1*2^(emax+1) IS(Should be) a Number...in a common sense...?
If this is the case, then how ACTUALLY does the standard think what a NaN is?
If not, then how do we deal with this???
I'm considering one like this:
let eM = emax+1;
then 1d.d...d * 2^(eM-1) = 1d.d...d * 2^(emax)
with 1d.d...d having legal number of digits by the system.
This is actually a way like that dealing with denormalized number.
The thing here is this:
Is the judgement posterior or prior to the completion of calculation?
If it's the former, the above may be a problem or not?
On the other hand, then the task seems undonable...
Is there anyone ever thinking about this issue?
Thx for considering it!!
Note: things for +-INF are also presented.

From Wikipedia:
The five possible exceptions are:
Invalid operation (e.g., square root of a negative number) (returns qNaN by default).
Division by zero (an operation on finite operands gives an exact infinite result, e.g., 1/0 or log(0)) (returns ±infinity by default).
Overflow (a result is too large to be represented correctly) (returns ±infinity by default (for round-to-nearest mode)).
Underflow (a result is very small (outside the normal range) and is inexact) (returns a denormalized value by default).
...

Related

Erlang cowboy reply json data , float number precision is wrong?

code is here :
RstJson = rfc4627:encode({obj, [{"age", 45.99}]}),
{ok, Req3} = cowboy_req:reply(200, [{<<"Content-Type">>, <<"application/json;charset=UTF-8">>}], RstJson, Req2)
then I get this wrong data from front client:
{
"age": 45.990000000000002
}
the float number precision is changed !
how can I solved this problem?

Let's have a look at the JSON that rfc4627 generates:
> io:format("~s~n", [rfc4627:encode({obj, [{"age", 45.99}]})]).
{"age":4.59900000000000019895e+01}
It turns out that rfc4627 encodes floating-point values by calling float_to_list/1, which uses "scientific" notation with 20 digits of precision. As Per Hedeland noted on the erlang-questions mailing list in November 2007, that's an odd choice:
A reasonable question could be why float_to_list/1 generates 20 digits
when a 64-bit float (a.k.a. C double), which is what is used internally,
only can hold 15-16 worth of them - I don't know off-hand what a 128-bit
float would have, but presumably significantly more than 20, so it's not
that either. I guess way back in the dark ages, someone thought that 20
was a nice and even number (I hope it wasn't me:-). The 6.30000 form is
of course just the ~p/~w formatting.
However, it turns out this is actually not the problem! In fact, 45.990000000000002 is equal to 45.99, so you do get the correct value in the front end:
> 45.990000000000002 =:= 45.99.
true
As noted above, a 64-bit float can hold 15 or 16 significant digits, but 45.990000000000002 contains 17 digits (count them!). It looks like your front end tries to print the number with more precision than it actually contains, thereby making the number look different even though it is in fact the same number.
The answers to the question Is floating point math broken? go into much more detail about why this actually makes sense, given how computers handle floating point values.

the encode float number function in rfc4627 is :
encode_number(Num, Acc) when is_float(Num) ->
lists:reverse(float_to_list(Num), Acc).
I changed it like this :
encode_number(Num, Acc) when is_float(Num) ->
lists:reverse(io_lib:format("~p",[Num]), Acc).
Problem Solved.

Is there a "native" way to convert from numbers to dB in Tcl

dB or decibel is a unit that is used to show ratio in logarithmic scale, and specifecly, the definition of dB that I'm interested in is X(dB) = 20log(x) where x is the "normal" value, and X(dB) is the value in dB. When wrote a code converted between mil. and mm, I noticed that if I use the direct approach, i.e., multiplying by the ratio between the units, I got small errors on the opposite conversion, i.e.: to_mil [to_mm val_in_mil] wasn't equal to val_in_mil and the same with mm. The library units has solved this problem, as the conversions done by it do not have that calculation error. But the specifically doesn't offer (or I didn't find) the option to convert a number to dB in the library.
Is there another library / command that can transform numbers to dB and dB to numbers without calculation errors?
I did an experiment with using the direct math conversion, and I what I got is:
>> set a 0.005
0.005
>> set b [expr {20*log10($a)}]
-46.0205999133
>> expr {pow(10,($b/20))}
0.00499999999999

It's all a matter of precision. We often tend to forget that floating point numbers are not real numbers (in the mathematical sense of ℝ).
How many decimal digit do you need?
If you, for example, would only need 5 decimal digits, rounding 0.00499999999999 will give you 0.00500 which is what you wanted.
Since rounding fp numbers is not an easy task and may generate even more troubles, you might just change the way you determine if two numbers are equal:
>> set a 0.005
0.005
>> set b [expr {20*log10($a)}]
-46.0205999133
>> set c [expr {pow(10,($b/20))}]
0.00499999999999
>> expr {abs($a - $c) < 1E-10}
1
>> expr {abs($a - $c) < 1E-20}
0
>> expr {$a - $c}
8.673617379884035e-19
The numbers in your examples can be considered "equal" up to an error or 10-18. Note that this is just a rough estimate, not a full solution.
If you're really dealing with problems that are sensitive to numerical errors propagation you might look deeper into "numerical analysis". The article What Every Computer Scientist Should Know About Floating-Point Arithmetic or, even better, this site: http://floating-point-gui.de might be a start.
In case you need a larger precision you should drop your "native" requirement.
You may use the BigFloat offered by tcllib (http://tcllib.sourceforge.net/doc/bigfloat.html or even use GMP (the GNU multiple precision arithmetic library) through ffidl (http://elf.org/ffidl). There's an interface already defined for it: gmp.tcl

With the way floating point numbers are stored, every log10(...) can't correspond to exactly one pow(10, ...). So you lose precision, just like the integer divisions 89/7 and 88/7 both are 12.
When you put a value into floating point format, you should forget the ability to know it's exact value anymore unless you keep the old, exact value too. If you want exactly 1/200, store it as the integer 1 and the integer 200. If you want exactly the ten-logarithm of 1/200, store it as 1, 200 and the info that a ten-logarithm has been done on it.
You can fill your entire memory with the first x decimal digits of the square root of 2, but it still won't be the square root of 2 you store.

Why does division by zero in IEEE754 standard results in Infinite value?

I'm just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It's a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN.
Function f(x) = 1/x is not defined when x=0, if x is a real number. For example, function sqrt is not defined for any negative number and sqrt(-1.0f) if IEEE-754 produces a NaN value. But 1.0f/0 is Inf.
But for some reason this is not the case in IEEE-754. There must be a reason for this, maybe some optimization or compatibility reasons.
So what's the point?

It's a nonsense from the mathematical perspective.
Yes. No. Sort of.
The thing is: Floating-point numbers are approximations. You want to use a wide range of exponents and a limited number of digits and get results which are not completely wrong. :)
The idea behind IEEE-754 is that every operation could trigger "traps" which indicate possible problems. They are
Illegal (senseless operation like sqrt of negative number)
Overflow (too big)
Underflow (too small)
Division by zero (The thing you do not like)
Inexact (This operation may give you wrong results because you are losing precision)
Now many people like scientists and engineers do not want to be bothered with writing trap routines. So Kahan, the inventor of IEEE-754, decided that every operation should also return a sensible default value if no trap routines exist.
They are
NaN for illegal values
signed infinities for Overflow
signed zeroes for Underflow
NaN for indeterminate results (0/0) and infinities for (x/0 x != 0)
normal operation result for Inexact
The thing is that in 99% of all cases zeroes are caused by underflow and therefore in 99%
of all times Infinity is "correct" even if wrong from a mathematical perspective.

I'm not sure why you would believe this to be nonsense.
The simplistic definition of a / b, at least for non-zero b, is the unique number of bs that has to be subtracted from a before you get to zero.
Expanding that to the case where b can be zero, the number that has to be subtracted from any non-zero number to get to zero is indeed infinite, because you'll never get to zero.
Another way to look at it is to talk in terms of limits. As a positive number n approaches zero, the expression 1 / n approaches "infinity". You'll notice I've quoted that word because I'm a firm believer in not propagating the delusion that infinity is actually a concrete number :-)
NaN is reserved for situations where the number cannot be represented (even approximately) by any other value (including the infinities), it is considered distinct from all those other values.
For example, 0 / 0 (using our simplistic definition above) can have any amount of bs subtracted from a to reach 0. Hence the result is indeterminate - it could be 1, 7, 42, 3.14159 or any other value.
Similarly things like the square root of a negative number, which has no value in the real plane used by IEEE754 (you have to go to the complex plane for that), cannot be represented.

In mathematics, division by zero is undefined because zero has no sign, therefore two results are equally possible, and exclusive: negative infinity or positive infinity (but not both).
In (most) computing, 0.0 has a sign. Therefore we know what direction we are approaching from, and what sign infinity would have. This is especially true when 0.0 represents a non-zero value too small to be expressed by the system, as it frequently the case.
The only time NaN would be appropriate is if the system knows with certainty that the denominator is truly, exactly zero. And it can't unless there is a special way to designate that, which would add overhead.

NOTE:
I re-wrote this following a valuable comment from #Cubic.
I think the correct answer to this has to come from calculus and the notion of limits. Consider the limit of f(x)/g(x) as x->0 under the assumption that g(0) == 0. There are two broad cases that are interesting here:
If f(0) != 0, then the limit as x->0 is either plus or minus infinity, or it's undefined. If g(x) takes both signs in the neighborhood of x==0, then the limit is undefined (left and right limits don't agree). If g(x) has only one sign near 0, however, the limit will be defined and be either positive or negative infinity. More on this later.
If f(0) == 0 as well, then the limit can be anything, including positive infinity, negative infinity, a finite number, or undefined.
In the second case, generally speaking, you cannot say anything at all. Arguably, in the second case NaN is the only viable answer.
Now in the first case, why choose one particular sign when either is possible or it might be undefined? As a practical matter, it gives you more flexibility in cases where you do know something about the sign of the denominator, at relatively little cost in the cases where you don't. You may have a formula, for example, where you know analytically that g(x) >= 0 for all x, say, for example, g(x) = x*x. In that case the limit is defined and it's infinity with sign equal to the sign of f(0). You might want to take advantage of that as a convenience in your code. In other cases, where you don't know anything about the sign of g, you cannot generally take advantage of it, but the cost here is just that you need to trap for a few extra cases - positive and negative infinity - in addition to NaN if you want to fully error check your code. There is some price there, but it's not large compared to the flexibility gained in other cases.
Why worry about general functions when the question was about "simple division"? One common reason is that if you're computing your numerator and denominator through other arithmetic operations, you accumulate round-off errors. The presence of those errors can be abstracted into the general formula format shown above. For example f(x) = x + e, where x is the analytically correct, exact answer, e represents the error from round-off, and f(x) is the floating point number that you actually have on the machine at execution.

numerical issues causing the difference in outputs of two programs?

I have two codes that theoretically should return the exact same output. However, this does not happen. The issue is that the two codes handle very small numbers (doubles) to the order of 1e-100 or so. I suspect that there could be some numerical issues which are related to that, and lead to the two outputs being different even though they should be theoretically the same.
Does it indeed make sense that handling numbers on the order of 1e-100 cause such problems? I don't mind the difference in output, if I could safely assume that the source is numerical issues. Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
Thanks.

Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
The first reference that comes to mind is What Every Computer Scientist Should Know About Floating-Point Arithmetic. It covers floating-point maths in general.
As far as numerical stability is concerned, the best references probably depend on the numerical algorithm in question. Two wide-ranging works that come to mind are:
Numerical Recipes by Press et al;
Matrix Computations by Golub and Van Loan.

It is not necessarily the small numbers that are causing the problem.
How do you check whether the outputs are the "exact same"?
I would check equality with tolerance. You may consider the floating point numbers x and y equal if either fabs(x-y) < 1.0e-6 or fabs(x-y) < fabs(x)*1.0e-6 holds.
Usually, there is a HUGE difference between the two algorithms if there are numerical issues. Often, a small change in the input may result in extreme changes in the output, if the algorithm suffers from numerical issues.
What makes you think that there are "numerical issues"?

If possible, change your algorithm to use Kahan Summation (aka compensated summation). From Wikipedia:
function KahanSum(input)
var sum = 0.0
var c = 0.0 //A running compensation for lost low-order bits.
for i = 1 to input.length do
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
//Next time around, the lost low part will be added to y in a fresh attempt.
return sum
This works by keeping a second running total of the cumulative error, similar to the Bresenham line drawing algorithm. The end result is that you get precision that is nearly double the data type's advertised precision.
Another technique I use is to sort my numbers from small to large (by manitude, ignoring sign) and add or subtract the small numbers first, then the larger ones. This has the virtue that if you add and subtract the same value multiple times, such numbers may cancel exactly and can be removed from the list.

What are "magic numbers" in computer programming?

When people talk about the use of "magic numbers" in computer programming, what do they mean?

Magic numbers are any number in your code that isn't immediately obvious to someone with very little knowledge.
For example, the following piece of code:
sz = sz + 729;
has a magic number in it and would be far better written as:
sz = sz + CAPACITY_INCREMENT;
Some extreme views state that you should never have any numbers in your code except -1, 0 and 1 but I prefer a somewhat less dogmatic view since I would instantly recognise 24, 1440, 86400, 3.1415, 2.71828 and 1.414 - it all depends on your knowledge.
However, even though I know there are 1440 minutes in a day, I would probably still use a MINS_PER_DAY identifier since it makes searching for them that much easier. Whose to say that the capacity increment mentioned above wouldn't also be 1440 and you end up changing the wrong value? This is especially true for the low numbers: the chance of dual use of 37197 is relatively low, the chance of using 5 for multiple things is pretty high.
Use of an identifier means that you wouldn't have to go through all your 700 source files and change 729 to 730 when the capacity increment changed. You could just change the one line:
#define CAPACITY_INCREMENT 729
to:
#define CAPACITY_INCREMENT 730
and recompile the lot.
Contrast this with magic constants which are the result of naive people thinking that just because they remove the actual numbers from their code, they can change:
x = x + 4;
to:
#define FOUR 4
x = x + FOUR;
That adds absolutely zero extra information to your code and is a total waste of time.

"magic numbers" are numbers that appear in statements like
if days == 365
Assuming you didn't know there were 365 days in a year, you'd find this statement meaningless. Thus, it's good practice to assign all "magic" numbers (numbers that have some kind of significance in your program) to a constant,
DAYS_IN_A_YEAR = 365
And from then on, compare to that instead. It's easier to read, and if the earth ever gets knocked out of alignment, and we gain an extra day... you can easily change it (other numbers might be more likely to change).

There's more than one meaning. The one given by most answers already (an arbitrary unnamed number) is a very common one, and the only thing I'll say about that is that some people go to the extreme of defining...
#define ZERO 0
#define ONE 1
If you do this, I will hunt you down and show no mercy.
Another kind of magic number, though, is used in file formats. It's just a value included as typically the first thing in the file which helps identify the file format, the version of the file format and/or the endian-ness of the particular file.
For example, you might have a magic number of 0x12345678. If you see that magic number, it's a fair guess you're seeing a file of the correct format. If you see, on the other hand, 0x78563412, it's a fair guess that you're seeing an endian-swapped version of the same file format.
The term "magic number" gets abused a bit, though, referring to almost anything that identifies a file format - including quite long ASCII strings in the header.
http://en.wikipedia.org/wiki/File_format#Magic_number

Wikipedia is your friend (Magic Number article)

Most of the answers so far have described a magic number as a constant that isn't self describing. Being a little bit of an "old-school" programmer myself, back in the day we described magic numbers as being any constant that is being assigned some special purpose that influences the behaviour of the code. For example, the number 999999 or MAX_INT or something else completely arbitrary.
The big problem with magic numbers is that their purpose can easily be forgotten, or the value used in another perfectly reasonable context.
As a crude and terribly contrived example:
while (int i != 99999)
{
DoSomeCleverCalculationBasedOnTheValueOf(i);
if (escapeConditionReached)
{
i = 99999;
}
}
The fact that a constant is used or not named isn't really the issue. In the case of my awful example, the value influences behaviour, but what if we need to change the value of "i" while looping?
Clearly in the example above, you don't NEED a magic number to exit the loop. You could replace it with a break statement, and that is the real issue with magic numbers, that they are a lazy approach to coding, and without fail can always be replaced by something less prone to either failure, or to losing meaning over time.

Anything that doesn't have a readily apparent meaning to anyone but the application itself.
if (foo == 3) {
// do something
} else if (foo == 4) {
// delete all users
}

Magic numbers are special value of certain variables which causes the program to behave in an special manner.
For example, a communication library might take a Timeout parameter and it can define the magic number "-1" for indicating infinite timeout.

The term magic number is usually used to describe some numeric constant in code. The number appears without any further description and thus its meaning is esoteric.
The use of magic numbers can be avoided by using named constants.

Using numbers in calculations other than 0 or 1 that aren't defined by some identifier or variable (which not only makes the number easy to change in several places by changing it in one place, but also makes it clear to the reader what the number is for).

In simple and true words, a magic number is a three-digit number, whose sum of the squares of the first two digits is equal to the third one.
Ex-202,
as, 2*2 + 0*0 = 2*2.
Now, WAP in java to accept an integer and print whether is a magic number or not.

It may seem a bit banal, but there IS at least one real magic number in every programming language.
0
I argue that it is THE magic wand to rule them all in virtually every programmer's quiver of magic wands.
FALSE is inevitably 0
TRUE is not(FALSE), but not necessarily 1! Could be -1 (0xFFFF)
NULL is inevitably 0 (the pointer)
And most compilers allow it unless their typechecking is utterly rabid.
0 is the base index of array elements, except in languages that are so antiquated that the base index is '1'. One can then conveniently code for(i = 0; i < 32; i++), and expect that 'i' will start at the base (0), and increment to, and stop at 32-1... the 32nd member of an array, or whatever.
0 is the end of many programming language strings. The "stop here" value.
0 is likewise built into the X86 instructions to 'move strings efficiently'. Saves many microseconds.
0 is often used by programmers to indicate that "nothing went wrong" in a routine's execution. It is the "not-an-exception" code value. One can use it to indicate the lack of thrown exceptions.
Zero is the answer most often given by programmers to the amount of work it would take to do something completely trivial, like change the color of the active cell to purple instead of bright pink. "Zero, man, just like zero!"
0 is the count of bugs in a program that we aspire to achieve. 0 exceptions unaccounted for, 0 loops unterminated, 0 recursion pathways that cannot be actually taken. 0 is the asymptote that we're trying to achieve in programming labor, girlfriend (or boyfriend) "issues", lousy restaurant experiences and general idiosyncracies of one's car.
Yes, 0 is a magic number indeed. FAR more magic than any other value. Nothing ... ahem, comes close.
rlynch#datalyser.com

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008