I'm having trouble with the precision of constant numerics in Fortran.
Do I need to write every 0.1 as 0.1d0 to have double precision? I know the compiler has a flag such as -fdefault-real-8 in gfortran that solves this kind of problem. Would it be a portable and reliable way to do? And how could I check if the flag option actually works for my code?
I was using F2py to call Fortran code in my Python code, and it doesn't report an error even if I give an unspecified flag, and that's what's worrying me.
In a Fortran program 1.0 is always a default real literal constant and 1.0d0 always a double precision literal constant.
However, "double precision" means different things in different contexts.
In Fortran contexts "double precision" refers to a particular kind of real which has greater precision than the default real kind. In more general communication "double precision" is often taken to mean a particular real kind of 64 bits which matches the IEEE floating point specification.
gfortran's compiler flag -fdefault-real-8 means that the default real takes 8 bytes and is likely to be that which the compiler would use to represent IEEE double precision.
So, 1.0 is a default real literal constant, not a double precision literal constant, but a default real may happen to be the same as an IEEE double precision.
Questions like this one reflect implications of precision in literal constants. To anyone who asked my advice about flags like -fdefault-real-8 I would say to avoid them.
Adding to #francescalus's response above, in my opinion, since the double precision definition can change across different platforms and compilers, it is a good practice to explicitly declare the desired kind of the constant using the standard Fortran convention, like the following example:
program test
use, intrinsic :: iso_fortran_env, only: RK => real64
implicit none
write(*,"(*(g20.15))") "real64: ", 2._RK / 3._RK
write(*,"(*(g20.15))") "double precision: ", 2.d0 / 3.d0
write(*,"(*(g20.15))") "single precision: ", 2.e0 / 3.e0
end program test
Compiling this code with gfortran gives:
$gfortran -std=gnu *.f95 -o main
$main
real64: .666666666666667
double precision: .666666666666667
single precision: .666666686534882
Here, the results in the first two lines (explicit request for 64-bit real kind, and double precision kind) are the same. However, in general, this may not be the case and the double precision result could depend on the compiler flags or the hardware, whereas the real64 kind will always conform to 64-bit real kind computation, regardless of the default real kind.
Now consider another scenario where one has declared a real variable to be of kind 64-bit, however, the numerical computation is done in 32-bit precision,
program test
use, intrinsic :: iso_fortran_env, only: RK => real64
implicit none
real(RK) :: real_64
real_64 = 2.e0 / 3.e0
write(*,"(*(g30.15))") "32-bit accuracy is returned: ", real_64
real_64 = 2._RK / 3._RK
write(*,"(*(g30.15))") "64-bit accuracy is returned: ", real_64
end program test
which gives the following output,
$gfortran -std=gnu *.f95 -o main
$main
32-bit accuracy is returned: 0.666666686534882
64-bit accuracy is returned: 0.666666666666667
Even though the variable is declared as real64, the results in the first line are still wrong, in the sense that they do not conform to double precision kind (64-bit that you desire). The reason is that the computations are first done in the requested (default 32-bit) precision of the literal constants and then stored in the 64-bit variable real_64, hence, getting a different result from the more accurate answer on the second line in the output.
So the bottom-line message is: It is always a good practice to explicitly declare the kind of the literal constants in Fortran using the "underscore" convention.
The answer to your question is : Yes you do need to indicate the constant is double precision. Using 0.1 is a common example of this problem, as the 4-byte and 8-byte representations are different. Other constants (eg 0.5) where the extended precision bytes are all zero don't have this problem.
This was introduced into Fortran at F90 and has caused problems for conversion and reuse of many legacy FORTRAN codes. Prior to F90, the result of double precision a = 0.1 could have used a real 0.1 or double 0.1 constant, although all compilers I used provided a double precision value. This can be a common source of inconsistent results when testing legacy codes with published results. Examples are frequently reported, eg PI=3.141592654 was in code on a forum this week.
However, using 0.1 as a subroutine argument has always caused problems, as this would be transferred as a real constant.
So given the history of how real constants have been handled, you do need to explicitly specify a double precision constant when it is required. It is not a user friendly approach.
Related
How to increase the precision in tcl.
I am getting b2 below as -0.000001 whereas the actual value is -7.95553e-007
set b2 [lindex $b1 0]
I tried "set tcl_precision 12" but it did not change anything
Tcl these days uses a floating point rendering system that means by default it never loses any precision at all when a double-precision floating point number is automatically converted to a string and back, while simultaneously using the minimum number of decimal digits in the string. It has had this code since Tcl 8.5 and uses it whenever the tcl_precision global variable is set to its default value (0 these days). In the future, this may well become a hard-core default, but I don't think it has done so yet.
Older versions of Tcl (all currently unsupported) instead used that tcl_precision global to control the number of decimal digits used; setting it to non-zero values still has that effect for backward compatibility. The old default value was 15, which usually did the right thing, but 17 ensures that no information is ever lost, even in tricky edge cases, but at a cost of often producing effectively noise digits at the end. (That is a consequence of the differences between arithmetic in base-2 and base-10, and are properly common to all languages that use IEEE binary floating point math.)
If you want to use a definite number of decimal digits after the point because you are producing output for human consumption, you should use the format command.
format %.5f 1.23; # >>> 1.23000
I understand what a datatype is (intuitively). But I need the formal definition. I don't understand if it is a set or it's the names 'int' 'float' etc. The formal definition found on wikipedia is confusing.
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
Can anyone help me with that?
Yep. What that's saying is that a data type has three pieces:
The various possible values. So, for example, an eight bit signed integer might have -127..128. This of that as a set of values V.
The operations: so an 8-bit signed integer might have +, -, * (multiply), and / (divide). The full definition would define those as functions from V into V, or possible as a function from V into float for division.
The way it's stored -- I sort of gave it away when I said "eight bit signed integer". The other detail is that I'm assuming a specific representation by the way I showed the range of values.
You might, if you're into object oriented programming, notice that this is very much like the definition of a class, which is defined by the storage used by each object, adn the methods of the class. Providing those parts for some arbitrary thing, but not inheritance rules, gives you what's called an abstract data type.
Update
#Appy, there's some room for differences in the formalities. I was a little subtle because it was late and I was suddenly uncertain if I'd assumed one's complement or two's complement -- of course it's two's complement. So interpretation is included in my description. Abstractly, though, you'd say it is a algebraic structure T=(V,O) where V is a set of values, O a set of functions from V into some arbitrary type -- remember '==' for example will be a function eq:V × V → {0,1} so you can't expect every operation to be into V.
I can define it as a classification of a particular type of information. It is easy for humans to distinguish between different types of data. We can usually tell at a glance whether a number is a percentage, a time, or an amount of money. We do this through special symbols %, :, and $.
Basically it's the concept that I am sure you grock. For computers however a data type is defined and has various associated attributes, like size, like a definition keywork (sometimes), the values it can take (numbers or characters for example) and operations that can be done on it like add subtract for numbers and append on string or compare on a character, etc. These differ from language to language and even from environment to env. (16 - 32 bit ints/ 32 - 64 envs./ etc).
If there is anything I am missing or needs refining please ask as this is fairly open ended.
Consider an untyped buffer, void*, from which I take several bytes to treat them as a floating-point value, float or double. Let's assume that floating-point values are IEEE-754 compatible on my machine. Thus, there might be a binary sequence from a buffer that doesn't represent any valid floating-point value. Attempt to operate on such a floating-point variable stuffed with invalid binary would result in a program fault.
How can I guard against a program abort in such a case - that is, how can I get informed about the invalid binary in floating-point variable?
p.s. What is the correct way to extract floating-point from an untyped buffer? I've heard that the trick with union casting like
void* buf;
union U {int i; float f;};
U *u = (U*) buf;
u->i = binvalue;
fpvalue = u->f;`
is invalid, even if buf is properly aligned.
One of the issues is that on many systems, a float has to be aligned (e.g. to an address multiple of 4 bytes), while an entirely arbitrary pointer might be unaligned (and the pointer could even point into an unmapped address, e.g. if it is close to NULL).
The other issue is that IEEE 754 floats indeed define signaling not a number-s and so forth. Maybe checking the float values with isnan or isinf could help.
At last, the IEEE 754 standard is so complex that not every system implements it entirely. You probably have to write very unportable code if you want to be failproof. While I did read in the past that standard, I also forgotten the gory detail.
I am using CUDA 4.0 on Geforce GTX 580 (Fermi) . I have numbers as small as 7.721155e-43 . I want to multiply them with each other just once or better say I want to calculate 7.721155e-43 * 7.721155e-43 .
My experience showed me I can't do it just straight forward. Could you please give me suggestion? Do I need to use double precision? How?
The magnitude of the smallest normal IEEE single-precision number is about 1.18e-38, the smallest denormal gets you down to about 1.40e-45. As a consequece an operand of magnitude 7.82e-43 will comprise only about 9 non-zero bits, which in itself may already be a problem, even before you get to the multiplication (whose result will underflow to zero in single precision). So you may also want to look at any up-stream computation that produces these tiny numbers.
If these small numbers are intermediate terms in a mathematical expression, rewriting that expression into a mathematically equivalent one that does not involve tiny intermediates would be one way of addressing the issue. Or you could scale some operands by factors that are powers of two (so as to not incur additional round-off due to the scaling). For example, scale by 2^24 = 16777216.
Lastly, you can switch part of the computation to double precision. To do so, simply introduce temporary variables of type double, perform the computation on them, then convert the final result back to float:
float r, f = 7.721155e-43f;
double d, t;
d = (double)f; // explicit cast is not necessary, since converting to wider type
t = d * d;
[... more intermediate computation, leaving result in 't' ...]
r = (float)t; // since conversion is to narrower type, cast will avoid warnings
In statistics we often have to work with likelihoods that end up being very small numbers and the standard technique is to use logs for everything. Then multiplication on a log scale is just addition. All intermediate numbers are stored as logs. Indeed it can take a bit of getting used to - but the alternative will often fail even when doing relatively modest computations. In R (for my convenience!) which uses doubles and prints 7 significant figures by default btw:
> 7.721155e-43 * 7.721155e-43
[1] 5.961623e-85
> exp(log(7.721155e-43) + log(7.721155e-43))
[1] 5.961623e-85
After much painful debugging, I believe I've found a unique property of Fortran that I'd like to verify here at stackoverflow.
What I've been noticing is that, at the very least, the value of internal logical variables are preserved across function or subroutine calls.
Here is some example code to illustrate my point:
PROGRAM function_variable_preserve
IMPLICIT NONE
CHARACTER(len=8) :: func_negative_or_not ! Declares function name
INTEGER :: input
CHARACTER(len=8) :: output
input = -9
output = func_negative_or_not(input)
WRITE(*,10) input, " is ", output
10 FORMAT("FUNCTION: ", I2, 2A)
CALL sub_negative_or_not(input, output)
WRITE(*,20) input, " is ", output
20 FORMAT("SUBROUTINE: ", I2, 2A)
WRITE(*,*) 'Expected negative.'
input = 7
output = func_negative_or_not(output)
WRITE(*,10) input, " is ", output
CALL sub_negative_or_not(input, output)
WRITE(*,20) input, " is ", output
WRITE(*,*) 'Expected positive.'
END PROGRAM function_variable_preserve
CHARACTER(len=*) FUNCTION func_negative_or_not(input)
IMPLICIT NONE
INTEGER, INTENT(IN) :: input
LOGICAL :: negative = .FALSE.
IF (input < 0) THEN
negative = .TRUE.
END IF
IF (negative) THEN
func_negative_or_not = 'negative'
ELSE
func_negative_or_not = 'positive'
END IF
END FUNCTION func_negative_or_not
SUBROUTINE sub_negative_or_not(input, output)
IMPLICIT NONE
INTEGER, INTENT(IN) :: input
CHARACTER(len=*), INTENT(OUT) :: output
LOGICAL :: negative = .FALSE.
IF (input < 0) THEN
negative = .TRUE.
END IF
IF (negative) THEN
output = 'negative'
ELSE
output = 'positive'
END IF
END SUBROUTINE sub_negative_or_not
This is the output:
FUNCTION: -9 is negative
SUBROUTINE: -9 is negative
Expected negative.
FUNCTION: 7 is negative
SUBROUTINE: 7 is negative
Expected positive.
As you can see, it appears that once the function or subroutine is called once, the logical variable negative, if switched to .TRUE., remains as such despite the initialization of negative to .FALSE. in the type declaration statement.
I could, of course, correct this problem by just adding a line
negative = .FALSE.
after declaring the variable in my function and subroutine.
However, it seems very odd to me that this be necessary.
For the sake of portability and code reusability, shouldn't the language (or compiler maybe) require re-initialization of all internal variables each time the subroutine or function is called?
To answer your question: Yes Fortran does preserve the value of internal variables through function and subroutine calls.
Under certain conditions ...
If you declare an internal variable with the SAVE attribute it's value is saved from one call to the next. This is, of course, useful in some cases.
However, your question is a common reaction upon first learning about one of Fortran's gotchas: if you initialise an internal variable in its declaration then it automatically acquires the SAVE attribute. You have done exactly that in your subroutines. This is standard-conforming. If you don't want this to happen don't initialise in the declaration.
This is the cause of much surprise and complaint from (some) newcomers to the language. But no matter how hard they complain it's not going to change so you just have to (a) know about it and (b) program in awareness of it.
This isn't too different from static function-scoped variables in C or C++.
Programming language design was in its infancy, back when FORTRAN was
developed. If it were being designed from scratch today, no doubt many of the design
decisions would have been different.
Originally, FORTRAN didn't even support recursion, there was no dynamic memory
allocation, programs played all sorts of type-punning games with COMMON blocks
and EQUIVALENCE statements, procedures could have multiple entry points....so the
memory model was basically for the compiler/linker to lay out everything, even local
variables and numeric literal constants, into fixed storage locations, rather than on
the stack. If you wanted, you could even write code that changed the value of "2" to
"42"!
By now, there is an awful lot of legacy FORTRAN code out there, and compiler writers go to great lengths to preserve backward-compatible semantics. I can't quote chapter and verse from the standard that mandates the behavior you've noted, nor its rationale, but it seems reasonable that backward compatibility trumped modern language design sensibilities, in this instance.
This has been discussed several times here, most recently at Fortran assignment on declaration and SAVE attribute gotcha
You don't have to discover this behavior by experimentation, it is clearly stated in the better textbooks.
Different languages are different and have different behaviors.
There is a historical reason for this behavior. Many compilers for Fortran 77 and earlier preserved the values of ALL local variables across calls of procedures. Programmers weren't supposed to rely upon this behavior but many did. According to the standard, if you wanted a local variable (non-COMMON) to retain its value you needed to use "SAVE". But many programmers ignored this. In that era programs were less frequently ported to different platforms and compilers, so incorrect assumptions might never be noticed. It is common to find this problem in legacy programs -- current Fortran compilers typically provide a compiler switch to cause all variables to be saved. It would be silly for the language standard to require that all local variables retain their values. But an intermediate requirement that would rescue many programs that were careless with "SAVE" would be to require all variables initialized in their declarations to automatically have the SAVE attribute. Hence what you discovered....