Real number arithmetic in a general purpose language? - language-agnostic

As (hopefully) most of you know, floating point arithmetic is different from real number arithmetic. It's for starters imprecise. Many numbers, especially decimals (0.1, 0.3) cannot be represented, leading to problems like this. A more thorough list can be found here.
Are there any general purpose languages that have built-in support for something closer to real number arithmetic? If not, what are good libraries that support this?
EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i as well.

Though I hate to say it, Fortran. It has extensive support for arbitrary-precision arithmetic and tons of support for big-number calculations. It's ancient and gross, but it gets the job done.

All the numbers used in your examples are algebraic numbers, and can be represented
finitely as roots of polynomials with integer coefficients.
The same cannot be said of real numbers in general, which is easily seen when one
considers that the reals are uncountable, but the set of computer programs is
countable. Therefore most reals will not have a finite representation in code.

What you are looking for is symbolic calculation (MATLAB and other tools used in math and engineering are good at it).
If you want a general purposed language, I think expression tree in C# is good point to start with. In the essence, the ability to store the expression (instead of evaluate the expression into real values) is the key to be able to perform symbolic calculation. Note that expression tree does not provide symbolic calculation, it just provides the data structure that supports symbolic calculation.

This question is interesting, but raises some issues. First, you will never be able to represent all the real numbers using a (even theoretically infinite) computer, for cardinality reasons.
What you are looking for is a "symbolic numbers" datatype. You can imagine some sort of expression tree, with predefined constants, arithmetical operations, and perhaps algebraic (roots of polynomials) and transcendantal (exp, sin, cos, log, etc) functions.
Now the fun part of the story: you cannot find an algorithm which tells whether two such trees are representing the same number (or equivalently, which test whether such a tree is zero). I won't state anything precise, but as a hint, this is similar to the Halting Problem (for computer scientists) or the Gödel Incompleteness Theorem (for mathematicians).
This renders such a class pretty useless.
For some subfields of the reals, you have canonical forms, like a/b for the rationals, or finite algebraic extensions of the rationals (a/b + ic/d for complex rationals, a/b + sqrt(2) * a/b for Q[sqrt(2)], etc). These can be used to represent some particular sets of algebraic numbers.
In practice, this is the most complicated thing you will need. If you have a particular necessity, like ranges of floating point numbers (to prove some result is whithin a specified interval, this is probably the closest you can get to real numbers), or arbitrary precision numbers, you have freely available classes everywhere. Google boost::range for the former, and gmp for the latter.

There are several languages with support for rational and complex numbers. Scheme, for instance, has support built in for arbitrarily precise rational numbers, and complex numbers with either rational, floating point, or integral coefficients:
> (+ 1/2 1/3)
5/6
> (* 3 1+1/2i)
3+3/2i
> (+ 1/2 .5)
1.0
If you want to go beyond rational numbers or complex numbers with rational coefficients, to algebraic numbers such as sqrt(2) or closed-form numbers like e, you will probably have to look beyond general purpose programming languages, and use a special purpose mathematical language like Mathematica or Maxima.

To cover the real numbers with any flair you'll need a symbolic package.
Boost, the C++ project, has a Rational library, but that's only part of the story.
You have irrational numbers in all sorts of forms (pi, base of the natural logarithm, square and cube roots, the Champernowne constant, to name only a few). The only way I know of to handle arithmetic operations is a symbolic package with smarts as to the relationship amongst all of these numbers. Assuming you could express e^pi, how would you add one to it? Or take the square root of it?
Mathematica might handle these cases.

Java: java.math.BigDecimal
C#: decimal

A lot of languages have support for that: Java has BigDecimal, Perl has Math::BigFloat and Math::BigRat, Haskell has Integer and a lot of libraries and languages are listed in the wikipedia.

Ada natively supports fixed-point math as well as floating-point. Fixed-point can be much more exact than floating-point, as long as the number's exponents remain in range.
If you need floating-points, but more precision than IEEE gives, there are bignum packages around for just about every language.
I think that's about the best you can do. Neither scheme can exactly represent repeating decimals (like 1/3). It would probably be possible to come up with a scheme that does, but I know of no language that supports such a thing with a built-in type. Even that won't help you with irrational numbers (like pi and e). I believe there's even a theorem that says there will always be unrepresentable numbers, no matter what scheme you come up with.

EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i
as well.
Ruby has a Rational class, so 1/3 can be expressed exactly as Rational(1,3). It also has a Complex class.

Scheme defines rationals, bignums, floating point and complex numbers. An implementation is not required to support them all, but if they are present, you can mix them and they will to "the right thing".

While its not "built-in", I think C++ (maybe C#) is your best bet. There are classes out there that have been written for this purpose.
http://www.oonumerics.org/oon/

Related

Is there a best-known implementation for a Number Theoretic Transform on the 257 (2^8 + 1) finite field?

I'm a bit of a novice when it comes to implementing FFTs in general, but I have most of the basic ideas down I think. In this specific case, I've an implementation of the number theoretic transform on the 257 finite field. It's basically your typical Radix-2 Cooley-Tukey FFT. What Id like to know is either: is there a good alternative to the Cooley-Tukey Radix-2 that's better suited to doing this particular NTT efficiently (if the answer is an unqualified yes or a yes conditional on something not entirely within the scope of this question, I'm interested in hearing about either), or are there things specific to a Mersenne NTT that allow for a more efficient implementation than a more general case?
I'd say that for dyadic length FFT there is nothing better than Cooley-Tukey.
This has nothing directly to to with Mersenne numbers, any number field with modulus 2^(m*2^n)+1 qualifies. I=2^(m*2^(n-1)) is the complex unit, I^2=2^(m*2^n)=-1 mod (2^(m*2^n)+1), and q=2^(2*m) is a primitive 2^n-th root of unity.
For inspiration for the second point see Section 1 of Schönhage: Asymtotically fast algorithms for the numerical multiplication ..., with overall summary of fast multiplications

In programming languages, what is typically the way used to represent binary values?

I'm not asking about a particular language, but just in general. I know that, for example, #0x or simply 0x is put before the number, or an h is placed after the number, to refer to hexadecimal.
Is there a similar "standard" for binary?
Most of the popular languages don't have a way to enter binary literals. Common Lisp does it using #b prefix, and IIRC PL/I uses a b suffix. Those are the only ones I can think of off the top of my head that allow it.
I found a page at RosettaCode that describes how to enter integer literals in many different languages, including specifying radix.
b is used to represent binary numbers. Very few languages provides this support.
One best example is verilog / system verilog e.g. 4'b0101

Handling Double values on CUDA ( Compute Capability 1.1) [duplicate]

I am writing a program for an embedded hardware that only supports 32-bit single-precision floating-point arithmetic. The algorithm I am implementing, however, requires a 64-bit double-precision addition and comparison. I am trying to emulate double datatype using a tuple of two floats. So a double d will be emulated as a struct containing the tuple: (float d.hi, float d.low).
The comparison should be straightforward using a lexicographic ordering. The addition however is a bit tricky because I am not sure which base should I use. Should it be FLT_MAX? And how can I detect a carry?
How can this be done?
Edit (Clarity): I need the extra significant digits rather than the extra range.
double-float is a technique that uses pairs of single-precision numbers to achieve almost twice the precision of single precision arithmetic accompanied by a slight reduction of the single precision exponent range (due to intermediate underflow and overflow at the far ends of the range). The basic algorithms were developed by T.J. Dekker and William Kahan in the 1970s. Below I list two fairly recent papers that show how these techniques can be adapted to GPUs, however much of the material covered in these papers is applicable independent of platform so should be useful for the task at hand.
https://hal.archives-ouvertes.fr/hal-00021443
Guillaume Da Graça, David Defour
Implementation of float-float operators on graphics hardware,
7th conference on Real Numbers and Computers, RNC7.
http://andrewthall.org/papers/df64_qf128.pdf
Andrew Thall
Extended-Precision Floating-Point Numbers for GPU Computation.
This is not going to be simple.
A float (IEEE 754 single-precision) has 1 sign bit, 8 exponent bits, and 23 bits of mantissa (well, effectively 24).
A double (IEEE 754 double-precision) has 1 sign bit, 11 exponent bits, and 52 bits of mantissa (effectively 53).
You can use the sign bit and 8 exponent bits from one of your floats, but how are you going to get 3 more exponent bits and 29 bits of mantissa out of the other?
Maybe somebody else can come up with something clever, but my answer is "this is impossible". (Or at least, "no easier than using a 64-bit struct and implementing your own operations")
It depends a bit on what types of operations you want to perform. If you only care about additions and subtractions, Kahan Summation can be a great solution.
If you need both the precision and a wide range, you'll be needing a software implementation of double precision floating point, such as SoftFloat.
(For addition, the basic principle is to break the representation (e.g. 64 bits) of each value into its three consitituent parts - sign, exponent and mantissa; then shift the mantissa of one part based on the difference in the exponents, add to or subtract from the mantissa of the other part based on the sign bits, and possibly renormalise the result by shifting the mantissa and adjusting the exponent correspondingly. Along the way, there are a lot of fiddly details to account for, in order to avoid unnecessary loss of accuracy, and deal with special values such as infinities, NaNs, and denormalised numbers.)
Given all the constraints for high precision over 23 magnitudes, I think the most fruitful method would be to implement a custom arithmetic package.
A quick survey shows Briggs' doubledouble C++ library should address your needs and then some. See this.[*] The default implementation is based on double to achieve 30 significant figure computation, but it is readily rewritten to use float to achieve 13 or 14 significant figures. That may be enough for your requirements if care is taken to segregate addition operations with similar magnitude values, only adding extremes together in the last operations.
Beware though, the comments mention messing around with the x87 control register. I didn't check into the details, but that might make the code too non-portable for your use.
[*] The C++ source is linked by that article, but only the gzipped tar was not a dead link.
This is similar to the double-double arithmetic used by many compilers for long double on some machines that have only hardware double calculation support. It's also used as float-float on older NVIDIA GPUs where there's no double support. See Emulating FP64 with 2 FP32 on a GPU. This way the calculation will be much faster than a software floating-point library.
However in most microcontrollers there's no hardware support for floats so they're implemented purely in software. Because of that, using float-float may not increase performance and introduce some memory overhead to save the extra bytes of exponent.
If you really need the longer mantissa, try using a custom floating-point library. You can choose whatever is enough for you, for example change the library to adapt a new 48-bit float type of your own if only 40 bits of mantissa and 7 bits of exponent is needed. No need to spend time for calculating/storing the unnecessary 16 bits anymore. But this library should be very efficient because compiler's libraries often have assembly level optimization for their own type of float.
Another software-based solution that might be of use: GNU MPFR
It takes care of many other special cases and allows arbitrary precision (better than 64-bit double) that you would have to otherwise take care of yourself.
That's not practical. If it was, every embedded 32-bit processor (or compiler) would emulate double precision by doing that. As it stands, none do it that I am aware of. Most of them just substitute float for double.
If you need the precision and not the dynamic range, your best bet would be to use fixed point. IF the compiler supports 64-bit this will be easier too.

What exactly is a datatype?

I understand what a datatype is (intuitively). But I need the formal definition. I don't understand if it is a set or it's the names 'int' 'float' etc. The formal definition found on wikipedia is confusing.
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
Can anyone help me with that?
Yep. What that's saying is that a data type has three pieces:
The various possible values. So, for example, an eight bit signed integer might have -127..128. This of that as a set of values V.
The operations: so an 8-bit signed integer might have +, -, * (multiply), and / (divide). The full definition would define those as functions from V into V, or possible as a function from V into float for division.
The way it's stored -- I sort of gave it away when I said "eight bit signed integer". The other detail is that I'm assuming a specific representation by the way I showed the range of values.
You might, if you're into object oriented programming, notice that this is very much like the definition of a class, which is defined by the storage used by each object, adn the methods of the class. Providing those parts for some arbitrary thing, but not inheritance rules, gives you what's called an abstract data type.
Update
#Appy, there's some room for differences in the formalities. I was a little subtle because it was late and I was suddenly uncertain if I'd assumed one's complement or two's complement -- of course it's two's complement. So interpretation is included in my description. Abstractly, though, you'd say it is a algebraic structure T=(V,O) where V is a set of values, O a set of functions from V into some arbitrary type -- remember '==' for example will be a function eq:V × V → {0,1} so you can't expect every operation to be into V.
I can define it as a classification of a particular type of information. It is easy for humans to distinguish between different types of data. We can usually tell at a glance whether a number is a percentage, a time, or an amount of money. We do this through special symbols %, :, and $.
Basically it's the concept that I am sure you grock. For computers however a data type is defined and has various associated attributes, like size, like a definition keywork (sometimes), the values it can take (numbers or characters for example) and operations that can be done on it like add subtract for numbers and append on string or compare on a character, etc. These differ from language to language and even from environment to env. (16 - 32 bit ints/ 32 - 64 envs./ etc).
If there is anything I am missing or needs refining please ask as this is fairly open ended.

Can coordinates of constructable points be represented exactly?

I'd like to write a program that lets users draw points, lines, and circles as though with a straightedge and compass. Then I want to be able to answer the question, "are these three points collinear?" To answer correctly, I need to avoid rounding error when calculating the points.
Is this possible? How can I represent the points in memory?
(I looked into some unusual numeric libraries, but I didn't find anything that claimed to offer both exact arithmetic and exact comparisons that are guaranteed to terminate.)
Yes.
I highly recommend Introduction to constructions, which is a good basic guide.
Basically you need to be able to compute with constructible numbers - numbers that are either rational, or of the form a + b sqrt(c) where a,b,c were previously created (see page 6 on that PDF). This could be done with algebraic data type (e.g. data C = Rational Integer Integer | Root C C C in Haskell, where Root a b c = a + b sqrt(c)). However, I don't know how to perform tests with that representation.
Two possible approaches are:
Constructible numbers are a subset of algebraic numbers, so you can use algebraic numbers.
All algebraic numbers can be represented using polynomials of whose they are roots. The operations are computable, so if you represent a number a with polynomial p and b with polynomial q (p(a) = q(b) = 0), then it is possible to find a polynomial r such that r(a+b) = 0. This is done in some CASes like Mathematica, example. See also: Computional algebraic number theory - chapter 4
Use Tarski's test and represent numbers. It is slow (doubly exponential or so), but works :) Example: to represent sqrt(2), use the formula x^2 - 2 && x > 0. You can write equations for lines there, check if points are colinear etc. See A suite of logic programs, including Tarski's test
If you turn to computable numbers, then equality, colinearity etc. get undecidable.
I think the only way this would be possible is if you used a symbolic representation,
as opposed to trying to represent coordinate values directly -- so you would have
to avoid trying to coerce values like sqrt(2) into some numerical format. You will
be dealing with irrational numbers that are not finitely representable in binary,
decimal, or any other positional notation.
To expand on Jim Lewis's answer slightly, if you want to operate on points that are constructible from the integers with exact arithmetic, you will need to be able to operate on representations of the form:
a + b sqrt(c)
where a, b, and c are either rational numbers, or representations in the form given above. Wikipedia has a pretty decent article on the subject of what points are constructible.
Answering the question of exact equality (as necessary to establish colinearity) with such representations is a rather tricky problem.
If you try to compare co-ordinates for your points, then you have a problem. Leaving aside co-linearity for a moment, how about just working out whether two points are the same or not?
Supposing that one has given co-ordinates, and the other is a compass-straightedge construction starting from certain other co-ordinates, you want to determine with certainty whether they're the same point or not. Either way is a theorem of Euclidean geometry, it's not something you can just measure. You can prove they aren't the same by spotting some difference in their co-ordinates (for example by computing decimal places of each until you encounter a difference). But in general to prove they are the same cannot be done by approximate methods. Compute as many decimal places as you like of some expansions of 1/sqrt(2) and sqrt(2)/2, and you can prove they're very close together but you won't ever prove they're equal. That takes algebra (or geometry).
Similarly, to show that three points are co-linear you will need theorem-proving software. Represent the points A, B, C by their constructions, and attempt to prove the theorem "A, B and C are colinear". This is very hard - your program will prove some theorems but not others. Much easier is to ask the user for a proof that they are co-linear, and then verify (or refute) that proof, but that's probably not what you want.
In general, constructable points may have an arbitrarily complex symbolic form, so you must use a symbolic representation to work them exactly. As Stephen Canon noted above, you often need numbers of the form a+b*sqrt(c), where a and b are rational and c is an integer. All numbers of this form form a closed set under arithmetic operations. I have written some C++ classes (see rational_radical1.h) to work with these numbers if that is all you need.
It is also possible to construct numbers which are sums of any number of terms of rational multiples of radicals. When dealing with more than a single radicand, the numbers are no longer closed under multiplication and division, so you will need to store them as variable length rational coefficient arrays. The time complexity of operations will then be quadratic in the number of terms.
To go even further, you can construct the square root of any given number, so you could potentially have nested square roots. Here, the representations must be tree-like structures to deal with root hierarchy. While difficult to implement, there is nothing in principle preventing you from working with these representations. I'm not sure just what additional numbers can be constructed, but beyond a certain point, your symbolic representation will be expressive enough to handle very large classes of numbers.
Addendum
Found this Google Books link.
If the grid axes are integer valued then the answer is fairly straight forward, the points are either exactly colinear or they are not.
Typically however, one works with real numbers (well, floating points) and then draws the rounded values on the screen which does exist in integer space. In this case you have no choice but to pick a tolerance and use it to determine colinearity. Keep it small and the users will never know the difference.
You seem to be asking, in effect, "Can the normal mathematics (integer or floating point) used by computers be made to represent real numbers perfectly, with no rounding errors?" And, of course, the answer to that is "No." If you want theoretical correctness, then you will be stuck with the much harder problem of symbolic manipulation and coding up the equivalent of the inferences that are done in geometry. (In short, I'm agreeing with Steve Jessop, above.)
Some thoughts in the hope that they might help.
The sort of constructions you're talking about will require multiplication and division, which means that to preserve exactness you'll have to use rational numbers, which are generally easy to implement on top of a suitable sort of big integer (i.e., of unbounded magnitude). (Common Lisp has these built-in, and there have to be other languages.)
Now, you need to represent square roots of arbitrary numbers, and these have to be mixed in.
Therefore, a number is one of: a rational number, a rational number multiplied by a square root of a rational number (or, alternately, just the square root of a rational), or a sum of numbers. In order to prove anything, you're going to have to get these numbers into some sort of canonical form, which for all I can figure offhand may be annoying and computationally expensive.
This of course means that the users will be restricted to rational points and cannot use arbitrary rotations, but that's probably not important.
I would recommend no to try to make it perfectly exact.
The first reason for this is what you are asking here, the rounding error and all that stuff that comes with floating point calculations.
The second one is that you have to round your input as the mouse and screen work with integers. So, initially all user input would be integers, and your output would be integers.
Beside, from a usability point of view, its easier to click in the neighborhood of another point (in a line for example) and that the interface consider you are clicking in the point itself.