Is there a best-known implementation for a Number Theoretic Transform on the 257 (2^8 + 1) finite field? - fft

I'm a bit of a novice when it comes to implementing FFTs in general, but I have most of the basic ideas down I think. In this specific case, I've an implementation of the number theoretic transform on the 257 finite field. It's basically your typical Radix-2 Cooley-Tukey FFT. What Id like to know is either: is there a good alternative to the Cooley-Tukey Radix-2 that's better suited to doing this particular NTT efficiently (if the answer is an unqualified yes or a yes conditional on something not entirely within the scope of this question, I'm interested in hearing about either), or are there things specific to a Mersenne NTT that allow for a more efficient implementation than a more general case?

I'd say that for dyadic length FFT there is nothing better than Cooley-Tukey.
This has nothing directly to to with Mersenne numbers, any number field with modulus 2^(m*2^n)+1 qualifies. I=2^(m*2^(n-1)) is the complex unit, I^2=2^(m*2^n)=-1 mod (2^(m*2^n)+1), and q=2^(2*m) is a primitive 2^n-th root of unity.
For inspiration for the second point see Section 1 of Schönhage: Asymtotically fast algorithms for the numerical multiplication ..., with overall summary of fast multiplications

Related

Speeding up Berlekamp Welch algorithm using FFT for Shamir Secret Share

I believe the Berlekamp Welch algorithm can be used to correctly construct the secret using Shamir Secret Share as long as $t<n/3$. How can we speed up the BW algorithm implementation using Fast Fourier transform?
Berlekamp Welch is used to correct errors for the original encoding scheme for Reed Solomon code, where there is a fixed set of data points known to encoder and decoder, and a polynomial based on the message to be transmitted, unknown to the decoder. This approach was mostly replaced by switching to a BCH type code where a fixed polynomial known to both encoder and decoder is used instead.
Berlekamp Welch inverts a matrix with time complexity O(n^3). Gao improved on this, reducing time complexity to O(n^2) based on extended Euclid algorithm. Note that the R[-1] product series is pre-computed based on the fixed set of data points, in order to achieve the O(n^2) time complexity. Link to the Wiki section on "original view" decoders.
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Reed_Solomon_original_view_decoders
Discreet Fourier essentially is the same as the encoding process, except there is a constraint on the fixed data points for encoding (they need to be successive powers of the field primitive) in order for the inverse transform to work. The inverse transform only works if the received data is error free. Lagrange interpolation doesn't have the constraint on the data points, and doesn't require the received data to be error free. Wiki has a section on this also:
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Discrete_Fourier_transform_and_its_inverse
In coding theory, the Welch-Berlekamp key equation is a interpolation problem, i.e. w(x)s(x) = n(x) for x = x_1, x_2, ...,x_m, where s(x) is known. Its solution is a polynomial pair (w(x), n(x)) satisfying deg(n(x)) < deg(w(x)) <= m/2. (Here m is even)
The Welch-Berlekamp algorithm is an algorithm for solving this with O(m^2). On the other hand, D.B. Blake et al. described the solution set as a module of rank 2 and gave an another algorithm (called modular approach) with O(m^2). You can see the paper (DOI: 10.1109/18.391235)
Over binary fields, FFT is complex since the size of the multiplicative group cannot be a power of 2. However, Lin, et al. give a new polynomial basis such that the FFT transforms over binary fields is with complexity O(nlogn). Furthermore, this method has been used in decoding Reed-Solomon (RS) codes in which a modular approach is taken. This modular approach takes the advantages of FFT such that its complexity is O(nlog^2n). This is the best complexity to date. The details are in (DOI: 10.1109/TCOMM.2022.3215998) and in (https://arxiv.org/abs/2207.11079, open access).
To sum up, this exists a fast modular approach which uses FFT and is capable of solving the interpolation problem in RS decoding. You should metion that this method requires that the evaluation set to be a subspace v or v + a. Maybe the above information is helpful.

CUDA cublas<t>gbmv understanding

I recently wanted to use a simple CUDA matrix-vector multiplication. I found a proper function in cublas library: cublas<<>>gbmv. Here is the official documentation
But it is actually very poor, so I didn't manage to understand what the kl and ku parameters mean. Moreover, I have no idea what stride is (it must also be provided).
There is a brief explanation of these parameters (Page 37), but it looks like I need to know something else.
A search on the internet doesn't provide tons of useful information on this question, mostly references to different version of documentation.
So I have several questions to GPU/CUDA/cublas gurus:
How do I find more understandable docs or guides about using cublas?
If you know how to use this very function, couldn't you explain me how do I use it?
Maybe cublas library is somewhat extraordinary and everyone uses something more popular, better documented and so on?
Thanks a lot.
So BLAS (Basic Linear Algebra Subprograms) generally is an API to, as the name says, basic linear algebra routines. It includes vector-vector operations (level 1 blas routines), matrix-vector operations (level 2) and matrix-matrix operations (level 3). There is a "reference" BLAS available that implements everything correctly, but most of the time you'd use an optimized implementation for your architecture. cuBLAS is an implementation for CUDA.
The BLAS API was so successful as an API that describes the basic operations that it's become very widely adopted. However, (a) the names are incredibly cryptic because of architectural limitations of the day (this was 1979, and the API was defined using names of 8 characters or less to ensure it could widely compile), and (b) it is successful because it's quite general, and so even the simplest function calls require a lot of extraneous arguments.
Because it's so widespread, it's often assumed that if you're doing numerical linear algebra, you already know the general gist of the API, so implementation manuals often leave out important details, and I think that's what you're running into.
The Level 2 and 3 routines generally have function names of the form TMMOO.. where T is the numerical type of the matrix/vector (S/D for single/double precision real, C/Z for single/double precision complex), MM is the matrix type (GE for general - eg, just a dense matrix you can't say anything else about; GB for a general banded matrix, SY for symmetric matrices, etc), and OO is the operation.
This all seems slightly ridiculous now, but it worked and works relatively well -- you quickly learn to scan these for familiar operations so that SGEMV is a single-precision general-matrix times vector multiplication (which is probably what you want, not SGBMV), DGEMM is double-precision matrix-matrix multiply, etc. But it does take some practice.
So if you look at the cublas sgemv instructions, or in the documentation of the original, you can step through the argument list. First, the basic operation is
This function performs the matrix-vector multiplication
y = a op(A)x + b y
where A is a m x n matrix stored in column-major format, x and y
are vectors, and and are scalars.
where op(A) can be A, AT, or AH. So if you just want y = Ax, as is the common case, then a = 1, b = 0. and transa == CUBLAS_OP_N.
incx is the stride between different elements in x; there's lots of situations where this would come in handy, but if x is just a simple 1d array containing the vector, then the stride would be 1.
And that's about all you need for SGEMV.

Real number arithmetic in a general purpose language?

As (hopefully) most of you know, floating point arithmetic is different from real number arithmetic. It's for starters imprecise. Many numbers, especially decimals (0.1, 0.3) cannot be represented, leading to problems like this. A more thorough list can be found here.
Are there any general purpose languages that have built-in support for something closer to real number arithmetic? If not, what are good libraries that support this?
EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i as well.
Though I hate to say it, Fortran. It has extensive support for arbitrary-precision arithmetic and tons of support for big-number calculations. It's ancient and gross, but it gets the job done.
All the numbers used in your examples are algebraic numbers, and can be represented
finitely as roots of polynomials with integer coefficients.
The same cannot be said of real numbers in general, which is easily seen when one
considers that the reals are uncountable, but the set of computer programs is
countable. Therefore most reals will not have a finite representation in code.
What you are looking for is symbolic calculation (MATLAB and other tools used in math and engineering are good at it).
If you want a general purposed language, I think expression tree in C# is good point to start with. In the essence, the ability to store the expression (instead of evaluate the expression into real values) is the key to be able to perform symbolic calculation. Note that expression tree does not provide symbolic calculation, it just provides the data structure that supports symbolic calculation.
This question is interesting, but raises some issues. First, you will never be able to represent all the real numbers using a (even theoretically infinite) computer, for cardinality reasons.
What you are looking for is a "symbolic numbers" datatype. You can imagine some sort of expression tree, with predefined constants, arithmetical operations, and perhaps algebraic (roots of polynomials) and transcendantal (exp, sin, cos, log, etc) functions.
Now the fun part of the story: you cannot find an algorithm which tells whether two such trees are representing the same number (or equivalently, which test whether such a tree is zero). I won't state anything precise, but as a hint, this is similar to the Halting Problem (for computer scientists) or the Gödel Incompleteness Theorem (for mathematicians).
This renders such a class pretty useless.
For some subfields of the reals, you have canonical forms, like a/b for the rationals, or finite algebraic extensions of the rationals (a/b + ic/d for complex rationals, a/b + sqrt(2) * a/b for Q[sqrt(2)], etc). These can be used to represent some particular sets of algebraic numbers.
In practice, this is the most complicated thing you will need. If you have a particular necessity, like ranges of floating point numbers (to prove some result is whithin a specified interval, this is probably the closest you can get to real numbers), or arbitrary precision numbers, you have freely available classes everywhere. Google boost::range for the former, and gmp for the latter.
There are several languages with support for rational and complex numbers. Scheme, for instance, has support built in for arbitrarily precise rational numbers, and complex numbers with either rational, floating point, or integral coefficients:
> (+ 1/2 1/3)
5/6
> (* 3 1+1/2i)
3+3/2i
> (+ 1/2 .5)
1.0
If you want to go beyond rational numbers or complex numbers with rational coefficients, to algebraic numbers such as sqrt(2) or closed-form numbers like e, you will probably have to look beyond general purpose programming languages, and use a special purpose mathematical language like Mathematica or Maxima.
To cover the real numbers with any flair you'll need a symbolic package.
Boost, the C++ project, has a Rational library, but that's only part of the story.
You have irrational numbers in all sorts of forms (pi, base of the natural logarithm, square and cube roots, the Champernowne constant, to name only a few). The only way I know of to handle arithmetic operations is a symbolic package with smarts as to the relationship amongst all of these numbers. Assuming you could express e^pi, how would you add one to it? Or take the square root of it?
Mathematica might handle these cases.
Java: java.math.BigDecimal
C#: decimal
A lot of languages have support for that: Java has BigDecimal, Perl has Math::BigFloat and Math::BigRat, Haskell has Integer and a lot of libraries and languages are listed in the wikipedia.
Ada natively supports fixed-point math as well as floating-point. Fixed-point can be much more exact than floating-point, as long as the number's exponents remain in range.
If you need floating-points, but more precision than IEEE gives, there are bignum packages around for just about every language.
I think that's about the best you can do. Neither scheme can exactly represent repeating decimals (like 1/3). It would probably be possible to come up with a scheme that does, but I know of no language that supports such a thing with a built-in type. Even that won't help you with irrational numbers (like pi and e). I believe there's even a theorem that says there will always be unrepresentable numbers, no matter what scheme you come up with.
EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i
as well.
Ruby has a Rational class, so 1/3 can be expressed exactly as Rational(1,3). It also has a Complex class.
Scheme defines rationals, bignums, floating point and complex numbers. An implementation is not required to support them all, but if they are present, you can mix them and they will to "the right thing".
While its not "built-in", I think C++ (maybe C#) is your best bet. There are classes out there that have been written for this purpose.
http://www.oonumerics.org/oon/

Can coordinates of constructable points be represented exactly?

I'd like to write a program that lets users draw points, lines, and circles as though with a straightedge and compass. Then I want to be able to answer the question, "are these three points collinear?" To answer correctly, I need to avoid rounding error when calculating the points.
Is this possible? How can I represent the points in memory?
(I looked into some unusual numeric libraries, but I didn't find anything that claimed to offer both exact arithmetic and exact comparisons that are guaranteed to terminate.)
Yes.
I highly recommend Introduction to constructions, which is a good basic guide.
Basically you need to be able to compute with constructible numbers - numbers that are either rational, or of the form a + b sqrt(c) where a,b,c were previously created (see page 6 on that PDF). This could be done with algebraic data type (e.g. data C = Rational Integer Integer | Root C C C in Haskell, where Root a b c = a + b sqrt(c)). However, I don't know how to perform tests with that representation.
Two possible approaches are:
Constructible numbers are a subset of algebraic numbers, so you can use algebraic numbers.
All algebraic numbers can be represented using polynomials of whose they are roots. The operations are computable, so if you represent a number a with polynomial p and b with polynomial q (p(a) = q(b) = 0), then it is possible to find a polynomial r such that r(a+b) = 0. This is done in some CASes like Mathematica, example. See also: Computional algebraic number theory - chapter 4
Use Tarski's test and represent numbers. It is slow (doubly exponential or so), but works :) Example: to represent sqrt(2), use the formula x^2 - 2 && x > 0. You can write equations for lines there, check if points are colinear etc. See A suite of logic programs, including Tarski's test
If you turn to computable numbers, then equality, colinearity etc. get undecidable.
I think the only way this would be possible is if you used a symbolic representation,
as opposed to trying to represent coordinate values directly -- so you would have
to avoid trying to coerce values like sqrt(2) into some numerical format. You will
be dealing with irrational numbers that are not finitely representable in binary,
decimal, or any other positional notation.
To expand on Jim Lewis's answer slightly, if you want to operate on points that are constructible from the integers with exact arithmetic, you will need to be able to operate on representations of the form:
a + b sqrt(c)
where a, b, and c are either rational numbers, or representations in the form given above. Wikipedia has a pretty decent article on the subject of what points are constructible.
Answering the question of exact equality (as necessary to establish colinearity) with such representations is a rather tricky problem.
If you try to compare co-ordinates for your points, then you have a problem. Leaving aside co-linearity for a moment, how about just working out whether two points are the same or not?
Supposing that one has given co-ordinates, and the other is a compass-straightedge construction starting from certain other co-ordinates, you want to determine with certainty whether they're the same point or not. Either way is a theorem of Euclidean geometry, it's not something you can just measure. You can prove they aren't the same by spotting some difference in their co-ordinates (for example by computing decimal places of each until you encounter a difference). But in general to prove they are the same cannot be done by approximate methods. Compute as many decimal places as you like of some expansions of 1/sqrt(2) and sqrt(2)/2, and you can prove they're very close together but you won't ever prove they're equal. That takes algebra (or geometry).
Similarly, to show that three points are co-linear you will need theorem-proving software. Represent the points A, B, C by their constructions, and attempt to prove the theorem "A, B and C are colinear". This is very hard - your program will prove some theorems but not others. Much easier is to ask the user for a proof that they are co-linear, and then verify (or refute) that proof, but that's probably not what you want.
In general, constructable points may have an arbitrarily complex symbolic form, so you must use a symbolic representation to work them exactly. As Stephen Canon noted above, you often need numbers of the form a+b*sqrt(c), where a and b are rational and c is an integer. All numbers of this form form a closed set under arithmetic operations. I have written some C++ classes (see rational_radical1.h) to work with these numbers if that is all you need.
It is also possible to construct numbers which are sums of any number of terms of rational multiples of radicals. When dealing with more than a single radicand, the numbers are no longer closed under multiplication and division, so you will need to store them as variable length rational coefficient arrays. The time complexity of operations will then be quadratic in the number of terms.
To go even further, you can construct the square root of any given number, so you could potentially have nested square roots. Here, the representations must be tree-like structures to deal with root hierarchy. While difficult to implement, there is nothing in principle preventing you from working with these representations. I'm not sure just what additional numbers can be constructed, but beyond a certain point, your symbolic representation will be expressive enough to handle very large classes of numbers.
Addendum
Found this Google Books link.
If the grid axes are integer valued then the answer is fairly straight forward, the points are either exactly colinear or they are not.
Typically however, one works with real numbers (well, floating points) and then draws the rounded values on the screen which does exist in integer space. In this case you have no choice but to pick a tolerance and use it to determine colinearity. Keep it small and the users will never know the difference.
You seem to be asking, in effect, "Can the normal mathematics (integer or floating point) used by computers be made to represent real numbers perfectly, with no rounding errors?" And, of course, the answer to that is "No." If you want theoretical correctness, then you will be stuck with the much harder problem of symbolic manipulation and coding up the equivalent of the inferences that are done in geometry. (In short, I'm agreeing with Steve Jessop, above.)
Some thoughts in the hope that they might help.
The sort of constructions you're talking about will require multiplication and division, which means that to preserve exactness you'll have to use rational numbers, which are generally easy to implement on top of a suitable sort of big integer (i.e., of unbounded magnitude). (Common Lisp has these built-in, and there have to be other languages.)
Now, you need to represent square roots of arbitrary numbers, and these have to be mixed in.
Therefore, a number is one of: a rational number, a rational number multiplied by a square root of a rational number (or, alternately, just the square root of a rational), or a sum of numbers. In order to prove anything, you're going to have to get these numbers into some sort of canonical form, which for all I can figure offhand may be annoying and computationally expensive.
This of course means that the users will be restricted to rational points and cannot use arbitrary rotations, but that's probably not important.
I would recommend no to try to make it perfectly exact.
The first reason for this is what you are asking here, the rounding error and all that stuff that comes with floating point calculations.
The second one is that you have to round your input as the mouse and screen work with integers. So, initially all user input would be integers, and your output would be integers.
Beside, from a usability point of view, its easier to click in the neighborhood of another point (in a line for example) and that the interface consider you are clicking in the point itself.

Invert 4x4 matrix - Numerical most stable solution needed

I want to invert a 4x4 matrix. My numbers are stored in fixed-point format (1.15.16 to be exact).
With floating-point arithmetic I usually just build the adjoint matrix and divide by the determinant (e.g. brute force the solution). That worked for me so far, but when dealing with fixed point numbers I get an unacceptable precision loss due to all of the multiplications used.
Note: In fixed point arithmetic I always throw away some of the least significant bits of immediate results.
So - What's the most numerical stable way to invert a matrix? I don't mind much about the performance, but simply going to floating-point would be to slow on my target architecture.
Meta-answer: Is it really a general 4x4 matrix? If your matrix has a special form, then there are direct formulas for inverting that would be fast and keep your operation count down.
For example, if it's a standard homogenous coordinate transform from graphics, like:
[ux vx wx tx]
[uy vy wy ty]
[uz vz wz tz]
[ 0 0 0 1]
(assuming a composition of rotation, scale, translation matrices)
then there's an easily-derivable direct formula, which is
[ux uy uz -dot(u,t)]
[vx vy vz -dot(v,t)]
[wx wy wz -dot(w,t)]
[ 0 0 0 1 ]
(ASCII matrices stolen from the linked page.)
You probably can't beat that for loss of precision in fixed point.
If your matrix comes from some domain where you know it has more structure, then there's likely to be an easy answer.
I think the answer to this depends on the exact form of the matrix. A standard decomposition method (LU, QR, Cholesky etc.) with pivoting (an essential) is fairly good on fixed point, especially for a small 4x4 matrix. See the book 'Numerical Recipes' by Press et al. for a description of these methods.
This paper gives some useful algorithms, but is behind a paywall unfortunately. They recommend a (pivoted) Cholesky decomposition with some additional features too complicated to list here.
I'd like to second the question Jason S raised: are you certain that you need to invert your matrix? This is almost never necessary. Not only that, it is often a bad idea. If you need to solve Ax = b, it is more numerically stable to solve the system directly than to multiply b by A inverse.
Even if you have to solve Ax = b over and over for many values of b, it's still not a good idea to invert A. You can factor A (say LU factorization or Cholesky factorization) and save the factors so you're not redoing that work every time, but you'd still solve the system each time using the factorization.
You might consider doubling to 1.31 before doing your normal algorithm. It'll double the number of multiplications, but you're doing a matrix invert and anything you do is going to be pretty tied to the multiplier in your processor.
For anyone interested in finding the equations for a 4x4 invert, you can use a symbolic math package to resolve them for you. The TI-89 will do it even, although it'll take several minutes.
If you give us an idea of what the matrix invert does for you, and how it fits in with the rest of your processing we might be able to suggest alternatives.
-Adam
Let me ask a different question: do you definitely need to invert the matrix (call it M), or do you need to use the matrix inverse to solve other equations? (e.g. Mx = b for known M, b) Often there are other ways to do this w/o explicitly needing to calculate the inverse. Or if the matrix M is a function of time & it changes slowly then you could calculate the full inverse once, & there are iterative ways to update it.
If the matrix represents an affine transformation (many times this is the case with 4x4 matrices so long as you don't introduce a scaling component) the inverse is simply the transpose of the upper 3x3 rotation part with the last column negated. Obviously if you require a generalized solution then looking into Gaussian elimination is probably the easiest.