Numerical Methods - Checking Domain of Phi(x) - numerical-methods

I have just started a MSc in Scientific Computing, and being an engineer my knowledge of real analysis is somewhat limited.
When rewriting f(x) = 0 as a fixed point formulation Phi(x) = x, it is stressed that we must check that for x in the interval [a,b] that Phi(x) maps into the same interval.
Is there a general real analysis method of checking this, using the Mean Value Theorem for example, or do I need to use simpler calculus method of checking the minimum and maximum values of Phi(x). The simpler calculus method doesn’t seem to be satisfactory or formal enough in a real analysis sense.
Any assistance would be appreciated.
Kind regards
John

Related

Speeding up Berlekamp Welch algorithm using FFT for Shamir Secret Share

I believe the Berlekamp Welch algorithm can be used to correctly construct the secret using Shamir Secret Share as long as $t<n/3$. How can we speed up the BW algorithm implementation using Fast Fourier transform?
Berlekamp Welch is used to correct errors for the original encoding scheme for Reed Solomon code, where there is a fixed set of data points known to encoder and decoder, and a polynomial based on the message to be transmitted, unknown to the decoder. This approach was mostly replaced by switching to a BCH type code where a fixed polynomial known to both encoder and decoder is used instead.
Berlekamp Welch inverts a matrix with time complexity O(n^3). Gao improved on this, reducing time complexity to O(n^2) based on extended Euclid algorithm. Note that the R[-1] product series is pre-computed based on the fixed set of data points, in order to achieve the O(n^2) time complexity. Link to the Wiki section on "original view" decoders.
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Reed_Solomon_original_view_decoders
Discreet Fourier essentially is the same as the encoding process, except there is a constraint on the fixed data points for encoding (they need to be successive powers of the field primitive) in order for the inverse transform to work. The inverse transform only works if the received data is error free. Lagrange interpolation doesn't have the constraint on the data points, and doesn't require the received data to be error free. Wiki has a section on this also:
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Discrete_Fourier_transform_and_its_inverse
In coding theory, the Welch-Berlekamp key equation is a interpolation problem, i.e. w(x)s(x) = n(x) for x = x_1, x_2, ...,x_m, where s(x) is known. Its solution is a polynomial pair (w(x), n(x)) satisfying deg(n(x)) < deg(w(x)) <= m/2. (Here m is even)
The Welch-Berlekamp algorithm is an algorithm for solving this with O(m^2). On the other hand, D.B. Blake et al. described the solution set as a module of rank 2 and gave an another algorithm (called modular approach) with O(m^2). You can see the paper (DOI: 10.1109/18.391235)
Over binary fields, FFT is complex since the size of the multiplicative group cannot be a power of 2. However, Lin, et al. give a new polynomial basis such that the FFT transforms over binary fields is with complexity O(nlogn). Furthermore, this method has been used in decoding Reed-Solomon (RS) codes in which a modular approach is taken. This modular approach takes the advantages of FFT such that its complexity is O(nlog^2n). This is the best complexity to date. The details are in (DOI: 10.1109/TCOMM.2022.3215998) and in (https://arxiv.org/abs/2207.11079, open access).
To sum up, this exists a fast modular approach which uses FFT and is capable of solving the interpolation problem in RS decoding. You should metion that this method requires that the evaluation set to be a subspace v or v + a. Maybe the above information is helpful.

Does RL methods converge with epsilon = 0?

My question is simple. Does an epsilon equals to zero will converge to an optimal policy using RL methods? (with negative and positive reward function values).
Thanks,
No, it does not. Because with epsilon=0 there is no exploration, and without exploration there is no guarantee. It is also intuitively sounds, since without exploration you cannot learn the environment well enough to find the optimal policy.
For example for the Q-learning algorithm, you can see the formal proof in
Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
which shows that Q-learning does converge to the optimal values, if \epsilon goes to zero, as the number of observation goes to infinity.

Is there a best-known implementation for a Number Theoretic Transform on the 257 (2^8 + 1) finite field?

I'm a bit of a novice when it comes to implementing FFTs in general, but I have most of the basic ideas down I think. In this specific case, I've an implementation of the number theoretic transform on the 257 finite field. It's basically your typical Radix-2 Cooley-Tukey FFT. What Id like to know is either: is there a good alternative to the Cooley-Tukey Radix-2 that's better suited to doing this particular NTT efficiently (if the answer is an unqualified yes or a yes conditional on something not entirely within the scope of this question, I'm interested in hearing about either), or are there things specific to a Mersenne NTT that allow for a more efficient implementation than a more general case?
I'd say that for dyadic length FFT there is nothing better than Cooley-Tukey.
This has nothing directly to to with Mersenne numbers, any number field with modulus 2^(m*2^n)+1 qualifies. I=2^(m*2^(n-1)) is the complex unit, I^2=2^(m*2^n)=-1 mod (2^(m*2^n)+1), and q=2^(2*m) is a primitive 2^n-th root of unity.
For inspiration for the second point see Section 1 of Schönhage: Asymtotically fast algorithms for the numerical multiplication ..., with overall summary of fast multiplications

Numerical integration of a discontinuous function in multiple dimensions

I have a function f(x) = 1/(x + a+ b*I*sign(x)) and I want to calculate the
integral of
dx dy dz f(x) f(y) f(z) f(x+y+z) f(x-y - z)
over the entire R^3 (b>0 and a,- b are of order unity). This is just a representative example -- in practice I have n<7 variables and 2n-1 instances of f(), n of them involving the n integration variables and n-1 of them involving some linear combintation of the integration variables. At this stage I'm only interested in a rough estimate with relative error of 1e-3 or so.
I have tried the following libraries :
Steven Johnson's cubature code: the hcubature algorithm works but is abysmally slow, taking hundreds of millions of integrand evaluations for even n=2.
HintLib: I tried adaptive integration with a Genz-Malik rule, the cubature routines, VEGAS and MISER with the Mersenne twister RNG. For n=3 only the first seems to be somewhat viable option but it again takes hundreds of millions of integrand evaluations for n=3 and relerr = 1e-2, which is not encouraging.
For the region of integration I have tried both approaches: Integrating over [-200, 200]^n (i.e. a region so large that it essentially captures most of the integral) and the substitution x = sinh(t) which seems to be a standard trick.
I do not have much experience with numerical analysis but presumably the difficulty lies in the discontinuities from the sign() term. For n=2 and f(x)f(y)f(x-y) there are discontinuities along x=0, y=0, x=y. These create a very sharp peak around the origin (with a different sign in the various quadrants) and sort of 'ridges' at x=0,y=0,x=y along which the integrand is large in absolute value and changes sign as you cross them. So at least I know which regions are important. I was thinking that maybe I could do Monte Carlo but somehow "tell" the algorithm in advance where to focus. But I'm not quite sure how to do that.
I would be very grateful if you had any advice on how to evaluate the integral with a reasonable amount of computing power or how to make my Monte Carlo "idea" work. I've been stuck on this for a while so any input would be welcome. Thanks in advance.
One thing you can do is to use a guiding function for your Monte Carlo integration: given an integral (am writing it in 1D for simplicity) of ∫ f(x) dx, write it as ∫ f(x)/g(x) g(x) dx, and use g(x) as a distribution from which you sample x.
Since g(x) is arbitrary, construct it such that (1) it has peaks where you expect them to be in f(x), and (2) such that you can sample x from g(x) (e.g., a gaussian, or 1/(1+x^2)).
Alternatively, you can use a Metropolis-type Markov chain MC. It will find the relevant regions of the integrand (almost) by itself.
Here are a couple of trivial examples.

CUDA cublas<t>gbmv understanding

I recently wanted to use a simple CUDA matrix-vector multiplication. I found a proper function in cublas library: cublas<<>>gbmv. Here is the official documentation
But it is actually very poor, so I didn't manage to understand what the kl and ku parameters mean. Moreover, I have no idea what stride is (it must also be provided).
There is a brief explanation of these parameters (Page 37), but it looks like I need to know something else.
A search on the internet doesn't provide tons of useful information on this question, mostly references to different version of documentation.
So I have several questions to GPU/CUDA/cublas gurus:
How do I find more understandable docs or guides about using cublas?
If you know how to use this very function, couldn't you explain me how do I use it?
Maybe cublas library is somewhat extraordinary and everyone uses something more popular, better documented and so on?
Thanks a lot.
So BLAS (Basic Linear Algebra Subprograms) generally is an API to, as the name says, basic linear algebra routines. It includes vector-vector operations (level 1 blas routines), matrix-vector operations (level 2) and matrix-matrix operations (level 3). There is a "reference" BLAS available that implements everything correctly, but most of the time you'd use an optimized implementation for your architecture. cuBLAS is an implementation for CUDA.
The BLAS API was so successful as an API that describes the basic operations that it's become very widely adopted. However, (a) the names are incredibly cryptic because of architectural limitations of the day (this was 1979, and the API was defined using names of 8 characters or less to ensure it could widely compile), and (b) it is successful because it's quite general, and so even the simplest function calls require a lot of extraneous arguments.
Because it's so widespread, it's often assumed that if you're doing numerical linear algebra, you already know the general gist of the API, so implementation manuals often leave out important details, and I think that's what you're running into.
The Level 2 and 3 routines generally have function names of the form TMMOO.. where T is the numerical type of the matrix/vector (S/D for single/double precision real, C/Z for single/double precision complex), MM is the matrix type (GE for general - eg, just a dense matrix you can't say anything else about; GB for a general banded matrix, SY for symmetric matrices, etc), and OO is the operation.
This all seems slightly ridiculous now, but it worked and works relatively well -- you quickly learn to scan these for familiar operations so that SGEMV is a single-precision general-matrix times vector multiplication (which is probably what you want, not SGBMV), DGEMM is double-precision matrix-matrix multiply, etc. But it does take some practice.
So if you look at the cublas sgemv instructions, or in the documentation of the original, you can step through the argument list. First, the basic operation is
This function performs the matrix-vector multiplication
y = a op(A)x + b y
where A is a m x n matrix stored in column-major format, x and y
are vectors, and and are scalars.
where op(A) can be A, AT, or AH. So if you just want y = Ax, as is the common case, then a = 1, b = 0. and transa == CUBLAS_OP_N.
incx is the stride between different elements in x; there's lots of situations where this would come in handy, but if x is just a simple 1d array containing the vector, then the stride would be 1.
And that's about all you need for SGEMV.