How to compute the maximum of a function? - function

How to compute the maximum of a smooth function defined on [a,b] in Fortran ?
For simplicity, a polynomial function.
The background is that almost all numerical flux(a concept in numerical PDE) involves computing the maximum of certain function over an interval [a,b].

For a 1-D problem with smooth and readily-computed derivatives, use Newton-Raphson to find zeros of the first derivative.
For multiple dimensions, and readily-computed derivatives, you're better off using a method that approximates the Hessian. There are several methods of this type, but I've found the L-BFGS method to be reliable and efficient. There a convenient, BSD-licensed package provided by a group at Northwestern University. There's also quite a bit of well-tested code at http://www.netlib.org/

Related

Speeding up Berlekamp Welch algorithm using FFT for Shamir Secret Share

I believe the Berlekamp Welch algorithm can be used to correctly construct the secret using Shamir Secret Share as long as $t<n/3$. How can we speed up the BW algorithm implementation using Fast Fourier transform?
Berlekamp Welch is used to correct errors for the original encoding scheme for Reed Solomon code, where there is a fixed set of data points known to encoder and decoder, and a polynomial based on the message to be transmitted, unknown to the decoder. This approach was mostly replaced by switching to a BCH type code where a fixed polynomial known to both encoder and decoder is used instead.
Berlekamp Welch inverts a matrix with time complexity O(n^3). Gao improved on this, reducing time complexity to O(n^2) based on extended Euclid algorithm. Note that the R[-1] product series is pre-computed based on the fixed set of data points, in order to achieve the O(n^2) time complexity. Link to the Wiki section on "original view" decoders.
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Reed_Solomon_original_view_decoders
Discreet Fourier essentially is the same as the encoding process, except there is a constraint on the fixed data points for encoding (they need to be successive powers of the field primitive) in order for the inverse transform to work. The inverse transform only works if the received data is error free. Lagrange interpolation doesn't have the constraint on the data points, and doesn't require the received data to be error free. Wiki has a section on this also:
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Discrete_Fourier_transform_and_its_inverse
In coding theory, the Welch-Berlekamp key equation is a interpolation problem, i.e. w(x)s(x) = n(x) for x = x_1, x_2, ...,x_m, where s(x) is known. Its solution is a polynomial pair (w(x), n(x)) satisfying deg(n(x)) < deg(w(x)) <= m/2. (Here m is even)
The Welch-Berlekamp algorithm is an algorithm for solving this with O(m^2). On the other hand, D.B. Blake et al. described the solution set as a module of rank 2 and gave an another algorithm (called modular approach) with O(m^2). You can see the paper (DOI: 10.1109/18.391235)
Over binary fields, FFT is complex since the size of the multiplicative group cannot be a power of 2. However, Lin, et al. give a new polynomial basis such that the FFT transforms over binary fields is with complexity O(nlogn). Furthermore, this method has been used in decoding Reed-Solomon (RS) codes in which a modular approach is taken. This modular approach takes the advantages of FFT such that its complexity is O(nlog^2n). This is the best complexity to date. The details are in (DOI: 10.1109/TCOMM.2022.3215998) and in (https://arxiv.org/abs/2207.11079, open access).
To sum up, this exists a fast modular approach which uses FFT and is capable of solving the interpolation problem in RS decoding. You should metion that this method requires that the evaluation set to be a subspace v or v + a. Maybe the above information is helpful.

Flexible programing for inverse function or root finding in Freepascal

I have a huge lib of math functions, like pdf or cdf of statistical distributions. But often e.g. the inverse cdf can be only calculated numerically, e.g. using Newton-Raphson or bisection, in the latter we would need to check if cdf(x) is > or < then the target y0.
However, many functions have further parameters like a Gaussian distribution having certain mean and sigma, so cdf is cdf(x,mean,sigma). Whereas other functions, such as standard normal cdf, have no further parameters, or some have even 3 or 4 further parameters.
A similar problem would happen if you want to apply bisection for either linear functions (2 parameters) or parabolas (3 parameters). Or if you want not the inverse function, but e.g. the integral of f.
The easiest implementation would be to define cdf as global function f(x); and to check for >y0 or global variables.
However, this is a very old-fashioned way, and Freepascal also supports procedural parameters, for calls like x=icdf(0.9987,#cdfStdNorm)
Even overloading is supported to allow calls like x2=icdf(0.9987,0,2,#cdfNorm) to pass also mean and sigma.
But this ends up still in two separate code blocks (even whole functions), because in one case we need to call cdf only with x, and in 2nd example also with mean and sigma.
Is there an elegant solution for this problem in Freepascal? Maybe using variant records? Or an object-oriented approach? I have no glue about OO, but I know the variant object style would require to change at least the headers of many functions because I want to apply the technique not only for inverse cdf calculation, but also to numerical integration, root finding, optimization, etc.
Or is it "best" just to define a real function type with e.g. x + 5 parameters (maybe as array), and to ignore the unused parameters? But for me it looks that then I would need many "wrapper" functions or to re-code all the existing functions (to use the arrays, even if they are sometimes not needed!).
Maybe macros can help as well? Any Freepascal hints are very welcome!
If you make it a (function .. of object), mean and sigma could be part of the class, and the function could internally just access it. Only the really changing parameters during the iteration would be parameters. (read: x)
Anonymous methods as talked about by David and Rudy is a further step to avoid having to declare a class for each such invocation, but that is convenience thing and IMHO not the core of the question. At the expense of declaring the class, your core code is free of global variable use and anonymous methods might also come with a performance cost, depending on usage.
Free Pascal also supports nested functions (function... is nested), which is the original Pascal closure-like way which was never adopted by Pascal compilers from Borland. A nested procedure passed as callback can access local variables in the procedure where it was declared. The Free Pascal numlib numeric math package uses this in some cases for similar cases like yours. For math it is even more natural.
Delphi never implements old constructs because borrowing syntax from other languages looks better on bulletlists and keeps the subscriptions flowing.

Numerical integration of a discontinuous function in multiple dimensions

I have a function f(x) = 1/(x + a+ b*I*sign(x)) and I want to calculate the
integral of
dx dy dz f(x) f(y) f(z) f(x+y+z) f(x-y - z)
over the entire R^3 (b>0 and a,- b are of order unity). This is just a representative example -- in practice I have n<7 variables and 2n-1 instances of f(), n of them involving the n integration variables and n-1 of them involving some linear combintation of the integration variables. At this stage I'm only interested in a rough estimate with relative error of 1e-3 or so.
I have tried the following libraries :
Steven Johnson's cubature code: the hcubature algorithm works but is abysmally slow, taking hundreds of millions of integrand evaluations for even n=2.
HintLib: I tried adaptive integration with a Genz-Malik rule, the cubature routines, VEGAS and MISER with the Mersenne twister RNG. For n=3 only the first seems to be somewhat viable option but it again takes hundreds of millions of integrand evaluations for n=3 and relerr = 1e-2, which is not encouraging.
For the region of integration I have tried both approaches: Integrating over [-200, 200]^n (i.e. a region so large that it essentially captures most of the integral) and the substitution x = sinh(t) which seems to be a standard trick.
I do not have much experience with numerical analysis but presumably the difficulty lies in the discontinuities from the sign() term. For n=2 and f(x)f(y)f(x-y) there are discontinuities along x=0, y=0, x=y. These create a very sharp peak around the origin (with a different sign in the various quadrants) and sort of 'ridges' at x=0,y=0,x=y along which the integrand is large in absolute value and changes sign as you cross them. So at least I know which regions are important. I was thinking that maybe I could do Monte Carlo but somehow "tell" the algorithm in advance where to focus. But I'm not quite sure how to do that.
I would be very grateful if you had any advice on how to evaluate the integral with a reasonable amount of computing power or how to make my Monte Carlo "idea" work. I've been stuck on this for a while so any input would be welcome. Thanks in advance.
One thing you can do is to use a guiding function for your Monte Carlo integration: given an integral (am writing it in 1D for simplicity) of ∫ f(x) dx, write it as ∫ f(x)/g(x) g(x) dx, and use g(x) as a distribution from which you sample x.
Since g(x) is arbitrary, construct it such that (1) it has peaks where you expect them to be in f(x), and (2) such that you can sample x from g(x) (e.g., a gaussian, or 1/(1+x^2)).
Alternatively, you can use a Metropolis-type Markov chain MC. It will find the relevant regions of the integrand (almost) by itself.
Here are a couple of trivial examples.

CUDA cublas<t>gbmv understanding

I recently wanted to use a simple CUDA matrix-vector multiplication. I found a proper function in cublas library: cublas<<>>gbmv. Here is the official documentation
But it is actually very poor, so I didn't manage to understand what the kl and ku parameters mean. Moreover, I have no idea what stride is (it must also be provided).
There is a brief explanation of these parameters (Page 37), but it looks like I need to know something else.
A search on the internet doesn't provide tons of useful information on this question, mostly references to different version of documentation.
So I have several questions to GPU/CUDA/cublas gurus:
How do I find more understandable docs or guides about using cublas?
If you know how to use this very function, couldn't you explain me how do I use it?
Maybe cublas library is somewhat extraordinary and everyone uses something more popular, better documented and so on?
Thanks a lot.
So BLAS (Basic Linear Algebra Subprograms) generally is an API to, as the name says, basic linear algebra routines. It includes vector-vector operations (level 1 blas routines), matrix-vector operations (level 2) and matrix-matrix operations (level 3). There is a "reference" BLAS available that implements everything correctly, but most of the time you'd use an optimized implementation for your architecture. cuBLAS is an implementation for CUDA.
The BLAS API was so successful as an API that describes the basic operations that it's become very widely adopted. However, (a) the names are incredibly cryptic because of architectural limitations of the day (this was 1979, and the API was defined using names of 8 characters or less to ensure it could widely compile), and (b) it is successful because it's quite general, and so even the simplest function calls require a lot of extraneous arguments.
Because it's so widespread, it's often assumed that if you're doing numerical linear algebra, you already know the general gist of the API, so implementation manuals often leave out important details, and I think that's what you're running into.
The Level 2 and 3 routines generally have function names of the form TMMOO.. where T is the numerical type of the matrix/vector (S/D for single/double precision real, C/Z for single/double precision complex), MM is the matrix type (GE for general - eg, just a dense matrix you can't say anything else about; GB for a general banded matrix, SY for symmetric matrices, etc), and OO is the operation.
This all seems slightly ridiculous now, but it worked and works relatively well -- you quickly learn to scan these for familiar operations so that SGEMV is a single-precision general-matrix times vector multiplication (which is probably what you want, not SGBMV), DGEMM is double-precision matrix-matrix multiply, etc. But it does take some practice.
So if you look at the cublas sgemv instructions, or in the documentation of the original, you can step through the argument list. First, the basic operation is
This function performs the matrix-vector multiplication
y = a op(A)x + b y
where A is a m x n matrix stored in column-major format, x and y
are vectors, and and are scalars.
where op(A) can be A, AT, or AH. So if you just want y = Ax, as is the common case, then a = 1, b = 0. and transa == CUBLAS_OP_N.
incx is the stride between different elements in x; there's lots of situations where this would come in handy, but if x is just a simple 1d array containing the vector, then the stride would be 1.
And that's about all you need for SGEMV.

What ODE solver uses calculations in a stepper function for interpolation?

I average over a multiple solutions of ODEs that have different initial conditions, so it's important for all of the solutions to have values at the same times; for example, at an increment of 0.01.
i've been using ODE routines from numerical recipes 3 (nr3). they do adaptive size-step and use the calculated values to do the same order of interpolation. i can't use them because they conflict with boost. are there any other similar routines?
i looked at GSL, it's very nice but it doesn't have a built in interpolation. one way i can do it is solve the ODE with an adaptive size and than run Akima interpolation. But it seems like nr3 solution would be faster and more accurate.
You can use odeint. It has Dopri5, Rosenbrock4 and Burlish-Stoer for dense output.
I have used DOPRI5 from http://www.unige.ch/~hairer/software.html with dense output = interpolation. I found it reliable. I used the original version (in Fortran); there is also a C version on the same webpage which I haven't used myself but I seem to remember that people were happy with it.