Speeding up Berlekamp Welch algorithm using FFT for Shamir Secret Share - fft

I believe the Berlekamp Welch algorithm can be used to correctly construct the secret using Shamir Secret Share as long as $t<n/3$. How can we speed up the BW algorithm implementation using Fast Fourier transform?

Berlekamp Welch is used to correct errors for the original encoding scheme for Reed Solomon code, where there is a fixed set of data points known to encoder and decoder, and a polynomial based on the message to be transmitted, unknown to the decoder. This approach was mostly replaced by switching to a BCH type code where a fixed polynomial known to both encoder and decoder is used instead.
Berlekamp Welch inverts a matrix with time complexity O(n^3). Gao improved on this, reducing time complexity to O(n^2) based on extended Euclid algorithm. Note that the R[-1] product series is pre-computed based on the fixed set of data points, in order to achieve the O(n^2) time complexity. Link to the Wiki section on "original view" decoders.
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Reed_Solomon_original_view_decoders
Discreet Fourier essentially is the same as the encoding process, except there is a constraint on the fixed data points for encoding (they need to be successive powers of the field primitive) in order for the inverse transform to work. The inverse transform only works if the received data is error free. Lagrange interpolation doesn't have the constraint on the data points, and doesn't require the received data to be error free. Wiki has a section on this also:
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Discrete_Fourier_transform_and_its_inverse

In coding theory, the Welch-Berlekamp key equation is a interpolation problem, i.e. w(x)s(x) = n(x) for x = x_1, x_2, ...,x_m, where s(x) is known. Its solution is a polynomial pair (w(x), n(x)) satisfying deg(n(x)) < deg(w(x)) <= m/2. (Here m is even)
The Welch-Berlekamp algorithm is an algorithm for solving this with O(m^2). On the other hand, D.B. Blake et al. described the solution set as a module of rank 2 and gave an another algorithm (called modular approach) with O(m^2). You can see the paper (DOI: 10.1109/18.391235)
Over binary fields, FFT is complex since the size of the multiplicative group cannot be a power of 2. However, Lin, et al. give a new polynomial basis such that the FFT transforms over binary fields is with complexity O(nlogn). Furthermore, this method has been used in decoding Reed-Solomon (RS) codes in which a modular approach is taken. This modular approach takes the advantages of FFT such that its complexity is O(nlog^2n). This is the best complexity to date. The details are in (DOI: 10.1109/TCOMM.2022.3215998) and in (https://arxiv.org/abs/2207.11079, open access).
To sum up, this exists a fast modular approach which uses FFT and is capable of solving the interpolation problem in RS decoding. You should metion that this method requires that the evaluation set to be a subspace v or v + a. Maybe the above information is helpful.

Related

how to predict query topics using word-topic matrix?

I'm implementing LDA using Java. I know how the algorithm works. In the end of the training (the given iterations) I will get 2 matrices (topic-word and document-topic) that represent the set of the input documents.
My problem is that when I input a new document (query) I want to use these matrices (or any other way) to get the document-topic vector of that query. How would I do that?
Are you using Variational Inference or Gibbs Sampling?
For Gibbs Sampling a typical approach is adding the new document/s to the inference, and only updating its own counters, keeping constant the counters for the documents you used to learn the model.
This is specified in equations 84 and 85 in Parameter Estimation for Text Analysis
I guess there has to be a similar approach in VI LDA.

Is there a best-known implementation for a Number Theoretic Transform on the 257 (2^8 + 1) finite field?

I'm a bit of a novice when it comes to implementing FFTs in general, but I have most of the basic ideas down I think. In this specific case, I've an implementation of the number theoretic transform on the 257 finite field. It's basically your typical Radix-2 Cooley-Tukey FFT. What Id like to know is either: is there a good alternative to the Cooley-Tukey Radix-2 that's better suited to doing this particular NTT efficiently (if the answer is an unqualified yes or a yes conditional on something not entirely within the scope of this question, I'm interested in hearing about either), or are there things specific to a Mersenne NTT that allow for a more efficient implementation than a more general case?
I'd say that for dyadic length FFT there is nothing better than Cooley-Tukey.
This has nothing directly to to with Mersenne numbers, any number field with modulus 2^(m*2^n)+1 qualifies. I=2^(m*2^(n-1)) is the complex unit, I^2=2^(m*2^n)=-1 mod (2^(m*2^n)+1), and q=2^(2*m) is a primitive 2^n-th root of unity.
For inspiration for the second point see Section 1 of Schönhage: Asymtotically fast algorithms for the numerical multiplication ..., with overall summary of fast multiplications

Numerical integration of a discontinuous function in multiple dimensions

I have a function f(x) = 1/(x + a+ b*I*sign(x)) and I want to calculate the
integral of
dx dy dz f(x) f(y) f(z) f(x+y+z) f(x-y - z)
over the entire R^3 (b>0 and a,- b are of order unity). This is just a representative example -- in practice I have n<7 variables and 2n-1 instances of f(), n of them involving the n integration variables and n-1 of them involving some linear combintation of the integration variables. At this stage I'm only interested in a rough estimate with relative error of 1e-3 or so.
I have tried the following libraries :
Steven Johnson's cubature code: the hcubature algorithm works but is abysmally slow, taking hundreds of millions of integrand evaluations for even n=2.
HintLib: I tried adaptive integration with a Genz-Malik rule, the cubature routines, VEGAS and MISER with the Mersenne twister RNG. For n=3 only the first seems to be somewhat viable option but it again takes hundreds of millions of integrand evaluations for n=3 and relerr = 1e-2, which is not encouraging.
For the region of integration I have tried both approaches: Integrating over [-200, 200]^n (i.e. a region so large that it essentially captures most of the integral) and the substitution x = sinh(t) which seems to be a standard trick.
I do not have much experience with numerical analysis but presumably the difficulty lies in the discontinuities from the sign() term. For n=2 and f(x)f(y)f(x-y) there are discontinuities along x=0, y=0, x=y. These create a very sharp peak around the origin (with a different sign in the various quadrants) and sort of 'ridges' at x=0,y=0,x=y along which the integrand is large in absolute value and changes sign as you cross them. So at least I know which regions are important. I was thinking that maybe I could do Monte Carlo but somehow "tell" the algorithm in advance where to focus. But I'm not quite sure how to do that.
I would be very grateful if you had any advice on how to evaluate the integral with a reasonable amount of computing power or how to make my Monte Carlo "idea" work. I've been stuck on this for a while so any input would be welcome. Thanks in advance.
One thing you can do is to use a guiding function for your Monte Carlo integration: given an integral (am writing it in 1D for simplicity) of ∫ f(x) dx, write it as ∫ f(x)/g(x) g(x) dx, and use g(x) as a distribution from which you sample x.
Since g(x) is arbitrary, construct it such that (1) it has peaks where you expect them to be in f(x), and (2) such that you can sample x from g(x) (e.g., a gaussian, or 1/(1+x^2)).
Alternatively, you can use a Metropolis-type Markov chain MC. It will find the relevant regions of the integrand (almost) by itself.
Here are a couple of trivial examples.

CUDA cublas<t>gbmv understanding

I recently wanted to use a simple CUDA matrix-vector multiplication. I found a proper function in cublas library: cublas<<>>gbmv. Here is the official documentation
But it is actually very poor, so I didn't manage to understand what the kl and ku parameters mean. Moreover, I have no idea what stride is (it must also be provided).
There is a brief explanation of these parameters (Page 37), but it looks like I need to know something else.
A search on the internet doesn't provide tons of useful information on this question, mostly references to different version of documentation.
So I have several questions to GPU/CUDA/cublas gurus:
How do I find more understandable docs or guides about using cublas?
If you know how to use this very function, couldn't you explain me how do I use it?
Maybe cublas library is somewhat extraordinary and everyone uses something more popular, better documented and so on?
Thanks a lot.
So BLAS (Basic Linear Algebra Subprograms) generally is an API to, as the name says, basic linear algebra routines. It includes vector-vector operations (level 1 blas routines), matrix-vector operations (level 2) and matrix-matrix operations (level 3). There is a "reference" BLAS available that implements everything correctly, but most of the time you'd use an optimized implementation for your architecture. cuBLAS is an implementation for CUDA.
The BLAS API was so successful as an API that describes the basic operations that it's become very widely adopted. However, (a) the names are incredibly cryptic because of architectural limitations of the day (this was 1979, and the API was defined using names of 8 characters or less to ensure it could widely compile), and (b) it is successful because it's quite general, and so even the simplest function calls require a lot of extraneous arguments.
Because it's so widespread, it's often assumed that if you're doing numerical linear algebra, you already know the general gist of the API, so implementation manuals often leave out important details, and I think that's what you're running into.
The Level 2 and 3 routines generally have function names of the form TMMOO.. where T is the numerical type of the matrix/vector (S/D for single/double precision real, C/Z for single/double precision complex), MM is the matrix type (GE for general - eg, just a dense matrix you can't say anything else about; GB for a general banded matrix, SY for symmetric matrices, etc), and OO is the operation.
This all seems slightly ridiculous now, but it worked and works relatively well -- you quickly learn to scan these for familiar operations so that SGEMV is a single-precision general-matrix times vector multiplication (which is probably what you want, not SGBMV), DGEMM is double-precision matrix-matrix multiply, etc. But it does take some practice.
So if you look at the cublas sgemv instructions, or in the documentation of the original, you can step through the argument list. First, the basic operation is
This function performs the matrix-vector multiplication
y = a op(A)x + b y
where A is a m x n matrix stored in column-major format, x and y
are vectors, and and are scalars.
where op(A) can be A, AT, or AH. So if you just want y = Ax, as is the common case, then a = 1, b = 0. and transa == CUBLAS_OP_N.
incx is the stride between different elements in x; there's lots of situations where this would come in handy, but if x is just a simple 1d array containing the vector, then the stride would be 1.
And that's about all you need for SGEMV.

How to represent stereo audio data for FFT

How should stereo (2 channel) audio data be represented for FFT? Do you
A. Take the average of the two channels and assign it to the real component of a number and leave the imaginary component 0.
B. Assign one channel to the real component and the other channel to the imag component.
Is there a reason to do one or the other? I searched the web but could not find any definite answers on this.
I'm doing some simple spectrum analysis and, not knowing any better, used option A). This gave me an unexpected result, whereas option B) went as expected. Here are some more details:
I have a WAV file of a piano "middle-C". By definition, middle-C is 260Hz, so I would expect the peak frequency to be at 260Hz and smaller peaks at harmonics. I confirmed this by viewing the spectrum via an audio editing software (Sound Forge). But when I took the FFT myself, with option A), the peak was at 520Hz. With option B), the peak was at 260Hz.
Am I missing something? The explanation that I came up with so far is that representing stereo data using a real and imag component implies that the two channels are independent, which, I suppose they're not, and hence the mess-up.
I don't think you're taking the average correctly. :-)
C. Process each channel separately, assigning the amplitude to the real component and leaving the imaginary component as 0.
Option B does not make sense. Option A, which amounts to convert the signal to mono, is OK (if you are interested in a global spectrum).
Your problem (double freq) is surely related to some misunderstanding in the use of your FFT routines.
Once you take the FFT you need to get the Magnitude of the complex frequency spectrum. To get the magnitude you take the absolute of the complex spectrum |X(w)|. If you want to look at the power spectrum you square the magnitude spectrum, |X(w)|^2.
In terms of your frequency shift I think it has to do with you setting the imaginary parts to zero.
If you imagine the complex Frequency spectrum as a series of complex vectors or position vectors in a cartesian space. If you took one discrete frequency bin X(w), there would be one real component representing its direction in the real axis (x -direction), and one imaginary component in the in the imaginary axis (y - direction). There are four important values about this discrete frequency, 1. real value, 2. imaginary value, 3. Magnitude and, 4. phase. If you just take the real value and set imaginary to 0, you are setting Magnitude = real and phase = 0deg or 90deg. You have hence forth modified the resulting spectrum, and applied a bias to every frequency bin. Take a look at the wiki on Magnitude of a vector, also called the Euclidean norm of a vector to brush up on your understanding. Leonbloy was correct, but I hope this was more informative.
Think of the FFT as a way to get information from a single signal. What you are asking is what is the best way to display data from two signals. My answer would be to treat each independently, and display an FFT for each.
If you want a really fast streaming FFT you can read about an algorithm I wrote here: www.depthcharged.us/?p=176