MATLAB Dftmtx for Huge "N" - fft

I have a vector of size M (say 500), which I up-sample by a factor of MM=500, so that my new vector is now size N=500 x 500=250000. I am using an optimisation algorithm, and need to carryout the fft/dft of the up-sampled vector of size N using the DFT Matrix, and not the inbuilt function.
However, this becomes prohibitive due to memory constraints. Is there any way to go about it? I have seen a similar question here Huge Fourier matrix - MATLAB but this is regarding a Huge Matrix, where the solution is to break the matrix into columns and do the operation column by column. In my case, the vector has 250000 rows.
Would it be wise to split the rows into pieces, say 500 each and iterate the same thing 500 times, and concatenate the results in end ?

If using the FFT is an option, the matrix of twiddle factors does not appear explicitly, so the actual memory requirements are on the order of O(N).
If you must use the explicit DFT matrix, then it is possible to break down the computations using submatrices of the larger DFT matrix. Given an input x of length N, and assuming we wish to divide the large DFT matrix into BlockSize x BlockSize submatrices, this can be done with the following matlab code:
y = zeros(size(x));
Imax = ceil(N / BlockSize); % divide the rows into Imax chunks
Jmax = ceil(N / BlockSize); % divide the columns into Jmax chunks
% iterate over the blocks
for i=0:Imax-1
imin = i*BlockSize;
imax = min(i*BlockSize+BlockSize-1,N-1);
for j=0:Jmax-1
jmin = j*BlockSize;
jmax = min(j*BlockSize+BlockSize-1,N-1);
[XX,YY] = meshgrid(jmin:jmax, imin:imax);
% compute the DFT submatrix
W = exp(-2* pi * 1i * XX .* YY / N);
% apply the DFT submatrix on a chunk of the input and add to the output
y([imin:imax] + 1) = y([imin:imax] + 1) + W * x([jmin:jmax] + 1);
end
end
If needed it would be fairly easy to adapt the above code to use different block size along the rows than along the columns.

Related

How to determine width of peaks and make FFT for every peak (and plot it in separate graph)

I have an acceleration data for X-axis and time vector for it. I determined the peaks more than threshold and now I should find the FFT for every peak.
As result I have this:
Peak Value 1 = 458, index 1988
Peak Value 2 = 456, index 1990
Peak Value 3 = 450, index 12081
....
Peak Value 9 = 432, index 12151
To find these peaks I used the peakfinder script.
The command [peakLoc, peakMag] = peakfinder(x0,...) gives me location and magnitude of peaks.
Also I have the Time (from time data vector) for each peak.
So what I suppose, that I should take every peak, find its width (or some data points around the peak) and make the FFT. Am I right? Could you help me in that?
I'm working in Octave and I'm new here :)
Code:
load ("C:\\..patch..\\peakfinder.m");
d =dlmread("C:\\..patch..\\acc2.csv", ";");
T=d(:,1);
Ax=d(:,2);
[peakInd peakVal]=peakfinder(Ax,10,430,1);
peakTime=T(peakInd);
[sortVal sortInd] = sort(peakVal, 'descend');
originInd = peakInd(sortInd);
for k = 1 : length(sortVal)
fprintf(1, 'Peak #%d = %d, index%d\n', k, sortVal(k), originInd (k));
end
plot(T,Ax,'b-',T(peakInd),Ax(peakInd),'rv');
and here you can download the data http://www.filedropper.com/acc2
FFT
d =dlmread("C:\\..path..\\acc2.csv", ";");
T=d(:,1);
Ax=d(:,2);
% sampling frequency
Fs_a=2000;
% length of FFT
Length_Ax=numel(Ax);
% number of lines of Fourier spectrum
fft_L= Fs_a*2;
% an array of time samples
T_Ax=0:1/Fs_a: Length_Ax;
fft_Ax=abs(fft(Ax,fft_L));
fft_Ax=2*fft_Ax./fft_L;
F=0:Fs_a/fft_L:Fs_a/2-1/fft_L;
subplot(3,1,1);
plot(T,Ax);
title('Ax axis');
xlabel('time (s)');
ylabel('amplitude)'); grid on;
subplot(3,1,2);
plot(F,fft_Ax(1:length(F)));
title('spectrum max Ax axis');
xlabel('frequency (Hz)');
ylabel('amplitude'); grid on;
It looks like you have two clusters of peaks, so I would plot the data over three plots: one of the whole timeseries, one zoomed in on the first cluster, and the last one zoomed in on the second cluster (note I have divided all your time values by 1e6 otherwise the tick labels get ugly):
figure
subplot(3,1,1)
plot(T/1e6,Ax,'b-',peakTime/1e6,peakVal,'rv');
subplot(3,1,2)
plot(T/1e6,Ax,'b-',peakTime(1:4)/1e6,peakVal(1:4),'rv');
axis([0.99*peakTime(1)/1e6 1.01*peakTime(4)/1e6 0.9*peakVal(1) 1.1*peakVal(4)])
subplot(3,1,3)
plot(T/1e6,Ax,'b-',peakTime(5:end)/1e6,peakVal(5:end),'rv');
axis([0.995*peakTime(5)/1e6 1.005*peakTime(end)/1e6 0.9*peakVal(5) 1.1*peakVal(end)])
I have set the axes around the extreme time and acceleration values, using some coefficients to have some "padding" around (the values of these coefficients were obtained through trial and error). This gives me the following plot, hopefully this is the sort of thing you are after. You can add x and y labels if you wish.
EDIT
Here's how I would do the FFT:
Fs = 2000;
L = length(Ax);
NFFT = 2^nextpow2(L); % Next power of 2 from length of Ax
Ax_FFT = fft(Ax,NFFT)/L;
f = Fs/2*linspace(0,1,NFFT/2+1);
% Plot single-sided amplitude spectrum.
figure
semilogx(f,2*abs(Ax_FFT(1:NFFT/2+1))) % using semilogx as huge DC component
title('Single-Sided Amplitude Spectrum of Ax')
xlabel('Frequency (Hz)')
ylabel('|Ax(f)|')
ylim([0 300])
giving the following result:

How to generate random numbers from log-normal distribution with a given mean and SD in SAS?

Wenping (Wendy) Zhang points out
that the SAS RAND function "basically gives "standard" distribution".
The author describes an interesting SAS %rndnmb macro to generate data from “non-standard” distributions. Unfortunately the code in unavailable. So, I dared to do it by myself.
If I understand correctly the Wikipedia says that y is from the log-normal distribution if
y = exp^(mu + sigma * Z).
The following formulas connect the mean and variance of the non-logarithmized sample values:
mu = ln((mean^2)/(sqrt(variance + mean^2))
and
sigma = sqrt(ln(1 + (variance)/(mean^2))).
If that correct, my y will be drawn from log-normal distribution when
Z is from standard normal distribution Z with mu' = 0, sigma' = 1.
Finally, is it correct that y is from lognormal distribution with mean and variance if
y = exp^(ln((mean^2)/(sqrt(variance + mean^2)) + sqrt(ln(1 + (variance)/(mean^2))) * Z)
?
My SAS code is:
/*I use StdDev^2 notation instead of variance here. */
DATA nonStLogNorm;
nonStLN = exp(1)**(log((mean**2)/(sqrt(StdDev^2 + mean**2)) +
sqrt(log(1 + (StdDev^2)/(mean**2))) * rand('UNIFORM'));
RUN;
References:
RAND function by Rick Wicklin:
http://blogs.sas.com/content/iml/2013/07/10/stop-using-ranuni/
http://blogs.sas.com/content/iml/2011/08/24/how-to-generate-random-numbers-in-sas/
What you need is the inverse cumulative distribution function. This is the function that is the inverse of the normalized integral of the distribution over the entire domain. So at 0% is your most negative possible value and 100% is your most positive. Practically though you would calmp to something like 0.01% and 99.99% or something like that as otherwise you'll end up at infinite for a lot of distributions.
Then from there you only need to random a number in a range (0,1) and plug that into the function. Remember to clamp it!
double CDF = 0.5 + 0.5*erf((ln(x) - center)/(sqrt(2)*sigma))
so
double x = exp(inverf((CDF - 0.5)*2.0)*sqrt(2)*sigma + center);
should give you the requested distribution. inverf is the inverse of the erf function. It is a common function but not in math.h typically.
Did a SIMD based random number generator that needed to do distributions. This worked fine, the above will work assuming I didn't flub up something while typing.
As requested how to clamp:
//This is how I do it with my Random class where the first argument
//is the min value and the second is the max
double CDF = Random::Range(0.0001,0.9999); //Depends on what you are using to random
//How you get there from Random Ints
unsigned int RandomNumber = rand();
//Conver number to range [0,1]
double CDF = (double)RandomNumber/(double)RAND_MAX;
//now clamp it to a min, max of your choosing
CDF = CDF*(max - min) + min;
If you want Z to be drawn from the standard normal distribution, shouldn't you obtain it by calling RAND('NORMAL') rather than RAND('UNIFORM')?

Translation from Complex-FFT to Finite-Field-FFT

Good afternoon!
I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have.
Consider the following code (coefficients' length, let it be m, is an exact power of two):
/// <summary>
/// Calculates the result of the recursive Number Theoretic Transform.
/// </summary>
/// <param name="coefficients"></param>
/// <returns></returns>
private static BigInteger[] Recursive_NTT_Skeleton(
IList<BigInteger> coefficients,
IList<BigInteger> rootsOfUnity,
int step,
int offset)
{
// Calculate the length of vectors at the current step of recursion.
// -
int n = coefficients.Count / step - offset / step;
if (n == 1)
{
return new BigInteger[] { coefficients[offset] };
}
BigInteger[] results = new BigInteger[n];
IList<BigInteger> resultEvens =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset);
IList<BigInteger> resultOdds =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset + step);
for (int k = 0; k < n / 2; k++)
{
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
}
return results;
}
It worked for complex FFT (replace BigInteger with a complex numeric type (I had my own)). It doesn't work here even though I changed the procedure of finding the primitive roots of unity appropriately.
Supposedly, the problem is this: rootsOfUnity parameter passed originally contained only the first half of m-th complex roots of unity in this order:
omega^0 = 1, omega^1, omega^2, ..., omega^(n/2)
It was enough, because on these three lines of code:
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
I originally made use of the fact, that at any level of recursion (for any n and i), the complex root of unity -omega^(i) = omega^(i + n/2).
However, that property obviously doesn't hold in finite fields. But is there any analogue of it which would allow me to still compute only the first half of the roots?
Or should I extend the cycle from n/2 to n and pre-compute all the m-th roots of unity?
Maybe there are other problems with this code?..
Thank you very much in advance!
I recently wanted to implement NTT for fast multiplication instead of DFFT too. Read a lot of confusing things, different letters everywhere and no simple solution, and also my finite fields knowledge is rusty , but today i finally got it right (after 2 days of trying and analog-ing with DFT coefficients) so here are my insights for NTT:
Computation
X(i) = sum(j=0..n-1) of ( Wn^(i*j)*x(i) );
where X[] is NTT transformed x[] of size n where Wn is the NTT basis. All computations are on integer modulo arithmetics mod p no complex numbers anywhere.
Important values
Wn = r ^ L mod p is basis for NTT
Wn = r ^ (p-1-L) mod p is basis for INTT
Rn = n ^ (p-2) mod p is scaling multiplicative constant for INTT ~(1/n)
p is prime that p mod n == 1 and p>max'
max is max value of x[i] for NTT or X[i] for INTT
r = <1,p)
L = <1,p) and also divides p-1
r,L must be combined so r^(L*i) mod p == 1 if i=0 or i=n
r,L must be combined so r^(L*i) mod p != 1 if 0 < i < n
max' is the sub-result max value and depends on n and type of computation. For single (I)NTT it is max' = n*max but for convolution of two n sized vectors it is max' = n*max*max etc. See Implementing FFT over finite fields for more info about it.
functional combination of r,L,p is different for different n
this is important, you have to recompute or select parameters from table before each NTT layer (n is always half of the previous recursion).
Here is my C++ code that finds the r,L,p parameters (needs modular arithmetics which is not included, you can replace it with (a+b)%c,(a-b)%c,(a*b)%c,... but in that case beware of overflows especial for modpow and modmul) The code is not optimized yet there are ways to speed it up considerably. Also prime table is fairly limited so either use SoE or any other algo to obtain primes up to max' in order to work safely.
DWORD _arithmetics_primes[]=
{
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,
179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,
419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,
661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997,1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093,1097,1103,1109,1117,1123,1129,1151,
0}; // end of table is 0, the more primes are there the bigger numbers and n can be used
// compute NTT consts W=r^L%p for n
int i,j,k,n=16;
long w,W,iW,p,r,L,l,e;
long max=81*n; // edit1: max num for NTT for my multiplication purposses
for (e=1,j=0;e;j++) // find prime p that p%n=1 AND p>max ... 9*9=81
{
p=_arithmetics_primes[j];
if (!p) break;
if ((p>max)&&(p%n==1))
for (r=2;r<p;r++) // check all r
{
for (l=1;l<p;l++)// all l that divide p-1
{
L=(p-1);
if (L%l!=0) continue;
L/=l;
W=modpow(r,L,p);
e=0;
for (w=1,i=0;i<=n;i++,w=modmul(w,W,p))
{
if ((i==0) &&(w!=1)) { e=1; break; }
if ((i==n) &&(w!=1)) { e=1; break; }
if ((i>0)&&(i<n)&&(w==1)) { e=1; break; }
}
if (!e) break;
}
if (!e) break;
}
}
if (e) { error; } // error no combination r,l,p for n found
W=modpow(r, L,p); // Wn for NTT
iW=modpow(r,p-1-L,p); // Wn for INTT
and here is my slow NTT and INTT implementations (i havent got to fast NTT,INTT yet) they are both tested with Schönhage–Strassen multiplication successfully.
//---------------------------------------------------------------------------
void NTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wj,wi,a,n2=n>>1;
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=a;
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
void INTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wi=1,wj=1,rN,a,n2=n>>1;
rN=modpow(n,m-2,m);
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=modmul(a,rN,m);
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
dst is destination array
src is source array
n is array size
m is modulus (p)
w is basis (Wn)
hope this helps to someone. If i forgot something please write ...
[edit1: fast NTT/INTT]
Finally I manage to get fast NTT/INTT to work. Was little bit more tricky than normal FFT:
//---------------------------------------------------------------------------
void _NFTT(long *dst,long *src,long n,long m,long w)
{
if (n<=1) { if (n==1) dst[0]=src[0]; return; }
long i,j,a0,a1,n2=n>>1,w2=modmul(w,w,m);
// reorder even,odd
for (i=0,j=0;i<n2;i++,j+=2) dst[i]=src[j];
for ( j=1;i<n ;i++,j+=2) dst[i]=src[j];
// recursion
_NFTT(src ,dst ,n2,m,w2); // even
_NFTT(src+n2,dst+n2,n2,m,w2); // odd
// restore results
for (w2=1,i=0,j=n2;i<n2;i++,j++,w2=modmul(w2,w,m))
{
a0=src[i];
a1=modmul(src[j],w2,m);
dst[i]=modadd(a0,a1,m);
dst[j]=modsub(a0,a1,m);
}
}
//---------------------------------------------------------------------------
void _INFTT(long *dst,long *src,long n,long m,long w)
{
long i,rN;
rN=modpow(n,m-2,m);
_NFTT(dst,src,n,m,w);
for (i=0;i<n;i++) dst[i]=modmul(dst[i],rN,m);
}
//---------------------------------------------------------------------------
[edit3]
I have optimized my code (3x times faster than code above),but still i am not satisfied with it so i started new question with it. There I have optimized my code even further (about 40x times faster than code above) so its almost the same speed as FFT on floating point of the same bit size. Link to it is here:
Modular arithmetics and NTT (finite field DFT) optimizations
To turn Cooley-Tukey (complex) FFT into modular arithmetic approach, i.e. NTT, you must replace complex definition for omega. For the approach to be purely recursive, you also need to recalculate omega for each level based on current signal size. This is possible because min. suitable modulus decreases as we move down in the call tree, so modulus used for root is suitable for lower layers. Additionally, as we are using same modulus, the same generator may be used as we move down the call tree. Also, for inverse transform, you should take additional step to take recalculated omega a and instead use as omega: b = a ^ -1 (via using inverse modulo operation). Specifically, b = invMod(a, N) s.t. b * a == 1 (mod N), where N is the chosen prime modulus.
Rewriting an expression involving omega by exploiting periodicity still works in modular arithmetic realm. You also need to find a way to determine the modulus (a prime) for the problem and a valid generator.
We note that your code works, though it is not a MWE. We extended it using common sense, and got correct result for a polynomial multiplication application. You just have to provide correct values of omega raised to certain powers.
While your code works, though, like from many other sources, you double spacing for each level. This does not lead to recursion that is as clean, though; this turns out to be identical to recalculating omega based on current signal size because the power for omega definition is inversely proportional to signal size. To reiterate: halving signal size is like squaring omega, which is like giving doubled powers for omega (which is what one would do for doubling of spacing). The nice thing about the approach that deals with recalculating of omega is that each subproblem is more cleanly complete in its own right.
There is a paper that shows some of the math for modular approach; it is a paper by Baktir and Sunar from 2006. See the paper at the end of this post.
You do not need to extend the cycle from n / 2 to n.
So, yes, some sources which say to just drop in a different omega definition for modular arithmetic approach are sweeping under the rug many details.
Another issue is that it is important to acknowledge that the signal size must be large enough if we are to not have overflow for result time-domain signal if we are performing convolution. Additionally, it may be useful to find certain implementations for exponentiation subject to modulus exist that are fast, even if the power is quite large.
References
Baktir and Sunar - Achieving efficient polynomial multiplication in Fermat fields using the fast Fourier transform (2006)
You must make sure that roots of unity actually exist. In R there are only 2 roots of unity: 1 and -1, since only for them x^n=1 can be true.
In C you have infinitely many roots of unity: w=exp(2*pi*i/N) is a primitive N-th roots of unity and all w^k for 0<=k
Now to your problem: you have to make sure the ring you're working in offers the same property: enough roots of unity.
Schönhage and Strassen (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm) use integers modulo 2^N+1. This ring has enough roots of unity. 2^N == -1 is a 2nd root of unity, 2^(N/2) is a 4th root of unity and so on. Furthermore, these roots of unity have the advantage that they are powers of two and can be implemented as binary shifts (with a modulo operation afterwards, which comes down to a add/subtract).
I think QuickMul (http://www.cs.nyu.edu/exact/doc/qmul.ps) works modulo 2^N-1.

Interpreting jTransform FFT results

Merged with Power Spectral Density from jTransforms DoubleFFT_1D.
I'm using Jtransforms java library to perform analysis on a give dataset.
An example of the data is as follows:
980,988,1160,1080,928,1068,1156,1152,1176,1264
I'm using the DoubleFFT_1D funtion in jTransforms. The data output is as follows:
10952, -152, 80.052, 379.936, -307.691, 12.734, -224.052, 427.607, -48.308, 81.472
I'm having trouble interpreting the output. I understand that the first element in the output array is the total of the 10 inputs (10952). It's the other elements of the output array that i don't understand. Ultimately, I want to plot the Power Spectral Density of the input data on a graph and find amounts between 0 and .5 Hz.
The documentation for the jTransform functions states:
(where a is the data set)
.....................
realForward
public void realForward(double[] a)Computes 1D forward DFT of real data leaving the result in a . The physical layout of the output data is as follows:
if n is even then
a[2*k] = Re[k], 0 <= k < n / 2
a[2*k+1] = Im[k], 0 < k < n / 2
a[1] = Re[n/2]
if n is odd then
a[2*k] = Re[k], 0 <= k < (n+1)/2
a[2*k+1] = Im[k], 0 < k< (n-1)/2
a[1] = Im[(n-1)/2]
This method computes only half of the elements of the real transform. The other half satisfies the symmetry condition. If you want the full real forward transform, use realForwardFull. To get back the original data, use realInverse on the output of this method.
Parameters:
a - data to transform
..................................
So what are the output numbers? What do the values mean?
Any help is appreciated.

Complexity of a given function

When I analyzed the complexity of the code segment below, I found that it is O(n/2). But while searching the internet I discovered that it is probably O(n). I'd like to know who's correct.
void function(int n) {
int i = 1, k = 100;
while (i < n) {
k++;
i += 2;
}
}
What is the point of the variable k in the above method? Regardless big-O notation talks about the behavior in the limit (as the value of n approaches infinity). As such, big-O notation is agnostic to BOTH scaling factors and constants. Which is to say, for any constant "c" and scaling factor "s"
O(f(n)) is equivalent to O(s*f(n) + c)
In your case f(n) = n, s = 1/2, and c = 0. So...
O(n) = O(n/2)
O(n) is the same as O(n/2)
The idea of big-O notation is to understand how fast an algorithm will run as you give it a larger input. So, for example, if you double the size of your input, will the program take twice as long or will it take 4 times as long.
Since both n and n/2 behave identically as you vary the value of N (i.e. if you increase N by a factor of 10, both N itself and N/2 scale identically).
O(n/2) = O(0.5n) = O(n). See Wikipedia for more on this.
If f is O(g), then there exist some c and n such that for all x > n, |f(x)| <= c * |g(x)|. That is, from input n onwards, c * g(x) dominates f(x).
It follows that O(n/2) = O(n), because,
If f(x) = x/2 and g(x) = x, then we set c = 0.5 and n = 0.
If f(x) = x and g(x) = x/2, then we set c = 2 and n = 0.
Note that there are infinitely many values for c and n that you can use to prove this. (In the above I minimized them, but that is not necessary.)