zero padded FFT using FFTW - fft

To interpolate a signal in frequency domain, one can pad zeros in time domain and do an FFT.
Suppose the number of elements in a given vector X is N and Y is the same as X but padded one sided with N zeros. Then the following give the same result.
$$\hat{x}(k)=\sum_{n=0}^{2N-1} Y(n)e^{i2\pi k n/2N},\quad k=0,...,2N-1,$$
$$\hat{x}(k)=\sum_{n=0}^{ N-1} X(n)e^{i2\pi k n/2N},\quad k=0,...,2N-1.$$
Now if we use FFTW package, the first equation needs 2N memory space for the input vector while the second one needs only N memory space (I do not know if it is even possible to do in the existing FFTW package)! Also the computational complexity lowers from 2N^2log(2N) to 2N^2log(N). The problem is worse whenever we do a 2D FFT or 3D FFT. Is it possible to do the second approach using FFTW package? This is fairly easy to do in MATLAB though.

If x is a 2N signal padded with zeros above N , its DFT writes :
If k is even :
Hence, the coefficients of even frequencies arise from the N-point discrete Fourier transform of x(n).
if k is odd :
Hence, the coefficients of odd frequencies arise from the N-point discrete Fourier transform of x(n)exp(i*M_PI*n/N).
Thus, the discrete Fourier transform of a zero-padded 2N signal resumes to two DFT of signals of length N and fftw can be used to compute them.
The overall computation time will be 2*c*N*ln(N), where c is a constant. It is expected to be faster than the direct computation of the DFT c*2*N*ln(2*N). Remember that ln(2*N)=ln(2)+ln(N) : as N gets large, the extra work in case of direct computation is negligible compared to ln(N) : the trick becomes useless, even if the dimension is larger than one. It does not affect complexity.
Moreover, FFTW is really efficient, using lots of features of your PC if it is correctly installed, and it will be hard to do better than this in any case, even if the presented trick is used. Finally, if the input signal is real, you may use fftw_plan fftw_plan_dft_r2c_2d : only half the coefficients in the Fourier space are computed and stored.
Regarding memory requirements, if you are really short of memory, you can use the FFTW_IN_PLACE flag and use the same array for input and output. Yet, it is slightly slower.
The procedure presented above can be extended to compute the DFT of a LN signal of a N-point signal padded with (L-1)N zeros : it resumes to the computation of L DFTs of length N.
Do you have any reference showing how MATLAB handles and optimizes the DFT of padded signals compared to FFTW ?
EDIT : Further research about the 3D case :
The 3D DFT of a padded 3D signal x(n,m,p) is :
If k_n, k_m and k_p are even :
If k_n and k_m are even and k_p is odd :
...There are 8 cases.
So, the computation of the 3d dft of a 3D x of size NxNxN padded to 2Nx2Nx2N resumes to the computation of 8 3d dft of size NxNxN. Size a 3d dft is a combination of 3 1d dft, the total number of dft of size N is 3x8xNxN while the direct computation requires 3x(2N)*(2N) dft of size 2N. Computational time is 24cN^3ln(N) against 24cN^3ln(2N) : a small gain is possible...Again fftw is fast...
Yet, instead of using a black-box 3d fft, let's compute the 8 dfts of size N at once, by performing the 1d dfts in each direction.
1d dft along N : 2 cases, NxN dfts => 2cN^3ln(N)
1d dft along M : 2 cases, 2NxN dfts => 4cN^3ln(N)
1d dft along P : 2 cases, 2Nx2N dfts => 8cN^3ln(N)
Hence, the total computation time is expected to be 14cN^3ln(N) against 24cN^3ln(2N) : a small gain is possible...Again fftw is fast...
Moreover, the computation of
requires only a single call to exp : first compute w=exp(I*M_PI/N) then update wn=wn*w; x(n)=x(n)*wn or use pow if precision becomes an issue.

Related

DM Script, why does the fourier transform of gaussian-kenel needs modulus

Recently I learn DM_Script for TEM image processing
I needed Gaussian blur process and I found one whose name is 'Gaussian Blur' in http://www.dmscripting.com/recent_updates.html
This code implements Gaussian blur algorithm by multiplying the fast fourier transform(FFT) of source image by the FFT of Gaussian-kernel image and finally doing inverse fourier transform of it.
Here is the part of the code,
// Carry out the convolution in Fourier space
compleximage fftkernelimg:=realFFT(kernelimg) (-> FFT of Gaussian-kernel image)
compleximage FFTSource:=realfft(warpimg) (-> FFT of source image)
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
realimage invFFT:=realIFFT(FFTProduct)
The point I want to ask is this
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
Why does the FFT of Gaussian-kernel need '.modulus().sqrt()' for the convolution?
It is related to the fact that the fourier transform of a Gaussian function becomes another Gaussian function?
Or It is related to a sort of limitation of discrete fourier transform?
Please answer me
Thanks
This is related to the general precision limitation of any floating point numeric computing. (see f.e. here, or more in depth here)
A rotational (real-valued) Gaussian of stand.dev. sigma should be transformed into a 100% real-values rotational Gaussioan of 1/sigma. However, doing this numerically will show you deviations: Just try the following:
number sigma = 30
number A0 = 1
realimage first := RealImage( "First", 8, 256, 256 )
first = A0 * exp( - (iradius**2/(2*sigma*sigma) ))
first.showimage()
complexImage second := FFT(first)
second.Showimage()
image nonZeroImaginaryMask = ( 0 != second.Imaginary() )
nonZeroImaginaryMask.Showimage()
nonZeroImaginaryMask.SetLimits(0,1)
When you then multiply these complex images (before back-transferring) you are introducing even more errors. By using modulus, one ensures that the forward transformed kernel is purely real and hence a better "damping" curve.
A better implementation of a FFT filtering code would actually create the FFT(Gaussian) directly with a std.dev of 1/sigma, as this is the analytically correct result. Doing a FFT of the kernel only makes sense if the kernel (or its FFT) is not analytically known.
In general: When implementing any "maths" into a program code, it can pay hugely to think it through with numerical computation limits in the back of your head. Reduce actual computation whenever possible (i.e. compute analytically and use the result instead of relying on brute force numerical computation) and try to "reshape" equations when possible, f.e. avoid large sums over many small numbers, be careful about checks against exact numeric values, try to avoid expressions which are very sensitive on small numerica errors etc.

QR algorithm repeating eigenvalues

Can QR algorithm find repeat eigenvalues (https://en.wikipedia.org/wiki/QR_algorithm) ? I.e. Does it support the case when not all N eigen value for real matrix N x N are distinct?
How extend QR algorithm to support finding complex eigenvalues?
In principle yes. It will work if the eigenvalues are really all eigenvalues, i.e., the algebraic and geometric multiplicity are the same.
If the multiple eigenvalue occurs in an Jordan-block of size s, then the unavoidable floating point error during the iteration will almost surely result in a star-shaped perturbation into an eigenvalue cluster with relative error of size mu^(1/s) where mu is the machine constant of the floating point data type.
The reason this happens is that on the irreducible invariant subspace corresponding to a Jordan block of size s the characteristic polynomial of the reduction of the linear operator to this subspace has is (λ-λ[j])^s. During the computation this gets perturbed to (λ-λ[j])^s+μq(λ) which in first approximation has roots close to λ[j]+μ^(1/s)*z[k], where z[k] denotes the s roots of 0=z^s+q(λ[k]). What the perturbation function q is is quite random, accumulated floating point truncation errors, and depends on details of the method.

STFT Clarification (FFT for real-time input)

I get how the DFT via correlation works, and use that as a basis for understanding the results of the FFT. If I have a discrete signal that was sampled at 44.1kHz, then that means if I were to take 1s of data, I would have 44,100 samples. In order to run the FFT on that, I would have to have an array of 44,100 and a DFT with N=44,100 in order to get the resolution necessary to detect a frequencies up to 22kHz, right? (Because the FFT can only correlate the input with sinusoidal components up to a frequency of N/2)
That's obviously a lot of data points and calculation time, and I have read that this is where the Short-time FT (STFT) comes in. If I then take the first 1024 samples (~23ms) and run the FFT on that, then take an overlapping 1024 samples, I can get the continuous frequency domain of the signal every 23ms. Then how do I interpret the output? If the output of the FFT on static data is N/2 data points with fs/(N/2) bandwidth, what is the bandwidth of the STFT's frequency output?
Here's an example that I ran in Mathematica:
100Hz sine wave at 44.1kHz sample rate:
Then I run the FFT on only the first 1024 points:
The frequency of interest is then at data point 3, which should somehow correspond to 100Hz. I think 44100/1024 = 43 is something like a scaling factor, which means that a signal with 1Hz in this little window will then correspond to a signal of 43Hz in the full data array. However, this would give me an output of 43Hz*3 = 129Hz. Is my logic correct but not my implementation?
As I have already stated in my earlier comments, the variable N affects the resolution achievable by the output frequency spectrum and not the range of frequencies you can detect.A larger N gives you a higher resolution at the expense of higher computation time and a lower N gives you lower computation time but can cause spectral leakage, which is the effect you have seen in your last figure.
As for your other question, well, theoretically the bandwidth of an FFT is infinite but we band-limit our result to the band of frequencies in the range [-fs/2 to fs/2] because all frequencies outside that band are susceptible to aliasing and are therefore of no use.Furthermore, if the input signal is real (which is true in most cases including ours) then the frequencies from [-fs/2 to 0] are just a reflection of the frequencies from [0 to fs/2] and so some FFT procedures just output the FFT spectrum from [0 to fs/2], which I think applies to your case.This means that the N/2 data points that you received as output represent the frequencies in the range [0 to fs/2] so that is the bandwidth you are working with in the case of the FFT and also in the case of the STFT (the STFT is just a series of FFT's, each FFT in a STFT will give you a spectrum with data points in this band).
I would also like to point out that the STFT will most likely not reduce your computation time if your input is a varying signal such as music because in that case you will need to take perform it several times over the duration of the song for it to be of any use, it will however enable you to understand the frequency characteristics of your song much better that you would do if you just performed one FFT.
To visualise the results of an FFT you use frequency (and/or phase) spectrum plots but in order to visualise the results of an STFT you will most probably need to create a spectrogram which is basically a graph can is made by just basically putting the individual FFT spectrums side by side.The process of creating a spectrogram can be seen in the figure below (Source: Dan Ellis - Introduction to Speech Processing).The spectrogram will show you how your signal's frequency characteristics change over time and how you interpret it will depend on what specific features you are looking to extract/detect from the audio.You might want to look at the spectrogram wikipedia page for more information.

FFT Magnitude Output

I'm just getting into signal processing and need to do some DFT/FFT work.
If I take a signal with two freqs of 2Hz and 5Hz: x(t)=sin(2*2pi*t)+sin(5*2pi*t). I sample at 100Hz for 5 sec (so my DFT size is 500).
Because my inputs are real values I get a symmetric DFT, so can discard the 2nd half and convert the DFT values into magnitude by doing sqrt(re^2+c^2).
My bin widths are 100/500 = 0.2Hz, and so I get:
With peaks at 2Hz and 5Hz as expected.
My question is: why are the magnitudes different?
On a related note, why are there not two perfect spikes at 2hz and 5Hz, i.ee the graph has non-zero values at 1.5 and 2.5 etc. Is this spectral leakage?
I expect your 500 data points are being processed as a 512 point FFT (most FFT libraries do not support arbitrary size inputs and so typically they zero pad to the next highest power of 2). If that is the case then you will be seeing the effects of spectral leakage. Applying a window function prior to the FFT should fix this. Note that you will still see "skirts" on either side of your peaks - this is due to the uncertainty introduced by a finite sampling window.

Why is the knapsack problem pseudo-polynomial?

I know that Knapsack is NP-complete while it can be solved by DP. They say that the DP solution is pseudo-polynomial, since it is exponential in the "length of input" (i.e. the numbers of bits required to encode the input). Unfortunately I did not get it. Can anybody explain that pseudo-polynomial thing to me slowly ?
The running time is O(NW) for an unbounded knapsack problem with N items and knapsack of size W. W is not polynomial in the length of the input though, which is what makes it pseudo-polynomial.
Consider W = 1,000,000,000,000. It only takes 40 bits to represent this number, so input size = 40, but the computational runtime uses the factor 1,000,000,000,000 which is O(240).
So the runtime is more accurately said to be O(N.2bits in W), which is exponential.
Also see:
How to understand the knapsack problem is NP-complete?
The NP-Completeness of Knapsack
Complexity of dynamic programming algorithm for the 0-1 knapsack problem
Pseudo-polynomial time
In most of our problems, we're dealing with large lists of numbers which fit comfortably inside standard int/float data types. Because of the way most processors are built to handle 4-8 byte numbers at a time at no additional cost (relative to numbers than fit in, say, 1 byte), we rarely encounter a change in running time from scaling our numbers up or down within ranges we encounter in real problems - so the dominant factor remains just the sheer quantity of data points, the n or m factors that we're used to.
(You can imagine that the Big-O notation is hiding a constant factor that divides-out 32 or 64 bits-per-datum, leaving only the number-of-data-points whenever each of our numbers fit in that many bits or less)
But try reworking with other algorithms to act on data sets involving big ints - numbers that require more than 8 bytes to represent - and see what that does to the runtime. The magnitude of the numbers involved always makes a difference, even in the other algorithms like binary sort, once you expand beyond the buffer of safety conventional processors give us "for-free" by handling 4-8 byte batches.
The trick with the Knapsack algorithm that we discussed is that it's unusually sensitive (relative to other algorithms ) to the magnitude of a particular parameter, W. Add one bit to W and you double the running time of the algorithm. We haven't seen that kind of dramatic response to changes in value in other algorithms before this one, which is why it might seem like we're treating Knapsack differently - but that's a genuine analysis of how it responds in a non-polynomial fashion to changes in input size.
The way I understand this is that the capacity would've been O(W) if the capacity input were an array of [1,2,...,W], which has a size of W. But the capacity input is not an array of numbers, it's instead a single integer. The time complexity is about the relationship to the size of input. The size of an integer is NOT the value of the integer, but the number of bits representing it. We do later convert this integer W into an array [1,2,...,W] in the algorithm, leading people into mistakenly thinking W is the size, but this array is not the input, the integer itself is.
Think of input as "an array of stuff", and the size as "how many stuff in the array". The item input is actually an array of n items in the array so size=n. The capacity input is NOT an array of W numbers in it, but a single integer, represented by an array of log(W) bits. Increase the size of it by 1 (adding 1 meaningful bit), W doubles so run time doubles, hence the exponential time complexity.
The Knapsack algorithm's run-time is bound not only on the size of the input (n - the number of items) but also on the magnitude of the input (W - the knapsack capacity) O(nW) which is exponential in how it is represented in computer in binary (2^n) .The computational complexity (i.e how processing is done inside a computer through bits) is only concerned with the size of the inputs, not their magnitudes/values.
Disregard the value/weight list for a moment. Let's say we have an instance with knapsack capacity 2. W would take two bits in the input data. Now we shall increase the knapsack capacity to 4, keeping the rest of the input. Our input has only grown by one bit, but the computational complexity has increased twofold. If we increase the capacity to 1024, we would have just 10 bits of the input for W instead of 2, but the complexity has increased by a factor of 512. Time complexity grows exponentially in the size of W in binary (or decimal) representation.
Another simple example that helped me understand the pseudo-polynomial concept is the naive primality testing algorithm. For a given number n we are checking if it's divided evenly by each integer number in range 2..√n, so the algorithm takes √(n−1) steps. But here, n is the magnitude of the input, not it's size.
Now The regular O(n) case
By contrast, searching an array for a given element runs in polynomial time: O(n). It takes at most n steps and here n is the size of the input (the length of the array).
[ see here ]
Calculating bits required to store decimal number
Complexity is based on input. In knapsack problem, Inputs are size, max Capacity, and profit, weight arrays. We construct dp table as size * W so we feel as its of polynomial time complexity. But, input W is an integer, not an array. So, it will be O(size*(no Of bits required to store given W)). If no of bits increase by 1, then running time doubles. Thus it is exponential, thereby pseudo-polynomial.