I would like to ask what are the units of magnitude
magnitude = sqrt(real*real + imag*imag);
after performing fft on .wav file data? Is it like V^2?
Related
I have some FFT data, 257 dimensions, every 10 ms, with 121 frames, i.e. 1.21 secs. I think the first dimension is probably something else and the remaining are the FFT coefficients, I guess.
It's probably just spectogram data. From a comment about the FFT data, sqrt10 and mean-variance-normalization might have been applied on it.
From there, I want to calculate back some PCM signal for 44.1 Hz so I can play the sound. I asked the same question in a more mathematical way here but maybe StackOverflow is a better place because I actually want to implement this.
I also asked the same question about the theory here on DSP SE.
How would I do that? Maybe I need some more information (which I have to find out somehow) - which? Maybe these missing information can be intelligently guessed somehow?
This question is both about the theory and practical implementation. The implementation is trivial I guess. But a concrete example in some language would be nice to help understanding the theory. Maybe C++ with FFTW? I skipped through the FFTW docs but I fail to understand all the terminology and some background, e.g. here. Why is it from complex to real or the other way, I only want real to real. What are those REDFT? What's a DCT, DFT, DST? FFTW_HC2R?
I read all the FFT data, i.e. 121 * 257 floats, into a vector freq_bins.
std::vector<float32_t> freq_bins; // FFT data
int freq_bins_count = 257;
size_t len = 121;
std::vector<float32_t> pcm; // output, PCM data
int N = freq_bins_count;
std::vector<double> out(N), orig_in(N);
// inspiration: https://stackoverflow.com/questions/2459295/invertible-stft-and-istft-in-python/6891772#6891772
for(int f = 0; f < len; ++f) {
size_t pos = freq_bins_count * f;
for(int i = 0; i < N; ++i)
out[i] = pow(freq_bins[pos + i] + offset, 10); // fft was sqrt10 + mvn
fftw_plan q = fftw_plan_r2r_1d(N, &out[0], &orig_in[0], FFTW_REDFT00, FFTW_ESTIMATE);
fftw_execute(q);
fftw_destroy_plan(q);
// naive overlap-and-add
auto start_frame = size_t(f * dt * sampleRate);
for(int i = 0; i < N; ++i) {
sample_t frame = orig_in[i] * scale / (2 * (N - 1));
size_t idx = start_frame + i;
while(idx >= pcm.size())
pcm.push_back(0);
pcm[idx] += frame;
}
}
But this is wrong, I guess. I just get garbage out.
Related might be this question. Or this.
If the data you are have is real then the data you have is most probably spectrogram data and if the data you are receiving is complex then you most probably have raw short time fourier transform (STFT) data (See the diagram on this post to see how STFT/spectrogram data is produced). Spectrogram data is produced by taking the magnitude squared of STFT data and is thus not invertible because all the phase information in the audio signal has been lost but raw STFT data is invertible so if that is what you have then you might want to look for a library that performs the inverse STFT function and try using that.
As for the question of what the FFT dimensions in your data represent, I reckon the 257 data points you are receiving every 10ms are the result of a 512 point FFT being used in the STFT process.The first sample is the 0Hz frequency and the rest of the 256 data points are one half of the FFT spectrum (the other half of the FFT data has been discarded because the input to the FFT is real and so one half of the FFT data is simply the complex conjugate of the other half).
In addition to this, I would like to point out that just because you are receiving FFT data every 10ms 121 times does not mean the audio signal is 1.21s.The STFT is usually produced by using overlapping windows so your audio signal is might be shorter than 1.21s.
You'd simply push that data you have through the inverse fourier transform. All FFT libraries offer forward and backward transformation functions.
I get how the DFT via correlation works, and use that as a basis for understanding the results of the FFT. If I have a discrete signal that was sampled at 44.1kHz, then that means if I were to take 1s of data, I would have 44,100 samples. In order to run the FFT on that, I would have to have an array of 44,100 and a DFT with N=44,100 in order to get the resolution necessary to detect a frequencies up to 22kHz, right? (Because the FFT can only correlate the input with sinusoidal components up to a frequency of N/2)
That's obviously a lot of data points and calculation time, and I have read that this is where the Short-time FT (STFT) comes in. If I then take the first 1024 samples (~23ms) and run the FFT on that, then take an overlapping 1024 samples, I can get the continuous frequency domain of the signal every 23ms. Then how do I interpret the output? If the output of the FFT on static data is N/2 data points with fs/(N/2) bandwidth, what is the bandwidth of the STFT's frequency output?
Here's an example that I ran in Mathematica:
100Hz sine wave at 44.1kHz sample rate:
Then I run the FFT on only the first 1024 points:
The frequency of interest is then at data point 3, which should somehow correspond to 100Hz. I think 44100/1024 = 43 is something like a scaling factor, which means that a signal with 1Hz in this little window will then correspond to a signal of 43Hz in the full data array. However, this would give me an output of 43Hz*3 = 129Hz. Is my logic correct but not my implementation?
As I have already stated in my earlier comments, the variable N affects the resolution achievable by the output frequency spectrum and not the range of frequencies you can detect.A larger N gives you a higher resolution at the expense of higher computation time and a lower N gives you lower computation time but can cause spectral leakage, which is the effect you have seen in your last figure.
As for your other question, well, theoretically the bandwidth of an FFT is infinite but we band-limit our result to the band of frequencies in the range [-fs/2 to fs/2] because all frequencies outside that band are susceptible to aliasing and are therefore of no use.Furthermore, if the input signal is real (which is true in most cases including ours) then the frequencies from [-fs/2 to 0] are just a reflection of the frequencies from [0 to fs/2] and so some FFT procedures just output the FFT spectrum from [0 to fs/2], which I think applies to your case.This means that the N/2 data points that you received as output represent the frequencies in the range [0 to fs/2] so that is the bandwidth you are working with in the case of the FFT and also in the case of the STFT (the STFT is just a series of FFT's, each FFT in a STFT will give you a spectrum with data points in this band).
I would also like to point out that the STFT will most likely not reduce your computation time if your input is a varying signal such as music because in that case you will need to take perform it several times over the duration of the song for it to be of any use, it will however enable you to understand the frequency characteristics of your song much better that you would do if you just performed one FFT.
To visualise the results of an FFT you use frequency (and/or phase) spectrum plots but in order to visualise the results of an STFT you will most probably need to create a spectrogram which is basically a graph can is made by just basically putting the individual FFT spectrums side by side.The process of creating a spectrogram can be seen in the figure below (Source: Dan Ellis - Introduction to Speech Processing).The spectrogram will show you how your signal's frequency characteristics change over time and how you interpret it will depend on what specific features you are looking to extract/detect from the audio.You might want to look at the spectrogram wikipedia page for more information.
The aim is to do correlation/convolutions(flip) of two 2D arrays using ios Accelerate framework for gaining speed.
My first attempt was with vImageConvolve_PlanarF/vdsp_imgfir which was good for lower sized arrays. But as array size increased, performance dropped drastically as it was an O(n2) implementation as mentioned by Accelerate developers themselves here(1).
I moved to FFT implementation(2) for reducing complexity to O(nlog2n). Using vDSP_fft2d_zip, gained speed to an extent. But using vDSP_fft2d_zip on 2D arrays at non powers of 2, we need to pad zeros. For e.g. on a 2D array of size 640 * 480, we need to pad zeros to make it 1024 * 512. Other FFT implementations like FFTW or OpenCV's DFT allow sizes which could be expressed as size = 2p * 3p * 5r. That allows, FFTW/OpenCV to do FFT of 640 * 480 2D array at the same size.
So for 2D arrays at size 640*480, in an Accelerate vs FFTW/OpenCV comparison, it is effectively between 1024*512 and 640*480 dimensions. So what ever performance gains I get from Accelerate's FFT on 2D arrays is nullified by it's inability to performs FFT at reasonable dimensions like size = 2p * 3p * 5r
2 Queries.
Am I missing any Accelerate functionality to perform this easily ? For e.g. any Accelerate function which could perform 2D array FFT at size = 2p * 3p * 5r. I assume vDSP_DFT_Execute performs only 1D FFT.
Better approaches to 2D FFT or correlation. Like in this answer(3), which asks to split arrays like 480 = 256 + 128 + 64 + 32 with repeated 1D FFT's over rows and then over columns. But this will need too many functions calls and I assume, will not help with performance.
Of lesser importance: I am doing correlation in tiles as one of the 2D arrays is far bigger then another. Say like 1920*1024 vs 150*100.
Linear convolution or correlation requires zero padding anyway, otherwise the result will be circular convolution or correlation.
1d iOS vDSP/Accelerate FFTs do allow N to be the product of small primes, not just 2^M. Not sure about 2d, but one can build a 2d FFT out of a 1d FFT.
I have a fundamental question:
I would like to know, why this time series:
k<-c(4,5,6,2,3,1)
is equal to:
21.0+0.000000i 0.5-6.062178i -1.5-0.866025i 5.0-0.000000i -1.5+0.866025i 0.5+6.062178i
In time series I have a set of points, but what is the resault of fft , are there points?
Fourier says that any (non-pathological) waveform can be decomposed into a bunch of sinewaves. The FFT does that for reasonable samples of a given waveform.
So your FFT results are the coefficients of each sinewave sub-component: the first for 0 Hz (or DC, or sum), the 2nd for a sinewave of 1 period per aperture, the next: 2 cycles per aperture, and etc. You can consider each coefficient pair x+iy, as either a vector in the complex plane for a sinewave's magnitude and phase, or as multipliers for a cosine and a sine that sum up to another sinewave of a specified phase.
I am trying to do an FFT on some data I have captured. I am working in the 10MHz-100MHz range, so my 8192 sample captures will not be big enough to convey anything meaningful when doing an FFT on them. So I am taking many non-overlapping captures of a sine wave and want to average them together.
What I am currently doing (in Scilab) in a for-loop for every file is:
temp1 = read_csv(filename,"\t");
temp1_fft = fft(temp1);
temp1_fft = temp1_fft .* conj(temp1_fft);
temp1_fft = log10(temp1_fft);
fft_code = fft_code + temp1_fft;
And then when I am done with all the files I:
fft_code = fft_code./numFiles;
But I am not so sure that I am handling this correctly. Is there a better way for non-overlapping samples?
I think you are close, but you should average the magnitude of the spectrums (temp1_fft) before taking the log10. Otherwise you essentially end up multiplying them instead of averaging. So instead, just move the log10 to outside the for loop like so (I don't know scilab syntax):
for filename in files:
temp1 = read_csv(filename,"\t");
temp1_fft = fft(temp1);
temp1_fft = temp1_fft .* conj(temp1_fft);
fft_code = fft_code + temp1_fft;
fft_code = fft_code./numFiles;
fft_code = log10(fft_code);
You definitely want to use the magnitude (you are already doing this when you multiply by the conj), as the phase information will depend on when your sampling began relative to the signal. If you need the phase information, you have to make sure your acquisitions are in sync with the signal somehow.
What this does is called "Power Spectrum Averaging":
Power Spectrum Averaging is also called RMS Averaging. RMS averaging computes the weighted mean of the sum of the squared magnitudes (FFT times its complex conjugate). The weighting is either linear or exponential. RMS averaging reduces fluctuations in the data but does not reduce the actual noise floor. With a sufficient number of averages, a very good approximation of the actual random noise floor can be displayed. Since RMS averaging involves magnitudes only, displaying the real or imaginary part, or phase, of an RMS average has no meaning and the power spectrum average has no phase information.