I'm just getting into signal processing and need to do some DFT/FFT work.
If I take a signal with two freqs of 2Hz and 5Hz: x(t)=sin(2*2pi*t)+sin(5*2pi*t). I sample at 100Hz for 5 sec (so my DFT size is 500).
Because my inputs are real values I get a symmetric DFT, so can discard the 2nd half and convert the DFT values into magnitude by doing sqrt(re^2+c^2).
My bin widths are 100/500 = 0.2Hz, and so I get:
With peaks at 2Hz and 5Hz as expected.
My question is: why are the magnitudes different?
On a related note, why are there not two perfect spikes at 2hz and 5Hz, i.ee the graph has non-zero values at 1.5 and 2.5 etc. Is this spectral leakage?
I expect your 500 data points are being processed as a 512 point FFT (most FFT libraries do not support arbitrary size inputs and so typically they zero pad to the next highest power of 2). If that is the case then you will be seeing the effects of spectral leakage. Applying a window function prior to the FFT should fix this. Note that you will still see "skirts" on either side of your peaks - this is due to the uncertainty introduced by a finite sampling window.
Related
I am running a cfft on a signal. The output seems to show symmetry. I know that
an fft is symmetrical, but the code
arm_cfft_f32(&arm_cfft_sR_f32_len512, &FFTBuf[0], 0, 1);
arm_cmplx_mag_f32(&FFTBuf[0], &FFTMagBuf[0], FFT_LEN);
accounts for this as the FFTMagBuf is Half the length of the Input array.
The output though, still appears to show symmetry
[1]https://imgur.com/K0uMDAm
arrows point to my whistle, which shows nicely, surrounded by much noise.
the middle one is probably a harmonic(my whistling is crap). but left right symmetry is noticeable.
I am using an stm32f4 disco board, and the samples are from the on-board mems microphone, and each block of samples(in this case 1024, to give an fft of 512 length) is passed through a hann window.
I am using a modified version of tony dicola's spectrogramui.py for visualization.
According to the documentation arm_cmplx_mag_f32 computes the magnitude of a complex signal. That's why FFTMagBuf has to be half the size of FFTBuf: both arrays hold real numbers but, the complex samples are made of two reals. It's unrelated to the simmetry of the FFT.
So, the output signal has exactly the same number of samples as the input.
That is, you compute the complex FFT of a real signal, which has some kind of symmetry (you need to account for complex conjugation too), and you take the magnitude, which is symmetric. Of course, the plot is then symmetric too.
I'm using the vDSP framework for a real-time audio application based on FFT computation.
After having lots of problems trying to figure out why the algorithm was producing incorrect results, I found out the following comment on the official vDSP FFT help code (DemonstrateFFT.c, lines 242, 416, 548)
/* Zero the signal before timing because repeated FFTs on non-zero
data can cause abnormalities such as infinities, NaNs, and
subnormal numbers.
*/
In order to reproduce the error, just comment line 247 (no zero the signal) and add something similar to the following line at line 273 (just after the vDSP_fft_zrip method)
if (isnan(Observed.realp[0])) printf("Iteration %lu: NaN\n",i); // it would work with any of the components of Observed
It is interesting to observe that reducing N (i.e. increasing the amount of FFTs per time unit) makes the zrip algorithm to fail before, which kinds of makes sense since the comment advices about performing repeated FFTs.
The behavior is also observed with the vDSP_fft_zrop algorithm.
I'm really wondering what's the point about performing FFTs of "zero data" as advised on the comment. Either I'm missing something important, or definitely the vDSP framework is not suited at all for real-time audio processing.
Normal 16 and 24-bit "real time" audio samples will not see this issue.
But benchmarks can create bigger and smaller numbers that can exceed the range of double precision floats when iterated enough times, and when using many functions, not just FFTs. Try iterating exp() fed back to itself, that will blow up even faster. It's a problem one encounters using any finite precision computer arithmetic (not just the ARM and x86 CPUs that vDSP uses).
I get how the DFT via correlation works, and use that as a basis for understanding the results of the FFT. If I have a discrete signal that was sampled at 44.1kHz, then that means if I were to take 1s of data, I would have 44,100 samples. In order to run the FFT on that, I would have to have an array of 44,100 and a DFT with N=44,100 in order to get the resolution necessary to detect a frequencies up to 22kHz, right? (Because the FFT can only correlate the input with sinusoidal components up to a frequency of N/2)
That's obviously a lot of data points and calculation time, and I have read that this is where the Short-time FT (STFT) comes in. If I then take the first 1024 samples (~23ms) and run the FFT on that, then take an overlapping 1024 samples, I can get the continuous frequency domain of the signal every 23ms. Then how do I interpret the output? If the output of the FFT on static data is N/2 data points with fs/(N/2) bandwidth, what is the bandwidth of the STFT's frequency output?
Here's an example that I ran in Mathematica:
100Hz sine wave at 44.1kHz sample rate:
Then I run the FFT on only the first 1024 points:
The frequency of interest is then at data point 3, which should somehow correspond to 100Hz. I think 44100/1024 = 43 is something like a scaling factor, which means that a signal with 1Hz in this little window will then correspond to a signal of 43Hz in the full data array. However, this would give me an output of 43Hz*3 = 129Hz. Is my logic correct but not my implementation?
As I have already stated in my earlier comments, the variable N affects the resolution achievable by the output frequency spectrum and not the range of frequencies you can detect.A larger N gives you a higher resolution at the expense of higher computation time and a lower N gives you lower computation time but can cause spectral leakage, which is the effect you have seen in your last figure.
As for your other question, well, theoretically the bandwidth of an FFT is infinite but we band-limit our result to the band of frequencies in the range [-fs/2 to fs/2] because all frequencies outside that band are susceptible to aliasing and are therefore of no use.Furthermore, if the input signal is real (which is true in most cases including ours) then the frequencies from [-fs/2 to 0] are just a reflection of the frequencies from [0 to fs/2] and so some FFT procedures just output the FFT spectrum from [0 to fs/2], which I think applies to your case.This means that the N/2 data points that you received as output represent the frequencies in the range [0 to fs/2] so that is the bandwidth you are working with in the case of the FFT and also in the case of the STFT (the STFT is just a series of FFT's, each FFT in a STFT will give you a spectrum with data points in this band).
I would also like to point out that the STFT will most likely not reduce your computation time if your input is a varying signal such as music because in that case you will need to take perform it several times over the duration of the song for it to be of any use, it will however enable you to understand the frequency characteristics of your song much better that you would do if you just performed one FFT.
To visualise the results of an FFT you use frequency (and/or phase) spectrum plots but in order to visualise the results of an STFT you will most probably need to create a spectrogram which is basically a graph can is made by just basically putting the individual FFT spectrums side by side.The process of creating a spectrogram can be seen in the figure below (Source: Dan Ellis - Introduction to Speech Processing).The spectrogram will show you how your signal's frequency characteristics change over time and how you interpret it will depend on what specific features you are looking to extract/detect from the audio.You might want to look at the spectrogram wikipedia page for more information.
I've been playing around with Web Audio some. I have a simple oscillator node playing at a frequency of context.sampleRate / analyzerNode.fftSize * 5 (107.666015625 in this case). When I call analyzer.getByteFrequencyData I would expect it to have a value in the 5th bin, and no where else. What I actually see is [0,0,0,240,255,255,255,240,0,0...]
Why am I getting values in multiple bins?
The webaudio AnalyserNode applies a Blackman window before computing the FFT. This windowing function will smear the single tone.
That has to do that your sequence is finite and therefore your signal is supposed to last for a finite amount of time. Surely you are calculating the FFT with a rectangular window, i.e. your signal is consider to last for the amount of generated samples only and that "discontinuity" (i.e. the fact that the signal has a finite number of samples) creates the spectral leakage. To minimise this effect, you could try several windows functions that when applied to your data prior the FFT calculation, reduces this effect.
It looks like you might be clipping somewhere in your computation by using a test signal too large for your data or arithmetic format. Try again using a floating point format.
How should stereo (2 channel) audio data be represented for FFT? Do you
A. Take the average of the two channels and assign it to the real component of a number and leave the imaginary component 0.
B. Assign one channel to the real component and the other channel to the imag component.
Is there a reason to do one or the other? I searched the web but could not find any definite answers on this.
I'm doing some simple spectrum analysis and, not knowing any better, used option A). This gave me an unexpected result, whereas option B) went as expected. Here are some more details:
I have a WAV file of a piano "middle-C". By definition, middle-C is 260Hz, so I would expect the peak frequency to be at 260Hz and smaller peaks at harmonics. I confirmed this by viewing the spectrum via an audio editing software (Sound Forge). But when I took the FFT myself, with option A), the peak was at 520Hz. With option B), the peak was at 260Hz.
Am I missing something? The explanation that I came up with so far is that representing stereo data using a real and imag component implies that the two channels are independent, which, I suppose they're not, and hence the mess-up.
I don't think you're taking the average correctly. :-)
C. Process each channel separately, assigning the amplitude to the real component and leaving the imaginary component as 0.
Option B does not make sense. Option A, which amounts to convert the signal to mono, is OK (if you are interested in a global spectrum).
Your problem (double freq) is surely related to some misunderstanding in the use of your FFT routines.
Once you take the FFT you need to get the Magnitude of the complex frequency spectrum. To get the magnitude you take the absolute of the complex spectrum |X(w)|. If you want to look at the power spectrum you square the magnitude spectrum, |X(w)|^2.
In terms of your frequency shift I think it has to do with you setting the imaginary parts to zero.
If you imagine the complex Frequency spectrum as a series of complex vectors or position vectors in a cartesian space. If you took one discrete frequency bin X(w), there would be one real component representing its direction in the real axis (x -direction), and one imaginary component in the in the imaginary axis (y - direction). There are four important values about this discrete frequency, 1. real value, 2. imaginary value, 3. Magnitude and, 4. phase. If you just take the real value and set imaginary to 0, you are setting Magnitude = real and phase = 0deg or 90deg. You have hence forth modified the resulting spectrum, and applied a bias to every frequency bin. Take a look at the wiki on Magnitude of a vector, also called the Euclidean norm of a vector to brush up on your understanding. Leonbloy was correct, but I hope this was more informative.
Think of the FFT as a way to get information from a single signal. What you are asking is what is the best way to display data from two signals. My answer would be to treat each independently, and display an FFT for each.
If you want a really fast streaming FFT you can read about an algorithm I wrote here: www.depthcharged.us/?p=176