fft output show unexpected symmetry - fft

I am running a cfft on a signal. The output seems to show symmetry. I know that
an fft is symmetrical, but the code
arm_cfft_f32(&arm_cfft_sR_f32_len512, &FFTBuf[0], 0, 1);
arm_cmplx_mag_f32(&FFTBuf[0], &FFTMagBuf[0], FFT_LEN);
accounts for this as the FFTMagBuf is Half the length of the Input array.
The output though, still appears to show symmetry
[1]https://imgur.com/K0uMDAm
arrows point to my whistle, which shows nicely, surrounded by much noise.
the middle one is probably a harmonic(my whistling is crap). but left right symmetry is noticeable.
I am using an stm32f4 disco board, and the samples are from the on-board mems microphone, and each block of samples(in this case 1024, to give an fft of 512 length) is passed through a hann window.
I am using a modified version of tony dicola's spectrogramui.py for visualization.

According to the documentation arm_cmplx_mag_f32 computes the magnitude of a complex signal. That's why FFTMagBuf has to be half the size of FFTBuf: both arrays hold real numbers but, the complex samples are made of two reals. It's unrelated to the simmetry of the FFT.
So, the output signal has exactly the same number of samples as the input.
That is, you compute the complex FFT of a real signal, which has some kind of symmetry (you need to account for complex conjugation too), and you take the magnitude, which is symmetric. Of course, the plot is then symmetric too.

Related

fft: fitting binned data

I want to fit a curve to data obtained from an FFT. While working on this, I remembered that an FFT gives binned data, and therefore I wondered if I should treat this differently with curve-fitting.
If the bins are narrow compared to the structure, I think it should not be necessary to treat the data differently, but for me that is not the case.
I expect the right way to fit binned data is by minimizing not the difference between values of the bin and fit, but between bin area and the area beneath the fitted curve, for each bin, such that the energy in each bin matches the energy in the range of the bin as signified by the curve.
So my question is: am I thinking correctly about this? If not, how should I go about it?
Also, when looking around for information about this subject, I encountered the "Maximum log likelihood" for example, but did not find enough information about it to understand if and how it applied to my situation.
PS: I have no clue if this is the right site for this question, please let me know if there is a better place.
For an unwindowed FFT, the correct interpolation between bins is by using a Sinc (sin(x)/x) or periodic Sinc (Dirichlet) interpolation kernel. For an FFT of samples of a band-limited signal, thus will reconstruct the continuous spectrum.
A very simple and effective way of interpolating the spectrum (from an FFT) is to use zero-padding. It works both with and without windowing prior to the FFT.
Take your input vector of length N and extend it to length M*N, where M is an integer
Set all values beyond the original N values to zeros
Perform an FFT of length (N*M)
Calculate the magnitude of the ouput bins
What you get is the interpolated spectrum.
Best regards,
Jens
This can be done by using maximum log likelihood estimation. This is a method that finds the set of parameters that is most likely to have yielded the measured data - the technique originates in statistics.
I have finally found an understandable source for how to apply this to binned data. Sadly I cannot enter formulas here, so I refer to that source for a full explanation: slide 4 of this slide show.
EDIT:
For noisier signals this method did not seem to work very well. A method that was a bit more robust is a least squares fit, where the difference between the area is minimized, as suggested in the question.
I have not found any literature to defend this method, but it is similar to what happens in the maximum log likelihood estimation, and yields very similar results for noiseless test cases.

understanding getByteTimeDomainData and getByteFrequencyData in web audio

The documentation for both of these methods are both very generic wherever I look. I would like to know what exactly I'm looking at with the returned arrays I'm getting from each method.
For getByteTimeDomainData, what time period is covered with each pass? I believe most oscopes cover a 32 millisecond span for each pass. Is that what is covered here as well? For the actual element values themselves, the range seems to be 0 - 255. Is this equivalent to -1 - +1 volts?
For getByteFrequencyData the frequencies covered is based on the sampling rate, so each index is an actual frequency, but what about the actual element values themselves? Is there a dB range that is equivalent to the values returned in the returned array?
getByteTimeDomainData (and the newer getFloatTimeDomainData) return an array of the size you requested - its frequencyBinCount, which is calculated as half of the requested fftSize. That array is, of course, at the current sampleRate exposed on the AudioContext, so if it's the default 2048 fftSize, frequencyBinCount will be 1024, and if your device is running at 44.1kHz, that will equate to around 23ms of data.
The byte values do range between 0-255, and yes, that maps to -1 to +1, so 128 is zero. (It's not volts, but full-range unitless values.)
If you use getFloatFrequencyData, the values returned are in dB; if you use the Byte version, the values are mapped based on minDecibels/maxDecibels (see the minDecibels/maxDecibels description).
Mozilla 's documentation describes the difference between getFloatTimeDomainData and getFloatFrequencyData, which I summarize below. Mozilla docs reference the Web Audio
experiment ; the voice-change-o-matic. The voice-change-o-matic illustrates the conceptual difference to me (it only works in my Firefox browser; it does not work in my Chrome browser).
TimeDomain/getFloatTimeDomainData
TimeDomain functions are over some span of time.
We often visualize TimeDomain data using oscilloscopes.
In other words:
we visualize TimeDomain data with a line chart,
where the x-axis (aka the "original domain") is time
and the y axis is a measure of a signal (aka the "amplitude").
Change the voice-change-o-matic "visualizer setting" to Sinewave to
see getFloatTimeDomainData(...)
Frequency/getFloatFrequencyData
Frequency functions (GetByteFrequencyData) are at a point in time; i.e. right now; "the current frequency data"
We sometimes see these in mp3 players/ "winamp bargraph style" music players (aka "equalizer" visualizations).
In other words:
we visualize Frequency data with a bar graph
where the x-axis (aka "domain") are frequencies or frequency bands
and the y-axis is the strength of each frequency band
Change the voice-change-o-matic "visualizer setting" to Frequency bars to see getFloatFrequencyData(...)
Fourier Transform (aka Fast Fourier Transform/FFT)
Another way to think about "time domain vs frequency" is shown the diagram below, from Fast Fourier Transform wikipedia
getFloatTimeDomainData gives you the chart on on the top (x-axis is Time)
getFloatFrequencyData gives you the chart on the bottom (x-axis is Frequency)
a Fast Fourier Transform (FFT) converts the Time Domain data into Frequency data, in other words, FFT converts the first chart to the second chart.
cwilso has it backwards.
the time data array is the longer one (fftSize), and the frequency data array is the shorter one (half that, frequencyBinCount).
fftSize of 2048 at the usual sample rate of 44.1kHz means each sample has 1/44100 duration, you have 2048 samples at hand, and thus are covering a duration of 2048/44100 seconds, which 46 milliseconds, not 23 milliseconds. The frequencyBinCount is indeed 1024, but that refers to the frequency domain (as the name suggests), not the time domain, and it the computation 1024/44100, in this context, is about as meaningful as adding your birth date to the fftSize.
A little math illustrating what's happening: Fourier transform is a 'vector space isomorphism', that is, a mapping going bijectively (i.e., reversible) between 2 vector spaces of the same dimension; the 'time domain' and the 'frequency domain.' The vector space dimension we have here (in both cases) is fftSize.
So where does the 'half' come from? The frequency domain coefficients 'count double'. Either because they 'actually are' complex numbers, or because you have the 'sin' and the 'cos' flavor. Or, because you have a 'magnitude' and a 'phase', which you'll understand if you know how complex numbers work. (Those are 3 ways to say the same in a different jargon, so to speak.)
I don't know why the API only gives us half of the relevant numbers when it comes to frequency - I can only guess. And my guess is that those are the 'magnitude' numbers, and the 'phase' numbers are thrown out. The reason that this is my guess is that in applications, magnitude is far more important than phase. Still, I'm quite surprised that the API throws out information, and I'd be glad if some expert who actually knows (and isn't guessing) can confirm that it's indeed the magnitude. Or - even better (I love to learn) - correct me.

why DCT transform is preferred over other transforms in video/image compression

I went through how DCT (discrete cosine transform) is used in image and video compression standards.
But why DCT only is preferred over other transforms like dft or dst?
Because cos(0) is 1, the first (0th) coefficient of DCT-II is the mean of the values being transformed. This makes the first coefficient of each 8x8 block represent the average tone of its constituent pixels, which is obviously a good start. Subsequent coefficients add increasing levels of detail, starting with sweeping gradients and continuing into increasingly fiddly patterns, and it just so happens that the first few coefficients capture most of the signal in photographic images.
Sin(0) is 0, so the DSTs start with an offset of 0.5 or 1, and the first coefficient is a gentle mound rather than a flat plain. That is unlikely to suit ordinary images, and the result is that DSTs require more coefficients than DCTs to encode most blocks.
The DCT just happens to suit. That is really all there is to it.
When performing image compression, our best bet is to perform the KLT or the Karhunen–Loève transform as it results in the least possible mean square error between the original and the compressed image. However, KLT is dependent on the input image, which makes the compression process impractical.
DCT is the closest approximation to the KL Transform. Mostly we are interested in low frequency signals so only even component is necessary hence its computationally feasible to compute only DCT.
Also, the use of cosines rather than sine functions is critical for compression as fewer cosine functions are needed to approximate a typical signal (See Douglas Bagnall's answer for further explanation).
Another advantage of using cosines is the lack of discontinuities. In DFT, since the signal is represented periodically, when truncating representation coefficients, the signal will tend to "lose its form". In DCT, however, due to the continuous periodic structure, the signal can withstand relatively more coefficient truncation but still keep the desired shape.
The DCT of a image macroblock where the top and bottom and/or the left and right edges don't match will have less energy in the higher frequency coefficients than a DFT. Thus allowing greater opportunities for these high coefficients to be removed, more coarsely quantized or compressed, without creating more visible macroblock boundary artifacts.
DCT is preferred over DFT (Discrete Fourier Transformation) and KLT (Karhunen-Loeve Transformation)
1. Fast algorithm
2. Good energy compaction
3. Only real coefficients

Apple FFT Accelerate Framework Inverse FFT from Array of Real Numbers

I am using the accelerate framework FFT functions to produce a spectrogram of a sound sample. This part works great. However, I want to (effectively) manipulate the spectrum directly (ie manipulate the real numbers), and then call the inverse again, how would I go about doing that? It looks like the INVERSE call expects an array of IMAGINARY numbers, but how can I produce that from my manipulated real numbers? I have tried making the realp array my reals, and the imagp part zero, but that doesn't seem to work.
The reason I ask this is because I wish to run an FFT on a voice audio sample, and then run the FFT again and then lifter the low part of the cepstrum (thus hopefully separating the vocal tract components from the pitch) and then run an inverse FFT again to produce a spectrogram showing the vocal tract (formant) information more clearly (ie, without the pitch information). However, I seem to be running into problems on the inverse FFT, into which I am passing in my real values (cepstrum) in the realp array and the imagp is zero. I think I am doing something wrong here and the results are unexpected.
You need to process the complex forward FFT results, rather than the real magnitudes, or else the shape of the IFFT result spectrum will be distorted. Don't consider them imaginary numbers, consider them to be part of a 2D vector containing the required angular phase information.
If your cepstrum lifter/filter alters only the real magnitudes, then you can try using the amount of change of the real magnitudes as scaling factors to alter your forward complex FFT result before doing a complex IFFT.

How to represent stereo audio data for FFT

How should stereo (2 channel) audio data be represented for FFT? Do you
A. Take the average of the two channels and assign it to the real component of a number and leave the imaginary component 0.
B. Assign one channel to the real component and the other channel to the imag component.
Is there a reason to do one or the other? I searched the web but could not find any definite answers on this.
I'm doing some simple spectrum analysis and, not knowing any better, used option A). This gave me an unexpected result, whereas option B) went as expected. Here are some more details:
I have a WAV file of a piano "middle-C". By definition, middle-C is 260Hz, so I would expect the peak frequency to be at 260Hz and smaller peaks at harmonics. I confirmed this by viewing the spectrum via an audio editing software (Sound Forge). But when I took the FFT myself, with option A), the peak was at 520Hz. With option B), the peak was at 260Hz.
Am I missing something? The explanation that I came up with so far is that representing stereo data using a real and imag component implies that the two channels are independent, which, I suppose they're not, and hence the mess-up.
I don't think you're taking the average correctly. :-)
C. Process each channel separately, assigning the amplitude to the real component and leaving the imaginary component as 0.
Option B does not make sense. Option A, which amounts to convert the signal to mono, is OK (if you are interested in a global spectrum).
Your problem (double freq) is surely related to some misunderstanding in the use of your FFT routines.
Once you take the FFT you need to get the Magnitude of the complex frequency spectrum. To get the magnitude you take the absolute of the complex spectrum |X(w)|. If you want to look at the power spectrum you square the magnitude spectrum, |X(w)|^2.
In terms of your frequency shift I think it has to do with you setting the imaginary parts to zero.
If you imagine the complex Frequency spectrum as a series of complex vectors or position vectors in a cartesian space. If you took one discrete frequency bin X(w), there would be one real component representing its direction in the real axis (x -direction), and one imaginary component in the in the imaginary axis (y - direction). There are four important values about this discrete frequency, 1. real value, 2. imaginary value, 3. Magnitude and, 4. phase. If you just take the real value and set imaginary to 0, you are setting Magnitude = real and phase = 0deg or 90deg. You have hence forth modified the resulting spectrum, and applied a bias to every frequency bin. Take a look at the wiki on Magnitude of a vector, also called the Euclidean norm of a vector to brush up on your understanding. Leonbloy was correct, but I hope this was more informative.
Think of the FFT as a way to get information from a single signal. What you are asking is what is the best way to display data from two signals. My answer would be to treat each independently, and display an FFT for each.
If you want a really fast streaming FFT you can read about an algorithm I wrote here: www.depthcharged.us/?p=176