How to interpret FFT data for making a spectrum visualizer - fft

I am trying to visualize a spectrum where the frequency range is divided into N bars, either linearly or logarithmic. The FFT seems to work fine, but I am not sure how to interpret the values in order to decide the max height for the visualization.
I am using FMODAudio, a wrapper for C#. It's set up correctly.
In the case of a linear spectrum, the bars are defined as following:
public int InitializeSpectrum(int windowSize = 1024, int maxBars = 16)
{
numSamplesPerBar_Linear.Clear();
int barSamples = (windowSize / 2) / maxBars;
for (int i = 0; i < maxBars; ++i)
{
numSamplesPerBar_Linear.Add(barSamples);
}
IsInitialized = true;
Data = new float[numSamplesPerBar_Linear.Count];
return numSamplesPerBar_Linear.Count;
}
Data is the array which holds the spectrum values received from the update loop.
The update looks like this:
public unsafe void UpdateSpectrum(ref ParameterFFT* fftData)
{
int length = fftData->Length / 2;
if (length > 0)
{
int indexFFT = 0;
for (int index = 0; index < numSamplesPerBar_Linear.Count; ++index)
{
for (int frec = 0; frec < numSamplesPerBar_Linear[index]; ++frec)
{
for (int channel = 0; channel < fftData->ChannelCount; ++channel)
{
var floatspectrum = fftData->GetSpectrum(channel); //this is a readonlyspan<float> by default.
Data[index] += floatspectrum[indexFFT];
}
++indexFFT;
}
Data[index] /= (float)(numSamplesPerBar_Linear[index] * fftData->ChannelCount); // average of both channels for more meaningful values.
}
}
}
The values I get when testing a song are very low across the bands.
A randomly chosen moment when playing a song gives these values:
16 bars = 0,0326 0,0031 0,001 0,0003 0,0004 0,0003 0,0001 0,0002 0,0001 0,0001 0,0001 0 0 0 0 0
I realize it's more useful to use a logarithmic spectrum in many cases, and I intend to, but I still need to figure how how to find the max values for each bar so that I can setup the visualization on a proper scale.
Q: How can I know the potential max values for each bar based on this setup (it's not 1.0)?

output from FFT call is an array where each element is a complex number ( A + Bi ) where A is the real number component and B the imaginary number component ... element zero of this array represents frequency zero as in DC which is the offset bias can typically be ignored ... as you iterate across each element of this array you increment the frequency ... this freq increment is calculated using
Audio_samples <-- array of raw audio samples in PCM format which gets
fed into FFT call
num_fft_bins := float64(len(Audio_samples)) / 2.0 // using Nyquist theorem
freq_incr_per_bin := (input_audio_sample_rate / 2.0) / num_fft_bins
so to answer your question the output array from FFT call is a linear progression evenly spaced based in above freq increment constant

Depends on your input data to the FFT, and the scaling that your particular FFT implementation uses (not all FFTs use the same scale factor).
With an energy preserving forward-FFT, Parseval's theorem applies. So the energy (sum of squares) of the input vector equals the energy of the FFT result vector. Note that for a single integer periodic in aperture sinusoidal input (a pure tone), all that energy can appear in a single FFT result element. So if you know the maximum possible input energy, you can use that to compute the maximum possible result element magnitude for scaling purposes.
The range is often large enough that visualizers commonly need to use log scaling, or else typical input can get pixel quantized to a graph of all zeros.

Related

How to denormalize values to do the Harmonic Product Spectrum

I'm trying to execute the HPS algorithm and the results are not right. (48000Hz, 16bits)
I've applied to a buffer with the recorded frequency several splits, then a Hanning window, and finally the FFT.
I've obtained a peak in each FFT, that correspond with the frequency I am using, or an octave of it. But when i do the HPS, the results of the fundamental frequency are 0, because the numbers of the array where I make the sum(multiply) are too small, more than my peak in the original FFT.
This is the code of the HPS:
int i_max_h = 0;
double m_max_h = miniBuffer[0];
//m_max is the value of the peak in the original time domain array
m_max_h = m_max;
//array for the sum
double sum [] = new double[miniBuffer.length];
int fund_freq = 0;
//It could be divide by 3, but I'm not going over 500Hz, so it should works
for(int k = 0; k < 24000/48 ; k++)
{
//HPS down sampling and multiply
sum[k] = miniBuffer[k] * miniBuffer[2*k] * miniBuffer[3*k];
// find fundamental frequency (maximum value in plot)
if( sum[k] > m_max_h && k > 0 )
{
m_max_h = sum[k];
i_max_h = k;
}
}
//This should get the fundamental freq. from sum
fund_freq = (i_max_h * Fs / 24000);
System.out.print("Fundamental Freq.: ");
System.out.println(fund_freq);
System.out.println("");
The original HPS code is HERE
I don't know why the sum have little values, when it should be bigger than the previous, and the peak of the sum too. I've applied a RealFordward FFT, maybe there is a problem with the -1 to 1 range, that makes my sum decrease when I multiply it.
Any idea how to fix it, to do the HPS?
How could i do the inverse normalize?
The problem was that I was trying to get a higher value of amplitude on the sum array (the HPS array), and my set of values are normalize since I apply the FFT algorithm to them.
This is the solution I've created, multiplying the individual values of the sum array by 10 before make the multiply.
The number 10 is a coefficient that I have selected, but it could be wrong in some high frequencies cases, this coefficient could be another higher number.
'''
for(int k = 0; k < 24000/48 ; k++)
{
sum[k] = ((miniBuffer[k]*10) * (miniBuffer[2*k]*10) * (miniBuffer[3*k]*10));
// find fundamental frequency (maximum value in plot)
if( sum[k] > m_max_h && k > 0 )
{
m_max_h = sum[k];
i_max_h = k;
}
}
'''
The range of the frequencies is 24000/48 = 500, so it's between 0 and 499 Hz, more than I need in a bass.
If the split of the full array is less than 24000, i should decrease the number 48, and this is admissible, because the down sampled arrays are 24000/3 and 24000/2, so this value could decrease to 3, and it should work well.

3D binary image to 3D mesh using itk

I'm trying to generate a 3d mesh using 3d RLE binary mask.
In itk, I find a class named itkBinaryMask3DMeshSource
it's based on MarchingCubes algorithm
some example, use this class, ExtractIsoSurface et ExtractIsoSurface
in my case, I have a rle 3D binary mask but represented in 1d vector format.
I'm writing a function for this task.
My function takes as parameters :
Inputs : crle 1d vector ( computed rle), dimension Int3
Output : coord + coord indices ( or generate a single file contain both of those array; and next I can use them to visualize the mesh )
as a first step, I decoded this computed rle.
next, I use imageIterator to create an image compatible to BinaryMask3DMeshSource.
I'm blocked, in the last step.
This is my code :
void GenerateMeshFromCrle(const std::vector<int>& crle, const Int3 & dim,
std::vector<float>* coords, std::vector<int>*coord_indices, int* nodes,
int* cells, const char* outputmeshfile) {
std::vector<int> mask(crle.back());
CrleDecode(crle, mask.data());
// here we define our itk Image type with a 3 dimension
using ImageType = itk::Image< unsigned char, 3 >;
ImageType::Pointer image = ImageType::New();
// an Image is defined by start index and size for each axes
// By default, we set the first start index from x=0,y=0,z=0
ImageType::IndexType start;
start[0] = 0; // first index on X
start[1] = 0; // first index on Y
start[2] = 0; // first index on Z
// until here, no problem
// We set the image size on x,y,z from the dim input parameters
// itk takes Z Y X
ImageType::SizeType size;
size[0] = dim.z; // size along X
size[1] = dim.y; // size along Y
size[2] = dim.x; // size along Z
ImageType::RegionType region;
region.SetSize(size);
region.SetIndex(start);
image->SetRegions(region);
image->Allocate();
// Set the pixels to value from rle
// This is a fast way
itk::ImageRegionIterator<ImageType> imageIterator(image, region);
int n = 0;
while (!imageIterator.IsAtEnd() && n < mask.size()) {
// Set the current pixel to the value from rle
imageIterator.Set(mask[n]);
++imageIterator;
++n;
}
// In this step, we launch itkBinaryMask3DMeshSource
using BinaryThresholdFilterType = itk::BinaryThresholdImageFilter< ImageType, ImageType >;
BinaryThresholdFilterType::Pointer threshold =
BinaryThresholdFilterType::New();
threshold->SetInput(image->GetOutput()); // here it's an error, since no GetOutput member for image
threshold->SetLowerThreshold(0);
threshold->SetUpperThreshold(1);
threshold->SetOutsideValue(0);
using MeshType = itk::Mesh< double, 3 >;
using FilterType = itk::BinaryMask3DMeshSource< ImageType, MeshType >;
FilterType::Pointer filter = FilterType::New();
filter->SetInput(threshold->GetOutput());
filter->SetObjectValue(1);
using WriterType = itk::MeshFileWriter< MeshType >;
WriterType::Pointer writer = WriterType::New();
writer->SetFileName(outputmeshfile);
writer->SetInput(filter->GetOutput());
}
any idea
I appreciate your time.
Since image is not a filter you can plug it in directly: threshold->SetInput(image);. At the end of this function, you also need writer->Update();. The rest looks good.
Side-note: it looks like you might benefit from usage of import filter instead of manually iterating the buffer and copying values one at a time.

Relationship between FFT sampling rate, Bandwidth, and Resolution

I´m trying to write a code for omitting one sideband of FFT and shift the other on to the center,
I know the sampling rate (2 GHz) and the number of samples is (10000) the sidebands located in (-55,-355) and (55,355)
I want to know the frequency resolution of each spectral line
this is the code I´ve written...
void compfft (double *source, double *destination, int length)
{
double *realPart = malloc(length*sizeof(double));
double *ImgPart = malloc(length*sizeof(double));
int index,i,j;
for (index= 0;index< length; index++)
{
realPart[index] = source[index]; //data to a local array
}
memset(ImgPart, 0, sizeof(ImgPart));
FFT(realPart, ImgPart, length); //Take fft
//shifting the destination array
for(i=0; i<(length/4) ; i++){
*destination[i]=* realPart[i+749];
}
//filling the destination array with source array values from 55 Hz to 355 Hz
for(j=99; j<(length/5); j++){
destination[j] = realPart[j+750];
}
free(realPart);
free(ImgPart);
}
but my supervisor told me it´s wrong and I need to read more about the basics
I´m really confused plz help ..
The bandwidth and resolution depend not only on sample rate, but on the FFT window length and shape, as well as how you measure or define bandwidth and resolution, and potentially the signal to noise ratio.

Translation from Complex-FFT to Finite-Field-FFT

Good afternoon!
I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have.
Consider the following code (coefficients' length, let it be m, is an exact power of two):
/// <summary>
/// Calculates the result of the recursive Number Theoretic Transform.
/// </summary>
/// <param name="coefficients"></param>
/// <returns></returns>
private static BigInteger[] Recursive_NTT_Skeleton(
IList<BigInteger> coefficients,
IList<BigInteger> rootsOfUnity,
int step,
int offset)
{
// Calculate the length of vectors at the current step of recursion.
// -
int n = coefficients.Count / step - offset / step;
if (n == 1)
{
return new BigInteger[] { coefficients[offset] };
}
BigInteger[] results = new BigInteger[n];
IList<BigInteger> resultEvens =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset);
IList<BigInteger> resultOdds =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset + step);
for (int k = 0; k < n / 2; k++)
{
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
}
return results;
}
It worked for complex FFT (replace BigInteger with a complex numeric type (I had my own)). It doesn't work here even though I changed the procedure of finding the primitive roots of unity appropriately.
Supposedly, the problem is this: rootsOfUnity parameter passed originally contained only the first half of m-th complex roots of unity in this order:
omega^0 = 1, omega^1, omega^2, ..., omega^(n/2)
It was enough, because on these three lines of code:
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
I originally made use of the fact, that at any level of recursion (for any n and i), the complex root of unity -omega^(i) = omega^(i + n/2).
However, that property obviously doesn't hold in finite fields. But is there any analogue of it which would allow me to still compute only the first half of the roots?
Or should I extend the cycle from n/2 to n and pre-compute all the m-th roots of unity?
Maybe there are other problems with this code?..
Thank you very much in advance!
I recently wanted to implement NTT for fast multiplication instead of DFFT too. Read a lot of confusing things, different letters everywhere and no simple solution, and also my finite fields knowledge is rusty , but today i finally got it right (after 2 days of trying and analog-ing with DFT coefficients) so here are my insights for NTT:
Computation
X(i) = sum(j=0..n-1) of ( Wn^(i*j)*x(i) );
where X[] is NTT transformed x[] of size n where Wn is the NTT basis. All computations are on integer modulo arithmetics mod p no complex numbers anywhere.
Important values
Wn = r ^ L mod p is basis for NTT
Wn = r ^ (p-1-L) mod p is basis for INTT
Rn = n ^ (p-2) mod p is scaling multiplicative constant for INTT ~(1/n)
p is prime that p mod n == 1 and p>max'
max is max value of x[i] for NTT or X[i] for INTT
r = <1,p)
L = <1,p) and also divides p-1
r,L must be combined so r^(L*i) mod p == 1 if i=0 or i=n
r,L must be combined so r^(L*i) mod p != 1 if 0 < i < n
max' is the sub-result max value and depends on n and type of computation. For single (I)NTT it is max' = n*max but for convolution of two n sized vectors it is max' = n*max*max etc. See Implementing FFT over finite fields for more info about it.
functional combination of r,L,p is different for different n
this is important, you have to recompute or select parameters from table before each NTT layer (n is always half of the previous recursion).
Here is my C++ code that finds the r,L,p parameters (needs modular arithmetics which is not included, you can replace it with (a+b)%c,(a-b)%c,(a*b)%c,... but in that case beware of overflows especial for modpow and modmul) The code is not optimized yet there are ways to speed it up considerably. Also prime table is fairly limited so either use SoE or any other algo to obtain primes up to max' in order to work safely.
DWORD _arithmetics_primes[]=
{
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,
179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,
419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,
661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997,1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093,1097,1103,1109,1117,1123,1129,1151,
0}; // end of table is 0, the more primes are there the bigger numbers and n can be used
// compute NTT consts W=r^L%p for n
int i,j,k,n=16;
long w,W,iW,p,r,L,l,e;
long max=81*n; // edit1: max num for NTT for my multiplication purposses
for (e=1,j=0;e;j++) // find prime p that p%n=1 AND p>max ... 9*9=81
{
p=_arithmetics_primes[j];
if (!p) break;
if ((p>max)&&(p%n==1))
for (r=2;r<p;r++) // check all r
{
for (l=1;l<p;l++)// all l that divide p-1
{
L=(p-1);
if (L%l!=0) continue;
L/=l;
W=modpow(r,L,p);
e=0;
for (w=1,i=0;i<=n;i++,w=modmul(w,W,p))
{
if ((i==0) &&(w!=1)) { e=1; break; }
if ((i==n) &&(w!=1)) { e=1; break; }
if ((i>0)&&(i<n)&&(w==1)) { e=1; break; }
}
if (!e) break;
}
if (!e) break;
}
}
if (e) { error; } // error no combination r,l,p for n found
W=modpow(r, L,p); // Wn for NTT
iW=modpow(r,p-1-L,p); // Wn for INTT
and here is my slow NTT and INTT implementations (i havent got to fast NTT,INTT yet) they are both tested with Schönhage–Strassen multiplication successfully.
//---------------------------------------------------------------------------
void NTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wj,wi,a,n2=n>>1;
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=a;
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
void INTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wi=1,wj=1,rN,a,n2=n>>1;
rN=modpow(n,m-2,m);
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=modmul(a,rN,m);
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
dst is destination array
src is source array
n is array size
m is modulus (p)
w is basis (Wn)
hope this helps to someone. If i forgot something please write ...
[edit1: fast NTT/INTT]
Finally I manage to get fast NTT/INTT to work. Was little bit more tricky than normal FFT:
//---------------------------------------------------------------------------
void _NFTT(long *dst,long *src,long n,long m,long w)
{
if (n<=1) { if (n==1) dst[0]=src[0]; return; }
long i,j,a0,a1,n2=n>>1,w2=modmul(w,w,m);
// reorder even,odd
for (i=0,j=0;i<n2;i++,j+=2) dst[i]=src[j];
for ( j=1;i<n ;i++,j+=2) dst[i]=src[j];
// recursion
_NFTT(src ,dst ,n2,m,w2); // even
_NFTT(src+n2,dst+n2,n2,m,w2); // odd
// restore results
for (w2=1,i=0,j=n2;i<n2;i++,j++,w2=modmul(w2,w,m))
{
a0=src[i];
a1=modmul(src[j],w2,m);
dst[i]=modadd(a0,a1,m);
dst[j]=modsub(a0,a1,m);
}
}
//---------------------------------------------------------------------------
void _INFTT(long *dst,long *src,long n,long m,long w)
{
long i,rN;
rN=modpow(n,m-2,m);
_NFTT(dst,src,n,m,w);
for (i=0;i<n;i++) dst[i]=modmul(dst[i],rN,m);
}
//---------------------------------------------------------------------------
[edit3]
I have optimized my code (3x times faster than code above),but still i am not satisfied with it so i started new question with it. There I have optimized my code even further (about 40x times faster than code above) so its almost the same speed as FFT on floating point of the same bit size. Link to it is here:
Modular arithmetics and NTT (finite field DFT) optimizations
To turn Cooley-Tukey (complex) FFT into modular arithmetic approach, i.e. NTT, you must replace complex definition for omega. For the approach to be purely recursive, you also need to recalculate omega for each level based on current signal size. This is possible because min. suitable modulus decreases as we move down in the call tree, so modulus used for root is suitable for lower layers. Additionally, as we are using same modulus, the same generator may be used as we move down the call tree. Also, for inverse transform, you should take additional step to take recalculated omega a and instead use as omega: b = a ^ -1 (via using inverse modulo operation). Specifically, b = invMod(a, N) s.t. b * a == 1 (mod N), where N is the chosen prime modulus.
Rewriting an expression involving omega by exploiting periodicity still works in modular arithmetic realm. You also need to find a way to determine the modulus (a prime) for the problem and a valid generator.
We note that your code works, though it is not a MWE. We extended it using common sense, and got correct result for a polynomial multiplication application. You just have to provide correct values of omega raised to certain powers.
While your code works, though, like from many other sources, you double spacing for each level. This does not lead to recursion that is as clean, though; this turns out to be identical to recalculating omega based on current signal size because the power for omega definition is inversely proportional to signal size. To reiterate: halving signal size is like squaring omega, which is like giving doubled powers for omega (which is what one would do for doubling of spacing). The nice thing about the approach that deals with recalculating of omega is that each subproblem is more cleanly complete in its own right.
There is a paper that shows some of the math for modular approach; it is a paper by Baktir and Sunar from 2006. See the paper at the end of this post.
You do not need to extend the cycle from n / 2 to n.
So, yes, some sources which say to just drop in a different omega definition for modular arithmetic approach are sweeping under the rug many details.
Another issue is that it is important to acknowledge that the signal size must be large enough if we are to not have overflow for result time-domain signal if we are performing convolution. Additionally, it may be useful to find certain implementations for exponentiation subject to modulus exist that are fast, even if the power is quite large.
References
Baktir and Sunar - Achieving efficient polynomial multiplication in Fermat fields using the fast Fourier transform (2006)
You must make sure that roots of unity actually exist. In R there are only 2 roots of unity: 1 and -1, since only for them x^n=1 can be true.
In C you have infinitely many roots of unity: w=exp(2*pi*i/N) is a primitive N-th roots of unity and all w^k for 0<=k
Now to your problem: you have to make sure the ring you're working in offers the same property: enough roots of unity.
Schönhage and Strassen (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm) use integers modulo 2^N+1. This ring has enough roots of unity. 2^N == -1 is a 2nd root of unity, 2^(N/2) is a 4th root of unity and so on. Furthermore, these roots of unity have the advantage that they are powers of two and can be implemented as binary shifts (with a modulo operation afterwards, which comes down to a add/subtract).
I think QuickMul (http://www.cs.nyu.edu/exact/doc/qmul.ps) works modulo 2^N-1.

Interpreting jTransform FFT results

Merged with Power Spectral Density from jTransforms DoubleFFT_1D.
I'm using Jtransforms java library to perform analysis on a give dataset.
An example of the data is as follows:
980,988,1160,1080,928,1068,1156,1152,1176,1264
I'm using the DoubleFFT_1D funtion in jTransforms. The data output is as follows:
10952, -152, 80.052, 379.936, -307.691, 12.734, -224.052, 427.607, -48.308, 81.472
I'm having trouble interpreting the output. I understand that the first element in the output array is the total of the 10 inputs (10952). It's the other elements of the output array that i don't understand. Ultimately, I want to plot the Power Spectral Density of the input data on a graph and find amounts between 0 and .5 Hz.
The documentation for the jTransform functions states:
(where a is the data set)
.....................
realForward
public void realForward(double[] a)Computes 1D forward DFT of real data leaving the result in a . The physical layout of the output data is as follows:
if n is even then
a[2*k] = Re[k], 0 <= k < n / 2
a[2*k+1] = Im[k], 0 < k < n / 2
a[1] = Re[n/2]
if n is odd then
a[2*k] = Re[k], 0 <= k < (n+1)/2
a[2*k+1] = Im[k], 0 < k< (n-1)/2
a[1] = Im[(n-1)/2]
This method computes only half of the elements of the real transform. The other half satisfies the symmetry condition. If you want the full real forward transform, use realForwardFull. To get back the original data, use realInverse on the output of this method.
Parameters:
a - data to transform
..................................
So what are the output numbers? What do the values mean?
Any help is appreciated.