I understand the complex output of a DFT contains both "amplitude" and "phase" information at discrete frequencies.
Amplitude[n] = sqrt((r[n]*r[n]) + (i[n]*i[n]))
Phase[n] = (atan2(i[n],r[n]))
Frequency[n] = n * (sample_rate / (fft_input_length / 2))
It seems that I should be able to use the frequency, amplitude, and phase information to calculate the amplitude of each output bin as if the input at the corresponding frequency had a zero-phase alignment in the FFT input. But I am drawing a blank.
Hmm, digging deeper into my problem I discovered that the imaginary potion of the FFT output is always 0.0 regardless of the input. So I am guessing my code is flawed or the algorithm is not what I need.
If you want to rotate all DFT result bins to a phase of zero with reference to the start (sample 0): set r[n] = amplitude[n], i[n] = 0; make sure r[n] is symmetric over the full DFT length if you want strictly real data; and compute the IDFT if needed.
Related
This code computes the DFT from time domain.
Can anybody see the code below and help me to get the right answer?
my problem is:
when I change N value, for example, to 4, 5, 10 ,or other values.
X(1) changes with that. but I think X(1) must be the same for every value of N.
just like the shape below: the N value changes but the vertical value is the same.
I appreciate if you help me.
Thank you.
enter image description here
clear; clc;
% %% Analytical
N=4;
k=0:N-1;
X=zeros(N,1);
t=k/N;
x=(5+2*cos(2*pi*t-pi/2)+3*cos(4*pi*t))
%x=abs((1-(0.012.*(pi.*52.*(t-0.3721)).^2)).*exp(-(pi.*52.*(t-0.3721).^2)))
abs(sum(x))
for k=0:N-1
for n=0:N-1
X(k+1)=X(k+1)+x(n+1).*exp(-1i.*2.*pi.*(n).*(k)/N);
end
end
k1=[0:N-1];
stem(k1,abs(X))
% xlim([0 1])
% ylim([-1 1])
xlabel('Frequency');
ylabel('|X(k)|');
title('Frequency domain - Magnitude response')
Your definition of DFT (which is probably the most common definition) does not have the property that X(1) remains constant with N. Instead, it is X(1)/N which will remain constant. To use this DFT to get the magnitudes of the input at various frequencies, you'll need to divide the DFT output by N.
To verify this, you can call Matlab's fft function and compare with your results. You should get the same answer from Matlab's fft. Note that Matlab's fft documentation says:
The resulting FFT amplitude is A*n/2, where A is the original amplitude and n is the number of FFT points.
I am beginner in deep learning.
I am using this dataset and I want my network to detect keypoints of a hand.
How can I make my output layer's nodes to be in range [-1, 1] (range of normalized 2D points)?
Another problem is when I train for more than 1 epoch the loss gets negative values
criterion: torch.nn.MultiLabelSoftMarginLoss() and optimizer: torch.optim.SGD()
Here u can find my repo
net = nnModel.Net()
net = net.to(device)
criterion = nn.MultiLabelSoftMarginLoss()
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=decay_rate)
You can use the Tanh activation function, since the image of the function lies in [-1, 1].
The problem of predicting key-points in an image is more of a regression problem than a classification problem (especially if you're making your model outputs + targets fall within a continuous interval). Therefore, I suggest you use the L2 Loss.
In fact, it could be a good exercise for you to determine which loss function that is appropriate for regression problems provides the lowest expected generalization error using cross-validation. There's several such functions available in PyTorch.
One way I can think of is to use torch.nn.Sigmoid which produces outputs in [0,1] range and scale outputs to [-1,1] using 2*x-1 transformation.
I get how the DFT via correlation works, and use that as a basis for understanding the results of the FFT. If I have a discrete signal that was sampled at 44.1kHz, then that means if I were to take 1s of data, I would have 44,100 samples. In order to run the FFT on that, I would have to have an array of 44,100 and a DFT with N=44,100 in order to get the resolution necessary to detect a frequencies up to 22kHz, right? (Because the FFT can only correlate the input with sinusoidal components up to a frequency of N/2)
That's obviously a lot of data points and calculation time, and I have read that this is where the Short-time FT (STFT) comes in. If I then take the first 1024 samples (~23ms) and run the FFT on that, then take an overlapping 1024 samples, I can get the continuous frequency domain of the signal every 23ms. Then how do I interpret the output? If the output of the FFT on static data is N/2 data points with fs/(N/2) bandwidth, what is the bandwidth of the STFT's frequency output?
Here's an example that I ran in Mathematica:
100Hz sine wave at 44.1kHz sample rate:
Then I run the FFT on only the first 1024 points:
The frequency of interest is then at data point 3, which should somehow correspond to 100Hz. I think 44100/1024 = 43 is something like a scaling factor, which means that a signal with 1Hz in this little window will then correspond to a signal of 43Hz in the full data array. However, this would give me an output of 43Hz*3 = 129Hz. Is my logic correct but not my implementation?
As I have already stated in my earlier comments, the variable N affects the resolution achievable by the output frequency spectrum and not the range of frequencies you can detect.A larger N gives you a higher resolution at the expense of higher computation time and a lower N gives you lower computation time but can cause spectral leakage, which is the effect you have seen in your last figure.
As for your other question, well, theoretically the bandwidth of an FFT is infinite but we band-limit our result to the band of frequencies in the range [-fs/2 to fs/2] because all frequencies outside that band are susceptible to aliasing and are therefore of no use.Furthermore, if the input signal is real (which is true in most cases including ours) then the frequencies from [-fs/2 to 0] are just a reflection of the frequencies from [0 to fs/2] and so some FFT procedures just output the FFT spectrum from [0 to fs/2], which I think applies to your case.This means that the N/2 data points that you received as output represent the frequencies in the range [0 to fs/2] so that is the bandwidth you are working with in the case of the FFT and also in the case of the STFT (the STFT is just a series of FFT's, each FFT in a STFT will give you a spectrum with data points in this band).
I would also like to point out that the STFT will most likely not reduce your computation time if your input is a varying signal such as music because in that case you will need to take perform it several times over the duration of the song for it to be of any use, it will however enable you to understand the frequency characteristics of your song much better that you would do if you just performed one FFT.
To visualise the results of an FFT you use frequency (and/or phase) spectrum plots but in order to visualise the results of an STFT you will most probably need to create a spectrogram which is basically a graph can is made by just basically putting the individual FFT spectrums side by side.The process of creating a spectrogram can be seen in the figure below (Source: Dan Ellis - Introduction to Speech Processing).The spectrogram will show you how your signal's frequency characteristics change over time and how you interpret it will depend on what specific features you are looking to extract/detect from the audio.You might want to look at the spectrogram wikipedia page for more information.
I have a simple sine function as sin(2*pift+phi). I want to obtain the phase signal phi.
I tried to use FFT to calculate phi. In matlab I do the following
f=200; %frequency of sine wave
overSampRate=30; %oversampling rate
fs=overSampRate*f; %sampling frequency
phase = 3/5*pi; %desired phase shift in radians
nCyl = 5; %to generate five cycles of sine wave
t=0:1/fs:nCyl*1/f; %time base
x=sin(2*pi*f*t+phase); %replace with cos if a cosine wave is desired
NFFT=1024; %NFFT-point DFT
X=fft(x,NFFT); %compute DFT using FFT
XX=2*abs(X(1:NFFT/2+1));
[tt ind]=max(XX);
phase_Estimate=angle(X(ind);
This result makes almost no sense to me. For example, when phi=0.523, phase_Estimate is obtained -0.98.
Using an non-interpolated FFT result phase only works if the period of sinusoid is exactly an integer submultiple of the FFT length. In your example, the sine wave isn't integer periodic in aperture.
If not, you will need to interpolate the phase to get a better estimate. Here's one method to get an better interpolated phase:
First fftshift (rotate by N/2) the data to move the zero phase reference point to the center of the window before doing the FFT. (This is needed to keep the phase from flipping/alternating between adjacent FFT result bins. * )
Then do the FFT and estimate the frequency of the sinusoid by parabolic or, better yet, Sinc interpolation.
Then use the estimated frequency to linearly interpolate the phase between the nearest two FFT result bin phases. Update: Or better yet, use Sinc interpolation of the real and imaginary components of the FFT result separately, then use atan2 on the interpolated IQ components to get an interpolated phase.
Then use the estimated frequency and phase at the center of the window to calculate the phase at some other point, such as the beginning of the FFT window.
Also note that the phase of a sine is different from the phase of a cosine wave by pi/2. atan(im,re) returns the cosine phase.
(* as an alternative to pre-fftshit-ing the data, one could also post-flip the phase of the odd FFT result bins.)
This is actually much more difficult question to answer than it first seems.
The answer #hotpaw2 gives is completely correct and spells it out way better than any other resource I found, but it is still only an outline and it took me a few hours to put all the meat to it's bones.
In the hopes that someone else will also find the question relevant (and for future reference for myself), here is a bit more thorough explanation:
Suppose you have a (local) maximum at index ind (as in the case of question).
Step 1: try to interpolate the more precise location of the maximum by using the two surrounding values. This is well explained in many, many places such as https://www.dsprelated.com/freebooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html has a good explanation for how to do that, but TL:DR version is:
delta = 0.5*(X[ind-1]-X[ind+1])/(X[ind-1]-2*X[ind]+X[ind+1])
p0 = ind+delta
with the estimated peak at p0
(If you want a more precise estimate, use log(X[ind-1]) instead, or go full out and use sinc function, but for most purposes, the delta above is sufficient)
Step 2: the tricky part: use that location to interpolate the phase.
The first instinct is to do simple linear interpolation using the delta we just found:
i0 = floor(p0); w = p0-i0; wp = 1-w
ang = wp*angle(X[i0]) + w*angle(X[i0+1])
This WILL NOT WORK for multiple reasons, most of which were outlined by #hotpaw2. The first of them is that this is not how you average angles, as they wrap modulo 2pi so 0 and 2pi should be similar. The more correct approach is to average the normalized complex numbers instead:
ang = angle(wp*X[i0]/abs(X[i0]) + w*X[i0+1]/abs(X[i0+1]))
However, this is still not correct, because if a peak is between i0 and i0+1, the phase flips 180 degrees (pi radians) there, making the average very misleading. To fix this "phase flip", you either have to (a) perform fftshift before fft (yes, in time-domain) or (b) flip the phase of every odd-indexed value of X (achieved by multiplying the complex number with -1) or (in case you are reluctant to touch the FFT as I was), you can also just (c) mock the approach (b) with the following code:
i0 = floor(p0); w = p0-i0; wp = 1-w
if (i0 % 2 == 1) { w*=-1; wp*=-1 } # Flip both if i0 odd
ang = angle(wp*X[i0]/abs(X[i0]) - w*X[i0+1]/abs(X[i0+1])) # Note the "-" here!
This will give you a (mostly) correct phase, but for a cosine and at the center of fft window.
Step 3 (Optional): If you need the phase for sine and from the beginning of the window, you need to add a correction factor:
ang_beg = ang - (2*pi*p0/N)*N/2 + pi*0.5 = ang - pi*(p0 - 0.5)
(0.5*pi converts cos to sin, and -p0*pi translates to beginning of window).
This seemed to work, at least in the Phase Vocoder I needed it in. Hopefully someone else will also find this useful.
As an aside, the phase interpolation is not needed for a pure sine wave, as angle(X[i0]) = angle(-X[i0+1]) so you can just use it directly. With actual signals, there is likely to be some deviation so interpolation adds some robustness which is usually a good idea, although using w and wp and normalizing is likely overkill and angle(sgn*(X[i0]-X[i0+1)) is usually enough.
Any comments to all this are very welcome. I am not a DSP specialist, so I may well be wrong in some details, bu this does seem to work, so hopefully someone else will also find it useful.
You're trying to get the phase from the power spectrum (XX) when you should be getting it from the FFT (X). Change:
phase_Estimate=angle(XX(ind));
to:
phase_Estimate=angle(X(ind));
It maybe late, but I changed Your script a little
f=200; %frequency of sine wave
overSampRate=30; %oversampling rate
fs=overSampRate*f; %sampling frequency
shift = 30
phase = shift*pi/180; %desired phase shift in radians
nCyl = 5; %to generate five cycles of sine wave
t=0:1/fs:nCyl*1/f; %time base
x=cos(2*pi*f*t+phase); %replace with cos if a cosine wave is desired
NFFT=4096; %NFFT-point DFT
X=fft(x,NFFT); %compute DFT using FFT
XX=2*abs(X(1:NFFT/2+1));
[tt, ind]=max(XX);
phase_Estimate = angle(X(ind)) * 360/(2*pi)
It spits out quite close results to what I would expect.
I changed the x vector generation to cosine, calculated degrees in phase_Estimate instead of radians, and made it easy to change input phase shift.
I am trying to do an FFT on some data I have captured. I am working in the 10MHz-100MHz range, so my 8192 sample captures will not be big enough to convey anything meaningful when doing an FFT on them. So I am taking many non-overlapping captures of a sine wave and want to average them together.
What I am currently doing (in Scilab) in a for-loop for every file is:
temp1 = read_csv(filename,"\t");
temp1_fft = fft(temp1);
temp1_fft = temp1_fft .* conj(temp1_fft);
temp1_fft = log10(temp1_fft);
fft_code = fft_code + temp1_fft;
And then when I am done with all the files I:
fft_code = fft_code./numFiles;
But I am not so sure that I am handling this correctly. Is there a better way for non-overlapping samples?
I think you are close, but you should average the magnitude of the spectrums (temp1_fft) before taking the log10. Otherwise you essentially end up multiplying them instead of averaging. So instead, just move the log10 to outside the for loop like so (I don't know scilab syntax):
for filename in files:
temp1 = read_csv(filename,"\t");
temp1_fft = fft(temp1);
temp1_fft = temp1_fft .* conj(temp1_fft);
fft_code = fft_code + temp1_fft;
fft_code = fft_code./numFiles;
fft_code = log10(fft_code);
You definitely want to use the magnitude (you are already doing this when you multiply by the conj), as the phase information will depend on when your sampling began relative to the signal. If you need the phase information, you have to make sure your acquisitions are in sync with the signal somehow.
What this does is called "Power Spectrum Averaging":
Power Spectrum Averaging is also called RMS Averaging. RMS averaging computes the weighted mean of the sum of the squared magnitudes (FFT times its complex conjugate). The weighting is either linear or exponential. RMS averaging reduces fluctuations in the data but does not reduce the actual noise floor. With a sufficient number of averages, a very good approximation of the actual random noise floor can be displayed. Since RMS averaging involves magnitudes only, displaying the real or imaginary part, or phase, of an RMS average has no meaning and the power spectrum average has no phase information.