Envelope of a signal regarding shifts - fft

Assume I have a Ricker wavelet. I can compute the envelope of this wavelet as shown below:
This is the normal condition we usually see.
However, if I shift the Ricker wavelet to be wholly negative, and then I compute its envelope. A confusing phenomenon happens that the envelope looks like the opposite of the original wavelet:
Furthermore, if I shift the Ricker wavelet to be wholly positive, and compute its envelope. You can see the envelope is almost the same as the original wavelet:
Does anybody know the mathematical explanation behind these phenomena?
And how can we avoid the latter two cases? Remove the mean value of the wavelet to force it having zero mean?

I assume that you are working with Python with the Scipy module:
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
points = 100
a = 4.0
myricker = signal.ricker(points, a)
The envelope is typically computed as the absolute value of the analytic signal, which is the Hilbert transform of the original signal:
analytic_signal = signal.hilbert(myricker)
amplitude_envelope = np.abs(analytic_signal)
The analytic signal is a complex quantity.
In order to understand the behavior of envelope, try plotting its real and imaginary parts:
fig, ((ax1, ax2)) = plt.subplots(2, sharey=True)
fig.suptitle('')
ax1.plot(myricker, label='myricker')
ax1.plot(amplitude_envelope, label='envelope')
ax1.legend()
ax2.plot(np.real(analytic_signal), label='real')
ax2.plot(np.imag(analytic_signal), color='black', label='imaginary')
ax2.legend()
This is what you should get:
The real part of the analytic signal is identical to the original signal. The imaginary part is often referred to as the Hilbert transform itself.
Now if you shift your original ricker upwards by adding a small constant value (e.g. 0.2), the real part of the signal will be shifted accordingly, but the imaginary part will remain the same and therefore its contribution to the envelope will be smaller:
As you increase the shift, the contribution of the imaginary part to the envelope becomes smaller and smaller. Here for a shift of 1, it is so small that the envelope looks very close to the original ricker wavelet:

Related

DM Script, why does the fourier transform of gaussian-kenel needs modulus

Recently I learn DM_Script for TEM image processing
I needed Gaussian blur process and I found one whose name is 'Gaussian Blur' in http://www.dmscripting.com/recent_updates.html
This code implements Gaussian blur algorithm by multiplying the fast fourier transform(FFT) of source image by the FFT of Gaussian-kernel image and finally doing inverse fourier transform of it.
Here is the part of the code,
// Carry out the convolution in Fourier space
compleximage fftkernelimg:=realFFT(kernelimg) (-> FFT of Gaussian-kernel image)
compleximage FFTSource:=realfft(warpimg) (-> FFT of source image)
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
realimage invFFT:=realIFFT(FFTProduct)
The point I want to ask is this
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
Why does the FFT of Gaussian-kernel need '.modulus().sqrt()' for the convolution?
It is related to the fact that the fourier transform of a Gaussian function becomes another Gaussian function?
Or It is related to a sort of limitation of discrete fourier transform?
Please answer me
Thanks
This is related to the general precision limitation of any floating point numeric computing. (see f.e. here, or more in depth here)
A rotational (real-valued) Gaussian of stand.dev. sigma should be transformed into a 100% real-values rotational Gaussioan of 1/sigma. However, doing this numerically will show you deviations: Just try the following:
number sigma = 30
number A0 = 1
realimage first := RealImage( "First", 8, 256, 256 )
first = A0 * exp( - (iradius**2/(2*sigma*sigma) ))
first.showimage()
complexImage second := FFT(first)
second.Showimage()
image nonZeroImaginaryMask = ( 0 != second.Imaginary() )
nonZeroImaginaryMask.Showimage()
nonZeroImaginaryMask.SetLimits(0,1)
When you then multiply these complex images (before back-transferring) you are introducing even more errors. By using modulus, one ensures that the forward transformed kernel is purely real and hence a better "damping" curve.
A better implementation of a FFT filtering code would actually create the FFT(Gaussian) directly with a std.dev of 1/sigma, as this is the analytically correct result. Doing a FFT of the kernel only makes sense if the kernel (or its FFT) is not analytically known.
In general: When implementing any "maths" into a program code, it can pay hugely to think it through with numerical computation limits in the back of your head. Reduce actual computation whenever possible (i.e. compute analytically and use the result instead of relying on brute force numerical computation) and try to "reshape" equations when possible, f.e. avoid large sums over many small numbers, be careful about checks against exact numeric values, try to avoid expressions which are very sensitive on small numerica errors etc.

How can a fourier transform ever be used for exprapolation?

Please excuse a clueless newbie question.
Since a discrete Fourier Transform on a fixed interval is treated as repeating indefinitely, how can it ever be used to extrapolate a time series? What follows the end of the interval will always be identical to the beginning.
Even a simple least square fit would at least give a trend.
How can all that cycle information in a FT be useless for extrapolation?
How? By changing your initial assumption.
One does not need to assume that the input to a DFT repeats indefinitely exactly periodic in aperture width. Assuming that the input is a rectangular window upon a longer stationary sequence which may or may not be periodic within the DFT aperture is also a valid assumption, and commonly used to interpolate/estimate "between bin" spectra.
(e.g. if the DFT result looks exactly like offset samples of Sinc function corresponding to the window width, one could assume that this is the result of a rectangular window upon a single low degree of freedom oscillator, or extreme luck or an alien intelligence that just happens to order all N bins in just such an interesting pattern. Occam's razor may or may not suggest that the former is a better assumption depending on your model of the input.)
Extending interpolated "between bin" or estimated non-periodic-in-aperture spectra (e.g. after deconvolving the assumed Sinc distortion caused by the rectangular window) beyond the end of the DFT aperture/window may allow extrapolating data not identical with the beginning of the DFT aperture/window.

why DCT transform is preferred over other transforms in video/image compression

I went through how DCT (discrete cosine transform) is used in image and video compression standards.
But why DCT only is preferred over other transforms like dft or dst?
Because cos(0) is 1, the first (0th) coefficient of DCT-II is the mean of the values being transformed. This makes the first coefficient of each 8x8 block represent the average tone of its constituent pixels, which is obviously a good start. Subsequent coefficients add increasing levels of detail, starting with sweeping gradients and continuing into increasingly fiddly patterns, and it just so happens that the first few coefficients capture most of the signal in photographic images.
Sin(0) is 0, so the DSTs start with an offset of 0.5 or 1, and the first coefficient is a gentle mound rather than a flat plain. That is unlikely to suit ordinary images, and the result is that DSTs require more coefficients than DCTs to encode most blocks.
The DCT just happens to suit. That is really all there is to it.
When performing image compression, our best bet is to perform the KLT or the Karhunen–Loève transform as it results in the least possible mean square error between the original and the compressed image. However, KLT is dependent on the input image, which makes the compression process impractical.
DCT is the closest approximation to the KL Transform. Mostly we are interested in low frequency signals so only even component is necessary hence its computationally feasible to compute only DCT.
Also, the use of cosines rather than sine functions is critical for compression as fewer cosine functions are needed to approximate a typical signal (See Douglas Bagnall's answer for further explanation).
Another advantage of using cosines is the lack of discontinuities. In DFT, since the signal is represented periodically, when truncating representation coefficients, the signal will tend to "lose its form". In DCT, however, due to the continuous periodic structure, the signal can withstand relatively more coefficient truncation but still keep the desired shape.
The DCT of a image macroblock where the top and bottom and/or the left and right edges don't match will have less energy in the higher frequency coefficients than a DFT. Thus allowing greater opportunities for these high coefficients to be removed, more coarsely quantized or compressed, without creating more visible macroblock boundary artifacts.
DCT is preferred over DFT (Discrete Fourier Transformation) and KLT (Karhunen-Loeve Transformation)
1. Fast algorithm
2. Good energy compaction
3. Only real coefficients

How to detect local maxima and curve windows correctly in semi complex scenarios?

I have a series of data and need to detect peak values in the series within a certain number of readings (window size) and excluding a certain level of background "noise." I also need to capture the starting and stopping points of the appreciable curves (ie, when it starts ticking up and then when it stops ticking down).
The data are high precision floats.
Here's a quick sketch that captures the most common scenarios that I'm up against visually:
One method I attempted was to pass a window of size X along the curve going backwards to detect the peaks. It started off working well, but I missed a lot of conditions initially not anticipated. Another method I started to work out was a growing window that would discover the longer duration curves. Yet another approach used a more calculus based approach that watches for some velocity / gradient aspects. None seemed to hit the sweet spot, probably due to my lack of experience in statistical analysis.
Perhaps I need to use some kind of a statistical analysis package to cover my bases vs writing my own algorithm? Or would there be an efficient method for tackling this directly with SQL with some kind of local max techniques? I'm simply not sure how to approach this efficiently. Each method I try it seems that I keep missing various thresholds, detecting too many peak values or not capturing entire events (reporting a peak datapoint too early in the reading process).
Ultimately this is implemented in Ruby and so if you could advise as to the most efficient and correct way to approach this problem with Ruby that would be appreciated, however I'm open to a language agnostic algorithmic approach as well. Or is there a certain library that would address the various issues I'm up against in this scenario of detecting the maximum peaks?
my idea is simple, after get your windows of interest you will need find all the peaks in this window, you can just compare the last value with the next , after this you will have where the peaks occur and you can decide where are the best peak.
I wrote one simple source in matlab to show my idea!
My example are in wave from audio file :-)
waveFile='Chick_eco.wav';
[y, fs, nbits]=wavread(waveFile);
subplot(2,2,1); plot(y); legend('Original signal');
startIndex=15000;
WindowSize=100;
endIndex=startIndex+WindowSize-1;
frame = y(startIndex:endIndex);
nframe=length(frame)
%find the peaks
peaks = zeros(nframe,1);
k=3;
while(k <= nframe - 1)
y1 = frame(k - 1);
y2 = frame(k);
y3 = frame(k + 1);
if (y2 > 0)
if (y2 > y1 && y2 >= y3)
peaks(k)=frame(k);
end
end
k=k+1;
end
peaks2=peaks;
peaks2(peaks2<=0)=nan;
subplot(2,2,2); plot(frame); legend('Get Window Length = 100');
subplot(2,2,3); plot(peaks); legend('Where are the PEAKS');
subplot(2,2,4); plot(frame); legend('Peaks in the Window');
hold on; plot(peaks2, '*');
for j = 1 : nframe
if (peaks(j) > 0)
fprintf('Local=%i\n', j);
fprintf('Value=%i\n', peaks(j));
end
end
%Where the Local Maxima occur
[maxivalue, maxi]=max(peaks)
you can see all the peaks and where it occurs
Local=37
Value=3.266296e-001
Local=51
Value=4.333496e-002
Local=65
Value=5.049438e-001
Local=80
Value=4.286804e-001
Local=84
Value=3.110046e-001
I'll propose a couple of different ideas. One is to use discrete wavelets, the other is to use the geographer's concept of prominence.
Wavelets: Apply some sort of wavelet decomposition to your data. There are multiple choices, with Daubechies wavelets being the most widely used. You want the low frequency peaks. Zero out the high frequency wavelet elements, reconstruct your data, and look for local extrema.
Prominence: Those noisy peaks and valleys are of key interest to geographers. They want to know exactly which of a mountain's multiple little peaks is tallest, the exact location of the lowest point in the valley. Find the local minima and maxima in your data set. You should have a sequence of min/max/min/max/.../min. (You might want to add an arbitrary end points that are lower than your global minimum.) Consider a min/max/min sequence. Classify each of these triples per the difference between the max and the larger of the two minima. Make a reduced sequence that replaces the smallest of these triples with the smaller of the two minima. Iterate until you get down to a single min/max/min triple. In your example, you want the next layer down, the min/max/min/max/min sequence.
Note: I'm going to describe the algorithmic steps as if each pass were distinct. Obviously, in a specific implementation, you can combine steps where it makes sense for your application. For the purposes of my explanation, it makes the text a little more clear.
I'm going to make some assumptions about your problem:
The windows of interest (the signals that you are looking for) cover a fraction of the entire data space (i.e., it's not one long signal).
The windows have significant scope (i.e., they aren't one pixel wide on your picture).
The windows have a minimum peak of interest (i.e., even if the signal exceeds the background noise, the peak must have an additional signal excess of the background).
The windows will never overlap (i.e., each can be examined as a distinct sub-problem out of context of the rest of the signal).
Given those, you can first look through your data stream for a set of windows of interest. You can do this by making a first pass through the data: moving from left to right, look for noise threshold crossing points. If the signal was below the noise floor and exceeds it on the next sample, that's a candidate starting point for a window (vice versa for the candidate end point).
Now make a pass through your candidate windows: compare the scope and contents of each window with the values defined above. To use your picture as an example, the small peaks on the left of the image barely exceed the noise floor and do so for too short a time. However, the window in the center of the screen clearly has a wide time extent and a significant max value. Keep the windows that meet your minimum criteria, discard those that are trivial.
Now to examine your remaining windows in detail (remember, they can be treated individually). The peak is easy to find: pass through the window and keep the local max. With respect to the leading and trailing edges of the signal, you can see n the picture that you have a window that's slightly larger than the actual point at which the signal exceeds the noise floor. In this case, you can use a finite difference approximation to calculate the first derivative of the signal. You know that the leading edge will be somewhat to the left of the window on the chart: look for a point at which the first derivative exceeds a positive noise floor of its own (the slope turns upwards sharply). Do the same for the trailing edge (which will always be to the right of the window).
Result: a set of time windows, the leading and trailing edges of the signals and the peak that occured in that window.
It looks like the definition of a window is the range of x over which y is above the threshold. So use that to determine the size of the window. Within that, locate the largest value, thus finding the peak.
If that fails, then what additional criteria do you have for defining a region of interest? You may need to nail down your implicit assumptions to more than 'that looks like a peak to me'.

How to represent stereo audio data for FFT

How should stereo (2 channel) audio data be represented for FFT? Do you
A. Take the average of the two channels and assign it to the real component of a number and leave the imaginary component 0.
B. Assign one channel to the real component and the other channel to the imag component.
Is there a reason to do one or the other? I searched the web but could not find any definite answers on this.
I'm doing some simple spectrum analysis and, not knowing any better, used option A). This gave me an unexpected result, whereas option B) went as expected. Here are some more details:
I have a WAV file of a piano "middle-C". By definition, middle-C is 260Hz, so I would expect the peak frequency to be at 260Hz and smaller peaks at harmonics. I confirmed this by viewing the spectrum via an audio editing software (Sound Forge). But when I took the FFT myself, with option A), the peak was at 520Hz. With option B), the peak was at 260Hz.
Am I missing something? The explanation that I came up with so far is that representing stereo data using a real and imag component implies that the two channels are independent, which, I suppose they're not, and hence the mess-up.
I don't think you're taking the average correctly. :-)
C. Process each channel separately, assigning the amplitude to the real component and leaving the imaginary component as 0.
Option B does not make sense. Option A, which amounts to convert the signal to mono, is OK (if you are interested in a global spectrum).
Your problem (double freq) is surely related to some misunderstanding in the use of your FFT routines.
Once you take the FFT you need to get the Magnitude of the complex frequency spectrum. To get the magnitude you take the absolute of the complex spectrum |X(w)|. If you want to look at the power spectrum you square the magnitude spectrum, |X(w)|^2.
In terms of your frequency shift I think it has to do with you setting the imaginary parts to zero.
If you imagine the complex Frequency spectrum as a series of complex vectors or position vectors in a cartesian space. If you took one discrete frequency bin X(w), there would be one real component representing its direction in the real axis (x -direction), and one imaginary component in the in the imaginary axis (y - direction). There are four important values about this discrete frequency, 1. real value, 2. imaginary value, 3. Magnitude and, 4. phase. If you just take the real value and set imaginary to 0, you are setting Magnitude = real and phase = 0deg or 90deg. You have hence forth modified the resulting spectrum, and applied a bias to every frequency bin. Take a look at the wiki on Magnitude of a vector, also called the Euclidean norm of a vector to brush up on your understanding. Leonbloy was correct, but I hope this was more informative.
Think of the FFT as a way to get information from a single signal. What you are asking is what is the best way to display data from two signals. My answer would be to treat each independently, and display an FFT for each.
If you want a really fast streaming FFT you can read about an algorithm I wrote here: www.depthcharged.us/?p=176