I need to create a specific custom waveform for an oscillator for use with Web Audio API.
I have a Javascript function to output the desired waveform (calculating a y between -1 and 1 for any given x), and the plotted result looks like this:
However the Web Audio API documentation only lets you create custom wavetables based on harmonic tables via the createPeriodicWave function, which can then be used to configure a custom oscillator via setPeriodicWave. Is there a generic technique that can be used to compute the harmonic tables based on my waveform function?
A DFT (or FFT) with a length of exactly one period of your custom waveform will produce a harmonic table. Just low-pass filter and sample your waveform 2^N times, and feed that to a generic library FFT. (Choose a large enough 2^N to be at least more than 2X the low-pass filter's or your waveform's intrinsic highest frequency content). The magnitudes of the FFT's resulting complex bins will be your harmonic power levels.
Related
I am trying to implement nn.MultiheadAttention in my network. According to the docs,
embed_dim – total dimension of the model.
However, according to the source file,
embed_dim must be divisible by num_heads
and
self.q_proj_weight = Parameter(torch.Tensor(embed_dim, embed_dim))
If I understand properly, this means each head takes only a part of features of each query, as the matrix is quadratic. Is it a bug of realization or is my understanding wrong?
Each head uses a different part of the projected query vector. You can imagine it as if the query gets split into num_heads vectors that are independently used to compute the scaled dot-product attention. So, each head operates on a different linear combination of the features in queries (and keys and values, too). This linear projection is done using the self.q_proj_weight matrix and the projected queries are passed to F.multi_head_attention_forward function.
In F.multi_head_attention_forward, it is implemented by reshaping and transposing the query vector, so that the independent attentions for individual heads can be computed efficiently by matrix multiplication.
The attention head sizes are a design decision of PyTorch. In theory, you could have a different head size, so the projection matrix would have a shape of embedding_dim × num_heads * head_dims. Some implementations of transformers (such as C++-based Marian for machine translation, or Huggingface's Transformers) allow that.
I would like to know how one could generate a wavetable out of a wav file for example.
I know a wavetable can be used in web audio api with setPerdiodic wave and I know how to use it.
But what do I need to do to create my own wavetables? I read about inverse FFT, but I did find nearly nothing. I don't need any code just an idea or a formula of how to get the wavetable from an wav file to a Buffer.
There are a few constraints here and I'm not sure how good the result will be.
Your wav file source can't be too long; the PeriodicWave object
only supports arrays up to size 8192 or so.
I'm going to assume your waveform is intended to be periodic. If the
last sample and the first aren't reasonably close to each other,
there will be a hard-to-reproduce jump.
The waveform must have zero mean, so if it doesn't you should remove
the mean.
With that taken care of, select a power of two greater than the length
of your wave file (not strictly needed, but most FFTs expect powers of
two). Zero-pad the wave file if the length is not a power of two.
Then compute the the FFT. You'll either get an array of complex
numbers or two arrays. Separate these out to real and imaginary
arrays and use them for contructing the PeriodicWave.
All examples that I can find on the Internet just visualize the result array of the function computeSpectrum, but I am tasked with something else.
I generate a music note and I need by analyzing the result array to be able to say what note is playing. I figured out that I need to set the second parameter of the function call 'FFTMode' to true and then it returns sound frequencies. I thought that really it should return only one non-zero value which I could use to determine what note I generated using Math.sin function, but it is not the case.
Can somebody suggest a way how I can accomplish the task? Using the soundMixer.computeSpectrum is a requirement because I am going to analyze more complex sounds later.
FFT will transform your signal window into set of Nyquist sine waves so unless 440Hz is one of them you will obtain more than just one nonzero value! For a single sine wave you would obtain 2 frequencies due to aliasing. Here an example:
As you can see for exact Nyquist frequency the FFT response is single peak but for nearby frequencies there are more peaks.
Due to shape of the signal you can obtain continuous spectrum with peaks instead of discrete values.
Frequency of i-th sample is f(i)=i*samplerate/N where i={0,1,2,3,4,...(N/2)-1} is sample index (first one is DC offset so not frequency for 0) and N is the count of samples passed to FFT.
So in case you want to detect some harmonics (multiples of single fundamental frequency) then set the samplerate and N so samplerate/N is that fundamental frequency or divider of it. That way you would obtain just one peak for harmonics sinwaves. Easing up the computations.
I am using the accelerate framework FFT functions to produce a spectrogram of a sound sample. This part works great. However, I want to (effectively) manipulate the spectrum directly (ie manipulate the real numbers), and then call the inverse again, how would I go about doing that? It looks like the INVERSE call expects an array of IMAGINARY numbers, but how can I produce that from my manipulated real numbers? I have tried making the realp array my reals, and the imagp part zero, but that doesn't seem to work.
The reason I ask this is because I wish to run an FFT on a voice audio sample, and then run the FFT again and then lifter the low part of the cepstrum (thus hopefully separating the vocal tract components from the pitch) and then run an inverse FFT again to produce a spectrogram showing the vocal tract (formant) information more clearly (ie, without the pitch information). However, I seem to be running into problems on the inverse FFT, into which I am passing in my real values (cepstrum) in the realp array and the imagp is zero. I think I am doing something wrong here and the results are unexpected.
You need to process the complex forward FFT results, rather than the real magnitudes, or else the shape of the IFFT result spectrum will be distorted. Don't consider them imaginary numbers, consider them to be part of a 2D vector containing the required angular phase information.
If your cepstrum lifter/filter alters only the real magnitudes, then you can try using the amount of change of the real magnitudes as scaling factors to alter your forward complex FFT result before doing a complex IFFT.
How should stereo (2 channel) audio data be represented for FFT? Do you
A. Take the average of the two channels and assign it to the real component of a number and leave the imaginary component 0.
B. Assign one channel to the real component and the other channel to the imag component.
Is there a reason to do one or the other? I searched the web but could not find any definite answers on this.
I'm doing some simple spectrum analysis and, not knowing any better, used option A). This gave me an unexpected result, whereas option B) went as expected. Here are some more details:
I have a WAV file of a piano "middle-C". By definition, middle-C is 260Hz, so I would expect the peak frequency to be at 260Hz and smaller peaks at harmonics. I confirmed this by viewing the spectrum via an audio editing software (Sound Forge). But when I took the FFT myself, with option A), the peak was at 520Hz. With option B), the peak was at 260Hz.
Am I missing something? The explanation that I came up with so far is that representing stereo data using a real and imag component implies that the two channels are independent, which, I suppose they're not, and hence the mess-up.
I don't think you're taking the average correctly. :-)
C. Process each channel separately, assigning the amplitude to the real component and leaving the imaginary component as 0.
Option B does not make sense. Option A, which amounts to convert the signal to mono, is OK (if you are interested in a global spectrum).
Your problem (double freq) is surely related to some misunderstanding in the use of your FFT routines.
Once you take the FFT you need to get the Magnitude of the complex frequency spectrum. To get the magnitude you take the absolute of the complex spectrum |X(w)|. If you want to look at the power spectrum you square the magnitude spectrum, |X(w)|^2.
In terms of your frequency shift I think it has to do with you setting the imaginary parts to zero.
If you imagine the complex Frequency spectrum as a series of complex vectors or position vectors in a cartesian space. If you took one discrete frequency bin X(w), there would be one real component representing its direction in the real axis (x -direction), and one imaginary component in the in the imaginary axis (y - direction). There are four important values about this discrete frequency, 1. real value, 2. imaginary value, 3. Magnitude and, 4. phase. If you just take the real value and set imaginary to 0, you are setting Magnitude = real and phase = 0deg or 90deg. You have hence forth modified the resulting spectrum, and applied a bias to every frequency bin. Take a look at the wiki on Magnitude of a vector, also called the Euclidean norm of a vector to brush up on your understanding. Leonbloy was correct, but I hope this was more informative.
Think of the FFT as a way to get information from a single signal. What you are asking is what is the best way to display data from two signals. My answer would be to treat each independently, and display an FFT for each.
If you want a really fast streaming FFT you can read about an algorithm I wrote here: www.depthcharged.us/?p=176