How to interpret cuFFT R2C result - fft

I'm in the process of accelerating some data analysis code with GPU and am currently doing some profiling and comparisons between the numpy.fft library and cuFFT (using the skcuda.fft wrapper).
I'm certain I'm just missing something obvious about the FFT implementation in cuFFT, but I'm struggling to find what it is in the cuFFT documentation.
To setup the problem I create 500 ms of data sampled at 100 MS/s with a few spectral components. Then, I declare the GPU arrays, the cufft plan (R2C) and run the fft with a subset of the data. Finally, I have a comparison with numpy.fft.rfft:
time = np.linspace(0, 500E-3, int(50E6))
freq = 100E6
data = np.sin(2*np.pi*time*1E6)
data += np.sin(2*np.pi*time*2E6 + 0.5)
data += np.sin(2*np.pi*time*3E6 - 0.1)
data += np.sin(2*np.pi*time*4E6 - 0.9)
data += np.sin(2*np.pi*time*1E6 - 1.9)
data += np.sin(2*np.pi*time*15E6 - 2.1)
data += np.sin(2*np.pi*time*20E6 - 0.3)
data += np.sin(2*np.pi*time*25E6 - 0.3)
nPtsFFT = int(2**13)
dDev = gp.GPUArray(nPtsFFT, np.float64)
dDev.set(data[:nPtsFFT])
rDev = gp.GPUArray(int(nPtsFFT/2+1), np.float64)
plan = cufft.Plan(nPtsFFT, np.float64, np.complex128)
cufft.fft(dDev, rDev, plan)
rHost = rDev.get()
freqs = np.fft.rfftfreq(nPtsFFT, 1/freq)
hfftRes = np.fft.rfft(data[:nPtsFFT])
plt.loglog(freqs, np.abs(hfftRes), label='npfft')
plt.loglog(freqs, np.abs(rHost), label='cufft')
plt.legend()
plt.show()
I naively assumed that these would be approximately equal, but what I find is that the cufft peaks are all shifted and every other point is lower than expected.
This reminded me of the output of scipy.fftpack.rfft, so checking the documentation there I found the interleaving of Re and Im parts. So, if I modify the plotting to:
plt.loglog(freqs, np.abs(hfftRes), label='npfft')
plt.loglog(freqs[:-1:2]/2, np.abs(rHost[:-1:2] + 1j*rHost[1::2]), label='cufft')
plt.legend()
plt.show()
I now get what I expect but only up to 25 MHz, when I should be able to get data up to 50 MHz given the sampling rate.
Is there a way to extract the data up to the Nyquist frequency from this transform?

Since the R2C interface produces complex outputs you have to provide an array of np.complex128 types to get the entire int(nPtsFFT/2+1) complex values, not just int(nPtsFFT/2+1) floating point values (which would corresponds to only half the quantity of data).
This can be done by changing the rDev definition as follows (and keeping everything else the same):
rDev = gp.GPUArray(int(nPtsFFT/2+1), np.complex128)
plan = cufft.Plan(nPtsFFT, np.float64, np.complex128)
cufft.fft(dDev, rDev, plan)
rHost = rDev.get()
freqs = np.fft.rfftfreq(nPtsFFT, 1/freq)
hfftRes = np.fft.rfft(data[:nPtsFFT])
plt.loglog(freqs, np.abs(hfftRes), label='npfft')
plt.loglog(freqs, np.abs(rHost), label='cufft')
plt.legend()
plt.show()
The result should then go all the way up to 50MHz Nyquist frequency as expected, with the spikes lining up nicely with the reference np.fft.rfft implementation.

I haven't yet found an explanation of the cufft output, but I can get the behaviour I want with cupyx.scipy.fft.rfft, which might be useful if anyone else finds the same problem.
deviceData = cp.array(data, dtype = np.float64)
fftPlan = cpfftPack.get_fft_plan(deviceData[:nPtsFFT], (nPtsFFT,),
value_type = 'R2C')
dfft = cpfft.rfft(deviceData[:nPtsFFT], plan = fftPlan)
h_dfft = dfft.get()
hfft = npfft.rfft(data[:nPtsFFT])
plt.loglog(freqs, np.abs(hfft), label='npfft')
plt.loglog(freqs, np.abs(h_dfft), label='cupyx fft')
plt.legend()
plt.show()

Related

How to define a custom loss function for a multi-dimensional target

I'm working with Tensorflow 2.0 and I'm using a normal sequential layer.
I'm trying to define a custom loss functions which does the following:
takes some elements of the input
computes their sum and invert the result
multiplies the result with a part of y_pred
constrain the result to be as close to 1 as possibile
Thus the loss function would be L() = MSE() + (described above)
My code follows:
def custom_loss_wrapper(input_train):
#tf.function
def summing(row):
return tf.math.reduce_sum(row, 1,keepdims=True)
#tf.function
def custom_loss(y_true, y_pred):
row_M = input_train
row_M = row_M[:, 2:5]
sum_M = summing(row_M)
inv_M = (1/sum_M)
row_B = y_pred[:, :3]
sum_B = summing(row_B)
row_Q = tf.math.multiply(inv_M,row_B)
alpha = 0.01
penalty = K.mean(K.square(sum_Q - 1))
return K.mean(K.square(y_true - y_pred)) + (1/alpha) * penalty
return custom_loss
I would like to understand if what I'm doing is right. There are not errors and the training runs, but I do not know if this piece of code does what I'm trying to define. Mostly if this works correctly considering batches of data and not single records

Initialisation of weights for deeplearning model

I am going through a book on deep learning which initializes weights between two layers of neurons as:
w = np.random.randn(layers[i] + 1, layers[i + 1] + 1)
self.W.append(w / np.sqrt(layers[i]))
As per the book, divison by np.sqrt(layers[i]) in second line of code is done for following reason:
scale w by dividing by the square root of the number of nodes in the current layer, thereby
normalizing the variance of each neuron’s output
What does it exactly mean? And how would it impact if we don't do it?
Weights initialization is very important to tackle the vanishing/Explosion Gradients. In order for the output/gradients(reverse direction) to flow properly, the variance of the outputs of each layer to be equal to the variance of its input. Likewise of gradients in the reverse direction. the input and output flow of a layer is called fan-in and fan-out of the layer.
To better explain what I mean above, let me give you an example. Assume that we have a hundred consecutive layers and we apply a feed forward calculation with linear activation (After all it is just matrix multiplication), the data is 500 samples of 100 features:
neurons, features = 100, 100
n_layers = 100
X = np.random.normal(size=(500, features)) # your input
mean, var = 0, 0
for layer in range(n_layers):
W = np.random.normal(size=(features, neurons))
X = np.dot(X, W)
mean = mean + X.mean()
var = var + X.var()
mean/n_layers, np.sqrt(var/n_layers)
# output:
(-4.055498760574568e+95, 8.424477240271639e+98)
You will see that it will have a huge mean and standard deviations. Lets break this problem down; a property of a matrix multiplication of which the result will have a standard deviation very close to the square root of the number of fan in (input) connections. This property can be verified with this snippet of code:
fan_in = 1000 # change it to any number
X = np.random.normal(size=(100, fan_in))
W = np.random.normal(size=(fan_in, 1))
np.dot(X, W).std()
# result:
32.764359213560454
This happens because we sum fan_in (1000 in the above case) products of the element-wise multiplication of one element of inputs X by one column of W. Therefore, if we scaled every weights by the 1/sqrt(fan_in) to maintain the distribution of the flow as seen in the following snippet:
neurons, features = 100, 100
n_layers = 100
X = np.random.normal(size=(500, features)) # your input
mean, var = 0, 0
for layer in range(n_layers):
W = np.random.normal(size=(features, neurons), scale=np.sqrt(1 / neurons)) # scaled the weights with the fan-in
X = np.dot(X, W)
mean = mean + X.mean()
var = var + X.var()
mean/n_layers, np.sqrt(var/n_layers)
# output:
(0.0002608301398189543, 1.021452570914829)
You can read more about kernel initialization in the following blog

Implementing WNGrad in Pytorch?

I'm trying to implement the WNGrad (technically WN-Adam, algorithm 4 in the paper) optimizier (WNGrad) in pytorch. I've never implemented an optimizer in pytorch before so I don't know if I've done it correctly (I started from the adam implementation). The optimizer does not make much progress and falls down like I would expect (bj values can only monotonically increase, which happens quickly so no progress is made) but I'm guessing I have a bug. Standard optimizers (Adam, SGD) work fine on the same model I'm trying to optimize.
Does this implementation look correct?
from torch.optim import Optimizer
class WNAdam(Optimizer):
"""Implements WNAdam algorithm.
It has been proposed in `WNGrad: Learn the Learning Rate in Gradient Descent`_.
Arguments:
params (iterable): iterable of parameters to optimize or dicts defining
parameter groups
lr (float, optional): learning rate (default: 0.1)
beta1 (float, optional): exponential smoothing coefficient for gradient.
When beta=0 this implements WNGrad.
.. _WNGrad\: Learn the Learning Rate in Gradient Descent:
https://arxiv.org/abs/1803.02865
"""
def __init__(self, params, lr=0.1, beta1=0.9):
if not 0.0 <= beta1 < 1.0:
raise ValueError("Invalid beta1 parameter: {}".format(beta1))
defaults = dict(lr=lr, beta1=beta1)
super().__init__(params, defaults)
def step(self, closure=None):
"""Performs a single optimization step.
Arguments:
closure (callable, optional): A closure that reevaluates the model
and returns the loss.
"""
loss = None
if closure is not None:
loss = closure()
for group in self.param_groups:
for p in group['params']:
if p.grad is None:
continue
grad = p.grad.data
state = self.state[p]
# State initialization
if len(state) == 0:
state['step'] = 0
# Exponential moving average of gradient values
state['exp_avg'] = torch.zeros_like(p.data)
# Learning rate adjustment
state['bj'] = 1.0
exp_avg = state['exp_avg']
beta1 = group['beta1']
state['step'] += 1
state['bj'] += (group['lr']**2)/(state['bj'])*grad.pow(2).sum()
# update exponential moving average
exp_avg.mul_(beta1).add_(1 - beta1, grad)
bias_correction = 1 - beta1 ** state['step']
p.data.sub_(group['lr'] / state['bj'] / bias_correction, exp_avg)
return loss
The paper's author has an open sourced implementation on GitHub.
The WNGrad paper
states it's inspired by batch (and weight) normalization. You should use L2 norm with respect to the weight dimensions (don't sum it all) as show in this algorithm

How to compute Fourier coefficients with MATLAB

I'm trying to compute the Fourier coefficients for a waveform using MATLAB. The coefficients can be computed using the following formulas:
T is chosen to be 1 which gives omega = 2pi.
However I'm having issues performing the integrals. The functions are are triangle wave (Which can be generated using sawtooth(t,0.5) if I'm not mistaking) as well as a square wave.
I've tried with the following code (For the triangle wave):
function [ a0,am,bm ] = test( numTerms )
b_m = zeros(1,numTerms);
w=2*pi;
for i = 1:numTerms
f1 = #(t) sawtooth(t,0.5).*cos(i*w*t);
f2 = #(t) sawtooth(t,0.5).*sin(i*w*t);
am(i) = 2*quad(f1,0,1);
bm(i) = 2*quad(f2,0,1);
end
end
However it's not getting anywhere near the values I need. The b_m coefficients are given for a
triangle wave and are supposed to be 1/m^2 and -1/m^2 when m is odd alternating beginning with the positive term.
The major issue for me is that I don't quite understand how integrals work in MATLAB and I'm not sure whether or not the approach I've chosen works.
Edit:
To clairify, this is the form that I'm looking to write the function on when the coefficients have been determined:
Here's an attempt using fft:
function [ a0,am,bm ] = test( numTerms )
T=2*pi;
w=1;
t = [0:0.1:2];
f = fft(sawtooth(t,0.5));
am = real(f);
bm = imag(f);
func = num2str(f(1));
for i = 1:numTerms
func = strcat(func,'+',num2str(am(i)),'*cos(',num2str(i*w),'*t)','+',num2str(bm(i)),'*sin(',num2str(i*w),'*t)');
end
y = inline(func);
plot(t,y(t));
end
Looks to me that your problem is what sawtooth returns the mathworks documentation says that:
sawtooth(t,width) generates a modified triangle wave where width, a scalar parameter between 0 and 1, determines the point between 0 and 2π at which the maximum occurs. The function increases from -1 to 1 on the interval 0 to 2πwidth, then decreases linearly from 1 to -1 on the interval 2πwidth to 2π. Thus a parameter of 0.5 specifies a standard triangle wave, symmetric about time instant π with peak-to-peak amplitude of 1. sawtooth(t,1) is equivalent to sawtooth(t).
So I'm guessing that's part of your problem.
After you responded I looked into it some more. Looks to me like it's the quad function; not very accurate! I recast the problem like this:
function [ a0,am,bm ] = sotest( t, numTerms )
bm = zeros(1,numTerms);
am = zeros(1,numTerms);
% 2L = 1
L = 0.5;
for ii = 1:numTerms
am(ii) = (1/L)*quadl(#(x) aCos(x,ii,L),0,2*L);
bm(ii) = (1/L)*quadl(#(x) aSin(x,ii,L),0,2*L);
end
ii = 0;
a0 = (1/L)*trapz( t, t.*cos((ii*pi*t)/L) );
% now let's test it
y = ones(size(t))*(a0/2);
for ii=1:numTerms
y = y + am(ii)*cos(ii*2*pi*t);
y = y + bm(ii)*sin(ii*2*pi*t);
end
figure; plot( t, y);
end
function a = aCos(t,n,L)
a = t.*cos((n*pi*t)/L);
end
function b = aSin(t,n,L)
b = t.*sin((n*pi*t)/L);
end
And then I called it like:
[ a0,am,bm ] = sotest( t, 100 );
and I got:
Sweetness!!!
All I really changed was from quad to quadl. I figured that out by using trapz which worked great until the time vector I was using didn't have enough resolution, which led me to believe it was a numerical issue rather than something fundamental. Hope this helps!
To troubleshoot your code I would plot the functions you are using and investigate, how the quad function samples them. You might be undersampling them, so make sure your minimum step size is smaller than the period of the function by at least factor 10.
I would suggest using the FFTs that are built-in to Matlab. Not only is the FFT the most efficient method to compute a spectrum (it is n*log(n) dependent on the length n of the array, whereas the integral in n^2 dependent), it will also give you automatically the frequency points that are supported by your (equally spaced) time data. If you compute the integral yourself (might be needed if datapoints are not equally spaced), you might calculate frequency data that are not resolved (closer spacing than 1/over the spacing in time, i.e. beyond the 'Fourier limit').

Interpreting jTransform FFT results

Merged with Power Spectral Density from jTransforms DoubleFFT_1D.
I'm using Jtransforms java library to perform analysis on a give dataset.
An example of the data is as follows:
980,988,1160,1080,928,1068,1156,1152,1176,1264
I'm using the DoubleFFT_1D funtion in jTransforms. The data output is as follows:
10952, -152, 80.052, 379.936, -307.691, 12.734, -224.052, 427.607, -48.308, 81.472
I'm having trouble interpreting the output. I understand that the first element in the output array is the total of the 10 inputs (10952). It's the other elements of the output array that i don't understand. Ultimately, I want to plot the Power Spectral Density of the input data on a graph and find amounts between 0 and .5 Hz.
The documentation for the jTransform functions states:
(where a is the data set)
.....................
realForward
public void realForward(double[] a)Computes 1D forward DFT of real data leaving the result in a . The physical layout of the output data is as follows:
if n is even then
a[2*k] = Re[k], 0 <= k < n / 2
a[2*k+1] = Im[k], 0 < k < n / 2
a[1] = Re[n/2]
if n is odd then
a[2*k] = Re[k], 0 <= k < (n+1)/2
a[2*k+1] = Im[k], 0 < k< (n-1)/2
a[1] = Im[(n-1)/2]
This method computes only half of the elements of the real transform. The other half satisfies the symmetry condition. If you want the full real forward transform, use realForwardFull. To get back the original data, use realInverse on the output of this method.
Parameters:
a - data to transform
..................................
So what are the output numbers? What do the values mean?
Any help is appreciated.