Relationship between FFT sampling rate, Bandwidth, and Resolution - fft

I´m trying to write a code for omitting one sideband of FFT and shift the other on to the center,
I know the sampling rate (2 GHz) and the number of samples is (10000) the sidebands located in (-55,-355) and (55,355)
I want to know the frequency resolution of each spectral line
this is the code I´ve written...
void compfft (double *source, double *destination, int length)
{
double *realPart = malloc(length*sizeof(double));
double *ImgPart = malloc(length*sizeof(double));
int index,i,j;
for (index= 0;index< length; index++)
{
realPart[index] = source[index]; //data to a local array
}
memset(ImgPart, 0, sizeof(ImgPart));
FFT(realPart, ImgPart, length); //Take fft
//shifting the destination array
for(i=0; i<(length/4) ; i++){
*destination[i]=* realPart[i+749];
}
//filling the destination array with source array values from 55 Hz to 355 Hz
for(j=99; j<(length/5); j++){
destination[j] = realPart[j+750];
}
free(realPart);
free(ImgPart);
}
but my supervisor told me it´s wrong and I need to read more about the basics
I´m really confused plz help ..

The bandwidth and resolution depend not only on sample rate, but on the FFT window length and shape, as well as how you measure or define bandwidth and resolution, and potentially the signal to noise ratio.

Related

step doubling Runge Kutta implementation stuck shrinking stepsize to machine precision

I need to integrate a system of ODES using an adaptive RK4 method with stepsize control via step doubling techniques.
The problem is that the program continues forever shrinking the stepsize down to machine precision while not advancing time.
the idea is to step the solution once by a single step and also by two successive half steps, compare the result as their difference and store it in eps. So eps is a measure of the error. Now I want to determine the next step stepsize according to whether eps is greater to a specified accuracy eps0 (as described in the book "Numerical Recipes")
RK4Step(double t, double* Y, double *Yout, void (*RHSFunc)(double, double *, double *),double h) steps the solution vector Y by h and puts the result into Yout using the function RHSFunc.
#define NEQ 4 //problem dimension
int main(int argc, char* argv[])
{
ofstream frames("./frames.dat");
ofstream graphs("./graphs.dat");
double Y[4] = {2.0, 2.0, 1.0, 0.0}; //initial conditions for solution vector
double finaltime = 100; //end of integration
double eps0 = 10e-5; //error to compare with eps
double t = 0.0;
double step = 0.01;
while(t < finaltime)
{
double eps = 0.0;
double Y1[4], Y2[4]; //Y1 will store half step solution
//Y2 will store double step solution
double dt = step; //cache current stepsize
for(;;)
{
//make a step starting from state stored in Y and
//put solution into Y1. Then from Y1 make another half step
//and store into Y1.
RK4Step(t, Y, Y1, RHS, step); //two half steps
RK4Step(t+step,Y1, Y1, RHS, step);
RK4Step(t, Y, Y2, RHS, 2*step); //one long step
//compute eps as maximum of differences between Y1 and Y2
//(an alternative would be quadrature sums)
for(int i=0; i<NEQ; i++)
eps=max(eps, fabs( (Y1[i]-Y2[i])/15.0 ) );
//if error is within tolerance we grow stepsize
//and advance time
if(eps < eps0)
{
//stepsize is accepted, grow stepsize
//save solution from Y1 into Y,
//advance time by the previous (cached) stepsize
Y[0] = Y1[0]; Y[1] = Y1[1];
Y[2] = Y1[2]; Y[3] = Y1[3];
step = 0.9*step*pow(eps0/eps, 0.20); //(0.9 is the safety factor)
t+=dt;
break;
}
//if the error is too big we shrink stepsize
step = 0.9*step*pow(eps0/eps, 0.25);
}
}
frames.close();
graphs.close();
return 0;
}
You never reset eps in the inner loop. This could be the direct cause of your problem. While the actual error reduces with ever decreasing step sizes, the maximum in eps stays constant, and above eps0. This results in a constant reducing factor in the step size update, without any chance to break the loop.
Another "wrong thing" is that the error estimate and tolerance are incompatible. The error tolerance eps0 is an error density or unit-step error. To bring your error estimate eps into that format you need to divide eps by step. Or put another way, currently you are forcing the actual step error to be close to 0.5*eps0, so that the global error is 0.5*eps0 times the number of steps taken, with the number of steps loosely proportional to eps0^0.2. In the version using the unit-step error, the local error is forced to be "dynamically" close to 0.5*eps0*step, so that the global error is about 5*eps0 times the length of the integration interval. I'd say that the second variant is more in line with intuition about the expected behavior.
This is not a critical error, but may lead to sub-optimal step sizes and an actual global error that deviates non-trivially from the desired error tolerance.
You also have a coding inconsistency as in the propagation of the state and declaration of state vectors you have hard-coded 4 components in the state vector, while in the error computation you have a loop over a variable number NEQ of equations and components. As you are using C++, you could use a state vector class that handles all dimension-dependent loops internally. (If done too far, frequent allocation of instances with a short life span could be an efficiency issue.)

How to interpret FFT data for making a spectrum visualizer

I am trying to visualize a spectrum where the frequency range is divided into N bars, either linearly or logarithmic. The FFT seems to work fine, but I am not sure how to interpret the values in order to decide the max height for the visualization.
I am using FMODAudio, a wrapper for C#. It's set up correctly.
In the case of a linear spectrum, the bars are defined as following:
public int InitializeSpectrum(int windowSize = 1024, int maxBars = 16)
{
numSamplesPerBar_Linear.Clear();
int barSamples = (windowSize / 2) / maxBars;
for (int i = 0; i < maxBars; ++i)
{
numSamplesPerBar_Linear.Add(barSamples);
}
IsInitialized = true;
Data = new float[numSamplesPerBar_Linear.Count];
return numSamplesPerBar_Linear.Count;
}
Data is the array which holds the spectrum values received from the update loop.
The update looks like this:
public unsafe void UpdateSpectrum(ref ParameterFFT* fftData)
{
int length = fftData->Length / 2;
if (length > 0)
{
int indexFFT = 0;
for (int index = 0; index < numSamplesPerBar_Linear.Count; ++index)
{
for (int frec = 0; frec < numSamplesPerBar_Linear[index]; ++frec)
{
for (int channel = 0; channel < fftData->ChannelCount; ++channel)
{
var floatspectrum = fftData->GetSpectrum(channel); //this is a readonlyspan<float> by default.
Data[index] += floatspectrum[indexFFT];
}
++indexFFT;
}
Data[index] /= (float)(numSamplesPerBar_Linear[index] * fftData->ChannelCount); // average of both channels for more meaningful values.
}
}
}
The values I get when testing a song are very low across the bands.
A randomly chosen moment when playing a song gives these values:
16 bars = 0,0326 0,0031 0,001 0,0003 0,0004 0,0003 0,0001 0,0002 0,0001 0,0001 0,0001 0 0 0 0 0
I realize it's more useful to use a logarithmic spectrum in many cases, and I intend to, but I still need to figure how how to find the max values for each bar so that I can setup the visualization on a proper scale.
Q: How can I know the potential max values for each bar based on this setup (it's not 1.0)?
output from FFT call is an array where each element is a complex number ( A + Bi ) where A is the real number component and B the imaginary number component ... element zero of this array represents frequency zero as in DC which is the offset bias can typically be ignored ... as you iterate across each element of this array you increment the frequency ... this freq increment is calculated using
Audio_samples <-- array of raw audio samples in PCM format which gets
fed into FFT call
num_fft_bins := float64(len(Audio_samples)) / 2.0 // using Nyquist theorem
freq_incr_per_bin := (input_audio_sample_rate / 2.0) / num_fft_bins
so to answer your question the output array from FFT call is a linear progression evenly spaced based in above freq increment constant
Depends on your input data to the FFT, and the scaling that your particular FFT implementation uses (not all FFTs use the same scale factor).
With an energy preserving forward-FFT, Parseval's theorem applies. So the energy (sum of squares) of the input vector equals the energy of the FFT result vector. Note that for a single integer periodic in aperture sinusoidal input (a pure tone), all that energy can appear in a single FFT result element. So if you know the maximum possible input energy, you can use that to compute the maximum possible result element magnitude for scaling purposes.
The range is often large enough that visualizers commonly need to use log scaling, or else typical input can get pixel quantized to a graph of all zeros.

Getting pointers to specific elements of a 1D contiguous array on the device

I am trying to use CUBLAS in C++ to rewrite a python/tensorflow script which is operating on batches of input samples (of shape BxD, B: BatchSize, D: Depth of the flattened 2D matrix)
For the first step, I decided to use CUBLAS cublasSgemmBatched to compute MatMul for batches of matrices.
I've found couple working sample codes as the one in link to the question,
but what I want is to allocate one big contiguous device array to store batches of flattened identical shaped matrices. I DO NOT want to store batches separated from each other on device memory(as they are in the provided sample code in the given link to StackOverflow question)
From what I can imagine, somehow I have to get a list of pointers to starting elements of each batch on device memory. something like this:
float **device_batch_ptr;
cudaMalloc((void**)&device_batch_ptr, batch_size*sizeof(float *));
for(int i = 0 ; i < batch_size; i++ ) {
// set device_batch_ptr[i] to starting point of i'th batch on device memory array.
}
Note that cublasSgemmBatched needs a float** that each float* in it, points to starting element of each batch in a given input matrix.
Any advice and suggestions will be greatly appreciated.
If your arrays are in contiguous linear memory (device_array) then all you need to do is calculate the offsets using standard pointer arithmetic and store the device addresses in a host array which you then copy to the device. Something like:
float** device_batch_ptr;
float** h_device_batch_ptr = new float*[batch_size];
cudaMalloc((void**)&device_batch_ptr, batch_size*sizeof(float *));
size_t nelementsperrarray = N * N;
for(int i = 0 ; i < batch_size; i++ ) {
// set h_device_batch_ptr[i] to starting point of i'th batch on device memory array.
h_device_batch_ptr[i] = device_array + i * nelementsperarray;
}
cudaMemcpy(device_batch_ptr, h_device_batch_ptr, batch_size*sizeof(float *)),
cudaMemcpyHostToDevice);
[Obviously never compiled or tested, use at own risk]

C - pass array as parameter and change size and content

UPDATE: I solved my problem (scroll down).
I'm writing a small C program and I want to do the following:
The program is connected to a mysql database (that works perfectly) and I want to do something with the data from the database. I get about 20-25 rows per query and I created my own struct, which should contain the information from each row of the query.
So my struct looks like this:
typedef struct {
int timestamp;
double rate;
char* market;
char* currency;
} Rate;
I want to pass an empty array to a function, the function should calculate the size for the array based on the returned number of rows of the query. E.g. there are 20 rows which are returned from a single SQL query, so the array should contain 20 objectes of my Rate struct.
I want something like this:
int main(int argc, char **argv)
{
Rate *rates = ?; // don't know how to initialize it
(void) do_something_with_rates(&rates);
// the size here should be ~20
printf("size of rates: %d", sizeof(rates)/sizeof(Rate));
}
How does the function do_something_with_rates(Rate **rates) have to look like?
EDIT: I did it as Alex said, I made my function return the size of the array as size_t and passed my array to the function as Rate **rates.
In the function you can access and change the values like (*rates)[i].timestamp = 123 for example.
In C, memory is either dynamically or statically allocated.
Something like int fifty_numbers[50] is statically allocated. The size is 50 integers no matter what, so the compiler knows how big the array is in bytes. sizeof(fifty_numbers) will give you 200 bytes here.
Dynamic allocation: int *bunch_of_numbers = malloc(sizeof(int) * varying_size). As you can see, varying_size is not constant, so the compiler can't figure out how big the array is without executing the program. sizeof(bunch_of_numbers) gives you 4 bytes on a 32 bit system, or 8 bytes on a 64 bit system. The only one that know how big the array is would be the programmer. In your case, it's whoever wrote do_something_with_rates(), but you're discarding that information by either not returning it, or taking a size parameter.
It's not clear how do_something_with_rates() was declared exactly, but something like: void do_something_with_rates(Rate **rates) won't work as the function has no idea how big rates is. I recommend something like: void do_something_with_rates(size_t array_size, Rate **rates). At any rate, going by your requirements, it's still a ways away from working. Possible solutions are below:
You need to either return the new array's size:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
return n; // returning the new size so that the caller knows
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size = do_something_with_rates(20, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Or pass in a size parameter for the function to set:
void do_something_with_rates(size_t old_array_size, size_t *new_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
*new_array_size = n; // setting the new size so that the caller knows
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size;
do_something_with_rates(20, &new_size, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Why do I need to pass the old size as a parameter?
void do_something_with_rates(Rate **rates) {
// You don't know what n is. How would you
// know how many rate objects the caller wants
// you to process for any given call to this?
for (size_t i = 0; i < n; ++i)
// carry out your operation on new_rates
}
Everything changes when you have a size parameter:
void do_something_with_rates(size_t size, Rate **rates) {
for (size_t i = 0; i < size; ++i) // Now you know when to stop
// carry out your operation on new_rates
}
This is a very fundamental flaw with your program.
I want to also want the function to change the contents of the array:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out some operation on new_rates
Rate *array = *new_rates;
for (size_t i = 0; i < n; ++i) {
array[i]->timestamp = time();
// you can see the pattern
}
return n; // returning the new size so that the caller knows
}
sizeof produces a value (or code to produce a value) of the size of a type or the type of an expression at compile time. The size of an expression can therefore not change during the execution of the program. If you want that feature, use a variable, terminal value or a different programming language. Your choice. Whatever. C's better than Java.
char foo[42];
foo has either static storage duration (which is only partially related to the static keyword) or automatic storage duration.
Objects with static storage duration exist from the start of the program to the termination. Those global variables are technically called variables declared at file scope that have static storage duration and internal linkage.
Objects with automatic storage duration exist from the beginning of their initialisation to the return of the function. These are usually on the stack, though they could just as easily be on the graph. They're variables declared at block scope that have automatic storage duration and internal linkage.
In either case, todays compilers will encode 42 into the machine code. I suppose it'd be possible to modify the machine code, though that several thousands of lines you put into that task would be much better invested into storing the size externally (see other answer/s), and this isn't really a C question. If you really want to look into this, the only examples I can think of that change their own machine code are viruses... How are you going to avoid that antivirus heuristic?
Another option is to encode size information into a struct, use a flexible array member and then you can carry both the array and the size around as one allocation. Sorry, this is as close as you'll get to what you want. e.g.
struct T_vector {
size_t size;
T value[];
};
struct T_vector *T_make(struct T_vector **v) {
size_t index = *v ? (*v)->size++ : 0, size = index + 1;
if ((index & size) == 0) {
void *temp = realloc(*v, size * sizeof *(*v)->value);
if (!temp) {
return NULL;
}
*v = temp;
// (*v)->size = size;
*v = 42; // keep reading for a free cookie
}
return (*v)->value + index;
}
#define T_size(v) ((v) == NULL ? 0 : (v)->size)
int main(void) {
struct T_vector *v = NULL; T_size(v) == 0;
{ T *x = T_make(&v); x->value[0]; T_size(v) == 1;
x->y = y->x; }
{ T *y = T_make(&v); x->value[1]; T_size(v) == 2;
y->x = x->y; }
free(v);
}
Disclaimer: I only wrote this as an example; I don't intend to test or maintain it unless the intent of the example suffers drastically. If you want something I've thoroughly tested, use my push_back.
This may seem innocent, yet even with that disclaimer and this upcoming warning I'll likely see a comment along the lines of: Each successive call to make_T may render previously returned pointers invalid... True, and I can't think of much more I could do about that. I would advise calling make_T, modifying the value pointed at by the return value and discarding that pointer, as I've done above (rather explicitly).
Some compilers might even allow you to #define sizeof(x) T_size(x)... I'm joking; don't do this. Do it, mate; it's awesome!
Technically we aren't changing the size of an array here; we're allocating ahead of time and where necessary, reallocating and copying to a larger array. It might seem appealing to abstract allocation away this way in C at times... enjoy :)

Low memory copy throughput Host to Device

I have a vector of vectors vector<vector<double>> data.
I want to copy only the information contained in that "2D matrix" as there are no vectors in CUDA.
So the first approach I used was
vector<vector<double>> *values;
vector<vector<double>>::iterator it;
double *d_values;
double *dst;
checkCudaErr(
cudaMalloc((void**)&d_values, sizeof(double)*M*N)
);
dst = d_values;
for (it = values->begin(); it != values->end(); ++it){
double *src = &((*it)[0]);
size_t s = it->size();
checkCudaErr(
cudaMemcpy(dst, src, sizeof(double)*s, cudaMemcpyHostToDevice)
);
dst += s;
}
After profiling with NVVP I got a very low cudaMempcpy throughput. I think this is logic as I'm sending a very small amount of
bytes in each cudaMemcpy call.
So I decided to change a little bit the code to try to improve this, so the second approach is
double *h_values = new double[M*N];
dst = h_values;
for (it = values->begin(); it != values->end(); ++it){
double *src = &((*it)[0]);
size_t s = it->size();
memcpy(dst, src, sizeof(double)*s);
dst += s;
}
checkCudaErr(
cudaMemcpy(d_values, h_values, sizeof(double)*M*N, cudaMemcpyHostToDevice)
);
the result after profiling is still a low memcpy throughput.
So, my question is, how can I improve the copies from host to device?
I'm using a Quadro K4000. I'm getting 25 MB/s for the first case and about 2 GB/s on the second one. M = 5 and N = 2000000. I must say the value for M is a common value, but sometimes it can get up to 50.
A reason for your slow throughput can be that you allocate your double matrix with new. This memory is not page locked. You can either use a system function (dont know which system you use) or the cuda function providing this functionality. It would be cudaMallocHost.
Just remove your =new double[M*N] and set your h_values with cudaMallocHost(&h_values, sizeof(double)*M*N) (and of course dont delete it, but free it (with cudaFreeHost)).
Btw. the theoretical top speed is 8 GB/s (PCI 2.0 x 16 lanes), practical you will stay below it (around 6 GB/s).