Getting Data back from Filtered FFT - fft

I have applied an FFT to some data that I'd like to process using Matlab. The resulting frequencies are quite noisy, so I have applied a moving average filter to the frequency/amplitude vectors. Now I am interested in getting the time domain data based on this filtered frequency domain data, to be used in a spectrograph later.
To get the frequency/amplitude components I used this code from a Mathworks example:
NFFT=2^nextpow2(L);
A=fft(a,NFFT)/L; %a is the data
f=Fs/2*linspace(0,1,NFFT/2+1);
and plotted using:
plot(f,2*abs(A(1:NFFT/2+1))
Can you recommend a way of getting the time domain data from the filtered FFT results? Is there an inverse FFT involved?
Thank you very much!

An IFFT is the inverse of an FFT. If you don't change the frequency data, you should get the same data back from an ifft(fft(x)) from the same library.
If you change the data, and want to get real data back, you have to filter all the imaginary components as well as the real components of the complex FFT results, and make sure that the frequency domain data is still complex conjugate symmetric before doing the IFFT. If you use the magnitudes only, that will throw away the phase information which can greatly distort the result.

Related

Evaluating the performance of variational autoencoder on unlabeled data

I've designed a variational autoencoder (VAE) that clusters sequential time series data.
To evaluate the performance of VAE on labeled data, First, I run KMeans on the raw data and compare the generated labels with the true labels using Adjusted Mutual Info Score (AMI). Then, after the model is trained, I pass validation data to it, run KMeans on latent vectors, and compare the generated labels with the true labels of validation data using AMI. Finally, I compare the two AMI scores with each other to see if KMeans has better performance on the latent vectors than the raw data.
My question is this: How can we evaluate the performance of VAE when the data is unlabeled?
I know we can run KMeans on the raw data and generate labels for it, but in this case, since we consider the generated labels as true labels, how can we compare the performance of KMeans on the raw data with KMeans on the latent vectors?
Note: The model is totally unsupervised. Labels (if exist) are not used in the training process. They're used only for evaluation.
In unsupervised learning you evaluate the performance of a model by either using labelled data or visual analysis. In your case you do not have labelled data, so you would need to do analysis. One way to do this is by looking at the predictions. If you know how the raw data should be labelled, you can qualitatively evaluate the accuracy. Another method is, since you are using KMeans, is to visualize the clusters. If the clusters are spread apart in groups, that is usually a good sign. However, if they are closer together and overlapping, the labelling of vectors in the respective areas may be less accurate. Alternatively, there may be some sort of a metric that you can use to evaluate the clusters or come up with your own.

Forecasting out of sample with Fourier regressors

I'm trying to create a multivariate multi-step-ahead forecast using machine learning (weekly and yearly seasonality).
I use some exogenous variables, including Fourier terms. I'm happy with the results of testing the model with in sample data, but now I want to go for production and make real forecasts on completely unseen data. While I can update the other regressors (variables) since they are dummy variables and related to time, I don't know how I will generate new Fourier terms for the N steps ahead.
I have an understanding problem here and what to check it with you: when you generate the fourier terms based on periodicity and the number of sin/cos used to decompose the time serie you want to forecast this process should be independent on that values of the time series. Is that right?
If so, how do you extend the terms for the N steps?
Just for the sake of completeness, I use R.
Thank you
From what I am reading and understanding, you want to get future N terms on the Fourier. To do this, you need to shift your calculated time frame to be some point in the past (say N-1). This is just simple causality, you cannot model the future with Fourier (for example, you cant have (N-1) = a(N+1) + b(N-2) + c(N).

How can I view the sum/average velocity over all timesteps in Paraview?

I imported a point cloud (with velocity, pressure, ...) into Paraview.
What I found so far: a temporal statistics filter that calculates the average of all points per time step.
What I am looking for: a filter that calculates the average of all time steps per particle.
I could write the data into a csv file and load it back into Paraview for visualization. But is there an easier way of doing it directly in Paraview?
Want you want is the 'Temporal Statistics' filter.
Be aware that you cannot choose a specific array to compute: it will process all available arrays.

Rescale data to be sinuosoidal

I have some time series data I'm looking at in Python that I know should follow a sine2 function, but for various reasons doesn't quite fit it. I'm taking an FFT of it and it has a fairly broad frequency spread, when it should be a very narrow single frequency. However, the errors causing this are quite consistent--if I take data again it matches very closely to the previous data set and gives a very similar FFT.
So I've been trying to come up with a way I can rescale the time axis of the data so that it is at a single frequency, and then apply this same rescaling to future data I collect. I've tried various filtering techniques to smooth the data or to cut frequencies from the FFT without much luck. I've also tried fitting a frequency varying sine2 to the data, but haven't been able to get a good fit (if I was able to, I would use the frequency vs time function to rescale the time axis of the original data so that it has a constant frequency and then apply the same rescaling to any new data I collect).
Here's a small sample of the data I'm looking at (the full data goes for a few hundred cycles). And the resulting FFT of the full data
Any suggestions would be greatly appreciated. Thanks!

Best practice for storing GPS data of a tracking app in mysql database

I have a datamodel question for a GPS tracking app. When someone uses our app it will save latitude, longitude, current speed, timestamp and burned_calories every 5 seconds. When a workout is completed, the average speed, total time/distance and burned calories of the workout will be stored in a database. So far so good..
What we want is to also store the data that is saved those every 5 seconds, so we can utilize this later on to plot graphs/charts of a workout for example.
How should we store this amount of data in a database? A single workout can contain 720 rows if someone runs for an hour. Perhaps a serialised/gzcompressed data array in a single row. I'm aware though that this is bad practice..
A relational one/many to many model would be undone? I know MySQL can easily handle large amounts of data, but we are talking about 720 * workouts
twice a week * 7000 users = over 10 million rows a week.
(Ofcourse we could only store the data of every 10 seconds to halve the no. of rows, or every 20 seconds, etc... but it would still be a large amount of data over time + the accuracy of the graphs would decrease)
How would you do this?
Thanks in advance for your input!
Just some ideas:
Quantize your lat/lon data. I believe that for technical reasons, the data most likely will be quantized already, so if you can detect that quantization, you might use it. The idea here is to turn double numbers into reasonable integers. In the worst case, you may quantize to the precision double numbers provide, which means using 64 bit integers, but I very much doubt your data is even close to that resolution. Perhaps a simple grid with about one meter edge length is enough for you?
Compute differences. Most numbers will be fairly large in terms of absolute values, but also very close together (unless your members run around half the world…). So this will result in rather small numbers. Furthermore, as long as people run with constant speed into a constant direction, you will quite often see the same differences. The coarser your spatial grid in step 1, the more likely you get exactly the same differences here.
Compute a Huffman code for these differences. You might try encoding lat and long movement separately, or computing a single code with 2d displacement vectors at its leaves. Try both and compare the results.
Store the result in a BLOB, together with the dictionary to decode your Huffman code, and the initial position so you can return data to absolute coordinates.
The result should be a fairly small set of data for each data set, which you can retrieve and decompress as a whole. Retrieving individual parts from the database is not possible, but it sounds like you wouldn't be needing that.
The benefit of Huffman coding over gzip is that you won't have to artificially introduce an intermediate byte stream. Directly encoding the actual differences you encounter, with their individual properties, should work much better.