I have a set of varying length multi variate time series data.
Also, each time series corresponds to a vector [non time series]. These vectors are not labels.
I'm trying to model both time series and vector data as a single model for extreme rare event classification.
The vector values have some influence over the time series hence this relation needs to be learnt in order to classify.
Could you please tell me how to model the mixture of time series and flat(vector) data?
Related
I've designed a variational autoencoder (VAE) that clusters sequential time series data.
To evaluate the performance of VAE on labeled data, First, I run KMeans on the raw data and compare the generated labels with the true labels using Adjusted Mutual Info Score (AMI). Then, after the model is trained, I pass validation data to it, run KMeans on latent vectors, and compare the generated labels with the true labels of validation data using AMI. Finally, I compare the two AMI scores with each other to see if KMeans has better performance on the latent vectors than the raw data.
My question is this: How can we evaluate the performance of VAE when the data is unlabeled?
I know we can run KMeans on the raw data and generate labels for it, but in this case, since we consider the generated labels as true labels, how can we compare the performance of KMeans on the raw data with KMeans on the latent vectors?
Note: The model is totally unsupervised. Labels (if exist) are not used in the training process. They're used only for evaluation.
In unsupervised learning you evaluate the performance of a model by either using labelled data or visual analysis. In your case you do not have labelled data, so you would need to do analysis. One way to do this is by looking at the predictions. If you know how the raw data should be labelled, you can qualitatively evaluate the accuracy. Another method is, since you are using KMeans, is to visualize the clusters. If the clusters are spread apart in groups, that is usually a good sign. However, if they are closer together and overlapping, the labelling of vectors in the respective areas may be less accurate. Alternatively, there may be some sort of a metric that you can use to evaluate the clusters or come up with your own.
I have multiple records of time series data related to human movement of multiple repetitions. In this data, rows represent time stamps and columns show recorded parameters (features) at each timestamp. Most of the records have different timestamps. I want to create a 3D array of shapes (number of repetitions, timestamps, features) for movement modeling. For this, I need to make each data record of equal length. Can you please suggest a method(s) {except resampling} to make length equal without losing movement-related information?
I'm looking to build a regression model where I have time based variables that may or may not exist for each data sample.
For instance, let's say we wanted to build a regression model where we could predict how long a new car will last. One of the values is when the car gets its first servicing. However, there are some samples where the car never gets serviced at all. In these situations, how can I account for this when building the model? Can I even use a linear regression model or will I have to choose a different regression model?
When I think about it, this is basically the equivalent of having 2 fields: one for whether the car was serviced and if that is true, a second field for when. But I'm not sure how to build a regression that has data that is intentionally missing.
Apply regression without using time-series. To try to capture seasonality in the data, encode the date/time columns into binary columns (to represent year, day of year, day of the month and day of the week etc.).
I am building a deep learning model for macro-economic prediction. However, different indicators varies widely when it comes to its subsampling time, ranging from minutes to annually.
Dataframe example
The picture contains the 'Treasury Rates (DGS1-20)' which is sampled daily and 'Inflation Rate(CPALT...)' which is sampled monthly. These features are essential for the model to train and dropping out the NaN rows would result in too little data.
I've read some books and articles about how to deal with missing data that includes down sampling to monthly time frames, swapping the NaNs with -1, filling it with averages between the last and next value etc. But the methods that I read mostly deals with data sets that has a missing value of about 10% of the whole dataset while in this case of mine, the monthly sampled 'Inflation(CPI)' is missing at 90+% if I combine it with the 'Treasury Rate' dataset.
I was wondering if there was any workaround to handle missing values, particularly for economic data where the sampling time gap ranges so widely. Thank you
I have been stuck for days because I cannot find a way to fit a traditional regression model such as ARIMA giving multiple time series as input.
I have got thousand trajectories positions of different vehicles (xy coordinates for each position). Let's say each sample (trajectory) is composed of 10 positions (all objects trajectories are not necessarily of the same length). It means I have got 10*N different time series (N is the total number of samples). I want to fit a model with all samples for x coordinates and then predict the future position of any new trajectory (test samples) that I give in input. Then I plan to do the same with another model for y coordinates. I do not say the method will work but I need to implement it to compare it with others (Neural networks, ...)
The hypothesis: a number of time series can be modeled with a single ARIMAX (or other) model (i.e. the same parameters work for all the time series). What is wanted: to fit them all simultaneously.
Can someone help me please?
Thank you in advance for your support!
Best regards,