Difference between time_steps and features in a LSTM input of Keras. Can anyone explain with an example? - deep-learning

I am trying to build a LSTM model to train sequences using Keras. I have gone through some posts here but unable to find a narration that explains what does samples, features and time steps mean in the context of RNNs or LSTM.

Samples. One sequence is one sample. A batch is comprised of one or more samples.
Time Steps. One time step is one point of observation in the sample.
Features. One feature is one observation at a time step.

Related

How to evaluate the deep learning time series forecasting models?

I am working on a long-term time series (wind speed) forecasting model with different deep learning algorithms. I am using MLP, CNN, and LSTM. I have several questions, and I would appreciate it if you could answer them.
-Do I have to do any preprocessing for seasonality for these deep learning models?
Why is my R-square so bad and sometimes negative?
When I plot the predicted model on the train or test, it is obvious that the model is not good since it is like a straight line and does not capture the trend. However, my evaluation parameters are really good. For example, the RMSE, MAE, and MAPE are 0.77, 0.67, and 0.1, respectively. So is it enough to just report these parameters as many articles have?
And the last one, is it possible to use the proposed model for different datasets? Is it reasonable if I use another city wind speed dataset with a different pattern and trend on this model? Because I have seen many articles that have done it and my models are not working on different datasets.

Deep Learning for Acoustic Emission concrete fracture speciments: regression on-set time and classification of type of failure

How can I use deep learning for both regression and classification tasks?
I am facing a problem with acoustic emission on fracture with concrete speciment. The objective is to find automatically the on-set time instant (time at the beginning of the acoustic emission) and the slope with the peak value to determine the kind of fracture (mode I or mode II based on the raise angle RA).
I have tried Regional CNN to work with images of the signals Fine-tuning Faster-RCNN using pytorch, but unfortunately the results are not outstanding up to now.
I would like to work with sequences (time series) of amplitude data according to a certain sampling frequency, but they have different length each. How can I deal with this problem?
Can I make a 1D-CNN which makes a sort of anomaly detection based on the supervised point that I can mark manually on training examples?
I have a certain number of recordings which I would like to exploit to train the model sampled at 100Hz. In examples on anomaly detection like Timeseries anomaly detection using an Autoencoder, they use the same time series and they perform a window with sliding 1 time step in order to obtain about 3700 to train their neural network. Instead I have different number of recordings (time series) each of them with a certain on-set time instant and different global length in seconds. How can I manage it?
I actually need the time instant of the beginning of the signal and the maximum point to define the raise angle and classify the type of fracture. Can I make classification directly with CNN simultaneously with regression tasks of the on-set time instant?
Thank you in advance!
I finally solved, thanks to the fundamental suggestion by #JonNordby, using Sound Event Detection method. We adopted and readapted the code from GitHub YashNita.
I labelled the data according to the following image:
Then, I adopted the method for extracting features from computing the spectrogram of the input signals:
And finally we were able to get a more precise output recognition of the Seismic Event Detection which is directly connected to the Acoustic Emission Event detection, obtaining the following result:
For the moment, only the event recognition phase was done, but it would be simple to readapt also to conduct classification of mode I or mode II of cracking.

Multi-attention based supervised Feature Selection in Multivariate time series

I have been working on a multivariate time series problem. The dataset has at least 40 different factors. I tried to select only the appropriate features before training the model. I came across a paper called "A Multiattention-Based Supervised Feature Selection Method for Multivariate Time Series. The link to the paper:"https://www.hindawi.com/journals/cin/2021/6911192/
The paper looks promising however I could not find the the implementation of it. I would like to know if anyone has come across a similar paper and knows how to implement the architecture in the paper?
If not, I want to know alternate methods to find only the appropriate features for my multivariate time series before training the model.

How to combine the probability (soft) output of different networks and get the hard output?

I have trained three different models separately in caffe, and I can get the probability of belonging to each class for semantic segmentation. I want to get an output based on the 3 probabilities that I am getting (for example, the argmax of three probabilities). This can be done by inferring through net model and deploy.prototxt files. And then based on the final soft output, the hard output shows the final segmentation.
My questions are:
How to get ensemble output of these networks?
How to do end-to-end training of ensemble of three networks? Is there any resources to get help?
How to get final segmentation based on the final probability (e.g., argmax of three probabilities), which is soft output?
My question may sound very basic question, and my apologies for that. I am still trying to learn step by step. I really appreciate your help.
There are two ways (at least that I know of) that you could do to solve (1):
One is to use pycaffe interface, instantiate the three networks, forward an input image through each of them, fetch the output and perform any operation you desire to combine all three probabilites. This is specially useful if you intend to combine them using a more complex logic.
The alternative (way less elegant) is to use caffe test and process all your inputs separately through each network saving the probabilities into files. Then combine the probabilities from the files later.
Regarding your second question, I have never trained more than two weight-sharing CNNs (siamese networks). From what I understood, your networks don't share weights, only the architecture. If you want to train all three end-to-end please take a look at this tutorial made for siamese networks. The authors define in their prototxt both paths/branches, connect each branch's layers to the input Data layer and, at the end, with a loss layer.
In your case you would define the three branches (one for each of your networks), connect with input data layers (check if each branch processes the same input or different inputs, for example, the same image pre-processed differently) and unite them with a loss, similarly to the tutorial.
Now, for the last question, it seems Caffe has a ArgMax layer that may be what you are looking for. If you are familiar with python, you could also use a python layer that allows you to define with great flexibility how to combine the output probabilities.

Is it possible to forward the output of a deep-learning network to another network with caffe / pycaffe?

I am using caffe, or more likely pycaffe to train and create my network. I am having a dataset with 5 labels at the end. I had the idea to create one network for each label that can just simply say the score for one class. After having then trained 5 networks I want to compare the outputs of the networks and which one has the highest score.
Sadly I do only know how to create one network , but not how to let them interact and moreover how to do something like a max function at the end. I add a picture to describe what I want to do.
Moreover, I do not know if this would have a better outcome than just a normal deep neuronal network.
I don't see what you expect to have as the input to this "max" function. Even if you use some sort of is / is not boundary training, your approach appears to be an inferior version of the softmax layer available in all popular frameworks.
Yes, you can build a multi-channel model, train each channel with a different data set, and then accept the most confident prediction -- but the result will take longer and be less accurate than a cooperative training pass. Your five channels wind up negotiating their boundaries after they've made other parametric assumptions.
Feed a single model all the information available from the outset; you'll get faster convergence and more accurate classification.