Is it possible to feed the output back to input in artificial neural network? - deep-learning

I am currently designing a artificial neural network for a problem with a decay curve.
For example, building a model for predicting the durability of the some material. It may includes the environment condition like temperature and humidity.
However, it is not adequate to predict the durability of the material. For such a problem, I think it is better to using the output durability of previous time slots as one of the current input to predict the durability of next time slot.
Moreover, I do not know how to train a model which feed the output back to input as one of the input columns has only the initial value before training.
For this case,
Method 1 (fail)
I have tried to fill the predicted output durability of current row to the input durability of next row. Nevertheless, it will prevent the model from "loss.backward()" so we cannot compute and update the gradient if we do so. The gradient function used was "CopySlices" instead of "MSELoss" when I copied the predicted output to the next row of the input data.
Feed output to input
gradient function -copy-
Method 2 "fill the input column with expected output"
In this method, I fill the blank input column with expected output (row-1) before training the model. Filling the input column with expected output of previous row is only done for training. For real prediction, I will feed the predicted output to the input. In this case, I am successful to train a overfitting model with MSELoss.
Moreover, I do not believe it is a right method as it uses the expected output as the input no matter how bad it predict. I strongly believed that it is not a right method.
Therefore, I want to ask whether it is possible to feed output to input in linear regression problem using artificial neural network.
I apologize for uploading no code here as I am not convenient to upload the full code here. It may be confidential.

It looks like you need an RNN (recurrent neural network). This tutorial is pretty helpful for understanding an RNN: https://colah.github.io/posts/2015-08-Understanding-LSTMs/.

Related

Can HuggingFace `Trainer` be customised for curriculum learning?

I have been looking for certain features in the HuggingFace transformer Trainer object (in particular Seq2SeqTrainer) and would like to know whether they exist and if so, how to implement them, or whether I would have to write my own training loop to enable them.
I am looking to apply Curriculum Learning to my training strategy, as well as evaluating the model at regular intervals, and therefore would like to enable the following
choose in which order the model sees training samples at each epoch (it seems that the data passed onto the train_dataset argument are automatically shuffled by some internal code, and even if I managed to stop that, I would still need to pass differently ordered data at different epochs, as I may want to start training the model from easy samples for a few epochs, and then pass a random shuffle of all data for later epochs)
run custom evaluation at integer multiples of a fix number of steps. The standard compute_metrics argument of the Trainer takes a function to which the predictions and labels are passed* and the user can decide how to generate the metrics given these. However I'd like a finer level of control, for example changing the maximum sequence length for the tokenizer when doing the evaluation, as opposed to when doing training, which would require me including some explicit evaluation code inside compute_metrics which needs to access the trained model and the data from disk.
Can these two points be achieved by using the Trainer on a multi-GPU machine, or would I have to write my own training loop?
*The function often looks something like this and I'm not sure it would work with the Trainer if it doesn't have this configuration
def compute_metrics(eval_pred):
predictions, labels = eval_pred
...
You can pass custom functions to compute metrics in the training arguments

LSTM Evolution Forecast

I have a confusion about the way the LSTM networks work when forecasting with an horizon that is not finite but I'm rather searching for a prediction in whatever time in future. In physical terms I would call it the evolution of the system.
Suppose I have a time series $y(t)$ (output) I want to forecast, and some external inputs $u_1(t), u_2(t),\cdots u_N(t)$ on which the series $y(t)$ depends.
It's common to use the lagged value of the output $y(t)$ as input for the network, such that I schematically have something like (let's consider for simplicity just lag 1 for the output and no lag for the external input):
[y(t-1), u_1(t), u_2(t),\cdots u_N(t)] \to y(t)
In this way of thinking the network, when one wants to do recursive forecast it is forced to use the predicted value at the previous step as input for the next step. In this way we have an effect of propagation of error that makes the long term forecast badly behaving.
Now, my confusion is, I'm thinking as a RNN as a kind of an (simple version) implementation of a state space model where I have the inputs, my output and one or more state variable responsible for the memory of the system. These variables are hidden and not observed.
So now the question, if there is this kind of variable taking already into account previous states of the system why would I need to use the lagged output value as input of my network/model ?
Getting rid of this does my long term forecast would be better, since I'm not expecting anymore the propagation of the error of the forecasted output. (I guess there will be anyway an error in the internal state propagating)
Thanks !
Please see DeepAR - a LSTM forecaster more than one step into the future.
The main contributions of the paper are twofold: (1) we propose an RNN
architecture for probabilistic forecasting, incorporating a negative
Binomial likelihood for count data as well as special treatment for
the case when the magnitudes of the time series vary widely; (2) we
demonstrate empirically on several real-world data sets that this
model produces accurate probabilistic forecasts across a range of
input characteristics, thus showing that modern deep learning-based
approaches can effective address the probabilistic forecasting
problem, which is in contrast to common belief in the field and the
mixed results
In this paper, they forecast multiple steps into the future, to negate exactly what you state here which is the error propagation.
Skipping several steps allows to get more accurate predictions, further into the future.
One more thing done in this paper is predicting percentiles, and interpolating, rather than predicting the value directly. This adds stability, and an error assessment.
Disclaimer - I read an older version of this paper.

How to get the accuracy of classifier on test data in DeepLearning

I am trying to use DL4J for deep learning and have provided the training data with the labels. I am then trying to send a test data by assigning a dummy label. Without providing a dummy label, it gives runtime error. I dont understand why we need to assign label to test data.
Additionally, I want to know what is the accuracy of the prediction made. From what I saw in the dl4j docs, there is something known as a confusion matrix which is generated. I understand that this just gives us an idea of how well the training data has trained the system. Is there a way to get the accuracy of prediction on test data? Since we are giving a dummy label for the test data, I feel that the confusion matrix is also not generated correctly.
First, how can you test if the network outputs the correct labels if you don't know what the correct labels are? You should always have a labels when training and testing because that way you can assert if the output is correct.
Second question, I've found this on dl4j webpage:
Evaluation eval = new Evaluation(3);
INDArray output = model.output(testData.getFeatures());
eval.eval(testData.getLabels(), output);
log.info(eval.stats());
There is stated that this .stats() method displays the confusion matrix entries (one per line), Accuracy, Precision, Recall and F1 Score. Additionally the Evaluation Class can also calculate and return the following values:
Confusion Matrix
False Positive/Negative Rate
True Positive/Negative
Class Counts
F-beta, G-measure, Matthews Correlation Coefficient and more
I hope this helps you.
You may find people who can respond to your question in the DL4J dev community here: https://gitter.im/deeplearning4j/deeplearning4j/tuninghelp

Keras LSTM input - Predicting a parabolic trajectory

I want to predict the trajectory of a ball falling. That trajectory is parabolic. I know that LSTM may be too much for this (i.e. a simpler method could suffice).
I thought that we can do this with 2 LSTM layers and a Dense layer at the end.
The end result that I want is to give the model 3 heights h0,h1,h2 and let it predict h3. Then, I want to give it h1, h2, and the h3 it outputted previously to predict h4, and so on, until I can predict the whole trajectory.
Firstly, what would the input shape be for the first LSTM layer ?
Would it be input_shape = (3,1) ?
Secondly, would the LSTM be able to predict a parabolic path ?
I am getting almost a flat line, not a parabola, and I want to rule out the possibility that I am misunderstanding how to feed and shape input.
Thank you
The input shape is in the form (samples, timeSteps, features).
Your only feature is "height", so features = 1.
And since you're going to input sequences with different lengths, you can use timeSteps = None.
So, your input_shape could be (None, 1).
Since we're going to use a stateful=True layer below, we can use batch_input_shape=(1,None,1). Choose the amount of "samples" you want.
Your model can predict the trajectory indeed, but maybe it will need more than one layer. (The exact answer about how many layers and cells depend on knowing how the match inside LSTM works).
Training:
Now, first you need to train your network (only then it will be able to start predicting good things).
For training, suppose you have a sequence of [h1,h2,h3,h4,h5,h6...], true values in the correct sequence. (I suggest you have actually many sequences (samples), so your model learns better).
For this sequence, you want an output predicting the next step, then your target would be [h2,h3,h4,h5,h6,h7...]
So, suppose you have a data array with shape (manySequences, steps, 1), you make:
x_train = data[:,:-1,:]
y_train = data[:,1:,:]
Now, your layers should be using return_sequences=True. (Every input step produces an output step). And you train the model with this data.
A this point, whether you're using stateful=True or stateful=False is not very relevant. (But if true, you always need model.reset_state() before every single epoch and sequence)
Predicting:
For predicting, you can use stateful=True in the model. This means that when you input h1, it will produce h2. And when you input h2 it will remember the "current speed" (the state of the model) to predict the correct h3.
(In the training phase, it's not important to have this, because you're inputting the entire sequences at once. So the speed will be understood between steps of the long sequences).
You can se the method reset_states() as set_current_speed_to(0). You will use it whenever the step you're going to input is the first step in a sequence.
Then you can do loops like this:
model.reset_states() #make speed = 0
nextH = someValueWithShape((1,1,1))
predictions = [nextH]
for i in range(steps):
nextH = model.predict(nextH)
predictions.append(nextH)
There is an example here, but using two features. There is a difference that I use two models, one for training, one for predicting, but you can use only one with return_sequences=True and stateful=True (don't forget to reset_states() at the beginning of every epoch in training).

How to perform multi labeling classification (for CNN)?

I am currently looking into multi-labeling classification and I have some questions (and I couldn't find clear answers).
For the sake of clarity let's take an example : I want to classify images of vehicles (car, bus, truck, ...) and their make (Audi, Volkswagen, Ferrari, ...).
So I thought about training two independant CNN (one for the "type" classification and one fore the "make" classifiaction) but I thought it might be possible to train only one CNN on all the classes.
I read that people tend to use sigmoid function instead of softmax to do that. I understand that sigmoid does not sum up to 1 like softmax does but I dont understand in what doing that enables to do multi-labeling classification ?
My second question is : Is it possible to take into account that some classes are completly independant ?
Thridly, in term of performances (accuracy and time to give the classification for a new image), isn't training two independant better ?
Thank you for those who could give my some answers or some ideas :)
Softmax is a special output function; it forces the output vector to have a single large value. Now, training neural networks works by calculating an output vector, comparing that to a target vector, and back-propagating the error. There's no reason to restrict your target vector to a single large value, and for multi-labeling you'd use a 1.0 target for every label that applies. But in that case, using a softmax for the output layer will cause unintended differences between output and target, differences that are then back-propagated.
For the second part: you define the target vectors; you can encode any sort of dependency you like there.
Finally, no - a combined network performs better than the two halves would do independently. You'd only run two networks in parallel when there's a difference in network layout, e.g. a regular NN and CNN in parallel might be viable.