Validation accuracy when training and testing within same loop - deep-learning

Since I am training and testing within same loop (for each epoch on training set, network is applied on entire validation set).
Now does it make sense that the highest validation accuracy I get at some instant (nth epoch) be my network's highest accuracy or should I only use the validation accuracy once the graph has settled and weights don't change?

I think you are confusing testing with validation. If possible you should keep a separate test set of your dataset for testing only AFTER the training and validation are done.
Although you can use that test set for your validation, it's the best practice to do inference on a separate set of data that the model has never seen before.
So answering your question, if you get a spike in your validation accuracy then it can or cannot be the highest as more the number of epochs more the chances of greater accuracy (assuming the model is not overfitting the dataset).
In this situation, it's best that you save the model after every validation accuracy spike so you the best model with the highest validation accuracy after the training and validation has completed.

Related

CNN: Normal that the validation loss decreases much slower than training loss?

i'm training a CNN U-net model for semantic segmentation of images, however the training loss seems to decrease in a much faster rate than the validation loss, is this normal?
I'm using a loss of 0.002
The training and validation loss can be seen in the image bellow:
Yes, this is perfectly normal.
As the NN learns, it infers from the training samples, that it knows better at each iteration. The validation set is never used during training, this is why it is so important.
Basically:
as long as the validation loss decreases (even slightly), it means the NN is still able to learn/generalise better,
as soon as the validation loss stagnates, you should stop training,
if you keep training, the validation loss will likely increase again, this is called overfitting. Put simply, it means the NN learns "by heart" the training data, instead of really generalising to unknown samples (such as in the validation set)
We usually use early stopping to avoid the last: basically, if your validation loss doesn't improve in X iterations, stop training (X being a value such as 5 or 10).

difference between accuracy from confusion matrix and validation accuracy

What is the difference between accuracy obtained from confusion matrix and validation accuracy obtained after several epochs?
I am new to deep learning
The validation accuracy (from each epoch) is the accuracy obtained from each trial (combination of parameters/weight). That is not to be used for reporting. It is just logging of the status of the training/learning process.
If you want to measure accuracy of a trained (fixed/frozen) model (be it deep learning or whatever algorithm learning), use a separate validation set and measure it using confusion matrix yourselve (there is metric package in sklearn).

Is this a valid way to speed up kFold cross validations for deep neural network training?

In the context of convolutional neural network training, I need to do a 10-fold cross validations of my training set. Training just 1 of the 10 fold takes at least one hour on my GPU which means total time for training all 10 folds independently would take at least 10 hours! To speed up training, will my kFold result be valid if I load and tune the trained weights from the fully trained model from the first fold (fold1) for each of the rest of the KFold models (fold2, fold3... fold10)? Is there any side effect?
That won't be doing any cross-validation.
The point of retraining the net is to ensure that it's trained on a different subset of your data, and for every full training, it keeps aside a set of validation data that it has never seen. If you reload your weights from a previous training instance, you're going to be validating against data your network has already seen, and your cross-validation score will be inflated.

How we know when to stop training a model on a pre-trained model?

My apologies since my question may sound stupid question. But I am quite new in deep learning and caffe.
How can we detect how many iterations are required to fine-tune a pre-trained on our own dataset? For example, I am running fcn32 for my own data with 5 classes. When can I stop the fine-tuning process by looking at the loss and accuracy of training phase?
Many thanks
You shouldn't do it by looking at the loss or accuracy of training phase. Theoretically, the training accuracy should always be increasing (also means the training loss should always be decreasing) because you train the network to decrease the training loss. But a high training accuracy doesn't necessary mean a high test accuracy, that's what we referred as over-fitting problem. So what you need to find is a point where the accuracy of test set (or validation set if you have it) stops increasing. And you can simply do it by specifying a relatively larger number of iteration at first, then monitor the test accuracy or test loss, if the test accuracy stops increasing (or the loss stops decreasing) in consistently N iterations (or epochs), where N could be 10 or other number specified by you, then stop the training process.
The best thing to do is to track training and validation accuracy and store snapshots of the weights every k iterations. To compute validation accuracy you need to have a sparate set of held out data which you do not use for training.
Then, you can stop once the validation accuracy stops increasing or starts decreasing. This is called early stopping in the literature. Keras, for example, provides functionality for this: https://keras.io/callbacks/#earlystopping
Also, it's good practice to plot the above quantities, because it gives you important insights into the training process. See http://cs231n.github.io/neural-networks-3/#accuracy for a great illustration (not specific to early stopping).
Hope this helps
Normally you converge to a specific validation accuracy for your model. In practice you normally stop training, if the validation loss did not increase in x epochs. Depending on your epoch duration x may vary most commonly between 5 and 20.
Edit:
An epoch is one iteration over your dataset for trainig in ML terms. You do not seem to have a validation set. Normally the data is split into training and validation data so you can see how well your model performs on unseen data and made decisions about which model to take by looking at this data. You might want to take a look at http://caffe.berkeleyvision.org/gathered/examples/mnist.html to see the usage of a validation set, even though they call it test set.

Testing a network without setting number of iterations

I have a pre-trained network with which I would like to test my data. I defined the network architecture using a .prototxt and my data layer is a custom Python Layer that receives a .txt file with the path of my data and its label, preprocess it and then feed to the network.
At the end of the network, I have a custom Python layer that get the class prediction made by the net and the label (from the first layer) and print, for example, the accuracy regarding all batches.
I would like to run the network until all examples have passed through the net.
However, while searching for the command to test a network, I've found:
caffe test -model architecture.prototxt -weights model.caffemodel -gpu 0 -iterations 100
If I don't set the -iterations, it uses the default value (50).
Does any of you know a way to run caffe test without setting the number of iterations?
Thank you very much for your help!
No, Caffe does not have a facility to detect that it has run exactly one epoch (use each input vector exactly once). You could write a validation input routine to do that, but Caffe expects you to supply the quantity. This way, you can generate easily comparable results for a variety of validation data sets. However, I agree that it would be a convenient feature.
The lack of this feature might be related to its lack for training and the interstitial testing.
In training, we tune the hyper-parameters to get the most accurate model for a given application. As it turns out, this is more closely dependent on TOTAL_NUM than on the number of epochs (given a sufficiently large training set).
With a fixed training set, we often graph accuracy (y-axis) against epochs (x-axis), because that gives tractable results as we adjust batch size. However, if we cut the size of the training set in half, the most comparable graph would scale on TOTAL_NUM rather than the epoch number.
Also, by restricting the size of the test set, we avoid long waits for that feedback during training. For instance, in training against the ImageNet data set (1.2M images), I generally test with around 1000 images, typically no more than 5 times per epoch.