So basically one splits the database in training/testing. Let's say 2/3 training and the rest is set for testing.
Then in caffe we split our training data in batches of different sizes, let's say that we have 100 batches of 50 images each, so we have 5000 training images. Now let's say that we have 50 testing batches of 50 images each.
Now let' say that caffe did 1 epoch and then test with the testing batches. How does caffe do this?
It takes first training batch and with it, it tries to predict the labels of every testing batch?
Like:
training_batch_1 : testing_batch_1 = accuracy xxxx;
training_batch_1 : testing_batch_2 = accuracy xxxx;
....
training_batch_1 : testing_batch_50 = accuracy xxxx;
And then it extract the mean accuracy for training_batch_1. Then does the same thing with training_batch_2 and so on?
A test simply runs the input vector through a single forward pass of the trained model. Does the top predicted label match the given test value? If so, score 1 point. At the end of the batch, divide total points by batch size, and that's the batch accuracy.
At the end of the testing run, take the mean of the batch accuracies; that's the testing accuracy.
Is that what you needed to know?
Related
I'm trying to train an deep neural network model, the output dimensions of each iteration in one epoch is like [64,1600,8] (64 is the batch size). But in the last iteration of first epoch, this output changed to [54,1600,8] and faced with dimension error. Why in the last iteration batch size had changed??
Additionally, if I change the batch size to 32 the last iteration's output is [22,1600,8].
I think that the output of the last iteration must be same as the other iteration.
The last iteration batch size changed because you did not have enough data to completely fill the batch. If you have a batch size of 10, for example, and you have 101 entries total in your data, then you will have 10 batches of 10 and 1 batch of 1.
The solution is to either drop the batch if it is not the correct size, or to adapt your model so that it will detect the size of the batch and change accordingly, instead of having the batch size hard-coded in to your model parameters.
Seeing that you are using pytorch, I'll add to the answer by Richard by saying that pytorch DataLoaders have the functionality built-in to drop the last (incomplete) batch. Checking the documentation, you can specify drop_last=True while instantiating the DataLoader.
First of all, thank you in advance for the answers to come. I am confused about the use of K-Fold Cross Validation (CV) with CNN.
When working with CV under normal conditions, as seen in the link below, the original dataset is first split as test and training.
https://miro.medium.com/max/875/1*pJ5jQHPfHDyuJa4-7LR11Q.png
Then, the training dataset is divided into training and validation in the K cycle according to the determined K value. In short, if we say K = 5, our training is repeated 5 times, and each time a newly trained model is formed.
Question 1: How can we calculate the overall training-validation accuracy and loss values of 5 different models. Do we need to add up and average the success of all models?
Question 2: We separated the TEST data set from the original data set at the beginning of the training. How can we test the TEST data set on 5 different models? Should we test on 5 models then get their average accuracy or should we test only on the most successful model?
I trained an image classification model of 10 classes by finetuning EfficientNet-B4 for 100 epochs. I split my training data to 70/30. I used stochastic gradient descent with Nesterov momentum of 0.9 and the starting learning rate is 0.001. The batch size is 10. The test loss seemed to stuck at 84% for the next 50 epochs (51st - 100th). I do not know whether the model was stuck in local minima or the model was overfitted. Below is an image of the test and train loss from 51st epoch to 100th. I need your help a lot. Thanks. Train test loss image from 51st to 100th epoch.
From the graph you provided, both validation and training losses are still going down so your model is still training and there is no overfit. If your test set is stuck at the same accuracy, the reason is probably that the data you are using for your training/validation dataset does not generalize well enough on your test dataset (in your graph the validation only reached 50% accuracy while your test set reached 84% accuracy).
I looked into your training and validation graph. yes, your model is training and the losses are going down, but your validation error is near 50%, which means 'random guess'.
Possible reasons-
1- From your train error (which is presented in the image between 50-100 epoch), the error in average is going down, but it's random. like your error at epoch 100 is pretty much the same at epoch 70. This could be because your either dataset is too simple and you are forcing huge network like an efficient net to overfit it.
2- it could also be because of the way you are finetuning it, there could be any problem. like which all layers you froze and for which layer you are taking the gradients while doing BP. I am assuming you are using pre-trained weights.
3- Optimizer issue. try to use Adam
It would be great if you can provide total losses (from epoch 1 - 100).
I'm using Keras. When I run model.fit_generator(...), it goes 1 step per about 1.5 second, but the last step takes a few minutes.
Epoch 1/50
30/31 [============================>.] - ETA: 0s - loss: 2.0676 - acc: 0.2010
Why?
This happens because you are giving validation data to Keras, through a parameter in model.fit or model.fit_generator.
After each epoch, Keras takes the validation data and evaluates the model on this data, which implies one forward pass for each validation data point, which might take a lot of time and might seem that Keras is stuck, but it is necessary when training a model.
I faced this issue while training a CNN , and found that decreasing the image dimensions speeds up the training. The processing time is reduced due to reduced input dimension during both forward pass and backpropagation (while updating weights). If for example, you are using a CNN for image classification, image size of 64*64 would be processed much faster than of size 256*256, though obviously at the cost of losing out information due to lower resolution.
I am training a deep autoencoder (for now 5 layers encoding and 5 layers decoding, using leaky ReLu) to reduce the dimensionality of the data from about 2000 dims to 2. I can train my model on 10k data, and the outcome is acceptable.
The problem arises when I am using bigger data (50k to 1M). Using the same model with the same optimizer and drop out etc does not work and the training gets stuck after a few epochs.
I am trying to do some hyper-parameter search on the optimizer (I am using adam), but I am not sure if this will solve the problem.
Should I look for something else to change/check? Does the batch size matter in this case? Should I solve the problem by fine tuning the optimizer? Shoul I play with the dropout ratio? ...
Any advice is very much appreciated.
p.s. I am using Keras. It is very convenient. If you do not know about it, then check it out: http://keras.io/
I would have the following questions when trying to find a cause of the problem:
1) What happens if you change the size of the middle layer from 2 to something bigger? Does it improve the performance of the model trained on >50k training set?
2) Are 10k training examples and test examples randomly selected from 1M dataset?
My guess is that your training model is simply not able to decompress your 50K-1M data using just 2 dimensions in the middle layer. So, it's easier for the model to fit their params for 10k data, activations from middle layer are more sensible in that case, but for >50k data activations are random noise.
After some investigation, I have realized that the layer configuration I am using is somehow ill for the problem, and this seems to cause -at least parts of the- problem.
I have been using sequence of layers for encoding and decoding. The layer sizes where chosen to decrease linearly, for example:
input: 1764 (dims)
hidden1: 1176
hidden2: 588
encoded: 2
hidden3: 588
hidden4: 1176
output: 1764 (same as input)
However this seems to work only occasionally and it is sensitive to the choice of hyper parameters.
I tried to replace this with an exponentially decreasing layer size (for encoding) and the other way for decoding. so:
1764, 128, 16, 2, 16, 128, 1764
Now in this case the training seems to be happening more robustly. I still have to make a hyper parameter search to see if this one is sensitive or not, but a few manual trials seems to show its robustness.
I will post an update if I encounter some other interesting points.