My solver.prototxt using adam is as follows. Do I need to add or remove any terms? The loss doesnt seem to reduce
net: "/home/softwares/caffe-master/examples/hpm/hp.prototxt"
test_iter: 6
test_interval: 1000
base_lr: 0.001
momentum: 0.9
momentum2: 0.999
delta: 0.00000001
lr_policy: "fixed"
regularization_type: "L2"
stepsize: 2000
display: 100
max_iter: 20000
snapshot: 1000
snapshot_prefix: "/home/softwares/caffe-master/examples/hpm/hp"
type: "Adam"
solver_mode: GPU
By comparing with caffe example on mnist, 'stepsize' can be deleted since 'lr_policy' is 'fixed'.
How about your work? If you used Adam. I suggest you look at the setting in caffe. I do not know why you have L2 and delta value. This is standard setting
# The train/test net protocol buffer definition
# this follows "ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION"
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# All parameters are from the cited paper above
base_lr: 0.001
momentum: 0.9
momentum2: 0.999
# since Adam dynamically changes the learning rate, we set the base learning
# rate to a fixed value
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
type: "Adam"
solver_mode: GPU
Try with a learning rate of 0.1 and a slower step size like 300 and watch the behavior, also check if the lmdb/hdf5 file is well formed and have a correct scale to ease the learning, you can do this by generating the mean file over your dataset.
Related
I'm training a unet neural network. During training, each iteration has a "loss value". This value generally converges, but sometimes jumps around. What weights are finally saved in the .caffemodel file?
What happens if I save it at iteration 20000, and that just so happens to be a point where the loss jumped up a bit, and isn't the lowest loss that it has seen? Are the weights and biases saved from the last iteration or something smarter like the lowest of last 5% iterations?
Thank you
Solver.prototxt has one parameter called "snapshot"
net: "path/to/train.prototxt"
.
.
max_iter: 20000
snapshot: 1000
snapshot_prefix: "path/to/caffemodel/"
solver_mode: GPU
For example, if you fix snapshot: 1000, then each 1000 iterations it will be saved one file .caffemodel with the weights corresponding to that iteration, regardless of whether the loss was less in the previous iteration.
I'm about to experiment with a neural network for handwriting recognition, which can be found here:
https://github.com/mnielsen/neural-networks-and-deep-learning/blob/master/src/network.py
If the weights and biases are randomly initialized, it recognizes over 80% of the digits after a few epochs. If I add a small factor of 0.27 to all weights and biases after initialization, learning is much slower, but eventually it reaches the same accuracy of over 80%:
self.biases = [np.random.randn(y, 1)+0.27 for y in sizes[1:]]
self.weights = [np.random.randn(y, x)+0.27 for x, y in zip(sizes[:-1], sizes[1:])]
Epoch 0 : 205 / 2000
Epoch 1 : 205 / 2000
Epoch 2 : 205 / 2000
Epoch 3 : 219 / 2000
Epoch 4 : 217 / 2000
...
Epoch 95 : 1699 / 2000
Epoch 96 : 1706 / 2000
Epoch 97 : 1711 / 2000
Epoch 98 : 1708 / 2000
Epoch 99 : 1730 / 2000
If I add a small factor of 0.28 to all weights and biases after initialization, the network isn't learning at all anymore.
self.biases = [np.random.randn(y, 1)+0.28 for y in sizes[1:]]
self.weights = [np.random.randn(y, x)+0.28 for x, y in zip(sizes[:-1], sizes[1:])]
Epoch 0 : 207 / 2000
Epoch 1 : 209 / 2000
Epoch 2 : 209 / 2000
Epoch 3 : 209 / 2000
Epoch 4 : 209 / 2000
...
Epoch 145 : 234 / 2000
Epoch 146 : 234 / 2000
Epoch 147 : 429 / 2000
Epoch 148 : 234 / 2000
Epoch 149 : 234 / 2000
I think this has to to with the sigmoid function which gets very flat when close to one and zero. But what happens at this point when the mean of the weights and biases is 0.28? Why is there such a steep drop in the number of recognized digits? And why are there outliers like the 429 above?
Initialization plays a big role in training networks. A good initialization can make training and convergence a lot faster, while a bad one can make it many times slower. It can even allow or prevent convergence at all.
You might want to read this fr some more information on the topic
https://towardsdatascience.com/weight-initialization-in-neural-networks-a-journey-from-the-basics-to-kaiming-954fb9b47c79
By adding 0.27 to all weights and biases you probably shift the network away from the optimal solution and increase the gradients. Depending on the layer count this can lead to exploding gradients. Now you have very big updates of weights every iteration. What could be happening is that you have some weight that is 0.3 (after adding 0.27 to it) and we say the optimal value would be 0.1. Now you get a update with -0.4, now you are at -0.1. The next update might be 0.4 (or something close) and you are back at the original problem. So instead of going slow towards the optimal value, the optimizations just overshoots everything and bounces back and forth. This might be fixed after some time or can lead to no convergence at all since the network just bounces around.
Also in general you want biases to be initialized to 0 or very close to zero. If you try this further you might want to try not adding 0.27 to biases and setting them to 0 or something close to 0 initially. Maybe by doing this it can actually learn again.
Some of my parameters
base_lr: 0.04
max_iter: 170000
lr_policy: "poly"
batch_size = 8
iter_size =16
this is how the training process looks until now:
The Loss here seems stagnant, is there a problem here or this normal?
The solution for me was to lower the base learning rate by a factor of 10 before resuming training from a solverstate snapshot.
To achieve this same solution automatically, you can set the "gamma" and "stepsize" parameters in your solver.prototxt:
base_lr: 0.04
stepsize:10000
gamma:0.1
max_iter: 170000
lr_policy: "poly"
batch_size = 8
iter_size =16
This will reduce your base_lr by a factor of 10 every 10,000 iterations.
Please note, it is normal for loss to fluctuate between values, and even hover around a constant value before making a dip. This could be the cause of your issue, I would suggest training well beyond 1800 iterations before falling back on the above implementation. Look up graphs of caffe train loss logs.
Additionally, please direct all future questions to the caffe mailing group. This serves as a central location for all caffe questions and solutions.
I struggled with this myself and didn't find solutions anywhere before I figured it out. Hope what worked for me will work for you!
I have trained FCN32 for semantic segmentation from scratch for my data, and I got the following output:
As it can be seen, this is not a good learning curve showing an improper training on data.
solver is as follows:
net: "train_val.prototxt"
#test_net: "val.prototxt"
test_iter: 5105 #736
# make test net, but don't invoke it from the solver itself
test_interval: 1000000 #20000
display: 50
average_loss: 50
lr_policy: "step" #"fixed"
stepsize: 50000 #+
gamma: 0.1 #+
# lr for unnormalized softmax
base_lr: 1e-10
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 600000
weight_decay: 0.0005
snapshot: 30000
snapshot_prefix: "snapshot/FCN32s_CNN1"
test_initialization: false
solver_mode: GPU
after changing the learning rate to 0.001, it became worse:
I am wondering what can I do for improving the training? Thanks
You can try varying the learning rate. Good values are normally something between 0.1 and 0.0001.
I'm trying to implement FCN-8s using my own custom data. While training, from scratch, I see that my loss = -nan.
Could someone suggest what's going wrong and how can I correct this? My solver.prototxt is as follows:-
The train_val.prototxt is the same as given in the link above. My custom images are of size 3x512x640 and labels of 1x512x640.There are 11 different types of labels.
net: "/home/ubuntu/CNN/train_val.prototxt"
test_iter: 13
test_interval: 500
display: 20
average_loss: 20
lr_policy: "fixed"
base_lr: 1e-4
momentum: 0.99
iter_size: 1
max_iter: 3000
weight_decay: 0.0005
snapshot: 200
snapshot_prefix: "train"
test_initialization: false