Weird loss pattern when using two losses in caffe - deep-learning

I am training a CNN in caffe and receive following weird loss pattern:
I0425 16:38:58.305482 23335 solver.cpp:398] Test net output #0: loss = nan (* 1 = nan loss)
I0425 16:38:58.305524 23335 solver.cpp:398] Test net output #1: loss_intermediate = inf (* 1 = inf loss)
I0425 16:38:59.235857 23335 solver.cpp:219] Iteration 0 (-4.2039e-45 iter/s, 20.0094s/50 iters), loss = 18284.4
I0425 16:38:59.235926 23335 solver.cpp:238] Train net output #0: loss = 18274.9 (* 1 = 18274.9 loss)
I0425 16:38:59.235942 23335 solver.cpp:238] Train net output #1: loss_intermediate = 9.46859 (* 1 = 9.46859 loss)
I0425 16:38:59.235955 23335 sgd_solver.cpp:105] Iteration 0, lr = 1e-06
I0425 16:39:39.330327 23335 solver.cpp:219] Iteration 50 (1.24704 iter/s, 40.0948s/50 iters), loss = 121737
I0425 16:39:39.330410 23335 solver.cpp:238] Train net output #0: loss = 569.695 (* 1 = 569.695 loss)
I0425 16:39:39.330425 23335 solver.cpp:238] Train net output #1: loss_intermediate = 121168 (* 1 = 121168 loss)
I0425 16:39:39.330433 23335 sgd_solver.cpp:105] Iteration 50, lr = 1e-06
I0425 16:40:19.372197 23335 solver.cpp:219] Iteration 100 (1.24868 iter/s, 40.0421s/50 iters), loss = 34088.4
I0425 16:40:19.372268 23335 solver.cpp:238] Train net output #0: loss = 369.577 (* 1 = 369.577 loss)
I0425 16:40:19.372283 23335 solver.cpp:238] Train net output #1: loss_intermediate = 33718.8 (* 1 = 33718.8 loss)
I0425 16:40:19.372292 23335 sgd_solver.cpp:105] Iteration 100, lr = 1e-06
I0425 16:40:59.501541 23335 solver.cpp:219] Iteration 150 (1.24596 iter/s, 40.1297s/50 iters), loss = 21599.6
I0425 16:40:59.501606 23335 solver.cpp:238] Train net output #0: loss = 478.262 (* 1 = 478.262 loss)
I0425 16:40:59.501621 23335 solver.cpp:238] Train net output #1: loss_intermediate = 21121.3 (* 1 = 21121.3 loss)
...
I0425 17:09:01.895849 23335 solver.cpp:219] Iteration 2200 (1.24823 iter/s, 40.0568s/50 iters), loss = 581.874
I0425 17:09:01.895912 23335 solver.cpp:238] Train net output #0: loss = 532.049 (* 1 = 532.049 loss)
I0425 17:09:01.895926 23335 solver.cpp:238] Train net output #1: loss_intermediate = 49.8377 (* 1 = 49.8377 loss)
I0425 17:09:01.895936 23335 sgd_solver.cpp:105] Iteration 2200, lr = 1e-06
FYI: My Network consists of basically two stages therefore I have two losses. The first stage can be seen as a coarse stage and the second one is an upsampling stage of the coarse stage.
My question is: Is this a typical loss pattern? First the loss value is high and the intermediate_loss is low for the first iteration and then it basically turns around in the next iterations so the loss is lower and the intermediate_loss is higher. In the end only the intermediate_loss converges.

"typical" isn't really an applicable term. There is such a variety of models and topologies that you can find many examples of strange loss progressions.
In your case, the intermediate loss may well start out low because it "doesn't know any better" yet. As the latter layers become trained enough to give reliable feedback to the intermediate layers, then it begins learning enough to have serious mistakes.
The final loss computation is in direct contact with ground truth; it learns from the first iteration, so it has a more understandable progression from high loss to low.

Related

how to prevent my DNN / MLP converging to average

I want to use available several features to predict a variable. It does not seem to be related to vision or NLP. Although I believe there are good reasons that the variable to be predicted is a non linear function of these features. So I just use normal MLP like following:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(53, 200)
self.fc2 = nn.Linear(200, 100)
self.fc3 = nn.Linear(100, 36)
self.fc4 = nn.Linear(36, 1)
def forward(self, x):
x = F.leaky_relu(self.fc1(x))
x = F.leaky_relu(self.fc2(x))
x = F.leaky_relu(self.fc3(x))
x = self.fc4(x)
return x
net = Net().to(device)
loss_function = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001, weight_decay= 1e-6)
def train_normal(model, device, train_loader, optimizer, epoch):
model.train ()
for batch_idx, (data, target) in enumerate (train_loader):
data = data.to (device)
target = target.to (device)
optimizer.zero_grad ()
output = model (data)
loss = loss_function (output, target)
loss.backward ()
torch.nn.utils.clip_grad_norm_(model.parameters(), 100)
optimizer.step ()
if batch_idx % 100 == 0:
print ('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format (
epoch, batch_idx * len (data), len (train_loader.dataset),
100. * batch_idx / len (train_loader), loss.item ()))
At first it seems to work and did learn something:
Train Epoch: 9 [268800/276316 (97%)] Loss: 0.217219
Train Epoch: 9 [275200/276316 (100%)] Loss: 0.234965
predicted actual diff
-1.18 -1.11 -0.08
0.15 -0.15 0.31
0.19 0.27 -0.08
-0.49 -0.48 -0.01
-0.05 0.08 -0.14
0.44 0.50 -0.06
-0.17 -0.05 -0.12
1.81 1.92 -0.12
1.55 0.76 0.79
-0.05 -0.30 0.26
But when it kept learning, I saw the results seemingly to be close to each other's average regardless the different input:
predicted actual diff
-0.16 -0.06 -0.10
-0.16 -0.55 0.39
-0.13 -0.26 0.14
-0.15 0.50 -0.66
-0.16 0.02 -0.18
-0.16 -0.12 -0.04
-0.16 -0.40 0.24
-0.01 1.20 -1.21
-0.07 0.33 -0.40
-0.09 0.02 -0.10
What technology / trick can prevent it? Also, how to increase the accuracy, shall I add more hidden layers or add more neurons of each layer?
One possible problem is that there is nothing to learn.
Check that your data is standardized and try different learning rates (maybe even cyclic learning rate). Something that can be happening is that the algorithm is not able to get inside the minima and keeps jumping around.
I am not sure, if you are using it but, use a standard implementation that works in another dataset and then change it to your problem, just to avoid small development mistakes. You can check either this tutorial How to apply Deep Learning on tabular data with FastAi but if you are really new I will totally recommend doing this MOOC https://course.fast.ai/. This should allow you to gain some level and understanding.
If you have all tabular data already you can try to use a machine learning algorithm like linear regression/gradient boosting. Just to check if your data has some info.
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> # y = 1 * x_0 + 2 * x_1 + 3
>>> y = np.dot(X, np.array([1, 2])) + 3
>>> reg = LinearRegression().fit(X, y)
>>> reg.score(X, y)
1.0
>>> reg.coef_
array([1., 2.])
>>> reg.intercept_
3.0000...
>>> reg.predict(np.array([[3, 5]]))
array([16.])
Let me know if you find the solution to your problem!

Why adding noise into image data failed CNN regression learning

I am running an object localization task with CNN regression. I got pretty good results with the original data.
However, after I added 5% noise into the original data in the augmentation process, the loss values during the training looks ok, but the validation fluctuates a lot during the training. And when I check the results after the training was done, the neural nets always predicted a constant output. That is, the output from the networks never changes regardless of the input data, which means the neural network was trapped into a dead valley. How could this happen since I added variability into the data, and how to avoid that?
The code to add noise:
sindex = 0;
for n in range(n_train):
cndata = normalized_data[n, :, :, :]
# plt.matshow(np.squeeze(prepped_data[sindex, :, :,:]))
# add noise, augmentation, and shift in x and y
for x in range(1):
noise = np.random.normal(0, noise_sd, (xres, xres, slices))
ishift = np.random.randint(low=1, high=(row - xres), size=2)
prepped_data[sindex, :, :, :, 0] = cndata[ishift[0]:ishift[0] + xres, ishift[1]:ishift[1] + xres, :] + noise
prepped_label[sindex, :] = labelData[:, n] + [-ishift[0], -ishift[1], 0, -ishift[0], -ishift[1], 0, 0]
sindex = sindex + 1
Loss function definition:
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy', root_mean_squared_error])
RMSE definition:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
Here is the original training curve without noise:
Here is the training curve after adding noise:

caffe output the negative loss value with SoftmaxWithLoss layer?

Below is my last layer in training net:
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "final"
bottom: "label"
top: "loss"
loss_param {
ignore_label: 255
normalization: VALID
}
}
Note I adopt a softmax_loss layer. Since its calculation form is like: - log (probability), it's weird the loss can be negative, as shown below(iteration 80).
I0404 23:32:49.400624 6903 solver.cpp:228] Iteration 79, loss = 0.167006
I0404 23:32:49.400806 6903 solver.cpp:244] Train net output #0: loss = 0.167008 (* 1 = 0.167008 loss)
I0404 23:32:49.400825 6903 sgd_solver.cpp:106] Iteration 79, lr = 0.0001
I0404 23:33:25.660655 6903 solver.cpp:228] Iteration 80, loss = -1.54972e-06
I0404 23:33:25.660845 6903 solver.cpp:244] Train net output #0: loss = 0 (* 1 = 0 loss)
I0404 23:33:25.660862 6903 sgd_solver.cpp:106] Iteration 80, lr = 0.0001
I0404 23:34:00.451464 6903 solver.cpp:228] Iteration 81, loss = 1.89034
I0404 23:34:00.451661 6903 solver.cpp:244] Train net output #0: loss = 1.89034 (* 1 = 1.89034 loss)
Can anyone explain it for me? How can this happened?
Thank you very much!
PS:
The task I do here is semantic segmentation.
There are 20 object classes plus background in total(So 21 classes). The label range from 0-21. The extra label 225 is ignored which can be find in SoftmaxWithLoss definition at the beginning of this post.
Is caffe run on GPU or CPU ?
Print out prob_data that you get after softmax operation:
// find the next line in your cpu or gpu Forward function
softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_);
// make sure you have data in cpu
const Dtype* prob_data = prob_.cpu_data();
for (int i = 0; i < prob_.count(); i++) {
printf("%f ", prob_data[i]);
}

why iteration loss does not match the sum of all train net output in caffe

Iteration 1, loss = 0.0486978
Train net output #0: loss_bbox = 0.236533 (* 0.5 = 0.118266 loss)
Train net output #1: loss_cls = 0.353563 (* 0.5 = 0.176781 loss)
Hi, i am training a CNN network using caffe, i found the iteration loss does not match the sum of all train net outputs.
any advise? thanks!

loss increase fine-tuning caffe

I have a classification problem of 280 classes with ~278,000 images.
I do fine-tuning based on the model GoogleNet (bvlc_googlenet in caffe) using quick_solver.txt.
My solver is as follows:
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.001
lr_policy: "poly"
power: 0.5
max_iter: 800000
momentum: 0.9
weight_decay: 0.0002
snapshot: 20000
During training I use the batch size of 32, and the test batch 32 too. I just relearn from scratch three layers loss1/classifier loss2/classifier and loss3/classifier by renaming them. I set the global learning rate 0.001, i.e. 10 times less than the one used in training from scratch. The last three layers however still get the learning rate 0.01.
Logfile of the very first iterations:
I0515 08:44:41.838122 1279 solver.cpp:228] Iteration 40, loss = 9.72169
I0515 08:44:41.838163 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 5.7261 (* 0.3 = 1.71783 loss)
I0515 08:44:41.838170 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 5.65961 (* 0.3 = 1.69788 loss)
I0515 08:44:41.838173 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 5.46685 (* 1 = 5.46685 loss)
I0515 08:44:41.838179 1279 sgd_solver.cpp:106] Iteration 40, lr = 0.000999975
Until the 100,000-th iteration, my net obtains 50% top-1 accuracy and ~80% top-5 accuracy:
I0515 13:45:59.789113 1279 solver.cpp:337] Iteration 100000, Testing net (#0)
I0515 13:46:53.914217 1279 solver.cpp:404] Test net output #0: loss1/loss1 = 2.08631 (* 0.3 = 0.625893 loss)
I0515 13:46:53.914274 1279 solver.cpp:404] Test net output #1: loss1/top-1 = 0.458375
I0515 13:46:53.914279 1279 solver.cpp:404] Test net output #2: loss1/top-5 = 0.768781
I0515 13:46:53.914284 1279 solver.cpp:404] Test net output #3: loss2/loss1 = 1.88489 (* 0.3 = 0.565468 loss)
I0515 13:46:53.914288 1279 solver.cpp:404] Test net output #4: loss2/top-1 = 0.494906
I0515 13:46:53.914290 1279 solver.cpp:404] Test net output #5: loss2/top-5 = 0.805906
I0515 13:46:53.914294 1279 solver.cpp:404] Test net output #6: loss3/loss3 = 1.77118 (* 1 = 1.77118 loss)
I0515 13:46:53.914297 1279 solver.cpp:404] Test net output #7: loss3/top-1 = 0.517719
I0515 13:46:53.914299 1279 solver.cpp:404] Test net output #8: loss3/top-5 = 0.827125
At the 119,00-th iteration everything is still normal
I0515 14:43:38.669674 1279 solver.cpp:228] Iteration 119000, loss = 2.70265
I0515 14:43:38.669777 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 2.41406 (* 0.3 = 0.724217 loss)
I0515 14:43:38.669783 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 2.38374 (* 0.3 = 0.715123 loss)
I0515 14:43:38.669787 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 1.92663 (* 1 = 1.92663 loss)
I0515 14:43:38.669798 1279 sgd_solver.cpp:106] Iteration 119000, lr = 0.000922632
Right after that the loss suddenly raise, i.e. equal to the initial loss ( from 8 to 9),
I0515 14:43:45.377710 1279 solver.cpp:228] Iteration 119040, loss = 8.3068
I0515 14:43:45.377751 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 5.77026 (* 0.3 = 1.73108 loss)
I0515 14:43:45.377758 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 5.76971 (* 0.3 = 1.73091 loss)
I0515 14:43:45.377763 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 5.70022 (* 1 = 5.70022 loss)
I0515 14:43:45.377768 1279 sgd_solver.cpp:106] Iteration 119040, lr = 0.000922605
And the net cannot reduce that loss long after the sudden change happened
I0515 16:51:10.485610 1279 solver.cpp:228] Iteration 161040, loss = 9.01994
I0515 16:51:10.485649 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 5.63485 (* 0.3 = 1.69046 loss)
I0515 16:51:10.485656 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 5.63484 (* 0.3 = 1.69045 loss)
I0515 16:51:10.485661 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 5.62972 (* 1 = 5.62972 loss)
I0515 16:51:10.485666 1279 sgd_solver.cpp:106] Iteration 161040, lr = 0.0008937
I rerun the experiment two times and it just repeat exactly at the iteration 119040-th. For further information, I did data shuffling in creating the LMDB database. And I used this database to train a VGG-16 (step learning rate policy, max 80k iterations, 20k iters per step) without any problem. With VGG I obtain 55% top-1 accuracy.
Anybody meet a similar problem to mine?