How to train ResNet101 model from scratch in Caffe? - caffe

I'm using the DeepLab_v2 version of Caffe in order to do semantic segmentation. I can finetune the ResNet101 using imagenet model, but I cannot train the model from scratch using custom data. Did anyone have similar experience and managed to solve this issue?
This is how a functional block of the ResNet looks like, that I'm currently using for training:
layer {
bottom: "data"
top: "conv1"
name: "conv1"
type: "Convolution"
param {
name: "conv1_0"
lr_mult: 1
decay_mult: 1
}
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 2
bias_term: false
weight_filler {
type: "msra"
}
}
}
layer {
bottom: "conv1"
top: "conv1"
name: "bn_conv1"
type: "BatchNorm"
batch_norm_param {
use_global_stats: true
}
param {
name: "bn_conv1_0"
lr_mult: 0
}
param {
name: "bn_conv1_1"
lr_mult: 0
}
param {
name: "bn_conv1_2"
lr_mult: 0
}
}
layer {
bottom: "conv1"
top: "conv1"
name: "scale_conv1"
type: "Scale"
scale_param {
bias_term: true
filler {
value: 0.5
}
bias_filler {
value: -2
}
}
param {
name: "scale_conv1_0"
lr_mult: 0
}
param {
name: "scale_conv1_1"
lr_mult: 0
}
}
layer {
top: "conv1"
bottom: "conv1"
name: "conv1_relu"
type: "ReLU"
}
I tried all kinds of variations including use_global_stats: false. I am able to train one single block of the type above, but when I try to use all 101 layers, the model does not converge anymore.
Any ideas?

Related

CNN doesn't learn simple geometric patterns

It must be a very stupid question, but since I have not such sufficient know ledge storage and having no more time to search the answer of it, I have to put it here to ask for help. I generated a training dataset of images of simple geometric shapes as triangles, squares, diamonds etc. by programs and constructed a CNN with two convolutional layers and one pooling layer also a final fully connected layer to learn the classifications of these shapes. But the network just does not to learn it. I mean the loss just does not decrease. What is the cause?
In Caffe, the neural network configuration file "very_simple_one.prototxt" looks like:
name: "very_simple_one"
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/train_valid_lmdb"
batch_size: 1000
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/test_lmdb"
batch_size: 100
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 50
kernel_size: 5
stride: 5
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 5
stride: 5
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 3
kernel_size: 8
stride: 8
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "fc3"
type: "InnerProduct"
bottom: "conv2"
top: "fc3"
inner_product_param {
num_output: 3
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc3"
bottom: "label"
}
The "solver.prototxt" looks like:
net: "very_simple_one.prototxt"
type: "SGD"
test_iter: 15
test_interval: 100
base_lr: 0.05
lr_policy: "step"
gamma: 0.9999
stepsize: 100
display: 20
max_iter: 50000
snapshot: 2000
momentum: 0.9
weight_decay: 0.00000000000
solver_mode: GPU
Also tried AdaGrad by commenting the "momentum" and modify the "type" to AdaGrad.
Train this net by the command:
....../caffe/build/tools/caffe train -solver solver.prototxt
All failed to train. I mean the loss just does not decrease. The loss is hovering within a very very small interval but never really to decrease.
Just wonder if the dataset is definitely not able to be trained or there is something wrong with my configuration files, the above ones?
I also have modified the network according to what Ibrahim Yousuf said by replacing the pooling layer as convolutional layer as:
name: "very_simple_one"
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/train_valid_lmdb"
batch_size: 1000
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/test_lmdb"
batch_size: 100
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 50
kernel_size: 5
##stride: 5
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv1.5"
type: "Convolution"
bottom: "conv1"
top: "conv1.5"
convolution_param {
num_output: 10
kernel_size: 5
stride: 2
}
}
layer {
name: "relu1.5"
type: "ReLU"
bottom: "conv1.5"
top: "conv1.5"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "conv1.5"
top: "conv2"
convolution_param {
num_output: 3
kernel_size: 8
stride: 4
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "fc3"
type: "InnerProduct"
bottom: "conv2"
top: "fc3"
inner_product_param {
num_output: 3
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc3"
bottom: "label"
}
But the loss still does not decrease. Should I be confirmed that the cause is my dataset? And my dataset is really very small and if anyone could give me a hand, I can upload it onto somewhere a net disk to be downloaded for test.
Solved. The labels for classification should start from zero not one e.g. 0, 1 ,2 for three classification problem not 1, 2, 3.

Using the SPP Layer in caffe results in Check failed: pad_w_ < kernel_w_ (1 vs. 1)

Ok, I had a previous question about using the SPP Layer in caffe.
This question is a subsequent to the previous one.
When using the SPP Layer I get the error output below.
It seems that the images are getting too small when reaching the spp layer?
The images I use are small. The width ranges between 10 and 20 px and height ranges between 30 and 35px.
I0719 12:18:22.553256 2114932736 net.cpp:406] spatial_pyramid_pooling <- conv2
I0719 12:18:22.553261 2114932736 net.cpp:380] spatial_pyramid_pooling -> pool2
F0719 12:18:22.553505 2114932736 pooling_layer.cpp:74] Check failed: pad_w_ < kernel_w_ (1 vs. 1)
*** Check failure stack trace: ***
# 0x106afcb6e google::LogMessage::Fail()
# 0x106afbfbe google::LogMessage::SendToLog()
# 0x106afc53a google::LogMessage::Flush()
# 0x106aff86b google::LogMessageFatal::~LogMessageFatal()
# 0x106afce55 google::LogMessageFatal::~LogMessageFatal()
# 0x1068dc659 caffe::PoolingLayer<>::LayerSetUp()
# 0x1068ffd98 caffe::SPPLayer<>::LayerSetUp()
# 0x10691123f caffe::Net<>::Init()
# 0x10690fefe caffe::Net<>::Net()
# 0x106927ef8 caffe::Solver<>::InitTrainNet()
# 0x106927325 caffe::Solver<>::Init()
# 0x106926f95 caffe::Solver<>::Solver()
# 0x106935b46 caffe::SGDSolver<>::SGDSolver()
# 0x10693ae52 caffe::Creator_SGDSolver<>()
# 0x1067e78f3 train()
# 0x1067ea22a main
# 0x7fff9a3ad5ad start
# 0x5 (unknown)
I was correct, my images were to small.
I changed my net and it worked. I removed one conv layer and replaced the normal pool layer with the spp layer. I also had to set my test batch size to 1. Accuracy were very high, but my F1 Score went down. I dont know if this is related to the small test batch Size I had to use.
Net:
name: "TessDigitMean"
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_SPP/784/caffe/train_lmdb"
batch_size: 1 #64
backend: LMDB
}
}
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_SPP/784/caffe/test_lmdb"
batch_size: 1
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
pad_w: 2
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "spatial_pyramid_pooling"
type: "SPP"
bottom: "conv1"
top: "pool2"
spp_param {
pyramid_height: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}

The outputs of the convolutional layer in Caffe are different

I wrote a siamese-like network using caffe with two inputs. The output of the convolutional layer with the first input is always the same, while the second output changes every time. The input layer and the convolutional layers are as follows:
layer {
name: "input"
type: "Input"
top: "data1"
top: "data2"
input_param {
shape {dim: 1
dim: 1
dim: 28
dim: 28
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data1"
top: "conv1_1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data2"
top: "conv1_2"
convolution_param {
num_output: 20
kernel_size: 5
bias_term: false
weight_filler {
type: "xavier"
}
}
}
May I build the convolutional layer with the python layer? If so, how

caffe fcn pixel-wise segmentation to regression

hello I am quite new to deep learning and caffe so please do not mind if my question is a little stupid.
I have been looking into pixel-wise classification / segmentation / regression. Therefore I have seen there is a gitlhub repo for image segmentation fcn berkeley and some other posts like question 1, question 2.
What I wanted to do is something similar buf slightly different. I have a dataset of images and their corresponding ground_truth as images. I am not sure if it is better to use pixel-wise classification via SoftmaxLoss or regression via EuclideanLoss. My ground_truth images contain values from 0-255 and only have one channel.
I have been trying to do a regression task and have a fully convolutional network with a few convolutional layers which remain the output size and the last layer looks like this: In the end I want to do a depth prediction task. Therefore I am not sure if it is better to use SoftmaxWithLoss or EuclideanLoss. However this question might be a bit stupid. But is this approach correct? First I have tried to learn the shape of my images, i.e. I have set the values in the ground_truth to 0.5 when my input image has a value greater than 0 at the corresponding location. Could anyone help me please?
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 256
kernel_size: 53
stride: 1
pad: 26
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "conv1"
top: "conv2"
convolution_param {
num_output: 128
kernel_size: 15
stride: 1
pad: 7
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "conv3"
type: "Convolution"
bottom: "conv2"
top: "conv3"
convolution_param {
num_output: 1
kernel_size: 11
stride: 1
pad: 5
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
#layer {
# name: "loss"
# type: "SoftmaxWithLoss"
# bottom: "score"
# bottom: "label"
# top: "loss"
# loss_param {
# ignore_label: 255
# normalize: true
# }
#}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "conv3"
bottom: "label"
top: "loss"
}

Discrepancy in results when using batch size 1 in prototxt versus coercing batch size to 1 in pycaffe

I am running the MNIST example with some manual changes to the layers. While training everything works great and I reach a final test accuracy of ~99%. I am now trying to work with the generated model in python using pycaffe and am following the steps as given here. I want to compute the confusion matrix so I'm looping through the test images one by one from LMDB and then running the network. Here is the code :
net = caffe.Net(args.proto, args.model, caffe.TEST)
...
datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(value)
label = int(datum.label)
image = caffe.io.datum_to_array(datum).astype(np.uint8)
...
net.blobs['data'].reshape(1, 1, 28, 28) # Greyscale 28x28 images
net.blobs['data'].data[...] = image
net.forward()
# Get predicted label
print net.blobs['label'].data[0] # use this later for confusion matrix
Here is my network definition prototxt
name: "MNISTNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc2"
bottom: "label"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
Note that test batch size is 100 which is why I need that reshape in python code. Now, suppose I change the test batch size to 1, the exact same python code prints different (and mostly correct) predicted class labels. Thus, the code being run with batch size 1 produces expected result with ~99% accuracy while batch size 100 is horrible.
However, based on the Imagenet pycaffe tutorial, I don't see what I'm doing wrong. As a last resort I can create a copy of my prototxt with batch size 1 for test and use that in my python code and use the original one while training, but that is not ideal.
Also, I don't think it should be an issue with preprocessing since it doesn't explain why it works well with batch size 1.
Any pointers appreciated!