Skip backpropagation in CNN training for faster training in Caffe - deep-learning

My intention is to speed up training process using Weights from previous trained models to similar layers in new model training.
Say I have two models 1st model and 2nd model.
1st model is train from scratch.
2nd model has two additional layers (conv_1_1 and relu_1_1) difference from 1st model. The rest of layers are same.
So I like to back-propagate only those layers they are different layers from 1st model and those same layers as 1st model are not retrained. Then intention is to speed up training the 2nd model.
For that what I do is I set lr_mult and decay_mult to 0.
But found out that training time of 2nd model is still as long as that of 1st model.
I think they are still multiplying with 0, so even though weights are not updated, computation is still there.
How can I skip backpropagation at all in those same layers?
I checked log file and found as
I0830 13:40:26.546422 10580 net.cpp:226] mbox_loss needs backward computation.
I0830 13:40:26.546432 10580 net.cpp:228] mbox_priorbox does not need backward computation.
I0830 13:40:26.546437 10580 net.cpp:226] mbox_conf needs backward computation.
I0830 13:40:26.546440 10580 net.cpp:226] mbox_loc needs backward computation.
I0830 13:40:26.546444 10580 net.cpp:228] conv_6_norm_mbox_priorbox does not need backward computation.
I0830 13:40:26.546448 10580 net.cpp:226] conv_6_norm_mbox_conf_flat needs backward computation.
I0830 13:40:26.546452 10580 net.cpp:226] conv_6_norm_mbox_conf_perm needs backward computation.
I0830 13:40:26.546455 10580 net.cpp:226] conv_6_norm_mbox_conf needs backward computation.
I0830 13:40:26.546460 10580 net.cpp:226] conv_6_norm_mbox_loc_flat needs backward computation.
I0830 13:40:26.546464 10580 net.cpp:226] conv_6_norm_mbox_loc_perm needs backward computation.
I0830 13:40:26.546468 10580 net.cpp:226] conv_6_norm_mbox_loc needs backward computation.
I0830 13:40:26.546471 10580 net.cpp:226] conv_6_norm_conv_6_norm_0_split needs backward computation.
I0830 13:40:26.546475 10580 net.cpp:226] conv_6_norm needs backward computation.
I0830 13:40:26.546478 10580 net.cpp:226] relu_6 needs backward computation.
I0830 13:40:26.546481 10580 net.cpp:226] conv_6 needs backward computation.
I0830 13:40:26.546485 10580 net.cpp:226] pool_5 needs backward computation.
I0830 13:40:26.546489 10580 net.cpp:226] relu_5 needs backward computation.
I0830 13:40:26.546492 10580 net.cpp:226] conv_5 needs backward computation.
I0830 13:40:26.546495 10580 net.cpp:226] pool_4 needs backward computation.
I0830 13:40:26.546499 10580 net.cpp:226] relu_4 needs backward computation.
I0830 13:40:26.546502 10580 net.cpp:226] conv_4 needs backward computation.
I0830 13:40:26.546505 10580 net.cpp:226] pool_3 needs backward computation.
I0830 13:40:26.546509 10580 net.cpp:226] relu_3 needs backward computation.
I0830 13:40:26.546512 10580 net.cpp:226] conv_3 needs backward computation.
I0830 13:40:26.546515 10580 net.cpp:226] pool_2 needs backward computation.
I0830 13:40:26.546519 10580 net.cpp:226] relu_2 needs backward computation.
I0830 13:40:26.546522 10580 net.cpp:226] conv_2 needs backward computation.
I0830 13:40:26.546525 10580 net.cpp:226] pool_1 needs backward computation.
I0830 13:40:26.546530 10580 net.cpp:226] relu_1_1 needs backward computation.
I0830 13:40:26.546532 10580 net.cpp:226] conv_1_1 needs backward computation.
I0830 13:40:26.546536 10580 net.cpp:228] relu_1 does not need backward computation.
I0830 13:40:26.546540 10580 net.cpp:228] conv_1 does not need backward computation.
I0830 13:40:26.546545 10580 net.cpp:228] data_data_0_split does not need backward computation.
I0830 13:40:26.546548 10580 net.cpp:228] data does not need backward computation.
So only conv_1 and relu_1 are not back propagated. But other layers are still back propagated.
How can I switch off back propagation in the following layers
conv_6_norm_conv_6_norm_0_split, conv_6_norm, relu_6, conv_6, pool_5, relu_5, conv_5, pool_4, relu_4, conv_4,pool_3, relu_3, conv_3, pool_2, relu_2, conv_2
Train.prototxt file is as follow.
name: "RegNet_train_0"
layer {
name: "data"
type: "AnnotatedData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
mean_value: 104.0
mean_value: 117.0
mean_value: 123.0
resize_param {
prob: 1.0
resize_mode: WARP
height: 480
width: 480
interp_mode: LINEAR
interp_mode: AREA
interp_mode: NEAREST
interp_mode: CUBIC
interp_mode: LANCZOS4
height_scale: 480
width_scale: 480
}
emit_constraint {
emit_type: CENTER
}
distort_param {
brightness_prob: 0.5
brightness_delta: 32.0
contrast_prob: 0.5
contrast_lower: 0.5
contrast_upper: 1.5
hue_prob: 0.5
hue_delta: 18.0
saturation_prob: 0.5
saturation_lower: 0.5
saturation_upper: 1.5
random_order_prob: 0.0
}
expand_param {
prob: 0.5
max_expand_ratio: 4.0
}
}
data_param {
source: "/home/coie/data/NumberPlate/lmdb/Nextan_trainval_lmdb"
batch_size: 16
backend: LMDB
}
annotated_data_param {
batch_sampler {
max_sample: 1
max_trials: 1
}
batch_sampler {
sampler {
min_scale: 0.300000011921
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.10000000149
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.300000011921
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.300000011921
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.300000011921
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.5
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.300000011921
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.699999988079
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.300000011921
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
min_jaccard_overlap: 0.899999976158
}
max_sample: 1
max_trials: 50
}
batch_sampler {
sampler {
min_scale: 0.300000011921
max_scale: 1.0
min_aspect_ratio: 0.5
max_aspect_ratio: 2.0
}
sample_constraint {
max_jaccard_overlap: 1.0
}
max_sample: 1
max_trials: 50
}
label_map_file: "/home/coie/data/NumberPlate/labelmap_NumberPlate.prototxt"
}
}
layer {
name: "conv_1"
type: "Convolution"
bottom: "data"
top: "conv_1"
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
convolution_param {
num_output: 8
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu_1"
type: "ReLU"
bottom: "conv_1"
top: "conv_1"
}
layer {
name: "conv_1_1"
type: "Convolution"
bottom: "conv_1"
top: "conv_1_1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 8
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu_1_1"
type: "ReLU"
bottom: "conv_1_1"
top: "conv_1_1"
}
layer {
name: "pool_1"
type: "Pooling"
bottom: "conv_1_1"
top: "pool_1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv_2"
type: "Convolution"
bottom: "pool_1"
top: "conv_2"
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
convolution_param {
num_output: 8
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu_2"
type: "ReLU"
bottom: "conv_2"
top: "conv_2"
}
layer {
name: "pool_2"
type: "Pooling"
bottom: "conv_2"
top: "pool_2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv_3"
type: "Convolution"
bottom: "pool_2"
top: "conv_3"
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
convolution_param {
num_output: 16
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu_3"
type: "ReLU"
bottom: "conv_3"
top: "conv_3"
}
layer {
name: "pool_3"
type: "Pooling"
bottom: "conv_3"
top: "pool_3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv_4"
type: "Convolution"
bottom: "pool_3"
top: "conv_4"
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
convolution_param {
num_output: 16
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu_4"
type: "ReLU"
bottom: "conv_4"
top: "conv_4"
}
layer {
name: "pool_4"
type: "Pooling"
bottom: "conv_4"
top: "pool_4"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv_5"
type: "Convolution"
bottom: "pool_4"
top: "conv_5"
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
convolution_param {
num_output: 32
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu_5"
type: "ReLU"
bottom: "conv_5"
top: "conv_5"
}
layer {
name: "pool_5"
type: "Pooling"
bottom: "conv_5"
top: "pool_5"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv_6"
type: "Convolution"
bottom: "pool_5"
top: "conv_6"
param {
lr_mult: 0.0
decay_mult: 0.0
}
param {
lr_mult: 0.0
decay_mult: 0.0
}
convolution_param {
num_output: 32
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu_6"
type: "ReLU"
bottom: "conv_6"
top: "conv_6"
}
layer {
name: "conv_6_norm"
type: "Normalize"
bottom: "conv_6"
top: "conv_6_norm"
norm_param {
across_spatial: false
scale_filler {
type: "constant"
value: 20.0
}
channel_shared: false
}
}
layer {
name: "conv_6_norm_mbox_loc"
type: "Convolution"
bottom: "conv_6_norm"
top: "conv_6_norm_mbox_loc"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 12
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "conv_6_norm_mbox_loc_perm"
type: "Permute"
bottom: "conv_6_norm_mbox_loc"
top: "conv_6_norm_mbox_loc_perm"
permute_param {
order: 0
order: 2
order: 3
order: 1
}
}
layer {
name: "conv_6_norm_mbox_loc_flat"
type: "Flatten"
bottom: "conv_6_norm_mbox_loc_perm"
top: "conv_6_norm_mbox_loc_flat"
flatten_param {
axis: 1
}
}
layer {
name: "conv_6_norm_mbox_conf"
type: "Convolution"
bottom: "conv_6_norm"
top: "conv_6_norm_mbox_conf"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 6
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "conv_6_norm_mbox_conf_perm"
type: "Permute"
bottom: "conv_6_norm_mbox_conf"
top: "conv_6_norm_mbox_conf_perm"
permute_param {
order: 0
order: 2
order: 3
order: 1
}
}
layer {
name: "conv_6_norm_mbox_conf_flat"
type: "Flatten"
bottom: "conv_6_norm_mbox_conf_perm"
top: "conv_6_norm_mbox_conf_flat"
flatten_param {
axis: 1
}
}
layer {
name: "conv_6_norm_mbox_priorbox"
type: "PriorBox"
bottom: "conv_6_norm"
bottom: "data"
top: "conv_6_norm_mbox_priorbox"
prior_box_param {
min_size: 25.6000003815
max_size: 48.0
aspect_ratio: 3.0
flip: false
clip: false
variance: 0.10000000149
variance: 0.10000000149
variance: 0.20000000298
variance: 0.20000000298
img_size: 480
step: 32.0
offset: 0.5
}
}
layer {
name: "mbox_loc"
type: "Concat"
bottom: "conv_6_norm_mbox_loc_flat"
top: "mbox_loc"
concat_param {
axis: 1
}
}
layer {
name: "mbox_conf"
type: "Concat"
bottom: "conv_6_norm_mbox_conf_flat"
top: "mbox_conf"
concat_param {
axis: 1
}
}
layer {
name: "mbox_priorbox"
type: "Concat"
bottom: "conv_6_norm_mbox_priorbox"
top: "mbox_priorbox"
concat_param {
axis: 2
}
}
layer {
name: "mbox_loss"
type: "MultiBoxLoss"
bottom: "mbox_loc"
bottom: "mbox_conf"
bottom: "mbox_priorbox"
bottom: "label"
top: "mbox_loss"
include {
phase: TRAIN
}
propagate_down: true
propagate_down: true
propagate_down: false
propagate_down: false
loss_param {
normalization: VALID
}
multibox_loss_param {
loc_loss_type: SMOOTH_L1
conf_loss_type: SOFTMAX
loc_weight: 1.0
num_classes: 2
share_location: true
match_type: PER_PREDICTION
overlap_threshold: 0.5
use_prior_for_matching: true
background_label_id: 1
use_difficult_gt: true
neg_pos_ratio: 3.0
neg_overlap: 0.5
code_type: CENTER_SIZE
ignore_cross_boundary_bbox: false
mining_type: MAX_NEGATIVE
}
}

Caffe library has param_need_backward, layer_need_backward_ and blob_need_backward_.
param_need_backward controls weights update and if lr_mult is set 0, array element for that layer in the param_need_backward array has false. Then inside weight is not updated in Conv_layer.cpp since param_propagate_down_ is false.
if (this->param_propagate_down_[0] || propagate_down[i]) {
for (int n = 0; n < this->num_; ++n) {
// gradient w.r.t. weight. Note that we will accumulate diffs.
if (this->param_propagate_down_[0]) {
this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,
top_diff + n * this->top_dim_, weight_diff);
}
// gradient w.r.t. bottom data, if necessary.
if (propagate_down[i]) {
this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight,
bottom_diff + n * this->bottom_dim_);
}
}
}
layer_need_backward_ and blob_need_backward_ are for layers and they are controlled in Net init according to Network architecture.

Related

The outputs of the convolutional layer in Caffe are different

I wrote a siamese-like network using caffe with two inputs. The output of the convolutional layer with the first input is always the same, while the second output changes every time. The input layer and the convolutional layers are as follows:
layer {
name: "input"
type: "Input"
top: "data1"
top: "data2"
input_param {
shape {dim: 1
dim: 1
dim: 28
dim: 28
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data1"
top: "conv1_1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data2"
top: "conv1_2"
convolution_param {
num_output: 20
kernel_size: 5
bias_term: false
weight_filler {
type: "xavier"
}
}
}
May I build the convolutional layer with the python layer? If so, how

What is the output dimension in the googlenet after the concat layer?

When looking throught the prototxt of the googlenet one finds that the inception layers have a concat layer at the end which takes several bottom inputs.
e.g:
layer {
name: "inception_3a/output"
type: "Concat"
bottom: "inception_3a/1x1"
bottom: "inception_3a/3x3"
bottom: "inception_3a/5x5"
bottom: "inception_3a/pool_proj"
top: "inception_3a/output"
}
As it can be seen, there is one 1x1 conv-layer, one 3x3 conv-layer , one 5x5 conv-layer and finally a pooling layer. These layers are described as following:
layer {
name: "inception_3a/1x1"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.03
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_1x1"
type: "ReLU"
bottom: "inception_3a/1x1"
top: "inception_3a/1x1"
}
layer {
name: "inception_3a/3x3_reduce"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.09
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_3a/3x3_reduce"
top: "inception_3a/3x3_reduce"
}
layer {
name: "inception_3a/3x3"
type: "Convolution"
bottom: "inception_3a/3x3_reduce"
top: "inception_3a/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
std: 0.03
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_3x3"
type: "ReLU"
bottom: "inception_3a/3x3"
top: "inception_3a/3x3"
}
layer {
name: "inception_3a/5x5_reduce"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 16
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.2
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_3a/5x5_reduce"
top: "inception_3a/5x5_reduce"
}
layer {
name: "inception_3a/5x5"
type: "Convolution"
bottom: "inception_3a/5x5_reduce"
top: "inception_3a/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
std: 0.03
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_5x5"
type: "ReLU"
bottom: "inception_3a/5x5"
top: "inception_3a/5x5"
}
layer {
name: "inception_3a/pool"
type: "Pooling"
bottom: "pool2/3x3_s2"
top: "inception_3a/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_3a/pool_proj"
type: "Convolution"
bottom: "inception_3a/pool"
top: "inception_3a/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.1
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
It can be seen that these have different numbers of output and also different filter size, anyhow the documentation on the concat layer is the following:
input:
n_i * c_i * h * w for each input blob i from 1 to K.
Output:
if axis = 0: (n_1 + n_2 + ... + n_K) * c_1 * h * w, and all input c_i
should be the same.
if axis = 1: n_1 * (c_1 + c_2 + ... + c_K) * h * w, and all input n_i should be the same.
Firstly, I am not sure what the default is and secondly I am not sure which Dimensions will have the output Volume, since width and height should stay the same but all thre conv layer produce different outputs. Any pointers would be really appreciated
The default value for 'Concat' axis is 1, thus concatenating through channel dimension. In order to do this, all the layers that are concatenated, should have the same height and width. Looking to the log, the dimensions are (assuming batch size 32):
inception_3a/1x1 -> [32, 64, 28, 28]
inception_3a/3x3 -> [32, 128, 28, 28]
inception_3a/5x5 -> [32, 32, 28, 28]
inception_3a/pool_proj -> [32, 32, 28, 28]
Thus the final output will have dimension:
inception_3a/output -> [32 (64+128+32+32) 28, 28] -> [32, 256, 28, 28]
As expected from the Caffe log:
Creating Layer inception_3a/output
inception_3a/output <- inception_3a/1x1
inception_3a/output <- inception_3a/3x3
inception_3a/output <- inception_3a/5x5
inception_3a/output <- inception_3a/pool_proj
inception_3a/output -> inception_3a/output
Setting up inception_3a/output
Top shape: 32 256 28 28 (6422528)

Deploy error axis 2 out of range for alexnet

I am using the Alexnet and try to deploy my network. But when I do so I get the following error:
I0109 15:16:56.645679 4240 net.cpp:100] Creating Layer fc6
I0109 15:16:56.645681 4240 net.cpp:434] fc6 <- pool5
I0109 15:16:56.645684 4240 net.cpp:408] fc6 -> fc6
I0109 15:16:56.712829 4240 net.cpp:150] Setting up fc6
I0109 15:16:56.712869 4240 net.cpp:157] Top shape: 1 4096 (4096)
I0109 15:16:56.712873 4240 net.cpp:165] Memory required for data: 6778220
I0109 15:16:56.712882 4240 layer_factory.hpp:77] Creating layer relu6
I0109 15:16:56.712890 4240 net.cpp:100] Creating Layer relu6
I0109 15:16:56.712893 4240 net.cpp:434] relu6 <- fc6
I0109 15:16:56.712915 4240 net.cpp:395] relu6 -> fc6 (in-place)
F0109 15:16:56.713158 4240 blob.hpp:122] Check failed: axis_index < num_axes() (2 vs. 2) axis 2 out of range for 2-D Blob with shape 1 4096 (4096)
*** Check failure stack trace: ***
I have no clue why. It has always been working for me and now this error occurs.
EDIT
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6l"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6l"
top: "fc6d"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6d"
top: "fc7"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
inner_product_param {
num_output: 612
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "softmax"
type: "Softmax"
bottom: "fc8"
top: "softmax"
}
Above is my .prototxt (Should be the same as Alexnet)
Check failed: axis_index < num_axes() (2 vs. 2) axis 2 out of range for 2-D Blob with shape 1 4096 (4096)
*** Check failure stack trace: ***
# 0x7f8b8e3125cd google::LogMessage::Fail()
# 0x7f8b8e314433 google::LogMessage::SendToLog()
# 0x7f8b8e31215b google::LogMessage::Flush()
# 0x7f8b8e314e1e google::LogMessageFatal::~LogMessageFatal()
# 0x7f8b8e92c86a caffe::Blob<>::CanonicalAxisIndex()
# 0x7f8b8eaa09c2 caffe::CuDNNReLULayer<>::Reshape()
# 0x7f8b8e97f481 caffe::Net<>::Init()
# 0x7f8b8e980d01 caffe::Net<>::Net()
# 0x7f8b8eac7c5a caffe::Solver<>::InitTrainNet()
# 0x7f8b8eac8fc7 caffe::Solver<>::Init()
# 0x7f8b8eac936a caffe::Solver<>::Solver()
# 0x7f8b8e960c53 caffe::Creator_SGDSolver<>()
# 0x40ac89 train()
# 0x407590 main
# 0x7f8b8d283830 __libc_start_main
# 0x407db9 _start
# (nil) (unknown)
Aborted (core dumped)
Example 2:
I0228 13:03:22.875816 4395 layer_factory.hpp:77] Creating layer relu6
I0228 13:03:22.875828 4395 net.cpp:100] Creating Layer relu6
I0228 13:03:22.875831 4395 net.cpp:434] relu6 <- fc-main
I0228 13:03:22.875855 4395 net.cpp:395] relu6 -> fc-main (in-place)
F0228 13:03:22.876565 4395 blob.hpp:122] Check failed: axis_index < num_axes() (2 vs. 2) axis 2 out of range for 2-D Blob with shape 4 4096 (16384)
*** Check failure stack trace: ***
# 0x7fe1271d85cd google::LogMessage::Fail()
# 0x7fe1271da433 google::LogMessage::SendToLog()
# 0x7fe1271d815b google::LogMessage::Flush()
# 0x7fe1271dae1e google::LogMessageFatal::~LogMessageFatal()
# 0x7fe1277f286a caffe::Blob<>::CanonicalAxisIndex()
# 0x7fe1279669c2 caffe::CuDNNReLULayer<>::Reshape()
# 0x7fe127845481 caffe::Net<>::Init()
# 0x7fe127846d01 caffe::Net<>::Net()
# 0x7fe12798dc5a caffe::Solver<>::InitTrainNet()
# 0x7fe12798efc7 caffe::Solver<>::Init()
# 0x7fe12798f36a caffe::Solver<>::Solver()
# 0x7fe127826c53 caffe::Creator_SGDSolver<>()
# 0x40ac89 train()
# 0x407590 main
# 0x7fe126149830 __libc_start_main
# 0x407db9 _start
# (nil) (unknown)
Aborted (core dumped)
layer {
name: "fc-main"
type: "InnerProduct"
bottom: "pool5"
top: "fc-main"
param {
decay_mult: 1
}
param {
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "xavier"
std: 0.005
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc-main"
top: "fc-main"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc-main"
top: "fc-main"
dropout_param {
dropout_ratio: 0.5
}
}

caffe reshape / upsample fully connected layer

Assuming we have a layer like this:
layer {
name: "fully-connected"
type: "InnerProduct"
bottom: "bottom"
top: "top"
inner_product_param {
num_output: 1
}
}
The output is batch_size x 1. In several papers (for exmaple link1 page 3 picture on the top, or link2 page 4 on top)I have seen that they used such a layer in the end to come up with a 2D image for pixel-wise prediction. How is it possible to transform this into a 2D image? I was thinking of reshape or deconvolution, but I cannot figure out how that would work. A simple example would be helpful
UPDATE: My input images are 304x228 and my ground_truth (depth images) are 75x55.
################# Main net ##################
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relufc6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4070
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
type: "Reshape"
name: "reshape"
bottom: "fc7"
top: "fc7_reshaped"
reshape_param {
shape { dim: 1 dim: 1 dim: 55 dim: 74 }
}
}
layer {
name: "deconv1"
type: "Deconvolution"
bottom: "fc7_reshaped"
top: "deconv1"
convolution_param {
num_output: 64
kernel_size: 5
pad: 2
stride: 1
#group: 256
weight_filler {
type: "bilinear"
}
bias_term: false
}
}
#########################
layer {
name: "conv6"
type: "Convolution"
bottom: "data"
top: "conv6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 63
kernel_size: 9
stride: 2
pad: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "conv6"
top: "conv6"
}
layer {
name: "pool6"
type: "Pooling"
bottom: "conv6"
top: "pool6"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
########################
layer {
name: "concat"
type: "Concat"
bottom: "deconv1"
bottom: "pool6"
top: "concat"
concat_param {
concat_dim: 1
}
}
layer {
name: "conv7"
type: "Convolution"
bottom: "concat"
top: "conv7"
convolution_param {
num_output: 64
kernel_size: 5
pad: 2
stride: 1
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "conv7"
top: "conv7"
relu_param{
negative_slope: 0.01
engine: CUDNN
}
}
layer {
name: "conv8"
type: "Convolution"
bottom: "conv7"
top: "conv8"
convolution_param {
num_output: 64
kernel_size: 5
pad: 2
stride: 1
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu8"
type: "ReLU"
bottom: "conv8"
top: "conv8"
relu_param{
negative_slope: 0.01
engine: CUDNN
}
}
layer {
name: "conv9"
type: "Convolution"
bottom: "conv8"
top: "conv9"
convolution_param {
num_output: 1
kernel_size: 5
pad: 2
stride: 1
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu9"
type: "ReLU"
bottom: "conv9"
top: "result"
relu_param{
negative_slope: 0.01
engine: CUDNN
}
}
log:
I1108 19:34:57.239722 4277 data_layer.cpp:41] output data size: 1,1,228,304
I1108 19:34:57.243340 4277 data_layer.cpp:41] output data size: 1,1,55,74
I1108 19:34:57.247392 4277 net.cpp:150] Setting up conv1
I1108 19:34:57.247407 4277 net.cpp:157] Top shape: 1 96 55 74 (390720)
I1108 19:34:57.248191 4277 net.cpp:150] Setting up pool1
I1108 19:34:57.248196 4277 net.cpp:157] Top shape: 1 96 27 37 (95904)
I1108 19:34:57.253263 4277 net.cpp:150] Setting up conv2
I1108 19:34:57.253276 4277 net.cpp:157] Top shape: 1 256 27 37 (255744)
I1108 19:34:57.254202 4277 net.cpp:150] Setting up pool2
I1108 19:34:57.254220 4277 net.cpp:157] Top shape: 1 256 13 18 (59904)
I1108 19:34:57.269943 4277 net.cpp:150] Setting up conv3
I1108 19:34:57.269961 4277 net.cpp:157] Top shape: 1 384 13 18 (89856)
I1108 19:34:57.285303 4277 net.cpp:150] Setting up conv4
I1108 19:34:57.285338 4277 net.cpp:157] Top shape: 1 384 13 18 (89856)
I1108 19:34:57.294801 4277 net.cpp:150] Setting up conv5
I1108 19:34:57.294841 4277 net.cpp:157] Top shape: 1 256 13 18 (59904)
I1108 19:34:57.295207 4277 net.cpp:150] Setting up pool5
I1108 19:34:57.295210 4277 net.cpp:157] Top shape: 1 256 6 9 (13824)
I1108 19:34:57.743222 4277 net.cpp:150] Setting up fc6
I1108 19:34:57.743259 4277 net.cpp:157] Top shape: 1 4096 (4096)
I1108 19:34:57.881680 4277 net.cpp:150] Setting up fc7
I1108 19:34:57.881718 4277 net.cpp:157] Top shape: 1 4070 (4070)
I1108 19:34:57.881826 4277 net.cpp:150] Setting up reshape
I1108 19:34:57.881846 4277 net.cpp:157] Top shape: 1 1 55 74 (4070)
I1108 19:34:57.884768 4277 net.cpp:150] Setting up conv6
I1108 19:34:57.885309 4277 net.cpp:150] Setting up pool6
I1108 19:34:57.885327 4277 net.cpp:157] Top shape: 1 63 55 74 (256410)
I1108 19:34:57.885395 4277 net.cpp:150] Setting up concat
I1108 19:34:57.885412 4277 net.cpp:157] Top shape: 1 64 55 74 (260480)
I1108 19:34:57.886759 4277 net.cpp:150] Setting up conv7
I1108 19:34:57.886786 4277 net.cpp:157] Top shape: 1 64 55 74 (260480)
I1108 19:34:57.897269 4277 net.cpp:150] Setting up conv8
I1108 19:34:57.897303 4277 net.cpp:157] Top shape: 1 64 55 74 (260480)
I1108 19:34:57.899129 4277 net.cpp:150] Setting up conv9
I1108 19:34:57.899138 4277 net.cpp:157] Top shape: 1 1 55 74 (4070)
The value of num_output of the last fully connected layer will not be 1 for pixel wise prediction. It will be equal to w*h of the input image.
What made you feel that the value will be 1?
Edit 1:
Below are the dimensions of each layer mentioned in link1 page 3 figure:
LAYER OUTPUT DIM [c*h*w]
course1 96*h1*w1 conv layer
course2 256*h2*w2 conv layer
course3 384*h3*w3 conv layer
course4 384*h4*w4 conv layer
course5 256*h5*w5 conv layer
course6 4096*1*1 fc layer
course7 X*1*1 fc layer where 'X' could be interpreted as w*h
To understand this further, lets assume we have a network to predict the pixels of the image. The images are of size 10*10. Thus, the final output of the fc layer too will be having the dimension 100*1*1(like in course7). This could be interpreted as 10*10.
Now the question will be, how can the 1d array predict a 2d image correctly. For this, you have to note that the loss is calculated for this output, using the labels which could be corresponding to the pixel data. Thus during training, the weights will learn to predict the pixel data.
EDIT 2:
Trying to draw the net using draw_net.py in caffe, gives you this:
The relu layer connected with conv6 and fc6 has the same name, leading to a complicated connectivity in the drawn image. I am not sure on whether this will cause some issues during training, but I would suggest you to rename one of the relu layers to a unique name to avoid some unforseen issues.
Coming back to your question, there doesn't seem to be an upsampling happening after fully connected layers. As seen in the log:
I1108 19:34:57.881680 4277 net.cpp:150] Setting up fc7
I1108 19:34:57.881718 4277 net.cpp:157] Top shape: 1 4070 (4070)
I1108 19:34:57.881826 4277 net.cpp:150] Setting up reshape
I1108 19:34:57.881846 4277 net.cpp:157] Top shape: 1 1 55 74 (4070)
I1108 19:34:57.884768 4277 net.cpp:150] Setting up conv6
I1108 19:34:57.885309 4277 net.cpp:150] Setting up pool6
I1108 19:34:57.885327 4277 net.cpp:157] Top shape: 1 63 55 74 (256410)
fc7 has output dimension of 4070*1*1. This is being reshaped to 1*55*74 to be passed as an input to conv6 layer.
The output of the whole network is produced in conv9, which has an output dimension of 1*55*74, which is exactly similar to the dimension of the labels (depth data).
Please do pinpoint on where you feel the upsample is happening, if my answer is still not clear.
if you simply need fully-connected networks like the conventional multi-layer perceptron, use 2D blobs (shape (N, D)) and call the InnerProductLayer.

Discrepancy in results when using batch size 1 in prototxt versus coercing batch size to 1 in pycaffe

I am running the MNIST example with some manual changes to the layers. While training everything works great and I reach a final test accuracy of ~99%. I am now trying to work with the generated model in python using pycaffe and am following the steps as given here. I want to compute the confusion matrix so I'm looping through the test images one by one from LMDB and then running the network. Here is the code :
net = caffe.Net(args.proto, args.model, caffe.TEST)
...
datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(value)
label = int(datum.label)
image = caffe.io.datum_to_array(datum).astype(np.uint8)
...
net.blobs['data'].reshape(1, 1, 28, 28) # Greyscale 28x28 images
net.blobs['data'].data[...] = image
net.forward()
# Get predicted label
print net.blobs['label'].data[0] # use this later for confusion matrix
Here is my network definition prototxt
name: "MNISTNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc2"
bottom: "label"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
Note that test batch size is 100 which is why I need that reshape in python code. Now, suppose I change the test batch size to 1, the exact same python code prints different (and mostly correct) predicted class labels. Thus, the code being run with batch size 1 produces expected result with ~99% accuracy while batch size 100 is horrible.
However, based on the Imagenet pycaffe tutorial, I don't see what I'm doing wrong. As a last resort I can create a copy of my prototxt with batch size 1 for test and use that in my python code and use the original one while training, but that is not ideal.
Also, I don't think it should be an issue with preprocessing since it doesn't explain why it works well with batch size 1.
Any pointers appreciated!