how to understand caffe's bilinear upsampling - caffe

caffe'doc says that:
layer {
name: "upsample", type: "Deconvolution"
bottom: "{{bottom_name}}" top: "{{top_name}}"
convolution_param {
kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
num_output: {{C}} group: {{C}}
pad: {{ceil((factor - 1) / 2.)}}
weight_filler: { type: "bilinear" } bias_term: false
}
param { lr_mult: 0 decay_mult: 0 }
}
I have no idea why to set kenrel_size, stride, and pad like this?

for upsampling, if you want resize factor to be 2, then the parameter would be kernel_size: 4, stride:2, pad:1

Related

Caffe Multilabel not working as expected

I've played with Caffe for a long time but never done multilabel classification and it seems I'm getting stuck:
What I'm using
First of all, I'm creating the lmdb (train_lmdb, val_lmdb), labels (labels_train_lmdb, labels_val_lmdb) and mean (mean_lmdb.binaryproto) with Caffe-LMDBCreation-MultiLabel.
The model has around 13000 images for 7 classes.
2000 of those images have two classes (for example, the vector is [1, 0, 0, 1, 0, 0, 0]
The rest of the images have only one class (for example, the vector would be [0, 0, 1, 0, 0, 0, 0]
What I'm expecting
I'm expecting, at least, to grab an image from the train dataset, for example:
img1.jpg 0 0 0 1 0 0 0
classify it, and have a value similar to [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]
What I'm having instead
for the image above (img1.jpg), I'm having these type of results:
[0.48112139105796814, 0.5486980676651001, 0.5396456122398376, 0.44233766198158264, 0.5605107545852661, 0.3539462387561798, 0.5215630531311035]
which doesn't make sense. I've tried with several snapshots (one each 10000 iterations) and results are similar, all of them really close to 0.50
My prototxt
train_val.prototxt
name: "multi-class-alexnet"
# --------------------------------- TRAIN -------------------------------
# -----------------------------------------------------------------------
layer {
name: "data"
type: "Data"
top: "data"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 180
mean_file: "./mean_lmdb.binaryproto"
}
data_param {
source: "./train_lmdb"
batch_size: 64
backend: LMDB
}
}
# ---------------------------- TRAIN LABELS -----------------------------
# -----------------------------------------------------------------------
layer {
name: "data"
type: "Data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
mean_value: 0
}
data_param {
source: "./labels_train_lmdb"
batch_size: 64
backend: LMDB
}
}
# ---------------------------------- VAL --------------------------------
# -----------------------------------------------------------------------
layer {
name: "data"
type: "Data"
top: "data"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 180
mean_file: "./mean_lmdb.binaryproto"
}
data_param {
source: "./val_lmdb"
batch_size: 32
backend: LMDB
}
}
# ----------------------------- VAL LABELS ------------------------------
# -----------------------------------------------------------------------
layer {
name: "data"
type: "Data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
mean_value: 0
}
data_param {
source: "./labels_val_lmdb"
batch_size: 32
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6_"
type: "InnerProduct"
bottom: "pool5"
top: "fc6_"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6_"
top: "fc6_"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6_"
top: "fc6_"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6_"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "latent"
type: "InnerProduct"
bottom: "fc7"
top: "latent"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 48
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
bottom: "latent"
top: "latent_sigmoid"
name: "latent_sigmoid"
type: "Sigmoid"
}
layer {
name: "fc9"
type: "InnerProduct"
bottom: "latent_sigmoid"
top: "fc9"
param {
lr_mult: 10
decay_mult: 1
}
param {
lr_mult: 20
decay_mult: 0
}
inner_product_param {
num_output: 7
weight_filler {
type: "gaussian"
std: 0.2
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "MultiLabelAccuracy"
bottom: "fc9"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
# ----------------------------------------------------------------
# ----------------- Multi-label Loss Function -------------------
# ----------------------------------------------------------------
layer {
name: "loss"
type: "SigmoidCrossEntropyLoss"
bottom: "fc9"
bottom: "label"
top: "loss"
}
deploy.prototxt:
name: "multi-class-alexnet"
input: "data"
input_shape {
dim: 10
dim: 3
dim: 180
dim: 180
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6_"
type: "InnerProduct"
bottom: "pool5"
top: "fc6_"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6_"
top: "fc6_"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6_"
top: "fc6_"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6_"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "latent_"
type: "InnerProduct"
bottom: "fc7"
top: "latent_"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 7
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
bottom: "latent_"
top: "latent_sigmoid"
name: "latent_sigmoid"
type: "Sigmoid"
}
Loss function
Somehow my model is showing two losses, which I don't understand, losses on output #0 and #5. These are the grep of the last lines (over 110000 iterations):
output #0:
I0427 10:20:04.475754 1817 solver.cpp:238] Train net output #0: loss = 0.0867133 (* 1 = 0.0867133 loss)
I0427 10:20:38.257825 1817 solver.cpp:238] Train net output #0: loss = 0.0477974 (* 1 = 0.0477974 loss)
I0427 10:21:11.794013 1817 solver.cpp:238] Train net output #0: loss = 0.0390092 (* 1 = 0.0390092 loss)
I0427 10:21:45.620671 1817 solver.cpp:238] Train net output #0: loss = 0.039954 (* 1 = 0.039954 loss)
I0427 10:22:19.271747 1817 solver.cpp:238] Train net output #0: loss = 0.0477802 (* 1 = 0.0477802 loss)
I0427 10:22:53.160802 1817 solver.cpp:238] Train net output #0: loss = 0.0406158 (* 1 = 0.0406158 loss)
I0427 10:23:26.843694 1817 solver.cpp:238] Train net output #0: loss = 0.0355715 (* 1 = 0.0355715 loss)
I0427 10:24:31.727321 1817 solver.cpp:238] Train net output #0: loss = 0.0396538 (* 1 = 0.0396538 loss)
I0427 10:25:05.019598 1817 solver.cpp:238] Train net output #0: loss = 0.037121 (* 1 = 0.037121 loss)
I0427 10:25:38.730303 1817 solver.cpp:238] Train net output #0: loss = 0.0362058 (* 1 = 0.0362058 loss)
output #5:
I0427 09:26:52.251719 1817 solver.cpp:398] Test net output #5: loss = 6.98116 (* 1 = 6.98116 loss)
I0427 09:33:01.639736 1817 solver.cpp:398] Test net output #5: loss = 6.99285 (* 1 = 6.99285 loss)
I0427 09:39:09.991879 1817 solver.cpp:398] Test net output #5: loss = 7.02165 (* 1 = 7.02165 loss)
I0427 09:45:18.013739 1817 solver.cpp:398] Test net output #5: loss = 7.01533 (* 1 = 7.01533 loss)
I0427 09:51:27.065721 1817 solver.cpp:398] Test net output #5: loss = 7.02347 (* 1 = 7.02347 loss)
I0427 09:58:13.271441 1817 solver.cpp:398] Test net output #5: loss = 6.98176 (* 1 = 6.98176 loss)
I0427 10:05:31.896226 1817 solver.cpp:398] Test net output #5: loss = 6.99103 (* 1 = 6.99103 loss)
I0427 10:12:12.693677 1817 solver.cpp:398] Test net output #5: loss = 7.02868 (* 1 = 7.02868 loss)
I0427 10:18:23.250385 1817 solver.cpp:398] Test net output #5: loss = 7.03427 (* 1 = 7.03427 loss)
I0427 10:24:31.239820 1817 solver.cpp:398] Test net output #5: loss = 6.97721 (* 1 = 6.97721 loss)

The outputs of the convolutional layer in Caffe are different

I wrote a siamese-like network using caffe with two inputs. The output of the convolutional layer with the first input is always the same, while the second output changes every time. The input layer and the convolutional layers are as follows:
layer {
name: "input"
type: "Input"
top: "data1"
top: "data2"
input_param {
shape {dim: 1
dim: 1
dim: 28
dim: 28
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data1"
top: "conv1_1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data2"
top: "conv1_2"
convolution_param {
num_output: 20
kernel_size: 5
bias_term: false
weight_filler {
type: "xavier"
}
}
}
May I build the convolutional layer with the python layer? If so, how

What is the output dimension in the googlenet after the concat layer?

When looking throught the prototxt of the googlenet one finds that the inception layers have a concat layer at the end which takes several bottom inputs.
e.g:
layer {
name: "inception_3a/output"
type: "Concat"
bottom: "inception_3a/1x1"
bottom: "inception_3a/3x3"
bottom: "inception_3a/5x5"
bottom: "inception_3a/pool_proj"
top: "inception_3a/output"
}
As it can be seen, there is one 1x1 conv-layer, one 3x3 conv-layer , one 5x5 conv-layer and finally a pooling layer. These layers are described as following:
layer {
name: "inception_3a/1x1"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.03
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_1x1"
type: "ReLU"
bottom: "inception_3a/1x1"
top: "inception_3a/1x1"
}
layer {
name: "inception_3a/3x3_reduce"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.09
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_3a/3x3_reduce"
top: "inception_3a/3x3_reduce"
}
layer {
name: "inception_3a/3x3"
type: "Convolution"
bottom: "inception_3a/3x3_reduce"
top: "inception_3a/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
std: 0.03
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_3x3"
type: "ReLU"
bottom: "inception_3a/3x3"
top: "inception_3a/3x3"
}
layer {
name: "inception_3a/5x5_reduce"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 16
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.2
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_3a/5x5_reduce"
top: "inception_3a/5x5_reduce"
}
layer {
name: "inception_3a/5x5"
type: "Convolution"
bottom: "inception_3a/5x5_reduce"
top: "inception_3a/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
std: 0.03
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_5x5"
type: "ReLU"
bottom: "inception_3a/5x5"
top: "inception_3a/5x5"
}
layer {
name: "inception_3a/pool"
type: "Pooling"
bottom: "pool2/3x3_s2"
top: "inception_3a/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_3a/pool_proj"
type: "Convolution"
bottom: "inception_3a/pool"
top: "inception_3a/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
std: 0.1
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
It can be seen that these have different numbers of output and also different filter size, anyhow the documentation on the concat layer is the following:
input:
n_i * c_i * h * w for each input blob i from 1 to K.
Output:
if axis = 0: (n_1 + n_2 + ... + n_K) * c_1 * h * w, and all input c_i
should be the same.
if axis = 1: n_1 * (c_1 + c_2 + ... + c_K) * h * w, and all input n_i should be the same.
Firstly, I am not sure what the default is and secondly I am not sure which Dimensions will have the output Volume, since width and height should stay the same but all thre conv layer produce different outputs. Any pointers would be really appreciated
The default value for 'Concat' axis is 1, thus concatenating through channel dimension. In order to do this, all the layers that are concatenated, should have the same height and width. Looking to the log, the dimensions are (assuming batch size 32):
inception_3a/1x1 -> [32, 64, 28, 28]
inception_3a/3x3 -> [32, 128, 28, 28]
inception_3a/5x5 -> [32, 32, 28, 28]
inception_3a/pool_proj -> [32, 32, 28, 28]
Thus the final output will have dimension:
inception_3a/output -> [32 (64+128+32+32) 28, 28] -> [32, 256, 28, 28]
As expected from the Caffe log:
Creating Layer inception_3a/output
inception_3a/output <- inception_3a/1x1
inception_3a/output <- inception_3a/3x3
inception_3a/output <- inception_3a/5x5
inception_3a/output <- inception_3a/pool_proj
inception_3a/output -> inception_3a/output
Setting up inception_3a/output
Top shape: 32 256 28 28 (6422528)

caffe reshape / upsample fully connected layer

Assuming we have a layer like this:
layer {
name: "fully-connected"
type: "InnerProduct"
bottom: "bottom"
top: "top"
inner_product_param {
num_output: 1
}
}
The output is batch_size x 1. In several papers (for exmaple link1 page 3 picture on the top, or link2 page 4 on top)I have seen that they used such a layer in the end to come up with a 2D image for pixel-wise prediction. How is it possible to transform this into a 2D image? I was thinking of reshape or deconvolution, but I cannot figure out how that would work. A simple example would be helpful
UPDATE: My input images are 304x228 and my ground_truth (depth images) are 75x55.
################# Main net ##################
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relufc6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4070
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
type: "Reshape"
name: "reshape"
bottom: "fc7"
top: "fc7_reshaped"
reshape_param {
shape { dim: 1 dim: 1 dim: 55 dim: 74 }
}
}
layer {
name: "deconv1"
type: "Deconvolution"
bottom: "fc7_reshaped"
top: "deconv1"
convolution_param {
num_output: 64
kernel_size: 5
pad: 2
stride: 1
#group: 256
weight_filler {
type: "bilinear"
}
bias_term: false
}
}
#########################
layer {
name: "conv6"
type: "Convolution"
bottom: "data"
top: "conv6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 63
kernel_size: 9
stride: 2
pad: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "conv6"
top: "conv6"
}
layer {
name: "pool6"
type: "Pooling"
bottom: "conv6"
top: "pool6"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
########################
layer {
name: "concat"
type: "Concat"
bottom: "deconv1"
bottom: "pool6"
top: "concat"
concat_param {
concat_dim: 1
}
}
layer {
name: "conv7"
type: "Convolution"
bottom: "concat"
top: "conv7"
convolution_param {
num_output: 64
kernel_size: 5
pad: 2
stride: 1
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "conv7"
top: "conv7"
relu_param{
negative_slope: 0.01
engine: CUDNN
}
}
layer {
name: "conv8"
type: "Convolution"
bottom: "conv7"
top: "conv8"
convolution_param {
num_output: 64
kernel_size: 5
pad: 2
stride: 1
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu8"
type: "ReLU"
bottom: "conv8"
top: "conv8"
relu_param{
negative_slope: 0.01
engine: CUDNN
}
}
layer {
name: "conv9"
type: "Convolution"
bottom: "conv8"
top: "conv9"
convolution_param {
num_output: 1
kernel_size: 5
pad: 2
stride: 1
weight_filler {
type: "gaussian"
std: 0.011
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu9"
type: "ReLU"
bottom: "conv9"
top: "result"
relu_param{
negative_slope: 0.01
engine: CUDNN
}
}
log:
I1108 19:34:57.239722 4277 data_layer.cpp:41] output data size: 1,1,228,304
I1108 19:34:57.243340 4277 data_layer.cpp:41] output data size: 1,1,55,74
I1108 19:34:57.247392 4277 net.cpp:150] Setting up conv1
I1108 19:34:57.247407 4277 net.cpp:157] Top shape: 1 96 55 74 (390720)
I1108 19:34:57.248191 4277 net.cpp:150] Setting up pool1
I1108 19:34:57.248196 4277 net.cpp:157] Top shape: 1 96 27 37 (95904)
I1108 19:34:57.253263 4277 net.cpp:150] Setting up conv2
I1108 19:34:57.253276 4277 net.cpp:157] Top shape: 1 256 27 37 (255744)
I1108 19:34:57.254202 4277 net.cpp:150] Setting up pool2
I1108 19:34:57.254220 4277 net.cpp:157] Top shape: 1 256 13 18 (59904)
I1108 19:34:57.269943 4277 net.cpp:150] Setting up conv3
I1108 19:34:57.269961 4277 net.cpp:157] Top shape: 1 384 13 18 (89856)
I1108 19:34:57.285303 4277 net.cpp:150] Setting up conv4
I1108 19:34:57.285338 4277 net.cpp:157] Top shape: 1 384 13 18 (89856)
I1108 19:34:57.294801 4277 net.cpp:150] Setting up conv5
I1108 19:34:57.294841 4277 net.cpp:157] Top shape: 1 256 13 18 (59904)
I1108 19:34:57.295207 4277 net.cpp:150] Setting up pool5
I1108 19:34:57.295210 4277 net.cpp:157] Top shape: 1 256 6 9 (13824)
I1108 19:34:57.743222 4277 net.cpp:150] Setting up fc6
I1108 19:34:57.743259 4277 net.cpp:157] Top shape: 1 4096 (4096)
I1108 19:34:57.881680 4277 net.cpp:150] Setting up fc7
I1108 19:34:57.881718 4277 net.cpp:157] Top shape: 1 4070 (4070)
I1108 19:34:57.881826 4277 net.cpp:150] Setting up reshape
I1108 19:34:57.881846 4277 net.cpp:157] Top shape: 1 1 55 74 (4070)
I1108 19:34:57.884768 4277 net.cpp:150] Setting up conv6
I1108 19:34:57.885309 4277 net.cpp:150] Setting up pool6
I1108 19:34:57.885327 4277 net.cpp:157] Top shape: 1 63 55 74 (256410)
I1108 19:34:57.885395 4277 net.cpp:150] Setting up concat
I1108 19:34:57.885412 4277 net.cpp:157] Top shape: 1 64 55 74 (260480)
I1108 19:34:57.886759 4277 net.cpp:150] Setting up conv7
I1108 19:34:57.886786 4277 net.cpp:157] Top shape: 1 64 55 74 (260480)
I1108 19:34:57.897269 4277 net.cpp:150] Setting up conv8
I1108 19:34:57.897303 4277 net.cpp:157] Top shape: 1 64 55 74 (260480)
I1108 19:34:57.899129 4277 net.cpp:150] Setting up conv9
I1108 19:34:57.899138 4277 net.cpp:157] Top shape: 1 1 55 74 (4070)
The value of num_output of the last fully connected layer will not be 1 for pixel wise prediction. It will be equal to w*h of the input image.
What made you feel that the value will be 1?
Edit 1:
Below are the dimensions of each layer mentioned in link1 page 3 figure:
LAYER OUTPUT DIM [c*h*w]
course1 96*h1*w1 conv layer
course2 256*h2*w2 conv layer
course3 384*h3*w3 conv layer
course4 384*h4*w4 conv layer
course5 256*h5*w5 conv layer
course6 4096*1*1 fc layer
course7 X*1*1 fc layer where 'X' could be interpreted as w*h
To understand this further, lets assume we have a network to predict the pixels of the image. The images are of size 10*10. Thus, the final output of the fc layer too will be having the dimension 100*1*1(like in course7). This could be interpreted as 10*10.
Now the question will be, how can the 1d array predict a 2d image correctly. For this, you have to note that the loss is calculated for this output, using the labels which could be corresponding to the pixel data. Thus during training, the weights will learn to predict the pixel data.
EDIT 2:
Trying to draw the net using draw_net.py in caffe, gives you this:
The relu layer connected with conv6 and fc6 has the same name, leading to a complicated connectivity in the drawn image. I am not sure on whether this will cause some issues during training, but I would suggest you to rename one of the relu layers to a unique name to avoid some unforseen issues.
Coming back to your question, there doesn't seem to be an upsampling happening after fully connected layers. As seen in the log:
I1108 19:34:57.881680 4277 net.cpp:150] Setting up fc7
I1108 19:34:57.881718 4277 net.cpp:157] Top shape: 1 4070 (4070)
I1108 19:34:57.881826 4277 net.cpp:150] Setting up reshape
I1108 19:34:57.881846 4277 net.cpp:157] Top shape: 1 1 55 74 (4070)
I1108 19:34:57.884768 4277 net.cpp:150] Setting up conv6
I1108 19:34:57.885309 4277 net.cpp:150] Setting up pool6
I1108 19:34:57.885327 4277 net.cpp:157] Top shape: 1 63 55 74 (256410)
fc7 has output dimension of 4070*1*1. This is being reshaped to 1*55*74 to be passed as an input to conv6 layer.
The output of the whole network is produced in conv9, which has an output dimension of 1*55*74, which is exactly similar to the dimension of the labels (depth data).
Please do pinpoint on where you feel the upsample is happening, if my answer is still not clear.
if you simply need fully-connected networks like the conventional multi-layer perceptron, use 2D blobs (shape (N, D)) and call the InnerProductLayer.

Which part of the deploy.prototxt file in caffe is absolutely necessary for testing?

In a recent discussion, I found out that some parts of the deploy.prototxt exist only because they have been directly copied from the train_test.prototxt and are ignored during testing. For example:
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param { #Starting here
lr_mult: 1
}
param {
lr_mult: 2
} #To here
convolution_param { #is this section useful?
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
I was told that the section containing LR for weights as biases was useless in deploy files and could be deleted. This got me thinking, is the convolution_param portion absolutely required? If yes, do we still have to define the weight and bias fillers as we will only do testing using this file and fillers are initialized only when we need to train a network. Is there any other detail that is unnecessary?
The convolution_param portion is required but you can remove weight_filler and bias_filler if you want.
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
}
}
The above layer will run well during Test.