How can I define a constant vector layer in Caffe? - caffe

I have a data with 10-d label vector, and I want to use a caffe model to make regression against these data with 10-d output. But now, I only want to check loss of some outputs (for example, 1, 3, 4, 5, 6-d of 10-d vector), so I define a layer with 5-d output at the bottom of the last output layer, But I'v no idea how to get corresponding 5-d label vector groundtruth, I think may be I can define a constant layer to indicate which entries I want get. Please help me if you have any ideas.
update: example
This is my original InnerProduct and Loss layer
layer {
name: "score"
type: "InnerProduct"
bottom: "fc7"
top: "score"
inner_product_param {
num_output: 10
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "score"
bottom: "label"
top: "loss"
include {
phase: TRAIN
}
}
I care more about $n_1$ (like 1,3,4,5,6) entries of the 10-dimension output and their loss, so I want to fetch the loss of these entries, like
layer {
name: "score1"
type: "InnerProduct"
bottom: "fc7"
top: "score1"
inner_product_param {
num_output: 5 # n_1
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "loss1"
type: "EuclideanLoss"
bottom: "score1"
bottom: "label"
top: "loss1"
include {
phase: TRAIN
}
}
How can I get score1 from score directly?

From what I interpret from your question, I think you would like to calculate the loss of the output layer w.r.t the labels for a regression model. But you would like to not bring some of the labels to the equation.
If my interpretation is true, being a regression model, I am expecting your new layer to be something similar to that of EuclidieanLayer. If it is so, the caffe_sub function in the layer could be replaced by the following code segment.
int arrayPos[5] = {1,3,4,5,6};
int count = 5;
Dtype *newBottom0=(Dtype*)malloc(sizeof(Dtype)*count);
Dtype *newBottom1=(Dtype*)malloc(sizeof(Dtype)*count);
for(int varI=0; varI<count; varI++)
{
newBottom0[varI] = (Dtype) bottom[0]->cpu_data()[arrayPos[varI]];
newBottom1[varI] = (Dtype) bottom[1]->cpu_data()[arrayPos[varI]];
}
caffe_sub( count, newBottom0, newBottom1, diff_.mutable_cpu_data());
free(newBottom0);
free(newBottom1);

Related

Convolution layer in CNN

We know that Convolution layer in CNN uses filters and different filters will look for different information in the input image.
But let say in this SSD, we have prototxt file and it has specification for the convolution layer as
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1"
top: "conv2_1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
All convolution layers in different networks like (GoogleNet, AlexNet, VGG etc) are more or less similar.
Just look at that and how to understand, filters in this convolution layer try to extract which information of the input image?
EDIT:
Let me clarify for my question.
I see two convolutions layer from the prototxt file as follows. They are from SSD.
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1"
top: "conv2_1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
Then I print here of their outputs
Data
conv1_1 and conv2_1 images are here and here.
So my query is how these two conv layers produced different output. But no difference in prototxt file.
The filters at earlier layers represent low-level features like edges
(These features retain higher
spatial resolution for precise localization with low-level visual information similar to the response map of Gabor filters). On the other hand, the filter at the mid-layer extract features like corners or blobs, which are more complex.
And as you go deeper you can not visualize and interpret these features, because filters in mid-level and high-level layers are not directly connected to the input image. For instance, when you get the output of the first layer you can actually visualize and interpret it as edges but when you go deeper and apply second convolution layer to these extracted edges (the output of the first layer), then you get something like edges of edges ( or sth like this) and capture more semantic information and less fine-grained spatial details. In the the prototxt file all convolutions and other types of operation can resemble each other. But they extract different kinds of features, because of having different order and weights.
"Convolution" layer differ not only in their parameters (e.g., kernel_size, stride, pad etc.) but also in their weights: the trainable parameters of the convolution kernels.
You see different output (aka "responses") because the weights of the filters are different.
See this answer regarding the difference between "data" blobs and "parameter/weights" blobs in caffe.

Understanding Faster RCNN processing

I run through the code of Faster RCNN for the better understanding of the implementation.
I used gdb to debug C++ code behind the python interface and I can go through line by line to C++ codes.
This paper (page 4, first para) mentioned the split of Convolutional Map to 2k scores and 4k coordinates.
That is implemented using this prototxt as
layer {
name: "rpn_conv/3x3"
type: "Convolution"
bottom: "conv5_3"
top: "rpn/output"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 512
kernel_size: 3 pad: 1 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}
layer {
name: "rpn_cls_score"
type: "Convolution"
bottom: "rpn/output"
top: "rpn_cls_score"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 18 # 2(bg/fg) * 9(anchors)
kernel_size: 1 pad: 0 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}
layer {
name: "rpn_bbox_pred"
type: "Convolution"
bottom: "rpn/output"
top: "rpn_bbox_pred"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 36 # 4 * 9(anchors)
kernel_size: 1 pad: 0 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}
But I go through the code and that is actually implemented under cudnn_conv_layer.cpp and cudnn_conv_layer.cu.
After passing through these rpn_cls_score and rpn_bbox_pred layers, I can see the output blob shapes are
capacity 4 = {1, 18, 36, 49}
capacity 4 = {1, 36, 36, 49}, so it splitted scores and boxes.
(1)How can I understand the process it went through so that 256 or 512 Dimension is splitted into {1, 18, 36, 49} and {1, 36, 36, 49}. There are lr_mult, but I even can't find how lr_mult is used?
(2)Then Page 5 first column discussed about Loss implementation, I can't find the source how this SGD loss minimization is implemented inside the code?
It would be better to try some basic tutorials on CNN and Caffe first. Directly jumping into Caffe implementation without knowing background theory might lead to more confusion.
Whatever region of prototxt you showed is just 3 layers of Convolution.
512-plane input feature maps are convolved with 18 filters to get 18-plane output feature maps in layer "rpn_cls_score". Same 512-plane input feature maps are convolved with 36 filters to get 36-plane output feature maps in layer "rpn_bbox_pred".
Both these layers are convolution layers.
See the CPU implementation : https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cpp
lr_mult is a learning-rate multiplication factor. In your solver.prototxt, there will be a base_lr. It is multiplied with lr_mult of each layer to get the effective learning rate of that layer. It is a part of parameter update and it is hidden from the user. (That is the beauty of machine learning frameworks)
Once again, entire backward pass and parameter update are done by the Caffe in background. User need not worry about it. Since you are looking for the implementation, See SGD here : https://github.com/BVLC/caffe/blob/master/src/caffe/solvers/sgd_solver.cpp

How to get the probabilities for each class for multi-label classification in Caffe

I'm training a network on a multi-label dataset.
My training file looks like this:
img1 1 0 1 0 0 0 0 1 .... 1
...
...
imgN 0 1 0 1 0 1 0 0 .... 0
From reading the tutorials I understand that I have to use the SigmoidCrossEntropyLoss layer.
My question is, after training, what layer do I need to use to extract with the extract_feat.bin script the probabilities for each label?
Bellow I wrote the last layer of my network.
Thank you!
layer {
name: "fc8-1"
type: "InnerProduct"
bottom: "fc7"
top: "fc8-1"
inner_product_param {
num_output: 12400
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "loss"
type: "SigmoidCrossEntropyLoss"
bottom: "fc8-1"
bottom: "label"
top: "loss"
}
When training with "SigmoidCrossEntropy" loss layer, you need to replace the loss layer with a simple "Sigmoid" layer for test time:
layer {
type: "Sigmoid"
bottom: "fc8-1"
top: "class_prob"
name: "class_prob"
}
Your test-time output should be 12,400 dimensional vector (per input) all entries in range [0..1] representing class probabilities.

caffe pixel-wise classification / regression

What I want to do is to do a simple pixel-wise classification or regression task. Therefore I have an input image and a ground_truth. What I want to do is to do an easy segmentation task where I have a circle and a rectangle. And I want to train, where the circle or where the rectangle is. That means I have an ground_truth images which has value "1" at all the locations where the circle is and value "2" at all the locations where the rectangle is. Then I have my images and ground_truth images as input in form of .png images.
Then I think I can either to a regression or classification task depending on my loss layer: I have been using the fully convolutional AlexNet from fcn alexnet
classification:
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 3 ## <<---- 0 = backgrund 1 = circle 2 = rectangle
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss" ## <<----
bottom: "score"
bottom: "ground_truth"
top: "loss"
loss_param {
ignore_label: 0
}
}
regression:
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 1 ## <<---- 1 x height x width
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
layer {
name: "loss"
type: "EuclideanLoss" ## <<----
bottom: "score"
bottom: "ground_truth"
top: "loss"
}
However, this produces not even the results I want to have. I think there is something wrong with my understanding of pixel-wise classification / regression. Could you tell me where my mistake is?
EDIT 1
For regression the retrieval of the output would look like this:
output_blob = pred['result'].data
predicated_image_array = np.array(output_blob)
predicated_image_array = predicated_image_array.squeeze()
print predicated_image_array.shape
#print predicated_image_array.shape
#print mean_array
range_value = np.ptp(predicated_image_array)
min_value = predicated_image_array.min()
max_value = predicated_image_array.max()
# make positive
predicated_image_array[:] -= min_value
if not range_value == 0:
predicated_image_array /= range_value
predicated_image_array *= 255
predicated_image_array = predicated_image_array.astype(np.int64)
print predicated_image_array.shape
cv2.imwrite('predicted_output.jpg', predicated_image_array)
This is easy since the output is 1 x height x width and the values are the actual output values. But how would one retrieve the output for classification / SotMaxLayer since the output is 3 (num labels) x height x width. But I do not know the meaning of the content of this shape.
first of all, your problem is not regression, but classification!
if you want to teach the net recognise circles and rectangles you have to make a different data set - an images and labels, for example: circle - 0 and rectangle - 1. you do it by making text file that containsthe images path and the images labels, for example: /path/circle1.png 0 /path/circle2.png 0 /path/rectangle1.png 1 /path/rectangle1.png 1. here is a nice tutorial for a problem like yours. good luck.

How to use 1-dim vector as input for caffe?

I'd like to train a neural network (NN) on my own 1-dim data, which I stored in a hdf5 database for caffe. According to the documetation this should work. It also works for me as far as I only use "Fully Connected Layers", "Relu" and "Dropout". However I get an error when I try to use "Convolution" and "Max Pooling" layers in the NN architecture. The error complains about the input dimension of the data.
I0622 16:44:20.456007 9513 net.cpp:84] Creating Layer conv1
I0622 16:44:20.456015 9513 net.cpp:380] conv1 <- data
I0622 16:44:20.456048 9513 net.cpp:338] conv1 -> conv1
I0622 16:44:20.456061 9513 net.cpp:113] Setting up conv1
F0622 16:44:20.456487 9513 blob.cpp:28] Check failed: shape[i] >= 0 (-9 vs. 0)
This is the error when I only want to use a "Pooling" layer behind an "InnerProduct" layer:
I0622 16:52:44.328660 9585 net.cpp:338] pool1 -> pool1
I0622 16:52:44.328666 9585 net.cpp:113] Setting up pool1
F0622 16:52:44.328680 9585 pooling_layer.cpp:84] Check failed: 4 == bottom[0]->num_axes() (4 vs. 2) Input must have 4 axes, corresponding to (num, channels, height, width)
However I don't know how to change the input dimensions such that it works.
This is the beginning of my prototxt file specifying the network architecture:
name: "LeNet"
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/path/to/my/data/train.txt"
batch_size: 200
}
}
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TEST
}
hdf5_data_param {
source: "/path/to/my/data/test.txt"
batch_size: 200
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 1
kernel_h: 11
kernel_w: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_h: 3
kernel_w: 1
stride: 2
}
}
And this is how I output my 4D-database (with two singleton dimensions) using Matlabs h5write function:
h5create('train.h5','/data',[dimFeats 1 1 numSamplesTrain]);
h5write('train.h5','/data', traindata);
You seem to be outputting your data using the wrong shape. Caffe blobs have the dimensions (n_samples, n_channels, height, width) .
Other than that your prototxt seems to be fine for doing predictions based on a 1D input.
As I have no experience in using the h5create and h5write in Matlab, I am not sure on whether the training dataset is generated with the dimensions that you expect it to generate.
The error msg for the convolution layer says that shape[i] = -9. This means that either the width, height, channels or number of images in a batch is being set to -9.
The error msg when using pooling layer alone says that the network could detect only an input of 2D while the network is expecting an input of 4D.
The error messages in both the layers are related to reshaping the blobs and this is a clear indication that the dimensions of the input are not as expected.
Try debugging the Reshape functions present in blob.cpp & layers/pooling_layer.cpp to get an insight on which value is actually going rogue.