How can I get Top-k accuracy on PyCaffe during training phase? - caffe

I'd like to know if there's a way to get the top-k error on PyCaffe while doing training phase.
I know there's the top_k parameter for the .prototxt file but is there any way I can use it on PyCaffe?
layer {
name: "accuracy"
type: "Accuracy"
bottom: "..."
bottom: "label"
top: "accuracy"
accuracy_param {
top_k: 5
}
include {
phase: TEST
}
}

For anyone wondering I just find out you need to put multiple accuracy layers with different top-k numbers. Here is an example of a top-3 accuracy.
layer {
name: "accuracy1"
type: "Accuracy"
bottom: "score"
bottom: "label"
top: "accuracy1"
include {
phase: TEST
}
}
layer {
name: "accuracy2"
type: "Accuracy"
bottom: "score"
bottom: "label"
top: "accuracy2"
accuracy_param {
top_k: 2
}
include {
phase: TEST
}
}
layer {
name: "accuracy3"
type: "Accuracy"
bottom: "score"
bottom: "label"
top: "accuracy3"
accuracy_param {
top_k: 3
}
include {
phase: TEST
}
}

Related

CNN doesn't learn simple geometric patterns

It must be a very stupid question, but since I have not such sufficient know ledge storage and having no more time to search the answer of it, I have to put it here to ask for help. I generated a training dataset of images of simple geometric shapes as triangles, squares, diamonds etc. by programs and constructed a CNN with two convolutional layers and one pooling layer also a final fully connected layer to learn the classifications of these shapes. But the network just does not to learn it. I mean the loss just does not decrease. What is the cause?
In Caffe, the neural network configuration file "very_simple_one.prototxt" looks like:
name: "very_simple_one"
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/train_valid_lmdb"
batch_size: 1000
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/test_lmdb"
batch_size: 100
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 50
kernel_size: 5
stride: 5
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 5
stride: 5
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 3
kernel_size: 8
stride: 8
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "fc3"
type: "InnerProduct"
bottom: "conv2"
top: "fc3"
inner_product_param {
num_output: 3
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc3"
bottom: "label"
}
The "solver.prototxt" looks like:
net: "very_simple_one.prototxt"
type: "SGD"
test_iter: 15
test_interval: 100
base_lr: 0.05
lr_policy: "step"
gamma: 0.9999
stepsize: 100
display: 20
max_iter: 50000
snapshot: 2000
momentum: 0.9
weight_decay: 0.00000000000
solver_mode: GPU
Also tried AdaGrad by commenting the "momentum" and modify the "type" to AdaGrad.
Train this net by the command:
....../caffe/build/tools/caffe train -solver solver.prototxt
All failed to train. I mean the loss just does not decrease. The loss is hovering within a very very small interval but never really to decrease.
Just wonder if the dataset is definitely not able to be trained or there is something wrong with my configuration files, the above ones?
I also have modified the network according to what Ibrahim Yousuf said by replacing the pooling layer as convolutional layer as:
name: "very_simple_one"
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/train_valid_lmdb"
batch_size: 1000
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
##name: "input"
name: "data"
##type: "Input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "images/train_valid_lmdb_mean.binaryproto"
}
data_param {
source: "images/test_lmdb"
batch_size: 100
backend: LMDB
}
input_param {
shape {
dim: 1
dim: 3
dim: 200
dim: 200
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 50
kernel_size: 5
##stride: 5
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv1.5"
type: "Convolution"
bottom: "conv1"
top: "conv1.5"
convolution_param {
num_output: 10
kernel_size: 5
stride: 2
}
}
layer {
name: "relu1.5"
type: "ReLU"
bottom: "conv1.5"
top: "conv1.5"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "conv1.5"
top: "conv2"
convolution_param {
num_output: 3
kernel_size: 8
stride: 4
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "fc3"
type: "InnerProduct"
bottom: "conv2"
top: "fc3"
inner_product_param {
num_output: 3
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc3"
bottom: "label"
}
But the loss still does not decrease. Should I be confirmed that the cause is my dataset? And my dataset is really very small and if anyone could give me a hand, I can upload it onto somewhere a net disk to be downloaded for test.
Solved. The labels for classification should start from zero not one e.g. 0, 1 ,2 for three classification problem not 1, 2, 3.

Using the SPP Layer in caffe results in Check failed: pad_w_ < kernel_w_ (1 vs. 1)

Ok, I had a previous question about using the SPP Layer in caffe.
This question is a subsequent to the previous one.
When using the SPP Layer I get the error output below.
It seems that the images are getting too small when reaching the spp layer?
The images I use are small. The width ranges between 10 and 20 px and height ranges between 30 and 35px.
I0719 12:18:22.553256 2114932736 net.cpp:406] spatial_pyramid_pooling <- conv2
I0719 12:18:22.553261 2114932736 net.cpp:380] spatial_pyramid_pooling -> pool2
F0719 12:18:22.553505 2114932736 pooling_layer.cpp:74] Check failed: pad_w_ < kernel_w_ (1 vs. 1)
*** Check failure stack trace: ***
# 0x106afcb6e google::LogMessage::Fail()
# 0x106afbfbe google::LogMessage::SendToLog()
# 0x106afc53a google::LogMessage::Flush()
# 0x106aff86b google::LogMessageFatal::~LogMessageFatal()
# 0x106afce55 google::LogMessageFatal::~LogMessageFatal()
# 0x1068dc659 caffe::PoolingLayer<>::LayerSetUp()
# 0x1068ffd98 caffe::SPPLayer<>::LayerSetUp()
# 0x10691123f caffe::Net<>::Init()
# 0x10690fefe caffe::Net<>::Net()
# 0x106927ef8 caffe::Solver<>::InitTrainNet()
# 0x106927325 caffe::Solver<>::Init()
# 0x106926f95 caffe::Solver<>::Solver()
# 0x106935b46 caffe::SGDSolver<>::SGDSolver()
# 0x10693ae52 caffe::Creator_SGDSolver<>()
# 0x1067e78f3 train()
# 0x1067ea22a main
# 0x7fff9a3ad5ad start
# 0x5 (unknown)
I was correct, my images were to small.
I changed my net and it worked. I removed one conv layer and replaced the normal pool layer with the spp layer. I also had to set my test batch size to 1. Accuracy were very high, but my F1 Score went down. I dont know if this is related to the small test batch Size I had to use.
Net:
name: "TessDigitMean"
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_SPP/784/caffe/train_lmdb"
batch_size: 1 #64
backend: LMDB
}
}
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_SPP/784/caffe/test_lmdb"
batch_size: 1
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
pad_w: 2
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "spatial_pyramid_pooling"
type: "SPP"
bottom: "conv1"
top: "pool2"
spp_param {
pyramid_height: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}

Discrepancy in results when using batch size 1 in prototxt versus coercing batch size to 1 in pycaffe

I am running the MNIST example with some manual changes to the layers. While training everything works great and I reach a final test accuracy of ~99%. I am now trying to work with the generated model in python using pycaffe and am following the steps as given here. I want to compute the confusion matrix so I'm looping through the test images one by one from LMDB and then running the network. Here is the code :
net = caffe.Net(args.proto, args.model, caffe.TEST)
...
datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(value)
label = int(datum.label)
image = caffe.io.datum_to_array(datum).astype(np.uint8)
...
net.blobs['data'].reshape(1, 1, 28, 28) # Greyscale 28x28 images
net.blobs['data'].data[...] = image
net.forward()
# Get predicted label
print net.blobs['label'].data[0] # use this later for confusion matrix
Here is my network definition prototxt
name: "MNISTNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc2"
bottom: "label"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
Note that test batch size is 100 which is why I need that reshape in python code. Now, suppose I change the test batch size to 1, the exact same python code prints different (and mostly correct) predicted class labels. Thus, the code being run with batch size 1 produces expected result with ~99% accuracy while batch size 100 is horrible.
However, based on the Imagenet pycaffe tutorial, I don't see what I'm doing wrong. As a last resort I can create a copy of my prototxt with batch size 1 for test and use that in my python code and use the original one while training, but that is not ideal.
Also, I don't think it should be an issue with preprocessing since it doesn't explain why it works well with batch size 1.
Any pointers appreciated!

Implementing a Multi Target Regression Net with Caffe with Non-image data input

I was trying to implement a simple net in Caffe for coping with multi-target regression.
The dimension of my input is 5.
The dimension of my output is 3.
I have this data on a mat file.
I have used h5create and h5write to create both train.h5 and test.h5 files.
load('train_data.mat')
h5create('train_data.h5','/data',[5 10000]);
h5write('train_data.h5', '/data', x');
load('train_label.mat')
h5create('train_label.h5','/label',[3 10000]);
h5write('train_label.h5', '/label', Y');
load('test_data.mat')
h5create('test_data.h5','/data',[5 2500]);
h5write('test_data.h5', '/data', x');
load('test_label.mat')
h5create('test_label.h5','/label',[3 2500]);
h5write('test_label.h5', '/label', Y');
I designed a network in Caffe as follows:
name: "RegressionNet"
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "data"
include {
phase: TRAIN
}
hdf5_data_param {
source: "examples/my_second_cnn_example/data/train_data.txt"
batch_size: 10
}
}
layer {
name: "data"
type: "HDF5Data"
top: "label"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "examples/my_second_cnn_example/data/train_label.txt"
batch_size: 10
}
}
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "data"
include {
phase: TEST
}
hdf5_data_param {
source: "examples/my_second_cnn_example/data/test_data.txt"
batch_size: 10
}
}
layer {
name: "data"
type: "HDF5Data"
top: "label"
top: "label"
include {
phase: TEST
}
hdf5_data_param {
source: "examples/my_first_cnn_example/data/test_label.txt"
batch_size: 10
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "data"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "fc1"
bottom: "label"
top: "loss"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc1"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
Unfortunately It does not work! I got the following error
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (10 vs. 30) Number of labels must match number of predictions; e.g., if label axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
I think that the problem is in the way I'm organizing the data. And also in the use of the Accuracy layer.
However, any idea about how to solve it?

Caffe HDF5 neural network basic example fails to parse the model file

I am trying to build a minimal example of a neural network with HDF5 data that I have prepared from a CSV file using the caffe libraries.
My prototext is as follows: [wine_train.prototxt]
name:"wineclass"
layers {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "examples/wine/test.txt"
batch_size: 10
}
include{
phase:TEST
}
}
layer {
name: "data"
type: "HDF5Data"
top: "label"
top: "label"
hdf5_data_param {
source: "examples/wine/train.txt"
batch_size: 2
}
include{
phase:TRAIN
}
}
layers {
name: "ip"
type: "INNER_PRODUCT"
bottom: "data"
top: "ip"
inner_product_param {
num_output: 3
}
}
layers {
name: "loss"
type: "SOFTMAX_LOSS"
bottom: "ip"
bottom: "label"
top: "loss"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
and my solver is as follows:
net: "examples/wine/wine_train.prototxt"
test_iter: 250
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 5000
display: 1000
max_iter: 10000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "wine/train"
solver_mode: CPU
Each time I get the following error:
shaunak#ubuntu:~/caffe$ build/tools/caffe train -model '/home/shaunak/caffe/examples/wine/wine_train.prototxt' -solver '/home/shaunak/caffe/examples/wine/solver.prototxt'
I0415 04:31:00.154145 57047 caffe.cpp:117] Use CPU.
I0415 04:31:00.154485 57047 caffe.cpp:121] Starting Optimization
I0415 04:31:00.154552 57047 solver.cpp:32] Initializing solver from parameters:
test_iter: 250
test_interval: 1000
base_lr: 0.01
display: 1000
max_iter: 10000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 5000
snapshot: 10000
snapshot_prefix: "wine/train"
solver_mode: CPU
net: "examples/wine/wine_train.prototxt"
I0415 04:31:00.154660 57047 solver.cpp:70] Creating training net from net file: examples/wine/wine_train.prototxt
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 4:9: Expected integer or identifier.
F0415 04:31:00.154774 57047 upgrade_proto.cpp:928] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: examples/wine/wine_train.prototxt
*** Check failure stack trace: ***
# 0x7f4a30766c3c google::LogMessage::Fail()
# 0x7f4a30766b88 google::LogMessage::SendToLog()
# 0x7f4a3076658a google::LogMessage::Flush()
# 0x7f4a30769521 google::LogMessageFatal::~LogMessageFatal()
# 0x7f4a30b8b1ee caffe::ReadNetParamsFromTextFileOrDie()
# 0x7f4a30b6dfa2 caffe::Solver<>::InitTrainNet()
# 0x7f4a30b6ee63 caffe::Solver<>::Init()
# 0x7f4a30b6f036 caffe::Solver<>::Solver()
# 0x40c3c0 caffe::GetSolver<>()
# 0x406361 train()
# 0x4048f1 main
# 0x7f4a2fe86ec5 (unknown)
# 0x404e9d (unknown)
Aborted (core dumped)
shaunak#ubuntu:~/caffe$
What exactly does the error say and how do I resolve it?
Update
This seems to run:
name: "WineNet"
layer {
name: "data"
type: "HDF5Data"
top: "wine_train_data"
top: "wine_train_label"
hdf5_data_param {
source: "examples/wine/train.txt"
batch_size: 10
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "wine_train_data"
top: "fc1"
inner_product_param {
num_output: 2
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc1"
bottom: "wine_train_label"
top: "loss"
}
UPDATE2
Virtually identical models: One works the other doesn't. I cant explain why!
Works:
name: "WineNet"
layer {
name: "data"
type: "HDF5Data"
top: "wine_train_data"
top: "wine_train_label"
hdf5_data_param {
source: "examples/wine/train.txt"
batch_size: 10
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "wine_train_data"
top: "fc1"
inner_product_param {
num_output: 2
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc1"
bottom: "wine_train_label"
top: "loss"
}
Doesn't:
name:"wineclass"
layers {
name: "data"
type: "HDF5Data"
top: "wine_train_data"
top: "wine_train_label"
hdf5_data_param {
source: "examples/wine/train.txt"
batch_size: 10
}
}
layer {
name: "ip"
type: "InnerProduct"
bottom: "wine_train_data"
top: "ip"
inner_product_param {
num_output: 2
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip"
bottom: "wine_train_label"
top: "loss"
}
I ran into the same problem yesterday. You might need to check that your caffe version is up-to-date. They changed the protobuf definition quite heavily. In your case, "type" used to be an enum, now it just takes a string.
[Update] From the discussion in the comments: The answer is to not use "layers" but "layer". layers was probably present in some old / outdated example.