I'd like to train a neural network (NN) on my own 1-dim data, which I stored in a hdf5 database for caffe. According to the documetation this should work. It also works for me as far as I only use "Fully Connected Layers", "Relu" and "Dropout". However I get an error when I try to use "Convolution" and "Max Pooling" layers in the NN architecture. The error complains about the input dimension of the data.
I0622 16:44:20.456007 9513 net.cpp:84] Creating Layer conv1
I0622 16:44:20.456015 9513 net.cpp:380] conv1 <- data
I0622 16:44:20.456048 9513 net.cpp:338] conv1 -> conv1
I0622 16:44:20.456061 9513 net.cpp:113] Setting up conv1
F0622 16:44:20.456487 9513 blob.cpp:28] Check failed: shape[i] >= 0 (-9 vs. 0)
This is the error when I only want to use a "Pooling" layer behind an "InnerProduct" layer:
I0622 16:52:44.328660 9585 net.cpp:338] pool1 -> pool1
I0622 16:52:44.328666 9585 net.cpp:113] Setting up pool1
F0622 16:52:44.328680 9585 pooling_layer.cpp:84] Check failed: 4 == bottom[0]->num_axes() (4 vs. 2) Input must have 4 axes, corresponding to (num, channels, height, width)
However I don't know how to change the input dimensions such that it works.
This is the beginning of my prototxt file specifying the network architecture:
name: "LeNet"
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/path/to/my/data/train.txt"
batch_size: 200
}
}
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TEST
}
hdf5_data_param {
source: "/path/to/my/data/test.txt"
batch_size: 200
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 1
kernel_h: 11
kernel_w: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_h: 3
kernel_w: 1
stride: 2
}
}
And this is how I output my 4D-database (with two singleton dimensions) using Matlabs h5write function:
h5create('train.h5','/data',[dimFeats 1 1 numSamplesTrain]);
h5write('train.h5','/data', traindata);
You seem to be outputting your data using the wrong shape. Caffe blobs have the dimensions (n_samples, n_channels, height, width) .
Other than that your prototxt seems to be fine for doing predictions based on a 1D input.
As I have no experience in using the h5create and h5write in Matlab, I am not sure on whether the training dataset is generated with the dimensions that you expect it to generate.
The error msg for the convolution layer says that shape[i] = -9. This means that either the width, height, channels or number of images in a batch is being set to -9.
The error msg when using pooling layer alone says that the network could detect only an input of 2D while the network is expecting an input of 4D.
The error messages in both the layers are related to reshaping the blobs and this is a clear indication that the dimensions of the input are not as expected.
Try debugging the Reshape functions present in blob.cpp & layers/pooling_layer.cpp to get an insight on which value is actually going rogue.
Related
I am a beginner of Caffe. I had done the training of MNIST with LeNet and ImageNet with AlexNet followed by tutorial, and got pretty good results. Then I tried to train MNIST with AlexNet model. The train model is almost the same as models/bvlc_alexnet/train_val.prototxt but changed somewhere like:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false` <--------------- set to false, and delete crop_size and mean_file
}
data_param {
source: "./mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
......
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false <-------- set to false, and delete crop_size and mean_file
}
data_param {
source: "./mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
......
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size:3 <-------------------- changed to 3
stride: 2 <-------------------- changed to 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
......
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 10 <-------------------- changed to 10
` weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
and the solver.prototxt is as followed
net: "./train_val.prototxt"
test_iter: 1000
test_interval: 100
base_lr: 0.01
lr_policy: "inv"
power: 0.75
gamma: 0.1
stepsize: 1000
display: 100
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "./caffe_alexnet_train"
solver_mode: GPU
after 100,000 iteration training, the accuracy reached to about 0.97
I0315 19:28:54.827383 26505 solver.cpp:258] Train net output #0: loss = 0.0331752 (* 1 = 0.0331752 loss)`
`......`
I0315 19:28:56.384718 26505 solver.cpp:351] Iteration 100000, Testing net (#0)
I0315 19:28:58.121800 26505 solver.cpp:418] Test net output #0: accuracy = 0.974875
I0315 19:28:58.121834 26505 solver.cpp:418] Test net output #1: loss = 0.0804802 (* 1 = 0.0804802 loss)
Then I used the python script to predict a single picture in test set
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import caffe
caffe_root = '/home/ubuntu/pkg/local/caffe'
sys.path.insert(0, caffe_root + 'python')
MODEL_FILE = './deploy.prototxt'
PRETRAINED = './caffe_alexnet_train_iter_100000.caffemodel'
IMAGE_FILE = './4307.png'
input_image = caffe.io.load_image(IMAGE_FILE, color=False)
net = caffe.Classifier(MODEL_FILE, PRETRAINED)
prediction = net.predict([input_image], oversample = False)
caffe.set_mode_cpu()
print( 'predicted class: ', prediction[0].argmax() )
print( 'predicted class all: ', prediction[0] )
but the prediction is wrong. (This script predicts well on MNIST with LeNet)
and the probability of each class is odd also
predicted class: 9 <------------- the correct label is 5
predicted class all: [0.01998338 0.14941786 0.09392905 0.07361069 0.07640345 0.10996494 0.03646726 0.12371133 0.15246753 0.16404454]
**the deploy.prototxt is the almost the same as models/bvlc_alexnet/deploy.prototxt but changed the same places in train_val.prototxt
Any suggestion?
AlexNet was designed to discriminate among 1000 classes, training on 1.3M input images of (canonically) 256x256x3 data values each. You're using essentially the same tool to handle 10 classes with 28x28x1 input.
Very simply, you're over-fitting by design.
If you want to use the general AlexNet design to handle the far-simpler job, you'll need to scale it down appropriately. It will take some experimentation to find a workable definition of "appropriately": narrow the conv layers by some factor, add a drop-out, cut out one conv inception entirely, ...
We know that Convolution layer in CNN uses filters and different filters will look for different information in the input image.
But let say in this SSD, we have prototxt file and it has specification for the convolution layer as
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1"
top: "conv2_1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
All convolution layers in different networks like (GoogleNet, AlexNet, VGG etc) are more or less similar.
Just look at that and how to understand, filters in this convolution layer try to extract which information of the input image?
EDIT:
Let me clarify for my question.
I see two convolutions layer from the prototxt file as follows. They are from SSD.
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1"
top: "conv2_1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
Then I print here of their outputs
Data
conv1_1 and conv2_1 images are here and here.
So my query is how these two conv layers produced different output. But no difference in prototxt file.
The filters at earlier layers represent low-level features like edges
(These features retain higher
spatial resolution for precise localization with low-level visual information similar to the response map of Gabor filters). On the other hand, the filter at the mid-layer extract features like corners or blobs, which are more complex.
And as you go deeper you can not visualize and interpret these features, because filters in mid-level and high-level layers are not directly connected to the input image. For instance, when you get the output of the first layer you can actually visualize and interpret it as edges but when you go deeper and apply second convolution layer to these extracted edges (the output of the first layer), then you get something like edges of edges ( or sth like this) and capture more semantic information and less fine-grained spatial details. In the the prototxt file all convolutions and other types of operation can resemble each other. But they extract different kinds of features, because of having different order and weights.
"Convolution" layer differ not only in their parameters (e.g., kernel_size, stride, pad etc.) but also in their weights: the trainable parameters of the convolution kernels.
You see different output (aka "responses") because the weights of the filters are different.
See this answer regarding the difference between "data" blobs and "parameter/weights" blobs in caffe.
I'm using the following command to draw the block diagram of networks from prototxt files in caffe
python draw_net.py <filename.prototxt> <output.png>
This works fine if I use Alexnet, BVLC Caffenet or even RCNN. But when I use VGG-16 file, it gives a blank output image of size 11x11. No error is thrown. I have verified the paths too. All the files are taken from the Caffe Model Zoo. I'm using the Caffe taken from the master branch.
Your VGG16 file may contain old type definition of layers:
layers {
bottom: "data"
top: "conv1_1"
name: "conv1_1"
type: CONVOLUTION
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}
To make it work, you need to make use of the new API of type:
layer {
bottom: "conv1_1"
top: "conv1_2"
name: "conv1_2"
type: "Convolution"
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
I have, admittedly, a rather large network. It's based on a network from a paper that claims to use Caffe for the implementation. Here's the topology:
To the best of my ability, I've tried to recreate the model. The authors use the term "upconv" which is a combination of 2x2 unpooling followed by 5x5 convolution. I've taken this to mean a deconvolutional layer with stride 2 and kernel size 5 (please do correct me if you believe otherwise). Here's a short snippet from the full model and solver:
...
# upconv2
layer {
name: "upconv2"
type: "Deconvolution"
bottom: "upconv1rec"
top: "upconv2"
convolution_param {
num_output: 65536 # 256x16x16
kernel_size: 5
stride: 2
}
}
layer {
name: "upconv2-rec"
type: "ReLU"
bottom: "upconv2"
top: "upconv2rec"
relu_param {
negative_slope: 0.01
}
}
# upconv3
layer {
name: "upconv3"
type: "Deconvolution"
bottom: "upconv2rec"
top: "upconv3"
convolution_param {
num_output: 94208 # 92x32x32
kernel_size: 5
stride: 2
}
}
...
But it seems this is too large for Caffe to handle:
I0502 10:42:08.859184 13048 net.cpp:86] Creating Layer upconv3
I0502 10:42:08.859184 13048 net.cpp:408] upconv3 <- upconv2rec
I0502 10:42:08.859184 13048 net.cpp:382] upconv3 -> upconv3
F0502 10:42:08.859184 13048 blob.cpp:34] Check failed: shape[i] <= 2147483647 / count_ (94208 vs. 32767) blob size exceeds INT_MAX
How can I get around this limitation?
I have a data with 10-d label vector, and I want to use a caffe model to make regression against these data with 10-d output. But now, I only want to check loss of some outputs (for example, 1, 3, 4, 5, 6-d of 10-d vector), so I define a layer with 5-d output at the bottom of the last output layer, But I'v no idea how to get corresponding 5-d label vector groundtruth, I think may be I can define a constant layer to indicate which entries I want get. Please help me if you have any ideas.
update: example
This is my original InnerProduct and Loss layer
layer {
name: "score"
type: "InnerProduct"
bottom: "fc7"
top: "score"
inner_product_param {
num_output: 10
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "score"
bottom: "label"
top: "loss"
include {
phase: TRAIN
}
}
I care more about $n_1$ (like 1,3,4,5,6) entries of the 10-dimension output and their loss, so I want to fetch the loss of these entries, like
layer {
name: "score1"
type: "InnerProduct"
bottom: "fc7"
top: "score1"
inner_product_param {
num_output: 5 # n_1
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "loss1"
type: "EuclideanLoss"
bottom: "score1"
bottom: "label"
top: "loss1"
include {
phase: TRAIN
}
}
How can I get score1 from score directly?
From what I interpret from your question, I think you would like to calculate the loss of the output layer w.r.t the labels for a regression model. But you would like to not bring some of the labels to the equation.
If my interpretation is true, being a regression model, I am expecting your new layer to be something similar to that of EuclidieanLayer. If it is so, the caffe_sub function in the layer could be replaced by the following code segment.
int arrayPos[5] = {1,3,4,5,6};
int count = 5;
Dtype *newBottom0=(Dtype*)malloc(sizeof(Dtype)*count);
Dtype *newBottom1=(Dtype*)malloc(sizeof(Dtype)*count);
for(int varI=0; varI<count; varI++)
{
newBottom0[varI] = (Dtype) bottom[0]->cpu_data()[arrayPos[varI]];
newBottom1[varI] = (Dtype) bottom[1]->cpu_data()[arrayPos[varI]];
}
caffe_sub( count, newBottom0, newBottom1, diff_.mutable_cpu_data());
free(newBottom0);
free(newBottom1);