Caffe - multi-class and multi-label image classification - deep-learning

I'm trying to create a single multi-class and multi-label net configuration in caffe.
Let's say classification of dogs:
Is the dog small or large? (class)
What color is it? (class)
is it have a collar? (label)
Is this thing possible using caffe?
What is the proper way to do so?
What is the right way to build the lmdb file?
All the publications about multi-label classification are from around 2015, something in this subject changed since then?
Thanks.

The problem with Caffe's LMDB interface is that it only allows for a single int label per image.
If you want multiple labels per image you'll have to use a different input layer.
I suggest using "HDF5Data" layer:
This allows for more flexibility setting the input data, you may have as many "top"s as you want for this layer. You may have multiple labels per input image and have multiple losses for your net to train on.
See this post on how to create hdf5 data for caffe.

Thanks Shai,
Just trying to understand the practical way..
After creating 2 .text files (one for training and one for validation) containing all the tags of the images, for example:
/train/img/1.png 0 4 18
/train/img/2.png 1 7 17 33
/train/img/3.png 0 4 17
Running the py script:
import h5py, os
import caffe
import numpy as np
SIZE = 227 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
# you may apply other input transformations here...
# Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
# for example
transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img
y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
L.write( 'train.h5' ) # list all h5 files you are going to use
And creating train.h5 and val.h5 (is X data set containing the images and Y contain the labels?).
Replace my network input layers from:
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/train_db"
backend: LMDB
batch_size: 64
}
transform_param {
crop_size: 227
mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto"
mirror: true
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/val_db"
backend: LMDB
batch_size: 64
}
transform_param {
crop_size: 227
mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto"
mirror: true
}
include: { phase: TEST }
}
to
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "val_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TEST }
}
I guess HDF5 doesn't need a mean.binaryproto?
Next, how the output layer should change in order to output multiple label probabilities?
I guess I need cross- entropy layer instead of softmax?
This is the current output layers:
layers {
bottom: "prob"
bottom: "label"
top: "loss"
name: "loss"
type: SOFTMAX_LOSS
loss_weight: 1
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "prob"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}

Related

Slicing an input data layer in caffe - unknown blob input

I am trying to do pixel-wise classification with caffe, so need to provide a ground truth image the size of the input image. There is several ways of doing this, and I decided to set up my input as a 4-channel LMDB (according to the 2nd point of this answer). This requires me to add a Slice layer after my input, which is also outlined in the same answer.
I keep getting Unknown blob input data_lmdb to layer 0 as an error message (data_lmdb is supposed to be my very bottom input layer). I found that unknown blob (be it top or bottom) error is mostly caused by forgetting to define something in one of the TRAIN / TEST phases while defining it in the other (e.g. this question, or this one). But, I am using a combination of train.prototxt, inference.prototxt and solver.prototxt files that I have previously used, just replacing the input layers from HD5 to LMDB (for a bit of practice), so everything should be defined.
Can anybody see why I am getting the Unknown blob input data_lmdb to layer 0 error? From the train log files I can see that it crashes as soon as it reads the train.prototxt file (it doesn't even reach the Creating layer part).
My prototxt files are as follows:
solver.prototxt
net: "train.prototxt" # Change this to the absolute path to your model file
test_initialization: false
test_iter: 1
test_interval: 1000000
base_lr: 0.01
lr_policy: "fixed"
gamma: 1.0
stepsize: 2000
display: 20
momentum: 0.9
max_iter: 10000
weight_decay: 0.0005
snapshot: 100
snapshot_prefix: "set_snapshot_name" # Absolute path to output solver snapshots
solver_mode: GPU
train.prototxt (first two layers only; they are followed by a LNR normalization layer and then a Convolution layer):
name: "my_net"
layer {
name: "data_lmdb"
type: "Data"
top: "slice_input"
data_param {
source: "data/train"
batch_size: 4
backend: LMDB
}
}
layer{
name: "slice_input"
type: "Slice"
bottom: "data_lmdb" # 4-channels = rgb+truth
top: "data"
top: "label"
slice_param {
axis: 1
slice_point: 3
}
}
The first few layer definitions in inference.prototxt are identical to train.prototxt (which shouldn't matter anyway as it is not used in training) except the following:
in data_lmdb the source path is different (data/test)
in data_lmdb layer uses batch_size: 1
Please do let me know if I need to include any more information or layers. I was trying to keep it brief, which didn't really work out in the end.
The message Unknown blob input points on non-existent blob that some layer wants to have as input. Your slice_input layer specified data_lmdb as input blob, but there is no such a blob in your network. Instead, you have a layer with such a name. Blob names are defined by the top field, which is slice_input in this case.
You shoud either change top: "slice_input" to top: "data_lmdb" in your data_lmdb layer, or use bottom: "slice_input" # 4-channels = rgb+truth.
However, for more clear naming I would offer you the following:
name: "my_net"
layer {
name: "data"
type: "Data"
top: "data_and_label"
data_param {
source: "data/train"
batch_size: 4
backend: LMDB
}
}
layer{
name: "slice_input"
type: "Slice"
bottom: "data_and_label" # 4-channels = rgb+truth
top: "data"
top: "label"
slice_param {
axis: 1
slice_point: 3
}
}

Convert Caffe configuration to DeepLearning4J configuration

I need to implement an existing Caffe model with DeepLearning4j. However i am new to DL4J so dont know how to implement. Searching through docs and examples had little help, the terminolgy of those two are very different.
How would you write the below caffe prototxt in dl4j ?
Layer1:
layers {
name: "myLayer1"
type: CONVOLUTION
bottom: "data"
top: "myLayer1"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 20
kernel_w: 2
kernel_h: 2
stride_w: 1
stride_h: 1
weight_filler {
type: "msra"
variance_norm: AVERAGE
}
bias_filler {
type: "constant"
}
}
}
Layer 2
layers {
name: "myLayer1Relu"
type: RELU
relu_param {
negative_slope: 0.3
}
bottom: "myLayer1"
top: "myLayer1"
}
Layer 3
layers {
name: "myLayer1_dropout"
type: DROPOUT
bottom: "myLayer1"
top: "myLayer1"
dropout_param {
dropout_ratio: 0.2
}
}
Layer 4
layers {
name: "final_class"
type: INNER_PRODUCT
bottom: "myLayer4"
top: "final_class"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
variance_norm: AVERAGE
}
bias_filler {
type: "constant"
value: 0
}
}
}
This Github repo contains comparisons on the same model between DL4J, Caffe, Tensorflow, Torch.
1st layer is DL4J ConvolutionLayer and you can pass in attributes regarding nOut, kernel, stride and weightInit. From quick search it appears msra is equivalent to WeightInit.RELU and variance_norm is not a feature the model supports yet.
2nd layer is party of the ConvolutionLayer which is the activation
attribute; thus, set the attribute for the layer to "relu". Negative slope is not a feature that the model supports yet.
3rd layer is also an attribute on ConvolutionLayer which is dropOut
and you would pass in 0.2. There is work in progress to create a
specific DropOutLayer but its not merged yet.
4th layer would be a DenseLayer if there was another layer after it
but since its the last layer it is an OutputLayer
blobs_lr applies multiplier to weight lr and bias lr respectively. You can
change the learning rate on the layer by setting attributes on that
layer for learningRate and biasLearningRate
weight_decay is setting the l1 or l2 on the layer which you can set
for each layer with the attributes l1 or l2. DL4J defaults to not
applying l1 or l2 to bias thus the second weight_decay set to 0 in
Caffe.
bias filler is already default to constant and defaults to 0.
Below is a quick example of how your code would translate. More information can be found in DL4J examples:
int learningRate = 0.1;
int l2 = 0.005;
int intputHeight = 28;
int inputWidth = 28;
int channels = 1;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(iterations)
.regularization(false).l2(l2)
.learningRate(learningRate)
.list()
.layer(0, new ConvolutionLayer.Builder(new int[]{2,2}, new int[] {1,1})
.name("myLayer1")
.activation("relu").dropOut(0.2).nOut(20)
.biasLearningRate(2*learningRate).weightInit(WeightInit.RELU)
.build())
.layer(1, new OutputLayer.Builder()
.name("myLayer4").nOut(10)
.activation("softmax").l2(1 * l2).biasLearningRate(2*learningRate)
.weightInit(WeightInit.XAVIER).build())
.setInputType(InputType.convolutionalFlat(inputHeight,inputWidth,channels))
.build();
there's no automated way to do this but mapping the builder DSL for only a few laayers shouldn't be hard. A bare minimum example is here:
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/LenetMnistExample.java
You can see the same primitives, eg: stride,padding, xavier, biasInit all in there.
Our upcoming keras import might be a way for you to bridge caffe -> keras -> dl4j though.
Edit: I'm not going to build it for you. (I'm not sure if that's what you're looking for here)
Dl4j has the right primitives already though. It doesn't have an input layer for variance_norm: you use zero mean and unit variance normalization on the input before passing it in.
We have bias Init as part of the config if you just read the javadoc:
http://deeplearning4j.org/doc

caffe pixel-wise classification / regression

What I want to do is to do a simple pixel-wise classification or regression task. Therefore I have an input image and a ground_truth. What I want to do is to do an easy segmentation task where I have a circle and a rectangle. And I want to train, where the circle or where the rectangle is. That means I have an ground_truth images which has value "1" at all the locations where the circle is and value "2" at all the locations where the rectangle is. Then I have my images and ground_truth images as input in form of .png images.
Then I think I can either to a regression or classification task depending on my loss layer: I have been using the fully convolutional AlexNet from fcn alexnet
classification:
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 3 ## <<---- 0 = backgrund 1 = circle 2 = rectangle
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss" ## <<----
bottom: "score"
bottom: "ground_truth"
top: "loss"
loss_param {
ignore_label: 0
}
}
regression:
layer {
name: "upscore"
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 1 ## <<---- 1 x height x width
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
layer {
name: "loss"
type: "EuclideanLoss" ## <<----
bottom: "score"
bottom: "ground_truth"
top: "loss"
}
However, this produces not even the results I want to have. I think there is something wrong with my understanding of pixel-wise classification / regression. Could you tell me where my mistake is?
EDIT 1
For regression the retrieval of the output would look like this:
output_blob = pred['result'].data
predicated_image_array = np.array(output_blob)
predicated_image_array = predicated_image_array.squeeze()
print predicated_image_array.shape
#print predicated_image_array.shape
#print mean_array
range_value = np.ptp(predicated_image_array)
min_value = predicated_image_array.min()
max_value = predicated_image_array.max()
# make positive
predicated_image_array[:] -= min_value
if not range_value == 0:
predicated_image_array /= range_value
predicated_image_array *= 255
predicated_image_array = predicated_image_array.astype(np.int64)
print predicated_image_array.shape
cv2.imwrite('predicted_output.jpg', predicated_image_array)
This is easy since the output is 1 x height x width and the values are the actual output values. But how would one retrieve the output for classification / SotMaxLayer since the output is 3 (num labels) x height x width. But I do not know the meaning of the content of this shape.
first of all, your problem is not regression, but classification!
if you want to teach the net recognise circles and rectangles you have to make a different data set - an images and labels, for example: circle - 0 and rectangle - 1. you do it by making text file that containsthe images path and the images labels, for example: /path/circle1.png 0 /path/circle2.png 0 /path/rectangle1.png 1 /path/rectangle1.png 1. here is a nice tutorial for a problem like yours. good luck.

How to compute test/validation loss in pycaffe

I am trying to compute the test loss in my own training loop in python. Calling solver.test_nets[0].forward() seems to update the score blob but not the loss one. Any idea how to get it updated?
I am using the following solver config:
net: "/tmp/tmp8ikb9sg2/train.prototxt"
test_net: "/tmp/tmp8ikb9sg2/test.prototxt"
test_iter: 1
test_interval: 2147483647
base_lr: 0.1
lr_policy: "fixed"
test_initialization: false
and train and test.prototxt are exactly the same except for the phase definition at the top of the file:
name: "pycaffenet"
state {
phase: TRAIN # set TEST in test.prototxt
}
...
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "output"
top: "loss"
}
It was actually a different issue that what I thought. The loss blob was being updated but it remained the same because the weights of solver.test_nets[0] were not changing. It looks like they not automatically shared with solver.net. This can be done by simply calling:
solver.test_nets[0].share_with(solver.net)

How to use 1-dim vector as input for caffe?

I'd like to train a neural network (NN) on my own 1-dim data, which I stored in a hdf5 database for caffe. According to the documetation this should work. It also works for me as far as I only use "Fully Connected Layers", "Relu" and "Dropout". However I get an error when I try to use "Convolution" and "Max Pooling" layers in the NN architecture. The error complains about the input dimension of the data.
I0622 16:44:20.456007 9513 net.cpp:84] Creating Layer conv1
I0622 16:44:20.456015 9513 net.cpp:380] conv1 <- data
I0622 16:44:20.456048 9513 net.cpp:338] conv1 -> conv1
I0622 16:44:20.456061 9513 net.cpp:113] Setting up conv1
F0622 16:44:20.456487 9513 blob.cpp:28] Check failed: shape[i] >= 0 (-9 vs. 0)
This is the error when I only want to use a "Pooling" layer behind an "InnerProduct" layer:
I0622 16:52:44.328660 9585 net.cpp:338] pool1 -> pool1
I0622 16:52:44.328666 9585 net.cpp:113] Setting up pool1
F0622 16:52:44.328680 9585 pooling_layer.cpp:84] Check failed: 4 == bottom[0]->num_axes() (4 vs. 2) Input must have 4 axes, corresponding to (num, channels, height, width)
However I don't know how to change the input dimensions such that it works.
This is the beginning of my prototxt file specifying the network architecture:
name: "LeNet"
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/path/to/my/data/train.txt"
batch_size: 200
}
}
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TEST
}
hdf5_data_param {
source: "/path/to/my/data/test.txt"
batch_size: 200
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 1
kernel_h: 11
kernel_w: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_h: 3
kernel_w: 1
stride: 2
}
}
And this is how I output my 4D-database (with two singleton dimensions) using Matlabs h5write function:
h5create('train.h5','/data',[dimFeats 1 1 numSamplesTrain]);
h5write('train.h5','/data', traindata);
You seem to be outputting your data using the wrong shape. Caffe blobs have the dimensions (n_samples, n_channels, height, width) .
Other than that your prototxt seems to be fine for doing predictions based on a 1D input.
As I have no experience in using the h5create and h5write in Matlab, I am not sure on whether the training dataset is generated with the dimensions that you expect it to generate.
The error msg for the convolution layer says that shape[i] = -9. This means that either the width, height, channels or number of images in a batch is being set to -9.
The error msg when using pooling layer alone says that the network could detect only an input of 2D while the network is expecting an input of 4D.
The error messages in both the layers are related to reshaping the blobs and this is a clear indication that the dimensions of the input are not as expected.
Try debugging the Reshape functions present in blob.cpp & layers/pooling_layer.cpp to get an insight on which value is actually going rogue.