I have a .prototxt file and a .caffemodel file, for a network I built and trained in py-caffe, using self developed Python layers.
Later, I implemented the same layers for the C++ version of Caffe.
The question is whether it is possible to only change the layer types (from the Python ones to the C++ ones) in the .prototxt file, while using the same already trained .caffemodel file, in order to load and use the trained network, this time with the C++ layers?
Yes, of course it's possible. In case you have layer parameters, you should add a new layer_param message to caffe.proto, and read them accordingly in your cpp layer.
For example, PVANet caffe repository has implementation for the original Python proposal layer.
As you can see in caffe.proto, a new message was added to hold the Python layer's parameters -
// Message that stores parameters used by ProposalLayer
message ProposalParameter {
optional uint32 feat_stride = 1 [default = 16];
optional uint32 base_size = 2 [default = 16];
optional uint32 min_size = 3 [default = 16];
repeated float ratio = 4;
repeated float scale = 5;
optional uint32 pre_nms_topn = 6 [default = 6000];
optional uint32 post_nms_topn = 7 [default = 300];
optional float nms_thresh = 8 [default = 0.7];
}
The original train / test prototxt contains the following layer:
layer {
name: 'proposal'
type: 'Python'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rpn_rois'
top: 'rpn_scores'
include { phase: TRAIN }
python_param {
module: 'rpn.proposal_layer'
layer: 'ProposalLayer'
param_str: "{'feat_stride': 16, 'ratios': [0.333, 0.5, 0.667, 1, 1.5, 2, 3], 'scales': [2, 3, 5, 9, 16, 32]}"
}
}
When the new cpp one looks like:
layer {
name: 'proposal'
type: 'proposal'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rpn_rois'
top: 'rpn_scores'
include { phase: TRAIN }
proposal_param {
feat_stride: 16
ratios: 0.333, 0.5, 0.667, 1, 1.5, 2, 3
scales: 2, 3, 5, 9, 16, 32
}
}
etc.
Related
I'm using the Caffenet model. I did a classification task with 9 classes successfully. Then I tried to change it to a regression network by preparing another LMDB file with labels ranging from 700 to 1400. I changed the original training code and replaced the softmax with EuclideanLoss and num_outputs to 1. I did the same for testing and got this error:
"Check failed: ExactNumBottomBlobs() == bottom.size() (2 vs. 1) EuclideanLoss Layer takes 2 bottom blob(s) as input. * Check failure stack trace: * Aborted (core dumped)"
So I commented out EuclideanLoss layer:
layer { name: "prob" type: "EuclideanLoss" bottom: "fc8-cats-dogs-n" top: "prob" }
but now I get:
File "testr.py", line 86, in pred_probas = out['prob'] KeyError: 'prob'
Can anyone help me with this please?
I am facing an issue in converting my caffe model to dlc using SNPE.
Specifically in the "Scale" layer.
The first two layers are as follows
name: "First"
input: "data"
input_shape {
dim: 1
dim: 3
dim: xxx
dim: xxx
}
layer {
name: "data/Scale"
type: "Scale"
bottom: "data"
top: "data/Scale"
scale_param {
filler: {
value: 0.0078125
}
bias_term: true
bias_filler: {
value: -1
}
}
param {
lr_mult: 0
decay_mult: 1
}
param {
lr_mult: 0
decay_mult: 0
}
}
layer {
name: "Conv2d_0/convolution"
type: "Convolution"
convolution_param {
num_output: 32
pad: 1
kernel_size: 3
stride: 2
}
bottom: 'data/Scale'
top: "Conv2d_0/convolution"
}
I get the following error:
('Encountered Error:', 'list index out of range')
Stack Trace:
Traceback (most recent call last):
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/bin/x86_64-linux-clang/snpe-caffe-to-dlc", line 115, in <module>
args.enable_strict_validation)
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/lib/python/snpe/snpe_caffe_to_dlc.py", line 1145, in convert
self.convert_caffe_new(self.spec)
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/lib/python/snpe/snpe_caffe_to_dlc.py", line 1327, in convert_caffe_new
layer_seq = self._blob_connectivity_map.check_s_folding(layer)
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/lib/python/snpe/snpe_caffe_to_dlc.py", line 459, in check_s_folding
output_layer = self._blobs[prev_layer_output_blob]['output_of_layers'][0]
IndexError: list index out of range
Here is the documentation for the Scale layer limitation of SNPE:
https://developer.qualcomm.com/docs/snpe/limitations.html
Batch normalization (+ Scaling)
Caffe: Scaling (scale_layer) is optional. If present, it extends functionality of Batch normalization (batch_norm_layer). If not present, batch_norm_layer will still be converted as per Caffe specification. scale_layer used anywhere else in the network but immediately after the batch_norm_layer is not supported.
There is support for scaling, but only if it's part of the data layer:
https://developer.qualcomm.com/docs/snpe/network_layers.html
Scale (Image)
Input image scaling, maintains aspect ratio. This function is
primarily intended for images, but technically any 2D input data can
be processed if it makes sense. Scaling parameters are provided as an
option to the model converter tool.
There is no such Caffe layer by itself. This functionality is
technically part of the Caffe data provider.
I'm trying to create a single multi-class and multi-label net configuration in caffe.
Let's say classification of dogs:
Is the dog small or large? (class)
What color is it? (class)
is it have a collar? (label)
Is this thing possible using caffe?
What is the proper way to do so?
What is the right way to build the lmdb file?
All the publications about multi-label classification are from around 2015, something in this subject changed since then?
Thanks.
The problem with Caffe's LMDB interface is that it only allows for a single int label per image.
If you want multiple labels per image you'll have to use a different input layer.
I suggest using "HDF5Data" layer:
This allows for more flexibility setting the input data, you may have as many "top"s as you want for this layer. You may have multiple labels per input image and have multiple losses for your net to train on.
See this post on how to create hdf5 data for caffe.
Thanks Shai,
Just trying to understand the practical way..
After creating 2 .text files (one for training and one for validation) containing all the tags of the images, for example:
/train/img/1.png 0 4 18
/train/img/2.png 1 7 17 33
/train/img/3.png 0 4 17
Running the py script:
import h5py, os
import caffe
import numpy as np
SIZE = 227 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
# you may apply other input transformations here...
# Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
# for example
transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img
y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
L.write( 'train.h5' ) # list all h5 files you are going to use
And creating train.h5 and val.h5 (is X data set containing the images and Y contain the labels?).
Replace my network input layers from:
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/train_db"
backend: LMDB
batch_size: 64
}
transform_param {
crop_size: 227
mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto"
mirror: true
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/val_db"
backend: LMDB
batch_size: 64
}
transform_param {
crop_size: 227
mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto"
mirror: true
}
include: { phase: TEST }
}
to
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "val_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TEST }
}
I guess HDF5 doesn't need a mean.binaryproto?
Next, how the output layer should change in order to output multiple label probabilities?
I guess I need cross- entropy layer instead of softmax?
This is the current output layers:
layers {
bottom: "prob"
bottom: "label"
top: "loss"
name: "loss"
type: SOFTMAX_LOSS
loss_weight: 1
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "prob"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
I need to implement an existing Caffe model with DeepLearning4j. However i am new to DL4J so dont know how to implement. Searching through docs and examples had little help, the terminolgy of those two are very different.
How would you write the below caffe prototxt in dl4j ?
Layer1:
layers {
name: "myLayer1"
type: CONVOLUTION
bottom: "data"
top: "myLayer1"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 20
kernel_w: 2
kernel_h: 2
stride_w: 1
stride_h: 1
weight_filler {
type: "msra"
variance_norm: AVERAGE
}
bias_filler {
type: "constant"
}
}
}
Layer 2
layers {
name: "myLayer1Relu"
type: RELU
relu_param {
negative_slope: 0.3
}
bottom: "myLayer1"
top: "myLayer1"
}
Layer 3
layers {
name: "myLayer1_dropout"
type: DROPOUT
bottom: "myLayer1"
top: "myLayer1"
dropout_param {
dropout_ratio: 0.2
}
}
Layer 4
layers {
name: "final_class"
type: INNER_PRODUCT
bottom: "myLayer4"
top: "final_class"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
variance_norm: AVERAGE
}
bias_filler {
type: "constant"
value: 0
}
}
}
This Github repo contains comparisons on the same model between DL4J, Caffe, Tensorflow, Torch.
1st layer is DL4J ConvolutionLayer and you can pass in attributes regarding nOut, kernel, stride and weightInit. From quick search it appears msra is equivalent to WeightInit.RELU and variance_norm is not a feature the model supports yet.
2nd layer is party of the ConvolutionLayer which is the activation
attribute; thus, set the attribute for the layer to "relu". Negative slope is not a feature that the model supports yet.
3rd layer is also an attribute on ConvolutionLayer which is dropOut
and you would pass in 0.2. There is work in progress to create a
specific DropOutLayer but its not merged yet.
4th layer would be a DenseLayer if there was another layer after it
but since its the last layer it is an OutputLayer
blobs_lr applies multiplier to weight lr and bias lr respectively. You can
change the learning rate on the layer by setting attributes on that
layer for learningRate and biasLearningRate
weight_decay is setting the l1 or l2 on the layer which you can set
for each layer with the attributes l1 or l2. DL4J defaults to not
applying l1 or l2 to bias thus the second weight_decay set to 0 in
Caffe.
bias filler is already default to constant and defaults to 0.
Below is a quick example of how your code would translate. More information can be found in DL4J examples:
int learningRate = 0.1;
int l2 = 0.005;
int intputHeight = 28;
int inputWidth = 28;
int channels = 1;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(iterations)
.regularization(false).l2(l2)
.learningRate(learningRate)
.list()
.layer(0, new ConvolutionLayer.Builder(new int[]{2,2}, new int[] {1,1})
.name("myLayer1")
.activation("relu").dropOut(0.2).nOut(20)
.biasLearningRate(2*learningRate).weightInit(WeightInit.RELU)
.build())
.layer(1, new OutputLayer.Builder()
.name("myLayer4").nOut(10)
.activation("softmax").l2(1 * l2).biasLearningRate(2*learningRate)
.weightInit(WeightInit.XAVIER).build())
.setInputType(InputType.convolutionalFlat(inputHeight,inputWidth,channels))
.build();
there's no automated way to do this but mapping the builder DSL for only a few laayers shouldn't be hard. A bare minimum example is here:
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/LenetMnistExample.java
You can see the same primitives, eg: stride,padding, xavier, biasInit all in there.
Our upcoming keras import might be a way for you to bridge caffe -> keras -> dl4j though.
Edit: I'm not going to build it for you. (I'm not sure if that's what you're looking for here)
Dl4j has the right primitives already though. It doesn't have an input layer for variance_norm: you use zero mean and unit variance normalization on the input before passing it in.
We have bias Init as part of the config if you just read the javadoc:
http://deeplearning4j.org/doc
I am using caffe and it doesn't have a locally connected layer. So any example on how to use im2col layer, reshape layer and inner product layer to implement locally connected layer? Thanks
Personal View of Point:
I have also tried to use Crop, Im2col, Reshape and InnerProduct layer to implement locally connected layer but failed.
Because when I want to implement a convolution operation using InnerProduct layer, I find that in InnerProductLayer<Dtype>::Forward_cpu() function:
caffe_cpu_gemm<Dtype>(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans,
M_, N_, K_, (Dtype)1.,
bottom_data, weight, (Dtype)0., top_data);
and in BaseConvolutionLayer<Dtype>::forward_cpu_gemm() function:
caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /
group_, conv_out_spatial_dim_, kernel_dim_,
(Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,
(Dtype)0., output + output_offset_ * g);
the weight(s), which should be used as convolution kernels, are passed to different arguments of caffe_cpu_gemm().
So I can't implement a convolution operation using InnerProductLayer<Dtype>::Forward_cpu() function and thus can't implement a local connected layer(I mean local convolution here) using Crop, Im2col, Reshape and InnerProduct layers.
My solution:
However, I implemented a local convolution layer here and its idea is to divide input feature maps into N*N grid(even with overlap) and performs convolution on each of the grid using different kernels. For example, the input feature maps have a shape (2, 3, 8, 8) and you want to divide the spatial feature map 8*8 into 16 2*2 local regions and then perform convolution on each local region with different bank of kernels, you can write a prototxt like this:
layer {
name: "local_conv"
type: "LocalConvolution"
bottom: "bottom" # shape (2,3,8,8)
top: "top"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
local_conv_param {
local_region_number_h: 4
local_region_number_w: 4
local_region_ratio_h: 0.3 # determin the height/width of local regions
local_region_ratio_w: 0.3 # local_region_size = floor(local_region_ratio * input_size)
local_region_step_h: 2 # step between local regions on the top left part
# and other regions will lie in the axial symmetry positions
# automatically
local_region_step_w: 2
num_output: 5
kernel_h: 3
kernel_w: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
You can easily add this layer to your caffe and the related files are:
include/caffe/layers/local_conv_layer.hpp
src/caffe/layers/local_conv_layer.cpp(cu)
and you should also add message LocalConvolutionParameter, optional LocalConvolutionParameter local_conv_param from src/caffe/proto/caffe.proto to your caffe.proto.
.