I need to implement an existing Caffe model with DeepLearning4j. However i am new to DL4J so dont know how to implement. Searching through docs and examples had little help, the terminolgy of those two are very different.
How would you write the below caffe prototxt in dl4j ?
Layer1:
layers {
name: "myLayer1"
type: CONVOLUTION
bottom: "data"
top: "myLayer1"
blobs_lr: 1
blobs_lr: 2
convolution_param {
num_output: 20
kernel_w: 2
kernel_h: 2
stride_w: 1
stride_h: 1
weight_filler {
type: "msra"
variance_norm: AVERAGE
}
bias_filler {
type: "constant"
}
}
}
Layer 2
layers {
name: "myLayer1Relu"
type: RELU
relu_param {
negative_slope: 0.3
}
bottom: "myLayer1"
top: "myLayer1"
}
Layer 3
layers {
name: "myLayer1_dropout"
type: DROPOUT
bottom: "myLayer1"
top: "myLayer1"
dropout_param {
dropout_ratio: 0.2
}
}
Layer 4
layers {
name: "final_class"
type: INNER_PRODUCT
bottom: "myLayer4"
top: "final_class"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
variance_norm: AVERAGE
}
bias_filler {
type: "constant"
value: 0
}
}
}
This Github repo contains comparisons on the same model between DL4J, Caffe, Tensorflow, Torch.
1st layer is DL4J ConvolutionLayer and you can pass in attributes regarding nOut, kernel, stride and weightInit. From quick search it appears msra is equivalent to WeightInit.RELU and variance_norm is not a feature the model supports yet.
2nd layer is party of the ConvolutionLayer which is the activation
attribute; thus, set the attribute for the layer to "relu". Negative slope is not a feature that the model supports yet.
3rd layer is also an attribute on ConvolutionLayer which is dropOut
and you would pass in 0.2. There is work in progress to create a
specific DropOutLayer but its not merged yet.
4th layer would be a DenseLayer if there was another layer after it
but since its the last layer it is an OutputLayer
blobs_lr applies multiplier to weight lr and bias lr respectively. You can
change the learning rate on the layer by setting attributes on that
layer for learningRate and biasLearningRate
weight_decay is setting the l1 or l2 on the layer which you can set
for each layer with the attributes l1 or l2. DL4J defaults to not
applying l1 or l2 to bias thus the second weight_decay set to 0 in
Caffe.
bias filler is already default to constant and defaults to 0.
Below is a quick example of how your code would translate. More information can be found in DL4J examples:
int learningRate = 0.1;
int l2 = 0.005;
int intputHeight = 28;
int inputWidth = 28;
int channels = 1;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(iterations)
.regularization(false).l2(l2)
.learningRate(learningRate)
.list()
.layer(0, new ConvolutionLayer.Builder(new int[]{2,2}, new int[] {1,1})
.name("myLayer1")
.activation("relu").dropOut(0.2).nOut(20)
.biasLearningRate(2*learningRate).weightInit(WeightInit.RELU)
.build())
.layer(1, new OutputLayer.Builder()
.name("myLayer4").nOut(10)
.activation("softmax").l2(1 * l2).biasLearningRate(2*learningRate)
.weightInit(WeightInit.XAVIER).build())
.setInputType(InputType.convolutionalFlat(inputHeight,inputWidth,channels))
.build();
there's no automated way to do this but mapping the builder DSL for only a few laayers shouldn't be hard. A bare minimum example is here:
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/LenetMnistExample.java
You can see the same primitives, eg: stride,padding, xavier, biasInit all in there.
Our upcoming keras import might be a way for you to bridge caffe -> keras -> dl4j though.
Edit: I'm not going to build it for you. (I'm not sure if that's what you're looking for here)
Dl4j has the right primitives already though. It doesn't have an input layer for variance_norm: you use zero mean and unit variance normalization on the input before passing it in.
We have bias Init as part of the config if you just read the javadoc:
http://deeplearning4j.org/doc
Related
I am facing an issue in converting my caffe model to dlc using SNPE.
Specifically in the "Scale" layer.
The first two layers are as follows
name: "First"
input: "data"
input_shape {
dim: 1
dim: 3
dim: xxx
dim: xxx
}
layer {
name: "data/Scale"
type: "Scale"
bottom: "data"
top: "data/Scale"
scale_param {
filler: {
value: 0.0078125
}
bias_term: true
bias_filler: {
value: -1
}
}
param {
lr_mult: 0
decay_mult: 1
}
param {
lr_mult: 0
decay_mult: 0
}
}
layer {
name: "Conv2d_0/convolution"
type: "Convolution"
convolution_param {
num_output: 32
pad: 1
kernel_size: 3
stride: 2
}
bottom: 'data/Scale'
top: "Conv2d_0/convolution"
}
I get the following error:
('Encountered Error:', 'list index out of range')
Stack Trace:
Traceback (most recent call last):
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/bin/x86_64-linux-clang/snpe-caffe-to-dlc", line 115, in <module>
args.enable_strict_validation)
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/lib/python/snpe/snpe_caffe_to_dlc.py", line 1145, in convert
self.convert_caffe_new(self.spec)
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/lib/python/snpe/snpe_caffe_to_dlc.py", line 1327, in convert_caffe_new
layer_seq = self._blob_connectivity_map.check_s_folding(layer)
File "/home/nithin.ga/SNPE_19/snpe-1.19.2/lib/python/snpe/snpe_caffe_to_dlc.py", line 459, in check_s_folding
output_layer = self._blobs[prev_layer_output_blob]['output_of_layers'][0]
IndexError: list index out of range
Here is the documentation for the Scale layer limitation of SNPE:
https://developer.qualcomm.com/docs/snpe/limitations.html
Batch normalization (+ Scaling)
Caffe: Scaling (scale_layer) is optional. If present, it extends functionality of Batch normalization (batch_norm_layer). If not present, batch_norm_layer will still be converted as per Caffe specification. scale_layer used anywhere else in the network but immediately after the batch_norm_layer is not supported.
There is support for scaling, but only if it's part of the data layer:
https://developer.qualcomm.com/docs/snpe/network_layers.html
Scale (Image)
Input image scaling, maintains aspect ratio. This function is
primarily intended for images, but technically any 2D input data can
be processed if it makes sense. Scaling parameters are provided as an
option to the model converter tool.
There is no such Caffe layer by itself. This functionality is
technically part of the Caffe data provider.
I have, admittedly, a rather large network. It's based on a network from a paper that claims to use Caffe for the implementation. Here's the topology:
To the best of my ability, I've tried to recreate the model. The authors use the term "upconv" which is a combination of 2x2 unpooling followed by 5x5 convolution. I've taken this to mean a deconvolutional layer with stride 2 and kernel size 5 (please do correct me if you believe otherwise). Here's a short snippet from the full model and solver:
...
# upconv2
layer {
name: "upconv2"
type: "Deconvolution"
bottom: "upconv1rec"
top: "upconv2"
convolution_param {
num_output: 65536 # 256x16x16
kernel_size: 5
stride: 2
}
}
layer {
name: "upconv2-rec"
type: "ReLU"
bottom: "upconv2"
top: "upconv2rec"
relu_param {
negative_slope: 0.01
}
}
# upconv3
layer {
name: "upconv3"
type: "Deconvolution"
bottom: "upconv2rec"
top: "upconv3"
convolution_param {
num_output: 94208 # 92x32x32
kernel_size: 5
stride: 2
}
}
...
But it seems this is too large for Caffe to handle:
I0502 10:42:08.859184 13048 net.cpp:86] Creating Layer upconv3
I0502 10:42:08.859184 13048 net.cpp:408] upconv3 <- upconv2rec
I0502 10:42:08.859184 13048 net.cpp:382] upconv3 -> upconv3
F0502 10:42:08.859184 13048 blob.cpp:34] Check failed: shape[i] <= 2147483647 / count_ (94208 vs. 32767) blob size exceeds INT_MAX
How can I get around this limitation?
I am using caffe and it doesn't have a locally connected layer. So any example on how to use im2col layer, reshape layer and inner product layer to implement locally connected layer? Thanks
Personal View of Point:
I have also tried to use Crop, Im2col, Reshape and InnerProduct layer to implement locally connected layer but failed.
Because when I want to implement a convolution operation using InnerProduct layer, I find that in InnerProductLayer<Dtype>::Forward_cpu() function:
caffe_cpu_gemm<Dtype>(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans,
M_, N_, K_, (Dtype)1.,
bottom_data, weight, (Dtype)0., top_data);
and in BaseConvolutionLayer<Dtype>::forward_cpu_gemm() function:
caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /
group_, conv_out_spatial_dim_, kernel_dim_,
(Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,
(Dtype)0., output + output_offset_ * g);
the weight(s), which should be used as convolution kernels, are passed to different arguments of caffe_cpu_gemm().
So I can't implement a convolution operation using InnerProductLayer<Dtype>::Forward_cpu() function and thus can't implement a local connected layer(I mean local convolution here) using Crop, Im2col, Reshape and InnerProduct layers.
My solution:
However, I implemented a local convolution layer here and its idea is to divide input feature maps into N*N grid(even with overlap) and performs convolution on each of the grid using different kernels. For example, the input feature maps have a shape (2, 3, 8, 8) and you want to divide the spatial feature map 8*8 into 16 2*2 local regions and then perform convolution on each local region with different bank of kernels, you can write a prototxt like this:
layer {
name: "local_conv"
type: "LocalConvolution"
bottom: "bottom" # shape (2,3,8,8)
top: "top"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
local_conv_param {
local_region_number_h: 4
local_region_number_w: 4
local_region_ratio_h: 0.3 # determin the height/width of local regions
local_region_ratio_w: 0.3 # local_region_size = floor(local_region_ratio * input_size)
local_region_step_h: 2 # step between local regions on the top left part
# and other regions will lie in the axial symmetry positions
# automatically
local_region_step_w: 2
num_output: 5
kernel_h: 3
kernel_w: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
You can easily add this layer to your caffe and the related files are:
include/caffe/layers/local_conv_layer.hpp
src/caffe/layers/local_conv_layer.cpp(cu)
and you should also add message LocalConvolutionParameter, optional LocalConvolutionParameter local_conv_param from src/caffe/proto/caffe.proto to your caffe.proto.
.
I have a data with 10-d label vector, and I want to use a caffe model to make regression against these data with 10-d output. But now, I only want to check loss of some outputs (for example, 1, 3, 4, 5, 6-d of 10-d vector), so I define a layer with 5-d output at the bottom of the last output layer, But I'v no idea how to get corresponding 5-d label vector groundtruth, I think may be I can define a constant layer to indicate which entries I want get. Please help me if you have any ideas.
update: example
This is my original InnerProduct and Loss layer
layer {
name: "score"
type: "InnerProduct"
bottom: "fc7"
top: "score"
inner_product_param {
num_output: 10
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "score"
bottom: "label"
top: "loss"
include {
phase: TRAIN
}
}
I care more about $n_1$ (like 1,3,4,5,6) entries of the 10-dimension output and their loss, so I want to fetch the loss of these entries, like
layer {
name: "score1"
type: "InnerProduct"
bottom: "fc7"
top: "score1"
inner_product_param {
num_output: 5 # n_1
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "loss1"
type: "EuclideanLoss"
bottom: "score1"
bottom: "label"
top: "loss1"
include {
phase: TRAIN
}
}
How can I get score1 from score directly?
From what I interpret from your question, I think you would like to calculate the loss of the output layer w.r.t the labels for a regression model. But you would like to not bring some of the labels to the equation.
If my interpretation is true, being a regression model, I am expecting your new layer to be something similar to that of EuclidieanLayer. If it is so, the caffe_sub function in the layer could be replaced by the following code segment.
int arrayPos[5] = {1,3,4,5,6};
int count = 5;
Dtype *newBottom0=(Dtype*)malloc(sizeof(Dtype)*count);
Dtype *newBottom1=(Dtype*)malloc(sizeof(Dtype)*count);
for(int varI=0; varI<count; varI++)
{
newBottom0[varI] = (Dtype) bottom[0]->cpu_data()[arrayPos[varI]];
newBottom1[varI] = (Dtype) bottom[1]->cpu_data()[arrayPos[varI]];
}
caffe_sub( count, newBottom0, newBottom1, diff_.mutable_cpu_data());
free(newBottom0);
free(newBottom1);
I'd like to train a neural network (NN) on my own 1-dim data, which I stored in a hdf5 database for caffe. According to the documetation this should work. It also works for me as far as I only use "Fully Connected Layers", "Relu" and "Dropout". However I get an error when I try to use "Convolution" and "Max Pooling" layers in the NN architecture. The error complains about the input dimension of the data.
I0622 16:44:20.456007 9513 net.cpp:84] Creating Layer conv1
I0622 16:44:20.456015 9513 net.cpp:380] conv1 <- data
I0622 16:44:20.456048 9513 net.cpp:338] conv1 -> conv1
I0622 16:44:20.456061 9513 net.cpp:113] Setting up conv1
F0622 16:44:20.456487 9513 blob.cpp:28] Check failed: shape[i] >= 0 (-9 vs. 0)
This is the error when I only want to use a "Pooling" layer behind an "InnerProduct" layer:
I0622 16:52:44.328660 9585 net.cpp:338] pool1 -> pool1
I0622 16:52:44.328666 9585 net.cpp:113] Setting up pool1
F0622 16:52:44.328680 9585 pooling_layer.cpp:84] Check failed: 4 == bottom[0]->num_axes() (4 vs. 2) Input must have 4 axes, corresponding to (num, channels, height, width)
However I don't know how to change the input dimensions such that it works.
This is the beginning of my prototxt file specifying the network architecture:
name: "LeNet"
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/path/to/my/data/train.txt"
batch_size: 200
}
}
layer {
name: "myNet"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TEST
}
hdf5_data_param {
source: "/path/to/my/data/test.txt"
batch_size: 200
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 1
kernel_h: 11
kernel_w: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_h: 3
kernel_w: 1
stride: 2
}
}
And this is how I output my 4D-database (with two singleton dimensions) using Matlabs h5write function:
h5create('train.h5','/data',[dimFeats 1 1 numSamplesTrain]);
h5write('train.h5','/data', traindata);
You seem to be outputting your data using the wrong shape. Caffe blobs have the dimensions (n_samples, n_channels, height, width) .
Other than that your prototxt seems to be fine for doing predictions based on a 1D input.
As I have no experience in using the h5create and h5write in Matlab, I am not sure on whether the training dataset is generated with the dimensions that you expect it to generate.
The error msg for the convolution layer says that shape[i] = -9. This means that either the width, height, channels or number of images in a batch is being set to -9.
The error msg when using pooling layer alone says that the network could detect only an input of 2D while the network is expecting an input of 4D.
The error messages in both the layers are related to reshaping the blobs and this is a clear indication that the dimensions of the input are not as expected.
Try debugging the Reshape functions present in blob.cpp & layers/pooling_layer.cpp to get an insight on which value is actually going rogue.