After some reading about deconvolution in caffe, I am confused about the FCN's train.prototx here. The deconvolution layer's default weight filler is 'constant' and default value is zero. According to the deconvolution operation in caffe, doesn't all the output is zero in that input are multiplied by zero.
This model used pretrained parameters for initialization. You should use 'xavier' filler (mnist model):
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
You are absolutely right, the inference of a FCN initialised with Zero Deconv weights would be zero. You don't want that.
Initialising a deconv layer with weight_filler:{type:"bilinear"} would be appropriate. That would initialise the filter weights to a bilinear filer of required size.
Related
described as the title.
I know the regularization loss in pytorch usually defined through the defination of the optimizer (weight_decay):
torch.optim.SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=1e-5, nesterov=False)
how can I get the regularization loss value so that I can print it?
According to this answer, the regularization loss is never computed explicitly. So, what you need to do is calculate the loss on your own using the parameters. Something like
l2_loss = 0
for param in net.parameters() :
l2_loss += 0.5 * torch.sum(param ** 2)
I'm looking for the equivalent parameter lr_mult in Caffe prototxt file in Keras. I know we can freeze training using trainable=False in Keras, but what I'd like to do is not to set lr_mult: 0.0 (learn nothing), but like lr_mult: 0.1 (learn just a little bit) in some layers.
While defining prototxt in caffe, I found sometimes we use Softmax as the last layer type, sometimes we use SoftmaxWithLoss, I know the Softmax layer will return the probability the input data belongs to each class, but it seems that SoftmaxwithLoss will also return the class probability, then what's the difference between them? or did I misunderstand the usage of the two layer types?
While Softmax returns the probability of each target class given the model predictions, SoftmaxWithLoss not only applies the softmax operation to the predictions, but also computes the multinomial logistic loss, returned as output. This is fundamental for the training phase (without a loss there will be no gradient that can be used to update the network parameters).
See
SoftmaxWithLossLayer
and Caffe Loss
for more info.
I have bunch of questions about the way regularization and biased are working in caffe.
First, by default biased exist in the network, is it right?
Or, I need to ask caffe to add them?
Second, when it obtains the loss value, it does not consider the regularization. is it right? I mean the loss just contains the loss function value. As I understood, it just considers regularization in the gradient calculation. Is it right?
Third, when caffe obtains the gradient, does it consider the biased value in the regularization as well? Or does it just consider the weight of the network in the regularization?
Thanks in advance,
Afshin
For your 3 questions, my answer is:
Yes. Bias do exist in the network by default. For example, in the ConvolutionParameter and InnerProductParameter in caffe.proto, the bias_term's default value is true, which means the convolution/innerproduct layer in the network will has bias by default.
Yes. The loss value obtained by loss layer does not contain the value of regularization term. And it just consider the regularization after calling the function net_->ForwardBackward() and in fact in ApplyUpdate() function, where updating the network parameters happens.
Take a convolution layer in a network for example:
layer {
name: "SomeLayer"
type: "Convolution"
bottom: "data"
top: "conv"
#for weights
param {
lr_mult: 1
decay_mult: 1.0 #coefficient of regularization for weights
#default is 1.0, here is for the sake of clarity
}
#for bias
param {
lr_mult: 2
decay_mult: 1.0 #coefficient of regularization for bias
#default is 1.0, here is for the sake of clarity
}
... #left
}
The answer for this question is: when caffe obtains the gradient, the solver will consider the biased value in the regularization only if the 2 variables: the second decay_mult above and the weight_decay in the solver.prototxt are both larger than zero.
Details can be found in functoin void SGDSolver::Regularize().
Hope this will help you.
While calculating the dimensions of the SSD Object Detection pipeline, we found that for the layer named "pool3", with parameters:
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
the input dimensions are 75x75x256 (WxHxC)
and according to the formula: ( Wout = ( Win − kernel + 2*padding )/stride +1), the output dimension for Width comes out to be, (75-2)/2 = 37.5
However, the paper shows the output size at this point as 38, same is the output of the following code for this network
net.blobs['pool3'].shape
the answer seems simple that Caffe framework 'ceils' it but referring to this post and this one as well, it should be 'flooring' and answer should be 37
So can anyone suggest how Caffe treats these non-integral output sizes ?
There's something called padding. When the output feature map is not a whole number, the input feature map is padded with 0's. That's a standard procedure though it may not be explicitly mentioned.