I'm looking for the equivalent parameter lr_mult in Caffe prototxt file in Keras. I know we can freeze training using trainable=False in Keras, but what I'd like to do is not to set lr_mult: 0.0 (learn nothing), but like lr_mult: 0.1 (learn just a little bit) in some layers.
Related
I have two groups of images (concrete cracks and uncracked concrete) so they are binary classification, I am making classification for them by using vgg19.
when I used (1) neuron for the output layer and using softmax I got accuracy 0.5 and fixed during 250 epochs, while when I used 2 neuron with softmax the accuaracy increaced above 0.9.
So, shall I have to use 1 or 2 neurons for the output for VGG19 with binary classification?
Knowing that the total number of layers in EfficientNet-B0 is 237 and in EfficientNet-B7 the total comes out to 813, what is the total number of layers in EfficientNetB2 ?
If you print len(model.layers) on EfficientNetB2 model with keras you will have 342 layers.
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB2
model = EfficientNetB2(weights='imagenet')
print(len(model.layers))
You can do this with all other versions of EfficientNetBx if you wish.
But as Pradyut said here normally not all layers are taken into account when we count them:
While counting the number of layers in a Neural Network we usually
only count convolutional layers and fully connected layers. Pooling
Layer is taken together with the Convolutional Layer and counted as
one layer and Dropout is a regularization technique so it will also
not be counted as a separate layer.
For reference, the VGG16 mode is defined as a 16 layer model. Those 16
layers are only the Convolutional Layers and Fully Connected Dense
Layers. If you count all the pooling and activation layers it will
change to a 41 layer model, which it is not. Reference: VGG16, VGG16
Paper
So as per your code you have 3 layers (1 Convolutional Layer with 28
Neurons, 1 Fully Connected Layer with 128 Neurons and 1 Fully
Connected Layer with 10 neurons)
As for making it a 10 layer network you can add more convolutional
layers or dense layers before the output layers, but it won't be
necessary for the MNIST dataset.
I hope I answered your question!
I created Convolutional Autoencoder using Pytorch and I'm trying to improve it.
For the encoding layer I use first 4 layers of pre-trained ResNet 18 model from torchvision.models.resnet.
I have mid-layer with just one Convolutional layer with input and output channel sizes of 512. For the decoding layer I use Convolutional layers following with BatchNorm and ReLU activation function.
The decoding layer reduces the channel each layer: 512 -> 256 -> 128 -> 64 -> 32 -> 16 -> 3 and increases the resolution of the image with interpolation to match the dimension of the corresponding layer in the encoding part. For the last layer I use sigmoid instead of ReLu.
All Convolutional layers are:
self.up = nn.Sequential(
nn.Conv2d(input_channels, output_channels,
kernel_size=5, stride=1,
padding=2, bias=False),
nn.BatchNorm2d(output_channels),
nn.ReLU()
)
The input images are scaled to [0, 1] range and have shapes 224x224x3. Sample outputs are (First is from training set, the second from the test set):
First image
First image output
Second image
Second image output
Any ideas why output is blurry? The provided model has been trained around 160 epochs with ~16000 images using Adam optimizer with lr=0.00005. I'm thinking about adding one more Convolutional layer in self.up given above. This will increase complexity of the model, but I'm not sure if it is the right way to improve the model.
After some reading about deconvolution in caffe, I am confused about the FCN's train.prototx here. The deconvolution layer's default weight filler is 'constant' and default value is zero. According to the deconvolution operation in caffe, doesn't all the output is zero in that input are multiplied by zero.
This model used pretrained parameters for initialization. You should use 'xavier' filler (mnist model):
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
You are absolutely right, the inference of a FCN initialised with Zero Deconv weights would be zero. You don't want that.
Initialising a deconv layer with weight_filler:{type:"bilinear"} would be appropriate. That would initialise the filter weights to a bilinear filer of required size.
I have bunch of questions about the way regularization and biased are working in caffe.
First, by default biased exist in the network, is it right?
Or, I need to ask caffe to add them?
Second, when it obtains the loss value, it does not consider the regularization. is it right? I mean the loss just contains the loss function value. As I understood, it just considers regularization in the gradient calculation. Is it right?
Third, when caffe obtains the gradient, does it consider the biased value in the regularization as well? Or does it just consider the weight of the network in the regularization?
Thanks in advance,
Afshin
For your 3 questions, my answer is:
Yes. Bias do exist in the network by default. For example, in the ConvolutionParameter and InnerProductParameter in caffe.proto, the bias_term's default value is true, which means the convolution/innerproduct layer in the network will has bias by default.
Yes. The loss value obtained by loss layer does not contain the value of regularization term. And it just consider the regularization after calling the function net_->ForwardBackward() and in fact in ApplyUpdate() function, where updating the network parameters happens.
Take a convolution layer in a network for example:
layer {
name: "SomeLayer"
type: "Convolution"
bottom: "data"
top: "conv"
#for weights
param {
lr_mult: 1
decay_mult: 1.0 #coefficient of regularization for weights
#default is 1.0, here is for the sake of clarity
}
#for bias
param {
lr_mult: 2
decay_mult: 1.0 #coefficient of regularization for bias
#default is 1.0, here is for the sake of clarity
}
... #left
}
The answer for this question is: when caffe obtains the gradient, the solver will consider the biased value in the regularization only if the 2 variables: the second decay_mult above and the weight_decay in the solver.prototxt are both larger than zero.
Details can be found in functoin void SGDSolver::Regularize().
Hope this will help you.