Caffe: Why in Imagenet model activation dimension of fc6 is 4096? - caffe

I was looking at pool5 layer in the Imagenet trained model (ILSVRC challenge) whose output size is 256*6*6 (~9000), after which there is a fc6 layer whose num_outputs is 4096. Can anyone please explain how is 4096 chosen?

Related

Neural Network: For Binary Classification use 1 or 2 output neurons with VGG19

I have two groups of images (concrete cracks and uncracked concrete) so they are binary classification, I am making classification for them by using vgg19.
when I used (1) neuron for the output layer and using softmax I got accuracy 0.5 and fixed during 250 epochs, while when I used 2 neuron with softmax the accuaracy increaced above 0.9.
So, shall I have to use 1 or 2 neurons for the output for VGG19 with binary classification?

what is the number of layers in EfficientNetB2?

Knowing that the total number of layers in EfficientNet-B0 is 237 and in EfficientNet-B7 the total comes out to 813, what is the total number of layers in EfficientNetB2 ?
If you print len(model.layers) on EfficientNetB2 model with keras you will have 342 layers.
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB2
model = EfficientNetB2(weights='imagenet')
print(len(model.layers))
You can do this with all other versions of EfficientNetBx if you wish.
But as Pradyut said here normally not all layers are taken into account when we count them:
While counting the number of layers in a Neural Network we usually
only count convolutional layers and fully connected layers. Pooling
Layer is taken together with the Convolutional Layer and counted as
one layer and Dropout is a regularization technique so it will also
not be counted as a separate layer.
For reference, the VGG16 mode is defined as a 16 layer model. Those 16
layers are only the Convolutional Layers and Fully Connected Dense
Layers. If you count all the pooling and activation layers it will
change to a 41 layer model, which it is not. Reference: VGG16, VGG16
Paper
So as per your code you have 3 layers (1 Convolutional Layer with 28
Neurons, 1 Fully Connected Layer with 128 Neurons and 1 Fully
Connected Layer with 10 neurons)
As for making it a 10 layer network you can add more convolutional
layers or dense layers before the output layers, but it won't be
necessary for the MNIST dataset.
I hope I answered your question!

How to estimate the online training complexity of neural network?

suppose I am having a neural network with the following structure
Input layer:10 neuron
Hidden layer 1: 20 neuron with relu activation function
Batch normalization
Hidden layer 2: 30 neuron with relu activation function
Batch normalization
Hidden layer 3: 40 neuron with relu activation function
Batch normalization
Output layer : 4 neuron with logistic regression
Then how to calculate the big O complexity for online training of it ?
The training is assume to be back propagation
Thank you for your enthusiast
O(2^n) to converge to optimal solution.

Pytorch Convolutional Autoencoder output is blurry. How to improve it?

I created Convolutional Autoencoder using Pytorch and I'm trying to improve it.
For the encoding layer I use first 4 layers of pre-trained ResNet 18 model from torchvision.models.resnet.
I have mid-layer with just one Convolutional layer with input and output channel sizes of 512. For the decoding layer I use Convolutional layers following with BatchNorm and ReLU activation function.
The decoding layer reduces the channel each layer: 512 -> 256 -> 128 -> 64 -> 32 -> 16 -> 3 and increases the resolution of the image with interpolation to match the dimension of the corresponding layer in the encoding part. For the last layer I use sigmoid instead of ReLu.
All Convolutional layers are:
self.up = nn.Sequential(
nn.Conv2d(input_channels, output_channels,
kernel_size=5, stride=1,
padding=2, bias=False),
nn.BatchNorm2d(output_channels),
nn.ReLU()
)
The input images are scaled to [0, 1] range and have shapes 224x224x3. Sample outputs are (First is from training set, the second from the test set):
First image
First image output
Second image
Second image output
Any ideas why output is blurry? The provided model has been trained around 160 epochs with ~16000 images using Adam optimizer with lr=0.00005. I'm thinking about adding one more Convolutional layer in self.up given above. This will increase complexity of the model, but I'm not sure if it is the right way to improve the model.

How to create a CNN for image classification with dynamic input

I would like to create a fully convolution network for binary image classification in pytorch that can take dynamic input image sizes, but I don't quite understand conceptually the idea behind changing the final layer from a fully connected layer to a convolution layer. Here and here both state that this is possible by using a 1x1 convolution.
Suppose I have a 16x16x1 image as input to the CNN. After several convolutions, the output is a 16x16x32. If using a fully connected layer, I can produce a single value output by creating 16*16*32 weights and feeding it to a single neuron. What I don't understand is how you would get a single value output by applying a 1x1 convolution. Wouldn't you end up with 16x16x1 output?
Check this link: http://cs231n.github.io/convolutional-networks/#convert
In this case, your convolution layer should be a 16 x 16 filter with 1 output channel. This will convert the 16 x 16 x 32 input into a single output.
Sample code to test:
from keras.layers import Conv2D, Input
from keras.models import Model
import numpy as np
input = Input((16,16,32))
output = Conv2D(1, 16)(input)
model = Model(input, output)
print(model.summary()) # check the output shape
output = model.predict(np.zeros((1, 16, 16, 32))) # check on sample data
print(f'output is {np.squeeze(output)}')
This approach of Fully convolutional networks are useful in segmentation tasks using patch based approaches since you can speed up prediction(inference) by feeding a bigger portion of the image.
For classification tasks, you usually have a fc layer at the end. In that case, a layer like AdaptiveAvgPool2d is used which ensures the fc layer sees a constant input feature size irrespective of the input image size.
https://pytorch.org/docs/stable/nn.html#adaptiveavgpool2d
See this pull request for torchvision VGG: https://github.com/pytorch/vision/pull/747
In case of Keras, GlobalAveragePooling2D. See the example, "Fine-tune InceptionV3 on a new set of classes".
https://keras.io/applications/
I hope you are familier with keras. Now see your image is of 16*16*1. Image will pass to the keras convoloutional layer but first we have to create the model. like model=Sequential() by this we are able to get keras model instance. now we will give our convoloutional layer with our parameters like
model.add(Conv2D(20,(2,2),padding="same"))
now here we are adding 20 filters to our image. and our image becomes 16*16*20 now for more best features we add more conv layers like
model.add(Conv2D(32,(2,2),padding="same"))
now we add 32 filters to your image after this your image will be size of 16*16*32
dont forgot to put activation after conv layers. If you are new than you should study about activations, Optimization and loss of the network. these are the basic part of neural Networks.
Now its time to move towards fully connected layer. First we need to flatten our image because fully connected layer only works on 2d vectors (no_of_ex,image_dim) in your case
imgae diminsion after applying flattening will be (16*16*32)
model.add(Flatten())
after flatening our image your network will give it to fully connected layers
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
because you are having a problem of binary classification if you have to classify 3 classes than last layer will have 3 neuron if you have to classify 10 examples than your last dense layer willh have 10 neuron.
model.add(Activation("softmax"))
model.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
return model
after this you have to fit this model.
estimator=model()
estimator.fit(X_train,y_train)
full code:
def model (classes):
model=Sequential()
# conv2d set =====> Conv2d====>relu=====>MaxPooling
model.add(Conv2D(20,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(32,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
#now adding Softmax Classifer because we want to classify 10 class
model.add(Dense(classes))
model.add(Activation("softmax"))
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001, decay=1e-6),
metrics=['accuracy'])
return model
You can take help from this kernal