Knowing that the total number of layers in EfficientNet-B0 is 237 and in EfficientNet-B7 the total comes out to 813, what is the total number of layers in EfficientNetB2 ?
If you print len(model.layers) on EfficientNetB2 model with keras you will have 342 layers.
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB2
model = EfficientNetB2(weights='imagenet')
print(len(model.layers))
You can do this with all other versions of EfficientNetBx if you wish.
But as Pradyut said here normally not all layers are taken into account when we count them:
While counting the number of layers in a Neural Network we usually
only count convolutional layers and fully connected layers. Pooling
Layer is taken together with the Convolutional Layer and counted as
one layer and Dropout is a regularization technique so it will also
not be counted as a separate layer.
For reference, the VGG16 mode is defined as a 16 layer model. Those 16
layers are only the Convolutional Layers and Fully Connected Dense
Layers. If you count all the pooling and activation layers it will
change to a 41 layer model, which it is not. Reference: VGG16, VGG16
Paper
So as per your code you have 3 layers (1 Convolutional Layer with 28
Neurons, 1 Fully Connected Layer with 128 Neurons and 1 Fully
Connected Layer with 10 neurons)
As for making it a 10 layer network you can add more convolutional
layers or dense layers before the output layers, but it won't be
necessary for the MNIST dataset.
I hope I answered your question!
Related
I have two groups of images (concrete cracks and uncracked concrete) so they are binary classification, I am making classification for them by using vgg19.
when I used (1) neuron for the output layer and using softmax I got accuracy 0.5 and fixed during 250 epochs, while when I used 2 neuron with softmax the accuaracy increaced above 0.9.
So, shall I have to use 1 or 2 neurons for the output for VGG19 with binary classification?
Hello every one i'm reading pointnet papper but i cant understand the numbers of network architecture can you explain this to me.
enter image description here
It means 3 fully connected layers, having neurons in each layer as 64, 128 and 1024 respectively. Between each layer, there is batch normalization as well as RELU activation (implemented in the paper).
You can find my code here with the results:
https://github.com/shwe87/tfm-asr/blob/master/ASR-Spanish-Bi-RNN-17062020.ipynb
I tested two simple models for ASR in Spanish:
Model 1:
- Layer Normalization
- Bi-directional GRU
- Dropout
- Fully Connected layer
- Dropout
- Fully Connected layer as a classifier (classifies one of the alphabet chars)
Model 2:
- Conv Layer 1
- Conv Layer 2
- Fully Connected
- Dropout
- Bidirectional GRU
- Fully connected layer as a classifier
I tried with 30 epochs because I have limited resources of GPU.
The validation and training loss for these two models:
Model 1 performed not so good as expected.
Model 2 worked too well, after 20 epochs, it started overfitting (please see the graph in the notebook results) and in the output, I could actually see some words creating which seems like the labels. Although it is overfitting, it still needs training because it doesn't predict the total outcome. For start, I am happy with this model.
I tested a third complex Model.
You can find it here with the results output:
https://github.com/shwe87/tfm-asr/blob/master/ASR-DNN.ipynb
Model 3:
- Layer Normalization
- RELU
- Bidirectional GRU
- Dropout
- Stack this 10 times more.
The valid loss and training loss for this model:
I tested this on 30 epochs and there were no good results, actually, all the predictions were blank...
Is this because this complex model needs more epochs for training?
Update:
I modified the model by adding 2 convolutional layer before the stacked GRU and the model seems to have improved.
I see that in the first model and the third model I applied layer normalization and both's prediction seems to be very bad....Does layer normalization makes the learning delay? But according to papers like:
https://www.arxiv-vanity.com/papers/1607.06450/ layer normalization speeds up the training and also helps in speeding the training loss. So, I am really confused. I have limited resources of GPU and I am not sure if I should go for another try without layer normalization.......
I created Convolutional Autoencoder using Pytorch and I'm trying to improve it.
For the encoding layer I use first 4 layers of pre-trained ResNet 18 model from torchvision.models.resnet.
I have mid-layer with just one Convolutional layer with input and output channel sizes of 512. For the decoding layer I use Convolutional layers following with BatchNorm and ReLU activation function.
The decoding layer reduces the channel each layer: 512 -> 256 -> 128 -> 64 -> 32 -> 16 -> 3 and increases the resolution of the image with interpolation to match the dimension of the corresponding layer in the encoding part. For the last layer I use sigmoid instead of ReLu.
All Convolutional layers are:
self.up = nn.Sequential(
nn.Conv2d(input_channels, output_channels,
kernel_size=5, stride=1,
padding=2, bias=False),
nn.BatchNorm2d(output_channels),
nn.ReLU()
)
The input images are scaled to [0, 1] range and have shapes 224x224x3. Sample outputs are (First is from training set, the second from the test set):
First image
First image output
Second image
Second image output
Any ideas why output is blurry? The provided model has been trained around 160 epochs with ~16000 images using Adam optimizer with lr=0.00005. I'm thinking about adding one more Convolutional layer in self.up given above. This will increase complexity of the model, but I'm not sure if it is the right way to improve the model.
While reading a book Machine Learning: a Probabilistic Perspective by Murphy and this article by Mike O'Neill I have encountered some calculations about the number of weights in Convolutional Neural Network which I want to understand. The architecture of the network is like this:
And this is the explanation from the above article:
Layer #2 is also a convolutional layer, but with 50 feature maps. Each
feature map is 5x5, and each unit in the feature maps is a 5x5
convolutional kernel of corresponding areas of all 6 of the feature
maps of the previous layers, each of which is a 13x13 feature map.
There are therefore 5x5x50 = 1250 neurons in Layer #2, (5x5+1)x6x50 =
7800 weights, and 1250x26 = 32500 connections.
The calculation of the number of weights, (5x5+1)x6x50 = 7800, seems strange for me. Shouldn't be the actual calculation like this:
(5x5x6+1)x50 = 7550 according to the parameter sharing explained here.
My argument is as follows:
We have 50 filters of size 5x5x6 and 1 bias for each filter, hence the total number of weights is (5x5x6+1)x50=7550. And this is Pytorch code which verifies this:
import torch
import torch.nn as nn
model = nn.Conv2d(in_channels=6, out_channels=50, kernel_size=5, stride=2)
params_count = sum(param.numel() for param in model.parameters() if param.requires_grad)
print(params_count) # 7550
Can anyone explain this and which one is correct?
My calculations:
Layer-1 depth is 6, kernel : 5*5
Layer-2 depth is 50 , kernel : 5*5
Total number of neurons #Layer-2 : 5*5*50 = 1250
Total number of weights would be: 5*5*50*6 = 7500
Finally, bias for #Layer-2 = 50 (depth is 50)
I agree with you : Total weights must be 7550.