Why VGG16 model cannot be trained with its FC Layers

Why VGG16 model cannot be trained with its FC Layers - deep-learning

I am trying to train the VGG16 model code, but the loss is not optimized and seems that model's parameters are not updated.
here is the model :
import torch
import torch.nn as nn
import math
import torch.nn.functional as F
from utils import AvgPoolConv
cfg = {
'VGG11': [16, 'M', 32, 'M', 64, 64, 'M', 128, 128, 'M', 128, 128, 'M'],
'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],}
class VGG(nn.Module):
def __init__(self, vgg_name, use_bn, num_class=100):
super(VGG, self).__init__()
self.features = self._make_layers(cfg[vgg_name], use_bn)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(512,4096),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096,4096),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096, num_class)
)
#self.classifier = nn.Linear(512, num_class)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
n = m.weight.size(1)
m.weight.data.normal_(0, 1.0/float(n))
m.bias.data.zero_()
def forward(self, x):
out = self.features(x)
out = self.classifier(out)
return out
def _make_layers(self, cfg, use_bn=True):
layers = []
in_channels = 3
for x in cfg:
if x == 'M':
layers += [nn.AvgPool2d(2)]
#layers += [AvgPoolConv(kernel_size=2, stride=2, input_channel=in_channels)]
else:
layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
nn.BatchNorm2d(x) if use_bn else nn.Dropout(0.25),
nn.ReLU(inplace=True)]
in_channels = x
#layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
return nn.Sequential(*layers)
but if I delete the first 2 FC layers from the classifier as shown below, the model is trained and loss can be optimized ??
self.features = self._make_layers(cfg[vgg_name], use_bn)
self.classifier = nn.Linear(512, num_class)
Why this happens?

First, it would be good to verify if the parameters are really not updated or just that the change is small.
Different architectures might require different tuning (learning rate, weight decay if you use it etc.). A good thing to try when debugging is a test "can I overfit it"; use a single batch (or a single sample even) and check if you can get it to 0; you might need to tweak optimization parameters mentioned before.
Assuming everything is correct and the gradient flows, I'd say - tune the learning rate and try adding batch normalization between your linear and relu layers (should make the training much faster).

Related

Pytorch CNN Input for Guitar Tab CNN

I am trying to implement the architecture I have attached.
The output of my DataLoader has size: torch.Size([128, 192, 9, 1])
I am using a batch size of 128
View just reshapes the output of the dense layer
model = nn.Sequential(nn.Conv2d(192, 32, 3),
nn.ReLU(),
nn.Conv2d(32, 64, 3),
nn.ReLU(),
nn.Conv2d(64, 64, 3),
nn.MaxPool2d(2),
nn.Dropout(0.25),
nn.Flatten(),
nn.Linear(5952, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, 126),
View((6, 21)),
nn.Softmax(dim=1))
this is the architecture I currently have and I don't know if my input to the Conv2d are correct
I keep getting errors with my dimensions and kernel sizes. I am unsure how to proceed.
CNN Architecture

PyTorch: How to calculate output size of the CNN?

I went through this PyTorch CNN implementation available here: https://machinelearningknowledge.ai/pytorch-conv2d-explained-with-examples/
I am unable to understand how they replace the '?' with some value. What is the formula for calculating the CNN layer output?
This is essential to be calculated in PyTorch; not so in Tensorflow - Keras. If there is any other blog that explains this well, please drop it in the comments.
# Implementation of CNN/ConvNet Model
class CNN(torch.nn.Module):
def __init__(self):
super(CNN, self).__init__()
# L1 ImgIn shape=(?, 28, 28, 1)
# Conv -> (?, 28, 28, 32)
# Pool -> (?, 14, 14, 32)
self.layer1 = torch.nn.Sequential(
torch.nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Dropout(p=1 - keep_prob))
# L2 ImgIn shape=(?, 14, 14, 32)
# Conv ->(?, 14, 14, 64)
# Pool ->(?, 7, 7, 64)
self.layer2 = torch.nn.Sequential(
torch.nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Dropout(p=1 - keep_prob))
# L3 ImgIn shape=(?, 7, 7, 64)
# Conv ->(?, 7, 7, 128)
# Pool ->(?, 4, 4, 128)
self.layer3 = torch.nn.Sequential(
torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
torch.nn.Dropout(p=1 - keep_prob))
# L4 FC 4x4x128 inputs -> 625 outputs
self.fc1 = torch.nn.Linear(4 * 4 * 128, 625, bias=True)
torch.nn.init.xavier_uniform(self.fc1.weight)
self.layer4 = torch.nn.Sequential(
self.fc1,
torch.nn.ReLU(),
torch.nn.Dropout(p=1 - keep_prob))
# L5 Final FC 625 inputs -> 10 outputs
self.fc2 = torch.nn.Linear(625, 10, bias=True)
torch.nn.init.xavier_uniform_(self.fc2.weight) # initialize parameters
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.view(out.size(0), -1) # Flatten them for FC
out = self.fc1(out)
out = self.fc2(out)
return out
#instantiate CNN model
model = CNN()
model
Thanks!

I assume you calculation is wrong because:
Pytorch support images in format C * H * W (e.g. 3x32x32 not 32x32x3)
First dimension always batch dimension and must be omitted in calculation because, all nn.Modules handle it by default
So if you want calculate input size for first Linear layer, you can use this trick:
conv = nn.Sequential(self.layer1,self.layer2, self.layer3, nn.Flatten())
out = conv(torch.randn(1,im_height,im_width).unsqueeze(0))
# fc_layer_in_channels = out.shape[1]
self.fc1 = torch.nn.Linear(out.shape[1], 625, bias=True)
but only if you know im_height,im_width
The best practice is use torch.nn.AdaptiveAvgPool2d.
With this layer you always can get output of fixed spatial size.

Implementation Issue: Deep ConvNet for Pattern Recognition

I'm trying to implement a pattern recognition model using a fully convolutional network (fig 1 in https://www.sciencedirect.com/science/article/pii/S0031320318304370, I was able to get the full text without signing in or anything but if it's a problem I can attach a picture too!) but I'm getting a size error when moving from the final Conv2D layer to the first fc_layer.
Here is my error message:
RuntimeError: size mismatch, m1: [4 x 1024], m2: [4 x 1024] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283
Originally, as in the figure, my first linear layer was:
nn.Linear(4*4*512, 1024)
but after getting the size mismatch, I changed it to:
nn.Linear(4,1024)
Now, I have a strange error message as written above.
For reference (if it helps), here is my code:
import torch.nn as nn
import torch.utils.model_zoo as model_zoo
class convnet(nn.Module):
def __init__(self, num_classes=1000):
super(convnet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=3, stride=2, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.MaxPool2d(kernel_size=1),
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2),# stride=2),
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2), #stride=2),
nn.Conv2d(256, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True), #nn.Dropout(p=0.5)
)
self.classifier = nn.Sequential(
nn.Linear(4, 1024),
nn.Dropout(p=0.5),
nn.ReLU(inplace=True),
#nn.Dropout(p=0.5),
nn.Linear(1024, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, num_classes),
)
def forward(self, x):
x = self.features(x)
x = torch.flatten(x,1)
x = self.classifier(x)
return x
I suspect it's an issue with the padding and stride.
Thanks!

The error is from a matrix multiplication, where m1 should be an m x n matrix and m2 an n x p matrix and the result would be an m x p matrix. In your case it's 4 x 1024 and 4 x 1024, but that doesn't work since 1024 != 4.
That means your input to the first linear layer has size [4, 1024] (4 being the batch size), therefore the input features of the first linear layer should be 1024.
self.classifier = nn.Sequential(
nn.Linear(1024, 1024),
nn.Dropout(p=0.5),
nn.ReLU(inplace=True),
#nn.Dropout(p=0.5),
nn.Linear(1024, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, num_classes),
)
If you are uncertain how many features your input has, you can print out its size just before the layer:
x = self.features(x)
x = torch.flatten(x,1)
print(x.size()) # => torch.Size([4, 1024])
x = self.classifier(x)

Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))

I am using models.vgg16(pretrained=True) model for image classification, where number of classes = 3.
Batch size is 12 trainloader = torch.utils.data.DataLoader(train_data, batch_size=12, shuffle=True) since error says Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))
I have changed last fc layer parameters and got last FC layer as Linear(in_features=1000, out_features=3, bias=True)
Loss function is BCEWithLogitsLoss()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(vgg16.parameters(), lr=0.001, momentum=0.9)
Training code is
# zero the parameter gradients
optimizer.zero_grad()
outputs = vgg16(inputs) #----> forward pass
loss = criterion(outputs, labels) #----> compute loss #error occurs here
loss.backward() #----> backward pass
optimizer.step() #----> weights update
While computing loss, I get this error Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))
code is available at: code

Try to double check how you modified the linear layer. It seems that somehow the model does not forward pass through it.
Your model output have 1000 output size for each sample, while it should have 3. That's the reason you cannot evaluate the loss, since you try to compare 1000 classes to 3. You should have 3 outputs in your last layer, and that should work.
EDIT
From the code you shared here: link, I think there are two problems.
First, you modifed your model this way:
# Load the pretrained model from pytorch
vgg16 = models.vgg16(pretrained=True)
vgg16.classifier[6].in_features = 1000
vgg16.classifier[6].out_features = 3
while what you did here is to add a layer as an attribute to your network, you should also modify the forward() function of your model. Adding the layer as an attribute in the list doesn't apply the layer when forwardpassing the input.
Usually the way to do this properly is to define new class which inherits from the model you want to implement - class myvgg16(models.vgg16) or more generally class myvgg(nn.Module). You can find further explanation in the following link
If it fails, try to unsqueeze(1) your targets size (i.e. the lables variable). This is less likly to be the reason for the error but worth a try.
EDIT
Give another try of converting your target tensor to one hot vectors. And change the tensor type to Float as the BCELoss receives floats.

share the code of your model and it would be easy to debug. The problem is surely in your last fully connected layer. The size mismatch clearly says that you are getting 1000 features each for 12 images(batch size) but then you have 12 features to be compared with.
Clearly fully connected layer has the problem.
Use this and you will solve the problem-
vgg16 = models.vgg16(pretrained=True)
vgg16.classifier[6]= nn.Linear(4096, 3)
if __name__ == "__main__":
from torchsummary import summary
model = vgg16
model = model.cuda()
print(model)
summary(model, input_size = (3,120,120))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 120, 120] 1,792
ReLU-2 [-1, 64, 120, 120] 0
Conv2d-3 [-1, 64, 120, 120] 36,928
ReLU-4 [-1, 64, 120, 120] 0
MaxPool2d-5 [-1, 64, 60, 60] 0
Conv2d-6 [-1, 128, 60, 60] 73,856
ReLU-7 [-1, 128, 60, 60] 0
Conv2d-8 [-1, 128, 60, 60] 147,584
ReLU-9 [-1, 128, 60, 60] 0
MaxPool2d-10 [-1, 128, 30, 30] 0
Conv2d-11 [-1, 256, 30, 30] 295,168
ReLU-12 [-1, 256, 30, 30] 0
Conv2d-13 [-1, 256, 30, 30] 590,080
ReLU-14 [-1, 256, 30, 30] 0
Conv2d-15 [-1, 256, 30, 30] 590,080
ReLU-16 [-1, 256, 30, 30] 0
MaxPool2d-17 [-1, 256, 15, 15] 0
Conv2d-18 [-1, 512, 15, 15] 1,180,160
ReLU-19 [-1, 512, 15, 15] 0
Conv2d-20 [-1, 512, 15, 15] 2,359,808
ReLU-21 [-1, 512, 15, 15] 0
Conv2d-22 [-1, 512, 15, 15] 2,359,808
ReLU-23 [-1, 512, 15, 15] 0
MaxPool2d-24 [-1, 512, 7, 7] 0
Conv2d-25 [-1, 512, 7, 7] 2,359,808
ReLU-26 [-1, 512, 7, 7] 0
Conv2d-27 [-1, 512, 7, 7] 2,359,808
ReLU-28 [-1, 512, 7, 7] 0
Conv2d-29 [-1, 512, 7, 7] 2,359,808
ReLU-30 [-1, 512, 7, 7] 0
MaxPool2d-31 [-1, 512, 3, 3] 0
AdaptiveAvgPool2d-32 [-1, 512, 7, 7] 0
Linear-33 [-1, 4096] 102,764,544
ReLU-34 [-1, 4096] 0
Dropout-35 [-1, 4096] 0
Linear-36 [-1, 4096] 16,781,312
ReLU-37 [-1, 4096] 0
Dropout-38 [-1, 4096] 0
Linear-39 [-1, 3] 12,291
================================================================
Total params: 134,272,835
Trainable params: 134,272,835
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.16
Forward/backward pass size (MB): 62.84
Params size (MB): 512.21
Estimated Total Size (MB): 575.21

Reshaping image data in Keras to match CNN requirements

I've created a CNN designed to recognize objects.
from keras.preprocessing.image import img_to_array, load_img
img = load_img('newimage.jpg')
x = img_to_array(img)
x = x.reshape( (1,) + x.shape )
scores = model.predict(x, verbose=1)
print(scores)
However I'm getting:
expected convolution2d_input_1 to have shape (None, 3, 108, 192) but got array with shape (1, 3, 192, 108)
My model:
def create_model():
model = Sequential()
model.add(Convolution2D(32, 3, 3, input_shape=(3, img_width, img_height)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
return model
I've looked at related answers and the documentation, but at a loss as to how to reshape the array to match what's expected?

I guess the problem is with setting up the image width and height. As the error says:
expected convolution2d_input_1 to have shape (None, 3, 108, 192) # expected width = 108 and height = 192
but got array with shape (1, 3, 192, 108) # width = 192, height = 108
Update: I tested your code with a small change and it worked!
I am giving just changed lines:
img_width, img_height = 960, 717
model.add(Convolution2D(32, 3, 3, input_shape=(img_height, img_width, 3)))
This is the main change - input_shape=(img_height, img_width, 3)
The image i used to run this code was of width = 960 and height = 717. I have updated my previous answer as some part of the answer was wrong! Sorry for that.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why VGG16 model cannot be trained with its FC Layers - deep-learning

Related

Pytorch CNN Input for Guitar Tab CNN

PyTorch: How to calculate output size of the CNN?

Implementation Issue: Deep ConvNet for Pattern Recognition

Target size (torch.Size([12])) must be the same as input size (torch.Size([12, 1000]))

Reshaping image data in Keras to match CNN requirements

Categories

Resources