Implementation Issue: Deep ConvNet for Pattern Recognition - deep-learning

I'm trying to implement a pattern recognition model using a fully convolutional network (fig 1 in https://www.sciencedirect.com/science/article/pii/S0031320318304370, I was able to get the full text without signing in or anything but if it's a problem I can attach a picture too!) but I'm getting a size error when moving from the final Conv2D layer to the first fc_layer.
Here is my error message:
RuntimeError: size mismatch, m1: [4 x 1024], m2: [4 x 1024] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283
Originally, as in the figure, my first linear layer was:
nn.Linear(4*4*512, 1024)
but after getting the size mismatch, I changed it to:
nn.Linear(4,1024)
Now, I have a strange error message as written above.
For reference (if it helps), here is my code:
import torch.nn as nn
import torch.utils.model_zoo as model_zoo
class convnet(nn.Module):
def __init__(self, num_classes=1000):
super(convnet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=3, stride=2, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.MaxPool2d(kernel_size=1),
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2),# stride=2),
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2), #stride=2),
nn.Conv2d(256, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True), #nn.Dropout(p=0.5)
)
self.classifier = nn.Sequential(
nn.Linear(4, 1024),
nn.Dropout(p=0.5),
nn.ReLU(inplace=True),
#nn.Dropout(p=0.5),
nn.Linear(1024, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, num_classes),
)
def forward(self, x):
x = self.features(x)
x = torch.flatten(x,1)
x = self.classifier(x)
return x
I suspect it's an issue with the padding and stride.
Thanks!

The error is from a matrix multiplication, where m1 should be an m x n matrix and m2 an n x p matrix and the result would be an m x p matrix. In your case it's 4 x 1024 and 4 x 1024, but that doesn't work since 1024 != 4.
That means your input to the first linear layer has size [4, 1024] (4 being the batch size), therefore the input features of the first linear layer should be 1024.
self.classifier = nn.Sequential(
nn.Linear(1024, 1024),
nn.Dropout(p=0.5),
nn.ReLU(inplace=True),
#nn.Dropout(p=0.5),
nn.Linear(1024, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, num_classes),
)
If you are uncertain how many features your input has, you can print out its size just before the layer:
x = self.features(x)
x = torch.flatten(x,1)
print(x.size()) # => torch.Size([4, 1024])
x = self.classifier(x)

Related

Why VGG16 model cannot be trained with its FC Layers

I am trying to train the VGG16 model code, but the loss is not optimized and seems that model's parameters are not updated.
here is the model :
import torch
import torch.nn as nn
import math
import torch.nn.functional as F
from utils import AvgPoolConv
cfg = {
'VGG11': [16, 'M', 32, 'M', 64, 64, 'M', 128, 128, 'M', 128, 128, 'M'],
'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],}
class VGG(nn.Module):
def __init__(self, vgg_name, use_bn, num_class=100):
super(VGG, self).__init__()
self.features = self._make_layers(cfg[vgg_name], use_bn)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(512,4096),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096,4096),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096, num_class)
)
#self.classifier = nn.Linear(512, num_class)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
n = m.weight.size(1)
m.weight.data.normal_(0, 1.0/float(n))
m.bias.data.zero_()
def forward(self, x):
out = self.features(x)
out = self.classifier(out)
return out
def _make_layers(self, cfg, use_bn=True):
layers = []
in_channels = 3
for x in cfg:
if x == 'M':
layers += [nn.AvgPool2d(2)]
#layers += [AvgPoolConv(kernel_size=2, stride=2, input_channel=in_channels)]
else:
layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
nn.BatchNorm2d(x) if use_bn else nn.Dropout(0.25),
nn.ReLU(inplace=True)]
in_channels = x
#layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
return nn.Sequential(*layers)
but if I delete the first 2 FC layers from the classifier as shown below, the model is trained and loss can be optimized ??
self.features = self._make_layers(cfg[vgg_name], use_bn)
self.classifier = nn.Linear(512, num_class)
Why this happens?
First, it would be good to verify if the parameters are really not updated or just that the change is small.
Different architectures might require different tuning (learning rate, weight decay if you use it etc.). A good thing to try when debugging is a test "can I overfit it"; use a single batch (or a single sample even) and check if you can get it to 0; you might need to tweak optimization parameters mentioned before.
Assuming everything is correct and the gradient flows, I'd say - tune the learning rate and try adding batch normalization between your linear and relu layers (should make the training much faster).

Changing Learning Rate According to Layer Width in Pytroch

I am trying to train a network where the learning rate for each layer scales with 1/(layer width). Is there a way to do this in pytorch? I tried changing the learning rate in the optimizer and including it in my training loop but that didn't work. I've seen some people talk about this with Adam, but I am using SGD to train. Here are the chunks where I defined my model and training, if thats any help.
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2().to(device)
def train(network, number_of_epochs):
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=learning_rate)
for epoch in range(number_of_epochs): # loop over the dataset multiple times
running_loss = 0.0
for i, (inputs, labels) in enumerate(trainloader):
# get the inputs
inputs = inputs.to(device)
labels = labels.to(device)
outputs = network(inputs)
loss = criterion(outputs, labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = network(inputs)
loss.backward()
optimizer.step()
In the documentation you can see that you can specify "per-parameter options". Assuming you only want to specify the learning rate for the Conv2d layers (this is easily customizable in the code below) you could do something like this:
import torch
from torch import nn
from torch import optim
from pprint import pprint
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2()
def getParameters(model):
getWidthConv2D = lambda layer: layer.out_channels
parameters = []
for layer in model.children():
paramdict = {'params': layer.parameters()}
if (isinstance(layer, nn.Conv2d)):
paramdict['lr'] = getWidthConv2D(layer) * 0.1 # Specify learning rate for Conv2D here
parameters.append(paramdict)
return parameters
optimizer = optim.SGD(getParameters(net2.network), lr=0.05)
print(optimizer)
You can do that by passing the relevant parameters with associated learning rates.
optimizer = optim.SGD(
[
{"params": network.layer[0].parameters(), "lr": 1e-1},
{"params": network.layer[1].parameters(), "lr": 1e-2},
...
],
lr=1e-3,
)

PyTorch: How to calculate output size of the CNN?

I went through this PyTorch CNN implementation available here: https://machinelearningknowledge.ai/pytorch-conv2d-explained-with-examples/
I am unable to understand how they replace the '?' with some value. What is the formula for calculating the CNN layer output?
This is essential to be calculated in PyTorch; not so in Tensorflow - Keras. If there is any other blog that explains this well, please drop it in the comments.
# Implementation of CNN/ConvNet Model
class CNN(torch.nn.Module):
def __init__(self):
super(CNN, self).__init__()
# L1 ImgIn shape=(?, 28, 28, 1)
# Conv -> (?, 28, 28, 32)
# Pool -> (?, 14, 14, 32)
self.layer1 = torch.nn.Sequential(
torch.nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Dropout(p=1 - keep_prob))
# L2 ImgIn shape=(?, 14, 14, 32)
# Conv ->(?, 14, 14, 64)
# Pool ->(?, 7, 7, 64)
self.layer2 = torch.nn.Sequential(
torch.nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Dropout(p=1 - keep_prob))
# L3 ImgIn shape=(?, 7, 7, 64)
# Conv ->(?, 7, 7, 128)
# Pool ->(?, 4, 4, 128)
self.layer3 = torch.nn.Sequential(
torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
torch.nn.Dropout(p=1 - keep_prob))
# L4 FC 4x4x128 inputs -> 625 outputs
self.fc1 = torch.nn.Linear(4 * 4 * 128, 625, bias=True)
torch.nn.init.xavier_uniform(self.fc1.weight)
self.layer4 = torch.nn.Sequential(
self.fc1,
torch.nn.ReLU(),
torch.nn.Dropout(p=1 - keep_prob))
# L5 Final FC 625 inputs -> 10 outputs
self.fc2 = torch.nn.Linear(625, 10, bias=True)
torch.nn.init.xavier_uniform_(self.fc2.weight) # initialize parameters
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
out = out.view(out.size(0), -1) # Flatten them for FC
out = self.fc1(out)
out = self.fc2(out)
return out
#instantiate CNN model
model = CNN()
model
Thanks!
I assume you calculation is wrong because:
Pytorch support images in format C * H * W (e.g. 3x32x32 not 32x32x3)
First dimension always batch dimension and must be omitted in calculation because, all nn.Modules handle it by default
So if you want calculate input size for first Linear layer, you can use this trick:
conv = nn.Sequential(self.layer1,self.layer2, self.layer3, nn.Flatten())
out = conv(torch.randn(1,im_height,im_width).unsqueeze(0))
# fc_layer_in_channels = out.shape[1]
self.fc1 = torch.nn.Linear(out.shape[1], 625, bias=True)
but only if you know im_height,im_width
The best practice is use torch.nn.AdaptiveAvgPool2d.
With this layer you always can get output of fixed spatial size.

RuntimeError: mat1 and mat2 shapes cannot be multiplied (25x340 and 360x1)

I get this error message and I'm not sure why. My input is (batch, 1, 312) from tabular data and this CNN is constructed for a regression prediction. I worked out the shapes for each step with the formula (input + 2*padding - filter size)/stride + 1 as in the comment below. The problem appears to occur at x = self.fc(x) and I can't figure out why. Your help is greatly appreciated. Thank you.
class CNNWeather(nn.Module):
# input (batch, 1, 312)
def __init__(self):
super(CNNWeather, self).__init__()
self.conv1 = nn.Conv1d(in_channels=1, out_channels=8, kernel_size=9, stride=1, padding='valid') # (312+2*0-9)/1 + 1 = 304
self.pool1 = nn.AvgPool1d(kernel_size=2, stride=2) # 304/2 = 302
self.conv2 = nn.Conv1d(in_channels=8, out_channels=12, kernel_size=3, stride=1, padding='valid') # (302-3)/1+1 = 300
self.pool2 = nn.AvgPool1d(kernel_size=2, stride=2) # 300/2 = 150
self.conv3 = nn.Conv1d(in_channels=12, out_channels=16, kernel_size=3, stride=1, padding='valid') # (150-3)/1+1 = 76
self.pool3 = nn.AvgPool1d(kernel_size=2, stride=2) # 76/2 = 38
self.conv4 = nn.Conv1d(in_channels=16, out_channels=20, kernel_size=3, stride=1, padding='valid') # (38-3)/1+1 = 36
self.pool4 = nn.AvgPool1d(kernel_size=2, stride=2) # 36/2 = 18 (batch, 20, 18)
self.fc = nn.Linear(in_features=20*18, out_features=1)
def forward(self, x):
x = self.pool1(F.relu(self.conv1(x)))
x = self.pool2(F.relu(self.conv2(x)))
x = self.pool3(F.relu(self.conv3(x)))
x = self.pool4(F.relu(self.conv4(x)))
print(x.size())
x = x.view(x.size(0), -1) # flatten (batch, 20*18)
x = self.fc(x)
return x
The problem seems to be related to the input size of your FC layer:
self.fc = nn.Linear(in_features=20*18, out_features=1)
The output of the previous layer is 340, so you must use in_features=340.
These are the shapes of the output for the third and fourth layers.
torch.Size([5, 16, 73]) conv3 out
torch.Size([5, 16, 36]) pool3 out
torch.Size([5, 20, 34]) conv4 out
torch.Size([5, 20, 17]) pool4 out
Notice that out of the "pool4" layer come 20x17, meaning 340 elements.

Does Output channel of conv2d to the next input shape of conv2d can be different?

In the below code, you can see in, first nn.Sequential/conv2d, it has 512 output, but it next nn.Sequential/Convtranspose it takes 1024 input channel. I wonder how it is possible? i didn't really get how this 512 output channel is fed into 1024 of the input channel of the next nn.Seqential.
self.face_decoder_blocks = nn.ModuleList([
nn.Sequential(Conv2d(512, 512, kernel_size=1, stride=1, padding=0),), #1, 1
nn.Sequential(Conv2dTranspose(1024, 512, kernel_size=3, stride=1, padding=0), # 3,3
Conv2d(512, 512, kernel_size=3, stride=1, padding=1, residual=True),),
nn.Sequential(Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1),
Conv2d(512, 512, kernel_size=3, stride=1, padding=1, residual=True),
Conv2d(512, 512, kernel_size=3, stride=1, padding=1, residual=True),)]
and inside-forward function,
x = face_sequences
for f in self.face_encoder_blocks:
x = f(x)
My Main question is stated above, please help me to understand this.
thanks in advance.