Why does the accuracy fluctuate widely after using batch normalization - deep-learning

I'm training a model which includes batch normalization layer, but i noticed that the accuracy can fluctuate widely (from 55% to 31% in just one epoch), both train accuracy and test accuracy, so i think it's not caused by overfitting.
This is my accuracy over epoch
This is joint graph
This is my model architecture
return nn.Sequential(
nn.Conv2d(3,64,kernel_size=7,stride=2,padding=3),
nn.BatchNorm2d(64,momentum=momentum),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3,stride=2,padding=1),
Residual(64, 64),
Residual(64, 64),
Residual(64, 128, use_1x1=True, stride=2),
Residual(128, 128),
Residual(128, 256, use_1x1=True, stride=2),
Residual(256, 256),
Residual(256, 512, use_1x1=True, stride=2),
Residual(512, 512),
nn.AdaptiveAvgPool2d((1, 1)),
nn.Flatten(),
nn.Linear(512, 176)
).to(device)
class Residual(nn.Module):
def __init__(self,input_channel,output_channel,use_1x1=False,stride=1):
super().__init__()
self.conv1=nn.Conv2d(input_channel,output_channel,kernel_size=3,padding=1,stride=stride)
self.conv2=nn.Conv2d(output_channel,output_channel,kernel_size=3,padding=1)
self.bn1=nn.BatchNorm2d(output_channel,momentum=momentum)
self.bn2=nn.BatchNorm2d(output_channel,momentum=momentum)
if use_1x1:
self.conv3=nn.Conv2d(input_channel,output_channel,kernel_size=1,stride=stride)
else:
self.conv3=None
def forward(self,X):
Y=F.relu(self.bn1(self.conv1(X)))
Y=self.bn2(self.conv2(Y))
if self.conv3 is not None:
X=self.conv3(X)
Y += X
return F.relu(Y)
But magically, if i don't call model.eval() in the accuracy evaluation function, which keeps the running_mean and running_var updating, the accuracy won't fluctuate
Furthermore, if i go through the training set after each epoch, as is shown in the code below
for epoch in range(epochs):
net.train()
for X, y in train_iter:
optimizer.zero_grad()
y_hat = net(X)
l = loss(y_hat, y)
l.backward()
optimizer.step()
for X,y in train_iter:
net(X)
eval_accuracy()
the accuracy doesn't fluctuate, too
I've tried to change the momentum, but it doesn't work
Now i'm totally confused, i don't have any idea why the accuracy fluctuates and why the method above works

Related

how to best visualize CNN architecture? (experience using "PlotNeuralNet")

I'm writing a thesis and want to present a visualisation of the CNN architecture used for the analysis (written in PyTorch). I came across this cool repository PlotNeuralNet with examples for how to generate LaTeX code for drawing neural networks for reports and presentation. However, I'm having trouble finding out how to exactly define my particular architecture.
Here is an example of how one would define an architecture.
import sys
sys.path.append('../')
from pycore.tikzeng import *
# define your arch
arch = \[
to_head( '..' ),
to_cor(),
to_begin(),
to_Conv("conv1", 512, 64, offset="(0,0,0)", to="(0,0,0)", height=64, depth=64, width=2 ),
to_Pool("pool1", offset="(0,0,0)", to="(conv1-east)"),
to_Conv("conv2", 128, 64, offset="(1,0,0)", to="(pool1-east)", height=32, depth=32, width=2 ),
to_connection( "pool1", "conv2"),
to_Pool("pool2", offset="(0,0,0)", to="(conv2-east)", height=28, depth=28, width=1),
to_SoftMax("soft1", 10 ,"(3,0,0)", "(pool1-east)", caption="SOFT" ),
to_connection("pool2", "soft1"),
to_Sum("sum1", offset="(1.5,0,0)", to="(soft1-east)", radius=2.5, opacity=0.6),
to_connection("soft1", "sum1"),
to_end()
\]
def main():
namefile = str(sys.argv[0]).split('.')[0]
to_generate(arch, namefile + '.tex' )
if __name__ == '__main__':
main()
However, looking at the different available blocks available in pycore module, I'm still not able to use the tool. Documentation for usage is not really that elaborate, so I was hoping someone here would find it trivial to define the architecture below. Else, any good ways to
class Net20(nn.Module):
""" CNN for 20-day Image
This particular model should have:
- 3 blocks
- 64 layers in first block, multiply by 2 each subsequent block
- filter size (5,3)
- vertical stride = 3 (but only in first layer)
- vertical dilation = 2 (but only in first layer)
- Leaky Relu activation function
- max pooling (2,1) at the end of each block
"""
def __init__(self):
super().__init__()
self.layer1 = nn.Sequential(
Conv2dSame(1, 64, kernel_size=(5,3), stride=(3,1), dilation=(2,1)),
nn.BatchNorm2d(64),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.layer2 = nn.Sequential(
Conv2dSame(64, 128, kernel_size=(5,3)),
nn.BatchNorm2d(128),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.layer3 = nn.Sequential(
Conv2dSame(128, 256, kernel_size=(5,3)),
nn.BatchNorm2d(256),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.fc1 = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(46080, 1),
)
def forward(self, x):
x = x.reshape(-1,1,64,60)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = x.reshape(-1,46080)
x = self.fc1(x)
return x
You can try model.summary() or keras.utils.plot_model. You may want to check: https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x3072 and 1024x512)

I'm trying to create a Pytorch Neural Network and keep getting this error
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x3072 and 1024x512)
Here is my code where I create the model:
# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(32*32, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
print(model)
An answer would be much appreciated
There can be two problems here:
Either the tensor is not being flattened which is possible in cases such as 64x1x3076 sized tensor.
The problem might be in code unrelated to your model but in the training loop of the code. Please add that part of the code as well.

Changing Learning Rate According to Layer Width in Pytroch

I am trying to train a network where the learning rate for each layer scales with 1/(layer width). Is there a way to do this in pytorch? I tried changing the learning rate in the optimizer and including it in my training loop but that didn't work. I've seen some people talk about this with Adam, but I am using SGD to train. Here are the chunks where I defined my model and training, if thats any help.
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2().to(device)
def train(network, number_of_epochs):
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=learning_rate)
for epoch in range(number_of_epochs): # loop over the dataset multiple times
running_loss = 0.0
for i, (inputs, labels) in enumerate(trainloader):
# get the inputs
inputs = inputs.to(device)
labels = labels.to(device)
outputs = network(inputs)
loss = criterion(outputs, labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = network(inputs)
loss.backward()
optimizer.step()
In the documentation you can see that you can specify "per-parameter options". Assuming you only want to specify the learning rate for the Conv2d layers (this is easily customizable in the code below) you could do something like this:
import torch
from torch import nn
from torch import optim
from pprint import pprint
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2()
def getParameters(model):
getWidthConv2D = lambda layer: layer.out_channels
parameters = []
for layer in model.children():
paramdict = {'params': layer.parameters()}
if (isinstance(layer, nn.Conv2d)):
paramdict['lr'] = getWidthConv2D(layer) * 0.1 # Specify learning rate for Conv2D here
parameters.append(paramdict)
return parameters
optimizer = optim.SGD(getParameters(net2.network), lr=0.05)
print(optimizer)
You can do that by passing the relevant parameters with associated learning rates.
optimizer = optim.SGD(
[
{"params": network.layer[0].parameters(), "lr": 1e-1},
{"params": network.layer[1].parameters(), "lr": 1e-2},
...
],
lr=1e-3,
)

Why is my loss not decreasing over training 10 epochs?

My hardware is a Ryzen 5000 series cpu with an nvidia rtx 3060 gpu. I'm currently working on a school assignment involving using a deep learning model (implemented in PyTorch) to predict COVID diagnosis from CT slice images. The dataset can be found at this url on GitHub: https://github.com/UCSD-AI4H/COVID-CT
I've written a custom dataset that takes the images from the dataset and resizes them to 224x224. I've also converted all rgba or grayscale images to rgb using skimage.color. Other transforms include random horizontal and vertical flipping, as well as ToTensor(). To evaluate the model I've used sklearn.metrics to compute the AUC, F1 score, and accuracy of the model.
My trouble is that I can't get the model to train. After 10 epochs the loss has not decreased. I've tried adjusting the learning rate of my optimizer but it hasn't helped. Any recommendations/thoughts would be greatly appreciated. Thanks!
class RONANet(nn.Module):
def __init__(self, classifier_type=None):
super(RONANet, self).__init__()
self.classifier_type = classifier_type
self.relu = nn.ReLU()
self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
self.classifier = self.compose_classifier()
self.conv_layers = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(32),
self.relu,
self.maxpool,
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(64),
self.relu,
self.maxpool,
nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(128),
self.relu,
self.maxpool,
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
self.relu,
self.maxpool,
nn.AdaptiveAvgPool2d(output_size=(1,1)),
)
def compose_classifier(self):
if 'fc' in self.classifier_type:
classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(14**2*256, 256),
self.relu,
nn.Linear(256, 128),
self.relu,
nn.Linear(128, 2))
elif 'conv'in self.classifier_type:
classifier = nn.Sequential(
nn.Conv2d(256, 1, kernel_size=1, stride=1))
return classifier
def forward(self, x):
features = self.conv_layers(x)
out = self.classifier(features)
if 'conv' in self.classifier_type:
out = out.reshape([-1,])
return out
RONANetv1 = RONANet(classifier_type='conv')
RONANetv1 = RONANetv1.cuda()
RONANetv2 = RONANet(classifier_type='fc')
RONANetv2 = RONANetv2.cuda()
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(RONANetv1.parameters(), lr=0.1)
num_epochs = 100
best_auc = 0.5 # set threshold to random model performance
scores = {}
for epoch in range(num_epochs):
RONANetv1.train()
print(f'Current Epoch: {epoch+1}')
epoch_loss = 0
for images, labels in train_dataloader:
batch_loss = 0
optimizer.zero_grad()
with torch.set_grad_enabled(True):
images = images.cuda()
labels = labels.cuda()
out = RONANetv1(images)
loss = criterion(out, labels)
batch_loss += loss.item()
loss.backward()
optimizer.step()
epoch_loss += batch_loss
print(f'Loss this epoch: {epoch_loss}\n')
current_val_auc, current_val_f1, current_val_acc = get_scores(RONANetv1, val_dataloader)
if current_val_auc > best_auc:
best_auc = current_val_auc
torch.save(RONANetv1.state_dict(), 'RONANetv1.pth')
scores['AUC'] = current_val_auc
scores['f1'] = current_val_f1
scores['Accuracy'] = current_val_acc
print(scores)
.
Output:
Current Epoch: 1
Loss this epoch: 38.038745045661926
{'AUC': 0.6632183908045978, 'f1': 0.0, 'Accuracy': 0.4915254237288136}
Current Epoch: 2
Loss this epoch: 37.96312761306763
Current Epoch: 3
Loss this epoch: 37.93656861782074
Current Epoch: 4
Loss this epoch: 38.045261442661285
Current Epoch: 5
Loss this epoch: 38.01626980304718
Current Epoch: 6
Loss this epoch: 37.93017905950546
Current Epoch: 7
Loss this epoch: 37.913547694683075
Current Epoch: 8
Loss this epoch: 38.049841582775116
Current Epoch: 9
Loss this epoch: 37.95650988817215
can you try with this learning rate
optimizer = torch.optim.Adam(RONANetv1.parameters(), lr=0.001)
and probably wait for atleast 25 epochs
So the issue is you're only training the first part of the classifier and not the second
# this
optimizer = torch.optim.Adam(RONANetv1.parameters(), lr=0.1)
# needs to become this
from itertools import chain
optimizer = torch.optim.Adam(chain(RONANetv1.parameters(), RONANetv2.parameters()))
and you need to incorportate the other cnn in training too
intermediate_out = RONANetv1(images)
out = RONANetv2(intermediate_out)
loss = criterion(out, labels)
batch_loss += loss.item()
loss.backward()
optimizer.step()
Hope that helps best of luck!

Pytorch model running out of memory on both CPU and GPU, can’t figure out what I’m doing wrong

Trying to implement a simple multi-label image classifier using Pytorch Lightning. Here's the model definition:
import torch
from torch import nn
# creates network class
class Net(pl.LightningModule):
def __init__(self):
super().__init__()
# defines conv layers
self.conv_layer_b1 = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32,
kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
)
# passes dummy x matrix to find the input size of the fc layer
x = torch.randn(1, 3, 800, 600)
self._to_linear = None
self.forward(x)
# defines fc layer
self.fc_layer = nn.Sequential(
nn.Linear(in_features=self._to_linear,
out_features=256),
nn.ReLU(),
nn.Linear(256, 5),
)
# defines accuracy metric
self.accuracy = pl.metrics.Accuracy()
self.confusion_matrix = pl.metrics.ConfusionMatrix(num_classes=5)
def forward(self, x):
x = self.conv_layer_b1(x)
if self._to_linear is None:
# does not run fc layer if input size is not determined yet
self._to_linear = x.shape[1]
else:
x = self.fc_layer(x)
return x
def cross_entropy_loss(self, logits, y):
criterion = nn.CrossEntropyLoss()
return criterion(logits, y)
def training_step(self, train_batch, batch_idx):
x, y = train_batch
logits = self.forward(x)
train_loss = self.cross_entropy_loss(logits, y)
train_acc = self.accuracy(logits, y)
train_cm = self.confusion_matrix(logits, y)
self.log('train_loss', train_loss)
self.log('train_acc', train_acc)
self.log('train_cm', train_cm)
return train_loss
def validation_step(self, val_batch, batch_idx):
x, y = val_batch
logits = self.forward(x)
val_loss = self.cross_entropy_loss(logits, y)
val_acc = self.accuracy(logits, y)
return {'val_loss': val_loss, 'val_acc': val_acc}
def validation_epoch_end(self, outputs):
avg_val_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
avg_val_acc = torch.stack([x['val_acc'] for x in outputs]).mean()
self.log("val_loss", avg_val_loss)
self.log("val_acc", avg_val_acc)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=0.0008)
return optimizer
The issue is probably not the machine since I'm using a cloud instance with 60 GBs of RAM and 12 GBs of VRAM. Whenever I run this model even for a single epoch, I get an out of memory error. On the CPU it looks like this:
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1966080000 bytes. Error code 12 (Cannot allocate memory)
and on the GPU it looks like this:
RuntimeError: CUDA out of memory. Tried to allocate 7.32 GiB (GPU 0; 11.17 GiB total capacity; 4.00 KiB already allocated; 2.56 GiB free; 2.00 MiB reserved in total by PyTorch)
Clearing the cache and reducing the batch size did not work. I'm a novice so clearly something here is exploding but I can't tell what. Any help would be appreciated.
Thank you!
Indeed, it's not a machine issue; the model itself is simply unreasonably big. Typically, if you take a look at common CNN models, the fc layers occur near the end, after the inputs already pass through quite a few convolutional blocks (and have their spatial resolutions reduced).
Assuming inputs are of shape (batch, 3, 800, 600), while passing the conv_layer_b1 layer, the feature map shape would be (batch, 32, 400, 300) after the MaxPool operation. After flattening, the inputs become (batch, 32 * 400 * 300), ie, (batch, 3840000).
The immediately following fc_layer thus contains nn.Linear(3840000, 256), which is simply absurd. This single linear layer contains ~983 million trainable parameters! For reference, popular image classification CNNs roughly have 3 to 30 million parameters on average, with larger variants reaching 60 to 80 million. Few ever really cross the 100 million mark.
You can count your model params with this:
def count_params(model):
return sum(map(lambda p: p.data.numel(), model.parameters()))
My advice: 800 x 600 is really a massive input size. Reduce it to something like 400 x 300, if possible. Furthermore, add several convolutional blocks similar to conv_layer_b1, before the FC layer. For example:
def get_conv_block(C_in, C_out):
return nn.Sequential(
nn.Conv2d(in_channels=C_in, out_channels=C_out,
kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
class Net(pl.LightningModule):
def __init__(self):
super().__init__()
# defines conv layers
self.conv_layer_b1 = get_conv_block(3, 16)
self.conv_layer_b2 = get_conv_block(16, 32)
self.conv_layer_b3 = get_conv_block(32, 64)
self.conv_layer_b4 = get_conv_block(64, 128)
self.conv_layer_b5 = get_conv_block(128, 256)
# passes dummy x matrix to find the input size of the fc layer
x = torch.randn(1, 3, 800, 600)
self._to_linear = None
self.forward(x)
# defines fc layer
self.fc_layer = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=self._to_linear,
out_features=256),
nn.ReLU(),
nn.Linear(256, 5)
)
# defines accuracy metric
self.accuracy = pl.metrics.Accuracy()
self.confusion_matrix = pl.metrics.ConfusionMatrix(num_classes=5)
def forward(self, x):
x = self.conv_layer_b1(x)
x = self.conv_layer_b2(x)
x = self.conv_layer_b3(x)
x = self.conv_layer_b4(x)
x = self.conv_layer_b5(x)
if self._to_linear is None:
# does not run fc layer if input size is not determined yet
self._to_linear = nn.Flatten()(x).shape[1]
else:
x = self.fc_layer(x)
return x
Here, because more conv-relu-pool layers are applied, the input is reduced to a feature map of a much smaller shape, (batch, 256, 25, 18), and the overall number of trainable parameters would be reduced to about ~30 million parameters.