Changing Learning Rate According to Layer Width in Pytroch

Changing Learning Rate According to Layer Width in Pytroch - deep-learning

I am trying to train a network where the learning rate for each layer scales with 1/(layer width). Is there a way to do this in pytorch? I tried changing the learning rate in the optimizer and including it in my training loop but that didn't work. I've seen some people talk about this with Adam, but I am using SGD to train. Here are the chunks where I defined my model and training, if thats any help.
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2().to(device)
def train(network, number_of_epochs):
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=learning_rate)
for epoch in range(number_of_epochs): # loop over the dataset multiple times
running_loss = 0.0
for i, (inputs, labels) in enumerate(trainloader):
# get the inputs
inputs = inputs.to(device)
labels = labels.to(device)
outputs = network(inputs)
loss = criterion(outputs, labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = network(inputs)
loss.backward()
optimizer.step()

In the documentation you can see that you can specify "per-parameter options". Assuming you only want to specify the learning rate for the Conv2d layers (this is easily customizable in the code below) you could do something like this:
import torch
from torch import nn
from torch import optim
from pprint import pprint
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2()
def getParameters(model):
getWidthConv2D = lambda layer: layer.out_channels
parameters = []
for layer in model.children():
paramdict = {'params': layer.parameters()}
if (isinstance(layer, nn.Conv2d)):
paramdict['lr'] = getWidthConv2D(layer) * 0.1 # Specify learning rate for Conv2D here
parameters.append(paramdict)
return parameters
optimizer = optim.SGD(getParameters(net2.network), lr=0.05)
print(optimizer)

You can do that by passing the relevant parameters with associated learning rates.
optimizer = optim.SGD(
[
{"params": network.layer[0].parameters(), "lr": 1e-1},
{"params": network.layer[1].parameters(), "lr": 1e-2},
...
],
lr=1e-3,
)

Related

how to best visualize CNN architecture? (experience using "PlotNeuralNet")

I'm writing a thesis and want to present a visualisation of the CNN architecture used for the analysis (written in PyTorch). I came across this cool repository PlotNeuralNet with examples for how to generate LaTeX code for drawing neural networks for reports and presentation. However, I'm having trouble finding out how to exactly define my particular architecture.
Here is an example of how one would define an architecture.
import sys
sys.path.append('../')
from pycore.tikzeng import *
# define your arch
arch = \[
to_head( '..' ),
to_cor(),
to_begin(),
to_Conv("conv1", 512, 64, offset="(0,0,0)", to="(0,0,0)", height=64, depth=64, width=2 ),
to_Pool("pool1", offset="(0,0,0)", to="(conv1-east)"),
to_Conv("conv2", 128, 64, offset="(1,0,0)", to="(pool1-east)", height=32, depth=32, width=2 ),
to_connection( "pool1", "conv2"),
to_Pool("pool2", offset="(0,0,0)", to="(conv2-east)", height=28, depth=28, width=1),
to_SoftMax("soft1", 10 ,"(3,0,0)", "(pool1-east)", caption="SOFT" ),
to_connection("pool2", "soft1"),
to_Sum("sum1", offset="(1.5,0,0)", to="(soft1-east)", radius=2.5, opacity=0.6),
to_connection("soft1", "sum1"),
to_end()
\]
def main():
namefile = str(sys.argv[0]).split('.')[0]
to_generate(arch, namefile + '.tex' )
if __name__ == '__main__':
main()
However, looking at the different available blocks available in pycore module, I'm still not able to use the tool. Documentation for usage is not really that elaborate, so I was hoping someone here would find it trivial to define the architecture below. Else, any good ways to
class Net20(nn.Module):
""" CNN for 20-day Image
This particular model should have:
- 3 blocks
- 64 layers in first block, multiply by 2 each subsequent block
- filter size (5,3)
- vertical stride = 3 (but only in first layer)
- vertical dilation = 2 (but only in first layer)
- Leaky Relu activation function
- max pooling (2,1) at the end of each block
"""
def __init__(self):
super().__init__()
self.layer1 = nn.Sequential(
Conv2dSame(1, 64, kernel_size=(5,3), stride=(3,1), dilation=(2,1)),
nn.BatchNorm2d(64),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.layer2 = nn.Sequential(
Conv2dSame(64, 128, kernel_size=(5,3)),
nn.BatchNorm2d(128),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.layer3 = nn.Sequential(
Conv2dSame(128, 256, kernel_size=(5,3)),
nn.BatchNorm2d(256),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.fc1 = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(46080, 1),
)
def forward(self, x):
x = x.reshape(-1,1,64,60)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = x.reshape(-1,46080)
x = self.fc1(x)
return x

You can try model.summary() or keras.utils.plot_model. You may want to check: https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/

CrossEntropyLoss equivalence to LogSoftmax + NLLLoss

According to the docs, CrossEntropyLoss criterion combines LogSoftmax function and NLLLoss criterion.
That is all fine and well, but testing it doesn't seem to substantiate this claim (ie assertion fails):
model_nll = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
model_ce = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
loss_fn_ce = nn.CrossEntropyLoss()
loss_fn_nll = nn.NLLLoss()
t = torch.rand(1,3072)
target = torch.tensor([1])
with torch.no_grad():
loss_nll = loss_fn_nll(model_nll(t), target)
loss_ce = loss_fn_ce(model_ce(t), target)
assert torch.eq(loss_nll, loss_ce)
I'm obviously missing something basic here.

As you noticed, the weights are initialized randomly.
One way to get two modules sharing the same weights is to simply export with state_dict the state of one and set it on the other with load_state_dict.
This is a one-liner:
>>> model_ce.load_state_dict(model_nll.state_dict())

The following assertion passes:
model = nn.Sequential(
nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
)
loss_fn_nll = nn.NLLLoss()
loss_fn_ce = nn.CrossEntropyLoss()
t = torch.rand(1, 3072)
target = torch.tensor([1])
with torch.no_grad():
loss_nll = loss_fn_nll(nn.LogSoftmax(dim=1)(model(t)), target)
loss_ce = loss_fn_ce(model(t), target)
assert torch.eq(loss_nll, loss_ce)
I assume the weights are randomly generated in the two networks in the original question. Even with torch.manual_seed(0) it still doesn't equate.

import torch
import torch.nn as nn
model_nll = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
model_ce = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2)
)
model_nll.load_state_dict(dict(model_ce.named_parameters()))
loss_fn_ce = nn.CrossEntropyLoss()
loss_fn_nll = nn.NLLLoss()
t = torch.rand(1,3072)
target = torch.tensor([1])
with torch.no_grad():
loss_nll = loss_fn_nll(model_nll(t), target)
loss_ce = loss_fn_ce(model_ce(t), target)
print(loss_nll, loss_ce)
assert torch.eq(loss_nll, loss_ce)
Your code has two problems:
The weights must be identical in both models. Initialization is always random, therefore you must force them to be the same.
You should not add the LogSoftmax in the model_ce, internally in the CrossEntropyLoss it is already computed. This allows it to be numerically more stable: it simplifies the derivative and allows the log-sum-exp trick to be applied.

How can I duplicate the Resnet50 to five branches?

Below you see the network structure of the ResNet50. What I want to do is duplicate the last convolution layers to five branches for some spesific task, where each branch will consist of two FC layers. How can I do that in the Pytorch, where Resnet50 is already loaded as
ResNet50 = torchvision.models.resnet50(pretrained=True)

One way to accomplish this is to index the children of the resnet model and then attatch a sequential after that pair of conv blocks. One great implementation can be found here:
You can use this same principal to replace the vgg with your resnet.Play close attention to how they slice the model and then add a linear sequential.
class BCNN(nn.Module):
def __init__(self):
super(BCNN,self).__init__()
# Load pretrained model
vgg_model = models.vgg16_bn(pretrained=True)
self.Conv1 = nn.Sequential(*list(vgg_model.features.children())[0:7])
self.Conv2 = nn.Sequential(*list(vgg_model.features.children())[7:14])
# Level-1 classifier after second conv block
self.level_one_clf = nn.Sequential(nn.Linear(128*56*56, 256),
nn.ReLU(),
nn.BatchNorm1d(256),
nn.Dropout(0.5),
nn.Linear(256, 256),
nn.BatchNorm1d(256),
nn.Dropout(0.5),
nn.Linear(256, 2))
self.Conv3 = nn.Sequential(*list(vgg_model.features.children())[14:24])
# Level-2 classifier after third conv block
self.level_two_clf = nn.Sequential(nn.Linear(256*28*28, 1024),
nn.ReLU(),
nn.BatchNorm1d(1024),
nn.Dropout(0.5),
nn.Linear(1024, 1024),
nn.BatchNorm1d(1024),
nn.Dropout(0.5),
nn.Linear(1024, 7))
self.Conv4 = nn.Sequential(*list(vgg_model.features.children())[24:34])
self.Conv5 = nn.Sequential(*list(vgg_model.features.children())[34:44])
# Level-3 classifier after fifth conv block
self.level_three_clf = nn.Sequential(nn.Linear(512*7*7, 4096),
nn.ReLU(),
nn.BatchNorm1d(4096),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.BatchNorm1d(4096),
nn.Dropout(0.5),
nn.Linear(4096, 25))
def forward(self,x):
x = self.Conv1(x)
x = self.Conv2(x)
lvl_one = x.view(x.size(0), -1)
lvl_one = self.level_one_clf(lvl_one)
x = self.Conv3(x)
lvl_two = x.view(x.size(0), -1)
lvl_two = self.level_two_clf(lvl_two)
x = self.Conv4(x)
x = self.Conv5(x)
lvl_three = x.view(x.size(0), -1)
lvl_three = self.level_three_clf(lvl_three)
return lvl_one, lvl_two, lvl_three

Implement a Network in Network CNN model using pytorch-lightning

I am trying to implement a NiN model. Basically trying to replicate code from d2l Here is my code.
import pandas as pd
import torch
from torch import nn
import torchmetrics
from torchvision import transforms
from torch.utils.data import DataLoader, random_split
import pytorch_lightning as pl
from torchvision.datasets import FashionMNIST
import wandb
from pytorch_lightning.loggers import WandbLogger
wandb.login()
## class definition
class Lightning_nin(pl.LightningModule):
def __init__(self):
super().__init__()
self.accuracy = torchmetrics.Accuracy(top_k=1)
self.model = nn.Sequential(
self.nin_block(1, 96, kernel_size=11, strides=4, padding=0),
nn.MaxPool2d(3, stride=2),
self.nin_block(96, 256, kernel_size=5, strides=1, padding=2),
nn.MaxPool2d(3, stride=2),
self.nin_block(256, 384, kernel_size=3, strides=1, padding=1),
nn.MaxPool2d(3, stride=2), nn.Dropout(0.5),
# There are 10 label classes
self.nin_block(384, 10, kernel_size=3, strides=1, padding=1),
nn.AdaptiveAvgPool2d((1, 1)),
# Transform the four-dimensional output into two-dimensional output with a
# shape of (batch size, 10)
nn.Flatten())
for layer in self.model:
if type(layer) == nn.Linear or type(layer) == nn.Conv2d:
nn.init.xavier_uniform_(layer.weight)
def nin_block(self,in_channels, out_channels, kernel_size, strides, padding):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),
nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1),
nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1),
nn.ReLU())
def forward(self, x):
x = self.model(x)
return x
def loss_fn(self,logits,y):
loss = nn.CrossEntropyLoss()
return loss(logits,y)
def training_step(self,train_batch,batch_idx):
X, y = train_batch
logits = self.forward(X)
loss = self.loss_fn(logits,y)
self.log('train_loss',loss)
m = nn.Softmax(dim=1)
output = m(logits)
self.log('train_acc',self.accuracy(output,y))
return loss
def validation_step(self,val_batch,batch_idx):
X,y = val_batch
logits = self.forward(X)
loss = self.loss_fn(logits,y)
self.log('test_loss',loss)
m = nn.Softmax(dim=1)
output = m(logits)
self.log('test_acc',self.accuracy(output,y))
def configure_optimizers(self):
optimizer = torch.optim.SGD(self.model.parameters(),lr= 0.1)
return optimizer
class Light_DataModule(pl.LightningDataModule):
def __init__(self,resize= None):
super().__init__()
if resize:
self.resize = resize
def setup(self, stage):
# transforms for images
trans = [transforms.ToTensor()]
if self.resize:
trans.insert(0, transforms.Resize(self.resize))
trans = transforms.Compose(trans)
# prepare transforms standard to MNIST
self.mnist_train = FashionMNIST(root="../data", train=True, download=True, transform=trans)
self.mnist_test = FashionMNIST(root="../data", train=False, download=True, transform=trans)
def train_dataloader(self):
return DataLoader(self.mnist_train, batch_size=128,shuffle=True,num_workers=4)
def val_dataloader(self):
return DataLoader(self.mnist_test, batch_size=128,num_workers=4)
## Train model
data_module = Light_DataModule(resize=224)
wandb_logger = WandbLogger(project="d2l",name ='NIN')
model = Lightning_nin()
trainer = pl.Trainer(logger=wandb_logger,max_epochs=4,gpus=1,progress_bar_refresh_rate =1)
trainer.fit(model, data_module)
wandb.finish()
After running the code I am only getting an accuracy of 0.1. Not sure where I am going wrong. I have been able to implement other CNN (like VGG) using the same template. Not sure where I am going wrong. The accuracy should be close to 0.9 after 10 epochs.

The kernel_size & strides are very big for the image size of 224. It will drastically reduce the information that is passed on to subsequent layers. Try reducing them. Also, VGG was a very carefully designed architecture.

Neural Network cannot overfit even one sample

I am using neural network for a regression task.
My input is an gray image whose size is 100x70x1.
The gray area has a unique value 60.
The input will go through a preprocessing layer, which multiply 1./255 on every pixel value.
My output is just three double number: [0.87077969, 0.98989031, 0.98888382]
I used ResNet152 model as shown below:
class Bottleneck(tf.keras.Model):
expansion = 4
def __init__(self, in_channels, out_channels, strides=1):
super(Bottleneck, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(out_channels, 1, 1, use_bias=False)
self.bn1 = tf.keras.layers.BatchNormalization()
self.conv2 = tf.keras.layers.Conv2D(out_channels, 3, strides, padding="same", use_bias=False)
self.bn2 = tf.keras.layers.BatchNormalization()
self.conv3 = tf.keras.layers.Conv2D(out_channels*self.expansion, 1, 1, use_bias=False)
self.bn3 = tf.keras.layers.BatchNormalization()
if strides != 1 or in_channels != self.expansion * out_channels:
self.shortcut = tf.keras.Sequential([
tf.keras.layers.Conv2D(self.expansion*out_channels, kernel_size=1,
strides=strides, use_bias=False),
tf.keras.layers.BatchNormalization()]
)
else:
self.shortcut = lambda x,_: x
def call(self, x, training=False):
out = tf.nn.elu(self.bn1(self.conv1(x), training))
out = tf.nn.elu(self.bn2(self.conv2(out), training))
out = self.bn3(self.conv3(out), training)
out += self.shortcut(x, training)
return tf.nn.elu(out)
class ResNet(tf.keras.Model):
def __init__(self, block, num_blocks):
super(ResNet, self).__init__()
self.in_channels = 64
self.conv1 = tf.keras.layers.Conv2D(64, 7, 2, padding="same", use_bias=False) # 60x60
self.bn1 = tf.keras.layers.BatchNormalization()
self.pool1 = tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same') # 30x30
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
self.avg_pool2d = tf.keras.layers.GlobalAveragePooling2D()
self.flatten = tf.keras.layers.Flatten()
def _make_layer(self, block, out_channels, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(block(self.in_channels, out_channels, stride))
self.in_channels = out_channels * block.expansion
return tf.keras.Sequential(layers)
def call(self, x, training=False):
out = self.pool1(tf.nn.elu(self.bn1(self.conv1(x), training)))
out = self.layer1(out, training=training)
out = self.layer2(out, training=training)
out = self.layer3(out, training=training)
out = self.layer4(out, training=training)
# For classification
out = self.flatten(out)
# out = tf.keras.layers.Reshape((out.shape[-1],))(out)
#out = self.linear(out)
return out
def model(self):
x = tf.keras.layers.Input(shape=(100,70,1))
return tf.keras.Model(inputs=[x], outputs=self.call(x))
def ResNet152():
return ResNet(Bottleneck, [3,8,36,3])
I used elu as activation function and changed the GlobalAveragePooling layer into flatten layer at the end of ResNet.
Before output I stack two Dense layer(2048 units and 3 units) on top of the ResNet model.
For training I used adam optimizer and inital learning rate is 1e-4, which will decreasing by factor 10 when the val_loss not decreasing for 3 epoch.
The loss is just mse error.
After early stopping while learning rate is 1e-8, the mse loss is still very high:8.6225
The prediction is [2.92318237, 5.53124916, 3.00686643] which is far away from the ground truth: [0.87077969, 0.98989031, 0.98888382]
I don't know why such a deep network cannot overfit such a sample.
Is this the reason that my input image has too few information? Could someone help me?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Changing Learning Rate According to Layer Width in Pytroch - deep-learning

You can do that by passing the relevant parameters with associated learning rates. optimizer = optim.SGD( [ {"params": network.layer[0].parameters(), "lr": 1e-1}, {"params": network.layer[1].parameters(), "lr": 1e-2}, ... ], lr=1e-3, )

Related

how to best visualize CNN architecture? (experience using "PlotNeuralNet")

CrossEntropyLoss equivalence to LogSoftmax + NLLLoss

How can I duplicate the Resnet50 to five branches?

Implement a Network in Network CNN model using pytorch-lightning

Neural Network cannot overfit even one sample

Categories

Resources