CrossEntropyLoss equivalence to LogSoftmax + NLLLoss - deep-learning

According to the docs, CrossEntropyLoss criterion combines LogSoftmax function and NLLLoss criterion.
That is all fine and well, but testing it doesn't seem to substantiate this claim (ie assertion fails):
model_nll = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
model_ce = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
loss_fn_ce = nn.CrossEntropyLoss()
loss_fn_nll = nn.NLLLoss()
t = torch.rand(1,3072)
target = torch.tensor([1])
with torch.no_grad():
loss_nll = loss_fn_nll(model_nll(t), target)
loss_ce = loss_fn_ce(model_ce(t), target)
assert torch.eq(loss_nll, loss_ce)
I'm obviously missing something basic here.

As you noticed, the weights are initialized randomly.
One way to get two modules sharing the same weights is to simply export with state_dict the state of one and set it on the other with load_state_dict.
This is a one-liner:
>>> model_ce.load_state_dict(model_nll.state_dict())

The following assertion passes:
model = nn.Sequential(
nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
)
loss_fn_nll = nn.NLLLoss()
loss_fn_ce = nn.CrossEntropyLoss()
t = torch.rand(1, 3072)
target = torch.tensor([1])
with torch.no_grad():
loss_nll = loss_fn_nll(nn.LogSoftmax(dim=1)(model(t)), target)
loss_ce = loss_fn_ce(model(t), target)
assert torch.eq(loss_nll, loss_ce)
I assume the weights are randomly generated in the two networks in the original question. Even with torch.manual_seed(0) it still doesn't equate.

import torch
import torch.nn as nn
model_nll = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))
model_ce = nn.Sequential(nn.Linear(3072, 1024),
nn.Tanh(),
nn.Linear(1024, 512),
nn.Tanh(),
nn.Linear(512, 128),
nn.Tanh(),
nn.Linear(128, 2)
)
model_nll.load_state_dict(dict(model_ce.named_parameters()))
loss_fn_ce = nn.CrossEntropyLoss()
loss_fn_nll = nn.NLLLoss()
t = torch.rand(1,3072)
target = torch.tensor([1])
with torch.no_grad():
loss_nll = loss_fn_nll(model_nll(t), target)
loss_ce = loss_fn_ce(model_ce(t), target)
print(loss_nll, loss_ce)
assert torch.eq(loss_nll, loss_ce)
Your code has two problems:
The weights must be identical in both models. Initialization is always random, therefore you must force them to be the same.
You should not add the LogSoftmax in the model_ce, internally in the CrossEntropyLoss it is already computed. This allows it to be numerically more stable: it simplifies the derivative and allows the log-sum-exp trick to be applied.

Related

how to best visualize CNN architecture? (experience using "PlotNeuralNet")

I'm writing a thesis and want to present a visualisation of the CNN architecture used for the analysis (written in PyTorch). I came across this cool repository PlotNeuralNet with examples for how to generate LaTeX code for drawing neural networks for reports and presentation. However, I'm having trouble finding out how to exactly define my particular architecture.
Here is an example of how one would define an architecture.
import sys
sys.path.append('../')
from pycore.tikzeng import *
# define your arch
arch = \[
to_head( '..' ),
to_cor(),
to_begin(),
to_Conv("conv1", 512, 64, offset="(0,0,0)", to="(0,0,0)", height=64, depth=64, width=2 ),
to_Pool("pool1", offset="(0,0,0)", to="(conv1-east)"),
to_Conv("conv2", 128, 64, offset="(1,0,0)", to="(pool1-east)", height=32, depth=32, width=2 ),
to_connection( "pool1", "conv2"),
to_Pool("pool2", offset="(0,0,0)", to="(conv2-east)", height=28, depth=28, width=1),
to_SoftMax("soft1", 10 ,"(3,0,0)", "(pool1-east)", caption="SOFT" ),
to_connection("pool2", "soft1"),
to_Sum("sum1", offset="(1.5,0,0)", to="(soft1-east)", radius=2.5, opacity=0.6),
to_connection("soft1", "sum1"),
to_end()
\]
def main():
namefile = str(sys.argv[0]).split('.')[0]
to_generate(arch, namefile + '.tex' )
if __name__ == '__main__':
main()
However, looking at the different available blocks available in pycore module, I'm still not able to use the tool. Documentation for usage is not really that elaborate, so I was hoping someone here would find it trivial to define the architecture below. Else, any good ways to
class Net20(nn.Module):
""" CNN for 20-day Image
This particular model should have:
- 3 blocks
- 64 layers in first block, multiply by 2 each subsequent block
- filter size (5,3)
- vertical stride = 3 (but only in first layer)
- vertical dilation = 2 (but only in first layer)
- Leaky Relu activation function
- max pooling (2,1) at the end of each block
"""
def __init__(self):
super().__init__()
self.layer1 = nn.Sequential(
Conv2dSame(1, 64, kernel_size=(5,3), stride=(3,1), dilation=(2,1)),
nn.BatchNorm2d(64),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.layer2 = nn.Sequential(
Conv2dSame(64, 128, kernel_size=(5,3)),
nn.BatchNorm2d(128),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.layer3 = nn.Sequential(
Conv2dSame(128, 256, kernel_size=(5,3)),
nn.BatchNorm2d(256),
nn.LeakyReLU(negative_slope=0.01, inplace=True),
nn.MaxPool2d((2, 1), ceil_mode=True)
)
self.fc1 = nn.Sequential(
nn.Dropout(p=0.5),
nn.Linear(46080, 1),
)
def forward(self, x):
x = x.reshape(-1,1,64,60)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = x.reshape(-1,46080)
x = self.fc1(x)
return x
You can try model.summary() or keras.utils.plot_model. You may want to check: https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/

Changing Learning Rate According to Layer Width in Pytroch

I am trying to train a network where the learning rate for each layer scales with 1/(layer width). Is there a way to do this in pytorch? I tried changing the learning rate in the optimizer and including it in my training loop but that didn't work. I've seen some people talk about this with Adam, but I am using SGD to train. Here are the chunks where I defined my model and training, if thats any help.
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2().to(device)
def train(network, number_of_epochs):
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=learning_rate)
for epoch in range(number_of_epochs): # loop over the dataset multiple times
running_loss = 0.0
for i, (inputs, labels) in enumerate(trainloader):
# get the inputs
inputs = inputs.to(device)
labels = labels.to(device)
outputs = network(inputs)
loss = criterion(outputs, labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = network(inputs)
loss.backward()
optimizer.step()
In the documentation you can see that you can specify "per-parameter options". Assuming you only want to specify the learning rate for the Conv2d layers (this is easily customizable in the code below) you could do something like this:
import torch
from torch import nn
from torch import optim
from pprint import pprint
class ConvNet2(nn.Module):
def __init__(self):
super(ConvNet2, self).__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 8, 3),
nn.ReLU(),
nn.Conv2d(8,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 32, 3),
nn.ReLU(),
nn.Conv2d(32,32, 3),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Flatten(),
nn.Linear(800, 10)
)
def forward(self, x):
return self.network(x)
net2 = ConvNet2()
def getParameters(model):
getWidthConv2D = lambda layer: layer.out_channels
parameters = []
for layer in model.children():
paramdict = {'params': layer.parameters()}
if (isinstance(layer, nn.Conv2d)):
paramdict['lr'] = getWidthConv2D(layer) * 0.1 # Specify learning rate for Conv2D here
parameters.append(paramdict)
return parameters
optimizer = optim.SGD(getParameters(net2.network), lr=0.05)
print(optimizer)
You can do that by passing the relevant parameters with associated learning rates.
optimizer = optim.SGD(
[
{"params": network.layer[0].parameters(), "lr": 1e-1},
{"params": network.layer[1].parameters(), "lr": 1e-2},
...
],
lr=1e-3,
)

How can I duplicate the Resnet50 to five branches?

Below you see the network structure of the ResNet50. What I want to do is duplicate the last convolution layers to five branches for some spesific task, where each branch will consist of two FC layers. How can I do that in the Pytorch, where Resnet50 is already loaded as
ResNet50 = torchvision.models.resnet50(pretrained=True)
One way to accomplish this is to index the children of the resnet model and then attatch a sequential after that pair of conv blocks. One great implementation can be found here:
You can use this same principal to replace the vgg with your resnet.Play close attention to how they slice the model and then add a linear sequential.
class BCNN(nn.Module):
def __init__(self):
super(BCNN,self).__init__()
# Load pretrained model
vgg_model = models.vgg16_bn(pretrained=True)
self.Conv1 = nn.Sequential(*list(vgg_model.features.children())[0:7])
self.Conv2 = nn.Sequential(*list(vgg_model.features.children())[7:14])
# Level-1 classifier after second conv block
self.level_one_clf = nn.Sequential(nn.Linear(128*56*56, 256),
nn.ReLU(),
nn.BatchNorm1d(256),
nn.Dropout(0.5),
nn.Linear(256, 256),
nn.BatchNorm1d(256),
nn.Dropout(0.5),
nn.Linear(256, 2))
self.Conv3 = nn.Sequential(*list(vgg_model.features.children())[14:24])
# Level-2 classifier after third conv block
self.level_two_clf = nn.Sequential(nn.Linear(256*28*28, 1024),
nn.ReLU(),
nn.BatchNorm1d(1024),
nn.Dropout(0.5),
nn.Linear(1024, 1024),
nn.BatchNorm1d(1024),
nn.Dropout(0.5),
nn.Linear(1024, 7))
self.Conv4 = nn.Sequential(*list(vgg_model.features.children())[24:34])
self.Conv5 = nn.Sequential(*list(vgg_model.features.children())[34:44])
# Level-3 classifier after fifth conv block
self.level_three_clf = nn.Sequential(nn.Linear(512*7*7, 4096),
nn.ReLU(),
nn.BatchNorm1d(4096),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.BatchNorm1d(4096),
nn.Dropout(0.5),
nn.Linear(4096, 25))
def forward(self,x):
x = self.Conv1(x)
x = self.Conv2(x)
lvl_one = x.view(x.size(0), -1)
lvl_one = self.level_one_clf(lvl_one)
x = self.Conv3(x)
lvl_two = x.view(x.size(0), -1)
lvl_two = self.level_two_clf(lvl_two)
x = self.Conv4(x)
x = self.Conv5(x)
lvl_three = x.view(x.size(0), -1)
lvl_three = self.level_three_clf(lvl_three)
return lvl_one, lvl_two, lvl_three

Tensorflow CNN shape mismatch

def load_data(data_path, batch_size, num_workers=2):
t_m = transforms.Compose(
[transforms.Grayscale(num_output_channels=1),
transforms.Resize((400,400)),
transforms.ToTensor(),
# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
dataset = torchvision.datasets.ImageFolder(root = data_path, transform=t_m)
# print (np.shape(dataset))
#split
train, test = torch.utils.data.random_split(dataset, [int( len(dataset) * 0.7 ), len(dataset) - int( len(dataset) * 0.7 ) ])
trainloader = torch.utils.data.DataLoader(train, batch_size=batch_size,
shuffle=True, num_workers=num_workers,drop_last = True)
testloader = torch.utils.data.DataLoader(test, batch_size=batch_size,
shuffle=False, num_workers=num_workers, drop_last = False)
return dataset,trainloader,testloader
import torch.nn as nn
model = torch.nn.Sequential(
nn.Conv2d(1, 32, kernel_size=5, padding=2),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, kernel_size=5, padding=2),
nn.MaxPool2d(2, 2),
nn.Linear ( 7 * 7 * 64, 1000),
nn.Linear(1000, 600),
nn.Linear(600, 200),
nn.Linear(200, 10)
)
#Training
total_epochs = 5
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adadelta(model.parameters())
for epoch in tqdm(range(total_epochs)):
#initialize
batch_count = 0
gc.collect()
loop_loss = 0.0
for img in (trainloader):
input_, label_ = img
# print (input_.shape)
out = model(input_)
out= nn.functional.relu(out)
loss = criterion(out, label_)
loss.backward()
optimizer.zero_grad()
optimizer.step()
loop_loss = loop_loss + loss.item()
batch_count = batch_count + 1
print('batch_loss: ', str(loss.item()))
print('Epochs completed:', epoch+1,'\n')
print('epoch_loss = ' + loop_loss/float(batch_count))
size mismatch, m1: [25600 x 100], m2: [3136 x 1000] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41
Please explain where the shapes went wrong? How should I fix this?
I am new to this so this might not be a good question but any detail would help
The input images are resized to 400,400 and converted to gray from rgb
Your problem is in the first Linear layer. Always code like this so that you can figure out yourself.
class MyModel(nn.Module):
def __init__(self, params):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(...)
self.fc = nn.Linear(...)
def forward(self, x):
x = self.conv1(x)
import pdb; pdb.set_trace()
x = self.fc(x)
return x
This way you can put the pdb where you want and you can check the shapes using x.shape command. Your problem is in the mismatch between shape of output of conv layer and your first Linear layer.

Combining Two CNN's

I Want to Combine Two CNN Into Just One In Keras, What I Mean Is that I Want The Neural Network To Take Two Images And Process Each One in Separate CNN, and Then Concatenate Them Together Into The Flattening Layer and Use Fully Connected Layer to Do The Last Work, Here What I Did:
# Start With First Branch ############################################################
branch_one = Sequential()
# Adding The Convolution
branch_one.add(Conv2D(32, (3,3),input_shape = (64,64,3) , activation = 'relu'))
branch_one.add(Conv2D(32, (3, 3), activation='relu'))
# Doing The Pooling Phase
branch_one.add(MaxPooling2D(pool_size=(2, 2)))
branch_one.add(Dropout(0.25))
branch_one.add(Flatten())
# Start With Second Branch ############################################################
branch_two = Sequential()
# Adding The Convolution
branch_two.add(Conv2D(32, (3,3),input_shape = (64,64,3) , activation = 'relu'))
branch_two.add(Conv2D(32, (3, 3), activation='relu'))
# Doing The Pooling Phase
branch_two.add(MaxPooling2D(pool_size=(2, 2)))
branch_two.add(Dropout(0.25))
branch_two.add(Flatten())
# Making The Combinition ##########################################################
final = Sequential()
final.add(Concatenate([branch_one, branch_two]))
final.add(Dense(units = 128, activation = "relu"))
final.add(Dense(units = 1, activation = "sigmoid"))
# Doing The Compilation
final.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
# Adding and Pushing The Images to CNN
# use ImageDataGenerator to preprocess the data
from keras.preprocessing.image import ImageDataGenerator
# augment the data that we have
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
# prepare training data
X1 = train_datagen.flow_from_directory('./ddsm1000_resized/images/train',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
X2 = train_datagen.flow_from_directory('./ddsm1000_resized_canny/images/train',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
# prepare test data
Y1 = test_datagen.flow_from_directory('./ddsm1000_resized/images/test',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
Y2 = test_datagen.flow_from_directory('./ddsm1000_resized_canny/images/test',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
final.fit_generator([X1, X2], steps_per_epoch = (8000 / 32), epochs = 1, validation_data = [Y1,Y2], validation_steps = 2000)
Keras Telling Me
RuntimeError: You must compile your model before using it.
I Think That is The CNN Does not the shapes of input data, so what Can I Do Here ?? Thanks
Make the change as pointed below:
from keras.layers import Merge
...
...
# Making The Combinition ##########################################################
final = Sequential()
final.add(Merge([branch_one, branch_two], mode = 'concat'))
...
...