I'm attempting to convert old code to PyTorch code as an experiment. Ultimately, I will be doing regression on a 10,000+ x 100 Matrix, updating weights and whatnot appropriately.
Trying to learn, I'm slowly scaling up on toy examples. I'm hitting a wall with the following sample code.
import torch
import torch.nn as nn
import torch.nn.functional as funct
from torch.autograd import Variable
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
x_data = Variable( torch.Tensor( [ [1.0, 2.0], [2.0, 3.0], [3.0, 4.0] ] ),
requires_grad=True )
y_data = Variable( torch.Tensor( [ [2.0], [4.0], [6.0] ] ) )
w = Variable( torch.randn( 2, 1, requires_grad=True ) )
b = Variable( torch.randn( 1, 1, requires_grad=True ) )
class Model(torch.nn.Module) :
def __init__(self) :
super( Model, self).__init__()
self.linear = torch.nn.Linear(2,1) ## 2 features per entry. 1 output
def forward(self, x2, w2, b2) :
y_pred = x2 # w2 + b2
return y_pred
model = Model()
criterion = torch.nn.MSELoss( size_average=False )
optimizer = torch.optim.SGD( model.parameters(), lr=0.01 )
for epoch in range(10) :
y_pred = model( x_data,w,b ) # Get prediction
loss = criterion( y_pred, y_data ) # Calc loss
print( epoch, loss.data.item() ) # Print loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Calculate gradients
optimizer.step() # Update w, b
However, doing so, my loss is always the same, and investigating shows my w and b never actually change. I'm a bit lost at what's going on here.
Ultimately, I'd like to be able to store the results of the "new" w and b to compare across iterations and datasets.
It looks like a case of cargo programming to me.
Notice that your Model class doesn't make use of self in forward, so it is effectively a "regular" (non-method) function, and model is entirely stateless. The simplest fix to your code is to make optimizer aware of w and b, by creating it as optimizer = torch.optim.SGD([w, b], lr=0.01). I also rewrite model to be a function
import torch
import torch.nn as nn
# torch.autograd.Variable is roughly equivalent to requires_grad=True
# and is deprecated in PyTorch 1.0
# your code gives not reason to have `requires_grad=True` on `x_data`
x_data = torch.tensor( [ [1.0, 2.0], [2.0, 3.0], [3.0, 4.0] ])
y_data = torch.tensor( [ [2.0], [4.0], [6.0] ] )
w = torch.randn( 2, 1, requires_grad=True )
b = torch.randn( 1, 1, requires_grad=True )
def model(x2, w2, b2):
return x2 # w2 + b2
criterion = torch.nn.MSELoss( size_average=False )
optimizer = torch.optim.SGD([w, b], lr=0.01 )
for epoch in range(10) :
y_pred = model( x_data,w,b )
loss = criterion( y_pred, y_data )
print( epoch, loss.data.item() )
optimizer.zero_grad()
loss.backward()
optimizer.step()
That being said, nn.Linear is built to simplify this procedure. It automatically creates an equivalent of both w and b, called self.weight and self.bias, respectively. Also, self.__call__(x) is equivalent to the definition of forward of your Model, in that it returns self.weight # x + self.bias. In other words, you can also use alternative code
import torch
import torch.nn as nn
x_data = torch.tensor( [ [1.0, 2.0], [2.0, 3.0], [3.0, 4.0] ] )
y_data = torch.tensor( [ [2.0], [4.0], [6.0] ] )
model = nn.Linear(2, 1)
criterion = torch.nn.MSELoss( size_average=False )
optimizer = torch.optim.SGD(model.parameters(), lr=0.01 )
for epoch in range(10) :
y_pred = model(x_data)
loss = criterion( y_pred, y_data )
print( epoch, loss.data.item() )
optimizer.zero_grad()
loss.backward()
optimizer.step()
where model.parameters() can be used to enumerate model parameters (equivalent to the manually created list [w, b] above). To access your parameters (load, save, print, whatever) use model.weight and model.bias.
Related
I'm trying to compare between 2 models in order to learn about the behaviour of the gradients.
import torch
import torch.nn as nn
import torchinfo
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.Identity = nn.Identity ()
self.GRU = nn.GRU(input_size=3, hidden_size=32, num_layers=2, batch_first=True)
self.fc = nn.Linear(32, 5)
def forward(self, input_series):
self.Identity(input_series)
output, h = self.GRU(input_series)
output = output[:, -1, :] # get last state
output = self.fc(output)
output = output.view(-1, 5, 1) # reorginize output
return output
class SecondModel(nn.Module):
def __init__(self):
super(SecondModel, self).__init__()
self.GRU = nn.GRU(input_size=3, hidden_size=32, num_layers=2, batch_first=True)
def forward(self, input_series):
output, h = self.GRU(input_series)
return output
Checking the gradient of the first model gives True (zero gradients):
model = MyModel()
x = torch.rand([2, 10, 3])
y = model(x)
y.retain_grad()
y[:, -1].sum().backward()
print(torch.allclose(y.grad[:, :-1], torch.tensor(0.))) # gradients w.r.t previous outputs are zeroes
Checking the gradient of the second model also gives True (zero gradients):
model = SecondModel()
x = torch.rand([2, 10, 3])
y = model(x)
y.retain_grad()
y[:, -1].sum().backward()
print(torch.allclose(y.grad[:, :-1], torch.tensor(0.))) # gradients w.r.t previous outputs are zeroes
According to the answer here:
Do linear layer after GRU saved the sequence output order?
the second model (with just GRU layer) need to give non zero gradients.
What am I missing ?
When will we get zero or non-zero gradients ?
The value of y.grad[:, :-1] theoretically shouldn't be zeroes, but here they are because y[:, :-1] doesn't seem to refer to the same tensor objects used to compute y[:, -1] in the GRU implementation. As an illustration, a simple 1-layer GRU implementation looks like
import torch
import torch.nn as nn
class GRU(nn.Module):
def __init__(self, input_size, hidden_size):
super().__init__()
self.lin_r = nn.Linear(input_size + hidden_size, hidden_size)
self.lin_z = nn.Linear(input_size + hidden_size, hidden_size)
self.lin_in = nn.Linear(input_size, hidden_size)
self.lin_hn = nn.Linear(hidden_size, hidden_size)
self.hidden_size = hidden_size
def forward(self, x):
bsz, len_, in_ = x.shape
h = torch.zeros([bsz, self.hidden_size])
hs = []
for i in range(len_):
r = self.lin_r(torch.cat([x[:, i], h], dim=-1)).sigmoid()
z = self.lin_z(torch.cat([x[:, i], h], dim=-1)).sigmoid()
n = (self.lin_in(x[:, i]) + r * self.lin_hn(h)).tanh()
h = (1.-z)*n + z*h
hs.append(h)
# Return the output both as a single tensor and as a list of
# tensors actually used in computing the hidden vectors
return torch.stack(hs, dim=1), hs
Then, we have
model = GRU(input_size=3, hidden_size=32)
x = torch.rand([2, 10, 3])
y, hs = model(x)
y.retain_grad()
for h in hs:
h.retain_grad()
y[:, -1].sum().backward()
print(torch.allclose(y.grad[:, -1], torch.tensor(0.))) # False, as expected (sanity check)
print(torch.allclose(y.grad[:, :-1], torch.tensor(0.))) # True, unexpected
print(any(torch.allclose(h.grad, torch.tensor(0.)) for h in hs)) # False, as expected
It appears PyTorch computes the gradients w.r.t all tensors in hs as expected but not those w.r.t y.
So, to answer your question:
I don't think you miss anything. The linked answer is just not quite right as it incorrectly assumes PyTorch would compute y.grad as expected.
The theory given as a comment in the linked answer is still right, but not quite complete: gradient is always zero iff the input doesn't matter.
I've been trying to plot a confusion matrix for the below code - check def train_alexnet(). But I keep getting this error:
IndexError: only integers, slices (`:`), ellipsis (`...`), None and long or byte Variables are valid indices (got float)
So, I tried converting my tensors to an integer tensor but then got the error:
ValueError: only one element tensors can be converted to Python scalars
Can someone suggest me what can be done to convert the tensors 'all_preds' and 'source_value' to tensors containing integer values? I found the torch no grad option but I am unaware as to how to use it because I'm new to pytorch.
Here's the link of the github repo that I'm trying to work with: https://github.com/syorami/DDC-transfer-learning/blob/master/DDC.py
from __future__ import print_function
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import warnings
warnings.filterwarnings('ignore')
import math
import model
import torch
import dataloader
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.metrics import confusion_matrix
from plotcm import plot_confusion_matrix
from torch import nn
from torch import optim
from torch.autograd import Variable
cuda = torch.cuda.is_available()
def step_decay(epoch, learning_rate):
# learning rate step decay
# :param epoch: current training epoch
# :param learning_rate: initial learning rate
# :return: learning rate after step decay
initial_lrate = learning_rate
drop = 0.8
epochs_drop = 10.0
lrate = initial_lrate * math.pow(drop, math.floor((1 + epoch) / epochs_drop))
return lrate
def train_alexnet(epoch, model, learning_rate, source_loader):
# train source on alexnet
# :param epoch: current training epoch
# :param model: defined alexnet
# :param learning_rate: initial learning rate
# :param source_loader: source loader
# :return:
log_interval = 10
LEARNING_RATE = step_decay(epoch, learning_rate)
print(f'Learning Rate: {LEARNING_RATE}')
optimizer = optim.SGD([
{'params': model.features.parameters()},
{'params': model.classifier.parameters()},
{'params': model.final_classifier.parameters(), 'lr': LEARNING_RATE}
], lr=LEARNING_RATE / 10, momentum=MOMENTUM, weight_decay=L2_DECAY)
# enter training mode
model.train()
iter_source = iter(source_loader)
num_iter = len(source_loader)
correct = 0
total_loss = 0
clf_criterion = nn.CrossEntropyLoss()
all_preds = torch.tensor([])
source_value = torch.tensor([])
for i in range(1, num_iter):
source_data, source_label = iter_source.next()
# print("source label: ", source_label)
if cuda:
source_data, source_label = source_data.cuda(), source_label.cuda()
source_data, source_label = Variable(source_data), Variable(source_label)
optimizer.zero_grad()
##
source_preds = model(source_data)
preds = source_preds.data.max(1, keepdim=True)[1]
correct += preds.eq(source_label.data.view_as(preds)).sum()
#prediction label
all_preds = torch.cat(
(all_preds, preds)
,dim=0
)
#actual label
source_value = torch.cat(
(source_value,source_label)
,dim=0
)
loss = clf_criterion(source_preds, source_label)
total_loss += loss
loss.backward()
optimizer.step()
if i % log_interval == 0:
print('Train Epoch {}: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, i * len(source_data), len(source_loader) * BATCH_SIZE,
100. * i / len(source_loader), loss.item()))
total_loss /= len(source_loader)
acc_train = float(correct) * 100. / (len(source_loader) * BATCH_SIZE)
# print('all preds= ',int(all_preds))
# print("source value", int(source_value))
stacked = torch.stack(
(
source_value
,(all_preds.argmax(dim=1))
)
,dim=1
)
print("stacked",stacked)
cmt = torch.zeros(3
,3, dtype=torch.float64)
with torch.no_grad():
for p in stacked:
tl, pl = p.tolist()
cmt[tl, pl] = cmt[tl, pl] + 1
print("cmt: ",cmt)
print('{} set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)'.format(
SOURCE_NAME, total_loss.item(), correct, len(source_loader.dataset), acc_train))
def test_alexnet(model, target_loader):
# test target data on fine-tuned alexnet
# :param model: trained alexnet on source data set
# :param target_loader: target dataloader
# :return: correct num
# enter evaluation mode
clf_criterion = nn.CrossEntropyLoss()
model.eval()
test_loss = 0
correct = 0
for data, target in target_test_loader:
if cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
target_preds = model(data)
test_loss += clf_criterion(target_preds, target) # sum up batch loss
pred = target_preds.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data.view_as(pred)).cpu().sum()
stacked = torch.stack(
(
target
,target_preds.argmax(dim=1)
)
,dim=1
)
print("stacked target",stacked)
test_loss /= len(target_loader)
print('{} set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
TARGET_NAME, test_loss.item(), correct, len(target_loader.dataset),
100. * correct / len(target_loader.dataset)))
return correct
def compute_confusion_matrix(preds, y):
#round predictions to the closest integer
rounded_preds = torch.round(torch.sigmoid(preds))
return confusion_matrix(y, rounded_preds)
if __name__ == '__main__':
ROOT_PATH = './v1234_combined/pets'
SOURCE_NAME = 'v123'
TARGET_NAME = 'v4'
BATCH_SIZE = 15
TRAIN_EPOCHS = 1
learning_rate = 1e-2
L2_DECAY = 5e-4
MOMENTUM = 0.9
source_loader = dataloader.load_training(ROOT_PATH, SOURCE_NAME, BATCH_SIZE)
#target_train_loader = dataloader.load_training(ROOT_PATH, TARGET_NAME, BATCH_SIZE)
target_test_loader = dataloader.load_testing(ROOT_PATH, TARGET_NAME, BATCH_SIZE)
print('Load data complete')
alexnet = model.Alexnet_finetune(num_classes=3)
print('Construct model complete')
# load pretrained alexnet model
alexnet = model.load_pretrained_alexnet(alexnet)
print('Load pretrained alexnet parameters complete\n')
if cuda: alexnet.cuda()
for epoch in range(1, TRAIN_EPOCHS + 1):
print(f'Train Epoch {epoch}:')
train_alexnet(epoch, alexnet, learning_rate, source_loader)
correct = test_alexnet(alexnet, target_test_loader)
print(len(source_loader.dataset))
In oder to conver all elements of a tensor from floats to ints, you need to use .to():
all_preds_int = all_preds.to(torch.int64)
Note that it appears as if your all_preds are the predicted class probabilities and not the actual labels. You might need to torch.argmax along the appropriate dimension. (BTW, the output of argmax is int - no need to convert).
from __future__ import print_function
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
from tensorflow.examples.tutorials.mnist import input_data
import torch.optim as optim
import tensorflow.python.util.deprecation as deprecation
deprecation._PRINT_DEPRECATION_WARNINGS = False
import matplotlib.pyplot as plt
%matplotlib inline
from plot import plot_loss_and_acc
mnist = input_data.read_data_sets("MNIST_data", one_hot=False)
batch_size = 250
epoch_num = 10
lr = 0.0001
disp_freq = 20
def next_batch(train=True):
# Reads the next batch of MNIST images and labels and returns them
if train:
batch_img, batch_label = mnist.train.next_batch(batch_size)
else:
batch_img, batch_label = mnist.test.next_batch(batch_size)
batch_label = torch.from_numpy(batch_label).long() # convert the numpy array into torch tensor
batch_label = Variable(batch_label) # create a torch variable
batch_img = torch.from_numpy(batch_img).float() # convert the numpy array into torch tensor
batch_img = Variable(batch_img) # create a torch variable
return batch_img, batch_label
class MLP(nn.Module):
def __init__(self, n_features, n_classes):
super(MLP, self).__init__()
self.layer1 = nn.Linear(n_features, 128)
self.layer2 = nn.Linear(128, 128)
self.layer3 = nn.Linear(128, n_classes)
def forward(self, x, training=True):
# a neural network with 2 hidden layers
# x -> FC -> relu -> dropout -> FC -> relu -> dropout -> FC -> output
x = F.relu(self.layer1(x))
x = F.dropout(x, 0.5, training=training)
x = F.relu(self.layer2(x))
x = F.dropout(x, 0.5, training=training)
x = self.layer3(x)
return x
def predict(self, x):
# a function to predict the labels of a batch of inputs
x = F.softmax(self.forward(x, training=False))
return x
def accuracy(self, x, y):
# a function to calculate the accuracy of label prediction for a batch of inputs
# x: a batch of inputs
# y: the true labels associated with x
prediction = self.predict(x)
maxs, indices = torch.max(prediction, 1)
acc = 100 * torch.sum(torch.eq(indices.float(), y.float()).float())/y.size()[0]
print(acc.data)
return acc.data
# define the neural network (multilayer perceptron)
net = MLP(784, 10)
# calculate the number of batches per epoch
batch_per_ep = mnist.train.num_examples // batch_size
# define the loss (criterion) and create an optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=lr)
print(' ')
print("__________Training__________________")
xArray = []
yLoss = []
yAcc = []
for ep in range(epoch_num): # epochs loop
for batch_n in range(batch_per_ep): # batches loop
features, labels = next_batch()
# Reset gradients
optimizer.zero_grad()
# Forward pass
output = net(features)
loss = criterion(output, labels)
# Backward pass and updates
loss.backward() # calculate the gradients (backpropagation)
optimizer.step() # update the weights
if batch_n % disp_freq == 0:
print('epoch: {} - batch: {}/{} '.format(ep, batch_n, batch_per_ep))
xArray.append(ep)
yLoss.append(loss.data)
#yAcc.append(acc.data)
print('loss: ', loss.data)
print('__________________________________')
# test the accuracy on a batch of test data
features, labels = next_batch(train=False)
print("Result")
print('Test accuracy: ', net.accuracy(features, labels))
print('loss: ', loss.data)
accuracy = net.accuracy(features, labels)
#Loss Plot
# plotting the points
plt.plot(xArray, yLoss)
# naming the x axis
plt.xlabel('epoch')
# naming the y axis
plt.ylabel('loss')
# giving a title to my graph
plt.title('Loss Plot')
# function to show the plot
plt.show()
#Accuracy Plot
# plotting the points
plt.plot(xArray, yAcc)
# naming the x axis
plt.xlabel('epoch')
# naming the y axis
plt.ylabel(' accuracy')
# giving a title to my graph
plt.title('Accuracy Plot ')
# function to show the plot
plt.show()
I want to display the accuracy of my training dataset. I have managed to display and plot the loss but I didn't manage to do it for accuracy. I know I am missing 1 or 2 lines of code and I don't know how to do it.
I mean if I can display the accuracy alongside each epoch like the loss I can do the plotting myself.
Hi replace this code print('epoch: {} - batch: {}/{} '.format(ep, batch_n, batch_per_ep)) with
print('epoch: {} - batch: {}/{} - accuracy: {}'.format(ep, batch_n, batch_per_ep, net.accuracy(features,labels)))
Hope this helps.
I borrowed code from this github repo for training of a DenseNet-121 [https://github.com/gaetandi/cheXpert/blob/master/cheXpert_final.ipynb][1]
The github code is for 14 class classification on the CheXpert chest X-ray dataset. I've revised it for binary classification.
# initialize and load the model
pathModel = "/ds2/images/model_ones_2epoch_densenet.tar"#"m-epoch0-07032019-213933.pth.tar"
I initialize the 14 class model so I can use the pretrained weights:
model = DenseNet121(nnClassCount).cuda()
model = torch.nn.DataParallel(model).cuda()
modelCheckpoint = torch.load(pathModel)
model.load_state_dict(modelCheckpoint['state_dict'])
And then convert to binary classification:
nnClassCount = 1
model.module.densenet121.classifier = nn.Sequential(
nn.Linear(1024, nnClassCount),
nn.Sigmoid()
).cuda()
model = torch.nn.DataParallel(model).cuda()
And then train via:
batch, losst, losse = CheXpertTrainer.train(model, dataLoaderTrain, dataLoaderVal, nnClassCount, 100, timestampLaunch, checkpoint = None, weight_path = weight_path)
My training data is laid out in a 2 column csv with column headers ('Path' and 'Class-Positive'), with path locations in the first column and 0 or 1 in the second column. I used oversampling when compiling the training list so paths in the csv are roughly a 50/50 split between 0's and 1's...shuffled.
I use livelossplot to monitor training/validation loss and accuracy. My loss plots look as expected but accuracy plots are flatlined around 0.5 (which makes sense given the 50/50 data if the net is saying its 100% positive or negative). I'm assuming I'm doing something wrong in how I'm doing predictions, but maybe something in the training is incorrect.
For predictions and probabilities I'm running:
varOutput = model(varInput)
_, preds = torch.max(varOutput, 1)
print('varshape: ',varOutput.shape)
probs = torch.sigmoid(varOutput)
*My issue: preds are all coming out as 0 and probs all above 0.5 *
Here is the initial code from github:
import os
import numpy as np
import time
import sys
import csv
import cv2
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn.functional as tfunc
from torch.utils.data import Dataset
from torch.utils.data.dataset import random_split
from torch.utils.data import DataLoader
from torch.optim.lr_scheduler import ReduceLROnPlateau
from PIL import Image
import torch.nn.functional as func
from sklearn.metrics.ranking import roc_auc_score
import sklearn.metrics as metrics
import random
use_gpu = torch.cuda.is_available()
# Paths to the files with training, and validation sets.
# Each file contains pairs (path to image, output vector)
pathFileTrain = '../CheXpert-v1.0-small/train.csv'
pathFileValid = '../CheXpert-v1.0-small/valid.csv'
# Neural network parameters:
nnIsTrained = False #pre-trained using ImageNet
nnClassCount = 14 #dimension of the output
# Training settings: batch size, maximum number of epochs
trBatchSize = 64
trMaxEpoch = 3
# Parameters related to image transforms: size of the down-scaled image, cropped image
imgtransResize = (320, 320)
imgtransCrop = 224
# Class names
class_names = ['No Finding', 'Enlarged Cardiomediastinum', 'Cardiomegaly', 'Lung Opacity',
'Lung Lesion', 'Edema', 'Consolidation', 'Pneumonia', 'Atelectasis', 'Pneumothorax',
'Pleural Effusion', 'Pleural Other', 'Fracture', 'Support Devices']
class CheXpertDataSet(Dataset):
def __init__(self, image_list_file, transform=None, policy="ones"):
"""
image_list_file: path to the file containing images with corresponding labels.
transform: optional transform to be applied on a sample.
Upolicy: name the policy with regard to the uncertain labels
"""
image_names = []
labels = []
with open(image_list_file, "r") as f:
csvReader = csv.reader(f)
next(csvReader, None)
k=0
for line in csvReader:
k+=1
image_name= line[0]
label = line[5:]
for i in range(14):
if label[i]:
a = float(label[i])
if a == 1:
label[i] = 1
elif a == -1:
if policy == "ones":
label[i] = 1
elif policy == "zeroes":
label[i] = 0
else:
label[i] = 0
else:
label[i] = 0
else:
label[i] = 0
image_names.append('../' + image_name)
labels.append(label)
self.image_names = image_names
self.labels = labels
self.transform = transform
def __getitem__(self, index):
"""Take the index of item and returns the image and its labels"""
image_name = self.image_names[index]
image = Image.open(image_name).convert('RGB')
label = self.labels[index]
if self.transform is not None:
image = self.transform(image)
return image, torch.FloatTensor(label)
def __len__(self):
return len(self.image_names)
#TRANSFORM DATA
normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
transformList = []
#transformList.append(transforms.Resize(imgtransCrop))
transformList.append(transforms.RandomResizedCrop(imgtransCrop))
transformList.append(transforms.RandomHorizontalFlip())
transformList.append(transforms.ToTensor())
transformList.append(normalize)
transformSequence=transforms.Compose(transformList)
#LOAD DATASET
dataset = CheXpertDataSet(pathFileTrain ,transformSequence, policy="ones")
datasetTest, datasetTrain = random_split(dataset, [500, len(dataset) - 500])
datasetValid = CheXpertDataSet(pathFileValid, transformSequence)
#Problèmes de l'overlapping de patients et du transform identique ?
dataLoaderTrain = DataLoader(dataset=datasetTrain, batch_size=trBatchSize, shuffle=True, num_workers=24, pin_memory=True)
dataLoaderVal = DataLoader(dataset=datasetValid, batch_size=trBatchSize, shuffle=False, num_workers=24, pin_memory=True)
dataLoaderTest = DataLoader(dataset=datasetTest, num_workers=24, pin_memory=True)
class CheXpertTrainer():
def train (model, dataLoaderTrain, dataLoaderVal, nnClassCount, trMaxEpoch, launchTimestamp, checkpoint):
#SETTINGS: OPTIMIZER & SCHEDULER
optimizer = optim.Adam (model.parameters(), lr=0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=1e-5)
#SETTINGS: LOSS
loss = torch.nn.BCELoss(size_average = True)
#LOAD CHECKPOINT
if checkpoint != None and use_gpu:
modelCheckpoint = torch.load(checkpoint)
model.load_state_dict(modelCheckpoint['state_dict'])
optimizer.load_state_dict(modelCheckpoint['optimizer'])
#TRAIN THE NETWORK
lossMIN = 100000
for epochID in range(0, trMaxEpoch):
timestampTime = time.strftime("%H%M%S")
timestampDate = time.strftime("%d%m%Y")
timestampSTART = timestampDate + '-' + timestampTime
batchs, losst, losse = CheXpertTrainer.epochTrain(model, dataLoaderTrain, optimizer, trMaxEpoch, nnClassCount, loss)
lossVal = CheXpertTrainer.epochVal(model, dataLoaderVal, optimizer, trMaxEpoch, nnClassCount, loss)
timestampTime = time.strftime("%H%M%S")
timestampDate = time.strftime("%d%m%Y")
timestampEND = timestampDate + '-' + timestampTime
if lossVal < lossMIN:
lossMIN = lossVal
torch.save({'epoch': epochID + 1, 'state_dict': model.state_dict(), 'best_loss': lossMIN, 'optimizer' : optimizer.state_dict()}, 'm-epoch'+str(epochID)+'-' + launchTimestamp + '.pth.tar')
print ('Epoch [' + str(epochID + 1) + '] [save] [' + timestampEND + '] loss= ' + str(lossVal))
else:
print ('Epoch [' + str(epochID + 1) + '] [----] [' + timestampEND + '] loss= ' + str(lossVal))
return batchs, losst, losse
#--------------------------------------------------------------------------------
def epochTrain(model, dataLoader, optimizer, epochMax, classCount, loss):
batch = []
losstrain = []
losseval = []
model.train()
for batchID, (varInput, target) in enumerate(dataLoaderTrain):
varTarget = target.cuda(non_blocking = True)
#varTarget = target.cuda()
varOutput = model(varInput)
lossvalue = loss(varOutput, varTarget)
optimizer.zero_grad()
lossvalue.backward()
optimizer.step()
l = lossvalue.item()
losstrain.append(l)
if batchID%35==0:
print(batchID//35, "% batches computed")
#Fill three arrays to see the evolution of the loss
batch.append(batchID)
le = CheXpertTrainer.epochVal(model, dataLoaderVal, optimizer, trMaxEpoch, nnClassCount, loss).item()
losseval.append(le)
print(batchID)
print(l)
print(le)
return batch, losstrain, losseval
#--------------------------------------------------------------------------------
def epochVal(model, dataLoader, optimizer, epochMax, classCount, loss):
model.eval()
lossVal = 0
lossValNorm = 0
with torch.no_grad():
for i, (varInput, target) in enumerate(dataLoaderVal):
target = target.cuda(non_blocking = True)
varOutput = model(varInput)
losstensor = loss(varOutput, target)
lossVal += losstensor
lossValNorm += 1
outLoss = lossVal / lossValNorm
return outLoss
#--------------------------------------------------------------------------------
#---- Computes area under ROC curve
#---- dataGT - ground truth data
#---- dataPRED - predicted data
#---- classCount - number of classes
def computeAUROC (dataGT, dataPRED, classCount):
outAUROC = []
datanpGT = dataGT.cpu().numpy()
datanpPRED = dataPRED.cpu().numpy()
for i in range(classCount):
try:
outAUROC.append(roc_auc_score(datanpGT[:, i], datanpPRED[:, i]))
except ValueError:
pass
return outAUROC
#--------------------------------------------------------------------------------
def test(model, dataLoaderTest, nnClassCount, checkpoint, class_names):
cudnn.benchmark = True
if checkpoint != None and use_gpu:
modelCheckpoint = torch.load(checkpoint)
model.load_state_dict(modelCheckpoint['state_dict'])
if use_gpu:
outGT = torch.FloatTensor().cuda()
outPRED = torch.FloatTensor().cuda()
else:
outGT = torch.FloatTensor()
outPRED = torch.FloatTensor()
model.eval()
with torch.no_grad():
for i, (input, target) in enumerate(dataLoaderTest):
target = target.cuda()
outGT = torch.cat((outGT, target), 0).cuda()
bs, c, h, w = input.size()
varInput = input.view(-1, c, h, w)
out = model(varInput)
outPRED = torch.cat((outPRED, out), 0)
aurocIndividual = CheXpertTrainer.computeAUROC(outGT, outPRED, nnClassCount)
aurocMean = np.array(aurocIndividual).mean()
print ('AUROC mean ', aurocMean)
for i in range (0, len(aurocIndividual)):
print (class_names[i], ' ', aurocIndividual[i])
return outGT, outPRED
class DenseNet121(nn.Module):
"""Model modified.
The architecture of our model is the same as standard DenseNet121
except the classifier layer which has an additional sigmoid function.
"""
def __init__(self, out_size):
super(DenseNet121, self).__init__()
self.densenet121 = torchvision.models.densenet121(pretrained=True)
num_ftrs = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(
nn.Linear(num_ftrs, out_size),
nn.Sigmoid()
)
def forward(self, x):
x = self.densenet121(x)
return x
# initialize and load the model
model = DenseNet121(nnClassCount).cuda()
model = torch.nn.DataParallel(model).cuda()
timestampTime = time.strftime("%H%M%S")
timestampDate = time.strftime("%d%m%Y")
timestampLaunch = timestampDate + '-' + timestampTime
batch, losst, losse = CheXpertTrainer.train(model, dataLoaderTrain, dataLoaderVal, nnClassCount, trMaxEpoch, timestampLaunch, checkpoint = None)
print("Model trained")
It looks like you have adapted the training correctly for the binary classification, but the prediction wasn't, as you are still trying it as if it were a multi-class prediction.
The output of your model (varOutput) has the size (batch_size, 1), since there is only one class. The maximum across that dimension will always be 0, since that is the only class available, there is no separate class for 1.
This single class represents both cases (0 and 1), so you can consider it is a the probability of it being positive (1). To get the distinct value of either 0 or 1, you simply use a threshold of 0.5, so everything below that receives the class 0 and above that 1. This can be easily done with torch.round.
But you also have another problem, you're applying the sigmoid function twice in a row, once in the classifier nn.Sigmoid() and then afterwards again torch.sigmoid(varOutput). That is problematic, because sigmoid(0) = 0.5, hence all your probabilities are over 0.5.
The output of your model are already the probabilities, the only thing left is to round them:
probs = model(varInput)
# The .squeeze(1) is to get rid of the singular class dimension
preds = torch.round(probs).squeeze(1)
The Sklearn documentation contains an example of a polynomial regression which beautifully illustrates the idea of overfitting (link).
The third plot shows a 15th order polynomial that overfits the simulated data. I replicated this model in TensorFlow, but I cannot get it to overfit.
Even when tuning the learning rate and the numbers of learning epochs, I cannot get the model to overfit. What am I missing?
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
def true_fun(X):
return np.cos(1.5 * np.pi * X)
# Generate dataset
n_samples = 30
np.random.seed(0)
x_train = np.sort(np.random.rand(n_samples)) # Draw from uniform distribution
y_train = true_fun(x_train) + np.random.randn(n_samples) * 0.1
x_test = np.linspace(0, 1, 100)
y_true = true_fun(x_test)
# Helper function
def run_dir(base_dir, dirname='run'):
"Number log directories incrementally"
import os
import re
pattern = re.compile(dirname+'_(\d+)')
try:
previous_runs = os.listdir(base_dir)
except FileNotFoundError:
previous_runs = []
run_number = 0
for name in previous_runs:
match = pattern.search(name)
if match:
number = int(match.group(1))
if number > run_number:
run_number = number
run_number += 1
logdir = os.path.join(base_dir, dirname + '_%02d' % run_number)
return(logdir)
# Define the polynomial model
def model(X, w):
"""Polynomial model
param X: data
param y: coeficients in the polynomial regression
returns: Polynomial function Y(X, w)
"""
terms = []
for i in range(int(w.shape[0])):
term = tf.multiply(w[i], tf.pow(X, i))
terms.append(term)
return(tf.add_n(terms))
# Create the computation graph
order = 15
tf.reset_default_graph()
X = tf.placeholder("float")
Y = tf.placeholder("float")
w = tf.Variable([0.]*order, name="parameters")
lambda_reg = tf.placeholder('float', shape=[])
learning_rate_ph = tf.placeholder('float', shape=[])
y_model = model(X, w)
loss = tf.div(tf.reduce_mean(tf.square(Y-y_model)), 2) # Square error
loss_rg = tf.multiply(lambda_reg, tf.reduce_sum(tf.square(w))) # L2 pentalty
loss_total = tf.add(loss, loss_rg)
loss_hist1 = tf.summary.scalar('loss', loss)
loss_hist2 = tf.summary.scalar('loss_rg', loss_rg)
loss_hist3 = tf.summary.scalar('loss_total', loss_total)
summary = tf.summary.merge([loss_hist1, loss_hist2, loss_hist3])
train_op = tf.train.GradientDescentOptimizer(learning_rate_ph).minimize(loss_total)
init = tf.global_variables_initializer()
def train(sess, x_train, y_train, lambda_val=0, epochs=2000, learning_rate=0.01):
feed_dict={X: x_train, Y: y_train, lambda_reg: lambda_val, learning_rate_ph: learning_rate}
logdir = run_dir("logs/polynomial_regression2/")
writer = tf.summary.FileWriter(logdir)
sess.run(init)
for epoch in range(epochs):
_, summary_str = sess.run([train_op, summary], feed_dict=feed_dict)
writer.add_summary(summary_str, global_step=epoch)
final_cost, final_cost_rg, w_learned = sess.run([loss, loss_rg, w], feed_dict=feed_dict)
return final_cost, final_cost_rg, w_learned
def plot_test(w_learned, x_test, x_train, y_train):
y_learned = calculate_y(x_test, w_learned)
plt.scatter(x_train, y_train)
plt.plot(x_test, y_true, label="true function")
plt.plot(x_test, y_learned,'r', label="learned function")
#plt.title('$\lambda = {:03.2f}$'.format(lambda_values[i]))
plt.ylabel('y')
plt.xlabel('x')
plt.legend()
plt.show()
def calculate_y(x, w):
y = 0
for i in range(w.shape[0]):
y += w[i] * np.power(x, i)
return y
sess = tf.Session()
final_cost, final_cost_rg, w_learned = train(sess, x_train, y_train, lambda_val=0,
learning_rate=0.3, epochs=2000)
sess.close()
plot_test(w_learned, x_test, x_train, y_train)
I have same problem about this. When I do polynomial regression, I also can't overfit the data by using GD in Tensorflow.
Then I compare the coefficients(weights) of the model by using sklearn LinearRegression, I found when the polynomial degree is larger the coefficient of high order is very smaller(i.e. 1e-4), and the low order is relative large(i.e. 0.1).
That's mean when you using GD algorithm for searching the best value of weights, the high order coefficient become extreme sensitive about the value change, and the low order coefficient is not.
And I guess the best coefficient(overfit with data) of low order term is large, and of high order term is tiny. When you set large learning rate, it's impossible to find the right answer, and when you set tiny learning rate, you need lots of iterations.
It's obvious when you using GD algorithm with small data set to make overfit.