what should I do if my regression model stuck at a high value loss? - deep-learning

I'm using neural nets for a regression problem where I have 3 features and I'm trying to predict one continuous value. I noticed that my neural net start learning good but after 10 epochs it get stuck on a high loss value and could not improve anymore.
I tried to use Adam and other adaptive optimizers instead of SGD but that didn't work. I tried a complex architectures like adding layers, neurons, batch normalization and other activations etc.. and that also didn't work.
I tried to debug and try to find out if something is wrong with the implementation but when I use only 10 examples of the data my model learn fast so there are no errors. I start to increase the examples of the data and monitoring my model results as I increase the data examples. when I reach 3000 data examples my model start to get stuck on a high value loss.
I tried to increase layers, neurons and also to try other activations, batch normalization. My data are also normalized between [-1, 1], my target value is not normalized since it is regression and I'm predicting a continuous value. I also tried using keras but I've got the same result.
My real dataset have 40000 data, I don't know what should I try, I almost try all things that I know for optimization but none of them worked. I would appreciate it if someone can guide me on this. I'll post my Code but maybe it is too messy to try to understand, I'm sure there is no problem with my implementation, I'm using skorch/pytorch and some SKlearn functions:
# take all features as an Independant variable except the bearing and distance
# here when I start small the model learn good but from 3000 data points as you can see the model stuck on a high value. I mean the start loss is 15 and it start to learn good but when it reach 9 it stucks there
# and if I try to use the whole dataset for training then the loss start at 47 and start decreasing until it reach 36 and then stucks there too
X = dataset.iloc[:3000, 0:-2].reset_index(drop=True).to_numpy().astype(np.float32)
# take distance and bearing as the output values:
y = dataset.iloc[:3000, -2:].reset_index(drop=True).to_numpy().astype(np.float32)
y_bearing = y[:, 0].reshape(-1, 1)
y_distance = y[:, 1].reshape(-1, 1)
# normalize the input values
scaler = StandardScaler()
X_norm = scaler.fit_transform(X, y)
X_br_train, X_br_test, y_br_train, y_br_test = train_test_split(X_norm,
y_bearing,
test_size=0.1,
random_state=42,
shuffle=True)
X_dis_train, X_dis_test, y_dis_train, y_dis_test = train_test_split(X_norm,
y_distance,
test_size=0.1,
random_state=42,
shuffle=True)
bearing_trainset = Dataset(X_br_train, y_br_train)
bearing_testset = Dataset(X_br_test, y_br_test)
distance_trainset = Dataset(X_dis_train, y_dis_train)
distance_testset = Dataset(X_dis_test, y_dis_test)
def root_mse(y_true, y_pred):
return np.sqrt(mean_squared_error(y_true, y_pred))
class RMSELoss(nn.Module):
def __init__(self):
super().__init__()
self.mse = nn.MSELoss()
def forward(self, yhat, y):
return torch.sqrt(self.mse(yhat, y))
class AED(nn.Module):
"""custom average euclidean distance loss"""
def __init__(self):
super().__init__()
def forward(self, yhat, y):
return torch.dist(yhat, y)
def train(on_target,
hidden_units,
batch_size,
epochs,
optimizer,
lr,
regularisation_factor,
train_shuffle):
network = None
trainset = distance_trainset if on_target.lower() == 'distance' else bearing_trainset
testset = distance_testset if on_target.lower() == 'distance' else bearing_testset
print(f"shape of trainset.X = {trainset.X.shape}, shape of trainset.y = {trainset.y.shape}")
print(f"shape of testset.X = {testset.X.shape}, shape of testset.y = {testset.y.shape}")
mse = EpochScoring(scoring=mean_squared_error, lower_is_better=True, name='MSE')
r2 = EpochScoring(scoring=r2_score, lower_is_better=False, name='R2')
rmse = EpochScoring(scoring=make_scorer(root_mse), lower_is_better=True, name='RMSE')
checkpoint = Checkpoint(dirname=f'results/{on_target}/checkpoints')
train_end_checkpoint = TrainEndCheckpoint(dirname=f'results/{on_target}/checkpoints')
if on_target.lower() == 'bearing':
network = BearingNetwork(n_features=X_norm.shape[1],
n_hidden=hidden_units,
n_out=y_distance.shape[1])
elif on_target.lower() == 'distance':
network = DistanceNetwork(n_features=X_norm.shape[1],
n_hidden=hidden_units,
n_out=1)
model = NeuralNetRegressor(
module=network,
criterion=RMSELoss,
device='cpu',
batch_size=batch_size,
lr=lr,
optimizer=optim.Adam if optimizer.lower() == 'adam' else optim.SGD,
optimizer__weight_decay=regularisation_factor,
max_epochs=epochs,
iterator_train__shuffle=train_shuffle,
train_split=predefined_split(testset),
callbacks=[mse, r2, rmse, checkpoint, train_end_checkpoint]
)
print(f"{'*' * 10} start training the {on_target} model {'*' * 10}")
history = model.fit(trainset, y=None)
print(f"{'*' * 10} End Training the {on_target} Model {'*' * 10}")
if __name__ == '__main__':
args = parser.parse_args()
train(on_target=args.on_target,
hidden_units=args.hidden_units,
batch_size=args.batch_size,
epochs=args.epochs,
optimizer=args.optimizer,
lr=args.learning_rate,
regularisation_factor=args.regularisation_lambda,
train_shuffle=args.shuffle)
and this is my network declaration:
class DistanceNetwork(nn.Module):
"""separate NN for predicting distance"""
def __init__(self, n_features=5, n_hidden=16, n_out=1):
super().__init__()
self.model = nn.Sequential(
nn.Linear(n_features, n_hidden),
nn.LeakyReLU(),
nn.Linear(n_hidden, 5),
nn.LeakyReLU(),
nn.Linear(5, n_out)
)
here is the log while training:

Related

Training the saving deep learning model in Pytorch

I have a deep learning model in pytorch (here I provide a simple overview of that). Since I have to run the model each day. I want to save the model in the previous day and then I train the saved model for small number of epochs (3-4) epochs more. Here is the model:
class NET(nn.Module):
def forward(self, y, par):
def test(self, y, num_samples):
def train_batch(model, optimizer, device):
model.train()
optimizer.zero_grad()
length = float(batch.size(0))
...
return loss1
def trainv(model, device, epochs, train_iterator, optimizer, validate_iterator):
for epoch in range(epochs):
for local_batch, local_labels in train_iterator:
train_loss = train_batch(model, optimizer, device)
validate_loss = runs_for_validate(validate_iterator, n_samples) # check on validate data
return train_losses, validate_loss
model = NET(inputs).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr =learning_rate )
train_losses, validate_loss = train(model, device, epochs, train_iterator, optimizer, validate_iterator)
My problem is that I dont know how to save the model and then again loading the saved model to train with more epochs.
previously I could save the model.state_dict() in '.pt' and can load it, but it seems it does not work here. I already saw this post, How can I save my training progress in PyTorch for a certain batch no.? , however I dont know which format I should save the model? and also when I should save the model? Could you please help me with this?
You can save the entire model, including the model architecture and its current state, by passing in the model object to the function.
torch.save(model, "model.pt")
You can save the model's state dictionary, which contains the model's parameters, separately.
torch.save(model.state_dict(), "model_state.pt")
You can load the model by using the torch.load function, passing in the path of the saved model.
model = torch.load("model.pt")
You can also load the state dictionary separately and use it to load the model's parameters.
state_dict = torch.load("model_state.pt")
model.load_state_dict(state_dict)
you can try like
class NET(nn.Module):
def __init__(self, inputs):
super(NET, self).__init__()
self.inputs = inputs
def forward(self, y, par):
# Your forward pass here
pass
def test(self, y, num_samples):
# Your test function here
pass
def train_batch(model, optimizer, device):
model.train()
optimizer.zero_grad()
length = float(batch.size(0))
...
return loss1
def trainv(model, device, epochs, train_iterator, optimizer, validate_iterator, save_path):
best_loss = float('inf')
for epoch in range(epochs):
for local_batch, local_labels in train_iterator:
train_loss = train_batch(model, optimizer, device)
validate_loss = runs_for_validate(validate_iterator, n_samples) # check on validate data
# Save the model if the validation loss improves
if validate_loss < best_loss:
best_loss = validate_loss
torch.save(model.state_dict(), save_path)
return train_losses, validate_loss
def load_model(model, load_path, device):
state_dict = torch.load(load_path, map_location=device)
model.load_state_dict(state_dict)
return model
# Initialize the model
model = NET(inputs).to(device)
# Load the model if a save path is provided
if load_path is not None:
model = load_model(model, load_path, device)
optimizer = torch.optim.Adam(model.parameters(), lr =learning_rate )
train_losses, validate_loss = trainv(model, device, epochs, train_iterator, optimizer, validate_iterator, save_path)

My training and validation loss suddenly increased in power of 3

train function
def train(model, iterator, optimizer, criterion, clip):
model.train()
epoch_loss = 0
for i, batch in enumerate(iterator):
optimizer.zero_grad()
output = model(batch.text)
loss = criterion(output, torch.unsqueeze(batch.labels, 1))
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
optimizer.step()
epoch_loss += loss.item()
return epoch_loss / len(iterator)
main_script
def main(
train_file,
test_file,
config_file,
checkpoint_path,
best_model_path
):
device = 'cuda' if torch.cuda.is_available() else 'cpu'
with open(config_file, 'r') as j:
config = json.loads(j.read())
for k,v in config['model'].items():
v = float(v)
if v < 1.0:
config['model'][k] = float(v)
else:
config['model'][k] = int(v)
for k,v in config['training'].items():
v = float(v)
if v < 1.0:
config['training'][k] = float(v)
else:
config['training'][k] = int(v)
train_itr, val_itr, test_itr, vocab_size = data_pipeline(
train_file,
test_file,
config['training']['max_vocab'],
config['training']['min_freq'],
config['training']['batch_size'],
device
)
model = CNNNLPModel(
vocab_size,
config['model']['emb_dim'],
config['model']['hid_dim'],
config['model']['model_layer'],
config['model']['model_kernel_size'],
config['model']['model_dropout'],
device
)
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()
num_epochs = config['training']['n_epoch']
clip = config['training']['clip']
is_best = False
best_valid_loss = float('inf')
model = model.to(device)
for epoch in tqdm(range(num_epochs)):
train_loss = train(model, train_itr, optimizer, criterion, clip)
valid_loss = evaluate(model, val_itr, criterion)
if (epoch + 1) % 2 == 0:
print("training loss {}, validation_loss{}".format(train_loss,valid_loss))
I was training a Convolution Neural Network for binary Text classification. Given a sentence, it detects its a hate speech or not. Training loss and validation loss was fine till 5 epoch after that suddenly the training loss and validation loss shot up suddenly from 0.2 to 10,000.
What could be the reason for such huge increase is loss suddenly?
Default learning rate of Adam is 0.001, which, depending on task, might be too high.
It looks like instead of converging your neural network became divergent (it left the previous ~0.2 loss minima and fell into different region).
Lowering your learning rate at some point (after 50% or 70% percent of training) would probably fix the issue.
Usually people divide the learning rate by 10 (0.0001 in your case) or by half (0.0005 in your case). Try with dividing by half and see if the issue persist, in general you would want to keep your learning rate as high as possible until divergence occurs as is probably the case here.
This is what schedulers are for (gamma specifies learning rate multiplier, might want to change that to 0.5 first).
One can think of lower learning rate phase as fine-tuning already found solution (placing weights in better region of the loss valley) and might require some patience.

CNN + RNN architecture for video recognition

I am trying to replicate the ConvNet + LSTM approach presented in this paper using pytorch. But I am struggling to find the correct way to combine the CNN and the LSTM in my model. Here is my attempt :
class VideoRNN(nn.Module):
def __init__(self, hidden_size, n_classes):
super(VideoRNN, self).__init__()
self.hidden_size = hidden_size
vgg = models.vgg16(pretrained=True)
embed = nn.Sequential(*list(vgg.classifier.children())[:-1])
vgg.classifier = embed
for param in vgg.parameters():
param.requires_grad = False
self.embedding = vgg
self.GRU = nn.GRU(4096, hidden_size)
def forward(self, input, hidden=None):
embedded = self.embedding(input)
output, hidden = self.gru(output, hidden)
output = self.classifier(output.view(-1, 4096))
return output, hidden
As my videos have variable length, I provide a PackedSequence as an input. It is created from a Tensor with shape (M,B,C,H,W) where M is the maximum sequence length and B the batch size. The C,H,W are the channels, height and width of each frame.
I want the pre-trained CNN to be part of the model as I may later unfreeze some layer to finetune the CNN for my task. That's why I didn't compute the embedding of the images separately.
My questions are then the following :
Is the shape of my input data correct in order to handle batches of videos in my context or should I use something else than a PackedSequence?
In my forward function, how can I handle the batch of sequences of images with my VGG and my GRU unit ? I cannot feed directly the PackedSequence as an input to my VGG so how can I proceed?
Does this approach seem to respect the "pytorch way of doing things" or should is my approach flawed?
I finally found the solution to make it works. Here is a simplified yet complete example of how I managed to create a VideoRNN able to use packedSequence as an input :
class VideoRNN(nn.Module):
def __init__(self, n_classes, batch_size, device):
super(VideoRNN, self).__init__()
self.batch = batch_size
self.device = device
# Loading a VGG16
vgg = models.vgg16(pretrained=True)
# Removing last layer of vgg 16
embed = nn.Sequential(*list(vgg.classifier.children())[:-1])
vgg.classifier = embed
# Freezing the model 3 last layers
for param in vgg.parameters():
param.requires_grad = False
self.embedding = vgg
self.gru = nn.LSTM(4096, 2048, bidirectional=True)
# Classification layer (*2 because bidirectionnal)
self.classifier = nn.Sequential(
nn.Linear(2048 * 2, 256),
nn.ReLU(),
nn.Linear(256, n_classes),
)
def forward(self, input):
hidden = torch.zeros(2, self.batch , 2048).to(
self.device
)
c_0 = torch.zeros(self.num_layer * 2, self.batch, 2048).to(
self.device
)
embedded = self.simple_elementwise_apply(self.embedding, input)
output, hidden = self.gru(embedded, (hidden, c_0))
hidden = hidden[0].view(-1, 2048 * 2)
output = self.classifier(hidden)
return output
def simple_elementwise_apply(self, fn, packed_sequence):
return torch.nn.utils.rnn.PackedSequence(
fn(packed_sequence.data), packed_sequence.batch_sizes
)
the key is the simple_elementwise_apply methods allowing to feed the PackedSequence in the CNN networks and to retrieve a new PackedSequence made of embedding as an output.
I hope you'll find it useful.

Disparate result after setting requires_grad=True

I am currently using PyTorch for deep neural network. I wrote a toy neural network shown below and I found that whether or not I set requires_grad=True for label y makes huge difference. When y.requires_grad=True, the neural network diverges. I am wondering why this happens.
import torch
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)
y = x.pow(2) + 10 * torch.rand(x.size())
x.requires_grad = True
# this is where problem occurs
y.requires_grad = True
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden)
self.predict = torch.nn.Linear(n_hidden, n_output)
def forward(self, x):
x = torch.relu(self.hidden(x))
x = self.predict(x)
return x
net = Net(1, 10, 1)
optimizer = torch.optim.SGD(net.parameters(), lr=0.5)
criterion = torch.nn.MSELoss()
for t in range(200):
y_pred = net(x)
loss= criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
print("Epoch {}: {}".format(t, loss))
optimizer.step()
It seems that you are using an outdated version of PyTorch. In more recent versions (0.4.0+), this will throw you the following error:
AssertionError: nn criterions don't compute the gradient w.r.t. targets -
please mark these tensors as not requiring gradients
Essentially, it tells you that it will only work if you set the requires_grad flag to False for your targets. The reason why this works at all in prior versions is indeed very interesting, and also why it causes diverging behavior.
My guess would be that a backwards pass would then also change your targets (instead of only changing your weights), which is obviously something you do not desire.

Keras' ImageDataGenerator.flow() results in very low training/validation accuracy as opposed to flow_from_directory()

I am trying to train a very simple model for image recognition, nothing spectacular. My first attempt worked just fine, when I used image rescaling:
# this is the augmentation configuration to enhance the training dataset
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# validation generator, only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical')
Then I simply trained the model as such:
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size)
This works perfectly fine and leads to a reasonable accuracy. Then I thought it may be a good idea to try out mean subtraction, as VGG16 model uses. Instead of doing it manually, I chose to use ImageDataGenerator.fit(). For that, however, you need to supply it with training images as numpy arrays, so I first read the images, convert them, and then feed them into it:
train_datagen = ImageDataGenerator(
featurewise_center=True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(featurewise_center=True)
def process_images_from_directory(data_dir):
x = []
y = []
for root, dirs, files in os.walk(data_dir, topdown=False):
class_names = sorted(dirs)
global class_indices
if len(class_indices) == 0:
class_indices = dict(zip(class_names, range(len(class_names))))
for dir in class_names:
filenames = os.listdir(os.path.join(root,dir))
for file in filenames:
img_array = img_to_array(load_img(os.path.join(root,dir,file), target_size=(224, 224)))[np.newaxis]
if len(x) == 0:
x = img_array
else:
x = np.concatenate((x,img_array))
y.append(class_indices[dir])
#this step converts an array of classes [0,1,2,3...] into sparse vectors [1,0,0,0], [0,1,0,0], etc.
y = np.eye(len(class_names))[y]
return x, y
x_train, y_train = process_images_from_directory(train_data_dir)
x_valid, y_valid = process_images_from_directory(validation_data_dir)
nb_train_samples = x_train.shape[0]
nb_validation_samples = x_valid.shape[0]
train_datagen.fit(x_train)
test_datagen.mean = train_datagen.mean
train_generator = train_datagen.flow(
x_train,
y_train,
batch_size=batch_size,
shuffle=False)
validation_generator = test_datagen.flow(
x_valid,
y_valid,
batch_size=batch_size,
shuffle=False)
Then, I train the model the same way, simply giving it both iterators. After the training completes, the accuracy is basically stuck at ~25% even after 50 epochs:
80/80 [==============================] - 77s 966ms/step - loss: 12.0886 - acc: 0.2500 - val_loss: 12.0886 - val_acc: 0.2500
When I run predictions on the above model, it classifies only 1 out 4 total classes correctly, all images from other 3 classes are classified as belonging to the first class - clearly the percentage of 25% has something to do with this fact, I just can't figure out what I am doing wrong.
I realize that I could calculate the mean manually and then simply set it for both generators, or that I could use ImageDataGenerator.fit() and then still go with flow_from_directory, but that would be a waste of already processed images, I would be doing the same processing twice.
Any opinions on how to make it work with flow() all the way?
Did you try setting shuffle=True in your generators?
You did not specify shuffling in the first case (it should be True by default) and set it to False in the second case.
Your input data might be sorted by classes. Without shuffling, your model first only sees class #1 and simply learns to predict class #1 always. It then sees class #2 and learns to always predict class #2 and so on. At the end of one epoch your model learns to always predict class #4 and thus gives a 25% accuracy on validation.