Train loss is decreasing, but accuracy remain the same - deep-learning

this is the train and development cell for multi-label classification task using Roberta (BERT). the first part is training and second part is development (validation). train_dataloader is my train dataset and dev_dataloader is development dataset. my question is: why train loss is decreasing step by step, but accuracy doesn't increase so much? practically, accuracy is increasing until iterate 4, but train loss is decreasing until the last epoch (iterate). is this ok or there should be a problem?
train_loss_set = []
iterate = 4
for _ in trange(iterate, desc="Iterate"):
model.train()
train_loss = 0
nu_train_examples, nu_train_steps = 0, 0
for step, batch in enumerate(train_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
optimizer.zero_grad()
output = model(batch_input_ids, attention_mask=batch_input_mask)
logits = output[0]
loss_function = BCEWithLogitsLoss()
loss = loss_function(logits.view(-1,num_labels),batch_labels.type_as(logits).view(-1,num_labels))
train_loss_set.append(loss.item())
loss.backward()
optimizer.step()
train_loss += loss.item()
nu_train_examples += batch_input_ids.size(0)
nu_train_steps += 1
print("Train loss: {}".format(train_loss/nu_train_steps))
###############################################################################
model.eval()
logits_pred,true_labels,pred_labels,tokenized_texts = [],[],[],[]
# Predict
for i, batch in enumerate(dev_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
with torch.no_grad():
out = model(batch_input_ids, attention_mask=batch_input_mask)
batch_logit_pred = out[0]
pred_label = torch.sigmoid(batch_logit_pred)
batch_logit_pred = batch_logit_pred.detach().cpu().numpy()
pred_label = pred_label.to('cpu').numpy()
batch_labels = batch_labels.to('cpu').numpy()
tokenized_texts.append(batch_input_ids)
logits_pred.append(batch_logit_pred)
true_labels.append(batch_labels)
pred_labels.append(pred_label)
pred_labels = [item for sublist in pred_labels for item in sublist]
true_labels = [item for sublist in true_labels for item in sublist]
threshold = 0.4
pred_bools = [pl>threshold for pl in pred_labels]
true_bools = [tl==1 for tl in true_labels]
print("Accuracy is: ", jaccard_score(true_bools,pred_bools,average='samples'))
torch.save(model.state_dict(), 'bert_model')
and the outputs:
Iterate: 0%| | 0/10 [00:00<?, ?it/s]
Train loss: 0.4024542534684801
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Jaccard is ill-defined and being set to 0.0 in samples with no true or predicted labels. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Accuracy is: 0.5806403013182674
Iterate: 10%|█ | 1/10 [03:21<30:14, 201.64s/it]
Train loss: 0.2972540049911379
Accuracy is: 0.6091337099811676
Iterate: 20%|██ | 2/10 [06:49<27:07, 203.49s/it]
Train loss: 0.26178574864264137
Accuracy is: 0.608361581920904
Iterate: 30%|███ | 3/10 [10:17<23:53, 204.78s/it]
Train loss: 0.23612180122962365
Accuracy is: 0.6096717783158462
Iterate: 40%|████ | 4/10 [13:44<20:33, 205.66s/it]
Train loss: 0.21416303515434265
Accuracy is: 0.6046892655367231
Iterate: 50%|█████ | 5/10 [17:12<17:11, 206.27s/it]
Train loss: 0.1929110718982203
Accuracy is: 0.6030885122410546
Iterate: 60%|██████ | 6/10 [20:40<13:46, 206.74s/it]
Train loss: 0.17280191068465894
Accuracy is: 0.6003766478342749
Iterate: 70%|███████ | 7/10 [24:08<10:21, 207.04s/it]
Train loss: 0.1517329115446631
Accuracy is: 0.5864783427495291
Iterate: 80%|████████ | 8/10 [27:35<06:54, 207.23s/it]
Train loss: 0.12957811209705325
Accuracy is: 0.5818832391713747
Iterate: 90%|█████████ | 9/10 [31:03<03:27, 207.39s/it]
Train loss: 0.11256680189521162
Accuracy is: 0.5796045197740114
Iterate: 100%|██████████| 10/10 [34:31<00:00, 207.14s/it]

The training loss is decreasing because you model gradually learns your training set. The evaluation accuracy is how well the model learned the global features of your training set and how well your model predicts "unseen data". So, if the loss is decreasing, your model is learning. Perhaps it has learned too specific information from the training set and it is, in fact, overfitting. This means that it fits "too well" to the training data and is unable to make correct predictions on unseen data, due to the fact that the test data may be a little different. That is why the evaluation accuracy is not increasing any more.
This could be an explanation.

Related

Validation loss is constant and training loss decreasing

I have a model training and I got this plot. It is over audio (about 70K of around 5-10s) and no augmentation is being done. I have tried the following to avoid overfitting:
Reduce complexity of the model by reducing number of GRU cells and hidden dimensions.
Add dropout in each layer.
I have tried with higher dataset.
What I am not sure is if my calculation of training loss and validation loss is correct. It is something like this. I am using drop_last=True and I am using the CTC loss criterion.
train_data_len = len(train_loader.dataset)
valid_data_len = len(valid_loader.dataset)
epoch_train_loss = 0
epoch_val_loss = 0
train_losses = []
valid_losses = []
model.train()
for e in range(n_epochs):
t0 = time.time()
#batch loop
running_loss = 0.0
for batch_idx, _data in enumerate(train_loader, 1):
# Calculate output ...
# bla bla
loss = criterion(output, labels.float(), input_lengths, label_lengths)
loss.backward()
optimizer.step()
scheduler.step()
# loss stats
running_loss += loss.item() * specs.size(0)
t_t = time.time() - t0
######################
# validate the model #
######################
with torch.no_grad():
model.eval()
tv = time.time()
running_val_loss = 0.0
for batch_idx_v, _data in enumerate(valid_loader, 1):
#bla, bla
val_loss = criterion(output, labels.float(), input_lengths, label_lengths)
running_val_loss += val_loss.item() * specs.size(0)
print("Epoch {}: Training took {:.2f} [s]\tValidation took: {:.2f} [s]\n".format(e+1, t_t, time.time() - tv))
epoch_train_loss = running_loss / train_data_len
epoch_val_loss = running_val_loss / valid_data_len
train_losses.append(epoch_train_loss)
valid_losses.append(epoch_val_loss)
print('Epoch: {} Losses\tTraining Loss: {:.6f}\tValidation Loss: {:.6f}'.format(
e+1, epoch_train_loss, epoch_val_loss))
model.train()

why is the accuracy of my pretrained resnet-152 model so low?

I am fairly new to deep learning and neural networks. I recently built a facial emotions recognition classifier using the FER-2013 dataset. I am using the pretrained resnet-152 model for classification, but the accuracy of my model is very low, both training and validation accuracies. I am getting an accuracy of around 36%, which is not good. I suppose that using transfer learning, the accuracies should be high, why is it that im getting such a low accuracy. should I change the hyperparameters? here is my code.
model= models.resnet152(pretrained=True)
for param in model.parameters():
param.requires_grad= False
print(model)
from collections import OrderedDict
classifier= nn.Sequential(OrderedDict([
('fc1',nn.Linear(2048, 512)),
('relu', nn.ReLU()),
('dropout1', nn. Dropout(p=0.5)),
('fc2', nn.Linear(512, 7)),
('output', nn.LogSoftmax(dim=1))
]))
model.fc= classifier
print(classifier)
def train_model(model, criterion, optimizer, scheduler, num_epochs=10):
since= time.time()
best_model_wts= copy.deepcopy(model.state_dict())
best_acc= 0.0
for epoch in range(1, num_epochs + 1):
print('Epoch {}/{}'.format(epoch, num_epochs))
print('-' * 10)
for phase in ['train', 'validation']:
if phase == 'train':
scheduler.step()
model.train()
else:
model.eval()
running_loss= 0.0
running_corrects=0
for inputs, labels in dataloaders[phase]:
inputs, labels= inputs.to(device), labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(phase== 'train'):
outputs= model(inputs)
loss= criterion(outputs, labels)
_, preds= torch.max(outputs, 1)
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds== labels.data)
epoch_loss= running_loss / dataset_sizes[phase]
epoch_acc= running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
if phase == 'validation' and epoch_acc > best_acc:
best_acc= epoch_acc
best_model_wts= copy.deepcopy(model.state_dict())
time_elapsed= time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best valid accuracy: {:4f}'.format(best_acc))
model.load_state_dict(best_model_wts)
return model
use_gpu= torch.cuda.is_available()
num_epochs= 10
if use_gpu:
print('Using GPU: '+ str(use_gpu))
model= model.cuda()
criterion= nn.NLLLoss()
optimizer= optim.SGD(model.fc.parameters(), lr = .0006, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
model_ft = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=10)
Can someone please guide me. I am a beginner at it, and I could really make use of some help in it.
preprocess the dataset.
Get more dataset as low accuracy could be a result of smaller dataset.
Try data-augmentation if you have less data.

Binary classification - computing average of accuracy per class does not equal overall accuracy

I have a binary classification problem with balanced number of examples per class. When testing the performance of the classifier on the test set, if I use all examples from both classes I get an accuracy of 79.87 %. However, when testing on the classes individually, accuracy per class 1 is 73.41 % and accuracy per class 2 is 63.31 %. The problem is that if I compute the average accuracy for the two classes, i.e. (73.41 + 63.31) /2 = 68.36 %, which does not equal 79.87 %.
How is this possible? I am using the model.evaluate function from Keras in order to obtain the accuracy numbers. My code is as follows:
model.compile(loss='binary_crossentropy',
optimizer=optim,
metrics=['accuracy'])
earlystop = EarlyStopping(monitor='val_acc', min_delta=0.001, patience=5, verbose=0, mode='auto')
callbacks_list = [earlystop]
X_train, y_train, X_val, y_val = data()
hist = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=30, batch_size=batch_size, shuffle=True, callbacks=callbacks_list)
#get training accuracy
training_accuracy = np.mean(hist.history["acc"])
validation_accuracy = np.mean(hist.history["val_acc"])
print("Training accuracy: %.2f%%" % (training_accuracy * 100))
print("Validation accuracy: %.2f%%" % (validation_accuracy * 100))
scores = model.evaluate(X_test, y_test, verbose=2)
y_pred = model.predict_classes(X_test)
print(metrics.classification_report(y_test, y_pred))
print("Testing loss: %.2f%%" % (scores[0]))
print("Testing accuracy: %.2f%%" % (scores[1]*100))
Why do I get results which don't add up? My setup is very trivial so I am sure there is no bug in my code. Thank you!
I can't find where in your code you're separating the classes to test each one.
But there is a big problem in taking the mean value of the history in np.mean(hist.history["val_acc"]).
The history evolves, you start with terrible accuracy and every epoch improves the value. Certainly, the only value that can be compared is the last.

Low validation accuracy in parallel DenseNet

I've taken the code from https://github.com/flyyufelix/cnn_finetune and remodeled it so that there is now two DenseNet-121 in parallel, with the layers after each model's last Global Average Pooled removed.
Both models were joined together like this:
print("Begin model 1")
model = densenet121_model(img_rows=img_rows, img_cols=img_cols, color_type=channel, num_classes=num_classes)
print("Begin model 2")
model2 = densenet121_nw_model(img_rows=img_rows, img_cols=img_cols, color_type=channel, num_classes=num_classes)
mergedOut = Add()([model.output,model2.output])
#mergedOut = Flatten()(mergedOut)
mergedOut = Dense(num_classes, name='cmb_fc6')(mergedOut)
mergedOut = Activation('softmax', name='cmb_prob')(mergedOut)
newModel = Model([model.input,model2.input], mergedOut)
adam = Adam(lr=1e-3, decay=1e-6, amsgrad=True)
newModel.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
# Start Fine-tuning
newModel.fit([X_train,X_train], Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
shuffle=True,
verbose=1,
validation_data=([X_valid,X_valid],Y_valid)
)
The first model has its layers frozen, and the one in parallel is suppose to learn additional features on top of the first model to supposedly improve accuracy.
However, even at 100 epochs,
the training accuracy is almost 100% but validation floats around 9%.
I'm not quite sure what could be the reason and how to fix it, considering I've already changed the optimizer from SGD (same concept, 2 densenets with the first trained on ImageNet, the second has no weights to begin with same results) to Adam (2 densenets, both pre-trained on imagenet).
Epoch 101/1000
1000/1000 [==============================] - 1678s 2s/step - loss: 0.0550 - acc: 0.9820 - val_loss: 12.9906 - val_acc: 0.0900
Epoch 102/1000
1000/1000 [==============================] - 1703s 2s/step - loss: 0.0567 - acc: 0.9880 - val_loss: 12.9804 - val_acc: 0.1100

Keras: batch training for multiple large datasets

this question regards the common problem of training on multiple large files in Keras which are jointly too large to fit on GPU memory.
I am using Keras 1.0.5 and I would like a solution that does not require 1.0.6.
One way to do this was described by fchollet
here and
here:
# Create generator that yields (current features X, current labels y)
def BatchGenerator(files):
for file in files:
current_data = pickle.load(open("file", "rb"))
X_train = current_data[:,:-1]
y_train = current_data[:,-1]
yield (X_train, y_train)
# train model on each dataset
for epoch in range(n_epochs):
for (X_train, y_train) in BatchGenerator(files):
model.fit(X_train, y_train, batch_size = 32, nb_epoch = 1)
However I fear that the state of the model is not saved, rather that the model is reinitialized not only between epochs but also between datasets. Each "Epoch 1/1" represents training on a different dataset below:
~~~~~ Epoch 0 ~~~~~~
Epoch 1/1
295806/295806 [==============================] - 13s - loss: 15.7517
Epoch 1/1
407890/407890 [==============================] - 19s - loss: 15.8036
Epoch 1/1
383188/383188 [==============================] - 19s - loss: 15.8130
~~~~~ Epoch 1 ~~~~~~
Epoch 1/1
295806/295806 [==============================] - 14s - loss: 15.7517
Epoch 1/1
407890/407890 [==============================] - 20s - loss: 15.8036
Epoch 1/1
383188/383188 [==============================] - 15s - loss: 15.8130
I am aware that one can use model.fit_generator but as the method above was repeatedly suggested as a way of batch training I would like to know what I am doing wrong.
Thanks for your help,
Max
It has been a while since I faced that problem but I remember that I used
Kera's functionality to provide data through Python generators, i.e. model = Sequential(); model.fit_generator(...).
An exemplary code snippet (should be self-explanatory)
def generate_batches(files, batch_size):
counter = 0
while True:
fname = files[counter]
print(fname)
counter = (counter + 1) % len(files)
data_bundle = pickle.load(open(fname, "rb"))
X_train = data_bundle[0].astype(np.float32)
y_train = data_bundle[1].astype(np.float32)
y_train = y_train.flatten()
for cbatch in range(0, X_train.shape[0], batch_size):
yield (X_train[cbatch:(cbatch + batch_size),:,:], y_train[cbatch:(cbatch + batch_size)])
model = Sequential()
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
train_files = [train_bundle_loc + "bundle_" + cb.__str__() for cb in range(nb_train_bundles)]
gen = generate_batches(files=train_files, batch_size=batch_size)
history = model.fit_generator(gen, samples_per_epoch=samples_per_epoch, nb_epoch=num_epoch,verbose=1, class_weight=class_weights)