this question regards the common problem of training on multiple large files in Keras which are jointly too large to fit on GPU memory.
I am using Keras 1.0.5 and I would like a solution that does not require 1.0.6.
One way to do this was described by fchollet
here and
here:
# Create generator that yields (current features X, current labels y)
def BatchGenerator(files):
for file in files:
current_data = pickle.load(open("file", "rb"))
X_train = current_data[:,:-1]
y_train = current_data[:,-1]
yield (X_train, y_train)
# train model on each dataset
for epoch in range(n_epochs):
for (X_train, y_train) in BatchGenerator(files):
model.fit(X_train, y_train, batch_size = 32, nb_epoch = 1)
However I fear that the state of the model is not saved, rather that the model is reinitialized not only between epochs but also between datasets. Each "Epoch 1/1" represents training on a different dataset below:
~~~~~ Epoch 0 ~~~~~~
Epoch 1/1
295806/295806 [==============================] - 13s - loss: 15.7517
Epoch 1/1
407890/407890 [==============================] - 19s - loss: 15.8036
Epoch 1/1
383188/383188 [==============================] - 19s - loss: 15.8130
~~~~~ Epoch 1 ~~~~~~
Epoch 1/1
295806/295806 [==============================] - 14s - loss: 15.7517
Epoch 1/1
407890/407890 [==============================] - 20s - loss: 15.8036
Epoch 1/1
383188/383188 [==============================] - 15s - loss: 15.8130
I am aware that one can use model.fit_generator but as the method above was repeatedly suggested as a way of batch training I would like to know what I am doing wrong.
Thanks for your help,
Max
It has been a while since I faced that problem but I remember that I used
Kera's functionality to provide data through Python generators, i.e. model = Sequential(); model.fit_generator(...).
An exemplary code snippet (should be self-explanatory)
def generate_batches(files, batch_size):
counter = 0
while True:
fname = files[counter]
print(fname)
counter = (counter + 1) % len(files)
data_bundle = pickle.load(open(fname, "rb"))
X_train = data_bundle[0].astype(np.float32)
y_train = data_bundle[1].astype(np.float32)
y_train = y_train.flatten()
for cbatch in range(0, X_train.shape[0], batch_size):
yield (X_train[cbatch:(cbatch + batch_size),:,:], y_train[cbatch:(cbatch + batch_size)])
model = Sequential()
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
train_files = [train_bundle_loc + "bundle_" + cb.__str__() for cb in range(nb_train_bundles)]
gen = generate_batches(files=train_files, batch_size=batch_size)
history = model.fit_generator(gen, samples_per_epoch=samples_per_epoch, nb_epoch=num_epoch,verbose=1, class_weight=class_weights)
Related
this is the train and development cell for multi-label classification task using Roberta (BERT). the first part is training and second part is development (validation). train_dataloader is my train dataset and dev_dataloader is development dataset. my question is: why train loss is decreasing step by step, but accuracy doesn't increase so much? practically, accuracy is increasing until iterate 4, but train loss is decreasing until the last epoch (iterate). is this ok or there should be a problem?
train_loss_set = []
iterate = 4
for _ in trange(iterate, desc="Iterate"):
model.train()
train_loss = 0
nu_train_examples, nu_train_steps = 0, 0
for step, batch in enumerate(train_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
optimizer.zero_grad()
output = model(batch_input_ids, attention_mask=batch_input_mask)
logits = output[0]
loss_function = BCEWithLogitsLoss()
loss = loss_function(logits.view(-1,num_labels),batch_labels.type_as(logits).view(-1,num_labels))
train_loss_set.append(loss.item())
loss.backward()
optimizer.step()
train_loss += loss.item()
nu_train_examples += batch_input_ids.size(0)
nu_train_steps += 1
print("Train loss: {}".format(train_loss/nu_train_steps))
###############################################################################
model.eval()
logits_pred,true_labels,pred_labels,tokenized_texts = [],[],[],[]
# Predict
for i, batch in enumerate(dev_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
with torch.no_grad():
out = model(batch_input_ids, attention_mask=batch_input_mask)
batch_logit_pred = out[0]
pred_label = torch.sigmoid(batch_logit_pred)
batch_logit_pred = batch_logit_pred.detach().cpu().numpy()
pred_label = pred_label.to('cpu').numpy()
batch_labels = batch_labels.to('cpu').numpy()
tokenized_texts.append(batch_input_ids)
logits_pred.append(batch_logit_pred)
true_labels.append(batch_labels)
pred_labels.append(pred_label)
pred_labels = [item for sublist in pred_labels for item in sublist]
true_labels = [item for sublist in true_labels for item in sublist]
threshold = 0.4
pred_bools = [pl>threshold for pl in pred_labels]
true_bools = [tl==1 for tl in true_labels]
print("Accuracy is: ", jaccard_score(true_bools,pred_bools,average='samples'))
torch.save(model.state_dict(), 'bert_model')
and the outputs:
Iterate: 0%| | 0/10 [00:00<?, ?it/s]
Train loss: 0.4024542534684801
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Jaccard is ill-defined and being set to 0.0 in samples with no true or predicted labels. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Accuracy is: 0.5806403013182674
Iterate: 10%|█ | 1/10 [03:21<30:14, 201.64s/it]
Train loss: 0.2972540049911379
Accuracy is: 0.6091337099811676
Iterate: 20%|██ | 2/10 [06:49<27:07, 203.49s/it]
Train loss: 0.26178574864264137
Accuracy is: 0.608361581920904
Iterate: 30%|███ | 3/10 [10:17<23:53, 204.78s/it]
Train loss: 0.23612180122962365
Accuracy is: 0.6096717783158462
Iterate: 40%|████ | 4/10 [13:44<20:33, 205.66s/it]
Train loss: 0.21416303515434265
Accuracy is: 0.6046892655367231
Iterate: 50%|█████ | 5/10 [17:12<17:11, 206.27s/it]
Train loss: 0.1929110718982203
Accuracy is: 0.6030885122410546
Iterate: 60%|██████ | 6/10 [20:40<13:46, 206.74s/it]
Train loss: 0.17280191068465894
Accuracy is: 0.6003766478342749
Iterate: 70%|███████ | 7/10 [24:08<10:21, 207.04s/it]
Train loss: 0.1517329115446631
Accuracy is: 0.5864783427495291
Iterate: 80%|████████ | 8/10 [27:35<06:54, 207.23s/it]
Train loss: 0.12957811209705325
Accuracy is: 0.5818832391713747
Iterate: 90%|█████████ | 9/10 [31:03<03:27, 207.39s/it]
Train loss: 0.11256680189521162
Accuracy is: 0.5796045197740114
Iterate: 100%|██████████| 10/10 [34:31<00:00, 207.14s/it]
The training loss is decreasing because you model gradually learns your training set. The evaluation accuracy is how well the model learned the global features of your training set and how well your model predicts "unseen data". So, if the loss is decreasing, your model is learning. Perhaps it has learned too specific information from the training set and it is, in fact, overfitting. This means that it fits "too well" to the training data and is unable to make correct predictions on unseen data, due to the fact that the test data may be a little different. That is why the evaluation accuracy is not increasing any more.
This could be an explanation.
I have a model training and I got this plot. It is over audio (about 70K of around 5-10s) and no augmentation is being done. I have tried the following to avoid overfitting:
Reduce complexity of the model by reducing number of GRU cells and hidden dimensions.
Add dropout in each layer.
I have tried with higher dataset.
What I am not sure is if my calculation of training loss and validation loss is correct. It is something like this. I am using drop_last=True and I am using the CTC loss criterion.
train_data_len = len(train_loader.dataset)
valid_data_len = len(valid_loader.dataset)
epoch_train_loss = 0
epoch_val_loss = 0
train_losses = []
valid_losses = []
model.train()
for e in range(n_epochs):
t0 = time.time()
#batch loop
running_loss = 0.0
for batch_idx, _data in enumerate(train_loader, 1):
# Calculate output ...
# bla bla
loss = criterion(output, labels.float(), input_lengths, label_lengths)
loss.backward()
optimizer.step()
scheduler.step()
# loss stats
running_loss += loss.item() * specs.size(0)
t_t = time.time() - t0
######################
# validate the model #
######################
with torch.no_grad():
model.eval()
tv = time.time()
running_val_loss = 0.0
for batch_idx_v, _data in enumerate(valid_loader, 1):
#bla, bla
val_loss = criterion(output, labels.float(), input_lengths, label_lengths)
running_val_loss += val_loss.item() * specs.size(0)
print("Epoch {}: Training took {:.2f} [s]\tValidation took: {:.2f} [s]\n".format(e+1, t_t, time.time() - tv))
epoch_train_loss = running_loss / train_data_len
epoch_val_loss = running_val_loss / valid_data_len
train_losses.append(epoch_train_loss)
valid_losses.append(epoch_val_loss)
print('Epoch: {} Losses\tTraining Loss: {:.6f}\tValidation Loss: {:.6f}'.format(
e+1, epoch_train_loss, epoch_val_loss))
model.train()
I am fairly new to deep learning and neural networks. I recently built a facial emotions recognition classifier using the FER-2013 dataset. I am using the pretrained resnet-152 model for classification, but the accuracy of my model is very low, both training and validation accuracies. I am getting an accuracy of around 36%, which is not good. I suppose that using transfer learning, the accuracies should be high, why is it that im getting such a low accuracy. should I change the hyperparameters? here is my code.
model= models.resnet152(pretrained=True)
for param in model.parameters():
param.requires_grad= False
print(model)
from collections import OrderedDict
classifier= nn.Sequential(OrderedDict([
('fc1',nn.Linear(2048, 512)),
('relu', nn.ReLU()),
('dropout1', nn. Dropout(p=0.5)),
('fc2', nn.Linear(512, 7)),
('output', nn.LogSoftmax(dim=1))
]))
model.fc= classifier
print(classifier)
def train_model(model, criterion, optimizer, scheduler, num_epochs=10):
since= time.time()
best_model_wts= copy.deepcopy(model.state_dict())
best_acc= 0.0
for epoch in range(1, num_epochs + 1):
print('Epoch {}/{}'.format(epoch, num_epochs))
print('-' * 10)
for phase in ['train', 'validation']:
if phase == 'train':
scheduler.step()
model.train()
else:
model.eval()
running_loss= 0.0
running_corrects=0
for inputs, labels in dataloaders[phase]:
inputs, labels= inputs.to(device), labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(phase== 'train'):
outputs= model(inputs)
loss= criterion(outputs, labels)
_, preds= torch.max(outputs, 1)
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds== labels.data)
epoch_loss= running_loss / dataset_sizes[phase]
epoch_acc= running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
if phase == 'validation' and epoch_acc > best_acc:
best_acc= epoch_acc
best_model_wts= copy.deepcopy(model.state_dict())
time_elapsed= time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best valid accuracy: {:4f}'.format(best_acc))
model.load_state_dict(best_model_wts)
return model
use_gpu= torch.cuda.is_available()
num_epochs= 10
if use_gpu:
print('Using GPU: '+ str(use_gpu))
model= model.cuda()
criterion= nn.NLLLoss()
optimizer= optim.SGD(model.fc.parameters(), lr = .0006, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
model_ft = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=10)
Can someone please guide me. I am a beginner at it, and I could really make use of some help in it.
preprocess the dataset.
Get more dataset as low accuracy could be a result of smaller dataset.
Try data-augmentation if you have less data.
Problem
I am trying to implement 2-layer neural network using different methods (TensorFlow, PyTorch and from scratch) and then compare their performance based on MNIST dataset.
I am not sure what mistakes I have made, but the accuracy in PyTorch is only about 10%, which is basically random guess. I think probably the weights does not get updated at all.
Note that I intentionally use the dataset provided by TensorFlow to keep the data I use through 3 different methods consistent for accurate comparison.
from tensorflow.examples.tutorials.mnist import input_data
import torch
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = torch.nn.Linear(784, 100)
self.fc2 = torch.nn.Linear(100, 10)
def forward(self, x):
# x -> (batch_size, 784)
x = torch.relu(x)
# x -> (batch_size, 10)
x = torch.softmax(x, dim=1)
return x
net = Net()
net.zero_grad()
Loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
for epoch in range(1000): # loop over the dataset multiple times
batch_xs, batch_ys = mnist_m.train.next_batch(100)
# convert to appropriate settins
# note the input to the linear layer should be (n_sample, n_features)
batch_xs = torch.tensor(batch_xs, requires_grad=True)
# batch_ys -> (batch_size,)
batch_ys = torch.tensor(batch_ys, dtype=torch.int64)
# forward
# output -> (batch_size, 10)
output = net(batch_xs)
# result -> (batch_size,)
result = torch.argmax(output, dim=1)
loss = Loss(output, batch_ys)
# backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
The problem here is that you don't apply your fully connected layers fc1 and fc2.
Your forward() currently looks like:
def forward(self, x):
# x -> (batch_size, 784)
x = torch.relu(x)
# x -> (batch_size, 10)
x = torch.softmax(x, dim=1)
return x
So if you change it to:
def forward(self, x):
# x -> (batch_size, 784)
x = self.fc1(x) # added layer fc1
x = torch.relu(x)
# x -> (batch_size, 10)
x = self.fc2(x) # added layer fc2
x = torch.softmax(x, dim=1)
return x
It should work.
Regarding Umang Guptas answer: As I see it, calling zero_grad() before calling backward() as Mr.Robot did, is just fine. This shouldn't be a problem.
Edit:
So I did a short test - I set iterations from 1000 to 10000 to see the bigger picture if it is really decreasing. ( Of course I also loaded the data to mnist_m as this wasn't included in the code you've posted )
I added a print condition to the code:
if epoch % 1000 == 0:
print('Epoch', epoch, '- Loss:', round(loss.item(), 3))
Which prints out the loss every 1000 iterations:
Epoch 0 - Loss: 2.305
Epoch 1000 - Loss: 2.263
Epoch 2000 - Loss: 2.187
Epoch 3000 - Loss: 2.024
Epoch 4000 - Loss: 1.819
Epoch 5000 - Loss: 1.699
Epoch 6000 - Loss: 1.699
Epoch 7000 - Loss: 1.656
Epoch 8000 - Loss: 1.675
Epoch 9000 - Loss: 1.659
Tested with PyTorch version 0.4.1
So you can see that with the changed forward() the network is learning now, the rest of the code I left untouched.
Good luck further!
I've taken the code from https://github.com/flyyufelix/cnn_finetune and remodeled it so that there is now two DenseNet-121 in parallel, with the layers after each model's last Global Average Pooled removed.
Both models were joined together like this:
print("Begin model 1")
model = densenet121_model(img_rows=img_rows, img_cols=img_cols, color_type=channel, num_classes=num_classes)
print("Begin model 2")
model2 = densenet121_nw_model(img_rows=img_rows, img_cols=img_cols, color_type=channel, num_classes=num_classes)
mergedOut = Add()([model.output,model2.output])
#mergedOut = Flatten()(mergedOut)
mergedOut = Dense(num_classes, name='cmb_fc6')(mergedOut)
mergedOut = Activation('softmax', name='cmb_prob')(mergedOut)
newModel = Model([model.input,model2.input], mergedOut)
adam = Adam(lr=1e-3, decay=1e-6, amsgrad=True)
newModel.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
# Start Fine-tuning
newModel.fit([X_train,X_train], Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
shuffle=True,
verbose=1,
validation_data=([X_valid,X_valid],Y_valid)
)
The first model has its layers frozen, and the one in parallel is suppose to learn additional features on top of the first model to supposedly improve accuracy.
However, even at 100 epochs,
the training accuracy is almost 100% but validation floats around 9%.
I'm not quite sure what could be the reason and how to fix it, considering I've already changed the optimizer from SGD (same concept, 2 densenets with the first trained on ImageNet, the second has no weights to begin with same results) to Adam (2 densenets, both pre-trained on imagenet).
Epoch 101/1000
1000/1000 [==============================] - 1678s 2s/step - loss: 0.0550 - acc: 0.9820 - val_loss: 12.9906 - val_acc: 0.0900
Epoch 102/1000
1000/1000 [==============================] - 1703s 2s/step - loss: 0.0567 - acc: 0.9880 - val_loss: 12.9804 - val_acc: 0.1100