Low validation accuracy in parallel DenseNet - deep-learning

I've taken the code from https://github.com/flyyufelix/cnn_finetune and remodeled it so that there is now two DenseNet-121 in parallel, with the layers after each model's last Global Average Pooled removed.
Both models were joined together like this:
print("Begin model 1")
model = densenet121_model(img_rows=img_rows, img_cols=img_cols, color_type=channel, num_classes=num_classes)
print("Begin model 2")
model2 = densenet121_nw_model(img_rows=img_rows, img_cols=img_cols, color_type=channel, num_classes=num_classes)
mergedOut = Add()([model.output,model2.output])
#mergedOut = Flatten()(mergedOut)
mergedOut = Dense(num_classes, name='cmb_fc6')(mergedOut)
mergedOut = Activation('softmax', name='cmb_prob')(mergedOut)
newModel = Model([model.input,model2.input], mergedOut)
adam = Adam(lr=1e-3, decay=1e-6, amsgrad=True)
newModel.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
# Start Fine-tuning
newModel.fit([X_train,X_train], Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
shuffle=True,
verbose=1,
validation_data=([X_valid,X_valid],Y_valid)
)
The first model has its layers frozen, and the one in parallel is suppose to learn additional features on top of the first model to supposedly improve accuracy.
However, even at 100 epochs,
the training accuracy is almost 100% but validation floats around 9%.
I'm not quite sure what could be the reason and how to fix it, considering I've already changed the optimizer from SGD (same concept, 2 densenets with the first trained on ImageNet, the second has no weights to begin with same results) to Adam (2 densenets, both pre-trained on imagenet).
Epoch 101/1000
1000/1000 [==============================] - 1678s 2s/step - loss: 0.0550 - acc: 0.9820 - val_loss: 12.9906 - val_acc: 0.0900
Epoch 102/1000
1000/1000 [==============================] - 1703s 2s/step - loss: 0.0567 - acc: 0.9880 - val_loss: 12.9804 - val_acc: 0.1100

Related

Train loss is decreasing, but accuracy remain the same

this is the train and development cell for multi-label classification task using Roberta (BERT). the first part is training and second part is development (validation). train_dataloader is my train dataset and dev_dataloader is development dataset. my question is: why train loss is decreasing step by step, but accuracy doesn't increase so much? practically, accuracy is increasing until iterate 4, but train loss is decreasing until the last epoch (iterate). is this ok or there should be a problem?
train_loss_set = []
iterate = 4
for _ in trange(iterate, desc="Iterate"):
model.train()
train_loss = 0
nu_train_examples, nu_train_steps = 0, 0
for step, batch in enumerate(train_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
optimizer.zero_grad()
output = model(batch_input_ids, attention_mask=batch_input_mask)
logits = output[0]
loss_function = BCEWithLogitsLoss()
loss = loss_function(logits.view(-1,num_labels),batch_labels.type_as(logits).view(-1,num_labels))
train_loss_set.append(loss.item())
loss.backward()
optimizer.step()
train_loss += loss.item()
nu_train_examples += batch_input_ids.size(0)
nu_train_steps += 1
print("Train loss: {}".format(train_loss/nu_train_steps))
###############################################################################
model.eval()
logits_pred,true_labels,pred_labels,tokenized_texts = [],[],[],[]
# Predict
for i, batch in enumerate(dev_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
with torch.no_grad():
out = model(batch_input_ids, attention_mask=batch_input_mask)
batch_logit_pred = out[0]
pred_label = torch.sigmoid(batch_logit_pred)
batch_logit_pred = batch_logit_pred.detach().cpu().numpy()
pred_label = pred_label.to('cpu').numpy()
batch_labels = batch_labels.to('cpu').numpy()
tokenized_texts.append(batch_input_ids)
logits_pred.append(batch_logit_pred)
true_labels.append(batch_labels)
pred_labels.append(pred_label)
pred_labels = [item for sublist in pred_labels for item in sublist]
true_labels = [item for sublist in true_labels for item in sublist]
threshold = 0.4
pred_bools = [pl>threshold for pl in pred_labels]
true_bools = [tl==1 for tl in true_labels]
print("Accuracy is: ", jaccard_score(true_bools,pred_bools,average='samples'))
torch.save(model.state_dict(), 'bert_model')
and the outputs:
Iterate: 0%| | 0/10 [00:00<?, ?it/s]
Train loss: 0.4024542534684801
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Jaccard is ill-defined and being set to 0.0 in samples with no true or predicted labels. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Accuracy is: 0.5806403013182674
Iterate: 10%|█ | 1/10 [03:21<30:14, 201.64s/it]
Train loss: 0.2972540049911379
Accuracy is: 0.6091337099811676
Iterate: 20%|██ | 2/10 [06:49<27:07, 203.49s/it]
Train loss: 0.26178574864264137
Accuracy is: 0.608361581920904
Iterate: 30%|███ | 3/10 [10:17<23:53, 204.78s/it]
Train loss: 0.23612180122962365
Accuracy is: 0.6096717783158462
Iterate: 40%|████ | 4/10 [13:44<20:33, 205.66s/it]
Train loss: 0.21416303515434265
Accuracy is: 0.6046892655367231
Iterate: 50%|█████ | 5/10 [17:12<17:11, 206.27s/it]
Train loss: 0.1929110718982203
Accuracy is: 0.6030885122410546
Iterate: 60%|██████ | 6/10 [20:40<13:46, 206.74s/it]
Train loss: 0.17280191068465894
Accuracy is: 0.6003766478342749
Iterate: 70%|███████ | 7/10 [24:08<10:21, 207.04s/it]
Train loss: 0.1517329115446631
Accuracy is: 0.5864783427495291
Iterate: 80%|████████ | 8/10 [27:35<06:54, 207.23s/it]
Train loss: 0.12957811209705325
Accuracy is: 0.5818832391713747
Iterate: 90%|█████████ | 9/10 [31:03<03:27, 207.39s/it]
Train loss: 0.11256680189521162
Accuracy is: 0.5796045197740114
Iterate: 100%|██████████| 10/10 [34:31<00:00, 207.14s/it]
The training loss is decreasing because you model gradually learns your training set. The evaluation accuracy is how well the model learned the global features of your training set and how well your model predicts "unseen data". So, if the loss is decreasing, your model is learning. Perhaps it has learned too specific information from the training set and it is, in fact, overfitting. This means that it fits "too well" to the training data and is unable to make correct predictions on unseen data, due to the fact that the test data may be a little different. That is why the evaluation accuracy is not increasing any more.
This could be an explanation.

The output of my regression NN with LSTMs is wrong even with low val_loss

The Model
I am currently working on a stack of LSTMs and trying to solve a regression problem. The architecture of the model is as below:
comp_lstm = tf.keras.models.Sequential([
tf.keras.layers.LSTM(64, return_sequences = True),
tf.keras.layers.LSTM(64, return_sequences = True),
tf.keras.layers.LSTM(64),
tf.keras.layers.Dense(units=128),
tf.keras.layers.Dense(units=64),
tf.keras.layers.Dense(units=32),
tf.keras.layers.Dense(units=1)
])
comp_lstm.compile(optimizer='adam', loss='mae')
When I train the model, it shows some good loss and val_loss figures:
Epoch 6/20
200/200 [==============================] - 463s 2s/step - loss: 1.3793 - val_loss: 1.3578
Epoch 7/20
200/200 [==============================] - 461s 2s/step - loss: 1.3791 - val_loss: 1.3602
Now I run the code to check the output with the code below:
idx = np.random.randint(len(val_X))
sample_X, sample_y = [[val_X[idx,:]]], [[val_y[idx]]]
test = tf.data.Dataset.from_tensor_slices(([sample_X], [sample_y]))
prediction = comp_lstm.predict(test)
print(f'The actual value was {sample_y} and the model predicted {prediction}')
And the output is:
The actual value was [[21.3]] and the model predicted [[2.7479606]]
The next few times I ran it, I got the value:
The actual value was [[23.1]] and the model predicted [[0.8445232]]
The actual value was [[21.2]] and the model predicted [[2.5449793]]
The actual value was [[22.5]] and the model predicted [[1.2662419]]
I am not sure why this is working out the way that it is. The val_loss is super low, but the output is wildly different.
The Data Wrangling
The data wrangling in order to get train_X and val_X etc. is shown below:
hist2 = 128
features2 = np.array(list(map(list,[df["scaled_temp"].shift(x) for x in range(1, hist2+1)]))).T.tolist()
df_feat2 = pd.DataFrame([pd.Series(x) for x in features2], index = df.index)
df_trans2 = df.join(df_feat2).drop(columns=['scaled_temp']).iloc[hist2:]
df_trans2 = df_trans2.sample(frac=1)
target = df_trans2['T (degC)'].values
feat2 = df_trans2.drop(columns = ['T (degC)']).values
The shape of feat2 is (44435, 128), while the shape of target is (44435,)
The dataframe that is the column df["scaled_temp"] is shown below (which has been scaled with a standard scaler):
Date Time
2020-04-23T21:14:07.546476Z -0.377905
2020-04-23T21:17:32.406111Z -0.377905
2020-04-23T21:17:52.670373Z -0.377905
2020-04-23T21:18:55.010392Z -0.377905
2020-04-23T21:19:57.327291Z -0.377905
...
2020-06-08T09:13:06.718934Z -0.889968
2020-06-08T09:14:09.170193Z -0.889968
2020-06-08T09:15:11.634954Z -0.889968
2020-06-08T09:16:14.087139Z -0.889968
2020-06-08T09:17:16.549216Z -0.889968
Name: scaled_temp, Length: 44563, dtype: float64
The dataframe for df['T (degC)'] is shown below:
Date Time
2020-05-09T07:30:30.621001Z 24.0
2020-05-11T15:56:30.856851Z 21.3
2020-05-27T05:02:09.407266Z 28.3
2020-05-02T09:33:03.219329Z 20.5
2020-05-31T03:20:04.326902Z 22.4
...
2020-05-31T01:47:45.982819Z 23.1
2020-05-27T08:03:21.456607Z 27.2
2020-05-04T21:58:36.652251Z 20.9
2020-05-17T18:42:39.681050Z 22.5
2020-05-04T22:07:58.350329Z 21.1
Name: T (degC), Length: 44435, dtype: float64
The dataset creation process is as below:
train_X, val_X = feat2[:int(feat2.shape[0]*0.95), :], feat2[int(feat2.shape[0]*0.95):, :]
train_y, val_y = target[:int(target.shape[0]*0.95)], target[int(target.shape[0]*0.95):]
train = tf.data.Dataset.from_tensor_slices(([train_X], [train_y])).batch(BATCH_SIZE).repeat()
val = tf.data.Dataset.from_tensor_slices(([val_X], [val_y])).batch(BATCH_SIZE).repeat()
So I am not sure as to why this is happening.

Unable to save weights while using pre-trained VGG16 model

While using the pre-trained VGG16 model I am unable to save the weights of the best model. I use this code:
checkpointer = [
# Stop if the accuracy is not improving after 7 iterations
EarlyStopping(monitor='val_loss', patience=3, verbose=1),
# Saving the best model and re-use it while prediction
ModelCheckpoint(filepath="C:/Users/skumarravindran/Documents/keras_save_model/vgg16_v1.hdf5", verbose=1, monitor='val_acc', save_best_only=True),
#
]
And I get the following error:
C:\Users\skumarravindran\AppData\Local\Continuum\Anaconda2\envs\py35gpu1\lib\site-packages\keras\callbacks.py:405: RuntimeWarning: Can save best model only with val_acc available, skipping.
'skipping.' % (self.monitor), RuntimeWarning)
I experienced two situations where this error arises:
introducing a custom metric
using multiple outputs
In both cases the acc and val_acc are not computed. Strangely, Keras does compute an overall loss and val_loss.
You can remedy the first situation by adding accuracy to the metrics but that may have side effects, I am not sure. In both cases however, you can add acc and val_acc yourself in a callback. I have added an example for the multi output case where I have created a custom callback in which I compute my own acc and val_acc results by averaging over all val's and val_acc's of the output layers.
I have a model having are 5 dense output layers at the end, labeled D0..D4. The output of one epoch is as follows:
3540/3540 [==============================] - 21s 6ms/step - loss: 14.1437 -
D0_loss: 3.0446 - D1_loss: 2.6544 - D2_loss: 3.0808 - D3_loss: 2.7751 -
D4_loss: 2.5889 - D0_acc: 0.2362 - D1_acc: 0.3681 - D2_acc: 0.1542 - D3_acc: 0.1161 -
D4_acc: 0.3994 - val_loss: 8.7598 - val_D0_loss: 2.0797 - val_D1_loss: 1.4088 -
val_D2_loss: 2.0711 - val_D3_loss: 1.9064 - val_D4_loss: 1.2938 -
val_D0_acc: 0.2661 - val_D1_acc: 0.3924 - val_D2_acc: 0.1763 -
val_D3_acc: 0.1695 - val_D4_acc: 0.4627
As you can see it outputs an overall loss and val_loss and for each output layer: Di_loss, Di_acc, val_Di_loss and val_Di_acc, for i in 0..4. All of this is the content of the logs dictionary which is transmitted as a parameter in on_epoch_begin and on_epoch_end of a callback. Callbacks have more event handlers but for our purpose these two are the most relevant. When you have 5 outputs (as in my case) then the size of the dictionary is 5 times 4(acc, loss, val_acc, val_loss) + 2 (loss+val_loss).
What I did is compute the average of all accuracies and validation accuracies to add two items to logs:
logs['acc'] = som_acc / n_accs
logs['val_acc'] = som_val_acc / n_accs
Be sure you add this callback before the checkpoint callback, else the extra information you provide will not bee 'seen'. If all is implemented correctly the error message does not appear anymore and the model is happily checkpointing.
The code of my callback for the multiple output case is provided below.
class ExtraLogInfo(keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs):
self.timed = time.time()
return
def on_epoch_end(self, epoch, logs):
print(logs.keys())
som_acc = 0.0
som_val_acc = 0.0
n_accs = (len(logs) - 2) // 4
for i in range(n_accs):
acc_ptn = 'D{:d}_acc'.format(i)
val_acc_ptn = 'val_D{:d}_acc'.format(i)
som_acc += logs[acc_ptn]
som_val_acc += logs[val_acc_ptn]
logs['acc'] = som_acc / n_accs
logs['val_acc'] = som_val_acc / n_accs
logs['time'] = time.time() - self.timed
return
By using following code you will be able to save best model based on accuracy.
Please use following code:
model.compile(loss='categorical_crossentropy', optimizer= 'adam',
metrics=['accuracy'])
history = model.fit_generator(
train_datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=x_train.shape[0] // batch_size,
epochs=epochs,
callbacks=[ModelCheckpoint('VGG16-transferlearning.model', monitor='val_acc', save_best_only=True)]
)

Why is my accuracy high from the beginning of training?

I am training a neural network to recognize some of attributes on .png pictures, and what I get when I start training is something like this, and it is increasing till the end of the epoch:
32/4817 [..............................] - ETA: 167s - loss: 0.6756 - acc: 0.5
64/4817 [..............................] - ETA: 152s - loss: 0.6214 - acc: 0.7
96/4817 [..............................] - ETA: 145s - loss: 0.6169 - acc: 0.7
128/4817 [.............................] - ETA: 142s - loss: 0.5972 - acc: 0.7
160/4817 [.............................] - ETA: 140s - loss: 0.5734 - acc: 0.7
192/4817 [>............................] - ETA: 138s - loss: 0.5604 - acc: 0.7
224/4817 [>............................] - ETA: 137s - loss: 0.5427 - acc: 0.7
256/4817 [>............................] - ETA: 135s - loss: 0.5160 - acc: 0.7
288/4817 [>............................] - ETA: 134s - loss: 0.5492 - acc: 0.7
320/4817 [>............................] - ETA: 133s - loss: 0.5574 - acc: 0.7
352/4817 [=>...........................] - ETA: 131s - loss: 0.5559 - acc: 0.7
384/4817 [=>...........................] - ETA: 129s - loss: 0.5550 - acc: 0.7
416/4817 [=>...........................] - ETA: 128s - loss: 0.5504 - acc: 0.7
448/4817 [=>...........................] - ETA: 127s - loss: 0.5417 - acc: 0.7
480/4817 [=>...........................] - ETA: 126s - loss: 0.5425 - acc: 0.7
My question is why is the starting accuracy so high? I suppose it should be something around 0.1 and then increasing while learning.
Also, at the end I get:
('Test loss:', 0.42451223436727564)
('Test accuracy:', 0.82572614112830256)
Is that too big test loss?
This is my network:
input_shape = x_train[0].shape
print(input_shape)
model = Sequential()
stoplearn = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
patience=0, verbose=0, mode='auto')
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=20,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[stoplearn])
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
It is written in Python using Keras.
You classify your data into two classes (since your output layer is of size 2), so accuracy of 0.5 is not high. In fact, it means that your network behaves randomly, which is what you expect at the beginning. Regarding the loss, there is no absolute answer for that. Your test accuracy seems not bad, and you could try to play with some of the parameters (for example, taking smaller size for the fully connected layer) to see whether you can improve it.
You have two classes. Random choice will lead to 50% accuracy. This is what you get in the beginning. Hence your result is expected.
The reason why it jumps directly to 70% accuracy could be that your problem is simple.
If you want to double-check it, you could
use other classifiers,
check how many examples are used to calculate accuracy,
serialize the trained classifier and manually feed it with new examples and check their results

Keras: batch training for multiple large datasets

this question regards the common problem of training on multiple large files in Keras which are jointly too large to fit on GPU memory.
I am using Keras 1.0.5 and I would like a solution that does not require 1.0.6.
One way to do this was described by fchollet
here and
here:
# Create generator that yields (current features X, current labels y)
def BatchGenerator(files):
for file in files:
current_data = pickle.load(open("file", "rb"))
X_train = current_data[:,:-1]
y_train = current_data[:,-1]
yield (X_train, y_train)
# train model on each dataset
for epoch in range(n_epochs):
for (X_train, y_train) in BatchGenerator(files):
model.fit(X_train, y_train, batch_size = 32, nb_epoch = 1)
However I fear that the state of the model is not saved, rather that the model is reinitialized not only between epochs but also between datasets. Each "Epoch 1/1" represents training on a different dataset below:
~~~~~ Epoch 0 ~~~~~~
Epoch 1/1
295806/295806 [==============================] - 13s - loss: 15.7517
Epoch 1/1
407890/407890 [==============================] - 19s - loss: 15.8036
Epoch 1/1
383188/383188 [==============================] - 19s - loss: 15.8130
~~~~~ Epoch 1 ~~~~~~
Epoch 1/1
295806/295806 [==============================] - 14s - loss: 15.7517
Epoch 1/1
407890/407890 [==============================] - 20s - loss: 15.8036
Epoch 1/1
383188/383188 [==============================] - 15s - loss: 15.8130
I am aware that one can use model.fit_generator but as the method above was repeatedly suggested as a way of batch training I would like to know what I am doing wrong.
Thanks for your help,
Max
It has been a while since I faced that problem but I remember that I used
Kera's functionality to provide data through Python generators, i.e. model = Sequential(); model.fit_generator(...).
An exemplary code snippet (should be self-explanatory)
def generate_batches(files, batch_size):
counter = 0
while True:
fname = files[counter]
print(fname)
counter = (counter + 1) % len(files)
data_bundle = pickle.load(open(fname, "rb"))
X_train = data_bundle[0].astype(np.float32)
y_train = data_bundle[1].astype(np.float32)
y_train = y_train.flatten()
for cbatch in range(0, X_train.shape[0], batch_size):
yield (X_train[cbatch:(cbatch + batch_size),:,:], y_train[cbatch:(cbatch + batch_size)])
model = Sequential()
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
train_files = [train_bundle_loc + "bundle_" + cb.__str__() for cb in range(nb_train_bundles)]
gen = generate_batches(files=train_files, batch_size=batch_size)
history = model.fit_generator(gen, samples_per_epoch=samples_per_epoch, nb_epoch=num_epoch,verbose=1, class_weight=class_weights)