LSTM hyperparameter tuning results in overfitting predictions

LSTM hyperparameter tuning results in overfitting predictions - deep-learning

I am doing a search for hyperparameter tuning with the following code, but everytime it results in overfitting the predictions by x20-x40...
Can anybody pintpoint the issue here from the coding? - the coding is below all text descriptions.
The dataset used is limited to ~84 datapoints.
THE CODE FOR MODEL BUILDING/SEARCH:
def build_model(hp):
model = Sequential()
model.add(LSTM(hp.Int('input_unit',min_value=0,max_value=420,step=6),return_sequences=True, input_shape=(scaled_train_data.shape[0],scaled_train_data.shape[1])))
for i in range(hp.Int('n_layers', 1, 5)):
model.add(LSTM(hp.Int(f'lstm_{i}_units',min_value=0,max_value=420,step=6),return_sequences=True))
model.add(LSTM(hp.Int('layer_2_neurons',min_value=0,max_value=420,step=6)))
model.add(Dropout(hp.Float('Dropout_rate',min_value=0,max_value=0.6,step=0.1)))
model.add(Dense(scaled_train_data.shape[1], activation=hp.Choice('dense_activation',values=['relu', 'sigmoid', 'tanh'],default='relu')))
model.compile(loss='mean_squared_error', optimizer='adam' ,metrics = ['mse'])
return model
CODE FOR TRIALS AND PREDICTIONS
tuner= RandomSearch(
build_model,
objective='mse',
max_trials=60,
executions_per_trial=1
)
tuner.search(
x=scaled_train_data,
y=scaled_train_data,
epochs=30,
batch_size=2,
validation_data=(scaled_test_data,scaled_test_data),
)
best_model = tuner.get_best_models(num_models=1)[0]
best_model.predict
lstm_predictions_scaled = list()
batch = scaled_train_data[-12:]
current_batch = batch.reshape((1, 12, 1))
for i in range(len(test_data)):
lstm_pred = best_model.predict(current_batch)[0]
lstm_predictions_scaled.append(lstm_pred)
current_batch = np.append(current_batch[:,1:,:],[[lstm_pred]],axis=1)
lstm_predictions = scaler.inverse_transform(lstm_predictions_scaled)

Related

Multi task problem - Print accuracy and mse after every epoch

I am training a CNN on face images and I want it to perform classification and regression tasks at the same time. I figured out how to train the CNN as below:
resnet = tf.keras.applications.ResNet50(
include_top=False ,
weights='imagenet' ,
input_shape=(96, 96, 3) ,
pooling="avg"
)
for layer in resnet.layers:
layer.trainable = True
inputs = Input(shape=(96, 96, 3), name='main_input')
main_branch = resnet(inputs)
main_branch = Flatten()(main_branch)
expr_branch = Dense(8, activation='softmax', name='expr_output')(main_branch)
va_branch = Dense(2, name='va_output')(main_branch)
model = Model(inputs = inputs,
outputs = [expr_branch, va_branch])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss={'expr_output': 'sparse_categorical_crossentropy', 'va_output':
'mean_squared_error'})
I want after every epoch to prin the accuracy and the mse metrics for each task. So far I have written the below:
checkpoint = tf.keras.callbacks.ModelCheckpoint(
model_path,
save_weights_only=True,
verbose=1
)
history = model.fit_generator(
train_generator,
epochs=2,
steps_per_epoch=STEP_SIZE_TRAIN_resnet,
validation_data=test_generator,
validation_steps=STEP_SIZE_TEST_resnet,
max_queue_size=1,
shuffle=True,
callbacks=[checkpoint],
verbose=1
)
When I had only the classification task I would write
checkpoint = tf.keras.callbacks.ModelCheckpoint(
model_path,
monitor='val_accuracy',
save_best_only=True,
mode='max',
verbose=1
)
which printed the val_accuracy at every epoch and saved the weights. How can I do the same (print mse and accuracy and save the weights after every epoch) at a multitask problem?

How to add an additional output node during training for Pytorch?

I am making a class-incremental learning multi-label classifier. Here the model first trains with 7 labels. After training, another dataset emerges that contains the same labels except one more. I want to automatically add an extra node to the trained network and continue training on this new dataset. How can I do this?
class FeedForewardNN(nn.Module):
def __init__(self, input_size, h1_size = 264, h2_size = 128, num_services=8):
super().__init__()
self.input_size = input_size
self.lin1 = nn.Linear(input_size, h1_size)
self.lin2 = nn.Linear(h1_size, h2_size)
self.lin3 = nn.Linear(h2_size, num_services)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.lin1(x)
x = self.relu(x)
x = self.lin2(x)
x = self.relu(x)
x = self.lin3(x)
x = self.sigmoid(x)
return x
This is the architecture of the feedforward Neural Network.
Then I first train on the data set with only 7 classes.
#Create NN
input_size = len(x_columns)
net1 = FeedForewardNN(input_size, num_services=7)
alpha= 0.001
#Define optimizer
optimizer = optim.Adam(net.parameters(), lr=alpha)
criterion = nn.BCELoss()
running_loss = 0
#Training Loop
loss_list = []
auc_list = []
for i in range(len(train_data_x)):
optimizer.zero_grad()
outputs = net1(train_data_x[i])
loss = criterion(outputs, train_data_y[i])
loss.backward()
optimizer.step()
However then, I want to add one additional output node, define the new weights but maintain the old trained weights, and train on this new data set.

I suggest to replace layer with new one, having desired shape, and than partially assign its parameter values with old ones as follows:
def increaseClassifier( m: torch.nn.Linear ):
w = m.weight
b = m.bias
old_shape = m.weight.shape
m2 = nn.Linear( old_shape[1], old_shape[0] +1 )
m2.weight = nn.parameter.Parameter( torch.cat( (m.weight, m2.weight[0:1]) ) )
m2.bias = nn.parameter.Parameter( torch.cat( (m.bias, m2.bias[0:1]) ) )
return m2
class FeedForewardNN(nn.Module):
...
def incrHere(self):
self.lin3 = increaseClassifier( self.lin3 )
UPD:
Can you explain, how these additional weights that come with this new output node are initialized?
The initial weights for new channel come from new layer creation, layer constructor make new parameters with some random initialization, then we are replace part of it with trained weight, and remained part is ready for new training.
m2.weight = nn.parameter.Parameter( torch.cat( (m.weight, m2.weight[0:1]) ) )

what should I do if my regression model stuck at a high value loss?

I'm using neural nets for a regression problem where I have 3 features and I'm trying to predict one continuous value. I noticed that my neural net start learning good but after 10 epochs it get stuck on a high loss value and could not improve anymore.
I tried to use Adam and other adaptive optimizers instead of SGD but that didn't work. I tried a complex architectures like adding layers, neurons, batch normalization and other activations etc.. and that also didn't work.
I tried to debug and try to find out if something is wrong with the implementation but when I use only 10 examples of the data my model learn fast so there are no errors. I start to increase the examples of the data and monitoring my model results as I increase the data examples. when I reach 3000 data examples my model start to get stuck on a high value loss.
I tried to increase layers, neurons and also to try other activations, batch normalization. My data are also normalized between [-1, 1], my target value is not normalized since it is regression and I'm predicting a continuous value. I also tried using keras but I've got the same result.
My real dataset have 40000 data, I don't know what should I try, I almost try all things that I know for optimization but none of them worked. I would appreciate it if someone can guide me on this. I'll post my Code but maybe it is too messy to try to understand, I'm sure there is no problem with my implementation, I'm using skorch/pytorch and some SKlearn functions:
# take all features as an Independant variable except the bearing and distance
# here when I start small the model learn good but from 3000 data points as you can see the model stuck on a high value. I mean the start loss is 15 and it start to learn good but when it reach 9 it stucks there
# and if I try to use the whole dataset for training then the loss start at 47 and start decreasing until it reach 36 and then stucks there too
X = dataset.iloc[:3000, 0:-2].reset_index(drop=True).to_numpy().astype(np.float32)
# take distance and bearing as the output values:
y = dataset.iloc[:3000, -2:].reset_index(drop=True).to_numpy().astype(np.float32)
y_bearing = y[:, 0].reshape(-1, 1)
y_distance = y[:, 1].reshape(-1, 1)
# normalize the input values
scaler = StandardScaler()
X_norm = scaler.fit_transform(X, y)
X_br_train, X_br_test, y_br_train, y_br_test = train_test_split(X_norm,
y_bearing,
test_size=0.1,
random_state=42,
shuffle=True)
X_dis_train, X_dis_test, y_dis_train, y_dis_test = train_test_split(X_norm,
y_distance,
test_size=0.1,
random_state=42,
shuffle=True)
bearing_trainset = Dataset(X_br_train, y_br_train)
bearing_testset = Dataset(X_br_test, y_br_test)
distance_trainset = Dataset(X_dis_train, y_dis_train)
distance_testset = Dataset(X_dis_test, y_dis_test)
def root_mse(y_true, y_pred):
return np.sqrt(mean_squared_error(y_true, y_pred))
class RMSELoss(nn.Module):
def __init__(self):
super().__init__()
self.mse = nn.MSELoss()
def forward(self, yhat, y):
return torch.sqrt(self.mse(yhat, y))
class AED(nn.Module):
"""custom average euclidean distance loss"""
def __init__(self):
super().__init__()
def forward(self, yhat, y):
return torch.dist(yhat, y)
def train(on_target,
hidden_units,
batch_size,
epochs,
optimizer,
lr,
regularisation_factor,
train_shuffle):
network = None
trainset = distance_trainset if on_target.lower() == 'distance' else bearing_trainset
testset = distance_testset if on_target.lower() == 'distance' else bearing_testset
print(f"shape of trainset.X = {trainset.X.shape}, shape of trainset.y = {trainset.y.shape}")
print(f"shape of testset.X = {testset.X.shape}, shape of testset.y = {testset.y.shape}")
mse = EpochScoring(scoring=mean_squared_error, lower_is_better=True, name='MSE')
r2 = EpochScoring(scoring=r2_score, lower_is_better=False, name='R2')
rmse = EpochScoring(scoring=make_scorer(root_mse), lower_is_better=True, name='RMSE')
checkpoint = Checkpoint(dirname=f'results/{on_target}/checkpoints')
train_end_checkpoint = TrainEndCheckpoint(dirname=f'results/{on_target}/checkpoints')
if on_target.lower() == 'bearing':
network = BearingNetwork(n_features=X_norm.shape[1],
n_hidden=hidden_units,
n_out=y_distance.shape[1])
elif on_target.lower() == 'distance':
network = DistanceNetwork(n_features=X_norm.shape[1],
n_hidden=hidden_units,
n_out=1)
model = NeuralNetRegressor(
module=network,
criterion=RMSELoss,
device='cpu',
batch_size=batch_size,
lr=lr,
optimizer=optim.Adam if optimizer.lower() == 'adam' else optim.SGD,
optimizer__weight_decay=regularisation_factor,
max_epochs=epochs,
iterator_train__shuffle=train_shuffle,
train_split=predefined_split(testset),
callbacks=[mse, r2, rmse, checkpoint, train_end_checkpoint]
)
print(f"{'*' * 10} start training the {on_target} model {'*' * 10}")
history = model.fit(trainset, y=None)
print(f"{'*' * 10} End Training the {on_target} Model {'*' * 10}")
if __name__ == '__main__':
args = parser.parse_args()
train(on_target=args.on_target,
hidden_units=args.hidden_units,
batch_size=args.batch_size,
epochs=args.epochs,
optimizer=args.optimizer,
lr=args.learning_rate,
regularisation_factor=args.regularisation_lambda,
train_shuffle=args.shuffle)
and this is my network declaration:
class DistanceNetwork(nn.Module):
"""separate NN for predicting distance"""
def __init__(self, n_features=5, n_hidden=16, n_out=1):
super().__init__()
self.model = nn.Sequential(
nn.Linear(n_features, n_hidden),
nn.LeakyReLU(),
nn.Linear(n_hidden, 5),
nn.LeakyReLU(),
nn.Linear(5, n_out)
)
here is the log while training:

Using ImageDataGenerator with your own generator

I have large dataset that will not fit in memory and it has multiple inputs. So thats why I created my own generator. But then I wanted to augment my data by using ImageDataGenerator I face problem. I don't know how to combine both generators.
What I have done till now is :
def data_gen( batch_size= None, nb_epochs=None, sess=None):
dataset = tf.data.TFRecordDataset(training_filenames)
dataset = dataset.map(_parse_function_all)
dataset = dataset.shuffle(buffer_size= 1000 + 4* batch_size)
dataset = dataset.batch(batch_size).repeat()
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
for i in range(nb_epochs):
sess.run(iterator.initializer)
while True:
try:
next_val = sess.run(next_element)
images_a = next_val[0][:, 0]
images_b = next_val[0][:, 1]
labels = next_val[1]
yield [images_a, images_b], labels
except tf.errors.OutOfRangeError:
break
mymodel = Model(input=[input_a, input_b], output=out)
mymodel.compile(loss=loss_both_equal, optimizer=rms, metrics=['accuracy', auc_roc])
data_gen_1 = data_gen(batch_size= batch_size, nb_epochs= 10, sess= sess)
mymodel.fit_generator(generator= data_gen_1, epochs = epochs,
steps_per_epoch=335,
callbacks=[tensorboard, alphaChanger])
So If I want to do some augmentation using DataImageGenerator, how I can combine my own generator with DataIamgeGenerator?

How to build Convolutional Bi-directional LSTM with Keras

I'm trying to build a Convolutional Bi-directional LSTM to classify DNA sequences ala this paper: DanQ: a hybrid convolutional and recurrent deep
neural network for quantifying the function of DNA
sequences (Architecture picture on the second page)
The short version of it is to build to one-hot encode a DNA sequence:
`'ATACG...' = [
[1,0,0,0],
[0,0,0,1],
[1,0,0,0],
[0,1,0,0],
[0,0,1,0],
...],`
Then feed it to a convolutional-relu-maxpooling layer to find motifs, then into a bidirectional LSTM network to learn long-distance dependancies.
The original source code is here.
However, it uses an outdated version of Keras and includes a dependency on Seya, which is what I'd like to avoid doing. Here is my first attempt at building the model:
inputs = Input(shape=(500,4))
convo_1 = Convolution1D(320, border_mode='valid',filter_length=26, activation="relu", subsample_length=1)(inputs)
maxpool_1 = MaxPooling1D(pool_length=13, stride=13)(convo_1)
drop_1 = Dropout(0.2)(maxpool_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)
model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
checkpointer = ModelCheckpoint(filepath=sc_local_dir+"DanQ_bestmodel.hdf5", verbose=1, save_best_only=True)
earlystopper = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
Unfortunately, the loss remained nearly constant during training, and the accuracy stayed constant as well. This leads me to believe that I have set the model up incorrectly, or that 1-dimensional convolution is useless on this kind of input. So i attempted to make switch to 2D convolution:
inputs = Input(shape=(1, 500,4))
convo_1 = Convolution2D(320, nb_row=15, nb_col=4, init='glorot_uniform', \
activation='relu', border_mode='same')(inputs)
maxpool_1 = MaxPooling2D((15, 4))(convo_1)
flat_1 = Flatten()(maxpool_1)
drop_1 = Dropout(0.2)(flat_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)
model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
checkpointer = ModelCheckpoint(filepath=sc_local_dir+"DanQ_bestmodel.hdf5", verbose=1, save_best_only=True)
earlystopper = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
Which gives me the following error when trying to feed the flattened layer into the LSTM:
Exception: Input 0 is incompatible with layer lstm_4: expected ndim=3, found ndim=2
Have I set up my 1D Convolution LSTM correctly? If so, then I likely need to upgrade to a 2D Convolution LSTM, in which case, how can I fix the input error?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

LSTM hyperparameter tuning results in overfitting predictions - deep-learning

Related

Multi task problem - Print accuracy and mse after every epoch

How to add an additional output node during training for Pytorch?

what should I do if my regression model stuck at a high value loss?

Using ImageDataGenerator with your own generator

How to build Convolutional Bi-directional LSTM with Keras

Categories

Resources