Related
I am developing an AI for video classification, which classifies a video file into one of three labels: Normal, Violent, or Pornography.
Here is a summary of my efforts so far to improve the accuracy of the model:
1. Dataset: I have collected a training dataset of 50,000 videos, consisting of 5000 original videos and 45,000 augmented videos, evenly split between the three labels.
2. Pre-processing: I have used an InceptionV3 model pre-trained on the ImageNet dataset to extract features from the videos for feeding into my main model.
3. Model Architecture: I have tried many different model architectures, but all of them resulted in overfitting problems after a maximum of 15 epochs.
4. Regularization: I have added L1 and L2 regularization, but they did not help improve the model.
5. Early Stopping: I have implemented early stopping, but it stopped training when the validation values were still not good enough to achieve good accuracy.
6. Model Complexity: I have tried both complex and less complex models, but both still resulted in overfitting.
7. Batch Normalization: I have added batch normalization, but it did not solve the overfitting problem.
8. Learning Rate Scheduler: I have tried using ReduceLROnPlateau and LearningRateScheduler togheter and alone, but still no luck.
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, verbose=0, mode='min', min_delta=0.0001, cooldown=0, min_lr=0)
lr_schedule = keras.callbacks.LearningRateScheduler(
lambda epoch: 0.0005* tf.math.exp(-0.05 * epoch),
verbose=True)
9. Computing Resources: I am running the training on AWS Sagemaker ml.t3.2xlarge with 32GB RAM memory.
10. Dataset Size: I would prefer to avoid increasing the size of the dataset as I am running short on time for the project delivery. However, if this is my only option, I am open to suggestions.
11. Tuning regularizer Gradually increase the regularization value in each layer to fine-tune the model.
Please note that these are just examples of the models I have tried, I have experimented with many others with similar results.
x = keras.layers.GRU(32, return_sequences=True, kernel_regularizer=keras.regularizers.l2(0.001))(
frame_features_input, mask=mask_input
)
x = keras.layers.GRU(16, kernel_regularizer=keras.regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.4)(x)
x = keras.layers.Dense(1024, activation="relu",
kernel_regularizer=keras.regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(256, activation="relu",
kernel_regularizer=keras.regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(128, activation="relu",
kernel_regularizer=keras.regularizers.l2(0.001))(x)
output = keras.layers.Dense(len(class_vocab), activation="softmax")(x)
rnn_model = keras.Model([frame_features_input, mask_input], output)
opt = keras.optimizers.experimental.AdamW(
learning_rate=0.0001, # 0.001
weight_decay=0.004, # .004 best perform
beta_1=0.9,
beta_2=0.999,
epsilon=1e-07,
amsgrad=False,
clipnorm=None,
clipvalue=None,
global_clipnorm=None,
use_ema=False,
ema_momentum=0.99,
ema_overwrite_frequency=None,
jit_compile=True,
name="AdamW")
rnn_model.compile(
loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"]
)
x = keras.layers.GRU(128, return_sequences=True, recurrent_dropout=0.3)(frame_features_input)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.GRU(64, return_sequences=False, recurrent_dropout=0.3)(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dense(32, activation="relu", kernel_regularizer=keras.regularizers.l2(0.01))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.BatchNormalization()(x)
output = keras.layers.Dense(len(class_vocab), activation="softmax")(x)
x = keras.layers.GRU(256, return_sequences=True, recurrent_dropout=0.3)(frame_features_input)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.GRU(128, return_sequences=True, recurrent_dropout=0.3)(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.GRU(64, return_sequences=False, recurrent_dropout=0.3)(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dense(32, activation="relu", kernel_regularizer=keras.regularizers.l2(0.01))(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.BatchNormalization()(x)
output = keras.layers.Dense
The results
The results using learning rate scheduler
Tried different model architectures, adding regularization, early stopping, and batch normalization, but still faced overfitting issue. Expected improved accuracy, but actual results show overfitting.
I'm using neural nets for a regression problem where I have 3 features and I'm trying to predict one continuous value. I noticed that my neural net start learning good but after 10 epochs it get stuck on a high loss value and could not improve anymore.
I tried to use Adam and other adaptive optimizers instead of SGD but that didn't work. I tried a complex architectures like adding layers, neurons, batch normalization and other activations etc.. and that also didn't work.
I tried to debug and try to find out if something is wrong with the implementation but when I use only 10 examples of the data my model learn fast so there are no errors. I start to increase the examples of the data and monitoring my model results as I increase the data examples. when I reach 3000 data examples my model start to get stuck on a high value loss.
I tried to increase layers, neurons and also to try other activations, batch normalization. My data are also normalized between [-1, 1], my target value is not normalized since it is regression and I'm predicting a continuous value. I also tried using keras but I've got the same result.
My real dataset have 40000 data, I don't know what should I try, I almost try all things that I know for optimization but none of them worked. I would appreciate it if someone can guide me on this. I'll post my Code but maybe it is too messy to try to understand, I'm sure there is no problem with my implementation, I'm using skorch/pytorch and some SKlearn functions:
# take all features as an Independant variable except the bearing and distance
# here when I start small the model learn good but from 3000 data points as you can see the model stuck on a high value. I mean the start loss is 15 and it start to learn good but when it reach 9 it stucks there
# and if I try to use the whole dataset for training then the loss start at 47 and start decreasing until it reach 36 and then stucks there too
X = dataset.iloc[:3000, 0:-2].reset_index(drop=True).to_numpy().astype(np.float32)
# take distance and bearing as the output values:
y = dataset.iloc[:3000, -2:].reset_index(drop=True).to_numpy().astype(np.float32)
y_bearing = y[:, 0].reshape(-1, 1)
y_distance = y[:, 1].reshape(-1, 1)
# normalize the input values
scaler = StandardScaler()
X_norm = scaler.fit_transform(X, y)
X_br_train, X_br_test, y_br_train, y_br_test = train_test_split(X_norm,
y_bearing,
test_size=0.1,
random_state=42,
shuffle=True)
X_dis_train, X_dis_test, y_dis_train, y_dis_test = train_test_split(X_norm,
y_distance,
test_size=0.1,
random_state=42,
shuffle=True)
bearing_trainset = Dataset(X_br_train, y_br_train)
bearing_testset = Dataset(X_br_test, y_br_test)
distance_trainset = Dataset(X_dis_train, y_dis_train)
distance_testset = Dataset(X_dis_test, y_dis_test)
def root_mse(y_true, y_pred):
return np.sqrt(mean_squared_error(y_true, y_pred))
class RMSELoss(nn.Module):
def __init__(self):
super().__init__()
self.mse = nn.MSELoss()
def forward(self, yhat, y):
return torch.sqrt(self.mse(yhat, y))
class AED(nn.Module):
"""custom average euclidean distance loss"""
def __init__(self):
super().__init__()
def forward(self, yhat, y):
return torch.dist(yhat, y)
def train(on_target,
hidden_units,
batch_size,
epochs,
optimizer,
lr,
regularisation_factor,
train_shuffle):
network = None
trainset = distance_trainset if on_target.lower() == 'distance' else bearing_trainset
testset = distance_testset if on_target.lower() == 'distance' else bearing_testset
print(f"shape of trainset.X = {trainset.X.shape}, shape of trainset.y = {trainset.y.shape}")
print(f"shape of testset.X = {testset.X.shape}, shape of testset.y = {testset.y.shape}")
mse = EpochScoring(scoring=mean_squared_error, lower_is_better=True, name='MSE')
r2 = EpochScoring(scoring=r2_score, lower_is_better=False, name='R2')
rmse = EpochScoring(scoring=make_scorer(root_mse), lower_is_better=True, name='RMSE')
checkpoint = Checkpoint(dirname=f'results/{on_target}/checkpoints')
train_end_checkpoint = TrainEndCheckpoint(dirname=f'results/{on_target}/checkpoints')
if on_target.lower() == 'bearing':
network = BearingNetwork(n_features=X_norm.shape[1],
n_hidden=hidden_units,
n_out=y_distance.shape[1])
elif on_target.lower() == 'distance':
network = DistanceNetwork(n_features=X_norm.shape[1],
n_hidden=hidden_units,
n_out=1)
model = NeuralNetRegressor(
module=network,
criterion=RMSELoss,
device='cpu',
batch_size=batch_size,
lr=lr,
optimizer=optim.Adam if optimizer.lower() == 'adam' else optim.SGD,
optimizer__weight_decay=regularisation_factor,
max_epochs=epochs,
iterator_train__shuffle=train_shuffle,
train_split=predefined_split(testset),
callbacks=[mse, r2, rmse, checkpoint, train_end_checkpoint]
)
print(f"{'*' * 10} start training the {on_target} model {'*' * 10}")
history = model.fit(trainset, y=None)
print(f"{'*' * 10} End Training the {on_target} Model {'*' * 10}")
if __name__ == '__main__':
args = parser.parse_args()
train(on_target=args.on_target,
hidden_units=args.hidden_units,
batch_size=args.batch_size,
epochs=args.epochs,
optimizer=args.optimizer,
lr=args.learning_rate,
regularisation_factor=args.regularisation_lambda,
train_shuffle=args.shuffle)
and this is my network declaration:
class DistanceNetwork(nn.Module):
"""separate NN for predicting distance"""
def __init__(self, n_features=5, n_hidden=16, n_out=1):
super().__init__()
self.model = nn.Sequential(
nn.Linear(n_features, n_hidden),
nn.LeakyReLU(),
nn.Linear(n_hidden, 5),
nn.LeakyReLU(),
nn.Linear(5, n_out)
)
here is the log while training:
In keras, is it possible to share weights between two layers, but to have other parameters differ? Consider the following (admittedly a bit contrived) example:
conv1 = Conv2D(64, 3, input_shape=input_shape, padding='same')
conv2 = Conv2D(64, 3, input_shape=input_shape, padding='valid')
Notice that the layers are identical except for the padding. Can I get keras to use the same weights for both? (i.e. also train the network accordingly?)
I've looked at the keras doc, and the section on shared layers seems to imply that sharing works only if the layers are completely identical.
To my knowledge, this cannot be done by the common "API level" of Keras usage.
However, if you dig a bit deeper, there are some (ugly) ways to share the weights.
First of all, the weights of the Conv2D layers are created inside the build() function, by calling add_weight():
self.kernel = self.add_weight(shape=kernel_shape,
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
For your provided usage (i.e., default trainable/constraint/regularizer/initializer), add_weight() does nothing special but appending the weight variables to _trainable_weights:
weight = K.variable(initializer(shape), dtype=dtype, name=name)
...
self._trainable_weights.append(weight)
Finally, since build() is only called inside __call__() if the layer hasn't been built, shared weights between layers can be created by:
Call conv1.build() to initialize the conv1.kernel and conv1.bias variables to be shared.
Call conv2.build() to initialize the layer.
Replace conv2.kernel and conv2.bias by conv1.kernel and conv1.bias.
Remove conv2.kernel and conv2.bias from conv2._trainable_weights.
Append conv1.kernel and conv1.bias to conv2._trainable_weights.
Finish model definition. Here conv2.__call__() will be called; however, since conv2 has already been built, the weights are not going to be re-initialized.
The following code snippet may be helpful:
def create_shared_weights(conv1, conv2, input_shape):
with K.name_scope(conv1.name):
conv1.build(input_shape)
with K.name_scope(conv2.name):
conv2.build(input_shape)
conv2.kernel = conv1.kernel
conv2.bias = conv1.bias
conv2._trainable_weights = []
conv2._trainable_weights.append(conv2.kernel)
conv2._trainable_weights.append(conv2.bias)
# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights) # True
# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')
X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())]) # [True, True]
One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.
I am working on image OCR with my own dataset, I have 1000 images of variable length and I want to feed in images in form of patches of 46X1. I have generated patches of my images and my label values are in Urdu text, so I have encoded them as utf-8. I want to implement CTC in the output layer. I have tried to implement CTC following the image_ocr example at github. But I get the following error in my CTC implementation.
'numpy.ndarray' object has no attribute 'get_shape'
Could anyone guide me about my mistakes? Kindly suggest the solution for it.
My code is:
X_train, X_test, Y_train, Y_test =train_test_split(imageList, labelList, test_size=0.3)
X_train_patches = np.array([image.extract_patches_2d(X_train[i], (46, 1))for i in range (700)]).reshape(700,1,1) #(Samples, timesteps,dimensions)
X_test_patches = np.array([image.extract_patches_2d(X_test[i], (46, 1))for i in range (300)]).reshape(300,1,1)
Y_train=np.array([i.encode("utf-8") for i in str(Y_train)])
Label_length=1
input_length=1
####################Loss Function########
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
#Building Model
model =Sequential()
model.add(LSTM(20, input_shape=(None, X_train_patches.shape[2]), return_sequences=True))
model.add(Activation('relu'))
model.add(TimeDistributed(Dense(12)))
model.add(Activation('tanh'))
model.add(LSTM(60, return_sequences=True))
model.add(Activation('relu'))
model.add(TimeDistributed(Dense(40)))
model.add(Activation('tanh'))
model.add(LSTM(100, return_sequences=True))
model.add(Activation('relu'))
loss_out = Lambda(ctc_lambda_func, name='ctc')([X_train_patches, Y_train, input_length, Label_length])
The way CTC is modelled currently in Keras is that you need to implement the loss function as a layer, you did that already (loss_out). Your problem is that the inputs you give that layer are not tensors from Theano/TensorFlow but numpy arrays.
To change that one option is to model these values as inputs to your model. This is exactly what the implementation does that you copied the code from:
labels = Input(name='the_labels', shape=[img_gen.absolute_max_string_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
To make this work you need to ditch the Sequential model and use the functional model API, exactly as done in the code linked above.
I'm trying to build a Convolutional Bi-directional LSTM to classify DNA sequences ala this paper: DanQ: a hybrid convolutional and recurrent deep
neural network for quantifying the function of DNA
sequences (Architecture picture on the second page)
The short version of it is to build to one-hot encode a DNA sequence:
`'ATACG...' = [
[1,0,0,0],
[0,0,0,1],
[1,0,0,0],
[0,1,0,0],
[0,0,1,0],
...],`
Then feed it to a convolutional-relu-maxpooling layer to find motifs, then into a bidirectional LSTM network to learn long-distance dependancies.
The original source code is here.
However, it uses an outdated version of Keras and includes a dependency on Seya, which is what I'd like to avoid doing. Here is my first attempt at building the model:
inputs = Input(shape=(500,4))
convo_1 = Convolution1D(320, border_mode='valid',filter_length=26, activation="relu", subsample_length=1)(inputs)
maxpool_1 = MaxPooling1D(pool_length=13, stride=13)(convo_1)
drop_1 = Dropout(0.2)(maxpool_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)
model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
checkpointer = ModelCheckpoint(filepath=sc_local_dir+"DanQ_bestmodel.hdf5", verbose=1, save_best_only=True)
earlystopper = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
Unfortunately, the loss remained nearly constant during training, and the accuracy stayed constant as well. This leads me to believe that I have set the model up incorrectly, or that 1-dimensional convolution is useless on this kind of input. So i attempted to make switch to 2D convolution:
inputs = Input(shape=(1, 500,4))
convo_1 = Convolution2D(320, nb_row=15, nb_col=4, init='glorot_uniform', \
activation='relu', border_mode='same')(inputs)
maxpool_1 = MaxPooling2D((15, 4))(convo_1)
flat_1 = Flatten()(maxpool_1)
drop_1 = Dropout(0.2)(flat_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)
model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
checkpointer = ModelCheckpoint(filepath=sc_local_dir+"DanQ_bestmodel.hdf5", verbose=1, save_best_only=True)
earlystopper = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
Which gives me the following error when trying to feed the flattened layer into the LSTM:
Exception: Input 0 is incompatible with layer lstm_4: expected ndim=3, found ndim=2
Have I set up my 1D Convolution LSTM correctly? If so, then I likely need to upgrade to a 2D Convolution LSTM, in which case, how can I fix the input error?