Emotion Detection with FER2013 and MPI dataset using CNN and TFLearn

Emotion Detection with FER2013 and MPI dataset using CNN and TFLearn - deep-learning

I am just beginner for Deep Learning. I try to capture all details.
Perplex is derived from MPI dataset. other emotions are derived from FER2013. To balance the data set, all emotions into (training: 3171, validation: 816) strategy, due to lack of perplex dataset.
Dataset Size:
perplex happy sad neutral angry
train 3171 3171 3171 3171 3171
perplex happy sad neutral angry
test 816 816 816 816 815
FER2013 source: downsized version of
https://www.kaggle.com/msambare/fer2013
MPI source (only cam 2, 3 & 4 angle, of all actors emotions such as Clueless, Confusion and Thinking):
https://www.b-tu.de/en/graphic-systems/databases/the-small-mpi-facial-expression-database
https://www.b-tu.de/fg-graphische-systeme/datenbanken/die-grosse-mpi-gesichtsausdrueckedatenbank
preprocess steps:
Firstly, all 3171 x 5 = 15855 and 816 x 5 = 4079 images into 48x48 gray scale.
Samples:
Sample dataset for Angry, Happy, Perplex and Sad
CNN Architecute using TFLearn:
# Input Layer
convnet = input_data(name="input", shape=[None, 48, 48, 1])
#Enabling Filters
convnet = conv_2d(convnet, 32, 5, activation = "relu")
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 64, 5, activation = "relu")
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 128, 5, activation = "relu")
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 64, 5, activation = "relu")
convnet = max_pool_2d(convnet, 5)
convnet = conv_2d(convnet, 32, 5, activation = "relu")
convnet = max_pool_2d(convnet, 5)
convnet = fully_connected(convnet, 1024, activation = "relu")
convnet = dropout(convnet, 0.5)
#Output Layer
convnet = fully_connected(convnet, 5, activation = "softmax")
convnet = regression(convnet, optimizer = "SGD", learning_rate = 0.001, loss = "categorical_crossentropy", name = "targets")
model = tflearn.DNN(convnet,
best_checkpoint_path = best_cp_path + '/',
best_val_accuracy = "BEST_VAL_ACCURACY",
tensorboard_dir = log_path,
tensorboard_verbose = 3)
image_size = 48
channel = 1
#Split for training and testing
train_x = np.array([index[0] for index in train]).reshape(-1, image_size, image_size, channel)
train_y = np.array([index[1] for index in train])
test_x = np.array([index[0] for index in test]).reshape(-1, image_size, image_size, channel)
test_y = np.array([index[1] for index in test])
model.fit(
train_x,
train_y,
validation_set = (test_x, test_y),
n_epoch = 500,
snapshot_step = 500,
show_metric = True,
run_id = "ED_SGD-0.001",
snapshot_epoch = True
)
I am shuffling all training and validation files before splitting as train and test input.
The training is overfitting after 150+ epoch out of 500, I am using raw SGD without momentum and learning decay, I tried Adam also same overfitting issue. the val_accuracy is 0.62 and train_accuracy is 0.8+
Early Saving Method: I am saving 0.62 accuracy model files in the model folder.
Odd's in my mind:
If you see closely all perplex emotion have black background, because it's taken in lab environment. Other emotions are taken from fer2013 it's lively with different grey shade background and some black also I can find.
How to overcome this overfitting issue?
Which hyper-parameter values
should I tune?
Should I upscale to 7000+ images as like in FER2013
dataset?
Should I apply different background of grey shades randomly to perplex
images?
How to increase the accuracy?
Loss curve:
Training & Validation Curve
Last Epoch values (taken from tensorboard graph, some values I didn't save from terminal):
Training Step: 17200+ | total loss: 1.7+ | time:
| SGD | epoch: 500 | **loss: 0.5236** - acc: 0.8270 | val_loss: 1.30219 - val_acc: 0.6239 -- iter: 15855/15855
GitHub:
you are warmly welcome
https://github.com/tcsbmogarage/ED.git

Related

deep learning RestNet problem calculate confusion matrix and other matrixes

i am new to deep learning.
I am running a code to train and test a model and find its precision recall f1-score support and confusion matrix.
plz see the code and tell me that am i coding right for taking F1 score and other matrixes. my accuracy is .97.
not sure about
have i taken the right prediction
have i compute the right confusion matrix.
guide me that the confusion matrix is ok or not.
enterinput_shape = (128, 128, 3)
batch_size = 64
epochs = 10
epoch_list = list(range(1, epochs+1))
Path to training & testing set.
train_dir = 'train'
test_dir = 'test'
train_dir_fake, test_dir_fake = os.path.join(train_dir, 'forged'), os.path.join(test_dir, 'forged')
train_dir_real, test_dir_real = os.path.join(train_dir, 'real'), os.path.join(test_dir, 'real')
train_fake_fnames, test_fake_fnames = os.listdir(train_dir_fake), os.listdir(test_dir_fake)
train_real_fnames, test_real_fnames = os.listdir(train_dir_real), os.listdir(test_dir_real)"
Training Data Generator.
train_datagen = ImageDataGenerator(rescale=1./255.)
Testing Data Generator.
test_datagen = ImageDataGenerator(rescale=1./255.)
Flow training images in batches of 64 using train_datagen generator
train_generator = train_datagen.flow_from_directory(train_dir,
target_size=(128, 128),
batch_size=batch_size,
shuffle='False',
class_mode='binary')
Flow test images in batches of 64 using test_datagen generator
test_generator = test_datagen.flow_from_directory(test_dir,
target_size=(128, 128),
batch_size=batch_size,
shuffle='False',
class_mode='binary')
ResNet50V2_model = ResNet50V2(input_shape=input_shape, include_top=False, weights="imagenet", classes=2)
for i in range(50):
l = ResNet50V2_model.get_layer(index=i)
l.trainable = True
model = Sequential()
model.add(ResNet50V2_model)
model.add(GlobalAveragePooling2D())
model.add(Dense(units=1, activation='sigmoid'))
Compiling the Model.
model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(learning_rate=1e-6, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0),
metrics=['accuracy'])
reduce = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, mode='auto')
early_stopping = EarlyStopping(monitor='val_loss', min_delta=1e-4, patience=5, verbose=0, mode='auto')
Starting the Training.
history = model.fit(train_generator, epochs=epochs, validation_data=test_generator)
storing model
network_name = "ResNet50V2"
try:
os.mkdir("./Reference_Data")
os.mkdir("./Reference_Data/Graphs")
os.mkdir("./Reference_Data/Summary")
os.mkdir("./Reference_Data/Model")
except OSError:
pass
try:
os.mkdir(os.path.join("./Reference_Data/Graphs", network_name))
except OSError:
pass
!dir
acc = np.linspace(min(epoch_list), max(epoch_list), 200)
val_acc = np.linspace(min(epoch_list), max(epoch_list), 200)
#define spline for accuracy
spl1 = make_interp_spline(epoch_list, history.history['accuracy'], k=3)
y_smooth1 = spl1(acc)
#define spline accuracy
spl2 = make_interp_spline(epoch_list, history.history['val_accuracy'], k=3)
y_smooth2 = spl2(val_acc)
with open("./Reference_Data/Summary/" + network_name + "summary.txt", 'w+') as f:
model.summary(print_fn=lambda x: f.write(x + '\n'))
Saving the Model for Inference Purpose.
model.save('./Reference_Data/Model/' + network_name + '/')
model.save('./Reference_Data/Model/' + network_name + '/' + network_name + '.h5')
test_generator.reset()
Y_pred = model.predict(test_generator,)
classes = test_generator.classes[test_generator.index_array]
y_pred = np.argmax(Y_pred, axis=-1)
y_pred=y_pred.round()
sum(y_pred==classes)/10000
pred=model.predict(test_generator,verbose=1)
def get_classification_report(
model, data_dir, batch_size=64,
steps=None, threshold=0.5, output_dict=False
):
data = get_test_data_generator(data_dir, batch_size=batch_size)
predictions = predict(model, data, steps, threshold)
predictions = predictions.reshape((predictions.shape[0],))
return classification_report(data.classes, predictions, output_dict=output_dict)
import sklearn.metrics as metrics
#y_pred = np.argmax(y_pred,axis=0)
#y_true=np.argmax(test_generator.classes,axis=0)
report = metrics.classification_report(true_classes, Y_pred.round(), target_names=class_labels,zero_division=0.0)
print(report)
precision recall f1-score support
forged 0.40 0.40 0.40 773
real 0.60 0.60 0.60 1172
accuracy 0.52 1945
macro avg 0.50 0.50 0.50 1945
weighted avg 0.52 0.52 0.52 1945

How to use DNN to fit these data

I'm using DNN to fit these data, and I use softmax to classify them into 2 class, and each of them has a demensity of 4040, can someone with experience tell me what's wrong with my nets.
It is strange that my initial loss is 7.6 and my initial error is 0.5524, and Basically they won't change anymore.
for train, test in kfold.split(data_pro, valence_labels):
model = keras.Sequential()
model.add(keras.layers.Dense(5000,activation='relu',input_shape=(4040,)))
model.add(keras.layers.Dropout(rate=0.25))
model.add(keras.layers.Dense(500, activation='relu'))
model.add(keras.layers.Dropout(rate=0.5))
model.add(keras.layers.Dense(1000, activation='relu'))
model.add(keras.layers.Dropout(rate=0.5))
model.add(keras.layers.Dense(2, activation='softmax'))
model.add(keras.layers.Dropout(rate=0.5))
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.0001,rho=0.9),
loss='binary_crossentropy',
metrics=['accuracy'])
print('------------------------------------------------------------------------')
print(f'Training for fold {fold_no} ...')
log_dir="logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# Fit data to model
history = model.fit(data_pro[train], valence_labels[train],
batch_size=128,
epochs=50,
verbose=1,
callbacks=[tensorboard_callback]
)
# Generate generalization metrics
scores = model.evaluate(data_pro[test], valence_labels[test], verbose=0)
print(f'Score for fold {fold_no}: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%')
acc_per_fold.append(scores[1] * 100)
loss_per_fold.append(scores[0])
# Increase fold number
fold_no = fold_no + 1
# == Provide average scores ==
print('------------------------------------------------------------------------')
print('Score per fold')
for i in range(0, len(acc_per_fold)):
print('------------------------------------------------------------------------')
print(f'> Fold {i+1} - Loss: {loss_per_fold[i]} - Accuracy: {acc_per_fold[i]}%')
print('------------------------------------------------------------------------')
print('Average scores for all folds:')
print(f'> Accuracy: {np.mean(acc_per_fold)} (+- {np.std(acc_per_fold)})')
print(f'> Loss: {np.mean(loss_per_fold)}')
print('------------------------------------------------------------------------')

You shouldn't add Dropout after the final Dense , delete the model.add(keras.layers.Dropout(rate=0.5))
And I think your code may raise error because your labels's dim is 1 , But your final Dense's units is 2 . Change model.add(keras.layers.Dense(2, activation='softmax')) to model.add(keras.layers.Dense(1, activation='sigmoid'))
Read this to learn tensorflow
Update 1 :
Change
model.compile(optimizer= tf.keras.optimizers.SGD(learning_rate = 0.00001,momentum=0.9,nesterov=True),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
to
model.compile(optimizer= tf.keras.optimizers.Adam(learning_rate=3e-4),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
And change
accAll = []
for epoch in range(1, 50):
model.fit(train_data, train_labels,
batch_size=50,epochs=5,
validation_data = (val_data, val_labels))
val_loss, val_Accuracy = model.evaluate(val_data,val_labels,batch_size=1)
accAll.append(val_Accuracy)
to
accAll = model.fit(
train_data, train_labels,
batch_size=50,epochs=20,
validation_data = (val_data, val_labels)
)

How to deal with severe overfitting in a UNet Encoder/Decoder CNN in a task very similar to image translation?

I am trying to fit a UNet CNN to a task very similar to image to image translation. The input to the network is a binary matrix of size (64,256) and the output is of size (64,32). The columns represent a status of a communication channel where each entry in the column is the status of a subchannel. 1 means that the subchannel is occupied and 0 means that the subchannel is vacant. The horizontal axis represents the flow of time. So, the first column is status of the channel at time slot 1 and the second column is the status at time slot and so forth. The task is to predict the status of the channel in the next 32 time slots given the the previous 256 time slots which I treated as image to image translation.
The accuracy on the training data is around 90% while the accuracy on the test is around 50%. By accuracy here, I mean the average percentage of correct entries in each image. Also, while training the validation loss increases while the loss decreases which is a clear sign of overfitting. I have tried most of the regularization techniques and also tried reducing the capacity of the model but this only reduces the training error while not improving the generalization error. Any advice or ideas? I included in the next part the learning curve for training on 1000 samples, the implementation of the network and samples from the training and test sets.
Learning curves of training on 1000 samples
3 Samples from the training set
3 Samples From the test set
Here is the implementation of the network:
def define_encoder_block(layer_in, n_filters, batchnorm=True):
# weight initialization
init = RandomNormal(stddev=0.02)
# add downsampling layer
g = Conv2D(n_filters, (4,4), strides=(2,2), padding='same',
kernel_initializer=init)(layer_in)
# conditionally add batch normalization
if batchnorm:
g = BatchNormalization()(g, training=True)
# leaky relu activation
g = LeakyReLU(alpha=0.2)(g)
return g
# define a decoder block
def decoder_block(layer_in, skip_in, n_filters, filter_strides, dropout=True, skip=True):
# weight initialization
init = RandomNormal(stddev=0.02)
# add upsampling layer
g = Conv2DTranspose(n_filters, (4,4), strides=filter_strides, padding='same',
kernel_initializer=init)(layer_in)
# add batch normalization
g = BatchNormalization()(g, training=True)
# conditionally add dropout
if dropout:
g = Dropout(0.5)(g, training=True)
if skip:
g = Concatenate()([g, skip_in])
# relu activation
g = Activation('relu')(g)
return g
# define the standalone generator model
def define_generator(image_shape=(64,256,1)):
# weight initialization
init = RandomNormal(stddev=0.02)
# image input
in_image = Input(shape=image_shape)
e1 = define_encoder_block(in_image, 64, batchnorm=False)
e2 = define_encoder_block(e1, 128)
e3 = define_encoder_block(e2, 256)
e4 = define_encoder_block(e3, 512)
e5 = define_encoder_block(e4, 512)
e6 = define_encoder_block(e5, 512)
e7 = define_encoder_block(e6, 512)
# bottleneck, no batch norm and relu
b = Conv2D(512, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(e7)
b = Activation('relu')(b)
# decoder model
d1 = decoder_block(b, e7, 512, (1,2))
d2 = decoder_block(d1, e6, 512, (1,2))
d3 = decoder_block(d2, e5, 512, (2,2))
d4 = decoder_block(d3, e4, 512, (2,2), dropout=False)
d5 = decoder_block(d4, e3, 256, (2,2), dropout=False)
d6 = decoder_block(d5, e2, 128, (2,1), dropout=False, skip= False)
d7 = decoder_block(d6, e1, 64, (2,1), dropout=False, skip= False)
# output
g = Conv2DTranspose(1, (4,4), strides=(2,1), padding='same', kernel_initializer=init)(d7)
out_image = Activation('sigmoid')(g)
# define model
model = Model(in_image, out_image)
return model

Training accuracy is not improving in recognizing violence with CNN + LSTM

I'm trying to implement a violence recognizer with CNN + LSTM but when I train the model it produces the same accuracy over the epochs as the following picture 55% over 20 epochs Right here
The data consists of 100 videos violence/non violence from the hockey dataset and it was preprocessed by cropping dark frames, scaled pixel values [0 to 1] and normalized in which the mean is zero with same standard deviation. The input shape is (100, 20, 160, 160, 3) which is number of videos, frames per video, frame height, frame width, RGB channels respectively. And the labels tensor of shape (100,2) which represents a vector [0 1] or [1 0] both arrays are given into the model as floats
The code of the model
def CNN_LSTM():
input_shapes=(NUMBER_OF_FRAMES,IMAGE_SIZE,IMAGE_SIZE,3)
np.random.seed(1234)
vg19 = tensorflow.keras.applications.vgg19.VGG19
base_model=vg19(include_top=False,weights='imagenet',input_shape=(IMAGE_SIZE, IMAGE_SIZE,3))
for layer in base_model.layers:
layer.trainable = True
cnn = TimeDistributed(base_model, input_shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, 3))
model = Sequential()
model.add(Input(shape=(NUMBER_OF_FRAMES, IMAGE_SIZE, IMAGE_SIZE, IMAGE_CHANNELS)))
model.add(cnn)
model.add(TimeDistributed(Flatten()))
model.add(LSTM(NUMBER_OF_FRAMES , return_sequences= True))
model.add(BatchNormalization())
model.add(TimeDistributed(Dense(90)))
model.add(BatchNormalization())
model.add(GlobalAveragePooling1D())
model.add(BatchNormalization())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(2, activation="sigmoid"))
adam = Adam(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=["accuracy"])
return model
x = CNN_LSTM()
x.summary()
history = x.fit(fvideo,flabels, batch_size=10, epochs=15)`
How can I solve the problem of the same accuracy? Thanks in advance

Highway networks in keras and lasagne - significant perfomance difference

I implemented highway networks with keras and with lasagne, and the keras version consistently underperforms to the lasagne version. I am using the same dataset and metaparameters in both of them. Here is the keras version's code:
X_train, y_train, X_test, y_test, X_all = hacking_script.load_all_data()
data_dim = 144
layer_count = 32
dropout = 0.04
hidden_units = 32
nb_epoch = 10
model = Sequential()
model.add(Dense(hidden_units, input_dim=data_dim))
model.add(Dropout(dropout))
for index in range(layer_count):
model.add(Highway(activation = 'relu'))
model.add(Dropout(dropout))
model.add(Dropout(dropout))
model.add(Dense(2, activation='softmax'))
print 'compiling...'
model.compile(loss='binary_crossentropy', optimizer='adagrad')
model.fit(X_train, y_train, batch_size=100, nb_epoch=nb_epoch,
show_accuracy=True, validation_data=(X_test, y_test), shuffle=True, verbose=0)
predictions = model.predict_proba(X_test)
And here is the lasagne version's code:
class MultiplicativeGatingLayer(MergeLayer):
def __init__(self, gate, input1, input2, **kwargs):
incomings = [gate, input1, input2]
super(MultiplicativeGatingLayer, self).__init__(incomings, **kwargs)
assert gate.output_shape == input1.output_shape == input2.output_shape
def get_output_shape_for(self, input_shapes):
return input_shapes[0]
def get_output_for(self, inputs, **kwargs):
return inputs[0] * inputs[1] + (1 - inputs[0]) * inputs[2]
def highway_dense(incoming, Wh=Orthogonal(), bh=Constant(0.0),
Wt=Orthogonal(), bt=Constant(-4.0),
nonlinearity=rectify, **kwargs):
num_inputs = int(np.prod(incoming.output_shape[1:]))
l_h = DenseLayer(incoming, num_units=num_inputs, W=Wh, b=bh, nonlinearity=nonlinearity)
l_t = DenseLayer(incoming, num_units=num_inputs, W=Wt, b=bt, nonlinearity=sigmoid)
return MultiplicativeGatingLayer(gate=l_t, input1=l_h, input2=incoming)
# ==== Parameters ====
num_features = X_train.shape[1]
epochs = 10
hidden_layers = 32
hidden_units = 32
dropout_p = 0.04
# ==== Defining the neural network shape ====
l_in = InputLayer(shape=(None, num_features))
l_hidden1 = DenseLayer(l_in, num_units=hidden_units)
l_hidden2 = DropoutLayer(l_hidden1, p=dropout_p)
l_current = l_hidden2
for k in range(hidden_layers - 1):
l_current = highway_dense(l_current)
l_current = DropoutLayer(l_current, p=dropout_p)
l_dropout = DropoutLayer(l_current, p=dropout_p)
l_out = DenseLayer(l_dropout, num_units=2, nonlinearity=softmax)
# ==== Neural network definition ====
net1 = NeuralNet(layers=l_out,
update=adadelta, update_rho=0.95, update_learning_rate=1.0,
objective_loss_function=categorical_crossentropy,
train_split=TrainSplit(eval_size=0), verbose=0, max_epochs=1)
net1.fit(X_train, y_train)
predictions = net1.predict_proba(X_test)[:, 1]
Now the keras version barely outperforms logistic regression, while the lasagne version is the best scoring algorithm so far. Any ideas as to why?

Here are some suggestions (I'm not sure if they will actually close the performance gap you are observing):
According to the Keras documentation the Highway layer is initialized using Glorot Uniform weights while in your Lasagne code you are using Orthogonal weight initialization. Unless you have another part of your code where you set the weight initialization to Orthogonal for the Keras Highway layer, this could be a source of the performance gap.
It also seems like you are using Adagrad for your Keras model, but you are using Adadelta for your Lasagne model.
Also I am not 100% sure about this, but you may also want to verify that your transform bias terms are initialized the same way.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Emotion Detection with FER2013 and MPI dataset using CNN and TFLearn - deep-learning

Related

deep learning RestNet problem calculate confusion matrix and other matrixes

How to use DNN to fit these data

How to deal with severe overfitting in a UNet Encoder/Decoder CNN in a task very similar to image translation?

Training accuracy is not improving in recognizing violence with CNN + LSTM

Highway networks in keras and lasagne - significant perfomance difference

Categories

Resources