I'm trying to implement this LSTM Architecture from the paper "Dropout improves Recurrent Neural Networks for Handwriting Recognition":
In the paper, the researchers defined Multidirectional LSTM Layers as "Four LSTM layers applied in parallel, each with a particular scanning direction"
Here's how (I think) the network looks like in Keras:
from keras.layers import LSTM, Dropout, Input, Convolution2D, Merge, Dense, Activation, TimeDistributed
from keras.models import Sequential
def build_lstm_dropout(inputdim, outputdim, return_sequences=True, activation='tanh'):
net_input = Input(shape=(None, inputdim))
model = Sequential()
lstm = LSTM(output_dim=outputdim, return_sequences=return_sequences, activation=activation)(net_input)
model.add(lstm)
model.add(Dropout(0.5))
return model
def build_conv(nb_filter, nb_row, nb_col, net_input, border_mode='relu'):
return TimeDistributed(Convolution2D( nb_filter, nb_row, nb_col, border_mode=border_mode, activation='relu')(net_input))
def build_lstm_conv(lstm, conv):
model = Sequential()
model.add(lstm)
model.add(conv)
return model
def build_merged_lstm_conv_layer(lstm_conv, mode='concat'):
return Merge([lstm_conv, lstm_conv, lstm_conv, lstm_conv], mode=mode)
def build_model(feature_dim, loss='ctc_cost_for_train', optimizer='Adadelta'):
net_input = Input(shape=(1, feature_dim, None))
lstm = build_lstm_dropout(2, 6)
conv = build_conv(64, 2, 4, net_input)
lstm_conv = build_lstm_conv(lstm, conv)
first_layer = build_merged_lstm_conv_layer(lstm_conv)
lstm = build_lstm_dropout(10, 20)
conv = build_conv(128, 2, 4, net_input)
lstm_conv = build_lstm_conv(lstm, conv)
second_layer = build_merged_lstm_conv_layer(lstm_conv)
lstm = build_lstm_dropout(50, 1)
fully_connected = Dense(1, activation='sigmoid')
lstm_fc = Sequential()
lstm_fc.add(lstm)
lstm_fc.add(fully_connected)
third_layer = Merge([lstm_fc, lstm_fc, lstm_fc, lstm_fc], mode='concat')
final_model = Sequential()
final_model.add(first_layer)
final_model.add(Activation('tanh'))
final_model.add(second_layer)
final_model.add(Activation('tanh'))
final_model.add(third_layer)
final_model.compile(loss=loss, optimizer=optimizer, sample_weight_mode='temporal')
return final_model
And here are my questions:
If my implementation of the architecture is correct, how do you
implement the scanning directions for the four LSTM layers?
If my implementation is not correct, is it possible to implement
such an architecture in Keras? If not, are there any other frameworks that can help me in implementing such an architecture?
You can check this for the implementation of bidirectional LSTM. Basically, you just set go_backwards=True for the backward-LSTM.
However, in your case, you have to write a "mirror"+reshape layer to reverse the rows. A mirror layer can look like (I am using lambda layer here for convenience) : Lambda(lambda x: x[:,::-1,:])
Related
I am trying to recreate the models from a study in which CNN-LSTM outperformed LSTM, but my CNN-LSTM produces nearly identical results to the LSTM. So it seems like the addition of the convolutional layers is not doing anything. The study describes the CNN-LSTM model like this:
The model is constructed by a single LSTM layer and two CNN layers. To form the CNN part, two 1D convolutional neural networks are stacked without any pooling layer. The second CNN layer is followed by a Rectified Linear Unit (ReLU) activation function. Each of the flattened output of the CNN’s ReLU layer and the LSTM layer is projected to the same dimension using a fully connected layer. Finally, a dropout layer is placed before the output layer.
Did I make a mistake in the implementation? The results of my CNN-LSTM are almost exactly the same as when I use the LSTM on its own. The LSTM on its own is the exact same code as below, just without the two conv1d layers and without the ReLu activation function.
class CNN_LSTM(nn.Module):
def __init__(self, input_size, seq_len, params, output_size):
super(CNN_LSTM, self).__init__()
self.n_hidden = params['lstm_hidden'] # neurons in each lstm layer
self.seq_len = seq_len # length of the input sequence
self.n_layers = 1 # nr of recurrent layers in the lstm
self.n_filters = params['n_filters'] # size of filter in cnn
self.c1 = nn.Conv1d(in_channels=1, out_channels=params['n_filters'], kernel_size=1, stride=1)
self.c2 = nn.Conv1d(in_channels=params['n_filters'], out_channels=1, kernel_size=1, stride=1)
self.lstm = nn.LSTM(
input_size=input_size, # nr of input features
hidden_size=params['lstm_hidden'],
num_layers=1
)
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(in_features=seq_len*params['lstm_hidden'], out_features=params['dense_hidden'])
self.dropout = nn.Dropout(p=.4)
self.fc2 = nn.Linear(in_features=params['dense_hidden'], out_features=output_size) # output_size = nr of output features
def reset_hidden_state(self):
self.hidden = (
torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),
torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),
)
def forward(self, sequences):
out = self.c1(sequences.view(len(sequences), 1, -1))
out = self.c2(out.view(len(out), self.n_filters, -1))
out = F.relu(out)
out, self.hidden = self.lstm(
out.view(len(out), self.seq_len, -1),
self.hidden
)
out = self.flatten(out)
out = self.fc1(out)
out = self.dropout(out)
out = self.fc2(out)
return out
Source for the study I am using.
I am new to deep learning, trying to implement a neural network using 4-fold cross-validation for training, testing, and validating. The topic is to classify the vehicle using an existing dataset.
The accuracy result is 0.7.
Traning Accuracy
An example output for epochs
I also don't know whether the code is correct and what to do for increasing the accuracy.
Here is the code:
!pip install category_encoders
import tensorflow as tf
from sklearn.model_selection import KFold
import pandas as pd
import numpy as np
from tensorflow import keras
import category_encoders as ce
from category_encoders import OrdinalEncoder
car_data = pd.read_csv('car_data.csv')
car_data.columns = ['Purchasing', 'Maintenance', 'No_Doors','Capacity','BootSize','Safety','Evaluation']
# Extract the features and labels from the dataset
X = car_data.drop(['Evaluation'], axis=1)
Y = car_data['Evaluation']
encoder = ce.OrdinalEncoder(cols=['Purchasing', 'Maintenance', 'No_Doors','Capacity','BootSize','Safety'])
X = encoder.fit_transform(X)
X = X.to_numpy()
Y_df = pd.DataFrame(Y, columns=['Evaluation'])
encoder = OrdinalEncoder(cols=['Evaluation'])
Y_encoded = encoder.fit_transform(Y_df)
Y = Y_encoded.to_numpy()
input_layer = tf.keras.layers.Input(shape=(X.shape[1]))
# Define the hidden layers
hidden_layer_1 = tf.keras.layers.Dense(units=64, activation='relu', kernel_initializer='glorot_uniform')(input_layer)
hidden_layer_2 = tf.keras.layers.Dense(units=32, activation='relu', kernel_initializer='glorot_uniform')(hidden_layer_1)
# Define the output layer
output_layer = tf.keras.layers.Dense(units=1, activation='sigmoid', kernel_initializer='glorot_uniform')(hidden_layer_2)
# Create the model
model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
# Initialize the 4-fold cross-validation
kfold = KFold(n_splits=4, shuffle=True, random_state=42)
# Initialize a list to store the scores
scores = []
quality_weights= []
# Compile the model
model.compile(optimizer='adam',
loss=''sparse_categorical_crossentropy'',
metrics=['accuracy'],
sample_weight_mode='temporal')
for train_index, test_index in kfold.split(X,Y):
# Split the data into train and test sets
X_train, X_test = X[train_index], X[test_index]
Y_train, Y_test = Y[train_index], Y[test_index]
# Fit the model on the training data
model.fit(X_train, Y_train, epochs=300, batch_size=64, sample_weight=quality_weights)
# Evaluate the model on the test data
score = model.evaluate(X_test, Y_test)
# Append the score to the scores list
scores.append(score[1])
plt.plot(history.history['accuracy'])
plt.title('Model Training Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper left')
plt.show()
# Print the mean and standard deviation of the scores
print(f'Mean accuracy: {np.mean(scores):.3f} +/- {np.std(scores):.3f}')
The first thing that caught my attention was here:
model.fit(X_train, Y_train, epochs=300, batch_size=64, sample_weight=quality_weights)
Your quality_weights should be a numpy array of size of the input.
Refer here: https://keras.io/api/models/model_training_apis/#fit-method
If changing that doesn't seemt to help then may be your network doesn't seem to be learning from the data. A few possible reasons could be:
The network is a bit too shallow. Try adding just one more hidden layer to see if that improves anything
From the code I can't see the size of your input data. Does it have enough datapoints for 4-fold cross-validation? Can you somehow augment the data?
I understand that mse will treat both actual - predict, and predict - actual the same way. I want to write a custom loss function such that
the penalty of predict > actual is more than actual > predict
Say I will have 2x more penalty for being predict > actual. How would I implement such function
import numpy as np
from keras.models import Model
from keras.layers import Input
import keras.backend as K
from keras.engine.topology import Layer
from keras.layers.core import Dense
from keras import objectives
def create_model():
# define the size
input_size = 6
hidden_size = 15;
# definte the model
model = Sequential()
model.add(Dense(input_size, input_dim=input_size, kernel_initializer='normal', activation='relu'))
model.add(Dense(hidden_size, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# mse is used as loss for the optimiser to converge quickly
# mae is something you can quantify the manitude
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
return model
early_stop = EarlyStopping(monitor='val_loss', patience=20)
history = model.fit(train_features, train_label, epochs=200, validation_split=0.2, verbose=0, shuffle=True)
predvalue = model.predict(test_features).flatten() * 100
How do I implement such loss function?
def customLoss(true,pred):
diff = pred - true
greater = K.greater(diff,0)
greater = K.cast(greater, K.floatx()) #0 for lower, 1 for greater
greater = greater + 1 #1 for lower, 2 for greater
#use some kind of loss here, such as mse or mae, or pick one from keras
#using mse:
return K.mean(greater*K.square(diff))
model.compile(optimizer = 'adam', loss = customLoss)
I want to use Keras to do two classes image classify using Cat vs. Dog dataset from Kaggle.com.
But I have some problem with param "class_mode" as below code.
if I use "binary" mode, accuracy is about 95%, but if I use "categorical" accuracy is abnormally low, only above 50%.
binary mode means only one output in last layer and use sigmoid activation to classify. sample's label is only one integer.
categorical means two output in last layer and use softmax activation to classify. sample's label is one hot format, eg.(1,0), (0,1).
I think these two ways should have the similar result. Anyone knows the reason for the difference? Thanks very much!
import os
import sys
import glob
import argparse
import matplotlib.pyplot as plt
from keras import __version__
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
set some params here
IM_WIDTH, IM_HEIGHT = 299, 299 #fixed size for InceptionV3
NB_EPOCHS = 1
BAT_SIZE = 32
FC_SIZE = 1024
NB_IV3_LAYERS_TO_FREEZE = 172
loss_mode = "binary_crossentropy"
def get_nb_files(directory):
"""Get number of files by searching directory recursively"""
if not os.path.exists(directory):
return 0
cnt = 0
for r, dirs, files in os.walk(directory):
for dr in dirs:
cnt += len(glob.glob(os.path.join(r, dr + "/*")))
return cnt
transfer_learn, keep the weights in inception v3
def setup_to_transfer_learn(model, base_model):
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss=loss_mode, metrics=['accuracy'])
Add last layer to do two classes classification.
def add_new_last_layer(base_model, nb_classes):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top
nb_classes: # of classes
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(FC_SIZE, activation='relu')(x) #new FC layer, random init
if args.class_mode == "binary":
predictions = Dense(1, activation='sigmoid')(x) #new softmax layer
else:
predictions = Dense(nb_classes, activation='softmax')(x) #new softmax layer
model = Model(inputs=base_model.input, outputs=predictions)
return model
Freeze the bottom NB_IV3_LAYERS and retrain the remaining top layers,
and fine tune weights.
def setup_to_finetune(model):
"""Freeze the bottom NB_IV3_LAYERS and retrain the remaining top layers.
note: NB_IV3_LAYERS corresponds to the top 2 inception blocks in the inceptionv3 arch
Args:
model: keras model
"""
for layer in model.layers[:NB_IV3_LAYERS_TO_FREEZE]:
layer.trainable = False
for layer in model.layers[NB_IV3_LAYERS_TO_FREEZE:]:
layer.trainable = True
model.compile(optimizer="rmsprop", loss=loss_mode, metrics=['accuracy'])
#model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
def train(args):
"""Use transfer learning and fine-tuning to train a network on a new dataset"""
nb_train_samples = get_nb_files(args.train_dir)
nb_classes = len(glob.glob(args.train_dir + "/*"))
nb_val_samples = get_nb_files(args.val_dir)
nb_epoch = int(args.nb_epoch)
batch_size = int(args.batch_size)
print("nb_classes:{}".format(nb_classes))
data prepare
train_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
test_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
train_generator = train_datagen.flow_from_directory(
args.train_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
#class_mode='binary'
class_mode=args.class_mode
)
validation_generator = test_datagen.flow_from_directory(
args.val_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
#class_mode='binary'
class_mode=args.class_mode
)
setup model
base_model = InceptionV3(weights='imagenet', include_top=False) #include_top=False excludes final FC layer
model = add_new_last_layer(base_model, nb_classes)
transfer learning
setup_to_transfer_learn(model, base_model)
#model.summary()
history_tl = model.fit_generator(
train_generator,
epochs=nb_epoch,
steps_per_epoch=nb_train_samples//BAT_SIZE,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
fine-tuning
setup_to_finetune(model)
history_ft = model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples//BAT_SIZE,
epochs=nb_epoch,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
model.save(args.output_model_file)
if args.plot:
plot_training(history_ft)
def plot_training(history):
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'r.')
plt.plot(epochs, val_acc, 'r')
plt.title('Training and validation accuracy')
plt.figure()
plt.plot(epochs, loss, 'r.')
plt.plot(epochs, val_loss, 'r-')
plt.title('Training and validation loss')
plt.show()
main func
if __name__=="__main__":
a = argparse.ArgumentParser()
a.add_argument("--train_dir", default="train2")
a.add_argument("--val_dir", default="test2")
a.add_argument("--nb_epoch", default=NB_EPOCHS)
a.add_argument("--batch_size", default=BAT_SIZE)
a.add_argument("--output_model_file", default="inceptionv3-ft.model")
a.add_argument("--plot", action="store_true")
a.add_argument("--class_mode", default="binary")
args = a.parse_args()
if args.train_dir is None or args.val_dir is None:
a.print_help()
sys.exit(1)
if args.class_mode != "binary" and args.class_mode != "categorical":
print("set class_mode as 'binary' or 'categorical'")
if args.class_mode == "categorical":
loss_mode = "categorical_crossentropy"
#set class_mode
print("class_mode:{}, loss_mode:{}".format(args.class_mode, loss_mode))
if (not os.path.exists(args.train_dir)) or (not os.path.exists(args.val_dir)):
print("directories do not exist")
sys.exit(1)
train(args)
I had this problem on several tasks when the learning rate was too high. Try something like 0.0001 or even less.
According to the Keras Documentation, the default rate ist 0.001:
keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
See https://keras.io/optimizers/#rmsprop
I found that if I use SDG or Adam optimizer, the accuracy can go up normally. So is there something wrong using RMSprop optimizer with default learning rate=0.001?
I am learning about designing Convolutional Neural Networks using Keras. I have developed a simple model using VGG16 as the base. I have about 6 classes of images in the dataset. Here are the code and description of my model.
model = models.Sequential()
conv_base = VGG16(weights='imagenet' ,include_top=False, input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
conv_base.trainable = False
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(6, activation='sigmoid'))
Here is the code for compiling and fitting the model:
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
model.summary()
callbacks = [
EarlyStopping(monitor='acc', patience=1, mode='auto'),
ModelCheckpoint(monitor='val_loss', save_best_only=True, filepath=model_file_path)
]
history = model.fit_generator(
train_generator,
steps_per_epoch=10,
epochs=EPOCHS,
validation_data=validation_generator,
callbacks = callbacks,
validation_steps=10)
Here is the code for prediction of a new image
img = image.load_img(img_path, target_size=(IMAGE_SIZE, IMAGE_SIZE))
plt.figure(index)
imgplot = plt.imshow(img)
x = image.img_to_array(img)
x = x.reshape((1,) + x.shape)
prediction = model.predict(x)[0]
# print(prediction)
Often model.predict() method predicts more than one class.
[0 1 1 0 0 0]
I have a couple of questions
Is it normal for a multiclass classification model to predict more than one output?
How is accuracy measured during training time if more than one class was predicted?
How can I modify the neural network so that only one class is predicted?
Any help is appreciated. Thank you so much!
You are not doing multi-class classification, but multi-label. This is caused by the use of a sigmoid activation at the output layer. To do multi-class classification properly, use a softmax activation at the output, which will produce a probability distribution over classes.
Taking the class with the biggest probability (argmax) will produce a single class prediction, as expected.