A3C with LSTM using keras - deep-learning

i'm trying to implement an A3C model with LSTM using keras, i beging with this version of A3C without LSTM: "https://github.com/coreylynch/async-rl", and try to modify only the network code, but i struggle to compile the whole model:
am i missing something ?
this is my model:
state = tf.placeholder("float", [None, agent_history_length, resized_width, resized_height])
vision_model = Sequential()
vision_model.add(Conv2D(activation="relu", filters=16, kernel_size=(8, 8), name="conv1", padding="same", strides=(4, 4),input_shape=(agent_history_length,resized_width, resized_height)))
vision_model.add(Conv2D(activation="relu", filters=32, kernel_size=(4, 4), name="conv2", padding="same", strides=(2, 2)))
vision_model.add(Flatten())
vision_model.add(Dense(activation="relu", units=256, name="h1"))
# Now let's get a tensor with the output of our vision model:
state_input = Input(shape=(1,agent_history_length,resized_width,resized_height))
encoded_frame_sequence = TimeDistributed(vision_model)(state_input)
encoded_video = LSTM(256)(encoded_frame_sequence) # the output will be a vector
action_probs = Dense(activation="softmax", units=4, name="p")(encoded_video)
state_value = Dense(activation="linear", units=1, name="v")(encoded_video)
policy_network = Model(inputs=state_input, outputs=action_probs)
value_network = Model(inputs=state_input, outputs=state_value)
p_params = policy_network.trainable_weights
v_params = value_network.trainable_weights
policy_network.summary()
value_network.summary()
p_out = policy_network(state_input)
v_out = value_network(state_input)

keras-rl examples lib does NOT support more than 2D input shape !

Related

Model Doesn't Learn When Used With Grid Search

I have built two models, the first works fine, the code is
model = Sequential()
model.add(Dense(25, input_dim=8, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(25, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(25, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='tanh'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_squared_error', optimizer=opt)
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=1000, verbose=1)
train_mse = model.evaluate(X_train, y_train, verbose=0)
test_mse = model.evaluate(X_test, y_test, verbose=0)
I believe the next model below, implemented using grid search, is identical. Except it doesn't seem to learn, the loss starts off astronomical and barely decreases. The code is:
def build_classifier():
classifier = Sequential()
classifier.add(Dense(25, input_dim=8, activation='relu', kernel_initializer='he_uniform'))
classifier.add(Dense(25, activation='relu', kernel_initializer='he_uniform'))
classifier.add(Dense(25, activation='relu', kernel_initializer='he_uniform'))
classifier.add(Dense(1, activation='tanh'))
opt = SGD(lr=0.01, momentum=0.9)
classifier.compile(loss='mean_squared_error', optimizer=opt, metrics=["accuracy"])
return classifier
classifier = KerasClassifier(build_fn = build_classifier)
parameters = {'epochs':[1000]}
grid_search = GridSearchCV(estimator = classifier,
param_grid = parameters,
cv = 5)
grid_search = grid_search.fit(X_train, y_train)
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_
^I am aware I am not actually tuning any hyper parameters here. I kept reduce and removing the hyper parameters I was tuning, in the hope I could identify the problem.
I have cross_categorical models using the same data, and do not have this problem (making sure my model is reasonable with test train split, then remaking with grid search for tuning hyper parameters).
I cannot see what the difference between the two models would be, that allows the first to learn reasonably but the second to not work at all.

Error Received while building the Auto encoder

I am trying to build an auto encoder for my term project using CNN as Encoder and LSTM as Decoder, how ever when i display the summary of the model. I receive the following error:
ValueError: Input 0 is incompatible with layer lstm_10: expected ndim=3, found ndim=2
x.shape = (45406, 100, 100)
y.shape = (45406,)
I already tried changing the shape of the input for the LSTM, but it didn't work.
def keras_model(image_x, image_y):
model = Sequential()
model.add(Lambda(lambda x: x / 127.5 - 1., input_shape=(image_x, image_y, 1)))
last = model.output
x = Conv2D(3, (3, 3), padding='same')(last)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2, 2), padding='valid')(x)
encoded= Flatten()(x)
x = LSTM(8, return_sequences=True, input_shape=(100,100))(encoded)
decoded = LSTM(64, return_sequences = True)(x)
x = Dropout(0.5)(decoded)
x = Dense(400, activation='relu')(x)
x = Dense(25, activation='relu')(x)
final = Dense(1, activation='relu')(x)
autoencoder = Model(model.input, final)
autoencoder.compile(optimizer="Adam", loss="mse")
autoencoder.summary()
model= keras_model(100, 100)
Given you are using an LSTM, you need a time dimension. So your input shape should be: (time, image_x, image_y, nb_image_channels).
I would suggest to get a more in-depth understanding of autoencoders, LSTM and 2D Convolution as all these play together here. This is a helpful intro: https://machinelearningmastery.com/lstm-autoencoders/ and this https://blog.keras.io/building-autoencoders-in-keras.html).
Also have a look at this example, someone implemented an LSTM with Conv2D How to reshape 3 channel dataset for input to neural network. The TimeDistributed layer comes in useful here.
However, just to get your error fixed you can add a Reshape() layer to fake the extra dimension:
def keras_model(image_x, image_y):
model = Sequential()
model.add(Lambda(lambda x: x / 127.5 - 1., input_shape=(image_x, image_y, 1)))
last = model.output
x = Conv2D(3, (3, 3), padding='same')(last)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2, 2), padding='valid')(x)
encoded= Flatten()(x)
# (50,50,3) is the output shape of the max pooling layer (see model summary)
encoded = Reshape((50*50*3, 1))(encoded)
x = LSTM(8, return_sequences=True)(encoded) # input shape can be removed
decoded = LSTM(64, return_sequences = True)(x)
x = Dropout(0.5)(decoded)
x = Dense(400, activation='relu')(x)
x = Dense(25, activation='relu')(x)
final = Dense(1, activation='relu')(x)
autoencoder = Model(model.input, final)
autoencoder.compile(optimizer="Adam", loss="mse")
print(autoencoder.summary())
model= keras_model(100, 100)

Reinforcement learning, why the performance collapsed?

I am trying to train an agent on ViZDoom platform on the deadly_corridor scenario with A3C algorithm and TensorFlow on TITAN X GPU server, however, the performance collapsed after training about 2+ days. As you can see in the following picture.
There are 6 demons in the corridor and the agent should kill at least 5 demons to get to the destination and get the vest.
Here is the code of the newtwork
with tf.variable_scope(scope):
self.inputs = tf.placeholder(shape=[None, *shape, 1], dtype=tf.float32)
self.conv_1 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.inputs, num_outputs=32,
kernel_size=[8, 8], stride=4, padding='SAME')
self.conv_2 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.conv_1, num_outputs=64,
kernel_size=[4, 4], stride=2, padding='SAME')
self.conv_3 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.conv_2, num_outputs=64,
kernel_size=[3, 3], stride=1, padding='SAME')
self.fc = slim.fully_connected(slim.flatten(self.conv_3), 512, activation_fn=tf.nn.elu)
# LSTM
lstm_cell = tf.contrib.rnn.BasicLSTMCell(cfg.RNN_DIM, state_is_tuple=True)
c_init = np.zeros((1, lstm_cell.state_size.c), np.float32)
h_init = np.zeros((1, lstm_cell.state_size.h), np.float32)
self.state_init = [c_init, h_init]
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
self.state_in = (c_in, h_in)
rnn_in = tf.expand_dims(self.fc, [0])
step_size = tf.shape(self.inputs)[:1]
state_in = tf.contrib.rnn.LSTMStateTuple(c_in, h_in)
lstm_outputs, lstm_state = tf.nn.dynamic_rnn(lstm_cell,
rnn_in,
initial_state=state_in,
sequence_length=step_size,
time_major=False)
lstm_c, lstm_h = lstm_state
self.state_out = (lstm_c[:1, :], lstm_h[:1, :])
rnn_out = tf.reshape(lstm_outputs, [-1, 256])
# Output layers for policy and value estimations
self.policy = slim.fully_connected(rnn_out,
cfg.ACTION_DIM,
activation_fn=tf.nn.softmax,
biases_initializer=None)
self.value = slim.fully_connected(rnn_out,
1,
activation_fn=None,
biases_initializer=None)
if scope != 'global' and not play:
self.actions = tf.placeholder(shape=[None], dtype=tf.int32)
self.actions_onehot = tf.one_hot(self.actions, cfg.ACTION_DIM, dtype=tf.float32)
self.target_v = tf.placeholder(shape=[None], dtype=tf.float32)
self.advantages = tf.placeholder(shape=[None], dtype=tf.float32)
self.responsible_outputs = tf.reduce_sum(self.policy * self.actions_onehot, axis=1)
# Loss functions
self.policy_loss = -tf.reduce_sum(self.advantages * tf.log(self.responsible_outputs+1e-10))
self.value_loss = tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value, [-1])))
self.entropy = -tf.reduce_sum(self.policy * tf.log(self.policy+1e-10))
# Get gradients from local network using local losses
local_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)
value_var, policy_var = local_vars[:-2] + [local_vars[-1]], local_vars[:-2] + [local_vars[-2]]
self.var_norms = tf.global_norm(local_vars)
self.value_gradients = tf.gradients(self.value_loss, value_var)
value_grads, self.grad_norms_value = tf.clip_by_global_norm(self.value_gradients, 40.0)
self.policy_gradients = tf.gradients(self.policy_loss, policy_var)
policy_grads, self.grad_norms_policy = tf.clip_by_global_norm(self.policy_gradients, 40.0)
# Apply local gradients to global network
global_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'global')
global_vars_value, global_vars_policy = \
global_vars[:-2] + [global_vars[-1]], global_vars[:-2] + [global_vars[-2]]
self.apply_grads_value = optimizer.apply_gradients(zip(value_grads, global_vars_value))
self.apply_grads_policy = optimizer.apply_gradients(zip(policy_grads, global_vars_policy))
And the optimizer is
optimizer = tf.train.RMSPropOptimizer(learning_rate=1e-5)
And here are some summaries of the gradients and norms
Help some one can help me to tackle this problem.
Now, personally, I think the reason why the performance of the agent collapsed is maybe the overoptimization of values. I read a paper on Double DQN on this, you can read this paper DEEP REINFORCEMENT LEARNING WITH DOUBLE Q-LEARNING

How to Implement "Multidirectional" LSTMs?

I'm trying to implement this LSTM Architecture from the paper "Dropout improves Recurrent Neural Networks for Handwriting Recognition":
In the paper, the researchers defined Multidirectional LSTM Layers as "Four LSTM layers applied in parallel, each with a particular scanning direction"
Here's how (I think) the network looks like in Keras:
from keras.layers import LSTM, Dropout, Input, Convolution2D, Merge, Dense, Activation, TimeDistributed
from keras.models import Sequential
def build_lstm_dropout(inputdim, outputdim, return_sequences=True, activation='tanh'):
net_input = Input(shape=(None, inputdim))
model = Sequential()
lstm = LSTM(output_dim=outputdim, return_sequences=return_sequences, activation=activation)(net_input)
model.add(lstm)
model.add(Dropout(0.5))
return model
def build_conv(nb_filter, nb_row, nb_col, net_input, border_mode='relu'):
return TimeDistributed(Convolution2D( nb_filter, nb_row, nb_col, border_mode=border_mode, activation='relu')(net_input))
def build_lstm_conv(lstm, conv):
model = Sequential()
model.add(lstm)
model.add(conv)
return model
def build_merged_lstm_conv_layer(lstm_conv, mode='concat'):
return Merge([lstm_conv, lstm_conv, lstm_conv, lstm_conv], mode=mode)
def build_model(feature_dim, loss='ctc_cost_for_train', optimizer='Adadelta'):
net_input = Input(shape=(1, feature_dim, None))
lstm = build_lstm_dropout(2, 6)
conv = build_conv(64, 2, 4, net_input)
lstm_conv = build_lstm_conv(lstm, conv)
first_layer = build_merged_lstm_conv_layer(lstm_conv)
lstm = build_lstm_dropout(10, 20)
conv = build_conv(128, 2, 4, net_input)
lstm_conv = build_lstm_conv(lstm, conv)
second_layer = build_merged_lstm_conv_layer(lstm_conv)
lstm = build_lstm_dropout(50, 1)
fully_connected = Dense(1, activation='sigmoid')
lstm_fc = Sequential()
lstm_fc.add(lstm)
lstm_fc.add(fully_connected)
third_layer = Merge([lstm_fc, lstm_fc, lstm_fc, lstm_fc], mode='concat')
final_model = Sequential()
final_model.add(first_layer)
final_model.add(Activation('tanh'))
final_model.add(second_layer)
final_model.add(Activation('tanh'))
final_model.add(third_layer)
final_model.compile(loss=loss, optimizer=optimizer, sample_weight_mode='temporal')
return final_model
And here are my questions:
If my implementation of the architecture is correct, how do you
implement the scanning directions for the four LSTM layers?
If my implementation is not correct, is it possible to implement
such an architecture in Keras? If not, are there any other frameworks that can help me in implementing such an architecture?
You can check this for the implementation of bidirectional LSTM. Basically, you just set go_backwards=True for the backward-LSTM.
However, in your case, you have to write a "mirror"+reshape layer to reverse the rows. A mirror layer can look like (I am using lambda layer here for convenience) : Lambda(lambda x: x[:,::-1,:])

How to build Convolutional Bi-directional LSTM with Keras

I'm trying to build a Convolutional Bi-directional LSTM to classify DNA sequences ala this paper: DanQ: a hybrid convolutional and recurrent deep
neural network for quantifying the function of DNA
sequences (Architecture picture on the second page)
The short version of it is to build to one-hot encode a DNA sequence:
`'ATACG...' = [
[1,0,0,0],
[0,0,0,1],
[1,0,0,0],
[0,1,0,0],
[0,0,1,0],
...],`
Then feed it to a convolutional-relu-maxpooling layer to find motifs, then into a bidirectional LSTM network to learn long-distance dependancies.
The original source code is here.
However, it uses an outdated version of Keras and includes a dependency on Seya, which is what I'd like to avoid doing. Here is my first attempt at building the model:
inputs = Input(shape=(500,4))
convo_1 = Convolution1D(320, border_mode='valid',filter_length=26, activation="relu", subsample_length=1)(inputs)
maxpool_1 = MaxPooling1D(pool_length=13, stride=13)(convo_1)
drop_1 = Dropout(0.2)(maxpool_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)
model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
checkpointer = ModelCheckpoint(filepath=sc_local_dir+"DanQ_bestmodel.hdf5", verbose=1, save_best_only=True)
earlystopper = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
Unfortunately, the loss remained nearly constant during training, and the accuracy stayed constant as well. This leads me to believe that I have set the model up incorrectly, or that 1-dimensional convolution is useless on this kind of input. So i attempted to make switch to 2D convolution:
inputs = Input(shape=(1, 500,4))
convo_1 = Convolution2D(320, nb_row=15, nb_col=4, init='glorot_uniform', \
activation='relu', border_mode='same')(inputs)
maxpool_1 = MaxPooling2D((15, 4))(convo_1)
flat_1 = Flatten()(maxpool_1)
drop_1 = Dropout(0.2)(flat_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)
model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
checkpointer = ModelCheckpoint(filepath=sc_local_dir+"DanQ_bestmodel.hdf5", verbose=1, save_best_only=True)
earlystopper = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
Which gives me the following error when trying to feed the flattened layer into the LSTM:
Exception: Input 0 is incompatible with layer lstm_4: expected ndim=3, found ndim=2
Have I set up my 1D Convolution LSTM correctly? If so, then I likely need to upgrade to a 2D Convolution LSTM, in which case, how can I fix the input error?