Reinforcement learning, why the performance collapsed?

Reinforcement learning, why the performance collapsed? - deep-learning

I am trying to train an agent on ViZDoom platform on the deadly_corridor scenario with A3C algorithm and TensorFlow on TITAN X GPU server, however, the performance collapsed after training about 2+ days. As you can see in the following picture.
There are 6 demons in the corridor and the agent should kill at least 5 demons to get to the destination and get the vest.
Here is the code of the newtwork
with tf.variable_scope(scope):
self.inputs = tf.placeholder(shape=[None, *shape, 1], dtype=tf.float32)
self.conv_1 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.inputs, num_outputs=32,
kernel_size=[8, 8], stride=4, padding='SAME')
self.conv_2 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.conv_1, num_outputs=64,
kernel_size=[4, 4], stride=2, padding='SAME')
self.conv_3 = slim.conv2d(activation_fn=tf.nn.relu, inputs=self.conv_2, num_outputs=64,
kernel_size=[3, 3], stride=1, padding='SAME')
self.fc = slim.fully_connected(slim.flatten(self.conv_3), 512, activation_fn=tf.nn.elu)
# LSTM
lstm_cell = tf.contrib.rnn.BasicLSTMCell(cfg.RNN_DIM, state_is_tuple=True)
c_init = np.zeros((1, lstm_cell.state_size.c), np.float32)
h_init = np.zeros((1, lstm_cell.state_size.h), np.float32)
self.state_init = [c_init, h_init]
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
self.state_in = (c_in, h_in)
rnn_in = tf.expand_dims(self.fc, [0])
step_size = tf.shape(self.inputs)[:1]
state_in = tf.contrib.rnn.LSTMStateTuple(c_in, h_in)
lstm_outputs, lstm_state = tf.nn.dynamic_rnn(lstm_cell,
rnn_in,
initial_state=state_in,
sequence_length=step_size,
time_major=False)
lstm_c, lstm_h = lstm_state
self.state_out = (lstm_c[:1, :], lstm_h[:1, :])
rnn_out = tf.reshape(lstm_outputs, [-1, 256])
# Output layers for policy and value estimations
self.policy = slim.fully_connected(rnn_out,
cfg.ACTION_DIM,
activation_fn=tf.nn.softmax,
biases_initializer=None)
self.value = slim.fully_connected(rnn_out,
1,
activation_fn=None,
biases_initializer=None)
if scope != 'global' and not play:
self.actions = tf.placeholder(shape=[None], dtype=tf.int32)
self.actions_onehot = tf.one_hot(self.actions, cfg.ACTION_DIM, dtype=tf.float32)
self.target_v = tf.placeholder(shape=[None], dtype=tf.float32)
self.advantages = tf.placeholder(shape=[None], dtype=tf.float32)
self.responsible_outputs = tf.reduce_sum(self.policy * self.actions_onehot, axis=1)
# Loss functions
self.policy_loss = -tf.reduce_sum(self.advantages * tf.log(self.responsible_outputs+1e-10))
self.value_loss = tf.reduce_sum(tf.square(self.target_v - tf.reshape(self.value, [-1])))
self.entropy = -tf.reduce_sum(self.policy * tf.log(self.policy+1e-10))
# Get gradients from local network using local losses
local_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)
value_var, policy_var = local_vars[:-2] + [local_vars[-1]], local_vars[:-2] + [local_vars[-2]]
self.var_norms = tf.global_norm(local_vars)
self.value_gradients = tf.gradients(self.value_loss, value_var)
value_grads, self.grad_norms_value = tf.clip_by_global_norm(self.value_gradients, 40.0)
self.policy_gradients = tf.gradients(self.policy_loss, policy_var)
policy_grads, self.grad_norms_policy = tf.clip_by_global_norm(self.policy_gradients, 40.0)
# Apply local gradients to global network
global_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'global')
global_vars_value, global_vars_policy = \
global_vars[:-2] + [global_vars[-1]], global_vars[:-2] + [global_vars[-2]]
self.apply_grads_value = optimizer.apply_gradients(zip(value_grads, global_vars_value))
self.apply_grads_policy = optimizer.apply_gradients(zip(policy_grads, global_vars_policy))
And the optimizer is
optimizer = tf.train.RMSPropOptimizer(learning_rate=1e-5)
And here are some summaries of the gradients and norms
Help some one can help me to tackle this problem.

Now, personally, I think the reason why the performance of the agent collapsed is maybe the overoptimization of values. I read a paper on Double DQN on this, you can read this paper DEEP REINFORCEMENT LEARNING WITH DOUBLE Q-LEARNING

Related

Temporal sequence feature extraction CNN, batches with different dimensions

I am using a CNN to extract features from temporal data of different lengths. I am using pad_sequence to pad the data in a batch. However as the max length in a batch will change, the padded sequence length differs by batch. This creates errors when i flatten the data for the FCN layer (as the dimension of the flattened vector changes). I am currently handling this by using an 'adaptive avg pooling layer' in before the FCN layers. As this is a global averaging, it fixes the output dimension for the FCN. However I am not sure if this is the correct thing to do.
Code is:
##pad tensors
def pad_collate(batch):
sequences = [item[0] for item in batch]
lengths = [len(seq) for seq in sequences]
padded_sequences = pad_sequence(sequences, batch_first=True, padding_value=0)
return padded_sequences, lengths
## Create dataloader
trainData = Sequence(root = path)
trainDataLoader = DataLoader(trainData, batch_size = BATCH_SIZE, collate_fn= pad_collate)
## CNN model
class FeatureExtractor(nn.Module):
def __init__(self, block, layers):
super(FeatureExtractor, self).__init__()
self.inplanes = 6
## 1st CONV layers
self.conv1 = nn.Conv2d(in_channels = 1, out_channels = 6, kernel_size = 3, stride = 2, padding = 4)
self.bn1 = nn.BatchNorm2d(6)
self.relu1 = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride = 2, padding = 1)
## residual blocks
self.layer0 = self._make_layer(block, 12, layers[0], stride = 1)
self.layer1 = self._make_layer(block, 24, layers[1], stride = 2)
self.avgpool = nn.AdaptiveAvgPool2d((5,5)) ##### MY CURRENT SOLUTION #####
self.fc = nn.Linear(600, 128)
def _make_layer(self, block, planes, blocks, stride):
downsample = None
if stride != 1 or self.inplanes != planes:
downsample = nn.Sequential(nn.Conv2d(self.inplanes, planes, kernel_size=1, stride=stride),
nn.BatchNorm2d(planes))
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
## first conv
x = self.conv1(x)
x = self.bn1(x)
x = self.relu1(x)
x = self.maxpool1(x)
## conv blocks
x = self.layer0(x)
x = self.layer1(x)
##FCN layer
x = self.avgpool(x)
x = torch.flatten(x, 1)
output = self.fc(x)
return output
Any other comments are also welcome (i am self-taught)

Why am I facing OOM in dali/pytorch inference pipeline?

I've a trained resnet 50 model and I'm using it for inference of medical images in jpeg2000 format. I'm using dali pipeline to speed up the process. But, I run into OOM after inferring 10 batches. I'm deleting tensors after each batch inference, still facing OOM.
Batch: 32,
image_size: 512 * 512,
GPU: P100 16GB.
#pipeline_def
def j2k_decode_pipeline():
jpegs, labels = fn.readers.file(files = j2kfiles)
images = fn.experimental.decoders.image(jpegs, device='mixed', output_type=types.RGB, dtype=DALIDataType.INT16)
images = fn.resize(
images,
resize_x=IMAGE_SIZE,
resize_y=IMAGE_SIZE,
resize_z=3,
interp_type=types.INTERP_LANCZOS3)
return images, labels
def get_preds_jk2(max_batch_size):
cnt = math.ceil(len(j2kfiles)/max_batch_size)-1
pipe = j2k_decode_pipeline(batch_size=max_batch_size, num_threads=2, device_id=0, debug=True)
pipe.build()
dali_iter = DALIGenericIterator(pipe, ['data', 'label'])
model_weights_path = '/model-2/model_weights.pth'
model = models.resnet50()
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)
model.load_state_dict(torch.load(model_weights_path))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
preds, labels = [], []
for i, data in enumerate(dali_iter):
print('batch ', i)
labels.append(data[0]['label'])
# Testing correctness of labels
imgs = data[0]['data']
im_temp = imgs - imgs.min()
im_temp = im_temp / im_temp.max()
im_temp = im_temp * 255
imgs_8bit = im_temp.type(torch.uint8).float()
imgs_tensor = imgs_8bit.permute(0,3,1,2)
imgs_tensor = imgs_tensor.to(device)
del imgs, im_temp, imgs_8bit
gc.collect()
torch.cuda.empty_cache()
with torch.no_grad():
outputs = model(imgs_tensor)
output_probs = torch.nn.functional.softmax(outputs, dim=1).data.cpu().numpy()[:,1]
preds.append(output_probs)
del imgs_tensor, outputs, output_probs
gc.collect()
torch.cuda.empty_cache()
if i == cnt:
print(i)
break
return preds, labels

How to add an additional output node during training for Pytorch?

I am making a class-incremental learning multi-label classifier. Here the model first trains with 7 labels. After training, another dataset emerges that contains the same labels except one more. I want to automatically add an extra node to the trained network and continue training on this new dataset. How can I do this?
class FeedForewardNN(nn.Module):
def __init__(self, input_size, h1_size = 264, h2_size = 128, num_services=8):
super().__init__()
self.input_size = input_size
self.lin1 = nn.Linear(input_size, h1_size)
self.lin2 = nn.Linear(h1_size, h2_size)
self.lin3 = nn.Linear(h2_size, num_services)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.lin1(x)
x = self.relu(x)
x = self.lin2(x)
x = self.relu(x)
x = self.lin3(x)
x = self.sigmoid(x)
return x
This is the architecture of the feedforward Neural Network.
Then I first train on the data set with only 7 classes.
#Create NN
input_size = len(x_columns)
net1 = FeedForewardNN(input_size, num_services=7)
alpha= 0.001
#Define optimizer
optimizer = optim.Adam(net.parameters(), lr=alpha)
criterion = nn.BCELoss()
running_loss = 0
#Training Loop
loss_list = []
auc_list = []
for i in range(len(train_data_x)):
optimizer.zero_grad()
outputs = net1(train_data_x[i])
loss = criterion(outputs, train_data_y[i])
loss.backward()
optimizer.step()
However then, I want to add one additional output node, define the new weights but maintain the old trained weights, and train on this new data set.

I suggest to replace layer with new one, having desired shape, and than partially assign its parameter values with old ones as follows:
def increaseClassifier( m: torch.nn.Linear ):
w = m.weight
b = m.bias
old_shape = m.weight.shape
m2 = nn.Linear( old_shape[1], old_shape[0] +1 )
m2.weight = nn.parameter.Parameter( torch.cat( (m.weight, m2.weight[0:1]) ) )
m2.bias = nn.parameter.Parameter( torch.cat( (m.bias, m2.bias[0:1]) ) )
return m2
class FeedForewardNN(nn.Module):
...
def incrHere(self):
self.lin3 = increaseClassifier( self.lin3 )
UPD:
Can you explain, how these additional weights that come with this new output node are initialized?
The initial weights for new channel come from new layer creation, layer constructor make new parameters with some random initialization, then we are replace part of it with trained weight, and remained part is ready for new training.
m2.weight = nn.parameter.Parameter( torch.cat( (m.weight, m2.weight[0:1]) ) )

WHat does Lambda do in this code (python keras)?

def AdaIN(x):
#Normalize x[0] (image representation)
mean = K.mean(x[0], axis = [1, 2], keepdims = True)
std = K.std(x[0], axis = [1, 2], keepdims = True) + 1e-7
y = (x[0] - mean) / std
#Reshape scale and bias parameters
pool_shape = [-1, 1, 1, y.shape[-1]]
scale = K.reshape(x[1], pool_shape)
bias = K.reshape(x[2], pool_shape)#Multiply by x[1] (GAMMA) and add x[2] (BETA)
return y * scale + bias
def g_block(input_tensor, latent_vector, filters):
gamma = Dense(filters, bias_initializer = 'ones')(latent_vector)
beta = Dense(filters)(latent_vector)
out = UpSampling2D()(input_tensor)
out = Conv2D(filters, 3, padding = 'same')(out)
out = Lambda(AdaIN)([out, gamma, beta])
out = Activation('relu')(out)
return out
Please see code above. I am currently studying styleGAN. I am trying to convert this code into pytorch but I cant seem to understand what does Lambda do in g_block. AdaIN needs only one input based on its declaration but some how is gamma and beta also used as input? Please inform me what does the Lambda do in this code.
Thank you very much.

Lambda layers in keras are used to call custom functions inside the model. In g_block Lambda calls AdaIN function and passes out, gamma, beta as arguments inside a list. And AdaIN function receives these 3 tensors encapsulated within a single list as x. And also those tensors are accessed inside AdaIN function by indexing list x(x[0], x[1], x[2]).
Here's pytorch equivalent:
import torch
import torch.nn as nn
import torch.nn.functional as F
class AdaIN(nn.Module):
def forward(self, out, gamma, beta):
bs, ch = out.size()[:2]
mean = out.reshape(bs, ch, -1).mean(dim=2).reshape(bs, ch, 1, 1)
std = out.reshape(bs, ch, -1).std(dim=2).reshape(bs, ch, 1, 1) + 1e-7
y = (out - mean) / std
bias = beta.unsqueeze(-1).unsqueeze(-1).expand_as(out)
scale = gamma.unsqueeze(-1).unsqueeze(-1).expand_as(out)
return y * scale + bias
class g_block(nn.Module):
def __init__(self, filters, latent_vector_shape, input_tensor_channels):
super().__init__()
self.gamma = nn.Linear(in_features = latent_vector_shape, out_features = filters)
# Initializes all bias to 1
self.gamma.bias.data = torch.ones(filters)
self.beta = nn.Linear(in_features = latent_vector_shape, out_features = filters)
# calculate appropriate padding
self.conv = nn.Conv2d(input_tensor_channels, filters, 3, 1, padding=1)# calc padding
self.adain = AdaIN()
def forward(self, input_tensor, latent_vector):
gamma = self.gamma(latent_vector)
beta = self.beta(latent_vector)
# check default interpolation mode in keras and replace mode below if different
out = F.interpolate(input_tensor, scale_factor=2, mode='nearest')
out = self.conv(out)
out = self.adain(out, gamma, beta)
out = torch.relu(out)
return out
# Sample:
input_tensor = torch.randn((1, 3, 10, 10))
latent_vector = torch.randn((1, 5))
g = g_block(3, latent_vector.shape[1], input_tensor.shape[1])
out = g(input_tensor, latent_vector)
print(out)
Note: you need to pass latent_vector and input_tensor shapes while creating g_block.

Why model's loss is always revolving around 1 in every epoch?

During training, loss of my model is revolving around "1". It is not converging.
I tried various optimizer but it still showing the same pattern. I am using keras with tensorflow backend. What could be possible reasons? Any help or reference link will be appreciable.
here is my model:
def model_vgg19():
vgg_model = VGG19(weights="imagenet", include_top=False, input_shape=(128,128,3))
for layer in vgg_model.layers[:10]:
layer.trainable = False
intermediate_layer_outputs = get_layers_output_by_name(vgg_model, ["block1_pool", "block2_pool", "block3_pool", "block4_pool"])
convnet_output = GlobalAveragePooling2D()(vgg_model.output)
for layer_name, output in intermediate_layer_outputs.items():
output = GlobalAveragePooling2D()(output)
convnet_output = concatenate([convnet_output, output])
convnet_output = Dense(2048, activation='relu')(convnet_output)
convnet_output = Dropout(0.6)(convnet_output)
convnet_output = Dense(2048, activation='relu')(convnet_output)
convnet_output = Lambda(lambda x: K.l2_normalize(x,axis=1)(convnet_output)
final_model = Model(inputs=[vgg_model.input], outputs=convnet_output)
return final_model
model=model_vgg19()
here is my loss function:
def hinge_loss(y_true, y_pred):
y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
loss = tf.convert_to_tensor(0,dtype=tf.float32)
g = tf.constant(1.0, shape=[1], dtype=tf.float32)
for i in range(0, batch_size, 3):
try:
q_embedding = y_pred[i+0]
p_embedding = y_pred[i+1]
n_embedding = y_pred[i+2]
D_q_p = K.sqrt(K.sum((q_embedding - p_embedding)**2))
D_q_n = K.sqrt(K.sum((q_embedding - n_embedding)**2))
loss = (loss + g + D_q_p - D_q_n)
except:
continue
loss = loss/(batch_size/3)
zero = tf.constant(0.0, shape=[1], dtype=tf.float32)
return tf.maximum(loss,zero)

What is definitely a problem is that you shuffle your data and then try to learn triplets out of this.
As you can see here: https://keras.io/models/model/ model.fit shuffles your data in each epoch, making your triplet setup obsolete. Try to set the shuffle parameter to false and see what happens, there might be different errors as well.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Reinforcement learning, why the performance collapsed? - deep-learning

Now, personally, I think the reason why the performance of the agent collapsed is maybe the overoptimization of values. I read a paper on Double DQN on this, you can read this paper DEEP REINFORCEMENT LEARNING WITH DOUBLE Q-LEARNING

Related

Temporal sequence feature extraction CNN, batches with different dimensions

Why am I facing OOM in dali/pytorch inference pipeline?

How to add an additional output node during training for Pytorch?

WHat does Lambda do in this code (python keras)?

Why model's loss is always revolving around 1 in every epoch?

Categories

Resources