Index of Embedding layer with zero padding in Keras - deep-learning

I am building an RNN model in Keras for sentences with word embeddings from gensim. I am initializing the embedding layer with GloVe vectors. Since this is a sequential model and sentences have variable lengths, vectors are zero-padded. e.g.
[0, 0, 0, 6, 2, 4]
Let's say the GloVe vectors have dimensions [NUM_VOCAB, EMBEDDING_SIZE]. The zero index is masked (ignored) so to get the proper indexing of words, do we add an extra column to the GloVe matrix so the dimensions are: [NUM_VOCAB+1, EMBEDDING_SIZE]?
Seems like there is an unnecessary vector that the model will estimate unless there is a more elegant way.
glove = Word2Vec.load_word2vec_format(filename)
embedding_matrix = np.vstack([np.zeros(EMBEDDING_SIZE), glove.syn0])
model = Sequential()
# -- this uses Glove as inits
model.add(Embedding(NUM_VOCAB, EMBEDDING_SIZE, input_length=maxlen, mask_zero=True,
weights=[embedding_matrix]))
# -- sequence layer
model.add(LSTM(32, return_sequences=False, init='orthogonal'))
model.add(Activation('tanh'))
...
Thanks

Related

How to use PyTorch nn.BatchNorm1d to get equal normalization across features?

i would like to ask a question regarding the nn.BatchNorm1d in PyTorch.
I have one main tensor, which has shape [B, 3, N]. Then, i have two additional tensors which have shape [B, 3, V1] and [B, 3, V2]. I will concatenate the main tensor with the two tensors separately, to construct new tensors [B, 3, N+V1] and [B, 3, N+V2].
I pass my tensors to a plain MLP (consists of conv1d and batchnorm1d). Ideally, i want to predict something "point-wise", like no matter what the number of dimension 2, it has some consistent prediction only given the value. However, the batchnorm1d will have different results given input [B, 3, N+V1] and [B, 3, N+V2], while i am only focusing on first N points in 2nd dimension.
import torch
import torch.nn as nn
# nn.BatchNorm1d
B=2
dim=64
N=40000
V1=1000
v2=2000
torch.manual_seed(0)
x = torch.rand(B, dim, N) # here imgs are flattened from 28x28
v1 = torch.rand(B, dim, V1)
v2 = torch.rand(B, dim, v2)
layer = nn.BatchNorm1d(dim) # batch norm is done on channels
out2 = layer(torch.cat((x, v1), dim=2))
out3 = layer(torch.cat((x, v2), dim=2))
torch.equal(out2[:, :, :N], out3[:, :, :N])
Is there any possible way to have consistent prediction of first N points?
Is this more along the lines of what you're looking for? Normalizing just across the channels?
out2 = torch.cat((x, v1), dim=2) / torch.linalg.norm(torch.cat((x, v1), dim=2), dim=1, keepdim=True)
out3 = torch.cat((x, v2), dim=2) / torch.linalg.norm(torch.cat((x, v2), dim=2), dim=1, keepdim=True)
torch.equal(out2[:, :, :N], out3[:, :, :N])
# True
I think if you want to do something like this within pytorch nn libraries you'll need to transpose your channels and feature dimensions that way you can use LayerNorm1d or InstanceNorm. See here for a nice visual example of the different normalization techniques
Update answer:
In case you want to use an nn module specifically. InstanceNorm or GroupNorm could also get you the response. However the number of channels now differs between the two so you'll need two distinct layers.
layer1 = nn.GroupNorm(V1+N, V1+N)
layer2 = nn.GroupNorm(V2+N, V2+N)
out2 = layer1(torch.cat((x, v1), dim=2).transpose(1,2))
out3 = layer2(torch.cat((x, v2), dim=2).transpose(1,2))
torch.equal(out2[:, :N, :], out3[:, :N, :])
True

About Quick Start of Deep Learning(Knet.jl) by Julia language

julia language deep learning framework,
This is a quick start for Knet.jl,
https://denizyuret.github.io/Knet.jl/latest/tutorial/#Tutorial
ENV ["COLUMNS"] = 72
using Knet, MLDatasets, IterTools
struct Conv; w; b; f; end
(c :: Conv) (x) = c.f. (pool (conv4 (c.w, x). + C.b))
Conv (w1, w2, cx, cy, f = relu) = Conv (param (w1, w2, cx, cy), param0 (1,1, cy, 1), f);
The complex type Conv has three fields, w, b, and f.
The Conv type c (x) function broadcasts the next function with the f function.
The inner product of the w matrix and the x matrix is ​​calculated with conv4 (c.w, x), and the addition with c.b is performed with. +.
I don't know what the pool is looking for in that matrix.
This (pool (conv4 ...)) is passed through the relu activation function.
At the last Conv (w1, w2, cx, cy, f = relu) = Conv (param (w1, w2, cx, cy), param0 (1,1, cy, 1), f);
I don't know what I'm trying to do.
This is the situation of understanding.
What are you trying to do, especially in the pool?
Why are there two params on the 5th line?
I do not know.
Actually, the layer does a convolution followed by max pooling:
Pooling layers reduce the dimensions of the data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, typically 2 x 2. Global pooling acts on all the neurons of the convolutional layer. There are two common types of pooling: max and average. Max pooling uses the maximum value of each cluster of neurons at the prior layer, while average pooling instead uses the average value. (source: https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layers)
There are two params on the 5th line, because a convolutional layers has two trainable parameters: the kernel weights w and the bias b. The function param (and param0) initialize them with the correct size and mark them as trainable parameters that will be updated during the optimization.
To learn neural networks, I found these examples: linear regression and a simple feed-forward network (multilayer perceptron) quite useful.

Output of a deep neural network in Pytorch

I have a neural network and this is the function for it:
def mlp(sizes, activation=nn.Tanh, output_activation=nn.Identity):
# Build a feedforward neural network. outputs are the logits
layers = []
for j in range(len(sizes)-1):
act = activation if j < len(sizes)-2 else output_activation
layers += [nn.Linear(sizes[j], sizes[j+1]), act()]
return nn.Sequential(*layers)
and sizes=[10, 20]
My input is a tensor in the shape of [100,1,10] and with this network, I get a tensor as my output in the shape of [100,20]. What I would like to add is two more layers so that instead of a matrix as the output, I get a vector of size 20. Is this possible considering that the input includes a batch of images?

Why adding noise into image data failed CNN regression learning

I am running an object localization task with CNN regression. I got pretty good results with the original data.
However, after I added 5% noise into the original data in the augmentation process, the loss values during the training looks ok, but the validation fluctuates a lot during the training. And when I check the results after the training was done, the neural nets always predicted a constant output. That is, the output from the networks never changes regardless of the input data, which means the neural network was trapped into a dead valley. How could this happen since I added variability into the data, and how to avoid that?
The code to add noise:
sindex = 0;
for n in range(n_train):
cndata = normalized_data[n, :, :, :]
# plt.matshow(np.squeeze(prepped_data[sindex, :, :,:]))
# add noise, augmentation, and shift in x and y
for x in range(1):
noise = np.random.normal(0, noise_sd, (xres, xres, slices))
ishift = np.random.randint(low=1, high=(row - xres), size=2)
prepped_data[sindex, :, :, :, 0] = cndata[ishift[0]:ishift[0] + xres, ishift[1]:ishift[1] + xres, :] + noise
prepped_label[sindex, :] = labelData[:, n] + [-ishift[0], -ishift[1], 0, -ishift[0], -ishift[1], 0, 0]
sindex = sindex + 1
Loss function definition:
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy', root_mean_squared_error])
RMSE definition:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
Here is the original training curve without noise:
Here is the training curve after adding noise:

How to specify the axis when using the softmax activation in a Keras layer?

The Keras docs for the softmax Activation states that I can specify which axis the activation is applied to. My model is supposed to output an n by k matrix M where Mij is the probability that the ith letter is symbol j.
n = 7 # number of symbols in the ouput string (fixed)
k = len("0123456789") # the number of possible symbols
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=((N,))))
...
model.add(layers.Dense(n * k, activation=None))
model.add(layers.Reshape((n, k)))
model.add(layers.Dense(output_dim=n, activation='softmax(x, axis=1)'))
The last line of code doesn't compile as I don't know how to correctly specify the axis (the axis for k in my case) for the softmax activation.
You must use an actual function there, not a string.
Keras allows you to use a few strings for convenience.
The activation functions can be found in keras.activations, and they're listed in the help file.
from keras.activations import softmax
def softMaxAxis1(x):
return softmax(x,axis=1)
.....
......
model.add(layers.Dense(output_dim=n, activation=softMaxAxis1))
Or even a custom axis:
def softMaxAxis(axis):
def soft(x):
return softmax(x,axis=axis)
return soft
...
model.add(layers.Dense(output_dim=n, activation=softMaxAxis(1)))