I have a custom LSTM model in PyTorch like below:
hidden_size = 32
num_layers = 1
num_classes = 2
class customModel(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(customModel, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
self.fcl = nn.Linear(hidden_size*2, num_classes)
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0))
fw_bilstm = out[-1, :, :self.hidden_size]
bk_bilstm = out[0, :, :self.hidden_size]
concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1)
fc = self.fcl(concat_fw_bw)
x = F.softmax(F.relu(fc))
return x
I can pass an input of type torch.Tensor to this model. The input is of length 67349, each is a 300 dimension vector.
After model initiation and prediction, I get an output vector of length 1.
model = customModel(300, hidden_size, num_layers, num_classes)
output = model(input_torch)
The output shows tensor([[0.5020, 0.4980]], grad_fn=<SoftmaxBackward>) when I print it out.
Why is this output of length 1? It seems like I should NOT have barch_first=True in my model but changing that requires other input dimension changes which I am not sure how to do.
Please suggest how can I get a vector output of length 67349 (input length) instead of 1?
Explanation
I see #gorjan suggested some modifications in the forward method of the network. So I wanted to clarify more what I am trying to build
Feed the embedding to a BiLSTM (done)
Get the hidden states of the last step in each direction and concatenate
Fed the concatenated output (from step 2) to a fully connected layer
with ReLUs
Fed the output from step 3 to a softmax layer
I have commented the def forward(...) method your module, have a look:
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
fw_bilstm = out[-1, :, :self.hidden_size] # This is wrong: You are taking only last batch element
bk_bilstm = out[0, :, :self.hidden_size] # This is wrong: You are taking only the first batch element
concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1) # This is not needed: If you want to obtain the hidden states for all elements in the sequence
fc = self.fcl(concat_fw_bw) # Because of the above mentioned issues, this is wrong as well.
x = F.softmax(F.relu(fc)) # This is wrong: Never stack activation on top of activation.
return x
Now, according to what you asked:
Please suggest how can I get a vector output of length 67349 (input length) instead of 1?
I suppose that you want to obtain the hidden states for each of you elements in the batch. Here is how you should structure your forward pass:
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
fc = self.fcl(out) # fc is of size [batch_size, sequence_length, num_classes]
x = F.softmax(fc) # Just softmax so that you can get the probabilities for each of your classes
return x
If we test out the updated model, these are the results:
# Assuming 32 elements in the batch, each elements has 177 elements in the sequence, and each sequence element has size 300
inputs = torch.rand(32, 177, 300)
# Obtaining the outputs from the model
outputs = model(inputs)
# The size is as expected: torch.Size([32, 177, 2])
print(outputs.shape)
Another thing to keep in mind, you say:
The input is of length 67349, each is a 300 dimension vector.
This is an extremely long sequence. You model will drastically underperform and I guess that your training would last forever. However, that is a completely different issues and should be discusses in a separate thread.
Related
Each training example is a sequence of how 8 input variables vary across 5 timesteps
I.e Input is [ip0_0, ip1_0,...,ip4_0], [ip0_1, ip1_1,...,ip4_1]..., [ip0_1, ip1_1,...,ip4_1]
Each training example has a label 0 or 1
I want to create a RNN that predicts the label from the inputs.
I see two ways of doing it
See RNNModelMultiforward below. High level idea is
Have a single torch.RNN()
Initialize hidden state to 0
Run the following 5 times
out, h = RNN([ip0_i,...,ip4_i], h), where i = 0,...,4
Run a feedforward layer that predicts the label from the final hidden state h
Is this the right way to do it or should I use a torch.RNN with num_layers = 5 and run it once to get the output: I.e hn = RNN([[ip0_0,...,ip4_0],.....,[ip0_4,...,ip4_4]], h0) (see RNNModelMultilayer below)
RNNModel multiforward
# Create RNN Model
class RNNModelMultiforward(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, num_epochs, act_fn='relu'):
super(RNNModelMultiforward, self).__init__()
# Number of hidden dimensions
self.hidden_dim = hidden_dim
# Number of hidden layers
self.layer_dim = layer_dim
#Number of times the RNN will be run (the input should be num_epochs X input_dim in size)
self.num_epochs = num_epochs
# RNN
self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity=act_fn)
# Readout layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Initialize hidden state with zeros
h = Variable(torch.zeros(self.layer_dim, self.hidden_dim))
for ts in range(0, self.num_epochs):
out, h = self.rnn(x[ts], h)
out = self.fc(h)
return out
RNNModel multilayer
# Create RNN Model
class RNNModelMultilayer(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, act_fn='relu'):
super(RNNModelMultilayer, self).__init__()
# Number of hidden dimensions
self.hidden_dim = hidden_dim
# Number of hidden layers
self.layer_dim = layer_dim
# RNN
self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity=act_fn)
# Readout layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Initialize hidden state with zeros
h0 = Variable(torch.zeros(self.layer_dim, self.hidden_dim))
out, hn = self.rnn(x, h0)
out = self.fc(hn[4])
return out
I have a dataset with 8 features and 4 timesteps. I am trying to implement an LSTM but need help understanding if i have set my tensor correctly. The aim is to take the outputted features from the LSTM and pass them through a NN.
My tensor shape is currently #samples x #timesteps x #features i.e. 4500x4x8. This works with the code below. I want to make sure that the model is indeed taking each timestep matrix as a new sequence (with matrix 4500x[0]x8 being the first timestep matrix and 4500x[3]x8 being the last timestep). I then take the final timestep output (output[:,-1,:] to feed through a NN.
Is the code doing what i think it is doing? I ask as performance is marginally less than a simple RF that only uses the final timestep data. This would be unexpected as the data has strong time-series correlations (it tracks patients vitals declining before going on ventilation).
I have the following code:
class LSTM1(nn.Module):
def __init__(self, num_classes, input_size, hidden_size, num_layers):
super(LSTM1, self).__init__()
self.num_classes = num_classes #number of classes
self.num_layers = num_layers #number of layers
self.input_size = input_size #input size
self.hidden_size = hidden_size #hidden state
self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True) #lstm
self.fc_1 = nn.Linear(hidden_size, 32) #fully connected 1
self.fc_2 = nn.Linear(32, 12) #fully connected 1
self.fc_3 = nn.Linear(12, 1)
self.fc = nn.Sigmoid() #fully connected last layer
self.relu = nn.ReLU()
def forward(self,x):
h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #hidden state
c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #internal state
# Propagate input through LSTM
output, (hn, cn) = self.lstm(x, (h_0, c_0)) #lstm with input, hidden, and internal state
out = output[:,-1,:] #reshaping the data for Dense layer next
out = self.relu(out)
out = self.fc_1(out) #first Dense
out = self.relu(out) #relu
out = self.fc_2(out) #2nd dense
out = self.relu(out) #relu
out = self.fc_3(out) #3rd dense
out = self.relu(out) #relu
out = self.fc(out) #Final Output
return out
Error
Your error stems from the last three lines.
Do not use ReLU activation at the end of your network
Use nn.Linear -> nn.Sigmoid with BCELoss or
nn.Linear with nn.BCEWithLogitsLoss (see here for what logits are).
What is going on
With ReLu you output values in the range [0, +inf)
Applying sigmoid on top of it “squashes” values to (0, 1) with threshold being 0 (e.g. 0 becomes 0.5 probability, hence 1 after threaholding at 0.5!)
In effect, you always predict 1 with this code, which is not what you want probably
I am trying to build RNN from scratch using pytorch and I am following this tutorial to build it.
import torch
import torch.nn as nn
import torch.nn.functional as F
class BasicRNN(nn.Module):
def __init__(self, n_inputs, n_neurons):
super(BasicRNN, self).__init__()
self.Wx = torch.randn(n_inputs, n_neurons) # n_inputs X n_neurons
self.Wy = torch.randn(n_neurons, n_neurons) # n_neurons X n_neurons
self.b = torch.zeros(1, n_neurons) # 1 X n_neurons
def forward(self, X0, X1):
self.Y0 = torch.tanh(torch.mm(X0, self.Wx) + self.b) # batch_size X n_neurons
self.Y1 = torch.tanh(torch.mm(self.Y0, self.Wy) +
torch.mm(X1, self.Wx) + self.b) # batch_size X n_neurons
return self.Y0, self.Y1
class CleanBasicRNN(nn.Module):
def __init__(self, batch_size, n_inputs, n_neurons):
super(CleanBasicRNN, self).__init__()
self.rnn = BasicRNN(n_inputs, n_neurons)
self.hx = torch.randn(batch_size, n_neurons) # initialize hidden state
def forward(self, X):
output = []
# for each time step
for i in range(2):
self.hx = self.rnn(X[i], self.hx)
output.append(self.hx)
return output, self.hx
FIXED_BATCH_SIZE = 4 # our batch size is fixed for now
N_INPUT = 3
N_NEURONS = 5
X_batch = torch.tensor([[[0,1,2], [3,4,5],
[6,7,8], [9,0,1]],
[[9,8,7], [0,0,0],
[6,5,4], [3,2,1]]
], dtype = torch.float) # X0 and X1
model = CleanBasicRNN(FIXED_BATCH_SIZE,N_INPUT,N_NEURONS)
a1,a2 = model(X_batch)
Running this code returns this error
RuntimeError: size mismatch, m1: [4 x 5], m2: [3 x 5] at /pytorch/..
After some digging I found this error happens when passing the hidden states to the BasicRNN model
N_INPUT = 3 # number of features in input
N_NEURONS = 5 # number of units in layer
X0_batch = torch.tensor([[0,1,2], [3,4,5],
[6,7,8], [9,0,1]],
dtype = torch.float) #t=0 => 4 X 3
X1_batch = torch.tensor([[9,8,7], [0,0,0],
[6,5,4], [3,2,1]],
dtype = torch.float) #t=1 => 4 X 3
test_model = BasicRNN(N_INPUT,N_NEURONS)
a1,a2 = test_model(X0_batch,X1_batch)
a1,a2 = test_model(X0_batch,torch.randn(1,N_NEURONS)) # THIS LINE GIVES ERROR
What is happening in the hidden states and How can I solve this problem?
Maybe the tutorial is wrong: torch.mm(X1, self.Wx) multiplies a 3 x 5 and a 4 x 5 tensor, which doesn't work. Even if you make it work by rewriting as torch.mm(self.Wx, X1.t()), you expect it to output a 4 x 5 tensor, but the result is a 4 x 3 tensor.
The BasicRNN is not an implementation of an RNN cell, but rather the full RNN fixed for two time steps. It is depicted in the image of the tutorial:
Where Y0, the first time step, does not include the previous hidden state (technically zero) and Y0 is also h0, which is then used for the second time step, Y1 or h1.
An RNN cell is one of the time steps in isolation, particularly the second one, as it should include the hidden state of the previous time step.
The next hidden state is calculate as described in the nn.RNNCell documentation:
In your BasicRNN there is only one bias term, but you still have a weight Wx for the input and the weight Wy for the hidden state, which should probably be called Wh instead. As for the forward method, its arguments become the input and the previous hidden state, instead of being two inputs at different time steps. This also means that you only have one calculation, corresponding to the formula of the nn.RNNCell, which was the calculation for the Y1, except that it uses the hidden state that was passed to the forward method.
class BasicRNN(nn.Module):
def __init__(self, n_inputs, n_neurons):
super(BasicRNN, self).__init__()
self.Wx = torch.randn(n_inputs, n_neurons) # n_inputs X n_neurons
self.Wh = torch.randn(n_neurons, n_neurons) # n_neurons X n_neurons
self.b = torch.zeros(1, n_neurons) # 1 X n_neurons
def forward(self, x, hidden):
return torch.tanh(torch.mm(x, self.Wx) + torch.mm(hidden, self.Wh) + self.b)
In the tutorial, they opted to use nn.RNNCell directly instead of implementing the cell.
Note: The terms of the matrix multiplications are in a different order, because the weights are usually transposed in comparison to your weights and the formula assumes the input and hidden state to be vectors (not batches). Technically, the batched inputs and hidden states would have to be transposed, and the output would be transposed back for it to work with the batches. It's easier to just use the transposed the weight, as the result is the same due to the transpose property of the matrix multiplication:
I have a set of input sentences. I am using the pretrained word2vec model from gensim to get the embedding of the input sentences. I want to pass these embeddings as input to a custom pytorch LSTM model
hidden_size = 32
num_layers = 1
num_classes = 2
class customModel(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(customModel, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=False, bidirectional=True)
self.fcl = nn.Linear(hidden_size*2, num_classes)
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0))
fw_bilstm = out[-1, :, :self.hidden_size]
bk_bilstm = out[0, :, :self.hidden_size]
concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1)
fc = self.fcl(concat_fw_bw)
x = F.softmax(F.relu(fc))
return x
Now I initialize the model object.
model = customModel(300, hidden_size, num_layers, num_classes)
Get embedding for the input sentences
sentences = [['my', 'name', 'is', 'nad'], ['i', 'love', 'nlp', 'proc']]
embedding = create_embedding(sentences)
embedding_torch = torch.FloatTensor(embedding)
Now I want to pass these embeddings to the model to get the prediction
for item in embedding_torch:
item = item.view((1, item.size()[0], item.size()[1]))
for epoch in range(1):
tag_scores = model(item)
print (tag_scores)
Which throws me runtime error
RuntimeError: Expected hidden[0] size (2, 4, 32), got (2, 1, 32)
I am not sure why this is happening. My understanding is h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device) line is calculating the hidden dimension properly.
What am I missing? Please suggest.
The backbone of your model is nn.LSTM which expects inputs with size [sequence_length, batch_size, embedding_size]. On the other hand, the inputs you are providing the model have size [1, sequence_lenth, embedding_size]. What I would do is create the nn.LSTM as:
# With batch_first=True
self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
That way, the model would expect the inputs to be of size [batch_size, sequence_length, embedding_size]. Then, instead of going through each element in the batch separately, do:
tag_scores = model(embedding_torch)
I am using Pytorch for an LSTM encoder-decoder sequence-to-sequence prediction problem. As a first step, I would like to forecast 2D trajectories (trajectory x, trajectory y) from multivariate input - 2-D or more (trajectory x, trajectory y, speed, rotation, etc.)
I am following the below notebook (link):
seq2seq with Attention
Here excerpts (encoder, decoder, attention):
class EncoderRNN(nn.Module):
def __init__(self, input_size, hidden_size, n_layers=1, dropout=0.1):
super(EncoderRNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.n_layers = n_layers
self.dropout = dropout
self.embedding = nn.Embedding(input_size, hidden_size)
self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=self.dropout, bidirectional=True)
def forward(self, input_seqs, input_lengths, hidden=None):
# Note: we run this all at once (over multiple batches of multiple sequences)
embedded = self.embedding(input_seqs)
packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)
outputs, hidden = self.gru(packed, hidden)
outputs, output_lengths = torch.nn.utils.rnn.pad_packed_sequence(outputs) # unpack (back to padded)
outputs = outputs[:, :, :self.hidden_size] + outputs[:, : ,self.hidden_size:] # Sum bidirectional outputs
return outputs, hidden
class LuongAttnDecoderRNN(nn.Module):
def __init__(self, attn_model, hidden_size, output_size, n_layers=1, dropout=0.1):
super(LuongAttnDecoderRNN, self).__init__()
# Keep for reference
self.attn_model = attn_model
self.hidden_size = hidden_size
self.output_size = output_size
self.n_layers = n_layers
self.dropout = dropout
# Define layers
self.embedding = nn.Embedding(output_size, hidden_size)
self.embedding_dropout = nn.Dropout(dropout)
self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=dropout)
self.concat = nn.Linear(hidden_size * 2, hidden_size)
self.out = nn.Linear(hidden_size, output_size)
# Choose attention model
if attn_model != 'none':
self.attn = Attn(attn_model, hidden_size)
def forward(self, input_seq, last_hidden, encoder_outputs):
# Note: we run this one step at a time
# Get the embedding of the current input word (last output word)
batch_size = input_seq.size(0)
embedded = self.embedding(input_seq)
embedded = self.embedding_dropout(embedded)
embedded = embedded.view(1, batch_size, self.hidden_size) # S=1 x B x N
# Get current hidden state from input word and last hidden state
rnn_output, hidden = self.gru(embedded, last_hidden)
# Calculate attention from current RNN state and all encoder outputs;
# apply to encoder outputs to get weighted average
attn_weights = self.attn(rnn_output, encoder_outputs)
context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x S=1 x N
# Attentional vector using the RNN hidden state and context vector
# concatenated together (Luong eq. 5)
rnn_output = rnn_output.squeeze(0) # S=1 x B x N -> B x N
context = context.squeeze(1) # B x S=1 x N -> B x N
concat_input = torch.cat((rnn_output, context), 1)
concat_output = F.tanh(self.concat(concat_input))
# Finally predict next token (Luong eq. 6, without softmax)
output = self.out(concat_output)
# Return final output, hidden state, and attention weights (for visualization)
return output, hidden, attn_weights
For calculating attention in the decoder stage, the encoder hidden state and encoder outputs are input and used as below:
class Attn(nn.Module):
def __init__(self, method, hidden_size):
super(Attn, self).__init__()
self.method = method
self.hidden_size = hidden_size
if self.method == 'general':
self.attn = nn.Linear(self.hidden_size, hidden_size)
elif self.method == 'concat':
self.attn = nn.Linear(self.hidden_size * 2, hidden_size)
self.v = nn.Parameter(torch.FloatTensor(1, hidden_size))
def forward(self, hidden, encoder_outputs):
max_len = encoder_outputs.size(0)
this_batch_size = encoder_outputs.size(1)
# Create variable to store attention energies
attn_energies = Variable(torch.zeros(this_batch_size, max_len)) # B x S
if USE_CUDA:
attn_energies = attn_energies.cuda()
# For each batch of encoder outputs
for b in range(this_batch_size):
# Calculate energy for each encoder output
for i in range(max_len):
attn_energies[b, i] = self.score(hidden[:, b], encoder_outputs[i, b].unsqueeze(0))
# Normalize energies to weights in range 0 to 1, resize to 1 x B x S
return F.softmax(attn_energies).unsqueeze(1)
def score(self, hidden, encoder_output):
if self.method == 'dot':
energy = hidden.dot(encoder_output)
return energy
elif self.method == 'general':
energy = self.attn(encoder_output)
energy = hidden.dot(energy)
return energy
elif self.method == 'concat':
energy = self.attn(torch.cat((hidden, encoder_output), 1))
energy = self.v.dot(energy)
return energy
My actual goal is to extend the method by adding further information to be fed into the decoder, such as image data at each input time step. Technically, I want to use two (or more) encoders, one for the trajectories as in the link above, and one separate one for image data (convolutional encoder).
I do this by concatenating embeddings produced by the trajectory encoder and the convolutional encoder (as well as the cell states etc.) and feeding the concatenated tensors to the decoder.
For example, image embedding (256-length tensor) concatenated with trajectory data embedding (256-length tensor) yields a 512-length embedding.
My question is: is it a problem for the attention calculation if I use a concatenated encoder hidden state, concatenated encoder cell state, and concatenated encoder output coming from those different sources rather than hidden states, cells, outputs coming from a single source?
What are the caveats or pre-processing that should happen to make this work?
Thank you very much in advance.