I'm currently working on two models that use different types of data but are connected. I'd want to create a combination model that takes in one instance of each of the data types, runs them through each of the pre-trained models independently, and then processes the combined output of the two distinct models through a few feed-forward layers at the top.
So far, I've learned that I can change forward to accept both inputs, so I've just cloned the structures of my individual models into the combined one, processed them each individually using forward(right )'s layers, and then merged the outputs as specified. What I'm having trouble with is figuring out how to achieve this.
as I understand from your question you can create two models then you need a third model that combines both the neural network with the forward and in the __main__ you can then load_state_dict
for example:
the first model
class FirstM(nn.Module):
def __init__(self):
super(FirstM, self).__init__()
self.fc1 = nn.Linear(20, 2)
def forward(self, x):
x = self.fc1(x)
return x
the second model
class SecondM(nn.Module):
def __init__(self):
super(SecondM, self).__init__()
self.fc1 = nn.Linear(20, 2)
def forward(self, x):
x = self.fc1(x)
return x
here you create a model that you can merge both two models in it as follows:
class Combined_model(nn.Module):
def __init__(self, modelA, modelB):
super(Combined_model, self).__init__()
self.modelA = modelA
self.modelB = modelB
self.classifier = nn.Linear(4, 2)
def forward(self, x1, x2):
x1 = self.modelA(x1)
x2 = self.modelB(x2)
x = torch.cat((x1, x2), dim=1)
x = self.classifier(F.relu(x))
return x
and then outside the classed in the main you can do as following
# Create models and load state_dicts
modelA = FirstM()
modelB = SecondM()
# Load state dicts
modelA.load_state_dict(torch.load(PATH))
modelB.load_state_dict(torch.load(PATH))
model = Combined_model(modelA, modelB)
x1, x2 = torch.randn(1, 10), torch.randn(1, 20)
output = model(x1, x2)
Related
I am trying to recreate the models from a study in which CNN-LSTM outperformed LSTM, but my CNN-LSTM produces nearly identical results to the LSTM. So it seems like the addition of the convolutional layers is not doing anything. The study describes the CNN-LSTM model like this:
The model is constructed by a single LSTM layer and two CNN layers. To form the CNN part, two 1D convolutional neural networks are stacked without any pooling layer. The second CNN layer is followed by a Rectified Linear Unit (ReLU) activation function. Each of the flattened output of the CNN’s ReLU layer and the LSTM layer is projected to the same dimension using a fully connected layer. Finally, a dropout layer is placed before the output layer.
Did I make a mistake in the implementation? The results of my CNN-LSTM are almost exactly the same as when I use the LSTM on its own. The LSTM on its own is the exact same code as below, just without the two conv1d layers and without the ReLu activation function.
class CNN_LSTM(nn.Module):
def __init__(self, input_size, seq_len, params, output_size):
super(CNN_LSTM, self).__init__()
self.n_hidden = params['lstm_hidden'] # neurons in each lstm layer
self.seq_len = seq_len # length of the input sequence
self.n_layers = 1 # nr of recurrent layers in the lstm
self.n_filters = params['n_filters'] # size of filter in cnn
self.c1 = nn.Conv1d(in_channels=1, out_channels=params['n_filters'], kernel_size=1, stride=1)
self.c2 = nn.Conv1d(in_channels=params['n_filters'], out_channels=1, kernel_size=1, stride=1)
self.lstm = nn.LSTM(
input_size=input_size, # nr of input features
hidden_size=params['lstm_hidden'],
num_layers=1
)
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(in_features=seq_len*params['lstm_hidden'], out_features=params['dense_hidden'])
self.dropout = nn.Dropout(p=.4)
self.fc2 = nn.Linear(in_features=params['dense_hidden'], out_features=output_size) # output_size = nr of output features
def reset_hidden_state(self):
self.hidden = (
torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),
torch.zeros(self.n_layers, self.seq_len, self.n_hidden).to(device=device),
)
def forward(self, sequences):
out = self.c1(sequences.view(len(sequences), 1, -1))
out = self.c2(out.view(len(out), self.n_filters, -1))
out = F.relu(out)
out, self.hidden = self.lstm(
out.view(len(out), self.seq_len, -1),
self.hidden
)
out = self.flatten(out)
out = self.fc1(out)
out = self.dropout(out)
out = self.fc2(out)
return out
Source for the study I am using.
Each training example is a sequence of how 8 input variables vary across 5 timesteps
I.e Input is [ip0_0, ip1_0,...,ip4_0], [ip0_1, ip1_1,...,ip4_1]..., [ip0_1, ip1_1,...,ip4_1]
Each training example has a label 0 or 1
I want to create a RNN that predicts the label from the inputs.
I see two ways of doing it
See RNNModelMultiforward below. High level idea is
Have a single torch.RNN()
Initialize hidden state to 0
Run the following 5 times
out, h = RNN([ip0_i,...,ip4_i], h), where i = 0,...,4
Run a feedforward layer that predicts the label from the final hidden state h
Is this the right way to do it or should I use a torch.RNN with num_layers = 5 and run it once to get the output: I.e hn = RNN([[ip0_0,...,ip4_0],.....,[ip0_4,...,ip4_4]], h0) (see RNNModelMultilayer below)
RNNModel multiforward
# Create RNN Model
class RNNModelMultiforward(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, num_epochs, act_fn='relu'):
super(RNNModelMultiforward, self).__init__()
# Number of hidden dimensions
self.hidden_dim = hidden_dim
# Number of hidden layers
self.layer_dim = layer_dim
#Number of times the RNN will be run (the input should be num_epochs X input_dim in size)
self.num_epochs = num_epochs
# RNN
self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity=act_fn)
# Readout layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Initialize hidden state with zeros
h = Variable(torch.zeros(self.layer_dim, self.hidden_dim))
for ts in range(0, self.num_epochs):
out, h = self.rnn(x[ts], h)
out = self.fc(h)
return out
RNNModel multilayer
# Create RNN Model
class RNNModelMultilayer(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, act_fn='relu'):
super(RNNModelMultilayer, self).__init__()
# Number of hidden dimensions
self.hidden_dim = hidden_dim
# Number of hidden layers
self.layer_dim = layer_dim
# RNN
self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity=act_fn)
# Readout layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Initialize hidden state with zeros
h0 = Variable(torch.zeros(self.layer_dim, self.hidden_dim))
out, hn = self.rnn(x, h0)
out = self.fc(hn[4])
return out
I have a NET like (exemple from here)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square, you can specify with a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
and another net like (exemple from here)
class binaryClassification(nn.Module):
def __init__(self):
super(binaryClassification, self).__init__()
# Number of input features is 12.
self.layer_1 = nn.Linear(12, 64)
self.layer_2 = nn.Linear(64, 64)
self.layer_out = nn.Linear(64, 1)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(p=0.1)
self.batchnorm1 = nn.BatchNorm1d(64)
self.batchnorm2 = nn.BatchNorm1d(64)
def forward(self, inputs):
x = self.relu(self.layer_1(inputs))
x = self.batchnorm1(x)
x = self.relu(self.layer_2(x))
x = self.batchnorm2(x)
x = self.dropout(x)
x = self.layer_out(x)
return x
I'd like to change, for exemple "self.fc2 = nn.Linear(120, 84)" in order to have 121 inputs, where the 121th is the x (output) of the binaryClassification network.
The idea is: I'd like to use in the same time, CNN network, and not-CNN network, to train both, with influence one on the other.
Is it possible? How can I perform that? (Keras or Pytorch examples are both ok).
Or maybe the idea is crazy and there is easier way to mix data and image as input of an unique network?
It is a perfectly valid approach, you are taking two different input data sources, processing them and combining the result to solve a common goal (in this case it seems like a 10-class image classification). You can define the input to your Net network to be a tuple of the image you need for the original Net and the features 12-value vector for your BinaryClassificator. An example code would be:
import torch
import torch.nn as nn
class binaryClassification(nn.Module):
#> ...same as above
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension
self.binClas = binaryClassification()
self.fc2 = nn.Linear(121, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, inputs):
x, features = inputs # split tuple
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square, you can specify with a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
# Concatenate with BinaryClassification
x = torch.cat([F.relu(self.fc1(x)), self.binClas(features)])
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
However! Be careful about training them together, it is hard to balance both branches in the network to make them learn. I would recommend you to train them separately for a while before plugging them together (generally speaking, the hyperparameters of one part of the network will probably not be optimal for the other). To do this, you could freeze one part of the network while training the other, and viceversa. (check this link to see how to freeze parts of a torch nn)
The most naive way to do it would be to instantiate both models, sum the two predictions and compute the loss with it. This will backpropagate through both models:
net1 = Net1()
net2 = Net2()
bce = torch.nn.BCEWithLogitsLoss()
params = list(net1.parameters()) + list(net2.parameters())
optimizer = optim.SGD(params)
for (x, ground_truth) in enumerate(your_data_loader):
optimizer.zero_grad()
prediction = net1(x) + net2(x) # the 2 models must output tensors of same shape
loss = bce(prediction, ground_truth)
train_loss.backward()
optimizer.step()
You could also e.g.
implement the layers of Net1 and Net2 in a single model
train Net1 and Net2 separately and ensemble them later
I am trying to replicate the ConvNet + LSTM approach presented in this paper using pytorch. But I am struggling to find the correct way to combine the CNN and the LSTM in my model. Here is my attempt :
class VideoRNN(nn.Module):
def __init__(self, hidden_size, n_classes):
super(VideoRNN, self).__init__()
self.hidden_size = hidden_size
vgg = models.vgg16(pretrained=True)
embed = nn.Sequential(*list(vgg.classifier.children())[:-1])
vgg.classifier = embed
for param in vgg.parameters():
param.requires_grad = False
self.embedding = vgg
self.GRU = nn.GRU(4096, hidden_size)
def forward(self, input, hidden=None):
embedded = self.embedding(input)
output, hidden = self.gru(output, hidden)
output = self.classifier(output.view(-1, 4096))
return output, hidden
As my videos have variable length, I provide a PackedSequence as an input. It is created from a Tensor with shape (M,B,C,H,W) where M is the maximum sequence length and B the batch size. The C,H,W are the channels, height and width of each frame.
I want the pre-trained CNN to be part of the model as I may later unfreeze some layer to finetune the CNN for my task. That's why I didn't compute the embedding of the images separately.
My questions are then the following :
Is the shape of my input data correct in order to handle batches of videos in my context or should I use something else than a PackedSequence?
In my forward function, how can I handle the batch of sequences of images with my VGG and my GRU unit ? I cannot feed directly the PackedSequence as an input to my VGG so how can I proceed?
Does this approach seem to respect the "pytorch way of doing things" or should is my approach flawed?
I finally found the solution to make it works. Here is a simplified yet complete example of how I managed to create a VideoRNN able to use packedSequence as an input :
class VideoRNN(nn.Module):
def __init__(self, n_classes, batch_size, device):
super(VideoRNN, self).__init__()
self.batch = batch_size
self.device = device
# Loading a VGG16
vgg = models.vgg16(pretrained=True)
# Removing last layer of vgg 16
embed = nn.Sequential(*list(vgg.classifier.children())[:-1])
vgg.classifier = embed
# Freezing the model 3 last layers
for param in vgg.parameters():
param.requires_grad = False
self.embedding = vgg
self.gru = nn.LSTM(4096, 2048, bidirectional=True)
# Classification layer (*2 because bidirectionnal)
self.classifier = nn.Sequential(
nn.Linear(2048 * 2, 256),
nn.ReLU(),
nn.Linear(256, n_classes),
)
def forward(self, input):
hidden = torch.zeros(2, self.batch , 2048).to(
self.device
)
c_0 = torch.zeros(self.num_layer * 2, self.batch, 2048).to(
self.device
)
embedded = self.simple_elementwise_apply(self.embedding, input)
output, hidden = self.gru(embedded, (hidden, c_0))
hidden = hidden[0].view(-1, 2048 * 2)
output = self.classifier(hidden)
return output
def simple_elementwise_apply(self, fn, packed_sequence):
return torch.nn.utils.rnn.PackedSequence(
fn(packed_sequence.data), packed_sequence.batch_sizes
)
the key is the simple_elementwise_apply methods allowing to feed the PackedSequence in the CNN networks and to retrieve a new PackedSequence made of embedding as an output.
I hope you'll find it useful.
Below is the example code to use pytorch to construct DNN for two regression tasks. The forward function returns two outputs (x1, x2). How about the network for lots of regression/classification tasks? e.g., 100 or 1000 outputs. It definitely not a good idea to hardcode all the outputs (e.g., x1, x2, ..., x100). Is there an simple method to do that? Thank you.
import torch
from torch import nn
import torch.nn.functional as F
class mynet(nn.Module):
def __init__(self):
super(mynet, self).__init__()
self.lin1 = nn.Linear(5, 10)
self.lin2 = nn.Linear(10, 3)
self.lin3 = nn.Linear(10, 4)
def forward(self, x):
x = self.lin1(x)
x1 = self.lin2(x)
x2 = self.lin3(x)
return x1, x2
if __name__ == '__main__':
x = torch.randn(1000, 5)
y1 = torch.randn(1000, 3)
y2 = torch.randn(1000, 4)
model = mynet()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
for epoch in range(100):
model.train()
optimizer.zero_grad()
out1, out2 = model(x)
loss = 0.2 * F.mse_loss(out1, y1) + 0.8 * F.mse_loss(out2, y2)
loss.backward()
optimizer.step()
You can (and should) use nn containers such as nn.ModuleList or nn.ModuleDict to manage arbitrary number of sub-modules.
For example (using nn.ModuleList):
class MultiHeadNetwork(nn.Module):
def __init__(self, list_with_number_of_outputs_of_each_head):
super(MultiHeadNetwork, self).__init__()
self.backbone = ... # build the basic "backbone" on top of which all other heads come
# all other "heads"
self.heads = nn.ModuleList([])
for nout in list_with_number_of_outputs_of_each_head:
self.heads.append(nn.Sequential(
nn.Linear(10, nout * 2),
nn.ReLU(inplace=True),
nn.Linear(nout * 2, nout)))
def forward(self, x):
common_features = self.backbone(x) # compute the shared features
outputs = []
for head in self.heads:
outputs.append(head(common_features))
return outputs
Note that in this example each head is more complex than a single nn.Linear layer.
The number of different "heads" (and number of outputs) is determined by the length of the argument list_with_number_of_outputs_of_each_head.
Important notice: it is crucial to use nn containers, rather than simple pythonic lists/dictionary to store all sub modules. Otherwise pytorch will have difficulty managing all sub modules.
See, e.g., this answer, this question and this one.