Do linear layer after GRU saved the sequence output order? - deep-learning

I'm dealing with the following senario:
My input has the shape of: [batch_size, input_sequence_length, input_features]
where:
input_sequence_length = 10
input_features = 3
My output has the shape of: [batch_size, output_sequence_length]
where:
output_sequence_length = 5
i.e: for each time slot of 10 units (each slot with 3 features) I need to predict the next 5 slots values.
I built the following model:
import torch
import torch.nn as nn
import torchinfo
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.GRU = nn.GRU(input_size=3, hidden_size=32, num_layers=2, batch_first=True)
self.fc = nn.Linear(32, 5)
def forward(self, input_series):
output, h = self.GRU(input_series)
output = output[:, -1, :] # get last state
output = self.fc(output)
output = output.view(-1, 5, 1) # reorginize output
return output
torchinfo.summary(MyModel(), (512, 10, 3))
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
MyModel [512, 5, 1] --
├─GRU: 1-1 [512, 10, 32] 9,888
├─Linear: 1-2 [512, 5] 165
==========================================================================================
I'm getting good results (very small MSE loss, and the predictions looks good),
but I'm not sure if the model output (5 sequence values) are really ordered by the model ?
i.e the second output based on the first output and the third output based on the second output ...
I know that the GRU output based on the learned sequence history.
But I'm also used linear layer, so is the output (after the linear layer) still sorted by time ?

UPDATE
This answer isn't quite right, see this follow-up question. The best way is to write the math and show that the 5 scalar outputs aren't functions of each other.
Old Answer
I'm not sure if the model output (5 sequence values) are really ordered by the model ? i.e the second output based on the first output and the third output based on the second output
No, they aren't. You can check that the gradients of, say, the last output w.r.t to the previous outputs are zeroes, which basically means that the last output isn't a function of the previous outputs.
model = MyModel()
x = torch.rand([2, 10, 3])
y = model(x)
y.retain_grad() # allows accessing y.grad although y is a non-leaf Tensor
y[:, -1].sum().backward() # computes gradients of last output
assert torch.allclose(y.grad[:, :-1], torch.tensor(0.)) # gradients w.r.t previous outputs are zeroes
A popular model to capture dependencies among output labels is conditional random fields. But since you're already happy with the predictions of the current model, perhaps modelling the output dependencies isn't that important.

Related

Why Is accuracy so different when I use evaluate() and predict()?

I have a Convolutional Neural Network, and it's trying to resolve a classification problem using images (2 classes, so binary classification), using sigmoid.
To evaluate the model I use:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
path_dir = '../../dataset/train'
parth_dir_test = '../../dataset/test'
datagen = ImageDataGenerator(
rescale=1./255,
validation_split = 0.2)
test_set = datagen.flow_from_directory(parth_dir_test,
target_size= (150,150),
batch_size = 64,
class_mode = 'binary')
score = classifier.evaluate(test_set, verbose=0)
print('Test Loss', score[0])
print('Test accuracy', score[1])
And it outputs:
When I try to print the classification report I use:
yhat_classes = classifier.predict_classes(test_set, verbose=0)
yhat_classes = yhat_classes[:, 0]
print(classification_report(test_set.classes,yhat_classes))
But now I get this accuracy:
If I print the test_set.classes, it shows the first 344 numbers of the array as 0, and the next 344 as 1. Is this test_set shuffled before feeding into the network?
I think your model is doing just fine both in "training" and "evaluating".Evaluation accuracy comes on the basis of prediction so maybe you are making some logical mistake while using model.predict_classes().Please check if you are using the trained model weights and not any randomly initialized model while evaluating it.
what "evaluate" does: The model sets apart this fraction of data while training, and will not train on it, and will evaluate loss and any other model's metrics on this data after each "epoch".so, model.evaluate() is for evaluating your trained model. Its output is accuracy or loss, not prediction to your input data!
predict: Generates output predictions for the input samples. model.predict() actually predicts, and its output is target value, predicted from your input data.
FYI: if your accurscy in Binary Classification problem is less than 50%, it's worse than the case that you randomly predict one of those classes (acc = 50%)!
I needed to add a shuffle=False. The code that work is:
test_set = datagen.flow_from_directory(parth_dir_test,
target_size=(150,150),
batch_size=64,
class_mode='binary',
shuffle=False)

PyTorch: Multi-class segmentation loss value != 0 when using target image as the prediction

I was performing semantic segmentation using PyTorch. There are a total of 103 different classes in the dataset and the targets are RGB images with only the Red channel containing the labels. I was using nn.CrossEntropyLoss as my loss function. For sanity, I wanted to check if using nn.CrossEntropyLoss is correct for this problem and whether it has the expected behaviour
I pick a random mask from my dataset and create a categorical version of it using this custom transform
class ToCategorical:
def __init__(self, n_classes: int) -> None:
self.n_classes = n_classes
def __call__(self, sample: torch.Tensor):
mask = sample.permute(1, 2, 0)
categories = torch.unique(mask).tolist()[1:] # get all categories other than 0
# build a tensor with `n_classes` channels
one_hot_image = torch.zeros(self.n_classes, *mask.shape[:-1])
for category in categories:
# get spacial locs where the categ is present
rows, cols, _ = torch.where(mask == category)
# in same spacial loc but in `categ` channel fill 1
one_hot_image[category, rows, cols] = 1
return one_hot_image
And then I send this image as the output (prediction) and use the ground truth mask as the target to the loss function.
import torch.nn as nn
mask = T.PILToTensor()(Image.open("path_to_image").convert("RGB"))
categorical_mask = ToCategorical(103)(mask).unsqueeze(0)
mask = mask[0].unsqueeze(0) # get only the red channel, add fake batch_dim
loss_fn = nn.CrossEntropyLoss()
target = mask
output = categorical_mask
print(output.shape, target.shape)
print(loss_fn(output, target.to(torch.long)))
I expected the loss to be zero but to my surprise, the output is as follows
torch.Size([1, 103, 600, 800]) torch.Size([1, 600, 800])
tensor(4.2836)
I verified with other samples in the dataset and I obtained similar values for other masks as well. Am I doing something wrong? I expect the loss to be = 0 when the output is the same as the target.
PS. I also know that nn.CrossEntropyLoss is the same as using log_softmax followed by nn.NLLLoss() but even I obtained the same value by using nllloss as well
For Reference
Dataset used: UECFoodPixComplete
I would like to adress this:
I expect the loss to be = 0 when the output is the same as the target.
If the prediction matches the target, i.e. the prediction corresponds to a one-hot-encoding of the labels contained in the dense target tensor, but the loss itself is not supposed to equal to zero. Actually, it can never be equal to zero because the nn.CrossEntropyLoss function is always positive by definition.
Let us take a minimal example with number of #C classes and a target y_pred and a prediction y_pred consisting of prefect predictions:
As a quick reminder:
The softmax is applied on the logits (q_i) as p_i = log(exp(q_i)/sum_j(exp(q_j)):
>>> p = F.softmax(y_pred, 1)
Similarly if you are using the log-softmax, defined as logp_i = log(p_i):
>>> logp = F.log_softmax(y_pred, 1)
Then comes the negative likelihood function computed between x the input and y the target: -y*x. In association with the softmax, it comes down to -y*p, or -y*logp respectively. In any case, whether you apply the log or not, only the predictions corresponding to the true classes will remain since the others ones are zeroed-out.
That being said, applying the NLLLoss on y_pred would indeed result with a 0 as you expected in your question. However, here we apply it on the probability distribution or log-probability: p, or logp respectively!
In our specific case, p_i = 1 for the true class and p_i = 0 for all other classes (there are #C - 1 of those). This means the softmax of the logit associated with the true class will equal to exp(1)/sum_i(p_i). And since sum_i(p_i) = (#C-1)*exp(0) + exp(1). We therefore have:
softmax(p) = e / (#C - 1 + e)
Similarly for log-softmax:
log-softmax(p) = log(e / (#C-1 + e)) = 1 - log(#C - 1 + e)
If we proceed by applying the negative likelihood function we simply get cross-entropy(y_pred, y_true) = (nllloss o log-softmax)(y_pred, y_true). This results in:
loss = - (1 - log(#C - 1 + e)) = log(#C - 1 + e) - 1
This effectively corresponds to the minimum of the nn.CrossEntropyLoss function.
Regarding your specific case where #C = 103, you may have an issue in your code... since the average loss should equal to log(102 + e) - 1 i.e. around 3.65.
>>> y_true = torch.randint(0,103,(1,1,2,5))
>>> y_pred = torch.zeros(1,103,2,5).scatter(1, y_true, value=1)
You can see for yourself with one of the provided methods:
the builtin function nn.functional.cross_entropy:
>>> F.cross_entropy(y_pred, y_true[:,0])
tensor(3.6513)
manually computing the quantity:
>>> logp = F.log_softmax(y_pred, 1)
>>> -logp.gather(1, y_true).mean()
tensor(3.6513)
analytical result:
>>> log(102 + e) - 1
3.6513

Multi-label classification return more than 1 class in each label

How to train a Multi-label classification model when each label should return more than 1 class?
Example:
Image classification have 2 label: style with 4 classes and layout with 5 classes.
An image in list should return 2 style and 3 layout like [1 0 1 0] [1 1 0 0 1]
My sample net:
class MyModel(nn.Module):
def __init__(self, n__classes=32):
super().__init__()
self.base_model = models.resnet50(pretrained=True).to(device)
last_channel = self.base_model.fc.in_features
self.base_model.fc = nn.Sequential()
self.layout = nn.Sequential(
nn.Dropout(0.2),
nn.Linear(last_channel, n_classes_layout),
nn.Sigmoid()
)
self.style = nn.Sequential(
nn.Dropout(0.2),
nn.Linear(last_channel, n_classes_style),
nn.Sigmoid()
)
def forward(self, x):
base = self.base_model(x)
return self.layout(base), self.style(base)
def loss_fn(outputs, targets):
o1, o2 = outputs
t1, t2 = targets
I am not sure what you referring label to but it seems you have a multi output model predicting on one hand style, and on the other layout. So I am assuming you are dealing with a multi-task network with two "independent" outputs which are supervised separately with two-loss terms.
Consider one of those two: you can consider this multi-label classification task as outputting n values each one estimating the probability that the corresponding label is present in the input. In your first task, you have four classes, on the second task, you have five.
Unlike a single-label classification task, here you wish to output one activation per label: you can use a sigmoid activation per logit. Then apply the Binary Cross Entropy loss on each of those outputs. In PyTorch, you can use BCELoss, which handles multi-dimensional tensors. In practice you could concatenate the two outputs from the two tasks, do the same with the target labels and call the criterion once.

Why/how can model.forward() succeed both on input being mini-batch vs single item?

Why and how does this work?
When I run the forward phase on input
being mini-batch tensor
or alternatively being a single input item
model.__call__() (which AFAIK is calling forward() ) swallows that and spills out adequate output (i.e. a tensor of mini-batch of estimates or a single item of estimate)
Adopting testcode from the Pytorch NN example shows what I mean, but I don't get it.
I would expect it to create problems and me forced to transform the single item input into a mini-batch of size 1( reshape (1,xxx)) or likewise, like I did in the code below.
( I did variations of the test to be sure it is e.g. not depending on execution order )
# -*- coding: utf-8 -*-
import torch
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
#N, D_in, H, D_out = 64, 1000, 100, 10
N, D_in, H, D_out = 64, 10, 4, 3
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-4
for t in range(1):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
model.eval()
print ("###########")
print ("x[0]",x[0])
print ("x[0].size()", x[0].size())
y_1pred = model(x[0])
print ("y_1pred.size()", y_1pred.size())
print (y_1pred)
model.eval()
print ("###########")
print ("x.size()", x.size())
y_pred = model(x)
print ("y_pred.size()", y_pred.size())
print ("y_pred[0]", y_pred[0])
print ("###########")
model.eval()
input_item = x[0]
batch_len1_shape = torch.Size([1,*(input_item.size())])
batch_len1 = input_item.reshape(batch_len1_shape)
y_pred_batch_len1 = model(batch_len1)
print ("input_item",input_item)
print ("input_item.size()", input_item.size())
print ("y_pred_batch_len1.size()", y_pred_batch_len1.size())
print (y_1pred)
raise Exception
This is the output it generates:
###########
x[0] tensor([-1.3901, -0.2659, 0.4352, -0.6890, 0.1098, -0.3124, 0.6419, 1.1004,
-0.7910, -0.5389])
x[0].size() torch.Size([10])
y_1pred.size() torch.Size([3])
tensor([-0.5366, -0.4826, 0.0538], grad_fn=<AddBackward0>)
###########
x.size() torch.Size([64, 10])
y_pred.size() torch.Size([64, 3])
y_pred[0] tensor([-0.5366, -0.4826, 0.0538], grad_fn=<SelectBackward>)
###########
input_item tensor([-1.3901, -0.2659, 0.4352, -0.6890, 0.1098, -0.3124, 0.6419, 1.1004,
-0.7910, -0.5389])
input_item.size() torch.Size([10])
y_pred_batch_len1.size() torch.Size([1, 3])
tensor([-0.5366, -0.4826, 0.0538], grad_fn=<AddBackward0>)
The docs on nn.Linear state that
Input: (N,∗,in_features) where ∗ means any number of additional dimensions
so one would naturally expect that at least two dimensions are necessary. However, if we look under the hood we will see that Linear is implemented in terms of nn.functional.linear, which dispatches to torch.addmm or torch.matmul (depending whether bias == True) which broadcast their argument.
So this behavior is likely a bug (or an error in documentation) and I would not depend on it working in the future, if I were you.

Simple LSTM in PyTorch with Sequential module

In PyTorch, we can define architectures in multiple ways. Here, I'd like to create a simple LSTM network using the Sequential module.
In Lua's torch I would usually go with:
model = nn.Sequential()
model:add(nn.SplitTable(1,2))
model:add(nn.Sequencer(nn.LSTM(inputSize, hiddenSize)))
model:add(nn.SelectTable(-1)) -- last step of output sequence
model:add(nn.Linear(hiddenSize, classes_n))
However, in PyTorch, I don't find the equivalent of SelectTable to get the last output.
nn.Sequential(
nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
# what to put here to retrieve last output of LSTM ?,
nn.Linear(hiddenSize, classe_n))
Define a class to extract the last cell output:
# LSTM() returns tuple of (tensor, (recurrent state))
class extract_tensor(nn.Module):
def forward(self,x):
# Output shape (batch, features, hidden)
tensor, _ = x
# Reshape shape (batch, hidden)
return tensor[:, -1, :]
nn.Sequential(
nn.LSTM(inputSize, hiddenSize, 1, batch_first=True),
extract_tensor(),
nn.Linear(hiddenSize, classe_n)
)
According to the LSTM cell documentation the outputs parameter has a shape of (seq_len, batch, hidden_size * num_directions) so you can easily take the last element of the sequence in this way:
rnn = nn.LSTM(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, (h0, c0))
print(output[-1]) # last element
Tensor manipulation and Neural networks design in PyTorch is incredibly easier than in Torch so you rarely have to use containers. In fact, as stated in the tutorial PyTorch for former Torch users PyTorch is built around Autograd so you don't need anymore to worry about containers. However, if you want to use your old Lua Torch code you can have a look to the Legacy package.
As far as I'm concerned there's nothing like a SplitTable or a SelectTable in PyTorch. That said, you are allowed to concatenate an arbitrary number of modules or blocks within a single architecture, and you can use this property to retrieve the output of a certain layer. Let's make it more clear with a simple example.
Suppose I want to build a simple two-layer MLP and retrieve the output of each layer. I can build a custom class inheriting from nn.Module:
class MyMLP(nn.Module):
def __init__(self, in_channels, out_channels_1, out_channels_2):
# first of all, calling base class constructor
super().__init__()
# now I can build my modular network
self.block1 = nn.Linear(in_channels, out_channels_1)
self.block2 = nn.Linear(out_channels_1, out_channels_2)
# you MUST implement a forward(input) method whenever inheriting from nn.Module
def forward(x):
# first_out will now be your output of the first block
first_out = self.block1(x)
x = self.block2(first_out)
# by returning both x and first_out, you can now access the first layer's output
return x, first_out
In your main file you can now declare the custom architecture and use it:
from myFile import MyMLP
import numpy as np
in_ch = out_ch_1 = out_ch_2 = 64
# some fake input instance
x = np.random.rand(in_ch)
my_mlp = MyMLP(in_ch, out_ch_1, out_ch_2)
# get your outputs
final_out, first_layer_out = my_mlp(x)
Moreover, you could concatenate two MyMLP in a more complex model definition and retrieve the output of each one in a similar way.
I hope this is enough to clarify, but in case you have more questions, please feel free to ask, since I may have omitted something.