Shape of ground truth in multiclass image segmentation with pytorch - deep-learning

I'm working on 128 x 128 x 3 cell images and want to segment them into 5 classes including backgrounds. I first made target images to be 128 x 128 and values are in {0,1,2,3,4}. But I found I have to make my target ground truth as 5-channel image, and all the values are 0 or 1: if a pixel has 1 in the nth channel, then it should be classified to nth class.
But when I run my model into a Unet model which I forked from GitHub, I found there's an error while calculating cross-entropy loss.
I initially set up the number of channels in the input to be 3 and the number of classes in the output to be 5. And batch size = 2
Here is my codes:
for i, (x, y) in batch_iter:
input, target = x.to(self.device), y.to(self.device) # send to device (GPU or CPU)
self.optimizer.zero_grad() # zerograd the parameters
out = self.model(input) # one forward pass
loss = self.criterion(out, target) # calculate loss
loss_value = loss.item()
train_losses.append(loss_value)
loss.backward() # one backward pass
self.optimizer.step() # update the parameters
batch_iter.set_description(f'Training: (loss {loss_value:.4f})') # update progressbar
self.training_loss.append(np.mean(train_losses))
self.learning_rate.append(self.optimizer.param_groups[0]['lr'])
batch_iter.close()
And error message
RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [2, 5, 128, 128]
How can I solve this?

It seems you are using either nn.CrossEntropyLoss or nn.functional.cross_entropy
I also faced the same error.
CrossEntropyLoss is usually used for classification use cases.
If your targets are normalized tensors with values in [0, 1], you could use nn.BCELoss or nn.functional.binary_cross_entropy_with_logits. This worked in my case as we are using separate mask for each class - it becomes a binary cross entropy problem.

Related

Input and Output to the lstms in pytorch

I want to implement lstms with CNN in pytorch as my data is a time series data i.e. frames of video for heart rate detection, I am struggling with the input and output dimensions for lstms what and how i should properly configure the dimensions/parameters/arguments at input of lstms in pytorch as its quite confusing when considering time steps, hidden state etc.
my output from CNN is “2 batches of 256 frames”, which is now the input to lstms
batch is 2
features =256
the output is also a batch of 2 with 256 frames.
Generally, the input shape of sequential data takes the form (batch_size, seq_len, num_features). Based on your explanation, I assume your input is of the form (2, 256), where 2 is the batch size and 256 is the sequence length of scalars (1-dimensional tensor). Therefore, you should reshape your input to be (2, 256, 1) by inputs.unsqueeze(2).
To declare and use an LSTM model, simply try
from torch import nn
model = nn.LSTM(
input_size=1, # 1-dimensional features
batch_first=True, # batch is the first (zero-th) dimension
hidden_size=some_hidden_size, # maybe 64, 128, etc.
num_layers=some_num_layers, # maybe 1 or 2
proj_size=1, # output should also be 1-dimensional
)
outputs, (hidden_state, cell_state) = model(inputs)

Combine two tensors of same dimension to get a final output tensor using trainable weights

While working on a problem related to question-answering(MRC), I have implemented two different architectures that independently give two tensors (probability distribution over the tokens). Both the tensors are of dimension (batch_size,512). I wish to obtain the final output of the form (batch_size,512). How can I combine the two tensors using trainable weights and then train the model on the final prediction?
Edit (Additional Information):
So in the forward function of my NN model, I have used BERT model to encode the 512 tokens. These encodings are 768 dimensional. These are then passed to a Linear layer nn.Linear(768,1) to output a tensor of shape (batch_size,512,1). Apart from this I have another model built on top of the BERT encodings that also yields a tensor of shape (batch_size, 512, 1). I wish to combine these two tensors to finally get a tensor of shape (batch_size, 512, 1) which can be trained against the output logits of the same shape using CrossEntropyLoss.
Please share the PyTorch code snippet if possible.
Assume your two vectors are V1 and V2. You need to combine them (ensembling) to get a new vector. You can use a weighted sum like this:
alpha = sigmoid(alpha)
V_final = alpha * V1 + (1 - alpha) * V2
where alpha is a learnable scaler. The sigmoid is to bound alpha between 0 and 1,
and you can initialise alpha = 0 so that sigmoid(alpha) is half, meaning you are adding V1 and V2 with equal weights.
This is a linear combination, and there can be non-linear versions as well.
You can have a nonlinear layer that accepts (V1;V2) (the concatenation) and outputs a softmaxed output as well e.g. softmax(W * (V1;V2) + b).

Wasserstein GAN implemtation in pytorch. How to implement the loss?

I'm currently working on a project in pytorch on Wasserstein GAN (https://arxiv.org/pdf/1701.07875.pdf).
In Wasserstain GAN a new objective function is defined using the wasserstein distance as :
Which leads to the following algorithms for training the GAN:
My question is :
When implementing line 5 and 6 of the algorithm in pytorch should I be multiplying my loss -1 ? As in my code (I use RMSprop as my optimizer for both the generator and critic):
############################
# (1) Update D network: maximize (D(x)) + (D(G(x)))
###########################
for n in range(n_critic):
D.zero_grad()
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
output = D(real_cpu)
#errD_real = -criterion(output, label) #DCGAN
errD_real = torch.mean(output)
# Calculate gradients for D in backward pass
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
# Generate batch of latent vectors
noise = torch.randn(b_size, 100, device=device) #Careful here we changed shape of input (original : torch.randn(4, 100, 1, 1, device=device))
# Generate fake image batch with G
fake = G(noise)
# Classify all fake batch with D
output = D(fake.detach())
# Calculate D's loss on the all-fake batch
errD_fake = torch.mean(output)
# Calculate the gradients for this batch
errD_fake.backward()
D_G_z1 = output.mean().item()
# Add the gradients from the all-real and all-fake batches
errD = -(errD_real - errD_fake)
# Update D
optimizerD.step()
#Clipping weights
for p in D.parameters():
p.data.clamp_(-0.01, 0.01)
As you can see, I do the operation errD = -(errD_real - errD_fake), with errD_real and errD_fake being respectively the mean of the predictions of the critic on real and fake samples.
To my understanding RMSprop should optimize the weights of the critic the following way :
w <- w - alpha*gradient(w)
(alpha being the learning rate divided by the square root of the weighted moving average of the squared gradient)
Since the optimization problem requires to "go" in the same direction as the gradient it should be required to multiply gradient(w) by -1 before optimizing the weights.
Do you think that my reasoning is right ?
The program runs but my results are quiet poor.
I follow the same logic for the generator's weights but this time in order to go in the opposite direction of the gradient:
############################
# (2) Update G network: minimize -D(G(x))
###########################
G.zero_grad()
noise = torch.randn(b_size, 100, device=device)
fake = G(noise)
#label.fill_(fake_label) # fake labels are real for generator cost
# Since we just updated D, perform another forward pass of all-fake batch through D
output = D(fake).view(-1)
# Calculate G's loss based on this output
#errG = criterion(output, label) #DCGAN
errG = -torch.mean(output)
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
Sorry for the long question, I tried to explain my doubt as clear as possible. Thank you everyone.
I noticed some errors in the implementation of your discriminator training protocol. You call your backward functions twice with both the real and fake values loss being backpropagated at different time steps.
Technically an implementation using this scheme is possible but highly unreadable. There was a mistake with your errD_real in which your output is going to be positive instead of negative as an optimal D(G(z))>0 and so you penalize it for being correct. Overall your model converges simply by predicting D(x)<0 for all inputs.
To fix this do not call your errD_readl.backward() or your errD_fake.backward(). Simply using an errD.backward() after you define errD would work perfectly fine. Otherwise, your generator seems to be correct.

Pytorch: How do I deal with different input sizes within one batch?

I am implementing something closely related to the DeepSets architecture on point clouds:
https://arxiv.org/abs/1703.06114
That means I am working with a set of inputs (coordinates), have fully connected layers process each of those seperately and then perform average pooling over them (to then do further processing).
The input for each sample i is a tensor of shape [L_i, 3] where L_i is the number of points and the last dimension is 3 because each points has x,y,z coordinates. Crucially, L_i depends on the sample. So I have a different number of points per instance. When I put everything into a batch, I currently have the input in the shape [B, L, 3] where L is larger than L_i for all i. The individual samples are padded with 0's. The issue is that 0's are not ignored by the network, they are processed and fed into the average pooling. Instead I would like the average pooling to only consider actual points (not padded 0's). I do have another array which stores the lengths [L_1, L_2, L_3, L_4...], but I am not sure how to use it.
My Question is: How do you handle different input sizes wihtin one batch in the most graceful manner?
This is how the model is define:
encoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 128))
x = self.encoder(x)
x = x.max(dim=1)[0]
decoder = ...

Defining a Keras function

I have recently started to learn Deep Learning and CNNs. I have come across the following code which defines a simple CNN.
Can anyone help me to understand how these lines work:
loss = layer_output[:, :, :, 0] - What is the result of this ? My question is that, the network has not been trained yet. Weights [Kernels] are not yet calculated. so, what data it is going to return !! Does 0 represent the first kernel ?
iterate = K.function([input_img], [loss, grads]) - There is not much documentation available on Keras site. What I understand is that iterate is a function which takes an Input tensor and returns a list of tensors, first one is loss and second one is grads. But, they are defined elsewhere !!
Define Input Image with these dimensions:
img_data = np.random.uniform(size=(1, 250, 250, 3))
There is a Simple CNN, which has one Convolutional layer. It uses two 3 X 3 kernels.
input = Input(shape=250, 250, 3,), name='input_1')
First_Conv2D = Conv2D(2, kernel_size=(3, 3), padding="same", name='conv2d_1', activation='relu')(input)
flat = Flatten(name='flatten_1')(First_Conv2D)
output = Dense(2, name='dense_1', activation='softmax')(flat)
model = Model(inputs=[input], outputs=[output])
layer_dict = dict([(layer.name, layer) for layer in model.layers[0:]])
layer_output = layer_dict['conv2d_1'].output
input_img = model.input
# Calculate loss and gradient.
loss = layer_output[:, :, :, 0]
grads = K.gradients(loss, input_img)[0]
# Define a Keras function
iterate = K.function([input_img], [loss, grads])
# Call iterate function
loss_value, grads_value = iterate([img_data])
Thank You.
This looks like a nasty dissection of Keras as an API. I reckon it leads to more confusion rather than an introduction to deep learning. Anyway, addressing your questions:
All tensors are symbolic meaning that until we run a session, they do not contain any values. They instead define a directed computation graph. The loss = layer_output[:,:,:,0] is an slicing operation that takes the first element of the last dimension returning another tensor with 3 dimensions. When you run the session with actual inputs, then the tensors will have values which these operations run. The operations are almost identical to NumPy ndarrays which are not symbolic and contain values, you can get an intuition.
K.function just glues the inputs to the outputs returning a single operation that when given the inputs it will follow the computation graph from the inputs to the defined outputs. In this case, given a list of single input it returns a list of 2 output tensors loss and gradients. These are still symbolic remember, if you try to print one you'll just get what it is and it's shape, data type.