Image denoising using Unet architecture - deep-learning

I'm new to computer vision and deep learning. I'm trying to train this Unet model https://github.com/kevinlu1211/pytorch-unet-resnet-50-encoder with Resnet50 as encoder. I want to implement it in a way that I pass two rgb images which are first processed by resnet50 and then the layers are concated before being passed to the decoder. I tried doing it and changed n_classes in the code to 3 to output a 3 channel rgb image just like the inputs but it gives me a distorted image like this which I don't understand why. Please help me with this.
The part in the code that I modified to process two rgb inputs by resnet50 is here -
for i, block in enumerate(self.down_blocks, 2): # for all the down blocks
x = block(x)
if i == (UNetWithResnet50Encoder.DEPTH - 1):
continue
pre_pools[f"layer_{i}"] = x ## creating all the down sampling layers
pre_pools_inp2 = dict()
pre_pools_inp2[f"layer_0"] = y
y = self.input_block(y) #
pre_pools_inp2[f"layer_1"] = y
y = self.input_pool(y)
for i, block in enumerate(self.down_blocks, 2): # for all the down blocks
y = block(y)
if i == (UNetWithResnet50Encoder.DEPTH - 1):
continue
pre_pools_inp2[f"layer_{i}"] = y ## creating all the down sampling layers
x = torch.cat([x,y],1)
x = self.bridge(x) # this is now the bridge between down sampling and up sampling
for i, block in enumerate(self.up_blocks, 1):
key = f"layer_{UNetWithResnet50Encoder.DEPTH - 1 - i}" # now using that bridge for upsampling f
x = block(x, pre_pools[key])
output_feature_map = x
x = self.out(x)
del pre_pools
if with_output_feature_map:
return x, output_feature_map
else:
return x

Related

Model.fit gets stopped at the end of first epoch while training the model

Batch generator function :
def batch_generator(x, y, batch_size):
"""
Generate training image give image paths and associated steering angles
"""
images = list()
steers = list()
while True:
i = 0
for index in range(x.shape[0]):
center, left, right = x[index]
steering_angle = y[index]
# argumentation
image, steering_angle = augument(center, left, right, steering_angle)
# add the image and steering angle to the batch
image = preprocess(image)
images.append(image)
steers.append(steering_angle)
i += 1
if i == batch_size:
yield ( np.array(images), np.array(steers) )
i=0
images = list()
steers = list()
Model training :
train_generator = batch_generator(X_train, y_train, batch_size=10)
model.fit(
train_generator,
epochs = 20,
steps_per_epoch = len(X_train),
validation_data=batch_generator(X_valid, y_valid,batch_size=10),
)
Actually I have search this on stackoverflow and I found that someone was saying I should provide some batch_size to the model.fit()
But as we are using generator function so we can't specify batch_size to the model parameter(if we does then we see in each epochs .../unknown progress
and some of the blog saying there is issue with the model.fit() argument validation_data, I should not provide this.----------> when I am doing this Model training is working fine.
and apart from that My generator function is working fine, I debugged it, it is generating valid result which is accepted by model input layer.
So, what is the problem with validation_data argument ?
kindly Help me!
thanks:)

Difference between WGAN and WGAN-GP (Gradient Penalty)

I just find that in the code here:
https://github.com/NUS-Tim/Pytorch-WGAN/tree/master/models
The "generator" loss, G, between WGAN and WGAN-GP is different, for WGAN:
g_loss = self.D(fake_images)
g_loss = g_loss.mean().mean(0).view(1)
g_loss.backward(one) # !!!
g_cost = -g_loss
But for WGAN-GP:
g_loss = self.D(fake_images)
g_loss = g_loss.mean()
g_loss.backward(mone) # !!!
g_cost = -g_loss
Why one is one=1 and another is mone=-1?
You might have misread the source code, the first sample you gave is not averaging the resut of D to compute its loss but instead uses the binary cross-entropy.
To be more precise:
The first method ("GAN") uses the BCE loss to compute the loss terms for D and G. The standard GAN optimization objective for D is to minimize E_x[log(D(x))] + E_z[log(1-D(G(z)))]. Source code:
outputs = self.D(images)
d_loss_real = self.loss(outputs.flatten(), real_labels) # <- bce loss
real_score = outputs
# Compute BCELoss using fake images
fake_images = self.G(z)
outputs = self.D(fake_images)
d_loss_fake = self.loss(outputs.flatten(), fake_labels) # <- bce loss
fake_score = outputs
# Optimizie discriminator
d_loss = d_loss_real + d_loss_fake
self.D.zero_grad()
d_loss.backward()
self.d_optimizer.step()
For d_loss_real you optimize towards 1s (output is considered real), while d_loss_fake optimizes towards 0s (output is considered fake).
While the second ("WCGAN") uses the Wasserstein loss (ref) whereby we maximise for D the loss: E_x[D(x)] - E_z[D(G(z))]. Source code:
# Train discriminator
# WGAN - Training discriminator more iterations than generator
# Train with real images
d_loss_real = self.D(images)
d_loss_real = d_loss_real.mean()
d_loss_real.backward(mone)
# Train with fake images
z = self.get_torch_variable(torch.randn(self.batch_size, 100, 1, 1))
fake_images = self.G(z)
d_loss_fake = self.D(fake_images)
d_loss_fake = d_loss_fake.mean()
d_loss_fake.backward(one)
# [...]
Wasserstein_D = d_loss_real - d_loss_fake
By doing d_loss_real.backward(mone) you backpropage with a gradient of opposite sign, i.e. its's a gradient ascend, and you end up maximizing d_loss_real.
In order to Update D network:
lossD = Expectation of D(fake data) - Expectation of D(real data) + gradient penalty
lossD ↓,D(real data) ↑
so you need to add minus one to the gradient process

Pytorch - Loss for Object Localization

I am trying to perform an object localization task with MNIST based on Andrew Ng's lecture here. I am taking the MNIST digits and randomly placing them into a 90x90 shaped image and predicting the digit and it's center point. When I train, I am getting very poor results and my question is about whether or not my loss function is set up correctly. I basically just take the CrossEntropy for the digit, the MSE for the coordinates, and then add them all up. Is this correct? I don't get any errors, but the performance is just horrendous.
My dataset is defined as follows (which returns the label and the x y coordinates of the center of the digit):
class CustomMnistDataset_OL(Dataset):
def __init__(self, df, test=False):
'''
df is a pandas dataframe with 28x28 columns for each pixel value in MNIST
'''
self.df = df
self.test = test
def __len__(self):
return len(self.df)
def __getitem__(self, idx):
if self.test:
image = np.reshape(np.array(self.df.iloc[idx,:]), (28,28)) / 255.
else:
image = np.reshape(np.array(self.df.iloc[idx,1:]), (28,28)) / 255.
# create the new image
new_img = np.zeros((90, 90)) # images will be 90x90
# randomly select a bottom left corner to use for img
x_min, y_min = randrange(90 - image.shape[0]), randrange(90 - image.shape[0])
x_max, y_max = x_min + image.shape[0], y_min + image.shape[0]
x_center = x_min + (x_max-x_min)/2
y_center = y_min + (y_max-x_min)/2
new_img[x_min:x_max, y_min:y_max] = image
label = [int(self.df.iloc[idx,0]), x_center, y_center] # the label consists of the digit and the center of the number
sample = {"image": new_img, "label": label}
return sample['image'], sample['label']
My training function is set up as follows:
loss_fn = nn.CrossEntropyLoss()
loss_mse = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
def train(dataloader, model, loss_fn, loss_mse, optimizer):
model.train() # very important... This turns the model back to training mode
size = len(train_dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
X, y0, y1, y2 = X.to(device), y[0].to(device), y[1].to(device), y[2].to(device)
pred = model(X.float())
# DEFINE LOSS HERE -------
loss = loss_fn(pred[0], y0) + loss_mse(pred[1], y1.float()) + loss_mse(pred[2], y2.float())
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch*len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

Building RNN from scratch in pytorch

I am trying to build RNN from scratch using pytorch and I am following this tutorial to build it.
import torch
import torch.nn as nn
import torch.nn.functional as F
class BasicRNN(nn.Module):
def __init__(self, n_inputs, n_neurons):
super(BasicRNN, self).__init__()
self.Wx = torch.randn(n_inputs, n_neurons) # n_inputs X n_neurons
self.Wy = torch.randn(n_neurons, n_neurons) # n_neurons X n_neurons
self.b = torch.zeros(1, n_neurons) # 1 X n_neurons
def forward(self, X0, X1):
self.Y0 = torch.tanh(torch.mm(X0, self.Wx) + self.b) # batch_size X n_neurons
self.Y1 = torch.tanh(torch.mm(self.Y0, self.Wy) +
torch.mm(X1, self.Wx) + self.b) # batch_size X n_neurons
return self.Y0, self.Y1
class CleanBasicRNN(nn.Module):
def __init__(self, batch_size, n_inputs, n_neurons):
super(CleanBasicRNN, self).__init__()
self.rnn = BasicRNN(n_inputs, n_neurons)
self.hx = torch.randn(batch_size, n_neurons) # initialize hidden state
def forward(self, X):
output = []
# for each time step
for i in range(2):
self.hx = self.rnn(X[i], self.hx)
output.append(self.hx)
return output, self.hx
FIXED_BATCH_SIZE = 4 # our batch size is fixed for now
N_INPUT = 3
N_NEURONS = 5
X_batch = torch.tensor([[[0,1,2], [3,4,5],
[6,7,8], [9,0,1]],
[[9,8,7], [0,0,0],
[6,5,4], [3,2,1]]
], dtype = torch.float) # X0 and X1
model = CleanBasicRNN(FIXED_BATCH_SIZE,N_INPUT,N_NEURONS)
a1,a2 = model(X_batch)
Running this code returns this error
RuntimeError: size mismatch, m1: [4 x 5], m2: [3 x 5] at /pytorch/..
After some digging I found this error happens when passing the hidden states to the BasicRNN model
N_INPUT = 3 # number of features in input
N_NEURONS = 5 # number of units in layer
X0_batch = torch.tensor([[0,1,2], [3,4,5],
[6,7,8], [9,0,1]],
dtype = torch.float) #t=0 => 4 X 3
X1_batch = torch.tensor([[9,8,7], [0,0,0],
[6,5,4], [3,2,1]],
dtype = torch.float) #t=1 => 4 X 3
test_model = BasicRNN(N_INPUT,N_NEURONS)
a1,a2 = test_model(X0_batch,X1_batch)
a1,a2 = test_model(X0_batch,torch.randn(1,N_NEURONS)) # THIS LINE GIVES ERROR
What is happening in the hidden states and How can I solve this problem?
Maybe the tutorial is wrong: torch.mm(X1, self.Wx) multiplies a 3 x 5 and a 4 x 5 tensor, which doesn't work. Even if you make it work by rewriting as torch.mm(self.Wx, X1.t()), you expect it to output a 4 x 5 tensor, but the result is a 4 x 3 tensor.
The BasicRNN is not an implementation of an RNN cell, but rather the full RNN fixed for two time steps. It is depicted in the image of the tutorial:
Where Y0, the first time step, does not include the previous hidden state (technically zero) and Y0 is also h0, which is then used for the second time step, Y1 or h1.
An RNN cell is one of the time steps in isolation, particularly the second one, as it should include the hidden state of the previous time step.
The next hidden state is calculate as described in the nn.RNNCell documentation:
In your BasicRNN there is only one bias term, but you still have a weight Wx for the input and the weight Wy for the hidden state, which should probably be called Wh instead. As for the forward method, its arguments become the input and the previous hidden state, instead of being two inputs at different time steps. This also means that you only have one calculation, corresponding to the formula of the nn.RNNCell, which was the calculation for the Y1, except that it uses the hidden state that was passed to the forward method.
class BasicRNN(nn.Module):
def __init__(self, n_inputs, n_neurons):
super(BasicRNN, self).__init__()
self.Wx = torch.randn(n_inputs, n_neurons) # n_inputs X n_neurons
self.Wh = torch.randn(n_neurons, n_neurons) # n_neurons X n_neurons
self.b = torch.zeros(1, n_neurons) # 1 X n_neurons
def forward(self, x, hidden):
return torch.tanh(torch.mm(x, self.Wx) + torch.mm(hidden, self.Wh) + self.b)
In the tutorial, they opted to use nn.RNNCell directly instead of implementing the cell.
Note: The terms of the matrix multiplications are in a different order, because the weights are usually transposed in comparison to your weights and the formula assumes the input and hidden state to be vectors (not batches). Technically, the batched inputs and hidden states would have to be transposed, and the output would be transposed back for it to work with the batches. It's easier to just use the transposed the weight, as the result is the same due to the transpose property of the matrix multiplication:

Strange jumps in val_loss for Keras mobile_net based model

I am trying to train a model based on MobileNet to do landmark detection for Dog Faces (output is a tensor with x/y coordinates for the position of the eyes and the nose of the dog).
In my training, I am seeing the following graph for val_loss:
Question: What is going on with the random spikes in val_loss?
My model looks like this:
model = applications.MobileNet(weights="imagenet",
include_top=False,
input_shape=(224, 224, 3))
for layer in model.layers:
layer.trainable = True
x = model.output
p = 0.6
x = AveragePooling2D()(output_layer)
x = BatchNormalization(axis=1, name="net_out")(x)
x = Dropout(p/4)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p)(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p/2)(x)
# PREDICT_SIZE is 6 - the number of landmarks I'm extracting
x = Dense(PREDICT_SIZE, activation="linear")(x)
points = Reshape((PREDICT_SIZE,), name="f")(x)
model_final = Model(input=model.input, output=points)
model_final.compile(loss="mse",
optimizer=optimizers.Adam(epsilon=1e-7),
metrics=["accuracy"])
Further questions: Should I set epsilon to be an even bigger number in my optimizer? Will my model suffer if I make epsilon larger?
Should I try a different loss function?
Is the model itself at fault?
Should I try different top layers for my model?
Update: Removing the "relu" activation resulted both in no more funky jumps in val_loss, but also in a model with lower loss and even better precision! WOOHOO!
Here's the val_loss value after removing "relu" from the top layers (and no other changes).