Pytorch Convolutional Autoencoder output is blurry. How to improve it? - deep-learning

I created Convolutional Autoencoder using Pytorch and I'm trying to improve it.
For the encoding layer I use first 4 layers of pre-trained ResNet 18 model from torchvision.models.resnet.
I have mid-layer with just one Convolutional layer with input and output channel sizes of 512. For the decoding layer I use Convolutional layers following with BatchNorm and ReLU activation function.
The decoding layer reduces the channel each layer: 512 -> 256 -> 128 -> 64 -> 32 -> 16 -> 3 and increases the resolution of the image with interpolation to match the dimension of the corresponding layer in the encoding part. For the last layer I use sigmoid instead of ReLu.
All Convolutional layers are:
self.up = nn.Sequential(
nn.Conv2d(input_channels, output_channels,
kernel_size=5, stride=1,
padding=2, bias=False),
nn.BatchNorm2d(output_channels),
nn.ReLU()
)
The input images are scaled to [0, 1] range and have shapes 224x224x3. Sample outputs are (First is from training set, the second from the test set):
First image
First image output
Second image
Second image output
Any ideas why output is blurry? The provided model has been trained around 160 epochs with ~16000 images using Adam optimizer with lr=0.00005. I'm thinking about adding one more Convolutional layer in self.up given above. This will increase complexity of the model, but I'm not sure if it is the right way to improve the model.

Related

Unclear Architecture of MNIST Neural Network

I am trying to reproduce a Neural Network trained to detect whether there is a 0-3 digit in an image with another confounding image. The paper I am following lists the corresponding architecture:
A neural network with 28×56 input neurons and one output neuron is
trained on this task. The input values are coded between −0.5 (black)
and +1.5 (white). The neural network is composed of a first detection
pooling layer with 400 detection neurons sum-pooled into 100 units
(i.e. we sum-pool non-overlapping groups of 4 detection units). A
second detection-pooling layer with 400 detection neurons is applied
to the 100-dimensional output of the previous layer, and activities
are sum-pooled onto a single unit representing the deep network
output. Positive examples (0-3 digit in the image) are assigned target
value 100 and negative examples are assigned target value 0. The
neural network is trained to minimize the mean-square error between
the target values and its output.
My main doubt is in this context what they mean by detection neurons, if they mean filters or a single standard ReLU neuron. Also, if the mean filters, how could they be applied in the second layer to a 100-dimensional output when they are designed to operate on 2x2 matrixes.
Reference:
Montavon, G., Bach, S., Binder, A., Samek, W., & Müller, K. (2015).
Explaining NonLinear Classification Decisions with Deep Taylor
Decomposition. arXiv. https://doi.org/10.1016/j.patcog.2016.11.008.
Specifically section 4.C
Thanks a lot for the help!
My best guess at this is something like (code not tested - just rough PyTorch):
from torch import nn
class Model(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Sequential(
nn.Flatten(), # Flatten row-wise into a 1D sequence
nn.Linear(28 * 56, 400), # Linear layer with 400 outputs.
nn.AvgPool1D(4, 4), # Sum pool to 100 outputs.
)
self.layer2 = nn.Sequential(
nn.Linear(100, 400), # Linear layer with 400 outputs.
nn.AdaptiveAvgPool1D(1), # Sum pool to 1 output.
)
def forward(self, x):
return self.layer2(self.layer1(x))
But overall I would agree with the commentor on your post that there are some issues with language here.

what is the number of layers in EfficientNetB2?

Knowing that the total number of layers in EfficientNet-B0 is 237 and in EfficientNet-B7 the total comes out to 813, what is the total number of layers in EfficientNetB2 ?
If you print len(model.layers) on EfficientNetB2 model with keras you will have 342 layers.
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB2
model = EfficientNetB2(weights='imagenet')
print(len(model.layers))
You can do this with all other versions of EfficientNetBx if you wish.
But as Pradyut said here normally not all layers are taken into account when we count them:
While counting the number of layers in a Neural Network we usually
only count convolutional layers and fully connected layers. Pooling
Layer is taken together with the Convolutional Layer and counted as
one layer and Dropout is a regularization technique so it will also
not be counted as a separate layer.
For reference, the VGG16 mode is defined as a 16 layer model. Those 16
layers are only the Convolutional Layers and Fully Connected Dense
Layers. If you count all the pooling and activation layers it will
change to a 41 layer model, which it is not. Reference: VGG16, VGG16
Paper
So as per your code you have 3 layers (1 Convolutional Layer with 28
Neurons, 1 Fully Connected Layer with 128 Neurons and 1 Fully
Connected Layer with 10 neurons)
As for making it a 10 layer network you can add more convolutional
layers or dense layers before the output layers, but it won't be
necessary for the MNIST dataset.
I hope I answered your question!

PyTorch - unexpected shape of model parameters weights

I created a fully connected network in Pytorch with an input layer of shape (1,784) and a first hidden layer of shape (1,256).
To be short: nn.Linear(in_features=784, out_features=256, bias=True)
Method 1 : model.fc1.weight.data.shape gives me torch.Size([128, 256]), while
Method 2 : list(model.parameters())[0].shape gives me torch.Size([256, 784])
In fact, between an input layer of size 784 and a hidden layer of size 256, I was expecting a matrix of shape (784,256).
So, in the first case, I see the shape of the next hidden layer (128), which does not make sense for the weights between the input and first hidden layer, and, in the second case, it looks like Pytorch took the transform of the weight matrix.
I don't really understand how Pytorch shapes the different weight matrices, and how can I access individual weights after the training. Should I use method 1 or 2? When I display the corresponding tensors, the displays look totally similar, while the shapes are different.
In Pytorch, the weights of model parameters are transposed before applying the matmul operation on the input matrix. That's why the weight matrix dimensions are flipped, and is different from what you expect; i.e., instead of being [784, 256], you observe that it is [256, 784].
You can see the Pytorch source documentation for nn.Linear, where we have:
...
self.weight = Parameter(torch.Tensor(out_features, in_features))
...
def forward(self, input):
return F.linear(input, self.weight, self.bias)
When looking at the implementation of F.linear, we see the corresponding line that multiplies the input matrix with the transpose of the weight matrix:
output = input.matmul(weight.t())

How to create a CNN for image classification with dynamic input

I would like to create a fully convolution network for binary image classification in pytorch that can take dynamic input image sizes, but I don't quite understand conceptually the idea behind changing the final layer from a fully connected layer to a convolution layer. Here and here both state that this is possible by using a 1x1 convolution.
Suppose I have a 16x16x1 image as input to the CNN. After several convolutions, the output is a 16x16x32. If using a fully connected layer, I can produce a single value output by creating 16*16*32 weights and feeding it to a single neuron. What I don't understand is how you would get a single value output by applying a 1x1 convolution. Wouldn't you end up with 16x16x1 output?
Check this link: http://cs231n.github.io/convolutional-networks/#convert
In this case, your convolution layer should be a 16 x 16 filter with 1 output channel. This will convert the 16 x 16 x 32 input into a single output.
Sample code to test:
from keras.layers import Conv2D, Input
from keras.models import Model
import numpy as np
input = Input((16,16,32))
output = Conv2D(1, 16)(input)
model = Model(input, output)
print(model.summary()) # check the output shape
output = model.predict(np.zeros((1, 16, 16, 32))) # check on sample data
print(f'output is {np.squeeze(output)}')
This approach of Fully convolutional networks are useful in segmentation tasks using patch based approaches since you can speed up prediction(inference) by feeding a bigger portion of the image.
For classification tasks, you usually have a fc layer at the end. In that case, a layer like AdaptiveAvgPool2d is used which ensures the fc layer sees a constant input feature size irrespective of the input image size.
https://pytorch.org/docs/stable/nn.html#adaptiveavgpool2d
See this pull request for torchvision VGG: https://github.com/pytorch/vision/pull/747
In case of Keras, GlobalAveragePooling2D. See the example, "Fine-tune InceptionV3 on a new set of classes".
https://keras.io/applications/
I hope you are familier with keras. Now see your image is of 16*16*1. Image will pass to the keras convoloutional layer but first we have to create the model. like model=Sequential() by this we are able to get keras model instance. now we will give our convoloutional layer with our parameters like
model.add(Conv2D(20,(2,2),padding="same"))
now here we are adding 20 filters to our image. and our image becomes 16*16*20 now for more best features we add more conv layers like
model.add(Conv2D(32,(2,2),padding="same"))
now we add 32 filters to your image after this your image will be size of 16*16*32
dont forgot to put activation after conv layers. If you are new than you should study about activations, Optimization and loss of the network. these are the basic part of neural Networks.
Now its time to move towards fully connected layer. First we need to flatten our image because fully connected layer only works on 2d vectors (no_of_ex,image_dim) in your case
imgae diminsion after applying flattening will be (16*16*32)
model.add(Flatten())
after flatening our image your network will give it to fully connected layers
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
because you are having a problem of binary classification if you have to classify 3 classes than last layer will have 3 neuron if you have to classify 10 examples than your last dense layer willh have 10 neuron.
model.add(Activation("softmax"))
model.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
return model
after this you have to fit this model.
estimator=model()
estimator.fit(X_train,y_train)
full code:
def model (classes):
model=Sequential()
# conv2d set =====> Conv2d====>relu=====>MaxPooling
model.add(Conv2D(20,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(32,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
#now adding Softmax Classifer because we want to classify 10 class
model.add(Dense(classes))
model.add(Activation("softmax"))
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001, decay=1e-6),
metrics=['accuracy'])
return model
You can take help from this kernal

Resizing a convolution layer before concatenating in Keras

I'm reading U-Net: Convolutional Networks for Biomedical Image Segmentation and want to implement this in Keras.
In U-Net, I need to concatenate convolutional layers, one is in the contracting path and the other is in the expansive path (Fig1. 1. in the paper).
However, the sizes of them doesn't match, so I have to resize the output of convolutional layer before concatenating.
How do I do this in Keras?
There is a Cropping2D Layer in Keras: https://keras.io/layers/convolutional/#cropping2d
...
conv_13 = Conv2D(64, (3, 3), padding='same', activation='relu')(conv_12) # has outputsize of 568x568
...
crop_13 = Cropping2D((392, 392))(conv_13) # crop 568x568 to 392x392 symmetrically
merge_91 = Concatenate()([crop_13, upsampled_81) # concatenate both layers with same 2D size
...
Example for concatenating the first size (568x568) to the last upsampled size (392x392).