Resizing a convolution layer before concatenating in Keras - deep-learning

I'm reading U-Net: Convolutional Networks for Biomedical Image Segmentation and want to implement this in Keras.
In U-Net, I need to concatenate convolutional layers, one is in the contracting path and the other is in the expansive path (Fig1. 1. in the paper).
However, the sizes of them doesn't match, so I have to resize the output of convolutional layer before concatenating.
How do I do this in Keras?

There is a Cropping2D Layer in Keras: https://keras.io/layers/convolutional/#cropping2d
...
conv_13 = Conv2D(64, (3, 3), padding='same', activation='relu')(conv_12) # has outputsize of 568x568
...
crop_13 = Cropping2D((392, 392))(conv_13) # crop 568x568 to 392x392 symmetrically
merge_91 = Concatenate()([crop_13, upsampled_81) # concatenate both layers with same 2D size
...
Example for concatenating the first size (568x568) to the last upsampled size (392x392).

Related

Why is it that when viewing the architecture in Netron, the normalization layer that goes right after the convolutional layer is not shown?

I test some changes on a Convolutional neural network architecture. I tried to add BatchNorm layer right after conv layer and than add activation layer. Then I swapped activation layer with BatchNorm layer.
# here example of conv-> b_norm -> activation
...
nn.Conv2d(out_, out_, kernel_size=(1, 3), padding=(0, 1), bias=False),
nn.BatchNorm2d(out_),
nn.LeakyReLU(),
...
# here example of conv-> activation -> b_norm
...
nn.Conv2d(out_, out_, kernel_size=(1, 3), padding=(0, 1), bias=False),
nn.LeakyReLU(),
nn.BatchNorm2d(out_),
...
I have noticed that in Netron (app for visualization NN architectures) there is NO batch_norm in architecture with b_norm right after conv, but there is in the other one with b_norm after activation.
So my question is: Does the normalization layer after convolution have any special meaning or is there something wrong with the netron?
The fact that you cannot see Batch Norm when it follows convolution operation has to do with Batch Norm Folding. The convolution operation followed by Batch Norm can be replaced with just one convolution only with different weights.
I assume, for Netron visualization you first convert to ONNX format.
In PyTorch, Batch Norm Folding optimization is performed by default by torch.onnx.export, where eval mode is default one. You can disable this behavior by converting to ONNX in train mode:
torch.onnx.export(..., training=TrainingMode.TRAINING,...)
Then you should be able to see all Batch Norm layers in your graph.

Pytorch add hyperparameters for 3x3,32 conv2d layer and 2x2 maxpool layer

I am trying to create a conv2d layer below using pytorch. The hyperparameters are given in the image below. I am unsure how to implement the hyperparameters (3x3,32) for the first conv2d layer. I want to know how to use this using torch.nn.Conv2d.
Thank you very much.
Conv2d with hyperparameters
The conv2d hyper-parameters (3x3, 32) represents kernel_size=(3, 3) and number of output channels=32.
Therefore, this is how you define the first conv layer in your diagram:
conv3x3_32 = nn.Conv2d(in_channles=3, out_channels=32, kernel_size=3)
Note that the in_channles hyper-parameter should match the out_channels of the previous layer (or the input's).
For more details, see nn.Conv2d.

Architecture of VGGnet. What is multi-crop, dense evaluation?

I was reading the VGG16 paper very deep convolutional networks for large-scale image recognition
In 3.2 TESTING, It talks that all fully-connected layers are replaced by some CNN layers
Namely,
the fully-connected layers are first converted to convolutional layers (the first FC layer to a 7 × 7
conv. layer, the last two FC layers to 1 × 1 conv. layers). The resulting fully-convolutional net is
then applied to the whole (uncropped) image. The result is a class score map with the number of
channels equal to the number of classes, and a variable spatial resolution, dependent on the input
image size. Finally, to obtain a fixed-size vector of class scores for the image, the class score map is
spatially averaged (sum-pooled)
So the architecture of VGG16(Configuration D) when predict on testing set will be
input=(224, 224)
conv2d(64, (3,3))
conv2d(64, (3,3))
Maxpooling(2, 2)
conv2d(128, (3,3))
conv2d(128, (3,3))
Maxpooling(2, 2)
conv2d(256, (3,3))
conv2d(256, (3,3))
conv2d(256, (3,3))
Maxpooling(2, 2)
conv2d(512, (3,3))
conv2d(512, (3,3))
conv2d(512, (3,3))
Maxpooling(2, 2)
conv2d(512, (3,3))
conv2d(512, (3,3))
conv2d(512, (3,3))
Maxpooling(2, 2)
Dense(4096) is replaced by conv2d((7, 7))
Dense(4096) is replaced by conv2d((1, 1))
Dense(1000) is replaced by conv2d((1, 1))
So this architecture only uses for testing set?
Does the last 3 CNN layers all have 1000 channels?
The result is a class score map with the number of channels equal to the number of classes
Since the input size is 224*224, the size of output after the last Maxpooling layer will be (7 * 7). Why does it say a variable spatial resolution? I know it do multi-class scale, but it will be cropped to a (224, 224) image before input.
And How VGG16 gets a (1000, ) vector? What is spatially average(sum-pooled) in here? Does it just add a sum pooling layer with size (7, 7) to get a (1, 1, 1000) array?
the class score map is spatially averaged (sum-pooled)
In 3.2 TESTING
Also, multi-crop evaluation is complementary to dense evaluation due
to different convolution boundary conditions: when applying a ConvNet to a crop, the convolved
feature maps are padded with zeros, while in the case of dense evaluation the padding for the same
crop naturally comes from the neighbouring parts of an image (due to both the convolutions and
spatial pooling), which substantially increases the overall network receptive field, so more context
is captured.
So the multi-crop and dense evaluation will be used only on the validation set?
Let's say the input size is (256, 256), multi-crop might get a size of (224, 224) image, where the centre of the cropped image may be different, say [0:223, 0:223] or [1:224, 1:224]. Is my understand of multi-crop correct?
And what is dense evaluation? I am trying to google them, but cannot get relevant results.
the main idea of changing the dense layer to the convolutional layer is to make the inference input image size independent. Suppose you have (224,224) size image, then your network with FC will work nicely, but as soon as the image size is changed, your network will start throwing size mismatch error (which means your network is image size dependent).
Hence, to counter such things, a complete convolutional network is made where the features are stored in the channel while the size of the image is average using an average pooling layer or even convolutional steps to this dimension (channel=number_of_classification classes,1,1). So when you flatten this last outcome, it will come as *number_of_classes = channel*1*1.*
I am not attaching a complete code for this, because your complete questions will need more detailed answers while defining lots of basics. I encourage you to read the full connected convolutional network to get the idea. It's easy and I am 100% sure you will understand the nitty-gritty of that.

Pytorch Convolutional Autoencoder output is blurry. How to improve it?

I created Convolutional Autoencoder using Pytorch and I'm trying to improve it.
For the encoding layer I use first 4 layers of pre-trained ResNet 18 model from torchvision.models.resnet.
I have mid-layer with just one Convolutional layer with input and output channel sizes of 512. For the decoding layer I use Convolutional layers following with BatchNorm and ReLU activation function.
The decoding layer reduces the channel each layer: 512 -> 256 -> 128 -> 64 -> 32 -> 16 -> 3 and increases the resolution of the image with interpolation to match the dimension of the corresponding layer in the encoding part. For the last layer I use sigmoid instead of ReLu.
All Convolutional layers are:
self.up = nn.Sequential(
nn.Conv2d(input_channels, output_channels,
kernel_size=5, stride=1,
padding=2, bias=False),
nn.BatchNorm2d(output_channels),
nn.ReLU()
)
The input images are scaled to [0, 1] range and have shapes 224x224x3. Sample outputs are (First is from training set, the second from the test set):
First image
First image output
Second image
Second image output
Any ideas why output is blurry? The provided model has been trained around 160 epochs with ~16000 images using Adam optimizer with lr=0.00005. I'm thinking about adding one more Convolutional layer in self.up given above. This will increase complexity of the model, but I'm not sure if it is the right way to improve the model.

How to create a CNN for image classification with dynamic input

I would like to create a fully convolution network for binary image classification in pytorch that can take dynamic input image sizes, but I don't quite understand conceptually the idea behind changing the final layer from a fully connected layer to a convolution layer. Here and here both state that this is possible by using a 1x1 convolution.
Suppose I have a 16x16x1 image as input to the CNN. After several convolutions, the output is a 16x16x32. If using a fully connected layer, I can produce a single value output by creating 16*16*32 weights and feeding it to a single neuron. What I don't understand is how you would get a single value output by applying a 1x1 convolution. Wouldn't you end up with 16x16x1 output?
Check this link: http://cs231n.github.io/convolutional-networks/#convert
In this case, your convolution layer should be a 16 x 16 filter with 1 output channel. This will convert the 16 x 16 x 32 input into a single output.
Sample code to test:
from keras.layers import Conv2D, Input
from keras.models import Model
import numpy as np
input = Input((16,16,32))
output = Conv2D(1, 16)(input)
model = Model(input, output)
print(model.summary()) # check the output shape
output = model.predict(np.zeros((1, 16, 16, 32))) # check on sample data
print(f'output is {np.squeeze(output)}')
This approach of Fully convolutional networks are useful in segmentation tasks using patch based approaches since you can speed up prediction(inference) by feeding a bigger portion of the image.
For classification tasks, you usually have a fc layer at the end. In that case, a layer like AdaptiveAvgPool2d is used which ensures the fc layer sees a constant input feature size irrespective of the input image size.
https://pytorch.org/docs/stable/nn.html#adaptiveavgpool2d
See this pull request for torchvision VGG: https://github.com/pytorch/vision/pull/747
In case of Keras, GlobalAveragePooling2D. See the example, "Fine-tune InceptionV3 on a new set of classes".
https://keras.io/applications/
I hope you are familier with keras. Now see your image is of 16*16*1. Image will pass to the keras convoloutional layer but first we have to create the model. like model=Sequential() by this we are able to get keras model instance. now we will give our convoloutional layer with our parameters like
model.add(Conv2D(20,(2,2),padding="same"))
now here we are adding 20 filters to our image. and our image becomes 16*16*20 now for more best features we add more conv layers like
model.add(Conv2D(32,(2,2),padding="same"))
now we add 32 filters to your image after this your image will be size of 16*16*32
dont forgot to put activation after conv layers. If you are new than you should study about activations, Optimization and loss of the network. these are the basic part of neural Networks.
Now its time to move towards fully connected layer. First we need to flatten our image because fully connected layer only works on 2d vectors (no_of_ex,image_dim) in your case
imgae diminsion after applying flattening will be (16*16*32)
model.add(Flatten())
after flatening our image your network will give it to fully connected layers
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
because you are having a problem of binary classification if you have to classify 3 classes than last layer will have 3 neuron if you have to classify 10 examples than your last dense layer willh have 10 neuron.
model.add(Activation("softmax"))
model.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
return model
after this you have to fit this model.
estimator=model()
estimator.fit(X_train,y_train)
full code:
def model (classes):
model=Sequential()
# conv2d set =====> Conv2d====>relu=====>MaxPooling
model.add(Conv2D(20,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(32,(5,5),padding="same"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(8))
model.add(Activation("relu"))
model.add(Dense(2))
#now adding Softmax Classifer because we want to classify 10 class
model.add(Dense(classes))
model.add(Activation("softmax"))
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001, decay=1e-6),
metrics=['accuracy'])
return model
You can take help from this kernal