Difference between conv2d and conv2dtranspose with kernel size 1 - deep-learning

I understand that conv2d is used for downsampling and conv2dtranspose is the opposite (upsampling). However, assuming we are not using stride or padding here. Is there a difference between the two?
Downsampling means reducing the size of input dimension. for example If you have an input of (Batch Size = 5, Channel = 3, Height = 8, Width = 8), if you reduce the height and width using maxpooling (stride=2 kernel_size=2) the output becomes (Batch Size = 5, Channel = 3, Height = 4, Width = 4). That's downsampling, the opposite is upsampling (Increasing the Height and Width dimension)
for example:
classifier1 = torch.nn.Conv2d(in_channels=10, out_channels=5, kernel_size=1)
classifier2 = torch.nn.Conv2dTranspose(in_channels=10, out_channels=5, kernel_size=1)

Operation wise, no difference. ConvTranspose2d() inserts stride - 1 zeros inbetween all rows and columns, adds kernel size - padding - 1 padding zeros, then does exactly the same stuff as Conv2d(). Default arguments result in no changes.
Though, if you actually run them back to back like this on the same input, the results will vary unless you explicitly equalize the inital weights, of course.

Related

MIPS turning pixel into memory address

I have been given an exercise for my course and could use some help with it. We have to turn a pixel (row x column) into its memory address and print it. $gp is pixel 0x0 and every pixel is 32 bits. How would I go about calculating let's say pixel 0,1?
(width = 32px, height = 16px)
I've looked everywhere in our course information and can't seem to find anything to help me out.
Firstly you do y * width + x = index. Then you have to multiply the index by the size of the pixel to get the offset and finally $gp + offset.

Same padding when kernel size is even

When the kernel size is odd, we can manually calculate the necessary padding to get the output in the same dimension as input such that it creates same padding.
But how can we calculate padding dimensions for kernels with even sizes (ex: (2x2)?
note these the 2 formula's
pad= (filter_size - 1 )/ 2
o/p feature map dimension= (i/p feature map dimension - filter_size + 2(pad))/stride + 1
lets assume u have i/p dimension of 28x28, and you want same padding implies your o/p
dimension to be same i.e 28x28.
and i am assuming your stride as 1
let us come to calculation of padding amount,
pad = (2 - 1) / 2
= 1 / 2
substituting this value to equation 2)
o/p feature map=(28 - 2 + 2(1/2))/1 + 1
=28
Hence the last answer is your dimension of your o/p feature map,(hence verified)
I used padding as 1 and dilation as 2 which resulted same padding.

How to perform padding operation if the kernel stride is greater than the input shape dimensions in case of Maxpooling

I am trying to perform Maxpooling operation in caffe.
Input size is 6 x 6 x 1 x 1024 whereas the kernel size is 7 x 7.
Am i supposed to do padding inorder to perform MaxPooling.
First, you haven't specified the stride; the kernel dimensions are 7x7, larger than your input, but that's the size, not the stride. Stride is how far you move between iterations, such as shifting 2 pixels (which effectively halves the size of the output).
You probably want to pad so that the center of the kernel (element (3, 3) with 0-based indexing) can be over each pixel of your input. This means that you need a 3-pixel pad ( (7-1)/2 ) in each direction.
Is that what you needed?

caffe fully convolutional cnn - how to use the crop parameters

I am trying to train a fully convolutional network for my problem. I am using the implementation https://github.com/shelhamer/fcn.berkeleyvision.org .
I have different image sizes.
I am not sure how to set the 'Offset' param in the 'Crop' layer.
What are the default values for the 'Offset' param?
How to use this param to crop the images around the center?
According to the Crop layer documentation, it takes two bottom blobs and outputs one top blob. Let's call the bottom blobs as A and B, the top blob as T.
A -> 32 x 3 x 224 x 224
B -> 32 x m x n x p
Then,
T -> 32 x m x n x p
Regarding axis parameter, from docs:
Takes a Blob and crop it, to the shape specified by the second input Blob, across all dimensions after the specified axis.
which means, if we set axis = 1, then it will crop dimensions 1, 2, 3. If axis = 2, then T would have been of the size 32 x 3 x n x p. You can also set axis to a negative value, such as -1, which would mean the last dimension, i.e. 3 in this case.
Regarding offset parameter, I checked out $CAFFE_ROOT/src/caffe/proto/caffe.proto (on line 630), I did not find any default value for offset parameter, so I assume that you have to provide that parameter, otherwise it will result in an error. However, I may be wrong.
Now, Caffe knows that you need a blob of size m on the first axis. We still need to tell Caffe from where to crop. That's where offset comes in. If offset is 10, then your blob of size m will be cropped starting from 10 and end at 10+m-1 (for a total of size m). Set one value for offset to crop by that amount in all the dimensions (which are determined by axis, remember? In this case 1, 2, 3). Otherwise, if you want to crop each dimension differently, you have to specify number of offsets equal to the number of dimensions being cropped (in this case 3). So to sum up all,
If you have a blob of size 32 x 3 x 224 x 224 and you want to crop a center part of size 32 x 3 x 32 x 64, then you would write the crop layer as follows:
layer {
name: "T"
type: "Crop"
bottom: "A"
bottom: "B"
top: "T"
crop_param {
axis: 2
offset: 96
offset: 80
}
}

How to handle boundaries in conv/pool of conv-nets?

When convolution uses a kernel size of 4 and stride size of 4, meanwhile, the input size is only 10, it will be fail when trying to do third convolution operation on the boundary of input, so, should the input padded with zeros on boundary implicitly to avoid this problem? Is there any problem when I padded with other real numbers? Is it equals to increase the input size automatically?
Besides, if I expected to get a same size output feature map, usually kernel size of 3 and pad size of 1 can be used, but when kernel size is a odd number, how to decide the pad size on each side of input?
Yes, the input must be padded with zeros to overcome the small input image size problem. To compute the output feature maps at each level use the following formula:
H_out = ( H_in + 2 x Padding_Height - Kernel_Height ) / Stride_Height + 1
W_out = (W_in + 2 x Padding_Width - Kernel_Width) / Stride_Width + 1
You may keep the padding in accordance with the above formula.