How to handle boundaries in conv/pool of conv-nets? - deep-learning

When convolution uses a kernel size of 4 and stride size of 4, meanwhile, the input size is only 10, it will be fail when trying to do third convolution operation on the boundary of input, so, should the input padded with zeros on boundary implicitly to avoid this problem? Is there any problem when I padded with other real numbers? Is it equals to increase the input size automatically?
Besides, if I expected to get a same size output feature map, usually kernel size of 3 and pad size of 1 can be used, but when kernel size is a odd number, how to decide the pad size on each side of input?

Yes, the input must be padded with zeros to overcome the small input image size problem. To compute the output feature maps at each level use the following formula:
H_out = ( H_in + 2 x Padding_Height - Kernel_Height ) / Stride_Height + 1
W_out = (W_in + 2 x Padding_Width - Kernel_Width) / Stride_Width + 1
You may keep the padding in accordance with the above formula.

Related

MIPS turning pixel into memory address

I have been given an exercise for my course and could use some help with it. We have to turn a pixel (row x column) into its memory address and print it. $gp is pixel 0x0 and every pixel is 32 bits. How would I go about calculating let's say pixel 0,1?
(width = 32px, height = 16px)
I've looked everywhere in our course information and can't seem to find anything to help me out.
Firstly you do y * width + x = index. Then you have to multiply the index by the size of the pixel to get the offset and finally $gp + offset.

Same padding when kernel size is even

When the kernel size is odd, we can manually calculate the necessary padding to get the output in the same dimension as input such that it creates same padding.
But how can we calculate padding dimensions for kernels with even sizes (ex: (2x2)?
note these the 2 formula's
pad= (filter_size - 1 )/ 2
o/p feature map dimension= (i/p feature map dimension - filter_size + 2(pad))/stride + 1
lets assume u have i/p dimension of 28x28, and you want same padding implies your o/p
dimension to be same i.e 28x28.
and i am assuming your stride as 1
let us come to calculation of padding amount,
pad = (2 - 1) / 2
= 1 / 2
substituting this value to equation 2)
o/p feature map=(28 - 2 + 2(1/2))/1 + 1
=28
Hence the last answer is your dimension of your o/p feature map,(hence verified)
I used padding as 1 and dilation as 2 which resulted same padding.

How to perform padding operation if the kernel stride is greater than the input shape dimensions in case of Maxpooling

I am trying to perform Maxpooling operation in caffe.
Input size is 6 x 6 x 1 x 1024 whereas the kernel size is 7 x 7.
Am i supposed to do padding inorder to perform MaxPooling.
First, you haven't specified the stride; the kernel dimensions are 7x7, larger than your input, but that's the size, not the stride. Stride is how far you move between iterations, such as shifting 2 pixels (which effectively halves the size of the output).
You probably want to pad so that the center of the kernel (element (3, 3) with 0-based indexing) can be over each pixel of your input. This means that you need a 3-pixel pad ( (7-1)/2 ) in each direction.
Is that what you needed?

caffe fully convolutional cnn - how to use the crop parameters

I am trying to train a fully convolutional network for my problem. I am using the implementation https://github.com/shelhamer/fcn.berkeleyvision.org .
I have different image sizes.
I am not sure how to set the 'Offset' param in the 'Crop' layer.
What are the default values for the 'Offset' param?
How to use this param to crop the images around the center?
According to the Crop layer documentation, it takes two bottom blobs and outputs one top blob. Let's call the bottom blobs as A and B, the top blob as T.
A -> 32 x 3 x 224 x 224
B -> 32 x m x n x p
Then,
T -> 32 x m x n x p
Regarding axis parameter, from docs:
Takes a Blob and crop it, to the shape specified by the second input Blob, across all dimensions after the specified axis.
which means, if we set axis = 1, then it will crop dimensions 1, 2, 3. If axis = 2, then T would have been of the size 32 x 3 x n x p. You can also set axis to a negative value, such as -1, which would mean the last dimension, i.e. 3 in this case.
Regarding offset parameter, I checked out $CAFFE_ROOT/src/caffe/proto/caffe.proto (on line 630), I did not find any default value for offset parameter, so I assume that you have to provide that parameter, otherwise it will result in an error. However, I may be wrong.
Now, Caffe knows that you need a blob of size m on the first axis. We still need to tell Caffe from where to crop. That's where offset comes in. If offset is 10, then your blob of size m will be cropped starting from 10 and end at 10+m-1 (for a total of size m). Set one value for offset to crop by that amount in all the dimensions (which are determined by axis, remember? In this case 1, 2, 3). Otherwise, if you want to crop each dimension differently, you have to specify number of offsets equal to the number of dimensions being cropped (in this case 3). So to sum up all,
If you have a blob of size 32 x 3 x 224 x 224 and you want to crop a center part of size 32 x 3 x 32 x 64, then you would write the crop layer as follows:
layer {
name: "T"
type: "Crop"
bottom: "A"
bottom: "B"
top: "T"
crop_param {
axis: 2
offset: 96
offset: 80
}
}

Practice computing grid size for CUDA

dim3 block(4, 2)
dim3 grid((nx+block.x-1)/block.x, (ny.block.y-1)/block.y);
I found this code in Professional CUDA C Programming on page 53. It's meant to be a naive example of matrix multiplication. nx is the number of columns and ny is the number of rows.
Can you explain how the grid size is computed? Why is block.x added to nx and then subtracted by 1?
There is a preview (https://books.google.com/books?id=_Z7rnAEACAAJ&printsec=frontcover#v=onepage&q&f=false) but page 53 is missing.
This is the standard CUDA idiom for determining the minimum number of blocks in each dimension (the "grid") that completely cover the desired input. This could be expressed as ceil(nx/block.x), that is, figure out how many blocks are needed to cover the desired size, then round up.
But full floating point division and ceil is more expensive than necessary. Instead, since C defines integer division as a "floor" operation, you can add the divisor - 1 before dividing to the get the effect of a "ceiling" operation.
Try a few examples: If nx = 10, then nx + block.x - 1 is 13, and by integer divison, you need 3 blocks of size 4.
As you noted in the comment, +block.x pushes up floor to ceiling and the -1 is for numbers that divide perfectly into the divisor. e.g. (12 + 4)/4 would be 4 when we actually want (12+4-1)/4 which 3