It might be very basic but just got confuse in understanding why in VGG net we have multiple convolutional layers of 3x3 filter. What specific will happen when we are taking convolution of same image twice or more?
Nothing, if you don't have a non-linear transformation in between. Then you can always collapse it into a single Convulational layer which computes the same thing.
But VGG uses ReLU activation functions. This makes it possible to learn non-linear transformations of the data.
Related
I'm trying to understand why for example on MatLab page AlexNet is described as:
AlexNet is a convolutional neural network that is 8 layers deep.
After using analyzeNetwork() to check the architecture, there is clearly 25 layers.
How 25 layers are related to 8 layers deep? What's the difference between those two values?
I'm sure that I'm missing something, but I don't know what it is.
The MATLAB documentation is probably not clear enough. I should maybe talk about blocks (Personally I prefer this word). If you look at the figure:
Many "layers" have at the end a number that represents the block in which it is contained.
The term layer is often not clear, there are people who consider that a convolution + activation + batch norm is a layer. There is no consensus. In the case of MATLAB it is only counting the layers that have weights.
I have a simple encoder-decoder network. The encoder has several layers of conv1d with linear at the end and Relu between them, the decoder is consists of conv1d layers and Relu between them(no batch norm or dropout).
Using this model I try to overfit one example,I work with batch size=1 and always give the same input and same desired output, however no success. The loss indeed goes down until some threshold, but no matter what I do I can't get the loss lower than this bound and the output is useless. I tried more sophisticated encoder/decoder, change hyperparameters, make different preprocessing on my data, but I never can't get the loss lower than that threshold.
Just for the protocol, if I give it as input the desired output(so it will learn the id function) the network works, but that doesn't help me.
I will appreciate any help with it with any idea what might be the problem.
Try more number of epochs, with a lower learning rate.
Try increasing the size of your Dense layers.
Try avoiding any Dropout layers.
These can make the model more vulnerable to overfitting, if this is what you wanted.
I'm currently training multiple recurrent convolutional neural networks with deep q-learning for the first time.
Input is a 11x11x1 matrix, each network consists of 4 convolutional layer with dimensions 3x3x16, 3x3x32, 3x3x64, 3x3x64. I use stride=1 and padding=1. Each convLayer is followed by ReLU activation. The output is fed into a feedforward fully-connected dense layer with 128 units and after that into an LSTM layer, also containing 128 units. Two following dense layer produce separate advantage and value steams.
So training is running for a couple of days now and now I've realized (after I've read some related paper), I didn't add an activation function after the first dense layer (as in most of the papers). I wonder if adding one would significantly improve my network? Since I'm training the networks for university, I don't have unlimited time for training, because of a deadline for my work. However, I don't have enough experience in training neural networks, to decide on what to do...
What do you suggest? I'm thankful for every answer!
If I have to talk in general using an activation function helps you to include some non-linear property in your network.
The purpose of an activation function is to add some kind of non-linear property to the function, which is a neural network. Without the activation functions, the neural network could perform only linear mappings from inputs x to the outputs y. Why is this so?
Without the activation functions, the only mathematical operation during the forward propagation would be dot-products between an input vector and a weight matrix. Since a single dot product is a linear operation, successive dot products would be nothing more than multiple linear operations repeated one after the other. And successive linear operations can be considered as a one single learn operation.
A neural network without any activation function would not be able to realize such complex mappings mathematically and would not be able to solve tasks we want the network to solve.
What is the role of fully connected layer (FC) in deep learning? I've seen some networks have 1 FC and some have 2 FC and some have 3 FC. Can anyone explain to me?
Thanks a lot
The fully connected layers are able to very effectively learn non-linear combinations of input features. Let's take a convolutional neural network for example.
The output from the convolutional layers represents high-level features in the data. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features.
Essentially the convolutional layers are providing a meaningful, low-dimensional, and somewhat invariant feature space, and the fully-connected layer is learning a (possibly non-linear) function in that space.
I am using a pre-trained model which I want to add Elementwise layer that products the output of two layers: one layer is output of convolution layer 1x1x256x256 and the other is also the output of convolution layer 1x32x256x256. My question is: If we add elementwise layer for multiplying two layers and sending to the next layer, should we train from the scratch because the architecture is modified or still it is possible to use the pretrained model?
Thanks
Indeed making architectural changes puts the learned features at odds.
However, there's no reason not to use the learned weight for layers below the change -- these layers are not affected by the change, so they can benefit from the initialization.
As for the rest of the layers, I suppose init from trained weights should not be worse than random, So why not?
Don't forget to init any new layers with random weights (the default in caffe is zero - and this might cause trouble for learning).