How necessary are activation functions after dense layer in neural networks? - deep-learning

I'm currently training multiple recurrent convolutional neural networks with deep q-learning for the first time.
Input is a 11x11x1 matrix, each network consists of 4 convolutional layer with dimensions 3x3x16, 3x3x32, 3x3x64, 3x3x64. I use stride=1 and padding=1. Each convLayer is followed by ReLU activation. The output is fed into a feedforward fully-connected dense layer with 128 units and after that into an LSTM layer, also containing 128 units. Two following dense layer produce separate advantage and value steams.
So training is running for a couple of days now and now I've realized (after I've read some related paper), I didn't add an activation function after the first dense layer (as in most of the papers). I wonder if adding one would significantly improve my network? Since I'm training the networks for university, I don't have unlimited time for training, because of a deadline for my work. However, I don't have enough experience in training neural networks, to decide on what to do...
What do you suggest? I'm thankful for every answer!

If I have to talk in general using an activation function helps you to include some non-linear property in your network.
The purpose of an activation function is to add some kind of non-linear property to the function, which is a neural network. Without the activation functions, the neural network could perform only linear mappings from inputs x to the outputs y. Why is this so?
Without the activation functions, the only mathematical operation during the forward propagation would be dot-products between an input vector and a weight matrix. Since a single dot product is a linear operation, successive dot products would be nothing more than multiple linear operations repeated one after the other. And successive linear operations can be considered as a one single learn operation.
A neural network without any activation function would not be able to realize such complex mappings mathematically and would not be able to solve tasks we want the network to solve.

Related

How to Train End to End model with Part RNN and Part Graph Neural Network?

I am currently working on a project in which I need to use RNNs in part of a neural network. Essentially the RNN would take in a text of variable length and output a feature representation. This feature representation would then be clubbed with some more feature vectors and fed to a different graph neural network. Some loss would be calculated on the output of the graph neural network and this loss would be backpropagated across the entire network including the RNN and be used to train the entire end to end network.
However, I am not able to wrap my head around how I can use the RNN as a part of another different non-sequential model. I use PyTorch for most of my work.
Can anyone suggest any way in which I may address this problem. Or refer to any material which might be useful.
Thanks

How to perform polynomial landmark detection with deep learning

I am trying to build a system to segment vehicles using a deep convolutional neural network. I am familiar with predicting a set amount of points (i.e. ending a neural architecture with a Dense layer with 4 neurons to predict 2 points(x,y) coords for both). However, vehicles come in many different shapes and sizes and one vehicle may require more segmentation points than another. How can I create a neural network that can have different amounts of output values? I imagine I could use a RNN of some sort but would like a little guidance. Thank you
For example, in the following image the two vehicles have a different number of labeled keypoints.

effect of multiple convolutional layers in CNN

It might be very basic but just got confuse in understanding why in VGG net we have multiple convolutional layers of 3x3 filter. What specific will happen when we are taking convolution of same image twice or more?
Nothing, if you don't have a non-linear transformation in between. Then you can always collapse it into a single Convulational layer which computes the same thing.
But VGG uses ReLU activation functions. This makes it possible to learn non-linear transformations of the data.

What is the role of fully connected layer in deep learning?

What is the role of fully connected layer (FC) in deep learning? I've seen some networks have 1 FC and some have 2 FC and some have 3 FC. Can anyone explain to me?
Thanks a lot
The fully connected layers are able to very effectively learn non-linear combinations of input features. Let's take a convolutional neural network for example.
The output from the convolutional layers represents high-level features in the data. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features.
Essentially the convolutional layers are providing a meaningful, low-dimensional, and somewhat invariant feature space, and the fully-connected layer is learning a (possibly non-linear) function in that space.

Neural network: fit a function

Can you approximate a function (different from a line but still in the x,y plane: like cos, sin, arc, exp, etc) using a neural network with just an input, an output and a single layer of hidden neurons?
Yes, you can! Actually that's what the Universal Approximation Theory says, in short: the feed-forward network with a single hidden-layer can approximate any continuous function. However, it does not say anything about number of neurons in this layer (which can be very high) and the ability to algorithmicaly optimize weights of such a network. All it says is that such network exists.
Here is the link to the original publication by Cybenko, who used sigmoid activation function for the proof: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.441.7873&rep=rep1&type=pdf
And here is more friendly derivation: http://mcneela.github.io/machine_learning/2017/03/21/Universal-Approximation-Theorem.html