Difference Between keras.layer.Dense(32) and keras.layer.SimpleRNN(32)? - deep-learning

What is the difference between keras.layer.Dense() and keras.layer.SimpleRNN()? I do understand what is Neural Network and RNN, but with the api the intuition is just not clear.? When I see keras.layer.Dense(32) I understand it as layer with 32 neurons. But not really clear if SimpleRNN(32) means the same. I am a newbie on Keras.
How Dense() and SimpleRNN differ from each other?
Is Dense() and SimpleRNN() function same at any point of time?
If so then when and if not then what is the difference between SimpleRNN() and Dense()?
Would be great if someone could help in visualizing it?
What's exactly happening in
https://github.com/fchollet/keras/blob/master/examples/addition_rnn.py

Definitely different.
According to Keras Dense Dense implements the operation: output = activation(dot(input, kernel) + bias), it is a base architecture for neural network.
But for SimpleRNN, Keras SimpleRNN Fully-connected RNN where the output is to be fed back to input.
The structure of neural network and recurrent neural network are different.
To answer your question:
The difference between Dense() and SimpleRNN is the differences between traditional neural network and recurrent neural network.
No, they are just define structure for each network, but will work in different way.
Then same as 1
Check resources about neural network and recurrent neural network, there are lots of them on the internet.

Related

How to compute the Hessian of a large neural network in PyTorch?

How to compute the Hessian matrix of a large neural network or transformer model like BERT in PyTorch? I know torch.autograd.functional.hessian, but it seems like it only calculates the Hessian of a function, but not a neural network. I also saw the answer in How to compute hessian matrix for all parameters in a network in pytorch?. The problem is, I want to compute the Hessian with respect to the weights, but for large neural networks, it is very inefficient to write it as a function of the weights. Is there a better way to do this? Any suggestion is appreciated. Thanks.
After sometime I finally found a new feature in pytorch nightly build that solves this problem. The details are described in this comment: https://github.com/pytorch/pytorch/issues/49171#issuecomment-933814662. The solution uses the function torch.autograd.functional.hessian and the new feature torch.nn.utils._stateless. Notice that you have to install the nightly version of pytorch to use this new feature.

Is batch normalization useful for small networks?

We know batch normalization (BN) speeds up training of deep neural networks. But does it help with small neural networks as well? I have been experimenting with a 6-layer convolutional-MLP network and I cannot see any benefit for BN in training this network.
Batch Normalization is generally used for very deep neural networks. The outputs of layers after multiple layers keep fluctuating with every mini-batch and the layer has to keep chasing a moving target.
However, for shallow neural networks, this is not much of a problem, since the fluctuations are within a narrow range and do not pose the problem of a moving target. So for shallow neural networks, you can choose to train without batch normalization too and it will work as expected.

How to Train End to End model with Part RNN and Part Graph Neural Network?

I am currently working on a project in which I need to use RNNs in part of a neural network. Essentially the RNN would take in a text of variable length and output a feature representation. This feature representation would then be clubbed with some more feature vectors and fed to a different graph neural network. Some loss would be calculated on the output of the graph neural network and this loss would be backpropagated across the entire network including the RNN and be used to train the entire end to end network.
However, I am not able to wrap my head around how I can use the RNN as a part of another different non-sequential model. I use PyTorch for most of my work.
Can anyone suggest any way in which I may address this problem. Or refer to any material which might be useful.
Thanks

What is the role of fully connected layer in deep learning?

What is the role of fully connected layer (FC) in deep learning? I've seen some networks have 1 FC and some have 2 FC and some have 3 FC. Can anyone explain to me?
Thanks a lot
The fully connected layers are able to very effectively learn non-linear combinations of input features. Let's take a convolutional neural network for example.
The output from the convolutional layers represents high-level features in the data. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features.
Essentially the convolutional layers are providing a meaningful, low-dimensional, and somewhat invariant feature space, and the fully-connected layer is learning a (possibly non-linear) function in that space.

How to train an end-to-end deep neural network (FCN) for MatConvNet?

MatConvNet support convolution transpose layer ('convt'), but I can not find an example in their source code nor in their documents. Are there any guidence?
I noticed there is an example given by https://github.com/vlfeat/matconvnet-fcn,
but it involved many irrelated things. I hope the example to be as simple as possible.