Can transformer replace graph neural networks in node-level tasks? - deep-learning

Is GNN a special transformer?
Just a thoughtful question.

Related

Initializing the forget gate in LSTM network using Pytorch

I have read that regulating the bias term is important to improve the performance of LSTM networks. Here are some sources:
https://www.exxactcorp.com/blog/Deep-Learning/5-types-of-lstm-recurrent-neural-networks-and-what-to-do-with-them
http://proceedings.mlr.press/v37/jozefowicz15.pdf
Does anyone know how to actually implement this in Pytorch?

Can the attentional mechanism be applied to structures like feedforward neural networks?

Recently, I have learned decoder-encoder network and attention mechanism, and found that many papers and blogs implement attention mechanism on RNN network.
I am interested if other networks can incorporate attentional mechanisms.For example, the encoder is a feedforward neural network and decoder is an RNN. Can feedforward neural networks without time series use attentional mechanisms? If you can, please give me some suggestions.Thank you in advance!
In general Feed forward networks treat features as independent; convolutional networks focus on relative location and proximity; RNNs and LSTMs have memory limitations and tend to read in one direction.
In contrast to these, attention and the transformer can grab context about a word from distant parts of a sentence, both earlier and later than the word appears, in order to encode information to help us understand the word and its role in the system called sentence.
There is a good model for feed-forward network with attention mechanism here:
https://arxiv.org/pdf/1512.08756.pdf
hope to be useful.
Yes it is possible to use attention / self- attention / multi-head attention mechanisms to other feed forward networks. It is also possible to use attention mechanisms with CNN based architectures i.e which part of images should be paid more attention while predicting another part of an image. The mail idea behind attention is giving weight to all the other inputs while predicting a particular output or how we correlate words in a sentence for a NLP problem . You can read about the really famous Transformer architecture which is based on self-attention and has no RNN in it.
For getting a gist of different type of attention mechanism you can read this blog.

What is the role of fully connected layer in deep learning?

What is the role of fully connected layer (FC) in deep learning? I've seen some networks have 1 FC and some have 2 FC and some have 3 FC. Can anyone explain to me?
Thanks a lot
The fully connected layers are able to very effectively learn non-linear combinations of input features. Let's take a convolutional neural network for example.
The output from the convolutional layers represents high-level features in the data. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features.
Essentially the convolutional layers are providing a meaningful, low-dimensional, and somewhat invariant feature space, and the fully-connected layer is learning a (possibly non-linear) function in that space.

conditional random field in semantic segmentation

Are CRF (Conditional Random Fields) still actively used in semantic segmentation tasks or do the current deep neural networks made them unnecessary ?
I've seen both of the answers in academic papers and, since it seems quite complicated to implement and infer, I would like to have opinions on them before trying them out.
Thank you
The CRFs are still used for the tasks of image labeling and semantic image segmentation along with the DNNs. In fact, CRFs and DNNs are not self-excluding techniques and a lot of recent publications use both of them.
CRFs are based on probabilistic graphical models, where graph nodes and edges represent random variables, initialized with potential functions. DNN can be used as such potential function:
Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation
Conditional Random Fields as Recurrent Neural Networks
Brain Tumor Segmentation with Deep Neural Network (Future Work Section)
DCNN may be used for the feature extraction process, which is an essential step in applying CRFs:
Environmental Microorganism Classification Using Conditional Random Fields and Deep Convolutional Neural Networks
Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation
There are also toolkits, combining both CRFs and DNNs:
Direct graphical models C++ library

Difference Between keras.layer.Dense(32) and keras.layer.SimpleRNN(32)?

What is the difference between keras.layer.Dense() and keras.layer.SimpleRNN()? I do understand what is Neural Network and RNN, but with the api the intuition is just not clear.? When I see keras.layer.Dense(32) I understand it as layer with 32 neurons. But not really clear if SimpleRNN(32) means the same. I am a newbie on Keras.
How Dense() and SimpleRNN differ from each other?
Is Dense() and SimpleRNN() function same at any point of time?
If so then when and if not then what is the difference between SimpleRNN() and Dense()?
Would be great if someone could help in visualizing it?
What's exactly happening in
https://github.com/fchollet/keras/blob/master/examples/addition_rnn.py
Definitely different.
According to Keras Dense Dense implements the operation: output = activation(dot(input, kernel) + bias), it is a base architecture for neural network.
But for SimpleRNN, Keras SimpleRNN Fully-connected RNN where the output is to be fed back to input.
The structure of neural network and recurrent neural network are different.
To answer your question:
The difference between Dense() and SimpleRNN is the differences between traditional neural network and recurrent neural network.
No, they are just define structure for each network, but will work in different way.
Then same as 1
Check resources about neural network and recurrent neural network, there are lots of them on the internet.