Residual Unit with ShakeDrop Depth regularization - deep-learning

Please I need Python codes for the implementation of a Residual Unit with ShakeDrop Depth regularization.
Here is the design
Design Architecture

Related

How graph neural networks work for molecular generation?

I am learning AI for the sake of applying it to the field of chemistry, specifically, molecular generation. I finished learning about how to generate novel molecules using RNN-based architectures such as GRU and LSTM. The process in these architectures is as follows:
the input is a character of a known molecule (represented in a string format), the model's task is to predict the next character, therefore the output is a softmax probability over all characters. Then the loss is computed as the cross-entropy between the predicted character and the real one.
I am now moving to graph neural networks as they seem to provide more advantages over RNN-based architectures. Although I did my research, I could not understand how they work for this task (i.e., molecule generation). I mean, similarly, what is the input, the output, and the loss function that we are trying to minimize in GNN-based molecular generation? Thanks in advance.

difference between accuracy from confusion matrix and validation accuracy

What is the difference between accuracy obtained from confusion matrix and validation accuracy obtained after several epochs?
I am new to deep learning
The validation accuracy (from each epoch) is the accuracy obtained from each trial (combination of parameters/weight). That is not to be used for reporting. It is just logging of the status of the training/learning process.
If you want to measure accuracy of a trained (fixed/frozen) model (be it deep learning or whatever algorithm learning), use a separate validation set and measure it using confusion matrix yourselve (there is metric package in sklearn).

Is the xception model in keras the best model was describe in the paper?

I read the Xception paper and in this paper it was mentioned in part 4.7 that best results are achivable without any activation. Now I want to use this network on videos using keras toolbox but the model in keras uses 'ReLU' activation function. Does the model in keras returns best model or it is better to omit the relu layers?
You are confusing normal activations used for convolutional and dense layers, with the ones mentioned in the paper. Section 4.7 only deals with varying the activation between depth-wise and point-wise convolutions, the rest of the activations in the architecture are kept unchanged.

I use caffe to train FCNs for semantic segmentation task, but always get all background result, why?

I'm using caffe to train FCNs8 for semantic segmentation task on pascal voc 2012 dataset from scratch, the train_val prototxt is given by caffe model zoo,
but no matter how I tried, adjust learning rate try other learning method, I always get the zero labels for all pixel which stand for background.
The output is all zero which stand for background
Have anyone meet this problem when you are training a segmentation network from scratch?
I also try to train other segmentation networks, such as ParseNet, SegNet from scratch, but the result are also all zero.
I trained on pascal voc2012 as well as augmented pascal dataset.

Caffe Autoencoder

I wanna compare the performance of CNN and autoencoder in caffe. I'm completely familiar with cnn in caffe but I wanna is the autoencoder also has deploy.prototxt file ? is there any differences in using this two models rather than the architecture?
Yes it also has a deploy.prototxt.
both train_val.prototxt and 'deploy.prototxt' are cnn architecture description files. The sole difference between them is, train_val.prototxt takes training data and loss as input/output, but 'deploy.prototxt' takes testing image as input, and predicted value as out put.
Here is an example of a cnn and autoencoder for MINST: Caffe Examples. (I have not tried the examples.) Using the models is generally the same. Learning rates etc. depend on the model.
You need to implement an auto-encoder example using python or matlab. The example in Caffe is not true auto-encoder because it doesn't set layer-wise training stage and during training stage, it doesn't fix W{L->L+1} = W{L+1->L+2}^T. It is easily to find a 1D auto-encoder in github, but 2D auto-encoder may be hard to find.
The main difference between the Auto encoders and conventional network is
In Auto encoder your input is your label image for training.
Auto encoder tries to approximate the output similar as input.
Auto encoders does not have softmax layer while training.
It can be used as a pre-trained model for your network which converge faster comparing to other pre-trained models. It is because your network has already extracted the features for your data.
The Conventional training and testing you can perform on pre trained auto encoder network for faster convergence and accuracy.