I am doing a visualization to get to know what CNNs learned in an image.
When I visualize the Convd2d, it does seem CNNs are learning some features. However, when I visualize the output of activation layer (here is ReLU), it shows nothing.
Conv2d visualization
Activation layer visualization
I use the Sequential of Conv2d, BatchNorm and ReLU.
Related
I have a problem in which I have a series of observations, each of which is a graph of the same structure, but with different node features. I would like to learn a flat embedding of each graph of size 32x1.
My thought was to do this with an autoencoder. This would take the input graph, apply some graph convolutions, use a dense layer to map the graph to a 32x1 latent space, and then reconstruct the graph (using the same common structure) before applying a few more convolutions.
As far as I am aware, this is in contrast to the typical graph autoencoder framework, in which the latent representation is a graph of the same structure as the input but with latent representations of each nodes' features.
For this reason, I am not sure how to implement such an architecture using PyTorch Geometric. Namely, I am unsure how I go from the flat latent space back to a graph.
Is this possible, and if so, roughly how would I do so?
I am trying to understand the training process of a object deetaction deeplearng algorithm and I am having some problems understanding how the backbone network (the network that performs feature extraction) is trained.
I understand that it is common to use CNNs like AlexNet, VGGNet, and ResNet but I don't understand if these networks are pre-trained or not. If they are not trained what does the training consist of?
We directly use a pre-trained VGGNet or ResNet backbone. Although the backbone is pre-trained for classification task, the hidden layers learn features which can be used for object detection also. Initial layers will learn low level features such as lines, dots, curves etc. Next layer will learn learn high-level features that are built on top of low-level features to detect objects and larger shapes in the image.
Then the last layers are modified to output the object detection coordinates rather than class.
There are object detection specific backbones too. Check these papers:
DetNet: A Backbone network for Object Detection
CBNet: A Novel Composite Backbone Network Architecture for Object Detection
DetNAS: Backbone Search for Object Detection
High-Resolution Network: A universal neural architecture for visual recognition
Lastly, the pretrained weights will be useful only if you are using them for similar images. E.g.: weights trained on Image-net will be useless on ultrasound medical image data. In this case we would rather train from scratch.
What is the role of fully connected layer (FC) in deep learning? I've seen some networks have 1 FC and some have 2 FC and some have 3 FC. Can anyone explain to me?
Thanks a lot
The fully connected layers are able to very effectively learn non-linear combinations of input features. Let's take a convolutional neural network for example.
The output from the convolutional layers represents high-level features in the data. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features.
Essentially the convolutional layers are providing a meaningful, low-dimensional, and somewhat invariant feature space, and the fully-connected layer is learning a (possibly non-linear) function in that space.
I read the Xception paper and in this paper it was mentioned in part 4.7 that best results are achivable without any activation. Now I want to use this network on videos using keras toolbox but the model in keras uses 'ReLU' activation function. Does the model in keras returns best model or it is better to omit the relu layers?
You are confusing normal activations used for convolutional and dense layers, with the ones mentioned in the paper. Section 4.7 only deals with varying the activation between depth-wise and point-wise convolutions, the rest of the activations in the architecture are kept unchanged.
I am using a pre-trained model which I want to add Elementwise layer that products the output of two layers: one layer is output of convolution layer 1x1x256x256 and the other is also the output of convolution layer 1x32x256x256. My question is: If we add elementwise layer for multiplying two layers and sending to the next layer, should we train from the scratch because the architecture is modified or still it is possible to use the pretrained model?
Thanks
Indeed making architectural changes puts the learned features at odds.
However, there's no reason not to use the learned weight for layers below the change -- these layers are not affected by the change, so they can benefit from the initialization.
As for the rest of the layers, I suppose init from trained weights should not be worse than random, So why not?
Don't forget to init any new layers with random weights (the default in caffe is zero - and this might cause trouble for learning).