Local and global features extraction - deep-learning

I have a 10-layered CNN architecture where 2 skip connections are there.
One is from 1st layer to the 3rd layer. And another skip is from the 1st to 10th layer as shown in the figure. Can we say skip-1 extracts global features and skip-2 extracts local features from the given input image?
Sequential CNN layered network with skip connections
As the network goes deeper it will extract global features. So I'm thinking that at the 9th layer we have global features of the image and skip-2 gives local features from the 1st layer. Is this correct?

Related

Is it possible to forward the output of a deep-learning network to another network with caffe / pycaffe?

I am using caffe, or more likely pycaffe to train and create my network. I am having a dataset with 5 labels at the end. I had the idea to create one network for each label that can just simply say the score for one class. After having then trained 5 networks I want to compare the outputs of the networks and which one has the highest score.
Sadly I do only know how to create one network , but not how to let them interact and moreover how to do something like a max function at the end. I add a picture to describe what I want to do.
Moreover, I do not know if this would have a better outcome than just a normal deep neuronal network.
I don't see what you expect to have as the input to this "max" function. Even if you use some sort of is / is not boundary training, your approach appears to be an inferior version of the softmax layer available in all popular frameworks.
Yes, you can build a multi-channel model, train each channel with a different data set, and then accept the most confident prediction -- but the result will take longer and be less accurate than a cooperative training pass. Your five channels wind up negotiating their boundaries after they've made other parametric assumptions.
Feed a single model all the information available from the outset; you'll get faster convergence and more accurate classification.

net surgery on a custom caffe model

I'm trying to modify the weights of a caffemodel which is part of a caffe-branch called Deep Lab. Although there is a tutorial on how to do net surgery, when I try to do the same with my custom caffemodel the python kernel dies always on the following line:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/deeplab/train.prototxt',
'../models/deeplab/train.caffemodel',
caffe.TRAIN)
I think its because pycaffe doesn't know their custom layers such as ImageSegData, Silence and SegAccuracy so I removed these layers from the prototxt file, but still the python kernel keeps on dying when I try to load the network model. Does anyone know how to load these weights into python?
I found it already. I had literally to remove every custom layer and especially adapt the data layer such that it could read all the input images and thereby calculate the input dimensions.

How can I create a classifier using the feature map of a CNN?

I intend to make a classifier using the feature map obtained from a CNN. Can someone suggest how I can do this?
Would it work if I first train the CNN using +ve and -ve samples (and hence obtain the weights), and then every time I need to classify an image, I apply the conv and pooling layers to obtain the feature map? The problem I find in this, is that the image I want to classify, may not have a similar feature map, and hence I wouldn't be able to find the distance correctly. As the order of the features may by different in the layer.
You can use the same CNN for classification if you used (for example) the cross entropy loss-(also known as softmax with loss). In this case, you should take the argmax of your last layer (the node with the highest score), and that would be the class given by the network. However, all the architectures used in machine learning would expect at testing time an input similar to those used during training.

Caffe snapshots: .solverstate vs .caffemodel

When training a network, the snapshots taken every N iterations come in two forms together. One is the .solverstate file, which I presume is exactly what it sounds like, storing the state of the loss functions and gradients, etc. The other is the .caffemodel file which I know stores the trained parameters.
The .caffemodel is the file you need if you want a pre-trained model, so I imagine it's also the file you want if you are going to test your network.
WWhat is the .solverstate good for? In this tutorial it looks like you can restart training from it, but how does that differ than using the .caffemodel? Does .solverstate also include the same info as .caffemodel? Put another way, is .caffemodel just a subset of .solverstate?
The solverstate file, as its name conveys, stores the state of the solver and not any information related to classification results. The model is saved as caffemodel file, which you can use to obtain classification results for your data. If you want to fine-tune your network you may use a pre-trained caffemodel file. This will save time as your network does not need to learn from scratch. But, in case your present training needs to be halted, due to a power cut or an unexpected reboot, you may resume your training form the previous snapshot of the solverstate. The difference between using the solverstate and the caffemodel files is that the former allows you to complete your training in the pre-determined manner while the latter may require changes in certain training parameters such as the maximum number of iterations.

Why does Convolutional network need multiple feature maps?

I am a beginner of deep learning. For convolutional networks such as lenet-5, there are 6 feature maps in the C1 layer. Each feature map is associated with a unique convolution kernel (5x5 matrix).
What is the difference between any 2 feature maps in the same layer? For a black-white image dataset like MNIST (without RGB), people still use 6 feature maps.
I guess, initially, the 6 convolution kernels are randomly generated 5x5 matrices. Therefore, when the same input image is projected to different feature maps, the output of feature maps will be different. And this is the main motivation, right?
Every filter in your convolutional layer extracts a specific feature from the input. One filter could be sensitive to horizontal edges while another is sensitive to vertical edges. A third filter may be sensitive to a triangular shape. You want the feature maps to be as different from each other as possible to avoid redundancy. Avoiding redundancy improves the network's capacity to as many variations in the data as possible.
Random initialization prevents learning duplicate filters.
Why 6 feature maps? This is a result of trying out other numbers of filters. Keep in mind that increasing the number of filters results in higher computational overhead and possibly overfitting (memorizing the training data but not good at classifying new images correctly). Another intuition for 6 is that there's not that much variation in pixels, you'll eventually extract more complex features in subsequent layers. 6 feature maps for C1 ended up working well for the MNIST dataset.