I'm new to deep learning, so maybe this is a silly question...
Do any adjustments need to be made for applying Grad-CAM on CNNs that use a Global Average Pooling (GAP) layer right before fully connected ones?
I understand that the GAP layer aggregates the activations of an intermediate layer in order to produce a compact representation of the image, removing information regarding the features location. Is this an obstacle to grad-cam backpropagation?
I imagine that for a CNN that uses, for example, a Max Pooling layer followed by a Flatten layer, o Grad-CAM is capable of retriving the exact location of the relevant features.
I'm sorry if it is a silly doubt, but I couldn't find the answer for it anywhere.
Thanks in advance!
I have been experimenting with grad-cam with some VGGNets and ResNets in different tasks. It could be something in my head, but apparently ResNet tends to highlight larger regions in the image. Both models classify correctly, but the ResNet activation map usually highlights a larger area.
Even in the original Grad-CAM paper, this also happens, as shown below. However, I can't find any comments about it, I would like to know why.
Grad-CAM for VGGNet
Grad-CAM for ResNet
Related
I am familiar with the principal how Overfeat works to not only classify but also localize an object in an image by only using convolutional layers instead of fully connected layers at the end. However, each tutorial or explanation that I read talks about alexnet or a very basic neural network consisting of a few consecutive convolutional layers followed by 2-3 Fully connected layers to classify an image. However my question goes as follow, is it possible to modify a more complex network such as ResNet or Inception which don't use the standard consecutive convolutional layer techniques as in Alexnet or VGG?
Thanks
Welcome, and yes. Looking at a very simplified diagram like this, everything to the left of the split "FC" ('fully connected', or 'dense') arrows can be any kind of (what is typically called an) image classification network, such as those in Keras Applications, which includes VGG, ResNet, Inception, Xception, etc. For these kinds of networks, the input is obviously an image, and the output is sometimes called a 'feature map' (although that's a bit silly---have a look at the output and you'll understand---as it's typically far more akin to a post-modernist map than to a cartographic one).
So the answer to your question is yes: put any kind of network you want before the 'overfeat' ending thing, whether custom or otherwise, but know that it's intended to be some general convolutional reductionist model like ResNet, Inception, etc. Any kind of network that takes an image in and spits out a pooled or flattened (1 dimensional) form of a 'feature map' of 3 dimensions is what's apparently intended for this 'overfeat' concept.
I've been thinking that adding noise to an image can prevent overfitting and also "increase" the dataset by adding variations to it. I'm only trying to add some random 1s to images that has shape (256,256,3) which uses uint8 to represent its color. I don't think that can affect the visualization at all (I showed both images with matplotlib and they seems almost the same) and has only ~0.01 mean difference in the sum of their values.
But it doesn't look to have its advances. After training for a long time it's still not as good as the one doesn't use noises.
Has anyone tried to use noise for image classification tasks like this? Is it eventually better?
I wouldn't go to add noise to your data. Some papers employ input deformations during training to increase robutness and convergence speed of models. However, these deformations are statistically inefficient (not just on image but any kind of data).
You can read Intriguing properties of Neural Networks from Szegedy et al. for more details (and refer to references 9 & 13 for papers that uses deformations).
If you want to avoid overfitting, you might be interested to read about regularization instead.
Yes you may add noise to extend your dataset and avoid overfitting your training set but make sure it is random otherwise your network will take this noise as something it should learn (and that's not something you want). I wouldn't use this method first to do that, I would first rotate and/or flip my samples.
However, your network should perform better or, at least, as well as your previous network.
First thing I would check is : How do you measure your performances ? What were your performances before and after ? And did you change anything else ?
There are a couple of works that deal with this problem. Because you make the training set harder the training error will be lower, however your generalization might be better. It has been shown that adding noise can have stability effects for training Generative Adversarial Networks (Adversarial Training).
For classification tasks it is not that cut and dry. Not many works have actually dealt with this topic. The closest one is to my best knowledge is this one from google (https://arxiv.org/pdf/1412.6572.pdf), where they show the limitation of using training without noise. They do report a regularization effect, but not actual better results than using other methods.
I did not have any choice except asking here. I have a lot of difficulties for a long time. I have not been to observe any output from FCN32 :(
I trained FCN32 on my data from scratch and always getting a black image. I added gaussian with std= 0.01 initialization for convolutional layers. But still I get black image.
I tried to add weighted loss layers. However, I was not successful to add it correctly. I am not good at python and c++.
My questions:
Is there any correct PR that it can easily include this layer?
My data has 5 classes that the proportion of classes differ from each other in different images. How can I create these weight matrices for each image?
I really appreciate any help. Please share if you know any resource/link/ or if I can get it from other networks' repositories.
What is exactly fully convolutaionl layer? I mean, why is it 'fully'? The wording in [Long] is quite confusing to me.
Is it because they never use fully connected layer? Or is it because the convolution layers obtained by the 'convolutionization' described in Figure 2 have their kernels cover their entire input regions?
Do you see the last part in this image " fully connected" in fully convolution network we remove this part. But then how can do classification since we already have many channels with big activation map ?
In the example you mentioned they do up-sampling and their cost function is to measure the error between the re-construed image (up-sampled) and the ground truth.
So why it is called fully convolution because it is just convolution there. spatial feature extraction.
The phrase comes from a blend of the phrases "fully connected layer" with "convolutional layer". You can think of it as a fully connected layer which acts on a sub-region of an image. Then, instead of getting a single output feature vector for the whole image, you get a set of vectors, each per its corresponding image part. Where the vectors are formed to produce a map, which is a reminiscent of convolutional feature maps.
I'm a student major in physics as well as CS. One of my tasks is to find the supernova. The discovery of supernova is tedious and tough.
Through contrast the picture now and before, then we may find some bright spot on the picture and that may be the supernova.
like this,
The picture has many noise, and there are always many ghost spot because of instability of the instruments, or other lights make the illusions.
However, the supernova has some obvious characteristics, it always show up around the fixed stars. The shape of light is circle. etc.There already some conventional methods used on that. But they don't have good performance.
So I wonder if it's worthwhile trying it on CNN.
Which kind of data can CNN do well on?
Thanks.
So I wonder if it's worthwhile trying it on CNN.
I think CNN is overkill for this problem.
Which kind of data can CNN do well on?
Data with complex localised relationships in the structure and a large number of features. You use a convolution across a local frame to learn the representation.
The problem you have is very simple. You don't have many parameters, i.e. colour is grayscale, representation of a supernova is all contained within the immediate vicinity of it's occurrence.
I think you would probably have much more success with some really simple algorithm such as:
Find all fixed stars
search for any big 'blobs' of light with specific parameters
search for any circles of light
These alone will massively reduce the computational size of the problem. From there, there are a number of ML approaches you could take.
CNNs are generally for very big data sets with highly complex non-linear relationships. This (may?) be a big data set but it is certainly not complex in this particular task.