As I understand Resnet has some identity layer that their task is to create the output as the same as the input of the layer. but what is the use of this work? What is the benefit to add layers like this?
Any help will be appreciated
The sole purpose of creating ResNet architecture was to fix the problem of degrading/ saturating accuracy in deeper network which was caused by vanishing gradients as a primary reason . Identity layer or skip connections help prevent this problem since it is very easy for a layer to learn a linear function where input is equal to the output i.e f(x) = x . Now ResNet performed a lot better than other architectures and one reason as specified by Andrew Ng in his course is that skip connections learn the function f(x) = x very easily and if you are lucky then they sometimes learn that function plus other features which is beneficial for the network in extracting out final features.
Related
assumed I have a trained model of torch.nn.Module, and I only need it from now on for evaluation.
Is PyTorch passing my data through all the layers or is it compressing the model so that it only calculates an equivalent function?
If not, is there a way to do it in order to make the calculation faster and the model lighter in memory terms?
I have been looking on the internet for a similar question and didn't find any suitable answer.
You should do two things during inference: set your model to evaluation mode with model.eval() and use no_grad() (disables gradient computation). This will make your code faster and more memory-efficient.
In practice this will look like
model.eval()
with torch.no_grad():
#your inference code here
There are many options and it depends on your specific case.
One option is to convert to TorchScript.
Another option is to do quantization on the model.
Third, you could perform knowledge distillation, transferring the knowledge from your existing model to a smaller one.
Currently, for detection (localisation + recognition tasks) we use mainly deep learning algorithm in computer vision. Two types of detector exist :
one stage : SSD, YOLO, retinanet, ...
two stage : RCNN, Fast RCNN and faster RCNN for example
Using these detectors on very small objects (10 pixels for example) is a very challenging tasks and it seems the one stage algorithm are worse than the two stage algorithm. But I do not really understand why it works better on Faster RCNN for example. In fact, the one and two stage detector use both of them the anchor concept, and most of them use the same backbone like VGG16 or resnet50/resnet101. That means the receptive fields is the same. For example, I tried to detect very small object on retinanet and on faster RCNN. On retinanet, small object are not detected contrary to faster rcnn. I do not understand why. What is the explication theoretically ? (same backbone : resnet50)
I think in general networks like retinaNet are trying to bridge the gap you mention.Usually in one stage networks we will have anchor boxes of varying scales in the feature maps produced by the Backbone net, These feature maps are produced by heavily down sampling the input image, A lot of information about small object might be lost while performing this operation.While this is the case with one stage detectors, In two stage detectors because of flexibility of the RPN network, The RPN network may still propose regions which are small and this may help it to perform slightly better than its one stage counterparts.
I don't think you should be very surprised that both of these might use the same backbone, After the conv features are extracted both networks use different methods to perform detection.
Hope this helps, Let me know if i wasn't clear enough,or you have questions.
I have implemented a custom openai gym environment for a game similar to http://curvefever.io/, but with discreet actions instead of continuous. So my agent can in each step go in one of four directions, left/up/right/down. However one of these actions will always lead to the agent crashing into itself, since it cant "reverse".
Currently I just let the agent take any move, and just let it die if it makes an invalid move, hoping that it will eventually learn to not take that action in that state. I have however read that one can set the probabilities for making an illegal move zero, and then sample an action. Is there any other way to tackle this problem?
You can try to solve this by 2 changes:
1: give current direction as an input and give reward of maybe +0.1 if it takes the move which does not make it crash, and give -0.7 if it make a backward move which directly make it crash.
2: If you are using neural network and Softmax function as activation function of last layer, multiply all outputs of neural network with a positive integer ( confidence ) before giving it to Softmax function. it can be in range of 0 to 100 as i have experience more than 100 will not affect much. more the integer is the more confidence the agent will have to take action for a given state.
If you are not using neural network or say, deep learning, I suggest you to learn concepts of deep learning as your environment of game seems complex and a neural network will give best results.
Note: It will take huge amount of time. so you have to wait enough to train the algorithm. i suggest you not to hurry and let it train. and i played the game, its really interesting :) my wishes to make AI for the game :)
Suppose there are parameters in the network I would like to change manually in pycaffe, rather than update automatically by the solver. For example, suppose we would like to penalize dense activations, this can be implemented as an additional loss layer. Across the training process, we would like to change the strength of this penalty by multiplying the loss with a coefficient that evolves in a pre-specified way. What would be a good way to do this in caffe? Is it possible to specify this in the prototxt definition? In the pycaffe interface?
Update: I suppose setting lr_mult and decay_mult to 0 might be a solution, but seems like a clumsy one. Maybe a DummyDataLayer providing the parameters as a blob would be a better option. But there is so little documentation that it's quite a struggle to write for someone new to caffe
Maybe this is a trivial question, but just in case someone else might be interested, here is a successful implementation I ended up using
In the layer proto def, set lr_mult and decay_mult to 0, which means that we neither want to learn or decay the parameters. Use filler to set initial values. To change the parameters in python during training of the network, use a statement like
net.param['name'][index].data[...] = something
Anybody knows how to change the learning rate lr_mult of a specific layer in CAFFE from the solver prototxt? I know there's base_lr, however I would like to target the rate of a specific layer, and doing it from the solver instead of the network prototxt.
Thanks!
Every layer that requiers learning (i.e convultional, fully-connected, etc.) has a specific lr_mult parameter that can be controlled specifically for that layer. lr_mult is a "multiplier on the global learning rate for this parameter."
Simply define or change the lr_mult for your layer in train_val.prototxt.
This is useful for fine-tuning, where you might want to have increased learning rate only for the new layer.
For more info check the caffe fine-tuning tutorial. (Note: it is a bit outdated and the deprecated term blobs_lr is used there instead of lr_mult)
EDIT: To my best knowledge it is not possible to define a layer-specific learning rate from the solver.prototxt. Hence, assuming the solver.prototxt limitation is not strict, I suggest a different method to achieve the same result.