I'm trying to modify the weights of a caffemodel which is part of a caffe-branch called Deep Lab. Although there is a tutorial on how to do net surgery, when I try to do the same with my custom caffemodel the python kernel dies always on the following line:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/deeplab/train.prototxt',
'../models/deeplab/train.caffemodel',
caffe.TRAIN)
I think its because pycaffe doesn't know their custom layers such as ImageSegData, Silence and SegAccuracy so I removed these layers from the prototxt file, but still the python kernel keeps on dying when I try to load the network model. Does anyone know how to load these weights into python?
I found it already. I had literally to remove every custom layer and especially adapt the data layer such that it could read all the input images and thereby calculate the input dimensions.
Related
I try to run a code written by Pytorch Lightning. I want to run it on one machine multi GPUs.
##The running setting is
--gpus=1,2,3,4
--strategy=ddp
There is no problem when training. But when we validate the model after one epoch training. It still runs on multi-GPUs(multi-processings), so that, the validation dataset will be split and assigned to different GPUs. So when the code try to write the prdict file and compute the scores with the gold file, it will have problems. Source Code
So I just want to shut down the ddp when I validate the model. Just run it on local_rank 0.
I have trained a model of YOLOv4 by using my original dataset and the custom yolov4 configuration file, which I will refer to as my 'base' YOLOv4 model.
Now I want to use this base model that I have created to train the model again using images that I have manually augmented. I am trying to retrain my models to try and increase the mAP and AP. So I want to use the weights from my base model to train a new yolov4 model with the manually augmented images.
I have seen on the YOLOv4 wiki page that using stopbackward = 1 freezes the layers so weights in these layers would not be updated, however this reduces accuracy. Also there was another piece of information that I read where ./darknet partial cfg/yolov4.cfg yolov4.weights yolov4.conv.137 137 takes out the first 137 layers. Does this mean that the first 137 layers are frozen in the network or does this mean you are only training on the 137 layers?
My questions are:
Which code actually does freeze layers so I can do transfer learning
on the base YOLOv4 model I have created?
Which layers would you recommend freezing,the first 137
before the first YOLO layer in the network?
Thank you in advance!
To answer your questions:
If you want to use transfer learning, you don't have to freeze any layers. You should simply start training with the weights you have stored from your first run. So instead of darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 you can run darknet.exe detector train data/obj.data yolo-obj.cfg backup/your_weights_file. The weights are stored in the backup folder build\darknet\x64\backup\. So for example, the command could look like this: darknet.exe detector train data/obj.data yolo-obj.cfg backup/yolov4_2000.weights
Freezing layers can save time during training. What is a good solution is to first train the model with the first layers frozen, and later unfreeze the layers to finetune your learning. I am not sure what is a good amount of layers to freeze in the first run, maybe can you test it with trial and error.
The command "./darknet partial cfg/yolov4.cfg yolov4.weights yolov4.conv.137 137" dumps the weights from the first 137 layers in "yolov4.weights" into the file "yolov4.conv.137", and has nothing to do with training.
What is going wrong with this code? I have generated adversarial images using cleverhans API - generate_np method. And using the default cleverhans CNN classifier to classify the images. The test accuracy is very low as expected when I use the model after generating the images. But if I save and reload the model, the accuracy is too high. Please check the code here.
https://github.com/csesivakumar/Adversarial_Defense/blob/master/Cleverhans_generatenp.ipynb
Python: 3.6
Pasting my answer from the GitHub issue tracker in case others are facing the same issue:
From your code it looks like you are initializing the model's weights, defining the tf session, etc... after having trained the model using Keras. My guess is that the adv_x array does not contain images that are adversarial. This would explain why the accuracy output by [22] is close to random---because the model weights are random. When you restore the model, its weights are set again to the values learned during training so the accuracy is restored (because the images are not adversarial).
In Caffe, what happens if I change some parameters in the solver or train prototxt while training a network using the given files (and e.g. run another training using the updated solver/train prototxt)? Does it affect the training or is the content of the files loaded in the beginning and the training is unaffected by the later changes?
Prototexts are read from disk the moment you call caffe train from command line or caffe.Net/caffe.get_solver via Python interface, and never again. The solver or network is instantiated using those parameters, and any further changes to the files are irrelevant (until you manually reload, of course).
When training a network, the snapshots taken every N iterations come in two forms together. One is the .solverstate file, which I presume is exactly what it sounds like, storing the state of the loss functions and gradients, etc. The other is the .caffemodel file which I know stores the trained parameters.
The .caffemodel is the file you need if you want a pre-trained model, so I imagine it's also the file you want if you are going to test your network.
WWhat is the .solverstate good for? In this tutorial it looks like you can restart training from it, but how does that differ than using the .caffemodel? Does .solverstate also include the same info as .caffemodel? Put another way, is .caffemodel just a subset of .solverstate?
The solverstate file, as its name conveys, stores the state of the solver and not any information related to classification results. The model is saved as caffemodel file, which you can use to obtain classification results for your data. If you want to fine-tune your network you may use a pre-trained caffemodel file. This will save time as your network does not need to learn from scratch. But, in case your present training needs to be halted, due to a power cut or an unexpected reboot, you may resume your training form the previous snapshot of the solverstate. The difference between using the solverstate and the caffemodel files is that the former allows you to complete your training in the pre-determined manner while the latter may require changes in certain training parameters such as the maximum number of iterations.