I want to overfit my model on a mini-batch of data, to see if my model is correct. My dataset is in lmdb format. The data layer automatically update when I perform solver.step(). How can I avoid solver from loading new data in Caffe?
I use this with the Pycaffe interface:
if overfit: lmdb_cursor.first()
I have a flag (overfit) that when set calls this method which resets the cursor back to the beginning of the database for each batch. Hope this helps.
Related
I try to run a code written by Pytorch Lightning. I want to run it on one machine multi GPUs.
##The running setting is
--gpus=1,2,3,4
--strategy=ddp
There is no problem when training. But when we validate the model after one epoch training. It still runs on multi-GPUs(multi-processings), so that, the validation dataset will be split and assigned to different GPUs. So when the code try to write the prdict file and compute the scores with the gold file, it will have problems. Source Code
So I just want to shut down the ddp when I validate the model. Just run it on local_rank 0.
I have trained a model of YOLOv4 by using my original dataset and the custom yolov4 configuration file, which I will refer to as my 'base' YOLOv4 model.
Now I want to use this base model that I have created to train the model again using images that I have manually augmented. I am trying to retrain my models to try and increase the mAP and AP. So I want to use the weights from my base model to train a new yolov4 model with the manually augmented images.
I have seen on the YOLOv4 wiki page that using stopbackward = 1 freezes the layers so weights in these layers would not be updated, however this reduces accuracy. Also there was another piece of information that I read where ./darknet partial cfg/yolov4.cfg yolov4.weights yolov4.conv.137 137 takes out the first 137 layers. Does this mean that the first 137 layers are frozen in the network or does this mean you are only training on the 137 layers?
My questions are:
Which code actually does freeze layers so I can do transfer learning
on the base YOLOv4 model I have created?
Which layers would you recommend freezing,the first 137
before the first YOLO layer in the network?
Thank you in advance!
To answer your questions:
If you want to use transfer learning, you don't have to freeze any layers. You should simply start training with the weights you have stored from your first run. So instead of darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 you can run darknet.exe detector train data/obj.data yolo-obj.cfg backup/your_weights_file. The weights are stored in the backup folder build\darknet\x64\backup\. So for example, the command could look like this: darknet.exe detector train data/obj.data yolo-obj.cfg backup/yolov4_2000.weights
Freezing layers can save time during training. What is a good solution is to first train the model with the first layers frozen, and later unfreeze the layers to finetune your learning. I am not sure what is a good amount of layers to freeze in the first run, maybe can you test it with trial and error.
The command "./darknet partial cfg/yolov4.cfg yolov4.weights yolov4.conv.137 137" dumps the weights from the first 137 layers in "yolov4.weights" into the file "yolov4.conv.137", and has nothing to do with training.
I am using Theano with keras. I have a trained DNN and I have dumped the weight's in a file. I am performing some operations on these weights and again dumping the new converted weights into another file.
Now, I am loading my DNN model with these converted weights and want to compare the results between the two.
I used the keras.evaluate method but I find the accuracy to be exactly same even though the weights are different.
Is there another approach with which I can compare the accuracy?
Thanks.
Keras performs some under the hood operations for your batch_size including normalization. So if you only scaled and translated your image the result will stay the same.
Anyways you can do model.predict(sample, 1) and write your own evaluation metric to circumvent this issue.
I'm trying to modify the weights of a caffemodel which is part of a caffe-branch called Deep Lab. Although there is a tutorial on how to do net surgery, when I try to do the same with my custom caffemodel the python kernel dies always on the following line:
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/deeplab/train.prototxt',
'../models/deeplab/train.caffemodel',
caffe.TRAIN)
I think its because pycaffe doesn't know their custom layers such as ImageSegData, Silence and SegAccuracy so I removed these layers from the prototxt file, but still the python kernel keeps on dying when I try to load the network model. Does anyone know how to load these weights into python?
I found it already. I had literally to remove every custom layer and especially adapt the data layer such that it could read all the input images and thereby calculate the input dimensions.
I'm working with Caffe and am interested in comparing my training and test errors in order to determine if my network is overfitting or underfitting. However, I can't seem to figure out how to have Caffe report training error. It will show training loss (the value of the loss function computed over the batch), but this is not useful in determining if the network is overfitting/underfitting. Is there a straightforward way to do this?
I'm using the Python interface to Caffe (pycaffe). If I could get access to the raw training set somehow, I could just put batches through with forward passes and evaluate the results. But, I can't seem to figure out how to access more than the currently-processing batch of training data. Is this possible? My data is in a LMDB format.
In the train_val.prototxt file change the source in the TEST phase to point to the training LMDB database (by default it points to the validation LMDB database) and then run this command:
$ ./build/tools/caffe test -solver models/bvlc_reference_caffenet/solver.prototxt -weights models/bvlc_reference_caffenet/<caffenet_train_iter>.caffemodel -gpu 0