How to customize data loader of yolov5 to train on VisDrone dataset? - deep-learning

I am new to deep learning. I am doing a school project where I am trying to train a YOLOv5 model on VisDrone dataset. My training set has 4911 images and validation set has more than 3000 images but less than 4000. I am using google Colab pro. As far as I know it has 32gb ram capacity and GPU VRAM=15-16 GB. If I let the model load data automatically it's showing that "cuda out of memory". What strategy can I take to slove this problem? Should I customize the dataloader which is dataloaders.py file. How do I do that?

Usually cuda out of memory occurs due to batch size, it is much better if you let the trainer decide the batchsize it self. To do this replace line number 442 in train.py from:
parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')
to this:
parser.add_argument('--batch-size', type=int, default=-1, help='total batch size for all GPUs, -1 for autobatch')
This should solve the out of memory issue.

Related

is there any way to use RAM and VRAM at the same time on Pytorch?

When I use Keras , It loads the input dataset on RAM a
and loads networks on VRAM, so that I could train vary large network.
However I cannot find the way to do so with pytorch
and this makes me using so small memory I guess.
Is there any way to use both RAM on pytorch ?

How to set (limit) the number of cpu cores used by YOLO on darknet

I am using Darknet to train image data for object detection using YOLOv3 on a computer I share with others. Is there a way to limit the number of cores (CPUs) used by the network to avoid being ostracized by my colleagues? :)

How to extend tensorflow's GPU memory from system RAM

I want to train googles object detection with faster_rcnn_with resnet101 using mscoco datasetcode. I used only 10,000 images for training purpose.I used graphics: GeForce 930M/PCIe/SSE2. NVIDIA Driver Version:384.90. here is the picture of my GeForce.
And I have 8Gb RAM but in tensorflow gpu it is showed 1.96 Gb.
. Now How can I extend my PGU's RAM. I want to use full system memory.
You can train on the cpu to take advantage of the RAM on your machine. However, to run something on the gpu it has to be loaded to the gpu first. Now you can swap memory in and out, because not all the results are needed at any step. However, you pay with a very long training time and I would rather advise you to reduce the batch size. Nevertheless, details about this process and implementation can be found here: https://medium.com/#Synced/how-to-train-a-very-large-and-deep-model-on-one-gpu-7b7edfe2d072.

Running text classification - CNN on GPU

Based on this github link https://github.com/dennybritz/cnn-text-classification-tf , I want to classified my datasets on Ubuntu-16.04 on GPU.
For running on GPU, I've been changed line 23 on text_cnn.py to this : with tf.device('/gpu:0'), tf.name_scope("embedding"):
my first dataset for train phase has 9000 documents and it's size is about 120M and
second one for train has 1300 documents and it's size is about 1M.
After running on my Titan X server with GPU, I have got errors.
Please guide me, How can I solve this issue?
Thanks.
You are getting Out of Memory error, so first thing to try is smaller batch size
(default is 64). I would start with:
./train.py --batch_size 32
Most of the memory is used to hold the embedding parameters and convolution parameters. I would suggest reduce:
EMBEDDING_DIM
NUM_FILTERS
BATCH_SIZE
try embedding_dim=16, batch_size=16 and num_filters=32, if that works, increase them 2x at a time.
Also if you are using docker virtual machine to run tensorflow, you might be limited to use only 1G of memory by default though you have 16G memory in your machine. see here for more details.

Is it possible to reduce the number of images in batch in convert_imageset.cpp to tackle out of memory of GPU?

I was trying to run fcn on my data in caffe. I was able to convert my image sets into lmdb by convert_imageset builtin function caffe. However, once I wanted to train the net, it gave me the following error:
Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
.....
Aborted (core dumped)
I went through many online resources to solve the memory failure, but most of them suggesting reducing batch size. Even, I reduced the size of images to 256x256. I could not tackle this issue yet.
I checked the memory of GPU by this command nvidia-smi, and the model is Nvidia GT 730 and the memory is 1998 MiB. Since the batch size in train_val.prototxt is 1, I can not do anythin in train_val.prototxt. So my questions are:
By looking at log file in Terminal,I realized that whenever convert_imageset converting the data into LMDB, it is taking 1000 image in a group. Is it possible I change this number in line 143 and 151 of convert_imageset.cpp to a smaller (for example 2; to take two image at a time), recompile caffe, and then convert images to lmdb by using convert_imageset? Does it make sense?
If the answer to question 1 is yes, how can I compile caffe again,
should I remove build folder and again do caffe installation from
scratch?
How caffe process the LMDB data? Is it like taking a batch of those 1000 images showing while running convert_imagenet?
Your help is really appreciated.
Thanks...
AFAIK, there is no effect for the number of entries committed to lmdb at each transaction (txn->Commit();) on the cuda out of memory.
If you do want to re-compile caffe for whatever reason, simply run make clean. This will clear everything and let you re-compile from scratch.
Again, AFAIK, caffe access lmdb batch_size images at a time regardless of the size of the transaction size used when writing the dataset.
Are you sure batch_size is set to 1 for both TRAIN and TEST phases?