Running text classification - CNN on GPU - deep-learning

Based on this github link https://github.com/dennybritz/cnn-text-classification-tf , I want to classified my datasets on Ubuntu-16.04 on GPU.
For running on GPU, I've been changed line 23 on text_cnn.py to this : with tf.device('/gpu:0'), tf.name_scope("embedding"):
my first dataset for train phase has 9000 documents and it's size is about 120M and
second one for train has 1300 documents and it's size is about 1M.
After running on my Titan X server with GPU, I have got errors.
Please guide me, How can I solve this issue?
Thanks.

You are getting Out of Memory error, so first thing to try is smaller batch size
(default is 64). I would start with:
./train.py --batch_size 32

Most of the memory is used to hold the embedding parameters and convolution parameters. I would suggest reduce:
EMBEDDING_DIM
NUM_FILTERS
BATCH_SIZE
try embedding_dim=16, batch_size=16 and num_filters=32, if that works, increase them 2x at a time.
Also if you are using docker virtual machine to run tensorflow, you might be limited to use only 1G of memory by default though you have 16G memory in your machine. see here for more details.

Related

How to customize data loader of yolov5 to train on VisDrone dataset?

I am new to deep learning. I am doing a school project where I am trying to train a YOLOv5 model on VisDrone dataset. My training set has 4911 images and validation set has more than 3000 images but less than 4000. I am using google Colab pro. As far as I know it has 32gb ram capacity and GPU VRAM=15-16 GB. If I let the model load data automatically it's showing that "cuda out of memory". What strategy can I take to slove this problem? Should I customize the dataloader which is dataloaders.py file. How do I do that?
Usually cuda out of memory occurs due to batch size, it is much better if you let the trainer decide the batchsize it self. To do this replace line number 442 in train.py from:
parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')
to this:
parser.add_argument('--batch-size', type=int, default=-1, help='total batch size for all GPUs, -1 for autobatch')
This should solve the out of memory issue.

TensorFlow strange memory usage

I'm on an Ubuntu 19.10 machine (with KDE desktop environment) with 8GB of RAM, an i5 8250u and an MX130 gpu (2GB VRAM), running a Jupyter Notebook with tensorflow-gpu.
I was just training some models to test their memory usage, and I can't see any sense in what I'm looking at. I used KSysGUARD and NVIDIA System Monitor (https://github.com/congard/nvidia-system-monitor) to monitor my system during training.
As I hit "train", on NVIDIA S.M. show me that memory usage is 100% (or near 100% like 95/97%) the GPU usage is fine.
Always in NVIDIA S.M., I look at the processes list and "python" occupies only around 60MB of vram space.
In KSysGUARD, python's memory usage is always around 700mb.
There might be some explanation for that, the problem is that the gpu's memory usage hits 90% with a model with literally 2 neurons (densely connected of course xD), just like a model with 200million parameters does. I'm using a batch size of 128.
I thought around that mess, and if I'm not wrong, a model with 200million parameters should occupy 200000000*4bytes*128 bytes, which should be 1024gb.
That means I'm definitely wrong on something, but I'm too selfless to keep that riddle for me, so I decided to give you the chance to solve this ;D
PS: English is not my main language.
Tensorflow by default allocates all available VRAM in the target GPU. There is an experimental feature called memory growth that let's you control that, basically stops the initialization process from allocating all VRAM and does it when there is a need for it.
https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth

what is the issue with rapidminer hbos process ?

I have a dataset of more than 1 million transactions running on windows with 32 GB RAM with HBOS algorithm.
The issue is we are getting an Out of Memory Error.
Can anyone help.
the HBOS algorithm can be quite memory intensive and the memory needed grows with the number of attributes used. So first of all, reducing the number of attributes might help.
But I couldn't reproduce your error. Perhaps you should reduce the max memory used by RapidMiner (under Settings -> Preferences -> System). The JVM always needs a slight overhead, so running RapidMiner with 30GB max memory should be safe.

How to extend tensorflow's GPU memory from system RAM

I want to train googles object detection with faster_rcnn_with resnet101 using mscoco datasetcode. I used only 10,000 images for training purpose.I used graphics: GeForce 930M/PCIe/SSE2. NVIDIA Driver Version:384.90. here is the picture of my GeForce.
And I have 8Gb RAM but in tensorflow gpu it is showed 1.96 Gb.
. Now How can I extend my PGU's RAM. I want to use full system memory.
You can train on the cpu to take advantage of the RAM on your machine. However, to run something on the gpu it has to be loaded to the gpu first. Now you can swap memory in and out, because not all the results are needed at any step. However, you pay with a very long training time and I would rather advise you to reduce the batch size. Nevertheless, details about this process and implementation can be found here: https://medium.com/#Synced/how-to-train-a-very-large-and-deep-model-on-one-gpu-7b7edfe2d072.

Is it possible to reduce the number of images in batch in convert_imageset.cpp to tackle out of memory of GPU?

I was trying to run fcn on my data in caffe. I was able to convert my image sets into lmdb by convert_imageset builtin function caffe. However, once I wanted to train the net, it gave me the following error:
Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
.....
Aborted (core dumped)
I went through many online resources to solve the memory failure, but most of them suggesting reducing batch size. Even, I reduced the size of images to 256x256. I could not tackle this issue yet.
I checked the memory of GPU by this command nvidia-smi, and the model is Nvidia GT 730 and the memory is 1998 MiB. Since the batch size in train_val.prototxt is 1, I can not do anythin in train_val.prototxt. So my questions are:
By looking at log file in Terminal,I realized that whenever convert_imageset converting the data into LMDB, it is taking 1000 image in a group. Is it possible I change this number in line 143 and 151 of convert_imageset.cpp to a smaller (for example 2; to take two image at a time), recompile caffe, and then convert images to lmdb by using convert_imageset? Does it make sense?
If the answer to question 1 is yes, how can I compile caffe again,
should I remove build folder and again do caffe installation from
scratch?
How caffe process the LMDB data? Is it like taking a batch of those 1000 images showing while running convert_imagenet?
Your help is really appreciated.
Thanks...
AFAIK, there is no effect for the number of entries committed to lmdb at each transaction (txn->Commit();) on the cuda out of memory.
If you do want to re-compile caffe for whatever reason, simply run make clean. This will clear everything and let you re-compile from scratch.
Again, AFAIK, caffe access lmdb batch_size images at a time regardless of the size of the transaction size used when writing the dataset.
Are you sure batch_size is set to 1 for both TRAIN and TEST phases?