How to free GPU memory in Pytorch CUDA - deep-learning

I am using Colab and Pytorch CUDA for my deep learning project and faced the problem of not being able to free up the GPU. I have read some related posts here but they did not work with my problem. Please guide me on how to free up the GPU memory.
Thank you in advance

Try this:
torch.cuda.empty_cache()
or this:
with torch.no_grad():
torch.cuda.empty_cache()

Related

Unable to ultilize full GPU allocation in Colab

I am trying to run a Deep Learning based Face Recognition Model. But when I run it on Google Colab, it uses only 1.12 GB of GPU out of 42 GB. I have enabled and checked all the configuration of colab and in the code using Pytorch wrapper.
Please help how can I use full resource in Colab.
I guess your issue was discussed here on Stackoverflow
"
The high memory setting in the screen controls the system RAM
rather than GPU memory. The command !nvidia-smi will show GPU memory.
For example:
.. and also was answered here in Google Colab docs
https://colab.research.google.com/notebooks/pro.ipynb

Resume Training Caffe using Different GPU?

Please forgive what may be an insane question.
My setup: two machines, one with a GTX1080, one with an RX Vega 64. Retraining/fine tuning a bvlc_googlenet model on the GTX1080.
If I build Caffe for the Vega 64, then can I take a snapshot from the GTX1080 machine and restart training on the Vega 64? Would this work in the sense that the training would continue in a normal manner?
What if I moved the GTX1080 snapshot to a Volta V100 in AWS? Would this work?
I know Caffe will to some degree abstract the hardware, but I don't know how well it can do that. I need the GTX 1080 for something else...
Thanks in advance!
To my knowledge, this should work without a problem. Weight files and training snapshots are just blobs of numbers, that you should be able to resume on other hardware (e.g. CPU/GPU), a different machine with a different operating system, or between 32 and 64 bit processes.

cuda sdk example simpleStreams in SDK 4.1 not working

I upgraded CUDA GPU computing SDK and CUDA computing toolkit to 4.1. I was testing simpleStreams programs, but consistently it is taking more time that non-streamed execution. my device is with compute capability 2.1 and i'm using VS2008,windows OS.
This sample constantly has issues. If you tweak the sample to have equal duration for the kernel and memory copy the overlap will improve. Normally breadth first submission is better for concurrency; however, on WDDM OS this sample will usually have better overlap if you issue the memory copy right after kernel launch.
I noticed this as well. I thought it was just me but I didn't notice any improvement and tried searching the forums but didn't find anyone else with the issue.
I also ran the source code in the Cuda By Example book (which is really helpful and I recommend you pick it up if you're serious about GPU programming).
Chapter 10 examples has the progression of examples showing how streams should be used.
http://developer.nvidia.com/content/cuda-example-introduction-general-purpose-gpu-programming-0
But comparing the,
1. non-streamed version(which is basically the single stream version)
2. the streamed (incorrectly queued asyncmemcpy and kernel launch)
3. the streamed (correctly queued asyncmemcpy and kernel launch)
I find no benefit in using cuda streams. It might be a win7 issue as I found some sources online discussing that win vista didn't support the cuda streams correctly.
Let me know what you find with the example I linked. My setup is: Win7 64bit Pro, Cuda 4.1, Dual Geforce GTX460 cards, 8GB RAM.
I'm pretty new to Cuda so may not be able to help but generally its very hard to help without you posting any code. If posting is not possible then I suggest you take a look at Nvidia's visual profiler. Its cross platform and can show you were your bottlenecks are.

Is there something like Hadoop, but based on GPU?

Is there something like Hadoop, but based on GPU? I would like to do some research on distributed computing. Thank you for your help!
Yik,
Mars: A framework for GPU MapReduce comes to mind. Is that what you're thinking? There are other examples using GPUs by a traditional Hadoop system, and running Hadoop on a GPU entirely.

Using High Level Shader Language for computational algorithms

So, I heard that some people have figured out ways to run programs on the GPU using High Level Shader Language and I would like to start writing my own programs that run on the GPU rather than my CPU, but I have been unable to find anything on the subject.
Does anyone have any experience with writing programs for the GPU or know of any documentation on the subject?
Thanks.
For computation, CUDA and OpenCL are more suitable than shader languages. For CUDA, I highly recommend the book CUDA by Example. The book is aimed at absolute beginners to this area of programming.
The best way I think to start is to
Have a CUDA Card from Nvidia
Download Driver + Toolkit + SDK
Build the examples
Read the Cuda Programming Guide
Start to recreate the cudaDeviceInfo example
Try to allocate memory in the gpu
Try to create a little kernel
From there you should be able to gain enough momentum to learn the rest.
Once you learn CUDA then OpenCL and other are a breeze.
I am suggesting CUDA because is the one most widely supported and tested.