OpenAI Gym Agent trains with gpu on wsl but only on 20% gpu utilization - reinforcement-learning

I want to train my Agent with my gpu to be faster.
First of all it works but my gpu runs on only 20%.
Does anyone have a instruction on how to give my wsl more gpu power for that ?
Thanks a lot
I looked for wsl config but I didnt find something with gpu usage

Related

Running CUDA on a virtual machine without a physical NVidia GPU card

Is it possible to run a CUDA program on a virtual machine without having a physical NVidia GPU card on the host machine?
PCIe passthrough is only viable if the host machine has an NVidia card and that's not available.
One possible option to run CUDA programs without a GPU installed is to use an emulator/simulator (ex: http://gpgpu-sim.org/ ) but these simulators are usually limited.
I would appreciate a clear answer on that matter.
Thanks!
You can't run any modern version of CUDA (e.g. 6.0 or newer) unless you have actual GPU hardware available on the machine or virtual machine.
The various simulators and other methods all depend on very old versions of CUDA.

Nvidia CUDA Profiler's timeline contains many large gaps

I'm trying to profile my code using Nivida Profiler, but I'm getting strange gaps in the timeline as shown below:
Note: both kernels on the edges of the gaps are CudaMemCpyAsync (Host-to-Device)
I'm running on Ubuntu 14.04 with latest version of CUDA, 8.0.61 and latest Nvidia display driver.
Intel integrated graphics card is used in display not Nvidia. So, Nvidia Graphics card is only running the code, not anything else.
I've enabled CPU Profiling as well to check these gaps but nothing is shown!
Also, no Debugging options are enabled (-G nor -g)
and this is a "release build"
My laptop's specs:
Intel Core i7 4720HQ
Nvidia GTX 960m
16GB DDR3 Ram
1 TB Hard Drive
Is there anyway to trace what's happening in these empty time slots?
Thanks,
I'm afraid there are no automatic methods, but you can add custom traces in your code to find what's happening :
To do that you can use NVTX.
follow the links for some tutorials or documentation.
These profiling holes are probably due to data loading, memory allocations/initialisations done by the host between your kernels executions.

How to get maximum GPU memory usage for a process on Ubuntu? (For Nvidia GPU)

I have a server with Ubuntu 16.04 installed. It has a K80 GPU. Multiple processes are using the GPU.
Some processes have unpredictable GPU usage, and I want to reliably monitor their GPU usage.
I know that you can query GPU usage via: nvidia-smi, but that only gives you the usage at the queried time.
Currently I query the information every 100 ms, but that's just sampling the GPU usage, and can potentially skip peak GPU usage.
Is there a reliable way for me to get the maximum GPU memory usage for a given PID process?
Try using the NVIDIA Visual Profiler. I am not sure how accurate it is but it gives you a graph of the device memory usage at different times when your program is running.

How do I explain performance variability over PCIe bus?

On my CUDA program I see large variability between different runs (upto 50%) in communication time which include host to device and device to host data transfer times over PCI Express for pinned memory. How can I explain this variability? Does it happen when the PCI controller and memory controller is busy performing other PCIe transfers? Any insight/reference is greatly appreciated. The GPU is Tesla K20c, the host is AMD Opteron 6168 with 12 cores running the Linux operating system. The PCI Express version is 2.0.
The system you are doing this on is a NUMA system, which means that each of the two discrete CPUs (the Opteron 6168 has two 6 core CPUs in a single package) in your host has its own memory controller and there maybe a different number of HyperTransport hops between each CPUs memory and the PCI-e controller hosting your CUDA device.
This means that, depending on CPU affinity, the thread which runs your bandwidth tests may have different latency to both host memory and the GPU. This would explain the differences in timings which you are seeing

Getting Theano to use the GPU and all CPU cores (at the same time)

I managed to get Theano working with either GPU or multicore CPU on Ubuntu 14.04 by following this tutorial.
First I got multicore working (I could verify that in System Monitor).
Then, after adding the config below to .theanorc, I got GPU working:
[global]
device = gpu
floatX = float32
I verified it by running the test from the tutorial and checking the execution times, and also by the log message when running my program:
"Using gpu device 0: GeForce GT 525M"
But as soon as GPU started working I wouldn't see multicore in System Monitor anymore. It uses just one core at 100% like before.
How can I use both? Is it even possible?
You can't fully utilize both multicore and GPU at the same time.
Maybe this can be impoved in the future.