Building mxnet for windows (both cpu and gpu mode) - Running into errors - deep-learning

Mxnet is supposed to build and run, on CPU as well as on GPU, for multiple OSs including Windows.
I'm trying to build mxnet from source on Windows Server 2016 that has NVIDIA K80 GPU on it.
I followed all the instructions in https://mxnet.incubator.apache.org/get_started/windows_setup.html but not able to move past the point of building mxnet in Visual Studio 2013.
The error I'm seeing is
'mshadow::cuda::AddTakeGrad' : ambiguous call to overloaded function indexing_op.h
If I fix this generic call to AddTakeGrad to make it a specific call to mshadow::cuda::, then some other polymorphic function ends up with the same error and so on ...
I tried searching a lot to find if anyone was successful in building mxnet for windows (on both cpu mode and gpu mode) but couldn't find any.
Question: Has anyone been able to successfully build mxnet on Windows? If so, could you help with this error as well as any specific instructions to get it to build for both cpu mode as well as gpu mode?

These days it should be possible to just pip install.

Related

Installing CUDA as a non-root user with no GPU

I have a desktop without a GPU, on which I would like to develop code; and a machine on some cluster which has a GPU, and CUDA installed, but where I really can't "touch" anything and on which I won't run an IDE etc. I don't have root on any of the machines, woe is me.
So, essentially, I want to be able to compile and build my CUDA code on my own GPU-less desktop machine, then just copy it and test it on the other machine.
Can this be done despite the two hindering factors: I seem to recall the CUDA installer requiring the presence of a GPU; playing with the kernel; and doing other root-y stuff.
Notes:
I'll be using the standalone installer, not a package.
I'm on Fedora 22 with an x86_64 CPU.
Assuming you want to develop codes that use the CUDA runtime API, you can install the cuda toolkit on a system that does not have a GPU. Using the runfile installer method, simply answer no when prompted to install the driver.
If you want to compile codes (successfully) that use the CUDA driver API, that process will require a libcuda.so on your machine. This file is installed by the driver installer. There are various methods to "force" the driver installer to run on a machine without a GPU. You can get started by extracting the driver runfile installer (or downloading it separately) and passing the --help command line switch to the installer to learn about some of the options.
These methods will not allow you to run those codes on a machine with no GPU of course. Furthermore, the process of moving a compiled binary from one machine to another, and expecting it to run correctly, is troublesome in my opinion. Therefore my suggestion would be to re-compile the code on a target machine. Otherwise getting a compiled binary to run from one machine to the next is a question that is not unique to CUDA, and is outside the scope of my answer.
If you have no intention of running the codes on the non-GPU machine, and are willing to recompile on the target machine, then you can probably develop driver API codes even without libcuda.so (or there is a libcuda.so stub that you could try linking against just for compilation-test purposes, which is installed by the CUDA installer, if you search for it: /usr/local/cuda/lib64/stubs). If you don't link your driver API code against -lcuda, then you'll get a link error of course, but it should not matter, given the previously stated caveats.
Fedora 22 is not officially supported by CUDA 7.5 or prior. YMMV.
If you don't run the driver installer, you don't need to be a root user for any of this. Of course the install locations you pass to the installer must be those that your user privilege allows access to.

"no CUDA-capable device is detected" with CUDA-capable GPU installed Win7

I have installed cuda.7.0.28 into my laptop. I tried to run one of the sample file. I ran deviceQuery project and got this message:
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL
Then, I ran nvidia-smi.exe file and got this message:
As you see, it is written that "Not Supported". What should I do?
nvidia-smi returning 'not supported' does not necessarily mean that your GPU does not have the ability to run CUDA code. It means that you don't have the ability to see the active CUDA process name using nvidia-smi.
Cuda-z might be of help here. Take a look at what it is here: http://cuda-z.sourceforge.net/
Also, I have to say I had quite a few problems getting CUDA running on Windows. If you really need to run it on Windows, make sure you go through this first: http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-microsoft-windows/#axzz3cNkYKZDP
Have you tried to run it on linux on the same machine? It was much easier to get it workinge.
NVIDIA now provide a toolkit to install CUDA on windows (Linux or Mac also). It does a handy check of your system, to see if it meets necessary requirements for CUDA if you are unsure about your GPU
https://developer.nvidia.com/cuda-80-ga2-download-archive
I've noticed that when my nvidia driver is updated during the system package update process (on Ubuntu) that I'll get this message. It is resolved by a reboot, or likely an X restart although I haven't tried that.
This was disconcerting the first time it happened since it was one of those "Hey! My code just ran fine. WTF happened?" moments.

CUDA 5.0 cuda-gdb on Linux Needs dedicated CPU?

With a fresh CUDA 5.0 Linux install on CentOS 5.5, I am not able to gdb. So I am wondering if you still need a dedicated GPU for the Linux cuda-gdb? I tried it with the Vesa device driver for X11, but get the same result. Profiling works, running the app works, but trying to run cuda-gdb gives :
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
Any suggestions?
cuda-gdb still needs a GPU that is not used by graphical environment (e.g. if you are running Gnome/KDE/etc. you need to have system with several GPUs - not necessary all of them must be NVIDIA GPUs)
This particular message is not about this problem - you can ignore it. cuda-gdb will tell if it fails because no GPU can be used for debugging.

Debugger in CUDA 5

Nvidia has released extended eclipse for CUDA 5. They have Nsight plugin for VS2010 also. In VS2010 we can stop program execution at breakpoint in kernel but how to achieve this functionality in eclipse on Linux? I don't see any nsight specific keys to stop execution. I tried changing perspective but it debugs as a normal C/C++ application. I'm using Tesla C2070, Intel Xeon 8 core machine with Linux.
I'm from Nsight Eclipse Edition team.
Our goal is specifically for the application to be debugged as a normal C/C++ application. This means that you can set breakpoints, use "run to line", etc. regardless of whether you debug host or device code.
Basically, the process is quite standard for Eclipse:
Create a project (you can also import existing executable)
Click debug button
Debugger will run and by default will break in the main function. Note that no device code posted on the device so you will only see the host thread.
Set a breakpoint in the device code and hit resume (note that Breakpoints view toolbar also allows you setting breakpoint on any CUDA kernel launch)
Debugger will break when device code reaches the breakpoint. You can inspect your application state using visual debugger UI.
Couple things, and not sure which solved the issue. Drivers updated to latest ones with RC5.0, but I chose to run VNC server instead of native X server. Then the CUDA card(s) are dedicated to my apps and debugging, and it works like a charm, and now accessible from everywhere.
Eugene,
I just installed Cuda 5, and I wasn't able to break in any kernel code. It was a clean install of centos 5.5, with a fresh download of cuda-5, and i am running on a asus g71x laptop which has a gtx260m installed.
I thought maybe you cant run display and dedbug on one device still, so i switched to non-nv x display, but still had same issue, cant stop in the kernel code.
Have you tried CUDA 5.0 RC1? It is available now. You can download and try it. And I have tried the Nsight in it, it works well for debugging.
Best regards!
The 304.43 NVIDIA Driver does not let users other than root debug their CUDA application.
That problem is not present in any past or future public releases. The CUDA documentation recommends using only drivers listed in the CUDA DevZone. The 304.43 driver is not one of them.
That may or may not be the issue you are hitting. But I thought it was worth mentioning.

CUDA Visual Profiler doesn't generate timeline

I'm trying to determine where a slowdown is occurring in my GPU code. I've verified that the code runs correctly on its own (it doesn't throw any errors, outputs are correct, finishes cleanly, etc). When I try to profile the code in Visual Profiler, it seems to run normally, dumping correct intermediate outputs to stdout. The GPU is being used (I've checked with cuda-gdb and dumping printf()s from inside my kernels). Once all the code has completed, Visual Profiler reports that viper has terminated the executable. However, no timeline is generated. Instead, the main window shows 0, 10, 20, 25 microseconds all "collapsed" on top of one another. When I tell the Visual Profiler to run all analysis options, it proceeds through the 24 runs without problems, but still no timeline is generated.
I'm using CUDA 4.2, driver version 295.41 on Ubuntu x86_64 with a GeForce 460.
When the visual profiler fails to generate a timeline it is typically because it cannot locate a component required for profiling. This component is a shared library found in /usr/local/cuda/lib64 called libcuinj.so. Is that path on your LD_LIBRARY_PATH? How are you launching the Visual Profiler? The script in /usr/local/cuda/bin/nvvp should set the path correctly for you.
The 4.2 version of the visual profiler does not do a good job of reporting errors when this shared library is not found. The upcoming 5.0 version of the visual profiler has much better error reporting in this regard.
I don't know if it's the same under Linux, but in Nsight under Windows, there are two basic types of profiling that you can run. "Application trace" and "Profile". Only under Application trace do you get the timelines. Application trace records the timestamps when CUDA and kernel calls were made. The Profile setting offers options to analyze the kernels. It reads the hardware counters from the GPU and generates performance information related to one or multiple kernels (and no timelines).