Does JETSON NANO support RAPIDS? - nvidia-jetson-nano

Is it possible to run data science tools like RAPIDS on a JETSON NANO? After some searching, I am still not very clear... also, if it does, will data analysis run faster on it than on a CPU? Any insights will be appreciated. Thanks.

No, sadly, the Jetson Nano doesn't support RAPIDS. The Nano's GPU uses an older architecture GPU, Maxwell, which RAPIDS does not support (Pascal or better). The TX2, Jetson NX, and the Xaviers, which have a compatible GPU architecture, are the only ones that community members have seen success with. I know that there or some community members attempting a port of one or two of the RAPIDS libraries to the Nano, but that is a personal effort by them and YMMV.

Related

Can't run CUDA nor OpenCL on GeForce 540M

I have problem running samples provided by Nvidia in their GPU Computing SDK (there's a library of compiled sample codes).
For cuda I get message "No CUDA-capable device is detected", for OpenCL there's error from function that should find OpenCL capable units.
I have installed all three parts from Nvidia to develop with OpenCL - devdriver for win7 64bit v.301.27, cuda toolkit 4.2.9 and gpu computing sdk 4.2.9.
I think this might have to do with Optimus technology that reroutes output from Nvidia GPU to Intel to render things (this notebook has also Intel 3000HD accelerator), but in Nvidia control pannel I set to use high performance Nvidia GPU, set power profile to prefer maximum performance and for PhysX I changed from automatic selection to Nvidia processor again. Nothing has changed though, those samples won't run (not even those targeted for GF8000 cards).
I would like to play somewhat with OpenCL and see what it is capable of but without ability to test things it's useless. I have found some info about this on forums, but it was mostly about linux users where you need Bumblebee to access Nvidia GPU. There's no such problem on Windows however, drivers are better and so you can access it without dark magic (or I thought so until I found this problem).
My laptop has a GeForce 540M as well, in an Optimus configuration since my Sandy Bridge CPU also has Intel's integrated graphics. To run CUDA codes, I have to:
Install NVIDIA Driver
Go to NVIDIA Control Panel
Click 3D Settings -> Manage 3D Settings -> Global Settings
In the Preferred Graphics processor drop down, select "High-performance NVIDIA processor"
Apply the settings
Note that the instructions above apply the settings for all applications, so you don't have to worry about CUDA errors any more. But it will drain more battery.
Here is a video recap as well. Good luck!
Ok this has proven to be totally crazy solution. I was thinking if something isn't hooking between the hardware and application and only thing that came to my mind was AV software. I'm using Comodo with sandbox and Defense+ on and after turning them off I could run all those samples. What's more, only Defense+ needs to be turned off.
Now I just think about how much apps could have been blocked from accessing that GPU..
That's most likely because of the architecture of Optimus. So I'd suggest you to read
NVIDIA CUDA Developer Guide for NVIDIA Optimus Platforms, especially the section "Querying for a CUDA Device" which addresses this issue, I believe.

cuda sdk example simpleStreams in SDK 4.1 not working

I upgraded CUDA GPU computing SDK and CUDA computing toolkit to 4.1. I was testing simpleStreams programs, but consistently it is taking more time that non-streamed execution. my device is with compute capability 2.1 and i'm using VS2008,windows OS.
This sample constantly has issues. If you tweak the sample to have equal duration for the kernel and memory copy the overlap will improve. Normally breadth first submission is better for concurrency; however, on WDDM OS this sample will usually have better overlap if you issue the memory copy right after kernel launch.
I noticed this as well. I thought it was just me but I didn't notice any improvement and tried searching the forums but didn't find anyone else with the issue.
I also ran the source code in the Cuda By Example book (which is really helpful and I recommend you pick it up if you're serious about GPU programming).
Chapter 10 examples has the progression of examples showing how streams should be used.
http://developer.nvidia.com/content/cuda-example-introduction-general-purpose-gpu-programming-0
But comparing the,
1. non-streamed version(which is basically the single stream version)
2. the streamed (incorrectly queued asyncmemcpy and kernel launch)
3. the streamed (correctly queued asyncmemcpy and kernel launch)
I find no benefit in using cuda streams. It might be a win7 issue as I found some sources online discussing that win vista didn't support the cuda streams correctly.
Let me know what you find with the example I linked. My setup is: Win7 64bit Pro, Cuda 4.1, Dual Geforce GTX460 cards, 8GB RAM.
I'm pretty new to Cuda so may not be able to help but generally its very hard to help without you posting any code. If posting is not possible then I suggest you take a look at Nvidia's visual profiler. Its cross platform and can show you were your bottlenecks are.

I've got a Nvidia GPU, how can i code on it?

I've never really been into GPUs, not being a gamer but im aware of their parallel ability and wondered how could i get started programming on one? I recall (somewhere) there is a CUDA C-style programming language. What IDE do I use and is it relatively simple to execute code?
There are quick-start guides for getting the dev drivers and libraries set up on different platforms (win/mac/lin) here, there is also a link to the Cuda C programming guide.
http://developer.nvidia.com/object/nsight.html
Although all the CUDA stuff we do (fluid sims / particle sims etc) are done on Linux, essentially with emacs and and gcc.
Some suggestions:
(1) Download the CUDA SDK from Nvidia (http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.html). They have extensive set of application examples that have been previous developed, tested and commented. Some useful examples to startwith are matrixMul,
histogram, convolutionSeparable. For more complex well documented code see the examples "nbody".
(2) If you are very good in C++ programmming, then using C++ Thrust libraries for GPU is another best place to start. It has extensive STL like support for doing operations on GPU. And the overall programming effort is much less for standard algorithms.
(3) Eclipse with CUDA plugin is a good IDE to work initially.
On windows visual studio. On linux eclipse, code::blocks and others depending on which you feel more comfortable.
IDE though is the last thing. There are steps preceding this (installing appropriate display driver, toolkit, run sdk samples). The manuals/ links provided above are really helpful. Also there is nvidia forum for cuda development an many getting started guides

GPU Emulator for CUDA programming without the hardware [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware?
Info:
I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my desktop for doing this development. I would like to do some work on my netbook instead, but my netbook doesn't have a GPU. Now as far as I know, you need a CUDA capable GPU to run CUDA. Is there a way to get around this? It would seem like the only way is a GPU emulator (which obviously would be painfully slow, but would work). But whatever way there is to do this I would like to hear.
I'm programming on Ubuntu 10.04 LTS.
For those who are seeking the answer in 2016 (and even 2017) ...
Disclaimer
I've failed to emulate GPU after all.
It might be possible to use gpuocelot if you satisfy its list of
dependencies.
I've tried to get an emulator for BunsenLabs (Linux 3.16.0-4-686-pae #1 SMP
Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) i686 GNU/Linux).
I'll tell you what I've learnt.
nvcc used to have a -deviceemu option back in CUDA Toolkit 3.0
I downloaded CUDA Toolkit 3.0, installed it and tried to run a simple
program:
#include <stdio.h>
__global__ void helloWorld() {
printf("Hello world! I am %d (Warp %d) from %d.\n",
threadIdx.x, threadIdx.x / warpSize, blockIdx.x);
}
int main() {
int blocks, threads;
scanf("%d%d", &blocks, &threads);
helloWorld<<<blocks, threads>>>();
cudaDeviceSynchronize();
return 0;
}
Note that in CUDA Toolkit 3.0 nvcc was in the /usr/local/cuda/bin/.
It turned out that I had difficulties with compiling it:
NOTE: device emulation mode is deprecated in this release
and will be removed in a future release.
/usr/include/i386-linux-gnu/bits/byteswap.h(47): error: identifier "__builtin_bswap32" is undefined
/usr/include/i386-linux-gnu/bits/byteswap.h(111): error: identifier "__builtin_bswap64" is undefined
/home/user/Downloads/helloworld.cu(12): error: identifier "cudaDeviceSynchronize" is undefined
3 errors detected in the compilation of "/tmp/tmpxft_000011c2_00000000-4_helloworld.cpp1.ii".
I've found on the Internet that if I used gcc-4.2 or similarly ancient instead of gcc-4.9.2 the errors might disappear. I gave up.
gpuocelot
The answer by Stringer has a link to a very old gpuocelot project website. So at first I thought that the project was abandoned in 2012 or so. Actually, it was abandoned few years later.
Here are some up to date websites:
GitHub;
Project's website;
Installation guide.
I tried to install gpuocelot following the guide. I had several errors during installation though and I gave up again. gpuocelot is no longer supported and depends on a set of very specific versions of libraries and software.
You might try to follow this tutorial from July, 2015 but I don't guarantee it'll work. I've not tested it.
MCUDA
The MCUDA translation framework is a linux-based tool designed to
effectively compile the CUDA programming model to a CPU architecture.
It might be useful. Here is a link to the website.
CUDA Waste
It is an emulator to use on Windows 7 and 8. I've not tried it though. It doesn't seem to be developed anymore (the last commit is dated on Jul 4, 2013).
Here's the link to the project's website: https://code.google.com/archive/p/cuda-waste/
CU2CL
Last update: 12.03.2017
As dashesy pointed out in the comments, CU2CL seems to be an interesting project. It seems to be able to translate CUDA code to OpenCL code. So if your GPU is capable of running OpenCL code then the CU2CL project might be of your interest.
Links:
CU2CL homepage
CU2CL GitHub repository
This response may be too late, but it's worth noting anyway. GPU Ocelot (of which I am one of the core contributors) can be compiled without CUDA device drivers (libcuda.so) installed if you wish to use the Emulator or LLVM backends. I've demonstrated the emulator on systems without NVIDIA GPUs.
The emulator attempts to faithfully implement the PTX 1.4 and PTX 2.1 specifications which may include features older GPUs do not support. The LLVM translator strives for correct and efficient translation from PTX to x86 that will hopefully make CUDA an effective way of programming multicore CPUs as well as GPUs. -deviceemu has been a deprecated feature of CUDA for quite some time, but the LLVM translator has always been faster.
Additionally, several correctness checkers are built into the emulator to verify: aligned memory accesses, accesses to shared memory are properly synchronized, and global memory dereferencing accesses allocated regions of memory. We have also implemented a command-line interactive debugger inspired largely by gdb to single-step through CUDA kernels, set breakpoints and watchpoints, etc... These tools were specifically developed to expedite the debugging of CUDA programs; you may find them useful.
Sorry about the Linux-only aspect. We've started a Windows branch (as well as a Mac OS X port) but the engineering burden is already large enough to stress our research pursuits. If anyone has any time and interest, they may wish to help us provide support for Windows!
Hope this helps.
[1]: GPU Ocelot - https://code.google.com/archive/p/gpuocelot/
[2]: Ocelot Interactive Debugger - http://forums.nvidia.com/index.php?showtopic=174820
You can check also gpuocelot project which is a true emulator in the sense that PTX (bytecode in which CUDA code is converted to) will be emulated.
There's also an LLVM translator, it would be interesting to test if it's more fast than when using -deviceemu.
The CUDA toolkit had one built into it until the CUDA 3.0 release cycle. I you use one of these very old versions of CUDA, make sure to use -deviceemu when compiling with nvcc.
https://github.com/hughperkins/cuda-on-cl lets you run NVIDIA® CUDA™ programs on OpenCL 1.2 GPUs (full disclosure: I'm the author)
Be careful when you're programming using -deviceemu as there are operations that nvcc will accept while in emulation mode but not when actually running on a GPU. This is mostly found with device-host interaction.
And as you mentioned, prepare for some slow execution.
GPGPU-Sim is a GPU simulator that can run CUDA programs without using GPU.
I created a docker image with GPGPU-Sim installed for myself in case that is helpful.

Are There any Open Source Real Time Operating Systems for x86? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Are there any open source real time operating systems out there? I've heard of real-time Linux, but most implementations seem to really be a proprietary RTOS (that you have to pay for) that run Linux as a process -- much the same way Ardence's RTX real-time system works for Windows.
EDIT: I should clarify that I'm looking for RTOS to work with multi-core x86-family CPUs.
FreeRTOS, it provides the underlying kernel. I've used it in some embedded apps and it seems robust. But, it really depends on your application.
http://www.freertos.org/
Check out eCos free, open source and real-time operating system. (Supports x86, not sure about multi-core)
RTLinux is also available
eCos is free (but you can get paid support). It supports Intel x86 architecture. It supports multi-processor systems. Depending on your timing requirements, I've had not too good experience with real-time Linux systems. Although response time may be good in average, I've seen cases where the worst case over a few days may be 10 or even 100 times as much. I guess this partly depends on the quality of the drivers, partly on the scheduler itself.
But I guess it boils down to whether your system demands hard or soft real-time, what the timing constraints are, what kind of application you need to run. And how streamlined development system you require.
There are hard real-time extensions to the Linux kernel. You might want to check some of those out.
Good examples are RTAI and LXRT
RTAI
OpenSolaris has real-time capabilities, however you should watch out if you decide to use it for real-time development: pretty much all I/O can cause priority inversions in the kernel (low-priority system worker threads can starve and cause high priority threads to be blocked, e.g. in STREAMS code).
I have also been using the FreeRTOS operating system that is available either for free under a modified GNU licence, a paid commercial licence version or an expensive safety certified version (SafeRTOS)
From the web-site there is an x86 port as follows
x86
* Supported processor families: Any x86 compatible running in Real mode only, plus a Win32 simulator
* Supported tools: Open Watcom, Borland, Paradigm, plus Visual Studio for the WIN32 simulator
This OS provides the pre-emptive or co-operative task scheduling with queues, semaphores and priority setting for the tasks. It does not provide the sort of I/O or file library functions that come with other larger OS implementations like Linux.
What are your exact requirements? Perhaps you can use vanilla Linux - it doesn't provide real-time guarantees but might be good enough. Some people find that it's not as bad as the real-time vendors try to make out.
Vanilla Linux DOES have different scheduling policies as well, but not a lot of people know that.
Prex is under BSD License.
There is the S.Ha.R.K. Project. It works with x86 CPUs but I don't know if it handles all cores of a CPU.
Well this is not Open Source, but did you know that Windows CE is a hard real time operating system and that it does have a x86 port? I don't know however if it can support multi core CPUs. If it is a commercial project, you definitely should consider it.
There is also MicroC/OS-II, which has a x86 port, but as above, I don't know if it supports multi cores. It is free for non-commercial applications.
There are real-time extensions to Linux, as already mentioned by someone else. Have a look at xenomai.org.
I'm not so sure about the multiprocessor issue. What exactly do you want to do on your multiple processors?
BeRTOS looks quite interesting. But for x86 it supports "emulator only". Not sure why though.