Cross-compiling Open NI for Linux MIPS Platform - mips

I am trying to cross-compile Open NI in a MIPS based platform. But I see that all the header files check for platform and reject MIPS as unsupported platform.
Snippet:
#elif defined(ANDROID) && defined(__arm__)
#include "Android-Arm/XnPlatformAndroid-Arm.h"
#elif (linux && (i386 || __x86_64__))
#include "Linux-x86/XnPlatformLinux-x86.h"
#elif (linux && __arm__)
#include "Linux-Arm/XnPlatformLinux-Arm.h"
#elif _ARC
#include "ARC/XnPlatformARC.h"
#elif (__APPLE__)
#include "MacOSX/XnPlatformMacOSX.h"
#else
#error OpenNI Platform Abstraction Layer - Unsupported Platform!
#endif
As far as I have checked the code does not have any assembly code that might be processor-specific.
Is it safe to ignore the compiler error and proceed (by commenting out the #error statements) with the cross-compilation or is there a good reason why MIPS is not supported by Open NI?

Related

cudaErrorUnknown when calling cuda runtime or driver api

After heavy simulation that crashes my GPU, terminating the program and rebooting my computer, I cannot call any cuda api that runs correctly before rebooting.
nvidia-smi works well.
In my Ubuntu 20.04 computer, the CUDA11.6 and nvidia-driver 510.47.03 are installed.
The minimum codes for getting error in my computer are followings,
Driver api version
#include "cuda.h"
int main() {
CUresult result;
result = cuInit(0);
const char* pStr;
cuGetErrorName(result, &pStr);
if(result != CUDA_SUCCESS){
std::cerr << pStr << std::endl;
}
}
The compile command is,
nvcc a.cu -lcuda
and I got CUDA_ERROR_UNKNOWN.
Runtime api version,
#include <iostream>
int main() {
int n;
cudaError_t error = cudaGetDeviceCount(&n);
if(error){
std::cerr << cudaGetErrorName(error) << std::endl;
}
}
The compile command is,
nvcc a.cu
and I got cudaUnknownError.
Why I got this error ? Is this a problem of driver ?
looks like you trashed something (driver or cuda runtime) and you are not able to call any function related to cuda.
in my humble experience, I usually get these errors when my kernels runs for too long on a Windows machine and the Windows Display Driver Manager reset my GPU while i'm running the kernel.
Maybe you are experiencing some similiar issues on linux.
To fix this, have you tried to reset your GPU using the following bash command line ?
nvidia-smi --gpu-reset
Please also note that all CUDA errors are not recoverable and if the previous command doesn't work, you may reset your computer to fix the issue (or simply unload/reload the cuda driver).

Dynamic Parallelism on GTX 980 ti: Unknown Error

I am attempting dynamic parallelism on a GTX 980 ti card.
All attempts at running code return "unknown error".
Simple code is shown below with compilation options.
I can execute kernels at depth=0 with no issues.
The first time a child is called, the error is given.
The cudaDeviceSynchronize() were included after looking at other questions here, but didn't solve problem.
Any ideas? Could this be a drivers issue?
Edit 1:
OS: Linux-x86_64
Nvidia driver version: 384.59
nvcc version 7.5.17
There are two 980 ti's connected with PCIe x16 Gen3. The system also has windows installed on another RAID configured SSD.
#include <cuda.h>
#include <fstream>
#include <stdio.h>
#include <stdlib.h>
__global__ void ker_two(){
int two=0;
two++;
}
__global__ void ker_one(){
int one=0;
one++;
ker_two<<<1,1>>>();
cudaDeviceSynchronize();
};
int main( ){
ker_one<<<1,1>>>();
cudaDeviceSynchronize();
cudaError_t err = cudaGetLastError();
if (err != cudaSuccess)
printf("Cuda Error: %s\n", cudaGetErrorString(err));//*/
return 0;
}
compiled with
nvcc -arch=compute_52 -rdc=true -lcudadevrt test.cu
I am able (?) to reproduce the error on a machine with a Maxwell Titan card. It's a Fedora 24 distribution with CUDA 8.0.61 installed manually. Driver version is 375.51.
However - it seems the problem only occurs on my system when I call the cudaDeviceSynchronize() within the ker_one(), regardless of whether I call the second kernel or not. So maybe that's the problem you're seeing rather than dynamic parallelism per se.
Considering #talonmies' comment, this might even be just a driver issue.

cudaMalloc hang when building x64 version binary [duplicate]

My simple cuda helloworld application runs fine when built in 32 bit using visual studio 2015 community on windows 10. However, if I build it in 64 bit, it is not executed
GPU: Telsa K40c
ToolKit: CUDA 8
Operating System: windows 10 64bit
Visual Studio: community edition.
there is no error message in output console.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include<stdio.h>
__global__ void welcome() {
printf("Hello world :)");
}
int main() {
welcome << <1, 1 >> > ();
cudaDeviceSynchronize();
return 0;
}
I faced the same issue and opened a bug to nvidia #1855074.
They reproduced it successfully and I'm waiting for update on it.
One thing is sure, it's on their side.
The only workaround I found was to put my card in WDDM mode via nvidia-smi, which broke my taskbar.
I recommend to wait for a fix.

cuda simple application working for 32 bit not for 64 bit

My simple cuda helloworld application runs fine when built in 32 bit using visual studio 2015 community on windows 10. However, if I build it in 64 bit, it is not executed
GPU: Telsa K40c
ToolKit: CUDA 8
Operating System: windows 10 64bit
Visual Studio: community edition.
there is no error message in output console.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include<stdio.h>
__global__ void welcome() {
printf("Hello world :)");
}
int main() {
welcome << <1, 1 >> > ();
cudaDeviceSynchronize();
return 0;
}
I faced the same issue and opened a bug to nvidia #1855074.
They reproduced it successfully and I'm waiting for update on it.
One thing is sure, it's on their side.
The only workaround I found was to put my card in WDDM mode via nvidia-smi, which broke my taskbar.
I recommend to wait for a fix.

Dynamic Parallelism and CC 2.0 code in the same library

In my library I need to support devices of compute capability 2.0 and higher. For CC 3.5+ devices I’ve implemented optimized kernels which utilize Dynamic Parallelism. It seems that nvcc compiler does not support DP when anything less than “compute_35,sm_35” is specified (I'm getting compiler/linker errors). My question is what is the best way to support multiple kernel versions in such case? Having multiple DLLs and choosing between them at runtime will work but I was wondering if there is a better way.
UPDATE: I’m successfully using #if __CUDA_ARCH__ >= 350 for other things (like __ldg() etc) but it does not work in DP case as I have to link with cudadevrt.lib which produces the following error:
1>nvlink : fatal error : could not find compatible device code in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.5/lib/Win32/cudadevrt.lib
I believe this issue has been addressed now in CUDA 6.
In particular, the compile problem associated with having the -lcudadevrt library specified and throwing a link error for code that is not requiring dynamic parallelism, has been eliminated/removed.
Here's my simple test:
$ cat t264.cu
#include <stdio.h>
__global__ void kernel1(){
printf("Hello from DP Kernel\n");
}
__global__ void kernel2(){
#if __CUDA_ARCH__ >= 350
kernel1<<<1,1>>>();
#else
printf("Hello from non-DP Kernel\n");
#endif
}
int main(){
kernel2<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
$ nvcc -O3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35 -rdc=true -o t264 t264.cu -lcudadevrt
$ CUDA_VISIBLE_DEVICES="0" ./t264
Hello from non-DP Kernel
$ CUDA_VISIBLE_DEVICES="1" ./t264
Hello from DP Kernel
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Sat_Jan_25_17:33:19_PST_2014
Cuda compilation tools, release 6.0, V6.0.1
$
In my case, device 0 is a Quadro5000, a cc 2.0 device, and device 1 is a GeForce GT 640, a cc 3.5 device.