What's the replacement for cuModuleGetSurfRef and cuModuleGetTexRef? - cuda

CUDA 12 indicates that these two functions:
CUresult cuModuleGetSurfRef (CUsurfref* pSurfRef, CUmodule hmod, const char* name);
CUresult cuModuleGetTexRef (CUtexref* pTexRef, CUmodule hmod, const char* name);
which obtain a reference to surface or a texture, respectively, from a loaded module - are deprecated.
What are they deprecated in favor of? Are surfaces and textures in modules to be accessed differently? Will they be entirely out of modules? If it's the latter, how would one work with them using the CUDA driver API?

So, based on #talonmies' comment, it seems the "replacement" are "texture objects" and "surface objects". The main difference - as far as is evident in the API - is that the new "objects" have less API calls, which take richer descriptors. Thus, the user sets fields themselves, and does not need the large number of cuTexRefGetXXXX and cuTexRefSetXXXX calls. There are also "tensor map objects", appearing with Compute Capability 9.0 and later.

Related

ThreadX module manager memory

trying to build using IAR a sample of module and module_manager on STM32-H7 starting from the sample provided in "threadx-6.1.5_rel" and from https://learn.microsoft.com/en-us/azure/rtos/threadx-modules/chapter3 i keep getting called the "module_fault_handler()" it seems because of memory error after calling txm_module_manager_start().
In the provided examples during module_manage initializations i see:
txm_module_manager_initialize((VOID *) 0x90000000, 0xE000);
txm_module_manager_external_memory_enable(&my_module, (void *) 0x90000000, 128, TXM_MODULE_MANAGER_SHARED_ATTRIBUTE_WRITE);
Is not clear to me from where the hard coded values comes and/or how they are calculated, seeing the file sample_threadx_module.icf i couldn't figure out this.
Thank you in advance
The hardcoded values are just an example. Please provide memory locations that are compatible with your memory map.

Unable to create a thrust device vector

So I'm trying to start on GPU programming and using the Thrust library to simplify things.
I have created a test program to work with it and see how it works, however whenever I try to create a thrust::device_vector with non-zero size the program crashes with "Run-time Check Failure #3 - The variable 'result' is being used without being initialized.' (this comes from the allocator_traits.inl file) And... I have no idea how to fix this.
The following is all that is needed to cause this error.
#include <thrust/device_vector.h>
int main()
{
int N = 100;
thrust::device_vector<int> d_a(N);
return 0;
}
I suspect it may be a problem with how the environment is set up so the details on that are...
Created using visual studio 2019, in a CUDA 11.0 Runtime project (the example program given when opening this project works fine, however), Thrust version 1.9, and the given GPU is a GTX 970.
This issue only seems to manifest with the thrust version (1.9.x) associated with CUDA 11.0, and only in debug projects on windows/Visual Studio.
Some workarounds would be to switch to building a release project, or just click "Ignore" on the dialogs that appear at runtime. According to my testing this allows ordinary run or debug at that point.
I have not confirmed it, but I believe this issue is fixed in the latest thrust (1.10.x) just released (although not part of any formal CUDA release at this moment, I would expect it to be part of some future CUDA release).
Following the Answer of Robert Crovella, I fixed this issue by changing the corresponding lines of code in the thrust library with the code from GitHub. More precisely, in the file ...\CUDA\v11.1\include\thrust\detail\allocator\allocator_traits.inl I replaced the following function
template<typename Alloc>
__host__ __device__
typename disable_if<
has_member_system<Alloc>::value,
typename allocator_system<Alloc>::type
>::type
system(Alloc &)
{
// return a copy of a default-constructed system
typename allocator_system<Alloc>::type result;
return result;
}
by
template<typename Alloc>
__host__ __device__
typename disable_if<
has_member_system<Alloc>::value,
typename allocator_system<Alloc>::type
>::type
system(Alloc &)
{
// return a copy of a default-constructed system
return typename allocator_system<Alloc>::type();
}

how to draw texture on obj model through optix example

I'm very new to optix and cuda.
I'm trying to modify optix SDK example to present a 3D model with ray tracing. I modified the "progressivePhotonMap" example. Because of lacking of optix/cuda knowledge, I don't know how to draw texture on the 3D model, can anyone who is familiar with SDK example could help me?
I read other draw texture examples like "swimmingShark" or "cook" and try to find out clue to use. However, those examples seem has different way to draw texture.
From now on, I know i have to load texture in cpp file
GeometryInstance instance = m_context->createGeometryInstance( mesh, &m_material, &m_material+1 );
instance["diffuse_map"]->setTextureSampler(loadTexture( m_context, ... );
and create TextureSampler in cuda file
rtTextureSampler<float4, 2> diffuse_map; // Corresponds to OBJ mtl params
,and give them texcoord to draw, like this,
float3 Kd = make_float3( tex2D( diffuse_map, texcoord.x*diffuse_map_scale, texcoord.y*diffuse_map_scale ) );
However, I cannot found where the texcoord get the texture coordinate data in cuda file.
It seems there should be some code like this in .cpp file
GI["texcoord"]->setBuffer(texcoord)
Could anyone teach me where texcoord get the texture coordinate data, and how to match coordinate data and texure to present 3D model with ray tracing?
I can't find tutorial in google, I really need help or direction to reach my goal. Thank you.
You should read up on the OptiX documentation first. Specifically the paragraph regarding Attribute variables.
IIRC the texcoord variable is an attribute of the form
rtDeclareVariable( float3, texcoord, attribute texcoord );
that is computed in the intersection program and passed along to the closest hit program (attributes are designed to pass data from the intersection point to the shading points).
Short answer: it is set into another CUDA function which, conceptually, computes some data needed by that line.

Do STL containers support ARC when storing Obj-C objects in Objective-C++?

For example, would this leak?
static std::tuple<CGSize, NSURL *> getThumbnailURL() {
return std::make_tuple(CGSizeMake(100, 100), [NSURL URLWithString:#"http://examples.com/image.jpg"]);
}
No, it wouldn't leak. That NSURL object would be managed by ARC properly.
http://clang.llvm.org/docs/AutomaticReferenceCounting.html#template-arguments
If a template argument for a template type parameter is an retainable object owner type that does not have an explicit ownership qualifier, it is adjusted to have __strong qualification.
std::tuple<CGSize, NSURL *> is the same as std::tuple<CGSize, NSURL __strong *>. Thus NSURL object will be released when the std::tuple instance is destructed.
Yes, they work. STL containers are templated (STL = Standard Template Library), so whenever you use one it's as if you re-compiled their source code with the template arguments substituted in (template instantiation). And if you re-compiled their source code with the template arguments substituted in, then ARC would perform all the appropriate memory management necessary for the managed pointer types in that code.
Another way to think about it is that ARC managed pointer types are actually C++ smart pointer types -- they have a constructor that initializes it to nil, an assignment operator that releases the existing value and retains (or for block types, copies) the new value, and a destructor that releases the value. So as much as STL containers work with similar C++ smart pointer types, they work with ARC managed pointer types.

CUDA - invalid device function, how to know [architecture, code]?

I am getting the following error when running the default generated kernel when creating a CUDA project in VS Community:
addKernel launch failed: invalid device function
addWithCuda failed!
I searched for how to solve it, and found out that have to change the Project->Properties->CUDA C/C++->Device->Code Generation(default values for [architecture, code] are compute_20,sm_20), but I couldn't find the values needed for my graphic card (GeForce 8400 GS)
Is there any list on the net for the [architecture, code] or is it possible to get them by any command?
The numeric value in compute_XX and sm_XX are the Compute Capability (CC) for your CUDA device.
You can lookup this link http://en.wikipedia.org/wiki/CUDA#Supported_GPUs for a (maybe not complete) list of GPUs and there corresponding CC.
Your quite old 8400 GS (when I remember correctly) hosts a G86 chip which supports CC 1.1.
So you have to change to compute_11,sm_11
`