Linux kernel source code http://lxr.free-electrons.com/source/kernel/pid_namespace.c which gives the idea about how PID namespace is created.
Can any one please give me an idea where exactly PID's in the Namespace related are maintained apart from the global one, and how do I map those PID's?
Related
I am trying to optimize my simulator by leveraging run-time compilation. My code is pretty long and complex, but I identified a specific __device__ function whose performances can be strongly improved by removing all global memory accesses.
Does CUDA allow the dynamic compilation and linking of a single __device__ function (not a __global__), in order to "override" an existing function?
I am pretty sure the really short answer is no.
Although CUDA has dynamic/JIT device linker support, it is important to remember that the linkage process itself is still static.
So you can't delay load a particular function in an existing compiled GPU payload at runtime as you can in a conventional dynamic link loading environment. And the linker still requires that a single instance of all code objects and symbols be present at link time, whether that is a priori or at runtime. So you would be free to JIT link together precompiled objects with different versions of the same code, as long as a single instance of everything is present when the session is finalised and the code is loaded into the context. But that is as far as you can go.
It looks like you have a "main" kernel with a part that is "switchable" at run time.
You can definitely do this using nvrtc. You'd need to go about doing something like this:
Instead of compiling the main kernel ahead of time, store it as as string to be compiled and linked at runtime.
Let's say the main kernel calls "myFunc" which is a device kernel that is chosen at runtime.
You can generate the appropriate "myFunc" kernel based on equations at run time.
Now you can create an nvrtc program using multiple sources using nvrtcCreateProgram.
That's about it. The key is to delay compiling the main kernel until you need it at run time. You may also want to cache your kernels somehow so you end up compiling only once.
There is one problem I foresee. nvrtc may not find the curand device calls which may cause some issues. One work around would be to look at the header the device function call is in and use nvcc to compile the appropriate device kernel to ptx. You can store the resulting ptx as text and use cuLinkAddData to link with your module. You can find more information in this section.
Is there a way to solve the below thread related problem?
In the main portion ( no thread yet), we are sourcing lots of files and one amongst them is a .tcl file with itcl class and objects defined.
In the itcl class, a file is opened and the handle of the same is stored in a variable.
I am creating a multiple threads and would like to use the same file using the same handle.
Variable name of the handle could be passed across threads using tsv::set and tsv::get commands. However, the handle as such is not passed to threads. it says "Channel not found" error.
Could you pls help me fix this problem.
While you can use thread::transfer to move a channel between threads, you can't use the same channel in multiple threads at once (and please don't move server sockets at all); there is no thread::copy_channel. (Think about it: if you had the same channel in two threads and both had readable event handlers set, what would happen when the channel gets readable? There'd be a race condition. The same problem would exist with writable channels, though most code doesn't notice that as Tcl's default behaviour is typically Good Enough.)
To use thread::transfer, in the source thread do:
thread::transfer $targetThreadID $channel
In the target thread, after the above code in the source has happened, you can then access the channel as normal. Assuming you're in the top-most master interpreter of the thread (sub-interpreters are managed via interp as normal). In practice, you usually do a thread::send or thread::eval in the source thread after the transfer to tell the target's Tcl scripts that they are now in charge of the interpreter.
For logging, have one thread manage the log files and the other threads just send messages to the logger thread to have them written out. Hide the details inside a logger object (“real” in one thread, “thread-aware proxy” in the others) in each thread so that nothing else in your code knows what you're up to.
Suppose I have a pointer to a __global__ function in CUDA. Is there a way to programmatically ask CUDART for a string containing its name?
I don't believe this is possible by any public API.
I have previously tried poking around in the driver itself, but that doesn't look too promising. The compiler emitted code for <<< >>> kernel invocation clearly registers the mangled function name with the runtime via __cudaRegisterFunction, but I couldn't see any obvious way to perform a lookup by name/value in the runtime library. The driver API equivalent cuModuleGetFunction leads to an equally opaque type from which it doesn't seem possible to extract the function name.
Edited to add:
The host compiler itself doesn't support reflection, so there are no obvious fancy language tricks that could be pulled at runtime. One possibility would be to add another preprocessor pass to the compilation trajectory to build a static kernel function lookup table before the final build. That would be rather a lot of work, but it could be done, at least for "classic" compilation where everything winds up in a single translation unit.
I'm trying to figure out why cudaMemcpyFromSymbol() exists. It seems everything that 'symbol' func can do, the nonSymbol cmds can do.
The symbol func appears to make it easy for part of an array or index to be moved, but this could just as easily be done with the nonSymbol function. I suspect the nonSymbol approach will run faster as there is no symbol-lookup needed. (It is not clear if the symbol look up calculation is done at compile or run time.)
Why would I use cudaMemcpyFromSymbol() vs cudaMemcpy()?
cudaMemcpyFromSymbol is the canonical way to copy from any statically defined variable in device memory.
cudaMemcpy can't be directly use to copy to or from a statically defined device variable because it requires a device pointer, and that isn't known to host code at runtime. Therefore, an API call which can interrogate the device context symbol table is required. The two choices are either, cudaMemcpyFromSymbol which does the symbol lookup and copy in one operation, or cudaGetSymbolAddress which returns an address which can be passed to cudaMemcpy. The former is probably more efficient if you only want to do one copy, the latter if you want to use the address multiple times in host code.
So, I think I have a very weird question.
So, let say that I already have a program put on my GPU and in that program I call a function X. But that function X is not declared yet.
I want to be able, dynamically, to modify that function X, by completely changing the code and put it in the program without recompiling the rest or losing any pointers whatsoever.
To compare it with something that most of us know, I want to be able to do like the shaders in OpenGL. In the middle of the execution, I can change the code of one shader, only recompile that shader, activate the program and now I used this one.
So, is it possible. Or do I need to recompile the whole thing all the time. And if I have to recompile, do I lose the various arrays that I created in global memory ?
Thanks
W
If you compile with the -cuda flag using nvcc, you can get the intermediate C++ source that streams PTX to the processor. In theory, you could post-process this intermediate output to dynamically generate PTX on the fly and send it over. You might even be able to have PTX be self modifying, but that's way out of my league.