what is the difference between ZwOpenFile and NtOpenFile? - function

ZWOpenFile and NtOpenFile are both the functions of nt dll..ZwOpenFile is implemented as same as NtopenFile..but I dont understand why ZWopenFile is included in nt dll function.Can anyone please explain me the difference?

This is documented in MSDN:
A kernel-mode driver calls the Zw version of a native system services routine to inform the routine that the parameters come from a trusted, kernel-mode source. In this case, the routine assumes that it can safely use the parameters without first validating them. However, if the parameters might be from either a user-mode source or a kernel-mode source, the driver instead calls the Nt version of the routine, which determines, based on the history of the calling thread, whether the parameters originated in user mode or kernel mode. For more information about how the routine distinguishes user-mode parameters from kernel-mode parameters, see PreviousMode.
Basically it relates to how the parameters are validated.

Generally, kernel drivers should only use the ZwXxx() functions.
When called from user mode, the ZwXxx() and NtXxx() functions are exactly the same - they resolve to the same bits of code in ntdll.dll.
When called from a kernel-mode driver, the Zwxxx() variant ensures that a flag used by the kernel is set to indicate that the requestor mode (what's supposed to indicate the caller's mode) is kernel mode. If a kernel driver calls the NtXxx() variant the requestor mode isn't explicitly set so it's left alone and might indicate user or kernel mode, depending on what has occurred in the call stack up to this point.
If the requestor mode flag is set to user mode, the kernel will validate parameters, which might not be the right thing to do (especially if the kernel driver is passing in kernel mode buffers, as the validation will fail in that case), if it's set to kernel mode, the kernel implicitly trusts parameters.
So the rules for using these API names generally boils down to: if you're writing a kernel driver, call the ZwXxx() version (unless you're dealing with special situations, and you know what you're doing and why). If you're writing a user mode component, it doesn't matter which set you call.
As far as I know, Microsoft only documents the NtXxx() for use in user-mode (where it indicates that they are the user-mode equivalent to the corresponding ZwXxx() function).

Giving an example to what has already been said to ensure OP or anyone else gets a complete picture.
NtXxx calls from user mode are resulting in passing less trusted data(from user mode) to a more privileged layer (kernel mode). So it expects the buffer has valid user mode address, the Handles being passed are valid user mode handles, etc.
If a driver calls NtXxx api instead of its equivalent ZwXxx it has to ensure that valid user mode arguments are being passed i.e. it cannot pass a kernel mode address (even if it is valid) and a kernel mode handle (see OBJ_KERNEL_HANDLE).
As already said the ZwXxx equivalent of the API explicitly indicates (through requestor level) that such parameter validation needs to be skipped as the callee is at the same privilege level as the caller.
Here is link to a good starting point for anyone who wants to go beyond the obvious,
https://www.osronline.com/article.cfm?id=257.

Related

cuvidGetDecoderCaps not work with primary context [duplicate]

How can I create a CUDA context?
The first call of CUDA is slow and I want to create the context before I launch my kernel.
The canonical way to force runtime API context establishment is to call cudaFree(0). If you have multiple devices, call cudaSetDevice() with the ID of the device you want to establish a context on, then cudaFree(0) to establish the context.
EDIT: Note that as of CUDA 5.0, it appears that the heuristics of context establishment are slightly different and cudaSetDevice() itself establishes context on the device is it called on. So the explicit cudaFree(0) call is no longer necessary (although it won't hurt anything).
Using the runtime API: cudaDeviceSynchronize, cudaDeviceGetLimit, or anything that actually accesses the context should work.
I'm quite certain you're not using the driver API, as it doesn't do that sort of lazy initialization, but for others' benefit the driver call would be cuCtxCreate.

Where is hardware exception handling entry / exit code stored

I know this question seems very generic as it can depend on the platform,
but I understand with procedure / function calls, the assembler code to push return address on the stack and local variables etc. can be part of either the caller function or callee function.
When a hardware exception or interrupt occurs tho, the Program Counter will get the address of the exception handler via the exception table, but where is the actual code to store the state, return address etc. Or is this automatically done at the hardware level for interrupts and exceptions?
Thanks in advance
since you are asking about arm and you tagged microcontroller you might be talking about the arm7tdmi but are probably talking about one of the cortex-ms. these work differently than the full sized arm architecture. as documented in the architectural reference manual that is associated with these cores (the armv6-m or armv7-m depending on the core) it documents that the hardware conforms to the ABI, plus stuff for an interrupt. So the return address the psr and registers 0 through 4 plus some others are all put on the stack, which is unusual for an architecture to do. R14 instead of getting the return address gets an invalid address of a specific pattern which is all part of the architecture, unlike other processor ip, addresses spaces on the cortex-ms are encouraged or dictated by arm, that is why you see ram starts at 0x20000000 usually on these and flash is less than that, there are some exceptions where they place ram in the "executable" range pretending to be harvard when really modified harvard. This helps with the 0xFFFxxxxx link register return address, depending on the manual they either yada yada over the return address or they go into detail as to what the patterns you find mean.
likewise the address in the vector table is spelled out something like the first 16 are system/arm exceptions then interrupts follow after that where it can be up to 128 or 256 possible interrupts, but you have to look at the chip vendor (not arm) documentation for that to see how many they exposed and what is tied to what. if you are not using those interrupts you dont have to leave a huge hole in your flash for vectors, just use that flash for your program (so long as you insure you are never going to fire that exception or interrupt).
For function calls, which occur at well defined (synchronous) locations in the program, the compiler generates executable instructions to manage the return address, registers and local variables. These instructions are integrated with your function code. The details are hardware and compiler specific.
For a hardware exception or interrupt, which can occur at any location (asynchronous) in the program, managing the return address and registers is all done in hardware. The details are hardware specific.
Think about how a hardware exception/interrupt can occur at any point during the execution of a program. And then consider that if a hardware exception/interrupt required special instructions integrated into the executable code then those special instructions would have to be repeated everywhere throughout the program. That doesn't make sense. Hardware exception/interrupt management is handled in hardware.
The "code" isn't software at all; by definition the CPU has to do it itself internally because interrupts happen asynchronously. (Or for synchronous exceptions caused by instructions being executed, then the internal handling of that instruction is what effectively triggers it).
So it's microcode or hardwired logic inside the CPU that generates the stores of a return address on an exception, and does any other stuff that the architecture defines as happening as part of taking an exception / interrupt.
You might as well as where the code is that pushes a return address when the call instruction executes, on x86 for example where the call instruction pushes return info onto the stack instead of overwriting a link register (the way most RISCs do).

How come the macro is used as a function, but is not implemented anywhere?

The following code is in MySQL 5.5 storage/example/ha_example.cc:
MYSQL_READ_ROW_START(table_share->db.str, table_share->table_name.str, TRUE);
rc= HA_ERR_END_OF_FILE;
MYSQL_READ_ROW_DONE(rc);
I search the MYSQL_READ_ROW_START definition in the whole project, and find it in the include/probes_mysql_nodtrace.h:
#define MYSQL_READ_ROW_START(arg0, arg1, arg2)
#define MYSQL_READ_ROW_START_ENABLED() (0)
#define MYSQL_READ_ROW_DONE(arg0)
#define MYSQL_READ_ROW_DONE_ENABLED() (0)
It is just an empty macro definition here.
My question is, How came this macro MYSQL_READ_ROW_START is not associate with any function, but used as a function in the above code?
Thanks.
These aren't traditional macros: they're probe points for DTrace,
an observability framework for Solaris, OS X, FreeBSD and various
other operating systems.
DTrace revolves around the notion that different providers offer
certain probes at which one can observe running executables or
even the operating system itself. Some providers are time-based;
by firing at regular intervals the probes can, for example, be
used to profile the use of a CPU. Other providers are code-based,
and their probes might, for example, fire at the entrance to and
exit from functions.
The code you highlight is an example of the USDT (User-land
Statically Defined Tracing) provider. The canonical use of the
USDT provider is to expose meaningful events within transactions.
For example, the beginning and end of a transaction might well
occur somewhere deep within different functions; in this case
it's best for the developer to identify exactly what he wants to
reveal and when.
A USDT probe is more than a switchable printf() although it can
of course be used to reveal information, e.g. some local value
such as the intermediate result of a transaction. A USDT probe
can also be used to trigger behaviour. For example, one might
want to activate some network probes for only the duration of a
certain transaction.
Returning to your question, USDT probes are implemented by writing
macros in the code that correspond to a description of the
provider in a ".d" file elsewhere. This is parsed by the
dtrace(1) utility, which generates a header file that is suitable
for compilation. On a system that lacks DTrace it would make
sense to define a header file in which the USDT macros became null
ops, and judging by the given filename (probes_mysql_nodtrace.h)
this is what you are observing.
See http://dev.mysql.com/tech-resources/articles/getting_started_dtrace_saha.html.
To quote:
DTrace probes are implemented by kernel modules called providers, each
of which performs a particular kind of instrumentation to create
probes. Providers can thus described as publishers of probes that can
be consumed by DTrace consumers (see below). Providers can be used for
instrumenting kernel and user-level code. For user-level code, there
are two ways in which probes can be defined- User-Level Statically
Defined Tracing (USDT) or PID provider.
So it appears to be up to DTrace providers to implement such a macro.

What is the difference between message-passing and method-invocation?

Is there a difference between message-passing and method-invocation, or can they be considered equivalent? This is probably specific to the language; many languages don't support message-passing (though all the ones I can think of support methods) and the ones that do can have entirely different implementations. Also, there are big differences in method-invocation depending on the language (C vs. Java vs Lisp vs your favorite language). I believe this is language-agnostic. What can you do with a passed-method that you can't do with an invoked-method, and vice-versa (in your favorite language)?
Using Objective-C as an example of messages and Java for methods, the major difference is that when you pass messages, the Object decides how it wants to handle that message (usually results in an instance method in the Object being called).
In Java however, method invocation is a more static thing, because you must have a reference to an Object of the type you are calling the method on, and a method with the same name and type signature must exist in that type, or the compiler will complain. What is interesting is the actual call is dynamic, although this is not obvious to the programmer.
For example, consider a class such as
class MyClass {
void doSomething() {}
}
class AnotherClass {
void someMethod() {
Object object = new Object();
object.doSomething(); // compiler checks and complains that Object contains no such method.
// However, through an explicit cast, you can calm the compiler down,
// even though your program will crash at runtime
((MyClass) object).doSomething(); // syntactically valid, yet incorrect
}
}
In Objective-C however, the compiler simply issues you a warning for passing a message to an Object that it thinks the Object may not understand, but ignoring it doesn't stop your program from executing.
While this is very powerful and flexible, it can result in hard-to-find bugs when used incorrectly because of stack corruption.
Adapted from the article here.
Also see this article for more information.
as a first approximation, the answer is: none, as long as you "behave normally"
Even though many people think there is - technically, it is usually the same: a cached lookup of a piece of code to be executed for a particular named-operation (at least for the normal case). Calling the name of the operation a "message" or a "virtual-method" does not make a difference.
BUT: the Actor language is really different: in having active objects (every object has an implicit message-queue and a worker thread - at least conceptionally), parallel processing becones easier to handle (google also "communicating sequential processes" for more).
BUT: in Smalltalk, it is possible to wrap objects to make them actor-like, without actually changing the compiler, the syntax or even recompiling.
BUT: in Smalltalk, when you try to send a message which is not understoof by the receiver (i.e. "someObject foo:arg"), a message-object is created, containing the name and the arguments, and that message-object is passed as argument to the "doesNotUnderstand" message. Thus, an object can decide itself how to deal with unimplemented message-sends (aka calls of an unimplemented method). It can - of course - push them into a queue for a worker process to sequentialize them...
Of course, this is impossible with statically typed languages (unless you make very heavy use of reflection), but is actually a VERY useful feature. Proxy objects, code load on demand, remote procedure calls, learning and self-modifying code, adapting and self-optimizing programs, corba and dcom wrappers, worker queues are all built upon that scheme. It can be misused, and lead to runtime bugs - of course.
So it it is a two-sided sword. Sharp and powerful, but dangerous in the hand of beginners...
EDIT: I am writing about language implementations here (as in Java vs. Smalltalk - not inter-process mechanisms.
IIRC, they've been formally proven to be equivalent. It doesn't take a whole lot of thinking to at least indicate that they should be. About all it takes is ignoring, for a moment, the direct equivalence of the called address with an actual spot in memory, and consider it simply as a number. From this viewpoint, the number is simply an abstract identifier that uniquely identifies a particular type of functionality you wish to invoke.
Even when you are invoking functions in the same machine, there's no real requirement that the called address directly specify the physical (or even virtual) address of the called function. For example, although almost nobody ever really uses them, Intel protected mode task gates allow a call to be made directly to the task gate itself. In this case, only the segment part of the address is treated as an actual address -- i.e., any call to a task gate segment ends up invoking the same address, regardless of the specified offset. If so desired, the processing code can examine the specified offset, and use it to decide upon an individual method to be invoked -- but the relationship between the specified offset and the address of the invoked function can be entirely arbitrary.
A member function call is simply a type of message passing that provides (or at least facilitates) an optimization under the common circumstance that the client and server of the service in question share a common address space. The 1:1 correspondence between the abstract service identifier and the address at which the provider of that service reside allows a trivial, exceptionally fast, mapping from one to the other.
At the same time, make no mistake about it: the fact that something looks like a member function call doesn't prevent it from actually executing on another machine or asynchronously, or (frequently) both. The typical mechanism to accomplish this is proxy function that translates the "virtual message" of a member function call into a "real message" that can (for example) be transmitted over a network as needed (e.g., Microsoft's DCOM, and CORBA both do this quite routinely).
They really aren't the same thing in practice. Message passing is a way to transfer data and instructions between two or more parallel processes. Method invocation is a way to call a subroutine. Erlang's concurrency is built on the former concept with its Concurrent Oriented Programing.
Message passing most likely involves a form of method invocation, but method invocation doesn't necessarily involve message passing. If it did it would be message passing. Message passing is one form of performing synchronization between to parallel processes. Method invocation generally means synchronous activities. The caller waits for the method to finish before it can continue. Message passing is a form of a coroutine. Method-invocation is a form of subroutine.
All subroutines are coroutines, but all coroutines are not subroutines.
Is there a difference between message-passing and method-invocation, or can they be considered equivalent?
They're similar. Some differences:
Messages can be passed synchronously or asynchronously (e.g. the difference between SendMessage and PostMessage in Windows)
You might send a message without knowing exactly which remote object you're sending it to
The target object might be on a remote machine or O/S.

Writing Signal handlers for Shared libraries or DLL?

I have a Application A(by some company X). This application allows me to extend the functionality by allowing me to write my own functions.
I tell the Application A to call my user functions in the Applications A's configuration file (this is how it knows that Appl A must call user written Functions). The appl A uses Function pointers which I must register with Application A prior to calling my user written functions.
If there is a bug or fault in my user written functions in production, the Appl A will stop functioning. For example, if I have a segmentation fault in my User written functions.
So Application A will load my user written function from a shared DLL file. This means that my user written functions will be running in Application A' Process address space.
I wish to handle certain signals like Segmentation fault, divide by zero and stack overflow, but applications A has its own signal handlers written for this,
How can I write my own signal handlers to catch the exceptions in my user written functions, so that I can clean up gracefully w/o affecting much of Application A? Since my user functions will be called in Applications A's process, the OS will call signal handlers written in Application A and not my user functions.
How can I change this? I want OS to call signal handlers written in my functions but only for signal raised by my functions, which is asynchronous in nature.
Note: I do not have the source code of Application A and I cannot make any changes to it, because it's controlled by a different company.
I will be using C , and only C on a Linux, solaris and probably windows platforms.
You do not specify which platform you're working with, so I'll answer for Linux, and it should be valid for Windows as well.
When you set your signal handlers, the system call that you use returns the previous handler. It does it so that you can return it once you are no longer interested in handling that signal.
Linux man page for signal
MSDN entry on signal
Since you are a shared library loaded into the application you should have no problems manipulating the signals handlers. Just make sure to override the minimum you need in order to reduce the chances of disrupting the application itself (some applications use signals for async notifications).
The cleanest way to do this would be run your application code in a separate process that communicates with the embedded shared DLL code via some IPC mechanism. You could handle whatever signals you wanted in your process without affecting the other process. Typically the conditions you mention (seg fault, divide by zero, stack overflow) indicate bugs in the program and will result in termination. There isn't much you can do to "handle" these beyond fixing the root cause of the bug.
in C++, you can catch these by putting your code in a try-catch:
try
{
// here goes your code
}
catch ( ... )
{
// handle segfaults
}