I was looking at a program in IDA as i was trying to figure out how a certain function worked, when I came across something like this:
; C_TestClass::Foo(void)
__text:00000000 __ZN14C_TestClass7FooEv proc near
__text:00000000 jmp __ZN14C_TestClass20Barr ; C_TestClass::Barr(void)
__text:00000000 __ZN14C_TestClass7FooEv endp
__text:00000000
Can anyone explain to me what exactly jumping to a function would do in a case like this?
I am guessing that it acts as a wrapper for the other function?
You are correct that jumping is often a way to efficiently handle wrapper functions that aren't inlined.
Normally, you'd have to read all the function parameters and push them back onto the stack before you can call the sub-function.
But when the wrapper function has the exact same prototype:
Same calling convention
Same parameters (and in the same order)
Same return type
there's no need for all the usual function calling overhead. You can just jump directly to the target. (These categories may not be entirely necessary, as it may still be possible in some other cases.)
All the parameters (either on the stack or register) that were setup when calling the wrapper function will already be in place (and compatible) for the sub-function.
This is what's known as a tail-call. The compiler is able to invoke the function directly and let it return to the original caller without causing any problems. For example:
int f(int x)
{
...
return g(y);
}
The compiler can simply jmp g at the end because there is room for the arguments in the same place as f's arguments and the return value is not modified before returning to f's caller.
Related
I want to know when a function body end in assemby, for example in c you have this brakets {} that tell you when the function body start and when it ends but how do i know this in assembly?
Is there a parser that can extract me all the functions from assembly and start line and endline of their body?
There's no foolproof way, and there might not even be a well-defined correct answer in hand-written asm.
Usually (e.g. in compiler-generated code) you know a function ends when you see the next global symbol, like objdump does to decide when to print a new "banner". But without all function-start symbols being visible, there's no unambigious way. That's why some object file formats have room for size metadata associated with a symbol. Like .size foo, . - foo in GAS syntax.
It's not as easy as looking for a ret; some functions end with a jmp tail-call to another function. And some call a noreturn function like abort or __stack_chk_fail (not tailcall because they want to push a return address for a backtrace.) Or just fall off into whatever's next because that path had undefined behaviour in the source so the compiler assumed it wasn't reachable and stopped generating instructions for it, e.g. a C++ non-void function where execution can/does fall off the end without a return.
In general, assembly can blur the lines of what a function is.
Asm has features you can use to implement the high-level concept of a function, but you're not restricted to that.
e.g. multiple asm functions could all return by jumping to a common block of code that pops some registers before a ret. Is that shared tail a separate function that's called with a tail-called with a special calling convention?
Compilers don't usually do that, but humans could.
As for function entry points, usually some other code somewhere in the program will contain a call to it. But not necessarily; it might only be reachable via a table of function pointers, and you don't know that a block of .rodata holds function pointers until you find some code loading from it and calling or jumping.
But that doesn't work if the lowest-address instruction of the function isn't its entry point. See Does a function with instructions before the entry-point label cause problems for anything (linking)? for an example
Compilers don't generate code like that, but humans can. (It's a handy trick sometimes for https://codegolf.stackexchange.com/ questions.)
Or in the general case, a function might have multiple entry points. Or you could describe that as multiple functions with overlapping implementations. Sometimes it's as simple as one tailcalling another by falling into it without needing a jmp, i.e. it starts a few instructions before another.
I wan't to know when a function body ends in assembly, [...]
There are mainly four ways that the execution of a stream of (userspace) instructions can "end":
An unconditional jump like jmp or a conditional one like Jcc (je,jnz,jg ...)
A ret instruction (meaning the end of a subroutine) which probably comes closest to the intent of your question (including the ExitProcess "ret" command)
The call of another "function"
An exception. Not a C style exception, but rather an exception like "Invalid instruction" or "Division by 0" which terminates the user space program
[...] for example in c you have this brakets {} that tell you when the function body start and when it ends but how do i know this in assembly?
Simple answer: you don't. On the machine level every address can (theoretically) be an entry point to a "function". So there is no unique entry point to a "function" other than defined - and you can define anything.
On a tangent, this relates to self-modifying code and viruses, but it must not. The exit/end is as described in the first part above.
Is there a parser that can extract me all the functions from assembly and
start line and endline of their body?
Disassemblers create some kind of "functions" with entry and exit points. But they are merely assumed. No way to know if that assumption is correct. This may cause problems.
The usual approach is using a disassembler and the work to recombinate the stream of instructions to different "functions" remains to the person that mandated this task (vulgo: you). Some tools exist that claim to simplify this, but I cannot judge their efficacy.
From the perspective of a high level language, there are decompilers that try to reverse the transformation from (for example) C to assembly/machine code that try to automatize that task and will work more or less or in some cases.
Here is a simplified code
func MyHandler(a int) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteCode(a)
})
}
Whenever a http request comes MyHandler will be called, and it will return a function which will be used to handle the request. So whenever a http request comes a new function object will be created. Function is taken as the first class in Go. I'm trying to understand what actually happened when you return a function from memory's perspective. When you return a value, for example an integer, it will occupy 4 bytes in stack. So how about return a function and lots of things inside the function body? Is it an efficient way to do so? What's shortcomings?
If you're not used to closures, they may seem a bit magic. They are, however, easy to implement in compilers:
The compiler finds any variables that must be captured by the closure. It puts them into a working area that will be allocated and remain allocated as long as the closure itself exists.
The compiler then generates the inner function with a secret extra parameter, or some other runtime trickery,1 such that calling the function activates the closure.
Because the returned function accesses its closure variables through the compile-time arrangement, there's nothing special needed. And since Go is a garbage-collected language, there's nothing else needed either: the pointer to the closure keeps the closure data alive until the pointer is gone because the function cannot be called any more, at which point the closure data evaporates (well, at the next GC).
1GCC sometimes uses trampolines to do this for C, where trampolines are executable code generated at runtime. The executable code may set a register or pass an extra parameter or some such. This can be expensive since something treated as data at runtime (generated code) must be turned into executable code at runtime (possibly requiring a system call and potentially requiring that some supervisory code "vet" the resulting runtime code).
Go does not need any of this because the language was defined with closures in mind, so implementors don't, er, "close off" any easy ways to make this all work. Some runtime ABIs are defined with closures in mind as well, e.g., register r1 is reserved as the closure-variables pointer in all pointer-to-function types, or some such.
Actual function size is irrelevant. When you return a function like this, memory will be allocated for the closure, that is, any variables in the scope that the function uses. In this case, a pointer will be returned containing the address of the function and a pointer to the closure, which will contain a reference to the variable a.
void(* resetDevice) (void) = 0;
This was a function declaration I stumbled upon, moreover no definition found later.
What kind of a function is this? What does it do? What does it return?
This is pointer to void function accepting no parameters. Pointer is named resetDevice and is initialized to null. Some library code later will initialize it to point to real function. You will be able to call it with
resetDevice();
Now, what will that function do exactly, I have no idea - it depends on your environment/libraries, which you have not specified. My guess would be it is Arduino, you can read bit more about it here
http://forum.arduino.cc/index.php?topic=385427.0
I use this piece of code to reset my Arduino board.
As I found it, it is not a really elegant method to reset the board, but it works.
If you call "resetDevice()" while it is pointing to address 0 (NULL), it will cause a runtime error or "null pointer exception" as the pointer is not pointing to a valid memory location containing a function. And that error causing a board reboot.
Learning Ada and trying to make a stack ADT and I'm using this webpage to figure it out.
http://www.functionx.com/ada/Lesson06.htm
eightqueens.adb
with Ada.Text_IO;
use Ada.Text_IO;
with Stack;
use Stack;
procedure EightQueens is
begin
put_line ("awd");
end EightQueens;
stack.ads
package Stack is
function awd () return Integer;
end Stack;
stack.adb
package body Stack is
function awd () return integer is
begin
return 1;
end awd;
end Stack;
Error is
stack.ads:2:19: identifier expected
I'm most certain I did everything correctly.
Ada doesn't use empty parentheses, either for defining or for calling functions or procedures.
And for future reference, the phrase "I'm most certain I did everything correctly." is a red flag indicating that you've almost certainly done something wrong.
Just to elaborate, there are some syntactic decisions that Ada made that IMHO are superior to what you may be used to from C-syntax languages.
Functions with no parameters don't use empty parenthesis in their calls. This allows you to change a contant to a function call without having to recode any of the clients.
Arrays use parentheses like function calls do, rather than some unique syntax. This allows you to change an array constant to a function call without having to recode any of the clients.
To look at it another way, a constant is just a simplified version of a parameterless function, for when you can get away with always returning the same value. Likewise, a constant array is a simplified version of a parametered function call, for when you can get away with always returning the same value. If you later discover you need a more complex implementation, that's not the client's concern, and should not affect their code.
Steve Yegge mentioned it in a blog post and I have no idea what it means, could someone fill me in?
Is it the same thing as tail call optimization?
Tail call elimination is an optimization that saves stack space. It replaces a function call with a goto. Tail recursion elimination is the same thing, but with the added constraint that the function is calling itself.
Basically, if the very last thing a function A does is return A(params...) then you can eliminate the allocation of a stack frame and instead set the appropriate registers and jump directly into the body of the function.
Consider an (imaginary) calling convention that passes all parameters on the stack and returns the value in some register.
Some function could compile down to (in an imaginary assembly language):
function:
//Reading params B, C, & D off the stack
pop B
pop C
pop D
//Do something meaningful, including a base case return
...
//Pass new values for B, C, & D to a new invocation of function on the stack
push D*
push C*
push B*
call function
ret
Whatever the above actually does, it takes up a whole new stack frame for each call to function. However, since nothing occurs after the tail call to function except a return we can safely optimize that case away.
Resulting in:
function:
//Reading params B, C, & D off the stack (but only on the first call)
pop B
pop C
pop D
function_tail_optimized:
//Do something meaningful, including a base case return
...
//Instead of a new stack frame, load the new values directly into the registers
load B, B*
load C, C*
load D, D*
//Don't call, instead jump directly back into the function
jump function_tail_optimized
The end result is an equivalent function that saves a lot of stack space, especially for inputs that result in a large number of recursive calls.
There's a lot of imagination required in my answer, but I think you can get the idea.
from here:
"...Tail recursion elimination is a
special case of tail call elimination
in which the tail call is a call to
the function itself. In that case the
call can be replaced by a jump to the
start of the function after moving the
new arguments to the appropriate
registers or stack locations..."
from Wikipedia:
"...When a function is called, the computer must "remember" the place it was called from, the return address, so that it can return to that location with the result once the call is complete. Typically, this information is saved on the stack, a simple list of return locations in order of the times that the call locations they describe were reached. Sometimes, the last thing that a function does after completing all other operations is to simply call a function, possibly itself, and return its result. With tail recursion, there is no need to remember the place we are calling from — instead, we can leave the stack alone, and the newly called function will return its result directly to the original caller. Converting a call to a branch or jump in such a case is called a tail call optimization. Note that the tail call doesn't have to appear lexically after all other statements in the source code; it is only important that its result be immediately returned, since the calling function will never get a chance to do anything after the call if the optimization is performed...."