Why should you keep ESP in EBP inside a call? - function

I'm reading in Professional Assembly Language by Richard Blum that when you enter a call you should copy the value of the ESP register to EBP, and he also provided the following template:
function_label:
pushl %ebp
movl %esp, %ebp
< normal function code goes here>
movl %ebp, %esp
popl %ebp
ret
I don't understand why this is necessary. When you push something inside the function, you obviously intend to pop it back, thus restoring ESP to it's original value.
So why have this template?
And what's the use of the EBP register anyway?
I'm obviously missing something, but what is it?

When you push something inside the function, you obviously intend to pop it back
That's just part of the reason for using stack. The far more common usage is the one that's missing from your snippet, storing local variables. The next common code you see after setting up EBP is a substraction on ESP, equivalent to the amount of space required for local variable storage. That's of course easy to balance as well, just add the same amount back at the function epilogue. It gets more difficult when the code is also using things like C99 variable length arrays or the non-standard but commonly available _alloca() function. Being able to restore ESP from EBP makes this simple.
More to the point perhaps, it is not necessary to setup the stack frame like this. Most any x86 compiler supports an optimization option called "frame pointer omission". Turned on with GCC's -fomit-frame-pointer, /Oy on MSVC. Which makes the EBP register available for general usage, that can be very helpful on x86 with its dearth of cpu registers.
That optimization has a very grave disadvantage though. Without the EBP register pointing at the start of a stack frame, it gets very difficult to perform stack walks. That matters when you need to debug your code. A stack trace can be very important to find out how your code ended up crashing. Invaluable when you get a "core dump" of a crash from your customer. So valuable that Microsoft agreed to turn off the optimization on Windows binaries to give their customers a shot at diagnosing crashes.

Related

Assembly Commands Are Running without Me Explicitly Calling Them [duplicate]

I hope this question isn't to stupid cause it may seem obvious.
As I'm doing a little research on Buffer overflows I stumble over a simple question:
After going to a new Instruction Address after a call/return/jump:
Will the CPU execute the OP Code at that address and then move one byte to the next address and execute the next OP Code and so on until the next call/return/jump is reached? Or is there something more tricky involved?
A bit boringly extended explanation (saying the same as those comments):
CPU has special purpose register instruction pointer eip, which points to the next instruction to execute.
A jmp, call, ret, etc. ends internally with something similar to:
mov eip,<next_instruction_address>.
While the CPU is processing instructions, it does increment eip by appropriate size of last executed instruction automatically (unless overridden by one of those jmp/j[condition]/call/ret/int/... instructions).
Wherever you point the eip (by whatever means), CPU will try it's best to execute content of that memory as next instruction opcode(s), not aware of any context (where/why did it come from to this new eip). Actually this amnesia sort of happens ahead of each instruction executed (I'm silently ignoring the modern internal x86 architecture with various pre-execution queues and branch predictions, translation into micro instructions, etc... :) ... all of that are implementation details quite hidden from programmer, usually visible only trough poor performance, if you disturb that architecture much by jumping all around mindlessly). So it's CPU, eip and here&now, not much else.
note: some context on x86 can be provided by defining the memory layout by supervising code (like OS), ie. marking some areas of memory as non-executable. CPU detecting it's eip pointing to such area will signal a failure, and fall into "trap" handler (usually managed by OS also, killing the interfering process).
The call instruction saves (onto the stack) the address to the instruction after it onto the stack. After that, it simply jumps. It doesn't explicitly tell the cpu to look for a return instruction, since that will be handled by popping (from the stack) the return address that call saved in the first place. This allows for multiple calls and returns, or to put it simply, nested calls.
While the CPU is processing instructions, it does increment eip by
appropriate size of last executed instruction automatically (unless
overridden by one of those jmp/j[condition]/call/ret/int/... instructions).
That's what i wanted to know.
I'm well aware that thers more Stuff arround (NX Bit, Pipelining ect).
Thanks everybody for their replys

How many arguments are passed in a function call?

I wish to analyze assembly code that calls functions, and for each 'call' find out how many arguments are passed to the function. I assume that the target functions are not accessible to me, but only the calling code.
I limit myself to code that was compiled with GCC only, and to System V ABI calling convention.
I tried scanning back from each 'call' instruction, but I failed to find a good enough convention (e.g., where to stop scanning? what happen on two subsequent calls with the same arguments?). Assistance is highly appreciated.
Reposting my comments as an answer.
You can't reliably tell in optimized code. And even doing a good job most of the time probably requires human-level AI. e.g. did a function leave a value in RSI because it's a second argument, or was it just using RSI as a scratch register while computing a value for RDI (the first argument)? As Ross says, gcc-generated code for stack-args calling-conventions have more obvious patterns, but still nothing easy to detect.
It's also potentially hard to tell the difference between stores that spill locals to the stack vs. stores that store args to the stack (since gcc can and does use mov stores for stack-args sometimes: see -maccumulate-outgoing-args). One way to tell the difference is that locals will be reloaded later, but args are always assumed to be clobbered.
what happen on two subsequent calls with the same arguments?
Compilers always re-write args before making another call, because they assume that functions clobber their args (even on the stack). The ABI says that functions "own" their args. Compilers do make code that does this (see comments), but compiler-generated code isn't always willing to re-purpose the stack memory holding its args for storing completely different args in order to enable tail-call optimization. :( This is hand-wavey because I don't remember exactly what I've seen as far as missed tail-call optimization opportunities.
Yet if arguments are passed by the stack, then it shall probably be the easier case (and I conclude that all 6 registers are used as well).
Even that isn't reliable. The System V x86-64 ABI is not simple.
int foo(int, big_struct, int) would pass the two integer args in regs, but pass the big struct by value on the stack. FP args are also a major complication. You can't conclude that seeing stuff on the stack means that all 6 integer arg-passing slots are used.
The Windows x64 ABI is significantly different: For example, if the 2nd arg (after adding a hidden return-value pointer if needed) is integer/pointer, it always goes in RDX, regardless of whether the first arg went in RCX, XMM0, or on the stack. It also requires the caller to leave "shadow space".
So you might be able to come up with some heuristics to will work ok for un-optimized code. Even that will be hard to get right.
For optimized code generated by different compilers, I think it would be more work to implement anything even close to useful than you'd ever save by having it.

MIPS: why is ISR surrounded with rdpgpr $sp, $sp; wrpgpr $sp, $sp instructions?

I'm working with PIC32 MCUs (MIPS M4K core), I'm trying to understand how do interrupts work in MIPS; I'm armed with "See MIPS Run" book, official MIPS reference and Google. No one of them can help me understand the following:
I have interrupt declared like this:
void __ISR(_CORE_TIMER_VECTOR) my_int_handler(void)
I look at disassembly, and I see that RDPGPR SP, SP is called in the ISR prologue (first instruction, actually); and balancing WRPGPR SR, SR instruction is called in the ISR epilogue (before writing previously-saved Status register to CP0 and calling ERET).
I see that these instruction purposes are to read from and save to previous shadow register set, so, RDPGPR SP, SP reads $sp from shadow register set and WRPGPR SR, SR writes it back, but I can't understand the reason for this. This ISR intended not to use shadow register set, and actually in disassembly I see that context is saved to the stack. But, for some reason, $sp is read from and written to shadow $sp. Why is this?
And, related question: is there some really comprehensive resource (book, or something) on MIPS assembly language? "See MIPS Run" seems really good, it's great starting point for me to dig into MIPS architecture, but it does not cover several topics good enough, several things off the top of my head:
Very little information about EIC (external interrupt controller) mode: it has the diagram with Cause register that shows that in EIC mode we have RIPL instead of IP7-2, but there is nothing about how does it work (say, that interrupt is caused if only Cause->RIPL is more than Status->IPL. There's even no explanation what RIPL does mean ("Requested Interrupt Priority Level", well, Google helped). I understand that EIC is implementation-dependent, but the things I just mentioned are generic.
Assembly language is covered not completely enough: say, nothing about macro (.macro, .endm directives), I couldn't find anything about some assembler directives I've seen in the existing code, say, .set mips32r2, and so on.
I cant find anything about using rdpgpr/wrpgpr in the ISR, it covers these instructions (and shadow register sets in general) very briefly
Official MIPS reference doesn't help much in these topics as well. Is there really good book that covers all possible assembly directives, and so on?
When the MIPS core enters an ISR it can swap the interrupted code's active register set with a new one (there can be several different shadow register sets), specific for that interrupt priority.
Usually the interrupt routines don't have a stack of their own, and because the just switched-in shadow register set certainly have its sp register with a different value than the interrupted code's, the ISR copies the sp value from the just switched-out shadow register set to its own, to be able to use the interrupted code's stack.
If you wish, you could set your ISR's stack to a previously allocated stack of its own, but that is usually not useful.

Self-modifying program: Why does it raise an exception?

Just for the purposes of experimenting and playing around, I wrote the following short x64 assembly program:
.code
AsmFun proc
mov rax, MyLabel
mov byte ptr [rax], 0C3h ; C3 is x64 machine code for "ret"
MyLabel:
mov rax, 239847 ; This isn't "ret"
AsmFun endp
end
(I then called the code from C.)
It compiles/assembles/links just fine, but when I walk through the program, Visual Studio complains that an un-handled exception has been raised: "Access writing violation as [MyLabel].", where of course it doesn't actually say "[MyLabel]", but rather the address that happens to be at in memory.
Why is this happening? Is it a Windows thing that was put in place to avoid security exploits?
I live in Linux world, but perhaps you can adapt what I've found out.
Memory pages are generally read-only if they have execute permission. How I got around this was with mmap() and mprotect()... I'm sure there's something similar in Windows. It's a good bet the Mono source code would shed some light.
I used mmap() to allocate a new page with write access (but not read or execute). I populated it, then called mprotect() to change it to read-only and executable.
Don't forget... there are registers you want to avoid trashing. See the ABI documentation for further details.

Can I identify a "function" in an x86 binary?

"Function" meaning a chunk (or a graph of chunks) of the binary that starts at a point (likely arriving from one of the CALL instructions), possibly sets up a stack frame, and has one or more endpoints in the form of RETs (and depending on the calling convention it may also unwind said stack frame).
My current idea is to treat the various conditional branching instructions as junctions in a graph and do a Breadth-first search on the code this way. Is this viable at all? If not, what's a better approach?
My objective with this is just what it is: extract the functions. Purely for the sake of doing it. Maybe doing something fancy later if I have the time and notion.
You can use a disassembler library like BeaEngine to do the hard work for you and then search on resulting mnemonics for call.
Without a symbol table I would say: almost impossible. At least without false positives/negatives.
What you need first is a disassembler. Just looking for a byte combination won't cut it, the combination might be part of some "random" data. Then, tracing the CALLs is likely the best solution as a function doesn't necessarily always start with the same opcode sequence. But even a disassembler might have a hard time and get confused by embedded data in the text segment.
Even if you were able to find the functions, you cannot get their names without debug symbols (in the compiled program there's no need for names any more, only addresses).
Also, you'd have a very hard time finding out what kind of parameters the function accepts. For example, a function might accept 2 argument but uses neither. In this case you would need a function call and look at how the stack is prepared in advance of calling the function.
You have to look for things like:
push ebp
mov ebp, esp
sub esp, ???
...
...
add esp, ???
pop ebp
ret