Stack(s), Registers in ActionScript ByteCode AVM2, which all are there? - actionscript-3

From the AVM2 Overview PDF I encountered references to two types of stacks - Scope Stack and Operand Stack.
1) I assume these are two different memory stacks, each handling different things. Are there even more stacks?
2) pushstring "hello" - this would push a start of memory address where "hello" string is located onto Operand Stack. Right?
3) setlocal 0 - this would store a value from the stack (above) into register0 by popping it off. Right?
4) PushScope() - hmm, docs say pop value of stack, push value onto Scope Stack. Why?
I know a little bit of NASM but ABC seems more complex than that. Especially I'm confused about Scope Stack and the whole concept of multiple stacks.

I am no AVM2 expert, but here's what I know:
There are only 2 stacks, the two you mention: scope and operand.
Yes, pushstring "hello" will push the string onto the operand stack.
Also, correct. setlocal0 will pop "hello" off the stack and store it in reg 0.
The scope stack is used by all operations that require a name lookup for scope, for instance closures and exceptions. Often in ASM code you'll see getlocal_0 immediately followed by a pushscope. This is pretty common. You can kind of think of it as adding the "this" object to the scope stack for future reference in method calls, scope for closures, etc.
I highly recommend downloading the Tamarin source and playing with the decompiler there. Also, Yogda looks to be pretty handy for learning: http://www.yogda.com/

Related

MIPS functions and variables in stack

I have come in contact with MIPS-32, and I came with the question if a variable, for example $t0 declared having the value in one function can be altered by another and how this does have to do with stack, this is, the location of the variable in memory. Everything that I am talking is in assembly language. And more, I would like some examples concerning this use, this is, a function altering or not, a variable value of another function, and how this variable "survive" or not in terms of if the variable is given as a copy or a reference.
(I hope we can create an environment where conceptual question like that above can be explored more)
$t0 declared having the value in one function can be altered by another
$t0 is known as a call-clobbered register.  It is no different than the other registers as far as the hardware is concerned — being call clobbered vs. call preserved is an aspect of software convention, call the calling convention, which is a subset of an Application Binary Interface (ABI).
The calling convention, when followed, allows a function, F, to call another function, G, knowing only G's signature — name, parameters & their types, return type.  The function, F, would not have to also be changed if G changes, as long as both follow the convention.
Call clobbered doesn't mean it has to be clobbered, though, and when writing your own code you can use it any way you like (unless your coursework says to follow the MIPS32 calling convention, of course).
By the convention, a call-clobbered register can be used without worry: all you have to do use it is put a value into it!
Call preserved registers can also be used, if desired, but they should be presumed to be already in use by some caller (maybe not the immediate caller, but some distant caller), the values they contain must be restored before exiting the function back to return to its caller.  This is, of course, only possible by saving the original value before repurposing the register for a new use.
The two sets of register (call clobbered/preserved) serve two common use cases, namely cheap temporary storage and long term variables.  The former requires no effort to preserve/restore, while the latter both does require this effort, though gives us registers that will survive a function call, which is useful, for example, when a loop has a function call.
The stack comes into play when we need to first preserve, then restore call-preserved registers.  If we want to use call-preserved registers for some reason, then we need to preserve their original values in order to restore them later.  The most reasonable way to do that is to save them in the stack.  In order to do that we allocate some space from the stack.
To allocate some local memory, the stack pointer is decremented to reserve a function some space.  Since the stack pointer, upon return to caller, must have the same value, this space is necessarily deallocated upon return.  Hence the stack is great for local storage.  Original values of preserved registers must be also restored upon return to caller and so using local storage is appropriate.
https://www.dyncall.org/docs/manual/manualse11.html — search for section "MIPS32".
Let's also make the distinction between variables, a logical concept, and storage, a physical concept.
In high level language, variables are named and have scopes (limited lifetimes).  In machine code, we have physical hardware (storage) resources of registers and memory; these simply exist: they have no concept of lifetime.  In and of themselves these hardware resources are not variables, but places that we can use to hold variables for their lifetime/scope.
As assembly language programmers, we keep a mental (or even written) map of our logical variables to physical resources.  The compiler does the same, knowing the scope/lifetime of program variables and creating that "mental" map of variables to machine code storage.  Variables that have overlapping lifetimes cannot share the same hardware resource, of course, but when a variable is out of scope, its (mapped-to) physical resource can be reused for another purpose.
Logical variables can also move around to different physical resources.  A logical variable that is a parameter, may be passed in a CPU register, e.g. $a0, but then be moved into an $s register or into a (stack) memory location.  Such is the business of machine code.
To allocate some hardware storage to a high level language (or pseudo code) variable, we simply initialize the storage!  Hardware resources are necessarily constantly being repurposed to hold a different logical variable.
See also:
How a recursive function works in MIPS? — for discussion on variable analysis.
Mips/assembly language exponentiation recursivley
What's the difference between caller-saved and callee-saved in RISC-V

Function call and context save to stack

I am very interested in real time operating systems for micro-controllers, so I am doing a deep research on the topic. At the high level I understand all the general mechanisms of an OS.
In order to better learn it I decided to write a very simple kernel that does nothing but the context switch. This raised a lot of additional - practical questions to me. I was able to cope with many of them but I am still in doubt with the main thing - Saving context (all the CPU registers, and stack pointer) of a current task and restore context of a new task.
In general, OS use some function (lets say OSContextSwitch()) that preserves all the actions for the context switch. The body of the OSContextSwitch() is mainly written in assembly (inline assembly in C body function). But when the OSContextSwitch() is called by the scheduler, as far as I know, on a function call some of the CPU registers are preserved on the stack by the compiler (actually by the code generated by the compiler).
Finally, the question is: How to know which of the CPU registers are already preserved by the compiler to the stack so I can preserve the rest ? If I preserved all the registers regardless of the compiler behaviour, obviously there will be some stack leakage.
Such function should be written either as pure assembly (so NOT an assembly block inside a C function) or as a "naked" C function with nothing more than assembly block. Doing anything in between is a straight road to messing things up.
As for the registers which you should save - generally you need to know the ABI for your platform, which says that some registers are saved by caller and some should be saved by callee - you generally need to save/restore only the ones which are normally saved by callee. If you save all of them nothing wrong will happen - your code will only be slightly slower and use a little more RAM, but this will be a good place to start.
Here's a typical context switch implementation for ARM Cortex-M microcontrollers - https://github.com/DISTORTEC/distortos/blob/master/source/architecture/ARM/ARMv6-M-ARMv7-M/ARMv6-M-ARMv7-M-PendSV_Handler.cpp#L76

How many arguments are passed in a function call?

I wish to analyze assembly code that calls functions, and for each 'call' find out how many arguments are passed to the function. I assume that the target functions are not accessible to me, but only the calling code.
I limit myself to code that was compiled with GCC only, and to System V ABI calling convention.
I tried scanning back from each 'call' instruction, but I failed to find a good enough convention (e.g., where to stop scanning? what happen on two subsequent calls with the same arguments?). Assistance is highly appreciated.
Reposting my comments as an answer.
You can't reliably tell in optimized code. And even doing a good job most of the time probably requires human-level AI. e.g. did a function leave a value in RSI because it's a second argument, or was it just using RSI as a scratch register while computing a value for RDI (the first argument)? As Ross says, gcc-generated code for stack-args calling-conventions have more obvious patterns, but still nothing easy to detect.
It's also potentially hard to tell the difference between stores that spill locals to the stack vs. stores that store args to the stack (since gcc can and does use mov stores for stack-args sometimes: see -maccumulate-outgoing-args). One way to tell the difference is that locals will be reloaded later, but args are always assumed to be clobbered.
what happen on two subsequent calls with the same arguments?
Compilers always re-write args before making another call, because they assume that functions clobber their args (even on the stack). The ABI says that functions "own" their args. Compilers do make code that does this (see comments), but compiler-generated code isn't always willing to re-purpose the stack memory holding its args for storing completely different args in order to enable tail-call optimization. :( This is hand-wavey because I don't remember exactly what I've seen as far as missed tail-call optimization opportunities.
Yet if arguments are passed by the stack, then it shall probably be the easier case (and I conclude that all 6 registers are used as well).
Even that isn't reliable. The System V x86-64 ABI is not simple.
int foo(int, big_struct, int) would pass the two integer args in regs, but pass the big struct by value on the stack. FP args are also a major complication. You can't conclude that seeing stuff on the stack means that all 6 integer arg-passing slots are used.
The Windows x64 ABI is significantly different: For example, if the 2nd arg (after adding a hidden return-value pointer if needed) is integer/pointer, it always goes in RDX, regardless of whether the first arg went in RCX, XMM0, or on the stack. It also requires the caller to leave "shadow space".
So you might be able to come up with some heuristics to will work ok for un-optimized code. Even that will be hard to get right.
For optimized code generated by different compilers, I think it would be more work to implement anything even close to useful than you'd ever save by having it.

MIPS: why is ISR surrounded with rdpgpr $sp, $sp; wrpgpr $sp, $sp instructions?

I'm working with PIC32 MCUs (MIPS M4K core), I'm trying to understand how do interrupts work in MIPS; I'm armed with "See MIPS Run" book, official MIPS reference and Google. No one of them can help me understand the following:
I have interrupt declared like this:
void __ISR(_CORE_TIMER_VECTOR) my_int_handler(void)
I look at disassembly, and I see that RDPGPR SP, SP is called in the ISR prologue (first instruction, actually); and balancing WRPGPR SR, SR instruction is called in the ISR epilogue (before writing previously-saved Status register to CP0 and calling ERET).
I see that these instruction purposes are to read from and save to previous shadow register set, so, RDPGPR SP, SP reads $sp from shadow register set and WRPGPR SR, SR writes it back, but I can't understand the reason for this. This ISR intended not to use shadow register set, and actually in disassembly I see that context is saved to the stack. But, for some reason, $sp is read from and written to shadow $sp. Why is this?
And, related question: is there some really comprehensive resource (book, or something) on MIPS assembly language? "See MIPS Run" seems really good, it's great starting point for me to dig into MIPS architecture, but it does not cover several topics good enough, several things off the top of my head:
Very little information about EIC (external interrupt controller) mode: it has the diagram with Cause register that shows that in EIC mode we have RIPL instead of IP7-2, but there is nothing about how does it work (say, that interrupt is caused if only Cause->RIPL is more than Status->IPL. There's even no explanation what RIPL does mean ("Requested Interrupt Priority Level", well, Google helped). I understand that EIC is implementation-dependent, but the things I just mentioned are generic.
Assembly language is covered not completely enough: say, nothing about macro (.macro, .endm directives), I couldn't find anything about some assembler directives I've seen in the existing code, say, .set mips32r2, and so on.
I cant find anything about using rdpgpr/wrpgpr in the ISR, it covers these instructions (and shadow register sets in general) very briefly
Official MIPS reference doesn't help much in these topics as well. Is there really good book that covers all possible assembly directives, and so on?
When the MIPS core enters an ISR it can swap the interrupted code's active register set with a new one (there can be several different shadow register sets), specific for that interrupt priority.
Usually the interrupt routines don't have a stack of their own, and because the just switched-in shadow register set certainly have its sp register with a different value than the interrupted code's, the ISR copies the sp value from the just switched-out shadow register set to its own, to be able to use the interrupted code's stack.
If you wish, you could set your ISR's stack to a previously allocated stack of its own, but that is usually not useful.

Can I identify a "function" in an x86 binary?

"Function" meaning a chunk (or a graph of chunks) of the binary that starts at a point (likely arriving from one of the CALL instructions), possibly sets up a stack frame, and has one or more endpoints in the form of RETs (and depending on the calling convention it may also unwind said stack frame).
My current idea is to treat the various conditional branching instructions as junctions in a graph and do a Breadth-first search on the code this way. Is this viable at all? If not, what's a better approach?
My objective with this is just what it is: extract the functions. Purely for the sake of doing it. Maybe doing something fancy later if I have the time and notion.
You can use a disassembler library like BeaEngine to do the hard work for you and then search on resulting mnemonics for call.
Without a symbol table I would say: almost impossible. At least without false positives/negatives.
What you need first is a disassembler. Just looking for a byte combination won't cut it, the combination might be part of some "random" data. Then, tracing the CALLs is likely the best solution as a function doesn't necessarily always start with the same opcode sequence. But even a disassembler might have a hard time and get confused by embedded data in the text segment.
Even if you were able to find the functions, you cannot get their names without debug symbols (in the compiled program there's no need for names any more, only addresses).
Also, you'd have a very hard time finding out what kind of parameters the function accepts. For example, a function might accept 2 argument but uses neither. In this case you would need a function call and look at how the stack is prepared in advance of calling the function.
You have to look for things like:
push ebp
mov ebp, esp
sub esp, ???
...
...
add esp, ???
pop ebp
ret