Using "saved" registers in the main function at RISC-V Assembly - function

Suppose the simple following main function written in RISC-V Assembly:
.globl main
main:
addi s3,zero,10 #Should this register (s3) be saved before using?
Since s3 is a "saved register", the procedure calling conventions should be followed and thus, this register should be pushed to the stack before using it. However, by looking at the source file, no other procedure has used this register and saving the register to the stack seems redundant.
My question is, should these types of registers be saved every time before every usage even if it means writing more (redundant) code just to obey the calling conventions? Can these conventions sometimes be ignored to improve performance?
In the example above, should the register be saved because it is unknown if the main's caller has been using the s3 register?

Yes, main is a function that has a real caller you return to, and that caller might be using s3 for something.
Unless your main never returns, either being an infinite loop or only exiting by calling exit or a system call. If you never return, you don't need to be able to restore the caller's state, or even find your way back (via a return address).
So if it's just as convenient to call exit instead of ever returning from main, doing that allows you to avoid saving anything.
This also applies in cases where there's nothing for main to return to, of course, so returning wasn't even an option. e.g. if it's the entry point in a kernel or other freestanding code.
Also, I hope you understand that saved every time before every usage means once per function that uses them, not separately around each separate block. And not saving call-clobbered registers around each function call; just let them die.
Can these conventions sometimes be ignored to improve performance?
Yes, if you keep the details invisible to any code you don't control.
If you treat small private helper functions as actually part of one big function, then they can use a "private" custom calling convention. (Even if you do actually call / return instead of just jumping to them, if you want to avoid inlining them at multiple callsites)
Sometimes this is just taking advantage of extra guarantees when you know about the function you're calling. e.g. that it doesn't actually clobber some of its input arg registers. This can be useful in recursion when you're calling yourself: foo(int *p, int a) self calls might take advantage of p still being in the same register unmodified, instead of having to keep p somewhere else for use after the call returns like it would if calling an "unknown" function where you can't assume anything the calling convention doesn't guarantee.
Or if you have a publicly-visible wrapper in front of your actual private recursive function, you can set up some some constants, or even have the recursive function treat one register as a static variable, instead of passing around pointers to some shared state in memory. (That's no longer pure recursion, just a loop that uses the asm stack to keep track of some history that happens to include a jump address.)

Related

MIPS functions and variables in stack

I have come in contact with MIPS-32, and I came with the question if a variable, for example $t0 declared having the value in one function can be altered by another and how this does have to do with stack, this is, the location of the variable in memory. Everything that I am talking is in assembly language. And more, I would like some examples concerning this use, this is, a function altering or not, a variable value of another function, and how this variable "survive" or not in terms of if the variable is given as a copy or a reference.
(I hope we can create an environment where conceptual question like that above can be explored more)
$t0 declared having the value in one function can be altered by another
$t0 is known as a call-clobbered register.  It is no different than the other registers as far as the hardware is concerned — being call clobbered vs. call preserved is an aspect of software convention, call the calling convention, which is a subset of an Application Binary Interface (ABI).
The calling convention, when followed, allows a function, F, to call another function, G, knowing only G's signature — name, parameters & their types, return type.  The function, F, would not have to also be changed if G changes, as long as both follow the convention.
Call clobbered doesn't mean it has to be clobbered, though, and when writing your own code you can use it any way you like (unless your coursework says to follow the MIPS32 calling convention, of course).
By the convention, a call-clobbered register can be used without worry: all you have to do use it is put a value into it!
Call preserved registers can also be used, if desired, but they should be presumed to be already in use by some caller (maybe not the immediate caller, but some distant caller), the values they contain must be restored before exiting the function back to return to its caller.  This is, of course, only possible by saving the original value before repurposing the register for a new use.
The two sets of register (call clobbered/preserved) serve two common use cases, namely cheap temporary storage and long term variables.  The former requires no effort to preserve/restore, while the latter both does require this effort, though gives us registers that will survive a function call, which is useful, for example, when a loop has a function call.
The stack comes into play when we need to first preserve, then restore call-preserved registers.  If we want to use call-preserved registers for some reason, then we need to preserve their original values in order to restore them later.  The most reasonable way to do that is to save them in the stack.  In order to do that we allocate some space from the stack.
To allocate some local memory, the stack pointer is decremented to reserve a function some space.  Since the stack pointer, upon return to caller, must have the same value, this space is necessarily deallocated upon return.  Hence the stack is great for local storage.  Original values of preserved registers must be also restored upon return to caller and so using local storage is appropriate.
https://www.dyncall.org/docs/manual/manualse11.html — search for section "MIPS32".
Let's also make the distinction between variables, a logical concept, and storage, a physical concept.
In high level language, variables are named and have scopes (limited lifetimes).  In machine code, we have physical hardware (storage) resources of registers and memory; these simply exist: they have no concept of lifetime.  In and of themselves these hardware resources are not variables, but places that we can use to hold variables for their lifetime/scope.
As assembly language programmers, we keep a mental (or even written) map of our logical variables to physical resources.  The compiler does the same, knowing the scope/lifetime of program variables and creating that "mental" map of variables to machine code storage.  Variables that have overlapping lifetimes cannot share the same hardware resource, of course, but when a variable is out of scope, its (mapped-to) physical resource can be reused for another purpose.
Logical variables can also move around to different physical resources.  A logical variable that is a parameter, may be passed in a CPU register, e.g. $a0, but then be moved into an $s register or into a (stack) memory location.  Such is the business of machine code.
To allocate some hardware storage to a high level language (or pseudo code) variable, we simply initialize the storage!  Hardware resources are necessarily constantly being repurposed to hold a different logical variable.
See also:
How a recursive function works in MIPS? — for discussion on variable analysis.
Mips/assembly language exponentiation recursivley
What's the difference between caller-saved and callee-saved in RISC-V

Tcl_DoOneEvent is blocked if tkwait / vwait is called

There is an external C++ function that is called from Tcl/Tk and does some stuff in a noticeable amount of time. Tcl caller has to get the result of that function so it waits until it's finished. To avoid blocking of GUI, that C++ function has some kind of event loop implemented in its body:
while (m_curSyncProc.isRunning()) {
const clock_t tm = clock();
while (Tcl_DoOneEvent(TCL_ALL_EVENTS | TCL_DONT_WAIT) > 0) {} // <- stuck here in case of tkwait/vwait
// Pause for 10 ms to avoid 100% CPU usage
if (double(clock() - tm) / CLOCKS_PER_SEC < 0.005) {
nanosleep(10000);
}
}
Everything works great unless tkwait/vwait is in action in Tcl code.
For example, for dialogs the tkwait variable someVariable is used to wait Ok/Close/<whatever> button is pressed. I see that even standard Tk bgerror uses the same method (it uses vwait).
The problem is that once called Tcl_DoOneEvent does not return while Tcl code is waiting in tkwait/vwait line, otherwise it works well. Is it possible to fix it in that event loop without total redesigning of C++ code? Because that code is rather old and complicated and its author is not accessible anymore.
Beware! This is a complex topic!
The Tcl_DoOneEvent() call is essentially what vwait, tkwait and update are thin wrappers around (passing different flags and setting up different callbacks). Nested calls to any of them create nested event loops; you don't really want those unless you're supremely careful. An event loop only terminates when it is not processing any active event callbacks, and if those event callbacks create inner event loops, the outer event loop will not get to do anything at all until the inner one has finished.
As you're taking control of the outer event loop (in a very inefficient way, but oh well) you really want the inner event loops to not run at all. There's three possible ways to deal with this; I suspect that the third (coroutines) will be most suitable for you and that the first is what you're really trying to avoid, but that's definitely your call.
1. Continuation Passing
You can rewrite the inner code into continuation-passing style — a big pile of procedures that hands off from step to step through a state machine/workflow — so that it doesn't actually call vwait (and friends). The only one of the family that tends to be vaguely safe is update idletasks (which is really just Tcl_DoOneEvent(TCL_IDLE_EVENTS | TCL_DONT_WAIT)) to process Tk internally-generated alterations.
This option was your main choice up to Tcl 8.5, and it was a lot of work.
2. Threads
You can move to a multi-threaded application. This can be easy… or very difficult; the details depend on an examination of what you're doing throughout the application.
If going this route, remember that Tcl interpreters and Tcl values are totally thread-bound; they internally use thread-specific data so that they can avoid big global locks. This means that threads in Tcl are comparatively expensive to set up, but actually use multiple CPUs very efficiently afterwards; thread pooling is a very common approach.
3. Coroutines
Starting in 8.6, you can put the inner code in a coroutine. Almost everything in 8.6 is coroutine-aware (“non-recursive” in our internal lingo) by default (including commands you wouldn't normally think of, such as source) and once you've done that, you can replace the vwait calls with equivalents from the Tcllib coroutine package and things will typically “just work”. (For example, vwait var becomes coroutine::vwait var, and after 123 becomes coroutine::after 123.)
The only things that don't have direct replacements are tkwait window and tkwait visibility; you'll need to simulate those with waiting for a <Destroy> or <Visibility> event (the latter is uncommon as it is unsupported on some platforms), which you do by binding a trivial callback on those that just sets a variable that you can coroutine::vwait on (which is essentially all that tkwait does internally anyway).
Coroutines can become messy in a few cases, such as when you've got C code that is not coroutine-aware. The main places in Tcl where these come into play are in trace callbacks, inter-interpreter calls, and the scripted implementations of channels; the issue there is that the internal APIs these sit behind are rather complicated already (especially channels) and nobody's felt up to wading in and enabling a non-recursive implementation.

How many arguments are passed in a function call?

I wish to analyze assembly code that calls functions, and for each 'call' find out how many arguments are passed to the function. I assume that the target functions are not accessible to me, but only the calling code.
I limit myself to code that was compiled with GCC only, and to System V ABI calling convention.
I tried scanning back from each 'call' instruction, but I failed to find a good enough convention (e.g., where to stop scanning? what happen on two subsequent calls with the same arguments?). Assistance is highly appreciated.
Reposting my comments as an answer.
You can't reliably tell in optimized code. And even doing a good job most of the time probably requires human-level AI. e.g. did a function leave a value in RSI because it's a second argument, or was it just using RSI as a scratch register while computing a value for RDI (the first argument)? As Ross says, gcc-generated code for stack-args calling-conventions have more obvious patterns, but still nothing easy to detect.
It's also potentially hard to tell the difference between stores that spill locals to the stack vs. stores that store args to the stack (since gcc can and does use mov stores for stack-args sometimes: see -maccumulate-outgoing-args). One way to tell the difference is that locals will be reloaded later, but args are always assumed to be clobbered.
what happen on two subsequent calls with the same arguments?
Compilers always re-write args before making another call, because they assume that functions clobber their args (even on the stack). The ABI says that functions "own" their args. Compilers do make code that does this (see comments), but compiler-generated code isn't always willing to re-purpose the stack memory holding its args for storing completely different args in order to enable tail-call optimization. :( This is hand-wavey because I don't remember exactly what I've seen as far as missed tail-call optimization opportunities.
Yet if arguments are passed by the stack, then it shall probably be the easier case (and I conclude that all 6 registers are used as well).
Even that isn't reliable. The System V x86-64 ABI is not simple.
int foo(int, big_struct, int) would pass the two integer args in regs, but pass the big struct by value on the stack. FP args are also a major complication. You can't conclude that seeing stuff on the stack means that all 6 integer arg-passing slots are used.
The Windows x64 ABI is significantly different: For example, if the 2nd arg (after adding a hidden return-value pointer if needed) is integer/pointer, it always goes in RDX, regardless of whether the first arg went in RCX, XMM0, or on the stack. It also requires the caller to leave "shadow space".
So you might be able to come up with some heuristics to will work ok for un-optimized code. Even that will be hard to get right.
For optimized code generated by different compilers, I think it would be more work to implement anything even close to useful than you'd ever save by having it.

How is the stack and link register used in an interrupt procedure? (ARM Processor)

The ARM website says that the link register stores the return information for subroutines, function calls, and exceptions (such as interrupts), so what is the stack used for?
The answers to this similar question say that the stack is used to store the return address, and to "push" on local variables that will need to be put back on the core registers after the exception.
But this is what the link register is for, so is why is it needed? What is the difference between the two and how are they both used?
Okay I think I understand your question.
So you have some code in a function call a function
main ()
{
int a,b;
a = myfun0();
b=a+7;
...
So when we call myfun0() the link register basically gets us back so we can do the b = a+7; understanding of course all of this gets compiled to assembly and optimized and such, but this will suffice to understand that the link register is used to return back to just after the call.
But what if
myfun0 ()
{
return(myfun1()+3);
}
when main calls myfun0() the link register points at some code in the main() function it needs to return to. then myfun0() calls myfun1() it needs to come back to myfun0() to do some more math before returning to main(), so when it calls myfun1() the link register is set to come back and add 3. the problem is when we set the link register to return to myfun0() we trash the address in main() we needed to return to. so to prevent that IF the function is going to call another function then as well as local variables that cant all live within the disposable registers the link register must be put on the stack. So now main calls myfun0(), myfun0() is going to call a function (myfun1()) so a copy of the link register (return address into main()) is saved on the stack. myfun0 calls myfun1() myfun1() follows the same rule if calling something else put lr on the stack otherwise you dont have to, myfun1() uses lr to return to myfun0(), myfun0() restores lr from the stack so it can return to main. follow that simple rule per function and you cant go wrong.
Now interrupts which not sure related or not or maybe I misunderstood your question. so arm has banked registers at least for non-cortex-m cores. but in general when an interrupt/exception occurs if the exception handler needs to use any resources/registers that are in use by the foreground task then that handler needs to preserve those on the stack in such a way that the foreground task that was interrupted had no idea this happened since interrupts in general can often occur between any two instructions, so you need to even go so far as to preserve the flags that were set by the instruction before the one you interrupted.
So apply that to arm, you have to look at which architecture you are using and see where it describes the interrupt process what registers you have to preserve and which ones you dont, which stack pointer is used, etc (something you have to setup well ahead of your first exceptions, if you are using an arm with separate interrupt stack and forground stacks).
The cortex-m is designed to do some/all of that work for you it has one stack basically and on interrupt it pushes all the registers on the stack for you so you can simply have a C compiled function just run as a handler and the hardware cleans up after you (from a preserved register perspective). Some other processors families do something like this for you, they might have a return from interrupt instruction separate from a return instruction, one is there because on interrupt the hardware saves the flags and return address but for a simple call you dont need the flags preserved.
The arm is a lot more flexible than some other instruction sets, some others you may not have any instructions that allow you to branch to an address in any register you want you may have a limitation. You might be limited on what register you use as a stack pointer or the stack pointer itself is not accessible as a general purpose register. by convention the sp is 13 in the arm, they allow the pseudo instruction of push and pop which translates into the proper ldmia r13!{blah} and stmdb r13!,{blah} but you could pick your own (if not using a compiler that follows the convention or can change an open source compiler to use a different stack pointer register). the arm doesnt prevent that. the magic of the link register r14 is nothing more than a branch link or branch link exchange automatically modifies r14, but the instruction set allows you to use basically any register to branch/return for normal function calls. The arm has just enough general purpose registers to encourage compilers to do register based parameter passing vs stack only. Some processors lean toward stack only parameter passing and have designed their return address instructions to be strictly stack based avoiding a return register all together and a rule to have to save that if nesting functions.
All of these approaches have pros and cons, in the case of arm register passing was desireable and a register based return address too, but for nesting functions you have to preserve the return address at each nesting level to not get lost. Likewise for interrupts you have to put things back they way you found them as well as be able to get back to where you interrupted the foreground.

Can I identify a "function" in an x86 binary?

"Function" meaning a chunk (or a graph of chunks) of the binary that starts at a point (likely arriving from one of the CALL instructions), possibly sets up a stack frame, and has one or more endpoints in the form of RETs (and depending on the calling convention it may also unwind said stack frame).
My current idea is to treat the various conditional branching instructions as junctions in a graph and do a Breadth-first search on the code this way. Is this viable at all? If not, what's a better approach?
My objective with this is just what it is: extract the functions. Purely for the sake of doing it. Maybe doing something fancy later if I have the time and notion.
You can use a disassembler library like BeaEngine to do the hard work for you and then search on resulting mnemonics for call.
Without a symbol table I would say: almost impossible. At least without false positives/negatives.
What you need first is a disassembler. Just looking for a byte combination won't cut it, the combination might be part of some "random" data. Then, tracing the CALLs is likely the best solution as a function doesn't necessarily always start with the same opcode sequence. But even a disassembler might have a hard time and get confused by embedded data in the text segment.
Even if you were able to find the functions, you cannot get their names without debug symbols (in the compiled program there's no need for names any more, only addresses).
Also, you'd have a very hard time finding out what kind of parameters the function accepts. For example, a function might accept 2 argument but uses neither. In this case you would need a function call and look at how the stack is prepared in advance of calling the function.
You have to look for things like:
push ebp
mov ebp, esp
sub esp, ???
...
...
add esp, ???
pop ebp
ret