Does sw and lw in MIPS store a value below or above the stack pointer? - mips

My professor gave a video that looks like this:
In the lower right, he wrote $ra at location 124 while the $sp is at 128 which implies that the first sw $ra, 4($sp) instruction stores the $ra value at a location 4 bytes less than the $sp. But my book does it differently:
and
The image implies that the lw instruction stores it at locations larger, more positive numbers than the $sp. So which is right? Does lw and sw offset numbers refer to numbers higher or lower than the $sp?

You are right in observing that the first factorial is storing above the stack pointer, stack storage that it did not allocate, and must have been allocated by the caller.
This is somewhat non-standard usage, but technically legal, since the MIPS calling convention requires giving the top 4 stack locations of any stack frame to the callee. The function is only allocating a 2-word frame, and according to the calling convention (which allows the callee to use the top 4 words of the frame) it should be allocating minimally a 4-word frame.
Still, since the factorial function calls no other except itself, this is ~legal, and in compliance with the calling convention — in the sense that its job is to ensure that one function can call another.
(Note that in RISC V (the open source MIPS follow-on) this requirement of 4-words stack frame for callee to use is not present so similar would not work there.)
The second example is more traditional, however, it also does not allocate a standard sized frame — one that gives the top 4-words to the callee. Still it is also not technically necessary, and less reliant on the original caller (e.g. main) providing a proper stack frame (one with 4 words given to the callee).
Let's further observe that the first code sample stores $ra and $a0 on the stack, which are registers that we expect to be saved — whereas the latter example stores $s0 (which we would expect to be saved as these are dedicated non-volatile), but also $t0 and $t1 which seems non standard as these are dedicated temporaries.

Related

What exactly is the difference between leaf functions and non leaf functions?

As far as I know, the main difference between leaf functions and non-leaf functions is that leaf functions do not call another function, and non-leaf functions call for other functions.
Therefore, leaf functions do not require things like
begin
push $ra
~ ( some random bits of code) ~
pop $ra
end
But what exactly does it mean to 'call for a function'? It seems like I only know the difference between those two as a definition, and not really understand the whole thing.
A leaf function doesn't use jal, or anything else that would run any code you can't see when writing that function. That's what it means to not call any other functions.
In the tree of function calls your program could or does make (more generally a call graph), it's a leaf node. It has callers but no callees. If you're looking at C source vs. asm, inlining small functions can make non-leaf C functions into asm leaf functions.
Thus it doesn't have to save / restore $ra on the stack, it can just leave its own return address in that register and return to its own caller with jr $ra, without having used $ra for any other return address in the meantime. There will never be another function's stack frame below it on the callstack.
A large function might want to use a lot of registers, so it might actually save/restore $ra, and maybe some of $s0..$s7 just to use them as scratch space. But MIPS has lots of registers, so usually leaf functions can get their work done just fine using only $t0..$t9, $v0..1, $a0..3, and $at without needing to touch stack space. Except maybe for local arrays if it needs more scratch space than registers, or space that can be indexed.
A system call doesn't really count as a call if you do it directly with syscall so a leaf function can make a system call. (Most real-world systems are like MARS/SPIM in the fact that the kernel saves all registers around syscall, except for the return value.)
But if you call a wrapper function like jal read defined in libc (on a Unix system for example, not in MARS/SPIM) then that's a real function and you have to assume it follows the standard calling convention, like leaving garbage in all of the $t0..9 and $a / $v registers, as well as $ra.
The only exception might be some private helper functions where this function knows which registers they do/don't use, so you could look at a jal helper as just part of the implementation of this leaf function. In that case you would still have to manage $ra, maybe saving it in $t9 or something.
Related: MIPS: relevant use for a stack pointer ($sp) and the stack for an example of a non-leaf function using stack space to save stuff across two calls to unknown functions.
BTW, MIPS doesn't have push and pop instructions. You normally addiu $sp, $sp, -16 or however much stack space you need, and use sw to store into the space reserved. You wouldn't separately sub or add between every load and store.
And end isn't a real thing in MIPS assembly; you need to run an instruction that jumps back to your caller such as jr $ra. Or tailcall some other function, like j foo or b foo, to effectively call it with the return address being the one your caller originally passed.

Does MIPS store offsets for sw and lw before or above the stack pointer?

This is a diagram in a class of mine and only instruction 0 has executed. In the stack on the lower right of that digram, it stores $ra at an address 4 less than the $sp which is at 128.
In my book, however, they use offsets to mean higher than the $sp. here are the diagrams:
and
What’s strange to me is that in the first image, the lw and sw offsets are positive numbers but cause the values to be stored negative to the $sp at lower addresses.
However, the last two images use offsets to store or load values at higher addresses relative to $sp. The book makes room by subtracting from the $sp first and then storing register values in the freed up stack using positive offset numbers which go to higher addresses.
Are these two inconsistent?

Implementing jump register control to single-cycle MIPS

I am trying to implement jr (jump register) instruction support to a single-cycle MIPS processor. In the following image, I've drawn a simple mux that allows selecting between the normal chain PC or the instruction (jr) address.
How can I know that the instruction is JR to set the mux selection to '1'? I've already done jump and jump_and_link (although the image doesn't show it, as I don't have my project in hands right now), and to control them, I just check if the OP code is 10 (jump) or 11 (jal) in the main control and then set the mux sel to '1'. But I think I can't do the same with jr, as the instruction layout is distinct.
The opcode of a JR instruction has Instruction[31:26] == 0 (special) and Instruction[5:0] == 0x08 (JR). You need to look at both of these bit positions to decide that this is a JR instruction. The Control block on your diagram needs to have an additional input of Instruction[5:0]. The rs field in Instruction[25:21] selects the source register for this instruction. The PC needs to be assigned to rs when a JR instruction is executed.
I think you can improve the performance of the hardware by implementing the JR mux before the Jump mux, since the JR mux is not dependent on the pcnext of the output of the Jump sel mux.

What can a negative offset means in "load word" instruction in MIPS decompiled code

I am reverse engineering a C MIPS application, in some places I can see negative offsets in lw opcode, like:
80032910 lw $v0, -4($s4)
Positive offsets usually indicate some kind of structure, where one of the members is being accessed, but what code can lead to a negative offset?
For example, it can be generated if you read the previous value of a pointer, e.g. traversing an array from the end to the beginning
int *myDataEnd;
... code ...
while(*myDataEnd > *(myDataEnd-1))
myDataEnd--;
Referencing the integer pointed by myDataEnd-1 may generate that instruction.
Position independent code uses $gp as a pointer into the middle of a 64K region of global data, so you will often see lw $t0, -nnn($gp) in code. If the depth of a stack frame is unpredictable at compile-time, a frame pointer may be used to mark the start of the stack frame, causing there to be memory references with negative offsets to the frame pointer.
Hand optimized code can also use negative offsets to save reloading an address into a register.

MIPS - JAL confusion: $ra = PC+4 or PC+8?

I'm having trouble understanding how the instruction jal works in the MIPS processor.
My two questions are:
a) What is the value stored in R31 after "jal": PC+4 or PC+8?
b) If it's really PC+8, what happens to the instruction at PC+4? Is it executed before the jump or is it never executed?
In Patterson and Hennessy (fourth edition), pg 113:
"jump-and-link instruction: An instruction that jumps to and address and simultaneously saves the address of the following instruction in a register ($ra in MIPS)"
"program counter (PC): The register containing the address of the instruction in the program being executed"
After reading those two statements, it follows that the value saved in $ra should be (PC+4).
However, in the MIPS reference data (green card) that comes with the book, the jal instruction's algorithm is defined like this:
"Jump and Link : jal : J : R[31]=PC+8;PC=JumpAddr"
This website also states that "it's really PC+8", but strangely, after that it says that since pipelining is an advanced topic "we'll assume the return address is PC+4".
I come from 8086 assembly, so I'm aware that there's a big difference between returning to an address and to the one following it, because programs won't work if I just assume something that's not true. Thanks.
The address in $ra is really PC+8. The instruction immediately following the jal instruction is in the "branch delay slot". It is executed before the function is entered, so it shouldn't be re-executed when the function returns.
Other branching instructions on the Mips also have branch delay slots.
The delay slot is used to do something useful in the time it takes to execute the jal instruction.
I got the same question. Googled this excellent answer of Richard and also another link I wish to add here.
The link is http://chortle.ccsu.edu/AssemblyTutorial/Chapter-26/ass26_4.html
with this wonderful explanation of double adding 4 to the PC.
So the actual execution has two additions: 1) newPC=PC+4 by pipelining and 2) another addition $ra=newPC+4 by the jal instruction resulting the effective $ra = (address of the jal instruction)+8.