What can a negative offset means in "load word" instruction in MIPS decompiled code - reverse-engineering

I am reverse engineering a C MIPS application, in some places I can see negative offsets in lw opcode, like:
80032910 lw $v0, -4($s4)
Positive offsets usually indicate some kind of structure, where one of the members is being accessed, but what code can lead to a negative offset?

For example, it can be generated if you read the previous value of a pointer, e.g. traversing an array from the end to the beginning
int *myDataEnd;
... code ...
while(*myDataEnd > *(myDataEnd-1))
myDataEnd--;
Referencing the integer pointed by myDataEnd-1 may generate that instruction.

Position independent code uses $gp as a pointer into the middle of a 64K region of global data, so you will often see lw $t0, -nnn($gp) in code. If the depth of a stack frame is unpredictable at compile-time, a frame pointer may be used to mark the start of the stack frame, causing there to be memory references with negative offsets to the frame pointer.
Hand optimized code can also use negative offsets to save reloading an address into a register.

Related

What exactly is the difference between leaf functions and non leaf functions?

As far as I know, the main difference between leaf functions and non-leaf functions is that leaf functions do not call another function, and non-leaf functions call for other functions.
Therefore, leaf functions do not require things like
begin
push $ra
~ ( some random bits of code) ~
pop $ra
end
But what exactly does it mean to 'call for a function'? It seems like I only know the difference between those two as a definition, and not really understand the whole thing.
A leaf function doesn't use jal, or anything else that would run any code you can't see when writing that function. That's what it means to not call any other functions.
In the tree of function calls your program could or does make (more generally a call graph), it's a leaf node. It has callers but no callees. If you're looking at C source vs. asm, inlining small functions can make non-leaf C functions into asm leaf functions.
Thus it doesn't have to save / restore $ra on the stack, it can just leave its own return address in that register and return to its own caller with jr $ra, without having used $ra for any other return address in the meantime. There will never be another function's stack frame below it on the callstack.
A large function might want to use a lot of registers, so it might actually save/restore $ra, and maybe some of $s0..$s7 just to use them as scratch space. But MIPS has lots of registers, so usually leaf functions can get their work done just fine using only $t0..$t9, $v0..1, $a0..3, and $at without needing to touch stack space. Except maybe for local arrays if it needs more scratch space than registers, or space that can be indexed.
A system call doesn't really count as a call if you do it directly with syscall so a leaf function can make a system call. (Most real-world systems are like MARS/SPIM in the fact that the kernel saves all registers around syscall, except for the return value.)
But if you call a wrapper function like jal read defined in libc (on a Unix system for example, not in MARS/SPIM) then that's a real function and you have to assume it follows the standard calling convention, like leaving garbage in all of the $t0..9 and $a / $v registers, as well as $ra.
The only exception might be some private helper functions where this function knows which registers they do/don't use, so you could look at a jal helper as just part of the implementation of this leaf function. In that case you would still have to manage $ra, maybe saving it in $t9 or something.
Related: MIPS: relevant use for a stack pointer ($sp) and the stack for an example of a non-leaf function using stack space to save stuff across two calls to unknown functions.
BTW, MIPS doesn't have push and pop instructions. You normally addiu $sp, $sp, -16 or however much stack space you need, and use sw to store into the space reserved. You wouldn't separately sub or add between every load and store.
And end isn't a real thing in MIPS assembly; you need to run an instruction that jumps back to your caller such as jr $ra. Or tailcall some other function, like j foo or b foo, to effectively call it with the return address being the one your caller originally passed.

Does sw and lw in MIPS store a value below or above the stack pointer?

My professor gave a video that looks like this:
In the lower right, he wrote $ra at location 124 while the $sp is at 128 which implies that the first sw $ra, 4($sp) instruction stores the $ra value at a location 4 bytes less than the $sp. But my book does it differently:
and
The image implies that the lw instruction stores it at locations larger, more positive numbers than the $sp. So which is right? Does lw and sw offset numbers refer to numbers higher or lower than the $sp?
You are right in observing that the first factorial is storing above the stack pointer, stack storage that it did not allocate, and must have been allocated by the caller.
This is somewhat non-standard usage, but technically legal, since the MIPS calling convention requires giving the top 4 stack locations of any stack frame to the callee. The function is only allocating a 2-word frame, and according to the calling convention (which allows the callee to use the top 4 words of the frame) it should be allocating minimally a 4-word frame.
Still, since the factorial function calls no other except itself, this is ~legal, and in compliance with the calling convention — in the sense that its job is to ensure that one function can call another.
(Note that in RISC V (the open source MIPS follow-on) this requirement of 4-words stack frame for callee to use is not present so similar would not work there.)
The second example is more traditional, however, it also does not allocate a standard sized frame — one that gives the top 4-words to the callee. Still it is also not technically necessary, and less reliant on the original caller (e.g. main) providing a proper stack frame (one with 4 words given to the callee).
Let's further observe that the first code sample stores $ra and $a0 on the stack, which are registers that we expect to be saved — whereas the latter example stores $s0 (which we would expect to be saved as these are dedicated non-volatile), but also $t0 and $t1 which seems non standard as these are dedicated temporaries.

Does MIPS store offsets for sw and lw before or above the stack pointer?

This is a diagram in a class of mine and only instruction 0 has executed. In the stack on the lower right of that digram, it stores $ra at an address 4 less than the $sp which is at 128.
In my book, however, they use offsets to mean higher than the $sp. here are the diagrams:
and
What’s strange to me is that in the first image, the lw and sw offsets are positive numbers but cause the values to be stored negative to the $sp at lower addresses.
However, the last two images use offsets to store or load values at higher addresses relative to $sp. The book makes room by subtracting from the $sp first and then storing register values in the freed up stack using positive offset numbers which go to higher addresses.
Are these two inconsistent?

What is the target operand of MIPS j & beq instructions represent?

Could you please tell me what is the target operand of the following two MIPS instructions represent:
j target
beq $t0,$t1,target
is target represent the number of instructions displacement or bytes displacement ?
In assembly, the target is just a label of your source code.
When assembled, j jumps unconditionally to the effective address encoded by the instruction * 4. This is due to the fact that every instruction occupies 4 bytes, and each instruction must be word-aligned, so the encoding of the instruction does not store the two less significant bits of the target address (which will be always 00).
The branch instructions performs a relative jump. In machine code, the instruction stores (in A2-compliment) the number of words to move counting from the address of the next instruction to be executed.
In your jargon, they both be 'instructions displacement'.

What can be the cause of "jal" to the middle of another function in MIPS

I am looking at a very suspicious disassembled MIPS code of a C application
80019B90 jal loc_80032EB4
loc_80032EB4 is in the middle of another function's body, I've specially checked that no other code is loaded at this address in runtime and calling that function this way(passing some code in the beginning) can be useful. But how is it possible to do in C? It's not a goto as you can't goto to another function and normal function call will always "jal" to the beginning. Can this be some hand optinmimzation?
Update:
Simplified layout of both functions, callee:
sub_80032E88 (lz77_decode)
... save registers ...
80032E90 addiu $sp, -8
... allocate memory for decompressed data ...
80032EB0 move DECOMPRESSED_DATA_POINTER_A1, $v0
loc_80032EB4:
80032EB4 lw $t7, 0(PACKED_DATA_POINTER_A0)
... actual data decompression ...
80032F4C jr $ra
caller:
80019ACC addiu $sp, -0x30
... some not related code ...
80019B88 lw $a1, off_80018084 // A predefined buffer is used instead of allocating it for decompressed data
80019B90 jal loc_80032EB4
80019B94 move $a0, $s0
... some other code and function epilogue ...
Update 2:
I've checked if this can be a case of setjmp/longjmp usage, but in my tests I can always see calls to setjmp and longjmp functions in disassembled code, not a direct jump.
Update 3:
I've tried using GCC-specific ability to get label pointers and casted this pointer to function, result is close to what I want but disassembled code is still different as instead of using jal with exaxct address it calculating it runtime, maybe I am just unable to force compiler to see this value as constant, becouse of scope issues.
Since it is a data decompression function from a game system, it is very likely that this function is hand optimized assembly with multiple entry points. Multiple entry points aren't commonly used, so it is difficult to find a publicly available example, but here is an old thread from the gcc mailing list that suggests a possible use for this technique.
The gist is that if you have two functions where one function F1 has code that is a subset of the other function, F2's code, then the code for F2 can fall through into the code for F1. In your case, F2 allocates memory for the decompressed data, and F1 assumes that the memory allocation has already been done. I'm pretty sure that GCC 2.9x cannot generate code like this.
It is not possible to directly translate this construct from assembler into standard C, because you cannot goto another function in C, but this is perfectly legal in assembler code. The gcc mailing list thread suggests a couple of work-arounds to express the same idea in C.
If you look at the dis-assembled code for the decompression it will likely have a different style than compiler generated code. There may even be some use of opcodes, like find first set bit that the compiler cannot generate from C.