What can be the cause of "jal" to the middle of another function in MIPS - mips

I am looking at a very suspicious disassembled MIPS code of a C application
80019B90 jal loc_80032EB4
loc_80032EB4 is in the middle of another function's body, I've specially checked that no other code is loaded at this address in runtime and calling that function this way(passing some code in the beginning) can be useful. But how is it possible to do in C? It's not a goto as you can't goto to another function and normal function call will always "jal" to the beginning. Can this be some hand optinmimzation?
Update:
Simplified layout of both functions, callee:
sub_80032E88 (lz77_decode)
... save registers ...
80032E90 addiu $sp, -8
... allocate memory for decompressed data ...
80032EB0 move DECOMPRESSED_DATA_POINTER_A1, $v0
loc_80032EB4:
80032EB4 lw $t7, 0(PACKED_DATA_POINTER_A0)
... actual data decompression ...
80032F4C jr $ra
caller:
80019ACC addiu $sp, -0x30
... some not related code ...
80019B88 lw $a1, off_80018084 // A predefined buffer is used instead of allocating it for decompressed data
80019B90 jal loc_80032EB4
80019B94 move $a0, $s0
... some other code and function epilogue ...
Update 2:
I've checked if this can be a case of setjmp/longjmp usage, but in my tests I can always see calls to setjmp and longjmp functions in disassembled code, not a direct jump.
Update 3:
I've tried using GCC-specific ability to get label pointers and casted this pointer to function, result is close to what I want but disassembled code is still different as instead of using jal with exaxct address it calculating it runtime, maybe I am just unable to force compiler to see this value as constant, becouse of scope issues.

Since it is a data decompression function from a game system, it is very likely that this function is hand optimized assembly with multiple entry points. Multiple entry points aren't commonly used, so it is difficult to find a publicly available example, but here is an old thread from the gcc mailing list that suggests a possible use for this technique.
The gist is that if you have two functions where one function F1 has code that is a subset of the other function, F2's code, then the code for F2 can fall through into the code for F1. In your case, F2 allocates memory for the decompressed data, and F1 assumes that the memory allocation has already been done. I'm pretty sure that GCC 2.9x cannot generate code like this.
It is not possible to directly translate this construct from assembler into standard C, because you cannot goto another function in C, but this is perfectly legal in assembler code. The gcc mailing list thread suggests a couple of work-arounds to express the same idea in C.
If you look at the dis-assembled code for the decompression it will likely have a different style than compiler generated code. There may even be some use of opcodes, like find first set bit that the compiler cannot generate from C.

Related

What exactly is the difference between leaf functions and non leaf functions?

As far as I know, the main difference between leaf functions and non-leaf functions is that leaf functions do not call another function, and non-leaf functions call for other functions.
Therefore, leaf functions do not require things like
begin
push $ra
~ ( some random bits of code) ~
pop $ra
end
But what exactly does it mean to 'call for a function'? It seems like I only know the difference between those two as a definition, and not really understand the whole thing.
A leaf function doesn't use jal, or anything else that would run any code you can't see when writing that function. That's what it means to not call any other functions.
In the tree of function calls your program could or does make (more generally a call graph), it's a leaf node. It has callers but no callees. If you're looking at C source vs. asm, inlining small functions can make non-leaf C functions into asm leaf functions.
Thus it doesn't have to save / restore $ra on the stack, it can just leave its own return address in that register and return to its own caller with jr $ra, without having used $ra for any other return address in the meantime. There will never be another function's stack frame below it on the callstack.
A large function might want to use a lot of registers, so it might actually save/restore $ra, and maybe some of $s0..$s7 just to use them as scratch space. But MIPS has lots of registers, so usually leaf functions can get their work done just fine using only $t0..$t9, $v0..1, $a0..3, and $at without needing to touch stack space. Except maybe for local arrays if it needs more scratch space than registers, or space that can be indexed.
A system call doesn't really count as a call if you do it directly with syscall so a leaf function can make a system call. (Most real-world systems are like MARS/SPIM in the fact that the kernel saves all registers around syscall, except for the return value.)
But if you call a wrapper function like jal read defined in libc (on a Unix system for example, not in MARS/SPIM) then that's a real function and you have to assume it follows the standard calling convention, like leaving garbage in all of the $t0..9 and $a / $v registers, as well as $ra.
The only exception might be some private helper functions where this function knows which registers they do/don't use, so you could look at a jal helper as just part of the implementation of this leaf function. In that case you would still have to manage $ra, maybe saving it in $t9 or something.
Related: MIPS: relevant use for a stack pointer ($sp) and the stack for an example of a non-leaf function using stack space to save stuff across two calls to unknown functions.
BTW, MIPS doesn't have push and pop instructions. You normally addiu $sp, $sp, -16 or however much stack space you need, and use sw to store into the space reserved. You wouldn't separately sub or add between every load and store.
And end isn't a real thing in MIPS assembly; you need to run an instruction that jumps back to your caller such as jr $ra. Or tailcall some other function, like j foo or b foo, to effectively call it with the return address being the one your caller originally passed.

MIPS: legal to have two consecutive "load word" instructions into the same register?

Background: We're seeing a very intermittent crash in a function foo(int *p). The crash occurs while dereferencing p, whose value in these cases turns out to be 0xffffffff. An analysis of the core dump shows that foo() is called from the following assembly snippet:
bne ... somewhere else
lw $a0,44(sp)
lw $a0,40(sp)
jal foo()
lui s1, 0x1000
Inspecting memory in the core dump shows that 44(sp) is 0xffffffff, whereas 40(sp) is the correct value we intend to dereference. However, the value of a0 at the time of the crash, inside foo(), is 0xffffffff. (It's important to note that foo() in this case is just accessing a member; so it's literally the first instruction in foo() which is already attempting to access via a0, and crashing. Also, ra is pointing to the instruction following the above snippet, and s1 currently contains 0x10000000, so we're quite confident that foo() was, indeed, called from the above snippet.)
Our only theory at the moment is that the two consecutive lws into a0 are a hazard -- either a documented one, in which case this looks like a compiler bug; or an undocumented one.
So: is the above assembly legal? If it is, any other ideas about what could be going on here?
Thanks!
UPDATE: Well, turns out this was all a wild goose chase: a repeat analysis of the coredump by a colleague turned up a path in the code which I had missed, where there was a jump directly to the jal foo() instruction, immediately after having set a0 to 44(sp). In other words, there is a path in the code which is consistent with the result we're seeing that does not involve hazards, or "skipped instructions" or anything... I thought I checked this, but I guess I either didn't, or missed it... :(
Anyway, I've accepted markgz's answer, since it answers my original question about the legality of these instructions (apparently they are).
A quick search of the MIPS documentation for the MIPS32R2 ISA doesn't show any restrictions on LW after LW instructions.
There might be a bug in the MIPS implementation in your CPU. Things to look at include:
What address is 44(sp), 40(sp) - are they on a page boundary or a 256MByte boundary, or other interesting address?
Do either of the loads trigger a page fault?
Does patching the binary to insert a NOP, SSNOP, or a SYNC instruction between the loads make the problem go away?

how to get current PC register value on MIPS arch?

I'd like to do backtrace on MIPS.
Then, I face one problem: how do I get the current PC register value, since it doesn't belong to 32 normal registers..
Thanks for your suggestion..
Make a subroutine that looks somewhat like:
.text
.globl GetIP
GetIP:
move $v0, $ra
jr $ra
And then call the routine; it'll give you the address of the first instruction after the call.
after a jal call it will be copied to the ra register...
so you could store ra, then jal to the next line, read ra, restore ra.
Although this question isn't tagged c, I figured it might be useful to share a solution utilizing inline assembly in gcc.
__attribute__((noinline)) static void *get_pc(void)
{
void *pc;
asm volatile ("move %0, $ra" : "=r"(pc));
return pc;
}
Of course, the gist of the solution is the same as the currently accepted answer. Since the function is very small, it is a good candidate for inlining when optimization is turned on. However, if that function were inlined, its return value would be invalid : it would simply return some value of ra in the calling function, since a jal or jalr wouldn't be generated, and ra thus not set to the instruction following jal/jalr. This is why __attribute__((noinline)) is essential in this case.

MIPS - JAL confusion: $ra = PC+4 or PC+8?

I'm having trouble understanding how the instruction jal works in the MIPS processor.
My two questions are:
a) What is the value stored in R31 after "jal": PC+4 or PC+8?
b) If it's really PC+8, what happens to the instruction at PC+4? Is it executed before the jump or is it never executed?
In Patterson and Hennessy (fourth edition), pg 113:
"jump-and-link instruction: An instruction that jumps to and address and simultaneously saves the address of the following instruction in a register ($ra in MIPS)"
"program counter (PC): The register containing the address of the instruction in the program being executed"
After reading those two statements, it follows that the value saved in $ra should be (PC+4).
However, in the MIPS reference data (green card) that comes with the book, the jal instruction's algorithm is defined like this:
"Jump and Link : jal : J : R[31]=PC+8;PC=JumpAddr"
This website also states that "it's really PC+8", but strangely, after that it says that since pipelining is an advanced topic "we'll assume the return address is PC+4".
I come from 8086 assembly, so I'm aware that there's a big difference between returning to an address and to the one following it, because programs won't work if I just assume something that's not true. Thanks.
The address in $ra is really PC+8. The instruction immediately following the jal instruction is in the "branch delay slot". It is executed before the function is entered, so it shouldn't be re-executed when the function returns.
Other branching instructions on the Mips also have branch delay slots.
The delay slot is used to do something useful in the time it takes to execute the jal instruction.
I got the same question. Googled this excellent answer of Richard and also another link I wish to add here.
The link is http://chortle.ccsu.edu/AssemblyTutorial/Chapter-26/ass26_4.html
with this wonderful explanation of double adding 4 to the PC.
So the actual execution has two additions: 1) newPC=PC+4 by pipelining and 2) another addition $ra=newPC+4 by the jal instruction resulting the effective $ra = (address of the jal instruction)+8.

Microprogramming in MIPS

I am learning about micro programming and am confused as to what a micro-instruction actually is. I am using the MIPS architecture. My questions are as follows
Say for example I have the ADD instruction, what would the micro-instructions look like for this? How many micro-instructions are there for the add instruction. Is there somewhere online I can see the list of micro-instructions for the basic instructions of MIPS?
How can I figure out the bit string for an ADD microprogrammed instruction?
Microprogramming is a method of implementing a complex instruction set architecture (such as x86) in terms of simpler "micro instructions". MIPS is a RISC instruction set architecture and is not typically implemented using micro-programming, so there are ZERO microinstructions for the ADD instruction.
To answer your specific question one would have to know what the definition of your particular micro-architecture is.
This is an example of how to load the EPC into one of the registers and add 4-bytes to it:
lw t0, 20(sp) // Load EPC
addi t0, 4 // Add 4 to the return adress
sw t0, 20(sp) // Save EPC
There are "a lot" of instructions that you can use, you can see the MIPS Instruction Set here. In my humble opinion, MIPS is Really neat and easy to learn! A fun fact is that the first Playstation used a MIPS CPU.
Example instructions
lw = load word
la = load address
sw = save word
addi = add immidate
Then you have a lot of conditional instructions such as:
bne = branch not equal
bnez = branch not equal zero
And with these you use j to jump to an adress.
Here is an example from an Exception Handler that I wrote once for MIPS, this is the External Source handler:
External:
mfc0 t0, C0_CAUSE // We could aswell use 24(sp) to load CAUSE
and t0, t0, 0x02000 // Mask the CAUSE
bnez t0, Puls // If the only character left is
// "not equal zero" jump to Puls
j DisMiss // Else jump to DisMiss
In the above example I define an entry point called External that I can jump to, as I do with DisMiss to loop, you generally jump to yourself.
There are some other instructions used here aswell:
mfc0 = move from co-processor 0
To handle labels, I would suggest you check this question/answer out.
Here's a couple of resources on MicroProgramming with MIPS:
Some general information
Here is a bit more heavy power-point presentation on the subject from Princton ( PDF )
Here is a paper from another university which is one of the best of these three ( PDF ).