Is $s necessary when implementing a function that swaps while looping through a loop in MIPS? - mips

void sort (int v[], int n)
{
int i, j;
for (i = 0; i<n; i+=1){
for(j=i-1; j>=0; && v[j]>v[j+1]; j-=1){
swap(v,j);
}
}
}
void swap(int v[], int k)
{
int temp;
temp = v[k+1];
v[k+1]= temp;
}
below is MIPS code of swap function
swap : sll $t1, $a0, 2 // reg $t = k*4
add $t1, $a0, $t1 // reg $t1 = v+(k*4)
// reg $t1 has the address of v[k]
lw $t0, 0($t1) // reg $t0 (temp) =v[k]
lw $t2, 4($t1) // reg $t2 = v[k+1]
// refers to next element of v
sw $t2, 0($t1) // v[k] = reg $t2
sw $t0, 4($t1) // v[k+1] = reg $t0 (temp)
jr $ra // return to calling routine
I am studying computer architecture. Through several procedures, we learned that values ​​that should not be changed are stored using a stack pointer or stored in $S.
However, in the code above, the parameters v[] ($a0) and n($a1) of the sort function are stored in $s0 $s1 using the stack pointer, and before running the innermost loop, $s0 in $a0, The data is shown as assigning the value of $s1 to $a1.
However, in the code above, v[] has to keep the changed state inside the swap function, and j also has to change the value while looping, so there is no need to save and put the previous value.
Is my explanation correct? That is, when implementing the above code in MIPS, there is no problem in using the stack pointer to not store it in $s?

Is $s necessary when implementing a function that swaps while looping through a loop in MIPS?
The $s registers are never necessary.  What is necessary is that the variable values of v, n, i, and j survive the function call to swap.  There are two ways to make values survive the function call, and one is to use (stack) memory and the other is to use $s registers, so, $s register are not strictly necessary at all.  (Both forms involve stack memory (and the same amount, as well), but the former uses the stack memory directly for the variable values, and the latter uses the stack memory in preservation of the $s registers' original values).
However, $s registers are preferred for variables that have multiple uses, (measured dynamically by instruction execution at runtime rather than statically as seen in code listing) and where also such variable must survive a function call.  (Of course, arrays cannot live in $s registers, but a reference to the array (a pointer variable) can.)
Why do these variables have to survive the function call?  Because they are each consumed (used, sourced) after the call, though having been defined (set, targeted) before the function call.  Programs expect continuity of their variables whether or not function calls happen.
Let's call out the difference between logical variables used by high level language and pseudo code vs. the physical storage of machine code.
Logical variables have names, types, scope/lifetime (they come and go, some frequently during program execution, as local variables and parameters).  There is an infinite number of possible ones — just create a new name as we like (and can also be unnamed, as with heap data structures).  A program using logical variables has an expectation of continuity — that once set, a variable retains its value until changed by the program (and then retains that new value).
The physical storage of the machine is permanent and global and just groups of bits.  The CPU registers are always present, as is memory (let's gloss over virtualization).  The CPU does not see variables, their declarations, their names, their types or their lifetimes — the CPU sees physical storage.  (How does it work then?  The CPU is told, via machine instructions, how to interact with physical storage; it is so told as needed, every time it runs a machine code instruction.)
Part of the job of translating a high level language program into machine code is managing the mapping of the program's logical variables to the hardware's physical storage.  Since physical storage has constraints, machine code programs must work within them in doing those mappings.
Sometimes a logical variable has to be relocated to alternative physical storage (e.g. from $a register to either $s register or stack memory).  A machine code instruction (or sequence) is usually required to do such relocation, plus, the compiler or assembly programmer has to be aware that the mapping from variable to storage can be different at different points in the same program.
(Sometimes a logical variable's value is actually in more than one physical storage location, as is the case when manipulating global variables that live in memory and copies are brought into CPU registers to work with.  Similar is true for elements of arrays.)

Related

What exactly is the difference between leaf functions and non leaf functions?

As far as I know, the main difference between leaf functions and non-leaf functions is that leaf functions do not call another function, and non-leaf functions call for other functions.
Therefore, leaf functions do not require things like
begin
push $ra
~ ( some random bits of code) ~
pop $ra
end
But what exactly does it mean to 'call for a function'? It seems like I only know the difference between those two as a definition, and not really understand the whole thing.
A leaf function doesn't use jal, or anything else that would run any code you can't see when writing that function. That's what it means to not call any other functions.
In the tree of function calls your program could or does make (more generally a call graph), it's a leaf node. It has callers but no callees. If you're looking at C source vs. asm, inlining small functions can make non-leaf C functions into asm leaf functions.
Thus it doesn't have to save / restore $ra on the stack, it can just leave its own return address in that register and return to its own caller with jr $ra, without having used $ra for any other return address in the meantime. There will never be another function's stack frame below it on the callstack.
A large function might want to use a lot of registers, so it might actually save/restore $ra, and maybe some of $s0..$s7 just to use them as scratch space. But MIPS has lots of registers, so usually leaf functions can get their work done just fine using only $t0..$t9, $v0..1, $a0..3, and $at without needing to touch stack space. Except maybe for local arrays if it needs more scratch space than registers, or space that can be indexed.
A system call doesn't really count as a call if you do it directly with syscall so a leaf function can make a system call. (Most real-world systems are like MARS/SPIM in the fact that the kernel saves all registers around syscall, except for the return value.)
But if you call a wrapper function like jal read defined in libc (on a Unix system for example, not in MARS/SPIM) then that's a real function and you have to assume it follows the standard calling convention, like leaving garbage in all of the $t0..9 and $a / $v registers, as well as $ra.
The only exception might be some private helper functions where this function knows which registers they do/don't use, so you could look at a jal helper as just part of the implementation of this leaf function. In that case you would still have to manage $ra, maybe saving it in $t9 or something.
Related: MIPS: relevant use for a stack pointer ($sp) and the stack for an example of a non-leaf function using stack space to save stuff across two calls to unknown functions.
BTW, MIPS doesn't have push and pop instructions. You normally addiu $sp, $sp, -16 or however much stack space you need, and use sw to store into the space reserved. You wouldn't separately sub or add between every load and store.
And end isn't a real thing in MIPS assembly; you need to run an instruction that jumps back to your caller such as jr $ra. Or tailcall some other function, like j foo or b foo, to effectively call it with the return address being the one your caller originally passed.

array address addition in MIPS and translating to C

I just started learning MIPS and don't have much idea what's going on. The following problem asks to translate the last add instruction to C code :
.data
A: .word 0:16 # in C: int A[16];
.text
la $s6, A # in C: int* s6 = A;
li $t0, 3 # in C: int t0 = 3; // integer array index
sll $t0, $t0, 2 # (In MIPS: $t0 = $t0 << 2) ($t0*4 is the byte offset used in MIPS.)
add $s0, $s6, $t0 # <--- What is the equivalent C code
What I understood was this : the address of array A is stored in register $s6, then the constant 3 denoting an array index is stored in $t0, the sll instruction stores 3*2^2 = 12 into register $t0. Then, the add instruction adds the contents of $s6 and $t0 and stores the sum into $s0.
$s6 + $t0 = address of A + 12 ?? I am not able to make sense of this, please help me? Does it mean it adds 12 to the address of A and stores that into $s0?
In C, we do array indexing as follows:
a[i]
which by definition of the operators in C is equivalent to
*(a+i)
This construct represents pointer arithmetic (pointer addition, with binary + operator), followed by dereference (the unary * operator, applied after the addition).
What you're showing is just the pointer arithmetic without the dereference, so basically:
a+i
C knows which one is the pointer and which one is the index, so even i+a would be the same in C (we can also do *(i+a) and i[a] in C, believe it or not, and these are both equivalent to a[i] and *(a+i), indexing into the array to access an element.)
The fundamental difference here between C and assembly (beyond syntax) is that C knows the data type of the pointer, and so automatically scales the index by the element size and you never see that explicitly done in C — whereas in assembly it must be done explicitly, which is sll (shift of two) being used to multiply the index by 4.
So the addition of the array base pointer and index in C is more or less formally called pointer arithmetic and is equivalent to the shift and add in assembly.  Since C hides the scaling, though, there's no direct equivalent to just the add alone.
In C, pointer arithmetic also includes subtraction of pointer and index, with the same scaling automatically applied to the index.
In C, pointer arithmetic also include subtraction of two pointers, so a-b if they are both pointers to the same type and same array or memory block, will compute the index that represents the difference between two pointers.  In C, this is also inversely scaled to obtain a regular index.  In assembly we can do the same with subtraction followed by shifting in the opposite direction, namely right.
Of course there is no addition of pointer to pointer (it doesn't make sense).
Most hardware these days is byte addressable, so we need to use byte addressing both in C and in assembly.  (In languages like Java that don't have pointers we don't see the byte addressing that the language does for you, much like C hides index scaling.)
Any pointer is, in some sense, a pointer to a single byte.  Data items that occupy more than one byte are referred to by their first byte, and this includes arrays, structs, etc..
To do array indexing (into an integer array) we convert the index (0,1,2,3) into a byte offset (0,4,8,12) by scaling (multiplying the index by the pointer's element size).  Then add the byte address (base of the array) to that byte offset, yielding another byte address, but this one refers to the element at the index.

Does sw and lw in MIPS store a value below or above the stack pointer?

My professor gave a video that looks like this:
In the lower right, he wrote $ra at location 124 while the $sp is at 128 which implies that the first sw $ra, 4($sp) instruction stores the $ra value at a location 4 bytes less than the $sp. But my book does it differently:
and
The image implies that the lw instruction stores it at locations larger, more positive numbers than the $sp. So which is right? Does lw and sw offset numbers refer to numbers higher or lower than the $sp?
You are right in observing that the first factorial is storing above the stack pointer, stack storage that it did not allocate, and must have been allocated by the caller.
This is somewhat non-standard usage, but technically legal, since the MIPS calling convention requires giving the top 4 stack locations of any stack frame to the callee. The function is only allocating a 2-word frame, and according to the calling convention (which allows the callee to use the top 4 words of the frame) it should be allocating minimally a 4-word frame.
Still, since the factorial function calls no other except itself, this is ~legal, and in compliance with the calling convention — in the sense that its job is to ensure that one function can call another.
(Note that in RISC V (the open source MIPS follow-on) this requirement of 4-words stack frame for callee to use is not present so similar would not work there.)
The second example is more traditional, however, it also does not allocate a standard sized frame — one that gives the top 4-words to the callee. Still it is also not technically necessary, and less reliant on the original caller (e.g. main) providing a proper stack frame (one with 4 words given to the callee).
Let's further observe that the first code sample stores $ra and $a0 on the stack, which are registers that we expect to be saved — whereas the latter example stores $s0 (which we would expect to be saved as these are dedicated non-volatile), but also $t0 and $t1 which seems non standard as these are dedicated temporaries.

What can be the cause of "jal" to the middle of another function in MIPS

I am looking at a very suspicious disassembled MIPS code of a C application
80019B90 jal loc_80032EB4
loc_80032EB4 is in the middle of another function's body, I've specially checked that no other code is loaded at this address in runtime and calling that function this way(passing some code in the beginning) can be useful. But how is it possible to do in C? It's not a goto as you can't goto to another function and normal function call will always "jal" to the beginning. Can this be some hand optinmimzation?
Update:
Simplified layout of both functions, callee:
sub_80032E88 (lz77_decode)
... save registers ...
80032E90 addiu $sp, -8
... allocate memory for decompressed data ...
80032EB0 move DECOMPRESSED_DATA_POINTER_A1, $v0
loc_80032EB4:
80032EB4 lw $t7, 0(PACKED_DATA_POINTER_A0)
... actual data decompression ...
80032F4C jr $ra
caller:
80019ACC addiu $sp, -0x30
... some not related code ...
80019B88 lw $a1, off_80018084 // A predefined buffer is used instead of allocating it for decompressed data
80019B90 jal loc_80032EB4
80019B94 move $a0, $s0
... some other code and function epilogue ...
Update 2:
I've checked if this can be a case of setjmp/longjmp usage, but in my tests I can always see calls to setjmp and longjmp functions in disassembled code, not a direct jump.
Update 3:
I've tried using GCC-specific ability to get label pointers and casted this pointer to function, result is close to what I want but disassembled code is still different as instead of using jal with exaxct address it calculating it runtime, maybe I am just unable to force compiler to see this value as constant, becouse of scope issues.
Since it is a data decompression function from a game system, it is very likely that this function is hand optimized assembly with multiple entry points. Multiple entry points aren't commonly used, so it is difficult to find a publicly available example, but here is an old thread from the gcc mailing list that suggests a possible use for this technique.
The gist is that if you have two functions where one function F1 has code that is a subset of the other function, F2's code, then the code for F2 can fall through into the code for F1. In your case, F2 allocates memory for the decompressed data, and F1 assumes that the memory allocation has already been done. I'm pretty sure that GCC 2.9x cannot generate code like this.
It is not possible to directly translate this construct from assembler into standard C, because you cannot goto another function in C, but this is perfectly legal in assembler code. The gcc mailing list thread suggests a couple of work-arounds to express the same idea in C.
If you look at the dis-assembled code for the decompression it will likely have a different style than compiler generated code. There may even be some use of opcodes, like find first set bit that the compiler cannot generate from C.

Microprogramming in MIPS

I am learning about micro programming and am confused as to what a micro-instruction actually is. I am using the MIPS architecture. My questions are as follows
Say for example I have the ADD instruction, what would the micro-instructions look like for this? How many micro-instructions are there for the add instruction. Is there somewhere online I can see the list of micro-instructions for the basic instructions of MIPS?
How can I figure out the bit string for an ADD microprogrammed instruction?
Microprogramming is a method of implementing a complex instruction set architecture (such as x86) in terms of simpler "micro instructions". MIPS is a RISC instruction set architecture and is not typically implemented using micro-programming, so there are ZERO microinstructions for the ADD instruction.
To answer your specific question one would have to know what the definition of your particular micro-architecture is.
This is an example of how to load the EPC into one of the registers and add 4-bytes to it:
lw t0, 20(sp) // Load EPC
addi t0, 4 // Add 4 to the return adress
sw t0, 20(sp) // Save EPC
There are "a lot" of instructions that you can use, you can see the MIPS Instruction Set here. In my humble opinion, MIPS is Really neat and easy to learn! A fun fact is that the first Playstation used a MIPS CPU.
Example instructions
lw = load word
la = load address
sw = save word
addi = add immidate
Then you have a lot of conditional instructions such as:
bne = branch not equal
bnez = branch not equal zero
And with these you use j to jump to an adress.
Here is an example from an Exception Handler that I wrote once for MIPS, this is the External Source handler:
External:
mfc0 t0, C0_CAUSE // We could aswell use 24(sp) to load CAUSE
and t0, t0, 0x02000 // Mask the CAUSE
bnez t0, Puls // If the only character left is
// "not equal zero" jump to Puls
j DisMiss // Else jump to DisMiss
In the above example I define an entry point called External that I can jump to, as I do with DisMiss to loop, you generally jump to yourself.
There are some other instructions used here aswell:
mfc0 = move from co-processor 0
To handle labels, I would suggest you check this question/answer out.
Here's a couple of resources on MicroProgramming with MIPS:
Some general information
Here is a bit more heavy power-point presentation on the subject from Princton ( PDF )
Here is a paper from another university which is one of the best of these three ( PDF ).