I just started learning MIPS and don't have much idea what's going on. The following problem asks to translate the last add instruction to C code :
.data
A: .word 0:16 # in C: int A[16];
.text
la $s6, A # in C: int* s6 = A;
li $t0, 3 # in C: int t0 = 3; // integer array index
sll $t0, $t0, 2 # (In MIPS: $t0 = $t0 << 2) ($t0*4 is the byte offset used in MIPS.)
add $s0, $s6, $t0 # <--- What is the equivalent C code
What I understood was this : the address of array A is stored in register $s6, then the constant 3 denoting an array index is stored in $t0, the sll instruction stores 3*2^2 = 12 into register $t0. Then, the add instruction adds the contents of $s6 and $t0 and stores the sum into $s0.
$s6 + $t0 = address of A + 12 ?? I am not able to make sense of this, please help me? Does it mean it adds 12 to the address of A and stores that into $s0?
In C, we do array indexing as follows:
a[i]
which by definition of the operators in C is equivalent to
*(a+i)
This construct represents pointer arithmetic (pointer addition, with binary + operator), followed by dereference (the unary * operator, applied after the addition).
What you're showing is just the pointer arithmetic without the dereference, so basically:
a+i
C knows which one is the pointer and which one is the index, so even i+a would be the same in C (we can also do *(i+a) and i[a] in C, believe it or not, and these are both equivalent to a[i] and *(a+i), indexing into the array to access an element.)
The fundamental difference here between C and assembly (beyond syntax) is that C knows the data type of the pointer, and so automatically scales the index by the element size and you never see that explicitly done in C — whereas in assembly it must be done explicitly, which is sll (shift of two) being used to multiply the index by 4.
So the addition of the array base pointer and index in C is more or less formally called pointer arithmetic and is equivalent to the shift and add in assembly. Since C hides the scaling, though, there's no direct equivalent to just the add alone.
In C, pointer arithmetic also includes subtraction of pointer and index, with the same scaling automatically applied to the index.
In C, pointer arithmetic also include subtraction of two pointers, so a-b if they are both pointers to the same type and same array or memory block, will compute the index that represents the difference between two pointers. In C, this is also inversely scaled to obtain a regular index. In assembly we can do the same with subtraction followed by shifting in the opposite direction, namely right.
Of course there is no addition of pointer to pointer (it doesn't make sense).
Most hardware these days is byte addressable, so we need to use byte addressing both in C and in assembly. (In languages like Java that don't have pointers we don't see the byte addressing that the language does for you, much like C hides index scaling.)
Any pointer is, in some sense, a pointer to a single byte. Data items that occupy more than one byte are referred to by their first byte, and this includes arrays, structs, etc..
To do array indexing (into an integer array) we convert the index (0,1,2,3) into a byte offset (0,4,8,12) by scaling (multiplying the index by the pointer's element size). Then add the byte address (base of the array) to that byte offset, yielding another byte address, but this one refers to the element at the index.
Related
What happens in the receiving register if I do the following operation
sw $t3,0(t2) ? where t3 and t2 are two already known registers.
Does the value of t2 register get evoked during the operation or does the compiler just uses the default answer to any multiplication by zero which is (0)?
Thanks!
You are missing the '$' before 't2'. For sw and other similar instructions, the number outside of the parenthesis is the offset. For example if the base address of an array is stored in $t2, the value in $t3 will be stored at the 0 offset of that address (the 0 index of the array). Since MIPS is based off of 32-bits, each index in the array that you want to access can be multiplied by 4 to get the correct offset.
void sort (int v[], int n)
{
int i, j;
for (i = 0; i<n; i+=1){
for(j=i-1; j>=0; && v[j]>v[j+1]; j-=1){
swap(v,j);
}
}
}
void swap(int v[], int k)
{
int temp;
temp = v[k+1];
v[k+1]= temp;
}
below is MIPS code of swap function
swap : sll $t1, $a0, 2 // reg $t = k*4
add $t1, $a0, $t1 // reg $t1 = v+(k*4)
// reg $t1 has the address of v[k]
lw $t0, 0($t1) // reg $t0 (temp) =v[k]
lw $t2, 4($t1) // reg $t2 = v[k+1]
// refers to next element of v
sw $t2, 0($t1) // v[k] = reg $t2
sw $t0, 4($t1) // v[k+1] = reg $t0 (temp)
jr $ra // return to calling routine
I am studying computer architecture. Through several procedures, we learned that values that should not be changed are stored using a stack pointer or stored in $S.
However, in the code above, the parameters v[] ($a0) and n($a1) of the sort function are stored in $s0 $s1 using the stack pointer, and before running the innermost loop, $s0 in $a0, The data is shown as assigning the value of $s1 to $a1.
However, in the code above, v[] has to keep the changed state inside the swap function, and j also has to change the value while looping, so there is no need to save and put the previous value.
Is my explanation correct? That is, when implementing the above code in MIPS, there is no problem in using the stack pointer to not store it in $s?
Is $s necessary when implementing a function that swaps while looping through a loop in MIPS?
The $s registers are never necessary. What is necessary is that the variable values of v, n, i, and j survive the function call to swap. There are two ways to make values survive the function call, and one is to use (stack) memory and the other is to use $s registers, so, $s register are not strictly necessary at all. (Both forms involve stack memory (and the same amount, as well), but the former uses the stack memory directly for the variable values, and the latter uses the stack memory in preservation of the $s registers' original values).
However, $s registers are preferred for variables that have multiple uses, (measured dynamically by instruction execution at runtime rather than statically as seen in code listing) and where also such variable must survive a function call. (Of course, arrays cannot live in $s registers, but a reference to the array (a pointer variable) can.)
Why do these variables have to survive the function call? Because they are each consumed (used, sourced) after the call, though having been defined (set, targeted) before the function call. Programs expect continuity of their variables whether or not function calls happen.
Let's call out the difference between logical variables used by high level language and pseudo code vs. the physical storage of machine code.
Logical variables have names, types, scope/lifetime (they come and go, some frequently during program execution, as local variables and parameters). There is an infinite number of possible ones — just create a new name as we like (and can also be unnamed, as with heap data structures). A program using logical variables has an expectation of continuity — that once set, a variable retains its value until changed by the program (and then retains that new value).
The physical storage of the machine is permanent and global and just groups of bits. The CPU registers are always present, as is memory (let's gloss over virtualization). The CPU does not see variables, their declarations, their names, their types or their lifetimes — the CPU sees physical storage. (How does it work then? The CPU is told, via machine instructions, how to interact with physical storage; it is so told as needed, every time it runs a machine code instruction.)
Part of the job of translating a high level language program into machine code is managing the mapping of the program's logical variables to the hardware's physical storage. Since physical storage has constraints, machine code programs must work within them in doing those mappings.
Sometimes a logical variable has to be relocated to alternative physical storage (e.g. from $a register to either $s register or stack memory). A machine code instruction (or sequence) is usually required to do such relocation, plus, the compiler or assembly programmer has to be aware that the mapping from variable to storage can be different at different points in the same program.
(Sometimes a logical variable's value is actually in more than one physical storage location, as is the case when manipulating global variables that live in memory and copies are brought into CPU registers to work with. Similar is true for elements of arrays.)
My professor gave a video that looks like this:
In the lower right, he wrote $ra at location 124 while the $sp is at 128 which implies that the first sw $ra, 4($sp) instruction stores the $ra value at a location 4 bytes less than the $sp. But my book does it differently:
and
The image implies that the lw instruction stores it at locations larger, more positive numbers than the $sp. So which is right? Does lw and sw offset numbers refer to numbers higher or lower than the $sp?
You are right in observing that the first factorial is storing above the stack pointer, stack storage that it did not allocate, and must have been allocated by the caller.
This is somewhat non-standard usage, but technically legal, since the MIPS calling convention requires giving the top 4 stack locations of any stack frame to the callee. The function is only allocating a 2-word frame, and according to the calling convention (which allows the callee to use the top 4 words of the frame) it should be allocating minimally a 4-word frame.
Still, since the factorial function calls no other except itself, this is ~legal, and in compliance with the calling convention — in the sense that its job is to ensure that one function can call another.
(Note that in RISC V (the open source MIPS follow-on) this requirement of 4-words stack frame for callee to use is not present so similar would not work there.)
The second example is more traditional, however, it also does not allocate a standard sized frame — one that gives the top 4-words to the callee. Still it is also not technically necessary, and less reliant on the original caller (e.g. main) providing a proper stack frame (one with 4 words given to the callee).
Let's further observe that the first code sample stores $ra and $a0 on the stack, which are registers that we expect to be saved — whereas the latter example stores $s0 (which we would expect to be saved as these are dedicated non-volatile), but also $t0 and $t1 which seems non standard as these are dedicated temporaries.
Could you please tell me what is the target operand of the following two MIPS instructions represent:
j target
beq $t0,$t1,target
is target represent the number of instructions displacement or bytes displacement ?
In assembly, the target is just a label of your source code.
When assembled, j jumps unconditionally to the effective address encoded by the instruction * 4. This is due to the fact that every instruction occupies 4 bytes, and each instruction must be word-aligned, so the encoding of the instruction does not store the two less significant bits of the target address (which will be always 00).
The branch instructions performs a relative jump. In machine code, the instruction stores (in A2-compliment) the number of words to move counting from the address of the next instruction to be executed.
In your jargon, they both be 'instructions displacement'.
I am reverse engineering a C MIPS application, in some places I can see negative offsets in lw opcode, like:
80032910 lw $v0, -4($s4)
Positive offsets usually indicate some kind of structure, where one of the members is being accessed, but what code can lead to a negative offset?
For example, it can be generated if you read the previous value of a pointer, e.g. traversing an array from the end to the beginning
int *myDataEnd;
... code ...
while(*myDataEnd > *(myDataEnd-1))
myDataEnd--;
Referencing the integer pointed by myDataEnd-1 may generate that instruction.
Position independent code uses $gp as a pointer into the middle of a 64K region of global data, so you will often see lw $t0, -nnn($gp) in code. If the depth of a stack frame is unpredictable at compile-time, a frame pointer may be used to mark the start of the stack frame, causing there to be memory references with negative offsets to the frame pointer.
Hand optimized code can also use negative offsets to save reloading an address into a register.