How does lw in mips actually work? - mips

Is this statement a valid one?
lw $t0, 21($s0)
$s0 contains the decimal 2022.
In my opinion this is invalid, because based on what I know, The address specified by the offset + the register should always be a multiple of 4. Is this correct or not?
An extension to this question based on the answer provided, The exception will arise just on looking at the address at $s0 or after the computation of the address 21+$s0 ?

The execution of that instruction when $s0 contains the decimal 2022 will raise an exception, due to the effective address (2041=2022+21) is not aligned properly.

Related

How does MIPS assembler manage label address?

How does MIPS's assembler labels and J type instruction work?
I am currently making a MIPS simulator using C++ and came into a big question. How exactly does MIPS assembler manage label's and their address while on a J type instruction?
Let's assume that we have a following code. Also let's assume that start: starts at 0x00400000. Comments after code represent where the machine codes will be stored in memory.
start:
andi $t0, $t0, 0 # 0x0040 0000
andi $t1, $t1, 0 # 0x0040 0004
andi $t2, $t2, 0 # 0x0040 0008
addi $t3, $t3, 4 # 0x0040 000C
loop:
addi $t2, $t2, 1 # 0x0040 0010
beq $t2, $t3, exit # 0x0040 0014
j loop # 0x0040 0018
exit:
addi $t0, $t0, 1000 # 0x0040 002C
As I am understanding right at the moment, j loop expression will set PC as 0x0040 0010.
When J type instruction uses 32 bits and with MSB 6 bits as its opcode, it only has 26 bits left to represent address of instruction. Then how is it possible to represent 32 bit address system using only 26 bits?
With the example above, it can represent 0x00400010 with only 24bits. However, in references, text segment is located from 0x00400000 to 0x10000000 which needs 32bit to represent.
I have tried to understand this using MARS simulator, however it just represents j loop as j 0x00400010 which seems nonsense to me since 0x00400010 is 32 bits.
My current guess
One of my current guesses is following.
Assembler saves the loop: label's address into some memory address that is reachable by 26 bits. Then when expression j loop is called, label loop is translated to the memory address that contains 0x00400010 For example, 0x00400010 is saved in some address like 0x00300000 and when j loop is called, loop is translated into 0x00300000 and it is able to get value from 0x00300000 and reach out 0x00400010. (This is just one of my guess)
You have a number of questions here.
First, let's try to differentiate between the assembler's operation and the MIPS machine code that it generates and the processor executes.
The assembler manages labels and address in two ways.  First, it has a symbol table, which is like a dictionary, a data structure of key-value pairs where the names are keys and the addresses (that those names will refer to when the program is running) are the values in the pairs.
Second, the assembler manages the code and data sections with a location counter.  That location counter advances each time the program provides some code or data.  When new label is defined, the current location counter is then used as the address value in a new key-value pair.
The processor never sees the labels: they do not execute and they do not occupy any space in the code or data.  The processor sees only machine code instructions, which on MIPS are all 32-bits wide.  Each machine code instruction is divided into fields.  There are instruction types or formats, which on MIPS are straightforward: I-Type, J-Type, and R-Type.  These formats then define the instruction fields, and the assembler follows these encodings.  All the instruction formats share the 6-bit opcode field, and this opcode field tells the processor what format the instruction is, which fields it therefore has, and thus how to interpret and execute the rest of the instruction.
The assembler removes labels from the assembly — labels and their names do not exist in the program binary.  The label definitions themselves (label:) are omitted from the program binary but usages of labels are translated into numbers, so a machine code instruction that uses a label will have some instruction field that is numeric, and the assembler will provide a proper value for that numeric field so that the effect of the reaching or otherwise accessing what the label referred to is accomplished.  (The label is no longer in the program binary, but the code or data memory that the label referred does remain).
The assembler sets up branch instructions, j instructions, and la/lw instructions, using numbers that tell the processor how far forward or backward to move the program counter, or, what address some data of interest is at.  The lw/la instructions access data, and these use 2 x 32-bit instructions each holding 16 bits of the address of interest.  Between the two instructions, they put together a full 32-bit address for data access.  For branches to fully reach any 32-bit address, they would have to put together the 32-bit address in a similar manner (two instruction pair) and use an indirect/register branch.

How a recursive function works in MIPS?

I'm a newbie in MIPS (as I started learning MIPS assembly for my college) and I've got a problem in understanding how a recursive function works in MIPS.
For example, I've got this program (in C) to write it in MIPS:
int fact (int n)
{
if (n < 1) return 0;
else return n * fact(n - 1);
}
Can someone help me, with this or another example of a recursive function and explain me how it works?
The first thing I'd like to share is that the complexity in translating this into MIPS comes from the presence of mere function calling, rather than because recursion is involved — that fact is recursive is IMHO a red herring. To this end, I'll illustrate a non-recursive function that has every bit the complexity of the recursive function you've stated:
int fact (int n)
{
if (n < 1) return 0;
else return n * other(n - 1); // I've changed the call to "fact" to function "other"
}
My alteration is no longer recursive! However the MIPS code for this version will look identical to the MIPS code for your fact (with the exception, of course, that the jal fact which changes jal other). This is meant to illustrate that the complexity in translating this is due to the call within the function, and has nothing to do with who is being called. (Though YMMV with optimization techniques.)
To understand function calling, you need to understand:
the program counter: how the program interacts with the program counter, especially, of course in the context of function calling..
parameter passing
register conventions, generally
In C, we have explicit parameters. These explicit parameter, of course, also appear in assembly/machine language — but there are also parameters passed in machine code that are not visible in C code. Examples of these are the return address value, and the stack pointer.
What is needed here is an analysis of the function (independent of recursion):
The parameter n will be in $a0 on function entry. The value of n is required after the function call (to other), because we cannot multiply until that function call returns the right hand operand of *.
Therefore, n (the left hand operand to *) must survive the function call to other, and in $a0 it will not — since our own code will repurpose $a0 in order to call other(n-1), as n-1 must go into $a0 for that.
Also, the (in C, implicit) parameter$ra holds the return address value needed to return to our caller. The call to other will, similarly, repurpose the $ra register, wiping out its previous value.
Therefore, this function (yours or mine) needs two values to survive the function call that is within its body (e.g. the call to other).
The solution is simple: values we need (that are living in registers that are repurposed or wiped out by something we're doing, or the callee potentially does) need to be moved or copied elsewhere: somewhere that will survive the function call.
Memory can be used for this, and, we can obtain some memory for these purposes using the stack.
Based on this, we need to make a stack frame that has space for the two things we need (and would otherwise get wiped out) after calling other. The entry $ra must be saved (and later reloaded) in order for us to use it to return; also, the initial n value needs to be saved so we can use it for the multiply. (Stack frames are typically created in function prologue, and removed in function epilogue.)
As is often the case in machine code (or even programming in general) there are also other ways of handling things, though the gist is the same. (This is a good thing, and an optimizing compiler will generally seek the best way given the particular circumstances.)
Presence or absence of recursion does not change the fundamental analysis we need to translate this into assembly/machine language. Recursion dramatically increases the potential for stack overflow, but otherwise does not change this analysis.
Addendum
To be clear, recursion imposes the requirement to use a dynamically expandable call stack — though all modern computer systems provide such a stack for calling, so this requirement is easy to forget or gloss over on today's systems.
For programs without recursion, a call stack is not a requirement — local variables can be allocated to function-private global variables (including the return address), and this was done on certain older systems like the PDP-8, which did not offer specific hardware support for a call stack.
Systems that use stack memory for passing parameters and/or are register poor may not require the analysis described in this answer, since variables are already being stored in memory that survives nested function calls.
It is the partitioning of registers on modern register-rich machines that creates the requirement for the above analysis. These register-rich machines pass parameters and return values (mostly) in CPU registers, which is efficient but imposes the need to sometimes make copies as registers are repurposed from one function to another.
A way to implement the function you described is using the allocation of memory with addi to move the stack pointer to allocate (at the start) and free (at the end) some stack space. Then the sw instruction can save registers into that space. Use lw to restore them after a call, and/or when you're ready to return. So we can start with this instruction to allocate some memory:
addi $sp, $sp, -8 in $sp register, we sum -8
this is, we need 8 bytes, 4 for the $ra return and also 4 bytes for the int n. Now, we allocate in the following way:
sw $a0, 4($sp) #we are saving the int with register $a0 in position 4
sw $ra, 0($sp) #we are saving the return address with address $ra in position 0
Now, we need a temporary variable to store the 1 in the comparison above. Then we have:
addi $t0, $0, 2 in $t0 register, we sum 2 to $0
now the comparison operand is slt, in our case:
slt $t0, $a0, $t0 in $t0 register, we compare the value contained in $a0 register with that in $t0 register, if true $t0 is 1, else is 0
for if $t0 is zero, we need to have the following jump structure (observe that else is a label, this is, a structure to be followed according to a rule):
obs.: $0 is used to store zero
beq $t0, $0, $t0, else in $t0 we see if it's zero, if so, we continue our program, if not, we go to another instruction, this is, else.
continuing, we now have to return 0, as follows:
`addi $v0, $0, 0
and at the end we have to restore the stack as we very much know.
For the label else, we start with the notion that we need n becoming n-1, in the following manner:
`addi $a0, $a0, -1 #this is, we add $a0 and -1 to $a0
we have to use jal fact for it's clear we have a recursion.
the next step is to restore the address of return ra and the int n as we know, and also the stack.
It's evident that we have a multiplication, for this motif, we will apply the next instruction:
`mul $v0, $a0, $v0 #this is, we multiply $a0 with $v0, remembering that v0 stores the fact(n-1):
`mul $v0, $a0, $v0 #multiplies n and fact(n-1)
we have to keep in mind that it's necessary to use jr $ra to return.
I hope, I have cleared one or another point.

MIPS and $31, can't understand why data is stored in register $31

I don't know too much about MIPS, because we have to do it next year at university, but, this year we got to work with lex and yacc and ofcourse we need to know MIPS. I just learned something about it few hours ago, but for example if we have 'a=-2' and 'b=-a', I know that for 'a=-2' we have something like that 'addi $1, $0, -2', and for 'b=-a' we have something like that 'move $2, $31'. I understood untill here, but I want to know something. $31 is register where 'b' will be stored? and if yes, what is so special at that register? Why can't be stored in $30 , or $29 for example? It is because $31 is last register?
Register assignment is based upon the compiler's allocation scheme, subject to the mips ABI http://www.cs.uwm.edu/classes/cs315/Bacon/Lecture/HTML/ch05s03.html
So, if you have two variables: a and b, the compiler can assign them to any register that is available for the given purpose. Register $31 aka $ra is the return address register. It's not a good choice to retain a data value because $ra is hardwired into the jal instruction.
$0 aka $zero is hardwired to the value of zero. Other registers can be used for any purpose, but most compilers, and most programs, adhere to the register usage conventions of the ABI.
Thus, $1 aka $at is the "assembler temporary". This is used because mips only has conditional branch instructions for equality/inequality (e.g. beq/bne) and does not have (e.g. blt). So, it has an slt instruction that takes an output register, which is generally the $at register
For your sequence:
a = -2;
b = -a;
Let's assume that a has been assigned $t0 and b has been assigned to $t1. The generated sequence would be:
addi $t0,$zero,-2 # a = -2
sub $t1,$zero,$t0 # b = -a
Also, for more on what can and cannot be done with $ra, see my answer here: Whether $ra register callee saved or caller saved in mips?
$31 register in MIPS is the return address register. It is saved by the calling function. It is available for use after saving.
But there is no checking against that. It can be used in a lw instruction just like any other general purpose register.

What is the use of a $zero register in MIPS?

What is the use of a $zero register in MIPS?
What does it mean?
lw $t0, myInteger($zero)
The zero register always holds the constant 0. There's not really anything special about it except for the fact that 0 happens to be a very useful constant. So useful that the MIPS designers dedicated a register to holding its value. (This way you don't have to waste another register, or any memory, holding the value.)
EDIT:
As for the question of what that line of code means, it loads the word from MEMORY[myInteger + 0] into the $t0 register. The lw command takes both a constant (myInteger) and a register ($zero). Not sure why that is, but that's just how the instructions work. Since myInteger was used as the constant, a register had to be provided, so $zero was used.

Unaligned exception behaviour mips

Say I have this instruction in MIPS
lw $t0, 21($s0)
$s0 contains the decimal 2022 (not a multiple of four)
Here, The exception will arise just on looking at the address at $s0 or after the computation of the address 21+$s0 ?
The exception will be caused by trying to load from address 21($s0), or 2022+21 = 2043 decimal. EPC will contain the address of the load instruction when the exception happened, and the BadVaddr register should contain decimal 2043.