I am confused as to how a Store Word Instruction coming after an LW using the same $rt causes a pipeline stall in MIPS.
Consider this block of code:
lw $s0, 0($t0)
sw $s0, 12($t0)
lw $s1, 4($t0)
sw $s1, 16($t0)
lw $s2, 8($t0)
sw $s2, 20($t0)
Here 3 words are being shifted around in memory. For e.g in the first 2 lines, $s0 is loaded into ,
and then its contents are saved back in the memory. I'm not sure if the sw instruction required $s0 in EX stage or in MEM stage. if it is needed in MEM stage, wouldn't it be resolved just by forwarding without needing to stall the pipeline?
Hypothetically, yes. Forwarding into the MEM stage directly would make it possible to execute dependent LW and SW back-to-back. As long as the loaded word is stored by the SW at least. It wouldn't be possible to have the SW use that loaded word as the base of the address without a pipeline bubble, otherwise it would require forwarding back in time.
But typically you would see a pipeline such as below (source: a model of a 5 stage pipelined MIPS in SIM-PL), with only one forwarder which feeds into EX. With a setup like that, there is no way to forward from LW into SW, the hardware required for it isn't there.
Related
I just started studying computer organization.
My question is similar to this article
How are the address of the memory and that of the register connected?(AddrConstant MIPS instruction)
lw $t0, AddrConstant4($s1)
The meaning of this instruction is $t0=constant 4
How I understand this instruction is adding 4 to the value of register $s1 and, load (4+value of register $s1) into $t0.
My question is that I don't know what value does $s1 already have.
If $s1 has 0, it makes sense.
However, if $s1 has 5, $t0 will have 4+5=9.
who knows what value is in $s1.
or what I understood is wrong?
As soon as I wrote this question, another idea came to me.
AddrConstant4($s1) means put 4 into the value of register $s1. (It doesn't matter what value $s1 had before.)
So lw $t0, AddrConstant4($s1) is same as $t0==4.
This is right?
Its a pseudo instruction, that may expand to two or more instructions.
lw $t0, label($t1)
Expands to something like:
la $at, label
addu $at, $at, $t1
lw $t0, 0($at)
Where la itself is a pseudo instruction involving lui and perhaps ori.
There are some optimizations possible, but for the best code, much better to use la to load the label address into a register for a longer duration, e.g. that can be done outside a loop. (OR change algorithm to use pointers.)
I'm studying computer architecture. I'm confused about some quiz.
when executing n instructions in load-store arch.
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 48($s3)
then what is number of memory access, and number of instruction access?
I think num of memory access is 2 and num of instruction access is 3. Is it right?
Yes it's right, Just for the sake of understanding it better here is some explanation.
MIPS using load word instruction lw to read data word from memory into register and
store word sw to write a word in memory.
lw $t0, 32($s3)
This load a word from memory into register $t0
add $t0, $s2, $t0
This means you are on the register side no memory involved.
sw $t0, 48($s3) This store a word in memory.
You are using 3 instruction which two of them involve with memory access
I am working with MIPS Assembly in the Mars simulator and I am attempting to set it up so that I can enter any 2-digit number (ex: 24) into the Keyboard and Display MMIO Simulator and then take it from MMIO addresses and put it into my registers for manipulation. This technique will use polling, which I understand to some degree.
I can load individual characters and place their ascii value into my registers using the following code (inside .text):
main:
lui $t0, 0xFFFF #$t0 = 0xFFFF0000
poll: # polling procedure
lw $t1, 0($t0)
andi $t1, $t1, 0x0001
beq $t1, $zero, poll
lw $a0, 4($t0) # load word into register $a0
Is it possible in this case for MMIO to treat the input as an immediate and to take in two at once? If not, then are there any known workarounds to this? Thanks.
On MIPS architecture with pipelining and forwarding:
add $s0, $t1, $t2
sw $s0, 0($sp)
The add instruction will have the result ready at step 3 (execute operation) however I presume that the sw instruction want the result at step 2 (Instruction decode & register read).
There is a solved exercise in the book Computer Organization and Design by David A. Patterson: Find the hazards in the following code segment and reorder the instructions to avoid any pipeline stalls:
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1,$t2
sw $t3, 12($t0)
lw $t4, 8($01)
add $t5, $t1,$t4
sw $t5, 16($t0)
Solution:
lw $t1, 0($t0)
lw $t2, 4($t1)
lw $t4, 8($01)
add $t3, $t1,$t2
sw $t3, 12($t0)
add $t5, $t1,$t4
sw $t5, 16($t0)
In the solution it correctly recognizes the load-use hazard and rearranges the code accordingly, but is there an execute-store hazard as well?
Let's consider a MIPS in which forwarding is activated.
I think that in that case no hazard occurs: in fact the ADD instruction is an integer operation that in the MIPS architecture requires only one clock cycle.
Look at this graph:
ADD $t3,$t1,$t2 IF ID EX MEM WB
SW $t3,12($t0) IF ID EX MEM WB
As you can see no hazard occurs because the SW instruction stores the datum after two clock cycles since the result is put in $t3 by ADD.
Actually in similar situations a hazard can occur but only if the unit is a multicycle one (if it requires more than one clock cycle to compute the data).
Look ad this example, in which the ADD.D instruction uses a floating point adder that requires 4 clock cycles to perform the calculation:
ADD.D F2,F4,F5 IF ID A0 A1 A2 A3 MEM WB
S.D F2,somewhere IF ID EX X0 X1 X2 MEM WB
X0 and X1 are RAW stalls while X2 is a structural stalls: in the former case S.D must wait for ADD.D to finish; in the latter your MIPS cannot access in the same clock cycle to the memory two times, so a structural stall occurs.
I have the following MIPS code:
addi $s1, $0, 10
lw $t0, 4($s0)
srl $t1, $t0, 1 [STALL becausee $t0 depends on lw's $t0]
add $t2, $t1, $s1 [STALL because $t1 depends on srl's $t1]
sw $t2, 4($s0)
How can I rearrange it to avoid any stalls. I see that all the 2 to 5 line's sequence can't change. We can only move the first line in between srl and add OR lw and srl. Any ideas?
There are 4 read after write (RAW) dependencies in your code: addi->add, lw->srl, srl->add, add->sw. These can't be fixed as you pointed out.
What you can do is move the addi instruction. I would think the best place to move this instruction would be after the lw because in the MIPS architecture all load instructions use a load delay slot. This means that the instruction immediately after the load does not have access to the contents of the load. If you are using this code in a simulator such as spim or MARS this may not be simulated, but assuming you mean to use the loaded value of $t0 in the srl instruction, your assembly above is actually incorrect. For this to work, there should be a nop in between the lw and srl.
For that reason, it would be best to move the addi in between the lw and srl so as to utilize the lw load delay slot.