Is there an execute-store data hazard in MIPS? - mips

On MIPS architecture with pipelining and forwarding:
add $s0, $t1, $t2
sw $s0, 0($sp)
The add instruction will have the result ready at step 3 (execute operation) however I presume that the sw instruction want the result at step 2 (Instruction decode & register read).
There is a solved exercise in the book Computer Organization and Design by David A. Patterson: Find the hazards in the following code segment and reorder the instructions to avoid any pipeline stalls:
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1,$t2
sw $t3, 12($t0)
lw $t4, 8($01)
add $t5, $t1,$t4
sw $t5, 16($t0)
Solution:
lw $t1, 0($t0)
lw $t2, 4($t1)
lw $t4, 8($01)
add $t3, $t1,$t2
sw $t3, 12($t0)
add $t5, $t1,$t4
sw $t5, 16($t0)
In the solution it correctly recognizes the load-use hazard and rearranges the code accordingly, but is there an execute-store hazard as well?

Let's consider a MIPS in which forwarding is activated.
I think that in that case no hazard occurs: in fact the ADD instruction is an integer operation that in the MIPS architecture requires only one clock cycle.
Look at this graph:
ADD $t3,$t1,$t2 IF ID EX MEM WB
SW $t3,12($t0) IF ID EX MEM WB
As you can see no hazard occurs because the SW instruction stores the datum after two clock cycles since the result is put in $t3 by ADD.
Actually in similar situations a hazard can occur but only if the unit is a multicycle one (if it requires more than one clock cycle to compute the data).
Look ad this example, in which the ADD.D instruction uses a floating point adder that requires 4 clock cycles to perform the calculation:
ADD.D F2,F4,F5 IF ID A0 A1 A2 A3 MEM WB
S.D F2,somewhere IF ID EX X0 X1 X2 MEM WB
X0 and X1 are RAW stalls while X2 is a structural stalls: in the former case S.D must wait for ADD.D to finish; in the latter your MIPS cannot access in the same clock cycle to the memory two times, so a structural stall occurs.

Related

Dual-issue scheduling in MIPS

Here are the MIPS code:
Loop: lw $t0, 0($s1) # $t0=array element
addu $t0, $t0, $s2 # add scalar in $s2
sw $t0, 4($s1) # store result
addi $s1, $s1,–4 # decrement pointer
bne $s1, $zero, Loop # branch $s1!=0
After scheduling, we can pack them like below:
Why addi instruction can move before the sw instruction?
They both use the $s1 register.
Before scheduling in the first iteration,we will get 0($s1)-> $t0, 4($s1)<- $t0+$s2.
However the scheduling result may be 0($s1)-> $t0, 0($s1)<- $t0+$s2
There are obviously different.
What I guess there is a magic in pipeline.
I don't know what the name is,so I call it "anti-data hazard".
Since addi instruction will write back in the 5th stage(WB), we could use the data hazard to make the sw instruction get the old data address($s1) in stage3.
(sw instruction will not trigger forwarding)
Is my guess right?Please tell me.

MIPS Pipeline Stalls: SW after LW

I am confused as to how a Store Word Instruction coming after an LW using the same $rt causes a pipeline stall in MIPS.
Consider this block of code:
lw $s0, 0($t0)
sw $s0, 12($t0)
lw $s1, 4($t0)
sw $s1, 16($t0)
lw $s2, 8($t0)
sw $s2, 20($t0)
Here 3 words are being shifted around in memory. For e.g in the first 2 lines, $s0 is loaded into ,
and then its contents are saved back in the memory. I'm not sure if the sw instruction required $s0 in EX stage or in MEM stage. if it is needed in MEM stage, wouldn't it be resolved just by forwarding without needing to stall the pipeline?
Hypothetically, yes. Forwarding into the MEM stage directly would make it possible to execute dependent LW and SW back-to-back. As long as the loaded word is stored by the SW at least. It wouldn't be possible to have the SW use that loaded word as the base of the address without a pipeline bubble, otherwise it would require forwarding back in time.
But typically you would see a pipeline such as below (source: a model of a 5 stage pipelined MIPS in SIM-PL), with only one forwarder which feeds into EX. With a setup like that, there is no way to forward from LW into SW, the hardware required for it isn't there.

how can I know number of instruction access?

I'm studying computer architecture. I'm confused about some quiz.
when executing n instructions in load-store arch.
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 48($s3)
then what is number of memory access, and number of instruction access?
I think num of memory access is 2 and num of instruction access is 3. Is it right?
Yes it's right, Just for the sake of understanding it better here is some explanation.
MIPS using load word instruction lw to read data word from memory into register and
store word sw to write a word in memory.
lw $t0, 32($s3)
This load a word from memory into register $t0
add $t0, $s2, $t0
This means you are on the register side no memory involved.
sw $t0, 48($s3) This store a word in memory.
You are using 3 instruction which two of them involve with memory access

Write into next byte of a register

I understand that registers in memory are 32 bits. I also understand that lb will load contents from memory into the lower 8 bits of a register, and that if I did
lb $t1, $a3
lb $t1, 4($a3)
The second lb command will overwrite the contents loaded in the first. However, is there a way to write into the second byte of a register (loading from a different part in memory, so not two bytes right next to each other) and preserve the information of the first byte?
I am assuming what you want to use here is lbu (load byte unsigned) and not lb because you don't want the register to be sign extended (e.g. copying the byte AA in the register will result in 000000AA, and not FFFFFFAA).
If you want to write to the second byte of a register you can first use lbu to load the byte from memory to another register, then shift left 8 bits, and addu it to the original register.
For example:
lbu $t1, $a3 # 0x000000AA
lbu $t2, 4($a3) # 0x000000BB
sll $t2, $t2, 8 # 0x000000BB -> 0x0000BB00
addu $t1, $t1, $t2 # 0x000000AA + 0x0000BB00 = 0x0000BBAA

MIPS: Code Scheduling To remove Stalls

I have the following MIPS code:
addi $s1, $0, 10
lw $t0, 4($s0)
srl $t1, $t0, 1 [STALL becausee $t0 depends on lw's $t0]
add $t2, $t1, $s1 [STALL because $t1 depends on srl's $t1]
sw $t2, 4($s0)
How can I rearrange it to avoid any stalls. I see that all the 2 to 5 line's sequence can't change. We can only move the first line in between srl and add OR lw and srl. Any ideas?
There are 4 read after write (RAW) dependencies in your code: addi->add, lw->srl, srl->add, add->sw. These can't be fixed as you pointed out.
What you can do is move the addi instruction. I would think the best place to move this instruction would be after the lw because in the MIPS architecture all load instructions use a load delay slot. This means that the instruction immediately after the load does not have access to the contents of the load. If you are using this code in a simulator such as spim or MARS this may not be simulated, but assuming you mean to use the loaded value of $t0 in the srl instruction, your assembly above is actually incorrect. For this to work, there should be a nop in between the lw and srl.
For that reason, it would be best to move the addi in between the lw and srl so as to utilize the lw load delay slot.