Fixing load-use hazard issue in pipeline (MIPS) - mips

I have been working on some low level programming with 5 stage pipelining. But I hit a snag.
Assuming this diagram http://i.imgur.com/7kTFi.png
and the mips code:
lw $4,1000($6)
sw $4,2000($6)
what would actually happen? I assumed there would be bubbles, i counted two bubbles proceeding the ID stage.
Can we fix it by adding inputs to the new forwarding unit? Where can I add mux's and new datapaths to to avoid bubble+errors?

You are right, there would be two bubbles.
Assuming data forwarding:
1. IF ID EX MM WB
2. IF S S ID EX MM WB
(S means stall or bubble).
There is no way you can 'fix' it because in any case you have to wait for the end of the MM stage to have the value at 1000($6). It could even be worse without data forwarding where you would have to wait until the WB stage, meaning 3 stalls.
Only way to prevent such behaviour is to have smart compilers that will schedule those two instructions in a different way (i.e space them by adding others in-between).
Note that as it is, the program has no real purpose (get value at memory address [1000+Regs[$6]], and copy it at address [2000+Regs[$6]])

You need MEM to ALU forwarding

Related

MIPS Pipeline forwarding: How to forward to the second succeeding instruction?

Say, for example, I have 3 instructions: 1, 2, and 3.
I want to forward data from instruction 1 to instruction 3. The catch is, I can only forward from the EX/MEM register of instruction 1.
So we have:
1: IF ID EX MEM WB
2: IF ID EX MEM WB
3: IF ID EX MEM WB
and I want to forward from EX/MEM of 1 to ID/EX of 3.
This is part of a homework problem, and apparently I need to stall an instruction. I don't see how this would help anything in the slightest, since it already makes no sense for me to forward data forward in time.
Problem in question:
Answer:
Thanks for any help
... since it already makes no sense for me to forward data forward in time.
Data can only be forwarded forwards in time, not backwards, since as of 2021, we still don't have time machines.
So, forwarding necessarily feeds information available in the processor generated just right now, to somewhere else in the processor, so it can be used in the future (i.e. the next cycle).
Forwarding and stalling are both ways to mitigate a RAW hazard — the idea is simply to get a value from where generated to where needed.
If the "where needed" is earlier in time than the "where generated", then a stall is required.  However, if the "where needed" is later in time than the "where generated" then a forward can mitigate the hazard.  The hazard is caused by the pipeline's assumption that "where needed" is the register file, which is incorrect in back to back operations.
Some hazards require both a forward and a stall, as the best that can be done.
But all hazards can be mitigated with sufficient stalling and without forwarding, though that will reduce performance.  With sufficient stalling, the "where needed" and the "where generated" can both be the register file.
The catch is, I can only forward from the EX/MEM register of instruction 1.  and I want to forward from EX/MEM of 1 to ID/EX of 3.
We cannot forward from EX/MEM of 1 directly to ID/EX of 3.  Why?  Because, ID/EX of 3 is two cycles further along, so the data we want to forward from there (EX/MEM of 1) is no longer there: it has moved down the pipeline to the next stage.  By the time that ID/EX of 3 wants that data, that data is now in MEM/WB (e.g. of 1) — EX/MEM at that time is doing instruction 2.

MIPS pipeline registers length (IF/ID, ID/EX, EX/MEM, MEM/WB)

I am currently studying for my Computer Architecture exam and came across a question that asks to illustrate (bit by bit i would assume) the values contained in the mips pipeline architecture after the 3rd stage of the sub (before the clock commutes) given the following instructions.
add $t0,$t1,$t2
sub $t3,$t3,$t5
beq $t6,$t0,16
add $t0,$t1,$t3
I am not asking for the solution to this problem however after some research i haven't had much success wrapping my mind around it so i am asking for some help/advice.
Firstly i still don't have a clear understanding of the size of the pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB). I do understand that they contain the control unit codes for the next stages and that they contain the result of the previous stage so that it can be passed in to the next one.
So that would be (please correct me if i'm wrong) +9 for ID/EX, +5 for EX/MEM and +2 for MEM/WB but i haven't managed to find a clear schema of the data that we can expect these registers to contain.
Also, i figure that we would need to use HW forwarding to forward the result of the first add to beq (because of $t0) and to forward the result of sub to the last add (because of $t3). Does this factor in to what is contained in the registers?
It would be great if someone could point me in the right direction.
Thanks lots.
The purpose of each of these intermediate registers is to hold data that might be needed in the immediate next stage or in later stages. I'll discuss one possible design, but there are really many possible designs as I'll explain.
In the fetch stage, the next instruction to be execute (to which the current PC points) is fetched from memory and PC is updated to point to the next instruction to fetch. Therefore, IF/ID would include one 4-byte field to hold the fetched instruction. There are two ways to calculate the new PC: current PC + 4 or PC + 4 + offset in case of a branch. If the fetched instruction is itself a branch instruction, then we would need to pass the new PC so that the branch target address can be calculated in the EX stage. We can add a 4-byte field in IF/ID to hold the new PC value to be passed to the EX stage through the ID stage.
In the decode stage, the opcode and its operands are determined. The opcode is at a fixed location in the instruction in MIPS. An MIPS instruction may operate on a single source register, two source registers, one source register and a sign-extended 32-bit immediate value, a sign-extended 32-bit immediate value, or no operands. We can either prepare only the required operands for the EX stage based on the opcode or prepare all the operands that might be required for any opcode. The latter design is simpler, but it requires a larger ID/EX register. In particular, two 4-byte fields are required to hold two possible source register values (the values are read from the register file in the decode stage) and a 4-byte field for the possible sign-extended immediate value. No opcode will require all of these fields, but let's prepare all of them anyway and store them at fixed locations in the ID/EX register. It simplifies the design.
We to also pass the new PC value calculate in the fetch stage to the execute stage just in case the opcode turns out to be a branch. The branch target address is calculated relative to the current PC value (the PC of the instruction following the branch in static program order). There are two possible design here: either add a bus from the new PC field in IF/ID to the EX stage or add a field in ID/EX to hold the new PC value, which can then be accessed in the EX stage. The latter design adds a 4-byte field in ID/EX.
The EX requires the opcode from the ID stage. We can choose to pass only the opcode rather than the whole instruction. But then later stages might require other parts of the instruction. Generally, in RISC pipelines, it preferable to pass to make the whole instruction available to all stages. In this way, all parts of an instruction are already available when changes are made to any stage of the pipeline in the future. So let's add a 4-byte field to ID/EX to hold the instruction.
The EX stage reads the operands and the opcode from the ID/EX register (the opcode is part of the instruction) and performs the operation specified by the opcode. The EX/MEM register has to be big enough to hold all possible results, which might include the following: a 4-byte value computed by the ALU resulting from an arithmetic or logic operation, a 4-byte value representing the calculated effective address for a memory load or store operation, a 4-byte value representing the branch target address in case of a branch instruction, and a 1-bit condition in case of a conditional branch instruction. We can use a single 4-byte field in EX/MEM for the result (whatever it represents) and a 1-bit field for the condition. In addition, as before, we need a 4-byte field to hold the instruction. Also for store instructions, we need another 4-byte field to hold the value to be stored. One possible alternative design here is that rather than storing the 1-bit condition and 4-byte branch target address in EX/MEM, they can be passed directly to the IF stage.
In the MEM stage, in case of a branch instruction, the branch target address and the branch condition are passed back from EX/MEM to the IF fetch to determine the new PC. In case of a memory store operation, the operation is performed and there is no result to be passed to any stage. In case of a memory load operation, the 4-byte value is fetched from memory and stored in a field in the MEM/WB register. In case of an ALU operation, the 4-byte result will be just passed to a field in the MEM/WB register. In addition, as before, we need a 4-byte field in MEM/WB to hold the instruction.
Finally, in the WB stage, the 4-byte result whether loaded from memory or computed by the ALU is stored in the destination register. This only occurs for instructions that produce results. Otherwise, the WB stage can be skipped.
In summary, in the design I've discussed, the sizes of intermediate registers are as follows: IF/ID is 8 bytes in size, ID/EX is 20 bytes in size, EX/MEM is 25 bits in size, and MEM/WB is 8 bytes in size.
The design decision of whether a field is required in an intermediate register to hold some value or whether it can be passed directly in the same stage to the logic that requires the value is a "circuit-level" decision. If the signals can be guaranteed to not be corrupted, and if it feasible or convenient to add a dedicated bus, they can be directly connected.

MIPS Datapath Confusion

Been learning about mips datapath and had a couple questions.
Why is there a writeback stage?
-Thoughts: If it didn't add more latency or make the clock cycles longer it seems like you could move the mux in the writeback stage into the Mem stage and remove the Mem/Writeback buffer and get rid of the writeback stage entirely. Why is this not the case?
Confusion about branch prediction and stalls.
-Thoughts: If an add instruction follows beq instruction into the pipline (beq in ID stage, add in fetch stage) but the branch is taken, how does the add instruction then become converted to a no-op? (What control signals are set, how?)
When are the inter-stage buffers updated?
Thoughts: I think they are updated at the end of the clock cycle but have been unable to verify this. Also, I am trying to understand what exactly happens during a stall. When a stall is needed does the IF/ID inter-stage buffer get locked? If so how is this done? Does the instruction then read from the buffer to determine what instruction should be in the ID stage?
Thanks for any help
Here's a picture of the pipeline:
Writeback stage is for writing the result back to registers. MEM/WB buffer is there to hold any data from the previous stage. By getting rid of the writeback stage, what you'll be doing is essentially extending the mem stage. For example in an instruction like,
LW R1, 8(R2)
contents of the memory location addressed by 8(R2) will be stored in the MEM/WB buffer. By copying the contents to the buffer, MEM stage can now accept another LW instruction, hence more ILP.
#Craig Estey have answered correctly for this. However even if you dont't do the swapping #Craig has mentioned, you can always use control signals and flush things if IF, ID stages for the following instructions.
I am not sure there is a precise answer as to when an inter stage buffer is updated. The way I see it is, at the beginning of a clock cycle, data in the inter stage buffer is not relevant and at the end of a clock cycle it is relevant. Control signals are being used to control whats is happening in each stage of the pipeline, meaning they can be used to tell IF stage not to fetch any.

Organizing pipeline in MIPS

I'm unsure about how the following properties affect pipeline execution for a 5 stage MIPS design (IF, ID, EX, MEM, WB). I just need some clearing up.
only 1 memory port
no data fowarding.
Branch stalls until end of * stage
Does the 1 memory port mean we cannot fetch or write when we read/write to mem (i.e. MEM stage on lw,sw you can't enter IF or another MEM)?
With no forwarding does this means an instruction won't enter the ID stage until after or on the WB stage for the previous instruction it depends on?
Idk what the branch stall means
A common assumption is that you can write in the first half of a cycle, and read in the second half of a cycle.
Lets say than I1 is your first instruction and I2 your second instruction, and I2 is using a register that I1 is modifying.
Only 1 memory port.
This means that you cannot read or write memory at the same time in two different stages of the pipelines.
For instance, if I1 is at the MEM stage, another instruction cannot be at the IF stage at the same time, because both require memory access.
No data forwarding. Data forwarding reflects to the fact that at the end of EX stage for I1, you forward the data to the ID cycle of I2.
Consequently, no forwarding means that the pipeline has to wait for the WB stage of the I1 to go to ID stage of I2. With the asumption, you can go to ID stage at the same time as the WB stage of the previous instruction, because WB will write to memory during the first half of the cycle, and ID will read from memory during the second half of the cycle.
Branch stalls until end of EX stage. This is a common asumption, that doesn't use branch prediction techniques. It simply states that an instruction after a branch has to wait until the end of EX stage to start ID stage. Recall that the address of the next instruction to be executed is known only at the EX stage of the branch instruction.
Comment: IF and MEM access separate sections of memory. One is data memory (.data) and the other instruction memory (.code or .text). It is designed this way so that accessing memory during IF and MEM does not cause a structural stall.
The area which is used by .data is that used by the stack, with the stack is traditionally placed right at the "end" of the .data sector. This is why if you don't subtract from the stack pointer address before saving data to the stack you run the risk of overwriting your program code. As MIPS allows you to designate the stack address manually, some people choose put the stack a bit "before" the end in order to avoid problems if they know they will have space and not overwrite variables in MEM. For instance, placing the stack at 0x300 instead of 0x400 in WinMIPS64. I am not sure if that is good practice or not. But I have heard of people doing it.

Pipelining -Mips instructions

I have confused by using pipelining in mips instruction. Any help will be great. Thanks in advance.
What is the data dependency in the next two codes? Which of them can be solved by using stall (bubble) or forwarding. You can use the shape 1 for convenience.
shape 1:
If-Id-Ex-Mem-Wb
explanation:
if=instruction fetch
id=instruction decode register fetch
ex=execute
mem=memory access
wb=write back
code 1:
add $3,$4,$2
sub $5,$3,$1
lw $6,200($5)
sw $6,200($2)
lw $6,200($3)
add $7,$4,$6
code 2:
add $3,$4,$2
sub $5,$3,$1
lw $6,200($3)
add $7,$3,$6
(sorry for bad post,but i can't yet post an image)
Thanks.
Let's see the first one:
add $3,$4,$2
sub $5,$3,$1
The result from add is used in sub, therefore there is a data hazard. We have to insert an amount of NOP stages to resolve it. Assuming all instructions take up 5 cycles, we insert 3 NOPs and we're done.
add $3,$4,$2 IF ID EX MEM WB
sub $5,$3,$1 NOP NOP NOP IF ID EX MEM WB
We can do this for all subsequent instructions. Now instructions produce new values in the EX and MEM stages. Those values are not written to a register until the WB stage (for learning purposes, let's assume that's true). Since the registers are read in the ID stage, this leaves a window of three cycles for which old incorrect values are "flowing" through the pipeline. Forwarding can help cure this problem in our case above - forward the result from add:EX to sub:ID.
Hope this helps.