MIPS Pipeline forwarding: How to forward to the second succeeding instruction? - mips

Say, for example, I have 3 instructions: 1, 2, and 3.
I want to forward data from instruction 1 to instruction 3. The catch is, I can only forward from the EX/MEM register of instruction 1.
So we have:
1: IF ID EX MEM WB
2: IF ID EX MEM WB
3: IF ID EX MEM WB
and I want to forward from EX/MEM of 1 to ID/EX of 3.
This is part of a homework problem, and apparently I need to stall an instruction. I don't see how this would help anything in the slightest, since it already makes no sense for me to forward data forward in time.
Problem in question:
Answer:
Thanks for any help

... since it already makes no sense for me to forward data forward in time.
Data can only be forwarded forwards in time, not backwards, since as of 2021, we still don't have time machines.
So, forwarding necessarily feeds information available in the processor generated just right now, to somewhere else in the processor, so it can be used in the future (i.e. the next cycle).
Forwarding and stalling are both ways to mitigate a RAW hazard — the idea is simply to get a value from where generated to where needed.
If the "where needed" is earlier in time than the "where generated", then a stall is required.  However, if the "where needed" is later in time than the "where generated" then a forward can mitigate the hazard.  The hazard is caused by the pipeline's assumption that "where needed" is the register file, which is incorrect in back to back operations.
Some hazards require both a forward and a stall, as the best that can be done.
But all hazards can be mitigated with sufficient stalling and without forwarding, though that will reduce performance.  With sufficient stalling, the "where needed" and the "where generated" can both be the register file.
The catch is, I can only forward from the EX/MEM register of instruction 1.  and I want to forward from EX/MEM of 1 to ID/EX of 3.
We cannot forward from EX/MEM of 1 directly to ID/EX of 3.  Why?  Because, ID/EX of 3 is two cycles further along, so the data we want to forward from there (EX/MEM of 1) is no longer there: it has moved down the pipeline to the next stage.  By the time that ID/EX of 3 wants that data, that data is now in MEM/WB (e.g. of 1) — EX/MEM at that time is doing instruction 2.

Related

How many instructions need to be killed on a miss-predict in a 6-stage scalar or superscalar MIPS?

I am working on a pipeline with 6 stages: F D I X0 X1 W. I am asked how many instructions need to be killed when a branch miss-predict happens.
I have come up with 4. I think this because the branch resolution happens in X1 and we will need to kill all the instructions that came after the branch. In the pipeline diagram, it looks like it would require killing 4 instructions that are in the process of flowing through the pipeline. Is that correct?
I am also asked how many need to be killed if the pipeline is a three-wide superscalar. This one I am not sure on. I think that it would be 12 because you can fetch 3 instructions at a time. Is that correct?
kill all the instructions that came after the branch
Not if this is a real MIPS. MIPS has one branch-delay slot: The instruction after a branch always executes whether the branch is taken or not. (jal's return address is the end of the delay slot so it doesn't execute twice.)
This was enough to fully hide the 1 cycle of branch latency on classic MIPS I (R2000), which used a scalar classic RISC 5-stage pipeline. It managed that 1 cycle branch latency by forwarding from the first half of an EX clock cycle to an IF starting in the 2nd half of a clock cycle. This is why MIPS branch conditions are all "simple" (don't need carry propagation through the whole word), like beq between two registers but only one-operand bgez / bltz against an implicit 0 for signed 2's complement comparisons. That only has to check the sign bit.
If your pipeline was well-designed, you'd expect it to resolve branches after X0 because the MIPS ISA is already limited to make low-latency branch decision easy for the ALU. But apparently your pipeline is not optimized and branch decisions aren't ready until the end of X1, defeating the purpose of making it run MIPS code instead of RISC-V or whatever other RISC instruction set.
I have come up with 4. I think this because the branch resolution happens in X1 and we will need to kill all the instructions that came after the branch.
I think 4 cycles looks right for a generic scalar pipeline without a branch delay slot.
At the end of that X1 cycle, there's an instruction in each of the previous 4 pipeline stages, waiting to move to the next stage on that clock edge. (Assuming no other pipeline bubbles). The delay-slot instruction is one of those and doesn't need to be killed.
(Unless there was an I-cache miss fetching the delay slot instruction, in which case the delay slot instruction might not even be in the pipeline yet. So it's not as simple as killing the 3 stages before X0, or even killing all but the oldest previous instruction in the pipeline. Delay slots are not free to implement, also complicating exception handling.)
So 0..3 instructions need to be killed in pipeline stages from F to I. (If it's possible for the delay-slot instruction to be in one of those stages, you have to detect that special case. If it isn't, e.g. I-cache miss latency long enough that it's either in X0 or still waiting to be fetched, then the pipeline can just kill those first 3 stages and do something based on X0 being a bubble or not.)
I think that it would be 12 because you can fetch 3 instructions at a time
No. Remember the branch itself is one of a group of 3 instructions that can go through the pipeline. In the predict-not-taken case, presumably the decode stage would have sent all 3 instructions in that fetch/decode group down the pipe.
The worst case is I think when the branch is the first (oldest in program order) instruction in a group. Then 1 (or 2 with no branch delay slot) instructions from that group in X1 have to be killed, as well as all instructions in previous stages. Then (assuming no bubbles) you're cancelling 13 (or 14) instructions, 3 in each previous stage.
The best case is when the branch is last (youngest in program order) in a group of 3. Then you're discarding 11 (or 12 with no delay slot).
So for a 3-wide version of this pipeline with no delay slot, depending on bubbles in previous pipeline stages, you're killing 0..14 instructions that are in the pipeline already.
Implementing a delay slot sucks; there's a reason newer ISAs don't expose that pipeline detail. Long-term pain for short-term gain.

Organizing pipeline in MIPS

I'm unsure about how the following properties affect pipeline execution for a 5 stage MIPS design (IF, ID, EX, MEM, WB). I just need some clearing up.
only 1 memory port
no data fowarding.
Branch stalls until end of * stage
Does the 1 memory port mean we cannot fetch or write when we read/write to mem (i.e. MEM stage on lw,sw you can't enter IF or another MEM)?
With no forwarding does this means an instruction won't enter the ID stage until after or on the WB stage for the previous instruction it depends on?
Idk what the branch stall means
A common assumption is that you can write in the first half of a cycle, and read in the second half of a cycle.
Lets say than I1 is your first instruction and I2 your second instruction, and I2 is using a register that I1 is modifying.
Only 1 memory port.
This means that you cannot read or write memory at the same time in two different stages of the pipelines.
For instance, if I1 is at the MEM stage, another instruction cannot be at the IF stage at the same time, because both require memory access.
No data forwarding. Data forwarding reflects to the fact that at the end of EX stage for I1, you forward the data to the ID cycle of I2.
Consequently, no forwarding means that the pipeline has to wait for the WB stage of the I1 to go to ID stage of I2. With the asumption, you can go to ID stage at the same time as the WB stage of the previous instruction, because WB will write to memory during the first half of the cycle, and ID will read from memory during the second half of the cycle.
Branch stalls until end of EX stage. This is a common asumption, that doesn't use branch prediction techniques. It simply states that an instruction after a branch has to wait until the end of EX stage to start ID stage. Recall that the address of the next instruction to be executed is known only at the EX stage of the branch instruction.
Comment: IF and MEM access separate sections of memory. One is data memory (.data) and the other instruction memory (.code or .text). It is designed this way so that accessing memory during IF and MEM does not cause a structural stall.
The area which is used by .data is that used by the stack, with the stack is traditionally placed right at the "end" of the .data sector. This is why if you don't subtract from the stack pointer address before saving data to the stack you run the risk of overwriting your program code. As MIPS allows you to designate the stack address manually, some people choose put the stack a bit "before" the end in order to avoid problems if they know they will have space and not overwrite variables in MEM. For instance, placing the stack at 0x300 instead of 0x400 in WinMIPS64. I am not sure if that is good practice or not. But I have heard of people doing it.

Fixing load-use hazard issue in pipeline (MIPS)

I have been working on some low level programming with 5 stage pipelining. But I hit a snag.
Assuming this diagram http://i.imgur.com/7kTFi.png
and the mips code:
lw $4,1000($6)
sw $4,2000($6)
what would actually happen? I assumed there would be bubbles, i counted two bubbles proceeding the ID stage.
Can we fix it by adding inputs to the new forwarding unit? Where can I add mux's and new datapaths to to avoid bubble+errors?
You are right, there would be two bubbles.
Assuming data forwarding:
1. IF ID EX MM WB
2. IF S S ID EX MM WB
(S means stall or bubble).
There is no way you can 'fix' it because in any case you have to wait for the end of the MM stage to have the value at 1000($6). It could even be worse without data forwarding where you would have to wait until the WB stage, meaning 3 stalls.
Only way to prevent such behaviour is to have smart compilers that will schedule those two instructions in a different way (i.e space them by adding others in-between).
Note that as it is, the program has no real purpose (get value at memory address [1000+Regs[$6]], and copy it at address [2000+Regs[$6]])
You need MEM to ALU forwarding

Designing A Simplified MIPS Processor

So I'm going over some old quizzes for my Computer Organization final and I must have missed this lecture or something. I'm decently proficient in programming MIPS, but this problem has me completely stumped. Could someone help me understand this?
The diagram is missing lines connecting the various parts of the processor as well as a multiplexor for determining if the next instruction address is coming from the PC+4 or from a register as in a jr ra instruction.
There needs to be a line from the ALU to the write data portion of the register file. This is for R-Type instructions, as their result will need to be written back to the destination register. Going into the ALU needs to be lines from Read data 1 and Read data 2, this is how the values from the registers make their way into the ALU for R-type instructions.
A couple lines have been added from the instruction memory to the registers, you're missing the one to the Write Register though (this specifies the destination register for R-type instructions).
For the PC, the line going into the adder goes into the other input (the above one). The 4 for the adder is constant, as each instruction address is 4 bytes after the previous one, so unless we're jumping to an address we will be executing the instruction immediately after the current instruction. The line from the PC to the read address is also necessary as the PC specifies which instruction address to find the current instruction at. The line going into the PC comes from either the result of the PC+4 addition or from the register specified in a jr ra instruction.
To handle this decision a multiplexor is needed. Multiplexors have two inputs and one output, so this one will have one input from the PC+4 adder (for regular R-type instructions) and another from Read Data 1 (for jr ra instructions). The input from Read Data 1 should be visualized as a split from the line between Read Data 1 and the ALU. The output will go right back to the PC, as it determines the next instruction to execute.
I think that's everything that's wanted for that question, as the prof specifies that control signals are already generated (the multiplexor is a type of control unit, but I think it's necessary nonetheless). Hope that helps!

get wrong epc on MIPS

I know MIPS would get wrong epc register value when it happens at branch delay, and epc = fault_address - 4.
But now, I often get the wrong EPC value which is even NOT in .text segment such as 0xb6000000, what's wrong with the case??
Thanks for your advance..
The CPU does not know anything about the boundaries of the .text region in your program. It simply implements a 2^32 byte address space.
It is possible for an incorrectly programmed jump to go to any address within the 2^32 byte address space. The jump instruction itself will not cause any sort of exception - in fact the MIPS32® Architecture for Programmers Volume II: The MIPS32® Instruction Set explicitly states that jump (J, JR, JALR) instructions do not trigger any exceptions.
When the processor starts executing from the destination of an incorrectly programmed jump, in presumably uninitialized memory, what happens next depends on the contents of that memory. If uninitialized memory is filled with "random" data, that data will be interpreted as instructions which the processor will execute until an illegal instruction is found, or until an instruction triggers some other exception.