Why RegDst control signal and the associated mux is put in the execution stage instead of the instruction decode stage? - mips

According to the book Computer Organization and Design by Patterson and Hennessy (5th edition) page 304, the RegDst control signal is being used in the execution stage of the datapath.
It acts as the selection bit for the mux that chooses the destination register from one of the two register addresses passed in the ID/EX pipeline register.
Why the RegDst control signal and the associated mux are not put in the instruction decode stage (the stage immediately before the execution stage)? That way, we could have sent only the selected destination register address from [20-16] and [15-11] bits of the instruction in the ID/EX pipeline register, instead of sending both of them.

Related

Are there some practical examples that illustrate pipelining in datapath and control?

Pipelining in the data path is simply divvying/cutting (theoretically) the resources. But pipelining the control means each resource at piped stages gets the separate control signals?
For instance, in most of the RISC architectures, we have 5 stages of pipelining, and the Mem pipe stage has the separate control signal for load or store?
Are there some practical examples of control pipelining?
In a classic 5-stage pipeline, each stage of the pipe has inputs that come from the previous stage (except the first one, of course), and each stage of the pipe has outputs that go to the next stage (except the last one, of course).  It stands to reason that these inputs & outputs are comprised of both data and control signals.
The EX stage needs to know what ALU operation to perform (control: ALUOp) and the ALU input operands (data).
The MEM stage needs to know whether to read memory (control: MemRead) or to write memory (control: MemWrite) (plus size & type for extension, usually glossed over) and where to read (data: Address) and what to write (data: Write Data).
The WB stage needs to know whether to write a register (control: RegWrite) and what register to write (data: Write Register) and what value to write to the register (data: Write Data).
In the single stage processor, all these control signals are generate by lookup (using the opcode) in the ID stage.  When the processor is pipelined, either those signals are forwarded from one stage to another, or else, each stage would have to repeat lookup using the opcode (then opcode would need to be forwarded from one stage to another, in order for each stage to repeat the lookup, though it is possible that the opcode is forwarded anyway, perhaps for exceptions).  (I believe that repeating the lookup in each stage would incur costs (time & hardware) as compared with forwarding control signals, especially for WB which is supposed to execute in the first half of a cycle.)
Because the WB stage needs to know whether to write a register, that information (control: RegWrite) must be passed to it from the MEM stage, which gets it from the EX stage, which gets it from the ID stage, where it is generated by lookup of the opcode.  EX & MEM don't use the RegWrite control signal, but must accept it as an input so as to pass it through as output to the next stage.
Similar is true for control signals needed by MEM: MemRead and MemWrite, which are generated in ID, passed from EX to MEM (not used in EX), and MEM need not pass these further, since WB also doesn't use those signals.
If you look in chapter 4 of Computer Organization and Design RISC-V edition, towards the end of the chapter (Fig 4.44 in the 1st edition), it shows the control signals output from one stage passing through stage pipeline registers and into the next intermediate stage. For example, Instruction [30, 14-12] is fed into ID/EX and then read by ALU Control in the EX stage. That is an example of pipelining a control signal.

ARM interrupts and context saving

I am trying to understand how interrupts work in an ARM architecture(ARM7TDMI to be specific). I know that there are seven exceptions (Reset,Data Abort, FIQ, IRQ, Pre-fetch abort, SWI and Undefined instruction) and they execute in particular modes(Supervisor, Abort, FIQ, IRQ, Abort, Supervisor and Undefined respectively). I have the following questions.
1. When the I and F bits in CPSR(status register) are set to 1 to disable external and fast interrupt, does the other 5 exceptions are also disabled ?
2. If the SWI is not disabled when I and F bits are enabled then, is it possible to intentionally trigger a SWI exception within ISR of an external interrupt?
3.When any interrupt is triggered saving the CPSR to SPSR, changing the mode is done by the processor itself. So, is it enough to write the ISR handler function and update the vector table with the handler addresses(I don't want to save r0 to r12 general purpose registers) ?
4. Whenever the mode of execution is changed does context saving happens internally by the processor(even when we change the mode manually)?
5. How to mask/disable a SWI exception?
Thank you.
When the I and F bits in CPSR(status register) are set to 1 to disable external and fast interrupt, does the other 5 exceptions are
also disabled ?
No, these all depend on your code to be correct. For instance, a compiler will not normally generate an swi instruction.
If the SWI is not disabled when I and F bits are enabled then, is it possible to intentionally trigger a SWI exception within ISR of an
external interrupt?
Yes, it is possible. You may check the mode of the SPSR in your swi handler and abort (or whatever is appropriate) if you want.
3.When any interrupt is triggered saving the CPSR to SPSR, changing the mode is done by the processor itself. So, is it enough to write
the ISR handler function and update the vector table with the handler
addresses(I don't want to save r0 to r12 general purpose registers) ?
No one wants to save registers. However, if you use r0 to r12 then the main code will become corrupt. The banked sp is made to store these registers. Also, the vector table is not a handler address but an instruction/code.
Whenever the mode of execution is changed does context saving happens internally by the processor(even when we change the mode
manually)?
No, the instruction/code in the vector page is responsible for saving the context. If you have a pre-emptable OS then you need to save the context for the process and restore later. You may have 1000s of processes. So a CPU could not do this automatically. Your context save area may be rooted in the super mode stack; you can use the ISR/FIQ sp as a temporary register in this case. For instance, the switch_to function in ARM Linux maybe helpful. thread_info is rooted in the supervisor stack for the kernel management of the user space process/thread. The minimum code (with features removed) is,
__switch_to:
add ip, r1, #TI_CPU_SAVE # Get save area to `ip`.
stmia ip!, {r4 - sl, fp, sp, lr} ) # Store most regs on stack
add r4, r2, #TI_CPU_SAVE # Get restore area to `r4`
ldmia r4, {r4 - sl, fp, sp, pc} ) # Load all regs saved previously
# note the last instruction returns to a previous
# switch_to call by the destination thread/process
How to mask/disable a SWI exception?
You can not do this. You could write an swi handler that does nothing but increment the PC and/or you could just jump to the undefined handler depending on what it does.

MIPS forwarding implementation (tough)

I think I understand the first part
(i). I at least have answers for this. I am not sure about where this implementation would fail though, for part ii? Part ii has me completely stumped. Does anyone know situations where this would fail?
If you want to shine some light on part iii you would be my entire classes hero. Were all stumped there. Thanks for any input.
Tim FlimFlam, the infamous architect of the MN-4363 processor, is struggling with a pipelined implementation of the basic MIPS ISA.
(i) To implement forwarding, Tim connected the output of logic from EX and MEM stages (these logic outputs represent inputs to EXMEM and MEMWB latches, respectively) to the input of IDEX register. He claims that he will be able to cover any dependency in this manner.
• Would this implementation work?
• Would he need to insert any muxes? Explain for
1. the producer instruction is a load.
2. the producer instruction is of R-type. 3. the consumer instruction is of R-type. 4. the consumer instruction is a branch. 5. the consumer instruction is a store.
(ii) Tim claims that forwarding to EX stage only suffices to cover all dependencies.
• Provide two examples where his implementation would fail.
• Would “fail” in this case correspond to breaking correctness constraints?
(iii) Tim tries to identify the minimum amount of information to be transferred acros pipeline stages. Considering R-type, data transfer, and branch instructions, explain how wide each pipeline register should be, demarcating different fields per latch.
Not sure if this is late, but the answer rests in "all dependencies" in part 2. Dependencies/hazards are of multiple types, viz, control, data. Some data hazards can be fixed by forwarding (from the MEM && WB stages to execute stage. Other data hazards like LOAD dependency is not possible to fix by forwarding. To see why this happens, note that a LOAD instruction in the MEM stage will have the output ready from the memory only in the end of that clock cycle. In that same clock cycle, any intstruction in the execute stage which requires the value of the LOAD instruction will get the incorrect value. In such a scenario, at any instant of time within the clock cycle say beginning, the alu is beginning to execute while the memory is 'beginning' to fetch the data. At the end of the cycle, while the memory has finished fetching the data, the alu has also finished computing with the wrong values. To prevent hazards, you need alu to be beginning computing while the data memory has finished fetching (i.e the alu must stall for 1 cycle or you must have a nop between LOAD and ALU instrcution. Hope this helps!

MIPS 32-bit architecture: how can a register in a register file be read from and written to in the same clock cycle?

My computer architecture books explains that
"Since writes to the register file are edge-triggered, our design can
legally read and write the same register within a clock cycle: the
read will get the value written in an earlier clock cycle, while the
value written will be available to read in a subsequent clock cycle."
This makes some sense, and I somewhat understand what's going on with the register file. However, I don't understand when each event happens. Say we're reading from one of the 32 register files and writing to it in the same cycle. When would the register be read from? When would it be written to? I don't totally understand how events are triggered by the clock-edges, so it'd help to have that explained too. Thank you!
Reading the value of a register is asynchronous, whereas in the architecture you are working in your class, the registers are written sychronously (i.e. the writes are edge-triggered).
This means that you can read the current value of a register, apply some operation on it (e.g. add some immediate) and write the result at the next raising clock edge.
Suppose you want to issue an addiu $1, $1, 123, that is take the current value of $1, add 123 and store the result back in $1.
At the start of the clock cycle the control unit would instruct the register file to put the contents of $1 in one of the data buses that gets into the ALU. the control unit would also instruct to put the immediate 123 in the other data bus that also gets into the ALU. The addition which is just a combinatorial circuit implemented inside the ALU would compute the said addition and put the result in the data bus that connects the register file for storage.
All of this is done before the raising edge of the clock happens and the result of the addition gets presented until the next raising edge. At some point the raising edge occurs and the result of the addition is now written back into register $1.
The register file is built from flip-flops. Each flip-flop has a store, an input, an output and a trigger. The output is always presenting the stored value, so can be read all the time.
With a rising edge on the trigger, the input value moves into the store.

MIPS Pipelining Question

Is the forwarding (highlighted by the blue arrow) necessary? I figured the add instruction would successfully write back to register before the OR instruction reads it.
add is writing to register in the same step that or is reading from register, so there's no guarantee that the correct value will be safely in the register at the point or sees it--add is allowed one full clock cycle to make that write and have the signals propagate throughout the hardware. By contrast, xor is safe because it reads from r1 in the next clock cycle after add's write.