Why A and B registers are used in multicycle Datapath? - mips

Why are registers A and B whose inputs are ReadData1 and ReadData2 of RegisterFile are necessary? Isn't it possible to use directly the values which are on ReadData1 and ReadData2 outputs of Register File?
Instruction Register is already loaded with an instruction, so the value of IR is fixed, which means that $rs, $rt, $rd reg numbers are obviously the same within a single instruction. Hence, there are always the same values on the ReadRegister1 and ReadRegister2 inputs of the Register File. So the values that are on RD1 and RD2 outputs are the same unless the corresponding registers inside RegisterFile are overwritten.
That means that A and B registers are necessary only for the instructions that require to have the values of $rs or $rd registers that were overwritten on previous cycle. Can anybody give me an example of such an instruction.

The general pattern is that during a clock cycle: at the start of the clock, some register(s) feed values to computational logic which feed values to (the same or other) register(s) by the end of the clock, so that it can start all over again for the next cycle.
In the single cycle datapath, the value in the PC starts the process of the cycle, and by the end of the cycle, the PC register is updated to repeat a new cycle with another value.  Along the way, the register file is both consulted and also (potentially) updated.  You may note that these A & B registers are not present in the single cycle datapath.
You are correct that those values do not change during the execution of any one single instruction on the multicycle datapath.
However, the multicycle processor uses multiple cycles to execute a single instruction (so that it can speed the clock).  In order to support the successive cycles in that processor design, some internal registers are used — they capture the output of a prior cycle in order for a next cycle to do something different.
The problem with the multicycle datapath diagrams is that they don't make it clear what part of the processor runs in what cycles.  Those A & B registers are there to support a cycle boundary, so the decode is happening in one cycle and the arithmetic/ALU in another cycle.  (Without those registers, the processor would have to perform decode again in the subsequent cycle, which would decrease the clock rate and defeat the nature of the multicycle datapath.)
Boundaries are made much more clear in pipeline datapath diagrams.  Search for "MIPS pipeline datapath".  (Note that some pipeline datapath diagrams show registers between stages and others simply outline what's in what stage without showing those registers.)  The large vertical bars are registers and they separate the pipeline stages.  Of course, the pipeline processor executes all pipeline stages in parallel, though in theory, similar boundaries are applicable to the cycles in a multicycle processor.  Note that the ID/EX pipeline register in the pipeline datapath serves the purpose of the A & B registers in the multicycle datapath.

Related

Could this multicycle MIPS avoid a mux for the new PC value?

I came across this multicycle MIPS processor microarchitecture. My query is, is the multiplexer (selected by PCSrc) which selects PC value really needed? What is the harm in only sending the clocked PC value to PC'.
Considering the instruction is either lw, sw, beq or ALU operation, the second cycle would be used to fetch operands from register file. That cycle could be used to flop PC value instead of using first cycle to update PC value. This would save one mux.
Please tell me if my understanding is correct.

MIPS pipeline registers length (IF/ID, ID/EX, EX/MEM, MEM/WB)

I am currently studying for my Computer Architecture exam and came across a question that asks to illustrate (bit by bit i would assume) the values contained in the mips pipeline architecture after the 3rd stage of the sub (before the clock commutes) given the following instructions.
add $t0,$t1,$t2
sub $t3,$t3,$t5
beq $t6,$t0,16
add $t0,$t1,$t3
I am not asking for the solution to this problem however after some research i haven't had much success wrapping my mind around it so i am asking for some help/advice.
Firstly i still don't have a clear understanding of the size of the pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB). I do understand that they contain the control unit codes for the next stages and that they contain the result of the previous stage so that it can be passed in to the next one.
So that would be (please correct me if i'm wrong) +9 for ID/EX, +5 for EX/MEM and +2 for MEM/WB but i haven't managed to find a clear schema of the data that we can expect these registers to contain.
Also, i figure that we would need to use HW forwarding to forward the result of the first add to beq (because of $t0) and to forward the result of sub to the last add (because of $t3). Does this factor in to what is contained in the registers?
It would be great if someone could point me in the right direction.
Thanks lots.
The purpose of each of these intermediate registers is to hold data that might be needed in the immediate next stage or in later stages. I'll discuss one possible design, but there are really many possible designs as I'll explain.
In the fetch stage, the next instruction to be execute (to which the current PC points) is fetched from memory and PC is updated to point to the next instruction to fetch. Therefore, IF/ID would include one 4-byte field to hold the fetched instruction. There are two ways to calculate the new PC: current PC + 4 or PC + 4 + offset in case of a branch. If the fetched instruction is itself a branch instruction, then we would need to pass the new PC so that the branch target address can be calculated in the EX stage. We can add a 4-byte field in IF/ID to hold the new PC value to be passed to the EX stage through the ID stage.
In the decode stage, the opcode and its operands are determined. The opcode is at a fixed location in the instruction in MIPS. An MIPS instruction may operate on a single source register, two source registers, one source register and a sign-extended 32-bit immediate value, a sign-extended 32-bit immediate value, or no operands. We can either prepare only the required operands for the EX stage based on the opcode or prepare all the operands that might be required for any opcode. The latter design is simpler, but it requires a larger ID/EX register. In particular, two 4-byte fields are required to hold two possible source register values (the values are read from the register file in the decode stage) and a 4-byte field for the possible sign-extended immediate value. No opcode will require all of these fields, but let's prepare all of them anyway and store them at fixed locations in the ID/EX register. It simplifies the design.
We to also pass the new PC value calculate in the fetch stage to the execute stage just in case the opcode turns out to be a branch. The branch target address is calculated relative to the current PC value (the PC of the instruction following the branch in static program order). There are two possible design here: either add a bus from the new PC field in IF/ID to the EX stage or add a field in ID/EX to hold the new PC value, which can then be accessed in the EX stage. The latter design adds a 4-byte field in ID/EX.
The EX requires the opcode from the ID stage. We can choose to pass only the opcode rather than the whole instruction. But then later stages might require other parts of the instruction. Generally, in RISC pipelines, it preferable to pass to make the whole instruction available to all stages. In this way, all parts of an instruction are already available when changes are made to any stage of the pipeline in the future. So let's add a 4-byte field to ID/EX to hold the instruction.
The EX stage reads the operands and the opcode from the ID/EX register (the opcode is part of the instruction) and performs the operation specified by the opcode. The EX/MEM register has to be big enough to hold all possible results, which might include the following: a 4-byte value computed by the ALU resulting from an arithmetic or logic operation, a 4-byte value representing the calculated effective address for a memory load or store operation, a 4-byte value representing the branch target address in case of a branch instruction, and a 1-bit condition in case of a conditional branch instruction. We can use a single 4-byte field in EX/MEM for the result (whatever it represents) and a 1-bit field for the condition. In addition, as before, we need a 4-byte field to hold the instruction. Also for store instructions, we need another 4-byte field to hold the value to be stored. One possible alternative design here is that rather than storing the 1-bit condition and 4-byte branch target address in EX/MEM, they can be passed directly to the IF stage.
In the MEM stage, in case of a branch instruction, the branch target address and the branch condition are passed back from EX/MEM to the IF fetch to determine the new PC. In case of a memory store operation, the operation is performed and there is no result to be passed to any stage. In case of a memory load operation, the 4-byte value is fetched from memory and stored in a field in the MEM/WB register. In case of an ALU operation, the 4-byte result will be just passed to a field in the MEM/WB register. In addition, as before, we need a 4-byte field in MEM/WB to hold the instruction.
Finally, in the WB stage, the 4-byte result whether loaded from memory or computed by the ALU is stored in the destination register. This only occurs for instructions that produce results. Otherwise, the WB stage can be skipped.
In summary, in the design I've discussed, the sizes of intermediate registers are as follows: IF/ID is 8 bytes in size, ID/EX is 20 bytes in size, EX/MEM is 25 bits in size, and MEM/WB is 8 bytes in size.
The design decision of whether a field is required in an intermediate register to hold some value or whether it can be passed directly in the same stage to the logic that requires the value is a "circuit-level" decision. If the signals can be guaranteed to not be corrupted, and if it feasible or convenient to add a dedicated bus, they can be directly connected.

Organizing pipeline in MIPS

I'm unsure about how the following properties affect pipeline execution for a 5 stage MIPS design (IF, ID, EX, MEM, WB). I just need some clearing up.
only 1 memory port
no data fowarding.
Branch stalls until end of * stage
Does the 1 memory port mean we cannot fetch or write when we read/write to mem (i.e. MEM stage on lw,sw you can't enter IF or another MEM)?
With no forwarding does this means an instruction won't enter the ID stage until after or on the WB stage for the previous instruction it depends on?
Idk what the branch stall means
A common assumption is that you can write in the first half of a cycle, and read in the second half of a cycle.
Lets say than I1 is your first instruction and I2 your second instruction, and I2 is using a register that I1 is modifying.
Only 1 memory port.
This means that you cannot read or write memory at the same time in two different stages of the pipelines.
For instance, if I1 is at the MEM stage, another instruction cannot be at the IF stage at the same time, because both require memory access.
No data forwarding. Data forwarding reflects to the fact that at the end of EX stage for I1, you forward the data to the ID cycle of I2.
Consequently, no forwarding means that the pipeline has to wait for the WB stage of the I1 to go to ID stage of I2. With the asumption, you can go to ID stage at the same time as the WB stage of the previous instruction, because WB will write to memory during the first half of the cycle, and ID will read from memory during the second half of the cycle.
Branch stalls until end of EX stage. This is a common asumption, that doesn't use branch prediction techniques. It simply states that an instruction after a branch has to wait until the end of EX stage to start ID stage. Recall that the address of the next instruction to be executed is known only at the EX stage of the branch instruction.
Comment: IF and MEM access separate sections of memory. One is data memory (.data) and the other instruction memory (.code or .text). It is designed this way so that accessing memory during IF and MEM does not cause a structural stall.
The area which is used by .data is that used by the stack, with the stack is traditionally placed right at the "end" of the .data sector. This is why if you don't subtract from the stack pointer address before saving data to the stack you run the risk of overwriting your program code. As MIPS allows you to designate the stack address manually, some people choose put the stack a bit "before" the end in order to avoid problems if they know they will have space and not overwrite variables in MEM. For instance, placing the stack at 0x300 instead of 0x400 in WinMIPS64. I am not sure if that is good practice or not. But I have heard of people doing it.

Mips datapath procedure for executing an AND instruction?

Based on this figure, executing the AND instruction would cause these values to be assigned to the signals labeled in blue:
RegWrite = 1
ALUSrc = 0
ALU operation = 0000
MemRead = 0
MemWrite = 0
MemtoReg = 0
PCSrc =0
However, I am a little confused which inputs will be used in the Registers block? Can anyone describe the overall AND procedure in the MIPS datapath?
Starting from after the instruction is read from instruction memory, you need to know that AND is an r-type instruction and thus uses 3 registers. Which register is actually used is based off of the encoded instruction. (An R-Type has 3 5-bit fields, one for each reg.) rs and rt go to Read register 1 and 2, while rd goes to Write register. From there, Read data 1 and 2 (the bits of registers s and t) go to the ALU where a bitwise AND is performed on them. The result of that is written to the write register. I traced the path in your picture (omitting the PC incrementing part). I'm taking a class that uses that exact book this semester. If you look a little ahead, it goes deeper into what is going on. The explanation of the control (blue) lines helps a lot. The mux blocks are multiplexers, that is they allow alternating the output between two inputs. In this case, the ALUSrc mux will use Read data 2 because AND is an r-type. If it were i-type, it would switch to use the data coming from the sign extend, because that would be the immediate. The other mux is to allow either memory from data to be written to the write register or the result of an ALU operation. In this case, it will be the result of an ALU operation.
To imply answer your question about the register block, keep in mind that the inputs to the register block are the addresses of the registers your instruction will be using, the register block then either fetches the data in the registers who's addresses were given or write data at the end on this register.
However one note you have an inconsistency in your mux design MemtoReg and ALUSrc should have opposite values, so unless one of the 2 muxes is upside down (which is not advisable) then there is a mistake with your controller logic.

VHDL MIPS 5 stage pipeline Bug

The code for this is too long to post so Ill just describe it. I've created a 5 stage mips pipe that almost works. The catch is that EVERY lw instruction that reaches the instruction decode stage overwrites the control signal values in the execution stage. Not only that it causes the PC to skip can instruction, i.e from 300 -> 308. I just need some idea on where to look for bugs since this is a class assignment. If we take out all the LW instructions the CPU works fine.
Example:
The adder in the EX stage is going to sub $4 $1 $2 which should be 1
Once LW enters the ID stage ALUsrc is asserted AND ALUop is changed from subtract to add
This forces the adder in the EX stage to add $4 $1 $2 resulting in 5 being stored in $4
http://en.wikipedia.org/wiki/File:MIPS_Architecture_%28Pipelined%29.svg
The MIPS 5 Stage Pipeline (annotated to show Write Reg Select and enable)
The bottom line through the pipeline stages represents the register file write (back) port address and write enable and WB is the data from memory.
http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html
Load Word Instruction
Description:
A word is loaded into a register from the specified address.
Operation: $t = MEM[$s + offset]; advance_pc (4);
Syntax: lw $t, offset($s)
Encoding:
1000 11ss ssst tttt iiii iiii iiii iiii
Where the write register address ($t) input is read from data memory address comprised of register file register $s offset with the immediate value i which gets sign extended. Your $4 is $t above, $1 or $2 is $s while the remaining register file output lane sounds to be suborned for the sign extended immediate.
From your description it sounds like you aren't using a three port register file with one port a write only port.
With a three port register file the only time you run into conflicts is when you attempt to use the new register file value from memory before it is read from memory and written to the register file. That can be managed by a compiler scheduling NOOPs until the outstanding register file write is retired when a following instruction is trying to use it, or stalling the IF/ID in hardware when it's output contains a reference to an outstanding register file write.
There are three instructions that can be in flight to the right of IF/ID, each with a write to register file address and a write enable. You'd need to compare both instruction decode register file addresses to all three of those and stall IF/ID until those clear out. The write enable stored in each of those three pipeline stages are used to determine whether the write register address in those pipeilne stages should be compared.
Because the ID/EX, EX/MEM and MEM/WB write register file addresses are not used anywhere else the circuitry for doing the comparison can be collocated with IF/ID and the Register File, preventing unnecessary layout delays affecting the minimum clock cycle.
Using a two port register file is much simpler and infers IF/ID stalling until the write enable comes back from MEM/WB, effectively turning any memory reading instructions into 3 cycle instructions (or more, data memory can stall if it's a cache or slow). It makes a three port register file more or less necessary for performance reasons. There's an implied multiplexer to source for at least one of the two register file port controls (write enable, write address) from the MEM/WB stage when IF/ID is stalled (for memory->regfile).
Data memory access can stall MEM/WB, just like instruction memory access can also stall IF/ID. A stalled IF/ID doesn't issue a write enable for the register file to ID/EX nor does a stalled MEM/WB.