I'm given stages of a clock cycle in a processor.
IF ID EX MEM WB
250ps 350ps 150ps 300ps 200ps
Now I'm being asked what is the total latency of a LW instruction in a pipelined instruction.
Here's what I know:
The clock cycle time in a pipelined version is 350ps because that's the longest instruction.
The clock cycle time in a non-pipelined version is 1250ps because that's the duration of all the instructions added together.
But how does the "latency of a LW instruction" relate to those times?
Ok I'm pretty sure I figured out the answer which is you take the longest duration of the stages which in this case is 350ps and you multiply it by the amount of stages, in this case 5.
So
350 * 5 = 1750ps
Yes, you are correct with your result. Here is the formula:
(Number of Instructions)(Longest Instruction Time with (Unit)) = Latency(Unit)
Related
Why are registers A and B whose inputs are ReadData1 and ReadData2 of RegisterFile are necessary? Isn't it possible to use directly the values which are on ReadData1 and ReadData2 outputs of Register File?
Instruction Register is already loaded with an instruction, so the value of IR is fixed, which means that $rs, $rt, $rd reg numbers are obviously the same within a single instruction. Hence, there are always the same values on the ReadRegister1 and ReadRegister2 inputs of the Register File. So the values that are on RD1 and RD2 outputs are the same unless the corresponding registers inside RegisterFile are overwritten.
That means that A and B registers are necessary only for the instructions that require to have the values of $rs or $rd registers that were overwritten on previous cycle. Can anybody give me an example of such an instruction.
The general pattern is that during a clock cycle: at the start of the clock, some register(s) feed values to computational logic which feed values to (the same or other) register(s) by the end of the clock, so that it can start all over again for the next cycle.
In the single cycle datapath, the value in the PC starts the process of the cycle, and by the end of the cycle, the PC register is updated to repeat a new cycle with another value. Along the way, the register file is both consulted and also (potentially) updated. You may note that these A & B registers are not present in the single cycle datapath.
You are correct that those values do not change during the execution of any one single instruction on the multicycle datapath.
However, the multicycle processor uses multiple cycles to execute a single instruction (so that it can speed the clock). In order to support the successive cycles in that processor design, some internal registers are used — they capture the output of a prior cycle in order for a next cycle to do something different.
The problem with the multicycle datapath diagrams is that they don't make it clear what part of the processor runs in what cycles. Those A & B registers are there to support a cycle boundary, so the decode is happening in one cycle and the arithmetic/ALU in another cycle. (Without those registers, the processor would have to perform decode again in the subsequent cycle, which would decrease the clock rate and defeat the nature of the multicycle datapath.)
Boundaries are made much more clear in pipeline datapath diagrams. Search for "MIPS pipeline datapath". (Note that some pipeline datapath diagrams show registers between stages and others simply outline what's in what stage without showing those registers.) The large vertical bars are registers and they separate the pipeline stages. Of course, the pipeline processor executes all pipeline stages in parallel, though in theory, similar boundaries are applicable to the cycles in a multicycle processor. Note that the ID/EX pipeline register in the pipeline datapath serves the purpose of the A & B registers in the multicycle datapath.
Do all I-type MIPS instructions take the same number of cycles in a multi-cycle datapath? I know R-type have the same number of cycles.
Do all I-type MIPS instructions take the same number of cycles in a multi-cycle datapath?
No.
First, let's look at some of the instructions are included in the MIPS I-Type category: addi and lw. These are both I-Type instructions — with the identical 16-bit immediate and rs and rt fields. They are decoded using the same fields, which is why they are both considered I-Type instructions.
Ok, next, let's look at the multicycle processor. This is not a pipelined processor, though, generally speaking, it will have a cycle for each stage in an equivalent pipelined version.
While we would generally find a pipelined processor's performance superior at the same megahertz, one advantage of a multicycle processor implementation over a pipelined processor implementation is that "stages" that are not needed can be skipped (skipping a cycle is not viable in a pipelined processor because of the instruction execution overlap; whereas the multicycle processor does not overlap execution of instructions).
So, among IF, ID, EX, MEM, WB stages, addi does not require or make use of the MEM stage, and thus it would be silly not to skip that cycle, making addi a 4 cycle instruction.
However, lw, does require the MEM stage (so all the stages) hence it will be 1 cycle longer than addi.
I came across this multicycle MIPS processor microarchitecture. My query is, is the multiplexer (selected by PCSrc) which selects PC value really needed? What is the harm in only sending the clocked PC value to PC'.
Considering the instruction is either lw, sw, beq or ALU operation, the second cycle would be used to fetch operands from register file. That cycle could be used to flop PC value instead of using first cycle to update PC value. This would save one mux.
Please tell me if my understanding is correct.
I have confused by using pipelining in mips instruction. Any help will be great. Thanks in advance.
What is the data dependency in the next two codes? Which of them can be solved by using stall (bubble) or forwarding. You can use the shape 1 for convenience.
shape 1:
If-Id-Ex-Mem-Wb
explanation:
if=instruction fetch
id=instruction decode register fetch
ex=execute
mem=memory access
wb=write back
code 1:
add $3,$4,$2
sub $5,$3,$1
lw $6,200($5)
sw $6,200($2)
lw $6,200($3)
add $7,$4,$6
code 2:
add $3,$4,$2
sub $5,$3,$1
lw $6,200($3)
add $7,$3,$6
(sorry for bad post,but i can't yet post an image)
Thanks.
Let's see the first one:
add $3,$4,$2
sub $5,$3,$1
The result from add is used in sub, therefore there is a data hazard. We have to insert an amount of NOP stages to resolve it. Assuming all instructions take up 5 cycles, we insert 3 NOPs and we're done.
add $3,$4,$2 IF ID EX MEM WB
sub $5,$3,$1 NOP NOP NOP IF ID EX MEM WB
We can do this for all subsequent instructions. Now instructions produce new values in the EX and MEM stages. Those values are not written to a register until the WB stage (for learning purposes, let's assume that's true). Since the registers are read in the ID stage, this leaves a window of three cycles for which old incorrect values are "flowing" through the pipeline. Forwarding can help cure this problem in our case above - forward the result from add:EX to sub:ID.
Hope this helps.
What is the difference between dynamic and static instruction count?
a. Derive an expression to calculate the user CPU time as a function
of following parameters: the dynamic instruction count (N),
clock cycle per instruction (CPI) and clock frequency (f)
b. Explain the reason for choosing ‘dynamic’ instruction count as a parameter in Question 3a
instead of ‘static’ instruction count
The dynamic instruction count is the actual number of instructions executed by the CPU for a specific program execution, whereas the static instruction count is the number of instruction the program has.
We usually use dynamic instruction count as if for example you have a loop in your program then some instructions get executed more than once. Also, in the presence of branches, some instructions may not be executed at all.
Execution time (ET) = clock cycles per instruction(CPI) * number of instruction(IC) * cycle duration (CD).
Since a cycle frequency/rate (CR) is simply the inverse of cycle duration I.e cycles per second- and vice versa
ET= (CPI *IC)/CR