How many stalls do I need to execute the following instructions properly. I am a little confused with what I did, so I am here to see experts answers.
lw $1,0($2);
beq $1,$2,Label;
Note that the check whether the branch will occur or not will be done in decoding stage. But the source register rs of beq which is $1 in this case will be updated after writeback stage of lw instruction. So do we need to forward new data from Memory in memory stage to Decoding stage of beq instruction.
Here is the data path diagram:
The value that is fetched from the memory, is written to the register file in the write-back stage of the pipeline. Writes to the register file happen in the first half of the clock cycle, while reads from the register file happen in the second half of the clock cycle.
The value that is written to the register file can thus be read in the same clock cycle as it is written to the register file. Thus forwarding is not effective here.
As for the number of stalls needed, you need to insert two bubbles into the pipeline, as the lw instruction should be in the write back stage when the beq instruction is in the decode stage.
I hope this answers your question.
Related
There are a multitude of different instructions in MIPS. I'm currently learning about data and instruction cache.
Instruction cache simply takes what it can so to say, depending on the block size it might utilize spatial locality and fetch multiple instructions. But for data cache I have a harder time understanding when it fetches things from main memory and when it doesn't.
For example, the instruction lw $t0, 0x4C($0) will fetch a word of data stored in address 0x4C and depending on data cache capacity, sets, block size and so forth it will temporarily store in in a block in the cache if for that adress the valid bit or tag doesn't exist there.
In my litterature, an addi instruction does not fetch from memory, why? The only times it seems to need to fetch data from memory is when using the lw instruction, why?
I also have a question regarding registers in MIPS. If we're simply doing the instructions over the registers, then there will be no access to any main memory, correct? It will not even go to the data cache, correct? Are the registers the highest level in the memory heirarchy?
The reason addi doesn't "fetch from memory" is that it's using an immediate operand, as in, the program counter has already fetched the value that's going to be loaded. (Technically it is fetching from memory, since all code resides in some form of memory, but when literature refers to "memory" typically it's referring to a range of memory outside the program counter. When MIPS uses something like lw to load from memory, the CPU has no idea what value the destination register will have until the load is finished.
Just to illustrate this concept further, the original MIPS I architecture (which was used by the PlayStation 1) actually wouldn't finish loading from memory before the next instruction was already being worked on!
lw $t0,0($a0) ;load from the address pointed to by $a0
addi $t0,$t0,5 ;the value in $t0 hasn't been updated yet so this won't have the desired result.
The easiest solution to this was to put a nop after every lw. Chances are the version of MIPS you're using doesn't have this problem, so don't worry about it.
I'd really need a hand or two with this Assembly Mips CPU Excercise.
I have to determine input and output from: ALU(s), Jump-related MUX and from the Register File.
PC is 0x01D0 and the instruction I have to simulate is: beq $3, $7, -120
Regarding the ALU(s) I've no problem on those, I've got issues on MUX and RG.
As you can see on the image on the second jump-related MUX I don't know what to write regarding jump address [31-0].
The other problem I've got is within the Register File, I don't know what to write as input.(Instruction should be: 0x1067FFE2)
I understand that, given the latencies of say, IMem, Add , Mux , ALU , Regs, DMem and constrol , specific MIPS instruction such as add, and a specific datapath to work with, I am to find the critical path of the instruction on the datapath and add the latencies to come up with Clock Cycle Time. However, what if I am only given the latencies and the datapath, but no specific MIPS instruction? Do I just go with the longest single instruction and find its critical path? Or can I just add one instance of each individual latency to get a "general" clock cycle time?
Thanks for the help!
You need to use the latency of the slowest single cycle instruction, since the clock must run slow enough to complete that instruction correctly.
My computer architecture books explains that
"Since writes to the register file are edge-triggered, our design can
legally read and write the same register within a clock cycle: the
read will get the value written in an earlier clock cycle, while the
value written will be available to read in a subsequent clock cycle."
This makes some sense, and I somewhat understand what's going on with the register file. However, I don't understand when each event happens. Say we're reading from one of the 32 register files and writing to it in the same cycle. When would the register be read from? When would it be written to? I don't totally understand how events are triggered by the clock-edges, so it'd help to have that explained too. Thank you!
Reading the value of a register is asynchronous, whereas in the architecture you are working in your class, the registers are written sychronously (i.e. the writes are edge-triggered).
This means that you can read the current value of a register, apply some operation on it (e.g. add some immediate) and write the result at the next raising clock edge.
Suppose you want to issue an addiu $1, $1, 123, that is take the current value of $1, add 123 and store the result back in $1.
At the start of the clock cycle the control unit would instruct the register file to put the contents of $1 in one of the data buses that gets into the ALU. the control unit would also instruct to put the immediate 123 in the other data bus that also gets into the ALU. The addition which is just a combinatorial circuit implemented inside the ALU would compute the said addition and put the result in the data bus that connects the register file for storage.
All of this is done before the raising edge of the clock happens and the result of the addition gets presented until the next raising edge. At some point the raising edge occurs and the result of the addition is now written back into register $1.
The register file is built from flip-flops. Each flip-flop has a store, an input, an output and a trigger. The output is always presenting the stored value, so can be read all the time.
With a rising edge on the trigger, the input value moves into the store.
Is the forwarding (highlighted by the blue arrow) necessary? I figured the add instruction would successfully write back to register before the OR instruction reads it.
add is writing to register in the same step that or is reading from register, so there's no guarantee that the correct value will be safely in the register at the point or sees it--add is allowed one full clock cycle to make that write and have the signals propagate throughout the hardware. By contrast, xor is safe because it reads from r1 in the next clock cycle after add's write.