Instruction Execution in MIPS - mips

This is an abstract view of the implementation of the MIPS subset showing the
major functional units and the major connections between them
Why we need to add the result of (PC+4) with instruction address?
I know that the PC (Program Counter) is a register in a computer processor that contains the address (location) of the instruction being executed at the current time, but i didn't understand why we add the second adder in this picture?

Some of the operations that can be performed by the CPU are 'jumps'.
If your operation is a Jump, from the second block you get the address of the new instructions OR the lenght of the jump you have to do.

It's not the instruction address, the output of the instruction memory is an instruction itself.
They've obviously hidden most of the components (there's NO control circuitry). What they probably meant is the data path for branches, though they really should have put at least the link with the ALU output in there. Even so it would be better to explicitly decode the instruction, sign extend and shift left. So it's really inaccurate, but I don't see what else they could mean.

Related

What does the TargetWrite/IorD Control Line do on a multicycle MIPS processer

We learned all the main details about control lines and the general functionality of the MIPS chip in single cycle and also with pipelining.
But, in multicycle the control lines aren't identical in addition to other changes.
Specifically what does the TargetWrite (ALUout) and IorD control lines actually modify?
Based on my analysis, TW seems to modify where the PC points to depending on the bits it receives (for Jump, Branch, or standard moving to the next line)... Am I missing something?
Also what exactly does the IorD line do?
I looked at both course textbooks: See Mips Run and the Computer Architecture: A Quantitative Approach by Patterson and Hennessy which don't seem to mention these lines...
First, let's note that this block diagram does not have separate instruction memory and data memory.  That means that it either has a unified cache or goes directly to memory.  Most other block diagrams for MIPS will have separate dedicated Instruction Memory (cache) and Data memory (cache).  The advantage of this is that the processor can read instructions and read/write data in parallel.  In the a simple version of a multicycle processor, there is likely no need to read instructions and data in parallel, so a unified cache simplifies the hardware.
So, what IorD is doing is selecting the source for the address provided to the Memory — as to whether it is doing a fetch cycle for an instruction, or a read/write from/to data.
When IorD=0 then the PC provides the address from which to read (i.e. instruction fetch), and, when IorD=1 then the ALU provides the address to read/write data from.  For data operations, the ALU is computing a base + displacement addressing mode: Reg[rs] + SignExt32(imm16) as the effective address to use for the data read or write operation.
Further, let's note that this block diagram does not contain a separate adder for incrementing the PC by 4, whereas most other block diagrams do.  Lookup any of the first few MIPS single cycle datapath images, and you'll see the dedicated adder for that PC increment.  Using a dedicated adder allows the PC to be incremented in parallel with operations done by the ALU, whereas omitting that dedicated adder means that the main ALU must perform the increment of the PC.  However, this probably saves transistors in a simple version of a multicycle implementation where the ALU is not in use every cycle, and so can be used otherwise.
Since Target has a control TargetWrite, we might presume this is an internal register that might be useful in buffering the intended branch target address, for example, if the branch target is computed in one cycle, and finally used in another.
(I thought this could be about buffering for branch delay slot implementation (since those branches are delayed one instruction), but were that the case, the J-Type instructions would have also gone through Target, and they don't.)
So, it looks to me like the machinery there for this multicycle processor is to handle the branch instructions, say beq, which has to:
compute the next sequential PC address from PC + 4
compute the branch target address from (PC+4) + SignExt32(imm32)
compute the branch condition (does Reg[rs] == Reg[rt] ?)
But what order would they be computed?  It is clear from control signals in state 0 is that: PC+4 is computed first, and written back to the PC, for all instructions (i.e. for branches, whether the branch is taken or not).
It seems to me that in a next cycle, (PC+4) + SignExt32(imm16) is computed (by reusing the prior PC+4 which is now in the PC register — this result is stored in Target to buffer that value since it doesn't yet know if the branch is taken or not.  In a next cycle, contents of rs and rt are compared for equality and if equal, the branch should be taken, so PCSource=1, PCWrite=1 selects the Target from the buffer to update the PC, and if not taken, since the PC already has been updated to PC+4, that PC+4 stands (PCWrite=0, PCSource=don't care) for the start of the next instruction.  In either case the next instruction runs with what address the PC holds.
Alternately, since the processor is multicycle, the order of computation could be: compute PC+4 and store into the PC.  Compute the branch condition, and decide what kind of cycle to run next, namely, for the not-taken condition, go right to the next instruction fetch cycle (with PC+4 in the PC), or, for taken branch condition, compute (PC+4) + SignExt32(imm16) and put that into the PC, and then go on to the next instruction fetch cycle.
This alternative approach would require dynamic alteration of the cycles/state for branches, so would complicate the multicycle state machine somewhat and would also not require buffering of a branch Target — so I think it is more likely the former rather than this alternative.

Assembly Commands Are Running without Me Explicitly Calling Them [duplicate]

I hope this question isn't to stupid cause it may seem obvious.
As I'm doing a little research on Buffer overflows I stumble over a simple question:
After going to a new Instruction Address after a call/return/jump:
Will the CPU execute the OP Code at that address and then move one byte to the next address and execute the next OP Code and so on until the next call/return/jump is reached? Or is there something more tricky involved?
A bit boringly extended explanation (saying the same as those comments):
CPU has special purpose register instruction pointer eip, which points to the next instruction to execute.
A jmp, call, ret, etc. ends internally with something similar to:
mov eip,<next_instruction_address>.
While the CPU is processing instructions, it does increment eip by appropriate size of last executed instruction automatically (unless overridden by one of those jmp/j[condition]/call/ret/int/... instructions).
Wherever you point the eip (by whatever means), CPU will try it's best to execute content of that memory as next instruction opcode(s), not aware of any context (where/why did it come from to this new eip). Actually this amnesia sort of happens ahead of each instruction executed (I'm silently ignoring the modern internal x86 architecture with various pre-execution queues and branch predictions, translation into micro instructions, etc... :) ... all of that are implementation details quite hidden from programmer, usually visible only trough poor performance, if you disturb that architecture much by jumping all around mindlessly). So it's CPU, eip and here&now, not much else.
note: some context on x86 can be provided by defining the memory layout by supervising code (like OS), ie. marking some areas of memory as non-executable. CPU detecting it's eip pointing to such area will signal a failure, and fall into "trap" handler (usually managed by OS also, killing the interfering process).
The call instruction saves (onto the stack) the address to the instruction after it onto the stack. After that, it simply jumps. It doesn't explicitly tell the cpu to look for a return instruction, since that will be handled by popping (from the stack) the return address that call saved in the first place. This allows for multiple calls and returns, or to put it simply, nested calls.
While the CPU is processing instructions, it does increment eip by
appropriate size of last executed instruction automatically (unless
overridden by one of those jmp/j[condition]/call/ret/int/... instructions).
That's what i wanted to know.
I'm well aware that thers more Stuff arround (NX Bit, Pipelining ect).
Thanks everybody for their replys

MIPS: why is ISR surrounded with rdpgpr $sp, $sp; wrpgpr $sp, $sp instructions?

I'm working with PIC32 MCUs (MIPS M4K core), I'm trying to understand how do interrupts work in MIPS; I'm armed with "See MIPS Run" book, official MIPS reference and Google. No one of them can help me understand the following:
I have interrupt declared like this:
void __ISR(_CORE_TIMER_VECTOR) my_int_handler(void)
I look at disassembly, and I see that RDPGPR SP, SP is called in the ISR prologue (first instruction, actually); and balancing WRPGPR SR, SR instruction is called in the ISR epilogue (before writing previously-saved Status register to CP0 and calling ERET).
I see that these instruction purposes are to read from and save to previous shadow register set, so, RDPGPR SP, SP reads $sp from shadow register set and WRPGPR SR, SR writes it back, but I can't understand the reason for this. This ISR intended not to use shadow register set, and actually in disassembly I see that context is saved to the stack. But, for some reason, $sp is read from and written to shadow $sp. Why is this?
And, related question: is there some really comprehensive resource (book, or something) on MIPS assembly language? "See MIPS Run" seems really good, it's great starting point for me to dig into MIPS architecture, but it does not cover several topics good enough, several things off the top of my head:
Very little information about EIC (external interrupt controller) mode: it has the diagram with Cause register that shows that in EIC mode we have RIPL instead of IP7-2, but there is nothing about how does it work (say, that interrupt is caused if only Cause->RIPL is more than Status->IPL. There's even no explanation what RIPL does mean ("Requested Interrupt Priority Level", well, Google helped). I understand that EIC is implementation-dependent, but the things I just mentioned are generic.
Assembly language is covered not completely enough: say, nothing about macro (.macro, .endm directives), I couldn't find anything about some assembler directives I've seen in the existing code, say, .set mips32r2, and so on.
I cant find anything about using rdpgpr/wrpgpr in the ISR, it covers these instructions (and shadow register sets in general) very briefly
Official MIPS reference doesn't help much in these topics as well. Is there really good book that covers all possible assembly directives, and so on?
When the MIPS core enters an ISR it can swap the interrupted code's active register set with a new one (there can be several different shadow register sets), specific for that interrupt priority.
Usually the interrupt routines don't have a stack of their own, and because the just switched-in shadow register set certainly have its sp register with a different value than the interrupted code's, the ISR copies the sp value from the just switched-out shadow register set to its own, to be able to use the interrupted code's stack.
If you wish, you could set your ISR's stack to a previously allocated stack of its own, but that is usually not useful.

MIPS forwarding implementation (tough)

I think I understand the first part
(i). I at least have answers for this. I am not sure about where this implementation would fail though, for part ii? Part ii has me completely stumped. Does anyone know situations where this would fail?
If you want to shine some light on part iii you would be my entire classes hero. Were all stumped there. Thanks for any input.
Tim FlimFlam, the infamous architect of the MN-4363 processor, is struggling with a pipelined implementation of the basic MIPS ISA.
(i) To implement forwarding, Tim connected the output of logic from EX and MEM stages (these logic outputs represent inputs to EXMEM and MEMWB latches, respectively) to the input of IDEX register. He claims that he will be able to cover any dependency in this manner.
• Would this implementation work?
• Would he need to insert any muxes? Explain for
1. the producer instruction is a load.
2. the producer instruction is of R-type. 3. the consumer instruction is of R-type. 4. the consumer instruction is a branch. 5. the consumer instruction is a store.
(ii) Tim claims that forwarding to EX stage only suffices to cover all dependencies.
• Provide two examples where his implementation would fail.
• Would “fail” in this case correspond to breaking correctness constraints?
(iii) Tim tries to identify the minimum amount of information to be transferred acros pipeline stages. Considering R-type, data transfer, and branch instructions, explain how wide each pipeline register should be, demarcating different fields per latch.
Not sure if this is late, but the answer rests in "all dependencies" in part 2. Dependencies/hazards are of multiple types, viz, control, data. Some data hazards can be fixed by forwarding (from the MEM && WB stages to execute stage. Other data hazards like LOAD dependency is not possible to fix by forwarding. To see why this happens, note that a LOAD instruction in the MEM stage will have the output ready from the memory only in the end of that clock cycle. In that same clock cycle, any intstruction in the execute stage which requires the value of the LOAD instruction will get the incorrect value. In such a scenario, at any instant of time within the clock cycle say beginning, the alu is beginning to execute while the memory is 'beginning' to fetch the data. At the end of the cycle, while the memory has finished fetching the data, the alu has also finished computing with the wrong values. To prevent hazards, you need alu to be beginning computing while the data memory has finished fetching (i.e the alu must stall for 1 cycle or you must have a nop between LOAD and ALU instrcution. Hope this helps!

MIPS 32-bit architecture: how can a register in a register file be read from and written to in the same clock cycle?

My computer architecture books explains that
"Since writes to the register file are edge-triggered, our design can
legally read and write the same register within a clock cycle: the
read will get the value written in an earlier clock cycle, while the
value written will be available to read in a subsequent clock cycle."
This makes some sense, and I somewhat understand what's going on with the register file. However, I don't understand when each event happens. Say we're reading from one of the 32 register files and writing to it in the same cycle. When would the register be read from? When would it be written to? I don't totally understand how events are triggered by the clock-edges, so it'd help to have that explained too. Thank you!
Reading the value of a register is asynchronous, whereas in the architecture you are working in your class, the registers are written sychronously (i.e. the writes are edge-triggered).
This means that you can read the current value of a register, apply some operation on it (e.g. add some immediate) and write the result at the next raising clock edge.
Suppose you want to issue an addiu $1, $1, 123, that is take the current value of $1, add 123 and store the result back in $1.
At the start of the clock cycle the control unit would instruct the register file to put the contents of $1 in one of the data buses that gets into the ALU. the control unit would also instruct to put the immediate 123 in the other data bus that also gets into the ALU. The addition which is just a combinatorial circuit implemented inside the ALU would compute the said addition and put the result in the data bus that connects the register file for storage.
All of this is done before the raising edge of the clock happens and the result of the addition gets presented until the next raising edge. At some point the raising edge occurs and the result of the addition is now written back into register $1.
The register file is built from flip-flops. Each flip-flop has a store, an input, an output and a trigger. The output is always presenting the stored value, so can be read all the time.
With a rising edge on the trigger, the input value moves into the store.