How can we process BEQ(branch equal) inst in 2 cycles in MIPS multicycle - mips

In normal scenario we process the BEQ inst in 3 cycles like IF,ID,and EX but how can we process BEQ in just 2 cycles
I have already made the DataPath for 3 cycles

Related

Is there an execute-store data hazard in MIPS?

On MIPS architecture with pipelining and forwarding:
add $s0, $t1, $t2
sw $s0, 0($sp)
The add instruction will have the result ready at step 3 (execute operation) however I presume that the sw instruction want the result at step 2 (Instruction decode & register read).
There is a solved exercise in the book Computer Organization and Design by David A. Patterson: Find the hazards in the following code segment and reorder the instructions to avoid any pipeline stalls:
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1,$t2
sw $t3, 12($t0)
lw $t4, 8($01)
add $t5, $t1,$t4
sw $t5, 16($t0)
Solution:
lw $t1, 0($t0)
lw $t2, 4($t1)
lw $t4, 8($01)
add $t3, $t1,$t2
sw $t3, 12($t0)
add $t5, $t1,$t4
sw $t5, 16($t0)
In the solution it correctly recognizes the load-use hazard and rearranges the code accordingly, but is there an execute-store hazard as well?
Let's consider a MIPS in which forwarding is activated.
I think that in that case no hazard occurs: in fact the ADD instruction is an integer operation that in the MIPS architecture requires only one clock cycle.
Look at this graph:
ADD $t3,$t1,$t2 IF ID EX MEM WB
SW $t3,12($t0) IF ID EX MEM WB
As you can see no hazard occurs because the SW instruction stores the datum after two clock cycles since the result is put in $t3 by ADD.
Actually in similar situations a hazard can occur but only if the unit is a multicycle one (if it requires more than one clock cycle to compute the data).
Look ad this example, in which the ADD.D instruction uses a floating point adder that requires 4 clock cycles to perform the calculation:
ADD.D F2,F4,F5 IF ID A0 A1 A2 A3 MEM WB
S.D F2,somewhere IF ID EX X0 X1 X2 MEM WB
X0 and X1 are RAW stalls while X2 is a structural stalls: in the former case S.D must wait for ADD.D to finish; in the latter your MIPS cannot access in the same clock cycle to the memory two times, so a structural stall occurs.

Questions about adding jal instruction to mips single cycle datapath

I am trying to add jal instruction i understand how it works however i am having difficulty implementing it in the hardware?
I have this schematic and it shows that 31 connects to the mux before the register but not sure what to connect. I see that R[31] is equal to pc+8 or to the jump address however those are 32 bits while the entry to the mux is just 5 bits.
It means that the constant 31 be fed to the mux.
That 5-bit constant is the register number for $ra which is the register you want to hold the value of $PC + 8 if the MIPS has delayed branching and $PC + 4 if it does not have delayed branching.

nops in superscalar MIPS pipeline

Full disclosure: this is related to a homework question, but is not itself a homework question (if that makes sense).
Let's say I had the following MIPS code:
100 addi $1, $0, 1
104 nop
108 addi $2, $0, 2
112 addi $3, $0, 3
and I was running it on a superscalar datapath that loads two instructions each cycle. Based on what I know about superscalar pipelines, I would say that in cycle one the processor will fetch instructions 100 and 104.
I would also say that at some point before 100/104 finished executing, the CPU would fetch instructions 108 and 112. Now, if instruction 104 were something other than nop, I would say that this would happen in cycle 2 (leaving aside complexities like stalling). However, the fact that it IS a nop is making me pause.
I have two questions:
Am I correct that the processor will fetch both 100 and 104 in the same cycle? In other words, does a superscalar processor typically have any special handling around fetching nop instructions?
Assuming my assumption is correct: will instructions 108 and 112 be fetched in cycle 2, or cycle 3?
I suspect the answer would be cycle 2. Although the intent of nop is to delay the execution of the next instruction, it does this by just doing some empty work. My suspicion is that the process will just execute the instruction in parallel with instruction 100, and then fetch instructions 108 and 112 in the following cycle.

MIPS Minimum Needed Memory Space

What is the minimum amount of memory required to run the program whose portion is presented below and which runtime on a MIPS of 5 pipeline stages, 2 nanoseconds per stage for fixed-point operations? In floating point operations stage EX costs 16 ns. The instructions only last quantities of pipeline stages required for its execution (assume that there is no conflict of pipeline).
.data
Pf1: .word 0x41400000
Vet1: .double 1.0, 2.0, 3.0, 4.0
.text
leaf_example:
addi $sp, $sp, -48
sw $s0, 0($sp)
sll $t0, $s0, 5
label: addu $t0, $t0, $s2
sll $t0, $t0, 3
addu $t0, $a1, $t0
bgt $t0, $s0, label
l.d $f18, 0($t0)
AFAIK, the pipelining and time spent on particular stages helps the dynamic instruction count and/or instruction processing time, but not the memory that is required to store the program.
.text starts at 0x10010000
.data starts at 0x00400000
It seems reasonable that how much memory you need depends on if the hardware/chipset can virtually present memory at different locations without needing physical memory to fill the gaps.
No Virtual/Logical Memory Management
If there was no chipset or system providing logical memory management, it seems that you would need 4194304 bytes, or 4 MB if you didn't have a .data section. If you have anything in .data, then it would need to be at least 256MB + 64 KB + however many bytes you're storing.
In your example, this would mean that you need 256 MB + 64 KB + 36 bytes = 268501028 bytes, or about 256.07 MB.
With Virtual Memory Management
Suppose your MIPS program is running on a platform that is doing virtual memory management. Then the system could present memory at location 0x10010000, for example, without actually having all of the previous addresses (like 0x1000ffff) physically located.
Also, this analysis could work if you use a modified MIPS memory layout. In MARS, you can compact the memory by setting .data to start at address 0x0.
Here it would be a straightforward calculation of the instructions plus the data. In your example, since ble and l.d are pseudo-instructions, they increase the number of instructions from the apparent 8 to 11 real machine instructions. 11 words in .text (44 bytes) plus 9 words in .data (36 bytes) gives 80 bytes.

representing the addi $s1, $0, 4 instruction: write down the value of the control signals

Im doing a homework where I need to write down the value of the control signals for 5 instructions and am trying to figure out the sample first (code at the bottom). The 5 instructions I need to do are
Address Code Basic Source
0x00400014 0x12120004 beq $16,$18,0x0004 15 beq $s0, $s2, exit
0x00400018 0x8e080000 lw $8,0x0000($16) 16 lw $t0, ($s0)
0x0040001c 0x02118020 add $16,$16,$17 17 add $s0, $s0, $s1
0x00400020 0xae08fffc sw $8,0xfffc($16) 18 sw $t0, -4($s0)
0x00400024 0x08100005 j 0x00400014 19 j loop
And the example he did is for addi $s1,$0,4 . Right now I have this for it:
Address Code Basic Source
0x00400028 0x20110004 addi $16,$0,4 20 addi $s1, $0, 4
where I think the 4 in the basic column is incorrect. What would be the right answer?
Heres the sample he did for that, and below that is the diagram he is referring to with the control signals:
##--------------------------
# Example
# addi $s1, $0, 4
# Although not supported as in Figure 4.24, the instruction can be easily
# supported with minor changes in the control circuit.
instruction_address=0x00400028
instruction_encoding=0x20110004
OPcode=0b001000
Jump=0
Branch=0
Jump_address=0x00440010 # not used in this instruction
Branch_address=0x0040003C # not used in this instruction
Read_register_1=0b00000
Read_register_2=0b10001
Sign_extend_output=0x00000004
ALUSrc=1 # pick the value from sign_extend_output
ALUOp=0b00 # assume the same value as load/store instruction
ALU_control_input=0b0010 # add operation, as in load/store instruction
MemRead=0
MemWrite=0
MemtoReg=0 # select the ALU result
RegDst=0
Write_register=0b10001 #register number for $s1
RegWrite=1
##--------------------------
Lets examine the breakdown of the first instruction: beq $s0, $s2, exit.
The instruction address is given under the address column above: 0x00400014. You have the encoding as well: 0x12120004. The encoding is the machine instruction. Lets represent the instruction in binary: 000100 10000 10010 0000000000000100.
This is an I-type instruction. The first group of six bits is the opcode, the second group of five is the source register, the third group of five is the temporary register, and the last group of sixteen is the immediate value.
The opcode is then 0b000100. Since this is an I-type instruction, we aren't jumping to a target, thus the Jump signal is 0. However, we are branching, so the Branch signal is 1.
To find the Jump_Address, even though it is ignored, examine the the least significant 26 bits: 10000 10010 0000000000000100. Since addresses are word-aligned, we can enlarge the range of reachable addresses by having the jump offsets be the signed difference between the next instruction and target address. In other words, if my target address is 8 bytes away from the next instruction (PC-relative addressing), I'll use 2 to represent the offset. And this is why we must shift the offset 2 bits to the left. So we end up with Jump_Address = 10 00010 01000 0000000000010000 or 0x8480010.
To find the Branch_Address, which will be used, examine the least significant 16 bits: 0000000000000100. That's sign extended and shifted 2 bits to the left to get: 0000000000000000 0000000000010000 or 0x00000010. This immediate value will be added to the program counter, which points to the next instruction: 0x00400018. So we finally end with Branch_Address = 0x00400028. I'm assuming the exit label points to the next instruction after the five you've posted above, right after the j instruction.
The registers are straightforward. Read_register_1 = 0b10000 and Read_register_2 = 0b10010.
The Sign_extend_output is just the immediate field sign-extended: 0x00000004.
On to the ALU control signals. ALUSrc controls the multiplexer between the register file and ALU. Since a beq instruction requires the use of two registers, we need to select the Read data 2 register from the register file. We aren't using the immediate field for an ALU computation, like with the addi instruction. Therefore, the ALUSrc is 0.
The ALUOp and ALU_control_input are hard-wired values that are created from the opcode. ALUOp = 0b01 and ALU_control_input = 0b0110. Pg. 323 of Computer Organization and Design, 4th. Edition Revised by Hennessey and Patterson and this web page have a table with the appropriate control signals for a beq instruction. Pg. 318 has a table with the ALU control bit mappings.
MemRead and MemWrite are 0 since we aren't accessing memory; MemToReg is X (don't care) since MemWrite is 0; RegWrite is 0 since we aren't writing to the register file; RegDst is X since RegWrite is 0; and lastly, to find Write_register, take bits 16-20 (look at the multiplexer between the instruction memory and register file), which are 0b10010.