nops in superscalar MIPS pipeline - mips

Full disclosure: this is related to a homework question, but is not itself a homework question (if that makes sense).
Let's say I had the following MIPS code:
100 addi $1, $0, 1
104 nop
108 addi $2, $0, 2
112 addi $3, $0, 3
and I was running it on a superscalar datapath that loads two instructions each cycle. Based on what I know about superscalar pipelines, I would say that in cycle one the processor will fetch instructions 100 and 104.
I would also say that at some point before 100/104 finished executing, the CPU would fetch instructions 108 and 112. Now, if instruction 104 were something other than nop, I would say that this would happen in cycle 2 (leaving aside complexities like stalling). However, the fact that it IS a nop is making me pause.
I have two questions:
Am I correct that the processor will fetch both 100 and 104 in the same cycle? In other words, does a superscalar processor typically have any special handling around fetching nop instructions?
Assuming my assumption is correct: will instructions 108 and 112 be fetched in cycle 2, or cycle 3?
I suspect the answer would be cycle 2. Although the intent of nop is to delay the execution of the next instruction, it does this by just doing some empty work. My suspicion is that the process will just execute the instruction in parallel with instruction 100, and then fetch instructions 108 and 112 in the following cycle.

Related

How does MIPS assembler manage label address?

How does MIPS's assembler labels and J type instruction work?
I am currently making a MIPS simulator using C++ and came into a big question. How exactly does MIPS assembler manage label's and their address while on a J type instruction?
Let's assume that we have a following code. Also let's assume that start: starts at 0x00400000. Comments after code represent where the machine codes will be stored in memory.
start:
andi $t0, $t0, 0 # 0x0040 0000
andi $t1, $t1, 0 # 0x0040 0004
andi $t2, $t2, 0 # 0x0040 0008
addi $t3, $t3, 4 # 0x0040 000C
loop:
addi $t2, $t2, 1 # 0x0040 0010
beq $t2, $t3, exit # 0x0040 0014
j loop # 0x0040 0018
exit:
addi $t0, $t0, 1000 # 0x0040 002C
As I am understanding right at the moment, j loop expression will set PC as 0x0040 0010.
When J type instruction uses 32 bits and with MSB 6 bits as its opcode, it only has 26 bits left to represent address of instruction. Then how is it possible to represent 32 bit address system using only 26 bits?
With the example above, it can represent 0x00400010 with only 24bits. However, in references, text segment is located from 0x00400000 to 0x10000000 which needs 32bit to represent.
I have tried to understand this using MARS simulator, however it just represents j loop as j 0x00400010 which seems nonsense to me since 0x00400010 is 32 bits.
My current guess
One of my current guesses is following.
Assembler saves the loop: label's address into some memory address that is reachable by 26 bits. Then when expression j loop is called, label loop is translated to the memory address that contains 0x00400010 For example, 0x00400010 is saved in some address like 0x00300000 and when j loop is called, loop is translated into 0x00300000 and it is able to get value from 0x00300000 and reach out 0x00400010. (This is just one of my guess)
You have a number of questions here.
First, let's try to differentiate between the assembler's operation and the MIPS machine code that it generates and the processor executes.
The assembler manages labels and address in two ways.  First, it has a symbol table, which is like a dictionary, a data structure of key-value pairs where the names are keys and the addresses (that those names will refer to when the program is running) are the values in the pairs.
Second, the assembler manages the code and data sections with a location counter.  That location counter advances each time the program provides some code or data.  When new label is defined, the current location counter is then used as the address value in a new key-value pair.
The processor never sees the labels: they do not execute and they do not occupy any space in the code or data.  The processor sees only machine code instructions, which on MIPS are all 32-bits wide.  Each machine code instruction is divided into fields.  There are instruction types or formats, which on MIPS are straightforward: I-Type, J-Type, and R-Type.  These formats then define the instruction fields, and the assembler follows these encodings.  All the instruction formats share the 6-bit opcode field, and this opcode field tells the processor what format the instruction is, which fields it therefore has, and thus how to interpret and execute the rest of the instruction.
The assembler removes labels from the assembly — labels and their names do not exist in the program binary.  The label definitions themselves (label:) are omitted from the program binary but usages of labels are translated into numbers, so a machine code instruction that uses a label will have some instruction field that is numeric, and the assembler will provide a proper value for that numeric field so that the effect of the reaching or otherwise accessing what the label referred to is accomplished.  (The label is no longer in the program binary, but the code or data memory that the label referred does remain).
The assembler sets up branch instructions, j instructions, and la/lw instructions, using numbers that tell the processor how far forward or backward to move the program counter, or, what address some data of interest is at.  The lw/la instructions access data, and these use 2 x 32-bit instructions each holding 16 bits of the address of interest.  Between the two instructions, they put together a full 32-bit address for data access.  For branches to fully reach any 32-bit address, they would have to put together the 32-bit address in a similar manner (two instruction pair) and use an indirect/register branch.

How do the lines of lw add and sw translate to the chart below?

I'm just getting started learning this material. I don't understand where the 35, 9, 8 and such numbers in the chart come from. All I see in the assembly lines is the number 300.
https://ibb.co/ZYrk42H
35 is the opcode for lw, 9 is the actual register number for $t1 and 8 is the actual register number for $t0.
$t0 is a friendly register name — see here, you can see that $t0 is also $8 (aka register number 8). As you can imagine, the hardware doesn't care about the friendly names, and only wants to see the actual numbers.
35 is the opcode, as you can see here, lw has opcode 100011, which is 35

MIPS Structural Hazard

I'm trying to learn about MIPS pipe-lining and the hazards associated to them. I'm having trouble picturing what a structural hazard looks like in MIPS instructions.
I've read that it is a situation where two (or more) instructions
require the use of a given hardware resource at the same time. And I've seen examples shown in clock cycles before. But can anyone just provide a simple MIPS instruction set example for me to see? I'm having difficulty finding one online. Just see lots of examples for data hazards and that's not what I'm looking for. Thanks!
It is hard for you to come by this problem because it's usually resolved in the HW architecture...
Here are two examples:
Assume a write is made to the register file (RF) during stage 5 (WB) and a read is made to the same register on the RF on stage 2 (ID) at the same time. This is a structural hazard because two instructions are trying to access the same resource at the same clock cycle (what value will be read?). This can be resolved (in the HW), for instance, by splitting the RF access to two clock phases, write on HIGH and read on LOW. Moreover, if you think about it, a structural hazard is why there are separate 2 read ports and 1 write port in the RF.
Assume an instruction is being fetch from memory (stage 1, IF) and another read/write is done to the memory on stage 4 (MEM). Again, same resource accessed on the same cycle. This was resolved by separating the data and instruction memories (harvard architecture). It may look obvious to you but you can lookup Princeton Architecture and see an example to a unified memory.
So if we take the first example for instance: any set of instructions with a load (lw) command to the same register as in a R-type command (like add) that follows after two other instructions will do the trick:
lw $8, 100($9)
add $10, $11, $12
add $10, $11, $12
add $10, $8, $12
Hope that helps.
This might work, but I'm not a big MIPS person:
add $t0, $t1, $t2
sw $t3, 0($t4)
sub $t5, $t6, $t7
sub $t8, $t9, $t0
sw $t0, 0($s0)

Questions about adding jal instruction to mips single cycle datapath

I am trying to add jal instruction i understand how it works however i am having difficulty implementing it in the hardware?
I have this schematic and it shows that 31 connects to the mux before the register but not sure what to connect. I see that R[31] is equal to pc+8 or to the jump address however those are 32 bits while the entry to the mux is just 5 bits.
It means that the constant 31 be fed to the mux.
That 5-bit constant is the register number for $ra which is the register you want to hold the value of $PC + 8 if the MIPS has delayed branching and $PC + 4 if it does not have delayed branching.

representing the addi $s1, $0, 4 instruction: write down the value of the control signals

Im doing a homework where I need to write down the value of the control signals for 5 instructions and am trying to figure out the sample first (code at the bottom). The 5 instructions I need to do are
Address Code Basic Source
0x00400014 0x12120004 beq $16,$18,0x0004 15 beq $s0, $s2, exit
0x00400018 0x8e080000 lw $8,0x0000($16) 16 lw $t0, ($s0)
0x0040001c 0x02118020 add $16,$16,$17 17 add $s0, $s0, $s1
0x00400020 0xae08fffc sw $8,0xfffc($16) 18 sw $t0, -4($s0)
0x00400024 0x08100005 j 0x00400014 19 j loop
And the example he did is for addi $s1,$0,4 . Right now I have this for it:
Address Code Basic Source
0x00400028 0x20110004 addi $16,$0,4 20 addi $s1, $0, 4
where I think the 4 in the basic column is incorrect. What would be the right answer?
Heres the sample he did for that, and below that is the diagram he is referring to with the control signals:
##--------------------------
# Example
# addi $s1, $0, 4
# Although not supported as in Figure 4.24, the instruction can be easily
# supported with minor changes in the control circuit.
instruction_address=0x00400028
instruction_encoding=0x20110004
OPcode=0b001000
Jump=0
Branch=0
Jump_address=0x00440010 # not used in this instruction
Branch_address=0x0040003C # not used in this instruction
Read_register_1=0b00000
Read_register_2=0b10001
Sign_extend_output=0x00000004
ALUSrc=1 # pick the value from sign_extend_output
ALUOp=0b00 # assume the same value as load/store instruction
ALU_control_input=0b0010 # add operation, as in load/store instruction
MemRead=0
MemWrite=0
MemtoReg=0 # select the ALU result
RegDst=0
Write_register=0b10001 #register number for $s1
RegWrite=1
##--------------------------
Lets examine the breakdown of the first instruction: beq $s0, $s2, exit.
The instruction address is given under the address column above: 0x00400014. You have the encoding as well: 0x12120004. The encoding is the machine instruction. Lets represent the instruction in binary: 000100 10000 10010 0000000000000100.
This is an I-type instruction. The first group of six bits is the opcode, the second group of five is the source register, the third group of five is the temporary register, and the last group of sixteen is the immediate value.
The opcode is then 0b000100. Since this is an I-type instruction, we aren't jumping to a target, thus the Jump signal is 0. However, we are branching, so the Branch signal is 1.
To find the Jump_Address, even though it is ignored, examine the the least significant 26 bits: 10000 10010 0000000000000100. Since addresses are word-aligned, we can enlarge the range of reachable addresses by having the jump offsets be the signed difference between the next instruction and target address. In other words, if my target address is 8 bytes away from the next instruction (PC-relative addressing), I'll use 2 to represent the offset. And this is why we must shift the offset 2 bits to the left. So we end up with Jump_Address = 10 00010 01000 0000000000010000 or 0x8480010.
To find the Branch_Address, which will be used, examine the least significant 16 bits: 0000000000000100. That's sign extended and shifted 2 bits to the left to get: 0000000000000000 0000000000010000 or 0x00000010. This immediate value will be added to the program counter, which points to the next instruction: 0x00400018. So we finally end with Branch_Address = 0x00400028. I'm assuming the exit label points to the next instruction after the five you've posted above, right after the j instruction.
The registers are straightforward. Read_register_1 = 0b10000 and Read_register_2 = 0b10010.
The Sign_extend_output is just the immediate field sign-extended: 0x00000004.
On to the ALU control signals. ALUSrc controls the multiplexer between the register file and ALU. Since a beq instruction requires the use of two registers, we need to select the Read data 2 register from the register file. We aren't using the immediate field for an ALU computation, like with the addi instruction. Therefore, the ALUSrc is 0.
The ALUOp and ALU_control_input are hard-wired values that are created from the opcode. ALUOp = 0b01 and ALU_control_input = 0b0110. Pg. 323 of Computer Organization and Design, 4th. Edition Revised by Hennessey and Patterson and this web page have a table with the appropriate control signals for a beq instruction. Pg. 318 has a table with the ALU control bit mappings.
MemRead and MemWrite are 0 since we aren't accessing memory; MemToReg is X (don't care) since MemWrite is 0; RegWrite is 0 since we aren't writing to the register file; RegDst is X since RegWrite is 0; and lastly, to find Write_register, take bits 16-20 (look at the multiplexer between the instruction memory and register file), which are 0b10010.