I have confused by using pipelining in mips instruction. Any help will be great. Thanks in advance.
What is the data dependency in the next two codes? Which of them can be solved by using stall (bubble) or forwarding. You can use the shape 1 for convenience.
shape 1:
If-Id-Ex-Mem-Wb
explanation:
if=instruction fetch
id=instruction decode register fetch
ex=execute
mem=memory access
wb=write back
code 1:
add $3,$4,$2
sub $5,$3,$1
lw $6,200($5)
sw $6,200($2)
lw $6,200($3)
add $7,$4,$6
code 2:
add $3,$4,$2
sub $5,$3,$1
lw $6,200($3)
add $7,$3,$6
(sorry for bad post,but i can't yet post an image)
Thanks.
Let's see the first one:
add $3,$4,$2
sub $5,$3,$1
The result from add is used in sub, therefore there is a data hazard. We have to insert an amount of NOP stages to resolve it. Assuming all instructions take up 5 cycles, we insert 3 NOPs and we're done.
add $3,$4,$2 IF ID EX MEM WB
sub $5,$3,$1 NOP NOP NOP IF ID EX MEM WB
We can do this for all subsequent instructions. Now instructions produce new values in the EX and MEM stages. Those values are not written to a register until the WB stage (for learning purposes, let's assume that's true). Since the registers are read in the ID stage, this leaves a window of three cycles for which old incorrect values are "flowing" through the pipeline. Forwarding can help cure this problem in our case above - forward the result from add:EX to sub:ID.
Hope this helps.
Related
I have been working on some low level programming with 5 stage pipelining. But I hit a snag.
Assuming this diagram http://i.imgur.com/7kTFi.png
and the mips code:
lw $4,1000($6)
sw $4,2000($6)
what would actually happen? I assumed there would be bubbles, i counted two bubbles proceeding the ID stage.
Can we fix it by adding inputs to the new forwarding unit? Where can I add mux's and new datapaths to to avoid bubble+errors?
You are right, there would be two bubbles.
Assuming data forwarding:
1. IF ID EX MM WB
2. IF S S ID EX MM WB
(S means stall or bubble).
There is no way you can 'fix' it because in any case you have to wait for the end of the MM stage to have the value at 1000($6). It could even be worse without data forwarding where you would have to wait until the WB stage, meaning 3 stalls.
Only way to prevent such behaviour is to have smart compilers that will schedule those two instructions in a different way (i.e space them by adding others in-between).
Note that as it is, the program has no real purpose (get value at memory address [1000+Regs[$6]], and copy it at address [2000+Regs[$6]])
You need MEM to ALU forwarding
So I'm going over some old quizzes for my Computer Organization final and I must have missed this lecture or something. I'm decently proficient in programming MIPS, but this problem has me completely stumped. Could someone help me understand this?
The diagram is missing lines connecting the various parts of the processor as well as a multiplexor for determining if the next instruction address is coming from the PC+4 or from a register as in a jr ra instruction.
There needs to be a line from the ALU to the write data portion of the register file. This is for R-Type instructions, as their result will need to be written back to the destination register. Going into the ALU needs to be lines from Read data 1 and Read data 2, this is how the values from the registers make their way into the ALU for R-type instructions.
A couple lines have been added from the instruction memory to the registers, you're missing the one to the Write Register though (this specifies the destination register for R-type instructions).
For the PC, the line going into the adder goes into the other input (the above one). The 4 for the adder is constant, as each instruction address is 4 bytes after the previous one, so unless we're jumping to an address we will be executing the instruction immediately after the current instruction. The line from the PC to the read address is also necessary as the PC specifies which instruction address to find the current instruction at. The line going into the PC comes from either the result of the PC+4 addition or from the register specified in a jr ra instruction.
To handle this decision a multiplexor is needed. Multiplexors have two inputs and one output, so this one will have one input from the PC+4 adder (for regular R-type instructions) and another from Read Data 1 (for jr ra instructions). The input from Read Data 1 should be visualized as a split from the line between Read Data 1 and the ALU. The output will go right back to the PC, as it determines the next instruction to execute.
I think that's everything that's wanted for that question, as the prof specifies that control signals are already generated (the multiplexor is a type of control unit, but I think it's necessary nonetheless). Hope that helps!
I am looking at a very suspicious disassembled MIPS code of a C application
80019B90 jal loc_80032EB4
loc_80032EB4 is in the middle of another function's body, I've specially checked that no other code is loaded at this address in runtime and calling that function this way(passing some code in the beginning) can be useful. But how is it possible to do in C? It's not a goto as you can't goto to another function and normal function call will always "jal" to the beginning. Can this be some hand optinmimzation?
Update:
Simplified layout of both functions, callee:
sub_80032E88 (lz77_decode)
... save registers ...
80032E90 addiu $sp, -8
... allocate memory for decompressed data ...
80032EB0 move DECOMPRESSED_DATA_POINTER_A1, $v0
loc_80032EB4:
80032EB4 lw $t7, 0(PACKED_DATA_POINTER_A0)
... actual data decompression ...
80032F4C jr $ra
caller:
80019ACC addiu $sp, -0x30
... some not related code ...
80019B88 lw $a1, off_80018084 // A predefined buffer is used instead of allocating it for decompressed data
80019B90 jal loc_80032EB4
80019B94 move $a0, $s0
... some other code and function epilogue ...
Update 2:
I've checked if this can be a case of setjmp/longjmp usage, but in my tests I can always see calls to setjmp and longjmp functions in disassembled code, not a direct jump.
Update 3:
I've tried using GCC-specific ability to get label pointers and casted this pointer to function, result is close to what I want but disassembled code is still different as instead of using jal with exaxct address it calculating it runtime, maybe I am just unable to force compiler to see this value as constant, becouse of scope issues.
Since it is a data decompression function from a game system, it is very likely that this function is hand optimized assembly with multiple entry points. Multiple entry points aren't commonly used, so it is difficult to find a publicly available example, but here is an old thread from the gcc mailing list that suggests a possible use for this technique.
The gist is that if you have two functions where one function F1 has code that is a subset of the other function, F2's code, then the code for F2 can fall through into the code for F1. In your case, F2 allocates memory for the decompressed data, and F1 assumes that the memory allocation has already been done. I'm pretty sure that GCC 2.9x cannot generate code like this.
It is not possible to directly translate this construct from assembler into standard C, because you cannot goto another function in C, but this is perfectly legal in assembler code. The gcc mailing list thread suggests a couple of work-arounds to express the same idea in C.
If you look at the dis-assembled code for the decompression it will likely have a different style than compiler generated code. There may even be some use of opcodes, like find first set bit that the compiler cannot generate from C.
I'm having trouble understanding how the instruction jal works in the MIPS processor.
My two questions are:
a) What is the value stored in R31 after "jal": PC+4 or PC+8?
b) If it's really PC+8, what happens to the instruction at PC+4? Is it executed before the jump or is it never executed?
In Patterson and Hennessy (fourth edition), pg 113:
"jump-and-link instruction: An instruction that jumps to and address and simultaneously saves the address of the following instruction in a register ($ra in MIPS)"
"program counter (PC): The register containing the address of the instruction in the program being executed"
After reading those two statements, it follows that the value saved in $ra should be (PC+4).
However, in the MIPS reference data (green card) that comes with the book, the jal instruction's algorithm is defined like this:
"Jump and Link : jal : J : R[31]=PC+8;PC=JumpAddr"
This website also states that "it's really PC+8", but strangely, after that it says that since pipelining is an advanced topic "we'll assume the return address is PC+4".
I come from 8086 assembly, so I'm aware that there's a big difference between returning to an address and to the one following it, because programs won't work if I just assume something that's not true. Thanks.
The address in $ra is really PC+8. The instruction immediately following the jal instruction is in the "branch delay slot". It is executed before the function is entered, so it shouldn't be re-executed when the function returns.
Other branching instructions on the Mips also have branch delay slots.
The delay slot is used to do something useful in the time it takes to execute the jal instruction.
I got the same question. Googled this excellent answer of Richard and also another link I wish to add here.
The link is http://chortle.ccsu.edu/AssemblyTutorial/Chapter-26/ass26_4.html
with this wonderful explanation of double adding 4 to the PC.
So the actual execution has two additions: 1) newPC=PC+4 by pipelining and 2) another addition $ra=newPC+4 by the jal instruction resulting the effective $ra = (address of the jal instruction)+8.
I am learning about micro programming and am confused as to what a micro-instruction actually is. I am using the MIPS architecture. My questions are as follows
Say for example I have the ADD instruction, what would the micro-instructions look like for this? How many micro-instructions are there for the add instruction. Is there somewhere online I can see the list of micro-instructions for the basic instructions of MIPS?
How can I figure out the bit string for an ADD microprogrammed instruction?
Microprogramming is a method of implementing a complex instruction set architecture (such as x86) in terms of simpler "micro instructions". MIPS is a RISC instruction set architecture and is not typically implemented using micro-programming, so there are ZERO microinstructions for the ADD instruction.
To answer your specific question one would have to know what the definition of your particular micro-architecture is.
This is an example of how to load the EPC into one of the registers and add 4-bytes to it:
lw t0, 20(sp) // Load EPC
addi t0, 4 // Add 4 to the return adress
sw t0, 20(sp) // Save EPC
There are "a lot" of instructions that you can use, you can see the MIPS Instruction Set here. In my humble opinion, MIPS is Really neat and easy to learn! A fun fact is that the first Playstation used a MIPS CPU.
Example instructions
lw = load word
la = load address
sw = save word
addi = add immidate
Then you have a lot of conditional instructions such as:
bne = branch not equal
bnez = branch not equal zero
And with these you use j to jump to an adress.
Here is an example from an Exception Handler that I wrote once for MIPS, this is the External Source handler:
External:
mfc0 t0, C0_CAUSE // We could aswell use 24(sp) to load CAUSE
and t0, t0, 0x02000 // Mask the CAUSE
bnez t0, Puls // If the only character left is
// "not equal zero" jump to Puls
j DisMiss // Else jump to DisMiss
In the above example I define an entry point called External that I can jump to, as I do with DisMiss to loop, you generally jump to yourself.
There are some other instructions used here aswell:
mfc0 = move from co-processor 0
To handle labels, I would suggest you check this question/answer out.
Here's a couple of resources on MicroProgramming with MIPS:
Some general information
Here is a bit more heavy power-point presentation on the subject from Princton ( PDF )
Here is a paper from another university which is one of the best of these three ( PDF ).