MIPS:what's instruction lui $1,4097 for? - mips

Original:
After Compact:
So how is the process of this? What is lui here for?

lui is "Load upper immediate" and puts the the 16 bit immediate in the upper half of a register. In C-like notation $r = imm16 << 16. The register $1 is used as an assembler temporary. In this case the lw got divided into a lui and lw to load a full 32 bit address.
The instructions li and la are pseudo instructions (see Wikipedia) and they get replaced by a lui followed by a ori, addiu or andi.
Your assembler optimized the lui instructions away because the pseudo instructions and loads only need 16 bit values and addresses and not the full 32 bit values. Without optimization the assembler must assume that the full 32 bit values are needed.

Related

MIPS Branch Addressing Algorithm and Opcode isolation from instruction binary?

I just want to check my understanding of these two concepts is correct, as I have been trying to finish a project and while everything works to my expectations, it keeps narrowly failing the test cases and introducing a random value...
Basically, the objective of the project is to write out a branch instruction to console in this form:
BranchName $S, [$t, if applicable] 0xAbsoluteAddressOfBranchTargetInstruction
Edit: Clarification: I'm writing this in MIPS. The idea is I get a memory address in $a0 given to the program by my instructor's code (I write the function). The address is for the word containing a MIPS instruction. I'm to do the following:
Get instruction
Isolate instruction opcode and output its name to register (ie: opcode 5, output BNE), do nothing if it isn't a branch instruction.
Isolate $s, $t, and output as applicable (ie: no $t for bgez)
Use offset in the branch instruction to calculate its absolute address (the address of the target instruction following branch) and output in hex. For the purposes of this calculation, the address of the branch instruction ($a0) is assumed to be $pc.
IE:
BEQ $6, $9, 0x00100008
Firstly, is my understanding of branch calculation correct?
PC -> PC + 4
Lower 16 bits of instruction
<< 2 these lower bits
Add PC+4 and the left shifted lower 16 bits (only the lower 16 though).
Secondly, could somebody tell me which bits I need to isolate to know what kind of branch I'm dealing with? I think I have them (first 6 for BEQ/BNE, first 16 with $s masked out for others) but I wanted to double check.
Oh, and finally... should I expect deviation on SPIM from running it on an Intel x86 Windows system and an Intel x86 Linux system? I'm getting a stupid glitch and I cannot seem to isolate it from my hand-worked address calculations, but it only shows up when I run the test scripts my prof gave us on Linux (.sh); running directly in spim on either OS seems to work... provided my understanding of how to do the hand calculations (as listed above) is correct.
This is prefaced by my various comments.
Here is a sample program that does the address calculation correctly. It does not do the branch instruction type decode, so you'll have to combine parts of this and your version together.
Note that it uses the mars syscall 34 to print values in hex. This isn't available under spim, so you may need to output in decimal using syscall 1 or write your own hex value output function [if you haven't already]
.data
msg_best: .asciiz "correct target address: "
msg_tgt: .asciiz "current target address: "
msg_nl: .asciiz "\n"
.text
.globl main
main:
la $s0,inst # pointer to branch instruction
la $s1,einst # get end of instructions
subu $s1,$s1,$s0 # get number of bytes
srl $s1,$s1,2 # get number of instruction words
la $s2,loop # the correct target address
la $a0,msg_best
move $a1,$s2
jal printaddr
loop:
move $a0,$s0
jal showme # decode and print instruction
addiu $s0,$s0,4
sub $s1,$s1,1
bnez $s1,loop # more to do? yes, loop
li $v0,10
syscall
# branch instructions to decode
inst:
bne $s0,$s1,loop
beq $s0,$s1,loop
beqz $s1,loop
bnez $s1,loop
bgtz $s1,loop
bgez $s1,loop
bltz $s1,loop
blez $s1,loop
einst:
# showme -- decode and print data about instruction
#
# NOTE: this does _not_ decode the instruction type
#
# arguments:
# a0 -- instruction address
#
# registers:
# t5 -- raw instruction word
# t4 -- branch offset
# t3 -- absolute address of branch target
showme:
subu $sp,$sp,4
sw $ra,0($sp)
lw $t5,0($a0) # get inst word
addiu $t3,$a0,4 # get PC + 4
sll $t4,$t5,16 # shift offset left
sra $t4,$t4,16 # shift offset right (sign extend)
sll $t4,$t4,2 # get byte offset
addu $t3,$t3,$t4 # add in offset
# NOTE: as a diagnostic, we could compare t3 against s2 -- it should
# always match
la $a0,msg_tgt
move $a1,$t3
jal printaddr
lw $ra,0($sp)
addu $sp,$sp,4
jr $ra
# printaddr -- print address
#
# arguments:
# a0 -- message
# a1 -- address value
printaddr:
li $v0,4
syscall
# NOTE: only mars supports this syscall
# to use spim, use a syscall number of 1, which outputs in decimal and
# then hand convert
# or write your own hex output function
move $a0,$a1
li $v0,34 # output number in hex (mars _only_)
syscall
la $a0,msg_nl
li $v0,4
syscall
jr $ra
The 16 bit immediate value is sign-extended to 32 bits, then shifted. I don't know if that would affect your program; but, that's the only potential "mistake" I noticed.

Declaring integer values in MIPS

So I am writing an assembly program that has lots of constant integer values.
I know that in the .data section I can assign a label with a .word data type and type in my number. With this method I still have to load an address in main.
But in main, I could just simply use
li $t1, some_number
Are any one of these methods better than the other and why?
Generally I'd say using li is the better approach. You're avoiding adding a bunch of clutter in your .data section, and you will also get more efficient code in some cases.
Let's look at some examples:
.data
ten: .word 10
million: .word 1000000
.text
main:
lw $t0,ten
li $t1,10
lw $t2,million
li $t3,1000000
It's important to understand here that both lw and li are pseudo-instructions that get translated into one or more actual instructions. lw does exist in the MIPS instruction set, but this particular variant of it doesn't. li doesn't exist in the MIPS instruction set.
If we look at what SPIM generates for the first two instructions, we see:
[0x00400024] 0x3c011001 lui $1, 4097 ; 9: lw $t0,ten
[0x00400028] 0x8c280000 lw $8, 0($1)
[0x0040002c] 0x3409000a ori $9, $0, 10 ; 10: li $t1,10
So that's one additional instruction for the lw variant, as the address first has to be loaded into a register, and then the value is loaded from that address. This also means one additional (potentially slow) memory access (well, two if you count the instruction fetch).
Now let's look at the other two instructions, where the value to be loaded is too large to be encoded in a single instruction:
[0x00400030] 0x3c011001 lui $1, 4097 ; 11: lw $t2,million
[0x00400034] 0x8c2a0004 lw $10, 4($1)
[0x00400038] 0x3c01000f lui $1, 15 ; 12: li $t3,1000000
[0x0040003c] 0x342b4240 ori $11, $1, 16960
Here the immediate 1000000 is loaded using two instructions as (15 << 16) | 16960. So both variants require two instructions, but the li variant doesn't need to read from memory.
If you want to assign a meaningful name to a constant to avoid having magic numbers all over your code you can do so with =:
TEN = 10
li $t0, TEN # Expands to li $t0, 10
You could perhaps avoid loading the addresses for lw all the time by using $gp-relative addressing, but I feel that that's beyond the scope of this question.

Exam - Machine code codification

I found this exam question that I have to tell how the beq instruction is coded in machine code.
This is the code:
loop: addu $8, $9, $10
addiu $8, $8, 0x00FF7A01
beq $8, $0, loop
Although my first question would be 0x1100FFFD the correct answer is 0x1100FFFB.
I believe that this is because 0x00FF7A01 is bigger than 16 bit and addiu $8, $8, 0x00FF7A01 must be "decompiled" in more than one instruction.
So here are my questions.
Q1 - In what is addiu $8, $8, 0x00FF7A01 decomplided?
Q2 - And what if the immediate field on the beq instruction was bigger than 16 bits? Must I have to use jumps?
Q1)
For your first question, you mean assemble, not decompiled.
addiu $8, $8, 0x00FF7A01
Your assessment was correct, since the immediate value was bigger than what we can store in a single instruction, we would need to use multiple instructions. The assembler will use the $at (Assembler Temporary, which is $1) register for that.
lui $at, 0x00FF
ori $at, $at, 0x7A01
addu $t0, $t0, $at
Q2)
The branch instructions use what is called PC-relative addressing - the immediate value does not contain the actual address by rather the word offset. Since we know the instruction must be word (2^2) aligned, the low 2 bits are always zero. The actual address will be calculated on the fly by shifting the offset left by 2 and adding it to the address of the instruction following the branch. The final value will be our PC-relative effective address. So we actually have 17 bits to play with (technically 18, but the offset is signed).
When the offset exceeds that range, the assembler will use far branches - a branch followed by a jump.
beq $x, $y, label
will become
beq $x, $y, temp
# ...
j tskip
temp:
j tlabel
# ...
# ...
tskip:
Not sure how the assembler works here, but to allow addiu $8, $8, 0x00FF7A01 be assembled correctly, multiple instructions are necessary, as you have expected. Normally, addiu is a valid instruction that takes in 16-bit integer only.
The instruction addiu $8, $8, 0x00FF7A01 is minimally rewritten as these 3 instructions:
addiu $8, $8, 0x7A01
lui $11, 0x00FF // Assume nothing significant is stored in $11
addiu $8, $8, $11
Since there is now difference of 5 instructions from branch instruction, we need to put -5 in the immediate field of beq, which is 0xFFFB (detailed explanation here).
If the destination is outside the range of -217 to 217-1 bytes (or to the power of 15 in term of number of instructions), then jump instruction must be used.

representing the addi $s1, $0, 4 instruction: write down the value of the control signals

Im doing a homework where I need to write down the value of the control signals for 5 instructions and am trying to figure out the sample first (code at the bottom). The 5 instructions I need to do are
Address Code Basic Source
0x00400014 0x12120004 beq $16,$18,0x0004 15 beq $s0, $s2, exit
0x00400018 0x8e080000 lw $8,0x0000($16) 16 lw $t0, ($s0)
0x0040001c 0x02118020 add $16,$16,$17 17 add $s0, $s0, $s1
0x00400020 0xae08fffc sw $8,0xfffc($16) 18 sw $t0, -4($s0)
0x00400024 0x08100005 j 0x00400014 19 j loop
And the example he did is for addi $s1,$0,4 . Right now I have this for it:
Address Code Basic Source
0x00400028 0x20110004 addi $16,$0,4 20 addi $s1, $0, 4
where I think the 4 in the basic column is incorrect. What would be the right answer?
Heres the sample he did for that, and below that is the diagram he is referring to with the control signals:
##--------------------------
# Example
# addi $s1, $0, 4
# Although not supported as in Figure 4.24, the instruction can be easily
# supported with minor changes in the control circuit.
instruction_address=0x00400028
instruction_encoding=0x20110004
OPcode=0b001000
Jump=0
Branch=0
Jump_address=0x00440010 # not used in this instruction
Branch_address=0x0040003C # not used in this instruction
Read_register_1=0b00000
Read_register_2=0b10001
Sign_extend_output=0x00000004
ALUSrc=1 # pick the value from sign_extend_output
ALUOp=0b00 # assume the same value as load/store instruction
ALU_control_input=0b0010 # add operation, as in load/store instruction
MemRead=0
MemWrite=0
MemtoReg=0 # select the ALU result
RegDst=0
Write_register=0b10001 #register number for $s1
RegWrite=1
##--------------------------
Lets examine the breakdown of the first instruction: beq $s0, $s2, exit.
The instruction address is given under the address column above: 0x00400014. You have the encoding as well: 0x12120004. The encoding is the machine instruction. Lets represent the instruction in binary: 000100 10000 10010 0000000000000100.
This is an I-type instruction. The first group of six bits is the opcode, the second group of five is the source register, the third group of five is the temporary register, and the last group of sixteen is the immediate value.
The opcode is then 0b000100. Since this is an I-type instruction, we aren't jumping to a target, thus the Jump signal is 0. However, we are branching, so the Branch signal is 1.
To find the Jump_Address, even though it is ignored, examine the the least significant 26 bits: 10000 10010 0000000000000100. Since addresses are word-aligned, we can enlarge the range of reachable addresses by having the jump offsets be the signed difference between the next instruction and target address. In other words, if my target address is 8 bytes away from the next instruction (PC-relative addressing), I'll use 2 to represent the offset. And this is why we must shift the offset 2 bits to the left. So we end up with Jump_Address = 10 00010 01000 0000000000010000 or 0x8480010.
To find the Branch_Address, which will be used, examine the least significant 16 bits: 0000000000000100. That's sign extended and shifted 2 bits to the left to get: 0000000000000000 0000000000010000 or 0x00000010. This immediate value will be added to the program counter, which points to the next instruction: 0x00400018. So we finally end with Branch_Address = 0x00400028. I'm assuming the exit label points to the next instruction after the five you've posted above, right after the j instruction.
The registers are straightforward. Read_register_1 = 0b10000 and Read_register_2 = 0b10010.
The Sign_extend_output is just the immediate field sign-extended: 0x00000004.
On to the ALU control signals. ALUSrc controls the multiplexer between the register file and ALU. Since a beq instruction requires the use of two registers, we need to select the Read data 2 register from the register file. We aren't using the immediate field for an ALU computation, like with the addi instruction. Therefore, the ALUSrc is 0.
The ALUOp and ALU_control_input are hard-wired values that are created from the opcode. ALUOp = 0b01 and ALU_control_input = 0b0110. Pg. 323 of Computer Organization and Design, 4th. Edition Revised by Hennessey and Patterson and this web page have a table with the appropriate control signals for a beq instruction. Pg. 318 has a table with the ALU control bit mappings.
MemRead and MemWrite are 0 since we aren't accessing memory; MemToReg is X (don't care) since MemWrite is 0; RegWrite is 0 since we aren't writing to the register file; RegDst is X since RegWrite is 0; and lastly, to find Write_register, take bits 16-20 (look at the multiplexer between the instruction memory and register file), which are 0b10010.

Loading an address in MIPS64

This is probably a simple, obvious thing I'm just not seeing, but how do I load an address in a MIPS64 processor? In a MIPS32 processor the following assembler pseudo-instruction:
la $at, LabelAddr
Expands into:
lui $at, LabelAddr[31:16]
ori $at,$at, LabelAddr[15:0]
Looking at the MIPS64 instruction set, I see that lui still loads a 16-bit immediate into the upper half of a 32-bit word. There doesn't appear to be any kind of expanded instruction that loads an immediate anywhere into the upper area of a 64-bit word. This seems, then, that to do the equivalent of an la pseudo-instruction I'd need to expand into code something like:
lui $at, LabelAddr[63:48]
ori $at, $at, LabelAddr[47:32]
sll $at, 16
ori $at, $at, LabelAddr[31:16]
sll $at, 16
ori $at, $at, LabelAddr[15:0]
This strikes me as a bit ... convoluted for something as basic as loading an address so it leaves me convinced that I've overlooked something.
What is it I've overlooked (if anything)?
I think if you need to load a lot of constants, you should put it in a constant pool (A.K.A "literal pool") near the current code and then load it by an ld instruction.
For example: $s0 contains the pool's base address, and the constant you want to load is at offset 48, you can load it to $t1 by the instruction ld $t1, 48($s0)
This technique is very common in ARM, where instructions could only load a 12-bit immediate (only later versions of ARM can load 16-bit immediates with some restrictions). And it is used in Java too.
However somehow MIPS compilers still always generate multiple instructions to load a 64-bit immediate. For example to load 0xfedcba0987654321 on MIPS gcc uses
li $2,-9568256 # 0xffffffffff6e0000
daddiu $2,$2,23813
dsll $2,$2,17
daddiu $2,$2,-30875
dsll $2,$2,16
daddiu $2,$2,17185
Many other RISC architectures have more efficient ways to load an immediate so they need less instructions, but still at least 4. Maybe the instruction cache cost is lower than data cache cost in those cases, or maybe someone just don't like that idea
Here's an example of handwritten constant pool on MIPS
# load pool base address
dla $s0, pool
foo:
# just some placeholder
addu $t0, $t0, $t1
bar:
# load from pool
ld $a0, pool_foo($s0)
ld $a1, pool_bar($s0)
.section pool
# macro helper to define a pool entry
.macro ENTRY label
pool_entry_\label\(): .quad \label
.equ pool_\label\(), pool_entry_\label - pool
.endm
ENTRY foo
ENTRY bar
I failed to persuade any MIPS compilers to emit a literal pool but here's a compiler-generated example on ARM
address so it leaves me convinced that I've overlooked something.
What is it I've overlooked (if anything)?
What you are missing is that even in Mips64 the instruction size stays 32bit (4bytes). In this 32bit machine code encoding system, The 'la' translated to 'lui' + 'ori' combination can handle a max of 32 bit value (address). There are not enough bits in the 4byte machine instruction to easily encode a 64bit address. To deal with 64bit address, more iterations of the same (lui+ori) is used along with shifts (dsll).
Paxym