This is probably a simple, obvious thing I'm just not seeing, but how do I load an address in a MIPS64 processor? In a MIPS32 processor the following assembler pseudo-instruction:
la $at, LabelAddr
Expands into:
lui $at, LabelAddr[31:16]
ori $at,$at, LabelAddr[15:0]
Looking at the MIPS64 instruction set, I see that lui still loads a 16-bit immediate into the upper half of a 32-bit word. There doesn't appear to be any kind of expanded instruction that loads an immediate anywhere into the upper area of a 64-bit word. This seems, then, that to do the equivalent of an la pseudo-instruction I'd need to expand into code something like:
lui $at, LabelAddr[63:48]
ori $at, $at, LabelAddr[47:32]
sll $at, 16
ori $at, $at, LabelAddr[31:16]
sll $at, 16
ori $at, $at, LabelAddr[15:0]
This strikes me as a bit ... convoluted for something as basic as loading an address so it leaves me convinced that I've overlooked something.
What is it I've overlooked (if anything)?
I think if you need to load a lot of constants, you should put it in a constant pool (A.K.A "literal pool") near the current code and then load it by an ld instruction.
For example: $s0 contains the pool's base address, and the constant you want to load is at offset 48, you can load it to $t1 by the instruction ld $t1, 48($s0)
This technique is very common in ARM, where instructions could only load a 12-bit immediate (only later versions of ARM can load 16-bit immediates with some restrictions). And it is used in Java too.
However somehow MIPS compilers still always generate multiple instructions to load a 64-bit immediate. For example to load 0xfedcba0987654321 on MIPS gcc uses
li $2,-9568256 # 0xffffffffff6e0000
daddiu $2,$2,23813
dsll $2,$2,17
daddiu $2,$2,-30875
dsll $2,$2,16
daddiu $2,$2,17185
Many other RISC architectures have more efficient ways to load an immediate so they need less instructions, but still at least 4. Maybe the instruction cache cost is lower than data cache cost in those cases, or maybe someone just don't like that idea
Here's an example of handwritten constant pool on MIPS
# load pool base address
dla $s0, pool
foo:
# just some placeholder
addu $t0, $t0, $t1
bar:
# load from pool
ld $a0, pool_foo($s0)
ld $a1, pool_bar($s0)
.section pool
# macro helper to define a pool entry
.macro ENTRY label
pool_entry_\label\(): .quad \label
.equ pool_\label\(), pool_entry_\label - pool
.endm
ENTRY foo
ENTRY bar
I failed to persuade any MIPS compilers to emit a literal pool but here's a compiler-generated example on ARM
address so it leaves me convinced that I've overlooked something.
What is it I've overlooked (if anything)?
What you are missing is that even in Mips64 the instruction size stays 32bit (4bytes). In this 32bit machine code encoding system, The 'la' translated to 'lui' + 'ori' combination can handle a max of 32 bit value (address). There are not enough bits in the 4byte machine instruction to easily encode a 64bit address. To deal with 64bit address, more iterations of the same (lui+ori) is used along with shifts (dsll).
Paxym
Related
How does MIPS's assembler labels and J type instruction work?
I am currently making a MIPS simulator using C++ and came into a big question. How exactly does MIPS assembler manage label's and their address while on a J type instruction?
Let's assume that we have a following code. Also let's assume that start: starts at 0x00400000. Comments after code represent where the machine codes will be stored in memory.
start:
andi $t0, $t0, 0 # 0x0040 0000
andi $t1, $t1, 0 # 0x0040 0004
andi $t2, $t2, 0 # 0x0040 0008
addi $t3, $t3, 4 # 0x0040 000C
loop:
addi $t2, $t2, 1 # 0x0040 0010
beq $t2, $t3, exit # 0x0040 0014
j loop # 0x0040 0018
exit:
addi $t0, $t0, 1000 # 0x0040 002C
As I am understanding right at the moment, j loop expression will set PC as 0x0040 0010.
When J type instruction uses 32 bits and with MSB 6 bits as its opcode, it only has 26 bits left to represent address of instruction. Then how is it possible to represent 32 bit address system using only 26 bits?
With the example above, it can represent 0x00400010 with only 24bits. However, in references, text segment is located from 0x00400000 to 0x10000000 which needs 32bit to represent.
I have tried to understand this using MARS simulator, however it just represents j loop as j 0x00400010 which seems nonsense to me since 0x00400010 is 32 bits.
My current guess
One of my current guesses is following.
Assembler saves the loop: label's address into some memory address that is reachable by 26 bits. Then when expression j loop is called, label loop is translated to the memory address that contains 0x00400010 For example, 0x00400010 is saved in some address like 0x00300000 and when j loop is called, loop is translated into 0x00300000 and it is able to get value from 0x00300000 and reach out 0x00400010. (This is just one of my guess)
You have a number of questions here.
First, let's try to differentiate between the assembler's operation and the MIPS machine code that it generates and the processor executes.
The assembler manages labels and address in two ways. First, it has a symbol table, which is like a dictionary, a data structure of key-value pairs where the names are keys and the addresses (that those names will refer to when the program is running) are the values in the pairs.
Second, the assembler manages the code and data sections with a location counter. That location counter advances each time the program provides some code or data. When new label is defined, the current location counter is then used as the address value in a new key-value pair.
The processor never sees the labels: they do not execute and they do not occupy any space in the code or data. The processor sees only machine code instructions, which on MIPS are all 32-bits wide. Each machine code instruction is divided into fields. There are instruction types or formats, which on MIPS are straightforward: I-Type, J-Type, and R-Type. These formats then define the instruction fields, and the assembler follows these encodings. All the instruction formats share the 6-bit opcode field, and this opcode field tells the processor what format the instruction is, which fields it therefore has, and thus how to interpret and execute the rest of the instruction.
The assembler removes labels from the assembly — labels and their names do not exist in the program binary. The label definitions themselves (label:) are omitted from the program binary but usages of labels are translated into numbers, so a machine code instruction that uses a label will have some instruction field that is numeric, and the assembler will provide a proper value for that numeric field so that the effect of the reaching or otherwise accessing what the label referred to is accomplished. (The label is no longer in the program binary, but the code or data memory that the label referred does remain).
The assembler sets up branch instructions, j instructions, and la/lw instructions, using numbers that tell the processor how far forward or backward to move the program counter, or, what address some data of interest is at. The lw/la instructions access data, and these use 2 x 32-bit instructions each holding 16 bits of the address of interest. Between the two instructions, they put together a full 32-bit address for data access. For branches to fully reach any 32-bit address, they would have to put together the 32-bit address in a similar manner (two instruction pair) and use an indirect/register branch.
I'm trying to learn about MIPS pipe-lining and the hazards associated to them. I'm having trouble picturing what a structural hazard looks like in MIPS instructions.
I've read that it is a situation where two (or more) instructions
require the use of a given hardware resource at the same time. And I've seen examples shown in clock cycles before. But can anyone just provide a simple MIPS instruction set example for me to see? I'm having difficulty finding one online. Just see lots of examples for data hazards and that's not what I'm looking for. Thanks!
It is hard for you to come by this problem because it's usually resolved in the HW architecture...
Here are two examples:
Assume a write is made to the register file (RF) during stage 5 (WB) and a read is made to the same register on the RF on stage 2 (ID) at the same time. This is a structural hazard because two instructions are trying to access the same resource at the same clock cycle (what value will be read?). This can be resolved (in the HW), for instance, by splitting the RF access to two clock phases, write on HIGH and read on LOW. Moreover, if you think about it, a structural hazard is why there are separate 2 read ports and 1 write port in the RF.
Assume an instruction is being fetch from memory (stage 1, IF) and another read/write is done to the memory on stage 4 (MEM). Again, same resource accessed on the same cycle. This was resolved by separating the data and instruction memories (harvard architecture). It may look obvious to you but you can lookup Princeton Architecture and see an example to a unified memory.
So if we take the first example for instance: any set of instructions with a load (lw) command to the same register as in a R-type command (like add) that follows after two other instructions will do the trick:
lw $8, 100($9)
add $10, $11, $12
add $10, $11, $12
add $10, $8, $12
Hope that helps.
This might work, but I'm not a big MIPS person:
add $t0, $t1, $t2
sw $t3, 0($t4)
sub $t5, $t6, $t7
sub $t8, $t9, $t0
sw $t0, 0($s0)
Original:
After Compact:
So how is the process of this? What is lui here for?
lui is "Load upper immediate" and puts the the 16 bit immediate in the upper half of a register. In C-like notation $r = imm16 << 16. The register $1 is used as an assembler temporary. In this case the lw got divided into a lui and lw to load a full 32 bit address.
The instructions li and la are pseudo instructions (see Wikipedia) and they get replaced by a lui followed by a ori, addiu or andi.
Your assembler optimized the lui instructions away because the pseudo instructions and loads only need 16 bit values and addresses and not the full 32 bit values. Without optimization the assembler must assume that the full 32 bit values are needed.
What is the use of a $zero register in MIPS?
What does it mean?
lw $t0, myInteger($zero)
The zero register always holds the constant 0. There's not really anything special about it except for the fact that 0 happens to be a very useful constant. So useful that the MIPS designers dedicated a register to holding its value. (This way you don't have to waste another register, or any memory, holding the value.)
EDIT:
As for the question of what that line of code means, it loads the word from MEMORY[myInteger + 0] into the $t0 register. The lw command takes both a constant (myInteger) and a register ($zero). Not sure why that is, but that's just how the instructions work. Since myInteger was used as the constant, a register had to be provided, so $zero was used.
How does 'alignment of memory operands' help MIPS to be pipelined?
The book says:
Fourth, as discussed in Chapter 2, operands must be aligned in memory. Hence,
we need not worry about a single data transfer instruction requiring two data
memory accesses; the requested data can be transferred between processor and
memory in a single pipeline stage.
I think I understand that one data transfer instruction does not require two or more data memory aaccesses.
However, I am not sure what does it have to do with the alignment of memory operands.
Thanks, in advance!
The lw instruction requires that the memory address be word aligned.
Therefore, to access an unaligned word, one would need to access the two word boundaries that the required word intersects and mask out the necessary bytes.
For example, suppose you desire to load a word stored at address 0x2. 0x2 is not word aligned, so you would need to load the half word stored at 0x2 and the half-word stored at 0x4.
To do so, one might write:
lh $t0 2($zero)
lh $t1 4($zero)
sll $t1 $t1 16
or $t2 $t0 $t1
This only gets more complicated if you want to load for example a word stored at address 0x3:
# load first byte
lb $t0 3($zero)
# load second word, mask out first 3 bytes
lw $t1 4($zero)
lui $t2 0x0000FFFF
ori $t2 $t2 0xFFFFFFFF
or $t1 $t1 $t2
# combine
sll $t1 $t1 8
or $t2 $t0 $t1
So, it can be seen that the requirement for word alignment doesn't help MIPS to be pipelined, but rather that access to unaligned words requires excess memory accesses — this is a limitation of the ISA.