What is the cause of undefined ARM Exceptions? - exception

One question is when the undefined instruction happens .... Do we need to get the current executing instruction from R14_SVC or R14_UNDEF? . Currently I am working on one problem where an undefined instruction happened. On checking the R14_SVC I found the instruction was like below:
0x46BFD73C cmp r0, #0x0
0x46BFD740 beq 0x46BFD75C
0x46BFD744 ldr r0,0x46BFE358
so in my assumption the undefined instruction would have happened while executing the instruction beq 0x46BFD75C
One thing that puzzles me is I checked the r14_undef and the istruction was different.
0x46bfd4b8 bx r14
0x46bfd4bC mov r0, 0x01
0x46bfd4c0 bx r14
Which one caused the undefined instruction exception?

All of your answers are in the ARM ARM, ARM Architectural Reference Manual. go to infocenter.arm.com under reference manuals find the architecture family you are interested in. The non-cortex-m series all handle these exceptions the same way
When an Undefined Instruction exception occurs, the following actions are performed:
R14_und = address of next instruction after the Undefined instruction
SPSR_und = CPSR
CPSR[4:0] = 0b11011 /* Enter Undefined Instruction mode */
CPSR[5] = 0 /* Execute in ARM state */
/* CPSR[6] is unchanged */
CPSR[7] = 1 /* Disable normal interrupts */
/* CPSR[8] is unchanged */
CPSR[9] = CP15_reg1_EEbit
/* Endianness on exception entry */
if high vectors configured then
PC = 0xFFFF0004
else
PC = 0x00000004
R14_und points at the next instruction AFTER the undefined instruction. you have to examine SPSR_und to determine what mode the processor was in (arm or thumb) to know if you need to subtract 2 or 4 from R14_und and if you need to fetch 2 or 4 bytes. Unfortunately if on a newer architecture that supports thumb2 you may have to fetch 4 bytes even in thumb mode and try to figure out what happened. being variable word length it is very possible to be in a situation where it is impossible to determine what happened. If you are not using thumb2 instructions then it is deterministic.

Related

RISC-V: Illegal instruction exception when switching to supervisor mode

When setting the mstatus.mpp field to switch to supervisor mode, I'm getting an illegal instruction exception when calling mret. I'm testing this in qemu-system-riscv64 version 6.1 with the riscv64-softmmu system.
I recently upgraded from QEMU 5.0 to 6.1. Prior to this upgrade my code worked. I can't see anything relevant in the changelog. I'm assuming that there's a problem in my code that the newer version simply doesn't tolerate.
Here is a snippet of assembly that shows what's happening (unrelated boot code removed):
.setup_hart:
csrw satp, zero # Disable address translation.
li t0, (1 << 11) # Supervisor mode.
csrw mstatus, t0
csrw mie, zero # Disable interrupts.
la sp, __stack_top # Setup stack pointer.
la t0, asm_trap_vector
csrw mtvec, t0
la t0, kernel_main # Jump to kernel_main on trap return.
csrw mepc, t0
la ra, cpu_halt # If we return from main, halt.
mret
If I set the mstatus.mpp field to 0b11 for machine mode, I can get to kernel_main without any problem.
Here's the output from QEMU showing the exception information:
riscv_cpu_do_interrupt: hart:0, async:0, cause:0000000000000002, epc:0x000000008000006c, tval:0x0000000000000000, desc=illegal_instruction
mepc points to the address of the mret instruction where the exception occurs.
I've tested that the machine supports supervisor mode by writing and retrieving the value in mstatus.mpp successfully.
Is there something obvious I'm missing? My code seems very similar to the few examples I can find online, such as https://osblog.stephenmarz.com/ch3.2.html. Any help would be greatly appreciated.
The issue turned out to be RISC-V's Physical Memory Protection (PMP). QEMU will raise an illegal instruction exception when executing an MRET instruction if no PMP rules have been defined. Adding a PMP entry resolved the issue.
This was confusing, as this behaviour is not specified in the Privileged Architecture manual's section on mret.

When do we create base pointer in a function - before or after local variables?

I am reading the Programming From Ground Up book. I see two different examples of how the base pointer %ebp is created from the current stack position %esp.
In one case, it is done before the local variables.
_start:
# INITIALIZE PROGRAM
subl $ST_SIZE_RESERVE, %esp # Allocate space for pointers on the
# stack (file descriptors in this
# case)
movl %esp, %ebp
The _start however is not like other functions, it is the entry point of the program.
In another case it is done after.
power:
pushl %ebp # Save old base pointer
movl %esp, %ebp # Make stack pointer the base pointer
subl $4, %esp # Get room for our local storage
So my question is, do we first reserve space for local variables in the stack and create the base pointer or first create the base pointer and then reserve space for local variables?
Wouldn't both just work even if I mix them up in different functions of a program? One function does it before, the other does it after etc. Does C have a specific convention when it creates the machine code?
My reasoning is that all the code in a function would be relative to the base pointer, so as long as that function follows the convention according to which it created a reference of the stack, it just works?
Few related links for those are interested:
Function Prologue
In your first case you don't care about preservation - this is the entry point. You are trashing %ebp when you exit the program - who cares about the state of the registers? It doesn't matter any more as your application has ended. But in a function, when you return from that function the caller certainly doesn't want %ebp trashed. Now can you modify %esp first then save %ebp then use %ebp? Sure, so long as you unwind the same way on the other end of the function, you may not need to have a frame pointer at all, often that is just a personal choice.
You just need a relative picture of the world. A frame pointer is usually just there to make the compiler author's job easier, actually it is usually there just to waste a register for many instruction sets. Perhaps because some teacher or textbook taught it that way, and nobody asked why.
For coding sanity, the compiler author's sanity etc, it is desirable if you need to use the stack to have a base address from which to offset into your portion of the stack, FOR THE DURATION of the function. Or at least after the setup and before the cleanup. This can be the stack pointer(sp) itself or it can be a frame pointer, sometimes it is obvious from the instruction set. Some have a stack that grows down (in address space toward zero) and the stack pointer can only have positive offsets in sp based address (sane) or some negative only (insane) (unlikely but lets say its there). So you may want a general purpose register. Maybe there are some you cant use the sp in addressing at all and you have to use a general purpose register.
Bottom line, for sanity you want a reference point to offset items in the stack, the more painful way but uses less memory would be to add and remove things as you go:
x is at sp+4
push a
push b
do stuff
x is at sp+12
pop b
x is at sp+8
call something
pop a
x is at sp+4
do stuff
More work but can make a program (compiler) that keeps track and is less error prone than a human by hand, but when debugging the compiler output (a human) it is harder to follow and keep track. So generally we burn the stack space and have one reference point. A frame pointer can be used to separate the incoming parameters and the local variables using base pointer(bp) for example as a static base address within the function and sp as the base address for local variables (athough sp could be used for everything if the instruction set provides that much of an offset). So by pushing bp then modifying sp you are creating this two base address situation, sp can move around perhaps for local stuff (although not usually sane) and bp can be used as a static place to grab parameters if this is a calling convention that dictates all parameters are on the stack (generally when you dont have a lot of general purpose registers) sometimes you see the parameters are copied to local allocation on the stack for later use, but if you have enough registers you may see that instead a register is saved on the stack and used in the function instead of needing to access the stack using a base address and offset.
unsigned int more_fun ( unsigned int x );
unsigned int fun ( unsigned int x )
{
unsigned int y;
y = x;
return(more_fun(x+1)+y);
}
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e1a04000 mov r4, r0
8: e2800001 add r0, r0, #1
c: ebfffffe bl 0 <more_fun>
10: e0800004 add r0, r0, r4
14: e8bd4010 pop {r4, lr}
18: e12fff1e bx lr
Do not take what you see in a text book, white board (or on answers in StackOverflow) as gospel. Think through the problem, and through alternatives.
Are the alternatives functionally broken?
Are they functionally correct?
Are there disadvantages like readability?
Performance?
Is the performance hit universal or does it depend on just how
slow/fast the memory is?
Do the alternatives generate more code which is a performance hit but
maybe that code is pipelined vs random memory accesses?
If I don't use a frame pointer does the architecture let me regain
that register for general purpose use?
In the first example bp is being trashed, that is bad in general but this is the entry point to the program, there is no need to preserve bp (unless the operating system dictates).
In a function though, based on the calling convention one assumes that bpis used by the caller and must be preserved, so you have to save it on the stack to use it. In this case it appears to want to be used to access parameters passed in by the caller on the stack, then sp is moved to make room for (and possibly access but not necessarily required if bp can be used) local variables.
If you were to modify sp first then push bp, you would basically have two pointers one push width away from each other, does that make much sense? Does it make sense to have two frame pointers anyway and if so does it make sense to have them almost the same address?
By pushing bp first and if the calling convention pushes the first paramemter last then as a compiler author you can make bp+N always or ideally always point at the first parameter for a fixed value N likewise bp+M always points at the second. A bit lazy to me, but if the register is there to be burned then burn it...
In one case, it is done before the local variables.
_start is not a function. It's your entry point. There's no return address, and no caller's value of %ebp to save.
The i386 System V ABI doc suggests (in section 2.3.1 Initial Stack and Register State) that you might want to zero %ebp to mark the deepest stack frame. (i.e. before your first call instruction, so the linked list of saved ebp values has a NULL terminator when that first function pushes the zeroed ebp. See below).
Does C have a specific convention when it creates the machine code?
No, unlike in some other x86 systems, the i386 System V ABI doesn't require much about your stack-frame layout. (Linux uses the System V ABI / calling convention, and the book you're using (PGU) is for Linux.)
In some calling conventions, setting up ebp is not optional, and the function entry sequence has to push ebp just below the return address. This creates a linked list of stack frames which allows an exception handler (or debugger) to backtrace up the stack. (How to generate the backtrace by looking at the stack values?). I think this is required in 32-bit Windows code for SEH (structured exception handling), at least in some cases, but IDK the details.
The i386 SysV ABI defines an alternate mechanism for stack unwinding which makes frame pointers optional, using metadata in another section (.eh_frame and .eh_frame_hdr which contains metadata created by .cfi_... assembler directives, which in theory you could write yourself if you wanted stack-unwinding through your function to work. i.e. if you were calling any C++ code which expected throw to work.)
If you want to use the traditional frame-walking in current gdb, you have to actually do it yourself by defining a GDB function like gdb backtrace by walking frame pointers or Force GDB to use frame-pointer based unwinding. Or apparently if your executable has no .eh_frame section at all, gdb will use the EBP-based stack-walking method.
If you compile with gcc -fno-omit-frame-pointer, your call stack will have this linked-list property, because when C compilers do make proper stack frames, they push ebp first.
IIRC, perf has a mode for using the frame-pointer chain to get backtraces while profiling, and apparently this can be more reliable than the default .eh_frame stuff for correctly accounting which functions are responsible for using the most CPU time. (Or causing the most cache misses, branch mispredicts, or whatever else you're counting with performance counters.)
Wouldn't both just work even if I mix them up in different functions of a program? One function does it before, the other does it after etc.
Yes, it would work fine. In fact setting up ebp at all is optional, but when writing by hand it's easier to have a fixed base (unlike esp which moves around when you push/pop).
For the same reason, it's easier to stick to the convention of mov %esp, %ebp after one push (of the old %ebp), so the first function arg is always at ebp+8. See What is stack frame in assembly? for the usual convention.
But you could maybe save code size by having ebp point in the middle of some space you reserved, so all the memory addressable with an ebp + disp8 addressing mode is usable. (disp8 is a signed 8-bit displacement: -128 to +124 if we're limiting to 4-byte aligned locations). This saves code bytes vs. needing a disp32 to reach farther. So you might do
bigfunc:
push %ebp
lea -112(%esp), %ebp # first arg at ebp+8+112 = 120(%ebp)
sub $236, %esp # locals from -124(%ebp) ... 108(%ebp)
# saved EBP at 112(%ebp), ret addr at 116(%ebp)
# 236 was chosen to leave %esp 16-byte aligned.
Or delay saving any registers until after reserving space for locals, so we aren't using up any of the locations (other than the ret addr) with saved values we never want to address.
bigfunc2: # first arg at 4(%esp)
sub $252, %esp # first arg at 252+4(%esp)
push %ebp # first arg at 252+4+4(%esp)
lea 140(%esp), %ebp # first arg at 260-140 = 120(%ebp)
push %edi # save the other call-preserved regs
push %esi
push %ebx
# %esp is 16-byte aligned after these pushes, in case that matters
(Remember to be careful how you restore registers and clean up. You can't use leave because esp = ebp isn't right. With the "normal" stack frame sequence, you might restore other pushed registers (from near the saved EBP) with mov, then use leave. Or restore esp to point at the last push (with add), and use pop instructions.)
But if you're going to do this, there's no advantage to using ebp instead of ebx or something. In fact, there's a disadvantage to using ebp: the 0(%ebp) addressing mode requires a disp8 of 0, instead of no displacement, but %ebx wouldn't. So use %ebp for a non-pointer scratch register. Or at least one that you don't dereference without a displacement. (This quirk is irrelevant with a real frame pointer: (%ebp) is the saved EBP value. And BTW, the encoding that would mean (%ebp) with no displacement is how the ModRM byte encodes a disp32 with no base register, like (12345) or my_label)
These example are pretty artifical; you usually don't need that much space for locals unless it's an array, and then you'd use indexed addressing modes or pointers, not just a disp8 relative to ebp. But maybe you need space for a few 32-byte AVX vectors. In 32-bit code with only 8 vector registers, that's plausible.
AVX512 compressed disp8 mostly defeats this argument for 64-byte AVX512 vectors, though. (But AVX512 in 32-bit mode can still only use 8 vector registers, zmm0-zmm7, so you could easily need to spill some. You only get x/ymm8-15 and zmm8-31 in 64-bit mode.)

what is the current execution mode/exception level, etc?

I am new to ARMv8 architecture. I have following basic questions on my mind:
How do I know what is the current execution mode AArch32 or AArch64? Should I read CPSR or SPSR to ascertain this?
What is the current Exception level, EL0/1/2/3?
Once an exception comes, can i read any register to determine whether I am in Serror/Synchronous/IRQ/FIQ exception handler.
TIA.
The assembly instructions and their binary encoding are entirely different for 32 and 64 bit. So the information what mode you are currently in is something that you/ the compiler already needs to know during compilation. checking for them at runtime doesn't make sense. For C, C++ checking can be done at compile time (#ifdef) through compiler provided macros like the ones provided by armclang: __aarch64__ for 64 bit, __arm__ for 32 bit
depends on the execution mode:
aarch32: MRS <Rn>, CPSR read the current state into register number n. Then extract bits 3:0 that contain the current mode.
aarch64: MRS <Xn>, CurrentEL read the current EL into register number n
short answer: you can't. long answer: the assumption is that by the structure of the code and the state of any user defined variables, you already know what you are doing. i.e. whether you came to a position in code through regular code or through an exception.
aarch64 C code:
register uint64_t x0 __asm__ ("x0");
__asm__ ("mrs x0, CurrentEL;" : : : "%x0");
printf("EL = %" PRIu64 "\n", x0 >> 2);
arm C code:
register uint32_t r0 __asm__ ("r0");
__asm__ ("mrs r0, CPSR" : : : "%r0");
printf("EL = %" PRIu32 "\n", r0 & 0x1F);
CurrentEL however is not readable from EL0 as shown on the ARMv8 manual C5.2.1 "CurrentEL, Current Exception Level" section "Accessibility". Trying to run it in Linux userland raises SIGILL. You could catch that signal however I suppose...
CPSR is readable from EL0 however.
Tested on QEMU and gem5 with this setup.

get wrong epc on MIPS

I know MIPS would get wrong epc register value when it happens at branch delay, and epc = fault_address - 4.
But now, I often get the wrong EPC value which is even NOT in .text segment such as 0xb6000000, what's wrong with the case??
Thanks for your advance..
The CPU does not know anything about the boundaries of the .text region in your program. It simply implements a 2^32 byte address space.
It is possible for an incorrectly programmed jump to go to any address within the 2^32 byte address space. The jump instruction itself will not cause any sort of exception - in fact the MIPS32® Architecture for Programmers Volume II: The MIPS32® Instruction Set explicitly states that jump (J, JR, JALR) instructions do not trigger any exceptions.
When the processor starts executing from the destination of an incorrectly programmed jump, in presumably uninitialized memory, what happens next depends on the contents of that memory. If uninitialized memory is filled with "random" data, that data will be interpreted as instructions which the processor will execute until an illegal instruction is found, or until an instruction triggers some other exception.

ARM Undefined Instruction error

I'm getting an Undefined Instruction error while running an embedded system, no coprocessor, no MMU, Atmel 9263. The embedded system has memory in the range 0x20000000 - 0x23FFFFFF. I've had two cases so far:
SP 0x0030B840, LR 2000AE78 - the LR points at valid code, so I'm not sure what causes the exception, although the SP is bogus. What other addresses, registers, memory locations should I look at?
SP 0x20D384A8, LR 0x1FFCA59C - SP is ok, LR is bogus. Is there some kind of post mortem that I can do to find out how the LR got crushed? Looks like it rolled backwards off the end of the address space, but I can't figure out how.
Right now I am just replacing large chunks of code with simulations and running the tests agin to try and isolate the issue - the problem is sometimes it takes 4 hours to show the problem.
Any hints out there would be appreciated, thanks!
The chip is the AT91SAM9263, and we are using the IAR EWARM toolchain. I'm pretty sure it is straight ARM, but I will check.
EDIT
Another example of the Undef Instruct - this time SP/LR look fine. LR = 0x2000b0c4, and when I disassemble near there:
2000b0bc e5922000 LDR R2, [R2, #+0]
2000b0c0 e12fff32 BLX R2
2000b0c4 e1b00004 MOVS R0, R4
since LR is the instruction following the Undef Exception - how is BLX identified as Undefined? Note that CPSR is 0x00000013, so this is all ARM mode. However, R2 is 0x226d2a08 which is in the heap area, and I think is incorrect - the disassmbly there is ANDEQ R0,R0,R12, the instruction is 0x0000000C, and the other instructions there look like data to me. So I think the bad R2 is the problem, I'm just trying to understand why the Undef at the BLX?
thanks!
Check the T bit in the CPSR. If you are inadvertently changing from ARM mode to Thumb mode (or vice versa), undefined instructions will occur.
As far as the SP or LR getting corrupted, it could be that you execute a few instructions in the wrong mode that corrupt them before hitting the undefined instruction.
EDIT
Responding to the new error case in the edit of the question:
LR contains the return address from the BLX R2, so it makes sense that it points to one instruction after the BLX.
If R2 was pointing to the heap when the BLX R2 was executed, you'll jump into the heap and start executing the data as if they were instructions. This will cause an undefined instruction exception in short order...
If you want to see the exact instruction that was undefined, look at the R14_und register (defined while you're in the undefined instruction handler) - it contains the address of the next instruction after the Undefined one.
The root cause is the bad value in R2. Assuming this is C code, my guess is a bad pointer dereference, but I'd need to see the source to know for sure.
Is this an undefined instruction or a data abort because you are reading from an unaligned address?
edit:
On an undefined exception CPSR[4:0] should be 0b11011 or 0x1B not 0x13, 0x13 is a reset according to the arm arm.