ARM Undefined Instruction error - exception

I'm getting an Undefined Instruction error while running an embedded system, no coprocessor, no MMU, Atmel 9263. The embedded system has memory in the range 0x20000000 - 0x23FFFFFF. I've had two cases so far:
SP 0x0030B840, LR 2000AE78 - the LR points at valid code, so I'm not sure what causes the exception, although the SP is bogus. What other addresses, registers, memory locations should I look at?
SP 0x20D384A8, LR 0x1FFCA59C - SP is ok, LR is bogus. Is there some kind of post mortem that I can do to find out how the LR got crushed? Looks like it rolled backwards off the end of the address space, but I can't figure out how.
Right now I am just replacing large chunks of code with simulations and running the tests agin to try and isolate the issue - the problem is sometimes it takes 4 hours to show the problem.
Any hints out there would be appreciated, thanks!
The chip is the AT91SAM9263, and we are using the IAR EWARM toolchain. I'm pretty sure it is straight ARM, but I will check.
EDIT
Another example of the Undef Instruct - this time SP/LR look fine. LR = 0x2000b0c4, and when I disassemble near there:
2000b0bc e5922000 LDR R2, [R2, #+0]
2000b0c0 e12fff32 BLX R2
2000b0c4 e1b00004 MOVS R0, R4
since LR is the instruction following the Undef Exception - how is BLX identified as Undefined? Note that CPSR is 0x00000013, so this is all ARM mode. However, R2 is 0x226d2a08 which is in the heap area, and I think is incorrect - the disassmbly there is ANDEQ R0,R0,R12, the instruction is 0x0000000C, and the other instructions there look like data to me. So I think the bad R2 is the problem, I'm just trying to understand why the Undef at the BLX?
thanks!

Check the T bit in the CPSR. If you are inadvertently changing from ARM mode to Thumb mode (or vice versa), undefined instructions will occur.
As far as the SP or LR getting corrupted, it could be that you execute a few instructions in the wrong mode that corrupt them before hitting the undefined instruction.
EDIT
Responding to the new error case in the edit of the question:
LR contains the return address from the BLX R2, so it makes sense that it points to one instruction after the BLX.
If R2 was pointing to the heap when the BLX R2 was executed, you'll jump into the heap and start executing the data as if they were instructions. This will cause an undefined instruction exception in short order...
If you want to see the exact instruction that was undefined, look at the R14_und register (defined while you're in the undefined instruction handler) - it contains the address of the next instruction after the Undefined one.
The root cause is the bad value in R2. Assuming this is C code, my guess is a bad pointer dereference, but I'd need to see the source to know for sure.

Is this an undefined instruction or a data abort because you are reading from an unaligned address?
edit:
On an undefined exception CPSR[4:0] should be 0b11011 or 0x1B not 0x13, 0x13 is a reset according to the arm arm.

Related

Looking for a _one byte_ invalid opcode with x86

I need an invalid opcode with x86 (not x64!) that's exactly one byte in length to overwrite some code in a foreign process. Currently I'm using INT3 (0xCC) but it would be nicer to trap an invalid opcode separately since the foreign process contains a lot of valid INT3.
According to http://ref.x86asm.net/coder32.html, there aren't any in 32-bit mode guaranteed to #UD. Anything that wasn't nailed down has been reused as building material for new extensions.
The ones that exist in 64-bit mode are reserved and not guaranteed to fault on future CPUs; only ud2 is truly guaranteed future-proof. Assuming x86-64 lasts long enough, likely some vendor will make use of that 64-bit-only coding space and stop wasting code-size to also cater to increasingly obsolete 32-bit mode.
If you don't need #UD, you can raise #GP(0) with some privileged instructions in user-space, assuming you're never going to be running in kernel mode.
F4 hlt will always #GP(0) in user-space, not enabled by IOPL, only true CPL=0. (Or #UD if used with a lock prefix). Even if it somehow gets executed in a kernel context, it just stops and waits for the next interrupt, so typically no effect on correctness unless executed with interrupts disabled. (In which case you're stuck until the next NMI).
A similar but worse option is FB sti. But it can execute successfully in a program that's used Linux iopl(), like an X11 server. Unless interrupts were supposed to be disabled, though, that's still not going to lock up your system, it just won't trigger the exception you were looking for. (Unlike cli which could get that CPU stuck, or in al, dx which could do wild IO and even be allowed by ioperm not just iopl, depending on what value is in DX.)
Depending what comes next in memory, 9A callf ptr16:32 might fault on trying to load an invalid value into CS. That value would come from the 2 bytes of machine code 5 and 6 bytes after this one (i.e. after a 32-bit new EIP, since ptr16:32 is stored little-endian). Unlike call rel32 or whatever, it may fault before actually pushing anything and overwriting the current CS:EIP. (But if not, in theory your debugger could simulate popping that far-return address back into CS:EIP after catching the fault.)
Just to be clear, I'm suggesting overwriting a byte with 9A, and leaving the later bytes of machine code unmodified, after checking that the bytes that would be the new CS value are in fact invalid. e.g. by making sure a far call to that address segfaults. Or if this is near the end of a page, and the next is unmapped, it can #PF.
The F0 lock prefix faults with #UD if used on things other than a memory-destination RMW operation, so it can also work if later context would decode as any other instruction. But you can't always use it; you need to check that you aren't creating a valid atomic RMW instruction. e.g. if the ModRM byte was 00 or 01, replacing the opcode with a lock prefix creates a memory-destination add.
#ecm points out that f1 on some CPUs is icebp / int1, but on other CPUs where it isn't, it's undefined but doesn't raise #UD. (http://ref.x86asm.net/coder32.html#xF1)
If the following byte is 0, D4 00 aam 0 is guaranteed to #DE (divide exception). But any other value does immediate 8-bit division of AL.
Depending what byte comes next, CD int n can be used. But not for all following bytes, e.g. int 0x80 won't fault under Linux (unless your kernel is built without CONFIG_IA32_EMULATION). And you might not want some of the other random interrupt numbers. e.g. CD 03 int 3 is pretty much like CC int3.

what is the current execution mode/exception level, etc?

I am new to ARMv8 architecture. I have following basic questions on my mind:
How do I know what is the current execution mode AArch32 or AArch64? Should I read CPSR or SPSR to ascertain this?
What is the current Exception level, EL0/1/2/3?
Once an exception comes, can i read any register to determine whether I am in Serror/Synchronous/IRQ/FIQ exception handler.
TIA.
The assembly instructions and their binary encoding are entirely different for 32 and 64 bit. So the information what mode you are currently in is something that you/ the compiler already needs to know during compilation. checking for them at runtime doesn't make sense. For C, C++ checking can be done at compile time (#ifdef) through compiler provided macros like the ones provided by armclang: __aarch64__ for 64 bit, __arm__ for 32 bit
depends on the execution mode:
aarch32: MRS <Rn>, CPSR read the current state into register number n. Then extract bits 3:0 that contain the current mode.
aarch64: MRS <Xn>, CurrentEL read the current EL into register number n
short answer: you can't. long answer: the assumption is that by the structure of the code and the state of any user defined variables, you already know what you are doing. i.e. whether you came to a position in code through regular code or through an exception.
aarch64 C code:
register uint64_t x0 __asm__ ("x0");
__asm__ ("mrs x0, CurrentEL;" : : : "%x0");
printf("EL = %" PRIu64 "\n", x0 >> 2);
arm C code:
register uint32_t r0 __asm__ ("r0");
__asm__ ("mrs r0, CPSR" : : : "%r0");
printf("EL = %" PRIu32 "\n", r0 & 0x1F);
CurrentEL however is not readable from EL0 as shown on the ARMv8 manual C5.2.1 "CurrentEL, Current Exception Level" section "Accessibility". Trying to run it in Linux userland raises SIGILL. You could catch that signal however I suppose...
CPSR is readable from EL0 however.
Tested on QEMU and gem5 with this setup.

MIPS: legal to have two consecutive "load word" instructions into the same register?

Background: We're seeing a very intermittent crash in a function foo(int *p). The crash occurs while dereferencing p, whose value in these cases turns out to be 0xffffffff. An analysis of the core dump shows that foo() is called from the following assembly snippet:
bne ... somewhere else
lw $a0,44(sp)
lw $a0,40(sp)
jal foo()
lui s1, 0x1000
Inspecting memory in the core dump shows that 44(sp) is 0xffffffff, whereas 40(sp) is the correct value we intend to dereference. However, the value of a0 at the time of the crash, inside foo(), is 0xffffffff. (It's important to note that foo() in this case is just accessing a member; so it's literally the first instruction in foo() which is already attempting to access via a0, and crashing. Also, ra is pointing to the instruction following the above snippet, and s1 currently contains 0x10000000, so we're quite confident that foo() was, indeed, called from the above snippet.)
Our only theory at the moment is that the two consecutive lws into a0 are a hazard -- either a documented one, in which case this looks like a compiler bug; or an undocumented one.
So: is the above assembly legal? If it is, any other ideas about what could be going on here?
Thanks!
UPDATE: Well, turns out this was all a wild goose chase: a repeat analysis of the coredump by a colleague turned up a path in the code which I had missed, where there was a jump directly to the jal foo() instruction, immediately after having set a0 to 44(sp). In other words, there is a path in the code which is consistent with the result we're seeing that does not involve hazards, or "skipped instructions" or anything... I thought I checked this, but I guess I either didn't, or missed it... :(
Anyway, I've accepted markgz's answer, since it answers my original question about the legality of these instructions (apparently they are).
A quick search of the MIPS documentation for the MIPS32R2 ISA doesn't show any restrictions on LW after LW instructions.
There might be a bug in the MIPS implementation in your CPU. Things to look at include:
What address is 44(sp), 40(sp) - are they on a page boundary or a 256MByte boundary, or other interesting address?
Do either of the loads trigger a page fault?
Does patching the binary to insert a NOP, SSNOP, or a SYNC instruction between the loads make the problem go away?

get wrong epc on MIPS

I know MIPS would get wrong epc register value when it happens at branch delay, and epc = fault_address - 4.
But now, I often get the wrong EPC value which is even NOT in .text segment such as 0xb6000000, what's wrong with the case??
Thanks for your advance..
The CPU does not know anything about the boundaries of the .text region in your program. It simply implements a 2^32 byte address space.
It is possible for an incorrectly programmed jump to go to any address within the 2^32 byte address space. The jump instruction itself will not cause any sort of exception - in fact the MIPS32® Architecture for Programmers Volume II: The MIPS32® Instruction Set explicitly states that jump (J, JR, JALR) instructions do not trigger any exceptions.
When the processor starts executing from the destination of an incorrectly programmed jump, in presumably uninitialized memory, what happens next depends on the contents of that memory. If uninitialized memory is filled with "random" data, that data will be interpreted as instructions which the processor will execute until an illegal instruction is found, or until an instruction triggers some other exception.

What is the cause of undefined ARM Exceptions?

One question is when the undefined instruction happens .... Do we need to get the current executing instruction from R14_SVC or R14_UNDEF? . Currently I am working on one problem where an undefined instruction happened. On checking the R14_SVC I found the instruction was like below:
0x46BFD73C cmp r0, #0x0
0x46BFD740 beq 0x46BFD75C
0x46BFD744 ldr r0,0x46BFE358
so in my assumption the undefined instruction would have happened while executing the instruction beq 0x46BFD75C
One thing that puzzles me is I checked the r14_undef and the istruction was different.
0x46bfd4b8 bx r14
0x46bfd4bC mov r0, 0x01
0x46bfd4c0 bx r14
Which one caused the undefined instruction exception?
All of your answers are in the ARM ARM, ARM Architectural Reference Manual. go to infocenter.arm.com under reference manuals find the architecture family you are interested in. The non-cortex-m series all handle these exceptions the same way
When an Undefined Instruction exception occurs, the following actions are performed:
R14_und = address of next instruction after the Undefined instruction
SPSR_und = CPSR
CPSR[4:0] = 0b11011 /* Enter Undefined Instruction mode */
CPSR[5] = 0 /* Execute in ARM state */
/* CPSR[6] is unchanged */
CPSR[7] = 1 /* Disable normal interrupts */
/* CPSR[8] is unchanged */
CPSR[9] = CP15_reg1_EEbit
/* Endianness on exception entry */
if high vectors configured then
PC = 0xFFFF0004
else
PC = 0x00000004
R14_und points at the next instruction AFTER the undefined instruction. you have to examine SPSR_und to determine what mode the processor was in (arm or thumb) to know if you need to subtract 2 or 4 from R14_und and if you need to fetch 2 or 4 bytes. Unfortunately if on a newer architecture that supports thumb2 you may have to fetch 4 bytes even in thumb mode and try to figure out what happened. being variable word length it is very possible to be in a situation where it is impossible to determine what happened. If you are not using thumb2 instructions then it is deterministic.