MMIX: questions on dynamic trap - exception

I have some questions on MMIX's interrupt.
The definition of the special register rK is "interrupt mask register".
But it seems it treat different event differently.
for I/O bits(let's call ** bit:
This is a normal case.
When an external interrupt come, if rK's ** bit is 0, rQ's ** bit will hold '1' until rK's ** bit is set to 1, and the trap will occur.
for S-bit (program):
Accroding to the source-code of mmix-pipe, even when rK's S-bit is 0, a security exception will occur. and rK's S-bit will be alter from 0 to 1.
So, in positive-address, rK's S-bit should be 1, and in negative-address rK's S-bit don't have any effect.
Only rQ's S-bit is used to tell system that "there was a security issue", rK's S-bit seems useless.
for P-bit (program):
It is sad "instruction comes from a privileged (negative) virtual address." in mmix-doc.pdf.
But if PC is in a negative-address, rQ's P-bit will not always be set to 1. (mmix-pipe also not set rQ's P-bit).
I can prove it:
If all the instructions that executed before the "resume" insn will set rQ's P-bit to 1, then resume will also set rQ's P-bit to 1.
As a result, after the resume, the 'user' insn will always trigger a TRAP. that is impossible.
So when P-exception is inhibited, rQ's P-bit will not set to 1.
so rK's P-bit is not just used as a mask, it is used as a "option to bypass a exception-checking".
Here are some questions.
Why not make all rK's bits just as a mask for interrupt? Is there any reason made S/P bits so strange?
do other rQ's program bits (rwxnkb) set to 1 if rK's corresponding bits is 0?
As in Neg-address, trip will not occur, do DVWIOUZX bits hold 1 after it just happened?

Related

Looking for a _one byte_ invalid opcode with x86

I need an invalid opcode with x86 (not x64!) that's exactly one byte in length to overwrite some code in a foreign process. Currently I'm using INT3 (0xCC) but it would be nicer to trap an invalid opcode separately since the foreign process contains a lot of valid INT3.
According to http://ref.x86asm.net/coder32.html, there aren't any in 32-bit mode guaranteed to #UD. Anything that wasn't nailed down has been reused as building material for new extensions.
The ones that exist in 64-bit mode are reserved and not guaranteed to fault on future CPUs; only ud2 is truly guaranteed future-proof. Assuming x86-64 lasts long enough, likely some vendor will make use of that 64-bit-only coding space and stop wasting code-size to also cater to increasingly obsolete 32-bit mode.
If you don't need #UD, you can raise #GP(0) with some privileged instructions in user-space, assuming you're never going to be running in kernel mode.
F4 hlt will always #GP(0) in user-space, not enabled by IOPL, only true CPL=0. (Or #UD if used with a lock prefix). Even if it somehow gets executed in a kernel context, it just stops and waits for the next interrupt, so typically no effect on correctness unless executed with interrupts disabled. (In which case you're stuck until the next NMI).
A similar but worse option is FB sti. But it can execute successfully in a program that's used Linux iopl(), like an X11 server. Unless interrupts were supposed to be disabled, though, that's still not going to lock up your system, it just won't trigger the exception you were looking for. (Unlike cli which could get that CPU stuck, or in al, dx which could do wild IO and even be allowed by ioperm not just iopl, depending on what value is in DX.)
Depending what comes next in memory, 9A callf ptr16:32 might fault on trying to load an invalid value into CS. That value would come from the 2 bytes of machine code 5 and 6 bytes after this one (i.e. after a 32-bit new EIP, since ptr16:32 is stored little-endian). Unlike call rel32 or whatever, it may fault before actually pushing anything and overwriting the current CS:EIP. (But if not, in theory your debugger could simulate popping that far-return address back into CS:EIP after catching the fault.)
Just to be clear, I'm suggesting overwriting a byte with 9A, and leaving the later bytes of machine code unmodified, after checking that the bytes that would be the new CS value are in fact invalid. e.g. by making sure a far call to that address segfaults. Or if this is near the end of a page, and the next is unmapped, it can #PF.
The F0 lock prefix faults with #UD if used on things other than a memory-destination RMW operation, so it can also work if later context would decode as any other instruction. But you can't always use it; you need to check that you aren't creating a valid atomic RMW instruction. e.g. if the ModRM byte was 00 or 01, replacing the opcode with a lock prefix creates a memory-destination add.
#ecm points out that f1 on some CPUs is icebp / int1, but on other CPUs where it isn't, it's undefined but doesn't raise #UD. (http://ref.x86asm.net/coder32.html#xF1)
If the following byte is 0, D4 00 aam 0 is guaranteed to #DE (divide exception). But any other value does immediate 8-bit division of AL.
Depending what byte comes next, CD int n can be used. But not for all following bytes, e.g. int 0x80 won't fault under Linux (unless your kernel is built without CONFIG_IA32_EMULATION). And you might not want some of the other random interrupt numbers. e.g. CD 03 int 3 is pretty much like CC int3.

How to clear an exception in handler in risc-v?

Following is my trap routine in FE310 Sifive-Hifive1-Rev B board.
my_trap_routine:
// read mcause
csrr t0, mcause;
// read mepc
csrr t1, mepc;
mret;
Now, I generated a load access fault exception and execution jumped inside the trap routine. Now how to clear the exception inside the handler so that it don't keep jumping into trap routine again and again?
You have to advance the exception program counter, so that you return to the next instruction after in the user / interrupted code.
This is fairly simple in RISC V unless the compressed instruction set is in use, in which case you have to decode the excepting instruction to determine how far to advance the PC.
Fortunately, it is a pretty simple decode, but you need to be aware that RISC V allows varying instruction length in sizes of 2 byte increments.

How to diag imprecise bus fault after config of priority bit allocation, Cortex M3 STM32F10x w uC/OS-III

I have an issue in an app written for the ST Microelectronics STM32F103 (ARM Cortex-M3 r1p1). RTOS is uC/OS-III; dev environment is IAR EWARM v. 6.44; it also uses the ST Standard Peripheral Library v. 1.0.1.
The app is not new; it's been in development and in the field for at least a year. It makes use of two UARTs, I2C, and one or two timers. Recently I decided to review interrupt priority assignments, and I rearranged priorities as part of the review (things seemed to work just fine).
I discovered that there was no explicit allocation of group and sub-priority bits in the initialization code, including the RTOS, and so to make the app consistent with another app (same product, different processor) and with the new priority scheme, I added a call to NVIC_PriorityGroupConfig(), passing in NVIC_PriorityGroup_2. This sets the PRIGROUP value in the Application Interrupt and Reset Control Register (AIRCR) to 5, allocating 2 bits for group (preemption) priority and 2 bits for subpriority.
After doing this, I get an imprecise bus fault exception on execution, not immediately but very quickly thereafter. (More on where I suspect it occurs in a moment.) Since it's imprecise (BFSR.IMPRECISERR asserted), there's nothing of use in BFAR (BFSR.BFARVALID clear).
The STM32F family group implements 4 bits of priority. While I've not found this mentioned explicitly anywhere, it's apparently the most significant nybble of the priority. This assumption seems to be validated by the PRIGROUP table given in documentation (p. 134, STM32F10xxx/20xxx/21xxx/L1xxx Cortex-M3 Programming Manual (Doc 15491, Rev 5), sec. 4.4.5, Application interrupt and control register (SCB_AIRCR), Table 45, Priority grouping, p. 134).
In the ARM scheme, priority values comprise some number of group or preemption priority bits and some number of subpriority bits. Group priority bits are upper bits; subpriority are lower. The 3-bit AIRCR.PRIGROUP value controls how bit allocation for each are defined. PRIGROUP = 0 configures 7 bits of group priority and 1 bit of subpriority; PRIGROUP = 7 configures 0 bits of group priority and 8 bits of subpriority (thus priorities are all subpriority, and no preemption occurs for exceptions with settable priorities).
The reset value of AIRCR.PRIGROUP is defined to be 0.
For the STM32F10x, since only the upper 4 bits are implemented, it seems to follow that PRIGROUP = 0, 1, 2, 3 should all be equivalent, since they all correspond to >= 4 bits of group priority.
Given that assumption, I also tried calling NVIC_PriorityGroupConfig() with a value of NVIC_PriorityGroup_4, which corresponds to a PRIGROUP value of 3 (4 bits group priority, no subpriority).
This change also results in the bus fault exception.
Unfortunately, the STM32F103 is, I believe, r1p1, and so does not implement the Auxiliary Control Register (ACTLR; introduced in r2p0), so I can't try out the DISDEFWBUF bit (disables use of the write buffer during default memory map accesses, making all bus faults precise at the expense of some performance reduction).
I'm almost certain that the bus fault occurs in an ISR, and most likely in a UART ISR. I've set a breakpoint at a particular place in code, started the app, and had the bus fault before execution hit the breakpoint; however, if I step through code in the debugger, I can get to and past that breakpoint's location, and if I allow it to execute from there, I'll see the bus fault some small amount of time after I continue.
The next step will be to attempt to pin down what ISR is generating the bus fault, so that I can instrument it and/or attempt to catch its invocation and step through it.
So my questions are:
1) Anyone have any suggestions as to how to go about identifying the origin of imprecise bus fault exceptions more intelligently?
2) Why would setting PRIGROUP = 3 change the behavior of the system when PRIGROUP = 0 is the reset default? (PRIGROUP=0 means 7 bits group, 1 bit sub priority; PRIGROUP=3 means 4 bits group, 4 bits sub priority; STM32F10x only implements upper 4 bits of priority.)
Many, many thanks to everyone in advance for any insight or non-NULL pointers!
(And of course if I figure it out beforehand, I'll update this post with any information that might be useful to others encountering the same sort of scenario.)
Even if BFAR is not valid, you can still read other related registers within your bus-fault ISR:
void HardFault_Handler_C(unsigned int* hardfault_args)
{
printf("R0 = 0x%.8X\r\n",hardfault_args[0]);
printf("R1 = 0x%.8X\r\n",hardfault_args[1]);
printf("R2 = 0x%.8X\r\n",hardfault_args[2]);
printf("R3 = 0x%.8X\r\n",hardfault_args[3]);
printf("R12 = 0x%.8X\r\n",hardfault_args[4]);
printf("LR = 0x%.8X\r\n",hardfault_args[5]);
printf("PC = 0x%.8X\r\n",hardfault_args[6]);
printf("PSR = 0x%.8X\r\n",hardfault_args[7]);
printf("BFAR = 0x%.8X\r\n",*(unsigned int*)0xE000ED38);
printf("CFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED28);
printf("HFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED2C);
printf("DFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED30);
printf("AFSR = 0x%.8X\r\n",*(unsigned int*)0xE000ED3C);
printf("SHCSR = 0x%.8X\r\n",SCB->SHCSR);
while (1);
}
If you can't use printf at the point in the execution when this specific Hard-Fault interrupt occurs, then save all the above data in a global buffer instead, so you can view it after reaching the while (1).
Here is the complete description of how to connect this ISR to the interrupt vector (although, as I understand from your question, you already have it implemented):
Jumping from one firmware to another in MCU internal FLASH
You might be able to find additional information on top of what you already know at:
http://www.keil.com/appnotes/files/apnt209.pdf

get wrong epc on MIPS

I know MIPS would get wrong epc register value when it happens at branch delay, and epc = fault_address - 4.
But now, I often get the wrong EPC value which is even NOT in .text segment such as 0xb6000000, what's wrong with the case??
Thanks for your advance..
The CPU does not know anything about the boundaries of the .text region in your program. It simply implements a 2^32 byte address space.
It is possible for an incorrectly programmed jump to go to any address within the 2^32 byte address space. The jump instruction itself will not cause any sort of exception - in fact the MIPS32® Architecture for Programmers Volume II: The MIPS32® Instruction Set explicitly states that jump (J, JR, JALR) instructions do not trigger any exceptions.
When the processor starts executing from the destination of an incorrectly programmed jump, in presumably uninitialized memory, what happens next depends on the contents of that memory. If uninitialized memory is filled with "random" data, that data will be interpreted as instructions which the processor will execute until an illegal instruction is found, or until an instruction triggers some other exception.

MIPS exception handling (Specifically branch delay slots)

Say an exception has been hit in the branch delay slot of a conditional branch
e.g.
BEQ a0, zero, _true
BREAK (0000)
sw a0, 0000(t0)
_true:
sw a1, 0000(t0)
My exception handler will pick up the exception type 9 from the BREAK instruction and set the BD bit of the CAUSE register to 1 as it is in the branch delay and the EPC will be the address of the branch.
The documentation says that this will require complex processing which isn't described. i.e. Getting the target of the branch/jump, doing any required comparison then setting the PC to the true or false address.
My solution to get around the complex processing (which is a bit of a hack) is as follows:
Store the instruction in the branch delay slot
NOP the instruction in the branch delay slot
Return from the exception handler restoring all registers
re-execute the *BEQ a0, zero, _true* and the branch delay will be a nop so it will have no effect
Place a sw breakpoint at the target(s) of the branch and set a flag
once the sw breakpoint is hit restore the branch delay slot and remove traces of the sw breakpoints.
Parsing branches and jumps is fine (hence why i can get the targets) but in the conditional branches, once i have parsed, i then have to do the comparisons to determine whether to jump to the true part of go to the false (next line) which i feel is more work than i would like. Do I not??
My problem with my hacky method is:
Will the CPU have already stored that it has hit the conditional branch and have determined that after the branch delay slot has been executed whether it is going to take the branch or not, therefore once i point the Program Counter back to the branch and it gets executed instead of executing correctly it thinks it must jump to the true or false part of the branch which was pre-determined before the exception occurred? (try a "double jump")
do you got the MIPS programmers documents? if you want an 100% accurate answer read them - if not I can just tell you the important bits as I remember them.
in short - yes you need to load the instruction from memory, parse it and interpret the result to figure out where you have to continue. "Patching" the code as you expressed would work too, but you need to make sure the instruction cache gets invalidated, else you will be running from the cache and end in an infinite loop.
the updating of the PC follows after the delay slot has been executed, until then it will point to the branch. there is no special handling during an exception except you have a register which says if you are in a delay slot or not.
you`d need to emulate all instructions that can conditionally raise an exception in your handler (load/store) along with the branch instructions. if its another kind of instruction in the DS you just can restart at the branch (the exception was an external interrupt in this case).
if your concern is about performance then simply dont put exception-raising instructions in a delay slot.
edit: and no, MIPS stores nothing about interrupted instructions, but the method you are suggesting likely will be slower due to having to invalidate the ICache twice