How would an empty interrupt service handler be written in x64? - exception

I have read the AMD64 Developer manual on interrupt routines. According to the manual,
The interrupt handler must pop the error code off the stack if one was pushed by the interrupt or exception. IRET restores the interrupted program's rIP, CS and rFLAGS by popping their saved values off of the stack and into their respective registers.
Therefore, would an empty ISR handler look something along this ASM code?
add rsp, 4 ;pop err code off stack
iretq
I am assuming the size of the error code is 4 bytes, as other websites have told me. I'm pretty sure this is totally wrong, but some guidance will help.

The error code size in long-mode (x64) is 8 bytes long. So instead of adding 4 bytes to the stack pointer, you will need to add 8 bytes.
In addition, not all exceptions push an error code onto the stack. A table that contains which exceptions do and do not can be found here: https://wiki.osdev.org/Exceptions
If a hander does not push an error code, the empty handler is just the iretq instruction to return from the handler. If it DOES push an error code, we simply add 8 bytes to the stack pointer and then return from the handler.
add rsp, 8
iretq
Thanks #MichaelPetch

Related

MIPS exceptions what do they mean

I've been studying assembly lately and i can't seem to understand how the exceptions work exactly. More specific, i get the message Exception 6 occurred and ignored. Can someone please explain what exactly does this mean? I am using qtspim.
Exceptions may be caused by hardware or software. An exception is like an unscheduled function call that jumps to a new address.
The program may encounter an error condition such as
an undefined instruction. The program then jumps to code in the operating system (OS), which may choose to terminate the program. Other causes of exceptions are division by zero, attempts to read some nonexistent memory, hardware malfunctions, debugger breakpoints, and arithmetic overflow.
The processor records the cause of an exception and the value of the PC
at the time the exception occurs. It then jumps to the exception handler function. The exception handler is code (usually in the OS) that examines the
cause of the exception and responds appropriately, It then returns to the program that
was executing before the exception took place.
In MIPS, the exception handler is always located at 0x80000180. When an exception occurs, the processor always jumps to this instruction address, regardless of the cause.
The MIPS architecture uses a special-purpose register, called the Cause
register, to record the cause of the exception.
MIPS uses another special-purpose register called the Exception
Program Counter (EPC) to store the value of the PC at the time an
exception takes place. The processor returns to the address in EPC after
handling the exception. This is analogous to using $ra to store the old
value of the PC during a jal instruction.

What's the difference between Software-Generated Interrupt and Software-Generated Exception?

I am reading the Intel Manual 3A Chapter 6 Interrupt and Exception Handling.
Interrupt and Exception have 3 sources respectively.
For Software-Generated Interrupt, it says:
The INT n instruction permits interrupts to be generated from within
software by supplying an interrupt vector number as an operand. For
example, the INT 35 instruction forces an implicit call to the
interrupt handler for interrupt 35. Any of the interrupt vectors from
0 to 255 can be used as a parameter in this instruction. If the
processor’s predefined NMI vector is used, however, the response of
the processor will not be the same as it would be from an NMI
interrupt generated in the normal manner. If vector number 2 (the NMI
vector) is used in this instruction, the NMI interrupt handler is
called, but the processor’s NMI-handling hardware is not activated.
Interrupts generated in software with the INT n instruction cannot be
masked by the IF flag in the EFLAGS register.
For Software-Generated Exceptions, it says:
The INTO, INT 3, and BOUND instructions permit exceptions to be
generated in software. These instructions allow checks for exception
conditions to be performed at points in the instruction stream. For
example, INT 3 causes a breakpoint exception to be generated. The INT
n instruction can be used to emulate exceptions in software; but there
is a limitation. If INT n provides a vector for one of the
architecturally-defined exceptions, the processor generates an
interrupt to the correct vector (to access the exception handler) but
does not push an error code on the stack. This is true even if the
associated hardware-generated exception normally produces an error
code. The exception handler will still attempt to pop an error code
from the stack while handling the exception. Because no error code was
pushed, the handler will pop off and discard the EIP instead (in place
of the missing error code). This sends the return to the wrong
location.
So, what's the difference? Seems both leverage the int n instruction. How can I tell whether it generates an exception or an interrupt in a piece of assembly code?
In the x86 architecture an exception is handled as an interrupt, nominally with an interrupt handler.
So interrupts and exceptions are terms that overlaps, the latter are a kind of the former.
Interrupt numbers from 0 to 31 are reserved for CPU exceptions, for example interrupt number 0 is the #DE (Divide error), interrupt number 13 is the #GP (General Protection).
When the CPU detects a condition that should rise an exception (like an access to a non present page) it performs a series of tasks.
First it pushes an error code if needed, some exceptions (like #PF and #GP) do, some (like #DE) don't.
The Section 6.15 of the Intel manual 3 lists all the exceptions with their eventual error code.
Secondly it "calls" the appropriate interrupt handler which is like a far call but with EFLAGS pushed on the stack.
int n does only the second step, it calls an interrupt but doesn't push any error code as there is no error condition in the hardware in the first place (and because int n was there before the concept of error codes).
So it can be used to emulate exceptions, the software has to eventually push an appropriate error code.
When you see int n in the code, it is never an exception. It is an interrupt, that eventually is used to steer the control flow into a particular OS exception handler.
Trivia: int3 (with no space) is special because it is encoded as CC which is only one byte (normal int n is CD imm8). This is useful for debugging, since the debugger can put it anywhere in the code segment.
into only generates the #OF exception if OF = 1.

Why does this PUSH instruction cause a UNDEFINED_INSTRUCTION exception at my ARM processor?

I am working with a Cortex-A9 and my program crashes because of an UNDEFINED_INSTRUCTION exception. The assembly line that causes this exception is according to my debugger's trace:
Trace #9999 : S:0x022D9A7C E92D4800 ARM PUSH {r11,lr}
Exception: UNDEFINED_INSTRUCTION (9)
I program in C and don't write assembly or binary and I am using gcc. Is this really the instruction that causes the exception, i.e. is the encoding of this PUSH instruction wrong and hence a compiler/assembler bug? Or is the encoding correct and something strange is going on? Scrolling back in the trace I found another PUSH instruction, that does not cause errors and looks like this:
Trace #9966 : S:0x022A65FC E52DB004 ARM PUSH {r11}
And of course there are a lot of other PUSH instruction too. But I did not find any other that tries to push specifically R11 and LR, so I can't compare.
I can't answer my own question, so I edit it:
Sorry guys, I don't exactly know what happend. I tried it several times and got the same error again and again. Then I turned the device off, went away and tried it again later and know it works fine...
Maybe the memory was corrupted somehow due to overheating or something? I don't know. Thanks for your answers anyway.
I use gcc 4.7.2 btw.
I suspect something is corrupting the SP register. Load/store multiple (of which PUSH is one alias) to unaligned addresses are undefined in the architecture, so if SP gets overwritten with something that's not a multiple of 4, then a subsequent push/pop will throw an undef exception.
Now, if you're on ARM Linux, there is (usually) a kernel trap for unaligned accesses left over from the bad old days which if enabled will attempt to fix up most unaligned load/store multiple instructions (despite them being architecturally invalid). However if the address is invalid (as is likely in the case of SP being overwritten with nonsense) it will give up and leave the undef handler to do its thing.
In the (highly unlikely) case that the compiler has somehow generated bad code that is fix-uppable most of the time,
cat /proc/cpuinfo/alignment
would show some non-zero fixup counts, but as I say, it's most likely corruption - a previous function has smashed the stack in such a way that an invalid SP is loaded on return, that then shows up at the next stack operation. Best double-check your pointer and array accesses.

Is using Task Gate with x86 IDT, the only way to handle kernel mode ( ring 0) stack fault exception?

Assume that kernel always executes in Ring 0 privilege level. For the stack fault exception (due to stack overflow or limit violations), which gate should be used for x86 IDT ( Interrupt descriptor table) setup out of Trap Gate, Interrupt Gate and Task Gate?
X86 processor needs stack to push eflags, CS, eip on stack before calling stack fault exception handler. Which means there is a need of Stack switch to call exception handler.
Is using Task Gate the only way to perform Stack Switch?
Is using Task Gate the only way to write stack fault handler for kernel stack faults?
Intel Manual writes-
"A new tss permits the handler to use a new privilege level 0 stack when handling the exception or interrupt. If an exception or interrupt occurs when the current privilege level 0 stack is corrupted, accessing the handler through a task gate can prevent a system crash by providing the handler with a new privilege level 0 stack".
Thanks in advance for your response.
As far as I know, yes. Imagine how it works:
You overflow your stack.
The GPF handler gets called, however it can't do anything because the R0 stack is corrupted.
The process tries to call the Double Fault handler, but can't.
The process triple faults.
If you have a TSS, the following gets done:
You overflow your stack.
The processor tries to call the GPF handler, resulting in a taskswitch
The processor switches tasks, resulting in new stack.
Everything continues as normal.

OS development: How to avoid an infinite loop after an exception routine

For some months I've been working on a "home-made" operating system.
Currently, it boots and goes into 32-bit protected mode.
I've loaded the interrupt table, but haven't set up the pagination (yet).
Now while writing my exception routines I've noticed that when an instruction throws an exception, the exception routine is executed, but then the CPU jumps back to the instruction which threw the exception! This does not apply to every exception (for example, a div by zero exception will jump back to the instruction AFTER the division instruction), but let's consider the following general protection exception:
MOV EAX, 0x8
MOV CS, EAX
My routine is simple: it calls a function that displays a red error message.
The result: MOV CS, EAX fails -> My error message is displayed -> CPU jumps back to MOV CS -> infinite loop spamming the error message.
I've talked about this issue with a teacher in operating systems and unix security.
He told me he knows Linux has a way around it, but he doesn't know which one.
The naive solution would be to parse the throwing instruction from within the routine, in order to get the length of that instruction.
That solution is pretty complex, and I feel a bit uncomfortable adding a call to a relatively heavy function in every affected exception routine...
Therefore, I was wondering if the is another way around the problem. Maybe there's a "magic" register that contains a bit that can change this behaviour?
--
Thank you very much in advance for any suggestion/information.
--
EDIT: It seems many people wonder why I want to skip over the problematic instruction and resume normal execution.
I have two reasons for this:
First of all, killing a process would be a possible solution, but not a clean one. That's not how it's done in Linux, for example, where (AFAIK) the kernel sends a signal (I think SIGSEGV) but does not immediately break execution. It makes sense, since the application can block or ignore the signal and resume its own execution. It's a very elegant way to tell the application it did something wrong IMO.
Another reason: what if the kernel itself performs an illegal operation? Could be due to a bug, but could also be due to a kernel extension. As I've stated in a comment: what should I do in that case? Shall I just kill the kernel and display a nice blue screen with a smiley?
That's why I would like to be able to jump over the instruction. "Guessing" the instruction size is obviously not an option, and parsing the instruction seems fairly complex (not that I mind implementing such a routine, but I need to be sure there is no better way).
Different exceptions have different causes. Some exceptions are normal, and the exception only tells the kernel what it needs to do before allowing the software to continue running. Examples of this include a page fault telling the kernel it needs to load data from swap space, an undefined instruction exception telling the kernel it needs to emulate an instruction that the CPU doesn't support, or a debug/breakpoint exception telling the kernel it needs to notify a debugger. For these it's normal for the kernel to fix things up and silently continue.
Some exceptions indicate abnormal conditions (e.g. that the software crashed). The only sane way of handling these types of exceptions is to stop running the software. You may save information (e.g. core dump) or display information (e.g. "blue screen of death") to help with debugging, but in the end the software stops (either the process is terminated, or the kernel goes into a "do nothing until user resets computer" state).
Ignoring abnormal conditions just makes it harder for people to figure out what went wrong. For example, imagine instructions to go to the toilet:
enter bathroom
remove pants
sit
start generating output
Now imagine that step 2 fails because you're wearing shorts (a "can't find pants" exception). Do you want to stop at that point (with a nice easy to understand error message or something), or ignore that step and attempt to figure out what went wrong later on, after all the useful diagnostic information has gone?
If I understand correctly, you want to skip the instruction that caused the exception (e.g. mov cs, eax) and continue executing the program at the next instruction.
Why would you want to do this? Normally, shouldn't the rest of the program depend on the effects of that instruction being successfully executed?
Generally speaking, there are three approaches to exception handling:
Treat the exception as an unrepairable condition and kill the process. For example, division by zero is usually handled this way.
Repair the environment and then execute the instruction again. For example, page faults are sometimes handled this way.
Emulate the instruction using software and skip over it in the instruction stream. For example, complicated arithmetic instructions are sometimes handled this way.
What you're seeing is the characteristic of the General Protection Exception. The Intel System Programming Guide clearly states that (6.15 Exception and Interrupt Reference / Interrupt 13 - General Protection Exception (#GP)) :
Saved Instruction Pointer
The saved contents of CS and EIP registers point to the instruction that generated the
exception.
Therefore, you need to write an exception handler that will skip over that instruction (which would be kind of weird), or just simply kill the offending process with "General Protection Exception at $SAVED_EIP" or a similar message.
I can imagine a few situations in which one would want to respond to a GPF by parsing the failed instruction, emulating its operation, and then returning to the instruction after. The normal pattern would be to set things up so that the instruction, if retried, would succeed, but one might e.g. have some code that expects to access some hardware at addresses 0x000A0000-0x000AFFFF and wish to run it on a machine that lacks such hardware. In such a situation, one might not want to ever bank in "real" memory in that space, since every single access must be trapped and dealt with separately. I'm not sure whether there's any way to handle that without having to decode whatever instruction was trying to access that memory, although I do know that some virtual-PC programs seem to manage it pretty well.
Otherwise, I would suggest that you should have for each thread a jump vector which should be used when the system encounters a GPF. Normally that vector should point to a thread-exit routine, but code which was about to do something "suspicious" with pointers could set it to an error handler that was suitable for that code (the code should unset the vector when laving the region where the error handler would have been appropriate).
I can imagine situations where one might want to emulate an instruction without executing it, and cases where one might want to transfer control to an error-handler routine, but I can't imagine any where one would want to simply skip over an instruction that would have caused a GPF.