I'm trying to use w32 fiber API in FreePascal to implement coroutine class. So far I could allocate worker context (CreateFiber) and switch between contexts (SwitchToFiber).
However I couldn't get exceptions working reliably. There should be different exception chains in different contexts. When using w32 SEH exception handler chain, chain switching happens automatically in SwitchToFiber. But FreePascal doesn't use w32 SEH chain and rather stores its' own chain in threadvar.
I'd like to go ahead and try to save/restore threadvars by hand. So far I could get ThreadEnvironmentBlock structure:
function GetCurrentTEB: PThreadEnvironmentBlock;
asm
mov eax, fs:[$18]
end;
I believe threadvars are stored in ThreadLocalStorage which is somewhere inside ThreadEnvironmentBlock ;-) Now I'd like to properly save and restore the ThreadLocalStorage.
The following information is needed:
Where in ThreadEnvironmentBlock threadvars are stored?
How to save/store them into/from global heap?
Related
Ada 83 was one of the first languages to have exceptions. (I want to say 'the first', but one thing I have learned from looking into the history of technology is that there is almost always an earlier X.)
From the implementation perspective, the most complex part of implementing exceptions is their interaction with resource cleanup (destructors in C++, try-finally in Java etc.); when an exception is thrown, resource cleanup code needs to be run on exit from every dynamically nested scope on the way out.
Does Ada 83 have any resource cleanup features that are invoked in this way by exceptions? Or can the implementation just do a straight longjmp?
The issue is not really whether exceptions clean up resources, but whether leaving a declarative scope, such as a subprogram body or a block statement, cleans up resources allocated in that scope. The reason for leaving the scope is a secondary issue. It does not much matter whether execution leaves by reaching the "end" of the scope or by propagating an exception that was raised in the scope but not handled in the scope.
Ada 83 has a very limited concept of "resources", but does try to clean up those resources. When a scope is left, the stack frame, with all the local variables of the scope, is removed. If the scope declares a local access type, and especially if the declaration has a Storage_Size clause, the whole "collection" of dynamically allocated objects for that access type may be removed (deallocated, freed) when the scope is left (although I think this is not a strict requirement and some compilers may not have implemented it). If the scope is the master ("owner") of some tasks, the tasks must terminate before the scope can be left (but the programmer is responsible for somehow informing the tasks that they should terminate, or for aborting the tasks).
But for most of what today are considered "resources", such as local heap allocations with non-local access types, open locally declared files, and so on, Ada 83 does not automatically clean up such local resource allocations when the local scope is left. The normal idiom is for such scopes to have a local exception handler that cleans up the resources and then (if needed) re-raises the exception or raises another exception.
One is allowed to perform resource cleanup as a step before raising an exception but Ada 83 exceptions do not automatically clean up resources. How would the exception know which resources to clean up and which not to clean up? An exception handler can be raised and handled in the same block, in which case the programmer can clean up related resources in the handler, or it can be propagated to an enclosing block or to a block in the scope which called the subprogram raising the exception. Ada 83 exceptions are not semantically connected to a set of resources needing clean up.
An exception is typically declared outside the scope in which it is raised. If this were not so then the exception could not be explicitly handled outside the scope in which it is declared by name and could only be handled using an Others choice in the exception handler.
Suppose the simple following main function written in RISC-V Assembly:
.globl main
main:
addi s3,zero,10 #Should this register (s3) be saved before using?
Since s3 is a "saved register", the procedure calling conventions should be followed and thus, this register should be pushed to the stack before using it. However, by looking at the source file, no other procedure has used this register and saving the register to the stack seems redundant.
My question is, should these types of registers be saved every time before every usage even if it means writing more (redundant) code just to obey the calling conventions? Can these conventions sometimes be ignored to improve performance?
In the example above, should the register be saved because it is unknown if the main's caller has been using the s3 register?
Yes, main is a function that has a real caller you return to, and that caller might be using s3 for something.
Unless your main never returns, either being an infinite loop or only exiting by calling exit or a system call. If you never return, you don't need to be able to restore the caller's state, or even find your way back (via a return address).
So if it's just as convenient to call exit instead of ever returning from main, doing that allows you to avoid saving anything.
This also applies in cases where there's nothing for main to return to, of course, so returning wasn't even an option. e.g. if it's the entry point in a kernel or other freestanding code.
Also, I hope you understand that saved every time before every usage means once per function that uses them, not separately around each separate block. And not saving call-clobbered registers around each function call; just let them die.
Can these conventions sometimes be ignored to improve performance?
Yes, if you keep the details invisible to any code you don't control.
If you treat small private helper functions as actually part of one big function, then they can use a "private" custom calling convention. (Even if you do actually call / return instead of just jumping to them, if you want to avoid inlining them at multiple callsites)
Sometimes this is just taking advantage of extra guarantees when you know about the function you're calling. e.g. that it doesn't actually clobber some of its input arg registers. This can be useful in recursion when you're calling yourself: foo(int *p, int a) self calls might take advantage of p still being in the same register unmodified, instead of having to keep p somewhere else for use after the call returns like it would if calling an "unknown" function where you can't assume anything the calling convention doesn't guarantee.
Or if you have a publicly-visible wrapper in front of your actual private recursive function, you can set up some some constants, or even have the recursive function treat one register as a static variable, instead of passing around pointers to some shared state in memory. (That's no longer pure recursion, just a loop that uses the asm stack to keep track of some history that happens to include a jump address.)
There is an external C++ function that is called from Tcl/Tk and does some stuff in a noticeable amount of time. Tcl caller has to get the result of that function so it waits until it's finished. To avoid blocking of GUI, that C++ function has some kind of event loop implemented in its body:
while (m_curSyncProc.isRunning()) {
const clock_t tm = clock();
while (Tcl_DoOneEvent(TCL_ALL_EVENTS | TCL_DONT_WAIT) > 0) {} // <- stuck here in case of tkwait/vwait
// Pause for 10 ms to avoid 100% CPU usage
if (double(clock() - tm) / CLOCKS_PER_SEC < 0.005) {
nanosleep(10000);
}
}
Everything works great unless tkwait/vwait is in action in Tcl code.
For example, for dialogs the tkwait variable someVariable is used to wait Ok/Close/<whatever> button is pressed. I see that even standard Tk bgerror uses the same method (it uses vwait).
The problem is that once called Tcl_DoOneEvent does not return while Tcl code is waiting in tkwait/vwait line, otherwise it works well. Is it possible to fix it in that event loop without total redesigning of C++ code? Because that code is rather old and complicated and its author is not accessible anymore.
Beware! This is a complex topic!
The Tcl_DoOneEvent() call is essentially what vwait, tkwait and update are thin wrappers around (passing different flags and setting up different callbacks). Nested calls to any of them create nested event loops; you don't really want those unless you're supremely careful. An event loop only terminates when it is not processing any active event callbacks, and if those event callbacks create inner event loops, the outer event loop will not get to do anything at all until the inner one has finished.
As you're taking control of the outer event loop (in a very inefficient way, but oh well) you really want the inner event loops to not run at all. There's three possible ways to deal with this; I suspect that the third (coroutines) will be most suitable for you and that the first is what you're really trying to avoid, but that's definitely your call.
1. Continuation Passing
You can rewrite the inner code into continuation-passing style — a big pile of procedures that hands off from step to step through a state machine/workflow — so that it doesn't actually call vwait (and friends). The only one of the family that tends to be vaguely safe is update idletasks (which is really just Tcl_DoOneEvent(TCL_IDLE_EVENTS | TCL_DONT_WAIT)) to process Tk internally-generated alterations.
This option was your main choice up to Tcl 8.5, and it was a lot of work.
2. Threads
You can move to a multi-threaded application. This can be easy… or very difficult; the details depend on an examination of what you're doing throughout the application.
If going this route, remember that Tcl interpreters and Tcl values are totally thread-bound; they internally use thread-specific data so that they can avoid big global locks. This means that threads in Tcl are comparatively expensive to set up, but actually use multiple CPUs very efficiently afterwards; thread pooling is a very common approach.
3. Coroutines
Starting in 8.6, you can put the inner code in a coroutine. Almost everything in 8.6 is coroutine-aware (“non-recursive” in our internal lingo) by default (including commands you wouldn't normally think of, such as source) and once you've done that, you can replace the vwait calls with equivalents from the Tcllib coroutine package and things will typically “just work”. (For example, vwait var becomes coroutine::vwait var, and after 123 becomes coroutine::after 123.)
The only things that don't have direct replacements are tkwait window and tkwait visibility; you'll need to simulate those with waiting for a <Destroy> or <Visibility> event (the latter is uncommon as it is unsupported on some platforms), which you do by binding a trivial callback on those that just sets a variable that you can coroutine::vwait on (which is essentially all that tkwait does internally anyway).
Coroutines can become messy in a few cases, such as when you've got C code that is not coroutine-aware. The main places in Tcl where these come into play are in trace callbacks, inter-interpreter calls, and the scripted implementations of channels; the issue there is that the internal APIs these sit behind are rather complicated already (especially channels) and nobody's felt up to wading in and enabling a non-recursive implementation.
The ARM website says that the link register stores the return information for subroutines, function calls, and exceptions (such as interrupts), so what is the stack used for?
The answers to this similar question say that the stack is used to store the return address, and to "push" on local variables that will need to be put back on the core registers after the exception.
But this is what the link register is for, so is why is it needed? What is the difference between the two and how are they both used?
Okay I think I understand your question.
So you have some code in a function call a function
main ()
{
int a,b;
a = myfun0();
b=a+7;
...
So when we call myfun0() the link register basically gets us back so we can do the b = a+7; understanding of course all of this gets compiled to assembly and optimized and such, but this will suffice to understand that the link register is used to return back to just after the call.
But what if
myfun0 ()
{
return(myfun1()+3);
}
when main calls myfun0() the link register points at some code in the main() function it needs to return to. then myfun0() calls myfun1() it needs to come back to myfun0() to do some more math before returning to main(), so when it calls myfun1() the link register is set to come back and add 3. the problem is when we set the link register to return to myfun0() we trash the address in main() we needed to return to. so to prevent that IF the function is going to call another function then as well as local variables that cant all live within the disposable registers the link register must be put on the stack. So now main calls myfun0(), myfun0() is going to call a function (myfun1()) so a copy of the link register (return address into main()) is saved on the stack. myfun0 calls myfun1() myfun1() follows the same rule if calling something else put lr on the stack otherwise you dont have to, myfun1() uses lr to return to myfun0(), myfun0() restores lr from the stack so it can return to main. follow that simple rule per function and you cant go wrong.
Now interrupts which not sure related or not or maybe I misunderstood your question. so arm has banked registers at least for non-cortex-m cores. but in general when an interrupt/exception occurs if the exception handler needs to use any resources/registers that are in use by the foreground task then that handler needs to preserve those on the stack in such a way that the foreground task that was interrupted had no idea this happened since interrupts in general can often occur between any two instructions, so you need to even go so far as to preserve the flags that were set by the instruction before the one you interrupted.
So apply that to arm, you have to look at which architecture you are using and see where it describes the interrupt process what registers you have to preserve and which ones you dont, which stack pointer is used, etc (something you have to setup well ahead of your first exceptions, if you are using an arm with separate interrupt stack and forground stacks).
The cortex-m is designed to do some/all of that work for you it has one stack basically and on interrupt it pushes all the registers on the stack for you so you can simply have a C compiled function just run as a handler and the hardware cleans up after you (from a preserved register perspective). Some other processors families do something like this for you, they might have a return from interrupt instruction separate from a return instruction, one is there because on interrupt the hardware saves the flags and return address but for a simple call you dont need the flags preserved.
The arm is a lot more flexible than some other instruction sets, some others you may not have any instructions that allow you to branch to an address in any register you want you may have a limitation. You might be limited on what register you use as a stack pointer or the stack pointer itself is not accessible as a general purpose register. by convention the sp is 13 in the arm, they allow the pseudo instruction of push and pop which translates into the proper ldmia r13!{blah} and stmdb r13!,{blah} but you could pick your own (if not using a compiler that follows the convention or can change an open source compiler to use a different stack pointer register). the arm doesnt prevent that. the magic of the link register r14 is nothing more than a branch link or branch link exchange automatically modifies r14, but the instruction set allows you to use basically any register to branch/return for normal function calls. The arm has just enough general purpose registers to encourage compilers to do register based parameter passing vs stack only. Some processors lean toward stack only parameter passing and have designed their return address instructions to be strictly stack based avoiding a return register all together and a rule to have to save that if nesting functions.
All of these approaches have pros and cons, in the case of arm register passing was desireable and a register based return address too, but for nesting functions you have to preserve the return address at each nesting level to not get lost. Likewise for interrupts you have to put things back they way you found them as well as be able to get back to where you interrupted the foreground.
For some months I've been working on a "home-made" operating system.
Currently, it boots and goes into 32-bit protected mode.
I've loaded the interrupt table, but haven't set up the pagination (yet).
Now while writing my exception routines I've noticed that when an instruction throws an exception, the exception routine is executed, but then the CPU jumps back to the instruction which threw the exception! This does not apply to every exception (for example, a div by zero exception will jump back to the instruction AFTER the division instruction), but let's consider the following general protection exception:
MOV EAX, 0x8
MOV CS, EAX
My routine is simple: it calls a function that displays a red error message.
The result: MOV CS, EAX fails -> My error message is displayed -> CPU jumps back to MOV CS -> infinite loop spamming the error message.
I've talked about this issue with a teacher in operating systems and unix security.
He told me he knows Linux has a way around it, but he doesn't know which one.
The naive solution would be to parse the throwing instruction from within the routine, in order to get the length of that instruction.
That solution is pretty complex, and I feel a bit uncomfortable adding a call to a relatively heavy function in every affected exception routine...
Therefore, I was wondering if the is another way around the problem. Maybe there's a "magic" register that contains a bit that can change this behaviour?
--
Thank you very much in advance for any suggestion/information.
--
EDIT: It seems many people wonder why I want to skip over the problematic instruction and resume normal execution.
I have two reasons for this:
First of all, killing a process would be a possible solution, but not a clean one. That's not how it's done in Linux, for example, where (AFAIK) the kernel sends a signal (I think SIGSEGV) but does not immediately break execution. It makes sense, since the application can block or ignore the signal and resume its own execution. It's a very elegant way to tell the application it did something wrong IMO.
Another reason: what if the kernel itself performs an illegal operation? Could be due to a bug, but could also be due to a kernel extension. As I've stated in a comment: what should I do in that case? Shall I just kill the kernel and display a nice blue screen with a smiley?
That's why I would like to be able to jump over the instruction. "Guessing" the instruction size is obviously not an option, and parsing the instruction seems fairly complex (not that I mind implementing such a routine, but I need to be sure there is no better way).
Different exceptions have different causes. Some exceptions are normal, and the exception only tells the kernel what it needs to do before allowing the software to continue running. Examples of this include a page fault telling the kernel it needs to load data from swap space, an undefined instruction exception telling the kernel it needs to emulate an instruction that the CPU doesn't support, or a debug/breakpoint exception telling the kernel it needs to notify a debugger. For these it's normal for the kernel to fix things up and silently continue.
Some exceptions indicate abnormal conditions (e.g. that the software crashed). The only sane way of handling these types of exceptions is to stop running the software. You may save information (e.g. core dump) or display information (e.g. "blue screen of death") to help with debugging, but in the end the software stops (either the process is terminated, or the kernel goes into a "do nothing until user resets computer" state).
Ignoring abnormal conditions just makes it harder for people to figure out what went wrong. For example, imagine instructions to go to the toilet:
enter bathroom
remove pants
sit
start generating output
Now imagine that step 2 fails because you're wearing shorts (a "can't find pants" exception). Do you want to stop at that point (with a nice easy to understand error message or something), or ignore that step and attempt to figure out what went wrong later on, after all the useful diagnostic information has gone?
If I understand correctly, you want to skip the instruction that caused the exception (e.g. mov cs, eax) and continue executing the program at the next instruction.
Why would you want to do this? Normally, shouldn't the rest of the program depend on the effects of that instruction being successfully executed?
Generally speaking, there are three approaches to exception handling:
Treat the exception as an unrepairable condition and kill the process. For example, division by zero is usually handled this way.
Repair the environment and then execute the instruction again. For example, page faults are sometimes handled this way.
Emulate the instruction using software and skip over it in the instruction stream. For example, complicated arithmetic instructions are sometimes handled this way.
What you're seeing is the characteristic of the General Protection Exception. The Intel System Programming Guide clearly states that (6.15 Exception and Interrupt Reference / Interrupt 13 - General Protection Exception (#GP)) :
Saved Instruction Pointer
The saved contents of CS and EIP registers point to the instruction that generated the
exception.
Therefore, you need to write an exception handler that will skip over that instruction (which would be kind of weird), or just simply kill the offending process with "General Protection Exception at $SAVED_EIP" or a similar message.
I can imagine a few situations in which one would want to respond to a GPF by parsing the failed instruction, emulating its operation, and then returning to the instruction after. The normal pattern would be to set things up so that the instruction, if retried, would succeed, but one might e.g. have some code that expects to access some hardware at addresses 0x000A0000-0x000AFFFF and wish to run it on a machine that lacks such hardware. In such a situation, one might not want to ever bank in "real" memory in that space, since every single access must be trapped and dealt with separately. I'm not sure whether there's any way to handle that without having to decode whatever instruction was trying to access that memory, although I do know that some virtual-PC programs seem to manage it pretty well.
Otherwise, I would suggest that you should have for each thread a jump vector which should be used when the system encounters a GPF. Normally that vector should point to a thread-exit routine, but code which was about to do something "suspicious" with pointers could set it to an error handler that was suitable for that code (the code should unset the vector when laving the region where the error handler would have been appropriate).
I can imagine situations where one might want to emulate an instruction without executing it, and cases where one might want to transfer control to an error-handler routine, but I can't imagine any where one would want to simply skip over an instruction that would have caused a GPF.