Writing Signal handlers for Shared libraries or DLL? - signal-handling

I have a Application A(by some company X). This application allows me to extend the functionality by allowing me to write my own functions.
I tell the Application A to call my user functions in the Applications A's configuration file (this is how it knows that Appl A must call user written Functions). The appl A uses Function pointers which I must register with Application A prior to calling my user written functions.
If there is a bug or fault in my user written functions in production, the Appl A will stop functioning. For example, if I have a segmentation fault in my User written functions.
So Application A will load my user written function from a shared DLL file. This means that my user written functions will be running in Application A' Process address space.
I wish to handle certain signals like Segmentation fault, divide by zero and stack overflow, but applications A has its own signal handlers written for this,
How can I write my own signal handlers to catch the exceptions in my user written functions, so that I can clean up gracefully w/o affecting much of Application A? Since my user functions will be called in Applications A's process, the OS will call signal handlers written in Application A and not my user functions.
How can I change this? I want OS to call signal handlers written in my functions but only for signal raised by my functions, which is asynchronous in nature.
Note: I do not have the source code of Application A and I cannot make any changes to it, because it's controlled by a different company.
I will be using C , and only C on a Linux, solaris and probably windows platforms.

You do not specify which platform you're working with, so I'll answer for Linux, and it should be valid for Windows as well.
When you set your signal handlers, the system call that you use returns the previous handler. It does it so that you can return it once you are no longer interested in handling that signal.
Linux man page for signal
MSDN entry on signal
Since you are a shared library loaded into the application you should have no problems manipulating the signals handlers. Just make sure to override the minimum you need in order to reduce the chances of disrupting the application itself (some applications use signals for async notifications).

The cleanest way to do this would be run your application code in a separate process that communicates with the embedded shared DLL code via some IPC mechanism. You could handle whatever signals you wanted in your process without affecting the other process. Typically the conditions you mention (seg fault, divide by zero, stack overflow) indicate bugs in the program and will result in termination. There isn't much you can do to "handle" these beyond fixing the root cause of the bug.

in C++, you can catch these by putting your code in a try-catch:
try
{
// here goes your code
}
catch ( ... )
{
// handle segfaults
}

Related

Are there any CPU-state bits indicating being in an exception/interrupt handler in x86 and x86-64?

Are there any CPU-state bits indicating being in an exception/interrupt handler in x86 and x86-64? In other words, can we tell whether the main thread or exception handler is currently executed based only on the CPU registers' state?
Not, there's no bit in the CPU itself (e.g. a control register) that means "we're in an exception or interrupt handler".
But there is hidden state indicating that you're in an NMI (Non-Maskable Interrupt) handler. Since you can't block them by disabling interrupts, and unblockable arbitrary nesting of NMIs would be inconvenient, another NMI won't get delivered until you run an iret. Even if an exception (like #DE div by 0) happens during an NMI handler, and that exception handler itself returns with iret even if you're not done handling the NMI. See The x86 NMI iret problem on LWN.
For normal interrupts, you can disable interrupts (cli) if you don't want another interrupt to be delivered while this one is being handled.
However, the interrupt controller (logically outside the CPU core, but actually part of modern CPUs) may need to be told when you're done handling an external interrupt. (Not a software-interrupt or exception). https://wiki.osdev.org/IDT_problems#I_can_only_receive_one_IRQ shows the outb instructions needed to keep the legacy PIC happy. (I don't know if this applies to more modern ways of doing interrupts, like MSI-X message-signalled interrupts.
That part of the OSdev wiki page might be specific to toy OSes that let the BIOS emulate legacy IBM-PC stuff.) But either way, that's only for external interrupts like PS/2 keyboard controller, hard drive DMA complete, or whatever (not exceptions), so it's unrelated to your Are Linux system calls executed inside an exception handler? question.
The lack of exception-state means there's no special instruction you have to run to "acknowledge" an exception before calling schedule() from what was an interrupt handler. All you have to do is make sure interrupts are enabled or not when they should or shouldn't be. (sti / cli, or pushf / popf to save/restore the old interrupt state.) And of course that your software data structures remain consistent and appropriate for what you're doing. But there isn't anything you have to do specifically to keep the CPU happy.
It's not like with user-space where a signal handler should tell the OS it's done instead of just jumping somewhere and running indefinitely. (In Linux, a signal handler can modify the main-thread program-counter so sigreturn(2) resumes execution somewhere other than where you were when it was delivered.) If POSIX or Linux signals were the (mental) model you were wondering about for interrupts/exceptions, no, it's not like that.
There is an interrupt-priority mechanism (CR8 in x86-64, or the LAPIC TPR (Task Priority Register)), but it does not automatically get set when the CPU delivers an interrupt. You can set it once (e.g. if you have a lot of high-priority interrupts to process on this core) and it persists across interrupts. (How is CR8 register used to prioritize interrupts in an x86-64 CPU?).
It's just a filter on what interrupt-numbers can get delivered to this core when interrupts are enabled (sti, IF=1 bit in RFLAGS). Apparently Windows makes some use of it, or did back in 2007, but Linux doesn't (or didn't).
It's not like you have to tell the CPU / LAPIC that you're done with this interrupt so it's ok for it to deliver another interrupt of this or lower priority.

External FLASH slow verification with STM32cubeProgrammer

I'm working with an STM32F469 chip with a Micron MT25Q Quad_SPI Flash. To program the Flash, there needs to be an external loader program developed. That's all working, but the problem is that verification of the QSPI Flash is extremely slow.
Looking in the log file, it shows that the Flash is being programmed in 150K byte blocks. However, the verification is being done in 1K byte blocks. In addition, the chip is re-initialized before each block check. I've tried this with both through STM32cubeIDE and in STM32cubeProgrammer directly.
The external programmer program include the correct chip configuration information and specifies a 64K page size. I don't see how to get the programmer to use a larger block size. It looks like it understands what part of the SRAM is used and is using the balance of the 256K in the on-board SRAM for programming the QSPI Flash. It could use the same size for reading the data back or use the Verify() function in the external loader. It's calling Read() and then checking the data itself.
Any thought or hints?
Let me add some observations on creating a new external loader. The first observation is "Don't." If you can pick a supported external chip and pin it out to use an existing loader, then do that. STM provides just 4 example programs but they must have 50 external loaders. If the hardware design copies the schematic for a demo board that has an external loader, you should be fine and avoid doing the development work.
The external loader is not a complete executable. It provides a set of functions to do basic operations like Init(), Erase(), Read() and Write(). The trick is that there is no main() and no start-up code is run when the program starts.
The external loader is an ELF file, renamed to "*.stldr". The programming tool looks into the debug information to find the location of the functions. It then sets the registers to provide the parameters, the PC to run the function, and then let's it run. There's some super-clever work going on to make this work. The programmer looks at the returned value (R0) to see if things pass or not. It can also figure out if the function has crashed the core or otherwise timed-out.
What makes writing the external super fun is that the debugger is running the program so there's no debugger available to see what the code is doing. I settled on outputting errors, and encoded information, on the return() from the called functions to give hints as to what was happening.
The external loader isn't a "full" program. Without the startup code, lots of on-chip stuff isn't set up and some just isn't going to work. At least I couldn't figure it out. I'm not sure if it wasn't configured right or the debugger was blocking its use. Looking at the example external loaders, they are written in a very simple way and do not call the HAL or use interrupts. You'll need to provide core set-up functions to configure the clock chains. That Hal_Delay() method will never return as the timers and/or interrupts aren't working. I could never make them work and suspect the NVIC was somehow being disabled. I ended up replacing the HAL_delay() function with a for loop that spun based on core clock rate and a the instruction cycles per loop.
The app note suggests developing a stand-alone program to debug the basic capabilities. That's a good idea but a challenge. Prior to starting the external loader, I had the QSPI doing the needed operations but from a C++ application calling the HAL. Creating an external loader from that was a long exercise in stripping out and replacing functionality. A hint is that the examples are written at a register level. I'm not that good to deal directly with the QuadSPI peripheral and the chip's instruction set at the same time.
The normal start-up of a program is eliminated. Everything that's done before the main() is called (E.g., in startup_stm32f469nihx.s) is up to you. This includes setting the clock chains to boost the core clock and get the peripheral buses working. The program runs in the on-chip SRAM so any initialized variables are loaded correctly. There's no moving data needed but the stack and uninitialized data areas could/should still be zeroed.
I hope this helps someone!
Today I've faced the same issue.
I was able to improve the Verify speed with two simple steps, but Verify still much slower than programming and this is strange...
If anyone find a way to change the 1KB block read of STM32CubeProgrammer I would like to know =).
Follow the changes I made to improve a bit the performance.
Add a kind of lock in the Init Function to avoid multiple initializations. This was the most significant change because I'am checking the Flash ID in my initialization proccess. Other aproachs could be safer but this simple code snippet worked to me.
int Init(void)
{
static uint32_t lock;
if(lock != 0x43213CA5)
{
lock = 0x43213CA5;
/* Init procedure goes here */
}
return(1);
}
Cache a page instead of reading the external memory for each call. This will help more if your external memory page read has too much overhead, otherwise this idea won't give relevant results.

Where is hardware exception handling entry / exit code stored

I know this question seems very generic as it can depend on the platform,
but I understand with procedure / function calls, the assembler code to push return address on the stack and local variables etc. can be part of either the caller function or callee function.
When a hardware exception or interrupt occurs tho, the Program Counter will get the address of the exception handler via the exception table, but where is the actual code to store the state, return address etc. Or is this automatically done at the hardware level for interrupts and exceptions?
Thanks in advance
since you are asking about arm and you tagged microcontroller you might be talking about the arm7tdmi but are probably talking about one of the cortex-ms. these work differently than the full sized arm architecture. as documented in the architectural reference manual that is associated with these cores (the armv6-m or armv7-m depending on the core) it documents that the hardware conforms to the ABI, plus stuff for an interrupt. So the return address the psr and registers 0 through 4 plus some others are all put on the stack, which is unusual for an architecture to do. R14 instead of getting the return address gets an invalid address of a specific pattern which is all part of the architecture, unlike other processor ip, addresses spaces on the cortex-ms are encouraged or dictated by arm, that is why you see ram starts at 0x20000000 usually on these and flash is less than that, there are some exceptions where they place ram in the "executable" range pretending to be harvard when really modified harvard. This helps with the 0xFFFxxxxx link register return address, depending on the manual they either yada yada over the return address or they go into detail as to what the patterns you find mean.
likewise the address in the vector table is spelled out something like the first 16 are system/arm exceptions then interrupts follow after that where it can be up to 128 or 256 possible interrupts, but you have to look at the chip vendor (not arm) documentation for that to see how many they exposed and what is tied to what. if you are not using those interrupts you dont have to leave a huge hole in your flash for vectors, just use that flash for your program (so long as you insure you are never going to fire that exception or interrupt).
For function calls, which occur at well defined (synchronous) locations in the program, the compiler generates executable instructions to manage the return address, registers and local variables. These instructions are integrated with your function code. The details are hardware and compiler specific.
For a hardware exception or interrupt, which can occur at any location (asynchronous) in the program, managing the return address and registers is all done in hardware. The details are hardware specific.
Think about how a hardware exception/interrupt can occur at any point during the execution of a program. And then consider that if a hardware exception/interrupt required special instructions integrated into the executable code then those special instructions would have to be repeated everywhere throughout the program. That doesn't make sense. Hardware exception/interrupt management is handled in hardware.
The "code" isn't software at all; by definition the CPU has to do it itself internally because interrupts happen asynchronously. (Or for synchronous exceptions caused by instructions being executed, then the internal handling of that instruction is what effectively triggers it).
So it's microcode or hardwired logic inside the CPU that generates the stores of a return address on an exception, and does any other stuff that the architecture defines as happening as part of taking an exception / interrupt.
You might as well as where the code is that pushes a return address when the call instruction executes, on x86 for example where the call instruction pushes return info onto the stack instead of overwriting a link register (the way most RISCs do).

Memory access exception handling with MinGW on XP

I am trying to use the MinGW GCC toolchain on XP with some vendor code from an embedded project that accesses high memory (>0xFFFF0000) which is, I believe, beyond the virtual mem address space allowed in 'civilian' processes in XP.
I want to handle the memory access exceptions myself in some way that will permit execution to continue at the instruction following the exception, ie ignore it. Is there some way to do it with MinGW? Or with MS toolchain?
The vastly simplified picture is thus:
/////////////
// MyFile.c
MyFunc(){
VendorFunc_A();
}
/////////////////
// VendorFile.c
VendorFunc_A(){
VendorFunc_DoSomeDesirableSideEffect();
VendorFunc_B();
VendorFunc_DoSomeMoreGoodStuff();
}
VendorFunc_B(){
int *pHW_Reg = 0xFFFF0000;
*pHW_Reg = 1; // Mem Access EXCEPTION HERE
return(0); // I want to continue here
}
More detail:
I am developing an embedded project on an Atmel AVR32 platform with freeRTOS using the AVR32-gcc toolchain. It is desirable to develop/debug high level application code independent of the hardware (and the slow avr32 simulator). Various gcc, makefile and macro tricks permit me to build my Avr32/freeRTOS project in the MinGW/Win32 freeRTOS port enviroment and I can debug in eclipse/gdb. But the high-mem HW access in the (vendor supplied) Avr32 code crashes the MinGW exe (due to the mem access exception).
I am contemplating some combination of these approaches:
1) Manage the access exceptions in SW. Ideally I'd be creating a kind of HW simulator but that'd be difficult and involve some gnarly assembly code, I think. Alot of the exceptions can likely just be ignored.
2) Creating a modified copy of the Avr32 header files so as to relocate the HW register #defines into user process address space (and create some structs and linker sections that commmit those areas of virtual memory space)
3) Conditional compilation of function calls that result in highMem/HW access, or alernatively more macro tricks, so as to minimize code cruft in the 'real' HW target code. (There are other developers on this project.)
Any suggestions or helpful links would be appreciated.
This page is on the right track, but seems overly complicated, and is C++ which I'd like to avoid. But I may try it yet, absent other suggestions.
http://www.programmingunlimited.net/siteexec/content.cgi?page=mingw-seh
You need to figure out why the vendor code wants to write 1 to address 0xFFFF0000 in the first place, and then write a custom VendorFunc_B() function that emulates this behavior. It is likely that 0xFFFF0000 is a hardware register that will do something special when written to (eg. change baud rate on a serial port or power up the laser or ...). When you know what will happen when you write to this register on the target hardware, you can rewrite the vendor code to do something appropriate in the windows code (eg. write the string "Starting laser" to a log file). It is safe to assume that writing 1 to address 0xFFFF0000 on Windows XP will not be the right thing to do, and the Windows XP memory protection system detects this and terminates your program.
I had a similar issue recently, and this is the solution i settled on:
Trap memory accesses inside a standard executable built with MinGW
First of all, you need to find a way to remap those address ranges (maybe some undef/define combos) to some usable memory. If you can't do this, maybe you can hook through a seg-fault and handle the write yourself.
I also use this to "simulate" some specific HW behavior inside a single executable, for some already written code. However, in my case, i found a way to redefine early all the register access macros.

How to determine why a task destroys , VxWorks?

I have a VxWorks application running on ARM uC.
First let me summarize the application;
Application consists of a 3rd party stack and a gateway application.
We have implemented an operating system abstraction layer to support OS in-dependency.
The underlying stack has its own memory management&control facility which holds memory blocks in a doubly linked list.
For instance ; we don't directly perform malloc/new , free/delege .Instead we call OSA layer's routines and it gets the memory from OS and puts it in a list then returns this memory to application.(routines : XXAlloc , XXFree,XXReAlloc)
And when freeing the memory we again use XXFree.
In fact this block is a struct which has
-magic numbers indication the beginning and end of memory block
-size that user requested allocated
-size in reality due to alignment issue previous and next pointers
-pointer to piece of memory given back to application. link register that shows where in the application xxAlloc is called.
With this block structure stack can check if a block is corrupted or not.
Also we have pthread library which is ported from Linux that we use to
-create/terminate threads(currently there are 22 threads)
-synchronization objects(events,mutexes..)
There is main task called by taskSpawn and later this task created other threads.
this was a description of application and its VxWorks interface.
The problem is :
one of tasks suddenly gets destroyed by VxWorks giving no information about what's wrong.
I also have a jtag debugger and it hits the VxWorks taskDestoy() routine but call stack doesn't give any information neither PC or r14.
I'm suspicious of specific routine in code where huge xxAlloc is done but problem occurs
very sporadic giving no clue that I can map it to source code.
I think OS detects and exception and performs its handling silently.
any help would be great
regards
It resolved.
I did an isolated test. Allocated 20MB with malloc and memset with 0x55 and stopped thread of my application.
And I wrote another thread which checks my 20MB if any data else than 0x55 is written.
And quess what!! some other thread which belongs other components in CPU (someone else developed them) write my allocated space.
Thanks 4 your help
If your task exits, taskDestroy() is called. If you are suspicious of huge xxAlloc, verify that the allocation code is not calling exit() when memory is exhausted. I've been bitten by this behavior in a third party OSAL before.
Sounds like you are debugging after integration; this can be a hell of a job.
I suggest breaking the problem into smaller pieces.
Process
1) you can get more insight by instrumenting the code and/or using VxWorks intrumentation (depending on which version). This allows you to get more visibility in what happens. Be sure to log everything to a file, so you move back in time from the point where the task ends. Instrumentation is a worthwile investment as it will be handy in more occasions. Interesting hooks in VxWorks: Taskhooklib
2) memory allocation/deallocation is very fundamental functionality. It would be my first candidate for thorough (unit) testing in a well-defined multi-thread environment. If you have done this and no errors are found, I'd first start to look why the tas has ended.
other possible causes
A task will also end when the work is done.. so it may be a return caused by a not-so-endless loop. Especially if it is always the same task, this would be my guess.
And some versions of VxWorks have MMU support which must be considered.