Ada Exceptions in Safety Critical Embedded Systems [closed]

Ada Exceptions in Safety Critical Embedded Systems [closed] - exception

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I started learning Ada for its potential use in an embedded device which is safety critical. So far, I'm really liking it. However, in my research on embedded programming, I came across the hot topic of whether to use exception handling in embedded systems. I think I understand why some people seem to avoid it:
depending on its implementation it can introduce either run-time overhead or larger code size (mentioned here under "Implementation")
the time it takes to execute exceptions can be non-deterministic (one of several sources I saw)
Now my question is, Does the Ada language or the GNAT compiler address these concerns? My understanding of safety critical code is that non-deterministic code size and execution time is often not acceptable.
Due Diligence: I am having a bit of trouble finding out exactly how deterministic Ada exceptions can be, but my understanding is their original implementation called for more run-time overhead in exchange for reduced code size impact (above first link mentions Ada explicitly). Beyond the above first link, I have looked into profiles mentioning determinism of code, like the Ravenscar profile and this paper, but nothing seems to mention exception handling determinism. To be fair, I may be looking in the wrong places, as this topic seems quite deep.

There are embedded systems that are safety- or mission-critical, embedded systems that are hard real time, and embedded systems that are both.
Embedded systems that are hard real time may be constrained or not. Colleagues worked on a missile guidance system in the 70s that had about 4 instructions worth of headroom in its main loop! (as you can imagine, it was written in assembler and used a tuned executive, not an RTOS. Exceptions weren't supported). On the other hand, the last one I worked on, on a 1 GHz PowerPC board, had a 2 millisecond deadline for the response to a particular interrupt, and our measured worst case was 1.3 milliseconds (and it was a soft real time requirement anyway, you just didn't have to miss too many in a row).
That system also had safety requirements (I know, I know, safe missile systems, huh) and although we were permitted to use exceptions, an unhandled exception meant that the system had to be shut down, missile in flight or no, resulting in loss of missile. And we were strictly forbidden to say when others => null; to swallow an exception, so any exception we didn't handle would be 'unhandled' and would bounce up to the top level.
The argument is, if an unhandled exception happens, you can no longer know the state of the system, so you can't justify continuing. Of course, the wider safety engineering has to consider what action the overall system should take (for example, perhaps this processor should restart in a recovery mode).
Sometimes people use exceptions as part of their control flow; indeed, for handling random text inputs a commonly used method is, rather than checking for end of file, just carry on until you get an End_Error;
loop
begin
-- read input
-- process input
exception
when End_Error => exit;
end;
end loop;
Jacob's answer discusses using SPARK. You don't have to use SPARK to not handle exceptions, though of course it would be nice to be able to prove to yourself (and your safety auditor!) that there won't be any. Handling exceptions is very tricky, and some RTSs (e.g Cortex GNAT RTS) don't; the configuration pragma
pragma Restrictions (No_Exception_Propagation);
means that exceptions can't be propagated out of the scope where they're raised (the program will crash out with a call to a Last_Chance_Handler).
Propagating exceptions only withon the scope where they're raised isn't, IMO, that useful:
begin
-- do something
if some error condition then
raise Err;
end if;
-- do more
exception
when Err =>
null;
end;
would be a rather confusing way of avoiding the "do more" code. Better to use a label!

Exceptions are deterministic in Ada. (But some checks which can raise an exception have some freedom. If the compiler can provide a correct answer, it doesn't always have to raise an exception, if an intermediate result is out of bounds for the type in question.)
At least one Ada compiler (GNAT) has a "zero cost" exception implementation. This doesn't make exceptions completely free, but you don't pay a run-time cost until you actually raise an exception. You still pay a cost in terms of code space. How large that cost is depends on the architecture.
I haven't worked on safety critical systems myself, but I know for sure that the run-time used for the software in the Ariane 4 inertial navigation system included exceptions.
If you don't want exceptions, one option is to use SPARK (a language derived from Ada). You can still use any Ada compiler you like, but you use the SPARK tools to prove that the program can't raise any exceptions. You should note that SPARK isn't magic. You have to help the tools, by inserting assertions, which the tools can use as intermediate steps for the proofs.

Related

Precise exception

I was going through the book The Design and Implementation of the FreeBSD operating system and I came across this:
This ability to restart an instruction is called a precise exception.
The CPU may implement restarting by saving enough state when an
instruction begins that the state can be restored when a fault is
discovered. Alternatively, instructions could delay any modifications
or side effects until after any faults would be discovered so that the
instruction execution does not need to back up before restarting.
I could not understand what does
modification or side effects
refer to in the passage. Can anyone elaborate?

That description from the FreeBSD book is very OS centric. I even disagree with its definition, This ability to restart an instruction is called a precise exception. You don't restart an instruction after a power failure exception. So rather than try to figure out what McKusick meant, I'm gonna suggest going elsewhere for a better description of exceptions.
BTW, I prefer Smith's definition:
An interrupt is precise if the saved process state corresponds with
the sequential model of program execution where one instruction completes before the next begins.
https://lagunita.stanford.edu/c4x/Engineering/CS316/asset/smith.precise_exceptions.pdf
Almost all modern processors support precise exceptions. So the hard work is already done. What an OS must do is register a trap handler for the exceptions that the hardware takes. For example, there will be a page fault handler, floating point, .... To figure out what is necessary for these handlers you'd have to read the processor theory of operations.
Despite what seems like gritty systems detail, that is fairly high level. It doesn't say anything about what the hardware is doing and so the FreeBSD description is shorthanding a lot.
To really understand precise exceptions, you need to read about that in the context of pipelining, out of order, superscalar, ... in a computer architecture book. I'd recommend Computer Architecture A Quantitative Approach 6th ed. There's a section Dealing With Exceptions p C-38 that presents a taxonomy of the different types of exceptions. The FreeBSD description is only describing some exceptions. It then gets into how each exception type is handled by the pipeline.
Also, The Linux Programming Interface has 3 long chapters on the POSIX signal interface. I know it's not FreeBSD but it covers what an application will see when, for example, a floating point exception is taken and a SIGFPE signal is sent to the process.

Applications of explicitly raising exceptions

What are the applications and advantages of explicitly raising exceptions in a program. For example, if we consider Ada language specifically here provides an interface to raise exceptions in the program. Example:
raise <Exception>;
But what are the advantages and application areas where we would need to raise exceptions explicitly?
For example, in a procedure which accepts one of the parameters as string:
function Fixed_Str_To_Chr_Ptr (Source_String : String) return C.Strings.Chars_Ptr is
...
begin
...
-- Check whether source string is of acceptable length
if Source_String'Length <= 100 then
...
else
...
raise Constraint_Error;
end if;
return Ptr;
exception
when Constraint_Error=>
.. Do Something..
end Fixed_Str_To_Chr_Ptr;
Is there any advantage or good practice if I raise an exception in the above function and handle it when the passed string length bound exceeds the tolerable limits? Or a simple If-else handler logic should do the business?

I'll make my 2 cents an answer in order to bundle the various aspects. Let's start with the general question
But what are the advantages and application areas where we would need to raise exceptions explicitly?
There are a few typical reasons for raising exceptions. Most of them are not Ada-specific.
First of all there may be a general design decision to use or not use exceptions. Some general criteria:
Exception handlers may incur a run time cost even if an exception is actually never thrown (cf. e.g. https://gcc.gnu.org/onlinedocs/gnat_ugn/Exception-Handling-Control.html). That may be unacceptable.
Issues of inter-operability with other languages may preclude the use of exceptions, or at least require that none leave the part programmed in Ada.
To a degree the decision is also a matter of taste. A programmer coming from a language without exceptions may feel more confident with a design which just relies on checking return values.
Some programs will benefit from exceptions more than others. If traditional error handling obscures the actual program structure it may be time for exceptions. If, on the other hand, potential errors are few, easily detected and can be handled locally, exceptions may obscure potential execution paths more than handling errors traditionally would.
Once the general decision to use exceptions has been made the problem arises when and when not it is appropriate to raise them in your code. I mentioned one general criteria in my comment. What comes to mind:
Exceptions should not be part of normal, expected program flow (they are called exceptions, not expectations ;-) ). This is partly because the control flow is harder to see and partly because of the potential run time cost.
Errors which can be handled locally don't need exceptions. (It can still be useful to raise one in order to have a uniform error handling though. I'll discuss that below when I get to your code snippet.)
On the other hand, exceptions are great if a function has no idea how to handle errors. This is particularly true for utility and library functions which can be called from a variety of contexts (GUI, console program, embedded, server ...). Exceptions allow the error to propagate up the call chain until somebody can handle it, without any error handling code in the intervening layers.
Some people say that a library should only expose custom exceptions, at least for any anticipated errors. E.g. when an I/O exception occurs, wrap it in a custom exception and explicitly raise that custom exception instead.
Now to your specific code question:
Is there any advantage or good practice if I raise an exception in the above function and handle it when the passed string length bound exceeds the tolerable limits? Or a simple If-else handler logic should do the business?
I don't particularly like that (although I don't find it terrible) because my general argument above ("if you can handle it locally, don't raise") would indicate that a simple if/else is clearer.1 For example, if the function is long the exception handler will be far away from the error location, so one may wonder where exactly the exception could occur (and finding one raise location is no guarantee that one has found them all, so the reviewer must scrutinize the whole function!).
It depends a bit on the specific circumstances though. Raising an exception may be elegant if an error can happen in several places. For example, if several strings can be too short it may be nice to have a centralized error handling through the exception handler, instead of scattering if/then/elses (nested??) across the function body. The situation is so common that a legitimate case can be made for using goto constructs in languages without exceptions. An exception is then clearly superior.
1But in all reality, how do you handle that error there? Do you have a guaranteed logging facility? What do you return? Does the caller know the result can be invalid? Maybe you should throw and not catch.

There are two problems with the given example:
It's simple enough that control flow doesn't need the exception. That won't always be the case, however, and I'll come back to that in a moment.
Constraint_Error is a spectacularly bad exception to raise, to detect a string length error. The standard exceptions Program_Error, Constraint_Error, Storage_Error ought to be reserved for programming error conditions, and in most circumstances ought to bring down the executable before it can do any damage, with enough debugging information (a stack traceback at the very least) to let you find the mistake and guarantee it never happens again.
It's remarkably satisfying to get a Constraint_Error pointing spookily close to your mistake, instead of whatever undefined behaviour happens much later on... (It's useful to learn how to turn on stack tracebacks, which aren't generally on by default).
Instead, you probably want to define your own String_Size_Error exception, raise that and handle it. Then, anything else in your unshown code that raises Constraint_Error will be properly debugged instead of silently generating a faulty Chars_Ptr.
For a valid use case for raising exceptions, consider a circuit simulator such as SPICE (or a CFD simulator for gas flow, etc). These tools, even when working properly, are prone to failures thanks to numerical problems that happen in matrix computations. (Two terms cancel, producing zero +/- rounding error, which causes infeasibly large numbers or divide-by-zero later on). It's often an iterative approximation, where the error should reduce in each step until it's an acceptably low value. But if a failure occurs, the error term will start growing...
Typically the simulation happens step by step, where each step is a sufficiently small time step, maybe 1 us or 1 ns. The main loop requests a step, and this request is passed to thousands of agents in the simulation representing components in a circuit, or triangles in a CFD mesh.
Any one of those agents may fail to compute a solution, and the cleanest way to handle a failure is to raise an exception, maybe Convergence_Error. There may be thousands of possible points where an exception can be raised.
Testing thousands of return codes would get ugly fast. But with exceptions, the main loop only needs one handler, which takes some corrective action such as reducing the simulation step size and running the step again.
Sanitizing user text input in a browser may be another good use case, closer to the example code.
One word on the runtime cost of exceptions : the Gnat compiler and its RTS supports a "Zero Cost Exception" (ZCX) model - at least for some targets. There's a larger penalty when an exception is raised, as a tradeoff against eliminating the penalty in the normal case. If the penalty matters to you, refer to the documentation to see if it's worthwhile 9or even possible) in your case.

You raise an exception explicitly to control which exception is reported to the user of a subprogram. - Or in some cases just to control the message associated with the raised exception.
In very special cases you may also raise an exception as a program flow control.

Exceptions should stay true to their name, which is to represent exceptional situations.

No LR and SPSR for EL0 in Aarch64

In AArch64, There are 4 exception levels viz EL0-3. ARM site mentions there are 4 Stack pointers (SP_EL0/1/2/3) but only 3 exception Link registers (ELR_EL1/2/3) and only 3 saved program status register(SPSR_EL1/2/3).
Why the ELR_EL0 and SPSR_EL0 are not required?
P.S. Sorry if this is a silly question. I am new to ARM architecture.

By design exceptions cannnot target EL0, so if it can't ever take an exception then it has no use for the machinery to be able to return from one.
To expand on the reasoning a bit (glossing over the optional and more special-purpose higher exception levels), the basic design is that EL1 is where privileged system code runs, and EL0 is where unprivileged user code runs. Thus EL0 is by necessity far more restricted in what it can do, and wouldn't be very useful for handling architectural exceptions, i.e. low-level things requiring detailed knowledge of the system. Only privileged software (typically the OS kernel) should have access to the full hardware and software state necessary to decide whether handling that basic hardware exception means e.g. going and quietly paging something in from swap, versus delivering a "software exception"-type signal to the offending task to tell it off for doing something bad.

Who invented the throw/try/catch[/finally] kind of error handling?

My questions are more of historical nature than practical:
Who invented it?
Which language used it first (and to what extent)?
What was the original idea, the underlying concept (which actual problems had to be solved these days, papers welcome) ?
Is LISPs condition system the ancestor of current exception handling?

Today's Common Lisp condition system is a relative newcomer. The design was based on previous systems, but wasn't included as part of the Common Lisp language until the late 80's around the time of CLTL2
I believe the conditions chapter in that book has a fair amount commentary on the history and background of the design, and references to related research and prior implementations of similar systems.

The VAX CPUs had a stack-based exception handling system. In every call frame, one 32-bit cell was allocated and filled with a zero. If the subroutine being called wanted to handle exceptions, all it had to do was fill in that cell with the address of the exception-handling routine.
When an exception took place, a stack search would occur. This was easy, since the stack frames were all chained together. The first stack frame with a non-zero entry would cause an stack unwind to that point, and the exception handler would be called.
I remember this as being one of the features of the processor which were aimed at higher-level languages, but I don't know that there was a higher-level language that took advantage of the feature. I believe it was used by library code, which would likely have been written in assembler.

Doesn't it go back to the setjmp, longjmp functions in C? Richie, Kernighan, et al?

Real time system exception handling

As always after some research I was unable to find anything of real value. My question is how does one go about handling exceptions in a real time system? As program failure generally is not the best case i.e. nuclear reactor/ heart monitor.
Ok since everyone got lost on the second piece of this, which had NOTHING to do with the main question. I had it in there to show how I normally escape code blocks.

Exception handling in real-time/embedded systems has several layers. Not just the language supported options, but also MMU, CPU exceptions and one of my favorites: watchdogs.
Language exceptions (C/C++)
- not often used, beause it is hard to prove that all exceptions are handled at the right level. Also it is pretty hard to determine what threat/process should be responsible. Instead, programming by contract is preferred.
Programming style:
- i.e. programming by contract. Additional constraints : Misra/C Misra/C++. This can be checked to unsure that all possible cases are somehow handled. (i.e. no if without else)
Hardware support:
- MMU : use of multiple processes which are protected against each other. This allows
- watchdog
- CPU exceptions
- multi core: use of multiple cores to separate cricical processes from the rest. Also allows to have voting mechanisms (you want this and more for your nuclear reactor).
- multi-system
Most important is to define a strategy. Depending on the other nonfunctional requirements (safety, reliability, security) a strategy needs to be thought of. Can be graceful degradation to partial system reboot.

In a 'real-time', 'nuclear reactor' type system, chances are the exception handling allows the system to instead of fail, do the next best thing.
Let's say that we have a heart monitor. If it isn't receiving a signal, that might trigger an exception. In that case, the heart monitor might handle the exception by waiting a few seconds and trying again.
In a nuclear reactor, getting to a certain temperature might trigger an exception. In that case, the handling might shut off various parts of the reactor to start to cool it down, and then start them back up when it gets to a reasonable temperature.
Exceptions are meant to have a lower-level system say that it doesn't know what to do, and to have a higher level system handle it. Like in the nuclear reactor, the system that measures temperature probably doesn't know how to turn on parts of the reactor, so it triggers an exception so that some higher-level system can handle it.

A critical system is like any other system, except it's specified more clearly, passes through more testing phases, and is generally will fail-safe.
Regarding your form, yes it's pretty bad. I do mind the lack of {} very much; and it's been said so-often that this is just plain bad style, and leads to confusion when adding new code.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008