Design by Contract and Fail Fast - design-by-contract

Fail Fast -
Fail-fast is a property of a system or module with respect to its
response to failures. A fail-fast system is designed to immediately
report at its interface any failure or condition that is likely to
lead to failure. Fail-fast systems are usually designed to stop normal
operation rather than attempt to continue a possibly flawed process.
Such designs often check the system's state at several points in an
operation, so any failures can be detected early. A fail-fast module
passes the responsibility for handling errors, but not detecting them,
to the next-higher system design level.
Design by Contract -
Design by contract (DbC), also known as contract programming,
programming by contract and design-by-contract programming, is an
approach for designing software. It prescribes that software designers
should define formal, precise and verifiable interface specifications
for software components, which extend the ordinary definition of
abstract data types with preconditions, postconditions and invariants.
These specifications are referred to as "contracts", in accordance
with a conceptual metaphor with the conditions and obligations of
business contracts.
My question is what is the similar and difference in both terms .
I thinking that both are for software design.
Fail fast is more of respond to a system failure and Design by Contract is more of the gurantee , the minimum and the expectation of a system.
But how do i actually define the difference between both of them and the similarity.
Thanks for helping .!

They are mutually exclusive. A Java iterator is fail fast but also design by contract. Fail fast just means, bomb out in the hope nothing worse will happens (e.g. throw an exception). Whereas something like fail safe, would usually mean when failure happens, make sure nothing worse happens. You can do this by isolating system components or by having something that will handle the case of failure so that nothing bad will happen (e.g. session replication / failover)

Similarities:
Both can be implemented via assertions
Both are intrinsic to the design of XML
Differences:
Design by Contract doesn't handle unexpected errors
Fail fast doesn't handle redundant checks
Design by Contract doesn't handle bad requirements
Fail fast doesn't handle requirements mapping
References
The Liskov Substitution Principle and Test-Driven Development | Effective Software Design

Related

Ada Exceptions in Safety Critical Embedded Systems [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I started learning Ada for its potential use in an embedded device which is safety critical. So far, I'm really liking it. However, in my research on embedded programming, I came across the hot topic of whether to use exception handling in embedded systems. I think I understand why some people seem to avoid it:
depending on its implementation it can introduce either run-time overhead or larger code size (mentioned here under "Implementation")
the time it takes to execute exceptions can be non-deterministic (one of several sources I saw)
Now my question is, Does the Ada language or the GNAT compiler address these concerns? My understanding of safety critical code is that non-deterministic code size and execution time is often not acceptable.
Due Diligence: I am having a bit of trouble finding out exactly how deterministic Ada exceptions can be, but my understanding is their original implementation called for more run-time overhead in exchange for reduced code size impact (above first link mentions Ada explicitly). Beyond the above first link, I have looked into profiles mentioning determinism of code, like the Ravenscar profile and this paper, but nothing seems to mention exception handling determinism. To be fair, I may be looking in the wrong places, as this topic seems quite deep.
There are embedded systems that are safety- or mission-critical, embedded systems that are hard real time, and embedded systems that are both.
Embedded systems that are hard real time may be constrained or not. Colleagues worked on a missile guidance system in the 70s that had about 4 instructions worth of headroom in its main loop! (as you can imagine, it was written in assembler and used a tuned executive, not an RTOS. Exceptions weren't supported). On the other hand, the last one I worked on, on a 1 GHz PowerPC board, had a 2 millisecond deadline for the response to a particular interrupt, and our measured worst case was 1.3 milliseconds (and it was a soft real time requirement anyway, you just didn't have to miss too many in a row).
That system also had safety requirements (I know, I know, safe missile systems, huh) and although we were permitted to use exceptions, an unhandled exception meant that the system had to be shut down, missile in flight or no, resulting in loss of missile. And we were strictly forbidden to say when others => null; to swallow an exception, so any exception we didn't handle would be 'unhandled' and would bounce up to the top level.
The argument is, if an unhandled exception happens, you can no longer know the state of the system, so you can't justify continuing. Of course, the wider safety engineering has to consider what action the overall system should take (for example, perhaps this processor should restart in a recovery mode).
Sometimes people use exceptions as part of their control flow; indeed, for handling random text inputs a commonly used method is, rather than checking for end of file, just carry on until you get an End_Error;
loop
begin
-- read input
-- process input
exception
when End_Error => exit;
end;
end loop;
Jacob's answer discusses using SPARK. You don't have to use SPARK to not handle exceptions, though of course it would be nice to be able to prove to yourself (and your safety auditor!) that there won't be any. Handling exceptions is very tricky, and some RTSs (e.g Cortex GNAT RTS) don't; the configuration pragma
pragma Restrictions (No_Exception_Propagation);
means that exceptions can't be propagated out of the scope where they're raised (the program will crash out with a call to a Last_Chance_Handler).
Propagating exceptions only withon the scope where they're raised isn't, IMO, that useful:
begin
-- do something
if some error condition then
raise Err;
end if;
-- do more
exception
when Err =>
null;
end;
would be a rather confusing way of avoiding the "do more" code. Better to use a label!
Exceptions are deterministic in Ada. (But some checks which can raise an exception have some freedom. If the compiler can provide a correct answer, it doesn't always have to raise an exception, if an intermediate result is out of bounds for the type in question.)
At least one Ada compiler (GNAT) has a "zero cost" exception implementation. This doesn't make exceptions completely free, but you don't pay a run-time cost until you actually raise an exception. You still pay a cost in terms of code space. How large that cost is depends on the architecture.
I haven't worked on safety critical systems myself, but I know for sure that the run-time used for the software in the Ariane 4 inertial navigation system included exceptions.
If you don't want exceptions, one option is to use SPARK (a language derived from Ada). You can still use any Ada compiler you like, but you use the SPARK tools to prove that the program can't raise any exceptions. You should note that SPARK isn't magic. You have to help the tools, by inserting assertions, which the tools can use as intermediate steps for the proofs.

No LR and SPSR for EL0 in Aarch64

In AArch64, There are 4 exception levels viz EL0-3. ARM site mentions there are 4 Stack pointers (SP_EL0/1/2/3) but only 3 exception Link registers (ELR_EL1/2/3) and only 3 saved program status register(SPSR_EL1/2/3).
Why the ELR_EL0 and SPSR_EL0 are not required?
P.S. Sorry if this is a silly question. I am new to ARM architecture.
By design exceptions cannnot target EL0, so if it can't ever take an exception then it has no use for the machinery to be able to return from one.
To expand on the reasoning a bit (glossing over the optional and more special-purpose higher exception levels), the basic design is that EL1 is where privileged system code runs, and EL0 is where unprivileged user code runs. Thus EL0 is by necessity far more restricted in what it can do, and wouldn't be very useful for handling architectural exceptions, i.e. low-level things requiring detailed knowledge of the system. Only privileged software (typically the OS kernel) should have access to the full hardware and software state necessary to decide whether handling that basic hardware exception means e.g. going and quietly paging something in from swap, versus delivering a "software exception"-type signal to the offending task to tell it off for doing something bad.

Core of Verifier in Isabelle/HOL

Question
What is the core algorithm of the Isabelle/HOL verifier?
I'm looking for something on the level of a scheme metacircular evaluator.
Clarification
I'm only interested in the Verifier , not the strategies for automated theorem proving.
Context
I want to implement a simple proof verifier from scratch (purely for education reasons, not for production use.)
I want to understand the core Verifier algorithm of Isabelle/HOL. I don't care about the strategies / code used for automated theorem proving.
I have a suspicion that the core Verifier algorithm is very simple (and elegant). However, I can't find it.
Thanks!
Isabelle is a member of the "LCF family" of proof checkers, which means you have a special module --- the inference kernel -- where all inferences are run through to produce values of the abstract datatype thm. This is a bit like an operating system kernel processing system calls. Everything you can produce this way is "correct by construction" relative to the correctness of the kernel implementation. Since the programming language environment of the prover (Standard ML) has very strong static type-correctness properties, the correctness-by-construction of the inference kernel carries over to the rest of the proof assistant implementation, which can be quite huge.
So in principle you have a relatively small "trusted kernel" part and a really big "application user-space". In particular, most of Isabelle/HOL is actually a big collection of library theories and add-on tools (mostly in SML) in Isabelle user-land.
In Isabelle, the kernel infrastructure is quite complex, but still very small compared to the rest of the system. The kernel is in fact layered into a "micro kernel" (the Thm module) and a "nano kernel" (the Context module). Thm produces thm values in the sense of Milner's LCF-approach, and Context takes care of theory certficates for any results you produce, as well as proof contexts for local reasoning (notably in the Isar proof language).
If you want to learn more about LCF-style provers, I recommend looking also at HOL-Light which is probably the smallest realistic system of the LCF-family, realistic in the sense that people have done big applications with it. HOL-Light has the big advantage that its implementation can be easily understood, but this minimalism also has some disdavantages: it does not fully protect the user from doing non-sense in its ML environment, which is OCaml instead of SML. For various technical reasons, OCaml is not as "safe" by default as Standard ML.
If you untar the Isabelle sources, e.g.
http://isabelle.in.tum.de/dist/Isabelle2013_linux.tar.gz
you will find the core definitions in
src/Pure/thm.ML
And, there is such a project already you want to tackle:
http://www.proof-technologies.com/holzero/
added later: another, more serious project is
https://team.inria.fr/parsifal/proofcert/

Performances evaluation with Message Passing

I have to build a distributed application, using MPI.
One of the decision that I have to take is how to map instances of classes into process (and then into machines), in order to take maximum advantages from a distributed environment.
My question is: there is a model that let me choose the better mapping? I mean, some arrangements are surely wrong (for ex., putting in two different machines two objects that should process together a fairly large amount of data, in a sequential manner, without a stream of tokens to process), but there's a systematically way to determine such wrong arrangements, determined by flow of execution, message complexity, time taken by the computation done by the algorithmic components?
Well, there are data flow diagrams. Those can help identify parallelism's opportunities and pitfalls. The references on the wikipedia page might give you some more theoretical grounding.
When I worked at Lockheed Martin, I was exposed to CSIM, a tool they developed for modeling algorithm mapping to processing blocks.
Another thing you might try is the Join Calculus. I've found examples of programming with it to be surprisingly intuitive, and I think it's well grounded in theory. I'm not sure why it hasn't caught on more.
The other approach is the Pi Calculus, and I think that might be more popular, though it seems harder to understand.
A practical solution to this would be using a different model of distributed-memory parallel programming, that directly addresses your concerns. I work on the Charm++ programming system, whose model is that of individual objects sending messages from one to another. The runtime system facilitates automatic mapping of these objects to available processors, to account for issues of load balance and communication locality.

Real time system exception handling

As always after some research I was unable to find anything of real value. My question is how does one go about handling exceptions in a real time system? As program failure generally is not the best case i.e. nuclear reactor/ heart monitor.
Ok since everyone got lost on the second piece of this, which had NOTHING to do with the main question. I had it in there to show how I normally escape code blocks.
Exception handling in real-time/embedded systems has several layers. Not just the language supported options, but also MMU, CPU exceptions and one of my favorites: watchdogs.
Language exceptions (C/C++)
- not often used, beause it is hard to prove that all exceptions are handled at the right level. Also it is pretty hard to determine what threat/process should be responsible. Instead, programming by contract is preferred.
Programming style:
- i.e. programming by contract. Additional constraints : Misra/C Misra/C++. This can be checked to unsure that all possible cases are somehow handled. (i.e. no if without else)
Hardware support:
- MMU : use of multiple processes which are protected against each other. This allows
- watchdog
- CPU exceptions
- multi core: use of multiple cores to separate cricical processes from the rest. Also allows to have voting mechanisms (you want this and more for your nuclear reactor).
- multi-system
Most important is to define a strategy. Depending on the other nonfunctional requirements (safety, reliability, security) a strategy needs to be thought of. Can be graceful degradation to partial system reboot.
In a 'real-time', 'nuclear reactor' type system, chances are the exception handling allows the system to instead of fail, do the next best thing.
Let's say that we have a heart monitor. If it isn't receiving a signal, that might trigger an exception. In that case, the heart monitor might handle the exception by waiting a few seconds and trying again.
In a nuclear reactor, getting to a certain temperature might trigger an exception. In that case, the handling might shut off various parts of the reactor to start to cool it down, and then start them back up when it gets to a reasonable temperature.
Exceptions are meant to have a lower-level system say that it doesn't know what to do, and to have a higher level system handle it. Like in the nuclear reactor, the system that measures temperature probably doesn't know how to turn on parts of the reactor, so it triggers an exception so that some higher-level system can handle it.
A critical system is like any other system, except it's specified more clearly, passes through more testing phases, and is generally will fail-safe.
Regarding your form, yes it's pretty bad. I do mind the lack of {} very much; and it's been said so-often that this is just plain bad style, and leads to confusion when adding new code.