The more detail I put in an interface, the less reusable it is. On the other hand the less detail the more ethereal and useless it seems to become. Is there a standard set of recommendations about how to weigh this for various situations?
I'm a big fan of SOLID principles. The "I" in SOLID leads me to belive that clients shouldn't be forced to implement interfaces they do not need or use. In other words, if you have an abstract class or an interface, then the implementer should not be forced to implement parts that they don't care about.
Ray Houston wrote a good article on it (looking at the Membership Provider) here.
I have just co-authored a paper on granularity (size) of components and one of our conclusions is that there is no simple way to determine "what's right". So no, there is no standard set of recommendations.
I can give you a couple of academic references on the subject just in case you're interested:
Genero, M., Piattini, M., Calero, C. (eds.): Metrics for Software Conceptual Models.
Imperial College Press, London, UK (2005)
Shekhovtsov, V.A.: On Conceptualization of Quality. paper presented at Dagstuhl
Seminar on Conceptual Modelling, April 27-30 2008 (preprint on conference website) (2008)
Consider the Human genome as a class.
Each instance (cell object) has available to it all the functions of the genome.
(Although not all cell objects have access to all functions; except perhaps stem cells).
I'm bringing up this point because I have seen many instances of single classes trying to perform many functions, instead of having multiple classes, each performing a single function.
This is equivalent to a grain of sand having the instructions encoded in it to build a castle. Evolution has had the benefit of billions of years to work out the bugs. Engineers just don't have the capacity or the time to do this.
Related
I wanted some input on an interesting problem I've been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characteristics of them. For example, do they take the user's location?, do they share/sell with third parties?, etc.
I've talked to a few people, read a lot about privacy policies, and thought about this myself. Here is my current plan of attack:
First, read a lot of privacy and find the major "cues" or indicators that a certain characteristic is met. For example, if hundreds of privacy policies have the same line: "We will take your location.", that line could be a cue with 100% confidence that that privacy policy includes taking of the user's location. Other cues would give much smaller degrees of confidence about a certain characteristic.. For example, the presence of the word "location" might increase the likelihood that the user's location is store by 25%.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
I wanted to ask whether you guys think this is a good approach to this problem. How exactly would you approach a problem like this? Furthermore, are there any specific tools or frameworks you'd recommend using. Any input is welcome. This is my first time doing a project which touches on artificial intelligence, specifically machine learning and NLP.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
This is text classification. Given that you have multiple output categories per document, it's actually multilabel classification. The standard approach is to manually label a set of documents with the classes/labels that you want to predict, then train a classifier on features of the documents; typically word or n-gram occurrences or counts, possibly weighted by tf-idf.
The popular learning algorithms for document classification include naive Bayes and linear SVMs, though other classifier learners may work too. Any classifier can be extended to a multilabel one by the one-vs.-rest (OvR) construction.
A very interesting problem indeed!
On a higher level, what you want is summarization- a document has to be reduced to a few key phrases. This is far from being a solved problem. A simple approach would be to search for keywords as opposed to key phrases. You can try something like LDA for topic modelling to find what each document is about. You can then search for topics which are present in all documents- I suspect what will come up is stuff to do with licenses, location, copyright, etc. MALLET has an easy-to-use implementation of LDA.
I would approach this as a machine learning problem where you are trying to classify things in multiple ways- ie wants location, wants ssn, etc.
You'll need to enumerate the characteristics you want to use (location, ssn), and then for each document say whether that document uses that info or not. Choose your features, train your data and then classify and test.
I think simple features like words and n-grams would probably get your pretty far, and a dictionary of words related to stuff like ssn or location would finish it nicely.
Use the machine learning algorithm of your choice- Naive Bayes is very easy to implement and use and would work ok as a first stab at the problem.
I am curious, what are the differences between Domain Driven Design and Model Driven Architecture? I have the impression they have certain similarities.
Could you enlighten me?
Thanks
Don't disagree with most of the above although it's perhaps worth expanding a little.
The single most important concept in DDD is to focus on the problem domain. To put technology obsession to the side and concentrate primarily on modelling the problem you're trying to solve. So put ajax, ORMs, databases, frameworks etc. into the background and instead make sure you have a complete, accurate model of the problem first and foremost. (Of course you still need the architectural components - but they're explicitly subservient to the model). DDD calls this "Ubiquitous Language" - a model expressed in terms domain experts and developers alike use and understand. A model where the names of classes, methods etc. are taken from the problem domain.
DDD doesn't mandate /how/ you capture that model, although the book implies using an OO language to do so.
MDA shares that same notion of modelling the problem domain first and foremost (the PIM, Platform-Independent Model). As opposed to DDD, it recommends creating that model with UML. But the intent is the same: understand the problem domain without tainting it with (software) architectural concerns.
MDA's PSM (Platform-Specific Model) is somewhat analogous to applying the architectural patterns in DDD (e.g. aggregate, repository, etc.). Again - while different in specifics - both aim to solve the problem of converting a 'pure' problem domain model into a full software system.
So summing up, I'd say they are similar in two ways:
The centrality of the Model (as #Rui says) - specifically the /Domain/ model.
Applying architectural patterns to the model in order to realise the target system.
hth.
The root of both Domain-Driven Design (DDD) and Model Driven Architecture (MDA) is Model-Driven Engineering(MDE), also known as Model-Driven Software Development (MDSD) if limited to the software development domain. See Wikipedia: http://en.wikipedia.org/wiki/Model-driven_development
All approaches falling under the MDE umbrella have one thing in common: a model. How this model is materialized depends on the specific MDE flavor.
MDA is regarded as overly complex. DDD is considered by some as too abstract. My personal favorite MDE implementations are DSM and ABSE (not listed on the Wikipedia article).
DDD is about approaching a software solution from a business perspective with the intent of keeping the design as much close to the real world as possible. This is more of an art than engineering.
MDA solves different set of problems. More details here: http://xml.coverpages.org/OMG-MDAFAQfinal1.pdf
Each X-Driven approach helps deliver values of specific aspects and representations in problem-solving activities. From my point of view, the main difference is that DDD is a design technique and MDA is an infrastructure, which is needed when the engineering community wanted to use it in the real world industry.
The term of Domain in DDD has isA relationship to "Problem Domain" and often seems the same thing. DDD values domain expertise, where decision depends on how much we understand the problems and how we choose the right path from initial to winning states. Before the final design spec can be written, there will be a great effort on problem studies. By looking at the main 3 principles of DDD. I map DDD with things I familiar with my age nowadays, (a) Focus on the core domain (DDD & MVP seems identical in the focus setting), (b) Explore models in a creative collaboration (This is Model-Driven/Based Engineering). Two contributors consist of domain expert - designer and professional software developer. (c) Speak a ubiquitous language within an explicitly bounded context (Communicate using Domain-specific language and develop artifacts relevant to the problem domain)
By looking at the development collaboration of MDA and related standards, it is an infrastructure for the application of Model-Driven Engineering. This is the evolution of the software industry in supporting the way to describe a software system using models and demonstrates how we organize CIM/PIM/PSM models and artifacts. Many powerful modeling operations and tools such as model transformation, domain-specific modeling languages, and automated software engineering techniques are officially emerged with MDA
People like Alexander Stepanov and Sean Parent vote for a formal and abstract approach on software design.
The idea is to break complex systems down into a directed acyclic graph and hide cyclic behaviour in nodes representing that behaviour.
Parent gave presentations at boost-con and google (sheets from boost-con, p.24 introduces the approach, there is also a video of the google talk).
While i like the approach and think its a neccessary development, i have a problem with imagining how to handle subsystems with amorphous behaviour.
Imagine for example a common pattern for state-machines: using an interface which all states support and having different behaviour in concrete implementations for the states.
How would one solve that?
Note that i am just looking for an abstract approach.
I can think of hiding that behaviour behind a node and defining different sub-DAGs for the states, but that complicates the design considerately if you want to influence the behaviour of the main DAG from a sub-DAG.
Your question is not clear. Define amorphous subsystems.
You are "just looking for an abstract approach" but then you seem to want details about an implementation in a conventional programming language ("common pattern for state-machines"). So, what are you asking for? How to implement nested finite state-machines?
Some more detail will help the conversation.
For a real abstract approach, look at something like Stream X-Machines:
... The X-machine model is structurally the
same as the finite state machine, except
that the symbols used to label the machine's
transitions denote relations of type X→X. ...
The Stream X-Machine differs from Eilenberg's
model, in that the fundamental data type
X = Out* × Mem × In*,
where In* is an input sequence,
Out* is an output sequence, and Mem is the
(rest of the) memory.
The advantage of this model is that it
allows a system to be driven, one step
at a time, through its states and
transitions, while observing the
outputs at each step. These are
witness values, that guarantee that
particular functions were executed on
each step. As a result, complex
software systems may be decomposed
into a hierarchy of Stream
X-Machines, designed in a top-down
way and tested in a bottom-up way.
This divide-and-conquer approach to
design and testing is backed by
Florentin Ipate's proof of correct
integration, which proves how testing
the layered machines independently is
equivalent to testing the composed
system. ...
But I don't see how the presentation is related to this. He seems to speak about a quite mainstream approach to programming, nothing similar to X-Machines. Anyway, the presentation is quite confusing and I have no time to see the video right now.
First impression of the talk, reading the slides only
The author touches haphazardly on numerous fields/problems/solutions, apparently without recognizing it: from Peopleware (for example Psychology of programming), to Software Engineering (for example software product lines), to various programming techniques.
How the various parts are linked and what exactly he is advocating is not clear at all (I'm accustomed to just reading slides and they are usually consequential):
Dataflow programming?
Constraints solving for User Interfaces? For practical implementations, see Garnet for Common Lisp, Amulet/OpenAmulet for C++.
What advantages gives us this "new" concept-based generic programming with respect to well-known approaches (for example, tools based on Hoare logic pre/post conditions and invariants or, better, Hoare's Communicating Sequential Processes (CSP) or Hehner's Practical Theory of Programming or some programming language with a sophisticated type-system like ATS, Qi or Epigram and so on)? It seems to me that introducing "concepts" - which, as-is, are specific to C++ - is not more simple than using the alternatives. Is it just about jargon and "politics"? (Finally formal methods... but disguised).
Why organizing program modules as a DAG and not as a tree, like David Parnas advocated decades ago in Designing software for ease of extension and contraction? (here a directly accessible .pdf and here slides from a lecture). The work on X-Machines probably is an answer to this question (going even beyond DAGs), but, again, the author seems to speak about a quite conventional program development regime in which Parnas' approach is the only sensible.
If/when I will see the video I will update this answer.
I took a glimpse on Hoare Logic in college. What we did was really simple. Most of what I did was proving the correctness of simple programs consisting of while loops, if statements, and sequence of instructions, but nothing more. These methods seem very useful!
Are formal methods used in industry widely?
Are these methods used to prove mission-critical software?
Well, Sir Tony Hoare joined Microsoft Research about 10 years ago, and one of the things he started was a formal verification of the Windows NT kernel. Indeed, this was one of the reasons for the long delay of Windows Vista: starting with Vista, large parts of the kernel are actually formally verified wrt. to certain properties like absence of deadlocks, absence of information leaks etc.
This is certainly not typical, but it is probably the single most important application of formal program verification, in terms of its impact (after all, almost every human being is in some way, shape or form affected by a computer running Windows).
This is a question close to my heart (I'm a researcher in Software Verification using formal logics), so you'll probably not be surprised when I say I think these techniques have a useful place, and are not yet used enough in the industry.
There are many levels of "formal methods", so I'll assume you mean those resting on a rigourous mathematical basis (as opposed to, say, following some 6-Sigma style process). Some types of formal methods have had great success - type systems being one example. Static analysis tools based on data flow analysis are also popular, model checking is almost ubiquitous in hardware design, and computational models like Pi-Calculus and CCS seem to be inspiring some real change in practical language design for concurrency. Termination analysis is one that's had a lot of press recently - The SDV project at Microsoft and work by Byron Cook are recent examples of research/practice crossover in formal methods.
Hoare Reasoning has not, so far, made great inroads in the industry - this is for more reasons than I can list, but I suspect is mostly around the complexity of writing then proving specifications for real programs (they tend to get big, and fail to express properties of many real world environments). Various sub-fields in this type of reasoning are now making big inroads into these problems - Separation Logic being one.
This is partially the nature of ongoing (hard) research. But I must confess that we, as theorists, have entirely failed to educate the industry on why our techniques are useful, to keep them relevant to industry needs, and to make them approachable to software developers. At some level, that's not our problem - we're researchers, often mathematicians, and practical usage is not foremost in our minds. Also, the techniques being developed are often too embryonic for use in large scale systems - we work on small programs, on simplified systems, get the math working, and move on. I don't much buy these excuses though - we should be more active in pushing our ideas, and getting a feedback loop between the industry and our work (one of the main reasons I went back to research).
It's probably a good idea for me to resurrect my weblog, and make some more posts on this stuff...
I cannot comment much on mission-critical software, although I know that the avionics industry uses a wide variety of techniques to validate software, including Hoare-style methods.
Formal methods have suffered because early advocates like Edsger Dijkstra insisted that they ought to be used everywhere. Neither the formalisms nor the software support were up to the job. More sensible advocates believe that these methods should be used on problems that are hard. They are not widely used in industry, but adoption is increasing. Probably the greatest inroads have been in the use of formal methods to check safety properties of software. Some of my favorite examples are the SPIN model checker and George Necula's proof-carrying code.
Moving away from practice and into research, Microsoft's Singularity operating-system project is about using formal methods to provide safety guarantees that ordinarily require hardware support. This in turn leads to faster performance and stronger guarantees. For example, in singularity they have proved that if a third-party device driver is allowed into the system (which means basic verification conditions have been proved), then it cannot possibly bring down that whole OS–he worst it can do is hose its own device.
Formal methods are not yet widely used in industry, but they are more widely used than they were 20 years ago, and 20 years from now they will be more widely used still. So you are future-proofed :-)
Yes, they are used, but not widely in all areas. There are more methods than just hoare logic, some are used more, some less, depending on suitability for given task. The common problem is that sofware is biiiiiiig and verifying that all of it is correct is still too hard a problem.
For example the theorem-prover (a software that aids humans in proving program correctness) ACL2 has been used to prove that a certain floating-point processing unit does not have a certain type of bug. It was a big task, so this technique is not too common.
Model checking, another kind of formal verification, is used rather widely nowadays, for example Microsoft provides a type of model checker in the driver development kit and it can be used to verify the driver for a set of common bugs. Model checkers are also often used in verifying hardware circuits.
Rigorous testing can be also thought of as formal verification - there are some formal specifications of which paths of program should be tested and so on.
"Are formal methods used in industry?"
Yes.
The assert statement in many programming languages is related to formal methods for verifying a program.
"Are formal methods used in industry widely ?"
No.
"Are these methods used to prove mission-critical software ?"
Sometimes. More often, they're used to prove that the software is secure. More formally, they're used to prove certain security-related assertions about the software.
There are two different approaches to formal methods in the industry.
One approach is to change the development process completely. The Z notation and the B method that were mentioned are in this first category. B was applied to the development of the driverless subway line 14 in Paris (if you get a chance, climb in the front wagon. It's not often that you get a chance to see the rails in front of you).
Another, more incremental, approach is to preserve the existing development and verification processes and to replace only one of the verification tasks at a time by a new method. This is very attractive but it means developing static analysis tools for exiting, used languages that are often not easy to analyse (because they were not designed to be).
If you go to (for instance)
http://dblp.uni-trier.de/db/indices/a-tree/d/Delmas:David.html
(sorry, only one hyperlink allowed for new users :( )
you will find instances of practical applications of formal methods to the verification of C programs (with static analyzers Astrée, Caveat, Fluctuat, Frama-C) and binary code (with tools from AbsInt GmbH).
By the way, since you mentioned Hoare Logic, in the above list of tools, only Caveat is based on Hoare logic (and Frama-C has a Hoare logic plug-in). The others rely on abstract interpretation, a different technique with a more automatic approach.
My area of expertise is the use of formal methods for static code analysis to show that software is free of run-time errors. This is implemented using a formal methods technique known "abstract interpretation". The technique essentially enables you to prove certain atributes of a s/w program. E.g. prove that a+b will not overflow or x/(x-y) will not result in a divide by zero. An example static analysis tool that uses this technique is Polyspace.
With respect to your question: "Are formal methods used in industry widely?" and "Are these methods used to prove mission-critical software?"
The answer is yes. This opinion is based on my experience and supporting the Polyspace tool for industries that rely on the use of embedded software to control safety critical systems such as electronic throttle in an automobile, braking system for a train, jet engine controller, drug delivery infusion pump, etc. These industries do indeed use these types of formal methods tools.
I don't believe all 100% of these industry segments are using these tools, but the use is increasing. My opinion is that the Aerospace and Automotive industries lead with the Medical Device industry quickly ramping up use.
Polyspace is a a (hideously expensive, but very good) commercial product based on program verification. It's fairly pragmatic, in that it scales up from 'enhanced unit testing that will probably find some bugs' to 'the next three years of your life will be spent showing these 10 files have zero defects'.
It is based more on negative verification ('this program won't corrupt your stack') instead positive verification ('this program will do precisely what these 50 pages of equations say it will').
To add to Jorg's answer, here's an interview with Tony Hoare. The tools Jorg's referring to, I think, are PREfast and PREfix. See here for more information.
Besides of other more procedural approaches, Hoare logic was in the basis of Design by Contract, introduced as an object oriented technique by Bertrand Meyer in Eiffel (see Meyer's article of 1992, page 4). While Design by Contract is not the same as formal verification methods (for one thing, DbC doesn't prove anything until the software is executed), in my opinion it provides a more practical use.
I'm getting married soon and am busy with the seating plan, and am running into the usual issues of who sits where: X and Y must sit together, but A and B cannot stand each other etc.
The numbers I'm dealing with aren't huge (so the manual option will work just fine), but being of the geeky persuasion, I was wondering if there was any software available to do this for me?
Failing an exact match, what should I look for (the problem space, books, reference code) to tweak for my purposes?
I am the developer of PerfectTablePlan. I post here as well as Joel's Business of Software . ;0)
Combinatorial problems, such as seat assignment, are quite nasty algorithmically. NP-hard in fact. The number of ways to seat 60 guests in 60 seats is 60! (60 factorial) and that is more than the number of atoms in the known universe.
PerfectTablePlan allows you to specify that A must sit next to B, but nowhere near C. It uses a genetic algorithm to automatically the assign seats. This works pretty well in practice - it will usually find a decent solution for 100 guests in a few seconds. You might need to make a coffee for 1000+ guests. In practice some drag and drop fine-tuning is also usually required to cope with the vagaries of local customs and family politics (Uncle Bob is a bit deaf, we had better put him nearer the top table).
You can find out a bit more about the genetic algorithm here.
Ps/ The automatic seat assignment is only a small part of creating a good seating plan. See the PerfectTablePlan tour and the tips page for more details.
http://www.perfecttableplan.com/
I believe this is from a guy that usually posts at Joel On Software.
Never tried it though.
Hope it helps.
Try modeling this using GLPK. Integer linear programming is amenable to introducing constraints into graph-based problems with multiple possible outcomes.
My personal favorite is to not assign seating: allow folks to sit wherever they want.
That might lead to [un]intentional cliquishness, but it means you're not having to worry about it.
I expect this isn't a great answer, but you could research flocking behavior
If you take away the random jitters at each step, the flock eventually settles where each member has found its optimum position in relation to the rest of the group.
This sounds like a constraint satisfaction problem. You should probably check out logic programming systems that are also equipped with constraints-solvers. They're usually like prolog, only they are actually declarative for problems that are soluble by their solvers.
Hopefully there is one that has an easy interface from your favourite language, to get the data in and out.
I used a program a while ago that would fit this perfectly... it was a Java App, you could define rules, and it would create test cases that satisfied the rules. The file extension was .als
fact GateRules {
all g:Gate | one g.loc // Gates have 1 Location
I'll keep wracking my brain for the program name.
EDIT: It was Alloy
Now that I think about it, it may not be ideal - the notion of "seats in a fixed configuration" would be a little difficult to model. I used it differently: defining rules (an airport gate is in one location, only one plane is on a runway), and testing pre and post conditions for functions (after I land a plane, can i even have more than one plane on a runway?).