Genetic Programming Online Learning - language-agnostic

Has anybody seen a GP implemented with online learning rather than the standard offline learning? I've done some stuff with genetic programs and I simply can't figure out what would be a good way to make the learning process online.
Please let me know if you have any ideas, seen any implementations, or have any references that I can look at.

Per the Wikipedia link, online learning "learns one instance at a time." The online/offline labels usually refer to how training data is feed to a supervised regression or classification algorithm. Since genetic programming is a heuristic search that uses an evaluation function to evaluate the fitness of its solutions, and not a training set with labels, those terms don't really apply.
If what you're asking is if the output of the GP algorithm (i.e. the best phenotype), can be used while it's still "searching" for better solutions, I see no reason why not, assuming it makes sense for your domain/application. Once the fitness of your GA/GP's population reaches a certain threshold, you can apply that solution to your application, and continue to run the GP, switching to a new solution when a better one becomes available.
One approach along this line is an algorithm called rtNEAT, which attempts to use a genetic algorithm to generate and update a neural network in real time.

I found a few examples by doing a Google scholar search for online Genetic Programming.
An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming
It actually looks like they found a way to make GP modify the machine code of the robot's control system during actual activities - pretty cool!
Those same authors went on to produce more related work, such as this improvement:
Evolution of a world model for a miniature robot using genetic programming
Hopefully their work will be enough to get you started - I don't have enough experience with genetic programming to be able to give you any specific advice.

It actually looks like they found a way to make GP modify the machine code of the robot's control system during actual activities - pretty cool!
Yes, the department at Uni Dortmund was heavily into linear GP :-)
Direct execution of GP programs vs. interpreted code has some advantages, though in these days you'd probably rather want to go with dynamic languages such as Java, C# or Obj-C that allow you to write classes/methods at runtime while still you can still benefit from some runtime rather than run on the raw CPU.
The online-learning approach doesn't seem like anything absolutely novel or different from 'classic GP' to me.
From my understanding it's just a case of extending the set of training/fitness/test cases during runtime?
Cheers,
Jay

Related

What are important languages to learn to understand different approaches and concepts? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
When all you have is a pair of bolt cutters and a bottle of vodka, everything looks like the lock on the door of Wolf Blitzer's boathouse. (Replace that with a hammer and a nail if you don't read xkcd)
I currently program Clojure, Python, Java and PHP, so I am familiar with the C and LISP syntax as well as the whitespace thing. I know imperative, functional, immutable, OOP and a couple type systems and other things. Now I want more!
What are languages that take a different approach and would be useful for either practical tool choosing or theoretical understanding?
I don't feel like learning another functional language(Haskell) or another imperative OOP language(Ruby), nor do I want to practice impractical fun languages like Brainfuck.
One very interesting thing I found myself are monoiconic stack based languages like Factor.
Only when I feel I understand most concepts and have answers to all my questions, I want to start thinking about my own toy language to contain all my personal preferences.
Matters of practicality are highly subjective, so I will simply say that learning different language paradigms will only serve to make you a better programmer. What is more practical than that?
Functional, Haskell - I know you said that you didn't want to, but you should really really reconsider. You've gotten some functional exposure with Clojure and even Python, but you've not experienced it to its fullest without Haskell. If you're really against Haskell then good compromises are either ML or OCaml.
Declarative, Datalog - Many people would recommend Prolog in this slot, but I think Datalog is a cleaner example of a declarative language.
Array, J - I've only just discovered J, but I find it to be a stunning language. It will twist your mind into a pretzel. You will thank J for that.
Stack, Factor/Forth - Factor is very powerful and I plan to dig into it ASAP. Forth is the grand-daddy of the Stack languages, and as an added bonus it's simple to implement yourself. There is something to be said about learning through implementation.
Dataflow, Oz - I think the influence of Oz is on the upswing and will only continue to grow in the future.
Prototype-based, JavaScript / Io / Self - Self is the grand-daddy and highly influential on every prototype-based language. This is not the same as class-based OOP and shouldn't be treated as such. Many people come to a prototype language and create an ad-hoc class system, but if your goal is to expand your mind, then I think that is a mistake. Use the language to its full capacity. Read Organizing Programs without Classes for ideas.
Expert System, CLIPS - I always recommend this. If you know Prolog then you will likely have the upper-hand in getting up to speed, but it's a very different language.
Frink - Frink is a general purpose language, but it's famous for its system of unit conversions. I find this language to be very inspiring in its unrelenting drive to be the best at what it does. Plus... it's really fun!
Functional+Optional Types, Qi - You say you've experience with some type systems, but do you have experience with "skinnable* type systems? No one has... but they should. Qi is like Lisp in many ways, but its type system will blow your mind.
Actors+Fault-tolerance, Erlang - Erlang's process model gets a lot of the buzz, but its fault-tolerance and hot-code-swapping mechanisms are game-changing. You will not learn much about FP that you wouldn't learn with Clojure, but its FT features will make you wonder why more languages can't seem to get this right.
Enjoy!
What about Prolog (for unification/backtracking etc), Smalltalk (for "everything's a message"), Forth (reverse polish, threaded interpreters etc), Scheme (continuations)?
Not a language, but the Art of the Metaobject Protocol is mind-bending stuff
I second Haskell. Don't think "I know a Lisp, so I know functional programming". Ever heard of type classes? Algebraic data types? Monads? "Modern" (more or less - at least not 50 years old ;) ) functional languages, especially Haskell, have explored a plethora of very powerful useful new concepts. Type classes add ad-hoc polymorphism, but type inference (yet another thing the languages you already know don't have) works like a charm. Algebraic data types are simply awesome, especially for modelling trees-like data structures, but work fine for enums or simple records, too. And monads... well, let's just say people use them to make exceptions, I/O, parsers, list comprehensions and much more - in purely functional ways!
Also, the whole topic is deep enough to keep one busy for years ;)
I currently program Clojure, Python, Java and PHP [...] What are languages that take a different approach and would be useful for either practical tool choosing or theoretical understanding?
C
There's a lot of C code lying around---it's definitely practical. If you learn C++ too, there's a big lot of more code around (and the leap is short once you know C and Java).
It also gives you (or forces you to have) a great understanding of some theoretical issues; for instance, each running program lives in a 4 GB byte array, in some sense. Pointers in C are really just indices into this array---they're just a different kind of integer. No different in Java, Python, PHP, except hidden beneath a surface layer.
Also, you can write object-oriented code in C, you just have to be a bit manual about vtables and such. Simon Tatham's Portable Puzzle Collection is a great example of fairly accessible object-oriented C code; it's also fairly well designed and well worth a read to a beginner/intermediate C programmer. This is what happens in Haskell too---type classes are in some sense "just another vtable".
Another great thing about C: engaging in Q&A with skilled C programmers will get you a lot of answers that explain C in terms of lower-level constructs, which builds your closer-to-the-iron knowledge base.
I may be missing OP's point---I think I am, judging by the other answers---but I think it might be a useful answer to other people who have a similar question and read this thread.
From Peter Norvig's site:
"Learn at least a half dozen programming languages. Include one language that supports class abstractions (like Java or C++), one that supports functional abstraction (like Lisp or ML), one that supports syntactic abstraction (like Lisp), one that supports declarative specifications (like Prolog or C++ templates), one that supports coroutines (like Icon or Scheme), and one that supports parallelism (like Sisal). "
http://norvig.com/21-days.html
I'm amazed that after 6 months and hundreds of votes, noone has mentioned SQL ...
In the types as theorems / advanced type systems: Coq ( I think Agda comes in this category too).
Coq is a proof assistant embedded into a functional programing language.
You can write mathematical proofs and Coq helps to build a solution.
You can write functions and prove properties about it.
It has dependent types, that alone blew my mind. A simple example:
concatenate: forall (A:Set)(n m:nat), (array A m)->(array A n)->(array A (n+m))
is the signature of a function that concatenates two arrays of size n and m of elements of A and returns an array of size (n+m). It won't compile if the function doesn't return that!
Is based on the calculus of inductive constructions, and it has a solid theory behind it.
I'm not smart enough to understand it all, but I think is worth taking a look, specially if you trend towards type theory.
EDIT: I need to mention: you write a function in Coq and then you can PROVE it is correct for any input, that is amazing!
One of the languages which i am interested for have a very different point of view (including a new vocabulary to define the language elements and a radical diff syntax) is J. Haskell would be the obvious choice for me, although it is a functional lang, cause its type system and other unique features open your mind and makes you rethink you previous knowledge in (functional) programming.
Just like fogus has suggested it to you in his list, I advise you too to look at the language OzML/Mozart
Many paradigms, mainly targetted at concurrency/multi agent programming.
Concerning concurrency, and distributed calculus, the equivalent of Lambda calculus (which is behind functionnal programming) is called the Pi Calculus.
I have only started begining to look at some implementation of the Pi calculus. But they already have enlarged my conceptions of computing.
Pict
Nomadic Pict
FunLoft. (this one is pretty recent, conceived at INRIA)
Dataflow programming, aka flow-based programming is a good step ahead on the road. Some buzzwords: paralell processing, rapid prototyping, visual programming (not as bad as sounds first).
Wikipedia's articles are good:
In computer science, flow-based
programming (FBP) is a programming
paradigm that defines applications as
networks of "black box" processes,
which exchange data across predefined
connections by message passing, where
the connections are specified
externally to the processes. These
black box processes can be reconnected
endlessly to form different
applications without having to be
changed internally. FBP is thus
naturally component-oriented.
http://en.wikipedia.org/wiki/Flow-based_programming
http://en.wikipedia.org/wiki/Dataflow_programming
http://en.wikipedia.org/wiki/Actor_model
Read JPM's book: http://jpaulmorrison.com/fbp/
(We've written a simple implementation in C++ for home automation purposes, and we're very happy with it. Documentation is under construction.)
You've learned a lot of languages. Now is the time to focus on one language, and master it.
perhaps you might want to try LabView for it's visual programming, although it's for engineering purposes.
nevertheless, you seem pretty interested in all that's out there, hence the suggestion
also, you could try the android appinventor for visually building stuff
Bruce A. Tate, taking a page from The Pragmatic Programmer wrote a book on exactly that:
Seven Languages in Seven Weeks: A Pragmatic Guide to Learning Programming Languages
In the book, he covers Clojure, Haskell, Io, Prolog, Scala, Erlang, and Ruby.
Mercury: http://www.mercury.csse.unimelb.edu.au/
It's a typed Prolog, with uniqueness types and modes (i.e. specifying that the predicate append(X,Y,Z) meaning X appended to Y is Z yields one Z given an X and Y, but can yield multiple X/Ys for a given Z). Also, no cut or other extra-logical predicates.
If you will, it's to Prolog as Haskell is to Lisp.
Programming does not cover the task of programmers.
New things are always interesting, but there are some very cool old stuff.
The first database system was dBaseIII for me, I was spending about a month to write small examples (dBase/FoxPro/Clipper is a table-based db with indexes). Then, at my first workplace, I met MUMPS, and I got headache. I was young and fresh-brained, but it took 2 weeks to understand the MUMPS database model. There was a moment, like in comics: after 2 weeks, a button has been switched on, and the bulb has just lighten up in my mind. MUMPS is natural, low level, and very-very fast. (It's an unbalanced, unformalized btree without types.) Today's trends shows the way back to it: NoSQL, key-value db, multidimensional db - so there are only some steps left, and we reach Mumps.
Here's a presentation about MUMPS's advantages: http://www.slideshare.net/george.james/mumps-the-internet-scale-database-presentation
A short doc on hierarchical db: http://www.cs.pitt.edu/~chang/156/14hier.html
An introduction to MUMPS globals (in MUMPS, local variables, short: locals are the memory variables, and the global variables, short: globals are the "db variables", setting a global variable goes to the disk immediatelly):
http://gradvs1.mgateway.com/download/extreme1.pdf (PDF)
Say you want to write a love poem...
Instead of using a hammer just because there's one already in your hand, learn the proper tools for the task: learn to speak French.
Once you've reached near-native speaking level, you're ready to start your poem.
While learning new languages on an academical level is an interesting hobby, IMHO you can't really learn to use one until you try to apply it to a real world problem. So, rather than looking for a new language to learn, I'd in your place first look for a new things to build, and only then I'd look for the right language to use for that one specific project. First pick the problem, then the tool, not the other way around..
For anyone who hasn't been around since the mid 80's, I'd suggest learning 8-bit BASIC. It's very low-level, very primitive and it's an interesting exercise to program around its holes.
On the same line, I'd pick an HP-41C series calculator (or emulator, although nothing beats real hardware). It's hard to wrap your brain around it, but well worth it. A TI-57 will do, but will be a completely different experience. If you manage to solve second degree equations on a TI-55, you'll be considered a master (it had no conditionals and no branches except a RST, that jumped the program back to step 0).
And last, I'd pick FORTH (it was mentioned before). It has a nice "build your language" Lisp-ish thing, but is much more bare metal. It will teach you why Rails is interesting and when DSLs make sense and you'll have a glipse on what your non-RPN calculator is thinking while you type.
PostScript. It is a rather interesting language as it's stack based, and it's quite practical once you want to put things on paper and you want either to get it done or troubleshoot why isn't it getting done.
Erlang. The intrinsic parallelism gives it a rather unusual feel and you can again learn useful things from that. I'm not so sure about practicality, but it can be useful for some fast prototyping tasks and highly redundant systems.
Try programming GPUs - either CUDA or OpenCL. It's just C/C++ extensions, but the mental model of the architecture is again completely different from the classic approach, and it definitely gets practical once you need to get some real number crunching done.
Erlang, Forth and some embedded work with assembly language. Really; buy an Arduino kit or something similar, and create a polyphonic beep in assembly. You'll really learn something.
There's also anic:
https://code.google.com/p/anic/
From its site:
Faster than C, Safer than Java, Simpler than *sh
anic is the reference implementation compiler for the experimental, high-performance, implicitly parallel, deadlock-free general-purpose dataflow programming language ANI.
It doesn't seem to be under active development anymore, but it seems to have some interesting concepts (and that, after all, is what you seem to be after).
While not meeting your requirement of "different" - I'd wager that Fantom is a language that a professional programmer should look at. By their own admission, the authors of fantom call it a boring language. It merely shores up the most common use cases of Java and C#, with some borrowed closure syntax from ruby and similar newer languages.
And yet it manages to have its own bootstrapped compiler, provide a platform that has a drop in install with no external dependencies, gets packages right - and works on Java, C# and now the Web (via js).
It may not widen your horizons in terms of new ways of programming, but it will certainly show you better ways of programming.
One thing that I see missing from the other answers: languages based on term-rewriting.
You could take a look at Pure - http://code.google.com/p/pure-lang/ .
Mathematica is also rewriting based, although it's not so easy to figure out what's going on, as it's rather closed.
APL, Forth and Assembly.
Have some fun. Pick up a Lego Mindstorm robot kit and CMU's RobotC and write some robotics code. Things happen when you write code that has to "get dirty" and interact with the real world that you cannot possibly learn in any other way. Yes, same language, but a very different perspective.

Your criteria in using a new technology or programming language

What are your criteria or things that you consider when you are an early adopter of a programming language or technology?
Two of the most common explanations I've heard are:
It should be "fun" (what I've heard from technical people).
It should be capable of solving our problem (what I've heard from business people).
So what's yours?
I've made this change several times over my career spanning various companies, moving from C to Java to Ruby to Haskell for the majority of my software development.
In all cases, I've been looking for more expressive power and better abstractions. This is always driven by business needs: how can I develop better software more cheaply? To me, the challenge of this problem is "fun," so fun rather automatically comes along with it. Justifying the business value to managers can be difficult, however; they often don't have the technical skills to understand why one programming language can be better than another, and are worried about moving to technology that they understand even less than the current one. (I solved this problem by taking over the manager's job as well: I started a company.)
It's hard to say what exactly to look for in a new language. You obviously don't have a detailed grasp of the language, or you would already be using it or know why you're not. Vast experience will bring an instinct that will make certain languages "smell" better than others, but—and this can make it especially hard to convince others to look at a new language—you won't know precisely what features give you big advantages. An example would be pattern matching: it's a feature found in relatively few languages, and though I knew about it, I had no idea when I started in with Haskell that this would be a key contributor to productivity improvement.
While it's negative ("avoid this") advice rather than positive ("do this") advice, one fairly easy rule is to avoid spending a lot of time on languages very similar to ones you already know well. If you already know Ruby, learning Python is not likely to teach you much in the way of big new things; C# and Java would be another example. (Although C# is starting to get a few interesting features that Java doesn't have.)
Looking at what the academic community is doing with a language may be helpful. If it's a fertile area of research for academics, there's almost certainly going to be interesting stuff in there, whereas if it's not it's quite possible that there's nothing interesting there to learn.
My criteria is simple:
wow factor
simple
gets things done
quick
I want it to do something easily that is hard to do with the tools I'm used to. So I moved to Python, and then Ruby, over Java because I could build a program incrementally, add functions easily, and express programs more concisely (esp. with Ruby, where I can pass blocks/Procs and have clean closures, plus the ability to define nice DSLs making use of blocks and yield.)
I took up Erlang because it expresses Actor-based concurrency well; this makes for easier network programs.
I took up Haskell because it fit with a number of formal methods tools I wanted to experiment with.
Open source.
Active developer community
Active user community, with a friendly mailing list or forum.
Some examples and documentation, preferably a tutorial
Desirable features (solves problems).
If it's for my personal fun, I need very little excuse, as I do love learning new things, and the best way of learning is by doing. If it's for an employer, customer, or client, the bar is MUCH higher -- I must be convinced that the "new stuff", even after accounting for ramp-up effects and the costs that come with being at the bleeding edge, will do a substantially better job at delivering value to the client (or customer or employer). It's a matter of professional attitude: my job's to deliver top value to the client -- having fun while so doing is auxiliary and secondary. So, in practice, "new" technologies (including languages) that I introduce in a professional setting will generally be ones I've previously grown comfortable and confident with in my own spare time.
Someone has once said something to the effect of:
"If learning a programming language doesn't change the way you think about programming, it's not worth learning."
That's one metric (out of many) to judge the value of learning new languages (or other technology) by. Using this, one might suggest learning the following languages:
C, because it makes you understand the Von Neumann architecture better than any other language (and it's sort-of random-access Turing Machine like, sorta'...).
LaTeX (as a programming language, not only as a typesetting system) because it makes you learn about string rewriting systems as a model of computation. Here, sed is similar; learn both, because they're also both useful tools :-)
Haskell, because it teaches you about functional programming, lambda calculus (yet another model of computation), lazy evaluation, type inference, algebraic datatypes (done with ease), decidability of type systems (i.e. learn to fear C++)
Scheme `(or (another) ,Lisp) for its macro system, and dynamic typing, and functional programming done somewhat differently.
SmallTalk, to learn Object-orientation (so I hear)
Java, to learn what earning money feels like :D
Forth, because wisdom bestowed forth learned implies.
... that doesn't explain why I learn python or shell scripting, though. I think you should take enlightenment with a grain of salt and a shovelful of pragmatism :)
Should be capable of solving the problem
Should be more adequate to solve the problem than other alternatives
Should be fun
Should have prompt support, either from a community or the company promoting it
A language should be:
Easy to use, to learn and to code in.
Consistent. Many languages have 50 legacy ways of doing things, this increases the learning curve and turns quite annoying. C# for me is one of those languages.
It should provide the most useful solution with the least amount of code. On the other hand sometimes you do need a bit of expressiveness to make sure you're not making a huge mistake.
The right tool for the right job and maybe the right tool for any job
My criteria that the language should have:
1. New ideas - If the language is just another Scheme variant, if you know one than I don't feel the need to learn this new one. I will learn it if I think I will learn something new.
2. Similar to another language, but better. For example, while Java and C++ have many of the same ideas, Java's automatic garbage collection makes it a better choice in many cases.
Gets the most done with the least amount of effort
Extremely interoperable with different protocols, out of the box
Fast
Has lots of libraries built in for stuff 99% of web developers do (PDF's, emailing, reporting, etc..)
It depends on why I'm learning the new language. If I'm learning it for fun, then it has to meet these criteria:
Is well it supported on my platform?
Something that runs only on Linux
isn't interesting to a Windows
programmer.
Will I learn something new? In
other words, does it come up with a
new way of doing things?
Does it look fun? I don't want to learn Ada even if it has new ways of doing things.
If I'm learning it for work, the criteria are different:
How mature is it? Has it been
proven to work in the real world?
How big is the community?
Will it make my job easier? I.e. is
it worth the time investment versus
just doing the task with a language
I already know.

How do you plan an application's architecture before writing any code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
One thing I struggle with is planning an application's architecture before writing any code.
I don't mean gathering requirements to narrow in on what the application needs to do, but rather effectively thinking about a good way to lay out the overall class, data and flow structures, and iterating those thoughts so that I have a credible plan of action in mind before even opening the IDE. At the moment it is all to easy to just open the IDE, create a blank project, start writing bits and bobs and let the design 'grow out' from there.
I gather UML is one way to do this but I have no experience with it so it seems kind of nebulous.
How do you plan an application's architecture before writing any code? If UML is the way to go, can you recommend a concise and practical introduction for a developer of smallish applications?
I appreciate your input.
I consider the following:
what the system is supposed to do, that is, what is the problem that the system is trying to solve
who is the customer and what are their wishes
what the system has to integrate with
are there any legacy aspects that need to be considered
what are the user interractions
etc...
Then I start looking at the system as a black box and:
what are the interactions that need to happen with that black box
what are the behaviours that need to happen inside the black box, i.e. what needs to happen to those interactions for the black box to exhibit the desired behaviour at a higher level, e.g. receive and process incoming messages from a reservation system, update a database etc.
Then this will start to give you a view of the system that consists of various internal black boxes, each of which can be broken down further in the same manner.
UML is very good to represent such behaviour. You can describe most systems just using two of the many components of UML, namely:
class diagrams, and
sequence diagrams.
You may need activity diagrams as well if there is any parallelism in the behaviour that needs to be described.
A good resource for learning UML is Martin Fowler's excellent book "UML Distilled" (Amazon link - sanitised for the script kiddie link nazis out there (-: ). This book gives you a quick look at the essential parts of each of the components of UML.
Oh. What I've described is pretty much Ivar Jacobson's approach. Jacobson is one of the Three Amigos of OO. In fact UML was initially developed by the other two persons that form the Three Amigos, Grady Booch and Jim Rumbaugh
I really find that a first-off of writing on paper or whiteboard is really crucial. Then move to UML if you want, but nothing beats the flexibility of just drawing it by hand at first.
You should definitely take a look at Steve McConnell's Code Complete-
and especially at his giveaway chapter on "Design in Construction"
You can download it from his website:
http://cc2e.com/File.ashx?cid=336
If you're developing for .NET, Microsoft have just published (as a free e-book!) the Application Architecture Guide 2.0b1. It provides loads of really good information about planning your architecture before writing any code.
If you were desperate I expect you could use large chunks of it for non-.NET-based architectures.
I'll preface this by saying that I do mostly web development where much of the architecture is already decided in advance (WebForms, now MVC) and most of my projects are reasonably small, one-person efforts that take less than a year. I also know going in that I'll have an ORM and DAL to handle my business object and data interaction, respectively. Recently, I've switched to using LINQ for this, so much of the "design" becomes database design and mapping via the DBML designer.
Typically, I work in a TDD (test driven development) manner. I don't spend a lot of time up front working on architectural or design details. I do gather the overall interaction of the user with the application via stories. I use the stories to work out the interaction design and discover the major components of the application. I do a lot of whiteboarding during this process with the customer -- sometimes capturing details with a digital camera if they seem important enough to keep in diagram form. Mainly my stories get captured in story form in a wiki. Eventually, the stories get organized into releases and iterations.
By this time I usually have a pretty good idea of the architecture. If it's complicated or there are unusual bits -- things that differ from my normal practices -- or I'm working with someone else (not typical), I'll diagram things (again on a whiteboard). The same is true of complicated interactions -- I may design the page layout and flow on a whiteboard, keeping it (or capturing via camera) until I'm done with that section. Once I have a general idea of where I'm going and what needs to be done first, I'll start writing tests for the first stories. Usually, this goes like: "Okay, to do that I'll need these classes. I'll start with this one and it needs to do this." Then I start merrily TDDing along and the architecture/design grows from the needs of the application.
Periodically, I'll find myself wanting to write some bits of code over again or think "this really smells" and I'll refactor my design to remove duplication or replace the smelly bits with something more elegant. Mostly, I'm concerned with getting the functionality down while following good design principles. I find that using known patterns and paying attention to good principles as you go along works out pretty well.
http://dn.codegear.com/article/31863
I use UML, and find that guide pretty useful and easy to read. Let me know if you need something different.
UML is a notation. It is a way of recording your design, but not (in my opinion) of doing a design. If you need to write things down, I would recommend UML, though, not because it's the "best" but because it is a standard which others probably already know how to read, and it beats inventing your own "standard".
I think the best introduction to UML is still UML Distilled, by Martin Fowler, because it's concise, gives pratical guidance on where to use it, and makes it clear you don't have to buy into the whole UML/RUP story for it to be useful
Doing design is hard.It can't really be captured in one StackOverflow answer. Unfortunately, my design skills, such as they are, have evolved over the years and so I don't have one source I can refer you to.
However, one model I have found useful is robustness analysis (google for it, but there's an intro here). If you have your use-cases for what the system should do, a domain model of what things are involved, then I've found robustness analysis a useful tool in connecting the two and working out what the key components of the system need to be.
But the best advice is read widely, think hard, and practice. It's not a purely teachable skill, you've got to actually do it.
I'm not smart enough to plan ahead more than a little. When I do plan ahead, my plans always come out wrong, but now I've spend n days on bad plans. My limit seems to be about 15 minutes on the whiteboard.
Basically, I do as little work as I can to find out whether I'm headed in the right direction.
I look at my design for critical questions: when A does B to C, will it be fast enough for D? If not, we need a different design. Each of these questions can be answer with a spike. If the spikes look good, then we have the design and it's time to expand on it.
I code in the direction of getting some real customer value as soon as possible, so a customer can tell me where I should be going.
Because I always get things wrong, I rely on refactoring to help me get them right. Refactoring is risky, so I have to write unit tests as I go. Writing unit tests after the fact is hard because of coupling, so I write my tests first. Staying disciplined about this stuff is hard, and a different brain sees things differently, so I like to have a buddy coding with me. My coding buddy has a nose, so I shower regularly.
Let's call it "Extreme Programming".
"White boards, sketches and Post-it notes are excellent design
tools. Complicated modeling tools have a tendency to be more
distracting than illuminating." From Practices of an Agile Developer
by Venkat Subramaniam and Andy Hunt.
I'm not convinced anything can be planned in advance before implementation. I've got 10 years experience, but that's only been at 4 companies (including 2 sites at one company, that were almost polar opposites), and almost all of my experience has been in terms of watching colossal cluster********s occur. I'm starting to think that stuff like refactoring is really the best way to do things, but at the same time I realize that my experience is limited, and I might just be reacting to what I've seen. What I'd really like to know is how to gain the best experience so I'm able to arrive at proper conclusions, but it seems like there's no shortcut and it just involves a lot of time seeing people doing things wrong :(. I'd really like to give a go at working at a company where people do things right (as evidenced by successful product deployments), to know whether I'm a just a contrarian bastard, or if I'm really as smart as I think I am.
I beg to differ: UML can be used for application architecture, but is more often used for technical architecture (frameworks, class or sequence diagrams, ...), because this is where those diagrams can most easily been kept in sync with the development.
Application Architecture occurs when you take some functional specifications (which describe the nature and flows of operations without making any assumptions about a future implementation), and you transform them into technical specifications.
Those specifications represent the applications you need for implementing some business and functional needs.
So if you need to process several large financial portfolios (functional specification), you may determine that you need to divide that large specification into:
a dispatcher to assign those heavy calculations to different servers
a launcher to make sure all calculation servers are up and running before starting to process those portfolios.
a GUI to be able to show what is going on.
a "common" component to develop the specific portfolio algorithms, independently of the rest of the application architecture, in order to facilitate unit testing, but also some functional and regression testing.
So basically, to think about application architecture is to decide what "group of files" you need to develop in a coherent way (you can not develop in the same group of files a launcher, a GUI, a dispatcher, ...: they would not be able to evolve at the same pace)
When an application architecture is well defined, each of its components is usually a good candidate for a configuration component, that is a group of file which can be versionned as a all into a VCS (Version Control System), meaning all its files will be labeled together every time you need to record a snapshot of that application (again, it would be hard to label all your system, each of its application can not be in a stable state at the same time)
I have been doing architecture for a while. I use BPML to first refine the business process and then use UML to capture various details! Third step generally is ERD! By the time you are done with BPML and UML your ERD will be fairly stable! No plan is perfect and no abstraction is going to be 100%. Plan on refactoring, goal is to minimize refactoring as much as possible!
I try to break my thinking down into two areas: a representation of the things I'm trying to manipulate, and what I intend to do with them.
When I'm trying to model the stuff I'm trying to manipulate, I come up with a series of discrete item definitions- an ecommerce site will have a SKU, a product, a customer, and so forth. I'll also have some non-material things that I'm working with- an order, or a category. Once I have all of the "nouns" in the system, I'll make a domain model that shows how these objects are related to each other- an order has a customer and multiple SKUs, many skus are grouped into a product, and so on.
These domain models can be represented as UML domain models, class diagrams, and SQL ERD's.
Once I have the nouns of the system figured out, I move on to the verbs- for instance, the operations that each of these items go through to commit an order. These usually map pretty well to use cases from my functional requirements- the easiest way to express these that I've found is UML sequence, activity, or collaboration diagrams or swimlane diagrams.
It's important to think of this as an iterative process; I'll do a little corner of the domain, and then work on the actions, and then go back. Ideally I'll have time to write code to try stuff out as I'm going along- you never want the design to get too far ahead of the application. This process is usually terrible if you think that you are building the complete and final architecture for everything; really, all you're trying to do is establish the basic foundations that the team will be sharing in common as they move through development. You're mostly creating a shared vocabulary for team members to use as they describe the system, not laying down the law for how it's gotta be done.
I find myself having trouble fully thinking a system out before coding it. It's just too easy to only bring a cursory glance to some components which you only later realize are much more complicated than you thought they were.
One solution is to just try really hard. Write UML everywhere. Go through every class. Think how it will interact with your other classes. This is difficult to do.
What I like doing is to make a general overview at first. I don't like UML, but I do like drawing diagrams which get the point across. Then I begin to implement it. Even while I'm just writing out the class structure with empty methods, I often see things that I missed earlier, so then I update my design. As I'm coding, I'll realize I need to do something differently, so I'll update my design. It's an iterative process. The concept of "design everything first, and then implement it all" is known as the waterfall model, and I think others have shown it's a bad way of doing software.
Try Archimate.

Development Cost of Procedural Programming vs. OOP?

I come from a fairly strong OO background, the benefits of OOD & OOP are second nature to me, but recently I've found myself in a development shop tied to a procedural programming habits. The implementation language has some OOP features, they are not used in optimal ways.
Update: everyone seems to have an opinion about this topic, as do I, but the question was:
Have there been any good comparative studies contrasting the cost of software development using procedural programming languages versus Object Oriented languages?
Some commenters have pointed out the dubious nature of trying to compare apples to oranges, and I agree that it would be very difficult to accurately measure, however not entirely impossible perhaps.
Most all of these questions are confounded by the problem that individual programmer productivity varies by an order of magnitude or more; if you happen to have an OO programmer who is one of the gruop at productivity x, and a "procedural" programmer who is a 10x programmer, the procedural programmer is liable to win even if OO is faster in some sense.
There's also the problem that coding productivity is usually only 10-20 percent of the total effort in a realistic project, so higher productivity doesn't have much impact; even that hypothetical 10x programmer, or an infinitely fast programmer, can't cut the overall effort by more that 10-20 percent.
You might have a look at Fred Brooks' paper "No Silver Bullet".
After poking around with google I found this paper here. The search terms I used are Productivity object oriented.
The opening paragraphs goes on to say
Introduction of object-oriented
technology does not appear to hinder
overall productivity on new large
commercial projects, but it neither
seems to improve it in the first two
product generations. In practice, the
governing influence may be the
business workflow and not the
methodology.
I think you will find that Object Oriented Programming is better in specific circumstances but neutral for everything else. What sold my bosses on converting my company's CAD/CAM application to a object oriented framework is that I precisely showed the exact areas in which it will help. The focus wasn't on the methodology as a whole but how it will help us sold some specific problem we had. For us was having a extensible framework for adding more shapes, reports, and machine controllers, and using collections to remove the memory limitation of the older design.
OO or procedural offer to different way to develop and both can be costly if badly managed.
If we suppose that the works are done by the best person in both case, I think the result might be equal in term of cost.
I believe the cost difference will be on how you will be the maintenance phase where you will need to add features and modify current features. Procedural project are harder to have automatic testing, are less subject to be able to expand without affecting other part and is more harder to understand the concept part by part (because cohesive part aren't grouped together necessary).
So, I think, the OO cost will be lower in the long run compared to Procedural.
i think S.Lott was referring to the "unrepeatable experiment" phenomenon, i.e. you cannot write application X procedurally then rewind time and write it OO to see what the difference is.
you could write the same app twice two different ways, but
you would learn something about the app doing it the first way that would help you in the second way, and
you may be better at OO than at procedural, or vice-versa, depending on your experience and the nature of the application and the tools chosen
so there really is no direct basis for comparison
empirical studies are likewise useless, for similar reasons - different applications, different teams, etc.
paradigm shifts are difficult, and a small percentage of programmers may never make the transition
if you are free to develop your way, then the solution is simple: develop things your way, and when your co-workers notice that you are coding circles around them and your code doesn't break nearly as often etc. and they ask you how you do it, then teach them OOP (along with TDD and any other good practices you may use)
if not, well, it might be time to polish the resume... ;-)
Good idea. A head-to-head comparison. Write application X in a procedural style, and in an OO style and measure something. Cost to develop. Return on Investment.
What does it mean to write the same application in two styles? It would be a different application, wouldn't it? The procedural people would balk that the OO folks were cheating when they used inheritance or messaging or encapsulation.
There can't be such a comparison. There's no basis for comparing two "versions" of an application. It's like asking if apples or oranges are more cost-effective at being fruit.
Having said that, you have to focus on things other folks can actually see.
Time to build something that works.
Rate of bugs and problems.
If your approach is better, you'll be successful, and people will want to know why.
When you explain that OO leads to your success... well... you've won the argument.
The key is time. How long does it take the company to change the design to add new features or fix existing ones. Any study you make should focus on that area.
My company had a event driven procedure oriented design for a CAM software in the mid 90's created using VB3. It was taking a long time to adapt the software to new machines. A long time to test the effects of bug fixes and new features.
With VB6 came along I was able to graph out the current design and a new design that fixed the testing and adaptation problem. The non-technical boss grasped what I was trying doing right away.
The key is to explain WHY OOP will benefit the project. Use things like Refactoring by Fowler and Design Patterns to show how a new design will lower the time to do things. Also include how you get from Point A to Point B. Refactoring will help with showing how you can have working intermediate stages that can be shipped.
I don't think you'll find a study like that. At least you should define what you mean by "cost". Because OOP designing is somehow slower, so on the short term development is maybe faster with procedural programming. On very short term maybe spaghetti coding is even more faster.
But when project begins growing things are opposite, because OOP designing is best featured to manage code complexity.
So in a small project maybe procedural design MAY be cheaper, because it's faster and you don't have drawbacks.
But in a big project you'll get stick very quickly using only a simple paradigm like procedural programming
I doubt you will find a definitive study. As several people have mentioned this is not a reproducible experiment. You will find anecdotal evidence, a lot of it. Some people may find some statistical studies, but I would examine them carefully. I am not aware of any really good ones.
I also will make another point, there is no such thing as purely object oriented or purely procedural in the real world. Many if not most object methods are written with procedural code. At the same time many procedural programs use OO methodologies such as encapsulation (also call abstraction by some).
Don't get me wrong, OO and procedural programs look and are different, but it is a matter of dark gray vs light gray instead of black and white.
This article says nothing about OOP vs Procedural. But I'd think that you could use similar metrics from your company for a discussion.
I find it interesting as my company is starting to explore the ROWE initiative. In our first session, it was apparent that we don't currently capture enough metrics on outcomes.
So you need to focus on 1) Is the maintenance of current processes impeding future development? 2) How are different methods going to affect #1?

Ratio of real code to supporting code

I'm finding only about 30% of my code actually solves problems, the rest is taken up by logging, tests, parameter checking, exceptions, error handling and so on. Do you find that in your code, and is there an IDE/Editor that allows you to hide code that's not interesting?
OTOH are there languages which make the support code more manageable and smaller in size?
Edit - I think we're all aware of the difference between business logic and other code. I'm not saying that the logging etc is not important. The things is, when I'm coding I'm either implementing business logic, or I'm making sure things don't break. For me that's two different ways of thinking, do others develop like that, and is there an IDE that supports that way of developing?
Supporting code is just as important as the "real code". The quality of your product is determined as much by supporting code as anything else.
Consider an automobile. In terms of just getting from point A to point B, that requires nothing more than a go-cart: a frame, a seat, an engine, a few tires. But modern cars have a lot more than just the basics. Highly efficient engines using electronic engine timing. Automatic transmissions. Bucket seats. Heating and A/C. Rack and pinion steering. Power brakes. Anti-lock brakes. Quiet, comfortable cabins protected from the weather. Air bags. Crumple zones and other advanced safety features. Etc. Etc.
Details and execution are important, even in software. If you find that your "supporting code" tends to look more like kludges and hacks, then it's time to rethink your fundamental approach. But ultimately the fit and finish determines quality of the end product as much as anything else.
Edit: The questions you should ask yourself:
Is your "supporting code":
An umbrella duct taped to a pole or a metal and glass cabin frame?
A piece of pipe tied to the front of the car or an energy absorbing bumper integrated into a crumple zone?
A grappling hook on a rope tied to the frame or 4-wheel anti-lock power brakes?
A pair of goggles and a thick coat or a windshield and a heating system?
Answers to these questions will probably affect how much you care about your "supporting code".
Edit: Response to Dave Turvey's comment:
I'd encourage rereading the original question, one of the examples of "support code" listed is "error handling". Consider this for a moment. Imagine it in the context of, say, an automobile, a microwave oven, or even an operating system. Should error handling be relegated to second class citizenship because it serves a "support" function in some abstract sense? In an automobile the safety features are part of the fundamental design of the vehicle and comprise a substantial part of the value of the car. The safety features and "error handling" of a microwave oven (indeed, of the microwave oven's embedded software as well) are an important part of its value as well. A microwave oven that was improperly shielded could cook food just fine, under the right circumstances, but it would pose a hazard to the operator.
The implicit featureset of every tool (software or otherwise) includes this list:
Robustness
Usability
Performance
Everything anyone has ever built or used has had these features. Failure to understand this will translate to failure to execute well on these features which will make for a poor quality product of low value and low commercial interest. There is no such thing as "support code", there is only a misunderstanding of the nature of what it means for a feature to be complete. A "feature" that works in the abstract only under laboratory conditions is an experiment, not a part of a product.
The idea of pure, pristine features floating on a bog of dirty, ugly support code is the wrong image of software development. Instead, think of elegant, superbly-integrated machinery that is well-built, intuitive to use, and powerful.
The supporting code is important, but you want not to be distracted by it when you don't want to. There are two technologies that can help.
A language with first-class functions will help you modularize your code so that logging, timing, and so on can be implemented once and then combined with many other modules. It will also help you write unit tests. Some good ways to learn the techniques are to read the paper Why Functional Programming Matters and to learn about the QuickCheck tool. (No, I am not a shill for John Hughes, but he does do wonderful work.)
If you cannot use a programming language with powerful capabilities for modularization and reuse, or if you don't want to, Don Knuth's Literate Programming technique will help you organize your code so that you can split up parts the way you want and pay attention only to what you want, when you want. The Noweb literate-programming tool supports any language that can be written in ASCII, and also combinations of those languages.
If my IDE could hide "not interesting code" I would definitely turn the feature off. You wouldn't want that happening, I bet :)
There are certainly languages that minimise the amount of supporting code, but I don't think you could switch from Java to lets say JavaScript simply because in JavaScript you wouldn't have to declare every exception... I think it's quite necessary to have your supporting code where it is.
Oh, and you could have your program formally specified and mathematically proven, then you wouldn't need to support your code too much ;D
The real code you are referring to is usually called "Business Logic".
In a good unit testing system, your unit tests should be in their own classes (and probably their own assemblies) so that shouldn't be an issue.
The rest is language based for the most part. The more advanced a language, the better it's ability to avoid writing support code to some degree. Also, a well-targetted development system can help you avoid writing a lot of code (Visual Basic's screen builder, Ruby on Rails, ...) but these abstractions can break down and cause you to write just as much code as anything else if you use it to develop targets outside it's intended types of apps. (VB & Ruby don't help all that much if you're calculating prime numbers)
Beyond the language/platform, you have refactoring--the art of eliminating all the supporting code that you can (as well as redundancies in your business logic) to keep your code-base clean and small.
When practicing advanced refactoring, you'll probably end up writing tools for yourself.
Sometimes abstracting data out of your code and into a structured file of some sort can eliminate huge piles of support code and move the rest into "Business logic" because now parsing that data and setting it up is part of the "business" your program does.
This is a good trade-off because this type of business logic tends to be more readable and easier to factor. The other advantage of this kind of abstraction is that all your "Configuration" is now done in data which tends to make it somebody elses' problem.
As an example of this type of tooling: Rails itself! It takes a lot of the boilerplate of web development and factors it out of the code and into libraries driven by data and simplistic code (Ruby blurs the line between code and data--their data files actually loop back to being specified in Ruby code!)
It's like you want to take a trip to the top of Pike's Peak. You can take the Winnebago, you can take your SUV, or a motorcycle, or ride up on your bike.
Some ways are a more or less expensive, faster, etc. Sometimes you end up taking along a lot of stuff the isn't there strictly for accomplishing vertical progress; it's nice to have a beer in the cooler. But it pays to remember that you're responsible for everything that goes with you to the top.
Aspect Oriented Programming partly addresses this. It allows you to inject code into existing source/bytecode. This way you can make a task such as logging appear in its own module instead of woven into the business logic.
Work expands to fill it's container. This really sounds like an economics question. (ie. optimizing your outputs- features for users and features for the developer) with expensive inputs (time spent writing features, time spent writing plumbing code.)
You have to include user visible features or you don't have a viable product or job. Once that is done partly done, your remaining budget of time will be split between activities with a visible return on effort and an invisible (but positive!) return on effort, like exception logging, memory management, etc.
What ever language makes it cheaper to implement features will probably increase your features/to plumbing code ratio. Likewise, whatever language makes it cheaper to implement plumbing code will probably increase the feature to plumbing code because you'll have freed up more time to write features.
Like all optimization problems you'd have 2 effects-- the increase in the size of the support code (because say, you're using cheap code generation) and the increase in the size of feature related code (because you have more time left over to write features), so the final ratio might be hard to predict.
I do not begrudge the 90% of my code that is data access plumbing, because it is all testable, code generated and very cheap, compared to the 10% of handwritten of domain specific code.
I don't try to make all routines foolproof, only those exposed to the outside world.
http://en.wikipedia.org/wiki/Folding_editor
Higher and more dynamic languages are usually less verbose. Weak typing also saves a lot of code. Of course there are trade-offs.
I use the #region directive in Visual Studio to collapse blocks of code that are not the primary focus, e.g. unit tests. With log4net logging statements are only ever one line. I haven't found any approaches to reduce the parameter checking code although it sounds like C# 4 has some kind of contract framework that will help there.
I have some coworkers who once, while being chewed out by a client for an overdue and bug-ridden project, bragged to the customer that they had written 5 times as much test code as operational code.
The client was not happy, and by "not happy" I mean their skin turned green, they grew to 5 times their normal size, and their clothes popped off.
You could just make a static class in a utilities assembly that checks your parameters and things. For instance in the Spring Framework (which is where I got the idea) it has an Assert class and it makes it really fast to make sure that string params aren't empty or that object params aren't null. It cleans up validation code and reduces duplicate code which is a win win.