Why are there so many data structures absent from high level languages? - language-agnostic

Why is it that higher level languages (Javascript, PHP, etc.) don't offer data structures such as linked lists, queues, binary trees, etc. as part of their standard library? Is it for historical/practical/cultural reasons or is there something more fundamental that I'm missing.

linked lists
You can implement a linked list fairly easily in most dynamic languages, but they aren't that useful. For most cases, dynamic arrays (which most dynamic languages have built-in support for) are a better fit: they have better usage and cache coherence, better performance on indexed lookup, and decent performance on insert and delete. At the application level, there aren't many use cases where you really need a linked list.
queues
Easily implemented using a dynamic array.
binary trees
Binary trees are easily implemented in most dynamic languages. Binary search trees are, as ever, a chore to implement, but rarely needed. Hashtables will give you about the same performance and often better for many use cases. Most dynamic languages provide a built-in hashtable (or dictionary, or map, or table, or whatever you want to call it).
It's definitely important to know these foundational data structures, and if you're writing low-level code, you'll find yourself using them. At the application level where most dynamic languages are used though, hashtables and dynamic arrays really do cover 95% of your data structure needs.

There are many definitions of "high level" when it comes to languages, but I will take it in this context to refer to languages that are purpose-built for a specific domain (4GL anyone?). Such "specific domains" are typically restricted in scope, for example: web page construction, report writing, database querying, etc. Within that limited scope, there is frequently little need for anything but the most basic data structures.
Is There a Need?
Let's consider the case of Javascript. The scope of this language was originally very bounded, being a scripting language than ran within the confines of a web browser. It was concerned primarily with providing a small amount of dynamic behaviour on otherwise static web pages. Furthermore, the limitations of the technology made it impractical to write large components in that environment (notably performance and the sandbox model).
Since Javascript was confined to address "small problems", there was little need for a rich set of data structures. As data structures go, the Javascript's map is very flexible. You must remember that Basic and FORTRAN went a long way providing only arrays -- maps are considerably more flexible than that. Javascript appears to be undergoing a transformation, escaping the sandbox. Some very ambitious systems are being built in Javascript, both within and outside of the browser. And the technology is advancing to keep up with it (witness the new Javascript engines, persistence models, and so on). I anticipate that the demand for more interesting data structures will increase, and that the demand will be met.
Library capabilities generally appear as needs arise. Many of the basic data structures are so easy to implement that it hardly seems worth adding them to a library -- especially if that library needs to go through some sort of standardization process. This is why so many languages (of all levels) do not provide them out of the box. But I think that there is another force at work that will change all that... the rise of multiprogramming.
A New Need Arising?
It wasn't too long ago that the code that most developers wrote ran within the confines of a single thread. But now, our systems are full of threads, web workers, agents, coroutines, clusters, clouds and all manner of concurrent systems. This changes the whole complexion of implementing data structures from scratch.
In a single-threaded context, it is trivial to implement a linked list in almost any language. But add concurrency to the mix and now it takes a great deal of effort to get it right. One really needs to be a specialist to stand a chance at all. That is why you see rich collection frameworks in all the latest languages. The need to share data structures across thread boundaries (or worse) is being fulfilled.
History... But Not The Future
So, to summarize, I think the reason why rich data structures are conspicuously absent from many languages is largely historical. The need was not great enough to justify the effort. But there are new forces at work, in the form of highly concurrent systems, that force language libraries to provide industrial strength implementations of the richer data structures.

My intuitive answer would be that these languages defer the higher-level data structures to the programmer to implement him/herself. This allows the programmers to custom tailor the specific data structure to the problem being solved by the software. Often in an organization, many of these DSes are packaged in libraries for re-use in a large-scale application.

Related

is there a Universal Model for languages?

Many programming languages share generic and even fairly universal features. For example, if you compared Java, VB6, .NET, PHP, Python, then you would find common functions such as control structures, numeric and string manipulation, etc.
What has been done to define these features at a meta-language (or language-agnostic) level?
UML offers a descriptive reference of software in every aspect, but the real-world focus seems to be data processes. Is UML relevant?
I'm not asking "Why we don't have a single language that replaces the current plethora." We need many different tools (at least in this eon).
I'm not asking that all languages fit a template -- assembly vs. compiled languages are different enough to make that unfeasible (and some folks call HTML a language, though I wouldn't). Any attempt would start with a properly narrow scope. In line with this, I wouldn't expect the model to cover even a small selection with full validity.
I would expect however that such a model could be used to transpose from one language to another (with limited goals -- think jist translation).
There have been many attempts at this, but none have been very successful. The earliest I'm aware of is UNCOL more than 50 years ago.
You've given a list of languages that have a lot in common because they're pretty similar -- they're all procedural languages with common roots and some OO extensions thrown in, so that's not too suprising. If you start looking at different languages like LISP, haskell, erlang, prolog, or even SQL you start seeing very different things.
What you're describing sounds like the formal semantics of programming languages. There are a variety of approaches and each will give a way to formally specify the meaning of a program in some programming language. In some cases, this specification is essentially a translation into another language such as lambda calculus, or compilation for a formally specified abstract machine such as SECD.
There is so much work here it's hard to pick a specific reference. But I hope I've given you some useful keywords to continue your search.
UML is typically used to define algorithms/code in simpler terms before moving on to real code.
To answer what I am guessing to be your question, there is already a defined set of required parts of languages while,for,if,else... Will this ever be set as a standard, or made into a base library that is used by all languages: no, this is because the different developers of languages like to do it themselves.
I think the closest you can get to this without loss of generality is a Turing machine, which is not very useful for practical purposes. But if you allow Turing machine languages to be "labeled" and reused, you could build up the concepts you need, working from low- to high-level.
I think that MOF is the universal language.
You can for example create UML diagrams from MOF via a UML metamodel. If you save this metamodel information into xmi then you can save what ever information you need and even more than in any language. XMI semantic is so rich that there is no limit to its use. If you map UML to xmi on the top of a metamodel live synchronize with MOF then this is for me the universal language.
The author of Pattern Calculus seems to propose such a universal model. I expect that it will turn out to be just as useful as previous attempts to define a universal model, that is to say, good in parts but not the last word.

So was that Data Structures & Algorithms course really useful after all?

I remember when I was in DSA I was like wtf O(n) and wondering where would I use it other than in grad school or if you're not a PhD like Bloch. Somehow uses for it does pop up in business analysis, so I was wondering when have you guys had to call up your Big O skills to see how to write an algorithm, which data structure did you use to fit or whether you had to actually create a new ds (like your own implementation of a splay tree or trie).
Understanding Data Structures has been fundamental to many of the projects I've worked on, and that goes beyond the ten minute song 'n dance one does when asked such a question in an interview situation.
Granted that modern environments with all sorts of collection classes can make light work of storing and accessing large amounts of data, but having an understanding that a particular problem is best solved with a particular data structure can be a great timesaver. And by "timesaver" I mean "the difference between something working and not working".
Honestly, being able to answer that stuff is my biggest criterion for taking interviewees seriously in an interview. Knowing how basic data structures work, basic O(n) analysis, and some light theory is really crucial to being able to write large applications successfully.
It's important in the interview because it's important in the job. I've worked with techs in the past that were self taught, without taking the data structures course or reading a data structures book, and their code is occasionally bad in ways they should have seen coming.
If you don't know that n2 is going to run slowly compared to n log n, you've got more to learn.
As far as the later half of the data structures courses, it isn't generally applicable to most tech jobs, but if you ever do wind up needing it, you'll wish you had paid more attention.
Big-O notation is one of the basic notations used when describing algorithms implemented by a particular library. For example, all documentation on STL that I've seen describes various operations in terms of big-O, so naturally you have to e.g. understand the difference between O(1), O(log n) and O(n) to understand the implications of your choice of STL containers and algorithms. MSDN also does that for .NET classes, and IIRC Java documentation does that for standard Java classes. So, I'd say that knowing the notation is pretty much a requirement for understanding documentation of most popular frameworks out there.
Sure (even though I'm a humble MS in EE -- no PhD, no CS, differently from my colleague Joshua Block), I write a lot of stuff that needs to be highly scalable (or components that may need to be reused in highly scalable apps), so big-O considerations are most always at work in my design (and it's not hard to take them into account). The data structures I use are almost always from Python's simple but rich supply (which I did lend a hand developing;-), rarely is a totally custom one needed (rather than building on top of list, dict, etc); but when it does happen (e.g. the bitvectors in my open source project gmpy), no big deal.
I was able to use B-Trees right when I learned about them in algorithm class (that was about 15 years ago when there were much less open source implementations available). And even later the knowledge about the differences of e. g. container classes came in handy...
Absolutely: even though stacks, queues, etc. are pretty straightforward, it helps to have been introduced to them in a disciplined fashion.
B-Tree's and more advanced sorting are a bit more difficult so learning them early was a big benefit and I have indeed had to implement each of them at various points.
Finally, I created an algorithm for single-connected components a few years back that was significantly better than the one our signal-processing team was using but I couldn't convince them that it was better until I could show that it was O(n) complexity rather than O(nlogn).
...just to name a few examples.
Of course, if you are content to remain a CRUD-system hacker with no real desire to do more than collect a paycheck, then it may not be necessary...
I found my knowledge of data structures very useful when I needed to implement a customizable event-driven system about ten years ago. That's the biggie, but I use that sort of knowledge fairly frequently in lesser ways.
For me, knowing the exact algorithms has been... nice as background knowledge. However, the thing that's been the most useful is the more general background of having to pay attention to how different pieces of an algorithm interact. For instance, there can be places in code where moving one piece of code (ie, outside a loop) can make a huge difference in both time and space.
Its less of the specific knowledge the course taught and, rather, more that it acted like several years of experience. The course took something that might take years to encounter (have drilled into you) all the variations of in pure "real world experience" and condensed it.
The title of your question asks about data structures and algorithms, but the body of your question focuses on complexity analysis, so I'll focus on that too:
There are lots of programming jobs where being able to do complexity analysis is at least occasionally useful. See What career can I hope for if I like algorithms? for some examples of these.
I can think of several instances in my career where either I or a co-worker have discovered a a piece of code where the (usually time, sometimes space) complexity was higher that it should have been. eg: something that was quadratic or cubic when it could have been linear or nlog(n). Such code would work fine when given small inputs, but on larger inputs would quickly become really slow or consume all available memory. Knowing alternative algorithms and data structures, their complexities, and also how to analyze the complexity to build new algorithms is vital in being able to correct these problems (or avoid them in the first place).
Networking is all I've used it: in an implementation of traveling salesman.
Unfortunately I do a lot of "line of business" and "forms over data" apps, so most problems I work on can be solved by hammering together arrays, linked lists, and hash tables. However, I've had the chance to work my data structures magic here and there:
Due to weird complex business rules, I worked on an application which used a custom thread pool implemented as a leftist-heap.
My dev team struggled to write a complex multithreaded app. It was plagued with race conditions, dead locks, and lousy performance due to very fine-grained locking. We re-worked the code to share state between threads, opting to write a very light-weight wrapper to facilitate message passing. Next, we converting our linked lists and hash tables to immutable stacks and immutable style and immutable red-black trees, we had no more problems with thread safety or performance. The resulting code was immaculate and surprisingly readable.
Frequently, a business rules engine requires you to roll your own state machine, which is very naturally modelled as a graph where vertexes and states and edges are transitions between states.
If for no other reasons, I'm glad I took the time to readable about data structures and algorithms simply to be able picture novel problems a little differently, especially combinatorial problems and graph problems. Graph theory is no longer a synonym for "scary".

handling amorphous subsystems in formal software design

People like Alexander Stepanov and Sean Parent vote for a formal and abstract approach on software design.
The idea is to break complex systems down into a directed acyclic graph and hide cyclic behaviour in nodes representing that behaviour.
Parent gave presentations at boost-con and google (sheets from boost-con, p.24 introduces the approach, there is also a video of the google talk).
While i like the approach and think its a neccessary development, i have a problem with imagining how to handle subsystems with amorphous behaviour.
Imagine for example a common pattern for state-machines: using an interface which all states support and having different behaviour in concrete implementations for the states.
How would one solve that?
Note that i am just looking for an abstract approach.
I can think of hiding that behaviour behind a node and defining different sub-DAGs for the states, but that complicates the design considerately if you want to influence the behaviour of the main DAG from a sub-DAG.
Your question is not clear. Define amorphous subsystems.
You are "just looking for an abstract approach" but then you seem to want details about an implementation in a conventional programming language ("common pattern for state-machines"). So, what are you asking for? How to implement nested finite state-machines?
Some more detail will help the conversation.
For a real abstract approach, look at something like Stream X-Machines:
... The X-machine model is structurally the
same as the finite state machine, except
that the symbols used to label the machine's
transitions denote relations of type X→X. ...
The Stream X-Machine differs from Eilenberg's
model, in that the fundamental data type
X = Out* × Mem × In*,
where In* is an input sequence,
Out* is an output sequence, and Mem is the
(rest of the) memory.
The advantage of this model is that it
allows a system to be driven, one step
at a time, through its states and
transitions, while observing the
outputs at each step. These are
witness values, that guarantee that
particular functions were executed on
each step. As a result, complex
software systems may be decomposed
into a hierarchy of Stream
X-Machines, designed in a top-down
way and tested in a bottom-up way.
This divide-and-conquer approach to
design and testing is backed by
Florentin Ipate's proof of correct
integration, which proves how testing
the layered machines independently is
equivalent to testing the composed
system. ...
But I don't see how the presentation is related to this. He seems to speak about a quite mainstream approach to programming, nothing similar to X-Machines. Anyway, the presentation is quite confusing and I have no time to see the video right now.
First impression of the talk, reading the slides only
The author touches haphazardly on numerous fields/problems/solutions, apparently without recognizing it: from Peopleware (for example Psychology of programming), to Software Engineering (for example software product lines), to various programming techniques.
How the various parts are linked and what exactly he is advocating is not clear at all (I'm accustomed to just reading slides and they are usually consequential):
Dataflow programming?
Constraints solving for User Interfaces? For practical implementations, see Garnet for Common Lisp, Amulet/OpenAmulet for C++.
What advantages gives us this "new" concept-based generic programming with respect to well-known approaches (for example, tools based on Hoare logic pre/post conditions and invariants or, better, Hoare's Communicating Sequential Processes (CSP) or Hehner's Practical Theory of Programming or some programming language with a sophisticated type-system like ATS, Qi or Epigram and so on)? It seems to me that introducing "concepts" - which, as-is, are specific to C++ - is not more simple than using the alternatives. Is it just about jargon and "politics"? (Finally formal methods... but disguised).
Why organizing program modules as a DAG and not as a tree, like David Parnas advocated decades ago in Designing software for ease of extension and contraction? (here a directly accessible .pdf and here slides from a lecture). The work on X-Machines probably is an answer to this question (going even beyond DAGs), but, again, the author seems to speak about a quite conventional program development regime in which Parnas' approach is the only sensible.
If/when I will see the video I will update this answer.

How often do you need to create a real class hierarchy in your day to day programming?

I create business applications with heavy database use. Most of the programming work is just to connect components to the database and modifying components to adapt to general interface behaviour. I mostly use Delphi with its rich VCL library, and generally buy components needed. I keep most of the business logic in the database. I rarely get the chance to build a nice class hierarchy from the bottom up as there really is no need. Anyone else have this experience?
For me, occasionally a problem is clearer or easier with subclassing, but not often.
This also changes quite a bit in a given design as it's refactored.
My biggest problem is that programming courses and texts give so much weight to inheritance, hierarchies, and polymorphism through base classes (vs. interfaces or dynamic typing). This helps create legions of programmers that subclass everything and their mother.
The answer to this question is not totally language-agnostic;
Some languages like Java have a fairly limited set of language features available, meaning that subclassing is fairly often used because it's a convenient method for re-use, technical inheritance.
Closures and lambdas of C# make inheritance for technical reasons much less relevant. So normally inheritance is used for semantic reasons (like cat extends animal).
The last C# project I worked on, we more or less made all of the class hierarchies within a few weeks. After that it was more or less over.
On my current java project we create new class hierarchies all of the time.
Other languages will have other features that similarly affect this composition (mixins come to mind)
I put on my architecting/class design hat probably once or twice a month. It's probably the best hat I have and is the most fun to wear.
Depends what stage of the lifecycle your project is in though.
When your tackling problem domains you are well familiar with and already have a common code base to work from, you often have no need to create a new class hierarchy. It's when you stumble upon problems you have no ready solutions for, that you start building your own.
It's also very dependant on the type of applications you develop. If your domain already has well accepted conventions and libraries to work from, there probably isn't any need to reinvent the wheel (other than personal / academic interests). Some areas have inherently less available resources to work with, and in those you'll find yourself building everything from scratch most of the time.
A majority of applications, especially business applications, contains at least some kind of business logic in it. I would contend that business should not be in the database, but should rather be in the application. You can put referential integrity in the database as I think this is a good choice, but business logic should be only in the application.
By class hierarchy, I suppose you mean do you always have to end up with some inheritance in your object model, then the answer is no. But chances are you can often find some common code, factor it out and create a base class to contain the common code.
If you agree with me on the point that business logic should not be in the database, but should be in the application, then I recommend you look into the MVC Design Pattern to guide your design. You will find your design contain classes or objects. Your VCLs will represent your View, and you can have your Model classes map directly to the database table, i.e. each member in the class in the model corresponds to a field in a database table (again, this is the norm but there will be exception, where this simplicity fails to apply). Then you'll need a layer to handle the CRUD (Create, Read, Update, Delete) of the Model classes to the database tables. You will end up with an "layered" application that is easier to maintain and enhance.
It depends on what you mean by hierarchy - inheritance or layering?
When object oriented languages first came out, inheritance was overused. Complicated hierarchies were common. Now, interfaces (as in Java and C#) provide a simpler way to get the benefit of polymorphism without the complications of inheritance. I rarely use inheritance anymore.
Layering, however, is vital when creating a large application. Layering prevents general low-level classes (like lists) from directly referencing specific high-level classes (like web browser windows). As far as I know, there isn't a formal way to describe layering, but there are general guidelines (model-view-controller (MVC), separate GUI logic from business logic, separate data from presentation, etc.).
It really depends on the types/phases of the projects you're working on. I happen to do that everyday because I'm working on database internals for a new database, creating related libraries/frameworks. I'd imagine doing that a lot less if I'm working within a mature framework using other people's libraries.
I'm doing Infrastructure for our companys' product, so I'm writing a lot of code that will be used later by guys in other teams. So I end up writing lots of abstract classes, interfaces, hierarchies and so on. Mostly it's just a pattern of "default behaviour in an abstract/virtual class, which other programmers may override".
Very challenging, I must say.
The time that I find class hierarchies most beneficial is when the relationship between objects actually does match a true "is-a" relationship in the domain.
However if I can avoid large hierarchies I will due to the fact that they are often a little more tricky to map to relational databases and can really complicate your database designs. Since you say most of your applications make heavy use of databases this would be something to take into consideration.

Flow Based Programming

I have been doing a little reading on Flow Based Programming over the last few days. There is a wiki which provides further detail. And wikipedia has a good overview on it too. My first thought was, "Great another proponent of lego-land pretend programming" - a concept harking back to the late 80's. But, as I read more, I must admit I have become intrigued.
Have you used FBP for a real project?
What is your opinion of FBP?
Does FBP have a future?
In some senses, it seems like the holy grail of reuse that our industry has pursued since the advent of procedural languages.
1. Have you used FBP for a real project?
We've designed and implemented a DF server for our automation project (dispatcher, component iterface, a bunch of components, DF language, DF compiler, UI). It is written in bare C++, and runs on several Unix-like systems (Linux x86, MIPS, avr32 etc., Mac OSX). It lacks several features, e.g. sophisticated flow control, complex thread control (there is only a not too advanced component for it), so it is just a prototype, even it works. We're now working on a full-featured server. We've learnt lot during implementing and using the prototype.
Also, we'll make a visual editor some day.
2. What is your opinion of FBP?
2.1. First of all, dataflow programming is ultimate fun
When I met dataflow programming, I was feel like 20 years ago, when I met programming first. Altough, DF programming differs from procedural/OOP programming, it's just a kind of programming. There are lot of things to discover, even sooo simple ones! It's very funny, when, as an experienced programmer, you met a DF problem, which is a very-very basic thing, but it was completely unknown for you before. So, if you jump into DF programming, you will feel like a rookie programmer, who first met the "cycle" or "condition".
2.2. It can be used only for specific architectures
It's just a hammer, which are for hammering nails. DF is not suitable for UIs, web server and so on.
2.3. Dataflow architecture is optimal for some problems
A dataflow framework can make magic things. It can paralellize procedures, which are not originally designed for paralellization. Components are single-threaded, but when they're organized into a DF graph, they became multi-threaded.
Example: did you know, that make is a DF system? Try make -j (see man, what -j is used for). If you have multi-core machine, compile your project with and without -j, and compare times.
2.4. Optimal split of the problem
If you're writing a program, you often split up the problem for smaller sub-problems. There are usual split points for well-known sub-problems, which you don't need to implement, just use the existing solutions, like SQL for DB, or OpenGL for graphics/animation, etc.
DF architecture splits your problem a very interesting way:
the dataflow framework, which provides the architecture (just use an existing one),
the components: the programmer creates components; the components are simple, well-separated units - it's easy to make components;
the configuration: a.k.a. dataflow programming: the configurator puts the dataflow graph (program) together using components provided by the programmer.
If your component set is well-designed, the configurator can build such system, which the programmer has never even dreamed about. Configurator can implement new features without disturbing the programmer. Customers are happy, because they have personalised solution. Software manufacturer is also happy, because he/she don't need to maintain several customer-specific branches of the software, just customer-specific configurations.
2.5. Speed
If the system is built on native components, the DF program is fast. The only time loss is the message dispatching between components compared to a simple OOP program, it's also minimal.
3. Does FBP have a future?
Yes, sure.
The main reason is that it can solve massive multiprocessing issues without introducing brand new strange software architectures, weird languages. Dataflow programming is easy, and I mean both: component programming and dataflow configuration building. (Even dataflow framework writing is not a rocket science.)
Also, it's very economic. If you have a good set of components, you need only put the lego bricks together. A DF program is easy to maintain. The DF config building requires no experienced programmer, just a system integrator.
I would be happy, if native systems spread, with doors open for custom component creating. Also there should be a standard DF language, which means that it can be used with platform-independent visual editors and several DF servers.
Interesting discussion! It occurred to me yesterday that part of the confusion may be due to the fact that many different notations use directed arcs, but use them to mean different things. In FBP, the lines represent bounded buffers, across which travel streams of data packets. Since the components are typically long-running processes, streams may comprise huge numbers of packets, and FBP applications can run for very long periods - perhaps even "perpetually" (see a 2007 paper on a project called Eon, mostly by folks at UMass Amherst). Since a send to a bounded buffer suspends when the buffer is (temporarily) full (or temporarily empty), indefinite amounts of data can be processed using finite resources.
By comparison, the E in Grafcet comes from Etapes, meaning "steps", which is a rather different concept. In this kind of model (and there are a number of these out there), the data flowing between steps is either limited to what can be held in high-speed memory at one time, or has to be held on disk. FBP also supports loops in the network, which is hard to do in step-based systems - see for example http://www.jpaulmorrison.com/cgi-bin/wiki.pl?BrokerageApplication - notice that this application used both MQSeries and CORBA in a natural way. Furthermore, FBP is natively parallel, so it lends itself to programming of grid networks, multicore machines, and a number of the directions of modern computing. One last comment: in the literature I have found many related projects, but few of them have all the characteristics of FBP. A list that I have amassed over the years (a number of them closer than Grafcet) can be found in http://www.jpaulmorrison.com/cgi-bin/wiki.pl?FlowLikeProjects .
I do have to disagree with the comment about FBP being just a means of implementing FSMs: I think FSMs are neat, and I believe they have a definite role in building applications, but the core concept of FBP is of multiple component processes running asynchronously, communicating by means of streams of data chunks which run across what are now called bounded buffers. Yes, definitely FSMs are one way of building component processes, and in fact there is a whole chapter in my book on FBP devoted to this idea, and the related one of PDAs (1) - http://www.jpaulmorrison.com/fbp/compil.htm - but in my opinion an FSM implementing a non-trivial FBP network would be impossibly complex. As an example the diagram shown in
is about 1/3 of a single batch job running on a mainframe. Every one of those blocks is running asynchronously with all the others. By the way, I would be very interested to hearing more answers to the questions in the first post!
1: http://en.wikipedia.org/wiki/Pushdown_automaton Push-down automata
Whenever I hear the term flow based programming I think of LabView, conceptually. Ie component processes who's scheduling is driven primarily by a change to its input data. This really IS lego programming in the sense that the labview platform was used for the latest crop of mindstorm products. However I disagree that this makes it a less useful programming model.
For industrial systems which typically involve data collection, control, and automation, it fits very well. What is any control system if not data in transformed to data out? Ie what component in your control scheme would you not prefer to represent as a black box in a bigger picture, if you could do so. To achieve that level of architectural clarity using other methodologies you might have to draw a data domain class diagram, then a problem domain run time class relationship, then on top of that a use case diagram, and flip back and forth between them. With flow driven systems you have the luxury of being able to collapse a lot of this information together accurately enough that you can realistically design a system visually once the components are build and defined.
One question I never had to ask when looking at an application written in labview is "What piece of code set this value?", as it was inherent and easy to trace backwards from the data, and also mistakes like multiple untintended writers were impossible to create by mistake.
If only that was true of code written in a more typically procedural fashion!
1) I build a small FBP framework for an anomaly detection project, and it turns out to have been a great idea.
You can also have a look at some of the KNIME videos, that give a good idea of what a flow based framework feels like when the framework is put together by a great team. Admittedly, it is batch based and not created for continuous operation.
By far the best example of flow based programming, however, is UNIX pipes which is one of the oldest, most overlooked FBP framework. I don't think I have to elaborate on the power of nix pipes...
2) FBP is a very powerful tool for a large set of problems. The intrinsic parallelism is a great advantage, and any FBP framework can be made completely network transparent by using adapter modules. Smart frameworks are also absurdly fault tolerant, and able to dynamically reload crashed modules when necessary. The conceptual simplicity also allows cleaner communication with everybody involved in a project, and much cleaner code.
3) Absolutely! Pipes are here to stay, and are one of the most powerful feature of unix. The power inherent in a FBP framework compared to a static program are many, and trivialise change, to the point where some frameworks can be reconfigured while running with no special measures.
FBP FTW! ;-)
In automotive development, they have a language agnostic messaging protocol which is part of the MOST specification (Media Oriented Systems Transport), this was designed to communicate between components over a network or within the same device. Systems usually have both a real and visualized message bus - therefore you effectively have a form of flow based programming.
That was what made the light bulb go on for me several years ago and brought me here. It really is a fantastic way to work and so much more fun than conventional programming. The message catalog form the central specification and point of reference. It works well for both developers and management. i.e. Management are able to browse the message catalog instead of looking at source.
With integrated logging also referencing the catalog to produce intelligible analysis things can get really productive. I have real world experience of developing commercial products in this way. I am interested in taking things further, particularly with regards to tools and IDEs. Unfortunately I think many people within the automotive sector have missed the point about how great this is and have failed to build on it. They are now distracted by other fads and failed to realize that there was far more to most development than the physical bus.
I've used Spring Web Flow extensively in Java Web applications to model (typically) application processes, which tend to be complex wizard-like affairs with lots of conditional logic as to which pages to display. Its incredibly powerful. A new product was added and I managed to recut the existing pieces into a completely new application process in an hour or two (with adding a couple of new views/states).
I also looked into using OS Workflow to model business processes but that project got canned for various reasons.
In the Microsoft world you have Windows Workflow Foundation ("WWF"), which is becoming more popular, particularly in conjunction with Sharepoint.
FBP is just a means of implementing a finite state machine. It's nothing new.
I realize that it is not exactly the same thing, but this model has been used for years in PLC programming. ISO calls it Sequential Flow Chart, but many people call it Grafcet after a popular implementation. It offers parallel processing and defines transitions between states.
It's being used in the Business Intelligence world these days to mashup and process data. Data processing steps like ETL, querying, joining , and producing reports can be done by the end-user. I'm a developer on an open system - ComposableAnalytics.com In CA, the flow-based apps can be shared and executed via the browser.
This is what MQ Series, MSMQ and JMS are for.
This is cornerstone of Web Services and Enterprise Service Bus implementations.
Products like TIBCO and Sun's JCAPS are basically flow-based without using this particular buzz-word.
Most of the work of the application is done with small modules that pass messages through a processing network.