Where did the view handler design pattern come from? - language-agnostic

I'm trying to figure out the origin of the view handler design pattern in software engineering. Many of the design patterns in software engineering were inspired by things which pre-date computers, and I was wondering if anybody had any insights on the origin of this particular pattern.

It feels a lot like a rule systems for updating construction or planning drawing dependencies. Or to rephrase, it seems like an automation of the work rules and cross-indexes required to keep a set of architectural plans synchronized as changes are made on each drawing.

Look in POSA1 - Pattern-Oriented Software Architecture Volume 1: A System of Patterns (from 1996 i think).
There is a chapter about view handler. There are examples of the usage of a such pattern in 1985 (Macintosh Window Manager) and 1993 (Ms Word).
But i am not sure we are talking about the same pattern. Where you heard about this pattern name?

Related

Domain Driven Design vs Model Driven Architecture

I am curious, what are the differences between Domain Driven Design and Model Driven Architecture? I have the impression they have certain similarities.
Could you enlighten me?
Thanks
Don't disagree with most of the above although it's perhaps worth expanding a little.
The single most important concept in DDD is to focus on the problem domain. To put technology obsession to the side and concentrate primarily on modelling the problem you're trying to solve. So put ajax, ORMs, databases, frameworks etc. into the background and instead make sure you have a complete, accurate model of the problem first and foremost. (Of course you still need the architectural components - but they're explicitly subservient to the model). DDD calls this "Ubiquitous Language" - a model expressed in terms domain experts and developers alike use and understand. A model where the names of classes, methods etc. are taken from the problem domain.
DDD doesn't mandate /how/ you capture that model, although the book implies using an OO language to do so.
MDA shares that same notion of modelling the problem domain first and foremost (the PIM, Platform-Independent Model). As opposed to DDD, it recommends creating that model with UML. But the intent is the same: understand the problem domain without tainting it with (software) architectural concerns.
MDA's PSM (Platform-Specific Model) is somewhat analogous to applying the architectural patterns in DDD (e.g. aggregate, repository, etc.). Again - while different in specifics - both aim to solve the problem of converting a 'pure' problem domain model into a full software system.
So summing up, I'd say they are similar in two ways:
The centrality of the Model (as #Rui says) - specifically the /Domain/ model.
Applying architectural patterns to the model in order to realise the target system.
hth.
The root of both Domain-Driven Design (DDD) and Model Driven Architecture (MDA) is Model-Driven Engineering(MDE), also known as Model-Driven Software Development (MDSD) if limited to the software development domain. See Wikipedia: http://en.wikipedia.org/wiki/Model-driven_development
All approaches falling under the MDE umbrella have one thing in common: a model. How this model is materialized depends on the specific MDE flavor.
MDA is regarded as overly complex. DDD is considered by some as too abstract. My personal favorite MDE implementations are DSM and ABSE (not listed on the Wikipedia article).
DDD is about approaching a software solution from a business perspective with the intent of keeping the design as much close to the real world as possible. This is more of an art than engineering.
MDA solves different set of problems. More details here: http://xml.coverpages.org/OMG-MDAFAQfinal1.pdf
Each X-Driven approach helps deliver values of specific aspects and representations in problem-solving activities. From my point of view, the main difference is that DDD is a design technique and MDA is an infrastructure, which is needed when the engineering community wanted to use it in the real world industry.
The term of Domain in DDD has isA relationship to "Problem Domain" and often seems the same thing. DDD values domain expertise, where decision depends on how much we understand the problems and how we choose the right path from initial to winning states. Before the final design spec can be written, there will be a great effort on problem studies. By looking at the main 3 principles of DDD. I map DDD with things I familiar with my age nowadays, (a) Focus on the core domain (DDD & MVP seems identical in the focus setting), (b) Explore models in a creative collaboration (This is Model-Driven/Based Engineering). Two contributors consist of domain expert - designer and professional software developer. (c) Speak a ubiquitous language within an explicitly bounded context (Communicate using Domain-specific language and develop artifacts relevant to the problem domain)
By looking at the development collaboration of MDA and related standards, it is an infrastructure for the application of Model-Driven Engineering. This is the evolution of the software industry in supporting the way to describe a software system using models and demonstrates how we organize CIM/PIM/PSM models and artifacts. Many powerful modeling operations and tools such as model transformation, domain-specific modeling languages, and automated software engineering techniques are officially emerged with MDA

Are there any tools that make keeping the UML models in-sync with the code completely seamless?

UML Round-Trip Engineering tools with seamless synchronization?
The Rational suite purports to do it. But it's so pricey and clunky at drawing (worse than the Rose days) that it's not in the reach of most departments.
What’s amazing is that the free Bouml seems to do a fantastic job. It’s just feels too clunky to use. It has a great deal of functionality, is free (!), very fast, and reverse-engineers complex C++ very well. It also has some nice diagram support, including a very nice sequence diagram. Although the interface is unpolished (and constantly opens dialogs on the rightmost monitor), it does have the beginnings of a very capable product. It's a shame that the interface is so bare-bones and requires the expenditure of a lot of effort. Maybe it's because the author puts most of his time into the actual functionality. Does anyone have experience using Bouml throughout the product lifecycle?
That leaves the pricey MagicDraw, the very-capable yet reasonably-priced Enterprise Architect, and the slick-looking Visual Paradigm. Of these, only Visual Paradigm had an issue reverse-engineering my project's C++ headers.
MagicDraw has a strange, old feel. It does a good job at reverse-engineering on its own, although it remains to be seen whether round-trip engineering of complex C++ projects is seamless. They want over $1800 for the multi-language version, so it's priced similarly to Rational tools.
Enterprise Architect, although far less expensive than most, seems like it may be the most feature complete. It parses and generates C++ flawlessly. Even the comments and formatting are left intact. There are great training materials. But it doesn't handle Objective-C, so less useful for iOS and Mac OS X mixed code projects. The automatic Sequence Diagram generation sounds awesome, but sounds like it only works on Windows .NET projects.
Visual Architect (>$800 for multi-language 2-way) is bar far the best-looking software modeling tool I've come across. Although it may have some round-trip issues remaining, it is a pleasure to use for building the models by hand. It's even nicer than Rose was in some ways. It has an intuitive way of bringing up the tools you need right at the cursor. Yet as I mentioned, it currently falls short of the goal to keep the model in sync with the source. And it often doesn't even give notification that the import didn't fully work, or that duplicate classes have been created (with the same names). It also makes entry of message parameters difficult, using dialogs, whereas others allow the parameters to be changed right on the diagram. (The free Bouml excels at this, as does MagicDraw and others.)
Has anyone found a multi-language (Java, C++, C#, ObjC++, Python, Ruby, SQL) round-trip engineering tool that will hold up to real world projects, where customizations are handled (like custom parameters on messages), yet are not wiped out by the next source code import?
And where all the formatting and comments are completely preserved on generation. Close is not really good enough. If the tools mess up the source code formatting, no developer is going to want the tool run on his source.
Peter Coad's Together-J used to have diagrams and an editor together in one IDE (hence the name). Change a diagram and the code changes; same for the other way as well.
The UML tool and editor were both a bit slow. I think machines of the day were underpowered and didn't show it off to best advantage.
I believe Peter Coad sold it to Borland. Looks like Borland is out of the IDE business. You can still get it here.
I think IntelliJ is the best Java IDE there is. You can generate some nice UML diagrams using it.
The real question is: Why is UML so important? I'd rather have code. I usually do enough UML to get the idea across, write the code with unit tests, and then reverse engineer it for documentation. You can't debug or unit test UML diagrams. Better to have working code.
Bouml ... constantly opens dialogs on the rightmost monitor
in a multiple monitor configuration the best is to indicate to Bouml which monitor must be used by default, else for Bouml you have just a very large monitor including all your monitors. Of course to indicate a default monitor doesn't means you can't use the other one(s), and it is possible to move the dialogs/main window where you want. The definition of the default monitor to use is done through the environment dialog.
Enterprise Architect seems to do a good job at this. As you point out, it's reasonably-priced. And it will also generate diagrams and documentation, as well as import/export source code.

Do formal methods of program verfication have a place in industry?

I took a glimpse on Hoare Logic in college. What we did was really simple. Most of what I did was proving the correctness of simple programs consisting of while loops, if statements, and sequence of instructions, but nothing more. These methods seem very useful!
Are formal methods used in industry widely?
Are these methods used to prove mission-critical software?
Well, Sir Tony Hoare joined Microsoft Research about 10 years ago, and one of the things he started was a formal verification of the Windows NT kernel. Indeed, this was one of the reasons for the long delay of Windows Vista: starting with Vista, large parts of the kernel are actually formally verified wrt. to certain properties like absence of deadlocks, absence of information leaks etc.
This is certainly not typical, but it is probably the single most important application of formal program verification, in terms of its impact (after all, almost every human being is in some way, shape or form affected by a computer running Windows).
This is a question close to my heart (I'm a researcher in Software Verification using formal logics), so you'll probably not be surprised when I say I think these techniques have a useful place, and are not yet used enough in the industry.
There are many levels of "formal methods", so I'll assume you mean those resting on a rigourous mathematical basis (as opposed to, say, following some 6-Sigma style process). Some types of formal methods have had great success - type systems being one example. Static analysis tools based on data flow analysis are also popular, model checking is almost ubiquitous in hardware design, and computational models like Pi-Calculus and CCS seem to be inspiring some real change in practical language design for concurrency. Termination analysis is one that's had a lot of press recently - The SDV project at Microsoft and work by Byron Cook are recent examples of research/practice crossover in formal methods.
Hoare Reasoning has not, so far, made great inroads in the industry - this is for more reasons than I can list, but I suspect is mostly around the complexity of writing then proving specifications for real programs (they tend to get big, and fail to express properties of many real world environments). Various sub-fields in this type of reasoning are now making big inroads into these problems - Separation Logic being one.
This is partially the nature of ongoing (hard) research. But I must confess that we, as theorists, have entirely failed to educate the industry on why our techniques are useful, to keep them relevant to industry needs, and to make them approachable to software developers. At some level, that's not our problem - we're researchers, often mathematicians, and practical usage is not foremost in our minds. Also, the techniques being developed are often too embryonic for use in large scale systems - we work on small programs, on simplified systems, get the math working, and move on. I don't much buy these excuses though - we should be more active in pushing our ideas, and getting a feedback loop between the industry and our work (one of the main reasons I went back to research).
It's probably a good idea for me to resurrect my weblog, and make some more posts on this stuff...
I cannot comment much on mission-critical software, although I know that the avionics industry uses a wide variety of techniques to validate software, including Hoare-style methods.
Formal methods have suffered because early advocates like Edsger Dijkstra insisted that they ought to be used everywhere. Neither the formalisms nor the software support were up to the job. More sensible advocates believe that these methods should be used on problems that are hard. They are not widely used in industry, but adoption is increasing. Probably the greatest inroads have been in the use of formal methods to check safety properties of software. Some of my favorite examples are the SPIN model checker and George Necula's proof-carrying code.
Moving away from practice and into research, Microsoft's Singularity operating-system project is about using formal methods to provide safety guarantees that ordinarily require hardware support. This in turn leads to faster performance and stronger guarantees. For example, in singularity they have proved that if a third-party device driver is allowed into the system (which means basic verification conditions have been proved), then it cannot possibly bring down that whole OS–he worst it can do is hose its own device.
Formal methods are not yet widely used in industry, but they are more widely used than they were 20 years ago, and 20 years from now they will be more widely used still. So you are future-proofed :-)
Yes, they are used, but not widely in all areas. There are more methods than just hoare logic, some are used more, some less, depending on suitability for given task. The common problem is that sofware is biiiiiiig and verifying that all of it is correct is still too hard a problem.
For example the theorem-prover (a software that aids humans in proving program correctness) ACL2 has been used to prove that a certain floating-point processing unit does not have a certain type of bug. It was a big task, so this technique is not too common.
Model checking, another kind of formal verification, is used rather widely nowadays, for example Microsoft provides a type of model checker in the driver development kit and it can be used to verify the driver for a set of common bugs. Model checkers are also often used in verifying hardware circuits.
Rigorous testing can be also thought of as formal verification - there are some formal specifications of which paths of program should be tested and so on.
"Are formal methods used in industry?"
Yes.
The assert statement in many programming languages is related to formal methods for verifying a program.
"Are formal methods used in industry widely ?"
No.
"Are these methods used to prove mission-critical software ?"
Sometimes. More often, they're used to prove that the software is secure. More formally, they're used to prove certain security-related assertions about the software.
There are two different approaches to formal methods in the industry.
One approach is to change the development process completely. The Z notation and the B method that were mentioned are in this first category. B was applied to the development of the driverless subway line 14 in Paris (if you get a chance, climb in the front wagon. It's not often that you get a chance to see the rails in front of you).
Another, more incremental, approach is to preserve the existing development and verification processes and to replace only one of the verification tasks at a time by a new method. This is very attractive but it means developing static analysis tools for exiting, used languages that are often not easy to analyse (because they were not designed to be).
If you go to (for instance)
http://dblp.uni-trier.de/db/indices/a-tree/d/Delmas:David.html
(sorry, only one hyperlink allowed for new users :( )
you will find instances of practical applications of formal methods to the verification of C programs (with static analyzers Astrée, Caveat, Fluctuat, Frama-C) and binary code (with tools from AbsInt GmbH).
By the way, since you mentioned Hoare Logic, in the above list of tools, only Caveat is based on Hoare logic (and Frama-C has a Hoare logic plug-in). The others rely on abstract interpretation, a different technique with a more automatic approach.
My area of expertise is the use of formal methods for static code analysis to show that software is free of run-time errors. This is implemented using a formal methods technique known "abstract interpretation". The technique essentially enables you to prove certain atributes of a s/w program. E.g. prove that a+b will not overflow or x/(x-y) will not result in a divide by zero. An example static analysis tool that uses this technique is Polyspace.
With respect to your question: "Are formal methods used in industry widely?" and "Are these methods used to prove mission-critical software?"
The answer is yes. This opinion is based on my experience and supporting the Polyspace tool for industries that rely on the use of embedded software to control safety critical systems such as electronic throttle in an automobile, braking system for a train, jet engine controller, drug delivery infusion pump, etc. These industries do indeed use these types of formal methods tools.
I don't believe all 100% of these industry segments are using these tools, but the use is increasing. My opinion is that the Aerospace and Automotive industries lead with the Medical Device industry quickly ramping up use.
Polyspace is a a (hideously expensive, but very good) commercial product based on program verification. It's fairly pragmatic, in that it scales up from 'enhanced unit testing that will probably find some bugs' to 'the next three years of your life will be spent showing these 10 files have zero defects'.
It is based more on negative verification ('this program won't corrupt your stack') instead positive verification ('this program will do precisely what these 50 pages of equations say it will').
To add to Jorg's answer, here's an interview with Tony Hoare. The tools Jorg's referring to, I think, are PREfast and PREfix. See here for more information.
Besides of other more procedural approaches, Hoare logic was in the basis of Design by Contract, introduced as an object oriented technique by Bertrand Meyer in Eiffel (see Meyer's article of 1992, page 4). While Design by Contract is not the same as formal verification methods (for one thing, DbC doesn't prove anything until the software is executed), in my opinion it provides a more practical use.

Flow Based Programming

I have been doing a little reading on Flow Based Programming over the last few days. There is a wiki which provides further detail. And wikipedia has a good overview on it too. My first thought was, "Great another proponent of lego-land pretend programming" - a concept harking back to the late 80's. But, as I read more, I must admit I have become intrigued.
Have you used FBP for a real project?
What is your opinion of FBP?
Does FBP have a future?
In some senses, it seems like the holy grail of reuse that our industry has pursued since the advent of procedural languages.
1. Have you used FBP for a real project?
We've designed and implemented a DF server for our automation project (dispatcher, component iterface, a bunch of components, DF language, DF compiler, UI). It is written in bare C++, and runs on several Unix-like systems (Linux x86, MIPS, avr32 etc., Mac OSX). It lacks several features, e.g. sophisticated flow control, complex thread control (there is only a not too advanced component for it), so it is just a prototype, even it works. We're now working on a full-featured server. We've learnt lot during implementing and using the prototype.
Also, we'll make a visual editor some day.
2. What is your opinion of FBP?
2.1. First of all, dataflow programming is ultimate fun
When I met dataflow programming, I was feel like 20 years ago, when I met programming first. Altough, DF programming differs from procedural/OOP programming, it's just a kind of programming. There are lot of things to discover, even sooo simple ones! It's very funny, when, as an experienced programmer, you met a DF problem, which is a very-very basic thing, but it was completely unknown for you before. So, if you jump into DF programming, you will feel like a rookie programmer, who first met the "cycle" or "condition".
2.2. It can be used only for specific architectures
It's just a hammer, which are for hammering nails. DF is not suitable for UIs, web server and so on.
2.3. Dataflow architecture is optimal for some problems
A dataflow framework can make magic things. It can paralellize procedures, which are not originally designed for paralellization. Components are single-threaded, but when they're organized into a DF graph, they became multi-threaded.
Example: did you know, that make is a DF system? Try make -j (see man, what -j is used for). If you have multi-core machine, compile your project with and without -j, and compare times.
2.4. Optimal split of the problem
If you're writing a program, you often split up the problem for smaller sub-problems. There are usual split points for well-known sub-problems, which you don't need to implement, just use the existing solutions, like SQL for DB, or OpenGL for graphics/animation, etc.
DF architecture splits your problem a very interesting way:
the dataflow framework, which provides the architecture (just use an existing one),
the components: the programmer creates components; the components are simple, well-separated units - it's easy to make components;
the configuration: a.k.a. dataflow programming: the configurator puts the dataflow graph (program) together using components provided by the programmer.
If your component set is well-designed, the configurator can build such system, which the programmer has never even dreamed about. Configurator can implement new features without disturbing the programmer. Customers are happy, because they have personalised solution. Software manufacturer is also happy, because he/she don't need to maintain several customer-specific branches of the software, just customer-specific configurations.
2.5. Speed
If the system is built on native components, the DF program is fast. The only time loss is the message dispatching between components compared to a simple OOP program, it's also minimal.
3. Does FBP have a future?
Yes, sure.
The main reason is that it can solve massive multiprocessing issues without introducing brand new strange software architectures, weird languages. Dataflow programming is easy, and I mean both: component programming and dataflow configuration building. (Even dataflow framework writing is not a rocket science.)
Also, it's very economic. If you have a good set of components, you need only put the lego bricks together. A DF program is easy to maintain. The DF config building requires no experienced programmer, just a system integrator.
I would be happy, if native systems spread, with doors open for custom component creating. Also there should be a standard DF language, which means that it can be used with platform-independent visual editors and several DF servers.
Interesting discussion! It occurred to me yesterday that part of the confusion may be due to the fact that many different notations use directed arcs, but use them to mean different things. In FBP, the lines represent bounded buffers, across which travel streams of data packets. Since the components are typically long-running processes, streams may comprise huge numbers of packets, and FBP applications can run for very long periods - perhaps even "perpetually" (see a 2007 paper on a project called Eon, mostly by folks at UMass Amherst). Since a send to a bounded buffer suspends when the buffer is (temporarily) full (or temporarily empty), indefinite amounts of data can be processed using finite resources.
By comparison, the E in Grafcet comes from Etapes, meaning "steps", which is a rather different concept. In this kind of model (and there are a number of these out there), the data flowing between steps is either limited to what can be held in high-speed memory at one time, or has to be held on disk. FBP also supports loops in the network, which is hard to do in step-based systems - see for example http://www.jpaulmorrison.com/cgi-bin/wiki.pl?BrokerageApplication - notice that this application used both MQSeries and CORBA in a natural way. Furthermore, FBP is natively parallel, so it lends itself to programming of grid networks, multicore machines, and a number of the directions of modern computing. One last comment: in the literature I have found many related projects, but few of them have all the characteristics of FBP. A list that I have amassed over the years (a number of them closer than Grafcet) can be found in http://www.jpaulmorrison.com/cgi-bin/wiki.pl?FlowLikeProjects .
I do have to disagree with the comment about FBP being just a means of implementing FSMs: I think FSMs are neat, and I believe they have a definite role in building applications, but the core concept of FBP is of multiple component processes running asynchronously, communicating by means of streams of data chunks which run across what are now called bounded buffers. Yes, definitely FSMs are one way of building component processes, and in fact there is a whole chapter in my book on FBP devoted to this idea, and the related one of PDAs (1) - http://www.jpaulmorrison.com/fbp/compil.htm - but in my opinion an FSM implementing a non-trivial FBP network would be impossibly complex. As an example the diagram shown in
is about 1/3 of a single batch job running on a mainframe. Every one of those blocks is running asynchronously with all the others. By the way, I would be very interested to hearing more answers to the questions in the first post!
1: http://en.wikipedia.org/wiki/Pushdown_automaton Push-down automata
Whenever I hear the term flow based programming I think of LabView, conceptually. Ie component processes who's scheduling is driven primarily by a change to its input data. This really IS lego programming in the sense that the labview platform was used for the latest crop of mindstorm products. However I disagree that this makes it a less useful programming model.
For industrial systems which typically involve data collection, control, and automation, it fits very well. What is any control system if not data in transformed to data out? Ie what component in your control scheme would you not prefer to represent as a black box in a bigger picture, if you could do so. To achieve that level of architectural clarity using other methodologies you might have to draw a data domain class diagram, then a problem domain run time class relationship, then on top of that a use case diagram, and flip back and forth between them. With flow driven systems you have the luxury of being able to collapse a lot of this information together accurately enough that you can realistically design a system visually once the components are build and defined.
One question I never had to ask when looking at an application written in labview is "What piece of code set this value?", as it was inherent and easy to trace backwards from the data, and also mistakes like multiple untintended writers were impossible to create by mistake.
If only that was true of code written in a more typically procedural fashion!
1) I build a small FBP framework for an anomaly detection project, and it turns out to have been a great idea.
You can also have a look at some of the KNIME videos, that give a good idea of what a flow based framework feels like when the framework is put together by a great team. Admittedly, it is batch based and not created for continuous operation.
By far the best example of flow based programming, however, is UNIX pipes which is one of the oldest, most overlooked FBP framework. I don't think I have to elaborate on the power of nix pipes...
2) FBP is a very powerful tool for a large set of problems. The intrinsic parallelism is a great advantage, and any FBP framework can be made completely network transparent by using adapter modules. Smart frameworks are also absurdly fault tolerant, and able to dynamically reload crashed modules when necessary. The conceptual simplicity also allows cleaner communication with everybody involved in a project, and much cleaner code.
3) Absolutely! Pipes are here to stay, and are one of the most powerful feature of unix. The power inherent in a FBP framework compared to a static program are many, and trivialise change, to the point where some frameworks can be reconfigured while running with no special measures.
FBP FTW! ;-)
In automotive development, they have a language agnostic messaging protocol which is part of the MOST specification (Media Oriented Systems Transport), this was designed to communicate between components over a network or within the same device. Systems usually have both a real and visualized message bus - therefore you effectively have a form of flow based programming.
That was what made the light bulb go on for me several years ago and brought me here. It really is a fantastic way to work and so much more fun than conventional programming. The message catalog form the central specification and point of reference. It works well for both developers and management. i.e. Management are able to browse the message catalog instead of looking at source.
With integrated logging also referencing the catalog to produce intelligible analysis things can get really productive. I have real world experience of developing commercial products in this way. I am interested in taking things further, particularly with regards to tools and IDEs. Unfortunately I think many people within the automotive sector have missed the point about how great this is and have failed to build on it. They are now distracted by other fads and failed to realize that there was far more to most development than the physical bus.
I've used Spring Web Flow extensively in Java Web applications to model (typically) application processes, which tend to be complex wizard-like affairs with lots of conditional logic as to which pages to display. Its incredibly powerful. A new product was added and I managed to recut the existing pieces into a completely new application process in an hour or two (with adding a couple of new views/states).
I also looked into using OS Workflow to model business processes but that project got canned for various reasons.
In the Microsoft world you have Windows Workflow Foundation ("WWF"), which is becoming more popular, particularly in conjunction with Sharepoint.
FBP is just a means of implementing a finite state machine. It's nothing new.
I realize that it is not exactly the same thing, but this model has been used for years in PLC programming. ISO calls it Sequential Flow Chart, but many people call it Grafcet after a popular implementation. It offers parallel processing and defines transitions between states.
It's being used in the Business Intelligence world these days to mashup and process data. Data processing steps like ETL, querying, joining , and producing reports can be done by the end-user. I'm a developer on an open system - ComposableAnalytics.com In CA, the flow-based apps can be shared and executed via the browser.
This is what MQ Series, MSMQ and JMS are for.
This is cornerstone of Web Services and Enterprise Service Bus implementations.
Products like TIBCO and Sun's JCAPS are basically flow-based without using this particular buzz-word.
Most of the work of the application is done with small modules that pass messages through a processing network.

How to get started with speech-to-text?

I'm really interested in speech-to-text algorithms, but I'm not sure where to start studying up on them. A bunch of searching around led me to this, but it's from 1996 and I'm fairly certain that there have been improvements since then.
Does anyone who has any experience with this sort of stuff have any recommendations for reading / source code to examine? Or just general advice on what I should be trying to learn about if I want to get into the world of writing speech recognition programs (sometimes it's hard to know what to search for if you don't have much knowledge about the domain).
Edit: I'd like to do something cross-platform, but for the moment I'd be targeting linux.
Edit 2: Thanks csmba for the well-thought out reply. At this point in time, I'm mainly interested in being able to create applications that allow automation, or execution of different commands through voice. So, a limited amount of recognizable commands being able to be strung together. An example would be a music player that took commands like "Play the album Hello Everything by Squarepusher", or an application launcher that allowed the user to create voice-shortcuts to launch specific apps.
I realize that it's a pretty giant problem, and that I have nowhere near the level of knowledge required right now to tackle implementing an entire recognition engine, although the techniques involved with doing so fascinate me, and it is something I'd like to work myself up to doing. In all likelihood, I'll probably end up picking up a book or two on the subject and studying up / playing with "simple" implementations in my free time.
This is a HUGE questions, I wouldn't know how to begin... So let me just try giving you the right "terms" so you can refine your quest:
First, understand that Speech Recognition is a diverse and complicated subject, and it has many different applications. People tend to map this domain to the first thing that comes to their head (usually, that would be computers understanding what you are saying like in IVR systems). So first lets distinguise the concept into the main categories:
Human-to-Machine: Applications that deal with understanding what a human is saying, but the human knows he is talking to a machine and the grammar is very limited. Examples are
Computer automation
Specialized: Pilots automating some controls for example (noise a huge problem)
IVR (Interactive Voice Response) systems like Google-411 or when you call the bank and the computer on the other side says "say 'service' to get customer service"
human-to-human (Spontaneous speech): This is a bigger, more complex problem. Here we can also break it down into different applciations:
Call Center: conversation between Agent-Customer, phone quality, compressed
Intelligence: radio/phone/live conversations between 2 or more individuals
Now, Speech-To-Text is not what you should be saying that you care about. What you care about is solving a problem. Different technologies are used to solve different problems. See an overview here of some of them. to summarize, other approaches are Phonetic transcription, LVCSR and direct based.
Also, are you interested in being the PHd behind the technology? you would need a Masters equivalent involving Signal processing and probably a PHd to be cutting edge. In which case, you will work for a company that develops the actual speech engine. Companies like Nuance and IBM are the big ones, but also Phillips and other startups exist.
On the other hand, if you want to be the one implementing applications, you will not be working on the engine, but working on building application that USE the engine. A good analogy I think is form the gaming industry:
Are you developing the graphic engine (like the Cry engine), or working on one of several hundred games, all use the same graphic engine?
Don't get me wrong, there is plenty to work on the quality of the search also outside the IBM/Nuance of the world. The engine is usually very open, and there are a lot of algorithmic tweaking to be done that can dramatically affect performance. Each business application has different constraints and cost/benefit function, so you can make experiments for many years building better voice recognition based applications.
one more thing: in general, you would also want to have good statistics background the lower in the stack you want to be.
At this point in time, I'm mainly interested in being able to create applications that allow automation
Good, we are converging here... Then you have no interest in "Speech-to-Text". That buzzwords takes you to the world of full transcription, a place you do not need to go to. You should be focusing on some of the more Human-to-Machine technologies like Voice XML and the ones used in IVR systems (Nuance is the biggest player there)
I would definitely recommend picking up a book or two if you are new to the field. I've got no experience in the field, so I can't make a recommendation. If you are still in college (or still have close ties), you should find out if any of your professors can make a recommendation.
The survey you linked is probably an excellent resource, too. I'm sure there have been advancements since 1996, but the basics are unlikely to have fundamentally changed. If the survey is well-written, then it would be well worth your time to read it.
For OS X check out this: OS X Speech Technologies
For Windows check out this: Microsoft Speech API
I have worked with IBMs ViaVoice product. It has a good ASR (automated speech recognition) engine, and a nice text-to-speech engine.
The websites not very good, but this is a link for the Embedded version http://www-01.ibm.com/software/voice/support/
It is platform agnostic though, and everything works through a MVC architecture using vxml a variant of xml for voice purposes.
What platform are you targeting ?. There is Microsoft Speech APIs that you can use if its for windows.
There is also the Speech Recognition Service for Android.