Usability: speech recognition versus keypad - usability

We are seeing more and more speech recognition implemented and request for libraries that does good speech recognition. What's the rationale (in term of usability) behind it versus a keyboard or keypad? What reasons would you have to invest in this development?
For example, let's take the call centers. A few years ago, almost every call center used an IVR that prompted for a key for the menus. Now, we're seeing more and more menus with prompt for a spoken keyword and/or a pressed keypad: "please say invoice or press 1 to see your invoice". Or we are seeing the same thing in companies' phone directory: "please say the name of the person you are trying to reach" ... "Franck Loyd" ... "Did you say Jack Freud? Please say yes if you want to reach this person or say no to try again".
I guess it's a plus when you're in your car without holding your phone but is it worth the additional waiting time? Longer interaction for all the choices, longer prompt time while trying to analyze if something was said and so on? Also, reliability is better than it was, definitely, but sometime it feels more like an toy someone decided to plugged into the system so it can feel futuristic.
Any experience designing IVR or software that used (or chose not to) speech recognition?
Thanks!

What's the rationale (in term of
usability) behind it versus a keyboard
or keypad?
Usability is a very broad term. If I were to attempt to enter my address with a touch pad, it wouldn't be considered very usable. Some argue that using a speech engine with an overall success rate of 70-80% isn't very usable either. As indicated in other posts, hands free input can be much easier for those on a mobile phone. However, using words versus numeric input can actually be less intuitive than a touch tone phone if the topic is somewhat foreign to the caller. A caller hearing terms and phrases that aren't very familiar can't remember them in the 10-30 seconds of the prompt but they can hover over the best sounding choice with their finger or remember the order of choices.
What reasons would you have
to invest in this development?
This is an odd question. Usually the decision to use speech or not in an IVR environment is not driven from the development view of the world. Unless you have a specific requirement that really requires speech, you are almost always reducing overall success rates. Speech is usually a factor of corporate image ... or having the latest technological toy.
I guess it's a plus when you're in your car without holding your phone
but is it worth the additional waiting time?
Speech recognition latencies aren't very high these days when using modern ASRs. In most cases, input is handled in parallel with speech and time between end of speech recognition is .5 to 1s. Be aware that many IVRs then need to perform data look-ups after some inputs and this can appear as a slower system. Normal inputs pushing beyond 1s is usually the sign of an under-powered deployment.
It may not have been under-powered when original implemented, but through tuning efforts, you make a lot of performance versus accuracy decisions. To get that next .1%, resources can be pushed beyond what they should be at peak.
Also, reliability is better than it was, definitely,
but sometime it feels more like an toy someone decided
to plugged into the system so it can feel futuristic.
In general, yes. On the reliability note, you need to really look at the overall numbers to get a sense of the system. It is a battle of statistics where the individual isn't very important (unless they hold the title of VP or above). Through optimization of the input (shifting prompting), resource usage and other speech reco tuning parameters you attempt to maximize accuracy. For basic natural language responses, you can get in the upper 90s. However, your overall success rate is much lower. Imagine 5 prompts all at 98% (in reality, you tend to have a bunch 99 and then a few mid 90s or slightly below): .98 * .98 * .98 * .98 * .98 = 90%. That means 1 out of 10 failing. That is before caller confusion and business rules. DTMF input is usually very near 100%, even after several inputs.
Any experience designing IVR or software that
used (or chose not to) speech recognition?
Yes. But, I suspect that really isn't the question you want. As someone on the technology side, this is usually not your decision and you have limited influence on it. If you are really looking for the pros/cons of speech:
Pros:
Cool/hip (note, speech alone isn't sufficient. You need a great VUI and voice talent)
Good for a highly mobile crowd that shuns ear pieces. The future is supposed to be blending speech with tactile input. Maybe. It probably won't come from the IVR side of the market.
Good for tasks that can't be done with DTMF. Note, many of these problems tend to have low success rates in speech as well. Cost (versus humans) is usually the driving factor not usability. Dropping a call into a voicemail box for things like address change can be very cost effective.
Cons:
Expensive to development, deploy and maintain. Adding new choices can have a significant impact on success rates if you aren't careful. Always monitor the impact of change.
Is often deployed inappropriately. For example, just say your numeric menu choice. This is nearly often a case of we want speech coolness, but can't afford what it really takes to achieve speech coolness.
Success rates will be lower and therefore call center costs will be higher.
Failures tend to focus on specific prompts and individual callers. A caller that regularly experiences problems with your system will be very unhappy with you.
Callers get angry when they aren't understood. Is your goal to identify a subset of your customer base and really get them angry ?

I think that speech-recognition like any method of input has it's pro's and con's.
Pro's
No learning curve, we have been speaking since a very young age.
Very user-intuitive.
On the phone, no need to constantly move the headset from your ear.
Con's
Longer wait time
If bad sound quality, takes multiple attempts to get the selection right.

In some cases a company is required to handle rotary phones. It might be found as more cost affective to just setup the recognition system instead of both.
Voice recognition has a lot more overhead than touch tones. If you want the best results you need to constantly tweak the app and train the system on unrecognized word pronunciations. You also need to be very particular on how you prompt the user with voice recognition or you may get unexpected responses.
Overall touch tone is a lot easier as there are only a limited set of possible options at any given time.
If your app is straight forward enough you voice rec many only complicate it. Press 2 for some other language..

Speech recognition is definetly the wave of the future when combined with touchscreen technology. As example I use tazti speech recognition. It's available in XP and Vista version. Since Microsoft's touchscreen "Surface" platform runs on Vista, I'm sure tazti will work with the touchscreen technology. When I tried tazti speech recognition the built in commands worked great. Also it let's me create my own speech commands and those also work great. Voice searching Google and Yahoo, Wikipedia Youtube and many other search engines works great. Has many other features as well. But it doesn't have dictation. I found that I eliminate 70% or more of my internet generated clicks.... maybe more. NOTE: Tazti is a free download from their website.

Related

How can I analyze live data from webcam?

I am going to be working on self-chosen project for my college networking class and I just had a couple questions to help get me started in the right direction.
My project will involve creating a new "physical" link over which data, in the form of text, will be transmitted from one computer to another. This link will involve one computer with a webcam that reads a series of flashing colors (black/white) as binary and converts it to text. Each series of flashes will simulate a packet of data. I will be using OSX an the integrated webcam in a Macbook, the flashing computer will either be windows or osx.
So my questions are: which programming languages or API's would be best for reading live webcam data and analyzing the color of a certain area as well as programming and timing the flashes? Also, would I need to worry about matching the flash rate of the "writing" computer and the frame capture rate of the "reading" computer?
Thank you for any help you might be able to provide.
Regarding the frame capture rate, Shannon sampling theorem says that "perfect reconstruction of a signal is possible when the sampling frequency is greater than twice the maximum frequency of the signal being sampled". In other words if your flashing light switches 10 times per second, you need a camera of more than 20fps to properly capture that. So basically check your camera specs, divide by 2, lower the resulting a little and you have your maximum flashing rate.
Whatever can get the frames will work. If the light conditions in which the camera works are gonna be stable, and the position of the light on images is gonna be static then it is gonna be very very easy with checking the average pixel values of a certain area.
If you need additional image processing you should probably also find out about OpenCV (it has bindings to every programming language).
To answer your question about language choice, I would recommend java. The Java Media Framework is great and easy to use. I have used it for capturing video from webcams in the past. Be warned, however, that everyone you ask will recommend a different language - everyone has their preferences!
What are you using as the flashing device? What kind of distance are you trying to achieve? Something worth thinking about is how are you going to get the receiver to recognise where within the captured image to look for the flashes. Some kind of fiducial marker might be necessary. Longer ranges will make this problem harder to resolve.
If you're thinking about shorter ranges, have you considered using a two-dimensional transmitter? (given that you're using a two-dimensional receiver, it makes sense) and maybe have a transmitter that shows a sequence of QR codes (or similar encodings) on a monitor?
You will have to consider some kind of error-correction encoding, such as a hamming code. While encoding would increase the data footprint, it might give you overall better bandwidth given that you can crank up the speed much higher without having to worry about the odd corrupt bit.
Some 'evaluation' type material might include you discussing the obvious security risks in using such a channel - anyone with line of sight to the transmitter can eavesdrop! You could suggest in your writeup using some kind of encryption, a block cipher in CBC would do, but would require a key-exchange prior to transmission, so you could think about public key encryption.

What's the difference between game development and business development?

Like most developers, I'm a business developer, which in essence consists of slapping a UI onto some back-end data store. (We all know there's a lot more to it than that, but that's usually what it boils down to.)
I understand that game development is very different from business development, but I'm having a hard time explaining it to a friend of mine. I was hoping the SO community could help me out here.
To me, modern game developers deal a lot with manipulating 3-dimensional graphics. In gaming code (and I'm guessing here), you're assembling polygons (or something like that), rotating 'em, etc. This involves a different way of thinking from manipulating relational data (for instance). I don't know, really. I just know it's different.
EDIT:
I should stress that by "development" I mean "programming," not all of the aspects that go into creating a game or piece of business software. I'm sorry I didn't make that clear originally.
Thanks!
I'm in game development but came from business development long ago. Game development is very rigorous in mathematics if you work on the physics or graphics side. Even AI can need quite a bit of mathematics for the low-level stuff. The hardware usually takes care of a lot of the polygon manipulation math as far as drawing on the screen goes. There is also a lot of involvement with generating the in-game data with (often) many tools that are run in a pre-processing step, and that too can be math-intensive if you are generating visibility data.
In terms of programming domains, amongst other things, we deal with:
Graphics programming (including shader development)
Animation
Physics simulation
AI and gameplay
Audio
Networking (typically fairly low-level stuff)
Some of these involve pretty serious maths and algorithms knowledge. On top of all that, we face extremely tough speed constraints, and typically have to be very careful with memory usage too. We face constantly changing hardware, and since we're trying to push hardware to the limit, this can be pretty tough - you can't just abstract it away. Most game development is still quite low-level C++ work. We probably deal with databases less than most other programmers nowadays (although online games are changing this)!
Programmers are often the minority on modern game projects: it's all about content creation (animation, modelling, texturing, audio and design). This means many game programmers are dedicated to making the content creation process efficient, rather than working on the game code itself. This work may have more relaxed speed and memory constraints, although it does have to deal with massive data sets.
Making the game 'fun' is one of the hardest things to do - in business terminology, it "means extremely unstable requirements" as the designers constantly change their mind about how things should work, to chase down that elusive fun factor.
Finally, games are generally a ship-once, no chance to fix stuff kind of deal. This actually means there's very little code maintenance involved, so traditionally there may have been less attention paid to code quality issues. This is changing now with the growth in post-launch content addition, online gaming and the sheer size of modern projects.
Overall it's an incredibly exciting field to be in, the downside is that it's often less well paid (because it's a very tough business financially for developers, and because it's popular, there's always a fresh supply of people looking for jobs).
Just some random thoughts about what is different in game development. Note that there might be some sarcasm in it, though I tried to suppress the urge.
Unless you're a lucky employee of one of those new-style studios (like Eidos Montreal or Blizzard), there is always a deadline to fear that is much too short. In business programming, you mostly make the deadline up for yourself.
A business application serves some specific need. A game's intent is to entertain people. You can't really predict if a game will fail until it's out.
Performance is essential, in every aspect of the game. Writing code that is good to maintain is second priority. In business programming, good code that works is top priority.
For a business application, a shiny UI is a bonus. For a game, it is a must.
Debugging games is much harder, because there is always some hardware dependence which results in bugs that can only be reproduced on some machines, none of which is in your company. And a game sucks up much more performance than a typical business application.
You have people dedicated to creating the art, story, music, sound, background and design, none of which necessarily needs programming knowledge (scripting is a little different), i.e. you have a lot of content which is what the users (players) will see. Nobody cares about how good your code is, unless performance is bad or there are bugs. The others get the praise.
For larger games, you have programmers dedicated just to 3D graphics, networking, audio, tools, scripting, physics and so on. Most of them are highly specialized and each of them can lead the game into a disaster. You'll only need advanced math skills if you're the graphics or physics guy. Well, or AI.
Most games are fire-and-forget, apart from some bugfixes, unless it's one of the more successful games, which get an expansion pack or a sequel.
Security is an important issue for online games, since there are much more annoying people trying to to put people off than there are for business applications, many of which are for (more or less) internal uses at the customer.
You are expected to work much more than when writing business applications.
To land a job for an AAA title, you need to have worked on at least three shipped AAA titles (no, no typo here, ever read some job descriptions at Blizzard or LucasArts? :P)
But here come the good things:
You can pretend to work when you're playing games.
And finally, programming games is fun. Priceless.
Business development is generally much more forgiving.
The reason is basically this; usually, people ARE PAID to use business software. People PAY to use game software.
This may sound like it's not answering your question, but it really is. When my boss says "use microsoft word for that document", they're providing the software, and I'm obligated to use micosoft word. And so, when using it, when it decides to renumber all my chapter headings "just because" or a save to disk takes 30 seconds while it resolves OLE references (it's JUST ONE FREAKING EXCEL SPREADSHEET, for heaven's sake!), I just grit my teeth and remind myself I'm getting paid to do this.
Whereas, if I'm in a game, I'm expecting entertainment. I'm expecting the experience to work properly, and smoothly, and cleanly, with no major stutters or problems.
Again, getting down to why this is an issue for programming; those loops and structures in the game had better be DAMN good to make sure there is no major slowdown, no stuttering in the game engine, nothing that makes the consumer who just spent X amount of his hard-earned dollars say "this is a piece of crap" and walk away. With business software, you can get away with that sort of thing; in some ways, it's almost expected. Again, look at the performance of Microsoft Word; if it were a game, it would be laughed out of existence.
I know I sound like I'm picking on Microsoft Word, and I generally am, because I find it to be hideous, but the point is true for so many pieces of software. CAD software is another example. Same basic things going on as in games, but in general it's slow and hard to work with without a lot of training.
The difference comes down to polish, and the level of polish that's expected. Yes, there's generally more flexibility in business software than there is in games; but moreover and more importantly from a coding perspective, the code has GOT to work efficiently and cleanly in a game; business software is, generally, more forgiving of sloppy code.
In a business app, unoptimized and slow algorithms are generally accepted; and while they're never preferable, frequently the business decision gets made to add another feature instead of improving the performance. But in games, performance IS a feature, and one which is make-or-break.
One should have infinite loops, one shouldn't.
One should have infinite loops, one shouldn't. - Rich Bradshaw
Rich is right. Fundamentally, from a coding standpoint, a game loop creates a "frame" of action in which actions are taken based on the state of the game such as controller input, object collisions, etc. This loop repeats infinitely until some state of some game element or input tells it to stop or "quit." This approach keeps the CPU and graphics card pretty busy, hence the market for gamer machines with fast processors and even faster graphics cards.
Business applications do not have an active loop. Instead, they sit idle waiting for an event such as a click, a message from a web service client, an HTTP GET request, etc. Then they respond to the event.
Sure, gaming is generally more geometrically intensive than business applications, but that is not entirely true. Consider image editing, CAD and graphics tools. For many, these are business applications. But for the most part, a business application has to do with querying data, displaying that data, accepting user input, and modifying the data based on user input. In many cases, the business application does this across the network or even the Internet, but it's an apt nutshell.
The skillset and mindset of a business application developer and the game developer is often different. The game developer has a limited number of input constructs to consider in terms of creating a user experience with an unlimited choice of context or "world" if you will. The business developer is the opposite, with a limited set of potential contexts, usually the web page or the basic window, and an unlimited (or nearly so) set of input and data display combinations to create a user experience entirely different than the game developer sets out to achieve.
One big difference between business development and game development is the number of disciplines involved. Most business software is created by a team of developers, who all have the same basic skillset. In contrast, a game is created by a team of game designers, visual artists, 3d modelers, animators, musicians, and developers.
Good points about mathematics and integration of artists and other specialists in the team. In addition, I'd say that:
Game development, to some extend, will be more hardware dependent. In many cases, games are built simultaneously to several platforms and consoles (not to mention cellphones), with different architectures. That is abstracted up to a certain extent, but developers cannot completely avoid this fact.
Game development is often more performance sensitive, or at least the performance requirements are different. You're dealing with real-time experience, so a lot of time is spent optimizing those pesky fps.
In many cases, game development does not care as much about reuse and maintainability. The game engine will probably be reused, but the application code base will probably not live to see v2.0. In the last stretch of a project, there is a lot of quick and dirty debugging going on. If it looks fine to the end user, there's no added value in making an elegant fix two days before the release.
Let's start from the goal - the goal of game development is to create entertaining product. It should be accurate to the extend that it looks good and runs smoothly. The goal of a business software solution is to model a work process. It should be a tool which works fast enough. A stable product, which executes absolutely accurately and secure the tasks it should do.
Since we target different goals, we use different approaches to build a game and a business software. Let's move to the requirements. For a game, the requirements are determined by the game designer. For a software product the business defines the process and the requirements. For a game the requirements are not final - shall we have small cartoon figures or real human models - this does not matter for the game engine for example. But for a software product, the requirements should be strictly followed and cleared to the maximum possible detail before development.
From the different requirements come different software design and development approach. For a game the performance and gameplay are critical and the qualiity of the graphics and sounds (for example) could be reduced just to be compatible with weaker hardware. Also the physical model could be simplified just to run smoother and improve the gameplay. For the business software everything should be exact and cutting features means that your product will not be as functional as designed anymore.
For a game, the security is not important - there is no critical customer data which should be saved. For a business software a good security system should be supplied - starting from data encryption (while saving on data storage or transferring through network) moving through backup system and mentioning (but not last) the compatibility with previous versions.
I could continue with other aspects but I guess this is already too much for one post...
Business software (that isn't shrink-wrap software) can generally be much more poorly written but still considered a commercial success due to the bizarre disconnect between the quality of the product and saleability of the product. Game software, on the other hand, has to actually be good to survive the marketplace.
The bar for quality in specialized business software is generally much lower.
Business software has to be reliable, maintainable, consistent, not be too annoyingly slow, and can build on lots of already written stuff, such as databases, controls, forms etc.
A games programmer often starts with a blank sheet - hardware reference manuals, some documentation about the hardware and usually thin vendor libraries around some advanced hardware that's completely different to the last job.
From this they have to build what you see - and make most of it work within a 20ms time period, reliably, and often within a ridiculously short time period, facing changing requirements and often a very hard deadline, working untold numbers of hours for a comparative pittance.
That's not to mention often having to master some fairly complex mathematics and physics....
Performance is really the difference, from what I can tell.
Technologywise, games are usually Windows/C++ driven.
Game programming has more in common with scientific programming. You are modeling behavioral systems and anticipating results based upon a limited set of input.

How to measure usability to get hard data?

There are a few posts on usability but none of them was useful to me.
I need a quantitative measure of usability of some part of an application.
I need to estimate it in hard numbers to be able to compare it with future versions (for e.g. reporting purposes). The simplest way is to count clicks and keystrokes, but this seems too simple (for example is the cost of filling a text field a simple sum of typing all the letters ? - I guess it is more complicated).
I need some mathematical model for that so I can estimate the numbers.
Does anyone know anything about this?
P.S. I don't need links to resources about designing user interfaces. I already have them. What I need is a mathematical apparatus to measure existing applications interface usability in hard numbers.
Thanks in advance.
http://www.techsmith.com/morae.asp
This is what Microsoft used in part when they spent millions redesigning Office 2007 with the ribbon toolbar.
Here is how Office 2007 was analyzed:
http://cs.winona.edu/CSConference/2007proceedings/caty.pdf
Be sure to check out the references at the end of the PDF too, there's a ton of good stuff there. Look up how Microsoft did Office 2007 (regardless of how you feel about it), they spent a ton of money on this stuff.
Your main ideas to approach in this are Effectiveness and Efficiency (and, in some cases, Efficacy). The basic points to remember are outlined on this webpage.
What you really want to look at doing is 'inspection' methods of measuring usability. These are typically more expensive to set up (both in terms of time, and finance), but can yield significant results if done properly. These methods include things like heuristic evaluation, which is simply comparing the system interface, and the usage of the system interface, with your usability heuristics (though, from what you've said above, this probably isn't what you're after).
More suited to your use, however, will be 'testing' methods, whereby you observe users performing tasks on your system. This is partially related to the point of effectiveness and efficiency, but can include various things, such as the "Think Aloud" concept (which works really well in certain circumstances, depending on the software being tested).
Jakob Nielsen has a decent (short) article on his website. There's another one, but it's more related to how to test in order to be representative, rather than how to perform the testing itself.
Consider measuring the time to perform critical tasks (using a new user and an experienced user) and the number of data entry errors for performing those tasks.
First you want to define goals: for example increasing the percentage of users who can complete a certain set of tasks, and reducing the time they need for it.
Then, get two cameras, a few users (5-10) give them a list of tasks to complete and ask them to think out loud. Half of the users should use the "old" system, the rest should use the new one.
Review the tapes, measure the time it took, measure success rates, discuss endlessly about interpretations.
Alternatively, you can develop a system for bucket-testing -- it works the same way, though it makes it far more difficult to find out something new. On the other hand, it's much cheaper, so you can do many more iterations. Of course that's limited to sites you can open to public testing.
That obviously implies you're trying to get comparative data between two designs. I can't think of a way of expressing usability as a value.
You might want to look into the GOMS model (Goals, Operators, Methods, and Selection rules). It is a very difficult research tool to use in my opinion, but it does provide a "mathematical" basis to measure performance in a strictly controlled environment. It is best used with "expert" users. See this very interesting case study of Project Ernestine for New England Telephone operators.
Measuring usability quantitatively is an extremely hard problem. I tackled this as a part of my doctoral work. The short answer is, yes, you can measure it; no, you can't use the results in a vacuum. You have to understand why something took longer or shorter; simply comparing numbers is worse than useless, because it's misleading.
For comparing alternate interfaces it works okay. In a longitudinal study, where users are bringing their past expertise with version 1 into their use of version 2, it's not going to be as useful. You will also need to take into account time to learn the interface, including time to re-understand the interface if the user's been away from it. Finally, if the task is of variable difficulty (and this is the usual case in the real world) then your numbers will be all over the map unless you have some way to factor out this difficulty.
GOMS (mentioned above) is a good method to use during the design phase to get an intuition about whether interface A is better than B at doing a specific task. However, it only addresses error-free performance by expert users, and only measures low-level task execution time. If the user figures out a more efficient way to do their work that you haven't thought of, you won't have a GOMS estimate for it and will have to draft one up.
Some specific measures that you could look into:
Measuring clock time for a standard task is good if you want to know what takes a long time. However, lab tests generally involve test subjects working much harder and concentrating much more than they do in everyday work, so comparing results from the lab to real users is going to be misleading.
Error rate: how often the user makes mistakes or backtracks. Especially if you notice the same sort of error occurring over and over again.
Appearance of workarounds; if your users are working around a feature, or taking a bunch of steps that you think are dumb, it may be a sign that your interface doesn't give the tools to figure out how to solve their problems.
Don't underestimate simply asking users how well they thought things went. Subjective usability is finicky but can be revealing.

Ratio of real code to supporting code

I'm finding only about 30% of my code actually solves problems, the rest is taken up by logging, tests, parameter checking, exceptions, error handling and so on. Do you find that in your code, and is there an IDE/Editor that allows you to hide code that's not interesting?
OTOH are there languages which make the support code more manageable and smaller in size?
Edit - I think we're all aware of the difference between business logic and other code. I'm not saying that the logging etc is not important. The things is, when I'm coding I'm either implementing business logic, or I'm making sure things don't break. For me that's two different ways of thinking, do others develop like that, and is there an IDE that supports that way of developing?
Supporting code is just as important as the "real code". The quality of your product is determined as much by supporting code as anything else.
Consider an automobile. In terms of just getting from point A to point B, that requires nothing more than a go-cart: a frame, a seat, an engine, a few tires. But modern cars have a lot more than just the basics. Highly efficient engines using electronic engine timing. Automatic transmissions. Bucket seats. Heating and A/C. Rack and pinion steering. Power brakes. Anti-lock brakes. Quiet, comfortable cabins protected from the weather. Air bags. Crumple zones and other advanced safety features. Etc. Etc.
Details and execution are important, even in software. If you find that your "supporting code" tends to look more like kludges and hacks, then it's time to rethink your fundamental approach. But ultimately the fit and finish determines quality of the end product as much as anything else.
Edit: The questions you should ask yourself:
Is your "supporting code":
An umbrella duct taped to a pole or a metal and glass cabin frame?
A piece of pipe tied to the front of the car or an energy absorbing bumper integrated into a crumple zone?
A grappling hook on a rope tied to the frame or 4-wheel anti-lock power brakes?
A pair of goggles and a thick coat or a windshield and a heating system?
Answers to these questions will probably affect how much you care about your "supporting code".
Edit: Response to Dave Turvey's comment:
I'd encourage rereading the original question, one of the examples of "support code" listed is "error handling". Consider this for a moment. Imagine it in the context of, say, an automobile, a microwave oven, or even an operating system. Should error handling be relegated to second class citizenship because it serves a "support" function in some abstract sense? In an automobile the safety features are part of the fundamental design of the vehicle and comprise a substantial part of the value of the car. The safety features and "error handling" of a microwave oven (indeed, of the microwave oven's embedded software as well) are an important part of its value as well. A microwave oven that was improperly shielded could cook food just fine, under the right circumstances, but it would pose a hazard to the operator.
The implicit featureset of every tool (software or otherwise) includes this list:
Robustness
Usability
Performance
Everything anyone has ever built or used has had these features. Failure to understand this will translate to failure to execute well on these features which will make for a poor quality product of low value and low commercial interest. There is no such thing as "support code", there is only a misunderstanding of the nature of what it means for a feature to be complete. A "feature" that works in the abstract only under laboratory conditions is an experiment, not a part of a product.
The idea of pure, pristine features floating on a bog of dirty, ugly support code is the wrong image of software development. Instead, think of elegant, superbly-integrated machinery that is well-built, intuitive to use, and powerful.
The supporting code is important, but you want not to be distracted by it when you don't want to. There are two technologies that can help.
A language with first-class functions will help you modularize your code so that logging, timing, and so on can be implemented once and then combined with many other modules. It will also help you write unit tests. Some good ways to learn the techniques are to read the paper Why Functional Programming Matters and to learn about the QuickCheck tool. (No, I am not a shill for John Hughes, but he does do wonderful work.)
If you cannot use a programming language with powerful capabilities for modularization and reuse, or if you don't want to, Don Knuth's Literate Programming technique will help you organize your code so that you can split up parts the way you want and pay attention only to what you want, when you want. The Noweb literate-programming tool supports any language that can be written in ASCII, and also combinations of those languages.
If my IDE could hide "not interesting code" I would definitely turn the feature off. You wouldn't want that happening, I bet :)
There are certainly languages that minimise the amount of supporting code, but I don't think you could switch from Java to lets say JavaScript simply because in JavaScript you wouldn't have to declare every exception... I think it's quite necessary to have your supporting code where it is.
Oh, and you could have your program formally specified and mathematically proven, then you wouldn't need to support your code too much ;D
The real code you are referring to is usually called "Business Logic".
In a good unit testing system, your unit tests should be in their own classes (and probably their own assemblies) so that shouldn't be an issue.
The rest is language based for the most part. The more advanced a language, the better it's ability to avoid writing support code to some degree. Also, a well-targetted development system can help you avoid writing a lot of code (Visual Basic's screen builder, Ruby on Rails, ...) but these abstractions can break down and cause you to write just as much code as anything else if you use it to develop targets outside it's intended types of apps. (VB & Ruby don't help all that much if you're calculating prime numbers)
Beyond the language/platform, you have refactoring--the art of eliminating all the supporting code that you can (as well as redundancies in your business logic) to keep your code-base clean and small.
When practicing advanced refactoring, you'll probably end up writing tools for yourself.
Sometimes abstracting data out of your code and into a structured file of some sort can eliminate huge piles of support code and move the rest into "Business logic" because now parsing that data and setting it up is part of the "business" your program does.
This is a good trade-off because this type of business logic tends to be more readable and easier to factor. The other advantage of this kind of abstraction is that all your "Configuration" is now done in data which tends to make it somebody elses' problem.
As an example of this type of tooling: Rails itself! It takes a lot of the boilerplate of web development and factors it out of the code and into libraries driven by data and simplistic code (Ruby blurs the line between code and data--their data files actually loop back to being specified in Ruby code!)
It's like you want to take a trip to the top of Pike's Peak. You can take the Winnebago, you can take your SUV, or a motorcycle, or ride up on your bike.
Some ways are a more or less expensive, faster, etc. Sometimes you end up taking along a lot of stuff the isn't there strictly for accomplishing vertical progress; it's nice to have a beer in the cooler. But it pays to remember that you're responsible for everything that goes with you to the top.
Aspect Oriented Programming partly addresses this. It allows you to inject code into existing source/bytecode. This way you can make a task such as logging appear in its own module instead of woven into the business logic.
Work expands to fill it's container. This really sounds like an economics question. (ie. optimizing your outputs- features for users and features for the developer) with expensive inputs (time spent writing features, time spent writing plumbing code.)
You have to include user visible features or you don't have a viable product or job. Once that is done partly done, your remaining budget of time will be split between activities with a visible return on effort and an invisible (but positive!) return on effort, like exception logging, memory management, etc.
What ever language makes it cheaper to implement features will probably increase your features/to plumbing code ratio. Likewise, whatever language makes it cheaper to implement plumbing code will probably increase the feature to plumbing code because you'll have freed up more time to write features.
Like all optimization problems you'd have 2 effects-- the increase in the size of the support code (because say, you're using cheap code generation) and the increase in the size of feature related code (because you have more time left over to write features), so the final ratio might be hard to predict.
I do not begrudge the 90% of my code that is data access plumbing, because it is all testable, code generated and very cheap, compared to the 10% of handwritten of domain specific code.
I don't try to make all routines foolproof, only those exposed to the outside world.
http://en.wikipedia.org/wiki/Folding_editor
Higher and more dynamic languages are usually less verbose. Weak typing also saves a lot of code. Of course there are trade-offs.
I use the #region directive in Visual Studio to collapse blocks of code that are not the primary focus, e.g. unit tests. With log4net logging statements are only ever one line. I haven't found any approaches to reduce the parameter checking code although it sounds like C# 4 has some kind of contract framework that will help there.
I have some coworkers who once, while being chewed out by a client for an overdue and bug-ridden project, bragged to the customer that they had written 5 times as much test code as operational code.
The client was not happy, and by "not happy" I mean their skin turned green, they grew to 5 times their normal size, and their clothes popped off.
You could just make a static class in a utilities assembly that checks your parameters and things. For instance in the Spring Framework (which is where I got the idea) it has an Assert class and it makes it really fast to make sure that string params aren't empty or that object params aren't null. It cleans up validation code and reduces duplicate code which is a win win.

How to get started with speech-to-text?

I'm really interested in speech-to-text algorithms, but I'm not sure where to start studying up on them. A bunch of searching around led me to this, but it's from 1996 and I'm fairly certain that there have been improvements since then.
Does anyone who has any experience with this sort of stuff have any recommendations for reading / source code to examine? Or just general advice on what I should be trying to learn about if I want to get into the world of writing speech recognition programs (sometimes it's hard to know what to search for if you don't have much knowledge about the domain).
Edit: I'd like to do something cross-platform, but for the moment I'd be targeting linux.
Edit 2: Thanks csmba for the well-thought out reply. At this point in time, I'm mainly interested in being able to create applications that allow automation, or execution of different commands through voice. So, a limited amount of recognizable commands being able to be strung together. An example would be a music player that took commands like "Play the album Hello Everything by Squarepusher", or an application launcher that allowed the user to create voice-shortcuts to launch specific apps.
I realize that it's a pretty giant problem, and that I have nowhere near the level of knowledge required right now to tackle implementing an entire recognition engine, although the techniques involved with doing so fascinate me, and it is something I'd like to work myself up to doing. In all likelihood, I'll probably end up picking up a book or two on the subject and studying up / playing with "simple" implementations in my free time.
This is a HUGE questions, I wouldn't know how to begin... So let me just try giving you the right "terms" so you can refine your quest:
First, understand that Speech Recognition is a diverse and complicated subject, and it has many different applications. People tend to map this domain to the first thing that comes to their head (usually, that would be computers understanding what you are saying like in IVR systems). So first lets distinguise the concept into the main categories:
Human-to-Machine: Applications that deal with understanding what a human is saying, but the human knows he is talking to a machine and the grammar is very limited. Examples are
Computer automation
Specialized: Pilots automating some controls for example (noise a huge problem)
IVR (Interactive Voice Response) systems like Google-411 or when you call the bank and the computer on the other side says "say 'service' to get customer service"
human-to-human (Spontaneous speech): This is a bigger, more complex problem. Here we can also break it down into different applciations:
Call Center: conversation between Agent-Customer, phone quality, compressed
Intelligence: radio/phone/live conversations between 2 or more individuals
Now, Speech-To-Text is not what you should be saying that you care about. What you care about is solving a problem. Different technologies are used to solve different problems. See an overview here of some of them. to summarize, other approaches are Phonetic transcription, LVCSR and direct based.
Also, are you interested in being the PHd behind the technology? you would need a Masters equivalent involving Signal processing and probably a PHd to be cutting edge. In which case, you will work for a company that develops the actual speech engine. Companies like Nuance and IBM are the big ones, but also Phillips and other startups exist.
On the other hand, if you want to be the one implementing applications, you will not be working on the engine, but working on building application that USE the engine. A good analogy I think is form the gaming industry:
Are you developing the graphic engine (like the Cry engine), or working on one of several hundred games, all use the same graphic engine?
Don't get me wrong, there is plenty to work on the quality of the search also outside the IBM/Nuance of the world. The engine is usually very open, and there are a lot of algorithmic tweaking to be done that can dramatically affect performance. Each business application has different constraints and cost/benefit function, so you can make experiments for many years building better voice recognition based applications.
one more thing: in general, you would also want to have good statistics background the lower in the stack you want to be.
At this point in time, I'm mainly interested in being able to create applications that allow automation
Good, we are converging here... Then you have no interest in "Speech-to-Text". That buzzwords takes you to the world of full transcription, a place you do not need to go to. You should be focusing on some of the more Human-to-Machine technologies like Voice XML and the ones used in IVR systems (Nuance is the biggest player there)
I would definitely recommend picking up a book or two if you are new to the field. I've got no experience in the field, so I can't make a recommendation. If you are still in college (or still have close ties), you should find out if any of your professors can make a recommendation.
The survey you linked is probably an excellent resource, too. I'm sure there have been advancements since 1996, but the basics are unlikely to have fundamentally changed. If the survey is well-written, then it would be well worth your time to read it.
For OS X check out this: OS X Speech Technologies
For Windows check out this: Microsoft Speech API
I have worked with IBMs ViaVoice product. It has a good ASR (automated speech recognition) engine, and a nice text-to-speech engine.
The websites not very good, but this is a link for the Embedded version http://www-01.ibm.com/software/voice/support/
It is platform agnostic though, and everything works through a MVC architecture using vxml a variant of xml for voice purposes.
What platform are you targeting ?. There is Microsoft Speech APIs that you can use if its for windows.
There is also the Speech Recognition Service for Android.