How to get started with speech-to-text? - language-agnostic

I'm really interested in speech-to-text algorithms, but I'm not sure where to start studying up on them. A bunch of searching around led me to this, but it's from 1996 and I'm fairly certain that there have been improvements since then.
Does anyone who has any experience with this sort of stuff have any recommendations for reading / source code to examine? Or just general advice on what I should be trying to learn about if I want to get into the world of writing speech recognition programs (sometimes it's hard to know what to search for if you don't have much knowledge about the domain).
Edit: I'd like to do something cross-platform, but for the moment I'd be targeting linux.
Edit 2: Thanks csmba for the well-thought out reply. At this point in time, I'm mainly interested in being able to create applications that allow automation, or execution of different commands through voice. So, a limited amount of recognizable commands being able to be strung together. An example would be a music player that took commands like "Play the album Hello Everything by Squarepusher", or an application launcher that allowed the user to create voice-shortcuts to launch specific apps.
I realize that it's a pretty giant problem, and that I have nowhere near the level of knowledge required right now to tackle implementing an entire recognition engine, although the techniques involved with doing so fascinate me, and it is something I'd like to work myself up to doing. In all likelihood, I'll probably end up picking up a book or two on the subject and studying up / playing with "simple" implementations in my free time.

This is a HUGE questions, I wouldn't know how to begin... So let me just try giving you the right "terms" so you can refine your quest:
First, understand that Speech Recognition is a diverse and complicated subject, and it has many different applications. People tend to map this domain to the first thing that comes to their head (usually, that would be computers understanding what you are saying like in IVR systems). So first lets distinguise the concept into the main categories:
Human-to-Machine: Applications that deal with understanding what a human is saying, but the human knows he is talking to a machine and the grammar is very limited. Examples are
Computer automation
Specialized: Pilots automating some controls for example (noise a huge problem)
IVR (Interactive Voice Response) systems like Google-411 or when you call the bank and the computer on the other side says "say 'service' to get customer service"
human-to-human (Spontaneous speech): This is a bigger, more complex problem. Here we can also break it down into different applciations:
Call Center: conversation between Agent-Customer, phone quality, compressed
Intelligence: radio/phone/live conversations between 2 or more individuals
Now, Speech-To-Text is not what you should be saying that you care about. What you care about is solving a problem. Different technologies are used to solve different problems. See an overview here of some of them. to summarize, other approaches are Phonetic transcription, LVCSR and direct based.
Also, are you interested in being the PHd behind the technology? you would need a Masters equivalent involving Signal processing and probably a PHd to be cutting edge. In which case, you will work for a company that develops the actual speech engine. Companies like Nuance and IBM are the big ones, but also Phillips and other startups exist.
On the other hand, if you want to be the one implementing applications, you will not be working on the engine, but working on building application that USE the engine. A good analogy I think is form the gaming industry:
Are you developing the graphic engine (like the Cry engine), or working on one of several hundred games, all use the same graphic engine?
Don't get me wrong, there is plenty to work on the quality of the search also outside the IBM/Nuance of the world. The engine is usually very open, and there are a lot of algorithmic tweaking to be done that can dramatically affect performance. Each business application has different constraints and cost/benefit function, so you can make experiments for many years building better voice recognition based applications.
one more thing: in general, you would also want to have good statistics background the lower in the stack you want to be.
At this point in time, I'm mainly interested in being able to create applications that allow automation
Good, we are converging here... Then you have no interest in "Speech-to-Text". That buzzwords takes you to the world of full transcription, a place you do not need to go to. You should be focusing on some of the more Human-to-Machine technologies like Voice XML and the ones used in IVR systems (Nuance is the biggest player there)

I would definitely recommend picking up a book or two if you are new to the field. I've got no experience in the field, so I can't make a recommendation. If you are still in college (or still have close ties), you should find out if any of your professors can make a recommendation.
The survey you linked is probably an excellent resource, too. I'm sure there have been advancements since 1996, but the basics are unlikely to have fundamentally changed. If the survey is well-written, then it would be well worth your time to read it.

For OS X check out this: OS X Speech Technologies
For Windows check out this: Microsoft Speech API

I have worked with IBMs ViaVoice product. It has a good ASR (automated speech recognition) engine, and a nice text-to-speech engine.
The websites not very good, but this is a link for the Embedded version http://www-01.ibm.com/software/voice/support/
It is platform agnostic though, and everything works through a MVC architecture using vxml a variant of xml for voice purposes.

What platform are you targeting ?. There is Microsoft Speech APIs that you can use if its for windows.

There is also the Speech Recognition Service for Android.

Related

Programming practice

I've decided to get some experience working on some project this summer.
Due to local demand on market I would prefer to learn Java (Standard and Enterprise Editions).
But I can't even to conjecture what kind of project to do. Recently I had some ideas about C. With C I could to contribute to huge Linux projects. I don't mean that my work will be surely commited. I could get the code and practice with it. But C it's not right thing to get good job in my area. In case of JavaSE there is a chance to develop some desktop applications. But thinking about JavaEE I get stuck. I'll be very thankful for answers.
CodingBat.com will give you good core Java practice.
Project Euler is still the best for all around practice. You can use whatever language you'd like to solve the problems there.
For actual projects, I almost always start on something easy like a Twitter client. It gets you exposure to all the basics along with UI and network communication. You can work up from there. Just don't start with something so overwhelming that you can't figure it out and want to give up. That's not going to get you anywhere.
The best advice is: work on a project that you have personal interest in. Something based on your hobbies, maybe.
If that doesn't work, make a blogging / CMS engine. Or an online photo album. Or an eStore. The world doesn't really need another of any of these things, but it will give you some good practical experience with JavaEE.
Another benefit of "re-inventing the wheel" (for learning) is that you have probably already used systems like these described above, and you have a good idea of how it can work, and maybe you have your own ideas of how it could work better. That can make requirements much simpler, and also will give you a sort of benchmark so you can see how close you can come to building a tool like the "real" ones out there. And if yours is really great, well, maybe release it and see what happens. ;)
There are many Java-based projects on SourceForge. Tinker with one you find interesting.
I've implemented either a betting pool or a Baccarat game in almost every language I've
learned.
This type of software covers:
Dates and times, with calculations
Currency types and things that can be converted to and from currency.
A discrete set of rules that is easy to test
States, transition between states and multiple entities responsible for state transition
Multiple users with different views of the same model End conditions
Multiple player blackjack and poker would work also.
One caveat is that in my day job I work on financial systems and there is a huge overlap
between things to consider when writing a multiplayer game of chance and a trading system.
build an address book. the concept is simple, so you're not stuck on "what" to write. You can focus on learning your chosen language. You get experience in working with a database, java ( insert any language here), and UI design.
when you decide to learn another language you can create the same thing. Since the database has been created already, you can focus on the language itself.
the concept of inputting data, storing data, and retrieving data is central to a lot of applications.
Have a look around http://openhatch.org/ for a project that sounds interesting.

framework for browser based MMO?

I want to create a browser based MMO similar to "monopoly city streets." Is there a good framework available for this kind of thing?
Generally speaking, browser based 'MMOs' have little in common on the technical level with MMOs and are usually just websites with a recreational element. As such, your options are much the same as they are for any website, with the added caveat that you probably want a richer client than Javascript can offer. Flash and Silverlight are your two main contenders there and there are various libraries and frameworks available for them.
One option I know about that is geared directly towards larger online games is SmartFoxServer, which comes highly recommended. This is better suited to games that require a real-time element, although in practice such games are rarer than you think.
The short answer: no.
The long answer:
Back in 2003 or so, I was using Game Maker extensively. I would frequent the Game Maker Community very often, and every now and then a question would pop up in the Novice Questions & Answers section: "How I make MMORPG?".
There is no framework for making a browser-based MMORPG because the subject is vast. RuneScape is an MMORPG, and it's Java-based. But so is Kingdom of Loathing, and it's based on PHP (turn-based).
Also, you will need a design that is better than "Our game is going to be like X."
You could use MMO.js... it allows you to build great MMORPG's without worrying about sockets, threads or the server side handling...
Monopoly City Streets is itself built upon two publically available APIs [1], one of which is suited nicely to real-time game development although neither is comprehensive nor designed for 'non technical' use.
MMO is a catchall term that can refer to a great deal of different technical approaches and the differing hazards and skills required to attempt them. Effectively it refers to scale, rather than the actual style of game. Whilst a framework might deal with a very specific type of game concept, it's unlikely to be what you had in mind.
Certainly to my knowledge there is no layman's MMO framework for any of the common mapping APIs.
[1] http://en.wikipedia.org/wiki/Monopoly_City_Streets

Usability: speech recognition versus keypad

We are seeing more and more speech recognition implemented and request for libraries that does good speech recognition. What's the rationale (in term of usability) behind it versus a keyboard or keypad? What reasons would you have to invest in this development?
For example, let's take the call centers. A few years ago, almost every call center used an IVR that prompted for a key for the menus. Now, we're seeing more and more menus with prompt for a spoken keyword and/or a pressed keypad: "please say invoice or press 1 to see your invoice". Or we are seeing the same thing in companies' phone directory: "please say the name of the person you are trying to reach" ... "Franck Loyd" ... "Did you say Jack Freud? Please say yes if you want to reach this person or say no to try again".
I guess it's a plus when you're in your car without holding your phone but is it worth the additional waiting time? Longer interaction for all the choices, longer prompt time while trying to analyze if something was said and so on? Also, reliability is better than it was, definitely, but sometime it feels more like an toy someone decided to plugged into the system so it can feel futuristic.
Any experience designing IVR or software that used (or chose not to) speech recognition?
Thanks!
What's the rationale (in term of
usability) behind it versus a keyboard
or keypad?
Usability is a very broad term. If I were to attempt to enter my address with a touch pad, it wouldn't be considered very usable. Some argue that using a speech engine with an overall success rate of 70-80% isn't very usable either. As indicated in other posts, hands free input can be much easier for those on a mobile phone. However, using words versus numeric input can actually be less intuitive than a touch tone phone if the topic is somewhat foreign to the caller. A caller hearing terms and phrases that aren't very familiar can't remember them in the 10-30 seconds of the prompt but they can hover over the best sounding choice with their finger or remember the order of choices.
What reasons would you have
to invest in this development?
This is an odd question. Usually the decision to use speech or not in an IVR environment is not driven from the development view of the world. Unless you have a specific requirement that really requires speech, you are almost always reducing overall success rates. Speech is usually a factor of corporate image ... or having the latest technological toy.
I guess it's a plus when you're in your car without holding your phone
but is it worth the additional waiting time?
Speech recognition latencies aren't very high these days when using modern ASRs. In most cases, input is handled in parallel with speech and time between end of speech recognition is .5 to 1s. Be aware that many IVRs then need to perform data look-ups after some inputs and this can appear as a slower system. Normal inputs pushing beyond 1s is usually the sign of an under-powered deployment.
It may not have been under-powered when original implemented, but through tuning efforts, you make a lot of performance versus accuracy decisions. To get that next .1%, resources can be pushed beyond what they should be at peak.
Also, reliability is better than it was, definitely,
but sometime it feels more like an toy someone decided
to plugged into the system so it can feel futuristic.
In general, yes. On the reliability note, you need to really look at the overall numbers to get a sense of the system. It is a battle of statistics where the individual isn't very important (unless they hold the title of VP or above). Through optimization of the input (shifting prompting), resource usage and other speech reco tuning parameters you attempt to maximize accuracy. For basic natural language responses, you can get in the upper 90s. However, your overall success rate is much lower. Imagine 5 prompts all at 98% (in reality, you tend to have a bunch 99 and then a few mid 90s or slightly below): .98 * .98 * .98 * .98 * .98 = 90%. That means 1 out of 10 failing. That is before caller confusion and business rules. DTMF input is usually very near 100%, even after several inputs.
Any experience designing IVR or software that
used (or chose not to) speech recognition?
Yes. But, I suspect that really isn't the question you want. As someone on the technology side, this is usually not your decision and you have limited influence on it. If you are really looking for the pros/cons of speech:
Pros:
Cool/hip (note, speech alone isn't sufficient. You need a great VUI and voice talent)
Good for a highly mobile crowd that shuns ear pieces. The future is supposed to be blending speech with tactile input. Maybe. It probably won't come from the IVR side of the market.
Good for tasks that can't be done with DTMF. Note, many of these problems tend to have low success rates in speech as well. Cost (versus humans) is usually the driving factor not usability. Dropping a call into a voicemail box for things like address change can be very cost effective.
Cons:
Expensive to development, deploy and maintain. Adding new choices can have a significant impact on success rates if you aren't careful. Always monitor the impact of change.
Is often deployed inappropriately. For example, just say your numeric menu choice. This is nearly often a case of we want speech coolness, but can't afford what it really takes to achieve speech coolness.
Success rates will be lower and therefore call center costs will be higher.
Failures tend to focus on specific prompts and individual callers. A caller that regularly experiences problems with your system will be very unhappy with you.
Callers get angry when they aren't understood. Is your goal to identify a subset of your customer base and really get them angry ?
I think that speech-recognition like any method of input has it's pro's and con's.
Pro's
No learning curve, we have been speaking since a very young age.
Very user-intuitive.
On the phone, no need to constantly move the headset from your ear.
Con's
Longer wait time
If bad sound quality, takes multiple attempts to get the selection right.
In some cases a company is required to handle rotary phones. It might be found as more cost affective to just setup the recognition system instead of both.
Voice recognition has a lot more overhead than touch tones. If you want the best results you need to constantly tweak the app and train the system on unrecognized word pronunciations. You also need to be very particular on how you prompt the user with voice recognition or you may get unexpected responses.
Overall touch tone is a lot easier as there are only a limited set of possible options at any given time.
If your app is straight forward enough you voice rec many only complicate it. Press 2 for some other language..
Speech recognition is definetly the wave of the future when combined with touchscreen technology. As example I use tazti speech recognition. It's available in XP and Vista version. Since Microsoft's touchscreen "Surface" platform runs on Vista, I'm sure tazti will work with the touchscreen technology. When I tried tazti speech recognition the built in commands worked great. Also it let's me create my own speech commands and those also work great. Voice searching Google and Yahoo, Wikipedia Youtube and many other search engines works great. Has many other features as well. But it doesn't have dictation. I found that I eliminate 70% or more of my internet generated clicks.... maybe more. NOTE: Tazti is a free download from their website.

How do you plan an application's architecture before writing any code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
One thing I struggle with is planning an application's architecture before writing any code.
I don't mean gathering requirements to narrow in on what the application needs to do, but rather effectively thinking about a good way to lay out the overall class, data and flow structures, and iterating those thoughts so that I have a credible plan of action in mind before even opening the IDE. At the moment it is all to easy to just open the IDE, create a blank project, start writing bits and bobs and let the design 'grow out' from there.
I gather UML is one way to do this but I have no experience with it so it seems kind of nebulous.
How do you plan an application's architecture before writing any code? If UML is the way to go, can you recommend a concise and practical introduction for a developer of smallish applications?
I appreciate your input.
I consider the following:
what the system is supposed to do, that is, what is the problem that the system is trying to solve
who is the customer and what are their wishes
what the system has to integrate with
are there any legacy aspects that need to be considered
what are the user interractions
etc...
Then I start looking at the system as a black box and:
what are the interactions that need to happen with that black box
what are the behaviours that need to happen inside the black box, i.e. what needs to happen to those interactions for the black box to exhibit the desired behaviour at a higher level, e.g. receive and process incoming messages from a reservation system, update a database etc.
Then this will start to give you a view of the system that consists of various internal black boxes, each of which can be broken down further in the same manner.
UML is very good to represent such behaviour. You can describe most systems just using two of the many components of UML, namely:
class diagrams, and
sequence diagrams.
You may need activity diagrams as well if there is any parallelism in the behaviour that needs to be described.
A good resource for learning UML is Martin Fowler's excellent book "UML Distilled" (Amazon link - sanitised for the script kiddie link nazis out there (-: ). This book gives you a quick look at the essential parts of each of the components of UML.
Oh. What I've described is pretty much Ivar Jacobson's approach. Jacobson is one of the Three Amigos of OO. In fact UML was initially developed by the other two persons that form the Three Amigos, Grady Booch and Jim Rumbaugh
I really find that a first-off of writing on paper or whiteboard is really crucial. Then move to UML if you want, but nothing beats the flexibility of just drawing it by hand at first.
You should definitely take a look at Steve McConnell's Code Complete-
and especially at his giveaway chapter on "Design in Construction"
You can download it from his website:
http://cc2e.com/File.ashx?cid=336
If you're developing for .NET, Microsoft have just published (as a free e-book!) the Application Architecture Guide 2.0b1. It provides loads of really good information about planning your architecture before writing any code.
If you were desperate I expect you could use large chunks of it for non-.NET-based architectures.
I'll preface this by saying that I do mostly web development where much of the architecture is already decided in advance (WebForms, now MVC) and most of my projects are reasonably small, one-person efforts that take less than a year. I also know going in that I'll have an ORM and DAL to handle my business object and data interaction, respectively. Recently, I've switched to using LINQ for this, so much of the "design" becomes database design and mapping via the DBML designer.
Typically, I work in a TDD (test driven development) manner. I don't spend a lot of time up front working on architectural or design details. I do gather the overall interaction of the user with the application via stories. I use the stories to work out the interaction design and discover the major components of the application. I do a lot of whiteboarding during this process with the customer -- sometimes capturing details with a digital camera if they seem important enough to keep in diagram form. Mainly my stories get captured in story form in a wiki. Eventually, the stories get organized into releases and iterations.
By this time I usually have a pretty good idea of the architecture. If it's complicated or there are unusual bits -- things that differ from my normal practices -- or I'm working with someone else (not typical), I'll diagram things (again on a whiteboard). The same is true of complicated interactions -- I may design the page layout and flow on a whiteboard, keeping it (or capturing via camera) until I'm done with that section. Once I have a general idea of where I'm going and what needs to be done first, I'll start writing tests for the first stories. Usually, this goes like: "Okay, to do that I'll need these classes. I'll start with this one and it needs to do this." Then I start merrily TDDing along and the architecture/design grows from the needs of the application.
Periodically, I'll find myself wanting to write some bits of code over again or think "this really smells" and I'll refactor my design to remove duplication or replace the smelly bits with something more elegant. Mostly, I'm concerned with getting the functionality down while following good design principles. I find that using known patterns and paying attention to good principles as you go along works out pretty well.
http://dn.codegear.com/article/31863
I use UML, and find that guide pretty useful and easy to read. Let me know if you need something different.
UML is a notation. It is a way of recording your design, but not (in my opinion) of doing a design. If you need to write things down, I would recommend UML, though, not because it's the "best" but because it is a standard which others probably already know how to read, and it beats inventing your own "standard".
I think the best introduction to UML is still UML Distilled, by Martin Fowler, because it's concise, gives pratical guidance on where to use it, and makes it clear you don't have to buy into the whole UML/RUP story for it to be useful
Doing design is hard.It can't really be captured in one StackOverflow answer. Unfortunately, my design skills, such as they are, have evolved over the years and so I don't have one source I can refer you to.
However, one model I have found useful is robustness analysis (google for it, but there's an intro here). If you have your use-cases for what the system should do, a domain model of what things are involved, then I've found robustness analysis a useful tool in connecting the two and working out what the key components of the system need to be.
But the best advice is read widely, think hard, and practice. It's not a purely teachable skill, you've got to actually do it.
I'm not smart enough to plan ahead more than a little. When I do plan ahead, my plans always come out wrong, but now I've spend n days on bad plans. My limit seems to be about 15 minutes on the whiteboard.
Basically, I do as little work as I can to find out whether I'm headed in the right direction.
I look at my design for critical questions: when A does B to C, will it be fast enough for D? If not, we need a different design. Each of these questions can be answer with a spike. If the spikes look good, then we have the design and it's time to expand on it.
I code in the direction of getting some real customer value as soon as possible, so a customer can tell me where I should be going.
Because I always get things wrong, I rely on refactoring to help me get them right. Refactoring is risky, so I have to write unit tests as I go. Writing unit tests after the fact is hard because of coupling, so I write my tests first. Staying disciplined about this stuff is hard, and a different brain sees things differently, so I like to have a buddy coding with me. My coding buddy has a nose, so I shower regularly.
Let's call it "Extreme Programming".
"White boards, sketches and Post-it notes are excellent design
tools. Complicated modeling tools have a tendency to be more
distracting than illuminating." From Practices of an Agile Developer
by Venkat Subramaniam and Andy Hunt.
I'm not convinced anything can be planned in advance before implementation. I've got 10 years experience, but that's only been at 4 companies (including 2 sites at one company, that were almost polar opposites), and almost all of my experience has been in terms of watching colossal cluster********s occur. I'm starting to think that stuff like refactoring is really the best way to do things, but at the same time I realize that my experience is limited, and I might just be reacting to what I've seen. What I'd really like to know is how to gain the best experience so I'm able to arrive at proper conclusions, but it seems like there's no shortcut and it just involves a lot of time seeing people doing things wrong :(. I'd really like to give a go at working at a company where people do things right (as evidenced by successful product deployments), to know whether I'm a just a contrarian bastard, or if I'm really as smart as I think I am.
I beg to differ: UML can be used for application architecture, but is more often used for technical architecture (frameworks, class or sequence diagrams, ...), because this is where those diagrams can most easily been kept in sync with the development.
Application Architecture occurs when you take some functional specifications (which describe the nature and flows of operations without making any assumptions about a future implementation), and you transform them into technical specifications.
Those specifications represent the applications you need for implementing some business and functional needs.
So if you need to process several large financial portfolios (functional specification), you may determine that you need to divide that large specification into:
a dispatcher to assign those heavy calculations to different servers
a launcher to make sure all calculation servers are up and running before starting to process those portfolios.
a GUI to be able to show what is going on.
a "common" component to develop the specific portfolio algorithms, independently of the rest of the application architecture, in order to facilitate unit testing, but also some functional and regression testing.
So basically, to think about application architecture is to decide what "group of files" you need to develop in a coherent way (you can not develop in the same group of files a launcher, a GUI, a dispatcher, ...: they would not be able to evolve at the same pace)
When an application architecture is well defined, each of its components is usually a good candidate for a configuration component, that is a group of file which can be versionned as a all into a VCS (Version Control System), meaning all its files will be labeled together every time you need to record a snapshot of that application (again, it would be hard to label all your system, each of its application can not be in a stable state at the same time)
I have been doing architecture for a while. I use BPML to first refine the business process and then use UML to capture various details! Third step generally is ERD! By the time you are done with BPML and UML your ERD will be fairly stable! No plan is perfect and no abstraction is going to be 100%. Plan on refactoring, goal is to minimize refactoring as much as possible!
I try to break my thinking down into two areas: a representation of the things I'm trying to manipulate, and what I intend to do with them.
When I'm trying to model the stuff I'm trying to manipulate, I come up with a series of discrete item definitions- an ecommerce site will have a SKU, a product, a customer, and so forth. I'll also have some non-material things that I'm working with- an order, or a category. Once I have all of the "nouns" in the system, I'll make a domain model that shows how these objects are related to each other- an order has a customer and multiple SKUs, many skus are grouped into a product, and so on.
These domain models can be represented as UML domain models, class diagrams, and SQL ERD's.
Once I have the nouns of the system figured out, I move on to the verbs- for instance, the operations that each of these items go through to commit an order. These usually map pretty well to use cases from my functional requirements- the easiest way to express these that I've found is UML sequence, activity, or collaboration diagrams or swimlane diagrams.
It's important to think of this as an iterative process; I'll do a little corner of the domain, and then work on the actions, and then go back. Ideally I'll have time to write code to try stuff out as I'm going along- you never want the design to get too far ahead of the application. This process is usually terrible if you think that you are building the complete and final architecture for everything; really, all you're trying to do is establish the basic foundations that the team will be sharing in common as they move through development. You're mostly creating a shared vocabulary for team members to use as they describe the system, not laying down the law for how it's gotta be done.
I find myself having trouble fully thinking a system out before coding it. It's just too easy to only bring a cursory glance to some components which you only later realize are much more complicated than you thought they were.
One solution is to just try really hard. Write UML everywhere. Go through every class. Think how it will interact with your other classes. This is difficult to do.
What I like doing is to make a general overview at first. I don't like UML, but I do like drawing diagrams which get the point across. Then I begin to implement it. Even while I'm just writing out the class structure with empty methods, I often see things that I missed earlier, so then I update my design. As I'm coding, I'll realize I need to do something differently, so I'll update my design. It's an iterative process. The concept of "design everything first, and then implement it all" is known as the waterfall model, and I think others have shown it's a bad way of doing software.
Try Archimate.

What's the difference between game development and business development?

Like most developers, I'm a business developer, which in essence consists of slapping a UI onto some back-end data store. (We all know there's a lot more to it than that, but that's usually what it boils down to.)
I understand that game development is very different from business development, but I'm having a hard time explaining it to a friend of mine. I was hoping the SO community could help me out here.
To me, modern game developers deal a lot with manipulating 3-dimensional graphics. In gaming code (and I'm guessing here), you're assembling polygons (or something like that), rotating 'em, etc. This involves a different way of thinking from manipulating relational data (for instance). I don't know, really. I just know it's different.
EDIT:
I should stress that by "development" I mean "programming," not all of the aspects that go into creating a game or piece of business software. I'm sorry I didn't make that clear originally.
Thanks!
I'm in game development but came from business development long ago. Game development is very rigorous in mathematics if you work on the physics or graphics side. Even AI can need quite a bit of mathematics for the low-level stuff. The hardware usually takes care of a lot of the polygon manipulation math as far as drawing on the screen goes. There is also a lot of involvement with generating the in-game data with (often) many tools that are run in a pre-processing step, and that too can be math-intensive if you are generating visibility data.
In terms of programming domains, amongst other things, we deal with:
Graphics programming (including shader development)
Animation
Physics simulation
AI and gameplay
Audio
Networking (typically fairly low-level stuff)
Some of these involve pretty serious maths and algorithms knowledge. On top of all that, we face extremely tough speed constraints, and typically have to be very careful with memory usage too. We face constantly changing hardware, and since we're trying to push hardware to the limit, this can be pretty tough - you can't just abstract it away. Most game development is still quite low-level C++ work. We probably deal with databases less than most other programmers nowadays (although online games are changing this)!
Programmers are often the minority on modern game projects: it's all about content creation (animation, modelling, texturing, audio and design). This means many game programmers are dedicated to making the content creation process efficient, rather than working on the game code itself. This work may have more relaxed speed and memory constraints, although it does have to deal with massive data sets.
Making the game 'fun' is one of the hardest things to do - in business terminology, it "means extremely unstable requirements" as the designers constantly change their mind about how things should work, to chase down that elusive fun factor.
Finally, games are generally a ship-once, no chance to fix stuff kind of deal. This actually means there's very little code maintenance involved, so traditionally there may have been less attention paid to code quality issues. This is changing now with the growth in post-launch content addition, online gaming and the sheer size of modern projects.
Overall it's an incredibly exciting field to be in, the downside is that it's often less well paid (because it's a very tough business financially for developers, and because it's popular, there's always a fresh supply of people looking for jobs).
Just some random thoughts about what is different in game development. Note that there might be some sarcasm in it, though I tried to suppress the urge.
Unless you're a lucky employee of one of those new-style studios (like Eidos Montreal or Blizzard), there is always a deadline to fear that is much too short. In business programming, you mostly make the deadline up for yourself.
A business application serves some specific need. A game's intent is to entertain people. You can't really predict if a game will fail until it's out.
Performance is essential, in every aspect of the game. Writing code that is good to maintain is second priority. In business programming, good code that works is top priority.
For a business application, a shiny UI is a bonus. For a game, it is a must.
Debugging games is much harder, because there is always some hardware dependence which results in bugs that can only be reproduced on some machines, none of which is in your company. And a game sucks up much more performance than a typical business application.
You have people dedicated to creating the art, story, music, sound, background and design, none of which necessarily needs programming knowledge (scripting is a little different), i.e. you have a lot of content which is what the users (players) will see. Nobody cares about how good your code is, unless performance is bad or there are bugs. The others get the praise.
For larger games, you have programmers dedicated just to 3D graphics, networking, audio, tools, scripting, physics and so on. Most of them are highly specialized and each of them can lead the game into a disaster. You'll only need advanced math skills if you're the graphics or physics guy. Well, or AI.
Most games are fire-and-forget, apart from some bugfixes, unless it's one of the more successful games, which get an expansion pack or a sequel.
Security is an important issue for online games, since there are much more annoying people trying to to put people off than there are for business applications, many of which are for (more or less) internal uses at the customer.
You are expected to work much more than when writing business applications.
To land a job for an AAA title, you need to have worked on at least three shipped AAA titles (no, no typo here, ever read some job descriptions at Blizzard or LucasArts? :P)
But here come the good things:
You can pretend to work when you're playing games.
And finally, programming games is fun. Priceless.
Business development is generally much more forgiving.
The reason is basically this; usually, people ARE PAID to use business software. People PAY to use game software.
This may sound like it's not answering your question, but it really is. When my boss says "use microsoft word for that document", they're providing the software, and I'm obligated to use micosoft word. And so, when using it, when it decides to renumber all my chapter headings "just because" or a save to disk takes 30 seconds while it resolves OLE references (it's JUST ONE FREAKING EXCEL SPREADSHEET, for heaven's sake!), I just grit my teeth and remind myself I'm getting paid to do this.
Whereas, if I'm in a game, I'm expecting entertainment. I'm expecting the experience to work properly, and smoothly, and cleanly, with no major stutters or problems.
Again, getting down to why this is an issue for programming; those loops and structures in the game had better be DAMN good to make sure there is no major slowdown, no stuttering in the game engine, nothing that makes the consumer who just spent X amount of his hard-earned dollars say "this is a piece of crap" and walk away. With business software, you can get away with that sort of thing; in some ways, it's almost expected. Again, look at the performance of Microsoft Word; if it were a game, it would be laughed out of existence.
I know I sound like I'm picking on Microsoft Word, and I generally am, because I find it to be hideous, but the point is true for so many pieces of software. CAD software is another example. Same basic things going on as in games, but in general it's slow and hard to work with without a lot of training.
The difference comes down to polish, and the level of polish that's expected. Yes, there's generally more flexibility in business software than there is in games; but moreover and more importantly from a coding perspective, the code has GOT to work efficiently and cleanly in a game; business software is, generally, more forgiving of sloppy code.
In a business app, unoptimized and slow algorithms are generally accepted; and while they're never preferable, frequently the business decision gets made to add another feature instead of improving the performance. But in games, performance IS a feature, and one which is make-or-break.
One should have infinite loops, one shouldn't.
One should have infinite loops, one shouldn't. - Rich Bradshaw
Rich is right. Fundamentally, from a coding standpoint, a game loop creates a "frame" of action in which actions are taken based on the state of the game such as controller input, object collisions, etc. This loop repeats infinitely until some state of some game element or input tells it to stop or "quit." This approach keeps the CPU and graphics card pretty busy, hence the market for gamer machines with fast processors and even faster graphics cards.
Business applications do not have an active loop. Instead, they sit idle waiting for an event such as a click, a message from a web service client, an HTTP GET request, etc. Then they respond to the event.
Sure, gaming is generally more geometrically intensive than business applications, but that is not entirely true. Consider image editing, CAD and graphics tools. For many, these are business applications. But for the most part, a business application has to do with querying data, displaying that data, accepting user input, and modifying the data based on user input. In many cases, the business application does this across the network or even the Internet, but it's an apt nutshell.
The skillset and mindset of a business application developer and the game developer is often different. The game developer has a limited number of input constructs to consider in terms of creating a user experience with an unlimited choice of context or "world" if you will. The business developer is the opposite, with a limited set of potential contexts, usually the web page or the basic window, and an unlimited (or nearly so) set of input and data display combinations to create a user experience entirely different than the game developer sets out to achieve.
One big difference between business development and game development is the number of disciplines involved. Most business software is created by a team of developers, who all have the same basic skillset. In contrast, a game is created by a team of game designers, visual artists, 3d modelers, animators, musicians, and developers.
Good points about mathematics and integration of artists and other specialists in the team. In addition, I'd say that:
Game development, to some extend, will be more hardware dependent. In many cases, games are built simultaneously to several platforms and consoles (not to mention cellphones), with different architectures. That is abstracted up to a certain extent, but developers cannot completely avoid this fact.
Game development is often more performance sensitive, or at least the performance requirements are different. You're dealing with real-time experience, so a lot of time is spent optimizing those pesky fps.
In many cases, game development does not care as much about reuse and maintainability. The game engine will probably be reused, but the application code base will probably not live to see v2.0. In the last stretch of a project, there is a lot of quick and dirty debugging going on. If it looks fine to the end user, there's no added value in making an elegant fix two days before the release.
Let's start from the goal - the goal of game development is to create entertaining product. It should be accurate to the extend that it looks good and runs smoothly. The goal of a business software solution is to model a work process. It should be a tool which works fast enough. A stable product, which executes absolutely accurately and secure the tasks it should do.
Since we target different goals, we use different approaches to build a game and a business software. Let's move to the requirements. For a game, the requirements are determined by the game designer. For a software product the business defines the process and the requirements. For a game the requirements are not final - shall we have small cartoon figures or real human models - this does not matter for the game engine for example. But for a software product, the requirements should be strictly followed and cleared to the maximum possible detail before development.
From the different requirements come different software design and development approach. For a game the performance and gameplay are critical and the qualiity of the graphics and sounds (for example) could be reduced just to be compatible with weaker hardware. Also the physical model could be simplified just to run smoother and improve the gameplay. For the business software everything should be exact and cutting features means that your product will not be as functional as designed anymore.
For a game, the security is not important - there is no critical customer data which should be saved. For a business software a good security system should be supplied - starting from data encryption (while saving on data storage or transferring through network) moving through backup system and mentioning (but not last) the compatibility with previous versions.
I could continue with other aspects but I guess this is already too much for one post...
Business software (that isn't shrink-wrap software) can generally be much more poorly written but still considered a commercial success due to the bizarre disconnect between the quality of the product and saleability of the product. Game software, on the other hand, has to actually be good to survive the marketplace.
The bar for quality in specialized business software is generally much lower.
Business software has to be reliable, maintainable, consistent, not be too annoyingly slow, and can build on lots of already written stuff, such as databases, controls, forms etc.
A games programmer often starts with a blank sheet - hardware reference manuals, some documentation about the hardware and usually thin vendor libraries around some advanced hardware that's completely different to the last job.
From this they have to build what you see - and make most of it work within a 20ms time period, reliably, and often within a ridiculously short time period, facing changing requirements and often a very hard deadline, working untold numbers of hours for a comparative pittance.
That's not to mention often having to master some fairly complex mathematics and physics....
Performance is really the difference, from what I can tell.
Technologywise, games are usually Windows/C++ driven.
Game programming has more in common with scientific programming. You are modeling behavioral systems and anticipating results based upon a limited set of input.