Natural Language Parsing for ToDo application - language-agnostic

I'm wondering if someone could lead me to any examples of natural language parsing for to do lists. Nothing as intense as real Natural Language Parsing, but something that could process the line:
Go to George's house at 3pm on Tuesday with Kramer
as well as the line:
3 on tuesday go to georges
and get the same output.
I've seen other to do applications that do this sort of work in the past. Is there anything out there with examples or have people just custom written this code themselves?

Somebody pointed out this natural language parsing on this site..kudos to whoever you are for posting the link...http://code.gustavonarea.net/booleano/

That's a great idea! As you might imagine this is vastly complex and can be approached in many different ways. Perhaps check out the Natural Language Toolkit for starters, which is mostly python but also requires building some Ocaml and Java components. I also recommend reading some books and or papers on lexical semantics.

I wrote something similar to this in Perl. The input would be a day/time with the name of some action. Sentences like: "3pm Run full unit test suite", "reboot servers on dec 25", etc.
I used the Perl module Date::Manip since it's awesome for this sort of thing and coded the rest of the logic manually.

Related

Why no programming in English? What is the difference between natural languages and programming languages?

What is the key difference between natural languages (such as English and French) and programming languages like C++ and Perl?
I am familiar with the ambiguity problem, but can't it be solved using an interactive compiler or using a subset of the natural language using a strict grammar but all the time still retaining the essence of the language?
Another issue is context. But lawyers have ways to solve this issue. (This question is not about reducing the programming complexity, it's simply about concise reasons and roadblock in using natural languages for instructing computer.)
Is there any other significant problem besides these two? Or do these two have greater consequences than I mentioned above? Is the interactive solution and lawyers language technically not feasible for programming?
This is an extremely interesting question and in short, yes, there are some very good reasons why we don't use English to write programs.
It's been said before that the greatest gift that computer science has given us is not the ability to talk to computers but now that formal languages exist for describing algorithms we now have even better tools for communicating these ideas to other people. Even if a computer is not involved. Indeed the best software engineers see their jobs primarily as writing software that is readable to other people so as to make maintenance and addition of new features as easy as possible. This is not possible in a language as big and as free form as any natural, spoken language.
Ambiguity
One reason is that of ambiguity. Have you ever looked a menu in a restaurant and seen that with your burger you can get "Coleslaw and fries or salad"? What does this mean? Can I get both coleslaw and fries or the other option is a salad alone? Or do I always get coleslaw and I have to chose between the fries or a salad? English is full of these things.
I used to teach a class on this and an example I liked to use to explain ambiguity was as follows. I asked the students to write a one paragraph story ending with the sentence "Tom asked Chris if he could help him". About half the time the stories written indicated that the student interpreted the sentence as Tom asking for assistance from Chris. The other half of the time people thought Tom was offering to lend Chris a hand.
If you think about it, there are a lot of people who do write programs in English. They're called product managers and the compiler they use is software engineers. The problem here is that a software engineer has to inject a lot of his own understanding of the problem to understand what the description really means. And trust me, there is a lot of back and forth. Even on very simple business requirements I must clarify ambiguities.
Context
I would not agree that lawyers have ways to solve the context problem. We continually have ongoing arguments in courts among, in some cases, some of the most educated people in the country, about the meanings of various laws. Sometimes this involves arguing about the context in which the laws were written long ago. Sometimes in involves applying it to a new, previously non-existent context like the Internet. The fact that we have thousands of lawyers working on disambiguating these issues is proof that it cannot be handled by a simple computer program like a compiler. It's just too hard of a problem.
Conciseness
Another issue is just the ability to be concise. Mathematics long ago invented notations for many different concepts in the maths because it's just easier to read if there's a special syntax that is concise and has a well defined meaning. A mathematician knows what it means when I say "f(x) = 3x+1". It means the same thing as "There is a function called f and it has one argument. The value of applying f to a number is the number that is one more than three times the number given." But the former is a lot easier to read once you've learned the syntax. The same is true for programming languages. Programming languages are specialized to describe computations.
Implementation
Creators of programming languages deliberately create very small languages. These are, in fact, subsets of English also with some extra syntax. The idea of understanding all of English in all of its free form ways and, worse yet, increasing vocabulary is a job for Natural Language Processing (NPL). A very hard job. If you want to be able to assign unambiguous meaning to a program and have the program's behavior never change, you need a well defined syntax and semantics.
The take-away point here is that English is a very big, very flexible language with no formal specification. Programming languages need to have a well-defined syntax and semantics in order for algorithms to have unambiguous and unchanging meaning. Someone could, in fact, write a formal syntax for a subset of English and give it unambiguous meaning. But this would be a huge, huge job.
Its been done
Check out BabelBuster. The idea here was to take C and convert it to and from a very small subset of very rigorous English such that one could write a program in C and then convert it to English. During the DeCSS DVD decryption arguments, the MPAA was trying to get programs that could decrypt their DVDs declared illegal. BabelBuster fought back with a very interesting idea. Create a way to convert English, which is protected under the freedom of speech into working code in C and thus make the point that C code is also just a language which should be protected as such. Therefor one should be able to publish code that cracks DeCSS. It's an interesting piece of work relevant to your question regardless of which side you're on.
The problem with BabelBuster is that you need to write your program in a very, very limited subset of English. But it is possible to do this.
Conclusion
English, like all natural languages, allows us to describe a computation or algorithm but the language is verbose, offers many ways to say the same thing, is dependent on the context of the speaker, and not formally specified. If your goal is to describe computations, you should take English, chose a minimal workable subset in which you can say everything you need. Formally specify what each word in this subset will mean. Then create a few special notations to make it concise to say the things you say just like mathematics did. If you do this, you do this you'll end up with a typical programming language, or something like it.
There are three principal reasons.
First, as Gabe says - people have figured out through trial and error that programming in things that are close to English sentences only forces programmers to type more useless cruft. (And yes, COBOL was explicitly designed to read more "naturally".)
To a programmer,
windows++
is more readable than
You should now increment the number of windows by one.
For example, Tetris is a rather easy game to code. I would be terribly surprised if you managed to make an English explanation that is detailed enough for a computer (remember, computers are dumb, so you have to spell it all out) in less pages than a short novel.
The second reason is that the range of things a computer knows how to do is rather small, so the number of language constructs that are needed for that is also limited. In contrast, natural languages need to be able to express the entirety of human experience, which does require many language constructs to pull off. For example, "According to his wife, John would have caught the fish yesterday if it hadn't rained" is not expressible in C - and does not need to be.
And third is, indeed, ambiguity, as you yourself note. There are a lot of places where a software error is simply not permissible. People do enough bugs in unambiguous languages; allowing ambiguity would be a disaster waiting to happen. And on the same subject, we are still unable to parse human language sufficiently well - state of the art parsers still have unacceptably high error rates.
It is possible to automatically translate structured English into code, as long as a restricted subset of the English language is used.
As a proof-of-concept, I have developed an programming language called EngScript, which translates English sentences sentences into Python source code.
Arithmetic operations can be written in plain English:
#print{3 to the power of 2}
#print{3 raised to the power of 2}
#Both of these statements print "9".
print{3 plus (the sum of 1 and 2)}
#This prints "5".
Variables can be initialized in plain English, too:
let x be (x plus 1)
if (x is not equal to 7) :
print x

Where can I learn the basics of writing a lexer?

I want to learn how to write a lexer. My university course had an assignment where we had to write a parser (and a lexer to go along with it) but this was given to us with no instruction or feedback (beyond the mark) so I didn't really learn much from it.
After searching for this topic, I can only find fairly advanced write ups which focus on areas which I feel are a few steps ahead of where I am at. I want a discussion on the basics of writing a lexer for a very simple language which I can use as a basis for investigating tokenising more complex languages.
At this stage I'm not really interested in best practices or optimisation techniques but instead prefer a focus on the essentials. What are some good resources to get me started?
Basically there are two main approaches to writing a lexer:
Creating a hand-written one in which case I recommend this small tutorial.
Using some lexer generator tools such as lex. In this case, I recommend reading the tutorials to the particular tool of choice.
Also I would like to recommend the Kaleidoscope tutorial from the LLVM documentation. It runs through the implementation of a simple language and in particular demonstrates how to write a small lexer. There is a C++ and an Objective Caml version of the tutorial.
The classical textbook on the subject is Compilers: Principles, Techniques, and Tools also known as the Dragon Book. However this probably falls under the category of "fairly advanced write ups".
The Dragon Book is probably the definitive guide on the subject, although it can be a bit overwhelming. Language Implementation Patterns and Programming Language Pragmatics are great resources as well.

Programmers dictionary/lexicon for non native speakers

I'm not an English speaker, and I'm not very good at English. I'm self thought. I have not worked together with others on a common codebase. I don't have any friends who program. I don't work with other programmers (at least nobody who cares about these things).
I guess this might explain some of my problems in finding good unambiguous class names. I have tried to find some sort of "Programmers dictionary" containing words often used and their meanings. When reading others code I have to look up words quite often, and as many use abbreviations this poses an additional challenge.
My very limited vocabulary "forces" me to use bad class names like xxManager, xxProvider, xxWhatever. It's usually less problematic choosing variable and method names.
Other non English people out here: How have you managed to cope with this? Have you studied English so well it's not a problem? Or have you read so much code naming comes natural? Or discussed a lot with English speakers? Found any good websites, articles or other publications? As I've never read anything regarding programming in my own language, I often have more problems trying to find the words in my language...
PS: All other posts I've found was regarding mixing native tongue and English... And I understand this might be a bit off topic and might be closed.
Edit: Some resources from the answers and other stuff I use:
Jargon / The New Hacker's Dictionary
Common design patterns
Google translate
Dictionary
The Jargon file will help with the more obscure references people will give in the industry.
http://catb.org/jargon/html/go01.html
Other than that..finding good names for your variables/classes/etc is hard. Often times, it's harder than actually solving the problem. Here's a good resource for some common design pattern names people like to use: http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29
Examples:
AbcFactory
XyzBridge
Could be an unorthodox suggestion, but I would recommend studying English more deeply (I am also a non-native speaker).
Expose yourself to as much English as possible! Watch movies, read English fiction, listen to technical podcasts.
Mind you, if you really want to deepen your knowledge of English, you're probably not going to learn a lot watching "Transformers". On the other hand, diving into Ulysses probably is not a good strategy either.
If you're feeling adventurous, you could always get a subscription to the New Yorker magazine. It'll do things to you - yes this is flamebaiting. :P
Other non English people out here:
How have you managed to cope with this?
Good naming in code matters. Using English is the preferred, but if you don't know English very well the result could be counterproductive.
I had a friend who just guessed what the correct name would be and the result was horrible. ie
String employiiNeim; // employeeName
int eich; // age
The problem with English, is that is not pronounced as written ( french have this minor ... ehrm characteristic ) Other languages like Spanish, German, Dutch, and others, do type and pronounce every letter in the word.
This becomes particular relevant when what you are coding are business rules or business models. In this case it is much better to use your native language.
String nombreEmpleado;
int edad;
Way much better, specially when you work with others.
Have you studied English so well it's not a problem?
Yeap, there is no other way, and a lot of practice.
You can study English the same way you study programming languages though. You can have a teacher and attend to a class room and study an hour a day. Or ( what I did ) you can just grab something that is interesting to you and try to understand it. For instance, you have a small document describing something you care, you read blogs or read content here at StackOverflow, you translate a song you like, etc. etc.
All these are study forms. There is no other way, you won't wake up one day and say: "...I know kung fu" I mean, and say: ..."I know English"
Or have you read so much code naming comes natural?
Also helps, but if you don't understand what the code means, you ... well won't make any progress.
You'll learn the programming language, and that will help you to understand English bit better, but won't help you to learn it. That's because when we program we learn the programming language not the native language.
Or discussed a lot with English speakers?
Eerhh..nope. If you have that chance go ahead, it will improve your listening and speaking, but not necessarily your writting.
The most effective way to improve your English vocabulary and grammar is by READING ( reading in your native language also improves your own language btw )
So, I would say, read as much as you can. Use your native language while you gain more confidence, and keep studying.
The English will come with time.
If you can't find the "Programmer's Dictionary" you're looking for, start one. Post a new question: "What entries are missing from this Dictionary for English-as-a-Second-Language-Programmers?" and seed it with 10 or 20 words/definitions you've already discovered. Once posters have suggested enough additions, move it to a a wiki somewhere and keep accepting contributions. You might end up creating a valuable resource.
Documenting your code with excellent prose like your question above will go a long way!
If you stick to common design patterns endemic to the language, platform, and architecture for which you're working with, other engineers should understand your nomenclature fairly easily.
If you are worried about it in terms of naming your own objects, just think of what your native word is for what you want to do, then go get an english language translation dictionary, and use the english language version.
How about using your native language?
Of course (like for me as an Austrian) some letters may not be allowed - but who cares if there is Mörder or Moerder (Murder) in the class name :)
Or (as I do) use a dictionary like dict.cc or something else.
I do - think what the class does - it manages game session (for an example) so it will become GameSessionManager.
Abbreviations are (at least for me) a problem - but what I've learned from other code - event native speakers use different abbreviations.
And if the class is called GameSessionMgr or GameSessionMngr doesn't make a difference.
Your are not writing books or some kind of "english poem" where spelling, grammar and... counts.
You write code - and if you follow "your sepcial rules" - you and others will (after some time) be able to understand you code and class names.
It will come with time and experience. Above all attempt to (like #Mike A says) document things until the code becomes clearer and try to be consistent.
This is an issue that I run into as well, even as a native English speaker. As a programmer, I often find that I need to find a descriptive word for a class, variable, function, etc. I often find myself asking a friend or coworker what verbage they would use by explaining my idea, carefully excluding any words I myself have considered as a possible choice for the class/function/variable name so as not to inhibit their creativity.
It seems to me that the English Language & Usage site proposal over at Area51 is a good place to ask such questions as "What would you call a class (or thing) that does this, this and that, and has properties x, y, and z?

Best language and framework for a text based game like mafia wars

Which do you think is the best language/framework to develop a text based adventure game like
Mafia wars? I am proficient in Java/JavaScript and have dabbled in Python, Perl, Erlang, Scheme. Also, any pointers to articles relating to this is very welcome. I am starting from scratch and hence have no constraints. This is a hobby project that I am planning to do to satisfy my coding urge.
The 'best' language doesn't exist.
Try using the one that you feel most comfortable with, after thinking about date structures, functional requirements, possibly the one where you can get the most support in your immediate (person to person) or close (e.g. stackoverflow) environment.
I'm going to try something original here - give Natural language a try.
Inform is a tool for creating interactive fiction (a.k.a. text-based adventures) that features its own language. It takes care of creating the initial "infrastructure" (taking user input, recognizing verbs, that sort of thing) and lets you concentrate on creating "things", "places" and "actions".
Here's a sample, extracted from its tutorial:
The wood-slatted crate is in the Gazebo. The crate is a container.
Mr Jones wears a top hat. The crate contains a croquet mallet.
It looks deceitfully easy, I know. But try it :)
Inform also allows you publish it on The Interactive Fiction Database, as well as export it to a standard Z-machine format (I belive the file extension for this is .z8) .There's even a javascript z-machine interpreter, in case you prefer to host your adventure on a web-page yourself.
Edit: I've found two additional "frameworks" - I don't know whether they use a programming language, or they are completely graphical, I don't use windows: Adrift and TADS 3
I'm a little confused by your requirements; Mafia Wars is a web game, correct? Text adventures, while they can be played on the web (see this article: http://kooneiform.wordpress.com/tag/if-interpreters/) are usually single-player games, a far cry from Mafia Wars.
I think you mean you want to create a PBBG or web game; based on your experience then I recommend a Python back-end with JavaScript on the client-side. One framework you could look into is the Google App Engine, which has Python support, and would be an excellently scalable solution.
Alternatively you can choose one of the many Python web frameworks available. If you'd like a simple place to start, I recommend web.py, which I've been trying out recently and quite like. I've found that combining Python and JavaScript/AJAX with web.py and something like jQuery is a very enjoyable and friction-free way to develop.
Clojure could be a fun option - Lisps are a classic way of writing natural language processing programs and text adventure games are a good example.
Here's a nice little tutorial for writing a text adventure in Clojure.
Just use what you have learn, there no specific programming language to do that kind of application. Just it's more or less easy depending the language
Since you seems to be experienced in Python just go ahead with python! If you don't already made some web project, you should take a look at tutorials and resources on the web.
Good luck!

How to write a LISP interpreter for actionscript 3?

I know there is one, but it's not easy to implement the way I want.
I would like to know the steps to interpret the lisp language and what functions are essential to be implemented.
First, you learn Lisp, then read LiSP and (given you know ActionScript well enough) you just start. PAIP also has sections about implementing Lisp interpreters and compilers.
To get an idea how it generally can be approached, you can have a look at Write Yourself a Scheme in 48 hours. It uses Haskell as an implementation language, but it would give you an idea.
It surely wouldn't be trivial, but it has been done quite often, and there is much to learn from doing it.
danlei's recommendations are excellent. If you want to learn Lisp, PAIP is a better choice to start with, because it will teach you a lot about Common Lisp and a smallish chunk of Scheme.
However, my recommendation would be to start with The Structure and Interpretation of Computer Programs, which will teach you at least as much about Lisp as PAIP (you won't learn as much about AI, though), has a longer and more complete section on how to write Lisp interpreters, and is an awesome book all around. In addition, it's available in its entirety online. I had to order both PAIP and LiSP by mail.
Check out the 'Essentials of Programming Languages' book (also known as EoPL).
If you want to implement a basic lisp in a higher level language, you might get some mileage out of the later chapters of The Little Schemer (where you're shown how to write a meta-circular evaluator in Scheme), the entirety of WYAS48 (where you're shown how to implement an R5RS Scheme in Haskell) and these two Norvig articles (wherein he implements a basic lisp-like in Python).
You can check out sporklisp (which is a lisp variant written in vba) and works in excel.
https://github.com/spoonix/sporklisp
PS I have been able to port this to MS Access also with no problem.
------ note from the author of sporklisp -------
A lot of the concepts came from lisp wizards, particulalry the Structure and Interpretation of Computer Programs (SICP), Peter Norvig's excellent tutorials for LisPy and JScheme, and Christian Queinnec's Lisp in Small Pieces.
I would recommend reading one of the famous dragon books. It pretty much explains the entire process of parsing, compiling, code generation, optimization etc
If you have to ask - you can't do it.
Implementing a programming language is a terribly complicated thing, even if you don't have to do it from scratch. And I don't think there will be many supporting tools/libraries for Actionscript. On top of that, LISP is a functional programming language which will require quite an amount of extra trickery in addition to the usual language implementation in order to get a decent performance.