Which is a better long term URL design? - language-agnostic

I like Stack Overflow's URLs - specifically the forms:
/questions/{Id}/{Title}
/users/{Id}/{Name}
It's great because as the title of the question changes, the search engines will key in to the new URL but all the old URLs will still work.
Jeff mentioned in one of the podcasts - at the time the Flair feature was being announced - that he regretted some design decisions he had made when it came to these forms. Specifically, he was troubled by his pseudo-verbs, as in:
/users/edit/{Id}
/posts/{Id}/edit
It was a bit unclear which of these verb forms he ended up preferring.
Which pattern do you prefer (1 or 2) and why?

I prefer pattern 2 for the simple reason is that the URL reads better. Compare:
"I want to access the USERS EDIT resource, for this ID" versus
"I want to access the POSTS resource, with this ID and EDIT it"
If you forget the last part of each URL, then in the second URL you have a nice recovery plan.
Hi, get /users/edit... what? what do you want to edit? Error!
Hi, get /posts/id... oh you want the post with this ID hmm? Cool.
My 2 pennies!

My guess would be he preferred #2.
If you put the string first it means it always has to be there. Otherwise you get ugly looking urls like:
/users//4534905
No matter what you need the id of the user so this
/user/4534905/
Ends up looking better. If you want fakie verbs you can add them to the end.
/user/4534905/edit

Neither. Putting a non-English numeric ID in the URL is hardly search engine friendly. You are best to utliize titles with spaces replaced with dashes and all lowercase. So for me the correct form is:
/question/how-do-i-bake-an-apple-pie
/user/frank-krueger

I prefer the 2nd option as well.
But I still believe that the resulting URLs are ugly because there's no meaning whatsoever in there.
That's why I tend to split the url creation into two parts:
/posts/42
/posts/42-goodbye-and-thanks-for-all-the-fish
Both URLs refer to the same document and given the latter only the id is used in the internal query. Thus I can offer somewhat meaningful URLs and still refrain from bloating my Queries.

I like number 2:
also:
/questions/foo == All questions called "foo"
/questions/{id}/foo == A question called "foo"
/users/aiden == All users called aiden
/users/{id}/aiden == A user called aiden
/users/aiden?a=edit or /users/aiden/edit == Edit the list of users called Aiden?
/users/{id}/edit or /users/{id}?a=edit is better
/rss/users/aiden == An RSS update of users called aiden
/rss/users/{id} == An RSS feed of a user's activity
/rss/users/{id}/aiden == An RSS feed of Aiden's profile changes
I don't mind GET arguments personally and think that /x/y/z should refer to a mutable resource and GET/POST/PUT should act upon it.
My 2p

/question/how-do-i-bake-an-apple-pie
/question/how-do-i-bake-an-apple-pie-2
/question/how-do-i-bake-an-apple-pie-...

Related

Rest API design with multiple unique ids

Currently, we are developing an API for our system and there are some resources that may have different kinds of identifiers.
For example, there is a resource called orders, which may have an unique order number and also have an unique id. At the moment, we only have URLs for the id, which are these URLs:
GET /api/orders/{id}
PUT /api/orders/{id}
DELETE /api/orders/{id}
But now we need also the possibility to use order numbers, which normally would result into:
GET /api/orders/{orderNumber}
PUT /api/orders/{orderNumber}
DELETE /api/orders/{orderNumber}
Obviously that won't work, since id and orderNumber are both numbers.
I know that there are some similar questions, but they don't help me out, because the answers don't really fit or their approaches are not really restful or comprehensible (for us and for possible developers using the API). Additionally, the questions and answers are partially older than 7 years.
To name a few:
1. Using a query param
One suggests to use a query param, e.g.
GET /api/orders/?orderNumber={orderNumber}
I think, there are a lot of problems. First, this is a filter on the orders collections, so that the result should be a list as well. However, there is only one order for the unique order number which is a little bit confusing. Secondly, we use such a filter to search/filter for a subset of orders. Additionally, a query params is some kind of a second-class parameter, but should be first-class in this case. This is even a problem, if I the object does not exist. Normally a get would return a 404 (not found), but a GET /api/orders/?orderNumber=1234 would be an empty array, if the order 1234 does not exist.
2. Using a prefix
Some public APIs use some kind of a discriminator to distinguish between different types, e.g. like:
GET /api/orders/id_1234
GET /api/orders/ordernumber_367652
This works for their approach, because id_1234 and ordernumber_367652 are their real unique identifiers that are also returned by other resources. However, that would result in a response object like this:
{
"id": "id_1234",
"ordernumber": "ordernumber_367652"
//...
}
This is not very clean, because the type (id or order number) is modelled twice. And apart from the problem of changing all identifiers and response objects, this would be confusing, if you e.g. want to search for all order numbers greater than 67363 (thus, there is also a string/number clash). If the response does not add the type as a prefix, a user have to add this for some request, which would also be very confusing (sometime you have to add this and sometimes not...)
3. Using a verb
This is what e.g. Twitter does: their URL ends with show.json, so you can use it like:
GET /api/orders/show.json?id=1234
GET /api/orders/show.json?number=367652
I think, this is the most awful solution, since it is not restful. Furthermore, it has some of the problems that I mentioned in the query param approach.
4. Using a subresource
Some people suggest to model this like a subresource, e.g.:
GET /api/orders/1234
GET /api/orders/id/1234 //optional
GET /api/orders/ordernumber/367652
I like the readability of this approach, but I think the meaning of /api/orders/ordernumber/367652 would be "get (just) the order number 367652" and not the order. Finally, this breaks some best practices like using plurals and only real resources.
So finally, my questions are: Did we missed something? And are there are other approaches, because I think that this is not an unusual problem?
to me, the most RESTful way of solving your problem is using the approach number 2 with a slight modification.
From a theoretical point of view, you just have valid identification code to identify your order. At this point of the design process, it isn't important whether your identification code is an id or an order number. It's something that uniquely identify your order and that's enough.
The fact that you have an ambiguity between ids and numbers format is an issue belonging to the implementation phase, not the design phase.
So for now, what we have is:
GET /api/orders/{some_identification_code}
and this is very RESTful.
Of course you still have the problem of solving your ambiguity, so we can proceed with the implementation phase. Unfortunately your order identification_code set is made of two distinct entities that share the format. It's trivial it can't work. But now the problem is in the definition of these entity formats.
My suggestion is very simple: ids will be integers, while numbers will be codes such as N1234567. This approach will make your resource representation acceptable:
{
"id": "1234",
"ordernumber": "N367652"
//...
}
Additionally, it is common in many scenarios such as courier shipments.
Here is an alternate option that I came up with that I found slightly more palatable.
GET /api/orders/1234
GET /api/orders/1234?idType=id //optional
GET /api/orders/367652?idType=ordernumber
The reason being it keeps the pathing consistent with REST standards, and then in the service if they did pass idType=orderNumber (idType of id is the default) you can pick up on that.
I'm struggling with the same issue and haven't found a perfect solution. I ended up using this format:
GET /api/orders/{orderid}
GET /api/orders/bynumber/{orderNumber}
Not perfect, but it is readable.
I'm also struggling with this! In my case, i only really need to be able to GET using the secondary ID, which makes this a little easier.
I am leaning towards using an optional prefix to the ID:
GET /api/orders/{id}
GET /api/orders/id:{id}
GET /api/orders/number:{orderNumber}
or this could be a chance to use an obscure feature of the URI specification, path parameters, which let you attach parameters to particular path elements:
GET /api/orders/{id}
GET /api/orders/{id};id_type=id
GET /api/orders/{orderNumber};id_type=number
The URL using an unqualified ID is the canonical one. There are two options for the behaviour of non-canonical URLs: either return the entity, or redirect to the canonical URL. The latter is more theoretically pure, but it may be inconvenient for users. Or it may be more useful for users, who knows!
Another way to approach this is to model an order number as its own thing:
GET /api/ordernumbers/{orderNumber}
This could return a small object with just the ID, which users could then use to retrieve the entity. Or even just redirect to the order.
If you also want a general search resource, then that can also be used here:
GET /api/orders?number={orderNumber}
In my case, i don't want such a resource (yet), and i could be uncomfortable adding what appears to be a general search resource that only supports one field.
So basically, you want to treat all ids and order numbers as unique identifiers for the order records. The thing about unique identifiers is, of course, they have to be unique! But your ids and order numbers are all numeric; do their ranges overlap? If, say, "1234" could be either an id or an order number, then obviously /api/orders/1234 is not going to reference a unique order.
If the ranges are unique, then you just need discriminator logic in the handler code for /api/orders/{id}, that can tell an id from an order number. This could actually work, say if your order numbers have more digits than your ids ever will. But I expect you would have done this already if you could.
If the ranges might overlap, then you must at least force the references to them to have unique ranges. The simplest way would be to add a prefix when referring to an order number, e.g. the prefix "N". So that if the order with id 1234 has order number 367652, it could be retrieved with either of these calls:
/api/orders/1234
/api/orders/N367652
But then, either the database must change to include the "N" prefix in all order numbers (you say this is not possible) or else the handler code would have to strip off the "N" prefix before converting to int. In that case, the "N" prefix should only be used in the API calls - user facing data-entry forms should not expose it! You can't have a "lookup by any identifier" field where users can enter either id or order number (this would have a non-uniqueness problem anyway.) Instead, you must have separate "lookup by id" and "lookup by order number" options. Then, you should be able to have the order number input handler automatically add the "N" prefix before submitting to the API.
Fundamentally, this is a problem with the database design - if this (using values from both fields as "unique identifiers") was a requirement, then the database fields should have been designed with this in mind (i.e. with non-overlapping ranges) - if you can't change the order number format, then the id format should have been different.

How to realize a context search based on synomyns?

Lets say an internet user searches for "trouble with gmail".
How can I return entries with "problem|problems|issues|issue|trouble|troubles with gmail|googlemail|google mail"?
I don't like to manually add these linkings between different keywords so the links between "issue <> problem <> trouble" and "gmail <> googlemail <> google mail" are completly unknown. They should be found in an automated process.
Approach to solve the problem
I provide a synonyms/thesaurus plattform like thesaurus.com, synonym.com, etc. or use an synomys database/api and use this user generated input for my queries on a third website.
But this won't cover all synonyms like the "gmail"-example.
Which other options do I have? Maybe something based on the given data and logged search phrases of the past?
You have to think of it ignoring the language.
When you show a baby the same thing using two words, he understand that those words are synonym. He might not have understood perfectly, but he will learn when this is repeated.
You type "problem with gmail".
Two choices:
Your search give results: you click on one item.
The system identify that this item was already clicked before when searching for "google mail bug". That's a match, and we will call it a "relative search".
Your search give poor results:
We will search in our history for a matching search:
We propose : "do you mean trouble with yahoo mail? yes/no". You click no, that's a "no match". And we might propose others suggestions like a list of known "relative search" or a list of might be related playing with both full text search in our history and levenshtein distance.
When a term is sufficiently scored to be considered as a "synonym", you can consider it is. Algorithm might be wrong, but in fact it depends on what you really expect.
If i search "sending a message is difficult with google", and "gmail issue", nothing is synonym, but search are relatively the same. This is more important to me than true synonyms.
And if you really want to get the synonym, i would do it in a second phase comparing words inside "relative searches" and would include a manual check.
I think google algorithm use synonym mainly to highlight search terms in page result, but not to do an actual search where they use the relative search terms, except in known situations, as the result for "gmail" and "google mail" are not the same.
But if you identify 10 relative searches for "gmail" which all contains "google mail", that will be a good start point to guess they are synonyms.
This is a bit long for a comment.
What you are looking for is called a "thesaurus" or "synonyms" list in the world of text searching. Apparently, there is a proposal for such functionality in MySQL. It is not yet implemented. (Here is a related question on Stack Overflow, although the link in the question doesn't seem to work.)
The work-around would be to modify queries before sending them to the database. That is, parse the query into words, then look up all the synonyms for those words, and reconstruct the query. This works better for the natural language searches than the boolean searches (which require more careful reconstruction).
Pseudo-code for getting the final word list with synonyms would be something like:
select #finalwords = concat_ws(' ', group_concat(synonyms separator ' ') )
from synonyms s
where find_in_set(s.baseword, #words) > 0;
Seems to me that you have two problems on your hands:
Lemmatisation, which breaks words down into their lemma, sometimes called the headword or root word. This is more difficult than Stemming, as it doesn't just chop suffixes off of words, but tries to find a true root, e.g. "are" => "be". This is something that is often done programatically, although it appears to be a complex task. Here is an online example of text being lemmatized: http://lemmatise.ijs.si/Services
Searching for synonymous lemmas. This is a very complex problem. One approach to this that I have heard of is modifying the lemmatisation engine to return more than one lemma for a given set of words, i.e. "problems" => "problem" and "issue", thereby allowing a more flexible set of results. However, this means that the synonymous lemmas must be provided to the lemmatisation engine from elsewhere. I truly have no idea how you would build a list of synonyms programatically.
So, you may consider a strategy whereby you lemmatise the text to be searched for, then pass each lemma out to your synonym finder (however that works) to get a final list of lemmas to perform your search with.
I think you have bitten off a very large problem for yourself.
If the system in question is a publicly accessible website, one 'out there' option is to ensure all content can be crawled by Google and then use a Google search on your own site, which should give you the synonym capability 'for free'. There would obviously be some vagaries in the results though and lag in getting match results for newly created content, depending upon how regularly the crawlers hit the site. Probably not suitable in your use case, but for some people, this may be sufficient.
Seeing your revised question, what about using a public API?
http://www.programmableweb.com/category/reference/apis?category=20066&keyword=synonym

Banned words checking algo

I am building a text chat system. I want to add the ability to check for banned words/phrases.
The only technique I can think of, and can't believe it could possibly be the best approach is to do a FOR loop through all the words and search for matches in the text. This seems like it would be unbelievably slow once lots of words are added.
I'm using AS3, but an answer in most any language would probably be useful.
take care,
lee
use an AS3 dictionary or a dict in python and just check if the word is in the dict. there is no way I can see to not go over all the words.
Consider concatenating all the entries in your Dictionary into a single RegExp, with which you have to parse the text only once. I've done some testing, and it's going to be way faster than replacing word for word.
function censorWithDictionary ( dict:Dictionary, text:String ) : String {
var reg : String = "";
for (var key:Object in dict)
{
reg += reg=="" ? "" : "|"; // add an "or" for multiple search words
reg += "\\b"+dict[key]+"\\b"; // only whole words
}
var regExp : RegExp = new RegExp ( reg, "gi" );
return text.replace ( regExp, "----" );
}
I had a similar problem - we run a gaming site and wanted to introduce a chat system which was not manually moderated. We went the "banned word" route and it's working really well.
I just counted them and we now have a list of (just) 79 banned words which originated from something I found on-line to which we have added words over time when chat messages crept through.
The way we check things is that we concatenate an entire chat message by removing all spaces and none alpha characters and then search for banned words in what's left.
The key decisions we made are:
Don't tell people why you rejected their messages
Don't let people post chat until you trust them a bit (on our site they have
to have played 3 games)
5 "Bad" messages and we automatically block you
We email a report out daily with all the chat which got through which we scan through
We allow other users to complain about posted messages - if that happens the message is automatically removed so we can check it later.
1+3+5 Hardly ever happen now and it works wonderfully even though - sometimes messages like
"I wish it was hot!"
Are rejected (the clue is the "sh" part of wish and "it") but even that doesn't happen often.
This is more a comment than an answer, but comments are limited in length and there're big issues here.
I believe you are fundamentally asking the wrong question!
Certainly dictionaries and blacklist would highlight words or phrases that you want to ban but would that list be acceptable to users of your system? Would there be text that users of your system find offensive but you do not. Who decides?
For example, would people living here have trouble or indeed people living here. What if you supported this football/soccer team. This person probably never visits the UK.
Then you get into the issue of anagrams and slang. FCUK is a high street brand in the UK (and elsewhere I'm sure). And then there's pr0n (no link!) or NAMBLA.
The real question is - How do I stop people using the system from using language that is generally unacceptable? And that's more a design / social engineering problem than a programming problem. I don't think this site has word / phrase filtering and yet there's nothing here that would cause offense to anyone.
Here's an idea - let your users decide what is acceptable! Use a reputation based system. Allow users to vote up users who behave and vote down users that cause offense (with the option of allowing users to give feedback on the vote to give them a chance to mend their ways) and then have an option to filter out users with low / negative reputations.

Deal with undefined values in code or in the template?

I'm writing a web application (in Python, not that it matters). One of the features is that people can leave comments on things. I have a class for comments, basically like so:
class Comment:
user = ...
# other stuff
where user is an instance of another class,
class User:
name = ...
# other stuff
And of course in my template, I have
<div>${comment.user.name}</div>
Problem: Let's say I allow people to post comments anonymously. In that case comment.user is None (undefined), and of course accessing comment.user.name is going to raise an error. What's the best way to deal with that? I see three possibilities:
Use a conditional in the template to test for that case and display something different. This is the most versatile solution, since I can change the way anonymous comments are displayed to, say, "Posted anonymously" (instead of "Posted by ..."), but I've often been told that templates should be mindless display machines and not include logic like that. Also, other people might wind up writing alternate templates for the same application, and I feel like I should be making things as easy as possible for the template writer.
Implement an accessor method for the user property of a Comment that returns a dummy user object when the real user is undefined. This dummy object would have user.name = 'Anonymous' or something like that and so the template could access it and print its name with no error.
Put an actual record in my database corresponding to a user with user.name = Anonymous (or something like that), and just assign that user to any comment posted when nobody's logged in. I know I've seen some real-world systems that operate this way. (phpBB?)
Is there a prevailing wisdom among people who write these sorts of systems about which of these (or some other solution) is the best? Any pitfalls I should watch out for if I go one way vs. another? Whoever gives the best explanation gets the checkmark.
I'd go with the first option, using an if switch in the template.
Consider the case of localization: You'll possibly have different templates for each language. You can easily localize the "anonymous" case in the template itself.
Also, the data model should have nothing to do with the output side. What would you do in the rest of the code if you wanted to test whether a user has a name or not? Check for == 'Anonymous' each time?
The template should indeed only be concerned with outputting data, but that doesn't mean it has to consist solely of output statements. You usually have some sort of if user is logged in, display "Logout", otherwise display "Register" and "Login" case in the templates. It's almost impossible to avoid these.
Personally, I like for clean code, and agree that templates should not have major logic. So in my implementations I make sure that all values have "safe" default values, typically a blank string, pointer to a base class or equivalent. That allows for two major improvements to the code, first that you don't have to constantly test for null or missing values, and you can output default values without too much logic in your display templates.
So in your situation, making a default pointer to a base value sounds like the best solution.
Your 3rd option: Create a regular User entity that represents an anonymous user.
I'm not a fan of None for database integrity reasons.

How can I program a simple chat bot AI?

I want to build a bot that asks someone a few simple questions and branches based on the answer. I realize parsing meaning from the human responses will be challenging, but how do you setup the program to deal with the "state" of the conversation?
It will be a one-to-one conversation between a human and the bot.
You probably want to look into Markov Chains as the basics for the bot AI. I wrote something a long time ago (the code to which I'm not proud of at all, and needs some mods to run on Python > 1.5) that may be a useful starting place for you: http://sourceforge.net/projects/benzo/
EDIT: Here's a minimal example in Python of a Markov Chain that accepts input from stdin and outputs text based on the probabilities of words succeeding one another in the input. It's optimized for IRC-style chat logs, but running any decent-sized text through it should demonstrate the concepts:
import random, sys
NONWORD = "\n"
STARTKEY = NONWORD, NONWORD
MAXGEN=1000
class MarkovChainer(object):
def __init__(self):
self.state = dict()
def input(self, input):
word1, word2 = STARTKEY
for word3 in input.split():
self.state.setdefault((word1, word2), list()).append(word3)
word1, word2 = word2, word3
self.state.setdefault((word1, word2), list()).append(NONWORD)
def output(self):
output = list()
word1, word2 = STARTKEY
for i in range(MAXGEN):
word3 = random.choice(self.state[(word1,word2)])
if word3 == NONWORD: break
output.append(word3)
word1, word2 = word2, word3
return " ".join(output)
if __name__ == "__main__":
c = MarkovChainer()
c.input(sys.stdin.read())
print c.output()
It's pretty easy from here to plug in persistence and an IRC library and have the basis of the type of bot you're talking about.
Folks have mentioned already that statefulness isn't a big component of typical chatbots:
a pure Markov implementations may express a very loose sort of state if it is growing its lexicon and table in real time—earlier utterances by the human interlocutor may get regurgitated by chance later in the conversation—but the Markov model doesn't have any inherent mechanism for selecting or producing such responses.
a parsing-based bot (e.g. ELIZA) generally attempts to respond to (some of the) semantic content of the most recent input from the user without significant regard for prior exchanges.
That said, you certainly can add some amount of state to a chatbot, regardless of the input-parsing and statement-synthesis model you're using. How to do that depends a lot on what you want to accomplish with your statefulness, and that's not really clear from your question. A couple general ideas, however:
Create a keyword stack. As your human offers input, parse out keywords from their statements/questions and throw those keywords onto a stack of some sort. When your chatbot fails to come up with something compelling to respond to in the most recent input—or, perhaps, just at random, to mix things up—go back to your stack, grab a previous keyword, and use that to seed your next synthesis. For bonus points, have the bot explicitly acknowledge that it's going back to a previous subject, e.g. "Wait, HUMAN, earlier you mentioned foo. [Sentence seeded by foo]".
Build RPG-like dialogue logic into the bot. As your parsing human input, toggle flags for specific conversational prompts or content from the user and conditionally alter what the chatbot can talk about, or how it communicates. For example, a chatbot bristling (or scolding, or laughing) at foul language is fairly common; a chatbot that will get het up, and conditionally remain so until apologized to, would be an interesting stateful variation on this. Switch output to ALL CAPS, throw in confrontational rhetoric or demands or sobbing, etc.
Can you clarify a little what you want the state to help you accomplish?
Imagine a neural network with parsing capabilities in each node or neuron. Depending on rules and parsing results, neurons fire. If certain neurons fire, you get a good idea about topic and semantic of the question and therefore can give a good answer.
Memory is done by keeping topics talked about in a session, adding to the firing for the next question, and therefore guiding the selection process of possible answers at the end.
Keep your rules and patterns in a knowledge base, but compile them into memory at start time, with a neuron per rule. You can engineer synapses using something like listeners or event functions.
I think you can look at the code for Kooky, and IIRC it also uses Markov Chains.
Also check out the kooky quotes, they were featured on Coding Horror not long ago and some are hilarious.
I think to start this project, it would be good to have a database with questions (organized as a tree. In every node one or more questions).
These questions sould be answered with "yes " or "no".
If the bot starts to question, it can start with any question from yuor database of questions marked as a start-question. The answer is the way to the next node in the tree.
Edit: Here is a somple one written in ruby you can start with: rubyBOT
naive chatbot program. No parsing, no cleverness, just a training file and output.
It first trains itself on a text and then later uses the data from that training to generate responses to the interlocutor’s input. The training process creates a dictionary where each key is a word and the value is a list of all the words that follow that word sequentially anywhere in the training text. If a word features more than once in this list then that reflects and it is more likely to be chosen by the bot, no need for probabilistic stuff just do it with a list.
The bot chooses a random word from your input and generates a response by choosing another random word that has been seen to be a successor to its held word. It then repeats the process by finding a successor to that word in turn and carrying on iteratively until it thinks it’s said enough. It reaches that conclusion by stopping at a word that was prior to a punctuation mark in the training text. It then returns to input mode again to let you respond, and so on.
It isn’t very realistic but I hereby challenge anyone to do better in 71 lines of code !! This is a great challenge for any budding Pythonists, and I just wish I could open the challenge to a wider audience than the small number of visitors I get to this blog. To code a bot that is always guaranteed to be grammatical must surely be closer to several hundred lines, I simplified hugely by just trying to think of the simplest rule to give the computer a mere stab at having something to say.
Its responses are rather impressionistic to say the least ! Also you have to put what you say in single quotes.
I used War and Peace for my “corpus” which took a couple of hours for the training run, use a shorter file if you are impatient…
here is the trainer
#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
for word in line.split():
text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
working=[]
check=textset[l]
for w in range(len(text)-1):
if check==text[w] and text[w][-1] not in '(),.?!':
working.append(str(text[w+1]))
follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()
here is the bot
#lukebot.py
import pickle,random
a=open('lexicon-luke','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
if a in successorlist:
return random.choice(successorlist[a])
else:
return 'the'
speech=''
while speech!='quit':
speech=raw_input('>')
s=random.choice(speech.split())
response=''
while True:
neword=nextword(s)
response+=' '+neword
s=neword
if neword[-1] in ',?!.':
break
print response
You tend to get an uncanny feeling when it says something that seems partially to make sense.
I would suggest looking at Bayesian probabilities. Then just monitor the chat room for a period of time to create your probability tree.
I'm not sure this is what you're looking for, but there's an old program called ELIZA which could hold a conversation by taking what you said and spitting it back at you after performing some simple textual transformations.
If I remember correctly, many people were convinced that they were "talking" to a real person and had long elaborate conversations with it.
If you're just dabbling, I believe Pidgin allows you to script chat style behavior. Part of the framework probably tacks the state of who sent the message when, and you'd want to keep a log of your bot's internal state for each of the last N messages. Future state decisions could be hardcoded based on inspection of previous states and the content of the most recent few messages. Or you could do something like the Markov chains discussed and use it both for parsing and generating.
If you do not require a learning bot, using AIML (http://www.aiml.net/) will most likely produce the result you want, at least with respect to the bot parsing input and answering based on it.
You would reuse or create "brains" made of XML (in the AIML-format) and parse/run them in a program (parser). There are parsers made in several different languages to choose from, and as far as I can tell the code seems to be open source in most cases.
You can use "ChatterBot", and host it locally using - 'flask-chatterbot-master"
Links:
[ChatterBot Installation]
https://chatterbot.readthedocs.io/en/stable/setup.html
[Host Locally using - flask-chatterbot-master]: https://github.com/chamkank/flask-chatterbot
Cheers,
Ratnakar