Reinforcement learning with pair of actions - reinforcement-learning

I'm learning the reinforcement learning in python and followed some training, and most of them dealing with simple actions (like up, down, right, or left), so basically one action at a time.
In my project I have actions in different ways: It has a pair of actions, means an action in addition to an offset been taken within this action...like (action-type, offset-been-taken).
Action types for example are: u1_set,u1_clear,u2_set,u2_clear,u3_set,u3_clear.
And on each action, there is attenuation offset associated with this implemented action (offset like -1,-0.5,0,+0.5,+1), so as example of some pair of actionswill be like (u2_set, +1), (u2_clear, -0.5),...etc.
Wondering what will be the best way to implement the reinforcement learning in this situation (pair of actions and offset) and if there is a good example available online to share.
Thanks in advance.

By far the easiest approach will be to simply treat every possible pair of "sub-actions" as a single complete action. So, in your example, every action is a pair (U, Offset), where U is one of {u1_set, u1_clear, u2_set, u2_clear, u3_est, u3_clear}, and Offset is one of {-1, -0.5, 0, +0.5, +1}. With this example, there would be a total of 6 x 5 = 30 possible pairs, so 30 different actions. That should be perfectly fine for most RL approaches.
If you move on to more complex situations (too many possible pairs), you could start considering more complex solutions as well. For example, you could treat the problem of selection an action-type as a first RL problem, and then the problem of selecting an offset as an additional, separate RL problem (possibly with an enhanced state representation that also contains the already-selected action type).
Or, if you were to move on to Reinforcement Learning with Neural Networks, you could simply have two separate "heads" as output layers, both connected to otherwise the same architecture.
I suspect those last two paragraphs may be unnecessarily complex, especially if you've only just started learning RL, and the first paragraph may be just fine.

Related

Deep reinforcement learning for similar observations but need totally different actions, how to solve it?

For DRL using neural networks, like DQN, if there is a task that needs total different actions at similar observations, is NN going to show its weakness at this moment? Will two near input to the NN generate similar output? If so, it cannot get the different the task need?
For instance:
the agent can choose discrete action from [A,B,C,D,E], here is the observation by a set of plugs in a binary list [0,0,0,0,0,0,0].
For observation [1,1,1,1,1,1,1] and [1,1,1,1,1,1,0] they are quite similar but if the agent should conduct action A at [1,1,1,1,1,1,1] but action D at [1,1,1,1,1,1,0]. Those two observation are too closed on the distance so the DQN may not easily get the proper action? How to solve?
One more thing:
One hot encoding is a way to improve the distance between observations. It is also a common and useful way for many supervised learning tasks. But one hot will also increase the dimension heavily.
Will two near input to the NN generate similar output ?
Artificial neural networks, by nature, are non-linear function approximators. Meaning that for two given similar inputs, the output can be very different.
You might get an intuition on it considering this example, two very similar pictures (the one on the right just has some light noise added to it) give very different results for the model.
For observation [1,1,1,1,1,1,1] and [1,1,1,1,1,1,0] they are quite similar but if the agent should conduct action A at [1,1,1,1,1,1,1] but action D at [1,1,1,1,1,1,0]. Those two observation are too closed on the distance so the DQN may not easily get the proper action ?
I see no problem with this example, a properly trained NN should be able to map the desired action for both inputs. Furthermore, in your example the input vectors contain binary values, a single difference in these vectors (meaning that they have a Hamming distance of 1) is big enough for the neural net to classify them properly.
Also, the non-linearity in neural networks comes from the activation functions, hope this helps !

What Algo to use to classify my data to 3 classes

I'm looking for a way to differentiate between 3 classes(classification problem) for each OBJECT to classify.
I have a large dataset(millions of lines). There are 2 features, each have 100 values(scaled to 0-1).
Each line refers to one sample of a specific Object(Object_id, 100 columns of my first feature, 100 of my second feature).
Each object(that has to be classified to either 3 classes) have at least 100 samples(1 sample is 1 line)
Unfortunately Classe 3 counts only 1/10 compared to 1 and 2(each object of classe 3 have around 500 samples, however classe 1 and 2 objects have around 2000 and more).
In order to do the classification, I need to take a bach of samples for each object(for exmaple 20, 50, or 100).
I dont know what algo suites better for my case, I'm new to deep learning so bear with me please
Let's break this down to two main questions: how to handle unbalanced datasets and which model to use.
Unbalanced datasets
Most machine learning algorithms are sensitive to some degree on unbalanced datasets. This is a huge challenge for Machine Learning in fields like medical diagnostics or seismology, where you have 98% "normal" readings and 2% "event" readings. There is no silver bullet to this problem. Some algorithms are more resilient to an unbalanced dataset, and some that deliberately unbalance their datasets to encourage a strong model (see bagging), and there are options to augment your data by introducing cloned data with statistical noise. However, your easiest and most effective approach is to decimate your dataset to make it balanced.
You have a class split of 2000|2000|500 datapoints. Randomly sample 500 datapoints from each of the first two classes so you have a balanced 500|500|500 dataset. It is important to randomly sample, instead of simply taking the first 500 as you want a representative sample of the class population. see the numpy.random module for how to select your datapoints.
Model selection
Although Deep Learning is portrayed as the be-all and end-all for machine learning, it represents a significant amount of time and cost to prepare, train and monitor. A typical approach to any new problem is to try some "baseline" shallow learning models. Often you'll see the following scenarios:
Your baseline models fail to train.
Your baseline model trains and fits moderately
Your baseline model trains and fits closely
In the first scenario, your deep learning model is unlikely to train either. In the third scenario there is no need to build a deep learning model when a simpler algorithm can solve it. Scenario 2 is your candidate fro deep learning.
So what models could you use?
Well, we know that it's a supervised problem, that we have a good number of samples, and that we are looking to classify. Your best bet for this kind of question is a Random Forests model. There is a good simple implementation in scikit-learn and hundreds of tutorials.
Alternatively, if you're looking at class fit through clustering, K-means ++ models (and co), or even Gaussian Mixture Models are a good place to start (again, see scikit learn's sklearn.clustering and sklearn.mixture)
If it fits well, then your work is done. If it fits moderately, think about deep learning. If it fails to fit, get add more features (and more diverse features) to your dataset.

Which reinforcement learning algorithm is applicable to a problem with a continuously variable reward and no intermediate rewards?

I think the title says it. A "game" takes a number of moves to complete, at which point a total score is computed. The goal is to maximize this score, and there are no rewards provided for specific moves during the game. Is there an existing algorithm that is geared toward this type of problem?
EDIT: By "continuously variable" reward, I mean it is a floating point number, not a win/loss binary. So you can't, for example, respond to "winning" by reinforcing the moves made to get there. All you have is a number. You can rank different runs in order of preference, but a single result is not especially meaningful.
First of all, in my opinion, the title of your question seems a little confusing when you talk about "continuously variable reward". Maybe you could clarify this aspect.
On the other hand, without taking into account the previous point, it looks your are talking about the temporal credit-assigment problem: How do you distribute credit for a sequence of actions which only obtain a reward (positive or negative) at the end of the sequence?
E.g., a Tic-tac-toe game where the agent doesn't recive any reward until the game ends. In this case, almost any RL algorithm tries to solve the temporal credit-assigment problem. See, for example, Section 1.5 of Sutton and Barto RL book, where they explain the working principles of RL and its advantages over other approaches using as example a Tic-tac-toe game.

Methods for automated synonym detection

I am currently working on a neural network based approach to short document classification, and since the corpuses I am working with are usually around ten words, the standard statistical document classification methods are of limited use. Due to this fact I am attempting to implement some form of automated synonym detection for the matches provided in the training. My question more specifically is about resolving a situation as follows:
Say I have classifications of "Involving Food", and one of "Involving Spheres" and a data set as follows:
"Eating Apples"(Food);"Eating Marbles"(Spheres); "Eating Oranges"(Food, Spheres);
"Throwing Baseballs(Spheres)";"Throwing Apples(Food)";"Throwing Balls(Spheres)";
"Spinning Apples"(Food);"Spinning Baseballs";
I am looking for an incremental method that would move towards the following linkages:
Eating --> Food
Apples --> Food
Marbles --> Spheres
Oranges --> Food, Spheres
Throwing --> Spheres
Baseballs --> Spheres
Balls --> Spheres
Spinning --> Neutral
Involving --> Neutral
I do realize that in this specific case these might be slightly suspect matches, but it illustrates the problems I am having. My general thoughts were that if I incremented a word for appearing opposite the words in a category, but in that case I would end up incidentally linking everything to the word "Involving", I then thought that I would simply decrement a word for appearing in conjunction with multiple synonyms, or with non-synonyms, but then I would lose the link between "Eating" and "Food". Does anyone have any clue as to how I would put together an algorithm that would move me in the directions indicated above?
There is an unsupervized boot-strapping approach that was explained to me to do this.
There are different ways of applying this approach, and variants, but here's a simplified version.
Concept:
Start by a assuming that if two words are synonyms, then in your corpus they will appear in similar settings. (eating grapes, eating sandwich, etc.)
(In this variant I will use co-occurence as the setting).
Boot-Strapping Algorithm:
We have two lists,
one list will contain the words that co-occur with food items
one list will contain the words that are food items
Supervized Part
Start by seeding one of the lists, for instance I might write the word Apple on the food items list.
Now let the computer take over.
Unsupervized Parts
It will first find all words in the corpus that appear just before Apple, and sort them in order of most occuring.
Take the top two (or however many you want) and add them into the co-occur with food items list. For example, perhaps "eating" and "Delicious" are the top two.
Now use that list to find the next two top food words by ranking the words that appear to the right of each word in the list.
Continue this process expanding each list until you are happy with the results.
Once that's done
(you may need to manually remove some things from the lists as you go which are clearly wrong.)
Variants
This procedure can be made quite effective if you take into account the grammatical setting of the keywords.
Subj ate NounPhrase
NounPhrase are/is Moldy
The workers harvested the Apples.
subj verb Apples
That might imply harvested is an important verb for distinguishing foods.
Then look for other occurrences of subj harvested nounPhrase
You can expand this process to move words into categories, instead of a single category at each step.
My Source
This approach was used in a system developed at the University of Utah a few years back which was successful at compiling a decent list of weapon words, victim words, and place words by just looking at news articles.
An interesting approach, and had good results.
Not a neural network approach, but an intriguing methodology.
Edit:
the system at the University of Utah was called AutoSlog-TS, and a short slide about it can be seen here towards the end of the presentation. And a link to a paper about it here
You could try LDA which is unsupervised. There is a supervised version of LDA but I can't remember the name! Stanford parser will have the algorithm which you can play around with. I understand it's not the NN approach you are looking for. But if you are just looking to group information together LDA would seem appropriate, especially if you are looking for 'topics'
The code here (http://ronan.collobert.com/senna/) implements a neural network to perform a variety on NLP tasks. The page also links to a paper that describes one of the most successful approaches so far of applying convolutional neural nets to NLP tasks.
It is possible to modify their code to use the trained networks that they provide to classify sentences, but this may take more work than you were hoping for, and it can be tricky to correctly train neural networks.
I had a lot of success using a similar technique to classify biological sequences, but, in contrast to English language sentences, my sequences had only 20 possible symbols per position rather than 50-100k.
One interesting feature of their network that may be useful to you is their word embeddings. Word embeddings map individual words (each can be considered an indicator vector of length 100k) to real valued vectors of length 50. Euclidean distance between the embedded vectors should reflect semantic distance between words, so this could help you detect synonyms.
For a simpler approach WordNet (http://wordnet.princeton.edu/) provides lists of synonyms, but I have never used this myself.
I'm not sure if I misunderstand your question. Do you require the system to be able to reason based on your input data alone, or would it be acceptable to refer to an external dictionary?
If it is acceptable, I would recommend you to take a look at http://wordnet.princeton.edu/ which is a database of English word relationships. (It also exists for a few other languges.) These relationships include synonyms, antonyms, hyperonyms (which is what you really seem to be looking for, rather than synonyms), hyponyms, etc.
The hyperonym / hyponym relationship links more generic terms to more specific ones. The words "banana" and "orange" are hyponyms of "fruit"; it is a hyperonym of both. http://en.wikipedia.org/wiki/Hyponymy Of course, "orange" is ambiguous, and is also a hyponym of "color".
You asked for a method, but I can only point you to data. Even if this turns out to be useful, you will obviously need quite a bit of work to use it for your particular application. For one thing, how do you know when you have reached a suitable level of abstraction? Unless your input is hevily normalized, you will have a mix of generic and specific terms. Do you stop at "citrus","fruit", "plant", "animate", "concrete", or "noun"? (Sorry, just made up this particular hierarchy.) Still, hope this helps.

Algorithm for online approximation of a slowly-changing, real valued function

I'm tackling an interesting machine learning problem and would love to hear if anyone knows a good algorithm to deal with the following:
The algorithm must learn to approximate a function of N inputs and M outputs
N is quite large, e.g. 1,000-10,000
M is quite small, e.g. 5-10
All inputs and outputs are floating point values, could be positive or negative, likely to be relatively small in absolute value but no absolute guarantees on bounds
Each time period I get N inputs and need to predict the M outputs, at the end of the time period the actual values for the M outputs are provided (i.e. this is a supervised learning situation where learning needs to take place online)
The underlying function is non-linear, but not too nasty (e.g. I expect it will be smooth and continuous over most of the input space)
There will be a small amount of noise in the function, but signal/noise is likely to be good - I expect the N inputs will expain 95%+ of the output values
The underlying function is slowly changing over time - unlikely to change drastically in a single time period but is likely to shift slightly over the 1000s of time periods range
There is no hidden state to worry about (other than the changing function), i.e. all the information required is in the N inputs
I'm currently thinking some kind of back-propagation neural network with lots of hidden nodes might work - but is that really the best approach for this situation and will it handle the changing function?
With your number of inputs and outputs, I'd also go for a neural network, it should do a good approximation. The slight change is good for a back-propagation technique, it should not have to 'de-learn' stuff.
I think stochastic gradient descent (http://en.wikipedia.org/wiki/Stochastic_gradient_descent) would be a straight forward first step, it will probably work nicely given the operating conditions you have.
I'd also go for an ANN. Single layer might do fine since your input space is large. You might wanna give it a shot before adding a lot of hidden layers.
#mikera What is it going to be used for? Is it an assignment in a ML course?