I’m trying to use Reinforcement Learning to solve a problem that involves a ton of simultaneous actions. For example, the agent will be able to take actions that can result in a single action, like shooting, or that can result in multiple actions, like shooting while jumping while turning right while doing a karate chop, etc. When all the possible actions are combined, I end up with a huge action array, say 1 x 2000. So my LSTM network output array will have that size. Of course I’ll use a dictionary to decode the action array to apply the actions(s). So my questions are, is that action array too large? Is this the way to handle simultaneous actions? Is there any other way to do this? Feel free to link any concrete examples you have seen around. Thanks.
I also have been trying to do something similar for my problem. You can check out the following papers:
Exploring Multi-Action Relationship in Reinforcement Learning
Imitation Learning with Concurrent Actions in 3D Games
Action Branching Architectures for Deep Reinforcement Learning
StarCraft II: A New Challenge for Reinforcement Learning
Related
Is there a way to model action masking for continuous action spaces? I want to model economic problems with reinforcement learning. These problems often have continuous action and state spaces. In addition, the state often influences what actions are possible and, thus, the allowed actions change from step to step.
Simple example:
The agent has a wealth (continuous state) and decides about spending (continuous action). The next periods is then wealth minus spending. But he is restricted by the budget constraint. He is not allowed to spend more than his wealth. What is the best way to model this?
What I tried:
For discrete actions it is possible to use action masking. So in each time step, I provided the agent with information which action is allowed and which not. I also tried to do it with contiuous action space by providing lower and upper bound on allowed actions and clip the actions smapled from actor network (e.g. DDPG).
I am wondering if this is a valid thing to do (it works in a simple toy model) because I did not find any RL library that implements this. Or is there a smarter way/best practice to include the information about allowed actions to the agent?
I think you are on the right track. I've looked into masked actions and found two possible approaches: give a negative reward when trying to take an invalid action (without letting the environment evolve), or dive deeper into the neural network code and let the neural network output only valid actions.
I've always considered this last approach as the most efficient, and your approach of introducing boundaries seems very similar to it. So as long as this is the type of mask (boundaries) you are looking for, I think you are good to go.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
For my task, I am given a series of frames extracted from a Tom and Jerry video. I need to detect the objects in a frame(in my case, the objects are tom and jerry) and their location. Since my dataset is different from the classes in ImageNet, I am stuck with no training data.
I have searched extensively and there seem to be some tools where I need to manually crop the location of images, is there any way to do this without such manual work ?
Any suggestions would be really helpful, thanks a lot !
is there any way to do this without such manual work ?
Welcome to the current state of machine learning, driven by data-hungry networks and much labor and work on creating datasets side :) Labels are here and will stay for a while, to tell your network (via loss function) what you want to do.
but but.. you are not in that bad situation at all, because you can go for the pre-trained net and just fine-tune into your lovely Jerry and Tom (acquiring training data will be 1-2h itself).
So what is this fine-tuning and how does it work? Let's say you are taking a pre-trained net trained on Imagenet and this net can perform reasonably well on classes defined in Imagenet. It will be your starting point. This network already has learned quite abstract features about all these objects from ImageNet, that's why the network is capable of transfer learning with a reasonably small amount of new class samples. Now when you add Tom and Jerry to the network output and fine-tune it on a small amount of data (20-100 samples) it will perform not that bad (I guess acc will be somewhere in 65-85%). So here is what I suggest:
google some pre-trained net easy to interact. I found this. See chapter 4. Transfer Learning with Your Own Image Dataset.
pick some labeling tool.
label 20-100 Toms, Jerries with bounding box. For a small dataset like this, divide it to ./train (80%) and ./test (20%). Try to catch different poses, different backgrounds, frames distinct from each other. Go for some augmentation.
Remove last network layer and add layer for 2 new outputs, Tom and Jerry.
train it (fine-tune it), check accuracy on your test set.
have fun! Train it again with more data.
"Is it possible to perform Object Detection without proper training
data?"
It's kinda is, but I can't imagine anything simpler than fine-tuning. We can talk here about:
A. non-machine learning approaches: which is computer vision + hand-crafting features + manually defining parametes and using it as a detector, but in your case, it is rather not the way you want to go; however some box sliding and manually color histogram thresholding may work for Tom and Jerry (this thresholding parameter naturally might be subject to train). This is quite often more work to do than proposed fine-tuning. Sometimes it is a way to label thousands of samples that way, then correct labels, then train more powerful detectors. There are numerous tasks that this approach is enough and the benefit can be lightweight and speed.
B. machine learning approaches which deal with no proper training data. Or maybe which deal with a small amount of data as humans do. This is mainly emerging filed, currently under active R&D and few of my favorites are:
fine-tuning pre-trained nets. Hey, we are using this because it is so simple!
one-shot approaches, like triplet-loss+deep-metrics
memory augmented neural networks used in one/few shot context
unsupervised, semi-supervised approaches
bio-plausible nets, including no-backprop approach with only last layer tuned via supervision
Deep Reinforcement Learning can be very useful in applying it to real-world problems which have highly dynamic nature. Few examples can be listed as is finance, healthcare etc. But when it comes to these kinds of problems it is hard to have a simulated environment. So what are the possible things to do?
Let me first comment a couple concepts trying to give you future research directions according to your comments:
Probably the term "forecast" is not suitable to describe the kind of problems solved by Reinforcement Learning. In some sense, RL needs to do an internal forecast process to choose the best actions in the long term. But the problem solved is an agent choosing actions in an environment. So, if your problem is a forecast problem, maybe other techniques are more suitable than RL.
Between tabular methods and deep Q-learning there exist many other methods that maybe are more suitable to your problem. They are probably less powerful but easy to use (more stable, less parameter tuning, etc.) You can combine Q-learning with other function approximators (simpler than a deep neural network). In general, the best choice is the simplest one able to solve the problem.
I don't know how to simulate the problem of human activities with first-person vision. In fact, I don't fully understand the problem setup.
And regarding to the original question of applying RL without accessing a simulated environment, as I previously said in the comments, if you have enough data, you could probably apply an RL algorithm. I'm assuming that you can store data from your environment but you can not easily interact with it. This is typical, for example, in medical domains where there exist many data about [patient status, treatment, next patient status], but you can not interact with patients by applying random treatments. In this situation, there are some facts to take into account:
RL methods generally consume a very large quantity of data. This is specially true when combined with deep nets. How much data it is necessary depends totally of the problem, but be ready to store millions of tuples [state, action, next state] if your environment is complex.
The stored tuples should be collected with a policy which contains some exploratory actions. RL algorithm will try to find the best possible actions among the ones contained in the data. If the agent can interact with the environment, it should choose exploratory actions to find the best one. Similarly, if the agent cannot interact and instead the data is gathered in advance, this data also should contain exploratory actions. The papers Neural Fitted Q Iteration - First Experiences
with a Data Efficient Neural Reinforcement
Learning Method and Tree-Based Batch Mode Reinforcement Learning could be helpful to understand these concepts.
I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning.
Right now I'm using the Kinect Gesture Builder program which uses Supervised Learning to associate user movement to specific gestures. But that requires supervised training which I'd like to move away from. I figure the algorithm might pick up certain associations between joints that I would when I classify the data myself (hands up, step left, step right, for example).
I think feeding that data into a deep neural network and then pass that into a reinforcement learning algorithm might give me a better result.
There was a paper on this recently. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
I know Accord.net has both deep neural networks and RL but has anyone combined them together? Any insights?
If I understand correctly from your question + comment, what you want is to have an agent that performs discrete actions using a visual input (raw pixels from a camera). This looks exactly like what DeepMind guys recently did, extending the paper you mentioned. Have a look at this. It is the newer (and better) version of playing Atari games. They also provide an official implementation, which you can download here.
There is even an implementation in Neon which works pretty well.
Finally, if you want to use continuous actions, you might be interested in this very recent paper.
To recap: yes, somebody combined DNN + RL, it works and if you want to use raw camera data to train an agent with RL, this is definitely one way to go :)
I'm interested in writing certain software that uses machine learning, and performs certain actions based on external data.
However I've run into problem (that was always interesting to me) -
how is it possible to write machine learning software that issues orders or sequences of orders?
The problem is that as I understand it, neural network gets bunch on inputs, and "recalls" output based on results of previous trainings. Instantly (well, more or less). So I'm not sure how "issuing orders" could fit into that system, especially when actions performed by system affect the system with certain delay. I'm also a bit unsure how is it possible to train this thing.
Examples of such system:
1. First person shooter enemy controller. As I understand it, it is possible to implement neural network controller for the bot that will switch bot behavior strategies(well, assign priorities to them) based on some inputs (probably something like health, ammo, etc). But I don't see a way to make higher-order controller, that could issue sequence of commands like "go there, then turn left". Also, bot's actions will affect variables that control bot's behavior. I.e. shooting reduces ammo, falling from heights reduces health, etc.
2. Automated market trader. It is certainly possible to make system that will try to predict the next market price of something. However, I don't see how is it possible to make system that would issue order to buy something, watch the trend, then sell it back to gain profit/cover up losses.
3. Car driver. Again, (as I understand it) it is possible to make system that will maintain desired movement vector based on position/velocity/torque data and results of previous training. However I don't see a way to make such system (learn to) perform sequence of actions.
I.e. as I understood it, neural net is technically a matrix - you give it input, it produces output. But what about generating sequences of actions that could change environment program operates in?
If such tasks are not entirely suitable for neural networks, what else could be used?
P.S. I understand that the question isn't exactly clear, and I suspect that I'm missing some knowledge. So I'll appreciate some pointers (i.e. books/resources to read, etc).
You could try to connect the output neurons to controllers directly, e.g. moving forward, turning, or shooting in the ego shooter, or buying orders for the trader. However, I think that the best results are gained nowadays when you let the neural net solve one rather specific subproblem, and then let a "normal" program interpret its answer. For example, you could let the neural net construct a map overlay of "where do I want to be", which the bot then translates into movements. The neural network for the trader could produce a "how much do I want which paper", which the bot then translates into buying or selling orders.
The decision which subproblem should be solved by a neural network is a very central one for its design. The important thing is that good solutions can be taught to the neural network.
Edit: Expanding this in the examples: When the ego shooter bot gets shot, it should not have wanted to be there; when it gets to shoot someone else, it should have wanted to be there more. When the trader loses money from a paper, it should have wanted it less before; if it gains, it should have wanted it more. These things can be taught.
The problem you are describing is known as Reinforcement Learning. Reinforcement learning is essentially a machine learning algorithm (such as a neural network) coupled with a controller. It has been used for all of the applications you mention, even to drive real cars.