avoiding illegal states in openai gym - reinforcement-learning

I'm trying to make a gym environment for a simulation problem. In my gym environment, I have a set of illegal states which I don't want my agent to go into them. What is the easiest way to add such logic to my environment, should I use the wrapper classes? I didn't quite get them, I tried to extend the MultiDiscrete space with inheriting a class from it and override the MulriDiscrete.sample function to stop the environment from going into the illegal states, but is there a more efficient way to do it?

I had a similar problem where i need to make a gym environment which has a sort of pool in the center of grid world where i didn't want the agent to go.
So, I represented grid world as matrix and the pool had different depths which the agent can fall into, so the values at those locations had the negative value proportional to the depth of the puddle.
When training agents this negative reward prevented the agent to fall into the puddle.
The code for the above environment is here and its usage is here
Hope this helps.

Related

Can a reinforcement learning algorithm which may be implemented in some RL library for continuous spaces be used for discrete space by rounding off?

Can we use RL algorithms which are implemented for continuous action space for discrete action space environments by simply mapping (or rounding off) the agent's action in continuous space range (of the gym environment) to discrete actions in openai gym env?
Yes, it works both ways, for an environment E with action space A_E you can just define a wrapper W such that W(A) has an action space A_W of your choice, and it just translates the actions in between. Now is it the most efficient approach? Probably not, usually exploiting any known structure of the problem brings better results.

Best practice to set Drake's simulator for fixed integration when using with reinforcement learning?

I'm using drake for some model-free reinforcement learning and I noticed that Drake uses a non-fixed step integration when simulating an update. This makes sense for the sake of integrating multiple times over a smaller duration when the accelerations of a body is large, but in the case of using reinforcement learning this results in some significant compute overhead and slow rollouts. I was wondering if there is a principled way to allow the simulation environment to operate in a fixed timestep integration mode beyond the method that I'm currently using (code below). I'm using the PyDrake bindings, and PPO as the RL algorithm currently.
integrator = simulator.get_mutable_integrator()
integrator.set_fixed_step_mode(True)
On way to change the integrator that is used for continuous-time dynamics is to call ResetIntegratorFromFlags. For example, to use the RungeKutta2Integrator you would call:
ResetIntegratorFromFlags(simulator=simulator, scheme="runge_kutta2", max_step_size=0.01)
The other thing to keep in mind is whether the System(s) you are simulating use continuous- or discrete-time dynamics, and whether that is configurable in those particular System(s). If there are no continuous-time dynamics being simulated, then the choice of integrator does not matter. Only the update period(s) of the discrete systems will matter.
In particular, if you are simulating a MultibodyPlant, it takes a time_step argument to its constructor. When zero, it will use continue-time dynamics; when greater than zero, it will use discrete-time dynamics.
When I've used Drake for RL, I've almost always put the MultibodyPlant into discrete mode. Staring with time_step=0.001 is usually a safe choice. You might be able to use a larger step depending on the bodies and properties in the scene.
I agree with #jwnimmer-tri -- I suspect for your use case you want to put the MultibodyPlant into discrete mode by specifying the time_step in the constructor.
And to the higher-level question -- I do think it is better to use fixed-step integration in RL. The variable-step integration is more accurate for any one rollout, but could introduce (small) artificial non-smoothness as your algorithm changes the parameters or initial conditions.

Invalid moves in reinforcement learning

I have implemented a custom openai gym environment for a game similar to http://curvefever.io/, but with discreet actions instead of continuous. So my agent can in each step go in one of four directions, left/up/right/down. However one of these actions will always lead to the agent crashing into itself, since it cant "reverse".
Currently I just let the agent take any move, and just let it die if it makes an invalid move, hoping that it will eventually learn to not take that action in that state. I have however read that one can set the probabilities for making an illegal move zero, and then sample an action. Is there any other way to tackle this problem?
You can try to solve this by 2 changes:
1: give current direction as an input and give reward of maybe +0.1 if it takes the move which does not make it crash, and give -0.7 if it make a backward move which directly make it crash.
2: If you are using neural network and Softmax function as activation function of last layer, multiply all outputs of neural network with a positive integer ( confidence ) before giving it to Softmax function. it can be in range of 0 to 100 as i have experience more than 100 will not affect much. more the integer is the more confidence the agent will have to take action for a given state.
If you are not using neural network or say, deep learning, I suggest you to learn concepts of deep learning as your environment of game seems complex and a neural network will give best results.
Note: It will take huge amount of time. so you have to wait enough to train the algorithm. i suggest you not to hurry and let it train. and i played the game, its really interesting :) my wishes to make AI for the game :)

Openai gym environment for multi-agent games

Is it possible to use openai's gym environments for multi-agent games? Specifically, I would like to model a card game with four players (agents). The player scoring a turn starts the next turn. How would I model the necessary coordination between the players (e.g. who's turn it is next)? Ultimately, I would like to use reinforcement learning on four agents that play against each other.
Yes, it is possible to use OpenAI gym environments for multi-agent games. Although in the OpenAI gym community there is no standardized interface for multi-agent environments, it is easy enough to build an OpenAI gym that supports this. For instance, in OpenAI's recent work on multi-agent particle environments they make a multi-agent environment that inherits from gym.Env which takes the following form:
class MultiAgentEnv(gym.Env):
def step(self, action_n):
obs_n = list()
reward_n = list()
done_n = list()
info_n = {'n': []}
# ...
return obs_n, reward_n, done_n, info_n
We can see that the step function takes a list of actions (one for each agent) and returns a list of observations, list of rewards, list of dones, while stepping the environment forwards. This interface is representative of Markov Game, in which all agents take actions at the same time and each observe their own subsequent observation, reward.
However, this kind of Markov Game interface may not be suitable for all multi-agent environments. In particular, turn-based games (such as card games) might be better cast as an alternating Markov Game, in which agents take turns (i.e. actions) one at a time. For this kind of environment, you may need to include which agent's turn it is in the representation of state, and your step function would then just take a single action, and return a single observation, reward and done.
There is a multi-agent deep deterministic policy gradient MADDPG approach has been implemented by OpenAI team.
This is the repo to get started.
https://github.com/openai/multiagent-particle-envs
What you are looking for is PettingZoo, it's a set of environment with multi agent setting and they have a specific class / synthax to handle multi agent environment.
It's an interesting library because you can also use it with ray / rllib to use already implemented algorithm like PPO / Q-learning. Like in this exemple.
Rllib also have an implementation for multiagents environments. But you will have to dig deeper in the documentation to understand it.
There is a specific multi-agent environment for reinforcement learning here. It supports any number of agents written in any programming language. An example game is already implemented which happens to be a card game.

Using 10-node tetrahedron, is strain continuous between neighbouing tetrahedons?

I'm trying to implementing a Finite Element Analysis algorithm. I solve K u = f to get the displacement u, and then calculate strain with u, then calculate the stress. Finally, I use the stress to calculate the Von Mises Stress, and visualize this. From the result I find the strain is not continuous between tetrahedrons.
I use 10 nodes tetrahedron as the element, so the displacement is a second-order polynomial in every element. The displacement should be enforced to be continuous between tetrahedrons. And the strain, which is the first order derivatives of the displacements should be continuous inside every tetrahedron. But I'm not sure: is this true across the interface between tetrahedrons?
Only the components of strain tangent to the adjoining face are guaranteed continuous.
This follows from the displacement continuity, when you take derivatives in the direction of the interface they are the same.
Commercial FEM programs typically do some post process averaging to make the other components look continuous. Note the strain components normal to an element boundary are only expected to be continuous if the underlying constitutive model is continuous, so such averaging is not always appropriate.
You should not compute the stress and strain at the nodes but inside the elements. You can choose for example 4 Gauss points and compute the values there. You then have to think about a scheme on how to get the values computed at the Gauss points onto the tet nodes.
There is a Mathematica application example which illustrates this. Unfortunately the web page is no longer available, but the notebooks are here. You'll find the example in the application example section under Finite Element Method, Structural Mechanics 3D (in the old HelpBrowser). If you have difficulties I could convert it to PDF and send it you.