Is it possible to modify OpenAI environments? - reinforcement-learning

There are some things that I would like to modify in the OpenAI environments. If we use the Cartpole example then we can edit things that are in the class init function but with environments that use Box2D it doesn't seem to be as straightforward.
For example, consider the BipedalWalker environment.
In this case, how would I edit things like the SPEED_HIP or SPEED_KNEE variables?

Yes, you can modify or create new environments in gym. The simplest (but not recommended) way is to modify the constants in your local gym installation directly, but of course that's not really nice.
A nicer way is to download the bipedal walker environment file (from here) and save it to a file (say, my_bipedal_walker.py)
Then you modify the constants in the my_bipedal_walker.py file, and then just import it in your code (assuming you put the file in a path that is importable, or the same folder as your other code files):
import gym
from my_bipedal_walker import BipedalWalker
env = BipedalWalker()
Then you have the env variable being an instance of the environment, with your defined constants for the physics computation, which you can use with any RL algorithm.
An even nicer way would be making your custom environment available in the OpenAI gym registry, which you can do by following the instructions here

You can edit the bipedal walker environment just like you can modify the cartpole environment.
All you have to do is modify the constants for SPEED_HIP and SPEED_KNEE.
If you want to change how those constants are used in the locomotion of the agent, you might also want to tweak the step method.
After making changes to the code, when you instantiate the environment, the new instance will be using the modifications you made.

Related

Building a Pipline Model using allennlp

I am pretty new to allennlp and I am struggling with building a model that does not seem to fit perfectly in the standard way of building model in allennlp.
I want to build a pipeline model using NLP. The pipeline consists basically of two models, let's call them A and B. First A is trained and based on the prediction of the full train A, B trained afterwards.
What I have seen is that people define two separate models, train both using the command line interface allennlp train ... in a shell script that looks like
# set a bunch of environment variables
...
allennlp train -s $OUTPUT_BASE_PATH_A --include-package MyModel --force $CONFIG_MODEL_A
# prepare environment variables for model b
...
allennlp train -s $OUTPUT_BASE_PATH_B --include-package MyModel --force $CONFIG_MODEL_B
I have two concerns about that:
This code is hard to debug
It's not very flexible. When I want to do a forward pass of the fully trained model I have write another script that bash script that does that.
Any ideas on how to do that in a better way?
I thought about using a python script instead of a shell script and invoke allennlp.commands.main(..) directly. Doing so at least you have a joint python module you can run using a debugger.
There are two possibilities.
If you're really just plugging the output of one model into the input of another, you could merge them together into one model and run it that way. You can do this with two already-trained models if you initialize the combined model with the two trained models using a from_file model. To do it at training time is a little harder, but not impossible. You would train the first model like you do now. For the second step, you train the combined model directly, with the inner first model's weights frozen.
The other thing you can do is use AllenNLP as a library, without the config files. We have a template up on GitHub that shows you how to do this. The basic insight is that everything you configure in one of the Jsonnet configuration files corresponds 1:1 to a Python class that you can use directly from Python. There is no requirement to use the configuration files. If you use AllenNLP this way, have much more flexibility, including chaining things together.

Ray RLllib: Export policy for external use

I have a PPO policy based model that I train with RLLib using the Ray Tune API on some standard gym environments (with no fancy preprocessing). I have model checkpoints saved which I can load from and restore for further training.
Now, I want to export my model for production onto a system that should ideally have no dependencies on Ray or RLLib. Is there a simple way to do this?
I know that there is an interface export_model in the rllib.policy.tf_policy class, but it doesn't seem particularly easy to use. For instance, after calling export_model('savedir') in my training script, and in another context loading via model = tf.saved_model.load('savedir'), the resulting model object is troublesome (something like model.signatures['serving_default'](gym_observation) doesn't work) to feed the correct inputs into for evaluation. I'm ideally looking for a method that would allow for easy out of the box model loading and evaluation on observation objects
Once you have restored from checkpoint with agent.restore(**checkpoint_path**), you can use agent.export_policy_model(**output_dir**) to export the model as a .pb file and variables folder.

How to modify the agent in an openai gym environment?

I want to modify the agents structure (for example the length of the legs) in the bipedal_walker environment.
Is there anyway to do that?
The easiest way would be to download and modify the OpenAI gym source code, and then use your locally modified version of gym in place of the pip package you installed.
For example, if you want to modify the length of the legs in the bipedal_walker environment, you'd probably want to play around with the LEG_H variable on this line of code.

Use a variable for hierarchical path in verilog configuration

I have a UVM testbench that uses configurations to replace a VHDL component that is deep within the design. Each test that I create has to use a verilog configuration to replace that component. Is there a way to use a variable for the hierarchical path so that I don't have to update each configuration if the VHDL design changes?
There is no way to use a variable to represent a hierarchical path, except for virtual interface variables used to represent the hierarchical path to interface instances.
You will need to show use an example of how each test changes the VHDL component to give us a better idea for a solution; maybe you can use a macro.
I found a solution that does what I would like it to do. I have used macros to define the instances that I want to configure. The following is a an example of what I have done:
`define USE_TB_COMP instance top.u_mod1.u_sub_mod1.u_comp use tb_comp;
config test1_c;
`USE_TB_COMP
endconfig
config test2_c;
`USE_TB_COMP
endconfig
....

Configuration Promotion Between Environments

What is a good way to coordinate configuration changes through environments?
In an effort to decouple our code from the environment we've moved all environmental config to external files. So maybe the application will look for ${application.config.dir}/app.properties and app.properties could contain:
user.auth.endpoint=http://some.url/user
user.auth.apikey=abcd12314
The problem is, user.auth.endpoint needs to point to a test resource when on test, a staging resource when on the staging environment, and a production resource when on prod.
We could maintain different copies of the config file but this would violate DRY and become very unwieldy (there are 20+ production environments).
What's a good way to manage this? What tools should I be searching for?
Externalizing config is a good idea, you could externalize them all the way to environment variables.
Env vars are easy to change between deploys without changing any code;
unlike config files, there is little chance of them being checked into
the code repo accidentally; and unlike custom config files, or other
config mechanisms such as Java System Properties, they are a language-
and OS-agnostic standard.
From http://12factor.net/config
I know of three approaches to this.
The first approach is to write, say, a Python "wrapper" script for your application. The script will find out some environmental details, such as hostname, user name and values of environment variables, and then construct the appropriate configuration file (or a set of command-line options) that is passed to the real application.
The second approach is to embed an interpreter for a scripting language (Python, Lua and Tcl come to mind) into your application. This makes it possible for you to write a configuration file in the syntax of that embedded scripting language. In this way, the configuration file can make use of the scripting language's features, such as the ability to query environment variables or execute a shell command (such as hostname) and use if-then-else statements to set variables appropriately.
The third approach (if you are using C++ or Java) is to use the open-source Config4* library (disclaimer, I am the main developer of that). I recommend you read Chapter 2 of the "Config4* Getting Started" manual to see examples of how its flexible syntax can enable a single configuration file adapt to multiple environments.
You can take a look at http://www.configapp.com. You work with 1 configuration file, and switch/tab between the environments. Internally it's just 1 configuration file, and it handles the environment variables and generates the config file for the specific environment. In Config terminology, you have 1 Prod environment with 20+ instances. You will have a Prod environment configuration and you can tweak the 20+ instances accordingly using a web interface.
You moved environment specific properties to a separate file, but with Config, you don't have to do that. With Config, you can have 1 configuration file, with environment variables support, and common configuration applied to all environments.
Note that I'm part of the Config team.