Start OpenAI gym on arbitrary initial state

Start OpenAI gym on arbitrary initial state - reinforcement-learning

Anybody knows any OpenAI Gym environments where we can set the initial state of the game? For example, I found the MountainCarContinuous-v0 can do such thing so that we can select at which point the car starts. However, I am looking for another more complex environment. Thanks in advance for your help!

You have to redefine the reset function of the class (for example, this). You may want to define it such that it gets as input your desired state, something like
def reset(self, state):
self.state = state
return np.array(self.state)
This should work for all OpenAI gym environments. If you want to do it for other simulators, things may be different. For instance, MuJoCo allows to do something like
saved_state = env.sim.get_state()
env.sim.set_state(saved_state)

Related

Querying OpenAI Gym Simulator

How can I use OpenAI's Gym environments "step" function or something equivalent without actually "stepping" the environment?
I just want to know what is the next state given current state and action but not to actually do the "step".

What do we need the "metadata" field of the gym environments for?

I have noticed that the base class Env (from gym) contains a class field called metadata. This field seems to be used to specify how an environment can be rendered. For example, in the case of the FrozenLake environment, metadata is defined as
metadata = {'render.modes': ['human', 'ansi']}
However, this property/field metadata is never used in that file. For example, in this same example, the render method has a parameter where you can specify the render mode (and the render method does not even check that the value passed to this parameter is in the metadata class field), so I am not sure why we would need this metadata field.
So, why do we need the field metadata? Is it used for something else other than specifying the rendering "modes" of an environment? If yes, where?

I will try to be as basic as possible.
I have run some trials and in my understanding the render has 2 modes, human and ansi which deal with the output on console.(attached is screenshot of my console). In human mode (the default) the output will be SFFF,FHFH with a color tag of current observation (human-friendly)...etc and in ansi mode you will output bytes-like objects where No encoding, decoding, or newline translation is performed e.g \nSFFF\n\x1b[41mF\x1b[0mHFH.
Check out my output below:

First of all, I am not a maintainer/contributor to the OpenAir gym repository. Everything below, therefore, comes from my own investigation since I was wondering the same thing. Therefore, the OpenAi Gym team had other reasons to include the metadata property than the ones I wrote down below.
TLDR
The metadata attribute describes some additional information about a gym environment-class that is not needed during training but is useful when performing:
Python tests.
Using gym utilities.
Parallel training utilities.
etc.
More in-depth answer
The first important thing to notice is that (in most of the environments) the metadata attribute is defined before the class constructor. This means that it is a class attribute that is shared among all instances of a class. This was done since it is not meant to be used during training but stores some additional information about an environment-class that can be used in other test or gym utilities (this is why it is called META-data).
I think this can be seen by diving a bit into the codebase and checking where the metadata is used. When searching the code, we can see that the the frozen lake environment metadata is used in two places:
The env_test script
Here the render.modes metadata is used in the gym/envs/tests/test_env.py script to test whether all the possible render methods in a given environment are working as expected:
for mode in env.metadata.get('render.modes', []):
env.render(mode=mode)
The video_recorder script
The second place where the render.modes metadata is used is the gym/wrappers/monitoring/video_recorder.py. Here it is used to check if a render mode is available in the gym environment, which can be recorded with the video recorder.
if 'rgb_array' not in modes:
if 'ansi' in modes:
self.ansi_mode = True
else:
logger.info('Disabling video recorder because {} neither supports video mode "rgb_array" nor "ansi".'.format(env))
# Whoops, turns out we shouldn't be enabled after all
self.enabled = False
return
Another example of metadata that is used here is the video.frames_per_second metadata.
Another possible use-case
As mentioned above, in most environments, the metadata is defined as a class attribute. By doing this, we can make sure that if the attribute is changed on one of the instances, all the other instances are also aware of this change. This can be handy when you, for example, want to use MPI to train an agent on multiple parallel gym environments at the same time. You could, for example, update the frequency of all the instances at once.
Conclusion
Judging from the above investigation, I think the metadata stores additional information about a gym environment-class, which is not used during training, but that, for example, can be used during:
Python tests.
Other gym utilities.
Parallel training utilities.
etc.
You can, of course, also store the same information as separate attributes on the instance, but I think as it is not used during training, storing it as a metadata class attribute is more correct.

pass loss function and metrics in config

In the official example, both metrics and loss function are hard coded. I am wondering if we can pass those in the config jsonnet, so I can reuse my model in different datasets with different metrics.

I knew I had seen that question before. Copy and paste from GitHub:
Metric is registrable, so you can easily add a parameter to you model of type List[Metric], and then specify metrics in Jsonnet. You'll have to make sure those metrics take exactly the same input.
For the loss, this is a little bit harder. You would create your own Registrable base class, and then implement the losses you want to use this way. You can use the Metric class as an example of how to do this. It would be a bit of typing work, but not difficult.

What's the reason for interface to exist in Actionscript-3 and other languages

what is the meaning of this interfaces? even if we implement an interface on a class, we have to declare it's functionality again and again each time we implement it on a different class, so what is the reason of interfaces exist on as3 or any other languages which has interface.
Thank you

I basically agree with the answers posted so far, just had a bit to add.
First to answer the easy part, yes other languages have interfaces. Java comes to mind immediately but I'm pretty sure all OOP languages (C++, C#, etc.) include some mechanism for creating interfaces.
As stated by Jake, you can write interfaces as "contracts" for what will be fulfilled in order to separate work. To take a hypothetical say I'm working on A and you're working on C, and bob is working on B. If we define B' as an interface for B, we can quickly and relatively easily define B' (relative to defining B, the implementation), and all go on our way. I can assume that from A I can code to B', you can assume from C you can code to B', and when bob gets done with B we can just plug it in.
This comes to Jugg1es point. The ability to swap out a whole functional piece is made easier by "dependency injection" (if you don't know this phrase, please google it). This is the exact thing described, you create an interface that defines generally what something will do, say a database connector. For all database connectors, you want it to be able to connect to database, and run queries, so you might define an interface that says the classes must have a "connect()" method and a "doQuery(stringQuery)." Now lets say Bob writes the implementation for MySQL databases, now your client says well we just paid 200,000 for new servers and they'll run Microsoft SQL so to take advantage of that with your software all you'd need to do is swap out the database connector.
In real life, I have a friend who runs a meat packing/distribution company in Chicago. The company that makes their software/hardware setup for scanning packages and weighing things as they come in and out (inventory) is telling them they have to upgrade to a newer OS/Server and newer hardware to keep with the software. The software is not written in a modular way that allows them to maintain backwards compatibility. I've been in this boat before plenty of times, telling someone xyz needs to be upgraded to get abc functionality that will make doing my job 90% easier. Anyhow guess point being in the real world people don't always make use of these things and it can bite you in the ass.

Interfaces are vital to OOP, particularly when developing large applications. One example is if you needed a data layer that returns data on, say, Users. What if you eventually change how the data is obtained, say you started with XML web services data, but then switched to a flat file or something. If you created an interface for your data layer, you could create another class that implements it and make all the changes to the data layer without ever having to change the code in your application layer. I don't know if you're using Flex or Flash, but when using Flex, interfaces are very useful.

Interfaces are a way of defining functionality of a class. it might not make a whole lot of sense when you are working alone (especially starting out), but when you start working in a team it helps people understand how your code works and how to use the classes you wrote (while keeping your code encapsulated). That's the best way to think of them at an intermediate level in my opinion.

While the existing answers are pretty good, I think they miss the chief advantage of using Interfaces in ActionScript, which is that you can avoid compiling the implementation of that Interface into the Main Document Class.
For example, if you have an ISpaceShip Interface, you now have a choice to do several things to populate a variable typed to that Interface. You could load an external swf whose main Document Class implements ISpaceShip. Once the Loader's contentLoaderInfo's COMPLETE event fires, you cast the contentto ISpaceShip, and the implementation of that (whatever it is) is never compiled into your loading swf. This allows you to put real content in front of your users while the load process happens.
By the same token, you could have a timeline instance declared in the parent AS Class of type ISpaceShip with "Export for Actionscript in Frame N *un*checked. This will compile on the frame where it is first used, so you no longer need to account for this in your preloading time. Do this with enough things and suddenly you don't even need a preloader.
Another advantage of coding to Interfaces is if you're doing unit tests on your code, which you should unless your code is completely trivial. This enables you to make sure that the code is succeeding or failing on its own merits, not based on the merits of the collaborator, or where the collaborator isn't appropriate for a test. For example, if you have a controller that is designed to control a specific type of View, you're not going to want to instantiate the full view for the test, but only the functionality that makes a difference for the test.
If you don't have support in your work situation for writing tests, coding to interfaces helps make sure that your code will be testable once you get to the point where you can write tests.

The above answers are all very good, the only thing I'd add - and it might not be immediately clear in a language like AS3, where there are several untyped collection classes (Array, Object and Dictionary) and Object/dynamic classes - is that it's a means of grouping otherwise disparate objects by type.
A quick example:
Image you had a space shooter, where the player has missiles which lock-on to various targets. Suppose, for this purpose, you wanted any type of object which could be locked onto to have internal functions for registering this (aka an interface):
function lockOn():void;//Tells the object something's locked onto it
function getLockData():Object;//Returns information, position, heat, whatever etc
These targets could be anything, a series of totally unrelated classes - enemy, friend, powerup, health.
One solution would be to have them all to inherit from a base class which contained these methods - but Enemies and Health Pickups wouldn't logically share a common ancestor (and if you find yourself making bizarre inheritance chains to accomodate your needs then you should rethink your design!), and your missile will also need a reference to the object its locked onto:
var myTarget:Enemy;//This isn't going to work for the Powerup class!
or
var myTarget:Powerup;//This isn't going to work for the Enemy class!
...but if all lockable classes implement the ILockable interface, you can set this as the type reference:
var myTarget:ILockable;//This can be set as Enemy, Powerup, any class which implements ILockable!
..and have the functions above as the interface itself.
They're also handy when using the Vector class (the name may mislead you, it's just a typed array) - they run much faster than arrays, but only allow a single type of element - and again, an interface can be specified as type:
var lockTargets:Vector.<Enemy> = new Vector.<Enemy>();//New array of lockable objects
lockTargets[0] = new HealthPickup();//Compiler won't like this!
but this...
var lockTargets:Vector.<ILockable> = new Vector.<ILockable>();
lockTargets[0] = new HealthPickup();
lockTargets[1] = new Enemy();
Will, provided Enemy and HealthPickup implement ILockable, work just fine!

What's the use of specifying Interface ICellRenderer in this case

Please refer to the post :
Display checkbox inside Flash List Control ? (Similar to itemrendering in Flex)
As usual, i always get somewhat confused with the significance of Interface.
So in the above link, i still don't get what is the significance of specifying ICellRenderer
Why can't instead of :
public class CustomCellRenderer extends CheckBox implements ICellRenderer
I can write
public class CustomCellRenderer extends CheckBox
I tried to play with above line, but it doesnot work. I MUST specify ICellRenderer.
In my opinion, Interfaces, just tell a class to follow a certain rule of using functions. How can it stop the working of the class ( if i have used all the necessary functions, but not implemented the required interface )
Thnx in advance.
Vishwas.

Due to checks in other code on your objects not implementing an interface can cause a failure. However this is the advantage of interfaces as well, your class doesn't need to extend from something to implement an interface, so it can arbitrarily extend from any class but implement a particular interface that means it has some other functionality outside of the things it got from it's inheritance chain. So Adobe (or whoever) codes something up like:
if(argumentObject is ICellRenderer)
{
var icr:ICellRenderer = argumentObject as ICellRenderer; //cast so code completion can work
icr.data = data; //do some stuff on the object that only an ICellRenderer can do
}
Let me know if this explanation isn't clear, it took me a long time to come to grips with interfaces and their use, but now I'm a huge advocate.
Supporting Arguments
Insofar as inheritance itself is concerned, just for clarity the concept is that if something is just like something else but with extra properties (nouns) or actions (verbs) associated with it, you need not re-write the common parts.
To use a common example say you have a class Animal (properties such as isAlive, actions like reproduce). Now say you want to make a class for SexualAnimal and AsexualAnimal and in the reproduce method for the SexualAnimal takes an argument whereas the one for AsexualAnimal doesn't (SexualAnimal would have an overloaded version of the reproduce method and would override the default throwing an error). Okay so that's all well and good now say somewhere down the line we get to Birds, now we have a choice on how to handle flight, we can A add flight as a boolean somewhere in the inheritence chain or B create an interface for flight (perhaps we aren't concerned with where in the inheritence tree an Animal is we just want to know that it can fly and therefore has certain properties like altitude and actions like takeFlight). Rather than check for every individual property that you want to possibly use on a flying thing you can just check is IFlyingThing.
Another way to see this is, as a programmer I can write up an Interface give it to another team and we can both operate based on the "contract" established in the Interface. For example say I need to access a database to do some operations, but I don't know which database solution their going to go with long term. So I write an interface for what I need the DB to be able to fetch Y data with X argument, now I'm free to write the rest of my code under the assumption that I supply X and you give me Y, no matter what the underlying implementation my code will work.
For a more formal reference to uses of interfaces see: http://en.wikipedia.org/wiki/Design_Patterns

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008