Use MiniGrid environment with stable-baseline3 - reinforcement-learning

I'm using MiniGrid library to work with different 2D navigation problems as experiments for my reinforcement learning problem. I'm also using stable-baselines3 library to train PPO models. but unfortunately, while training PPO using stable-baselines3 with MiniGrid environment, I got the following error.
Error
I imported the environment as follows,
import gymnasium as gym
from minigrid.wrappers import RGBImgObsWrapper
env = gym.make("MiniGrid-SimpleCrossingS9N1-v0", render_mode="human")
env = RGBImgObsWrapper(env)
env = ImgObsWrapper(env)
The training script using stable-baseline3 is as follows,
model = PPO('CnnPolicy', env, verbose=0)
model.learn(args.timesteps, callback)
I have done a quick debug and found a potential lead. I don't know if this is the real cause.
When I tried to load the environment directly from the gym the action space <class 'gym.spaces.discrete.Discrete'>. But when loaded from MiniGrid the action space is <class 'gymnasium.spaces.discrete.Discrete'>. Any help to sort out the problem is highly appreciated. Thanks in advance

That's the correct analysis, Stable Baselines3 doesn't support Gymnasium yet, so checks on gym.spaces.discrete.Discrete fail against gymnasium.
The following post answer explains how to workaround that, based on a currently open PR: OpenAI Gymnasium, are there any libraries with algorithms supporting it?.

Related

Can I use Gymnasium with Ray's RLlib?

I want to develop a custom Reinforcement Learning environment. Previously, I have been working with OpenAI's gym library and Ray's RLlib. I noticed that the README.md in the Open AI's gym library suggests moving to Gymnasium # (https://github.com/Farama-Foundation/Gymnasium). But I have yet to find a statement from Ray on using Gymnasium instead of gym.
Will I have problems using Gymnasium and Ray's RLlib?
Yes you will at the moment. One difference is that when performing an action in gynasium with the env.step(action) method, it returns a 5-tuple - the old "done" from gym<0.24.1 has been replaced with two final states - "truncated" or "terminated".
There is an outstanding issue to integrate gymnasium into rllib and I expect this issue will be resolved soon: https://github.com/ray-project/ray/issues/29697

How does the finetune on transformer (t5) work?

I am using pytorch lightning to finetune t5 transformer on a specific task. However, I was not able to understand how the finetuning works. I always see this code :
tokenizer = AutoTokenizer.from_pretrained(hparams.model_name_or_path) model = AutoModelForSeq2SeqLM.from_pretrained(hparams.model_name_or_path)
I don't get how the finetuning is done, are they freezing the whole model and training the head only, (if so how can I change the head) or are they using the pre-trained model as a weight initializing? I have been looking for an answer for couple days already. Any links or help are appreciated.
If you are using PyTorch Lightning, then it won't freeze the head until you specify it do so. Lightning has a callback which you can use to freeze your backbone and training only the head module. See Backbone Finetuning
Also checkout Ligthning-Flash, it allows you to quickly build model for various text tasks and uses Transformers library for backbone. You can use the Trainer to specify which kind of finetuning you want to apply for your training.
Thanks

Building a Pipline Model using allennlp

I am pretty new to allennlp and I am struggling with building a model that does not seem to fit perfectly in the standard way of building model in allennlp.
I want to build a pipeline model using NLP. The pipeline consists basically of two models, let's call them A and B. First A is trained and based on the prediction of the full train A, B trained afterwards.
What I have seen is that people define two separate models, train both using the command line interface allennlp train ... in a shell script that looks like
# set a bunch of environment variables
...
allennlp train -s $OUTPUT_BASE_PATH_A --include-package MyModel --force $CONFIG_MODEL_A
# prepare environment variables for model b
...
allennlp train -s $OUTPUT_BASE_PATH_B --include-package MyModel --force $CONFIG_MODEL_B
I have two concerns about that:
This code is hard to debug
It's not very flexible. When I want to do a forward pass of the fully trained model I have write another script that bash script that does that.
Any ideas on how to do that in a better way?
I thought about using a python script instead of a shell script and invoke allennlp.commands.main(..) directly. Doing so at least you have a joint python module you can run using a debugger.
There are two possibilities.
If you're really just plugging the output of one model into the input of another, you could merge them together into one model and run it that way. You can do this with two already-trained models if you initialize the combined model with the two trained models using a from_file model. To do it at training time is a little harder, but not impossible. You would train the first model like you do now. For the second step, you train the combined model directly, with the inner first model's weights frozen.
The other thing you can do is use AllenNLP as a library, without the config files. We have a template up on GitHub that shows you how to do this. The basic insight is that everything you configure in one of the Jsonnet configuration files corresponds 1:1 to a Python class that you can use directly from Python. There is no requirement to use the configuration files. If you use AllenNLP this way, have much more flexibility, including chaining things together.

Ray RLllib: Export policy for external use

I have a PPO policy based model that I train with RLLib using the Ray Tune API on some standard gym environments (with no fancy preprocessing). I have model checkpoints saved which I can load from and restore for further training.
Now, I want to export my model for production onto a system that should ideally have no dependencies on Ray or RLLib. Is there a simple way to do this?
I know that there is an interface export_model in the rllib.policy.tf_policy class, but it doesn't seem particularly easy to use. For instance, after calling export_model('savedir') in my training script, and in another context loading via model = tf.saved_model.load('savedir'), the resulting model object is troublesome (something like model.signatures['serving_default'](gym_observation) doesn't work) to feed the correct inputs into for evaluation. I'm ideally looking for a method that would allow for easy out of the box model loading and evaluation on observation objects
Once you have restored from checkpoint with agent.restore(**checkpoint_path**), you can use agent.export_policy_model(**output_dir**) to export the model as a .pb file and variables folder.

INFO:tensorflow:Summary name conv2d_1/kernel:0 is illegal

I am trying to use the tensorboard callback in keras. When I run the pretrained inceptionv3 model with the tensorboard callback I am getting the following warning:
INFO:tensorflow:Summary name conv2d_95/kernel:0 is illegal; using conv2d_95/kernel_0 instead.
I saw a comment on Github addressing this issue. SeaFX on his comment pointed out that he solved it by replacing variable.name with variable.name.replace(':','_'). I am unsure how to do that. Can anyone please help me. Thanks in advance :)
Not sure on getting name replacement to work however a workaround that may be sufficient for your needs is:
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.WARN)
import keras
This will turn off all INFO level logging but keep warnings, errors etc.
See this question for a discussion on the various log levels and changing them. Personally I found setting the TF_CPP_MIN_LOG_LEVEL environment variable didn't work under Jupyter notebook but I haven't tested on base Python.