How can I change n_steps while using stable baselines3 (PPO implementation)? - reinforcement-learning

I am implementing PPO from stable baselines3 for my custom environment. Right now n_steps = 2048, so the model update happens after 2048 time-steps. How can I change this, I want my model to update after n_steps = 1000?

Try using this as a parameter:
PPO(n_steps=1000)

Related

BentoML - Seving a CatBoostClassifier with cat_features

I am trying to create a BentoML service for a CatBoostClassifier model that was trained using a column as a categorical feature. If i save the model and I try to make some predictions with the saved model (not as a BentoML service) all works as expected, but when I create the service using BentML I get an error
_catboost.CatBoostError: Bad value for num_feature[non_default_doc_idx=0,feature_idx=2]="Tertiary": Cannot convert 'b'Tertiary'' to float
The value is found in a column named 'road_type' and the model was trained using 'object' as the data type for the column.
If I try to give a float or an integer for the 'road_type' column I get the following error
_catboost.CatBoostError: catboost/libs/data/model_dataset_compatibility.cpp:53: Feature road_type is Categorical in model but marked different in the dataset
If someone has encountered the same issue and found a solution I would appreciate it. Thanks!
I have tried different approaches for saving the model or loading the model but unfortunately it did not worked.
You can try to explicitly pass the cat_features to the bentoml runner.
It would be something like this:
from catboost import Pool
runner = bentoml.catboost.get("bentoml_catboost_model:latest").to_runner()
cat_features = [2] # specify your cat_features indexes
prediction = runner.predict.run(Pool(input_data, cat_features=cat_features))

Rollout summary statistics not being monitored for CustomEnv using Stable-Baselines3

I am trying to train a custom environment using PPO via Stable-Baselines3 and OpenAI Gym. For some reason the rollout statistics are not being reported for this custom environment when I try to train the PPO model.
The code that I am using is below ( I have not included the code for the CustomEnv for brevity):
env = CustomEnv(mode = "discrete")
env = Monitor(env, log_dir)
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log = log_dir)
timesteps = 5000
for i in range(3):
model.learn(total_timesteps = timesteps, reset_num_timesteps = False, tb_log_name = "PPO")
model.save(f"{models_dir}/car_model_{timesteps * i}")
Below is an image demonstrating the output from the above code (on the right of the image), and the left side of the image demonstrates the usual output from a dummy environment that I am using for debugging.
I have already tried adding the line of code:
env = Monitor(env, log_dir)
But that doesnt change the output.
SOLVED: There was an edge case where the environment was not ending, and the done variable remained False indefinitely.
After fixing this bug, the Rollout statistics reappeared.

Resource Allocation for Incremental Pipelines

There are times when an incremental pipeline in Palantir Foundry has to be built as a snapshot. If the data size is large, the resources to run the build are increased to reduce run time and then the configuration is removed after first snapshot run. Is there a way to set conditional configuration? Like if pipeline is running on Incremental Mode, use default configuration of resource allocation and if not the specified set of resources.
Example:
If pipeline runs as snapshot transaction, below configuration has to be applied
#configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"])
If incremental, then the default one.
The #configure and #incremental are set during the CI execution, while the actual code inside the function annotated by #transform_df or `#transform happens at build time.
This means that you can't programatically switch between them after the CI has passed. What you can do however is have a constant or configuration within your repo, and switch at code level whenever you want to switch these. Please make sure you understand how semantic versioning works before attempting this I.e.:
IS_INCREMENTAL = true
SEMANTIC_VERSION=1
def mytransform(input1, input2,...)
return input1.join(input2, "foo", left)
if IS_INCREMENTAL:
#incremental(semantic_version=SEMANTIC_VERSION)
#transform_df(
Output("foo"),
input1=Input("bar"),
input2=Input("foobar"))
def compute(input1, input2):
return mytransform(input1, input2)
else:
#configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"])
#transform_df(
Output("foo"),
input1=Input("bar"),
input2=Input("foobar"))
def compute(input1, input2):
return mytransform(input1, input2)

How to savely clone a pytorch module? Is creating a new one faster? #Pytorch

I need to repeatly create some modules, which are completely same.
In the following code, I fill two same lists with N same modules.
MIM_N_cell = []
MIM_S_cell = []
for i in range(self.num_layers - 1):
new_MIM_N_cell = MIM_NS_cell(input_dim=self.hidden_dim,
hidden_dim=self.hidden_dim,
kernel_size=self.kernel_size,
model_cfg=model_cfg)
new_MIM_S_cell = MIM_NS_cell(input_dim=self.hidden_dim,
hidden_dim=self.hidden_dim,
kernel_size=self.kernel_size,
model_cfg=model_cfg)
MIM_N_cell.append(new_MIM_N_cell)
MIM_S_cell.append(new_MIM_S_cell)
self.MIM_N_cell = nn.ModuleList(MIM_N_cell)
self.MIM_S_cell = nn.ModuleList(MIM_S_cell)
Should I use .clone() instead of creating a new module each time?
I guess cloning may be faster but I am not aware of the side affects of it. If it worth to clone, how to use it safely (i.e. don't change the training/testing behaviour of the network)

Difference between freezing layer with requires_grad and not passing params to optim in PyTorch

Let's say I train an autoencoder.
I want to freeze the parameters of the encoder for the training, so only the decoder trains.
I can do this using:
# assuming it's a single layer called 'encoder'
model.encoder.weights.data.requers_grad = False
Or I can pass only the decoder's parameters to the optimizer. Is there a difference?
The most practical way is to iterate through all parameters of the module you want to freeze and set required_grad to False. This gives you the flexibility to switch your modules on and off without having to initialize a new optimizer each time. You can do this using the parameters generator available on all nn.Modules:
for param in module.parameters():
param.requires_grad = False
This method is model agnostic since you don't have to worry whether your module contains multiple layers or sub-modules.
Alternatively, you can call the function nn.Module.requires_grad_ once as:
module.requires_grad_(False)