stable-baseline3, gym, train while also step/predict - reinforcement-learning

With stable-baselines3 given an agent, we can call "action = agent.predict(obs)". And then with Gym, this would be "new_obs, reward, done, info = env.step(action)". (more or less, maybe missed an input or an output).
We also have "agent.learn(10_000)" as an example, yet here we're less involved in the process and don't call the environment.
Looking for a way to train the agent while still calling "env.step". If you wander why, just trying to implement self play (agent and a previous version of it) playing with one environment (for example turns play as Chess).
WKR, Oren.

But why do you need it? If you take a look at the implementation of any learn method, you will see it is nothing more than an iteration over time steps calling collect_rollouts and train with some additional logging and setup at the beginning for, e.g., further saving the agent etc. Your env.step is called inside collect_rollouts.
I'd better suggest you to write a callback based on CheckpointCallback, which saves your agent (model) after N training steps and then attach this callback to your learn call. In your environment you could instantiate each N steps a "new previous" version of your model by calling ModelClass.load(file) on the file saved by a callback, so that finally you would be able to select actions of the other player using a self-play in your environment

Related

Discord.py assistance - How does the library forcibly call an asynchronous command function?

I have used many discord API wrappers, but as an experienced python developer, unfortunately I somehow still do not understand how a command gets called!
#client.command()
async demo(ctx):
channel = ctx.channel
await channel.send(f'Demonstration')
Above a command has been created (function) and it is placed after its decorator #client.command()
To my understanding, the decorator is in a way, a "check" performed before running the function (demo) but I do not understand how the discord.py library seemingly "calls" the demo function.....?? Is there some form of short/long polling system in the local imported discord.py library which polls the discord API and receives a list of jobs/messages and checks these against the functions the user has created?
I would love to know how this works as I dont understand what "calls" the functions that the user makes, and this would allow me to make my own wrapper for another similar social media platform! Many thanks in advance.
I am trying to work out how functions created by the user are seemingly "called" by the discord.py library. I have worked with the discord.py wrapper and other API wrappers before.
(See source code attached at the bottom of the answer)
The #bot.command() decorator adds a command to the internal lists/mappings of commands stored in the Bot instance.
Whenever a message is received, this runs through Bot.process_commands. It can then look through every command stored to check if the message starts with one of them (prefix is checked beforehand). If it finds a match, then it can invoke it (the underlying callback is stored in the Command instance).
If you've ever overridden an on_message event and your commands stopped working, then this is why: that method is no longer being called, so it no longer tries to look through your commands to find a match.
This uses a dictionary to make it far more efficient - instead of having to iterate over every single command & alias available, it only has to check if the first letters of the message match anything at all.
The commands.Command() decorator used in Cogs works slightly different. This turns your function into a Command instance, and when adding a cog (using Bot.add_cog()) the library checks every attribute to see if any of them are Command instances.
References to source code
GroupMixin.command() (called when you use #client.command()): https://github.com/Rapptz/discord.py/blob/24bdb44d54686448a336ea6d72b1bf8600ef7220/discord/ext/commands/core.py#L1493
As you can see, it calls add_command() internally to add it to the list of commands.
Adding commands (GroupMixin.add_command()): https://github.com/Rapptz/discord.py/blob/24bdb44d54686448a336ea6d72b1bf8600ef7220/discord/ext/commands/core.py#L1315
Bot.process_commands(): https://github.com/Rapptz/discord.py/blob/master/discord/ext/commands/bot.py#L1360
You'll have to follow the chain - most of the processing actually happens in get_context which tries to create a Context instance out of the message: https://github.com/Rapptz/discord.py/blob/24bdb44d54686448a336ea6d72b1bf8600ef7220/discord/ext/commands/bot.py#L1231
commands.Command(): https://github.com/Rapptz/discord.py/blob/master/discord/ext/commands/core.py#L1745

Accessing regmap RegFields

I am trying to find a clean way to access the regmap that is used with *RegisterNode for creating documentation and testing files. The TLRegisterNode has methods for generating the json through some Annotations. These are done in the regmap method by adding them to the ElaborationArtefacts object. Other protocols don't seem to have these annotations.
Is there anyway to iterate over the "regmap" Register Fields post elaboration or during?
I cannot just access the regmap as it's not really a val/var since it's a method. I can't quite figure out where this information is being stored. I don't really believe it's actually "storing" any information as much as it is simply creating the hardware to attach the specified logic to the RegisterNode based logic.
The JSON output is actually fine for me as I could just write a post processing script to convert JSON to my required formats, but I'm wondering if I can access this information OR if I could add a custom function call at the end. I cannot extend the case class *RegisterNode, but I'm not sure if it's possible to add custom functions to run at the end of the regmap method.
Here is something I threw together quickly:
//in *RegisterRouter.scala
def customregmap(customFunc: (RegField.Map*) => Unit, mapping: RegField.Map*) = {
regmap(mapping:_*)
customFunc(mapping:_*)
}
def regmap(mapping: RegField.Map*) = {
//normal stuff
}
A user could then create a custom function to run and pass it to the regmap or to the RegisterRouter
def myFunc(mapping: RegField.Map*): Unit = {
println("I'm doing my custom function for regmap!")
}
// ...
node.customregmap(myFunc,
0x0 -> coreControlRegFields,
0x4 -> fdControlRegFields,
0x8 -> fdControl2RegFields,
)
This is just a quick example I have. I believe what would be better, if something like this was possible, would be to have a Seq of functions that could be added to the RegisterNode that are ran at the end of the regmap method, similar to how TLRegisterNode currently works. So a user could add an arbitrary number and you still use the regmap call.
Background (not directly part of question):
I have a unified register script that I have built over the years in which I describe the registers for a particular IP. It works very similar to the RegField/node.regmap, except it obviously doesn't know about diplomacy and the like. It will generate the Verilog, but also a variety of files for DV (basic `defines for simple verilog simulations and more complex uvm_reg_block defines also with the ability to describe multiple of the IPs for a subsystem all the way up to an SoC level). It will also print out C Header files for SW and Sphinx reStructuredText for documentation.
Diplomacy actually solves one of the main issues I've been dealing with so I'm obviously trying to push most of my newer designs to Chisel/Diplo.
I ended up solving this by creating my own RegisterNode which is the same as the rocketchip RegisterNodes except that I use a different Elaboration Artifact to grab the info and store it for later.

Force lex to elicit specific slot

I have multiple slots in my intent. Is it possible to force lex to elicit a specific slot using the aws sdk POSTTEXT call, and not worry about the priority of slots?
Example:
pizzaordering - intent
toppings - slot
pizzasize - slot
cheesequanity - slot
pizzaquantity - slot
When i post "25" to lex, i want it to match pizzaquantity instead of cheesequanity
Eliciting a specific slot needs to happen after Lex processes the input and sends the Event/Request to your Lambda Function during the "initialization and validation" code hook.
Without a Lambda Function, Lex will only Delegate which slots to elicit based on which ones are checked as required.
So to have more control like this, you will need a Lambda Function. You will want to read the Lambda Function Input Event and Response Formats. That shows you how Lex will pass the processed user input to your Lambda Function, and how to respond in certain ways so that you can tell Lex what to do next, such as ElicitSlot
To be clear, this is not done with the PostText API.
If you are already using a Lambda Function, then you can post the code you are using but want it to elicit a specific slot, then I could offer a more specific solution. If you don't use a Lambda Function yet, then try to set one up and you may see how to use elicitSlot yourself.
If you run into more problems, just ask another question.

Lambda function calling another Lambda function

I want to create a Lambda function that runs through S3 files and if needed triggers other Lambda functions to parse the files in parallel.
Is this possible?
Yes it's possible. You would use the AWS SDK (which is included in the Lambda runtime environment for you) to invoke other Lambda functions, just like you would do in code running anywhere else.
You'll have to specify which language you are writing the Lambda function in if you want a more detailed answer.
If I understand your problem correctly you want one lambda that goes through a list of files in a S3-bucket. Some condition will decide whether a file should be parsed or not. For the files that should be parsed you want another 'file-parsing' lambda to parse those files.
To do this you will need two lambdas - one 'S3 reader' and one 'S3 file parser'.
For triggering the 'S3 file parser' lambda you have many few different options. Here are a two:
Trigger it using a SNS topic. (Here is an article on how to do that). If you have a very long list of files this might be an issue, as you most likely will surpass the number of instances of a lambda that can run in parallel.
Trigger it by invoking it with the AWS SDK. (See the article 'Leon' posted as comment to see how to do that.) What you need to consider here is that a long list of files might cause the 'S3 reader' lambda that controls the invocation to timeout since there is a 5 min runtime limit for a lambda.
Depending on the actual use case another potential solution is to just have one lambda that gets triggered when a file gets uploaded to the S3 bucket and let it decide whether it should get parsed or not and then parse it if needed. More info about how to do that can be found in this article and this tutorial.

Making a custom reporter for JSCS results in Gulp4

Please correct me where I'm wrong (still learning Gulp, Streams, etc.) I'd like to create a custom reporter for my gulp-jscs results. For example, let's say I have 3 files in my gulp.src() stream. To my knowledge, each is piped one at a time through jscs, which attaches a .jscs object onto the file with its results, one such variable in that object is .errorCount.
What I'd like to do is have a variable I create, ie: maxErrors which I set to, say 5. Since we're processing 3 files, let's say the first file passes with 0 errors, but the next has 3 errors. I don't want to prematurely stop processing since the maxErrors tally has not been reached (3/5 currently). So it should continue to process the next file which lets say has 3 errors as well, putting us over our max, so that we interrupt jscs from continuing to process more files and instead fail out and then let our custom reporter function gain access to the files that have been processed so I can look at their .jscs objects and customize some output.
My problem here is that I don't understand the docs when they say: .pipe(jscs.reporter('name-of-reporter')) How does a string value invoke my reporter (which currently exists as a function I've imported called libs.reporters.myJSCSReporter. I know pipe() expects Stream objects, so I can't just put a function in the .pipe() call.
I hope I've explained myself well enough (please ask for clarifications otherwise).