How could I create a module with learnable parameters when one set of parameters is from Dataset class - deep-learning

So, I have a model with some parameters that I need to train. And also, there are some parameters in the class Dataset(torch.utils.data.Dataset) which do some preprocessing. I need to train them as well with the model’s parameters. So, can you please let me know if what I am doing is correct:
params = list(model.parameters())
params.extend(list(Dataset.parameters()))
opt = torch.optim.Adam(params,lr=1e-4)
One more question. As Dataset class will generate both the train_ds and val_ds, I only need to train the parameters of the Dataset class when getting train_ds. And to get val_ds, I need to use the trained parameters of the Dataset. So, how do I create the train_ds and val_ds? Should I initially create both of them and then train the model with tarin_ds, and then create them again and use the val_ds for testing?

Dataset class does not have .parameters - it is used only to handle data.
Your model class is derived from nn.Module class that has .parameters and can be trained.
It seems like you need to add the trainable preprocessing to your model, rather than have it as part of your dataset class.

Related

BentoML - Seving a CatBoostClassifier with cat_features

I am trying to create a BentoML service for a CatBoostClassifier model that was trained using a column as a categorical feature. If i save the model and I try to make some predictions with the saved model (not as a BentoML service) all works as expected, but when I create the service using BentML I get an error
_catboost.CatBoostError: Bad value for num_feature[non_default_doc_idx=0,feature_idx=2]="Tertiary": Cannot convert 'b'Tertiary'' to float
The value is found in a column named 'road_type' and the model was trained using 'object' as the data type for the column.
If I try to give a float or an integer for the 'road_type' column I get the following error
_catboost.CatBoostError: catboost/libs/data/model_dataset_compatibility.cpp:53: Feature road_type is Categorical in model but marked different in the dataset
If someone has encountered the same issue and found a solution I would appreciate it. Thanks!
I have tried different approaches for saving the model or loading the model but unfortunately it did not worked.
You can try to explicitly pass the cat_features to the bentoml runner.
It would be something like this:
from catboost import Pool
runner = bentoml.catboost.get("bentoml_catboost_model:latest").to_runner()
cat_features = [2] # specify your cat_features indexes
prediction = runner.predict.run(Pool(input_data, cat_features=cat_features))

Add a TensorBoard metric from my PettingZoo environment

I'm using Tensorboard to see the progress of the PettingZoo environment that my agents are playing. I can see the reward go up with time, which is good, but I'd like to add other metrics that are specific to my environment. i.e. I'd like TensorBoard to show me more charts with my metrics and how they improve over time.
The only way I could figure out how to do that was by inserting a few lines into the learn method of OnPolicyAlgorithm that's part of SB3. This works and I got the charts I wanted:
(The two bottom charts are the ones I added.)
But obviously editing library code isn't a good practice. I should make the modifications in my own code, not in the libraries. Is there currently a more elegant way to add a metric from my PettingZoo environment into TensorBoard?
You can add a callback to add your own logs. See the below example. In this case the call back is called every step. There are other callbacks that you case use depending on your use case.
import numpy as np
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback
model = SAC("MlpPolicy", "Pendulum-v1", tensorboard_log="/tmp/sac/", verbose=1)
class TensorboardCallback(BaseCallback):
"""
Custom callback for plotting additional values in tensorboard.
"""
def __init__(self, verbose=0):
super(TensorboardCallback, self).__init__(verbose)
def _on_step(self) -> bool:
# Log scalar value (here a random variable)
value = np.random.random()
self.logger.record('random_value', value)
return True
model.learn(50000, callback=TensorboardCallback())

Difference between freezing layer with requires_grad and not passing params to optim in PyTorch

Let's say I train an autoencoder.
I want to freeze the parameters of the encoder for the training, so only the decoder trains.
I can do this using:
# assuming it's a single layer called 'encoder'
model.encoder.weights.data.requers_grad = False
Or I can pass only the decoder's parameters to the optimizer. Is there a difference?
The most practical way is to iterate through all parameters of the module you want to freeze and set required_grad to False. This gives you the flexibility to switch your modules on and off without having to initialize a new optimizer each time. You can do this using the parameters generator available on all nn.Modules:
for param in module.parameters():
param.requires_grad = False
This method is model agnostic since you don't have to worry whether your module contains multiple layers or sub-modules.
Alternatively, you can call the function nn.Module.requires_grad_ once as:
module.requires_grad_(False)

Multiplayer game using pygame [duplicate]

We are working on a Top-Down-RPG-like Multiplayer game for learning purposes (and fun!) with some friends. We already have some Entities in the Game and Inputs are working, but the network implementation gives us headache :D
The Issues
When trying to convert with dict some values will still contain the pygame.Surface, which I dont want to transfer and it causes errors when trying to jsonfy them. Other objects I would like to transfer in a simplyfied way like Rectangle cannot be converted automatically.
Already functional
Client-Server connection
Transfering JSON objects in both directions
Async networking and synchronized putting into a Queue
Situation
A new player connects to the server and wants to get the current game state with all objects.
Data-Structure
We use a "Entity-Component" based architecture, so we separated the game logic very strictly into "systems", while the data is stored in the "components" of each Entity. The Entity is a very simple container and has nothing more than a ID and a list of components. Example Entity (shorten for better readability):
Entity
|-- Component (Moveable)
|-- Component (Graphic)
| |- complex datatypes like pygame.SURFACE
| `- (...)
`- Component (Inventory)
We tried different approaches, but all seems not to fit very well or feel "hacky".
pickle
Very Python near, so not easy to implement other clients in future. And I´ve read about some security risks when creating items from network in this dynamic way how pickle it offers. It does not even solve the Surface/Rectangle issue.
__dict__
Still contains the reference to the old objects, so a "cleanup" or "filter" for unwanted datatypes happens also in the origin. A deepcopy throws Exception.
...\Python\Python36\lib\copy.py", line 169, in deepcopy
rv = reductor(4)
TypeError: can't pickle pygame.Surface objects
Show some code
The method of the "EnitityManager" Class which should generate the Snapshot of all Entities, including their components. This Snapshot should be converted to JSON without any errors - and if possible without much configuration in this core-class.
class EnitityManager:
def generate_world_snapshot(self):
""" Returns a dictionary with all Entities and their components to send
this to the client. This function will probably generate a lot of data,
but, its to send the whole current game state when a new player
connects or when a complete refresh is required """
# It should be possible to add more objects to the snapshot, so we
# create our own Snapshot-Datastructure
result = {'entities': {}}
entities = self.get_all_entities()
for e in entities:
result['entities'][e.id] = deepcopy(e.__dict__)
# Components are Objects, but dictionary is required for transfer
cmp_obj_list = result['entities'][e.id]['components']
# Empty the current list of components, its going to be filled with
# dictionaries of each cmp which are cleaned for the dump, because
# of the errors directly coverting the whole datastructure to JSON
result['entities'][e.id]['components'] = {}
for cmp in cmp_obj_list:
cmp_copy = deepcopy(cmp)
cmp_dict = cmp_copy.__dict__
# Only list, dict, int, str, float and None will stay, while
# other Types are being simply deleted including their key
# Lists and directories will be cleaned ob recursive as well
cmp_dict = self.clean_complex_recursive(cmp_dict)
result['entities'][e.id]['components'][type(cmp_copy).__name__] \
= cmp_dict
logging.debug("EntityMgr: Entity#3: %s" % result['entities'][3])
return result
Expectation and actual results
We can find a way to manually override elements which we dont want. But as the list of components will increase we have to put all the filter logic into this core class, which should not contain any components specializations.
Do we really have to put all the logic into the EntityManager for filtering the right objects? This does not feel good, as I would like to have all convertion to JSON done without any hardcoded configuration.
How to convert all this complex data in a most generic approach?
Thanks for reading so far and thank you very much for your help in advance!
Interesting articles which we were already working threw and maybe helpful for others with similar issues
https://gafferongames.com/post/what_every_programmer_needs_to_know_about_game_networking/
http://code.activestate.com/recipes/408859/
https://docs.python.org/3/library/pickle.html
UPDATE: Solution - thx 2 sloth
We used a combination of the following architecture, which works really great so far and is also good to maintain!
Entity Manager now calls the get_state() function of the entity.
class EntitiyManager:
def generate_world_snapshot(self):
""" Returns a dictionary with all Entities and their components to send
this to the client. This function will probably generate a lot of data,
but, its to send the whole current game state when a new player
connects or when a complete refresh is required """
# It should be possible to add more objects to the snapshot, so we
# create our own Snapshot-Datastructure
result = {'entities': {}}
entities = self.get_all_entities()
for e in entities:
result['entities'][e.id] = e.get_state()
return result
The Entity has only some basic attributes to add to the state and forwards the get_state() call to all the Components:
class Entity:
def get_state(self):
state = {'name': self.name, 'id': self.id, 'components': {}}
for cmp in self.components:
state['components'][type(cmp).__name__] = cmp.get_state()
return state
The components itself now inherit their get_state() method from their new superclass components, which simply cares about all simple datatypes:
class Component:
def __init__(self):
logging.debug('generic component created')
def get_state(self):
state = {}
for attr, value in self.__dict__.items():
if value is None or isinstance(value, (str, int, float, bool)):
state[attr] = value
elif isinstance(value, (list, dict)):
# logging.warn("Generating state: not supporting lists yet")
pass
return state
class GraphicComponent(Component):
# (...)
Now every developer has the opportunity to overlay this function to create a more detailed get_state() function for complex types directly in the Component Classes (like Graphic, Movement, Inventory, etc.) if it is required to safe the state in a more accurate way - which is a huge thing for maintaining the code in future, to have these code pieces in one Class.
Next step is to implement the static method for creating the items from the state in the same Class. This makes this working really smooth.
Thank you so much sloth for your help.
Do we really have to put all the logic into the EntityManager for filtering the right objects?
No, you should use polymorphism.
You need a way to represent your game state in a form that can be shared between different systems; so maybe give your components a method that will return all of their state, and a factory method that allows you create the component instances out of that very state.
(Python already has the __repr__ magic method, but you don't have to use it)
So instead of doing all the filtering in the entity manager, just let him call this new method on all components and let each component decide that the result will look like.
Something like this:
...
result = {'entities': {}}
entities = self.get_all_entities()
for e in entities:
result['entities'][e.id] = {'components': {}}
for cmp in e.components:
result['entities'][e.id]['components'][type(cmp).__name__] = cmp.get_state()
...
And a component could implement it like this:
class GraphicComponent:
def __init__(self, pos=...):
self.image = ...
self.rect = ...
self.whatever = ...
def get_state(self):
return { 'pos_x': self.rect.x, 'pos_y': self.rect.y, 'image': 'name_of_image.jpg' }
#staticmethod
def from_state(state):
return GraphicComponent(pos=(state.pos_x, state.pos_y), ...)
And a client's EntityManager that recieves the state from the server would iterate for the component list of each entity and call from_state to create the instances.

kotlin, how to simplify passing parameters to base class constructor?

We have a package that we are looking to convert to kotlin from python in order to then be able to migrate systems using that package.
Within the package there are a set of classes that are all variants, or 'flavours' of a common base class.
Most of the code is in the base class which has a significant number of optional parameters. So consider:
open class BaseTree(val height:Int=10,val roots:Boolean=true, //...... lots more!!
class FruitTree(val fruitSize, height:Int=10, roots:Boolean=true,
// now need all possible parameters for any possible instance
):BaseTree(height=height, roots=roots //... yet another variation of same list
The code is not actually trees, I just thought this was a simple way to convey the idea. There are about 20 parameters to the base class, and around 10 subclasses, and each subclass effectively needs to repeat the same two variations of the parameter list from the base class. A real nightmare if the parameter list ever changes!
Those from a Java background may comment "20 parameters is too many", may miss that this is optional parameters, the language features which impacts this aspect of design. 20 required parameters would be crazy, but 10 or even 20 optional parameters is not so uncommon, check sqlalchemy Table for example.
In python, you to call a base class constructor you can have:
def __init__(self, special, *args, **kwargs):
super().__init(*args, **kwargs) # pass all parameters except special to base constructor
Does anyone know a technique, using a different method (perhaps using interfaces or something?) to avoid repeating this parameter list over and over for each subclass?
There is no design pattern to simplify this use case.
Best solution: Refactor the code to use a more Java like approach: using properties in place of optional parameters.
Use case explained: A widely used class or method having numerous optional parameters is simply not practical in Java, and kotlin is most evolved as way of making java code better. A python class with 5 optional parameters, translated to Java with no optional parameters, could have 5! ( and 5 factorial is 60) different Java signatures...in other words a mess.
Obviously no object should routinely be instanced with a huge parameter list, so normall python classes only evolve for classes when the majority of calls do not need to specify these optional parameters, and the optional parameters are for the exception cases. The actual use case here is the implementation of a large number of optional parameters, where it should be very rare for any individual object to be instanced using more than 3 of the optional parameter. So a class with 10 optional parameters that is used 500 times in an application, would still expect 3 of the optional parameters to be the maximum ever used in one instance. But this is simply a design approach not workable in Java, no matter how often the class is reused.
In Java, functions do hot have optional parameters, which means this case where an object is instanced in this way in a Java library simply could never happen.
Consider an object with one mandatory instance parameter, and five possible options. In Java these options would each be properties able to be set by setters, and objects would then be instanced, and the setter(s) called for setting any relevant option, but infrequently required change to the default value for that option.
The downside is that these options cannot be set from the constructor and remain immutable, but the resultant code reduces the optional parameters.
Another approach is to have a group of less 'swiss army knife' objects, with a set of specialised tools replacing the one do-it-all tool, even when the code could be seen as just slightly different nuances of the same theme.
Despite the support for Optional parameters in kotlin, The inheritance structure in kotlin is not yet optimised for heavier use of this feature.
You can skip the name like BaseTree(height, roots) by put the variable in order but you cannot do things like Python because Python is dynamic language.
It is normal that Java have to pass the variables to super class too.
FruitTree(int fruitSize, int height, boolean root) {
super(height, root);
}
There are about 20 parameters to the base class, and around 10 subclasses
This is most likely a problem of your classes design.
Reading your question I started to experiment myself and this is what I came up with:
interface TreeProperties {
val height: Int
val roots: Boolean
}
interface FruitTreeProperties: TreeProperties {
val fruitSize: Int
}
fun treeProps(height: Int = 10, roots: Boolean = true) = object : TreeProperties {
override val height = height
override val roots = roots
}
fun TreeProperties.toFruitProperty(fruitSize: Int): FruitTreeProperties = object: FruitTreeProperties, TreeProperties by this {
override val fruitSize = fruitSize
}
open class BaseTree(val props: TreeProperties)
open class FruitTree(props: FruitTreeProperties): BaseTree(props)
fun main(args: Array<String>){
val largTree = FruitTree(treeProps(height = 15).toFruitProperty(fruitSize = 5))
val rootlessTree = BaseTree(treeProps(roots = false))
}
Basically I define the parameters in an interface and extend the interface for sub-classes using the delegate pattern. For convenience I added functions to generate instances of those interface which also use default parameters.
I think this achieves the goal of repeating parameter lists quite nicely but also has its own overhead. Not sure if it is worth it.
If your subclass really has that many parameters in the constructur -> No way around that. You need to pass them all.
But (mostly) it's no good sign, that a constructor/function has that many parameters...
You are not alone on this. That is already discussed on the gradle-slack channel. Maybe in the future, we will get compiler-help on this, but for now, you need to pass the arguments yourself.