Get mean/variance from gym wrapper after training - reinforcement-learning

I am using gym.wrappers.NormalizeObservation to for my training, and it works well. However, since I am using other wrappers on top and also have a vectorised environment, I cannot find a way to retrieve the mean and variance after training is finished, meaning I cannot test the model properly. Is there any way around this?

Related

Custom translator - Model adjustment after training

I've used three parallel sentence files to train my custom translator model. No dictionary files and no tuning files too. After training is finished and I've checked test results, I want to make some adjustments in the model. And here are several questions:
Is it possible to tune the model after training? Am I right that the model can't be changed and the only way is to train a new model?
The best approach to adjusting the model is to use tune files. Is it correct?
There is no way to see an autogenerated tune file, so I have to provide my own tuning file for a more manageable tuning process. Is it so?
Could you please describe how the tuning file is generated, when I have 3 sentence files with different amount of sentences, which is: 55k, 24k and 58k lines. Are all tuning sentences is from the first file or from all three files proportionally to their size? Which logic is used?
I wish there were more authoritative answers on this, I'll share what I know as a fellow user.
What Microsoft Custom Translator calls "tuning data" is what is normally known as a validation set. It's just a way to avoid overfitting.
Is it possible to tune the model after training? Am I right that the model can't be changed and the only way is to train a new model?
Yes, with Microsoft Custom Translator you can only train a model based on the generic category you have selected for the project.
(With Google AutoML technically you can choose to train a new model based on one of your previous custom models. However, it's also not usable without some trial and error.)
The best approach to adjusting the model is to use tune files. Is it correct?
It's hard to make a definitive statement on this. The training set also has an effect. A good validation set on top of a bad training set won't get us good results.
There is no way to see an autogenerated tune file, so I have to provide my own tuning file for a more manageable tuning process. Is it so?
Yes, it seems to me that if you let it decide how to split the training set into the training set, tuning set and test set, you can only download the training set and the test set.
Maybe neither includes the tuning set, so theoretically you can diff them. But that doesn't solve the problem of the split being different between different models.
... Which logic is used?
Good question.

TRPO - RL: I need to get a 8DOF robot arm to move to a specified point. I need to implement the TRPO RL code using OpenAI gym with Gazebo environment?

TRPO - RL: I need to get a 8DOF robot arm to move a specified point. I need to implement the TRPO RL code using OpenAI gym. I already have the gazebo environement. But I am unsure of how to write the code for the reward functons and the algorithm for the joint space motion. Please help.
Reward
Gazebo should be able to tell you the position of the end-effector link from which we can calculate the progress made towards a specified point after each step (i.e. positive if moving towards the goal, negative if away, and 0 otherwise).
This alone should encourage the end-effector towards the goal.
You may want to confirm that the system is able to learn with just this basic reward first before considering other criterions such as smoothness (avoid jerking motions), handedness (positioning the elbows on the left/right) etc.
These are significantly harder to specify and will have to be hand-designed according to your needs, possibly based on the joint states and/or some other derivatives that are available in your environment.
Motion
This will largely depend on your stack.
I am adding this part in just as a passing comment, but for instance, if you are using ROS as your middleware, then you can easily integrate Move-It to handle all the movement for you.

Camera image recognition with small sample set

I need to visually recognise some flat pictures showed to camera. There are not many of them (maybe 30) but discrimination may depend on details. The input may be partly obscured or shadowed and is suspect to lighting changes.
The samples need to be updatable.
There are many existing frameworks for object detection, with the most reliable ones depending on deep learning methods (mostly convolutional networks). However, the pretrained models are not well optimised to discern flat imagery of course, and even if I start training from scratch, updating the system for new samples would take a cumbersome training process, if I am right about how this works.
Is it possible to use deep learning while still keeping the sample pool flexible?
Is there any other well known reliable method to detect images from a small sample set?
One can use well trained networks for visual classification like Inception or SqueezeNet, slice of the last layer(s) and add a simple statistical algorithm (for example k-nearest neighbour) that can be directly teached by the samples in a non-iterative fashion.
Most classification-related calculations like lighting and orientation insensitivity are already handled by the pre-trained network then, while the network's output keep enough information to allow statistical algorithms decide the image class.
An implementation using k-nearest neighbour is shown here: https://teachablemachine.withgoogle.com/ , the source is hosted here: https://github.com/googlecreativelab/teachable-machine .
Use transfer learning; you’ll still need to build a training set, but you’ll get better results than starting with random weights. Try to find a model trained on images similar to yours. You might also do some black box testing of the selected model with your curated images to baseline it’s response curve to your images.

Trying to build a deep learning model using that plays google dinosaur on its own. It is always predicting the same class

Here is the github link to the code. Basically, I generated training and test images.
The training images consists pictures where the dinosaur jumped and also the pictures where the dinosaur did not jump in 2 separate directories. Then I made a simple sequential model in keras consisting of 2 convolutional layers and trained it on my data.
However, when I test it on my test image, everytime, it predicts the same class.
print(model.predict_proba(x) always gives 0. I can't seem to figure out the cause.
Please help me resolve this issue.
PS: I know it has already been done by many. However I wanted to try doing so on my own.

Deep Neural Network combined with qlearning

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning.
Right now I'm using the Kinect Gesture Builder program which uses Supervised Learning to associate user movement to specific gestures. But that requires supervised training which I'd like to move away from. I figure the algorithm might pick up certain associations between joints that I would when I classify the data myself (hands up, step left, step right, for example).
I think feeding that data into a deep neural network and then pass that into a reinforcement learning algorithm might give me a better result.
There was a paper on this recently. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
I know Accord.net has both deep neural networks and RL but has anyone combined them together? Any insights?
If I understand correctly from your question + comment, what you want is to have an agent that performs discrete actions using a visual input (raw pixels from a camera). This looks exactly like what DeepMind guys recently did, extending the paper you mentioned. Have a look at this. It is the newer (and better) version of playing Atari games. They also provide an official implementation, which you can download here.
There is even an implementation in Neon which works pretty well.
Finally, if you want to use continuous actions, you might be interested in this very recent paper.
To recap: yes, somebody combined DNN + RL, it works and if you want to use raw camera data to train an agent with RL, this is definitely one way to go :)