I am looking for a hyperparameter optimization (hpo) or neural architectural search (nas) framework/library such as ray tune or optuna that implements or allows to make use of a performance predictor (such as https://arxiv.org/abs/1705.10823 or similar) to speed up the hyperparameter search.
Related
Is there a value-based (Deep) Reinforcement Learning RL algorithm available that is centred fully around learning only the state-value function V(s), rather than to the state-action-value function Q(s,a)?
If not, why not, or, could it easily be implemented?
Any implementations even available in Python, say Pytorch, Tensorflow or even more high-level in RLlib or so?
I ask because
I have a multi-agent problem to simulate where in reality some efficient centralized decision-making that (i) successfully incentivizes truth-telling on behalf of the decentralized agents, and (ii) essentially depends on the value functions of the various actors i (on Vi(si,t+1) for the different achievable post-period states si,t+1 for all actors i), defines the agents' actions. From an individual agents' point of view, the multi-agent nature with gradual learning means the system looks non-stationary as long as training is not finished, and because of the nature of the problem, I'm rather convinced that learning any natural Q(s,a) function for my problem is significantly less efficient than learning simply the terminal value function V(s) from which the centralized mechanism can readily derive the eventual actions for all agents by solving a separate sub-problem based on all agents' values.
The math of the typical DQN with temporal difference learning seems to naturally be adaptable a state-only value based training of a deep network for V(s) instead of the combined Q(s,a). Yet, within the value-based RL subdomain, everybody seems to focus on learning Q(s,a) and I have not found any purely V(s)-learning algos so far (other than analytical & non-deep, traditional Bellman-Equation dynamic programming methods).
I am aware of Dueling DQN (DDQN) but it does not seem to be exactly what I am searching for. 'At least' DDQN has a separate learner for V(s), but overall it still targets to readily learn the Q(s,a) in a decentralized way, which seems not conducive in my case.
pretty new to deep learning, but couldn't seem to find/figure out what are backend weights such as
full_yolo_backend.h5
squeezenet_backend.h5
From what I have found and experimented, these backend weights have fundamentally different model architectures such as
yolov2 model has 40+ layers but the backend only 20+ layers (?)
you can build on top of the backend model with your own networks (?)
using backend models tend to yield poorer results (?)
I was hoping to seek some explanation on backend weights vs actual models for learning purposes. Thank you so much!
I'm note sure which implementation you are using but in many applications, you can consider a deep model as a feature extractor whose output is more or less task-agnostic, followed by a number of task-specific heads.
The choice of backend depends on your specific constraints in terms of tradeoff between accuracy and computational complexity. Examples of classical but time-consuming choices for backends are resnet-101, resnet-50 or VGG that can be coupled with FPN (feature pyramid networks) to yield multiscale features. However, if speed is your main concern then you can use smaller backends such as different MobileNet architectures or even the vanilla networks such as the ones used in the original Yolov1/v2 papers (tinyYolo is an extreme case).
Once you have chosen your backend (you can use a pretrained one), you can load its weights (that is what your *h5 files are). On top of that, you will add a small head that will carry the tasks that you need: this can be classification, bbox regression, or like in MaskRCNN forground/background segmentation. For Yolov2, you can just add very few, for example 3 convolutional layers (with non-linearities of course) that will output a tensor of size
BxC1xC2xAxP
#B==batch size
#C1==number vertical of cells
#C2==number of horizontal cells
#C3==number of anchors
#C4==number of parameters (i.e. bbx parameters, class prediction, confidence)
Then, you can just save/load the weights of this head separately. When you are happy with your results though, training jointly (end-to-end) will usually give you a small boost in accuracy.
Finally, to come back to your last questions, I assume that you are getting poor results with the backends because you are only loading backend weights but not the weights of the heads. Another possibility is that you are using a head trained with a backends X but that you are switching the backend to Y. In that case since the head expects different features, it's natural to see a drop in performance.
I am learning about the approach employed in Reinforcement Learning for robotics and I came across the concept of Evolutionary Strategies. But I couldn't understand how RL and ES are different. Can anyone please explain?
To my understanding, I know of two main ones.
1) Reinforcement learning uses the concept of one agent, and the agent learns by interacting with the environment in different ways. In evolutionary algorithms, they usually start with many "agents" and only the "strong ones survive" (the agents with characteristics that yield the lowest loss).
2) Reinforcement learning agent(s) learns both positive and negative actions, but evolutionary algorithms only learns the optimal, and the negative or suboptimal solution information are discarded and lost.
Example
You want to build an algorithm to regulate the temperature in the room.
The room is 15 °C, and you want it to be 23 °C.
Using Reinforcement learning, the agent will try a bunch of different actions to increase and decrease the temperature. Eventually, it learns that increasing the temperature yields a good reward. But it also learns that reducing the temperature will yield a bad reward.
For evolutionary algorithms, it initiates with a bunch of random agents that all have a preprogrammed set of actions it is going to do. Then the agents that has the "increase temperature" action survives, and moves onto the next generation. Eventually, only agents that increase the temperature survive and are deemed the best solution. However, the algorithm does not know what happens if you decrease the temperature.
TL;DR: RL is usually one agent, trying different actions, and learning and remembering all info (positive or negative). EM uses many agents that guess many actions, only the agents that have the optimal actions survive. Basically a brute force way to solve a problem.
I think the biggest difference between Evolutionary Strategies and Reinforcement Learning is that ES is a global optimization technique while RL is a local optimization technique. So RL can converge to a local optima converging faster while ES converges slower to a global minima.
Evolution Strategies optimization happens on a population level. An evolution strategy algorithm in an iterative fashion (i) samples a batch of candidate solutions from the search space (ii) evaluates them and (iii) discards the ones with low fitness values. The sampling for a new iteration (or generation) happens around the mean of the best scoring candidate solutions from the previous iteration. Doing so enables evolution strategies to direct the search towards a promising location in the search space.
Reinforcement learning requires the problem to be formulated as a Markov Decision Process (MDP). An RL agent optimizes its behavior (or policy) by maximizing a cumulative reward signal received on a transition from one state to another. Since the problem is abstracted as an MDP learning can happen on a step or episode level. Learning per step (or N steps) is done via temporal-Difference learning (TD) and per episode is done via Monte Carlo methods. So far I am talking about learning via action-value functions (learning the values of actions). Another way of learning is by optimizing the parameters of a neural network representing the policy of the agent directly via gradient ascent. This approach is introduced in the REINFORCE algorithm and the general approach known as policy-based RL.
For a comprehensive comparison check out this paper https://arxiv.org/pdf/2110.01411.pdf
Is there any easy way to merge properties of PPO with an A3C method? A3C methods run a number of parrel actors and optimize the parameters. I am trying to merge PPO with A3C.
PPO has a built-in mechanism(surrogate clipping objective function) to prevent large gradient updates & generally outperforms A3C on most continuous control environments.
In order for PPO to enjoy the benefits of parallel computing like A3C, Distributed PPO(DPPO) is the way to go.
Check out the links below to find out more information about DPPO.
Pseudo code from the original DeepMind paper
Original DeepMind paper: Emergence of Locomotion Behaviours in Rich Environments
If you plan to implement your DPPO code in Python with Tensorflow, I will suggest you to try Ray for the part on distributed execution.
Doing binary classification with infected/uninfected RBCs (something the pretrained DL models have never seen before) using models and weights from Keras. I find the performance of the models (vgg16,19,xception) decrease with increase in the number of training and validation instances. Why?
Maybe it is related to resource management where you are doing inference and the model expands in the memory and it can decrease the performance. This situation will create a lot of Main memory access to perform the forward pass computations and page faults are occurring and it can decrease the performance.
Hope this helps.