Neural Style Transfer with constraints - deep-learning

I want to implement Neural Style Transfer with constraints.
The way constraints are usually handled in optimization probelems is that it is implemented with the help of a Lagrange multiplier. So if I have a cost function then the constraints can be included by adding the equation of the constraint and multiplying it by what is called a Lagrange Multiplier.
But how can I implement this is Pytorch or any other deep learning scheme?
Hope someone can help.

Related

Deep Value-only Reinforcement Learning: Train V(s) instead of Q(s,a)?

Is there a value-based (Deep) Reinforcement Learning RL algorithm available that is centred fully around learning only the state-value function V(s), rather than to the state-action-value function Q(s,a)?
If not, why not, or, could it easily be implemented?
Any implementations even available in Python, say Pytorch, Tensorflow or even more high-level in RLlib or so?
I ask because
I have a multi-agent problem to simulate where in reality some efficient centralized decision-making that (i) successfully incentivizes truth-telling on behalf of the decentralized agents, and (ii) essentially depends on the value functions of the various actors i (on Vi(si,t+1) for the different achievable post-period states si,t+1 for all actors i), defines the agents' actions. From an individual agents' point of view, the multi-agent nature with gradual learning means the system looks non-stationary as long as training is not finished, and because of the nature of the problem, I'm rather convinced that learning any natural Q(s,a) function for my problem is significantly less efficient than learning simply the terminal value function V(s) from which the centralized mechanism can readily derive the eventual actions for all agents by solving a separate sub-problem based on all agents' values.
The math of the typical DQN with temporal difference learning seems to naturally be adaptable a state-only value based training of a deep network for V(s) instead of the combined Q(s,a). Yet, within the value-based RL subdomain, everybody seems to focus on learning Q(s,a) and I have not found any purely V(s)-learning algos so far (other than analytical & non-deep, traditional Bellman-Equation dynamic programming methods).
I am aware of Dueling DQN (DDQN) but it does not seem to be exactly what I am searching for. 'At least' DDQN has a separate learner for V(s), but overall it still targets to readily learn the Q(s,a) in a decentralized way, which seems not conducive in my case.

Why we use identity layer in ResNet?

As I understand Resnet has some identity layer that their task is to create the output as the same as the input of the layer. but what is the use of this work? What is the benefit to add layers like this?
Any help will be appreciated
The sole purpose of creating ResNet architecture was to fix the problem of degrading/ saturating accuracy in deeper network which was caused by vanishing gradients as a primary reason . Identity layer or skip connections help prevent this problem since it is very easy for a layer to learn a linear function where input is equal to the output i.e f(x) = x . Now ResNet performed a lot better than other architectures and one reason as specified by Andrew Ng in his course is that skip connections learn the function f(x) = x very easily and if you are lucky then they sometimes learn that function plus other features which is beneficial for the network in extracting out final features.

How do shared parameters in actor-critic models work?

Hello StackOverflow Community!
I have a question about Actor-Critic Models in Reinforcement Learning.
While listening policy gradient methods classes of Berkeley University, it is said in the lecture that in the actor-critic algorithms where we both optimize our policy with some policy parameters and our value functions with some value function parameters, we use same parameters in both optimization problems(i.e. policy parameters = value function parameters) in some algorithms (e.g. A2C/A3C)
I could not understand how this works. I was thinking that we should optimize them separately. How does this shared parameter solution helps us?
Thanks in advance :)
You can do it by sharing some (or all) layers of their network. If you do, however, you are assuming that there is a common state representation (the intermediate layer output) that is optimal w.r.t. both. This is a very strong assumption and it usually doesn't hold. It has been shown to work for learning from image, where you put (for instance) an autoencoder on the top both the actor and the critic network and train it using the sum of their loss function.
This is mentioned in PPO paper (just before Eq. (9)). However, they just say that they share layers only for learning Atari games, not for continuous control problems. They don't say why, but this can be explained as I said above: Atari games have a low-dimensional state representation that is optimal for both the actor and the critic (e.g., the encoded image learned by an autoencoder), while for continuous control you usually pass directly a low-dimensional state (coordinates, velocities, ...).
A3C, which you mentioned, was also used mostly for games (Doom, I think).
From my experience, in control sharing layers never worked if the state is already compact.

Inverted Pendulum: model-based or model-free?

This is my first post here, and I came here to discuss or get clarifications on something that I have trouble understanding, namely model-free vs model-based RL methods. I am currently implementing Q-learning, but am not certain I am doing it correctly.
Example: Say I am applying Q-learning to an inverted pendulum, where the reward is given as the absolute distance between the pendulum upward position, and terminal state (or goal state) is defined to be when the pendulum is very close to upward position.
Would this setup mean that I have a model-free or model-based setup? From how I have understood, this would be model-based as I have a model of the environment that is giving me the reward (R=abs(pos-wantedPos)). But then I saw an implementation of this using Q-learning (https://medium.com/#tuzzer/cart-pole-balancing-with-q-learning-b54c6068d947), which is a model-free algorithm. Now I am clueless...
Thankful for all responses.
Vanilla Q-learning is model-free.
The idea behind reinforcement learning is that an agent is trained to learn an optimal policy based on pairs of states and rewards--this is in contrast to trying to model the environment.
If you took a model-based approach, you would be trying to model the environment and ultimately perform value iteration or policy iteration of the Markov decision process.
In reinforcement learning, it is assumed you do not have the MDP, and thus must try to find an optimal policy based on the various rewards you receive from your experiences.
For a longer explanation, check out this post.

How to handle uncertainty in position?

I am working on a car following problem and the measurements I am receiving are uncertain ( I know that the noise model is gaussian and it's variance is also known). How do I select my next action in such kind of uncertainty?
Basically how should I change my cost function so that I can optimize my plan by selecting appropriate action?
Vanilla reinforcement learning is meant for Markov decision processes, where it's assumed that you can fully observe the state. Because your states are noisy, you have a Partially observable Markov decision process. Theoretically speaking you should be looking at a different category of RL approaches.
Practically, since you have so much information about the parameters of the uncertainty, you should consider using a Kalman or particle filter to perform state estimation. Then, use the most likely state estimate as the true state in your RL problem. The estimate will be wrong at times, of course, but if you're using a function approximation approach for the value function, the experience can generalize across similar states and you'll be able to learn. The learning performance is going to be proportional to the quality of your state estimate.