Why Gt+1 = v(St+1) in Bellman Equation for MRPs?

Why Gt+1 = v(St+1) in Bellman Equation for MRPs? - reinforcement-learning

In <Lecture 2: Markov Decision Processes> by David Silver on page 19, it has the following Derived formula:
I found is equal to which means Gt+1 = v(St+1) so Gt = v(St).
According to Return Defination:
and according to Gt = v(St):
v(St) = Gt =
But the defination of Value Function is
which means
v(s) = =
which is absolutly wrong.
My question are:
Why Gt+1 = v(St+1)?
Where are my derivation mistakes?

First big error is a claim that E[a + b] = E[a + c] imples b=c, this is not how expectations work. In particular E[a + b] = E[a] + E[b], and E[a + c] = E[a] + E[c] thus we have E[b] = E[c] (and not b=c!) so G_{t+1} is not equal to v(S_{t+1}), but E[G_{t+1}] = v(S_{t+1}) (which comes from the definition).
In general equality of funtion value does not make arguments equal. Same way f(x + a) = f(x + b) does not imply a=b for say f(x) = x^2 as it would also hold for x=0, a=-1, b=1.

Related

Matlab function in fmincon with fixed value

I have a simple model where I want to minimize the RMSE between my dependent variable y and my model values. The model is: y = alpha + beta'*x.
For minimization, I am using Matlab's fmincon function and am struggling with multiplying my parameter p(2) by x.
MWE:
% data
y = [5.072, 7.1588, 7.263, 4.255, 6.282, 6.9118, 4.044, 7.2595, 6.898, 4.8744, 6.5179, 7.3434, 5.4316, 3.38, 5.464, 5.90, 6.80, 6.193, 6.070, 5.737]
x = [18.3447, 79.86538, 85.09788, 10.5211, 44.4556, 69.567, 8.960, 86.197, 66.857, 16.875, 52.2697, 93.971, 24.35, 5.118, 25.126, 34.037, 61.4445, 42.704, 39.531, 29.988]
% initial values
p_initial = [0, 0];
% function: SEE BELOW
objective = #(p) sqrt(mean((y - y_mod(p)).^2));
% optimization
[param_opt, fval] = fmincon(objective, p_initial)
If I specify my function as follows then it works.
y_mod = #(p) p(1) + p(2).*x
However, it does not work if I use the following code. How can I multiply p(2) with x? Where x is not optimized, because the values are given.
function f = y_mod(p)
f = p(1) + p(2).*x
end

Here is the output from a script that has the function declaration:
>> modelFitExample2a
RMS Error=0.374, intercept=4.208, slope=0.0388
And here is code for the above. It has many commented lines because it includes alternate ways to fit the data: an inline declaration of y_mod(), or a multi-line declaration of y_mod(), or no y_mod() at all. This version uses the multi-line declaration of y_mod().
%modelFitExample2a.m WCR 2021-01-19
%Reply to stack exchange question on parameter fitting
clear;
global x %need this if define y_mod() separately, and in that case y_mod() must declare x global
% data
y = [5.0720, 7.1588, 7.2630, 4.2550, 6.2820, 6.9118, 4.0440, 7.2595, 6.8980, 4.8744...
6.5179, 7.3434, 5.4316, 3.3800, 5.4640, 5.9000, 6.8000, 6.1930, 6.0700, 5.7370];
x = [18.3447,79.8654,85.0979,10.5211,44.4556,69.5670, 8.9600,86.1970,66.8570,16.8750,...
52.2697,93.9710,24.3500, 5.1180,25.1260,34.0370,61.4445,42.7040,39.5310,29.9880];
% initial values
p_initial = [0, 0];
%predictive model with parameter p
%y_mod = #(p) p(1) + p(2)*x;
% objective function
%If you use y_mod(), then you must define it somewhere
objective = #(p) sqrt(mean((y - y_mod(p)).^2));
%objective = #(p) sqrt(mean((y-p(1)-p(2)*x).^2));
% optimization
options = optimset('Display','Notify');
[param_opt, fval] = fmincon(objective,p_initial,[],[],[],[],[],[],[],options);
% display results
fprintf('RMS Error=%.3f, intercept=%.3f, slope=%.4f\n',...
fval,param_opt(1),param_opt(2));
%function declaration: predictive model
%This is an alternative to the inline definition of y_mod() above.
function f = y_mod(p)
global x
f = p(1) + p(2)*x;
end

carl,
The second method, in which you declare y_mod() explicitly (at the end of your script, or in a separate file y_mod.m), does not work because y_mod() does not know what x is. Fix it by declaring x global in the main program at the top, and declare x global in y_mod().
%function declaration
function f = y_mod(p)
global x
f = p(1) + p(2)*x;
end
Of course you don't need y_mod() at all. The code also works if you use the following, and in this case, no global x is needed:
% objective function
objective = #(p) sqrt(mean((y-p(1)-p(2)*x).^2));
By the way, you don't need to multiply with .* in y_mod. You may use *, because you are multiplying a scalar by a vector.

Where is the error in my below code for approximate solution of poisson boundary value problem?

I have to solve the following boundary value problem which is
also it is defined in my Matlab code below, but my code doesn't work. I mean I didn't get the approximate solution of my system.
I want to know where is the problem in my code or just the version of matlab that I have can't compile the kind of function I have used , Thanks
Explanation of method I have used : I have used the finite element method or what we called Galerkin Method based on investigation about assembly matrix and stiffness matrix. I have multiplied the system by weight function which satisfies the boundary condition then I have integrated over elements (integration of elementary matrix over the range ]-1,1[). I have four elementary matrix. For more information about that Method I used please check this paper(page:6,7,8)
Note The error I have got upon the compilation of my code is
The current use of "MatElt2Nd" is inconsistent with it previous use or definition in line 7
Code
function [U] = EquaDiff2(n)
% ----------------------------------
% -d²u/dx² + 6*u = (-4*x^2-6)exp(x^2)
% u(-1) = 0 u(1)= 0
%----------------------------------
function [Ke, Fe] = MatElt2Nd(x1,x2)
% déclaration de la fonction,
% function of computing matrix and elementary matrix (assembly matrix)
% ----------------------------------
x = [-1:2/n:1]'; % modification d1 of bound d’intégration
K = zeros(n+1) ;
F = zeros(n+1,1) ;
for i = 1:n
j = i+1;
t = [i j];
x1 = x(i);
x2 = x(j);
[Ke,Fe] = MatElt2Nd(x1,x2);
K(t,t) = K(t,t) + Ke;
F(t) = F(t) + Fe;
end;
K(1,:) = [];
K(:,1) = [];
F(1) = [];
U = K\F;
U = [0.0;U];
t = 0:0.01:1;
return
%-------------------------------------------
% calculation of matrix Ke and vector Fe
%-------------------------------------------
function [Ke,Fe] = MatElt2Nd0(x1,x2)
% NEWly named nested function is introduced
Ke1 = 1/(x2-x1)*[ 1 -1 % no modification done
-1 1 ] ; % essentiellement que les matrices
Ke2 =(x2-x1)* [ 2 1 % élémentaires
1 2 ] ;
N = [(x-x2)/(x1-x2) (x-x1)/(x2-x1)] % function of form
Fe =simple( int(N' * (-4*x^2-6)*exp(x^2) , x, x1, x2) ) % vecteur Fe ;
Ke = Ke1 + 6*Ke2 ;
return
Edit I have got a general code for that but I can't do changes in the general code to solve my system , Any help ?
General Code
% au'(x)+bu"(x)=0 for 0<=x<=d
% BC: u(0)=0 and u(d)=h
%==============================================================
% ======Example======
% Finding an approximate solution to the following BVP using 4 elements of
% equal length.
% u'(x)-u"(x)=0 : 0<=x<=1
% BC: u(0)=0 and u(1)=1
% Solution:
% >> Galerkin(4,1,-1,1,1)
% ==============================================================
% The output of this program is
% 1- The approximate solution (plotted in blue)
% 2- The exact solution (plotted in red)
% 3- The percentage error (plotted in magenta)
%=======================Program Begin==========================
function Galerkin(ne1,a,b,d,h) % Declare function
clc % Clear workspace
% Define the Coefficients of the exact solution
% The Exact solution is : u(x)=C1+C2*exp(-ax/b)
% where C2=h/(exp(-a*d/b)-1)and C1=-C2
C2=h/((exp(-a*d/b))-1);
C1=-C2;
% Define element length
le = d/ne1;
% Define x matrix
x = zeros (ne1+1,1); %
for i=2:ne1 +1
x(i,1) = x(i-1,1)+le;
end
% K1 matrix corresponding to the diffusion term (u"(x))
K1 = (b/le) * [1,-1;-1,1]
% K2 matrix corresponding to the convection term (u'(x))
K2 = a*[-1/2 1/2;-1/2 1/2]
% Element stiffness Matrix
Ke = K1+K2
% Global stiffness matrix
%********************Begin Assembly***************************
k = zeros(ne1+1);
for i=1:ne1+1
for j=1:ne1 +1
if (i==j)
if(i==1)
k(i,j)=Ke(1,1);
elseif(i==ne1+1)
k(i,j)=Ke(2,2);
else
k(i,j)=Ke(1,1)+Ke(2,2);
end
elseif(i==j+1)
k(i,j)=Ke(1,2);
elseif(j==i+1)
k(i,j)=Ke(2,1);
else
k(i,j)=0;
end
end
end
%********************End Assembly*****************************
%The Global f Matrix
f = zeros(ne1+1,1);
%BC apply u(0) = 0
f(1,1) = 0;
%BC apply u(d) = h
f(ne1+1,1) = h;
% Display the Global stifness matrix before striking row
K_Global=k
%Striking first row (u1=0)
k(1,1) = 1;
for i=2:ne1+1
k(1,i) = 0;
k(ne1+1,i) = 0;
end
k(ne1+1,ne1+1) = 1;
% Display the solvable stifness matrix
K_strike=k
%solving the result and finding the displacement matrix, {u}
u=inv(k)*f
hold on
% ======Calculating Approximate Solution and plotting============
syms X
U_sym=sym(zeros(ne1,1));
dU_sym=sym(zeros(ne1,1));
for i=1:ne1
N1x=1-((X-x(i))/le);
N2x=(X-x(i))/le;
U_X=(u(i)*N1x)+(u(i+1)*N2x);
U_sym(i)=U_X;
dU_sym(i)=diff(U_sym(i));
subplot(3,1,1)
hold on
ezplot(U_sym(i),[x(i) x(i+1)])
subplot(3,1,2)
hold on
% du/dx approximate
ezplot(dU_sym(i),[x(i) x(i+1)])
end

Modifying perform function in Theano.tensor.nnet.softmax

I have just begun using lasagne and Theano to do some machine learning on Python.
I am trying to modify the softmax class in Theano. I want to change how the activation function(softmax) is calculated. Instead of dividing e_x by e_x.sum(axis=1), I want to divide e_x by sum of three consecutive numbers.
For instance, the result will be as follows:
sm[0] = e_x[0]/(e_x[0]+e_x[1]+e_x[2])
sm[1] = e_x[1]/(e_x[0]+e_x[1]+e_x[2])
sm[2] = e_x[2]/(e_x[0]+e_x[1]+e_x[2])
sm[3] = e_x[3]/(e_x[3]+e_x[4]+e_x[5])
sm[4] = e_x[4]/(e_x[3]+e_x[4]+e_x[5])
sm[5] = e_x[5]/(e_x[3]+e_x[4]+e_x[5])
and so on...
The problem is that I cannot quite grasp how theano carries out the computation.
Here is my main question. Does it suffice to just change the perform() function in the softmax class?
Here is the original perform() function:
def perform(self, node, input_storage, output_storage):
x, = input_storage
e_x = numpy.exp(x - x.max(axis=1)[:, None])
sm = e_x / e_x.sum(axis=1)[:, None]
output_storage[0][0] = sm
Here is my modified perform()
def myPerform(self, node, input_storage, output_storage):
x, = input_storage
e_x = numpy.exp(x - x.max(axis=1)[:, None])
sm = numpy.zeros_like(e_x)
for i in range(0,symbolCount):
total = e_x[3*i] + e_x[3*i+1] + e_x[3*i+2]
sm[3*i] = e_x[3*i]/total
sm[3*i+1] = e_x[3*i+1]/total
sm[3*i+2] = e_x[3*i+2]/total
output_storage[0][0] = sm
With the current code, I am getting 'unorderable types:int()>str()' error when I use the predict method in lasagne.

For something like this you're probably better off constructing a custom softmax via symbolic expressions rather than creating (or modifying) an operation.
Your custom softmax can be defined in terms of symbolic expressions. Doing it this way will give you gradients (and other Theano operation bits and pieces) "for free" but might run slightly slower than a custom operation could.
Here's an example:
import numpy
import theano
import theano.tensor as tt
x = tt.matrix()
# Use the built in softmax operation
y1 = tt.nnet.softmax(x)
# A regular softmax operation defined via ordinary Theano symbolic expressions
y2 = tt.exp(x)
y2 = y2 / y2.sum(axis=1)[:, None]
# Custom softmax operation
def custom_softmax(a):
b = tt.exp(a)
b1 = b[:, :3] / b[:, :3].sum(axis=1)[:, None]
b2 = b[:, 3:] / b[:, 3:].sum(axis=1)[:, None]
return tt.concatenate([b1, b2], axis=1)
y3 = custom_softmax(x)
f = theano.function([x], outputs=[y1, y2, y3])
x_value = [[.1, .2, .3, .4, .5, .6], [.1, .3, .5, .2, .4, .6]]
y1_value, y2_value, y3_value = f(x_value)
assert numpy.allclose(y1_value, y2_value)
assert y3_value.shape == y1_value.shape
a = numpy.exp(.1) + numpy.exp(.2) + numpy.exp(.3)
b = numpy.exp(.4) + numpy.exp(.5) + numpy.exp(.6)
c = numpy.exp(.1) + numpy.exp(.3) + numpy.exp(.5)
d = numpy.exp(.2) + numpy.exp(.4) + numpy.exp(.6)
assert numpy.allclose(y3_value, [
[numpy.exp(.1) / a, numpy.exp(.2) / a, numpy.exp(.3) / a, numpy.exp(.4) / b, numpy.exp(.5) / b, numpy.exp(.6) / b],
[numpy.exp(.1) / c, numpy.exp(.3) / c, numpy.exp(.5) / c, numpy.exp(.2) / d, numpy.exp(.4) / d, numpy.exp(.6) / d]
]), y3_value

Solving two coupled non-linear second order differentially equations numerically

I have encountered the following system of differential equations in lagrangian mechanics. Can you suggest a numerical method, with relevant links and references on how can I solve it. Also, is there a shorter implementation on Matlab or Mathematica?
mx (y dot)^2 + mgcosy - Mg - (M=m)(x double dot) =0
gsiny + 2(x dot)(y dot + x (y double dot)=0
where (x dot) or (y dot)= dx/dt or dy/dt, and the double dot indicated a double derivative wrt time.

You can create a vector Y = (x y u v)' so that
dx/dt = u
dy/dt = v
du/dt = d²x/dt²
dv/dt = d²y/dt²
It is possible to isolate the second derivatives from the equations, so you get
d²x/dt² = (m*g*cos(y) + m*x*v² - M*g)/(M-m)
d²y/dt² = -(g*sin(y) - 2*u*v)/x
Now, you can try to solve it using standard ODE solvers, such as Runge-Kutta methods. Matlab has a set of solvers, such as ode23. I didn't test he following, but it would be something like it:
function f = F(Y)
x = Y(1); y = Y(2); u = Y(3); v = Y(4);
f = [0,0,0,0];
f(1) = u;
f(2) = v;
f(3) = (m*g*cos(y) + m*x*v*v - M*g)/(M-m);
f(4) = -(g*sin(y) - 2*u*v)/x;
[T,Y] = ode23(F, time_period, Y0);

How to use Newton-Raphson method in matlab to find an equation root?

I am a new user of MATLAB. I want to find the value that makes f(x) = 0, using the Newton-Raphson method. I have tried to write a code, but it seems that it's difficult to implement Newton-Raphson method. This is what I have so far:
function x = newton(x0, tolerance)
tolerance = 1.e-10;
format short e;
Params = load('saved_data.mat');
theta = pi/2;
zeta = cos(theta);
I = eye(Params.n,Params.n);
Q = zeta*I-Params.p*Params.p';
% T is a matrix(5,5)
Mroot = Params.M.^(1/2); %optimization
T = Mroot*Q*Mroot;
% Find the eigenvalues
E = real(eig(T));
% Find the negative eigenvalues
% Find the smallest negative eigenvalue
gamma = min(E);
% Now solve for lambda
M_inv = inv(Params.M); %optimization
zm = Params.zm;
x = x0;
err = (x - xPrev)/x;
while abs(err) > tolerance
xPrev = x;
x = xPrev - f(xPrev)./dfdx(xPrev);
% stop criterion: (f(x) - 0) < tolerance
err = f(x);
end
% stop criterion: change of x < tolerance % err = x - xPrev;
end
The above function is used like so:
% Calculate the functions
Winv = inv(M_inv+x.*Q);
f = #(x)( zm'*M_inv*Winv*M_inv*zm);
dfdx = #(x)(-zm'*M_inv*Winv*Q*M_inv*zm);
x0 = (-1/gamma)/2;
xRoot = newton(x0,1e-10);

The question isn't particularly clear. However, do you need to implement the root finding yourself? If not then just use Matlab's built in function fzero (not based on Newton-Raphson).
If you do need your own implementation of the Newton-Raphson method then I suggest using one of the answers to Newton Raphsons method in Matlab? as your starting point.
Edit: The following isn't answering your question, but is just a note on coding style.
It is useful to split your program up into reusable chunks. In this case your root finding should be separated from your function construction. I recommend writing your Newton-Raphson method in a separate file and call this from the script where you define your function and its derivative. Your source would then look some thing like:
% Define the function (and its derivative) to perform root finding on:
Params = load('saved_data.mat');
theta = pi/2;
zeta = cos(theta);
I = eye(Params.n,Params.n);
Q = zeta*I-Params.p*Params.p';
Mroot = Params.M.^(1/2);
T = Mroot*Q*Mroot; %T is a matrix(5,5)
E = real(eig(T)); % Find the eigen-values
gamma = min(E); % Find the smallest negative eigen value
% Now solve for lambda (what is lambda?)
M_inv = inv(Params.M);
zm = Params.zm;
Winv = inv(M_inv+x.*Q);
f = #(x)( zm'*M_inv*Winv*M_inv*zm);
dfdx = #(x)(-zm'*M_inv*Winv*Q*M_inv*zm);
x0 = (-1./gamma)/2.;
xRoot = newton(f, dfdx, x0, 1e-10);
In newton.m you would have your implementation of the Newton-Raphson method, which takes as arguments the function handles you define (f and dfdx). Using your code given in the question, this would look something like
function root = newton(f, df, x0, tol)
root = x0; % Initial guess for the root
MAXIT = 20; % Maximum number of iterations
for j = 1:MAXIT;
dx = f(root) / df(root);
root = root - dx
% Stop criterion:
if abs(dx) < tolerance
return
end
end
% Raise error if maximum number of iterations reached.
error('newton: maximum number of allowed iterations exceeded.')
end
Notice that I avoided using an infinite loop.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why Gt+1 = v(St+1) in Bellman Equation for MRPs? - reinforcement-learning

Related

Matlab function in fmincon with fixed value

Where is the error in my below code for approximate solution of poisson boundary value problem?

Modifying perform function in Theano.tensor.nnet.softmax

Solving two coupled non-linear second order differentially equations numerically

How to use Newton-Raphson method in matlab to find an equation root?

Categories

Resources