I am trying to compute
where h(x) = X*theta, theta is nx1 matrix and X is an mxn and y is also m x 1 matrix. alpha,m,theta(j) and x(j)(i) are scalar values. I am trying to vectorize this by below way
theta0 =(X*theta - y)
theta1 = (X*theta - y).*X(:,2)
theta0 = sum(theta0);
theta1 = sum(theta1);
theta0 = theta(1) - theta0*alpha/m;
theta1 = theta(2) - theta1*alpha/m;
theta = [theta0;theta1];
where num_iter=1500 and m=length(y) but my code went into infinite loop. my equivalent for loop solution is below
theta0 =0;
theta1= 0;
for i = 1:m,
theta0 = theta0 + (sum(X(i,:)*theta) - y(i));
theta1 = theta1 + (sum(X(i,:)*theta) - y(i))*X(i,2);
end
theta(1) = theta(1) - theta0*alpha/m;
theta(2) = theta(2) - theta1*alpha/m;
please let me where I went wrong
The formula can be written in a vectorized manner as:
theta = theta - alpha/m * ((X*theta - y).' *X).';
Related
I'm trying to implement Gradient descent in octave. I know I can do it by calculating every value of theta by itself like this
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta_0 = theta(1) - alpha / m * sum(X * theta - y);
theta_1 = theta(2) - alpha / m * sum((X * theta - y) .* X(:, 2));
theta = [theta_0; theta_1];
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
But I want to implement it with vectors in one step like this:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - (alpha/m * sum(((X * theta) - y).*X)
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
The problem is when I run the second piece of code I get this error:
syntax error
>>> J_history(iter) = computeCost(X, y, theta);
^
What have I done wrong / how can I fix this?
I am reading "Numerical Methods for Engineers" by Chapra and Canale. In it, they've provided pseudocode for the implementation of Euler's method (for solving ordinary differential equations). Here is the pseucode:
Pseucode for implementing Euler's method
I tried implementing this code in GNU Octave, but depending on the input values, I am getting one of two errors:
The program doesn't give any output at all. I have to press 'Ctrl + C' in order to break execution.
The program gives this message:
error: 'ynew' undefined near line 5 column 21
error: called from
Integrator at line 5 column 9
main at line 18 column 7
I would be very grateful if you could get this program to work for me. I am actually an amateur in GNU Octave. Thank you.
Edit 1: Here is my code. For main.m:
%prompt user
y = input('Initial value of y:');
xi = input('Initial value of x:');
xf = input('Final value of x:');
dx = input('Step size:');
xout = input('Output interval:');
x = xi;
m = 0;
xpm = x;
ypm = y;
while(1)
xend = x + xout;
if xend > xf
xend = xf;
h = dx;
Integrator(x,y,h,xend);
m = m + 1;
xpm = x;
ypm = y;
if x >= xf
break;
endif
endif
end
For Integrator.m:
function Integrator(x,y,h,xend)
while(1)
if xend - x < h
h = xend - x;
Euler(x,y,h,ynew);
y = ynew;
if x >= xend
break;
endif
endif
end
endfunction
For Euler.m:
function Euler(x,y,h,ynew)
Derivs(x,y,dydx);
ynew = y + dydx * h;
x = x + h;
endfunction
For Derivs.m:
function Derivs(x,y,dydx)
dydx = -2 * x^3 + 12 * x^2 - 20 * x + 8.5;
endfunction
Edit 2: I shoud mention that the differential equation which Chapra and Canale have given as an example is:
y'(x) = -2 * x^3 + 12 * x^2 - 20 * x + 8.5
That is why the 'Derivs.m' script shows dydx to be this particular polynomial.
Here is my final code. It has four different M-files:
main.m
%prompt the user
y = input('Initial value of y:');
x = input('Initial value of x:');
xf = input('Final value of x:');
dx = input('Step size:');
xout = dx;
%boring calculations
m = 1;
xp = [x];
yp = [y];
while x < xf
[x,y] = Integrator(x,y,dx,min(xf, x+xout));
m = m+1;
xp(m) = x;
yp(m) = y;
end
%plot the final result
plot(xp,yp);
title('Solution using Euler Method');
ylabel('Dependent variable (y)');
xlabel('Independent variable (x)');
grid on;
Integrator.m
%This function takes in 4 inputs (x,y,h,xend) and returns 2 outputs [x,y]
function [x,y] = Integrator(x,y,h,xend)
while x < xend
h = min(h, xend-x);
[x,y] = Euler(x,y,h);
end
endfunction
Euler.m
%This function takes in 3 inputs (x,y,h) and returns 2 outputs [x,ynew]
function [x,ynew] = Euler(x,y,h)
dydx = Derivs(x,y);
ynew = y + dydx * h;
x = x + h;
endfunction
Derivs.m
%This function takes in 2 inputs (x,y) and returns 1 output [dydx]
function [dydx] = Derivs(x,y)
dydx = -2 * x^3 + 12 * x^2 - 20 * x + 8.5;
endfunction
Your functions should look like
function [x, y] = Integrator(x,y,h,xend)
while x < xend
h = min(h, xend-x)
[x,y] = Euler(x,y,h);
end%while
end%function
as an example. Depending on what you want to do with the result, your main loop might need to collect all the results from the single steps. One variant for that is
m = 1;
xp = [x];
yp = [y];
while x < xf
[x,y] = Integrator(x,y,dx,min(xf, x+xout));
m = m+1;
xp(m) = x;
yp(m) = y;
end%while
I am facing trouble in the increasing oscillation on a simple harmonic oscillator using a backward difference. here is my code in Scilab
function [x] = back(h, tf)
k = 2;
m = 1;
i = 2;
t(i - 1) = 0;
x(i - 1) = 10;
v(i - 1) = 0;
t(i) = t(i - 1) + h
v(i) = v(i - 1) - h * (k / m) * x(i - 1)
while t(i) < tf
t(i + 1) = t(i) + h
x(i + 1) = x(i - 1) - 2 * (k / m) * v(i) * h
i = i + 1
end
plot(t, x, 'b');
endfunction
From your code, I suppose that you are trying to implement the velocity-Verlet scheme. Here is its implementation for a simple oscillator with the differential equation:
function [x] = back(h, tf)
k = 2;
m = 1;
t = 0:h:tf;
x(1) = 10;
v(1) = 0;
for i=2:length(t)
x(i) = x(i - 1) + v(i - 1) * h - k / m * x(i-1) * h^2 / 2;
v(i) = v(i - 1) - k / m * (x(i) + x(i-1)) * h / 2;
end
plot(t, x, 'b');
endfunction
[x] = back(0.01, 10)
I'm not quite sure what you are trying to achieve, neither if your math is correct. But assuming that you want to solve the numerical problem of:
//coefficients of:
k = 2.;
m = 1.;
// with an initial condition of:
t(1) = 0.;
x(1) = 10.;
v(1) = 0.;
// time paramters:
N = 50;
tf = 10;
h = tf / 50.;
for ii = 2:N
t(ii) = t(ii - 1) + h;
x(ii) = x(ii - 1) - 2 * (k / m) * v(ii - 1) * h
v(ii) = v(ii - 1) - h * (k / m) * x(ii - 1)
disp(x(ii))
end
plot(t, x, 'b');
will result in:
which doesn't seem right but anyway. Please check your math again.
I wrote following code for gradientDescent in Octave in .m file as follows:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Test values:
X = [1 5; 1 2; 1 4; 1 5];
y = [1 6 4 2]';
theta = [0 0]';
alpha = 0.01;
num_iters = 1000;
% Initialize some useful values:
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
x = X(:,2);
h = theta(1) + (theta(2)*x);
theta_zero = theta(1) - alpha * (1/m) * sum(h-y);
theta_one = theta(2) - alpha * (1/m) * sum((h - y) .* x);
theta = [theta_zero; theta_one];
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta); % History of J
end
disp(min(J_history));
end
% Code for computeCost function is as follows:
function J = computeCost(X, y, theta)
data =
6.1101 17.5920
5.5277 9.1302
8.5186 13.6620
7.0032 11.8540
5.8598 6.8233
8.3829 11.8860
7.4764 4.3483
8.5781 12.0000
6.4862 6.5987
m = length(y);
J = 0;
X = data(:, 1);
y = data(:, 2);
predictions = X*theta'; % predictions of hypothesis on examples
sqrErrors = (predictions - y).^2; % squared errors
J = 1/(2*m) * sum(sqrErrors);
end
When I run this from octave workspace I get the following error:
Error: A(I) = X: X must have the same size as I
error: called from
gradientDescent at line 55 column 21
I tried many things but unsuccessfully and mentors never replied properly.
Can you please tell me where I may be making a mistake.
Thanks in advance.
Bharat.
I've actually been struggling against this for like 2 months now. What is it that makes these different?
hypotheses= X * theta
temp=(hypotheses-y)'
temp=X(:,1) * temp
temp=temp * (1 / m)
temp=temp * alpha
theta(1)=theta(1)-temp
hypotheses= X * theta
temp=(hypotheses-y)'
temp=temp * (1 / m)
temp=temp * alpha
theta(2)=theta(2)-temp
theta(1) = theta(1) - alpha * (1/m) * ((X * theta) - y)' * X(:, 1);
theta(2) = theta(2) - alpha * (1/m) * ((X * theta) - y)' * X(:, 2);
The latter works. I'm just not sure why..I struggle to understand the need for the matrix inverse .
What you're doing in the first example in the second block you've missed out a step haven't you? I am assuming you concatenated X with a vector of ones.
temp=X(:,2) * temp
The last example will work but can be vectorized even more to be more simple and efficient.
I've assumed you only have 1 feature. it will work the same with multiple features since all that happens is you add an extra column to your X matrix for each feature. Basically you add a vector of ones to x to vectorize the intercept.
You can update a 2x1 matrix of thetas in one line of code. With x concatenate a vector of ones making it a nx2 matrix then you can calculate h(x) by multiplying by the theta vector (2x1), this is (X * theta) bit.
The second part of the vectorization is to transpose (X * theta) - y) which gives you a 1*n matrix which when multiplied by X (an n*2 matrix) will basically aggregate both (h(x)-y)x0 and (h(x)-y)x1. By definition both thetas are done at the same time. This results in a 1*2 matrix of my new theta's which I just transpose again to flip around the vector to be the same dimensions as the theta vector. I can then do a simple scalar multiplication by alpha and vector subtraction with theta.
X = data(:, 1); y = data(:, 2);
m = length(y);
X = [ones(m, 1), data(:,1)];
theta = zeros(2, 1);
iterations = 2000;
alpha = 0.001;
for iter = 1:iterations
theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Performs gradient descent to learn theta. Updates theta by taking num_iters
% gradient steps with learning rate alpha.
% Number of training examples
m = length(y);
% Save the cost J in every iteration in order to plot J vs. num_iters and check for convergence
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
h = X * theta;
stderr = h - y;
theta = theta - (alpha/m) * (stderr' * X)';
J_history(iter) = computeCost(X, y, theta);
end
end
In the first one, if X were a 3x2 matrix and theta were a 2x1 matrix, then "hypotheses" would be a 3x1 matrix.
Assuming y is a 3x1 matrix, then you can perform (hypotheses - y) and get a 3x1 matrix, then the transpose of that 3x1 is a 1x3 matrix assigned to temp.
Then the 1x3 matrix is set to theta(2), but this should not be a matrix.
The last two lines of your code works because, using my mxn examples above,
(X * theta)
would be a 3x1 matrix.
Then that 3x1 matrix is subtracted by y (a 3x1 matrix) and the result is a 3x1 matrix.
(X * theta) - y
So the transpose of the 3x1 matrix is a 1x3 matrix.
((X * theta) - y)'
Finally, a 1x3 matrix times a 3x1 matrix will equal a scalar or 1x1 matrix, which is what you are looking for. I'm sure you knew already, but just to be thorough, the X(:,2) is the second column of the 3x2 matrix, making it a 3x1 matrix.
When you update you need to do like
Start Loop {
temp0 = theta0 - (equation_here);
temp1 = theta1 - (equation_here);
theta0 = temp0;
theta1 = temp1;
} End loop
This can be vectorized more simply with
h = X * theta % m-dimensional matrix (prediction our hypothesis gives per training example)
std_err = h - y % an m-dimensional matrix of errors (one per training example)
theta = theta - (alpha/m) * X' * std_err
Remember X, is the design matrix, and as such each row of X represents a training example and each column of X represents a given component (say the zeroth or first components) across all training examples. Each column of X is therefore exactly the thing we want to multiply element-wise with the std_err before summing to get the corresponding component of the theta vector.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1 : num_iters
hypothesis = X * theta;
Error = (hypothesis - y);
temp = theta - ((alpha / m) * (Error' * X)');
theta = temp;
J_history(iter) = computeCost(X, y, theta);
end
end
.
.
.
.
.
.
.
.
.
Spoiler alert
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
% ========================== BEGIN ===========================
t = zeros(2,1);
J = computeCost(X, y, theta);
t = theta - ((alpha*((theta'*X') - y'))*X/m)';
theta = t;
J1 = computeCost(X, y, theta);
if(J1>J),
break,fprintf('Wrong alpha');
else if(J1==J)
break;
end;
% ========================== END ==============================
% Save the cost J in every iteration
J_history(iter) = sum(computeCost(X, y, theta));
end
end