Storing cost history in a vector - octave

I wrote following code for gradientDescent in Octave in .m file as follows:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Test values:
X = [1 5; 1 2; 1 4; 1 5];
y = [1 6 4 2]';
theta = [0 0]';
alpha = 0.01;
num_iters = 1000;
% Initialize some useful values:
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
x = X(:,2);
h = theta(1) + (theta(2)*x);
theta_zero = theta(1) - alpha * (1/m) * sum(h-y);
theta_one = theta(2) - alpha * (1/m) * sum((h - y) .* x);
theta = [theta_zero; theta_one];
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta); % History of J
end
disp(min(J_history));
end
% Code for computeCost function is as follows:
function J = computeCost(X, y, theta)
data =
6.1101 17.5920
5.5277 9.1302
8.5186 13.6620
7.0032 11.8540
5.8598 6.8233
8.3829 11.8860
7.4764 4.3483
8.5781 12.0000
6.4862 6.5987
m = length(y);
J = 0;
X = data(:, 1);
y = data(:, 2);
predictions = X*theta'; % predictions of hypothesis on examples
sqrErrors = (predictions - y).^2; % squared errors
J = 1/(2*m) * sum(sqrErrors);
end
When I run this from octave workspace I get the following error:
Error: A(I) = X: X must have the same size as I
error: called from
gradientDescent at line 55 column 21
I tried many things but unsuccessfully and mentors never replied properly.
Can you please tell me where I may be making a mistake.
Thanks in advance.
Bharat.

Related

Octave changes scale of y axis when I change the x-axis

When I used this code, it plots two functions like this:
a = 2;
t0 = 1;
N = 100;
epsilon = 1e-5;
function t = metodoDeNewton(a, t0, N, epsilon)
t = zeros(1, N+1);
t(1) = t0;
for i = 1:N
t(i+1) = t(i) - (t(i).^2 - (a - sin(t(i)))) ./ (2 .* t(i) - cos(t(i)));
if abs(t(i+1) - t(i)) < epsilon
break;
endif
endfor
endfunction
t = metodoDeNewton(a, t0, N, epsilon);
x = 0:0.01:1;
y1 = t;
y2 = a - sin(t);
l = plot(x, y1, x, y2);
legend({'g(a)', 'h(a)'});
xlabel('a');
ylabel('y');
But, when I try to change the x to x = 0:0.01:0.2 the graph's y-axis scale changes and I'm no longer able to see the functions I believe.
How can I fix this, any help would be appreciated!

Weird errors on Gradient Descent in Octave (Syntax errors on known command)

I'm trying to implement Gradient descent in octave. I know I can do it by calculating every value of theta by itself like this
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta_0 = theta(1) - alpha / m * sum(X * theta - y);
theta_1 = theta(2) - alpha / m * sum((X * theta - y) .* X(:, 2));
theta = [theta_0; theta_1];
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
But I want to implement it with vectors in one step like this:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - (alpha/m * sum(((X * theta) - y).*X)
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
The problem is when I run the second piece of code I get this error:
syntax error
>>> J_history(iter) = computeCost(X, y, theta);
^
What have I done wrong / how can I fix this?

Implementing Euler's Method in GNU Octave

I am reading "Numerical Methods for Engineers" by Chapra and Canale. In it, they've provided pseudocode for the implementation of Euler's method (for solving ordinary differential equations). Here is the pseucode:
Pseucode for implementing Euler's method
I tried implementing this code in GNU Octave, but depending on the input values, I am getting one of two errors:
The program doesn't give any output at all. I have to press 'Ctrl + C' in order to break execution.
The program gives this message:
error: 'ynew' undefined near line 5 column 21
error: called from
Integrator at line 5 column 9
main at line 18 column 7
I would be very grateful if you could get this program to work for me. I am actually an amateur in GNU Octave. Thank you.
Edit 1: Here is my code. For main.m:
%prompt user
y = input('Initial value of y:');
xi = input('Initial value of x:');
xf = input('Final value of x:');
dx = input('Step size:');
xout = input('Output interval:');
x = xi;
m = 0;
xpm = x;
ypm = y;
while(1)
xend = x + xout;
if xend > xf
xend = xf;
h = dx;
Integrator(x,y,h,xend);
m = m + 1;
xpm = x;
ypm = y;
if x >= xf
break;
endif
endif
end
For Integrator.m:
function Integrator(x,y,h,xend)
while(1)
if xend - x < h
h = xend - x;
Euler(x,y,h,ynew);
y = ynew;
if x >= xend
break;
endif
endif
end
endfunction
For Euler.m:
function Euler(x,y,h,ynew)
Derivs(x,y,dydx);
ynew = y + dydx * h;
x = x + h;
endfunction
For Derivs.m:
function Derivs(x,y,dydx)
dydx = -2 * x^3 + 12 * x^2 - 20 * x + 8.5;
endfunction
Edit 2: I shoud mention that the differential equation which Chapra and Canale have given as an example is:
y'(x) = -2 * x^3 + 12 * x^2 - 20 * x + 8.5
That is why the 'Derivs.m' script shows dydx to be this particular polynomial.
Here is my final code. It has four different M-files:
main.m
%prompt the user
y = input('Initial value of y:');
x = input('Initial value of x:');
xf = input('Final value of x:');
dx = input('Step size:');
xout = dx;
%boring calculations
m = 1;
xp = [x];
yp = [y];
while x < xf
[x,y] = Integrator(x,y,dx,min(xf, x+xout));
m = m+1;
xp(m) = x;
yp(m) = y;
end
%plot the final result
plot(xp,yp);
title('Solution using Euler Method');
ylabel('Dependent variable (y)');
xlabel('Independent variable (x)');
grid on;
Integrator.m
%This function takes in 4 inputs (x,y,h,xend) and returns 2 outputs [x,y]
function [x,y] = Integrator(x,y,h,xend)
while x < xend
h = min(h, xend-x);
[x,y] = Euler(x,y,h);
end
endfunction
Euler.m
%This function takes in 3 inputs (x,y,h) and returns 2 outputs [x,ynew]
function [x,ynew] = Euler(x,y,h)
dydx = Derivs(x,y);
ynew = y + dydx * h;
x = x + h;
endfunction
Derivs.m
%This function takes in 2 inputs (x,y) and returns 1 output [dydx]
function [dydx] = Derivs(x,y)
dydx = -2 * x^3 + 12 * x^2 - 20 * x + 8.5;
endfunction
Your functions should look like
function [x, y] = Integrator(x,y,h,xend)
while x < xend
h = min(h, xend-x)
[x,y] = Euler(x,y,h);
end%while
end%function
as an example. Depending on what you want to do with the result, your main loop might need to collect all the results from the single steps. One variant for that is
m = 1;
xp = [x];
yp = [y];
while x < xf
[x,y] = Integrator(x,y,dx,min(xf, x+xout));
m = m+1;
xp(m) = x;
yp(m) = y;
end%while

Linear Regression using fminunc Implementation

I'm trying to implement linear regression with only one feature using fminunc in Octave.
Here is my code.
x = load('/home/battousai/Downloads/ex2Data/ex2x.dat');
y = load('/home/battousai/Downloads/ex2Data/ex2y.dat');
m = length(y);
x = [ones(m , 1) , x];
theta = [0 , 0]';
X0 = [x , y , theta];
options = optimset('GradObj' , 'on' , 'MaxIter' , 1500);
[x , val] = fminunc(#computeCost , X0 , options)
And here is the cost function which returns the gradient as well as the value of the cost function.
function [J , gradient] = computeCost(x , y , theta)
m = length(y);
J = (0.5 / m) .* (x * theta - y )' * (x * theta - y );
gradient = (1/m) .* x' * (x * theta - y);
end
The length of the data set is 50, i.e., the dimensions are 50 x 1. I'm not getting the part that how should I pass X0 to the fminunc.
Updated Driver Code:
x = load('/home/battousai/Downloads/ex2Data/ex2x.dat');
y = load('/home/battousai/Downloads/ex2Data/ex2y.dat');
m = length(y);
x = [ones(m , 1) x];
theta_initial = [0 , 0];
options = optimset('Display','iter','GradObj','on' , 'MaxIter' , 100);
[X , Cost] = fminunc(#(t)(computeCost(x , y , theta)), theta_initial , options)
Updated Code for Cost function:
function [J , gradient] = computeCost(x , y , theta)
m = length(y);
J = (1/(2*m)) * ((x * theta) - y )' * ((x * theta) - y) ;
gradient = (1 / m) .* x' * ((x * theta) - y);
end
Now I'm getting values of theta to be [0,0] but when I used normal equation, values of theta turned out to be [0.750163 , 0.063881].
From the documentation for fminunc:
FCN should accept a vector (array) defining the unknown variables
and
X0 determines a starting guess.
Since your input is a cost function (i.e. it associates your choice of parameter vector with a cost), the input argument to your cost function, that needs to be optimised via fminunc is only theta, since x and y (i.e. your observations and your targets) are considered 'given' aspects of the problem and are not things you're trying to optimise. So you either declare x and y global and access them from your function like so:
function [J , gradient] = computeCost(theta_0)
global x; global y;
% ...
and then call fminunc as: fminunc (#computeCost, t_0, options)
or, keep your computeCost function as computeCost(x, y, theta), and change your fminunc call to something like this:
[x , val] = fminunc(# (t) computeCost(x, y, t) , t0 , options)
UPDATE Not sure what you were doing wrong. Here is the full code and an octave session running it. Seems fine.
%% in file myscript.m
x = load('ex2x.dat');
y = load('ex2y.dat');
m = length(y);
x = [ones(m , 1) , x];
theta_0 = [0 , 0]';
options = optimset('GradObj' , 'on' , 'MaxIter' , 1500);
[theta_opt, cost] = fminunc(# (t) computeCost(x,y,t) , theta_0 , options)
%% in file computeCost.m
function [J , gradient] = computeCost(x , y , theta)
m = length(y);
J = (0.5 / m) .* (x * theta - y )' * (x * theta - y );
gradient = (1/m) .* x' * (x * theta - y);
end
%% in the octave terminal:
>> myscript
theta_opt =
0.750163
0.063881
cost = 9.8707e-04

Gradient Descent implementation in octave

I've actually been struggling against this for like 2 months now. What is it that makes these different?
hypotheses= X * theta
temp=(hypotheses-y)'
temp=X(:,1) * temp
temp=temp * (1 / m)
temp=temp * alpha
theta(1)=theta(1)-temp
hypotheses= X * theta
temp=(hypotheses-y)'
temp=temp * (1 / m)
temp=temp * alpha
theta(2)=theta(2)-temp
theta(1) = theta(1) - alpha * (1/m) * ((X * theta) - y)' * X(:, 1);
theta(2) = theta(2) - alpha * (1/m) * ((X * theta) - y)' * X(:, 2);
The latter works. I'm just not sure why..I struggle to understand the need for the matrix inverse .
What you're doing in the first example in the second block you've missed out a step haven't you? I am assuming you concatenated X with a vector of ones.
temp=X(:,2) * temp
The last example will work but can be vectorized even more to be more simple and efficient.
I've assumed you only have 1 feature. it will work the same with multiple features since all that happens is you add an extra column to your X matrix for each feature. Basically you add a vector of ones to x to vectorize the intercept.
You can update a 2x1 matrix of thetas in one line of code. With x concatenate a vector of ones making it a nx2 matrix then you can calculate h(x) by multiplying by the theta vector (2x1), this is (X * theta) bit.
The second part of the vectorization is to transpose (X * theta) - y) which gives you a 1*n matrix which when multiplied by X (an n*2 matrix) will basically aggregate both (h(x)-y)x0 and (h(x)-y)x1. By definition both thetas are done at the same time. This results in a 1*2 matrix of my new theta's which I just transpose again to flip around the vector to be the same dimensions as the theta vector. I can then do a simple scalar multiplication by alpha and vector subtraction with theta.
X = data(:, 1); y = data(:, 2);
m = length(y);
X = [ones(m, 1), data(:,1)];
theta = zeros(2, 1);
iterations = 2000;
alpha = 0.001;
for iter = 1:iterations
theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Performs gradient descent to learn theta. Updates theta by taking num_iters
% gradient steps with learning rate alpha.
% Number of training examples
m = length(y);
% Save the cost J in every iteration in order to plot J vs. num_iters and check for convergence
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
h = X * theta;
stderr = h - y;
theta = theta - (alpha/m) * (stderr' * X)';
J_history(iter) = computeCost(X, y, theta);
end
end
In the first one, if X were a 3x2 matrix and theta were a 2x1 matrix, then "hypotheses" would be a 3x1 matrix.
Assuming y is a 3x1 matrix, then you can perform (hypotheses - y) and get a 3x1 matrix, then the transpose of that 3x1 is a 1x3 matrix assigned to temp.
Then the 1x3 matrix is set to theta(2), but this should not be a matrix.
The last two lines of your code works because, using my mxn examples above,
(X * theta)
would be a 3x1 matrix.
Then that 3x1 matrix is subtracted by y (a 3x1 matrix) and the result is a 3x1 matrix.
(X * theta) - y
So the transpose of the 3x1 matrix is a 1x3 matrix.
((X * theta) - y)'
Finally, a 1x3 matrix times a 3x1 matrix will equal a scalar or 1x1 matrix, which is what you are looking for. I'm sure you knew already, but just to be thorough, the X(:,2) is the second column of the 3x2 matrix, making it a 3x1 matrix.
When you update you need to do like
Start Loop {
temp0 = theta0 - (equation_here);
temp1 = theta1 - (equation_here);
theta0 = temp0;
theta1 = temp1;
} End loop
This can be vectorized more simply with
h = X * theta % m-dimensional matrix (prediction our hypothesis gives per training example)
std_err = h - y % an m-dimensional matrix of errors (one per training example)
theta = theta - (alpha/m) * X' * std_err
Remember X, is the design matrix, and as such each row of X represents a training example and each column of X represents a given component (say the zeroth or first components) across all training examples. Each column of X is therefore exactly the thing we want to multiply element-wise with the std_err before summing to get the corresponding component of the theta vector.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1 : num_iters
hypothesis = X * theta;
Error = (hypothesis - y);
temp = theta - ((alpha / m) * (Error' * X)');
theta = temp;
J_history(iter) = computeCost(X, y, theta);
end
end
.
.
.
.
.
.
.
.
.
Spoiler alert
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
% ========================== BEGIN ===========================
t = zeros(2,1);
J = computeCost(X, y, theta);
t = theta - ((alpha*((theta'*X') - y'))*X/m)';
theta = t;
J1 = computeCost(X, y, theta);
if(J1>J),
break,fprintf('Wrong alpha');
else if(J1==J)
break;
end;
% ========================== END ==============================
% Save the cost J in every iteration
J_history(iter) = sum(computeCost(X, y, theta));
end
end