Gradient Descent implementation in octave - octave

I've actually been struggling against this for like 2 months now. What is it that makes these different?
hypotheses= X * theta
temp=(hypotheses-y)'
temp=X(:,1) * temp
temp=temp * (1 / m)
temp=temp * alpha
theta(1)=theta(1)-temp
hypotheses= X * theta
temp=(hypotheses-y)'
temp=temp * (1 / m)
temp=temp * alpha
theta(2)=theta(2)-temp
theta(1) = theta(1) - alpha * (1/m) * ((X * theta) - y)' * X(:, 1);
theta(2) = theta(2) - alpha * (1/m) * ((X * theta) - y)' * X(:, 2);
The latter works. I'm just not sure why..I struggle to understand the need for the matrix inverse .

What you're doing in the first example in the second block you've missed out a step haven't you? I am assuming you concatenated X with a vector of ones.
temp=X(:,2) * temp
The last example will work but can be vectorized even more to be more simple and efficient.
I've assumed you only have 1 feature. it will work the same with multiple features since all that happens is you add an extra column to your X matrix for each feature. Basically you add a vector of ones to x to vectorize the intercept.
You can update a 2x1 matrix of thetas in one line of code. With x concatenate a vector of ones making it a nx2 matrix then you can calculate h(x) by multiplying by the theta vector (2x1), this is (X * theta) bit.
The second part of the vectorization is to transpose (X * theta) - y) which gives you a 1*n matrix which when multiplied by X (an n*2 matrix) will basically aggregate both (h(x)-y)x0 and (h(x)-y)x1. By definition both thetas are done at the same time. This results in a 1*2 matrix of my new theta's which I just transpose again to flip around the vector to be the same dimensions as the theta vector. I can then do a simple scalar multiplication by alpha and vector subtraction with theta.
X = data(:, 1); y = data(:, 2);
m = length(y);
X = [ones(m, 1), data(:,1)];
theta = zeros(2, 1);
iterations = 2000;
alpha = 0.001;
for iter = 1:iterations
theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Performs gradient descent to learn theta. Updates theta by taking num_iters
% gradient steps with learning rate alpha.
% Number of training examples
m = length(y);
% Save the cost J in every iteration in order to plot J vs. num_iters and check for convergence
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
h = X * theta;
stderr = h - y;
theta = theta - (alpha/m) * (stderr' * X)';
J_history(iter) = computeCost(X, y, theta);
end
end

In the first one, if X were a 3x2 matrix and theta were a 2x1 matrix, then "hypotheses" would be a 3x1 matrix.
Assuming y is a 3x1 matrix, then you can perform (hypotheses - y) and get a 3x1 matrix, then the transpose of that 3x1 is a 1x3 matrix assigned to temp.
Then the 1x3 matrix is set to theta(2), but this should not be a matrix.
The last two lines of your code works because, using my mxn examples above,
(X * theta)
would be a 3x1 matrix.
Then that 3x1 matrix is subtracted by y (a 3x1 matrix) and the result is a 3x1 matrix.
(X * theta) - y
So the transpose of the 3x1 matrix is a 1x3 matrix.
((X * theta) - y)'
Finally, a 1x3 matrix times a 3x1 matrix will equal a scalar or 1x1 matrix, which is what you are looking for. I'm sure you knew already, but just to be thorough, the X(:,2) is the second column of the 3x2 matrix, making it a 3x1 matrix.

When you update you need to do like
Start Loop {
temp0 = theta0 - (equation_here);
temp1 = theta1 - (equation_here);
theta0 = temp0;
theta1 = temp1;
} End loop

This can be vectorized more simply with
h = X * theta % m-dimensional matrix (prediction our hypothesis gives per training example)
std_err = h - y % an m-dimensional matrix of errors (one per training example)
theta = theta - (alpha/m) * X' * std_err
Remember X, is the design matrix, and as such each row of X represents a training example and each column of X represents a given component (say the zeroth or first components) across all training examples. Each column of X is therefore exactly the thing we want to multiply element-wise with the std_err before summing to get the corresponding component of the theta vector.

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1 : num_iters
hypothesis = X * theta;
Error = (hypothesis - y);
temp = theta - ((alpha / m) * (Error' * X)');
theta = temp;
J_history(iter) = computeCost(X, y, theta);
end
end

.
.
.
.
.
.
.
.
.
Spoiler alert
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
% ========================== BEGIN ===========================
t = zeros(2,1);
J = computeCost(X, y, theta);
t = theta - ((alpha*((theta'*X') - y'))*X/m)';
theta = t;
J1 = computeCost(X, y, theta);
if(J1>J),
break,fprintf('Wrong alpha');
else if(J1==J)
break;
end;
% ========================== END ==============================
% Save the cost J in every iteration
J_history(iter) = sum(computeCost(X, y, theta));
end
end

Related

Weird errors on Gradient Descent in Octave (Syntax errors on known command)

I'm trying to implement Gradient descent in octave. I know I can do it by calculating every value of theta by itself like this
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta_0 = theta(1) - alpha / m * sum(X * theta - y);
theta_1 = theta(2) - alpha / m * sum((X * theta - y) .* X(:, 2));
theta = [theta_0; theta_1];
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
But I want to implement it with vectors in one step like this:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta - (alpha/m * sum(((X * theta) - y).*X)
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
The problem is when I run the second piece of code I get this error:
syntax error
>>> J_history(iter) = computeCost(X, y, theta);
^
What have I done wrong / how can I fix this?

Storing cost history in a vector

I wrote following code for gradientDescent in Octave in .m file as follows:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
% Test values:
X = [1 5; 1 2; 1 4; 1 5];
y = [1 6 4 2]';
theta = [0 0]';
alpha = 0.01;
num_iters = 1000;
% Initialize some useful values:
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
x = X(:,2);
h = theta(1) + (theta(2)*x);
theta_zero = theta(1) - alpha * (1/m) * sum(h-y);
theta_one = theta(2) - alpha * (1/m) * sum((h - y) .* x);
theta = [theta_zero; theta_one];
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta); % History of J
end
disp(min(J_history));
end
% Code for computeCost function is as follows:
function J = computeCost(X, y, theta)
data =
6.1101 17.5920
5.5277 9.1302
8.5186 13.6620
7.0032 11.8540
5.8598 6.8233
8.3829 11.8860
7.4764 4.3483
8.5781 12.0000
6.4862 6.5987
m = length(y);
J = 0;
X = data(:, 1);
y = data(:, 2);
predictions = X*theta'; % predictions of hypothesis on examples
sqrErrors = (predictions - y).^2; % squared errors
J = 1/(2*m) * sum(sqrErrors);
end
When I run this from octave workspace I get the following error:
Error: A(I) = X: X must have the same size as I
error: called from
gradientDescent at line 55 column 21
I tried many things but unsuccessfully and mentors never replied properly.
Can you please tell me where I may be making a mistake.
Thanks in advance.
Bharat.

Mathematic hidden collision

I'm working on a project that uses tile engine (self made), and my next task is to create an AI (besides other AIs that are done), this one is tricky because the AI should only spot the player if the player is in the AI's sight. Tried it with for cycles to after calculating the ranges (in tiles) [1 tile = 32*32].
Then I thought about creating an equation of a straight line. And here I am, puzzled in math.
Any idea how could I calculate if it's overlaps one of these "hidden" tiles?
NOTE that I want only use math!
TileInfo.tileData[la[floor(y / 32)][floor(x / 32)]];
//la -> array of tile positions, if it's >0 then there is a tile.
Say that the viewer is at position (x1,y1) and the target at (x2,y2). Now, I am assuming that there is a set of n contiguous tiles along along x and m along y. The lower, left corner of the first of these tiles is at position (x0,y0). The size of tiles are d along x and t along y. Now the math:
The line connecting viewer and target is
y = y1 + (y2 - y1) * (x - x1) / (x2 - x1)
The tiles corners are at points p1 = (x0,y0); p2 = (x0 + n * d, y0); p3 = (x0 + n * d, y0 + m * t); p4 = (x0, y0 + m * t). Now the job is to find if that line crosses any of the 4 segments connecting two consecutive corners. Let's take the segment between p1 and p2 (a horizontal line) defined by y = y0. If you set this into the line equation you can find the possible interception x which I named xi:
y0 = (y2 - y1) * (xi - x1) / (x2 - x1) + y1
You can invert this equation and find the possibx:
xi = x1 + (y0 - y1) * (x2 - x1) / (y2 - y1)
Now if xi > x0 and xi < x0 + n * d you have an interception for this segment. Otherwise you have a free line of sight.
Do the same for the other three segments whose straight lines are defined by p2 -> p3: x = x0 + n * d; p3 -> p4: y = y0 + m * d; and p4 -> p1: x = x0.
Note that when the segment is horizontal (y = const) you have to put this y in the line of sight straight line, calculate x and compare this x with the intercept. If the segment is vertical (x = const) then you have to put x in the straight line equation, calculate y and check if it falls in the interval or not.
A final remark is that you have to take particular care of cases where x1 = x2 or y1 = y2. This are vertical and horizontal line of sights and may lead to division by zero in the above equations. The solution: deal with these cases separately.

Math - Get x & y coordinates at intervals along a line

I'm trying to get x and y coordinates for points along a line (segment) at even intervals. In my test case, it's every 16 pixels, but the idea is to do it programmatically in ActionScript-3.
I know how to get slope between two points, the y intercept of a line, and a2 + b2 = c2, I just can't recall / figure out how to use slope or angle to get a and b (x and y) given c.
Does anyone know a mathematical formula to figure out a and b given c, y-intercept and slope (or angle)? (AS3 is also fine.)
You have a triangle:
|\ a^2 + b^2 = c^2 = 16^2 = 256
| \
| \ c a = sqrt(256 - b^2)
a | \ b = sqrt(256 - a^2)
| \
|__________\
b
You also know (m is slope):
a/b = m
a = m*b
From your original triangle:
m*b = a = sqrt(256 - b^2)
m^2 * b^2 = 256 - b^2
Also, since m = c, you can say:
m^2 * b^2 = m^2 - b^2
(m^2 + 1) * b^2 = m^2
Therefore:
b = m / sqrt(m^2 + 1)
I'm lazy so you can find a yourself: a = sqrt(m^2 - b^2)
Let s be the slop.
we have: 1) s^2 = a^2/b^2 ==> a^2 = s^2 * b^2
and: 2) a^2 + b^2 = c^2 = 16*16
substitute a^2 in 2) with 1):
b = 16/sqrt(s^2+1)
and
a = sqrt((s^2 * 256)/(s^2 + 1)) = 16*abs(s)/sqrt(s^2+1)
In above, I assume you want to get the length of a and b. In reality, your s is a signed value, so a could be negative. Therefore, the incremental value of a will really be:
a = 16s/sqrt(s^2+1)
The Point class built in to Flash has a wonderful set of methods for doing exactly what you want. Define the line using two points and you can use the "interpolate" method to get points further down the line automatically, without any of the trigonometry.
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/geom/Point.html#interpolate()
The Slope is dy/dx. Or in your terms A/B.
Therefore you can step along the line by adding A to the Y coordinate, and B to the X coordinate. You can Scale A and B to make the steps bigger or smaller.
To Calculate the slope and get A and B.
Take two points on the line (X1,Y1) , (X2,Y2)
A= (Y2-Y1)
B= (X2-X1)
If you calculate this with the two points you want to iterate between simply divide A and B by the number of steps you want to take
STEPS=10
yStep= A/STEPS
xStep= B/STEPS
for (i=0;i<STEPS;i++)
{
xCur=x1+xStep*i;
yCur=y1+yStep*i;
}
Given the equation for a line as y=slope*x+intercept, you can simply plug in the x-values and read back the y's.
Your problem is computing the step-size along the x-axis (how big a change in x results from a 16-pixel move along the line, which is b in your included plot). Given that you know a^2 + b^2 = 16 (by definition) and slope = a/b, you can compute this:
slope = a/b => a = b * slope [multiply both sides by b]
a^2 + b^2 = 16 => (b * slope)^2 + b^2 = 16 [by substitution from the previous step]
I'll leave it to you to solve for b. After you have b you can compute (x,y) values by:
for x = 0; x += b
y = slope * x + intercept
echo (x,y)
loop

Intersection of parabolic curve and line segment

I have an equation for a parabolic curve intersecting a specified point, in my case where the user clicked on a graph.
// this would typically be mouse coords on the graph
var _target:Point = new Point(100, 50);
public static function plot(x:Number, target:Point):Number{
return (x * x) / target.x * (target.y / target.x);
}
This gives a graph such as this:
I also have a series of line segments defined by start and end coordinates:
startX:Number, startY:Number, endX:Number, endY:Number
I need to find if and where this curve intersects these segments (A):
If it's any help, startX is always < endX
I get the feeling there's a fairly straight forward way to do this, but I don't really know what to search for, nor am I very well versed in "proper" math, so actual code examples would be very much appreciated.
UPDATE:
I've got the intersection working, but my solution gives me the coordinate for the wrong side of the y-axis.
Replacing my target coords with A and B respectively, gives this equation for the plot:
(x * x) / A * (B/A)
// this simplifies down to:
(B * x * x) / (A * A)
// which i am the equating to the line's equation
(B * x * x) / (A * A) = m * x + b
// i run this through wolfram alpha (because i have no idea what i'm doing) and get:
(A * A * m - A * Math.sqrt(A * A * m * m + 4 * b * B)) / (2 * B)
This is a correct answer, but I want the second possible variation.
I've managed to correct this by multiplying m with -1 before the calculation and doing the same with the x value the last calculation returns, but that feels like a hack.
SOLUTION:
public static function intersectsSegment(targetX:Number, targetY:Number, startX:Number, startY:Number, endX:Number, endY:Number):Point {
// slope of the line
var m:Number = (endY - startY) / (endX - startX);
// where the line intersects the y-axis
var b:Number = startY - startX * m;
// solve the two variatons of the equation, we may need both
var ix1:Number = solve(targetX, targetY, m, b);
var ix2:Number = solveInverse(targetX, targetY, m, b);
var intersection1:Point;
var intersection2:Point;
// if the intersection is outside the line segment startX/endX it's discarded
if (ix1 > startX && ix1 < endX) intersection1 = new Point(ix1, plot(ix1, targetX, targetY));
if (ix2 > startX && ix2 < endX) intersection2 = new Point(ix2, plot(ix2, targetX, targetY));
// somewhat fiddly code to return the smallest set intersection
if (intersection1 && intersection2) {
// return the intersection with the smaller x value
return intersection1.x < intersection2.x ? intersection1 : intersection2;
} else if (intersection1) {
return intersection1;
}
// this effectively means that we return intersection2 or if that's unset, null
return intersection2;
}
private static function solve(A:Number, B:Number, m:Number, b:Number):Number {
return (m + Math.sqrt(4 * (B / (A * A)) * b + m * m)) / (2 * (B / (A * A)));
}
private static function solveInverse(A:Number, B:Number, m:Number, b:Number):Number {
return (m - Math.sqrt(4 * (B / (A * A)) * b + m * m)) / (2 * (B / (A * A)));
}
public static function plot(x:Number, targetX:Number, targetY:Number):Number{
return (targetY * x * x) / (targetX * targetX);
}
Or, more explicit yet.
If your parabolic curve is
y(x)= A x2+ B x + C (Eq 1)
and your line is
y(x) = m x + b (Eq 2)
The two possible solutions (+ and -) for x are
x = ((-B + m +- Sqrt[4 A b + B^2 - 4 A C - 2 B m + m^2])/(2 A)) (Eq 3)
You should check if your segment endpoints (in x) contains any of these two points. If they do, just replace the corresponding x in the y=m x + b equation to get the y coordinate for the intersection
Edit>
To get the last equation you just say that the "y" in eq 1 is equal to the "y" in eq 2 (because you are looking for an intersection!).
That gives you:
A x2+ B x + C = m x + b
and regrouping
A x2+ (B-m) x + (C-b) = 0
Which is a quadratic equation.
Equation 3 are just the two possible solutions for this quadratic.
Edit 2>
re-reading your code, it seems that your parabola is defined by
y(x) = A x2
where
A = (target.y / (target.x)2)
So in your case Eq 3 becomes simply
x = ((m +- Sqrt[4 A b + m^2])/(2 A)) (Eq 3b)
HTH!
Take the equation for the curve and put your line into y = mx +b form. Solve for x and then determine if X is between your your start and end points for you line segment.
Check out: http://mathcentral.uregina.ca/QQ/database/QQ.09.03/senthil1.html
Are you doing this often enough to desire a separate test to see if an intersection exists before actually computing the intersection point? If so, consider the fact that your parabola is a level set for the function f(x, y) = y - (B * x * x) / (A * A) -- specifically, the one for which f(x, y) = 0. Plug your two endpoints into f(x,y) -- if they have the same sign, they're on the same side of the parabola, while if they have different signs, they're on different sides of the parabola.
Now, you still might have a segment that intersects the parabola twice, and this test doesn't catch that. But something about the way you're defining the problem makes me feel that maybe that's OK for your application.
In other words, you need to calulate the equation for each line segment y = Ax + B compare it to curve equation y = Cx^2 + Dx + E so Ax + B - Cx^2 - Dx - E = 0 and see if there is a solution between startX and endX values.