I'm attempting my first solo project, after taking an introductory course to machine learning, where I'm trying to use linear approximation to predict the outcome of addition/subtraction of two numbers.
I have 3 features: first number, subtraction/addition (0 or 1), and second number.
So my input looks something like this:
3 0 1
4 1 2
3 0 3
With corresponding output like this:
2
6
0
I have (I think) successfully implemented logistic regression algorithm, as the squared error does gradually decrease, but in 100 values, ranging from 0 to 50, the squared error value flattens out at around 685.6 after about 400 iterations.
Graph: Squared Error vs Iterations
.
To fix this, I have tried using a larger dataset for training, getting rid of regularization, and normalizing the input values.
I know that one of the steps to fix high bias is to add complexity to the approximation, but I want to maximize the performance at this particular level. Is it possible to go any further on this level?
My linear approximation code in Octave:
% Iterate
for i = 1 : iter
% hypothesis
h = X * Theta;
% reg theta prep
regTheta = Theta;
regTheta(:, 1) = 0;
% cost calc
J(i, 2) = (1 / (2 * m)) * (sum((h - y) .^ 2) + lambda * sum(sum(regTheta .^ 2,1),2));
% theta calc
Theta = Theta - (alpha / m) * ((h - y)' * X)' + lambda * sum(sum(regTheta, 1), 2);
end
Note: I'm using 0 for lambda, as to ignore regularization.
Related
Im trying to replicate a function in python and was able to code the following using multiple columns in a dataframe but was wondering if there is a python regression function that would do this more effectively. Here is the link to the description of the function. Sorry in advanced not really a stats guy. :)
http://tlc.thinkorswim.com/center/reference/thinkScript/Functions/Statistical/Inertia.html
It states that its the linear regression curve using the least-squares method to approximate data for each set of bars.
input y = close;
input n = 20;
def x = x[1] + 1; # previous value + 1
def a = (n * Sum(x * y, n) - Sum(x, n) * Sum(y, n) ) / ( n * Sum(Sqr(x), n) -Sqr(Sum(x, n)));
def b = (Sum(Sqr(x), n) * Sum(y, n) - Sum(x, n) * Sum(x * y, n) ) / ( n * Sum(Sqr(x), n) - Sqr(Sum(x, n)));
plot InertiaTS = a * x + b;
Thanks
Updated
here is the pandas columns and function. I first defined the xValue and yValue columns and then the following which is the raw calculation:
df['ind1']= ((10 * (df['xValue']*df['ysValue']).rolling(10, min_periods=10).sum() - df['xValue'].rolling(10, min_periods=10).sum()*df['ysValue'].rolling(10, min_periods=10).sum())/ (10 * (df['xValue'] ** 2).rolling(10, min_periods=10).sum() - (df['xValue'].rolling(10, min_periods=10).sum())**2)) * df['xValue'] + (((df['xValue'] ** 2).rolling(10, min_periods=10).sum()*df['ysValue'].rolling(10, min_periods=10).sum() - df['xValue'].rolling(10, min_periods=10).sum()*(df['xValue']*df['ysValue']).rolling(10, min_periods=10).sum())/(10 * (df['xValue'] ** 2).rolling(10, min_periods=10).sum() - (df['xValue'].rolling(10, min_periods=10).sum())**2))
It's not really clear whether you are just looking for a way to perform regression in python or you want to code the algorithm yourself.
if you want a package to do the regression, you can look at scikit-learn
Using,
from sklearn import linear_model
linear_model.LinearRegression()
If you want to code your own algorithm, you can look at gradient descent. you can look at a video by Andrew Ng on coursera - https://www.coursera.org/learn/machine-learning/lecture/GFFPB/gradient-descent-intuition. It's fairly intuitive to code the algorithm, the steps are as follows,
i. define a cost function - this is based on OLS(ordinary least squares) and looks like,
J = 1/2 (h(x) - y ) ^2
ii. take a partial derivative of cost function wrt each feature or j. Here X is the input vector comprised of n features one of which is j.
iii. Update the feature vector using gradient descent -
theta = theta - alpha * (partial derivative)
You can find the details here from Andrew Ng's papaper: http://cs229.stanford.edu/notes/cs229-notes1.pdf
sorry, it's difficult to put latex on SO
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
hypothesis = x*theta;
theta_0 = theta(1) - alpha(1/m)*sum((hypothesis-y)*x);
theta_1 = theta(2) - alpha(1/m)*sum((hypothesis-y)*x);
theta(1) = theta_0;
theta(2) = theta_1;
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
I keep getting this error
error: gradientDescent: subscript indices must be either positive integers less than 2^31 or logicals
on this line right in-between the first theta and =
theta_0 = theta(1) - alpha(1/m)*sum((hypothesis-y)*x);
I'm very new to octave so please go easy on me, and
thank you in advance.
This is from the coursera course on Machine Learning from Week 2
99% sure your error is on the line pointed out by topsig, where you have alpha(1/m)
it would help if you gave an example of input values to your function and what you hoped to see as an output, but I'm assuming from your comment
% taking num_iters gradient steps with learning rate alpha
that alpha is a constant, not a function. as such, you have the line alpha(1/m) without any operator in between. octave sees this as you indexing alpha with the value of 1/m.
i.e., if you had an array
x = [3 4 5]
x*(2) = [6 8 10] %% two times each element in the array
x(2) = [4] %% second element in the array
what you did doesn't seem to make sense, as 'm = length(y)' which will output a scalar, so
x = [3 4 5]; m = 3;
x*(1/m) = x*(1/3) = [1 1.3333 1.6666] %% element / 3
x(1/m) = ___error___ %% the 1/3 element in the array makes no sense
note that for certain errors it always indicates that the location of the error is at the assignment operator (the equal sign at the start of the line). if it points there, you usually have to look elsewhere in the line for the actual error. here, it was yelling at you for trying to apply a non-integer subscript (1/m)
I am new to Prolog and while I can understand the code, I find it hard to create a program. I am trying to create a function that takes an integer and return 2^(integer) example pow(4) returns 16 (2^4). I also need it to be in a loop to keep taking input until user inputs negative integer then it exits.
In this example, C is counter, X is user input, tried to include variable for output but cant think how to integrate it.
pow(0):- 0.
pow(1):- 2.
pow(X):-
X > 1,
X is X-1,
power(X),
C is X-1,
pow(X1),
X is 2*2.
pow(X):- X<0, C is 0.
pow(C).
You really need to read something about Prolog before trying to program in it. Skim through http://en.wikibooks.org/wiki/Prolog, for example.
Prolog doesn't have "functions": there are predicates. All inputs and outputs are via predicate parameters, the predicate itself doesn't return anything.
So pow(0):- 0. and pow(1):- 2. don't make any sense. What you want is pow(0, 0). and pow(1, 2).: let the first parameter be the input, and the second be the output.
X is X-1 also doesn't make sense: in Prolog variables are like algebra variables, X means the same value through the whole system of equations. Variables are basically write-once, and you have to introduce new variables in this and similar cases: X1 is X-1.
Hope that's enough info to get you started.
The [naive] recursive solution:
pow2(0,1) . % base case: any number raised to the 0 power is 1, by definition
pow2(N,M) :- % a positive integral power of 2 is computed thus:
integer(N) , % - verify than N is an inetger
N > 0 , % - verify that N is positive
N1 is N-1 , % - decrement N (towards zero)
pow2(N1,M1) , % - recurse down (when we hit zero, we start popping the stack)
M is M1*2 % - multiply by 2
. %
pow2(N,M) :- % negative integral powers of 2 are computed the same way:
integer(N) , % - verify than N is an integer
N < 0 , % - verify than N is negative
N1 is N+1 , % - increment N (towards zero).
pow2(N1,M) , % - recurse down (we we hit zero, we start popping the stack)
M is M / 2.0 % - divide by 2.
. % Easy!
The above, however, will overflow the stack when the recursion level is sufficiently high (ignoring arithmetic overflow issues). SO...
The tail-recursive solution is optimized away into iteration:
pow2(N,M) :- %
integer(N) , % validate that N is an integer
pow2(N,1,M) % invoke the worker predicate, seeding the accumulator with 1
. %
pow2(0,M,M) . % when we hit zero, we're done
pow2(N,T,M) :- % otherwise...
N > 0 , % - if N is positive,
N1 is N-1 , % - decrement N
T1 is T*2 , % - increment the accumulator
pow2(N1,T1,M) % - recurse down
. %
pow2(N,T,M) :- % otherwise...
N < 0 , % - if N is negative,
N1 is N+1 , % - increment N
T1 is T / 2.0 , % - increment the accumulator
pow2(N1,T1,M) % - recurse down
. %
If I have an FFT implementation of a certain size M (power of 2), how can I calculate the FFT of a set of size P=k*M, where k is a power of 2 as well?
#define M 256
#define P 1024
complex float x[P];
complex float X[P];
// Use FFT_M(y) to calculate X = FFT_P(x) here
[The question is expressed in a general sense on purpose. I know FFT calculation is a huge field and many architecture specific optimizations were researched and developed, but what I am trying to understand is how is this doable in the more abstract level. Note that I am no FFT (or DFT, for that matter) expert, so if an explanation can be laid down in simple terms that would be appreciated]
Here's an algorithm for computing an FFT of size P using two smaller FFT functions, of sizes M and N (the original question call the sizes M and k).
Inputs:
P is the size of the large FFT you wish to compute.
M, N are selected such that MN=P.
x[0...P-1] is the input data.
Setup:
U is a 2D array with M rows and N columns.
y is a vector of length P, which will hold FFT of x.
Algorithm:
step 1. Fill U from x by columns, so that U looks like this:
x(0) x(M) ... x(P-M)
x(1) x(M+1) ... x(P-M+1)
x(2) x(M+2) ... x(P-M+2)
... ... ... ...
x(M-1) x(2M-1) ... x(P-1)
step 2. Replace each row of U with its own FFT (of length N).
step 3. Multiply each element of U(m,n) by exp(-2*pi*j*m*n/P).
step 4. Replace each column of U with its own FFT (of length M).
step 5. Read out the elements of U by rows into y, like this:
y(0) y(1) ... y(N-1)
y(N) y(N+1) ... y(2N-1)
y(2N) y(2N+1) ... y(3N-1)
... ... ... ...
y(P-N) y(P-N-1) ... y(P-1)
Here is MATLAB code which implements this algorithm. You can test it by typing fft_decomposition(randn(256,1), 8);
function y = fft_decomposition(x, M)
% y = fft_decomposition(x, M)
% Computes FFT by decomposing into smaller FFTs.
%
% Inputs:
% x is a 1D array of the input data.
% M is the size of one of the FFTs to use.
%
% Outputs:
% y is the FFT of x. It has been computed using FFTs of size M and
% length(x)/M.
%
% Note that this implementation doesn't explicitly use the 2D array U; it
% works on samples of x in-place.
q = 1; % Offset because MATLAB starts at one. Set to 0 for C code.
x_original = x;
P = length(x);
if mod(P,M)~=0, error('Invalid block size.'); end;
N = P/M;
% step 2: FFT-N on rows of U.
for m = 0 : M-1
x(q+(m:M:P-1)) = fft(x(q+(m:M:P-1)));
end;
% step 3: Twiddle factors.
for m = 0 : M-1
for n = 0 : N-1
x(m+n*M+q) = x(m+n*M+q) * exp(-2*pi*j*m*n/P);
end;
end;
% step 4: FFT-M on columns of U.
for n = 0 : N-1
x(q+n*M+(0:M-1)) = fft(x(q+n*M+(0:M-1)));
end;
% step 5: Re-arrange samples for output.
y = zeros(size(x));
for m = 0 : M-1
for n = 0 : N-1
y(m*N+n+q) = x(m+n*M+q);
end;
end;
err = max(abs(y-fft(x_original)));
fprintf( 1, 'The largest error amplitude is %g\n', err);
return;
% End of fft_decomposition().
kevin_o's response worked quite well. I took his code and eliminated the loops using some basic Matlab tricks. It functionally is identical to his version
function y = fft_decomposition(x, M)
% y = fft_decomposition(x, M)
% Computes FFT by decomposing into smaller FFTs.
%
% Inputs:
% x is a 1D array of the input data.
% M is the size of one of the FFTs to use.
%
% Outputs:
% y is the FFT of x. It has been computed using FFTs of size M and
% length(x)/M.
%
% Note that this implementation doesn't explicitly use the 2D array U; it
% works on samples of x in-place.
q = 1; % Offset because MATLAB starts at one. Set to 0 for C code.
x_original = x;
P = length(x);
if mod(P,M)~=0, error('Invalid block size.'); end;
N = P/M;
% step 2: FFT-N on rows of U.
X=fft(reshape(x,M,N),[],2);
% step 3: Twiddle factors.
X=X.*exp(-j*2*pi*(0:M-1)'*(0:N-1)/P);
% step 4: FFT-M on columns of U.
X=fft(X);
% step 5: Re-arrange samples for output.
x_twiddle=bsxfun(#plus,M*(0:N-1)',(0:M-1))+q;
y=X(x_twiddle(:));
% err = max(abs(y-fft(x_original)));
% fprintf( 1, 'The largest error amplitude is %g\n', err);
return;
% End of fft_decomposition()
You could just use the last log2(k) passes of a radix-2 FFT, assuming the previous FFT results are from appropriately interleaved data subsets.
Well an FFT is basically a recursive type of Fourier Transform. It relies on the fact that as wikipedia puts it:
The best-known FFT algorithms depend upon the factorization of N, but there are FFTs with O(N log N) complexity for >all N, even for prime N. Many FFT algorithms only depend on the fact that e^(-2pi*i/N) is an N-th primitive root of unity, and >thus can be applied to analogous transforms over any finite field, such as number-theoretic transforms. Since the >inverse DFT is the same as the DFT, but with the opposite sign in the exponent and a 1/N factor, any FFT algorithm >can easily be adapted for it.
So this has pretty much already been done in the FFT. If you are talking about getting longer period signals out of your transform you are better off doing an DFT over the data sets of limited frequencies. There might be a way to do it from the frequency domain but IDK if anyone has actually done it. You could be the first!!!! :)
I would like to plot the shifted logistic function as shown from Wolfram Alpha.
In particular, I would like the function to be of the form
y = exp(x - t) / (1 + exp(x - t))
where t > 0. In the link, for example, t is 6. I had originally tried the following:
x = 0:.1:12;
y = exp(x - 6) ./ (1 + exp(x - 6));
plot(x, y);
axis([0 6 0 1])
However, this is not the same as the result from Wolfram Alpha. Here is an export of my plot.
I do not understand what the difference is between what I am trying to do here vs. plotting shifted sin and cosine functions (which works using the same technique).
I am not completely new to Matlab but I do not usually use it in this way.
Edit: My values for x in the code should have been from 0 to 12.
fplot takes as inputs a function handle and a range to plot for:
>> fplot(#(x) exp(x-6) / (1 + exp(x-6)), [0 12])
The beauty of fplot in this case is you don't need to spend time calculating y-values beforehand; you could also extract values from the graph after the fact if you want (by getting the line handle's XData and YData properties).
Your input to Wolfram Alpha is incorrect. It is interpreted as e*(x-6)/(1-e*(x-6)). Use plot y = exp(x - 6) / (1 + exp(x - 6)) for x from 0 to 12 in Wolfram Alpha (see here) for the same results as in MATLAB. Also use axis([0 12 0 1]) (or no axis statement at all on a new plot) to see the full results in MATLAB.
In reply to your comment: use y = exp(1)*(x - 6) ./ (1 + exp(1)*(x - 6)); to do in MATLAB what you were doing in Wolfram Alpha.