Compute gradients for unknown number of factor - octave

Is there a better way to write this? More particularly, is there a way to remove the loop and calculate grad directly without iterating?
for j = 1:size(theta)
grad(j) = 1 / m * sum((h - y) .* X(:, j));
endfor
h and y are both vectors, X is a matrix with an arbitrary number of rows and the same number of columns as theta

Your code seems to be pretty optimized.
The only other way; this code can be written; is initializing grad to a zeros matrix and then using the vectorized computation for carrying out the necessary Grad. descent optimization algorithm.

Related

Why the time complexity of bellman equation is n^3 for the direct solution?

For matrix form of bellman equation there is a direct solution. But I dont understand why the computational complexity for this form is n^3. I will be very appreciated if anyone can explain. thx
The Bellman equation of the value function in vector form can be written as
V = R + γPV
Where
V is a column vector representing the value function for each state (1..n)
R is a column vector representing the immediate reward after exiting a particular state
γ (gamma) is the discount factor
P is an nxn transition matrix (All the places we may end up)
Because the Bellman equation is a linear equation we can solve it directly. We can solve for V by rearranging the equation like this.
V - γPV = R
(I - γP) V = R
V = R (I - γP)^-1
Basically, due to the fact that you need to perform matrix inversion to solve for V, which is O(n^3) complexity using the standard Gauss-Jordan algorithm

How to map number in a range to another in the same range with no collisions?

Effectively what I'm looking for is a function f(x) that outputs into a range that is pre-defined. Calling f(f(x)) should be valid as well. The function should be cyclical, so calling f(f(...(x))) where the number of calls is equal to the size of the range should give you the original number, and f(x) should not be time dependent and will always give the same output.
While I can see that taking a list of all possible values and shuffling it would give me something close to what I want, I'd much prefer it if I could simply plug values into the function one at a time so that I do not have to compute the entire range all at once.
I've looked into Minimal Perfect Hash Functions but haven't been able to find one that doesn't use external libraries. I'm okay with using them, but would prefer to not do so.
If an actual range is necessary to help answer my question, I don't think it would need to be bigger than [0, 2^24-1], but the starting and ending values don't matter too much.
You might want to take a look at Linear Congruential Generator. You shall be looking at full period generator (say, m=224), which means parameters shall satisfy Hull-Dobell Theorem.
Calling f(f(x)) should be valid as well.
should work
the number of calls is equal to the size of the range should give you the original number
yes, for LCG with parameters satisfying Hull-Dobell Theorem you'll get full period covered once, and 'm+1' call shall put you back at where you started.
Period of such LCG is exactly equal to m
should not be time dependent and will always give the same output
LCG is O(1) algorithm and it is 100% reproducible
LCG is reversible as well, via extended Euclid algorithm, check Reversible pseudo-random sequence generator for details
Minimal perfect hash functions are overkill, all you've asked for is a function f that is,
bijective, and
"cyclical" (ie fN=f)
For a permutation to be cyclical in that way, its order must divide N (or be N but in a way that's just a special case of dividing N). Which in turn means the LCM of the orders of the sub-cycles must divide N. One way to do that is to just have one "sub"-cycle of order N. For power of two N, it's also really easy to have lots of small cycles of some other power-of-two order. General permutations do not necessarily satisfy the cycle-requirement, of course they are bijective but the LCM of the orders of the sub-cycles may exceed N.
In the following I will leave all reduction modulo N implicit. Without loss of generality I will assume the range starts at 0 and goes up to N-1, where N is the size of the range.
The only thing I can immediately think of for general N is f(x) = x + c where gcd(c, N) == 1. The GCD condition ensures there is only one cycle, which necessarily has order N.
For power-of-two N I have more inspiration:
f(x) = cx where c is odd. Bijective because gcd(c, N) == 1 so c has a modular multiplicative inverse. Also cN=1, because φ(N)=N/2 (since N is a power of two) so cφ(N)=1 (Euler's theorem).
f(x) = x XOR c where c < N. Trivially bijective and trivially cycles with a period of 2, which divides N.
f(x) = clmul(x, c) where c is odd and clmul is carry-less multiplication. Bijective because any odd c has a carry-less multiplicative inverse. Has some power-of-two cycle length (less than N) so it divides N. I don't know why though. This is a weird one, but it has decent special cases such as x ^ (x << k). By symmetry, the "mirrored" version also works.
Eg x ^ (x >> k).
f(x) = x >>> k where >>> is bit-rotation. Obviously bijective, and fN(x) = x >>> Nk, where Nk mod N = 0 so it rotates all the way back to the unrotated position regardless of what k is.

Is second order method worse than first order method?

I was thinking about an elementary question in numerical analysis.
When discretizing an ordinary differential equation, it is well known that a second order method is more accurate than a first order method, since the truncation error for second order method is O(dx^2) and O(dx) for the first order method. This is true when 0 < dx < 1.
what if dx > 1? For example, the domain is 0 to 10000 and the mesh size is 1000, then dx = 10. In this case, is the second order method not accurate as first order method, since dx^2 = 100 and dx = 10? We can encounter this when dealing with large scale problem, such as climate modeling (the cloud size could be several kilometers).
A second order method is not more accurate than a first order method because dx^2 < dx, for some value of dx. It's a statement about the asymptotic rate of convergence for small dx.
Additionally, comparing dx^2 to dx directly doesn't make sense, because dx isn't a unitless quantity, it's a length. So you're trying to compare an area to a length, which is meaningless.
In big-O notation, if a quantity converges with O(dx^2), then that typically means that the error is of the form e = a2 dx^2 + a3 dx^3 + ... The leading coefficient a2 is in the units of X/meters^2, where X is whatever units your error is in, and maybe you use some other length instead of meters. Similarly, for a first order solution, the error is in the form b1 dx + b2 dx^2 + ..., where b1 is in units of X/meters.
So if you decide you can neglect the non-leading terms (which you probably can't for large values of dx), the comparison isn't between dx^2 and dx, it's between a2 dx^2 and b1 dx. There is obviously a cross over between these two error terms, but it's not at dx=1, it's at dx = b1/a2. If your discretization is that coarse, you're probably not in the asymptotic regime in which you can ignore higher-order terms, and your solution is probably very inaccurate anyway.

Operate on all rows of a matrix without loop, octave

Suppose I have a matrix A and I want to obtain the following:
for i=1:m
A(i,:) = something which depends on i;
endfor
Is there a way to obtain that without the loop?
Added: Ok, I've understood I have to be more specific.
I have two matrices B and C (all the matrices we are considering have m rows).
I want to record in the i-th row of A the product of the polynomials written in the i-th rows of B and C (so using the loop I would call the conv function).
Any ideas?
I don't think it's possible to do this without a for loop as conv only accepts vector inputs, but I might be wrong. I can't see a way of using either bsxfun or arrayfun in conjunction with conv and matrix inputs. I might be wrong though... I stand to be corrected.
That is a very very general question, and not possible to answer with more details. Mainly on what i will be involved with. Suppose the following
for i = 1:m
A(i,:) += i;
endfor
It could be written with the much more efficient:
A .+ (1:m)'
Just compare:
octave> n = 1000;
octave> A = B = rand (n);
octave> tic; for i = 1:n, B(i,:) += i; endfor; toc
Elapsed time is 0.051 seconds.
octave> tic; C = A.+ (1:n)'; toc
Elapsed time is 0.01 seconds.
octave> isequal (C, B)
ans = 1
If you have a very old version of octave, you can instead do bsxfun (#plus, A, (i:m)').
However, if i on the right side of the expression will be used for indexing some other variable, then solution would be different. Maybe, the solution is cumsum, or some other of cumfoo function.
Your question is basically, "how do I vectorize code?", which is a really large subject, without telling us what you're trying to vectorize.

How would you write this algorithm for large combinations in the most compact way?

The number of combinations of k items which can be retrieved from N items is described by the following formula.
N!
c = ___________________
(k! * (N - k)!)
An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.
Optimize this formula to run with the smallest O time complexity
This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.
http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000
I'll post some info/links from that discussion after some people take a stab at the solution.
Any language is acceptable.
Python: O(min[k,n-k]2)
def choose(n,k):
k = min(k,n-k)
p = q = 1
for i in xrange(k):
p *= n - i
q *= 1 + i
return p/q
Analysis:
The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
The cost of each multiplication will then also increase linearly.
This sum of all iterations becomes an arithmetic series over k.
My conclusion: O(k2)
If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×104884378.)
def choose(n,k):
k = min(k,n-k)
result = 1.0
for i in xrange(k):
result *= 1.0 * (n - i) / (1 + i)
return result
Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.
Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.
If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:
3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
2 multiplications
1 divisions
The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.
EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.
EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.
I'd solve it in Mathematica:
Binomial[n, k]
Man, that was easy...
Python: approximation in O(1) ?
Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).
from decimal import Decimal
ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()
pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')
# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z
def choose(n, k):
n = Decimal(str(n))
k = Decimal(str(k))
return gamma(n+1)/gamma(k+1)/gamma(n-k+1)
Example:
>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')
Any higher, and it will overflow. The exponent seems to be limited to 40000000.
Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.
It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.
MATLAB:
The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)
nchoosek(N,k)
My solution: 36 characters, O(min(k,N-k))
a=min(k,N-k);
prod(N-a+1:N)/prod(1:a)
I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:
public int NChooseK(int n, int k)
{
var result = 1;
for (var i = 1; i <= k; i++)
{
result *= n - (k - i);
result /= i;
}
return result;
}
The final code is so simple you won't believe it will work until you run it.
Also, the original article gives some nice explanation on how he reached the final algorithm.