Why the time complexity of bellman equation is n^3 for the direct solution? - reinforcement-learning

For matrix form of bellman equation there is a direct solution. But I dont understand why the computational complexity for this form is n^3. I will be very appreciated if anyone can explain. thx

The Bellman equation of the value function in vector form can be written as
V = R + γPV
Where
V is a column vector representing the value function for each state (1..n)
R is a column vector representing the immediate reward after exiting a particular state
γ (gamma) is the discount factor
P is an nxn transition matrix (All the places we may end up)
Because the Bellman equation is a linear equation we can solve it directly. We can solve for V by rearranging the equation like this.
V - γPV = R
(I - γP) V = R
V = R (I - γP)^-1
Basically, due to the fact that you need to perform matrix inversion to solve for V, which is O(n^3) complexity using the standard Gauss-Jordan algorithm

Related

How to calculate the gradient for a custom deep learning layer that takes a 1X1 neuron as input and multiplies it by a constant NXN matrix?

I want to create a custom deep learning layer that takes as input a 1X1 neuron and uses it to scale a constant, predefined NXN matrix. I do not understand how to calculate the gradient for this layer.
I understand that in this case dLdZ is NXN and dLdX should be 1X1, and I don't understand what dZdX should be to satisfy that, it's obviously not a simple chained derivative where dLdX = dLdZ*dZdX since the dimensions don't match.
The question is not really language depenedent, I write here in Matlab.
%M is the constant NXN matrix
%X is 1X1X1Xb
Z = zeros(N,N,1,b);
for i = 1:b
Z(:,:,:,i) = squeeze(X(:,:,1,i))*M;
end
==============================
edit: the answer I got was very helpful. I now perform the calculation as follows:
dLdX = zeros(1,1,1,b);
for i = 1:b
dLdX(:,:,:,i) =sum(sum(dLdZ(:,:,:,i).*M)));
end
This works perfectly. Thanks!!
I think ur question is a little unclear. I will assume ur goal is to propagate the gradients through ur above defined layer to the batch of scalar values. Let me answer according to how I understand it.
U have parameter X, which is a scalar and of dimension b (b: batch_size). This is used to scale a constant matrix Z, which is of dimension NxN. Lets assume u calculate some scalar loss L immediately from the scaled matrix Z' = Z*X, where Z' is of dimension bxNxN.
Then u can calculate the gradients in X according to:
dL/dX = dL/dZ * dZ/dX --> Note that the dimensions of this product indeed match (unlike ur initial impression) since dL/dZ' is bxNxN and dZ'/dX is bxNxN. Summing over the correct indeces yields dL/dX which is of dimension b.
Did I understand u correct?
Cheers

Compute gradients for unknown number of factor

Is there a better way to write this? More particularly, is there a way to remove the loop and calculate grad directly without iterating?
for j = 1:size(theta)
grad(j) = 1 / m * sum((h - y) .* X(:, j));
endfor
h and y are both vectors, X is a matrix with an arbitrary number of rows and the same number of columns as theta
Your code seems to be pretty optimized.
The only other way; this code can be written; is initializing grad to a zeros matrix and then using the vectorized computation for carrying out the necessary Grad. descent optimization algorithm.

Calculating an integral of two numerical solutions of an ode

I would like to calculate an integral, which is determined by two functions: I(T) = ∫0T i( f(t), g(t)) dt where f and g solves ordinary differential equations and i is known.
The obvious approach would be to derive a differential equation for I and the solve it alongside f and g (which can be done, but is numerically expensive in my case). In my case, however, f solves an equation with an initial condition f(0) and g and equation with a final condition g(T).
My best guess at the moment would be to solve f and g on a grid using a standard ODE solver and then use a standard method for numerical integration with equally spaced t-coordinates or some kind of quadrature rule (basically anything described by Numerical Recipes).
Does anyone have a better solution? That is, a method that takes the specific type of ode solver and its accuracy into account.
Many advanced ODE solvers come with a feature called "dense output". The ODE solver gives you not only the values of f and g on a grid (as specified beforehand), but allows you to use its result to find the values at any time. Combining this with an adaptive quadrature rule should give you an answer to whatever precision you need.

Normal vector from least squares-derived plane

I have a set of points and I can derive a least squares solution in the form:
z = Ax + By + C
The coefficients I compute are correct, but how would I get the vector normal to the plane in an equation of this form? Simply using A, B and C coefficients from this equation don't seem correct as a normal vector using my test dataset.
Following on from dmckee's answer:
a x b = (a2b3 − a3b2), (a3b1 − a1b3), (a1b2 − a2b1)
In your case a1=1, a2=0 a3=A b1=0 b2=1 b3=B
so = (-A), (-B), (1)
Form the two vectors
v1 = <1 0 A>
v2 = <0 1 B>
both of which lie in the plane and take the cross-product:
N = v1 x v2 = <-A, -B, +1> (or v2 x v1 = <A, B, -1> )
It works because the cross-product of two vectors is always perpendicular to both of the inputs. So using two (non-colinear) vectors in the plane gives you a normal.
NB: You probably want a normalized normal, of course, but I'll leave that as an exercise.
A little extra color on the dmckee answer. I'd comment directly, but I do not have enough SO rep yet. ;-(
The plane z = Ax + By + C only contains the points (1, 0, A) and (0, 1, B) when C=0. So, we would be talking about the plane z = Ax + By. Which is fine, of course, since this second plane is parallel to the original one, the unique vertical translation that contains the origin. The orthogonal vector we wish to compute is invariant under translations like this, so no harm done.
Granted, dmckee's phrasing is that his specified "vectors" lie in the plane, not the points, so he's arguably covered. But it strikes me as helpful to explicitly acknowledge the implied translations.
Boy, it's been a while for me on this stuff, too.
Pedantically yours... ;-)

How would you write this algorithm for large combinations in the most compact way?

The number of combinations of k items which can be retrieved from N items is described by the following formula.
N!
c = ___________________
(k! * (N - k)!)
An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.
Optimize this formula to run with the smallest O time complexity
This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.
http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000
I'll post some info/links from that discussion after some people take a stab at the solution.
Any language is acceptable.
Python: O(min[k,n-k]2)
def choose(n,k):
k = min(k,n-k)
p = q = 1
for i in xrange(k):
p *= n - i
q *= 1 + i
return p/q
Analysis:
The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
The cost of each multiplication will then also increase linearly.
This sum of all iterations becomes an arithmetic series over k.
My conclusion: O(k2)
If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×104884378.)
def choose(n,k):
k = min(k,n-k)
result = 1.0
for i in xrange(k):
result *= 1.0 * (n - i) / (1 + i)
return result
Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.
Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.
If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:
3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
2 multiplications
1 divisions
The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.
EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.
EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.
I'd solve it in Mathematica:
Binomial[n, k]
Man, that was easy...
Python: approximation in O(1) ?
Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).
from decimal import Decimal
ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()
pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')
# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z
def choose(n, k):
n = Decimal(str(n))
k = Decimal(str(k))
return gamma(n+1)/gamma(k+1)/gamma(n-k+1)
Example:
>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')
Any higher, and it will overflow. The exponent seems to be limited to 40000000.
Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.
It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.
MATLAB:
The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)
nchoosek(N,k)
My solution: 36 characters, O(min(k,N-k))
a=min(k,N-k);
prod(N-a+1:N)/prod(1:a)
I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:
public int NChooseK(int n, int k)
{
var result = 1;
for (var i = 1; i <= k; i++)
{
result *= n - (k - i);
result /= i;
}
return result;
}
The final code is so simple you won't believe it will work until you run it.
Also, the original article gives some nice explanation on how he reached the final algorithm.