Octave genetic algorithm not using lower bounds - octave

Is there a trick to getting Octave's genetic algorithm solver to respect upper and lower bounds? For example,
options = gaoptimset('Generations', 10);
nvars = 6;
LB = ones(1,nvars);
UB = LB*10;
[soln, fval, exitflag] = ga(#fitnessfcn, nvars, [], [], [], [], LB, UB, [], options)
The solver returns a solution with undetermined bounds. For example,
soln = 0.551420 1.369775 -0.313379 -0.038621 0.274696 1.359802
UPDATE: I checked the scripts in the Octave package. I am pretty sure the ga function does not use the upper and lower bounds arguments. It does seems to read some upper/lower bound information from the gaoptim set. I'll play around with it when I have some time.

As per version 0.10 seems the bounds passed as parameter to ga are pretty much ignored. For now the only way to specify a bound is through the parameter PopInitRange of gaoptimset, which requires a 2xN matrix containing the LBs in the first row and the UBs in the second one. if a 2x1 vector is provided.
The LB and UB parameters are supposed to mimic the original Matlab's ga function, but apparently the implementation is taking his own way.

Problem
I can confirm that the parameter bounds are not used by the ga() implementation in octave ga packet.
Workaround
I got around this limitation by checking the bounds inside my score function and giving a constant and high penalty score ParaOutOfRangePenalty when the bounds are violated. You can implement this similarly in your fitnessfcn():
% Your problem specific score calculation goes here:
score = myfitnessfcn(data,parameters);
% Add score for each parameter outside limits to create a "soft" punishment
score = score + sum(parameters(:) < lowerBound))*ParaOutOfRangePenalty;
score = score + sum(parameters(:) > upperBound))*ParaOutOfRangePenalty;
Results
I found that octave ga() algorithm is clearly respecting the boundaries using this method.

Related

Having issue with max_norm parameter of torch.nn.Embedding

I use torch.nn.Embedding to embed my model’s categorical input features, however, I face problems when I set the max_norm parameter to not None.
There is a note on the pytorch docs page that explains how to use max_norm parameter through the following example:
n, d, m = 3, 5, 7
embedding = nn.Embedding(n, d, max_norm=True)
W = torch.randn((m, d), requires_grad=True)
idx = torch.tensor(\[1, 2\])
a = embedding.weight.clone() # W.t() # weight must be cloned for this to be differentiable
b = embedding(idx) # W.t() # modifies weight in-place
out = (a.unsqueeze(0) + b.unsqueeze(1))
loss = out.sigmoid().prod()
loss.backward()
I can’t easily understand this example from the docs. What is the purpose of having both ‘a’ and ‘b’ and why ‘out’ is defined as, out = (a.unsqueeze(0) + b.unsqueeze(1))?
Do we need to first clone the entire embedding tensor as in ‘a’, and then finding the embeddings for our desired indices as in ‘b’? Then how do ‘a’ and ‘b’ need to be added?
In my code, I don’t have W explicitly, I am assuming that W is representative of the weights applied by the torch.nn.Linear layers. So, I just need to prepare the input (which includes the embeddings for categorical features) that goes into my network.
I greatly appreciate any instructions on this, as understanding this example would help me adapt my code accordingly.
Because W in the line computing a requires gradients, we must save embedding.weight to compute those gradients in the backward pass. However, in the line computing b, executing embedding(idx) will scale embedding.weight by max_norm - in place. So, without cloning it in line a, embedding.weight will be modified when line b is executed - changing what was saved for the backward pass to update W. Hence the requirement to clone embedding.weight - to save it before it gets scaled in line b.
If you don't use embedding.weight outside of the normal forward pass, you don't need to worry about all this.
If you get an error, post it (and your code).

In DQN, hwo to perform gradient descent when each record in experience buffer corresponds to only one action?

The DQN algorithm below
Source
At the gradient descent line, there's something I don't quite understand.
For example, if I have 8 actions, then the output Q is a vector of 8 components, right?
But for each record in D, the return y_i is only a scalar with respect to a given action. How can I perform gradient descent on (y_i - Q)^2 ? I think it's not guaranteed that within a minibatch I have all actions' returns for a state.
You need to calculate the loss only on the Q-value which its action is selected. In your example, assume for a given row in your mini-batch, the action is 3. Then, you obtain the corresponding target, y_3, and then the loss is (Q(s,3) - y_3)^2, and basically you set the loss value of other actions to zero. You can implement this by using gather_nd in tensorflow or by obtaining one-hot-encode version of actions and then multiplying that one-hot-encode vector to Q-value vector. Using a one-hot-encode vector you can write:
action_input = tf.placeholder("float",[None,action_len])
QValue_batch = tf.reduce_sum(tf.multiply(T_Q_value,action_input), reduction_indices = 1)
in which action_input = np.eye(nb_classes)[your_action (e.g. 3)]. Same procedure can be followed by gather_nd:
https://www.tensorflow.org/api_docs/python/tf/gather_nd
I hope this resolves your confusion.

Numerical stability of ODE system

I have to perform a numerical solving of an ODE system which has the following form:
du_j/dt = f_1(u_j, v_j, t) + g_1(t)v_(j-1) + h_1(t)v_(j+1),
dv_j/dt = f_2(u_j, v_j, t) + g_2(t)u_(j-1) + h_2(t)u_(j+1),
where u_j(t) and v_j(t) are complex-valued scalar functions of time t, f_i and g_i are given functions, and j = -N,..N. This is an initial value problem and the task is to find the solution at a certain time T.
If g_i(t) = h_i(t) = 0, then the equations for different values of j can be solved independently. In this case I obtain a stable and accurate solutions with the aid of the fourth-order Runge-Kutta method. However, once I turn on the couplings, the results become very unstable with respect to the time grid step and explicit form of the functions g_i, h_i.
I guess it is reasonable to try to employ an implicit Runge-Kutta scheme, which might be stable in such a case, but if I do so, I will have to evaluate the inverse of a huge matrix of size 4*N*c, where c depends on the order of the method (e. g. c = 3 for the Gauss–Legendre method) at each step. Of course, the matrix will mostly contain zeros and have a block tridiagonal form but it still seems to be very time-consuming.
So I have two questions:
Is there a stable explicit method which works even when the coupling functions g_i and h_i are (very) large?
If an implicit method is, indeed, a good solution, what is the fastest method of the inversion of a block tridiagonal matrix? At the moment I just perform a simple Gauss method avoiding redundant operations which arise due to the specific structure of the matrix.
Additional info and details that might help us:
I use Fortran 95.
I currently consider g_1(t) = h_1(t) = g_2(t) = h_2(t) = -iAF(t)sin(omega*t), where i is the imaginary unit, A and omega are given constants, and F(t) is a smooth envelope going slowly, first, from 0 to 1 and then from 1 to 0, so F(0) = F(T) = 0.
Initially u_j = v_j = 0 unless j = 0. The functions u_j and v_j with great absolute values of j are extremely small for all t, so the initial peak does not reach the "boundaries".
To 1) There will be no stable explicit method if your functions are very large. This is due to the fact that the area of stability of explicit (Runge-Kutta) methods is compact.
To 2) If your matrices are larger then 100x100 you could use this method:
Inverses of Block Tridiagonal Matrices and Rounding Errors.

Finding the smallest distance in a set of points from the origin

I am to find the smallest distance between a given set of points and the origin. I have a matrix with 2 columns and 10 rows. Each row represents coordinates. One point consists of two coordinates and I would like to calculate the smallest distance between each point and to the origin. I would also like to determine which point gave this smallest distance.
In Octave, I calculate this distance by using norm and for each point in my set, I have a distance associated with them and the smallest distance is obviously the one I'm looking for. However, the code I wrote below isn't working the way it should.
function [dist,koor] = bonus4(S)
S= [-6.8667, -44.7967;
-38.0136, -35.5284;
14.4552, -27.1413;
8.4996, 31.7294;
-17.2183, 28.4815;
-37.5100, 14.1941;
-4.2664, -24.4428;
-18.6655, 26.9427;
-15.8828, 18.0170;
17.8440, -22.9164];
for i=1:size(S)
L=norm(S(i, :))
dist=norm(S(9, :));
koor=S(9, :) ;
end
i = 9 is the correct answer, but I need Octave to put that number in. How do I tell Octave that this is the number I want? Specifically:
dist=norm(S(9, :));
koor=S(9, :);
I cannot use any packages. I found the geometry package online but I am to solve the task without additional packages.
I'll work off of your original code. Firstly, you want to compute the norm of all of the points and store them as individual elements in an array. Your current code isn't doing that and is overwriting the variable L which is a single value at each iteration of the loop.
You'll want to make L an array and store the norms at each iteration of the loop. Once you do this, you'll want to find the location as well as the minimum distance itself. That can be done with one call to min where the first output gives you the minimum distance and the second output gives you the location of the minimum. You can use the second output to slice into your S array to retrieve the actual point.
Last but not least, you need to define S first before calling this function. You are defining S inside the function and that will probably give you unintended results if you want to change the input into this function at each invocation. Therefore, define S first, then call the function:
S= [-6.8667, -44.7967;
-38.0136, -35.5284;
14.4552, -27.1413;
8.4996, 31.7294;
-17.2183, 28.4815;
-37.5100, 14.1941;
-4.2664, -24.4428;
-18.6655, 26.9427;
-15.8828, 18.0170;
17.8440, -22.9164];
function [dist,koor] = bonus4(S)
%// New - Create an array to store the distances
L = zeros(size(S,1), 1);
%// Change to iterate over number of rows
for i=1:size(S,1)
L(i)=norm(S(i, :)); %// Change
end
[dist,ind] = min(L); %// Find the minimum distance
koor = S(ind,:); %// Get the actual point
end
Or, make sure you save the above function in a file called bonus4.m, then do this in the Octave command prompt:
octave:1> S= [-6.8667, -44.7967;
> -38.0136, -35.5284;
> 14.4552, -27.1413;
> 8.4996, 31.7294;
> -17.2183, 28.4815;
> -37.5100, 14.1941;
> -4.2664, -24.4428;
> -18.6655, 26.9427;
> -15.8828, 18.0170;
> 17.8440, -22.9164];
octave:2> [dist,koor] = bonus4(S);
Though this code works, I'll debate that it's slow as you're using a for loop. A faster way would be to do this completely vectorized. Because using norm for matrices is different than with vectors, you'll have to compute the distance yourself. Because you are measuring the distance from the origin, you can simply square each of the columns individually then add the columns of each row.
Therefore, you can just do this:
S= [-6.8667, -44.7967;
-38.0136, -35.5284;
14.4552, -27.1413;
8.4996, 31.7294;
-17.2183, 28.4815;
-37.5100, 14.1941;
-4.2664, -24.4428;
-18.6655, 26.9427;
-15.8828, 18.0170;
17.8440, -22.9164];
function [dist,koor] = bonus4(S)
%// New - Computes the norm of each point
L = sqrt(sum(S.^2, 2));
[dist,ind] = min(L); %// Find the minimum distance
koor = S(ind,:); %// Get the actual point
end
The function sum can be used to sum over a dimension independently. As such, by doing S.^2, you are squaring each term in the points matrix, then by using sum with the second parameter as 2, you are summing over all of the columns for each row. Taking the square root of this result computes the distance of each point to the origin, exactly the way the for loop functions. However, this (at least to me) is more readable and I daresay faster for larger sizes of points.

(Easy) Matlab: finding zero spots (fzero)

I'm all new to Matlab and I'm supposed to use this function to find all 3 zero spots.
f.m (my file where the function can be found)
function fval = f(x)
% FVAL = F(X), compute the value of a test function in x
fval = exp(-x) - exp(-2*x) + 0.05*x - 0.25;
So obviously I write "type f" to read my function but then I try to do like fzero ('f', 0) and I get the ans 0.4347 and I assume that's 1 of my 3 zero spots but how to find the other 2?
From fzero documentation
x = fzero(fun,x0) tries to find a zero of fun near x0, if x0 is a scalar. fun is a function handle. The value x returned by fzero is near a point where fun changes sign, or NaN if the search fails. In this case, the search terminates when the search interval is expanded until an Inf, NaN, or complex value is found.
So it can't find all zeros by itself, only one! Which one depends on your inputted x0.
Here's an example of how to find some more zeros, if you know the interval. However it just repeatedly calls fzero for different points in the interval (and then still can miss a zero if your discretization is to coarse), a more clever technique will obviously be faster:
http://www.mathworks.nl/support/solutions/en/data/1-19BT9/index.html?product=ML&solution=1-19BT9
As you can see in the documentation and the example above, the proper way for calling fzero is with a function handle (#fun), so in your case:
zero1 = fzero(#f, 0);
From this info you can also see that the actual roots are at 0.434738, 1.47755 and 4.84368. So if you call fzero with 0.4, 1.5 and 4.8 you probably get those values out of it (convergence of fzero depends on which algorithm it uses and what function you feed it).
Just to complement Gunther Struyf's answer: there's a nice function on the file exchange by Stephen Morris called FindRealRoots. This function finds an approximation to all roots of any function on any interval.
It works by approximating the function with a Chebyshev polynomial, and then compute the roots of that polynomial. This obviously only works well with continuous, smooth and otherwise well-behaved functions, but the function you give seems to have those qualities.
You would use this something like so:
%# find approximate roots
R = FindRealRoots(#f, -1, 10, 100);
%# refine all roots thus found
for ii = 1:numel(R)
R(ii) = fzero(#f, R(ii)); end