I implemented a custom loss function, which looks like this:
However, the gradient of this function is always zero and I don't understand why.
The code for the objective function:
def objective(p, output):
x,y = p
a = minA
b = minB
r = 0.1
XA = 1/2 -1/2 * torch.tanh(100*((x - a[0])**2 + (y - a[1])**2 - (r + 0.02)**2))
XB = 1/2 -1/2 * torch.tanh(100*((x - b[0])**2 + (y - b[1])**2 - (r + 0.02)**2))
q = (1-XA)*((1-XB)* output + (XB))
output_grad, _ = torch.autograd.grad(q, (x,y))
output_grad.requires_grad_()
q = output_grad**2
return q
And the code for training the model (which is a simple, fully connected NN):
model = NN(input_size)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
for e in range(epochs) :
for configuration in total:
print("Train for configuration", configuration)
# Training pass
optimizer.zero_grad()
#output is q~
output = model(configuration)
#loss is the objective function we defined
loss = objective(configuration, output.item())
loss.backward()
optimizer.step()
I really think the problem is in the output_grad, _ = torch.autograd.grad(q, (x,y)).
(During he training, "configuration" is a point sampled from a distribution identified by the coordinates x and y).
Thanks!!
Here I provide the code on a google colab session:
Google colab
Tanh is a bounded function and converges quite quickly to 1. Your XA and XB points are defined as
XA = 1/2 - 1/2 * torch.tanh(100*(z1 + z2 - z0))
XB = 1/2 - 1/2 * torch.tanh(100*(z3 + z4 - z0))
Since z1 + z2 - z0 and z3 + z4 - z0 are rather close to 1, you will end up with an input close to 100. This means the tanh will output 1, resulting in XA and XB begin zeros. You might not want to have this 100 coefficient if you want to have non zero outputs.
Related
I have a problem in performing a non linear fit with Gnu Octave. Basically I need to perform a global fit with some shared parameters, while keeping others fixed.
The following code works perfectly in Matlab, but Octave returns an error
error: operator *: nonconformant arguments (op1 is 34x1, op2 is 4x1)
Attached my code and the data to play with:
clear
close all
clc
pkg load optim
D = dlmread('hd', ';'); % raw data
bkg = D(1,2:end); % 4 sensors bkg
x = D(2:end,1); % input signal
Y = D(2:end,2:end); % 4 sensors reposnse
W = 1./Y; % weights
b0 = [7 .04 .01 .1 .5 2 1]; % educated guess for start the fit
%% model function
F = #(b) ((bkg + (b(1) - bkg).*(1-exp(-(b(2:5).*x).^b(6))).^b(7)) - Y) .* W;
opts = optimset("Display", "iter");
lb = [5 .001 .001 .001 .001 .01 1];
ub = [];
[b, resnorm, residual, exitflag, output, lambda, Jacob\] = ...
lsqnonlin(F,b0,lb,ub,opts)
To give more info, giving array b0, b0(1), b0(6) and b0(7) are shared among the 4 dataset, while b0(2:5) are peculiar of each dataset.
Thank you for your help and suggestions! ;)
Raw data:
0,0.3105,0.31342,0.31183,0.31117
0.013229,0.329,0.3295,0.332,0.372
0.013229,0.328,0.33,0.33,0.373
0.021324,0.33,0.3305,0.33633,0.399
0.021324,0.325,0.3265,0.333,0.397
0.037763,0.33,0.3255,0.34467,0.461
0.037763,0.327,0.3285,0.347,0.456
0.069405,0.338,0.3265,0.36533,0.587
0.069405,0.3395,0.329,0.36667,0.589
0.12991,0.357,0.3385,0.41333,0.831
0.12991,0.358,0.3385,0.41433,0.837
0.25368,0.393,0.347,0.501,1.302
0.25368,0.3915,0.3515,0.498,1.278
0.51227,0.458,0.3735,0.668,2.098
0.51227,0.47,0.3815,0.68467,2.124
1.0137,0.61,0.4175,1.008,3.357
1.0137,0.599,0.422,1,3.318
2.0162,0.89,0.5335,1.645,5.006
2.0162,0.872,0.5325,1.619,4.938
4.0192,1.411,0.716,2.674,6.595
4.0192,1.418,0.7205,2.691,6.766
8.0315,2.34,1.118,4.195,7.176
8.0315,2.33,1.126,4.161,6.74
16.04,3.759,1.751,5.9,7.174
16.04,3.762,1.748,5.911,7.151
32.102,5.418,2.942,7.164,7.149
32.102,5.406,2.941,7.164,7.175
64.142,7.016,4.478,7.174,7.176
64.142,7.018,4.402,7.175,7.175
128.32,7.176,6.078,7.175,7.176
128.32,7.175,6.107,7.175,7.173
255.72,7.165,7.162,7.165,7.165
255.72,7.165,7.164,7.166,7.166
511.71,7.165,7.165,7.165,7.165
511.71,7.165,7.165,7.166,7.164
Giving the function definition above, if you call it by F(b0) in the command windows, you will get a 34x4 matrix which is correct, since variable Y has the same size.
In that way I can (in theory) compute the standard formula for lsqnonlin (fit - measured)^2
I'm creating a game for kids. It's creating a triangle using 3 lines. How I approached this is I create two arcs(semi circle) from two end points of a base line. But I couldn't figure how to find the point of intersection of those two arc. I've search about it but only found point of intersection between two straight lines. Is there any method to find this point of intersection? Below is the figure of two arcs drawn from each end of the baseline.
Assume centers of the circle are (x1, y1) and (x2, y2), radii are R1 and R2. Let the ends of the base be A and B and the target point be T. We know that AT = R1 and BT = R2. IMHO the simplest trick to find T is to notice that difference of the squares of the distances is a known constant (R1^2 - R2^2). And it is easy to see that the line the contains points meeting this condition is actually a straight line perpendicular to the base. Circles equations:
(x - x1)^2 + (y-y1)^2 = R1^2
(x - x2)^2 + (y-y2)^2 = R2^2
If we subtract one from another we'll get:
(x2 - x1)(2*x - x1 - x2) + (y2 - y1)(2*y - y1 - y2) = R1^2 - R2^2
Let's x0 = (x1 + x2)/2 and y0 = (y1 + y2)/2 - the coordinates of the center. Let also the length of the base be L and its projections dx = x2 - x1 and dy = y2 - y1 (i.e. L^2 = dx^2 + dy^2). And let's Q = R1^2 - R2^2 So we can see that
2 * (dx * (x-x0) + dy*(y-y0)) = Q
So the line for all (x,y) pairs with R1^2 - R2^2 = Q = const is a straight line orthogonal to the base (because coefficients are exactly dx and dy).
Let's find the point C on the base that is the intersection with that line. It is easy - it splits the base so that difference of the squares of the lengths is Q. It is easy to find out that it is the point on a distance L/2 + Q/(2*L) from A and L/2 - Q/(2*L) from B. So now we can find that
TC^2 = R1^2 - (L/2 + Q/(2*L))^2
Substituting back Q and simplifying a bit we can find that
TC^2 = (2*L^2*R1^2 + 2*L^2*R2^2 + 2*R1^2*R2^2 - L^4 - R1^4 - R2^4) / (4*L^2)
So let's
a = (R1^2 - R2^2)/(2*L)
b = sqrt(2*L^2*R1^2 + 2*L^2*R2^2 + 2*R1^2*R2^2 - L^4 - R1^4 - R2^4) / (2*L)
Note that formula for b can also be written in a different form:
b = sqrt[(R1+R2+L)*(-R1+R2+L)*(R1-R2+L)*(R1+R2-L)] / (2*L)
which looks quite similar to the Heron's formula. And this is not a surprise because b is effectively the length of the height to the base AB from T in the triangle ABT so its length is 2*S/L where S is the area of the triangle. And the triangle ABT obviously has sides of lengths L, R1 and R2 respectively.
To find the target T we need to move a along the base and b in a perpendicular direction. So coordinates of T calculated from the middle of the segment are:
Xt = x0 + a * dx/L ± b * dy / L
Yt = y0 + a * dy/L ± b * dx / L
Here ± means that there are two solutions: one on either side of the base line.
Partial case: if R1 = R2 = R, then a = 0 and b = sqrt(R^2 - (L/2)^2) which makes obvious sense: T lies on the segment bisector on a length of sqrt(R^2 - (L/2)^2) from the middle of the segment.
Hope this helps.
While you have not stated clearly, I assume that you have points with coordinates (A.X, A.Y) and (B.X, B.Y) and lengths of two sides LenA and LenB and need to find coordinates of point C.
So you can make equation system exploiting circle equation:
(C.X - A.X)^2 + (C.Y - A.Y)^2 = LenA^2
(C.X - B.X)^2 + (C.Y - B.Y)^2 = LenB^2
and solve it for unknowns C.X, C.Y.
Not that it is worth to subtract A coordinates from all others, make and solve simpler system (the first equation becomes C'.X^2 + C'.Y^2 = LenA^2), then add A coordinates again
So I actually needed this to design a hopper to lift grapes during the wine harvest. Tried to work it out myself but the algebra is horrible, so I had a look on the web -in the end I did it myself but introduced some intermediate variables (that I calculate in Excel - this should also work for the OP since the goal was a calculated solution). In fairness this is really much the same as previous solutions but hopefully a little clearer.
Problem:
What are the coordinates of a point P(Xp,Yp) distance Lq from point Q(Xq,Yq) and distance Lr from point R(Xr,Yr)?
Let us first map the problem onto to new coordinate system where Lq is the origin, thus Q’ = (0,0), let (x,y) = P’(Xp-Xq,Yp-Yq) and let (a,b) = R’(Xr-Xq,Yr-Yq).
We may now write:
x^2 + y^2 = Lq^2 -(1)
(x-a)^2 + (y-b)^2 = Lr^2 -(2)
Expanding 2:
x^2 – 2ax + a^2 + y^2 -2ay + b^2 =Lr^2
Subtracting 1 and rearranging
2by = -2ax + a2 + b2 - Lr^2+ Lq^2
For convenience, let c = a^2 + b^2 + Lq^2 + Lr^2 (these are all known constants so c may be easily computed), thus we obtain:
y = -ax/b + c/2b
Substituting into 1 we obtain:
x^2 + (-a/b x + c/2b)^2 = Lq^2
Multiply the entire equation by b^2 and gather terms:
(a^2 + b^2) x2 -ac x + c/4 + Lq^2 b^2 = 0
Let A = (a2 + b2), B= -ac ,and C= c/4 + Lq^2 b^2
Use the general solution for a quadratic
x = (-B +-SQRT(B^2-4AC))/2A
Substitute back into 1 to get:
y= SQRT(Lq^2 - x^2 )
(This avoids computational difficulties where b = 0)
Map back to original coordinate system
P = (x+Xq, y + Yq)
Hope this helps, sorry about the formatting, I had this all pretty in Word, but lost it
I recently read this paper which introduces a process called "Warm-Up" (WU), which consists in multiplying the loss in the KL-divergence by a variable whose value depends on the number of epoch (it evolves linearly from 0 to 1)
I was wondering if this is the good way to do that:
beta = K.variable(value=0.0)
def vae_loss(x, x_decoded_mean):
# cross entropy
xent_loss = K.mean(objectives.categorical_crossentropy(x, x_decoded_mean))
# kl divergence
for k in range(n_sample):
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.,
std=1.0) # used for every z_i sampling
# Sample several layers of latent variables
for mean, var in zip(means, variances):
z_ = mean + K.exp(K.log(var) / 2) * epsilon
# build z
try:
z = tf.concat([z, z_], -1)
except NameError:
z = z_
except TypeError:
z = z_
# sum loss (using a MC approximation)
try:
loss += K.sum(log_normal2(z_, mean, K.log(var)), -1)
except NameError:
loss = K.sum(log_normal2(z_, mean, K.log(var)), -1)
print("z", z)
loss -= K.sum(log_stdnormal(z) , -1)
z = None
kl_loss = loss / n_sample
print('kl loss:', kl_loss)
# result
result = beta*kl_loss + xent_loss
return result
# define callback to change the value of beta at each epoch
def warmup(epoch):
value = (epoch/10.0) * (epoch <= 10.0) + 1.0 * (epoch > 10.0)
print("beta:", value)
beta = K.variable(value=value)
from keras.callbacks import LambdaCallback
wu_cb = LambdaCallback(on_epoch_end=lambda epoch, log: warmup(epoch))
# train model
vae.fit(
padded_X_train[:last_train,:,:],
padded_X_train[:last_train,:,:],
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=0,
callbacks=[tb, wu_cb],
validation_data=(padded_X_test[:last_test,:,:], padded_X_test[:last_test,:,:])
)
This will not work. I tested it to figure out exactly why it was not working. The key thing to remember is that Keras creates a static graph once at the beginning of training.
Therefore, the vae_loss function is called only once to create the loss tensor, which means that the reference to the beta variable will remain the same every time the loss is calculated. However, your warmup function reassigns beta to a new K.variable. Thus, the beta that is used for calculating loss is a different beta than the one that gets updated, and the value will always be 0.
It is an easy fix. Just change this line in your warmup callback:
beta = K.variable(value=value)
to:
K.set_value(beta, value)
This way the actual value in beta gets updated "in place" rather than creating a new variable, and the loss will be properly re-calculated.
I have just begun using lasagne and Theano to do some machine learning on Python.
I am trying to modify the softmax class in Theano. I want to change how the activation function(softmax) is calculated. Instead of dividing e_x by e_x.sum(axis=1), I want to divide e_x by sum of three consecutive numbers.
For instance, the result will be as follows:
sm[0] = e_x[0]/(e_x[0]+e_x[1]+e_x[2])
sm[1] = e_x[1]/(e_x[0]+e_x[1]+e_x[2])
sm[2] = e_x[2]/(e_x[0]+e_x[1]+e_x[2])
sm[3] = e_x[3]/(e_x[3]+e_x[4]+e_x[5])
sm[4] = e_x[4]/(e_x[3]+e_x[4]+e_x[5])
sm[5] = e_x[5]/(e_x[3]+e_x[4]+e_x[5])
and so on...
The problem is that I cannot quite grasp how theano carries out the computation.
Here is my main question. Does it suffice to just change the perform() function in the softmax class?
Here is the original perform() function:
def perform(self, node, input_storage, output_storage):
x, = input_storage
e_x = numpy.exp(x - x.max(axis=1)[:, None])
sm = e_x / e_x.sum(axis=1)[:, None]
output_storage[0][0] = sm
Here is my modified perform()
def myPerform(self, node, input_storage, output_storage):
x, = input_storage
e_x = numpy.exp(x - x.max(axis=1)[:, None])
sm = numpy.zeros_like(e_x)
for i in range(0,symbolCount):
total = e_x[3*i] + e_x[3*i+1] + e_x[3*i+2]
sm[3*i] = e_x[3*i]/total
sm[3*i+1] = e_x[3*i+1]/total
sm[3*i+2] = e_x[3*i+2]/total
output_storage[0][0] = sm
With the current code, I am getting 'unorderable types:int()>str()' error when I use the predict method in lasagne.
For something like this you're probably better off constructing a custom softmax via symbolic expressions rather than creating (or modifying) an operation.
Your custom softmax can be defined in terms of symbolic expressions. Doing it this way will give you gradients (and other Theano operation bits and pieces) "for free" but might run slightly slower than a custom operation could.
Here's an example:
import numpy
import theano
import theano.tensor as tt
x = tt.matrix()
# Use the built in softmax operation
y1 = tt.nnet.softmax(x)
# A regular softmax operation defined via ordinary Theano symbolic expressions
y2 = tt.exp(x)
y2 = y2 / y2.sum(axis=1)[:, None]
# Custom softmax operation
def custom_softmax(a):
b = tt.exp(a)
b1 = b[:, :3] / b[:, :3].sum(axis=1)[:, None]
b2 = b[:, 3:] / b[:, 3:].sum(axis=1)[:, None]
return tt.concatenate([b1, b2], axis=1)
y3 = custom_softmax(x)
f = theano.function([x], outputs=[y1, y2, y3])
x_value = [[.1, .2, .3, .4, .5, .6], [.1, .3, .5, .2, .4, .6]]
y1_value, y2_value, y3_value = f(x_value)
assert numpy.allclose(y1_value, y2_value)
assert y3_value.shape == y1_value.shape
a = numpy.exp(.1) + numpy.exp(.2) + numpy.exp(.3)
b = numpy.exp(.4) + numpy.exp(.5) + numpy.exp(.6)
c = numpy.exp(.1) + numpy.exp(.3) + numpy.exp(.5)
d = numpy.exp(.2) + numpy.exp(.4) + numpy.exp(.6)
assert numpy.allclose(y3_value, [
[numpy.exp(.1) / a, numpy.exp(.2) / a, numpy.exp(.3) / a, numpy.exp(.4) / b, numpy.exp(.5) / b, numpy.exp(.6) / b],
[numpy.exp(.1) / c, numpy.exp(.3) / c, numpy.exp(.5) / c, numpy.exp(.2) / d, numpy.exp(.4) / d, numpy.exp(.6) / d]
]), y3_value
I have encountered the following system of differential equations in lagrangian mechanics. Can you suggest a numerical method, with relevant links and references on how can I solve it. Also, is there a shorter implementation on Matlab or Mathematica?
mx (y dot)^2 + mgcosy - Mg - (M=m)(x double dot) =0
gsiny + 2(x dot)(y dot + x (y double dot)=0
where (x dot) or (y dot)= dx/dt or dy/dt, and the double dot indicated a double derivative wrt time.
You can create a vector Y = (x y u v)' so that
dx/dt = u
dy/dt = v
du/dt = d²x/dt²
dv/dt = d²y/dt²
It is possible to isolate the second derivatives from the equations, so you get
d²x/dt² = (m*g*cos(y) + m*x*v² - M*g)/(M-m)
d²y/dt² = -(g*sin(y) - 2*u*v)/x
Now, you can try to solve it using standard ODE solvers, such as Runge-Kutta methods. Matlab has a set of solvers, such as ode23. I didn't test he following, but it would be something like it:
function f = F(Y)
x = Y(1); y = Y(2); u = Y(3); v = Y(4);
f = [0,0,0,0];
f(1) = u;
f(2) = v;
f(3) = (m*g*cos(y) + m*x*v*v - M*g)/(M-m);
f(4) = -(g*sin(y) - 2*u*v)/x;
[T,Y] = ode23(F, time_period, Y0);