I have an optimization problem (optimze by changing x[0] and x[1]), where one of the constraints is a function, that uses the same constant variables (a and b) as the optimization function.
min f(x, a, b)
x[0] <= 100
x[1] <= 500
g(x, a, b) >= 0.9
But I am not sure how to realize the connection between function f and g:
x0 = np.array([10, 100])
bnds = ((0, 500), (0, 5000))
arguments = (100, 4) # This are varibles a and b
cons = ({'type': 'ineq', 'fun': lambda x: x[0]},
{'type': 'ineq', 'fun': lambda x: x[1]},
{'type': 'ineq', 'fun': lambda x: g(x, 100, 4)-0.9})
res = minimize(f, x0, args=arguments, method='SLSQP', bounds=bnds, constraints=cons)
print(res.x)
>> x: array([10, 5000])
But using this results function g results in
g(x,a,b)=0.85434
There is an optimal solution with x=[452, 4188], where
g(x,a,b)=0.901839
How do I need to adapt the constraints, that g(x,a,b) is valid.
Edit: Obviously the optimization is not sucessful:
print(res)
>> fun: 1778.86301369863
>> jac: array([1.00019786e+09, 9.31503296e-01])
>> message: 'Inequality constraints incompatible'
>> nfev: 4
>> nit: 1
>> njev: 1
>> status: 4
>> success: False
>> x: array([ 10., 5000.])
Thanks a lot.
UvW
f() and g() are not convex, not smooth and derivates are not available. Nevertheless my questioned aimed for the right syntax (using a function as a constraint). So I tried it with two "simpler" functions (see executeable code below) and it worked. So I assume, that my syntax is right and the problem lies with optimization method "SLSQP".
Is there an optimization method within the SCIPY package (some kind of Evolutionary algorithm) that I can use to solve my problem with g() and f()
not convex,
not smooth and
derivates are not available?
import numpy as np
from scipy.optimize import minimize
def g(x, a, b):
return (x[0]+x[1]+a+b)/100
def f(x, a, b):
return (x[0]*a+x[1]*b)*g(x, a, b)
x0 = np.array([0, 0]) bnds = ((0, 30), (0, 20)) arguments = (2, 3) # This are varibles a and b
cons = ({'type': 'ineq', 'fun': lambda x: x[0]},
{'type': 'ineq', 'fun': lambda x: x[1]},
{'type': 'ineq', 'fun': lambda x: g(x, 2, 3)-0.5}) #<--My question was about the correct syntax of that constraint
res = minimize(f, x0, args=arguments, method='SLSQP', bounds=bnds, constraints=cons)
print(res)
>> fun: 52.50000000000027
>> jac: array([2.05000019, 2.55000019])
>> message: 'Optimization terminated successfully.'
>> nfev: 20
>> nit: 5
>> njev: 5
>> status: 0
>> success: True
>> x: array([30., 15.])
Related
I am trying to implement a custom loss function in a Pytorch Autoencoder.
The loss function tries to maximize the cosine similarity between a given output tensor U (a vector) and 100 random vectors J where both U and J have the same dimension of [300]. This is repeated for each batch.
Suppose we have 30 items per batch, then the output tensor is
train_Y.shape = [30,300]
Random_vectors.shape = [30,100,300]
I can implement the loss function in two ways:
All_Y =[]
for Y,z_r in zip(train_y, random_vectors):
Y_cosine_list =[]
for z in z_r:
cosi = torch.dot(Y,z) / (torch.norm(Y)*torch.norm(z))
Y_cosine_list.append(cosi)
All_Y.append(Y_cosine_list)
All_Y = torch.tensor(All_Y).to(device)
train_loss = torch.sum(torch.abs(All_Y))/dim_0
train_loss = torch.tensor(train_loss.data, requires_grad = True)
or
train_Y = torch.zeros([dim_0, 100])
for i, (Y,z_r) in enumerate(zip(train_Y, random_vectors)):
for j,z in enumerate(z_r):
train_Y[i,j] = cos(Y,z)
train_Y = train_Y.to(device)
train_loss = torch.sum(torch.abs(train_Y))/dim_0
The second one is more elegant and to the point. However it is giving a "Cuda illegal memory access error". I have checked that the memory is not exceeded in either case. Is there anything wrong with the second implementation?
The first implementation is inelegant and I am not sure that it makes sense from a neural net optimization perspective. But it does not give errors and am able to complete training for all the epochs.
Ps: I have tried encapsulating this code block in a loss_fn method but I get the same illegal memory access error.
I have tried everything that I could find for the illegal memory access error - changing GPUs, removing a torch.stack block etc. But I can't seem to get rid of the problem.
Here is a vectorized way to do it
class CosineLoss(nn.Module):
def __init__(self, ):
super().__init__()
pass
def forward(self, x, y):
"""
Args:
x (torch.tensor): [batchsize, N, M] - tensor.
y (torch.tensor): [batchsize, M] - tensor.
Returns:
torch.tensor: scalar mean cosine loss
"""
# dot product along dimension 'm' i.e multiply and sum along 'm'.
dotp = torch.einsum("bm, bnm -> bn", y, x)
# L2 norm along dimension 'm' and multiply by broadcasting
length = torch.norm(y, dim=-1)[:, None]*torch.norm(x, dim=-1)
# cosine = dotproduct of unit vectors
cos = dotp/length
return cos.mean()
def test():
b, n, m = 30, 100, 300
train_Y = torch.randn(b, m, device='cuda')
random_vectors = torch.randn(b, n, m, requires_grad=True, device='cuda')
print(f'{random_vectors.grad = }')
cosineloss = CosineLoss()
loss = cosineloss(random_vectors, train_Y)
print(f'{loss = }')
loss.backward()
print(f'{random_vectors.grad.shape = }')
References:
einsum
broadcasting
I am new to python programming. I am working on deep learning algorithms . I have never seen these type of lines in c or c++ programs.
what do the lines predictions[a > 0.5] = 1 and acc = np.mean(predictions == y) mean?
def predict(x, w, b):
a = sigmoid( w.T # x + b)
predictions = np.zeros_like(a)
predictions[a > 0.5] = 1
return predictions
def test_model(x, y, w, b):
predictions = predict(x, w, b)
acc = np.mean(predictions == y)
acc = np.asscalar(acc)
return acc
def main():
x, y = load_train_data()
x = flatten(x)
x = x/255. # normalize the data to [0, 1]
print(f'train accuracy: {test_model(x, y, w, b) * 100:.2f}%')
x, y = load_test_data()
x = flatten(x)
x = x/255. # normalize the data to [0, 1]
print(f'test accuracy: {test_model(x, y, w, b) * 100:.2f}%')
Thank You
Generally speaking, without knowing more, it is impossible to say, because [] is just invoking a method, which can be defined on any class:
class Indexable:
def __getitem__(self, index):
if index:
return "Truly indexed!"
else:
return "Falsely indexed!"
predictions = Indexable()
a = 0.7
predictions[a > 0.5]
# => 'Truly indexed!'
The same is true of the operator >.
However, from context, it is likely that both predictions and a are numpy arrays of same size, and a contains numbers.
a > .5 will produce another array of the same size as a with False where the element is .5 or smaller, and True where it is larger than .5.
predictions[b] where b is a boolean array of the same size will produce an array that only contains the elements where b is True:
predictions = np.array([1, 2, 3, 4, 5])
b = [True, False, False, False, True]
predictions[b]
# => array([1, 5])
The indexed assignments are similar, setting a value only where the corresponding element in the index array is True:
predictions[b] = 17
predictions
# => array([17, 2, 3, 4, 17])
So the line you are wondering about is setting all the predictions where the corresponding a is over 0.5 to 1.
As for the other line you're wondering about, the logic of == is similar to that of > above: predictions == y will give a boolean array telling where predictions and y coincide.
This array is then passed to np.mean, which calculates an arithmetic average of its argument. How do you get an average of [True, False, False, False, True] though? By coercing them to floats! float(True) is 1.0; float(False) is 0.0. The average of a boolean list thus basically tells you the fraction of elements that are true: np.mean([True, False, False, True]) is the same as np.mean([1.0, 0.0, 0.0, 0.0, 1.0]), giving the result of 0.4 (or 40% of elements being True).
So your line calculates which proportion of predictions is same as y (in other words, accuracy of the prediction, assuming y is the gold data).
I have a function defined as below
\begin{equation}
f(x) = e^{-k1/x}x^{-2}(k1/x+56)^{81}
\end{equation}
Now I want to find integrate the function from 0 to infinite.
\begin{equation}
S = \int^{\inf}_{0} f(x) dx
\end{equation}
And then I want to find the cumulative function defined as below
\begin{equation}
CDF(p) = \int^{p}_{0} \frac{f(x)}{S} dx
\end{equation}
To do so, I wrote a program in Python.
from matplotlib import pyplot as plt
from scipy.integrate import quad
from math import pi, exp
import numpy as np
def func(x, k1, n):
w = -1.8*n+15 # scale the function down.
return (10**w)*exp(-k1/x)*x**(-2)*(k1/x+56)**n
def S(k1, n):
return quad(func, 0, 1e+28, args=(k1, n))[0] + quad(func, 1e+28, 1e+33, args=(k1, n))[0]
def CDF(x, k1, n):
return quad(func, 0, x, args=(k1, n))[0]/S(k1, n)
k1 = 7.7e+27 # When it's <3, CDF does not generate error.
n = 81
print(S(k1, n))
print(CDF(1.1e+27, k1, n))
But unfortunately, the CDF(1.1e+27) throws the error "results out of range".
How could I obtain CDF(1.1e+27)?
Suppose we have a composite type:
mutable struct MyType{TF<:AbstractFloat, TI<:Integer}
a::TF
b::TF
end
We define a constructor
function MyType(a; b = 1.0)
return MyType(a, b)
end
I can broadcast MyType over an array of a's, but how can I do that for b's?
I tried to do
MyType.([1.0, 2.0, 3.0]; [:b, 1.0, :b, 2.0, :b, 3.0,])
But, this does not work.
Note that the above example is totally artificial. In reality, I have a composite type that takes in many fields, many of which are constructed using keyword arguments, and I only want to change a few of them into different values stored in an array.
I don't think you can do this with dot-notation, however, you can manually construct the broadcast call:
julia> struct Foo
a::Int
b::Int
Foo(a; b = 1) = new(a, b)
end
julia> broadcast((x, y) -> Foo(x, b = y), [1,2,3], [4,5,6])
3-element Array{Foo,1}:
Foo(1, 4)
Foo(2, 5)
Foo(3, 6)
julia> broadcast((x, y) -> Foo(x; y), [1,2,3], [:b=>4,:b=>5,:b=>6])
3-element Array{Foo,1}:
Foo(1, 4)
Foo(2, 5)
Foo(3, 6)
I have just begun using lasagne and Theano to do some machine learning on Python.
I am trying to modify the softmax class in Theano. I want to change how the activation function(softmax) is calculated. Instead of dividing e_x by e_x.sum(axis=1), I want to divide e_x by sum of three consecutive numbers.
For instance, the result will be as follows:
sm[0] = e_x[0]/(e_x[0]+e_x[1]+e_x[2])
sm[1] = e_x[1]/(e_x[0]+e_x[1]+e_x[2])
sm[2] = e_x[2]/(e_x[0]+e_x[1]+e_x[2])
sm[3] = e_x[3]/(e_x[3]+e_x[4]+e_x[5])
sm[4] = e_x[4]/(e_x[3]+e_x[4]+e_x[5])
sm[5] = e_x[5]/(e_x[3]+e_x[4]+e_x[5])
and so on...
The problem is that I cannot quite grasp how theano carries out the computation.
Here is my main question. Does it suffice to just change the perform() function in the softmax class?
Here is the original perform() function:
def perform(self, node, input_storage, output_storage):
x, = input_storage
e_x = numpy.exp(x - x.max(axis=1)[:, None])
sm = e_x / e_x.sum(axis=1)[:, None]
output_storage[0][0] = sm
Here is my modified perform()
def myPerform(self, node, input_storage, output_storage):
x, = input_storage
e_x = numpy.exp(x - x.max(axis=1)[:, None])
sm = numpy.zeros_like(e_x)
for i in range(0,symbolCount):
total = e_x[3*i] + e_x[3*i+1] + e_x[3*i+2]
sm[3*i] = e_x[3*i]/total
sm[3*i+1] = e_x[3*i+1]/total
sm[3*i+2] = e_x[3*i+2]/total
output_storage[0][0] = sm
With the current code, I am getting 'unorderable types:int()>str()' error when I use the predict method in lasagne.
For something like this you're probably better off constructing a custom softmax via symbolic expressions rather than creating (or modifying) an operation.
Your custom softmax can be defined in terms of symbolic expressions. Doing it this way will give you gradients (and other Theano operation bits and pieces) "for free" but might run slightly slower than a custom operation could.
Here's an example:
import numpy
import theano
import theano.tensor as tt
x = tt.matrix()
# Use the built in softmax operation
y1 = tt.nnet.softmax(x)
# A regular softmax operation defined via ordinary Theano symbolic expressions
y2 = tt.exp(x)
y2 = y2 / y2.sum(axis=1)[:, None]
# Custom softmax operation
def custom_softmax(a):
b = tt.exp(a)
b1 = b[:, :3] / b[:, :3].sum(axis=1)[:, None]
b2 = b[:, 3:] / b[:, 3:].sum(axis=1)[:, None]
return tt.concatenate([b1, b2], axis=1)
y3 = custom_softmax(x)
f = theano.function([x], outputs=[y1, y2, y3])
x_value = [[.1, .2, .3, .4, .5, .6], [.1, .3, .5, .2, .4, .6]]
y1_value, y2_value, y3_value = f(x_value)
assert numpy.allclose(y1_value, y2_value)
assert y3_value.shape == y1_value.shape
a = numpy.exp(.1) + numpy.exp(.2) + numpy.exp(.3)
b = numpy.exp(.4) + numpy.exp(.5) + numpy.exp(.6)
c = numpy.exp(.1) + numpy.exp(.3) + numpy.exp(.5)
d = numpy.exp(.2) + numpy.exp(.4) + numpy.exp(.6)
assert numpy.allclose(y3_value, [
[numpy.exp(.1) / a, numpy.exp(.2) / a, numpy.exp(.3) / a, numpy.exp(.4) / b, numpy.exp(.5) / b, numpy.exp(.6) / b],
[numpy.exp(.1) / c, numpy.exp(.3) / c, numpy.exp(.5) / c, numpy.exp(.2) / d, numpy.exp(.4) / d, numpy.exp(.6) / d]
]), y3_value