I have a neural network and this is the function for it:
def mlp(sizes, activation=nn.Tanh, output_activation=nn.Identity):
# Build a feedforward neural network. outputs are the logits
layers = []
for j in range(len(sizes)-1):
act = activation if j < len(sizes)-2 else output_activation
layers += [nn.Linear(sizes[j], sizes[j+1]), act()]
return nn.Sequential(*layers)
and sizes=[10, 20]
My input is a tensor in the shape of [100,1,10] and with this network, I get a tensor as my output in the shape of [100,20]. What I would like to add is two more layers so that instead of a matrix as the output, I get a vector of size 20. Is this possible considering that the input includes a batch of images?
Related
I am trying to implement a custom loss function in a Pytorch Autoencoder.
The loss function tries to maximize the cosine similarity between a given output tensor U (a vector) and 100 random vectors J where both U and J have the same dimension of [300]. This is repeated for each batch.
Suppose we have 30 items per batch, then the output tensor is
train_Y.shape = [30,300]
Random_vectors.shape = [30,100,300]
I can implement the loss function in two ways:
All_Y =[]
for Y,z_r in zip(train_y, random_vectors):
Y_cosine_list =[]
for z in z_r:
cosi = torch.dot(Y,z) / (torch.norm(Y)*torch.norm(z))
Y_cosine_list.append(cosi)
All_Y.append(Y_cosine_list)
All_Y = torch.tensor(All_Y).to(device)
train_loss = torch.sum(torch.abs(All_Y))/dim_0
train_loss = torch.tensor(train_loss.data, requires_grad = True)
or
train_Y = torch.zeros([dim_0, 100])
for i, (Y,z_r) in enumerate(zip(train_Y, random_vectors)):
for j,z in enumerate(z_r):
train_Y[i,j] = cos(Y,z)
train_Y = train_Y.to(device)
train_loss = torch.sum(torch.abs(train_Y))/dim_0
The second one is more elegant and to the point. However it is giving a "Cuda illegal memory access error". I have checked that the memory is not exceeded in either case. Is there anything wrong with the second implementation?
The first implementation is inelegant and I am not sure that it makes sense from a neural net optimization perspective. But it does not give errors and am able to complete training for all the epochs.
Ps: I have tried encapsulating this code block in a loss_fn method but I get the same illegal memory access error.
I have tried everything that I could find for the illegal memory access error - changing GPUs, removing a torch.stack block etc. But I can't seem to get rid of the problem.
Here is a vectorized way to do it
class CosineLoss(nn.Module):
def __init__(self, ):
super().__init__()
pass
def forward(self, x, y):
"""
Args:
x (torch.tensor): [batchsize, N, M] - tensor.
y (torch.tensor): [batchsize, M] - tensor.
Returns:
torch.tensor: scalar mean cosine loss
"""
# dot product along dimension 'm' i.e multiply and sum along 'm'.
dotp = torch.einsum("bm, bnm -> bn", y, x)
# L2 norm along dimension 'm' and multiply by broadcasting
length = torch.norm(y, dim=-1)[:, None]*torch.norm(x, dim=-1)
# cosine = dotproduct of unit vectors
cos = dotp/length
return cos.mean()
def test():
b, n, m = 30, 100, 300
train_Y = torch.randn(b, m, device='cuda')
random_vectors = torch.randn(b, n, m, requires_grad=True, device='cuda')
print(f'{random_vectors.grad = }')
cosineloss = CosineLoss()
loss = cosineloss(random_vectors, train_Y)
print(f'{loss = }')
loss.backward()
print(f'{random_vectors.grad.shape = }')
References:
einsum
broadcasting
I have solved a differential equation with a neural net. I leave code below with an example. I want to be able to compute the first derivative of this neural net with respect to its input "x" and evaluate this derivative for any "x".
1- Notice that I compute der = discretize.derivative . Is that the derivative of the neural net with respect to "x"? With this expression, if I type [first(der(phi, u, [x], 0.00001, 1, res.minimizer)) for x in xs] I get something that I wonder if it is the derivative but I cannot find a way to extract this in an array, let alone plot this. How can I evaluate this derivative at any point, lets say for all points in the array defined below as "xs"? Below in Update I give a more straightforward approach I took to try to compute the derivative (but still did not succeed).
2- Is there any other way that I could take the derivative with respect to x of the neural net?
I am new to Julia, so I am struggling a bit with how to manipulate the data types. Thanks for any suggestions!
Update: I found a way to see the symbolic expression for the neural net doing the following:
predict(x) = first(phi(x,res.minimizer))
df(x) = gradient(predict, x)[1]
After running the two lines of code type predict(x) or df(x) in the REPL and it will spit out the full neural net with the weights and biases of the solution. However I cannot evaluate the gradient, it spits an error. How can I evaluate the gradient with respect to x of my function predict(x)??
The original code creating the neural net and solving the equation
using NeuralPDE, Flux, ModelingToolkit, GalacticOptim, Optim, DiffEqFlux
import ModelingToolkit: Interval, infimum, supremum
#parameters x
#variables u(..)
Dx = Differential(x)
a = 0.5
eq = Dx(u(x)) ~ -log(x*a)
# Initial and boundary conditions
bcs = [u(0.) ~ 0.01]
# Space and time domains
domains = [x ∈ Interval(0.01,1.0)]
# Neural network
n = 15
chain = FastChain(FastDense(1,n,tanh),FastDense(n,1))
discretization = PhysicsInformedNN(chain, QuasiRandomTraining(100))
#named pde_system = PDESystem(eq,bcs,domains,[x],[u(x)])
prob = discretize(pde_system,discretization)
const losses = []
cb = function (p,l)
push!(losses, l)
if length(losses)%100==0
println("Current loss after $(length(losses)) iterations: $(losses[end])")
end
return false
end
res = GalacticOptim.solve(prob, ADAM(0.01); cb = cb, maxiters=300)
prob = remake(prob,u0=res.minimizer)
res = GalacticOptim.solve(prob,BFGS(); cb = cb, maxiters=1000)
phi = discretization.phi
der = discretization.derivative
using Plots
analytic_sol_func(x) = (1.0+log(1/a))*x-x*log(x)
dx = 0.05
xs = LinRange(0.01,1.0,50)
u_real = [analytic_sol_func(x) for x in xs]
u_predict = [first(phi(x,res.minimizer)) for x in xs]
x_plot = collect(xs)
xconst = analytic_sol_func(1)*ones(size(xs))
plot(x_plot ,u_real,title = "Solution",linewidth=3)
plot!(x_plot ,u_predict,line =:dashdot,linewidth=2)
The solution I found consists in differentiating the approximation with the help of ForwardDiff.
So if the neural network approximation to the unkown function is called "funcres" then we take its derivative with respect to x as shown below.
using ForwardDiff
funcres(x) = first(phi(x,res.minimizer))
dxu = ForwardDiff.derivative.(funcres, Array(x_plot))
display(plot(x_plot,dxu,title = "Derivative",linewidth=3))
I am running an object localization task with CNN regression. I got pretty good results with the original data.
However, after I added 5% noise into the original data in the augmentation process, the loss values during the training looks ok, but the validation fluctuates a lot during the training. And when I check the results after the training was done, the neural nets always predicted a constant output. That is, the output from the networks never changes regardless of the input data, which means the neural network was trapped into a dead valley. How could this happen since I added variability into the data, and how to avoid that?
The code to add noise:
sindex = 0;
for n in range(n_train):
cndata = normalized_data[n, :, :, :]
# plt.matshow(np.squeeze(prepped_data[sindex, :, :,:]))
# add noise, augmentation, and shift in x and y
for x in range(1):
noise = np.random.normal(0, noise_sd, (xres, xres, slices))
ishift = np.random.randint(low=1, high=(row - xres), size=2)
prepped_data[sindex, :, :, :, 0] = cndata[ishift[0]:ishift[0] + xres, ishift[1]:ishift[1] + xres, :] + noise
prepped_label[sindex, :] = labelData[:, n] + [-ishift[0], -ishift[1], 0, -ishift[0], -ishift[1], 0, 0]
sindex = sindex + 1
Loss function definition:
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy', root_mean_squared_error])
RMSE definition:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
Here is the original training curve without noise:
Here is the training curve after adding noise:
The Keras docs for the softmax Activation states that I can specify which axis the activation is applied to. My model is supposed to output an n by k matrix M where Mij is the probability that the ith letter is symbol j.
n = 7 # number of symbols in the ouput string (fixed)
k = len("0123456789") # the number of possible symbols
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=((N,))))
...
model.add(layers.Dense(n * k, activation=None))
model.add(layers.Reshape((n, k)))
model.add(layers.Dense(output_dim=n, activation='softmax(x, axis=1)'))
The last line of code doesn't compile as I don't know how to correctly specify the axis (the axis for k in my case) for the softmax activation.
You must use an actual function there, not a string.
Keras allows you to use a few strings for convenience.
The activation functions can be found in keras.activations, and they're listed in the help file.
from keras.activations import softmax
def softMaxAxis1(x):
return softmax(x,axis=1)
.....
......
model.add(layers.Dense(output_dim=n, activation=softMaxAxis1))
Or even a custom axis:
def softMaxAxis(axis):
def soft(x):
return softmax(x,axis=axis)
return soft
...
model.add(layers.Dense(output_dim=n, activation=softMaxAxis(1)))
I am building an RNN model in Keras for sentences with word embeddings from gensim. I am initializing the embedding layer with GloVe vectors. Since this is a sequential model and sentences have variable lengths, vectors are zero-padded. e.g.
[0, 0, 0, 6, 2, 4]
Let's say the GloVe vectors have dimensions [NUM_VOCAB, EMBEDDING_SIZE]. The zero index is masked (ignored) so to get the proper indexing of words, do we add an extra column to the GloVe matrix so the dimensions are: [NUM_VOCAB+1, EMBEDDING_SIZE]?
Seems like there is an unnecessary vector that the model will estimate unless there is a more elegant way.
glove = Word2Vec.load_word2vec_format(filename)
embedding_matrix = np.vstack([np.zeros(EMBEDDING_SIZE), glove.syn0])
model = Sequential()
# -- this uses Glove as inits
model.add(Embedding(NUM_VOCAB, EMBEDDING_SIZE, input_length=maxlen, mask_zero=True,
weights=[embedding_matrix]))
# -- sequence layer
model.add(LSTM(32, return_sequences=False, init='orthogonal'))
model.add(Activation('tanh'))
...
Thanks