Iterative loss function Autoencoders - deep-learning

I am trying to implement a custom loss function in a Pytorch Autoencoder.
The loss function tries to maximize the cosine similarity between a given output tensor U (a vector) and 100 random vectors J where both U and J have the same dimension of [300]. This is repeated for each batch.
Suppose we have 30 items per batch, then the output tensor is
train_Y.shape = [30,300]
Random_vectors.shape = [30,100,300]
I can implement the loss function in two ways:
All_Y =[]
for Y,z_r in zip(train_y, random_vectors):
Y_cosine_list =[]
for z in z_r:
cosi = torch.dot(Y,z) / (torch.norm(Y)*torch.norm(z))
Y_cosine_list.append(cosi)
All_Y.append(Y_cosine_list)
All_Y = torch.tensor(All_Y).to(device)
train_loss = torch.sum(torch.abs(All_Y))/dim_0
train_loss = torch.tensor(train_loss.data, requires_grad = True)
or
train_Y = torch.zeros([dim_0, 100])
for i, (Y,z_r) in enumerate(zip(train_Y, random_vectors)):
for j,z in enumerate(z_r):
train_Y[i,j] = cos(Y,z)
train_Y = train_Y.to(device)
train_loss = torch.sum(torch.abs(train_Y))/dim_0
The second one is more elegant and to the point. However it is giving a "Cuda illegal memory access error". I have checked that the memory is not exceeded in either case. Is there anything wrong with the second implementation?
The first implementation is inelegant and I am not sure that it makes sense from a neural net optimization perspective. But it does not give errors and am able to complete training for all the epochs.
Ps: I have tried encapsulating this code block in a loss_fn method but I get the same illegal memory access error.
I have tried everything that I could find for the illegal memory access error - changing GPUs, removing a torch.stack block etc. But I can't seem to get rid of the problem.

Here is a vectorized way to do it
class CosineLoss(nn.Module):
def __init__(self, ):
super().__init__()
pass
def forward(self, x, y):
"""
Args:
x (torch.tensor): [batchsize, N, M] - tensor.
y (torch.tensor): [batchsize, M] - tensor.
Returns:
torch.tensor: scalar mean cosine loss
"""
# dot product along dimension 'm' i.e multiply and sum along 'm'.
dotp = torch.einsum("bm, bnm -> bn", y, x)
# L2 norm along dimension 'm' and multiply by broadcasting
length = torch.norm(y, dim=-1)[:, None]*torch.norm(x, dim=-1)
# cosine = dotproduct of unit vectors
cos = dotp/length
return cos.mean()
def test():
b, n, m = 30, 100, 300
train_Y = torch.randn(b, m, device='cuda')
random_vectors = torch.randn(b, n, m, requires_grad=True, device='cuda')
print(f'{random_vectors.grad = }')
cosineloss = CosineLoss()
loss = cosineloss(random_vectors, train_Y)
print(f'{loss = }')
loss.backward()
print(f'{random_vectors.grad.shape = }')
References:
einsum
broadcasting

Related

How can I evaluate and take the derivative of a neural net in Julia

I have solved a differential equation with a neural net. I leave code below with an example. I want to be able to compute the first derivative of this neural net with respect to its input "x" and evaluate this derivative for any "x".
1- Notice that I compute der = discretize.derivative . Is that the derivative of the neural net with respect to "x"? With this expression, if I type [first(der(phi, u, [x], 0.00001, 1, res.minimizer)) for x in xs] I get something that I wonder if it is the derivative but I cannot find a way to extract this in an array, let alone plot this. How can I evaluate this derivative at any point, lets say for all points in the array defined below as "xs"? Below in Update I give a more straightforward approach I took to try to compute the derivative (but still did not succeed).
2- Is there any other way that I could take the derivative with respect to x of the neural net?
I am new to Julia, so I am struggling a bit with how to manipulate the data types. Thanks for any suggestions!
Update: I found a way to see the symbolic expression for the neural net doing the following:
predict(x) = first(phi(x,res.minimizer))
df(x) = gradient(predict, x)[1]
After running the two lines of code type predict(x) or df(x) in the REPL and it will spit out the full neural net with the weights and biases of the solution. However I cannot evaluate the gradient, it spits an error. How can I evaluate the gradient with respect to x of my function predict(x)??
The original code creating the neural net and solving the equation
using NeuralPDE, Flux, ModelingToolkit, GalacticOptim, Optim, DiffEqFlux
import ModelingToolkit: Interval, infimum, supremum
#parameters x
#variables u(..)
Dx = Differential(x)
a = 0.5
eq = Dx(u(x)) ~ -log(x*a)
# Initial and boundary conditions
bcs = [u(0.) ~ 0.01]
# Space and time domains
domains = [x ∈ Interval(0.01,1.0)]
# Neural network
n = 15
chain = FastChain(FastDense(1,n,tanh),FastDense(n,1))
discretization = PhysicsInformedNN(chain, QuasiRandomTraining(100))
#named pde_system = PDESystem(eq,bcs,domains,[x],[u(x)])
prob = discretize(pde_system,discretization)
const losses = []
cb = function (p,l)
push!(losses, l)
if length(losses)%100==0
println("Current loss after $(length(losses)) iterations: $(losses[end])")
end
return false
end
res = GalacticOptim.solve(prob, ADAM(0.01); cb = cb, maxiters=300)
prob = remake(prob,u0=res.minimizer)
res = GalacticOptim.solve(prob,BFGS(); cb = cb, maxiters=1000)
phi = discretization.phi
der = discretization.derivative
using Plots
analytic_sol_func(x) = (1.0+log(1/a))*x-x*log(x)
dx = 0.05
xs = LinRange(0.01,1.0,50)
u_real = [analytic_sol_func(x) for x in xs]
u_predict = [first(phi(x,res.minimizer)) for x in xs]
x_plot = collect(xs)
xconst = analytic_sol_func(1)*ones(size(xs))
plot(x_plot ,u_real,title = "Solution",linewidth=3)
plot!(x_plot ,u_predict,line =:dashdot,linewidth=2)
The solution I found consists in differentiating the approximation with the help of ForwardDiff.
So if the neural network approximation to the unkown function is called "funcres" then we take its derivative with respect to x as shown below.
using ForwardDiff
funcres(x) = first(phi(x,res.minimizer))
dxu = ForwardDiff.derivative.(funcres, Array(x_plot))
display(plot(x_plot,dxu,title = "Derivative",linewidth=3))

mIoU for multi-class

I would like to understand how mIoU is calculated for multi-class classification. The formula for each class is
IoU formula
and then the average is done over the classes to get the mIoU. However, I don't understand what happens for the classes that are not represented. The formula becomes a division by 0, so I ignore them and the average is only computed for the classes represented.
The problem is that when a prediction is wrong, the accuracy is really lowered. It adds another class to make the average. For instance : in semantic segmentation the ground-truth of an image is made of 4 classes (0,1,2,3) and 6 classes are represented over the dataset. The prediction is also made of 4 classes (0,1,4,5) but all the items classified in 2 and 3 (in the ground-truth) are classified in 4 and 5 (in the prediction). In this case should we calculate the mIoU over 6 classes ? Even if 4 classes are totally wrong and there respective IoU is 0 ? So the problem is that if just one pixel is predicted in a class that is not in the ground_truth, we have to divide by a higher denominator and it lows a lot the score.
Is it the correct way to compute the mIoU for multi-class (and the semantic segmentation) ?
Instead of calculating the miou of each image and then calculate the "mean" miou over all the images, I calculate the miou as one big image. If a class is not in the image and is not predicited, I set there respective iou equal to 1.
From scratch :
def miou(gt,pred,nbr_mask):
intersection = np.zeros(nbr_mask) # int = (A and B)
den = np.zeros(nbr_mask) # den = A + B = (A or B) + (A and B)
for i in range(len(gt)):
for j in range(height):
for k in range(width):
if pred[i][j][k]==gt[i][j][k]:
intersection[gt[i][j][k]]+=1
den[pred[i][j][k]] += 1
den[gt[i][j][k]] += 1
mIoU = 0
for i in range(nbr_mask):
if den[i]!=0:
mIoU+=intersection[i]/(den[i]-intersection[i])
else:
mIoU+=1
mIoU=mIoU/nbr_mask
return mIoU
With gt the array of ground truth labels and pred the prediction of theassociated images (have to correspond in the array and be the same size).
Adding to the previous answer, this is a great fast and efficient pytorch GPU implementation of calculating the mIOU and classswise IOU for a batch of size (N, H, W) (both pred mask and labels), taken from the NeurIPS 2021 paper "Few-Shot Segmentation via Cycle-Consistent Transformer", github repo available here.
def intersectionAndUnionGPU(output, target, K, ignore_index=255):
# 'K' classes, output and target sizes are N or N * L or N * H * W, each value in range 0 to K - 1.
assert (output.dim() in [1, 2, 3])
assert output.shape == target.shape
output = output.view(-1)
target = target.view(-1)
output[target == ignore_index] = ignore_index
intersection = output[output == target]
area_intersection = torch.histc(intersection, bins=K, min=0, max=K-1)
area_output = torch.histc(output, bins=K, min=0, max=K-1)
area_target = torch.histc(target, bins=K, min=0, max=K-1)
area_union = area_output + area_target - area_intersection
return area_intersection, area_union, area_target
Example usage:
output = torch.rand(4, 5, 224, 224) # model output; batch size=4; channels=5, H,W=224
preds = F.softmax(output, dim=1).argmax(dim=1) # (4, 224, 224)
labels = torch.randint(0,5, (4, 224, 224))
i, u, _ = intersectionAndUnionGPU(preds, labels, 5) # 5 is num_classes
classwise_IOU = i/u # tensor of size (num_classes)
mIOU = i.sum()/u.sum() # mean IOU, taking (i/u).mean() is wrong
Hope this helps everyone!
(A non-GPU implementation is available as well in the repo!)

pytorch nllloss function target shape mismatch

I'm training a LSTM model using pytorch with batch size of 256 and NLLLoss() as loss function.
The loss function is having problem with the data shape.
The softmax output from the forward passing has shape of torch.Size([256, 4, 1181]) where 256 is batch size, 4 is sequence length, and 1181 is vocab size.
The target is in the shape of torch.Size([256, 4]) where 256 is batch size and 4 is the output sequence length.
When I was testing earlier with batch size of 1, the model works fine but when I add batch size, it is breaking. I read that NLLLoss() can take class target as input instead of one hot encoded target.
Am I misunderstanding it? Or did I not format the shape of the target correctly?
class LSTM(nn.Module):
def __init__(self, embed_size=100, hidden_size=100, vocab_size=1181, embedding_matrix=...):
super(LSTM, self).__init__()
self.hidden_size = hidden_size
self.word_embeddings = nn.Embedding(vocab_size, embed_size)
self.word_embeddings.load_state_dict({'weight': torch.Tensor(embedding_matrix)})
self.word_embeddings.weight.requires_grad = False
self.lstm = nn.LSTM(embed_size, hidden_size)
self.hidden2out = nn.Linear(hidden_size, vocab_size)
def forward(self, tokens):
batch_size, num_steps = tokens.shape
embeds = self.word_embeddings(tokens)
lstm_out, _ = self.lstm(embeds.view(batch_size, num_steps, -1))
out_space = self.hidden2out(lstm_out.view(batch_size, num_steps, -1))
out_scores = F.log_softmax(out_space, dim=1)
return out_scores
model = LSTM(self.config.embed_size, self.config.hidden_size, self.config.vocab_size, self.embedding_matrix)
loss_function = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=self.config.lr)
Error:
~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
1846 if target.size()[1:] != input.size()[2:]:
1847 raise ValueError('Expected target size {}, got {}'.format(
-> 1848 out_size, target.size()))
1849 input = input.contiguous().view(n, c, 1, -1)
1850 target = target.contiguous().view(n, 1, -1)
ValueError: Expected target size (256, 554), got torch.Size([256, 4])
Your input shape to the loss function is (N, d, C) = (256, 4, 1181) and your target shape is (N, d) = (256, 4), however, according to the docs on NLLLoss the input should be (N, C, d) for a target of (N, d).
Supposing x is your network output and y is the target then you can compute loss by transposing the incorrect dimensions of x as follows:
loss = loss_function(x.transpose(1, 2), y)
Alternatively, since NLLLoss is just averaging all the responses anyway, you can reshape x and y to be (N*d, C) and (N*d). This gives the same result without creating temporary copies of your tensors.
loss = loss_function(x.reshape(N*d, C), y.reshape(N*d))

How to calculate Batch Pairwise Distance in PyTorch efficiently

I have tensors X of shape BxNxD and Y of shape BxNxD.
I want to compute the pairwise distances for each element in the batch, i.e. I a BxMxN tensor.
How do I do this?
There is some discussion on this topic here: https://github.com/pytorch/pytorch/issues/9406, but I don't understand it as there are many implementation details while no actual solution is highlighted.
A naive approach would be to use the answer for non-batched pairwise distances as discussed here: https://discuss.pytorch.org/t/efficient-distance-matrix-computation/9065, i.e.
import torch
import numpy as np
B = 32
N = 128
M = 256
D = 3
X = torch.from_numpy(np.random.normal(size=(B, N, D)))
Y = torch.from_numpy(np.random.normal(size=(B, M, D)))
def pairwise_distances(x, y=None):
x_norm = (x**2).sum(1).view(-1, 1)
if y is not None:
y_t = torch.transpose(y, 0, 1)
y_norm = (y**2).sum(1).view(1, -1)
else:
y_t = torch.transpose(x, 0, 1)
y_norm = x_norm.view(1, -1)
dist = x_norm + y_norm - 2.0 * torch.mm(x, y_t)
return torch.clamp(dist, 0.0, np.inf)
out = []
for b in range(B):
out.append(pairwise_distances(X[b], Y[b]))
print(torch.stack(out).shape)
How can I do this without looping over B?
Thanks
I had a similar issue and spent some time to find the easiest and fastest solution. Now you can compute batched distance by using PyTorch cdist which will give you BxMxN tensor:
torch.cdist(Y, X)
Also, it works well if you just want to compute distances between each pair of rows of two matrixes.

Variationnal auto-encoder: implementing warm-up in Keras

I recently read this paper which introduces a process called "Warm-Up" (WU), which consists in multiplying the loss in the KL-divergence by a variable whose value depends on the number of epoch (it evolves linearly from 0 to 1)
I was wondering if this is the good way to do that:
beta = K.variable(value=0.0)
def vae_loss(x, x_decoded_mean):
# cross entropy
xent_loss = K.mean(objectives.categorical_crossentropy(x, x_decoded_mean))
# kl divergence
for k in range(n_sample):
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.,
std=1.0) # used for every z_i sampling
# Sample several layers of latent variables
for mean, var in zip(means, variances):
z_ = mean + K.exp(K.log(var) / 2) * epsilon
# build z
try:
z = tf.concat([z, z_], -1)
except NameError:
z = z_
except TypeError:
z = z_
# sum loss (using a MC approximation)
try:
loss += K.sum(log_normal2(z_, mean, K.log(var)), -1)
except NameError:
loss = K.sum(log_normal2(z_, mean, K.log(var)), -1)
print("z", z)
loss -= K.sum(log_stdnormal(z) , -1)
z = None
kl_loss = loss / n_sample
print('kl loss:', kl_loss)
# result
result = beta*kl_loss + xent_loss
return result
# define callback to change the value of beta at each epoch
def warmup(epoch):
value = (epoch/10.0) * (epoch <= 10.0) + 1.0 * (epoch > 10.0)
print("beta:", value)
beta = K.variable(value=value)
from keras.callbacks import LambdaCallback
wu_cb = LambdaCallback(on_epoch_end=lambda epoch, log: warmup(epoch))
# train model
vae.fit(
padded_X_train[:last_train,:,:],
padded_X_train[:last_train,:,:],
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=0,
callbacks=[tb, wu_cb],
validation_data=(padded_X_test[:last_test,:,:], padded_X_test[:last_test,:,:])
)
This will not work. I tested it to figure out exactly why it was not working. The key thing to remember is that Keras creates a static graph once at the beginning of training.
Therefore, the vae_loss function is called only once to create the loss tensor, which means that the reference to the beta variable will remain the same every time the loss is calculated. However, your warmup function reassigns beta to a new K.variable. Thus, the beta that is used for calculating loss is a different beta than the one that gets updated, and the value will always be 0.
It is an easy fix. Just change this line in your warmup callback:
beta = K.variable(value=value)
to:
K.set_value(beta, value)
This way the actual value in beta gets updated "in place" rather than creating a new variable, and the loss will be properly re-calculated.