mIoU for multi-class - deep-learning

I would like to understand how mIoU is calculated for multi-class classification. The formula for each class is
IoU formula
and then the average is done over the classes to get the mIoU. However, I don't understand what happens for the classes that are not represented. The formula becomes a division by 0, so I ignore them and the average is only computed for the classes represented.
The problem is that when a prediction is wrong, the accuracy is really lowered. It adds another class to make the average. For instance : in semantic segmentation the ground-truth of an image is made of 4 classes (0,1,2,3) and 6 classes are represented over the dataset. The prediction is also made of 4 classes (0,1,4,5) but all the items classified in 2 and 3 (in the ground-truth) are classified in 4 and 5 (in the prediction). In this case should we calculate the mIoU over 6 classes ? Even if 4 classes are totally wrong and there respective IoU is 0 ? So the problem is that if just one pixel is predicted in a class that is not in the ground_truth, we have to divide by a higher denominator and it lows a lot the score.
Is it the correct way to compute the mIoU for multi-class (and the semantic segmentation) ?

Instead of calculating the miou of each image and then calculate the "mean" miou over all the images, I calculate the miou as one big image. If a class is not in the image and is not predicited, I set there respective iou equal to 1.
From scratch :
def miou(gt,pred,nbr_mask):
intersection = np.zeros(nbr_mask) # int = (A and B)
den = np.zeros(nbr_mask) # den = A + B = (A or B) + (A and B)
for i in range(len(gt)):
for j in range(height):
for k in range(width):
if pred[i][j][k]==gt[i][j][k]:
intersection[gt[i][j][k]]+=1
den[pred[i][j][k]] += 1
den[gt[i][j][k]] += 1
mIoU = 0
for i in range(nbr_mask):
if den[i]!=0:
mIoU+=intersection[i]/(den[i]-intersection[i])
else:
mIoU+=1
mIoU=mIoU/nbr_mask
return mIoU
With gt the array of ground truth labels and pred the prediction of theassociated images (have to correspond in the array and be the same size).

Adding to the previous answer, this is a great fast and efficient pytorch GPU implementation of calculating the mIOU and classswise IOU for a batch of size (N, H, W) (both pred mask and labels), taken from the NeurIPS 2021 paper "Few-Shot Segmentation via Cycle-Consistent Transformer", github repo available here.
def intersectionAndUnionGPU(output, target, K, ignore_index=255):
# 'K' classes, output and target sizes are N or N * L or N * H * W, each value in range 0 to K - 1.
assert (output.dim() in [1, 2, 3])
assert output.shape == target.shape
output = output.view(-1)
target = target.view(-1)
output[target == ignore_index] = ignore_index
intersection = output[output == target]
area_intersection = torch.histc(intersection, bins=K, min=0, max=K-1)
area_output = torch.histc(output, bins=K, min=0, max=K-1)
area_target = torch.histc(target, bins=K, min=0, max=K-1)
area_union = area_output + area_target - area_intersection
return area_intersection, area_union, area_target
Example usage:
output = torch.rand(4, 5, 224, 224) # model output; batch size=4; channels=5, H,W=224
preds = F.softmax(output, dim=1).argmax(dim=1) # (4, 224, 224)
labels = torch.randint(0,5, (4, 224, 224))
i, u, _ = intersectionAndUnionGPU(preds, labels, 5) # 5 is num_classes
classwise_IOU = i/u # tensor of size (num_classes)
mIOU = i.sum()/u.sum() # mean IOU, taking (i/u).mean() is wrong
Hope this helps everyone!
(A non-GPU implementation is available as well in the repo!)

Related

How to use PyTorch nn.BatchNorm1d to get equal normalization across features?

i would like to ask a question regarding the nn.BatchNorm1d in PyTorch.
I have one main tensor, which has shape [B, 3, N]. Then, i have two additional tensors which have shape [B, 3, V1] and [B, 3, V2]. I will concatenate the main tensor with the two tensors separately, to construct new tensors [B, 3, N+V1] and [B, 3, N+V2].
I pass my tensors to a plain MLP (consists of conv1d and batchnorm1d). Ideally, i want to predict something "point-wise", like no matter what the number of dimension 2, it has some consistent prediction only given the value. However, the batchnorm1d will have different results given input [B, 3, N+V1] and [B, 3, N+V2], while i am only focusing on first N points in 2nd dimension.
import torch
import torch.nn as nn
# nn.BatchNorm1d
B=2
dim=64
N=40000
V1=1000
v2=2000
torch.manual_seed(0)
x = torch.rand(B, dim, N) # here imgs are flattened from 28x28
v1 = torch.rand(B, dim, V1)
v2 = torch.rand(B, dim, v2)
layer = nn.BatchNorm1d(dim) # batch norm is done on channels
out2 = layer(torch.cat((x, v1), dim=2))
out3 = layer(torch.cat((x, v2), dim=2))
torch.equal(out2[:, :, :N], out3[:, :, :N])
Is there any possible way to have consistent prediction of first N points?
Is this more along the lines of what you're looking for? Normalizing just across the channels?
out2 = torch.cat((x, v1), dim=2) / torch.linalg.norm(torch.cat((x, v1), dim=2), dim=1, keepdim=True)
out3 = torch.cat((x, v2), dim=2) / torch.linalg.norm(torch.cat((x, v2), dim=2), dim=1, keepdim=True)
torch.equal(out2[:, :, :N], out3[:, :, :N])
# True
I think if you want to do something like this within pytorch nn libraries you'll need to transpose your channels and feature dimensions that way you can use LayerNorm1d or InstanceNorm. See here for a nice visual example of the different normalization techniques
Update answer:
In case you want to use an nn module specifically. InstanceNorm or GroupNorm could also get you the response. However the number of channels now differs between the two so you'll need two distinct layers.
layer1 = nn.GroupNorm(V1+N, V1+N)
layer2 = nn.GroupNorm(V2+N, V2+N)
out2 = layer1(torch.cat((x, v1), dim=2).transpose(1,2))
out3 = layer2(torch.cat((x, v2), dim=2).transpose(1,2))
torch.equal(out2[:, :N, :], out3[:, :N, :])
True

Iterative loss function Autoencoders

I am trying to implement a custom loss function in a Pytorch Autoencoder.
The loss function tries to maximize the cosine similarity between a given output tensor U (a vector) and 100 random vectors J where both U and J have the same dimension of [300]. This is repeated for each batch.
Suppose we have 30 items per batch, then the output tensor is
train_Y.shape = [30,300]
Random_vectors.shape = [30,100,300]
I can implement the loss function in two ways:
All_Y =[]
for Y,z_r in zip(train_y, random_vectors):
Y_cosine_list =[]
for z in z_r:
cosi = torch.dot(Y,z) / (torch.norm(Y)*torch.norm(z))
Y_cosine_list.append(cosi)
All_Y.append(Y_cosine_list)
All_Y = torch.tensor(All_Y).to(device)
train_loss = torch.sum(torch.abs(All_Y))/dim_0
train_loss = torch.tensor(train_loss.data, requires_grad = True)
or
train_Y = torch.zeros([dim_0, 100])
for i, (Y,z_r) in enumerate(zip(train_Y, random_vectors)):
for j,z in enumerate(z_r):
train_Y[i,j] = cos(Y,z)
train_Y = train_Y.to(device)
train_loss = torch.sum(torch.abs(train_Y))/dim_0
The second one is more elegant and to the point. However it is giving a "Cuda illegal memory access error". I have checked that the memory is not exceeded in either case. Is there anything wrong with the second implementation?
The first implementation is inelegant and I am not sure that it makes sense from a neural net optimization perspective. But it does not give errors and am able to complete training for all the epochs.
Ps: I have tried encapsulating this code block in a loss_fn method but I get the same illegal memory access error.
I have tried everything that I could find for the illegal memory access error - changing GPUs, removing a torch.stack block etc. But I can't seem to get rid of the problem.
Here is a vectorized way to do it
class CosineLoss(nn.Module):
def __init__(self, ):
super().__init__()
pass
def forward(self, x, y):
"""
Args:
x (torch.tensor): [batchsize, N, M] - tensor.
y (torch.tensor): [batchsize, M] - tensor.
Returns:
torch.tensor: scalar mean cosine loss
"""
# dot product along dimension 'm' i.e multiply and sum along 'm'.
dotp = torch.einsum("bm, bnm -> bn", y, x)
# L2 norm along dimension 'm' and multiply by broadcasting
length = torch.norm(y, dim=-1)[:, None]*torch.norm(x, dim=-1)
# cosine = dotproduct of unit vectors
cos = dotp/length
return cos.mean()
def test():
b, n, m = 30, 100, 300
train_Y = torch.randn(b, m, device='cuda')
random_vectors = torch.randn(b, n, m, requires_grad=True, device='cuda')
print(f'{random_vectors.grad = }')
cosineloss = CosineLoss()
loss = cosineloss(random_vectors, train_Y)
print(f'{loss = }')
loss.backward()
print(f'{random_vectors.grad.shape = }')
References:
einsum
broadcasting

How to calculate Batch Pairwise Distance in PyTorch efficiently

I have tensors X of shape BxNxD and Y of shape BxNxD.
I want to compute the pairwise distances for each element in the batch, i.e. I a BxMxN tensor.
How do I do this?
There is some discussion on this topic here: https://github.com/pytorch/pytorch/issues/9406, but I don't understand it as there are many implementation details while no actual solution is highlighted.
A naive approach would be to use the answer for non-batched pairwise distances as discussed here: https://discuss.pytorch.org/t/efficient-distance-matrix-computation/9065, i.e.
import torch
import numpy as np
B = 32
N = 128
M = 256
D = 3
X = torch.from_numpy(np.random.normal(size=(B, N, D)))
Y = torch.from_numpy(np.random.normal(size=(B, M, D)))
def pairwise_distances(x, y=None):
x_norm = (x**2).sum(1).view(-1, 1)
if y is not None:
y_t = torch.transpose(y, 0, 1)
y_norm = (y**2).sum(1).view(1, -1)
else:
y_t = torch.transpose(x, 0, 1)
y_norm = x_norm.view(1, -1)
dist = x_norm + y_norm - 2.0 * torch.mm(x, y_t)
return torch.clamp(dist, 0.0, np.inf)
out = []
for b in range(B):
out.append(pairwise_distances(X[b], Y[b]))
print(torch.stack(out).shape)
How can I do this without looping over B?
Thanks
I had a similar issue and spent some time to find the easiest and fastest solution. Now you can compute batched distance by using PyTorch cdist which will give you BxMxN tensor:
torch.cdist(Y, X)
Also, it works well if you just want to compute distances between each pair of rows of two matrixes.

Variationnal auto-encoder: implementing warm-up in Keras

I recently read this paper which introduces a process called "Warm-Up" (WU), which consists in multiplying the loss in the KL-divergence by a variable whose value depends on the number of epoch (it evolves linearly from 0 to 1)
I was wondering if this is the good way to do that:
beta = K.variable(value=0.0)
def vae_loss(x, x_decoded_mean):
# cross entropy
xent_loss = K.mean(objectives.categorical_crossentropy(x, x_decoded_mean))
# kl divergence
for k in range(n_sample):
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.,
std=1.0) # used for every z_i sampling
# Sample several layers of latent variables
for mean, var in zip(means, variances):
z_ = mean + K.exp(K.log(var) / 2) * epsilon
# build z
try:
z = tf.concat([z, z_], -1)
except NameError:
z = z_
except TypeError:
z = z_
# sum loss (using a MC approximation)
try:
loss += K.sum(log_normal2(z_, mean, K.log(var)), -1)
except NameError:
loss = K.sum(log_normal2(z_, mean, K.log(var)), -1)
print("z", z)
loss -= K.sum(log_stdnormal(z) , -1)
z = None
kl_loss = loss / n_sample
print('kl loss:', kl_loss)
# result
result = beta*kl_loss + xent_loss
return result
# define callback to change the value of beta at each epoch
def warmup(epoch):
value = (epoch/10.0) * (epoch <= 10.0) + 1.0 * (epoch > 10.0)
print("beta:", value)
beta = K.variable(value=value)
from keras.callbacks import LambdaCallback
wu_cb = LambdaCallback(on_epoch_end=lambda epoch, log: warmup(epoch))
# train model
vae.fit(
padded_X_train[:last_train,:,:],
padded_X_train[:last_train,:,:],
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=0,
callbacks=[tb, wu_cb],
validation_data=(padded_X_test[:last_test,:,:], padded_X_test[:last_test,:,:])
)
This will not work. I tested it to figure out exactly why it was not working. The key thing to remember is that Keras creates a static graph once at the beginning of training.
Therefore, the vae_loss function is called only once to create the loss tensor, which means that the reference to the beta variable will remain the same every time the loss is calculated. However, your warmup function reassigns beta to a new K.variable. Thus, the beta that is used for calculating loss is a different beta than the one that gets updated, and the value will always be 0.
It is an easy fix. Just change this line in your warmup callback:
beta = K.variable(value=value)
to:
K.set_value(beta, value)
This way the actual value in beta gets updated "in place" rather than creating a new variable, and the loss will be properly re-calculated.

Compare two linear regression models in MATLAB

I want to compare the performance of two models using the F statistic. Here is a reproducible example and the expected results:
load carbig
tbl = table(Acceleration,Cylinders,Horsepower,MPG);
% Testing separetly both models
mdl1 = fitlm(tbl,'MPG~1+Acceleration+Cylinders+Horsepower');
mdl2 = fitlm(tbl,'MPG~1+Acceleration');
% Comparing both models using the F-test and p-value
numerator = (mdl2.SSE-mdl1.SSE)/(mdl1.NumCoefficients-mdl2.NumCoefficients);
denominator = mdl1.SSE/mdl1.DFE;
F = numerator/denominator;
p = 1-fcdf(F,mdl1.NumCoefficients-mdl2.NumCoefficients,mdl1.DFE);
We end up with F = 298.75 and p = 0, indicating mdl1 is significantly better than mdl2, as assessed by the F statistic.
Is there anyway to obtain the F and p values without performing twice fitlm and doing all the computation?
I tried to run a coefTest, as suggested by #Glen_b, however the function is poorly documented and the results are not the ones I'm expecting.
[p,F] = coefTest(mdl1); % p = 0, F = 262.508 (this F test mdl1 vs constant mdl)
[p,F] = coefTest(mdl1,[0,0,1,1]); % p = 0, F = 57.662 (not sure what this is testing)
[p,F] = coefTest(mdl1,[1,1,0,0]); % p = 0, F = 486.810 (idem)
I believe I should carry the test with a different null hypothesis (C) using the function [p,F] = coeffTest(mdl1,H,C). But I don't really know how to do it and there's no example.
This answer is in regards to comparing two linear regression models where one model is a restricted version of the other.
Short answer:
To do an F-test on the restriction that the 3rd and 4th elements of your estimated, coefficient vector b are zero:
[p, F] = coefTest(mdl1, [0, 0, 1, 0; 0, 0, 0, 1]);
Further explanation:
Let b be our estimated vector. Linear restrictions on b are typically written in a matrix form: R*b = r. The restriction that 3rd and 4th element of b are zero would be written:
[0, 0, 1, 0 * b = [0
0, 0, 0, 1] 0];
The matrix [0, 0, 1, 0; 0, 0, 0, 1] is what coefTest calls the H matrix in the docs.
P = coefTest(M,H), with H a numeric matrix having one column for each
coefficient, performs an F test that H*B=0, where B represents the
coefficient vector.
Long version
Sometimes with this econometric routines, it's nice just to write it out yourself so you know what's really going on.
Remove rows with NaN because they just add unrelated complexity:
tbl_dirty = table(Acceleration,Cylinders,Horsepower,MPG);
tbl = tbl_dirty(~any(ismissing(tbl_dirty),2),:);
Do the estimation etc...
n = height(tbl); % number of observations
y = tbl.MPG;
X = [ones(n, 1), tbl.Acceleration, tbl.Cylinders, tbl.Horsepower];
k = size(X,2); % number of variables (including constant)
b = X \ y; % estimate b with least squares
u = y - X * b; % calculates residuals
s2 = u' * u / (n - k); % estimate variance of error term (assuming homoskedasticity, independent observations)
BCOV = inv(X'*X) * s2; % get covariance matrix of b assuming homoskedasticity of error term etc...
bse = diag(BCOV).^.5; % standard errors
R = [0, 0, 1, 0;
0, 0, 0, 1];
r = [0; 0]; % Testing restriction: R * b = r
num_restrictions = size(R, 1);
F = (R*b - r)'*inv(R * BCOV * R')*(R*b - r) / num_restrictions; % F-stat (see Hiyashi for reference)
Fp = 1 - fcdf(F, num_restrictions, n - k); % F p-val
For reference, can look at p. 65 of Hiyashi's book Econometrics.
No, there is not.
Fitlm fits an arbitrary model. In your case a regression model with an intercept and either one or three regressors. It might seem that the model with three regressors can use information from the model with one regressor, but this is only true if there are some restrictions on the model and even then this overlapping information is limited.
Fitlm is a very general framework which can be used for arbitrary models. Doing multiple regressions at the same time with sharing of information can thus get quite complex and is not implemented.
It is possible to implement this yourself for these two specific models. Usually such a linear regression is solved using the covariance matrix:
Beta = (X' X) ^-1 X' y
were X is the data with the variables as columns and y is the target variable. In this case you could reuse part of the covariance matrix for which you only need the columns from the smaller regression: the variation in Acceleration. Since adding 2 new variables adds 8 values yo the covariance matrix you only save 1/9 of the time. Furthermore, the heaviest part is the inversion. Thus the time improvement is very very little.
In short, just do two separate regressions