Variance of the prediction of a two-point change after linear regression - regression

This may be an absolutely basic question, so forgive me if it is.
Let's say I have the following regression output
Call:
lm(formula = y ~ x1 + x2, data = fake)
Residuals:
Min 1Q Median 3Q Max
-2.9434 -0.6851 0.0231 0.6744 3.6313
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.130431 0.056489 -2.309 0.0211 *
x1 0.014597 0.003454 4.226 2.59e-05 ***
x2 -0.025518 0.062429 -0.409 0.6828
---
Now, I want to predict y from new data. I'm specifically interested in the error around my prediction.
Let's say that one row in the new observed dataset has
x1 = 2
x2 = 0
The expected value for y_pred is
y_pred = -0.1304 + 2 * 0.01460
...but what is the standard deviation of that prediction? Can I figure out 95% CIs on that prediction?
And specifically, would I figure those out by applying the std. error of beta_1 twice, once for each unit increase, or do I apply it only once?
EDIT to add: I don't have the original data, just the coefficient estimate and SE...so calculating using matrix algebra won't be possible.

Related

glmer - odds ratios & interpreting a binary outcome

Dear Members of the community,
It is my understanding that the summary() of a lmer model outputs values that are in line with the unit of measure specified by the nature of the outcome (e.g. reaction time in milliseconds for a given trial);
But what of glmer models where the outcome is strictly binary ? (e.g. 0/1 score for a given trial) ?
I read that in such a case the concept of odd-ratios allows to quantify results on a logit scale but am unsure how to implement this for the current example :
> summary(recsem.full)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: binomial ( logit )
Formula: score ~ 1 + modality + session + (1 + modality | PID) + (1 +
modality | stim)
Data: recsem
Control: glmerControl(optimizer = "nmkbw")
AIC BIC logLik deviance df.resid
1634.7 1685.5 -808.4 1616.7 2063
Scaled residuals:
Min 1Q Median 3Q Max
-7.3513 0.1192 0.2441 0.4284 2.1665
Random effects:
Groups Name Variance Std.Dev. Corr
PID (Intercept) 1.8173 1.3481
modality 1.1752 1.0841 -0.44
stim (Intercept) 0.5507 0.7421
modality 0.2508 0.5008 0.66
Number of obs: 2072, groups: PID, 37; stim, 28
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.1191 0.2972 7.129 1.01e-12 ***
modality 0.6690 0.2815 2.376 0.01749 *
session -0.4239 0.1302 -3.255 0.00113 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) modlty
modality -0.269
session -0.251 -0.008
Looking at the fixed effects my interpretation is that both predictors are significant but how do I interpret / transform the estimates into something more tangible ?
Thank you very much in advance for any insights you might be able to share;
Have a nice day !

mIoU for multi-class

I would like to understand how mIoU is calculated for multi-class classification. The formula for each class is
IoU formula
and then the average is done over the classes to get the mIoU. However, I don't understand what happens for the classes that are not represented. The formula becomes a division by 0, so I ignore them and the average is only computed for the classes represented.
The problem is that when a prediction is wrong, the accuracy is really lowered. It adds another class to make the average. For instance : in semantic segmentation the ground-truth of an image is made of 4 classes (0,1,2,3) and 6 classes are represented over the dataset. The prediction is also made of 4 classes (0,1,4,5) but all the items classified in 2 and 3 (in the ground-truth) are classified in 4 and 5 (in the prediction). In this case should we calculate the mIoU over 6 classes ? Even if 4 classes are totally wrong and there respective IoU is 0 ? So the problem is that if just one pixel is predicted in a class that is not in the ground_truth, we have to divide by a higher denominator and it lows a lot the score.
Is it the correct way to compute the mIoU for multi-class (and the semantic segmentation) ?
Instead of calculating the miou of each image and then calculate the "mean" miou over all the images, I calculate the miou as one big image. If a class is not in the image and is not predicited, I set there respective iou equal to 1.
From scratch :
def miou(gt,pred,nbr_mask):
intersection = np.zeros(nbr_mask) # int = (A and B)
den = np.zeros(nbr_mask) # den = A + B = (A or B) + (A and B)
for i in range(len(gt)):
for j in range(height):
for k in range(width):
if pred[i][j][k]==gt[i][j][k]:
intersection[gt[i][j][k]]+=1
den[pred[i][j][k]] += 1
den[gt[i][j][k]] += 1
mIoU = 0
for i in range(nbr_mask):
if den[i]!=0:
mIoU+=intersection[i]/(den[i]-intersection[i])
else:
mIoU+=1
mIoU=mIoU/nbr_mask
return mIoU
With gt the array of ground truth labels and pred the prediction of theassociated images (have to correspond in the array and be the same size).
Adding to the previous answer, this is a great fast and efficient pytorch GPU implementation of calculating the mIOU and classswise IOU for a batch of size (N, H, W) (both pred mask and labels), taken from the NeurIPS 2021 paper "Few-Shot Segmentation via Cycle-Consistent Transformer", github repo available here.
def intersectionAndUnionGPU(output, target, K, ignore_index=255):
# 'K' classes, output and target sizes are N or N * L or N * H * W, each value in range 0 to K - 1.
assert (output.dim() in [1, 2, 3])
assert output.shape == target.shape
output = output.view(-1)
target = target.view(-1)
output[target == ignore_index] = ignore_index
intersection = output[output == target]
area_intersection = torch.histc(intersection, bins=K, min=0, max=K-1)
area_output = torch.histc(output, bins=K, min=0, max=K-1)
area_target = torch.histc(target, bins=K, min=0, max=K-1)
area_union = area_output + area_target - area_intersection
return area_intersection, area_union, area_target
Example usage:
output = torch.rand(4, 5, 224, 224) # model output; batch size=4; channels=5, H,W=224
preds = F.softmax(output, dim=1).argmax(dim=1) # (4, 224, 224)
labels = torch.randint(0,5, (4, 224, 224))
i, u, _ = intersectionAndUnionGPU(preds, labels, 5) # 5 is num_classes
classwise_IOU = i/u # tensor of size (num_classes)
mIOU = i.sum()/u.sum() # mean IOU, taking (i/u).mean() is wrong
Hope this helps everyone!
(A non-GPU implementation is available as well in the repo!)

Using linear approximation to perform addition and subtraction | error barrier

I'm attempting my first solo project, after taking an introductory course to machine learning, where I'm trying to use linear approximation to predict the outcome of addition/subtraction of two numbers.
I have 3 features: first number, subtraction/addition (0 or 1), and second number.
So my input looks something like this:
3 0 1
4 1 2
3 0 3
With corresponding output like this:
2
6
0
I have (I think) successfully implemented logistic regression algorithm, as the squared error does gradually decrease, but in 100 values, ranging from 0 to 50, the squared error value flattens out at around 685.6 after about 400 iterations.
Graph: Squared Error vs Iterations
.
To fix this, I have tried using a larger dataset for training, getting rid of regularization, and normalizing the input values.
I know that one of the steps to fix high bias is to add complexity to the approximation, but I want to maximize the performance at this particular level. Is it possible to go any further on this level?
My linear approximation code in Octave:
% Iterate
for i = 1 : iter
% hypothesis
h = X * Theta;
% reg theta prep
regTheta = Theta;
regTheta(:, 1) = 0;
% cost calc
J(i, 2) = (1 / (2 * m)) * (sum((h - y) .^ 2) + lambda * sum(sum(regTheta .^ 2,1),2));
% theta calc
Theta = Theta - (alpha / m) * ((h - y)' * X)' + lambda * sum(sum(regTheta, 1), 2);
end
Note: I'm using 0 for lambda, as to ignore regularization.

Variationnal auto-encoder: implementing warm-up in Keras

I recently read this paper which introduces a process called "Warm-Up" (WU), which consists in multiplying the loss in the KL-divergence by a variable whose value depends on the number of epoch (it evolves linearly from 0 to 1)
I was wondering if this is the good way to do that:
beta = K.variable(value=0.0)
def vae_loss(x, x_decoded_mean):
# cross entropy
xent_loss = K.mean(objectives.categorical_crossentropy(x, x_decoded_mean))
# kl divergence
for k in range(n_sample):
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.,
std=1.0) # used for every z_i sampling
# Sample several layers of latent variables
for mean, var in zip(means, variances):
z_ = mean + K.exp(K.log(var) / 2) * epsilon
# build z
try:
z = tf.concat([z, z_], -1)
except NameError:
z = z_
except TypeError:
z = z_
# sum loss (using a MC approximation)
try:
loss += K.sum(log_normal2(z_, mean, K.log(var)), -1)
except NameError:
loss = K.sum(log_normal2(z_, mean, K.log(var)), -1)
print("z", z)
loss -= K.sum(log_stdnormal(z) , -1)
z = None
kl_loss = loss / n_sample
print('kl loss:', kl_loss)
# result
result = beta*kl_loss + xent_loss
return result
# define callback to change the value of beta at each epoch
def warmup(epoch):
value = (epoch/10.0) * (epoch <= 10.0) + 1.0 * (epoch > 10.0)
print("beta:", value)
beta = K.variable(value=value)
from keras.callbacks import LambdaCallback
wu_cb = LambdaCallback(on_epoch_end=lambda epoch, log: warmup(epoch))
# train model
vae.fit(
padded_X_train[:last_train,:,:],
padded_X_train[:last_train,:,:],
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=0,
callbacks=[tb, wu_cb],
validation_data=(padded_X_test[:last_test,:,:], padded_X_test[:last_test,:,:])
)
This will not work. I tested it to figure out exactly why it was not working. The key thing to remember is that Keras creates a static graph once at the beginning of training.
Therefore, the vae_loss function is called only once to create the loss tensor, which means that the reference to the beta variable will remain the same every time the loss is calculated. However, your warmup function reassigns beta to a new K.variable. Thus, the beta that is used for calculating loss is a different beta than the one that gets updated, and the value will always be 0.
It is an easy fix. Just change this line in your warmup callback:
beta = K.variable(value=value)
to:
K.set_value(beta, value)
This way the actual value in beta gets updated "in place" rather than creating a new variable, and the loss will be properly re-calculated.

Plotting a 3D function with Octave

I am having a problem graphing a 3d function - when I enter data, I get a linear graph and the values don't add up if I perform the calculations by hand. I believe the problem is related to using matrices.
INITIAL_VALUE=999999;
INTEREST_RATE=0.1;
MONTHLY_INTEREST_RATE=INTEREST_RATE/12;
# ranges
down_payment=0.2*INITIAL_VALUE:0.1*INITIAL_VALUE:INITIAL_VALUE;
term=180:22.5:360;
[down_paymentn, termn] = meshgrid(down_payment, term);
# functions
principal=INITIAL_VALUE - down_payment;
figure(1);
plot(principal);
grid;
title("Principal (down payment)");
xlabel("down payment $");
ylabel("principal $ (amount borrowed)");
monthly_payment = (MONTHLY_INTEREST_RATE*(INITIAL_VALUE - down_paymentn))/(1 - (1 + MONTHLY_INTEREST_RATE)^-termn);
figure(2);
mesh(down_paymentn, termn, monthly_payment);
title("monthly payment (principal(down payment)) / term months");
xlabel("principal");
ylabel("term (months)");
zlabel("monthly payment");
The 2nd figure like I said doesn't plot like I expect. How can I change my formula for it to render properly?
I tried your script, and got the following error:
error: octave_base_value::array_value(): wrong type argument `complex matrix'
...
Your monthly_payment is a complex matrix (and it shouldn't be).
I guess the problem is the power operator ^. You should be using .^ for element-by-element operations.
From the documentation:
x ^ y
x ** y
Power operator. If x and y are both scalars, this operator returns x raised to the power y. If x is a scalar and y is a square matrix, the result is computed using an eigenvalue expansion. If x is a square matrix. the result is computed by repeated multiplication if y is an integer, and by an eigenvalue expansion if y is not an integer. An error results if both x and y are matrices.
The implementation of this operator needs to be improved.
x .^ y
x .** y
Element by element power operator. If both operands are matrices, the number of rows and columns must both agree.