How to use predict with stored e(b) from old regression - regression

I know that one can get predicted values as follows:
reg y x1 x2 x3
predict pred_values
Let's say that I run a regression and store the values:
reg y x1 x2
matrix stored_b = e(b)
And then I run another regression (doesn't matter what).
Is it possible to use the predict command using stored_b instead of the current e(b)?
(Of course, I could generate the predicted values by manually computing them based on stored_b, but this could get tedious if there are many coefficients.)

There's no need to create a matrix. Stata has commands that facilitate the task. Try estimates store and estimates restore. An example:
clear
set more off
sysuse auto
// initial regression/predictions
regress price weight
estimates store myest
predict double resid, residuals
// second regression/prediction
regress price mpg
predict double residdiff, residuals
// backup and predict from initial regression results
estimates restore myest
predict double resid2, residuals
// should pass
assert resid == resid2
// should fail
assert resid == residdiff

Related

What is exact logic of performing batch normalization in deep learning?

After reading the research paper on batchnorm and its various descriptions in forums, I am still not clear how the basic computations are performed. The core of my questions is: a vector is normalized with respect to the set to which it belongs; we can thus normalize vectors input to layer 1 using the batch selected from the training set. Each input vector to the next layer needs to be normalized with respect to the set to which it belongs, but how do we get hold of that set?
More precisely, let
N = batch size;
Bi = set of vectors Xij (j = 1..n) whose normalized value will be
input to layer i of the network;
BN = Batch normalization function;
BN(Xij, Bi) = Normalized version of j-th vector, Xij, with respect to the set Bi.
BN(X1j, B1), j = 1..n, can be calculated because we know B1. They are inputs to layer 1.
We need BN(X2j, B2), j = 1..n, to input to layer 2, but we do not have B2 readily available. My questions is how to get B2, B3, etc.
We could process each BN(X1j, B1), j = 1..n, by layer 1 and remember the outputs as X2j (that collection will be B2). Then calculate BN(x2j, B2) for each j by normalizing with respect to B2 and input them to layer 2, etc. So the forward pass would consists of many such steps. For simplicity, I have ignored the scale and shift step, as that is not relevant to my question.
Being new to this topic, I would appreciate expert opinion on it.
Batch Norm first subtracts the mean to center the output around zero. Then Batch Normalization devides by the standart derivation to scale the output to unit variance.
This means that each vector is normalized with respect to the batch it belongs to. Assume you run your x through your first layer, lets call its output y1. Batch Normalization is then applied to y1 in the form of (y1 - y1.mean()) / y.std(). The 'set' you're talking about is just the vector y1, the output of the layer.
All this is of course ignoring versions where a bias and a derivation.

Wasserstein GAN implemtation in pytorch. How to implement the loss?

I'm currently working on a project in pytorch on Wasserstein GAN (https://arxiv.org/pdf/1701.07875.pdf).
In Wasserstain GAN a new objective function is defined using the wasserstein distance as :
Which leads to the following algorithms for training the GAN:
My question is :
When implementing line 5 and 6 of the algorithm in pytorch should I be multiplying my loss -1 ? As in my code (I use RMSprop as my optimizer for both the generator and critic):
############################
# (1) Update D network: maximize (D(x)) + (D(G(x)))
###########################
for n in range(n_critic):
D.zero_grad()
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
output = D(real_cpu)
#errD_real = -criterion(output, label) #DCGAN
errD_real = torch.mean(output)
# Calculate gradients for D in backward pass
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
# Generate batch of latent vectors
noise = torch.randn(b_size, 100, device=device) #Careful here we changed shape of input (original : torch.randn(4, 100, 1, 1, device=device))
# Generate fake image batch with G
fake = G(noise)
# Classify all fake batch with D
output = D(fake.detach())
# Calculate D's loss on the all-fake batch
errD_fake = torch.mean(output)
# Calculate the gradients for this batch
errD_fake.backward()
D_G_z1 = output.mean().item()
# Add the gradients from the all-real and all-fake batches
errD = -(errD_real - errD_fake)
# Update D
optimizerD.step()
#Clipping weights
for p in D.parameters():
p.data.clamp_(-0.01, 0.01)
As you can see, I do the operation errD = -(errD_real - errD_fake), with errD_real and errD_fake being respectively the mean of the predictions of the critic on real and fake samples.
To my understanding RMSprop should optimize the weights of the critic the following way :
w <- w - alpha*gradient(w)
(alpha being the learning rate divided by the square root of the weighted moving average of the squared gradient)
Since the optimization problem requires to "go" in the same direction as the gradient it should be required to multiply gradient(w) by -1 before optimizing the weights.
Do you think that my reasoning is right ?
The program runs but my results are quiet poor.
I follow the same logic for the generator's weights but this time in order to go in the opposite direction of the gradient:
############################
# (2) Update G network: minimize -D(G(x))
###########################
G.zero_grad()
noise = torch.randn(b_size, 100, device=device)
fake = G(noise)
#label.fill_(fake_label) # fake labels are real for generator cost
# Since we just updated D, perform another forward pass of all-fake batch through D
output = D(fake).view(-1)
# Calculate G's loss based on this output
#errG = criterion(output, label) #DCGAN
errG = -torch.mean(output)
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
Sorry for the long question, I tried to explain my doubt as clear as possible. Thank you everyone.
I noticed some errors in the implementation of your discriminator training protocol. You call your backward functions twice with both the real and fake values loss being backpropagated at different time steps.
Technically an implementation using this scheme is possible but highly unreadable. There was a mistake with your errD_real in which your output is going to be positive instead of negative as an optimal D(G(z))>0 and so you penalize it for being correct. Overall your model converges simply by predicting D(x)<0 for all inputs.
To fix this do not call your errD_readl.backward() or your errD_fake.backward(). Simply using an errD.backward() after you define errD would work perfectly fine. Otherwise, your generator seems to be correct.

How to do Poisson regression in gnuplot

I measured the spectrum of an X-ray tube with a Geiger counter. Such data aren't normally distributed, but instead they follow the Poisson distribution. I want to fit a curve to the measured data:
a = 1
b = 1
f(x) = a*(x/b - 1)/x**2
fit f(x) 'data.txt' via a,b
This will, however, perform the least-squares regression, which is theoretically wrong in this case. Instead I need to perform the Poisson regression. How do I achieve this?
The link function of the Poisson distribution is ln(μ). Is there a way to use this information, eg. with the gnuplot's link command?

Stata drops variables that "predicts failure perfeclty" even though the correlation between the variables isn't 1 or -1?

I am running a logit regression on some data. My dependent variable is binary as are all but one of my independent variables.
When I run my regression, stata drops many of my independent variables and gives the error:
"variable name" != 0 predicts failure perfectly
"variable name" dropped and "a number" obs not used
I know for a fact that some of the variables dropped don't predict failure perfectly. In other words, the dependent variables can take on the value 1 for either the value 1 or 0 of the independent variable.
Why is this happening and how can I resolve it?
Bivariate cross tabulation does not show the problem. Try this:
http://www.stata.com/support/faqs/statistics/completely-determined-in-logistic-regression/index.html
First confirm that this is what is happening [collinear]. (For your data, replace x1 and x2 with the independent variables of your model.)
Number covariate patterns:
egen pattern = group(x1 x2)
Identify pattern with only one outcome:
logit y x1 x2
predict p
summarize p
the extremes of p will be almost 0 or almost 1
tab pattern if p < 1e-7 // (use a value here slightly bigger than the min)
or in the above use "if p > 1 - 1e-7" if p is almost 1
list x1 x2 if pattern == XXXX // (use the value here from the tab step)
the above identifies the covariate pattern
The covariate pattern that predicts outcome perfectly may be meaningful to the researcher or may be an anomaly due to having many variables in the model.
Now you must get rid of the collinearity:
logit y x1 x2 if pattern ~= XXXX // (use the value here from the tab step)
note that there is collinearity
*You can omit the variable that logit drops or drop another one.
Refit the model with the collinearity removed:
logit y x1
You may or may not want to include the covariate pattern that predicts outcome perfectly. It depends on the answer to (3). If the covariate pattern that predicts outcome perfectly is meaningful, you may want to exclude these observations from the model:
logit y x1 if pattern ~= XXXX
Here one would report
Covariate pattern such and such predicted outcome perfectly
The best model for the rest of the data is ....xyz

Remove outliers with large standardized residuals in Stata

I run a simple regression in Stata for two subsamples and afterwards I want to exclude all observations with standardized residuals larger than 3.0. I tried:
regress y x if subsample_criteria==1
gen st_res1=e(rsta)
regress y x if subsample_criteria==0
gen st_res2=e(rsta)
drop if st_res1 | st_res2 > 3.0
However, the new variable is full of missing values and the values for the stand. residuals are not stored in the variables st_res1 and st_res2.
I am grateful for any hints!
The problem with your code is that Stata does not know what e(rsta) is (and neither do I), so it creates a missing, which Stata thinks of as very large positive number. All missings are greater than 3, so your constraint does not bind.
Ignoring the statistical merits of doing this, here's one way:
sysuse auto, clear
reg price mpg
predict ehat, rstandard
reg price mpg if abs(ehat)<3
Note that I am using the absolute value of the residual, which I think makes more sense here.
First, providing a MCVE is always a good first step (and fairly easy given Stata's sysuse and webuse commands). Now, on to the question.
See help regress postestimation and help predict for the proper syntax for generating new variables with residuals, etc. The syntax is a bit different from the gen command, as you will see below.
Note also that your drop if condition is improperly formatted, and right now is interpreted as drop if st_res1 != 0 | st_res2 > 3.0. (I also assume you want to drop standardized residuals < -3.0, but if this is incorrect, you can remove the abs() function.)
sysuse auto , clear
replace mpg = 10000 in 1/2
replace mpg = 0.0001 in 70
reg mpg weight if foreign
predict rst_for , rstandard
reg mpg weight if !foreign
predict rst_dom , rstandard
drop if abs(rst_for) > 3.0 | abs(rst_dom) > 3.0
Postscript: Note that you may also consider adding if e(sample) to your predict commands, depending on whether you wish to extrapolate the results of the subsample regression to the entire sample and evaluate all residuals, or whether you only wish to drop observations based on in-sample standardized residuals.