Best way to output teffects results from Stata to Excel? - regression

I want to output the ATE, std error, and p-value from this code:
teffects aipw (dep_var, logit) (treatment pred1 pred2 pred3)
I used this code:
putexcel set "$root/filename.xlsx", sheet("5") modify
putexcel A1=`e(stat)'
but it says "ate not found." Shouldn't the ate be stored automatically in e(stat)?

e(stat) stores the statistic that is estimated as a string, i.e. "ate" or "pomeans". This doesn't contain the actual point estimate.
The coefficients and standard errors can be accessed after any estimation command with the following syntax: _b[coef], _se[coef] or [eqno]_b[coef]/_b[eqno:coef] and [eqno]_se[coef]/_se[eqno:coef] in the case of multiple equation models.
You can specify the coeflegend option to most estimation commands to see how coefficients are named.
Example:
. webuse cattaneo2
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)
. teffects aipw (bweight prenatal1 mmarried mage fbaby) (mbsmoke mmarried c.mage##c.mage fbaby medu, probit), coeflegend
Iteration 0: EE criterion = 4.629e-21
Iteration 1: EE criterion = 1.939e-25
Treatment-effects estimation Number of obs = 4,642
Estimator : augmented IPW
Outcome model : linear by ML
Treatment model: probit
----------------------------------------------------------------------------------------
bweight | Coef. Legend
-----------------------+----------------------------------------------------------------
ATE |
mbsmoke |
(smoker vs nonsmoker) | -230.9892 _b[ATE:r1vs0.mbsmoke]
-----------------------+----------------------------------------------------------------
POmean |
mbsmoke |
nonsmoker | 3403.355 _b[POmean:0.mbsmoke]
----------------------------------------------------------------------------------------
. di _b[ATE:r1vs0.mbsmoke]
-230.9892
. di _se[ATE:r1vs0.mbsmoke]
26.210565
Any other statistics can be obtained from r(table), type matrix list r(table) after the estimation command to see this. For example, to obtain the pvalue:
mat A = r(table)
scalar pval = A[4,1]
di pval

Related

Problem while implementating of “Continual Learning Through Synaptic Intelligence” paper

I am trying to reproduce the results of “Continual Learning Through Synaptic Intelligence” paper [1]. I tried implementing the algorithm as best as I could understand after going through paper many times. I also looked at it’s official implementation on github which is in tensorflow 1.0, but could not understand much as I don’t have much familiarity with that.
Though I got some results but not good enough as paper. I wanted to ask if anyone can help me to find out where I am going wrong. Before going into coding details I want to discuss sudo code so that I undersatnd what is going wrong with my implementation.
Here is kind of sudo code that I have implemented. Please help me.
lambda = 1
xi = 1e-3
total_tasks = 5
model = NN(total_tasks)
## multiheaded linear model ([784(input)-->256-->256-->2(output)(*5, 5 separate heads)])
## output layer is 2 neuron head (separate heads for each task, total 5 tasks)
## output is vector of size 2 (for 2 classes)
prev_theta = model.theta(copy=True) # updated at end of task
## model.theta() returns list of shared parameters (i.e. layer1 and layer2 excluding output layer)
## copy=True, gives copy of parameters
## so it don't effect original params connected to computaitonal graph
omega_total = zero_like(prev_theta) ## Capital Omega in paper (per-parameter regularization strength)
omega = zero_like(prev_theta) ## small omega in paper (per-parameter contribution to loss)
for task_num in range(total_tasks):
optmizer = ADAM() # created before every task (or reset it)
prev_theta_step = model.theta(copy=True) # updated at end of step
## trainig for task start
for epoch in range(10):
for steps in range(steps_per_epoch):
X, Y = train_dataset[task_num].sample()
## X is flattened image of size 784
## Y is binary vector of size 2 ([0,1] or [1,0])
Y_pred = model(X, task_num) # model is multihead, task_num selects the head
loss = CROSS_ENTROPY(Y_pred, Y)
if(task_num>0): ## reg_loss starts from second task
theta = model.theta()
## here copy is not true so it returns params connected to computaitonal graph
reg_loss = torch.sum(omega_total*torch.square(theta - prev_theta))
loss = loss + lambda*reg_loss
optmizer.zero_grad()
loss.backward()
theta = model.theta(copy=True)
grads = model.theta_grads() ## grads of shared paramters only
omega = omega - grads*(theta - prev_theta_step)
prev_theta_step = theta
optimizer.step()
## training for task complete, update importance parameters
theta = model.theta(copy=True)
omega_total += relu( omega/( (theta - prev_theta)**2 + xi) )
prev_theta = theta
omega = torch.zeros(theta_shape)
## evaluation code
...
...
...
## evaluation done
I am also attaching result I got. In results ‘one’ (blue) represents without regression loss (lambda=0), ‘two’ (green) represents with regression loss (lambda=1).
Thank you for reading so far. Kindly help me out.

Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512) with Hugging face sentiment classifier

I'm trying to get the sentiments for comments with the help of hugging face sentiment analysis pretrained model. It's returning error like Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512) with Hugging face sentiment classifier.
Below I'm attaching the code please look at it
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import transformers
import pandas as pd
model = AutoModelForSequenceClassification.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')
token = AutoTokenizer.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')
classifier = pipeline(task='sentiment-analysis', model=model, tokenizer=token)
data = pd.read_csv('/content/drive/MyDrive/DisneylandReviews.csv', encoding='latin-1')
data.head()
Output is
Review
0 If you've ever been to Disneyland anywhere you...
1 Its been a while since d last time we visit HK...
2 Thanks God it wasn t too hot or too humid wh...
3 HK Disneyland is a great compact park. Unfortu...
4 the location is not in the city, took around 1...
Followed by
classifier("My name is mark")
Output is
[{'label': 'POSITIVE', 'score': 0.9953688383102417}]
Followed by code
basic_sentiment = [i['label'] for i in value if 'label' in i]
basic_sentiment
Output is
['POSITIVE']
Appending the total rows to empty list
text = []
for index, row in data.iterrows():
text.append(row['Review'])
I'm trying to get the sentiment for all the rows
sent = []
for i in range(len(data)):
sentiment = classifier(data.iloc[i,0])
sent.append(sentiment)
The error is :
Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512). Running this sequence through the model will result in indexing errors
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-19-4bb136563e7c> in <module>()
2
3 for i in range(len(data)):
----> 4 sentiment = classifier(data.iloc[i,0])
5 sent.append(sentiment)
11 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1914 # remove once script supports set_grad_enabled
1915 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1917
1918
IndexError: index out of range in self
some of the sentences in your Review column of the data frame are too long. when these sentences are converted to tokens and sent inside the model they are exceeding the 512 seq_length limit of the model, the embedding of the model used in the sentiment-analysis task was trained on 512 tokens embedding.
to fix this issue you can filter out the long sentences and keep only smaller ones (with token length < 512 )
or you can truncate the sentences with truncating = True
sentiment = classifier(data.iloc[i,0], truncation=True)
If you're tokenizing separately from your classification step, this warning can be output during tokenization itself (as opposed to classification).
In my case, I am using a BERT model, so I have MAX_TOKENS=510 (leaving room for the sequence-start and sequence-end tokens).
token = AutoTokenizer.from_pretrained("your model")
tokens = token.tokenize(
text, max_length=MAX_TOKENS, truncation=True
)
Now, when you run your classifier, the tokens are guaranteed not to exceed the maximum length.

Tukey's Test HSD.TEST interaction output is null.

I have a question regarding a means separation test to test interactions. Is it possible to perform a Tukey's Test with interactions? One of my terms in my model is significant and I want to know which ones they are.
For example my model is
ModelB<- lm(RATING.2~Allele%in%QTL11.Source+Family%in%QTL11.Source+REP)
anova(ModelB)
Response: RATING.2
Df Sum Sq Mean Sq F value Pr(>F)
REP 2 1.301 0.6507 0.9266 0.3993
Allele:QTL11.Source 8 105.021 13.1276 18.6941 < 2.2e-16 ***
QTL11.Source:Family 37 68.644 1.8552 2.6419 6.873e-05 ***
Residuals 100 70.223 0.7022
---
TukeysTestB<- HSD.test(ModelB,"Allele%in%QTL11.Source")
TukeysTestB
NULL
Why am I getting a NULL output, Is this not possible to test?

Error in eval(expr, envir, enclos) while using Predict function

When I try to run predict() on the dataset, it keeps giving me error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Here is the part of dataset -
LoanRange Loan.Type N WAFICO WALTV WAOrigRev WAPTValue
1 0-99999 Conventional 109 722.5216 63.55385 6068.239 0.6031879
2 0-99999 FHA 30 696.6348 80.00100 7129.650 0.5623650
3 0-99999 VA 13 698.6986 74.40525 7838.894 0.4892977
4 100000-149999 Conventional 860 731.2333 68.25817 6438.330 0.5962638
5 100000-149999 FHA 285 673.2256 82.42225 8145.068 0.5211495
6 100000-149999 VA 125 704.1686 87.71306 8911.461 0.5020074
7 150000-199999 Conventional 1291 738.7164 70.08944 8125.979 0.6045117
8 150000-199999 FHA 403 672.0891 84.65318 10112.192 0.5199632
9 150000-199999 VA 195 694.1885 90.77495 10909.393 0.5250807
10 200000-249999 Conventional 1162 740.8614 70.65027 8832.563 0.6111419
11 200000-249999 FHA 348 667.6291 85.13457 11013.856 0.5374226
12 200000-249999 VA 221 702.9796 91.76759 11753.642 0.5078298
13 250000-299999 Conventional 948 742.0405 72.22742 9903.160 0.6106858
Following is the code used for predicting count data N after determining the overdispersion-
model2=glm(N~Loan.Type+WAFICO+WALTV+WAOrigRev+WAPTValue, family=quasipoisson(link = "log"), data = DF)
summary(model2)
This is what I have done to create a sequence of count and use predict function-
countaxis <- seq (0,1500,150)
Y <- predict(model2, list(N=countaxis, type = "response")
At this step, I get the error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Can someone please point me where is the problem here.
Think about what exactly you are trying to predict. You are providing the predict function values of N (via countaxis), but in fact the way you set up your model, N is your response variable and the remaining variables are the predictors. That's why R is asking for LoanRange. It actually needs values for LoanRange, Loan.Type, ..., WAPTValue in order to predict N. So you need to feed predict inputs that let the model try to predict N.
For example, you could do something like this:
# create some fake data to predict N
newdata1 = data.frame(rbind(c("0-99999", "Conventional", 722.5216, 63.55385, 6068.239, 0.6031879),
c("150000-199999", "VA", 12.5216, 3.55385, 60.239, 0.0031879)))
colnames(newdata1) = c("LoanRange" ,"Loan.Type", "WAFICO" ,"WALTV" , "WAOrigRev" ,"WAPTValue")
# ensure that numeric variables are indeed numeric and not factors
newdata1$WAFICO = as.numeric(as.character(newdata1$WAFICO))
newdata1$WALTV = as.numeric(as.character(newdata1$WALTV))
newdata1$WAPTValue = as.numeric(as.character(newdata1$WAPTValue))
newdata1$WAOrigRev = as.numeric(as.character(newdata1$WAOrigRev))
# make predictions - this will output values of N
predict(model2, newdata = newdata1, type = "response")

Regression estimation in Eviews

I estimate the dependency of export,gdp and human capital. If choosing the linear method, I got this:
Dependent Variable: EXPORTS
Method: Least Squares
Date: 05/23/15 Time: 18:20
Sample: 1960 2011
Included observations: 52
Variable Coefficient Std. Error t-Statistic Prob.
C 2.63E+10 1.38E+10 1.911506 0.0618
HC -1.36E+10 6.08E+09 -2.233089 0.0301
GDP 2903680. 192313.2 15.09870 0.0000
R-squared 0.967407 Mean dependent var 1.90E+10
Adjusted R-squared 0.966076 S.D. dependent var 2.22E+10
S.E. of regression 4.08E+09 Akaike info criterion 47.15324
Sum squared resid 8.16E+20 Schwarz criterion 47.26581
Log likelihood -1222.984 Hannan-Quinn criter. 47.19640
F-statistic 727.1844 Durbin-Watson stat 0.745562
Prob(F-statistic) 0.000000
The sign of HC coefficient is negative, which is against the theory.I have tried logarithmic, exponential forms, but I still get negative results for HC.
I wonder what is the way to estimate it right.
Thank you in advance.
here is my data
Durbin-Watson stat 0.745562
It means that there is auto correlation problem in your model.