Backwards stepwise regression approach in Stata 13 - regression

. stepwise, pr(.05) : logit y1 (x1-x7)
begin with full model
p < 0.0500 for all terms in model
Logistic regression Number of obs = 28900
LR chi2(66) = 1182.91
Prob > chi2 = 0.0000
Log likelihood = -28120.170 Pseudo R2 = 0.0213
------------------------------------------------------------------------------
churn | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .0019635 .0007981 2.46 0.014 .0003992 .0035278
x2 | -.0002809 .0000496 -5.66 0.000 -.0003782 -.0001836
x3 | -.0031225 .0008888 -3.51 0.000 -.0048645 -.0013806
x4 | -.0011958 .0059387 -0.20 0.840 -.0128354 .0104439
x5 | .0007603 .0002804 2.71 0.007 .0002106 .0013099
x6 | .0070912 .0020636 3.44 0.001 .0030467 .0111357
x7 | -.0004919 .0000535 -9.19 0.660 -.0005968 -.0003871
_cons | .1497005 .0952738 1.57 0.116 -.0370327 .3364336
------------------------------------------------------------------------------
Note: 0 failures and 1 success completely determined.
As you can see, in the above logistic regression output, x4 and x7 both have p-values that are >0.05... however, Stata is telling me that p < 0.0500 for all terms in model, thereby rendering my stepwise approach useless.
Can anyone please advise what I may be doing wrong?

You insisted with your syntax that all the variables be kept together, so Stata has nowhere to go from where it started in this case. Hence there can be nothing stepwise with your syntax: it's either all in or all out.
See the help: a varlist in parentheses indicates that this group of variables is to be included or excluded together. All the predictors are so bound by what you typed.
After reading the help, all you may need to do is to omit the parentheses.
(Lack of a Stata tag for a month cut down mightily on the Stata users reading this.)

Related

ANOVA table in R: F-value does not "match the math"

I was playing around with a simple linear models when I noticed that, in the ANOVA table, the ratio MSreg/MSres does not exactly correspond to the F-value. Indeed, the two values are very similar but not the same.
Here my script
#quick view of the dataset
> head(my_data)
Diameter Height
1 0.325 0.080
2 0.320 0.100
3 0.280 0.110
4 0.125 0.040
5 0.400 0.135
6 0.335 0.100
#setting up the lm()
> ls1 <- lm(Diameter~Height, data=my_data)
> anova(ls1)
Analysis of Variance Table
Response: Diameter
Df Sum Sq Mean Sq F value Pr(>F)
Height 1 0.82415 0.82415 602.63 < 2.2e-16 ***
Residuals 98 0.13402 0.00137
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here 0.82415/0.00137=601.5693 which is not the F value in the table. Is there a particular reason for that?

which post-hoc test after welch-anova

i´m doing the statistical evaluation for my master´s thesis. the levene test was significant so i did the welch anova which was significant. now i tried the games-howell post hoc test but it didn´t work.
can anybody help me sending me the exact functions which i have to run in R to do the games-howell post hoc test and to get kind of a compact letter display, where it shows me which treatments are not significantly different from each other? i also wanted to ask if i did the welch anova the right way (you can find the output of R below)
here it the output which i did till now for the statistical evalutation:
data.frame': 30 obs. of 3 variables:
$ Dauer: Factor w/ 6 levels "0","2","4","6",..: 1 2 3 4 5 6 1 2 3 4 ...
$ WH : Factor w/ 5 levels "r1","r2","r3",..: 1 1 1 1 1 1 2 2 2 2 ...
$ TSO2 : num 107 86 98 97 88 95 93 96 96 99 ...
> leveneTest(TSO2~Dauer, data=TSO2R)
`Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 5 3.3491 0.01956 *
24
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1`
`> oneway.test (TSO2 ~Dauer, data=TSO2R, var.equal = FALSE) ###Welch-ANOVA
One-way analysis of means (not assuming equal variances)
data: TSO2 and Dauer
F = 5.7466, num df = 5.000, denom df = 10.685, p-value = 0.00807
'''`
Thank you very much!

Display symbolic expression in octave. Matrix multiplication as an expression and not as a result

I have hard time finding out how to display matrix multiplication as an expression, not as a result of an expression. The expression must be displayed in command line, not as a plot.
Lets say I have
syms m00 m01 m10 m11;
M = [m00 m01; m10 m11];
syms x0 x1;
X = [x0; x1];
I want to see the expression M * X as a symbolic expression. Something that will be displayed like:
| m00 m01 | * | x0 |
| m10 m11 | | x1 |
And will not be displayed as a result of M * X evaluation:
| m00*x0 + m01*x1 |
| m10*x0 + m11*x1 |
I have read documentation on octave symbolic package. Can not seem to find the mechanics there. My thoughts are wrapped around converting expressions to latex,
m = latex(M)
x = latex(X)
Concatenating the result as a latex string, and somehow print the latex string in octave command line. No luck as of now.

How to calculate the Hamming weight for a vector?

I am trying to calculate the Hamming weight of a vector in Matlab.
function Hamming_weight (vet_dec)
Ham_Weight = sum(dec2bin(vet_dec) == '1')
endfunction
The vector is:
Hamming_weight ([208 15 217 252 128 35 50 252 209 120 97 140 235 220 32 251])
However, this gives the following result, which is not what I want:
Ham_Weight =
10 10 9 9 9 5 5 7
I would be very grateful if you could help me please.
You are summing over the wrong dimension!
sum(dec2bin(vet_dec) == '1',2).'
ans =
3 4 5 6 1 3 3 6 4 4 3 3 6 5 1 7
dec2bin(vet_dec) creates a matrix like this:
11010000
00001111
11011001
11111100
10000000
00100011
00110010
11111100
11010001
01111000
01100001
10001100
11101011
11011100
00100000
11111011
As you can see, you're interested in the sum of each row, not each column. Use the second input argument to sum(x, 2), which specifies the dimension you want to sum along.
Note that this approach is horribly slow, as you can see from this question.
EDIT
For this to be a valid, and meaningful MATLAB function, you must change your function definition a bit.
function ham_weight = hamming_weight(vector) % Return the variable ham_weight
ham_weight = sum(dec2bin(vector) == '1', 2).'; % Don't transpose if
% you want a column vector
end % endfunction is not a MATLAB command.

How do I create a fitted value with a subset of regression coefficients in place of all coefficients?

I run a simple regression and find the fitted value like this:
sysuse auto, clear
reg price mpg c.mpg#foreign i.rep78 headroom trunk
predict fitted_price, xb
This gives me these coefficients:
-------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mpg | -306.1891 77.01548 -3.98 0.000 -460.243 -152.1352
|
foreign#c.mpg |
Foreign | 60.58403 37.24129 1.63 0.109 -13.90964 135.0777
|
rep78 |
2 | 999.7779 2150.269 0.46 0.644 -3301.4 5300.956
3 | 1200.741 2001.853 0.60 0.551 -2803.561 5205.043
4 | 1032.778 2070.513 0.50 0.620 -3108.864 5174.42
5 | 2081.128 2200.998 0.95 0.348 -2321.523 6483.779
|
headroom | -611.7201 502.3401 -1.22 0.228 -1616.55 393.1097
trunk | 134.4143 110.8262 1.21 0.230 -87.27118 356.0998
_cons | 10922.46 2803.271 3.90 0.000 5315.082 16529.84
-------------------------------------------------------------------------------
For purposes of a counterfactual (especially important in time series), I might want to find the fitted value using a subset of the coefficients from this regression. For example, I might want to find the fitted value using all the coefficients from this regression except the coefficient(s) from the interaction between mpg and foreign, i.e. c.mpg#foreign. (Note that this is different from simply running the regression again without the interaction, because that will yield different coefficients).
As of now, I do this:
sysuse auto, clear
reg price mpg c.mpg#foreign i.rep78 headroom trunk
matrix betas = e(b)
local names: colnames betas
foreach name of local names {
if strpos("`name'", "#") > 0 {
scalar define col_idx = colnumb(betas, "`name'")
matrix betas[1, col_idx] = 0
}
}
matrix score fitted_price_no_interact = betas
This isn't a robust solution because it relies on the naming convention of # in the column names of the coefficient matrix, and breaks down if I want to include one set of interactions but not another. I can code something like this for a specific regression, by manually specifying the names, but if I change the regression, I have to manually change the code.
Is there a more robust way to do this, e.g.
predict fitted_price, xb exclude(c.mpg#foreign trunk)
that will simplify this process for me?
Edit 2015-03-29: Use the original method on one subset of interactions, but retain others
A great advantage of your original method is that it can handle interactions of any complexity. The major defect is that it won't ignore interactions that you want to keep in the model. But if you use xi to create these, # won't appear in their names.
sysuse auto, clear
recode rep78 1 = 2 //combine small categories
xi, prefix("") i.rep78*mpg // mpg*i.rep78 won't work
des _I*
reg price mpg foreign c.mpg#foreign _I* headroom trunk
matrix betas = e(b)
local names: colnames betas
foreach name of local names {
if strpos("`name'", "#") > 0 {
scalar define col_idx = colnumb(betas, "`name'")
matrix betas[1, col_idx] = 0
}
matrix score fit_sans_mpgXforeign = betas
Edit 2015-03-28
The xi prefix wasn't needed, so, for example, this works in Stata 13.
sysuse auto, clear
gen intx = c.mpg#foreign
reg price mpg foreign i.rep78 headroom trunk intx
predict mhat
gen fitted_sans_interaction = mhat -_b[intx]*intx
Previous Answer
sysuse auto, clear
xi: gen intx = c.mpg#foreign
reg price mpg foreign i.rep78 headroom trunk intx
predict mhat
gen fitted_sans_interaction = mhat -_b[intx]*intx
or even
sysuse auto, clear
xi: gen intx = c.mpg#foreign
reg price c.mpg##foreign i.rep78 headroom trunk intx
predict mhat
gen fitted_sans_interaction = mhat -_b[intx]*intx
I've supplied the main effect of foreign which was omitted in your example.