I have a data set with the following variables: ID of an individual, current year, year of graduation, degree, income, and a 0/1 variable to indicate treatment. The income is in the same year as the variable year.
What I want is to regress current income over treatment for every possible combination of: year, year of graduation, and degree.
That means running multiple different regressions that will give me multiple coefficients.
I have zero clues how to do so. I would normally just use:
reg income treatment
But this will not give me multiple coefficients.
Try something like this:
sysuse auto
statsby, by(foreign rep78): regress mpg weight
Related
I want to run a regression of profits against time and see whether there is a change in the last 6 months. To do so, I want to have the regression before July and then get estimates for the whole year assessing whether there is a difference from the actual values.
I use:
areg profits date if date<td(30,6,2021), a(company) vce(cluster date)
predict profits_reg
However, the predictions are only generated for the first 6 months, I don't get the predictions for the days from July 1st, the cells are empty (.)
How can I ensure that I get predictions for all data in my data set?
What is the Stata code for adding region fixed-effects in ordinary least squares regression? My dependent variable is volume of sale of a product and independent one is dummy variable, 1 for red pamphlet, 0 for blue pamphlet distributed to a sample of people over five districts. I want to include region fixed effects in the model. I tried generating dummy variables for the five regions and adding the dummies in the model.
Is this approach correct? If not, which one is?
reg pamph sale income plotsize region1 region2 region3 region4 region5
There are a number of ways to control for group fixed effects.
The simplest (IMO) in your situation is to use a factor variable.
For example:
webuse nlswork
reg ln_w grade age i.ind_code
In your case this would look like:
reg pamph sale income plotsize i.region
Assuming that region is a variable with a unique id for each region.
Other options are areg (see help areg) or reghdfe (see here):
areg ln_w grade age, absorb(ind_code)
reghdfe ln_w grade age, absorb(ind_code)
I am using spss to conduct mixed effect model of the following project:
The participant is being asked some open ended questions and their answers are recorded.
For example, if the participant's answer is related to equality, the variable "equality" is coded as "1". Otherwise, it is coded as "0". Therefore, dependent variable is the variable "equality".
Fixed effects:
- participant's country (Asians vs. Westerners)
- gender (Male vs Female)
- age group (younger age group vs. older age group)
- condition (control group vs. intervention group)
Random effect: Subject ID (participants)
Sample size: over 600 participants
My syntax in spss:
MIXED Equality BY Country Gender AgeGroup Condition
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED= Country Gender AgeGroup Condition | SSTYPE(3)
/METHOD=ML
/PRINT=SOLUTION TESTCOV
/RANDOM=INTERCEPT | SUBJECT(SubID_R) COVTYPE(VC).
When running this analysis in spss, the following warning appears:
Iteration was terminated but convergence has not been achieved.
The MIXED procedure continues despite this warning. Subsequent results produced are based on the last iteration. Validity of the model fit is uncertain.
I try to increase the number of "MXSTEP" from 10 to 10000 in syntax, but another warning appear:
The final Hessian matrix is not positive definite although all
convergence criteria are satisfied.
The MIXED procedure continues despite this warning. Validity of subsequent results cannot be ascertained.
I also try to increase the number of "MXITER" but the warning remains. May I ask how to deal with this problem to get rid of the warning?
Aside from what you've already tried, in some cases increasing the number of Fisher scoring steps can be helpful, but it may be the case that your random intercept variance is truly redundant and you won't be able to resolve this problem with those data and that model.
Also, typically you would not use a linear model for a binary response variable, but would use something like a logistic model (this can be done in GENLINMIXED, under Analyze>Mixed Models>Generalized Linear in the menus).
I'm trying to create a regression that would include a polynomial (let's say 2nd order) of year on a certain interval of year (say 1 to 70) and a number of dummies for certain values of year (say for every year between 45 and 60).
If I didn't have the restriction for dummies, I believe the commands would be:
gen year2=year^2
regress y year year2 i.year if inrange(year,1,70)
I can't make the dummies manually, there will be more than 15 of them in the end). Could anybody help me, please?
If I then want to plot the estimated function without the dummies, why do these two bring different things?
twoway function _b[_cons] +_b[year]*x + _b[year2]*x^2, range(1 70)
twoway function _b[_cons] +_b[year]*year + _b[year2]*year^2, range(1 70)
The way I understood it, _b[_cons], _b[year] and _b[year2] call previously calculated coefficients for the corresponding independent variables and then multiplies it with them. Why does it bring different results then if x should be the same thing as year in this case?
I am not sure why Pearly is giving you such a hard time, I think this may be what you're looking for, but let me know if it is something different:
One thing to note, I am using a dataset that comes preloaded with Stata and this is usually a nice way to make a MVCE like Nick was saying in your other post.
clear
sysuse gnp96
/* variables: gnp, date (quarterly) */
gen year = year(dofq(date)) // get yearly variable
gen year2=year^2 // get the square of the yearly variable
tab year if inrange(year,1970,1975), gen(yr) // generate dummy variables
// the dummy varibales generated have null values for years not
// in the specified range, so we're going to fill those in
foreach v of varlist yr* {
replace `v' = 0 if `v' == .
}
// here's your regression
regress gnp year year2 yr* if inrange(year,1967,1990)
Now, the yr* are your dummy variables and the * is a wildcard calling all variables named like yr[something]
This gives you the range for the dummy variables and the range for the year variables.
As to your question on using x vs year, I am only hypothesizing, but I think that when you use x it is continuous since Stata isn't looking at your variables, but instead just at the x axis whereas your year variable is discrete (a bunch of integers) so it looks more like a step function. More information can be found using the command help twoway function
I am trying to estimate the price elasticity of electricity demand for time series data. In order to handle the endogeneity of price, I am running a panel IV regression.
Treatment groups (4) and 1 control group were randomly assigned and assign the household to a certain pricing program so treatment status should be highly correlated with the endogenous variable (log_price). However, due to random assignment, the treatment status should not be correlated with log_usage. I also include weather variables and a vector of household variables when not using fixed effects/xtivreg2. Treatment is a dummy variable for any treatment group vs. control. I have tried two regressions but both give me errors:
(1) xtivreg:
xtset ID datetime
xtivreg log_usage weather (log_price = i.treatment) household_vars
/// Error: Matsize too small
--> Here I am told that my matsize is too small, although I have set it to 800. I have a very large data set so maybe this is impossible?
(2) xtivreg2
xtset ID datetime
xtivreg2 log_usage weather (log_price = i.treatment), fe
///Error: Factor variables not allowed
(3) I would actually prefer to somehow differentiate between the different treatment groups (1-4) so I created dummies for each of the groups. The control group is the omitted base group:
xtset ID datetime
xtivreg log_usage weather (log_price = i.group1 i.group2 i.group3 i.group4)
///Error: matsize too small
How do I run xtivreg or xtivreg2 with binary variables as a instrument (or instruments)? How do I differentiate between any form of treatment and control and the individual treatment groups (1-4)?