performing multiple simple linear regression in spss - regression

i study the effect of dichotomous variable on variability of a dependent continuous variable. the dependent variable is repeatedly measured every month for 6 times. for every ID, i regress the time against the dependent variable to know the intercept, slope, residual standard deviation. is there a method in spss to do the regression analysis for all ID once instead of doing the analysis for each ID separately. after a search, i find amethod but not for repeatedly measured dependent variables. i did it also in linear mixed effect model but the slope and intercept are different from those obtained from simple linear regression.
i study the effect of dichotomous variable on variability of a dependent continuous variable. the dependent variable is repeatedly measured every month for 6 times. for every ID, i regress the time against the dependent variable to know the intercept, slope, residual standard deviation. is there a method in spss to do the regression analysis for all ID once instead of doing the analysis for each ID separately. after a search, i find amethod but not for repeatedly measured dependent variables. i did it also in linear mixed effect model but the slope and intercept are different from those obtained from simple linear regression.

Related

Using OLS regression on binary outcome variable

I have previously been told that -- for reasons that make complete sense -- one shouldn't run OLS regressions when the outcome variable is binary (i.e. yes/no, true/false, win/loss, etc). However, I often read papers in economics/other social sciences in which researchers run OLS regressions on binary variables and interpret the coefficients just like they would for a continuous outcome variable. A few questions about this:
Why do they not run a logistic regression? Is there any disadvantage/limitation to using logit models? In economics, for example, I very often see papers using OLS regression for binary variable and not logit. Can logit only be used in certain situations?
In general, when can one run an OLS regression on ordinal data? If I have a variable that captures "number of times in a week survey respondent does X", can I - in any circumstance - use it as a dependent variable in a linear regression? I often see this being done in literature as well, even though we're always told in introductory statistics/econometrics that outcome variables in an OLS regression should be continuous.
The application of applying OLS to a binary outcome is called Linear Probability Model. Compared to a logistic model, LPM has advantages in terms of implementation and interpretation that make it an appealing option for researchers conducting impact analysis. In LPM, parameters represent mean marginal effects while parameters represent log odds ratio in logistic regression. To calculate the mean marginal effects in logistic regression, we need calculate that derivative for every data point and then
calculate the mean of those derivatives. While logistic regression and the LPM usually yield the same expected average impact estimate[1], researchers prefer LPM for estimating treatment impacts.
In general, yes, we can definitely apply OLS to an ordinal outcome. Similar to the previous case, applying OLS to a binary or ordinal outcome result in violations of the assumptions of OLS. However, within econometrics, they believe the practical effect of violating these assumptions is minor and that the simplicity of interpreting an OLS outweighs the technical correctness of an ordered logit or probit model, especially when the ordinal outcome looks quasi-normal.
Reference:
[1] Deke, J. (2014). Using the linear probability model to estimate impacts on binary outcomes in randomized controlled trials. Mathematica Policy Research.

Is there a way to not select a reference category for logistic regression in SPSS?

When doing logistic regression in SPSS, is there a way to remove the reference category in the independent variables so they're all compared against each other equally rather than against the reference category?
When you have a categorical predictor variable, the most fundamental way to encode it for modeling, sometimes referred to as the canonical representation, is to use a 0-1 indicator for each level of the predictor, where each case takes on a value of 1 for the indicator corresponding to its category, and 0 for all the other indicators. The multinomial logistic regression procedure in SPSS (NOMREG) uses this parameterization.
If you run NOMREG with a single categorical predictor with k levels, the design matrix is built with an intercept column and the k indicator variables, unless you suppress the intercept. If the intercept remains in the model, the last indicator will be redundant, linearly dependent on the intercept and the first k-1 indicators. Another way to say this is that the design matrix is of deficient rank, since any of the columns can be predicted given the other k columns.
The same redundancy will be true of any additional categorical predictors entered as main effects (only k-1 of k indicators can be nonredundant). If you add interactions among categorical predictors, an indicator for each combination of levels of the two predictors is generated, but more than one of these will also be redundant given the intercept and main effects preceding the interaction(s).
The fundamental or canonical representation of the model is thus overparameterized, meaning it has more parameters than can be uniquely estimated. There are multiple ways commonly used to deal with this fact. One approach is the one used in NOMREG and most other more recent regression-type modeling procedures in SPSS, which is to use a generalized inverse of the cross-product of the design matrix, which has the effect of aliasing parameters associated with redundant columns to 0. You'll see these parameters represented by 0 values with no standard errors or other statistics in the SPSS output.
The other way used in SPSS to handle the overparameterized nature of the basic model is to reparameterize the design matrix to full rank, which involves creating k-1 coded variables instead of k indicators for each main effect, and creating interaction variables from these. This is the approach taken in LOGISTIC REGRESSION.
Note that the overall model fit and predicted values from a logistic regression (or other form of linear or generalized linear model) will be the same regardless of what choices are made about parameterization, as long as the appropriate total number of unique columns are in the design matrix. Particular parameter estimates are of course highly dependent upon the particular parameterization used, but you can derive the results from any of the valid approaches using the results from any other valid approach.
If there are k levels in a categorical predictor, there are k-1 degrees of freedom for comparing those k groups, meaning that once you'd made k-1 linearly independent or nonredundant comparisons, any others can be derived from those.
So the short answer is no, you can't do what you're talking about, but you don't need to, because the results for any valid parameterization will allow you to derive those for any other one.

When to choose zero-inflated Poisson models over traditional Poisson regression? When the proportion of zeros start being problematic?

I am learning generalised linear models on my own and while reading about regression models for count outcomes I found a recommendation to use zero-inflated Poisson or Negative Binomial regressions when facing an "excessive" or "considerable" amount of zero values in the outcome variable. However, I have been having a lot of difficulty trying to find a reference explicitly stating what proportion of zeros should be considered "excessive" to the point of warranting the use of zero-inflated models.

How to perform multiple logistic regression for a continuous dependent variable with values between 0 and 1?

I thought that logistic regression should only be used for true binary variables (0 or 1), but now one reviewer of my paper is asking me to perform multiple logistic regression for a dependent variable that is relative abundance (i.e. proportion data). I saw that logistic regression can also be used when there is an upper boundary to the value of the dependent variable. However, when I try to perform logistic regression in R with my data, I have the following error message:
In eval(family$initialize): non-integer #successes in a binomial glm!
My dependent variable is the number of species weighted by abundance or cumulative abundance of all species, that belong to a cluster (group) in a certain plot, divided by the total abundance in a plot (considering all functional groups), so it varies between 0 and 1. I have seen that some methods to apply logistic regression require two columns as the dependent variable, one with the resulting abundance and the other with 100-abundance (can also be 1-abundance), but I could not apply it given my limited R knowledge.

How does SPSS assign factor scores for cases where underlying variables were pairwise deleted?

Here's a simplified example of what I'm trying to figure out from a report. All analyses are being run in SPSS, which I don't have and don't use (my experience is with SAS and R).
They were running a regression to predict overall meal satisfaction from food type ordered, self-reported food flavor, and self-reported food texture.
But food flavor and texture are highly correlated, so they conducted a factor analysis, found food flavor and texture load on one factor, and used the factor scores in the regression.
However, about 40% of respondents don't have responses on self-reported food texture, so they used pairwise deletion while making the factors.
My question is when SPSS calculates the factor scores and outputs them as new variables in the data set, what does it do with people who had an input for a factor that was pairwise deleted?
How does it calculate (if it calculates it at all) factor scores for those people who had a response pairwise deleted during the creation of the factors and who therefore have missing data for one of the variables?
Factor scores are a linear combination of their scaled inputs. That is, given normalized variables X1, ..., Xn, we have the following (where LaTeX formatting isn't supported and the L's indicate the loadings)
f = \sum_{i=1}^n L_i X_i
In your case n = 2. Now suppose one of the X_i is missing. How do you take a sum involving a missing value? Clearly, you cannot... unless you impute it.
I don't know what the analyst who created your report did. I suggest you ask them.