Regression coefficients for centered predictor variables: One unit increase *as well as* decrease? - regression

I have a question on how to interpret coefficients in a regression analysis:
I'm doing a (logistic) regression analysis with a mean centered (continuous) predictor (where the 0 value has no meaning). With non-centered predictors, I would interpret the coefficient for the predictor as the change in the outcome variable for one unit increase in the predictor variable; adding one unit of the predictor variable is the only thing that makes sense, since the level for this variable is set to 0.
However, when I have centered predictors, can the coefficient for the predictor be interpreted as the change in the outcome for a predictor variable unit increase and a unit decrease, that is, in both directions away from the the mean? -- Obviously, half of my data consists of observations that have lower than average values on the outcome variable, and I'm interested in having meaningful coefficients for these as well ... (I can't find any answer to this, neither on the YouTube channels I use for statistics learning, nor in my (regrettably only 5) statistics books.)
(See the attached screenshots for an example: OLS regressions from the mtcars package in R (with mpg (miles per gallon) as outcome and wt (weight in 1000 lbs) as predictor, mean centered in the bottom screenshot.)

Related

Determining the values of the filter matrices in a CNN

I am getting started with deep learning and have a basic question on CNN's.
I understand how gradients are adjusted using backpropagation according to a loss function.
But I thought the values of the convolving filter matrices (in CNN's) needs to be determined by us.
I'm using Keras and this is how (from a tutorial) the convolution layer was defined:
classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
There are 32 filter matrices with dimensions 3x3 is used.
But, how are the values for these 32x3x3 matrices are determined?
It's not the gradients that are adjusted, the gradient calculated with the backpropagation algorithm is just the group of partial derivatives with respect to each weight in the network, and these components are in turn used to adjust the network weights in order to minimize the loss.
Take a look at this introductive guide.
The weights in the convolution layer in your example will be initialized to random values (according to a specific method), and then tweaked during training, using the gradient at each iteration to adjust each individual weight. Same goes for weights in a fully connected layer, or any other layer with weights.
EDIT: I'm adding some more details about the answer above.
Let's say you have a neural network with a single layer, which has some weights W. Now, during the forward pass, you calculate your output yHat for your network, compare it with your expected output y for your training samples, and compute some cost C (for example, using the quadratic cost function).
Now, you're interested in making the network more accurate, ie. you'd like to minimize C as much as possible. Imagine you want to find the minimum value for simple function like f(x)=x^2. You can start at some random point (as you did with your network), then compute the slope of the function at that point (ie, the derivative) and move down that direction, until you reach a minimum value (a local minimum at least).
With a neural network it's the same idea, with the difference that your inputs are fixed (the training samples), and you can see your cost function C as having n variables, where n is the number of weights in your network. To minimize C, you need the slope of the cost function C in each direction (ie. with respect to each variable, each weight w), and that vector of partial derivatives is the gradient.
Once you have the gradient, the part where you "move a bit following the slope" is the weights update part, where you update each network weight according to its partial derivative (in general, you subtract some learning rate multiplied by the partial derivative with respect to that weight).
A trained network is just a network whose weights have been adjusted over many iterations in such a way that the value of the cost function C over the training dataset is as small as possible.
This is the same for a convolutional layer too: you first initialize the weights at random (ie. you place yourself on a random position on the plot for the cost function C), then compute the gradients, then "move downhill", ie. you adjust each weight following the gradient in order to minimize C.
The only difference between a fully connected layer and a convolutional layer is how they calculate their outputs, and how the gradient is in turn computed, but the part where you update each weight with the gradient is the same for every weight in the network.
So, to answer your question, those filters in the convolutional kernels are initially random and are later adjusted with the backpropagation algorithm, as described above.
Hope this helps!
Sergio0694 states ,"The weights in the convolution layer in your example will be initialized to random values". So if they are random and say I want 10 filters. Every execution algorithm could find different filter. Also say I have Mnist data set. Numbers are formed of edges and curves. Is it guaranteed that there will be a edge filter or curve filter in 10?
I mean is first 10 filters most meaningful most distinctive filters we can find.
best

Obtaining multiple output in regression using deep learning

Given an RGB image of hand and 3d position of the keypoints of the hand as dataset, I want to do this as regression problem in DL. In this case input will be the RGB image, and output should be estimated 3d position of keypoints.
I have seen some info about regression but most of them are trying to estimate one single value. Is it possible to estimate multiple values(or output) all at once?
For now I have referred to this code. This guy is trying to estimate the age of a person in the image.
The output vector from a neural net can represent anything as long as you define loss function well. Say you want to detect (x,y,z) co-ordinates of 10 keypoints, then just have 30 element long output vector say (x1,y1,z1,x2,y2,z2..............,x10,y10,z10), where xi,yi,zi denote coordinates of ith keypoint, basically you can use any order you feel convenient with. Just be careful with your loss function. Say you want to calculate RMSE loss, you would have to extract tripes correctly and then calculate RMSE loss for each keypoint, or if you are fimiliar with linear algebra, just reshape it into a 3x10 matrix correctly and and have your results also as a 3x10 matrix and then just use
loss = tf.sqrt(tf.reduce_mean(tf.squared_difference(Y1, Y2)))
But once you have formulated your net you will have to stick to it.

Make a prediction using Octave plsregress

I have a good (or at least a self-consistent) calibration set and have applied PCA and recently PLS regression on n.i.r. spectrum of known mixtures of water and additive to predict the percentage of additive by volume. I thus far have done self-calibration and now want to predict the concentration from the n.i.r.spectrum blindly. Octave returns XLOADINGS, YLOADINGS, XSCORES, YSCORES, COEFFICIENTS, and FITTED with the plsregress command. The "fitted" is the estimate of concentration. Octave uses the SIMPLS approach.
How do I use these returned variables to predict concentration give a new samples spectrum?
Scores are usually denoted by T and loadings by P and X=TP'+E where E is the residual. I am stuck.
Note that T and P are X scores and loadings, respectively. Unlike PCA, PLS has scores and loadings for Y as well (usually denoted U and Q).
While the documentation of plsregress is sketchy at best, the paper it refers to Sijmen de Jong: SIMPLS: an alternativ approach to partial least squares regression Chemom Intell Lab Syst, 1993, 18, 251-263, DOI: 10.1016/0169-7439(93)85002-X
discusses prediction with equations (36) and (37), which give:
Yhat0 = X0 B
Note that this uses centered data X0 to predict centered y-values. B are the COEFFICIENTS.
I recommend that as a first step you predict your training spectra and make sure you get the correct results (FITTED).

Loss function for ordinal target on SoftMax over Logistic Regression

I am using Pylearn2 OR Caffe to build a deep network. My target is ordered nominal. I am trying to find a proper loss function but cannot find any in Pylearn2 or Caffe.
I read a paper "Loss Functions for Preference Levels: Regression with Discrete Ordered Labels" . I get the general idea - but I am not sure I understand what will the thresholds be, if my final layer is a SoftMax over Logistic Regression (outputting probabilities).
Can some help me by pointing to any implementation of such a loss function ?
Thanks
Regards
For both pylearn2 and caffe, your labels will need to be 0-4 instead of 1-5...it's just the way they work. The output layer will be 5 units, each is a essentially a logistic unit...and the softmax can be thought of as an adaptor that normalizes the final outputs. But "softmax" is commonly used as an output type. When training, the value of any individual unit is rarely ever exactly 0.0 or 1.0...it's always a distribution across your units - which log-loss can be calculated on. This loss is used to compare against the "perfect" case and the error is back-propped to update your network weights. Note that a raw output from PL2 or Caffe is not a specific digit 0,1,2,3, or 5...it's 5 number, each associated to the likelihood of each of the 5 classes. When classifying, one just takes the class with the highest value as the 'winner'.
I'll try to give an example...
say I have a 3 class problem, I train a network with a 3 unit softmax.
the first unit represents the first class, second the second and third, third.
Say I feed a test case through and get...
0.25, 0.5, 0.25 ...0.5 is the highest, so a classifier would say "2". this is the softmax output...it makes sure the sum of the output units is one.
You should have a look at ordinal (logistic) regression. This is the formal solution to the problem setup you describe ( do not use plain regression as the distance measures of errors are wrong).
https://stats.stackexchange.com/questions/140061/how-to-set-up-neural-network-to-output-ordinal-data
In particular I recommend looking at Coral ordinal regression implementation at
https://github.com/ck37/coral-ordinal/issues.

Stata: comparing coefficients from different regressions (different dependent variables)

I'm doing OLS fixed effects regression, and would like to test whether coefficients are the same between the two. One of the regressions has a different dependent variable than the other.
How can I do this?
Specifically, one of my regressions is:
xtreg black MAshock i.year, cluster(fips)
The other regression is:
xtreg white MAshock i.year, cluster(fips)
you should run one single regression and interact everything with a black dummy variable. the coefficient on that interaction term will test whether the coefficient of interest are the same or not.