How to ask Stata (or any other statistical package) to estimate the R^2 of a user defined linear model? - regression

Essentially, my task is to take a scatter plot that, under ideal conditions, would have a regression line with a particular slope - say ".5", and come up with some metric of how far off it is from this slope.
My original plan was to compute the scatter plot's actual regression line and compare the coefficient of that model to my "ideal" slope. However, I've realized that this method is prone to be very sensitive to outliers, in the sense that one outlier can totally flip the sign of the coefficient.
Therefore, my thought was to ask Stata to compute the R^2 of a model with slope of .5 -- but I don't know how to do this. Is it possible, in Stata or another package?

In R, with your slope of .5, you could calculate rsquared as
rsquared <- (.5 * (sd(x) / sd(y)))^2
Here's a simple example where I fit a small model and then use this calculation (and rsquared from both can be compared)
x <- c(3, 4, 5, 7, 10)
y <- c(5, 8, 9, 11, 18)
yfit <- lm (y~x)
slope <- yfit$coefficients[2]
slope
rsquaredfit <- summary(yfit)$r.squared
rsquaredfit
# From formula, given slope from fit
rsquared <- (slope * (sd(x) / sd(y)))^2
rsquared

Related

Standard deviation for mean absolute error (MAE) for gaussian plot

I am finding formula to calculate standard deviation of MAE. I am working on regression problem.
Let's say I have y and predicted y as below. y = [1, 2, 3, 4, 5] and predicted y = [1.2, 2.5, 3.9, 4.8, 6.2]. My MAE will be 0.72 based on the formula below.
I want to plot a gaussian distribution graph for MAE where I need mean (MAE) and its standard deviation similar to image below. However I am not sure how to calculate the standard deviation of MAE. Kindly comment if more information is needed. Thank you.

Conversational Neural Network for Peak Detection and time difference of a bouncing ball

For fun, I tried to create a neural network that can detect the time difference (t2 - t1) of two consecutive bounces of a ball within 1.5 seconds (disregarding the third bounce). "The idea is that if you have the time difference of the first two bounces, you can calculate the initial rebounce height, through a physics formula."
Input for the CNN was a spectrogram image as shown below. The output is one neuron, which will output the time difference between the first bounce and the second bounce (t1 the first bounce - t2 the second bounce). Overall there are 1000 samples in this CNN.
The first two bounces can have the same time difference, but be placed somewhere else. For example, one sample might be t2-t1=0.810-0.530=0.280 and another sample might be 0.980-0.7=2.80. This is clear in example 1 and example 2.
Exmaple 1 of Spectrogram
Example 2 of Spectrogram
Here is the full code (isn't much):
https://www.codepile.net/pile/Al51wXl6
Here's the network structure:
cnn = tf.keras.models.Sequential()
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=5, activation='relu', input_shape=[1025, 65, 1]))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
cnn.add(tf.keras.layers.Flatten())
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
cnn.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=30)
The output was far of the accuracy I was hoping for:
Mean Absolute error is: ~0.3
So my question is, am I missunderstanding CNN's or why cant my CNN perform this task.
Most critical mistake
Choice of loss function and output unit
You have a regression task (predicting the continious variable time-difference). But your loss function is binary_crossentropy, which is for classification. Must use something like "mean_squared_error" instead.
The output neuron non-linearity is sigmoid, which is for classification (or other things that should saturate between 0.0 and 1.0). Recomment using linear instead.

Why are my Keras Conv2D kernels 3-dimensional?

In a typical CNN, a conv layer will have Y filters of size NxM, and thus it has N x M x Y trainable parameters (not including bias).
Accordingly, in the following simple keras model, I expect the second conv layer to have 16 kernels of size (7x7), and thus kernel weights of size (7x7x16). Why then are its weights actually size (7x7x8x16)?
I understand the mechanics of what is happening: the Conv2D layers are actually doing a 3D convolution, treating the output maps of the previous layer as channels. It has 16 3D kernels of size(7x7x8). What I don't understand is:
why this is Keras's default behavior?
how do I get a "traditional" convolutional layer without dropping down into the low-level API (avoiding that is my reason for using Keras in the first place)?
_
from keras.models import Sequential
from keras.layers import InputLayer, Conv2D
model = Sequential([
InputLayer((101, 101, 1)),
Conv2D(8, (11, 11)),
Conv2D(16, (7, 7))
])
model.weights
Q1:and thus kernel weights of size (7x7x16). Why then are its weights actually size (7x7x8x16)?
No, the kernel weights is not the size(7x7x16).
from cs231n:
Example 2. Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3*3*20 = 180 connections to the input volume. Notice that, again, the connectivity is local in space (e.g. 3x3), but full along the input depth (20).
Be careful the 'every'.
In your model, 7x7 is your single filter size, and it will connect to previous conv layer, so the parameters on a single filter is 7x7x8, and you have 16, so the total parameters is 7x7x8x16
Q2:why this is Keras's default behavior?
See Q1.
In the typical jargon, when someone refers to a conv layer with N kernels of size (x, y), it is implied that the kernels actually have size (x, y, z), where z is the depth of the input volume to that layer.
Imagine what happens when the input image to the network has R, G, and B channels: each of the initial kernels itself has 3 channels. Subsequent layers are the same, treating the input volume as a multi-channel image, where the channels are now maps of some other feature.
The motion of that 3D kernel as it "sweeps" across the input is only 2D, so it is still referred to as a 2D convolution, and the output of that convolution is a 2D feature map.
Edit:
I found a good quote about this in a recent paper, https://arxiv.org/pdf/1809.02601v1.pdf
"In a convolutional layer, the input feature map X is a W1 × H1 × D1 cube, with W1, H1 and D1 indicating its width, height and depth (also referred to as the number of channels), respectively. The output feature map, similarly, is a cube Z with W2 × H2 × D2 entries. The convolution Z = f(X) is parameterized by D2 convolutional kernels, each of which is a S × S × D1 cube."

Tune input features using backprop in keras

I am trying to implement discriminant condition codes in Keras as proposed in
Xue, Shaofei, et al., "Fast adaptation of deep neural network based
on discriminant codes for speech recognition."
The main idea is you encode each condition as an input parameter and let the network learn dependency between the condition and the feature-label mapping. On a new dataset instead of adapting the entire network you just tune these weights using backprop. For example say my network looks like this
X ---->|----|
|DNN |----> Y
Z --- >|----|
X: features Y: labels Z:condition codes
Now given a pretrained DNN, and X',Y' on a new dataset I am trying to estimate the Z' using backprop that will minimize prediction error on Y'. The math seems straightforward except I am not sure how to implement this in keras without having access to the backprop itself.
For instance, can I add an Input() layer with trainable=True with all other layers set to trainable= False. Can backprop in keras update more than just layer weights? Or is there a way to hack keras layers to do this?
Any suggestions welcome.
thanks
I figured out how to do this (exactly) in Keras by looking at fchollet's post here
Using the keras backend I was able to compute the gradient of my loss w.r.t to Z directly and used it to drive the update.
Code below:
import keras.backend as K
import numpy as np
model.summary() #Pretrained model
loss = K.categorical_crossentropy(Y, Y_out)
grads = K.gradients(loss, Z)
grads /= (K.sqrt(K.mean(K.square(grads)))+ 1e-5)
iterate = K.function([X,Z],[loss,grads])
step = 0.1
Z_adapt = Z_in.copy()
for i in range(100):
loss_val, grads_val = iterate([X_in,Z_adapt])
Z_adapt -= grads_val[0] * step
print "iter:",i,np.mean(loss_value)
print "Before:"
print model.evaluate([X_in, Z_in],Y_out)
print "After:"
print model.evaluate([X_in, Z_adapt],Y_out)
X,Y,Z are nodes in the model graph. Z_in is an initial value for Z'. I set it to an average value from the train set. Z_adapt is after 100 iterations of gradient descent and should give you a better result.
Assume that the size of Z is m x n. Then you can first define an input layer of size m * n x 1. The input will be an m * n x 1 vector of ones. You can define a dense layer containing m * n neurons and set trainable = True for it. The response of this layer will give you a flattened version of Z. Reshape it appropriately and give it as input to the rest of the network that can be appended ahead of this.
Keep in mind that if the size of Z is too large, then network may not be able to learn a dense layer of that many neurons. In that case, maybe you need to put additional constraints or look into convolutional layers. However, convolutional layers will put some constraints on Z.

Kalman-filter with 100 data samples containing noise

If I have a series of observations of say 100 samples of x and y.
Is this enough to predict the 101th y corresponding to a x value?Can I use some part of this data of 100 samples to update some values(Considering that noise exists and some data might be corrupt) ?
Stack overflow is directed at coding - so if you have code that you expect to work, and it doesn't, you should post it with your question.
A Kalman filter can help in the problem you describe if you have a model for the dependence of y on x. So, for example, if your model is that:
y = a * x + b + Gaussian noise, then the Kalman filter is one way to estimate 'a' and 'b', which then allow you to predict the 101'st y from the 101'st x.