What does negative R2 score mean in regression? - regression

I am getting my R2 score for support Vector regression in negative. But when I tried to predict new results, It gives better performance than other algorithms? Is a negative R2 Score doesn't impact model performance?

R2 score can be negative as stated in the dosumentation. R2 is not always the square of anything, so it can have a negative value without violating any rules of math. R2 is negative only when the chosen model does not follow the trend of the data.
It seems that your model may be giving better performance because of over-fitting.
Or maybe try running the r2 score function in this format:
r2_score(y_true, y_pred)

Related

Understanding the relationship between the dispersion of residual distribution and R2

I am trying to make a point that when we use big data, traditional pricing variables such as credit score matters less. So I run two separate regressions, one where big data is used, and one without. I expect the R2 to be smaller for the big data case because the traditional variables explain less my outcome variable (let's say, your interest rate on a loan). But strangely i get an R2 that is larger, BUT the weird thing is, when I run the regressions and plot residuals by (predict res, residuals), the big data regression has a higher standard deviation. How is this possible? wouldn't a larger R2 lead to a lower std of residuals? am i missing something here?

XGBoost regression RMSE individual prediction

I have a simple regression problem with two independent variables and one dependent one. I tried linear regression from statsmodels and sk-learn, but I get the best results (R ^ 2 and RMSE) with XGBoost regressor.
On the new data set, RMSE is still in line with earlier results, but individual predictions are very different.
For example, the RMSE is 1000, and individual predictions vary from 20 to 3000. Thus, predictions are either almost perfectly accurate or strongly deviate in a few cases, but i don't know why is that.
My question is what is the cause of such variations in individual predictions?
When testing your model with new data, it's normal to get some of the predictions wrong. Since RMSE is 1000 it means that, on average, the root of the difference between the actual and predicted values is 1000. You can have values that are predicted very well, as well as values that give a very large squared error. The reason for this could be overfitting. It could also be that the new data set contains data that is very different from the data the model was trained on. But since the RMSE is in line with earlier results, I understand that RMSE was around 1000 on the training set as well. Therefore I don't necessarily see a problem with the test set. What I would do is go through the preprocessing steps for the training data and make sure they're done correctly:
standardize the data and remove possible skewness
check for collinearity between independent variables (you only have 2, so it should be easy to do)
check to see if independent variables have an acceptable variance. If your variables don't vary too much for each new data point it could be that they are useless for explaining the dependent variable.
BTW, what is the R2 score for your regression? It should tell you how much of the variability of the target variable is explained by your model. A low R2 score should indicate that the regressors used aren't very useful in explaining your target variable.
You can use the sklearn function StandardScaler() to standaredize the data.

Is BatchNorm turned off when inferencing?

I read from several sources that implicitly suggest batchnorm being turned off for inference but I have no definite answer for this.
Most common is to use a moving average of mean and std for your batch normalization as used by Keras for example (https://github.com/keras-team/keras/blob/master/keras/layers/normalization.py). If you just turn it off the network will perform worse on the same data, due to changes in how the images are processed.
This is done by storing the average mean and the average std of all the batches used during training the network. Then in inference this moving average is used for normalization.

Rapidminer decision tree using cross validation

I am using ten fold cross validation operator. I am using rapidminer first time so having some confusion that will I get 10 decision trees as a result. I have read that accuracy is average of all results so what is final output. Average of all?
The aim of cross validation is to output a prediction about the performance a model will produce when presented with unseen data.
For the 10 fold case, the data is split into 10 partitions. There are 10 possible ways to get 9/10 of the data to make training sets and these are used to build 10 models. These models are applied to the remaining 1 partition to produce a performance estimate. The 10 performances are averaged. The end result is an average that is a reasonable estimate of the performance of a model on unseen data.
The remaining question is what is the model? The best answer is to use a model built on all the data and to assume it is close enough to the 10 models used to generate the average estimate.

Intel-MKL FFT performance for some conditions

I am currently using Intel's MKL 2D FFT routines.
I am running into a condition where the performance is changing by a factor of 4-5.
What I am doing is implementing a type of band pass filter using FFT libraries. The results of test are correct, but the speed is an issue.
What I am seeing is about 1.3 sec on the forward FFT and between 1.3 and 6 seconds on the inverse FFT.
I have tracked this down to the weights I am applying after the forward pass of the FFT.
The weights are between 0 and -1, mostly 0 when I am getting the 6 seconds.
If I set the weights to 1 before applying the time is 1.3 seconds. Other test show this kind of behavior without using weights of 1.
My questions is how can the values I am applying cause this kind of slow down? I could understand a minor change in execution time, but not this dramatic of a change.
Thanks,
Jim K
I don't know if this is specific to the MKL version of the FFT or a general issue.
Some CPUs may require many more execution cycles to do floating point arithmetic operations using underflowed operands or when producing underflowed results.
For your filter coefficients, you can try weights far far larger than zero (in relation to a value near an IEEE double or float underflowed number), and still have a filter with a better than -120 dB stopband. Try that.
Some CPU and OS combinations might allow turning off underflowed floating point arithmetic or results. That may also help.