# AnalyticBridge

Subscribe to Dr. Granville's Weekly Digest

# Use PRESS, not R squared to judge predictive power of regression

R squared, also known as coefficient of determination, is a popular measure of quality of fit in regression. However, it does not offer any significant insights into how well our regression model can predict future values. Instead, the PRESS statistic (the predicted residual sum of squares) can be used as a measure of predictive power. The PRESS statistic can be computed in the leave-one-out cross validation process, by adding the square of the residuals for the case that is left out. As a reminder, in the leave-one-out cross validation, one case of the data set is used as the testing set and the remaining are used as the testing set. We iterate this process, until all cases have served as the testing set.

Here is an example implemented in R, on the gala dataset in the faraway package:

> gala[1:3,]

Species Endemics  Area Elevation Nearest Scruz Adjacent

Baltra         58       23 25.09       346     0.6   0.6     1.84

Bartolome      31       21  1.24       109     0.6  26.3   572.33

Caldwell        3        3  0.21       114     2.8  58.7     0.78

Model1:

>model1<-lm(Species~Endemics+Area+Elevation)

>summary(model1)

....

Residual standard error: 27.29 on 26 degrees of freedom

Multiple R-squared: 0.9492,    Adjusted R-squared: 0.9433

F-statistic: 161.8 on 3 and 26 DF,  p-value: < 2.2e-16

Model2:

> model2<-lm(Species~I(Endemics^2))

> summary(model2)

...

Residual standard error: 27.1 on 28 degrees of freedom

Multiple R-squared: 0.946,     Adjusted R-squared: 0.9441

F-statistic:   491 on 1 and 28 DF,  p-value: < 2.2e-16

Model3:

> model3<-lm(Species~Endemics+I(Endemics^2))

> summary(model3)

.....

Residual standard error: 22.94 on 27 degrees of freedom

Multiple R-squared: 0.9627,    Adjusted R-squared: 0.9599

F-statistic: 348.5 on 2 and 27 DF,  p-value: < 2.2e-16

Here are now the AIC (Akaike test criterion), BIC (Bayesian information criterion), and PRESS statistic of the three models:

Model 1:

>AIC(model1)

289.243

> BIC(model1)

296.249

PRESS(model1)=259520.5

Model 2:

> AIC(model2)

287.0325

> BIC(model2)

291.2361

PRESS(model2)=26382.22

Model 3:

> AIC(model3)

277.9558

> BIC(model3)

283.5606

PRESS(model3)=22567.03

As we can see, the PRESS statistic is significantly smaller (better) for models 2 and 3, while R squared has a trivial improvement for model 3.  So, according to PRESS, model 3 has the highest predictive power. It is interesting to note that the AIC and BIC also get their best values for model 3.

If you are interested in how I computed the PRESS statistic doing cross-validation in R, please check my next blog post.

Views: 10989

Comment

Join AnalyticBridge

Comment by Sean Flanigan on May 13, 2013 at 6:27pm

This is an amazing post. Thanks so much. R-Squared discussions tend to launch many bar fights.

Comment by Mirko Krivanek on May 13, 2013 at 11:27am

The ability to predict the future performance, rather than goodness of fit on existing data, is a great advantage. This can be achieved using cross-validation, which your method does in some way, through the leaving-one-out procedure. It would be nice to see a metric that simultaneously addresses

• robustness (R Square and PRESS fail)
• no sensitivity to number of observations (R square fails, not sure about PRESS)
• has predictive power (R square fails, PRESS wins)
Comment by Vincent Granville on May 13, 2013 at 11:03am

Vincent, to normalize Rsquared, use Fisher Transform and then apply the T test to the results. It takes care of the data variability and the data size. Outliers are a problem, but they will mess up the quality of the least squares model, anyway, regardless of the criteria by which you judge the quality of your model. if you don't want to worry about them, use quantile regression.

Comment by Vincent Granville on May 12, 2013 at 2:41pm

Great reading for statisticians and data scientists. R^2 has many flaws: it is sensitive to outliers and size-sensitive:an R^2 of 0.65 does not have the same meaning for a data set with 20 observations, than for a dataset with 10,000 observations. How do you normalize this?

## Top Content

1

2

3

4

5

6

7

8

9

10

### Is there any Open Source Data Mining Tool for Creating Decision Trees ?

© 2015   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC