Predicted R-Squared

The foundation of predictive R-squared is cross-validation. We will examine the LOOCV method in this post. First, the data set that we used in the past exercises.

Approval HighHistorians Rank
2 84
687
879
983
1279
29 67
2471
2675
1068
22 89
18 73
38 90

Then,

  • The first row is removed from the list, and the regression model is developed with the other 11 data (2:12)
  • The model is used to predict observation 1 (y). By plugging in the x value (e.g. 2) in the formula (cubic form)
  • The predicted y is subtracted from the actual y for observation 1 and squared (called the squared residual)
  • Observation 1 is returned to the list, and observation 2 is removed (1, 3:12)
  • The process is continued until the last observation and squared residual are collected
  • Sum all the squared residual to get what is known as PRESS (predicted residual error sum of squares)
  • Predicted R2 = 1 – (PRESS/TSS)
res_sq <- 0
for (i in 1:12) {
  new_presi <- Presi_Data[-i,]
  model1 <- lm(new_presi$Historians.rank ~ new_presi$Approval.High +  I(new_presi$Approval.High^2) + I(new_presi$Approval.High^3))
  
  res <- Presi_Data[i,1] - (model1$coefficients[1] + model1$coefficients[2]*Presi_Data[i,2]+model1$coefficients[3]*Presi_Data[i,2]^2 +model1$coefficients[4]*Presi_Data[i,2]^3)
  
    res_sq <- res_sq + res^2
}
res_sq

The res_sq is PRESS.

TSS (or SST) is the total sum of squares = sum of (response (y) – mean of response)2

sum((Presi_Data$Historians.rank-mean(Presi_Data$Historians.rank))^2)
predict_r_sq <- 1 - (res_sq/sum((Presi_Data$Historians.rank-mean(Presi_Data$Historians.rank))^2))