The goodness of Fit Continued

After fitting the data with the linear regression model, you determine the R-squared, which tells how good the fit is. R-squared represents how good the relationship between the model and the dependent variable is on a 0 to 1 scale.

Let’s take the previous example,

The question is: How good is the red line (model) compared to the mean?

That gives you the R-squared.
R2 = [Var(mean) – Var(line)] / Var (mean) = 1 – [ Var(line) / Var (mean)]

In the best fitting case, there is no variation around the model line and in the worst case, it is as bad as that around the mean.

The variation around the mean = sum of squares of differences between the mean and the actual data = 41.27269.

The variation around the line = sum of squares of differences between the line and the actual data = 13.7627.

Therefore, R2 = (41.27269 – 13.7627) / 41.27269 = 0.6665

Q1 <- data.frame("x" = c(10, 8, 13, 9.0, 11.0, 14.0, 6.0, 4.0, 12.0, 7.0, 5.0), "y" = c(8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68))

V_mean <- sum((Q1$y-mean(Q1$y))^2) 

V_line <- sum((Q1$y - 3 - 0.5*Q1$x)^2)

R_squared <- (V_mean - V_line) / V_mean

Not to forget: 3 + 0.5*Q1$x (Y = 3.0 + 0.5 X) is the equation of the line.