Difference between R-squared and Adjusted R-squared

In this tutorial, we will cover the difference between r-squared and adjusted r-squared. It includes detailed theoretical and practical explanation of these two statistical metrics in R.

R-squared (R²) 

It measures the proportion of the variation in your dependent variable explained by all of your independent variables in the model. It assumes that every independent variable in the model helps to explain variation in the dependent variable. In reality, some independent variables (predictors) don't help to explain dependent (target) variable. In other words, some variables do not contribute in predicting target variable.

Mathematically, R-squared is calculated by dividing sum of squares of residuals (SSres) by total sum of squares (SStot) and then subtract it from 1. In this case, SStot measures total variation. SSreg measures explained variation and SSres measures unexplained variation.

As SSres + SSreg = SStot, R² = Explained variation / Total Variation 
R-squared Equation
R-Squared is also called coefficient of determination. It lies between 0% and 100%. A r-squared value of 100% means the model explains all the variation of the target variable. And a value of 0% measures zero predictive power of the model. Higher R-squared value, better the model.

Adjusted R-Squared

It measures the proportion of variation explained by only those independent variables that really help in explaining the dependent variable. It penalizes you for adding independent variable that do not help in predicting the dependent variable.

Adjusted R-Squared can be calculated mathematically in terms of sum of squares. The only difference between R-square and Adjusted R-square equation is degree of freedom.

Adjusted R-Squared Equation
In the above equation, dft is the degrees of freedom n– 1 of the estimate of the population variance of the dependent variable, and dfe is the degrees of freedom n – p – 1 of the estimate of the underlying population error variance.

Adjusted R-squared value can be calculated based on value of r-squared, number of independent variables (predictors), total sample size.
Adjusted R-Squared Equation 2

Difference between R-square and Adjusted R-square
  1. Every time you add a independent variable to a model, the R-squared increases, even if the independent variable is insignificant. It never declines. Whereas Adjusted R-squared increases only when independent variable is significant and affects dependent variable. 

  2. In the table below, adjusted r-squared is maximum when we included two variables. It declines when third variable is added. Whereas r-squared increases when we included third variable. It means third variable is insignificant to the model.
    R-Squared vs. Adjusted R-Squared
  3. R- squared can never be negative, whereas adjusted r-squared can be negative when r-squared is close to zero.
  4. Adjusted r-squared value always be less than or equal to r-squared value.
Which is better?
Adjusted R-square should be used to compare models with different numbers of independent variables. Adjusted R-square should be used while selecting important predictors (independent variables) for the regression model. 

R Script : Calculate R-Squared and Adjusted R-Squared

Suppose you have actual and predicted dependent variable values. In the script below, we have created a sample of these values.
y = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2)
yhat = c(21.5, 21.14, 26.1, 20.2, 17.5, 19.7, 14.9, 22.5, 25.1, 18)
R.squared = 1 - sum((y-yhat)^2)/sum((y-mean(y))^2)
print(R.squared)
Final Result : R-Squared = 0.6410828 
n = 10
p = 3
adj.r.squared = 1 - (1 - R.squared) * ((n - 1)/(n-p-1))
print(adj.r.squared)
In this case, adjusted r-squared value is 0.4616242 assuming we have 3 predictors and 10 observations.

Statistics Tutorials : 50 Statistics Tutorials

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

5 Responses to "Difference between R-squared and Adjusted R-squared"

  1. Could you please explain RMSE, AIC and BIC as well.
    We use RMSE to compare our model. Lesser the value is good for our model, but I m not sure about the rest of the statistics AIC and BIC respectively..

    ReplyDelete
  2. nice explaination so it means always adjusted r square will <= Rsquare

    ReplyDelete
  3. Could you give please the data set in order to understand the difference better.

    ReplyDelete
  4. R squared depends on the sum of squared errors (SSE), il SSE decreases (new predictor has improved the fit) then R squared increases. In this case R squared is a good measure. Please give us a complete example to understand. Thank you

    ReplyDelete

Next → ← Prev