####
**Live Online Training :**
Predictive Modeling using SAS

- Explain Advanced Algorithms in Simple English

- Live Projects & Case Studies

- Domain Knowledge

- Job Placement Assistance

- Get 10% off till Oct 26, 2017

- Batch starts from October 28, 2017

In this tutorial, we will cover the difference between r-squared and adjusted r-squared. It includes detailed theoretical and practical explanation of these two statistical metrics in R.

It measures the proportion of the variation in your dependent variable explained by all of your independent variables in the model. It assumes that every independent variable in the model helps to explain variation in the dependent variable. In reality, some independent variables (predictors) don't help to explain dependent (target) variable. In other words, some variables do not contribute in predicting target variable.

Mathematically, R-squared is calculated by dividing sum of squares of residuals (

As

R-Squared is also called

It measures the proportion of variation explained by only those independent variables that really help in explaining the dependent variable. It penalizes you for adding independent variable that do not help in predicting the dependent variable.

Adjusted R-Squared can be calculated mathematically in terms of sum of squares. The only difference between R-square and Adjusted R-square equation is degree of freedom.

In the above equation, dft is the degrees of freedom n– 1 of the estimate of the population variance of the dependent variable, and dfe is the degrees of freedom n – p – 1 of the estimate of the underlying population error variance.

Adjusted R-squared value can be calculated based on value of r-squared, number of independent variables (predictors), total sample size.

Suppose you have actual and predicted dependent variable values. In the script below, we have created a sample of these values.

**R-squared (R²)**It measures the proportion of the variation in your dependent variable explained by all of your independent variables in the model. It assumes that every independent variable in the model helps to explain variation in the dependent variable. In reality, some independent variables (predictors) don't help to explain dependent (target) variable. In other words, some variables do not contribute in predicting target variable.

Mathematically, R-squared is calculated by dividing sum of squares of residuals (

**SSres**) by total sum of squares (**SStot**) and then subtract it from 1. In this case, SStot measures total variation.**SSreg**measures explained variation and SSres measures unexplained variation.As

**SSres + SSreg = SStot,****R² = Explained variation / Total Variation**R-squared Equation |

**coefficient of determination**. It lies between**0%**and**100%.**A r-squared value of 100% means the model explains all the variation of the target variable. And a value of 0% measures zero predictive power of the model.**Higher R-squared value, better the model.****Adjusted R-Squared**It measures the proportion of variation explained by only those independent variables that really help in explaining the dependent variable. It penalizes you for adding independent variable that do not help in predicting the dependent variable.

Adjusted R-Squared can be calculated mathematically in terms of sum of squares. The only difference between R-square and Adjusted R-square equation is degree of freedom.

Adjusted R-Squared Equation |

Adjusted R-squared value can be calculated based on value of r-squared, number of independent variables (predictors), total sample size.

Adjusted R-Squared Equation 2 |

**Difference between R-square and Adjusted R-square**- Every time you add a independent variable to a model, the
**R-squared****increases**, even if the independent variable is insignificant. It never declines. Whereas**Adjusted R-squared**increases only when independent variable is significant and affects dependent variable. - R- squared can never be negative, whereas adjusted r-squared can be negative when r-squared is close to zero.
- Adjusted r-squared value always be less than or equal to r-squared value.

In the table below, adjusted r-squared is maximum when we included two variables. It declines when third variable is added. Whereas r-squared increases when we included third variable. It means third variable is insignificant to the model.

R-Squared vs. Adjusted R-Squared |

**Which is better?**

Adjusted R-square should be used to compare models with different numbers of independent variables. Adjusted R-square should be used while selecting important predictors (independent variables) for the regression model.

**R Script : Calculate R-Squared and Adjusted R-Squared**Suppose you have actual and predicted dependent variable values. In the script below, we have created a sample of these values.

y = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2)

yhat = c(21.5, 21.14, 26.1, 20.2, 17.5, 19.7, 14.9, 22.5, 25.1, 18)

R.squared = 1 - sum((y-yhat)^2)/sum((y-mean(y))^2)

print(R.squared)

**Final Result :**R-Squared = 0.6410828n = 10In this case, adjusted r-squared value is 0.4616242 assuming we have 3 predictors and 10 observations.

p = 3

adj.r.squared = 1 - (1 - R.squared) * ((n - 1)/(n-p-1))

print(adj.r.squared)

Could you please explain RMSE, AIC and BIC as well.

ReplyDeleteWe use RMSE to compare our model. Lesser the value is good for our model, but I m not sure about the rest of the statistics AIC and BIC respectively..

Nicely Explained

ReplyDeletenice explaination so it means always adjusted r square will <= Rsquare

ReplyDeleteCould you give please the data set in order to understand the difference better.

ReplyDeleteR squared depends on the sum of squared errors (SSE), il SSE decreases (new predictor has improved the fit) then R squared increases. In this case R squared is a good measure. Please give us a complete example to understand. Thank you

ReplyDelete