Standardized vs Unstandardized Regression Coefficient

In one of my predictive model, i found a variable whose unstandardized regression coefficient (aka beta or estimate) close to zero (.0003) but it is statistically significant (p-value < .05). If a variable is significant, it means its coefficient value is significantly different from zero. The question arises "Why coefficient value is close to zero if it is a significant variable?".

The answer lies in the difference between unstandardized coefficient and standardized coefficient.

If an independent variable is expressed in millions or billions of dollars (for eg, $656,765), it can have unstandardized estimate close to zero. To make the coefficient value more interpretable, we can rescale the variable by dividing the variable by 1000 or 100,000 (depending on the value). After rescaling the variable, run regression analysis again including the transformed variable. You would find beta coefficient larger than the old coefficient value and significantly larger than 0.

Important Key takeaway :
Unstandardized coefficient should not be used to drop or rank predictors (aka independent variables) as it does not eliminate the unit of measurement.
But if a standardized beta is close to zero, it's a REAL PROBLEM.

Detailed Explanation

The concept of standardization or standardized coefficients comes into picture when predictors (aka independent variables) are expressed in different units. Suppose you have 3 independent variables - age, height and weight. The variable 'age' is expressed in years, height in cm, weight in kg. If we need to rank these predictors based on the unstandardized coefficient, it would not be a fair comparison as the unit of these variable is not same.

Real Use of Standardized Coefficient
They are mainly used to rank predictors (or independent or explanatory variables) as it eliminate the units of measurement of  independent and dependent variables). We can rank independent variables with absolute value of standardized coefficients. The most important variable will have maximum absolute value of standardized coefficient.


In the next section, we will discuss the interpretation of unstandardized and standardized coefficient in linear regression.

Linear Regression : Unstandardized Coefficient
It represents the amount by which dependent variable changes if we change independent variable by one unit keeping other independent variables constant.
Linear Regression : Standardized Coefficient
The standardized coefficient is measured in units of standard deviation. A beta value of 1.25 indicates that a change of one standard deviation in the independent variable results in a 1.25 standard deviations increase in the dependent variable.
Calculation of Standardized Coefficient for Linear Regression

Standardize both dependent and independent variables and use the standardized variables in the regression model to get standardized estimates. By 'standardize', i mean subtract the mean from each observation and divide that by the standard deviation. It is also called z-score. It would make mean 0 and standard deviation 1.

Another Approach
Standardized Coefficient for Linear Regression
The standardized coefficient is found by multiplying the unstandardized coefficient by the ratio of the standard deviations of the independent variable and dependent variable.
Interpretation in Logistic Regression

Logistic Regression : Unstandardized Coefficient
If X increases by one unit, the log-odds of Y increases by k unit, given the other variables in the model are held constant. 
Logistic Regression : Standardized Coefficient
A standardized coefficient value of 2.5 explains one standard deviation increase in independent variable on average, a 2.5 standard deviation increase in the log odds of dependent variable.
Calculation of Standardized Coefficient for Logistic Regression
Standardized Coefficient for Logistic Regression

Calculate Standardized Coefficient for Linear Regression in R

Let's start building a linear regression model

In the program below, we are using Boston dataset. It's about housing values in suburbs of Boston.
> str(Boston)
'data.frame': 506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

Data Description

crim – per capita crime rate by town.
zn – proportion of residential land zoned for lots over 25,000 sq. ft.
indus – proportion of non-retain business acres per town.
chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox – nitrogen oxides concentration (parts per million).
rm – average number of rooms per dwelling.
age – proportion of owner-occupied units built prior to 1940.
dis – weighted mean of distances to five Boston employment centers.
rad – index of accessibility to radial highways
tax – full-value property-tax rate per $10,000
ptratio – pupil-teacher ratio by town
black - 1000(Bk – 0.63)^2, where Bk is the proportion of blacks by town.
lstat – lower status of the population (percent).
medv – median value of owner-occupied homes in $1000s.

Standardized Coefficient using QuantPsyc Package
reg.model<-lm(medv ~ ., data=Boston)

#Standardised coefficients
> lm.beta(reg.model)
        crim           zn        indus         chas          nox           rm 
-0.101017076  0.117715201  0.015335200  0.074198832 -0.223848028  0.291056465 
         age          dis          rad          tax      ptratio        black 
 0.002118638 -0.337836347  0.289749053 -0.226031680 -0.224271231  0.092432232 

R Function : Standardized Coefficients in Linear Regression

We can compute standardized coefficient in R without using any package. See the function below-
stdz.coff <- function (regmodel)
{ b <- summary(regmodel)$coef[-1,1]
sx <- sapply(regmodel$model[-1], sd)
sy <- sapply(regmodel$model[1], sd)
beta <-b * sx / sy

Standardized Coefficient for Logistic Regression in R
Y = data.frame(Titanic)["Survived"]
X = runif(32)
mydata= data.frame(X, Y)

#Logistic regression model
model <- glm(Survived~ X,family=binomial(link='logit'),data=mydata)

#R Function : Standardized Coefficients
stdz.coff <- function (regmodel)
{ b <- summary(regmodel)$coef[-1,1]
sx <- sapply(regmodel$model[-1], sd)
beta <-(3^(1/2))/pi * sx * b

#Standardized Estimate

#Unstandardized Estimate

In SAS, you can include STB option to get standardized estimates.
proc logistic data = training descending;
class rank (ref ='1');
model admit = gre gpa rank /  stb;

Statistics Tutorials : 50 Statistics Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like banking, Telecom, HR and Health Insurance.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*
Related Posts:
4 Responses to "Standardized vs Unstandardized Regression Coefficient"
  1. You give a formula for standardizing independent and dependent variables. Can't the R scale() function be used to do the same thing?

  2. The higher the standardised coefficient the greater the significance?

  3. Very nice post. It is useful to see the use in R. Thanks for the post.


Next → ← Prev