In one of my predictive model, i found a variable whose unstandardized regression coefficient (aka beta or estimate) close to zero (.0003) but it is statistically significant (p-value < .05). If a variable is significant, it means its coefficient value is significantly different from zero. The question arises "Why coefficient value is close to zero if it is a significant variable?".

The answer lies in the

If an independent variable is expressed in millions or billions of dollars (for eg, $656,765), it can have unstandardized estimate close to zero. To make the coefficient value more interpretable, we can rescale the variable by dividing the variable by 1000 or 100,000 (depending on the value). After rescaling the variable, run regression analysis again including the transformed variable. You would find beta coefficient larger than the old coefficient value and significantly larger than 0.

The answer lies in the

**difference between unstandardized coefficient and standardized coefficient.**If an independent variable is expressed in millions or billions of dollars (for eg, $656,765), it can have unstandardized estimate close to zero. To make the coefficient value more interpretable, we can rescale the variable by dividing the variable by 1000 or 100,000 (depending on the value). After rescaling the variable, run regression analysis again including the transformed variable. You would find beta coefficient larger than the old coefficient value and significantly larger than 0.

**Important Key takeaway :**

Unstandardized coefficient should not be used to drop or rank predictors (aka independent variables) as it does not eliminate the unit of measurement.

But if a standardized beta is close to zero, it's a

The concept of standardization or standardized coefficients comes into picture when predictors (aka independent variables) are expressed in different units. Suppose you have 3 independent variables - age, height and weight. The variable 'age' is expressed in years, height in cm, weight in kg. If we need to rank these predictors based on the unstandardized coefficient, it would not be a fair comparison as the unit of these variable is not same.

**REAL PROBLEM.****Detailed Explanation**The concept of standardization or standardized coefficients comes into picture when predictors (aka independent variables) are expressed in different units. Suppose you have 3 independent variables - age, height and weight. The variable 'age' is expressed in years, height in cm, weight in kg. If we need to rank these predictors based on the unstandardized coefficient, it would not be a fair comparison as the unit of these variable is not same.

**Real Use of Standardized Coefficient**They are mainly used to rank predictors (or independent or explanatory variables) as it eliminate the units of measurement of independent and dependent variables). We can rank independent variables with absolute value of standardized coefficients. The most important variable will have maximum absolute value of standardized coefficient.

**Interpretation**

In the next section, we will discuss the interpretation of unstandardized and standardized coefficient in linear regression.

**Linear Regression : Unstandardized Coefficient**

It represents the amount by which dependent variable changes if we change independent variable by one unit keeping other independent variables constant.

**Linear Regression : Standardized Coefficient**

The standardized coefficient is measured in units of standard deviation. A beta value of 1.25 indicates that a change of one standard deviation in the independent variable results in a 1.25 standard deviations increase in the dependent variable.

**Calculation of Standardized Coefficient for Linear Regression**

Standardize both dependent and independent variables and use the standardized variables in the regression model to get standardized estimates. By 'standardize', i mean subtract the mean from each observation and divide that by the standard deviation. It is also called z-score. It would make mean 0 and standard deviation 1.

**Another Approach**

Standardized Coefficient for Linear Regression |

The standardized coefficient is found by multiplying the unstandardized coefficient by the ratio of the standard deviations of the independent variable and dependent variable.

**Interpretation in Logistic Regression**

**Logistic Regression : Unstandardized Coefficient**

If X increases by one unit, the log-odds of Y increases by k unit, given the other variables in the model are held constant.

**Logistic Regression : Standardized Coefficient**

A standardized coefficient value of 2.5 explains one standard deviation increase in independent variable on average, a 2.5 standard deviation increase in the log odds of dependent variable.

**Calculation of Standardized Coefficient for Logistic Regression**

Standardized Coefficient for Logistic Regression |

**Calculate Standardized Coefficient for Linear Regression in R**

*Let's start building a linear regression model*In the program below, we are using Boston dataset. It's about housing values in suburbs of Boston.

library(MASS) data(Boston) str(Boston)

>str(Boston)'data.frame': 506 obs. of 14 variables: $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ... $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ... $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ... $ chas : int 0 0 0 0 0 0 0 0 0 0 ... $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ... $ rm : num 6.58 6.42 7.18 7 7.15 ... $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ... $ dis : num 4.09 4.97 4.97 6.06 6.06 ... $ rad : int 1 2 2 3 3 3 5 5 5 5 ... $ tax : num 296 242 242 222 222 222 311 311 311 311 ... $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ... $ black : num 397 397 393 395 397 ... $ lstat : num 4.98 9.14 4.03 2.94 5.33 ... $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

#### Data Description

crim – per capita crime rate by town. zn – proportion of residential land zoned for lots over 25,000 sq. ft. indus – proportion of non-retain business acres per town. chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise). nox – nitrogen oxides concentration (parts per million). rm – average number of rooms per dwelling. age – proportion of owner-occupied units built prior to 1940. dis – weighted mean of distances to five Boston employment centers. rad – index of accessibility to radial highways tax – full-value property-tax rate per $10,000 ptratio – pupil-teacher ratio by town black - 1000(Bk – 0.63)^2, where Bk is the proportion of blacks by town. lstat – lower status of the population (percent). medv – median value of owner-occupied homes in $1000s.

**Standardized Coefficient using QuantPsyc Package**

reg.model<-lm(medv ~ ., data=Boston) #Standardised coefficients library(QuantPsyc) lm.beta(reg.model)

>lm.beta(reg.model)crim zn indus chas nox rm -0.101017076 0.117715201 0.015335200 0.074198832 -0.223848028 0.291056465 age dis rad tax ptratio black 0.002118638 -0.337836347 0.289749053 -0.226031680 -0.224271231 0.092432232 lstat -0.407446933

**R Function : Standardized Coefficients in Linear Regression**

We can compute standardized coefficient in R without using any package. See the function below-

stdz.coff <- function (regmodel)

{ b <- summary(regmodel)$coef[-1,1]

sx <- sapply(regmodel$model[-1], sd)

sy <- sapply(regmodel$model[1], sd)

beta <-b * sx / sy

return(beta)

}

stdz.coff(reg.model)

**Standardized Coefficient for Logistic Regression in R**

data("Titanic") Y = data.frame(Titanic)["Survived"] X = runif(32) mydata= data.frame(X, Y) #Logistic regression model model <- glm(Survived~ X,family=binomial(link='logit'),data=mydata) #R Function : Standardized Coefficients stdz.coff <- function (regmodel) { b <- summary(regmodel)$coef[-1,1] sx <- sapply(regmodel$model[-1], sd) beta <-(3^(1/2))/pi * sx * b return(beta) } #Standardized Estimate stdz.coff(model) #Unstandardized Estimate model$coefficients[-1]

In SAS, you can include

**STB**option to get standardized estimates.proc logistic data = training descending;

class rank (ref ='1');

model admit = gre gpa rank /stb;

run;

You give a formula for standardizing independent and dependent variables. Can't the R scale() function be used to do the same thing?

ReplyDeleteThe higher the standardised coefficient the greater the significance?

ReplyDeleteyes

DeleteVery nice post. It is useful to see the use in R. Thanks for the post.

ReplyDelete