# Relative Importance (Weight) Analysis

This post includes detailed explanation of relative weight analysis (RWA) along with its implementation in statistical softwares and programming like SPSS, SAS, R and Python. This technique is quite popular in survey analytics world, mainly used to perform driver/impact analysis. For example which human resource driver makes employees stay or leave the organisation. Is 'pay' driver important than 'work-life balance'?

Relative Weight (Importance) Analysis
Relative Weight Analysis is a useful technique to calculate the relative importance of predictors (independent variables) when independent variables are correlated to each other. It is an alternative to multiple regression technique and it addresses multicollinearity problem and also helps to calculate the importance rank of variables. It helps to answer " Which variable is the most important and rank variables based on their contribution to R-Square". Relative Importance Analysis

Background

When independent variables are correlated, it is difficult to determine the correct prediction power of each variable. Hence, it is difficult to rank them as we are unable to estimate coefficients correctly. Statistically, multicollinearity can increase the standard error of the coefficient estimates and make the estimates very sensitive to minor changes in the model. It means the coefficients are biased and difficult to interpret.

How it works

It creates a set of new independent variables that are the maximally related to the original independent variables but are uncorrelated to each other. Because these new transformed independent variables are uncorrelated to each other, the dependent variable can be regressed onto this new set of independent variables producing a series of standardized regression coefficients.

Calculation Steps
1. Compute correlation matrix between independent variables
2. Calculate Eigenvector and Eigenvalues on the above correlation matrix
3. Calculate diagonal matrix of eigenvalue and then take square root of the diagonal matrix
4. Calculate matrix multiplication of eigenvector, matrix in step 3 and Transpose of Eigenvector
5. Square the above matrix
6. To calculate the partial effect of each independent variable on dependent variable, calculate matrix multiplication of [Inverse of matrix in step 4] and correlation matrix [between dependent and independent variables (i.e. 1 X 1 matrix)]
7. To calculate R-Square, sum the above matrix (Step 6 matrix)
8. To calculate raw relative weights, calculate matrix multiplication of [matrix in step 5] and [Square of matrix in step 6]
9. To calculate raw relative weights as percentage of R-Square, divide raw relative weights by r-square and then multiply it by 100.

Important Point

In the next section I have included programs to run RWA Analysis. Before running the analysis, it is important to ensure you don't have missing values in both independent and dependent variables. If you have missing values, it is important to impute or remove them. Also ensure you provide only numeric values in the `target` and `predictors` arguments in the programs below.

SPSS Code

***************************************************
********** RELATIVE WEIGHT ANALYSIS ************
***********Author : Deepanshu Bhalla*******************
****************************************************

*Specify a path where INPUT DATA FILE is saved.

*Specify a path where you wish OUTPUT files to be saved.

*Define Independent Variable names.
DEFINE Ivars ( )
var1 var2 var3
!ENDDEFINE .

*Define Dependent Variable name.
DEFINE Target ( )
Churn
!ENDDEFINE .

*Define VARIABLE LABELING for Independent variables.
*Order of variable labeling and independent variables must be same.
*Space in labels should not be used, rather words separated by"_".
DEFINE LABELING ( )
Interest_Rate
Renewed_PCT
Account_No
!ENDDEFINE.

GET FILE = 'DataFile'.

CORRELATIONS
/VARIABLES= Target Ivars
/MISSING=PAIRWISE
/MATRIX = OUT (Corr.sav).

oms
/select tables
/if commands =['Regression'] SUBTYPES=['Coefficients']
/destination format =SAV  outfile ='Betas.sav'
/columns sequence =[RALL CALL LALL].

regression
/dependent Target
/method= enter Ivars.
omsend.

GET FILE='Betas.sav'.

FLIP ALL.

COMPUTE var6=INDEX(CASE_LBL, '_Beta').
RECODE var6 (1 THRU HIGHEST=1) (else=0).
SELECT IF var6=1.
EXECUTE .

DELETE VARIABLES CASE_LBL VAR6.

SAVE Outfile = 'Directory\Coefficients.sav'
/ RENAME = (var001=Coefficients).

GET FILE = corr.sav .
SELECT IF rowtype_ = 'CORR' .
EXECUTE.

matrix.

MGET / FILE = 'Corr.sav'
/ TYPE = CORR.

COMPUTE R = CR.

COMPUTE N = NCOL(R).
COMPUTE RXX = R(2:N,2:N).
COMPUTE RXY = R(2:N,1).
CALL EIGEN(RXX,EVEC,EV).
COMPUTE D = MDIAG(EV).
COMPUTE DELTA = SQRT(D).
COMPUTE LAMBDA = EVEC * DELTA * T(EVEC).
COMPUTE LAMBDASQ = LAMBDA &**2.
COMPUTE BETA1 = INV(LAMBDA) * RXY.
COMPUTE RSQUARE = CSSQ(BETA1).
COMPUTE RAWWGT = LAMBDASQ * BETA1 &**2.
COMPUTE IMPORT = (RAWWGT &/ RSQUARE) * 100.

PRINT RSQUARE /FORMAT=F8.8.
PRINT RAWWGT /FORMAT=F8.8
/TITLE = "Raw Relative Weights" .
PRINT IMPORT /FORMAT=PCT8.8
/TITLE = "Relative Weights as Percentage of R-square" .

SAVE RSQUARE
/OUTFILE='RSQ.sav'.
SAVE RAWWGT
/OUTFILE='Raw.sav'.
SAVE IMPORT
/OUTFILE='Relative.sav'.
END MATRIX.

INPUT PROGRAM.
NUMERIC LABELING (F25).
LOOP #=1 TO 1.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FLIP.

SAVE OUTFILE = 'Labeling.sav'
/ DROP VAR001
/ RENAME (CASE_LBL=Categories).

MATCH FILES FILE ='Labeling.sav'
/ FILE = 'Raw.sav'
/ Rename = (COL1 = RAW_RELATIVE)
/ FILE = 'Relative.sav'
/ Rename = (COL1 = PERCENT_RSQUARE)
/ FILE = 'Directory\Coefficients.sav'
/ FILE = 'RSQ.sav'
/ Rename = (COL1 = RSQUARE).

FORMATS  RAW_RELATIVE TO RSQUARE (F8.6).

SAVE TRANSLATE OUTFILE='Directory\Final_Output.xls'
/TYPE=XLS
/VERSION=8
/REPLACE
/FIELDNAMES
/CELLS=VALUES.
EXECUTE.

R Code

```rwa <- function(df, target, predictors) {

df2 <- df %>%
select(all_of(c(target,predictors))) %>%
filter(!is.na(target))

if(sum(colSums(is.na(df2)) > 0)) stop("Treat missing values in predictors")

corX <- df2 %>%
cor(., use = "pairwise.complete.obs") %>%
.[2:ncol(.), 2:ncol(.)]

corY <- df2 %>%
cor(., use = "pairwise.complete.obs") %>%
.[2:ncol(.), 1]

eigenX <- eigen(corX)
D <- diag(eigenX\$values)
delta <- sqrt(D)
lambda <- eigenX\$vectors %*% delta %*% t(eigenX\$vectors)
lambdasq <- lambda ^ 2
beta <- solve(lambda) %*% corY
rsquare <- sum(beta ^ 2)
rawWgt <- lambdasq %*% beta ^ 2
importance <- (rawWgt / rsquare) * 100
tbl <- data.frame(predictors,
`Raw Relative Weights` = rawWgt,
Importance = importance,
Beta = beta) %>%
arrange(desc(Importance))

return(list(`Importance Scores` = tbl, Rsquare = rsquare))

}

library(dplyr)
rwa(df = mtcars, target = "mpg", predictors = c("cyl", "disp", "hp", "gear"))
```

SAS Code

```FILENAME PROBLY TEMP;
PROC HTTP
METHOD="GET"
OUT=PROBLY;
RUN;

OPTIONS VALIDVARNAME=ANY;
PROC IMPORT
FILE=PROBLY
OUT=WORK.MYDATA REPLACE
DBMS=CSV;
RUN;

%macro rwa (df =, target=, predictors=, output=);

data temp;
set &df (keep = &target &predictors);
where not missing(&target);
run;

proc corr data= temp out=corX(where=(_type_ = 'CORR')) NOMISS noprint;
var &predictors;
run;

proc corr data= temp out=corY(where=(_type_ = 'CORR' and _name_ = "&target") drop=&target) NOMISS noprint;
var &target &predictors;
run;

proc iml;
use corX;
read all var _NUM_ into M; close;
eigenVal=eigval(m);
eigenVec=eigvec(m);
D = diag(eigenVal);
delta  = sqrt(D);
lambda = eigenVec * delta * t(eigenVec);
lambdasq = lambda ## 2;
use corY;
read all var _NUM_ into M2; close;
beta = inv(lambda) * t(m2);
rsquare = sum(beta ## 2);
rawRelWgt = lambdasq * beta##2;
importance = (rawRelWgt / rsquare) * 100;
VarName = {&predictors};
print rsquare;
create &output var {VarName rawRelWgt importance beta} ;
append;
close &output;
quit;

%mend;

%rwa(df=MYDATA, target= mpg, predictors=cyl disp hp gear, output=importanceTbl);
```

Python Code

```import pandas as pd
import numpy as np
import scipy.linalg

def rwa(df, target, predictors):

# Combine target and predictors
allVars = predictors.copy()
allVars.insert(0,target)

# Non-missing values in target variables
df2  =  df.loc[:,allVars][df[target].notnull()]
corX = df2.corr().loc[predictors,predictors]
corY = df2.corr().loc[predictors,target]

w,v= scipy.linalg.eig(corX)
D = np.diag(w)
delta = np.sqrt(D)
l =  np.matmul(np.matmul(v,  delta), np.transpose(v))
lambdasq = l**2
beta = np.matmul(np.linalg.inv(l),corY)

rsquare = sum(np.power(beta,2))
rawWgt  = np.matmul(lambdasq, beta**2)
importance = (rawWgt / rsquare) * 100

importanceTbl = pd.DataFrame({'Variables' : predictors, 'RawRelativeWeights' : rawWgt, 'ImportanceScores': importance, 'Beta':beta})

return importanceTbl, rsquare

result, rsq = rwa(df = mtcars, target = "mpg", predictors = ["cyl", "disp", "hp", "gear"])
print(result)
print(rsq)
```
Output
Output constitutes importance scores of each predictor (independent variable) along with r-squared value of the model.
```\$`RWA Table`
predictors Raw.Relative.Weights Importance       Beta
1         hp            0.2321744   29.79691 -0.4795836
2        cyl            0.2284797   29.32274 -0.4904939
3       disp            0.2221469   28.50999 -0.4835607
4       gear            0.0963886   12.37037  0.2734483

\$Rsquare
 0.7791896
```
Signs of Beta can be interpreted as if predictor variable is positively or negatively impacting target variable. Negative sign denotes negative relationhip, positive sign denotes positive relationship. Share
Related Posts Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and Human Resource.

9 Responses to "Relative Importance (Weight) Analysis"
1. Thanks for this interesting post.
However I'm experiencing some difficulties in following some steps, particularly steps 4 and 6
Could you please post an example of processing, say a 3 x 3 correlation matrix ?

2. This is excellent!

3. KlaZe it 13roSki

4. This comment has been removed by a blog administrator.

5. Thanks a lot. This is very helpful. May I know how to revise the syntax if the raw data is a correlation matrix?

6. Thank you for the code. Worked really well. Do you have a reference on interpretation of output?

1. Where did you run it? i used spss 20 and produced errors

7. Thanks for the above, I tried to use the syntax on spss 20 but produces error. Any advice?

8. How I can calculate RII for multiple variables by using SPSS means for clients, contractor consultant and other factors and there are many questions under above listed parties

Next → ← Prev
Looks like you are using an ad blocker!

To continue reading you need to turnoff adblocker and refresh the page. We rely on advertising to help fund our site. Please whitelist us if you enjoy our content.