Relative Weight Analysis

Deepanshu Bhalla 8 Comments , , , , ,

This post includes detailed explanation of Relative Weight Analysis (RWA) along with its implementation in statistical softwares and programming like R, Python, SPSS and SAS.

RWA is quite popular in survey analytics world, mainly used to perform driver/impact analysis. For example which human resource driver makes employees stay or leave the organisation. Is 'pay' driver important than 'work-life balance'?. It is also called Relative Importance Analysis.

Relative Weight (Importance) Analysis

Relative Weight Analysis is a useful technique to calculate the relative importance of predictors (independent variables) when independent variables are correlated to each other. It is an alternative to multiple regression technique and it addresses multicollinearity problem and also helps to calculate the importance rank of variables. It helps to answer "Which variable is the most important and rank variables based on their contribution to R-Square".

Relative Weight Analysis
Background

When independent variables are correlated, it is difficult to determine the correct prediction power of each variable. Hence, it is difficult to rank them as we are unable to estimate coefficients correctly. Statistically, multicollinearity can increase the standard error of the coefficient estimates and make the estimates very sensitive to minor changes in the model. It means the coefficients are biased and difficult to interpret.

How it works

It creates a set of new independent variables that are the maximally related to the original independent variables but are uncorrelated to each other. Because these new transformed independent variables are uncorrelated to each other, the dependent variable can be regressed onto this new set of independent variables producing a series of standardized regression coefficients.

How to calculate Relative Weight Analysis?

Below are the steps to calculate Relative Weight Analysis (RWA)

  1. Compute correlation matrix between independent variables
  2. Calculate Eigenvector and Eigenvalues on the above correlation matrix
  3. Calculate diagonal matrix of eigenvalue and then take square root of the diagonal matrix
  4. Calculate matrix multiplication of eigenvector, matrix in step 3 and Transpose of Eigenvector
  5. Square the above matrix
  6. To calculate the partial effect of each independent variable on dependent variable, calculate matrix multiplication of [Inverse of matrix in step 4] and correlation matrix [between dependent and independent variables (i.e. 1 X 1 matrix)]
  7. To calculate R-Square, sum the above matrix (Step 6 matrix)
  8. To calculate raw relative weights, calculate matrix multiplication of [matrix in step 5] and [Square of matrix in step 6]
  9. To calculate raw relative weights as percentage of R-Square, divide raw relative weights by r-square and then multiply it by 100.

Important Point

In the next section I have included programs to run RWA Analysis. Before running the analysis, it is important to ensure you don't have missing values in both independent and dependent variables. If you have missing values, it is important to impute or remove them. Also ensure you provide only numeric values in the target and predictors arguments in the programs below.

Calculate Relative Weight Analysis with Python, R, SAS and SPSS

SPSS Code

*Specify a path where INPUT DATA FILE is saved.
FILE HANDLE Datafile /NAME='C:\Documents and Settings\Deepanshu\My Documents\Downloads\examples\RWA Data.sav'.

*Specify a path where you wish OUTPUT files to be saved.
FILE HANDLE Directory /NAME='C:\Documents and Settings\Deepanshu\My Documents\Downloads\examples'.

*Define Independent Variable names.
DEFINE Ivars ( )
var1 var2 var3
!ENDDEFINE .

*Define Dependent Variable name.
DEFINE Target ( )
Churn
!ENDDEFINE .

*Define VARIABLE LABELING for Independent variables.
*Order of variable labeling and independent variables must be same.
*Space in labels should not be used, rather words separated by"_".
DEFINE LABELING ( )
Interest_Rate
Renewed_PCT
Account_No
!ENDDEFINE.

GET FILE = 'DataFile'.

CORRELATIONS
/VARIABLES= Target Ivars
/MISSING=PAIRWISE
/MATRIX = OUT (Corr.sav).

oms
  /select tables
  /if commands =['Regression'] SUBTYPES=['Coefficients']
  /destination format =SAV  outfile ='Betas.sav'
  /columns sequence =[RALL CALL LALL].

regression
  /dependent Target
  /method= enter Ivars.
omsend.

GET FILE='Betas.sav'.

FLIP ALL.

COMPUTE var6=INDEX(CASE_LBL, '_Beta').
RECODE var6 (1 THRU HIGHEST=1) (else=0).
SELECT IF var6=1.
EXECUTE .

DELETE VARIABLES CASE_LBL VAR6.

SAVE Outfile = 'Directory\Coefficients.sav'
/ RENAME = (var001=Coefficients).

GET FILE = corr.sav .
SELECT IF rowtype_ = 'CORR' .
EXECUTE.

matrix.

MGET / FILE = 'Corr.sav'
     / TYPE = CORR.

COMPUTE R = CR.

COMPUTE N = NCOL(R).
COMPUTE RXX = R(2:N,2:N).
COMPUTE RXY = R(2:N,1).
CALL EIGEN(RXX,EVEC,EV).
COMPUTE D = MDIAG(EV).
COMPUTE DELTA = SQRT(D).
COMPUTE LAMBDA = EVEC * DELTA * T(EVEC).
COMPUTE LAMBDASQ = LAMBDA &**2.
COMPUTE BETA1 = INV(LAMBDA) * RXY.
COMPUTE RSQUARE = CSSQ(BETA1).
COMPUTE RAWWGT = LAMBDASQ * BETA1 &**2.
COMPUTE IMPORT = (RAWWGT &/ RSQUARE) * 100.

PRINT RSQUARE /FORMAT=F8.8.
PRINT RAWWGT /FORMAT=F8.8
/TITLE = "Raw Relative Weights" .
PRINT IMPORT /FORMAT=PCT8.8
/TITLE = "Relative Weights as Percentage of R-square" .

SAVE RSQUARE
/OUTFILE='RSQ.sav'.
SAVE RAWWGT
/OUTFILE='Raw.sav'.
SAVE IMPORT
/OUTFILE='Relative.sav'.
END MATRIX.

INPUT PROGRAM.
NUMERIC LABELING (F25).
LOOP #=1 TO 1.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
FLIP.

SAVE OUTFILE = 'Labeling.sav'
/ DROP VAR001
/ RENAME (CASE_LBL=Categories).

MATCH FILES FILE ='Labeling.sav'
/ FILE = 'Raw.sav'
/ Rename = (COL1 = RAW_RELATIVE)
/ FILE = 'Relative.sav'
/ Rename = (COL1 = PERCENT_RSQUARE)
/ FILE = 'Directory\Coefficients.sav'
/ FILE = 'RSQ.sav'
/ Rename = (COL1 = RSQUARE).

FORMATS  RAW_RELATIVE TO RSQUARE (F8.6).

SAVE TRANSLATE OUTFILE='Directory\Final_Output.xls'  
  /TYPE=XLS
  /VERSION=8
  /REPLACE
  /FIELDNAMES
  /CELLS=VALUES.
EXECUTE.

R Code

rwa <- function(df, target, predictors) {

  df2 <- df %>%
    select(all_of(c(target,predictors))) %>%
    filter(!is.na(target))
  
  if(sum(colSums(is.na(df2)) > 0)) stop("Treat missing values in predictors")
  
  corX <- df2 %>% 
    cor(., use = "pairwise.complete.obs") %>% 
    .[2:ncol(.), 2:ncol(.)]
  
  corY <- df2 %>% 
    cor(., use = "pairwise.complete.obs") %>% 
    .[2:ncol(.), 1]
  
  eigenX <- eigen(corX)
  D <- diag(eigenX$values)
  delta <- sqrt(D)
  lambda <- eigenX$vectors %*% delta %*% t(eigenX$vectors)
  lambdasq <- lambda ^ 2
  beta <- solve(lambda) %*% corY
  rsquare <- sum(beta ^ 2)
  rawWgt <- lambdasq %*% beta ^ 2
  importance <- (rawWgt / rsquare) * 100
  tbl <- data.frame(predictors,
                       `Raw Relative Weights` = rawWgt,
                       Importance = importance,
                       Beta = beta) %>% 
    arrange(desc(Importance))
  
  return(list(`Importance Scores` = tbl, Rsquare = rsquare))
  
}


library(dplyr)
mtcars <- read.csv("https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/mtcars.csv")
rwa(df = mtcars, target = "mpg", predictors = c("cyl", "disp", "hp", "gear"))

SAS Code

FILENAME PROBLY TEMP;
PROC HTTP
 URL="https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/mtcars.csv"
 METHOD="GET"
 OUT=PROBLY;
RUN;

OPTIONS VALIDVARNAME=ANY;
PROC IMPORT
  FILE=PROBLY
  OUT=WORK.MYDATA REPLACE
  DBMS=CSV;
RUN;


%macro rwa (df =, target=, predictors=, output=);

data temp;
set &df (keep = &target &predictors);
where not missing(&target);
run;

proc corr data= temp out=corX(where=(_type_ = 'CORR')) NOMISS noprint;
var &predictors;
run;

proc corr data= temp out=corY(where=(_type_ = 'CORR' and _name_ = "&target") drop=&target) NOMISS noprint;
var &target &predictors;
run;

proc iml;
use corX; 
read all var _NUM_ into M; close;
eigenVal=eigval(m);
eigenVec=eigvec(m);
D = diag(eigenVal);
delta  = sqrt(D);
lambda = eigenVec * delta * t(eigenVec);
lambdasq = lambda ## 2;
use corY; 
read all var _NUM_ into M2; close;
beta = inv(lambda) * t(m2);
rsquare = sum(beta ## 2);
rawRelWgt = lambdasq * beta##2;
importance = (rawRelWgt / rsquare) * 100;
VarName = {&predictors};
print rsquare;
create &output var {VarName rawRelWgt importance beta} ;
append;
close &output;
quit;

%mend;

%rwa(df=MYDATA, target= mpg, predictors=cyl disp hp gear, output=importanceTbl);

Python Code

import pandas as pd
import numpy as np
import scipy.linalg
 
def rwa(df, target, predictors):

    # Combine target and predictors
    allVars = predictors.copy()
    allVars.insert(0,target)
    
    # Non-missing values in target variables
    df2  =  df.loc[:,allVars][df[target].notnull()]
    corX = df2.corr().loc[predictors,predictors]
    corY = df2.corr().loc[predictors,target]
    
    w,v= scipy.linalg.eig(corX)
    D = np.diag(w)
    delta = np.sqrt(D)
    l =  np.matmul(np.matmul(v,  delta), np.transpose(v))
    lambdasq = l**2
    beta = np.matmul(np.linalg.inv(l),corY)
    
    rsquare = sum(np.power(beta,2))
    rawWgt  = np.matmul(lambdasq, beta**2)
    importance = (rawWgt / rsquare) * 100
    
    importanceTbl = pd.DataFrame({'Variables' : predictors, 'RawRelativeWeights' : rawWgt, 'ImportanceScores': importance, 'Beta':beta})

    return importanceTbl, rsquare

mtcars = pd.read_csv("https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/mtcars.csv")
result, rsq = rwa(df = mtcars, target = "mpg", predictors = ["cyl", "disp", "hp", "gear"])
print(result)
print(rsq)
Output
Output constitutes importance scores of each predictor (independent variable) along with r-squared value of the model.
$`RWA Table`
  predictors Raw.Relative.Weights Importance       Beta
1         hp            0.2321744   29.79691 -0.4795836
2        cyl            0.2284797   29.32274 -0.4904939
3       disp            0.2221469   28.50999 -0.4835607
4       gear            0.0963886   12.37037  0.2734483

$Rsquare
[1] 0.7791896
Signs of Beta can be interpreted as if predictor variable is positively or negatively impacting target variable. Negative sign denotes negative relationhip, positive sign denotes positive relationship.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 8 Responses to "Relative Weight Analysis"
  1. Thanks for this interesting post.
    However I'm experiencing some difficulties in following some steps, particularly steps 4 and 6
    Could you please post an example of processing, say a 3 x 3 correlation matrix ?
    Thanks in advance

    ReplyDelete
  2. This is excellent!

    ReplyDelete
  3. Thanks a lot. This is very helpful. May I know how to revise the syntax if the raw data is a correlation matrix?

    ReplyDelete
  4. Thank you for the code. Worked really well. Do you have a reference on interpretation of output?

    ReplyDelete
    Replies
    1. Where did you run it? i used spss 20 and produced errors

      Delete
  5. Thanks for the above, I tried to use the syntax on spss 20 but produces error. Any advice?

    ReplyDelete
  6. How I can calculate RII for multiple variables by using SPSS means for clients, contractor consultant and other factors and there are many questions under above listed parties

    ReplyDelete
Next → ← Prev