# R : Variable Selection - Wald Chi-Square Analysis

In logistic regression, we can select top variables based on their high wald chi-square value. In other words, we can run univariate analysis of each independent variable and then pick important predictors based on their wald chi-square value.

#Run Logistic Regression
mylogit <- glm(admit ~ ., data = mydata, family = "binomial")

#Create Logistic Regression Function
unilogit = function(df,depvar) {
depvar1 = deparse(substitute(depvar))
lapply(names(df)[which(names(df)!= depvar1)], function(x)
{mylogit = glm(formula(paste(depvar1,"~",x)), data = df, family = "binomial")
summary(mylogit)\$coefficient}
)
}

#Run Function

#Merge all the coefficients
final <- do.call(rbind, univariate)

#Make the table formatable
univList = cbind(data.frame(Variable = row.names(final)),final)
FinalList = subset(univList, Variable!="(Intercept)")
FinalList[,"Wald ChiSquare"] = FinalList^2
FinalList[,"Rank"] = rank(-FinalList)
FinalList = FinalList[order(FinalList\$Rank),]
Method 2 :
unilogit2 = function(df,depvar, output) {
dummydt=data.frame(matrix(ncol=0,nrow=0))
depvar1 = deparse(substitute(depvar))
out = deparse(substitute(output))
xxxx = names(df)[which(names(df)!= depvar1)]
for (i in 1:length(xxxx)) {
mylogit = glm(formula(paste(depvar1,"~",xxxx[i])), data = df, family = "binomial")
coeff = data.frame(summary(mylogit)\$coefficient)
if (i==1) {output = rbind(dummydt,coeff)}
else {output = rbind(output,coeff)}
assign(out,output, envir = .GlobalEnv)
}
} Related Posts Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

2 Responses to "R : Variable Selection - Wald Chi-Square Analysis "
1. is this for both continuous and categorical variables?

2. can you please explain which variable is most impacting .?

Next → ← Prev
Love this Post? Spread the Word!
Share