Detecting Multicollinearity in Categorical Variables

In regression and tree models, it is required to meet assumptions of multicollinearity. Multicollinearity means "Independent variables are highly correlated to each other".

For categorical variables, multicollinearity can be detected with Spearman rank correlation coefficient (ordinal variables) and chi-square test (nominal variables).

For a categorical and a continuous variable, multicollinearity can be measured by t-test (if the categorical variable has 2 categories) or ANOVA (more than 2 categories).
Related Posts
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and Human Resource.

4 Responses to "Detecting Multicollinearity in Categorical Variables"
  1. Hi,

    Thanks for your great work. Could please explain this in more detail. Like age group and income group. Both are co-related but which test we would we use in SAS to quantify this relationship.

  2. One question
    What test is applicable if checking multicollinearity between cateogrical and continuous variable.

  3. If both dependent and independent variable are categorical ,how can multicollinarity test be done?

    1. Multicollinearity means "Independent variables are highly correlated to each other".
      your response or dependant variable is not considered while checking multicollinearity.


Next → ← Prev
Love this Post? Spread the Word!