In regression and tree models, it is required to meet assumptions of multicollinearity. Multicollinearity means "Independent variables are highly correlated to each other".
For categorical variables, multicollinearity can be detected with Spearman rank correlation coefficient (ordinal variables) and chi-square test (nominal variables).
For a categorical and a continuous variable, multicollinearity can be measured by t-test (if the categorical variable has 2 categories) or ANOVA (more than 2 categories).
For a categorical and a continuous variable, multicollinearity can be measured by t-test (if the categorical variable has 2 categories) or ANOVA (more than 2 categories).
Hi,
ReplyDeleteThanks for your great work. Could please explain this in more detail. Like age group and income group. Both are co-related but which test we would we use in SAS to quantify this relationship.
One question
ReplyDeleteWhat test is applicable if checking multicollinearity between cateogrical and continuous variable.
If both dependent and independent variable are categorical ,how can multicollinarity test be done?
ReplyDeleteMulticollinearity means "Independent variables are highly correlated to each other".
Deleteyour response or dependant variable is not considered while checking multicollinearity.
If two categorical variables are significantly associated, can we use both variables in a logistic regression model?
ReplyDeleteIf two categorical variables are significantly associated, you should not use both in logistic regression model. This is because one of the assumption of the logistic regression model is 'absence of multi-collinearity' among it's independent features. Hence use either of it or feature engineer new variable by using these both.
Delete