The purpose of this post is to help you understand the difference between linear regression and logistic regression. These regression techniques are two most popular statistical techniques that are generally used practically in various domains. Since these techniques are taught in universities, their usage level is very high in predictive modeling world. In this article, we have listed down 13 differences between these two algorithms.

While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinominal or ordinary logistic regression can have dependent variable with more than two categories.

While logistic regression is based on

**Difference between Linear and Logistic Regression**

**1. Variable Type :**Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups).While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinominal or ordinary logistic regression can have dependent variable with more than two categories.

**2. Algorithm :**Linear regression is based on

*least square estimation*which

**says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value.**

While logistic regression is based on

**which says coefficients should be chosen in such a way that it maximizes the**

*Maximum Likelihood Estimation***Probability of Y given X**(likelihood). With ML, the computer uses different "iterations" in which it tries different solutions until it gets the maximum likelihood estimates.

**3. Equation :**

**Multiple Regression Equation :**

Y is target or dependent variable, b0 is intercept. x1,x2,x3...xk are predictors or independent variables. b1,b2,b3....bk is coefficients of respective predictors.

**Logistic Regression Equation :**

**P(y=1) =**e(b0 + b1x1 + b2x2 +-----bkxk) / (1+e(b0+b1x1+ b2x2+------bkxk))

Which further simplifies to :

**P(y=1) =**1 / (1+ exp

**-**(b0+b1x1+ b2x2+------bkxk))

Logistic Regression Equation |

**The above function is called logistic or sigmoid function.**

**4. Curve :**

**Linear Regression : Straight line**

Straight Line : Linear Regression |

**Linear regression**aims at finding the best-fitting straight line which is also called a regression line. In the above figure, the red diagonal line is the best-fitting straight line and consists of the predicted score on Y for each possible value of X. The distance between the points to the regression line represent the errors.

**Logistic Regression : S Curve**

Logistic S-shaped curve |

**5. Linear Relationship :**Linear regression needs a linear relationship between the dependent and independent variables. While logistic regression does not need a linear relationship between the dependent and independent variables.

**6. Normality of Residual :**Linear regression requires error term should be normally distributed. While logistic regression does not require error term should be normally distributed.

**7. Homoscedasticity :**Linear regression assumes that residuals are approximately equal for all predicted dependent variable values. While Logistic regression does not need residuals to be equal for each level of the predicted dependent variable values.

**8. Sample Size :**Linear regression requires 5 cases per independent variable in the analysis.While logistic regression needs at least 10 events per independent variable.

**9. Purpose :**Linear regression is used to estimate the dependent variable incase of a change in independent variables

**.**For example, relationship between number of hours studied and your grades.

Whereas logistic regression is used to calculate the probability of an event. For example, an event can be whether customer will attrite or not in next 6 months.

**10. Interpretation :**Betas or Coefficients of linear regression is interpreted like below -

Keeping all other independent variables constant, how much the dependent variable is expected to increase/decrease with an unit increase in the independent variable.

In logistic regression, we interpret odd ratios -

The effect of a one unit of change in X in the predicted odds ratio with the other variables in the model held constant.

**11. Distribution :**

**Linear regression**assumes normal or gaussian distribution of dependent variable. Whereas,

**Logistic regression**assumes binomial distribution of dependent variable.

**Note :**Gaussian is the same as the normal distribution. See the implementation in R below -

**R Code :**

Create sample data by running the following script

set.seed(123)

y = ifelse(runif(100) < 0.5, 1,0)

x = sample(1:100,100)

y1 = sample(100:1000,100, replace=T)

**Linear Regression**

glm(y1 ~ x, family =gaussian(link = "identity"))

Coefficients:

(Intercept) x

600.6152 -0.8631

Degrees of Freedom: 99 Total (i.e. Null); 98 Residual

Null Deviance: 7339000

Residual Deviance: 7277000 AIC: 1409

**Logistic Regression**

glm(y ~ x, family =Coefficients:binomial(link = "logit"))

(Intercept) x

-0.024018 0.005279

Degrees of Freedom: 99 Total (i.e. Null); 98 Residual

Null Deviance: 137.2

Residual Deviance: 136.6 AIC: 140.6

**12. Link Function**

Linear regression uses

**Identity**link function of gaussian family. Whereas, logistic regression uses

**Logit**function of Binomial family.

**13.**

**Computational Time**: Linear regression is very fast as compared to logistic regression as logistic regression is an iterative process of maximum likelihood.

good explanation

ReplyDeletevery helpfull. i made my assignment easily

ReplyDeletethanks

good

ReplyDeleteits very helful for me doing my assignment but i need 8 more differences and 3 similarities. by the way thankx

ReplyDeleteI have expanded the list of difference between these two algorithms. Thanks!

DeleteI love the way you explain things. Your site has helped me immensely! I can't thank you enough! Keep up the good work!

ReplyDeleteA helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharingBad rate = 80%. I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it.

ReplyDeleteA helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). Here is a link to my data: https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharing . I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it. Thank you!

ReplyDeletevery helpful

ReplyDeleteIf the independent variable is discrete but continuous...wat technique we can use?

ReplyDeleteCorrection: dependent variable

DeleteDependent variable can not be discrete and continous at the same time. It can be onlt any one at a time. As pointed in article, for continuous variable, you can use linear regression. For discrete variable, you can make categories out of them and then use normal logistic regression.

Delete