Difference between Linear Regression and Logistic Regression

Deepanshu Bhalla 23 Comments , , ,
The purpose of this post is to help you understand the difference between linear regression and logistic regression. These regression techniques are two most popular statistical techniques that are generally used practically in various domains. Since these techniques are taught in universities, their usage level is very high in predictive modeling world. In this article, we have listed down 13 differences between these two algorithms.

Difference between Linear and Logistic Regression

1. Variable Type : Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups).

While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinominal or ordinary logistic regression can have dependent variable with more than two categories.


2. Algorithm :  Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value.

While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood). With ML, the computer uses different "iterations" in which it tries different solutions until it gets the maximum likelihood estimates.

3. Equation : 

Multiple Regression Equation :
Linear Regression Equation

Y is target or dependent variable, b0 is intercept. x1,x2,x3...xk are predictors or independent variables. b1,b2,b3....bk is coefficients of respective predictors.

Logistic Regression Equation :

P(y=1) = e(b0 + b1x1 + b2x2 +-----bkxk) / (1+e(b0+b1x1+ b2x2+------bkxk))

Which further simplifies to :

P(y=1) = 1 / (1+ exp -(b0+b1x1+ b2x2+------bkxk))
Logistic Regression Equation

The above function is called logistic or sigmoid function.

4. Curve : 

Linear Regression : Straight line


Straight Line : Linear Regression

Linear regression aims at finding the best-fitting straight line which is also called a regression line. In the above figure, the red diagonal line is the best-fitting straight line and consists of the predicted score on Y for each possible value of X. The distance between the points to the regression line represent the errors.

Logistic Regression : S Curve
Logistic S-shaped curve
Changing the coefficient leads to change in both the direction and the steepness of the logistic function. It means positive slopes result in an S-shaped curve and negative slopes result in a Z-shaped curve.

5. Linear Relationship : Linear regression needs a linear relationship between the dependent and independent variables. While logistic regression does not need a linear relationship between the dependent and independent variables.

6. Normality of Residual : Linear regression requires error term should be normally distributed. While logistic regression does not require error term should be normally distributed.

7. Homoscedasticity :  Linear regression assumes that residuals are approximately equal for all predicted dependent variable values. While Logistic regression does not need residuals to be equal for each level of the predicted dependent variable values.

8. Sample Size : Linear regression requires 5 cases per independent variable in the analysis.While logistic regression needs at least 10 events per independent variable.

9. Purpose : Linear regression is used to estimate the dependent variable incase of a change in independent variables. For example, relationship between number of hours studied and your grades.
Whereas logistic regression is used to calculate the probability of an event. For example, an event can be whether customer will attrite or not in next 6 months.

10. Interpretation : Betas or Coefficients of linear regression is interpreted like below -
Keeping all other independent variables constant, how much the dependent variable is expected to increase/decrease with an unit increase in the independent variable.
In logistic regression, we interpret odd ratios -
The effect of a one unit of change in X in the predicted odds ratio with the other variables in the model held constant.
11. Distribution :

Linear regression assumes normal or gaussian distribution of dependent variable. Whereas, Logistic regression assumes binomial distribution of dependent variable. Note : Gaussian is the same as the normal distribution. See the implementation in R below -

R Code :
Create sample data by running the following script
set.seed(123)
y = ifelse(runif(100) < 0.5, 1,0)
x = sample(1:100,100)
y1 = sample(100:1000,100, replace=T)
Linear Regression
glm(y1 ~ x, family = gaussian(link = "identity"))
Coefficients:
(Intercept)            x  
   600.6152      -0.8631  

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:    7339000 
Residual Deviance: 7277000 AIC: 1409

Logistic Regression
glm(y ~ x, family = binomial(link = "logit"))
Coefficients:
(Intercept)            x
  -0.024018     0.005279

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:    137.2
Residual Deviance: 136.6 AIC: 140.6

12. Link Function

Linear regression uses Identity link function of gaussian family. Whereas, logistic regression uses Logit function of Binomial family.

13. Computational Time: Linear regression is very fast as compared to logistic regression as logistic regression is an iterative process of maximum likelihood.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

23 Responses to "Difference between Linear Regression and Logistic Regression"
  1. good explanation

    ReplyDelete
  2. very helpfull. i made my assignment easily
    thanks

    ReplyDelete
  3. its very helful for me doing my assignment but i need 8 more differences and 3 similarities. by the way thankx

    ReplyDelete
    Replies
    1. I have expanded the list of difference between these two algorithms. Thanks!

      Delete
  4. I love the way you explain things. Your site has helped me immensely! I can't thank you enough! Keep up the good work!

    ReplyDelete
  5. A helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharingBad rate = 80%. I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it.

    ReplyDelete
  6. A helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). Here is a link to my data: https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharing . I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it. Thank you!

    ReplyDelete
  7. If the independent variable is discrete but continuous...wat technique we can use?

    ReplyDelete
    Replies
    1. Correction: dependent variable

      Delete
    2. Dependent variable can not be discrete and continous at the same time. It can be onlt any one at a time. As pointed in article, for continuous variable, you can use linear regression. For discrete variable, you can make categories out of them and then use normal logistic regression.

      Delete
  8. Can anyone explain in detail about odds ratio?

    Thanks in advance

    ReplyDelete
  9. An excellent explanation. Very useful indeed Thanks

    ReplyDelete
  10. Superb explanations about both terms.
    Excellent

    ReplyDelete
  11. Very relevant article***

    ReplyDelete
  12. Can you add a reference to support the minimum sample size of 5 for the linear regression, or explain why did you chose 5?

    ReplyDelete
  13. Good explanation very helpful to understand difference between both algorithm

    ReplyDelete
Next → ← Prev