Difference between Linear Regression and Logistic Regression

Live Online Training : Predictive Modeling using SAS

- Explain Advanced Algorithms in Simple English
- Live Projects & Case Studies
- Domain Knowledge
- Job Placement Assistance

The purpose of this post is to help you understand the difference between linear regression and logistic regression. These regression techniques are two most popular statistical techniques that are generally used practically in various domains. Since these techniques are taught in universities, their usage level is very high in predictive modeling world. In this article, we have listed down 13 differences between these two algorithms.

Difference between Linear and Logistic Regression

1. Variable Type : Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups).

While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinominal or ordinary logistic regression can have dependent variable with more than two categories.


2. Algorithm :  Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value.

While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood). With ML, the computer uses different "iterations" in which it tries different solutions until it gets the maximum likelihood estimates.

3. Equation : 

Multiple Regression Equation :
Linear Regression Equation

Y is target or dependent variable, b0 is intercept. x1,x2,x3...xk are predictors or independent variables. b1,b2,b3....bk is coefficients of respective predictors.

Logistic Regression Equation :

P(y=1) = e(b0 + b1x1 + b2x2 +-----bkxk) / (1+e(b0+b1x1+ b2x2+------bkxk))

Which further simplifies to :

P(y=1) = 1 / (1+ exp -(b0+b1x1+ b2x2+------bkxk))
Logistic Regression Equation

The above function is called logistic or sigmoid function.

4. Curve : 

Linear Regression : Straight line


Straight Line : Linear Regression

Linear regression aims at finding the best-fitting straight line which is also called a regression line. In the above figure, the red diagonal line is the best-fitting straight line and consists of the predicted score on Y for each possible value of X. The distance between the points to the regression line represent the errors.

Logistic Regression : S Curve
Logistic S-shaped curve
Changing the coefficient leads to change in both the direction and the steepness of the logistic function. It means positive slopes result in an S-shaped curve and negative slopes result in a Z-shaped curve.

5. Linear Relationship : Linear regression needs a linear relationship between the dependent and independent variables. While logistic regression does not need a linear relationship between the dependent and independent variables.

6. Normality of Residual : Linear regression requires error term should be normally distributed. While logistic regression does not require error term should be normally distributed.

7. Homoscedasticity :  Linear regression assumes that residuals are approximately equal for all predicted dependent variable values. While Logistic regression does not need residuals to be equal for each level of the predicted dependent variable values.

8. Sample Size : Linear regression requires 5 cases per independent variable in the analysis.While logistic regression needs at least 10 events per independent variable.

9. Purpose : Linear regression is used to estimate the dependent variable incase of a change in independent variables. For example, relationship between number of hours studied and your grades.
Whereas logistic regression is used to calculate the probability of an event. For example, an event can be whether customer will attrite or not in next 6 months.

10. Interpretation : Betas or Coefficients of linear regression is interpreted like below -
Keeping all other independent variables constant, how much the dependent variable is expected to increase/decrease with an unit increase in the independent variable.
In logistic regression, we interpret odd ratios -
The effect of a one unit of change in X in the predicted odds ratio with the other variables in the model held constant.
11. Distribution :

Linear regression assumes normal or gaussian distribution of dependent variable. Whereas, Logistic regression assumes binomial distribution of dependent variable. Note : Gaussian is the same as the normal distribution. See the implementation in R below -

R Code :
Create sample data by running the following script
set.seed(123)
y = ifelse(runif(100) < 0.5, 1,0)
x = sample(1:100,100)
y1 = sample(100:1000,100, replace=T)
Linear Regression
glm(y1 ~ x, family = gaussian(link = "identity"))
Coefficients:
(Intercept)            x  
   600.6152      -0.8631  

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:    7339000 
Residual Deviance: 7277000 AIC: 1409

Logistic Regression
glm(y ~ x, family = binomial(link = "logit"))
Coefficients:
(Intercept)            x
  -0.024018     0.005279

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:    137.2
Residual Deviance: 136.6 AIC: 140.6

12. Link Function

Linear regression uses Identity link function of gaussian family. Whereas, logistic regression uses Logit function of Binomial family.

13. Computational Time: Linear regression is very fast as compared to logistic regression as logistic regression is an iterative process of maximum likelihood.

Statistics Tutorials : 50 Statistics Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

11 Responses to "Difference between Linear Regression and Logistic Regression"

  1. good explanation

    ReplyDelete
  2. very helpfull. i made my assignment easily
    thanks

    ReplyDelete
  3. its very helful for me doing my assignment but i need 8 more differences and 3 similarities. by the way thankx

    ReplyDelete
    Replies
    1. I have expanded the list of difference between these two algorithms. Thanks!

      Delete
  4. I love the way you explain things. Your site has helped me immensely! I can't thank you enough! Keep up the good work!

    ReplyDelete
  5. A helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharingBad rate = 80%. I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it.

    ReplyDelete
  6. A helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). Here is a link to my data: https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharing . I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it. Thank you!

    ReplyDelete
  7. If the independent variable is discrete but continuous...wat technique we can use?

    ReplyDelete
    Replies
    1. Correction: dependent variable

      Delete

Next → ← Prev