Difference between Linear Regression and Logistic Regression

The purpose of this post is to help you understand the difference between linear regression and logistic regression. These regression techniques are two most popular statistical techniques that are generally used practically in various domains. Since these techniques are taught in universities, their usage level is very high in predictive modeling world. In this article, we have listed down 13 differences between these two algorithms.

Difference between Linear and Logistic Regression

1. Variable Type : Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups).

While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinominal or ordinary logistic regression can have dependent variable with more than two categories.

2. Algorithm : Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value.

While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood). With ML, the computer uses different "iterations" in which it tries different solutions until it gets the maximum likelihood estimates.

3. Equation :

Multiple Regression Equation :

Linear Regression Equation

Y is target or dependent variable, b0 is intercept. x1,x2,x3...xk are predictors or independent variables. b1,b2,b3....bk is coefficients of respective predictors.

Logistic Regression Equation :

P(y=1) = e(b0 + b1x1 + b2x2 +-----bkxk) / (1+e(b0+b1x1+ b2x2+------bkxk))

Which further simplifies to :

P(y=1) = 1 / (1+ exp -(b0+b1x1+ b2x2+------bkxk))

Logistic Regression Equation

The above function is called logistic or sigmoid function.

4. Curve :

Linear Regression : Straight line

Straight Line : Linear Regression

Linear regression aims at finding the best-fitting straight line which is also called a regression line. In the above figure, the red diagonal line is the best-fitting straight line and consists of the predicted score on Y for each possible value of X. The distance between the points to the regression line represent the errors.

Logistic Regression : S Curve

Logistic S-shaped curve

Changing the coefficient leads to change in both the direction and the steepness of the logistic function. It means positive slopes result in an S-shaped curve and negative slopes result in a Z-shaped curve.

5. Linear Relationship : Linear regression needs a linear relationship between the dependent and independent variables. While logistic regression does not need a linear relationship between the dependent and independent variables.

6. Normality of Residual : Linear regression requires error term should be normally distributed. While logistic regression does not require error term should be normally distributed.

7. Homoscedasticity : Linear regression assumes that residuals are approximately equal for all predicted dependent variable values. While Logistic regression does not need residuals to be equal for each level of the predicted dependent variable values.

8. Sample Size : Linear regression requires 5 cases per independent variable in the analysis.While logistic regression needs at least 10 events per independent variable.

9. Purpose : Linear regression is used to estimate the dependent variable incase of a change in independent variables. For example, relationship between number of hours studied and your grades.
Whereas logistic regression is used to calculate the probability of an event. For example, an event can be whether customer will attrite or not in next 6 months.

10. Interpretation : Betas or Coefficients of linear regression is interpreted like below -

Keeping all other independent variables constant, how much the dependent variable is expected to increase/decrease with an unit increase in the independent variable.

In logistic regression, we interpret odd ratios -

The effect of a one unit of change in X in the predicted odds ratio with the other variables in the model held constant.

11. Distribution :

Linear regression assumes normal or gaussian distribution of dependent variable. Whereas, Logistic regression assumes binomial distribution of dependent variable. Note : Gaussian is the same as the normal distribution. See the implementation in R below -

R Code :
Create sample data by running the following script

set.seed(123)
y = ifelse(runif(100) < 0.5, 1,0)
x = sample(1:100,100)
y1 = sample(100:1000,100, replace=T)

Linear Regression

glm(y1 ~ x, family = gaussian(link = "identity"))

Coefficients:

(Intercept) x

600.6152 -0.8631

Degrees of Freedom: 99 Total (i.e. Null); 98 Residual

Null Deviance: 7339000

Residual Deviance: 7277000 AIC: 1409

Logistic Regression

glm(y ~ x, family = binomial(link = "logit"))

Coefficients:
(Intercept) x
-0.024018 0.005279

Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 137.2
Residual Deviance: 136.6 AIC: 140.6

12. Link Function

Linear regression uses Identity link function of gaussian family. Whereas, logistic regression uses Logit function of Binomial family.

13. Computational Time: Linear regression is very fast as compared to logistic regression as logistic regression is an iterative process of maximum likelihood.

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 23 Responses to "Difference between Linear Regression and Logistic Regression"

AnonymousMay 19, 2015 at 6:50 AM
good explanation
AnonymousJune 11, 2015 at 11:02 AM
very helpfull. i made my assignment easily
thanks
UnknownJanuary 31, 2016 at 3:04 AM
good
UnknownMay 31, 2016 at 5:05 AM
its very helful for me doing my assignment but i need 8 more differences and 3 similarities. by the way thankx
PriyankaJanuary 15, 2017 at 11:15 PM
I love the way you explain things. Your site has helped me immensely! I can't thank you enough! Keep up the good work!
AnonymousJanuary 23, 2017 at 9:09 AM
A helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharingBad rate = 80%. I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it.
UnknownJanuary 23, 2017 at 9:29 AM
A helpful post. Tks you very much! I have a data with over 48000 observes and category outcome (0 and 1). Here is a link to my data: https://drive.google.com/file/d/0B0viyEhVYPseQlJ5d2pZa3RKc0E/view?usp=sharing . I run 2 model for that data: linear and logistics regression. I want to know which model is better? So I have a question: Which common criterions do compare good level of them? In linear model, we have R-squared. Similarity, we have GINI index, -2Loglikehood,sensitivity,specificity,area under ROC curve,... in logistics regression. But have not common criterions to evaluate good level of 2 model. Can you help me? If you can, please give me SAS code for it. Thank you!
Pritesh TiwariMarch 7, 2017 at 9:53 AM
very helpful
AnonymousJune 18, 2017 at 1:52 AM
If the independent variable is discrete but continuous...wat technique we can use?
AnonymousMay 23, 2018 at 3:56 AM
Can anyone explain in detail about odds ratio?

Thanks in advance
SanthaJuly 20, 2018 at 11:20 PM
An excellent explanation. Very useful indeed Thanks
UnknownAugust 9, 2018 at 7:57 PM
Clear & crisp 👍
UnknownMarch 29, 2019 at 4:56 PM
Excellent Comparison.
UnknownApril 24, 2019 at 3:09 PM
Superb explanations about both terms.
Excellent
dollymeenakJune 17, 2019 at 10:05 PM
Very nicely explained
AnonymousJuly 24, 2019 at 12:49 AM
Very relevant article***
UnknownOctober 9, 2019 at 8:34 AM
Very nice explanation
UnknownJanuary 4, 2021 at 2:13 AM
NICE ARTICLE
AnonymousJanuary 13, 2021 at 6:49 AM
Can you add a reference to support the minimum sample size of 5 for the linear regression, or explain why did you chose 5?
UnknownFebruary 1, 2022 at 2:43 AM
Good explanation very helpful to understand difference between both algorithm