How to Read a Box Plot

Deepanshu Bhalla 9 Comments
What is Box Plot?

A box plot shows the distribution of data. It is useful for identifying outliers and visualizing the skewness of a dataset.

How to Read Box Plot
How to Read Box Plot
Interpretation

Outlier : If a value is higher than Q3 + 1.5*(Q3-Q1), the value will be considered as outlier. Similarly, if a value is lower than Q1 - 1.5*(Q3-Q1), the value will be considered as outlier. Here Q1 and Q3 are first and third quartile.

Normal Distribution : If a box plot has equal proportions around the median, we can say distribution is symmetric or normal.

Positively Skewed : For a distribution that is positively skewed, the box plot will show the median closer to the lower or bottom quartile.

A distribution is considered "Positively Skewed" when mean > median. It means the data constitute higher frequency of high valued scores.

Negatively Skewed : For a distribution that is negatively skewed, the box plot will show the median closer to the upper or top quartile.

A distribution is considered "Negatively Skewed" when mean < median. It means the data constitute higher frequency of low valued scores.
Box Plots vs. Histogram

Box plots are useful for comparing distributions between different groups or datasets. Whereas histograms are useful for visualizing the frequency distribution of a single dataset.

Box Plots vs. Histogram
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 9 Responses to "How to Read a Box Plot"
  1. Thanks a lot for the wonderful explanation.This site is very useful and I love this. Please help us to learn more on basic and advanced statistical techniques.
    Thanks in advance.

    ReplyDelete
    Replies
    1. Thank you so much for your appreciation!! Check out this article - Cluster Analysis using SAS
      http://www.listendata.com/2014/10/cluster-analysis-using-sas.html

      Delete
  2. Thanks for sharing above information. This really helped me indeed.
    I believe box plot is the best way to identify outliers in our linear regression model.
    To create box plot I mention plot in options in proc univariate SAS, do you know any other procedure or option by which we can create box plot and to make it more presentable.

    ReplyDelete
    Replies
    1. Glad you found it useful. Yes, you can customize box plot by using PROC BOXPLOT procedure.

      Delete
  3. Thank you for your articles, they are quite helpful!
    I am finding a hard time with the fences though, because lower fence to Q1 and upper fence to Q3 don't look equal in general. Any help on clarifying my confusion will be greatly appreciated

    ReplyDelete
  4. could you please explain me how to judge the kurtosis value positive , negative or zero based on box plot examples?It has been asked in Predictive modeling using SAS e-miner global certification examination.

    ReplyDelete
  5. Thanks but I think this can confuse
    "It means the data constitute higher frequency of high valued scores"

    The use of the word frequency will imply to some, that there are more data points above the median. That isn't the case.
    What it is telling me that the range of scores below the median is smaller that the range of scores above the median and that's why the mean can be greater than the median a nd present positive skew

    ReplyDelete
  6. what if the mean > median , but boxplot shows median closer to the top quartile ?

    ReplyDelete
Next → ← Prev