Descriptive Statistics

Deepanshu Bhalla 3 Comments
Descriptive statistics answer the following questions:
  • What is the value that best describes the data set?
  • How much a data set speads from its average value?
  • What is the smallest and largest number in a data set?

It provides information on summary statistics that includes Mean, Standard Error, Median, Mode, Standard Deviation, Variance, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, and Count.

Measure of Central Tendency
It  describes a whole set of data with a single value that represents the centre of its distribution.
There are three main measures of central tendency: the mode, the median and the mean. 

Mean, Median and Mode
It is the sum of the observations divided by the sample size.

The mean of the values 5,6,6,8,9,9,9,9,10,10 is (5+6+6+8+9+9+9+9+10+10)/10 = 8.1

Limitation :  
It is affected by extreme values. Very large or very small numbers can distort the answer
It is the middle value. It splits the data in half. Half of the data are above the median; half of the data are below the median.

Advantage :  
It is NOT affected by extreme values. Very large or very small numbers does not affect it
It is the value that occurs most frequently in a dataset

Advantage :  
It can be used when the data is not numerical.

Disadvantage :
1. There may be no mode at all if none of the data is the same
2. There may be more than one mode   

When to use mean, median and mode?
Mean – When your data is not skewed i.e normally distributed. In other words, there are no extreme values present in the data set (Outliers).

Median – When your data is skewed or you are dealing with ordinal (ordered categories) data (e.g. likert scale 1. Strongly dislike 2. Dislike 3.Neutral   4. Like 5. Strongly like)

Mode - When dealing with nominal (unordered categories) data.

 In real life, suppose a company is considering expanding into an area and is studying the size of containers that competitors are offering. They would be more interested in the mode because they want to know what size tends to sell most often.

Measure of Dispersion
It refers to the spread or dispersion of scores. There are four main measures of variability: Range, Inter quartile range, Standard deviation and Variance.

It is simply the largest observation minus the smallest observation.

It is easy to calculate.

It is very sensitive to outliers and does not use all the observations in a data set.
Standard Deviation
 It is a measure of spread of data about the mean.

Advantage :  
It gives a better picture of your data than just the mean alone.

Disadvantage :  
1. It doesn't give a clear picture about the whole range of the data.
2. It can give a skewed picture if data contain outliers.

It is a measure of symmetry. A distribution is symmetric if it looks the same to the left and right of the center point.
It is a measure of whether the data are peaked or flat relative to the rest of the data. Higher values indicate a higher, sharper peak; lower values indicate a lower, less distinct peak.

Example 1: Suppose you are asked to calculate the average asset value of top stock funds and check whether there is any variability in the assets of these stock funds. You would answer this question with a measure of central tendency and variability.

Example 2: Suppose you are asked to provide a figure that best describes the annual salary offered to students in ABC College. You would answer this question with a measure of central tendency and variability.

Related Posts
Spread the Word!
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 3 Responses to "Descriptive Statistics"
  1. Thanks for the info, just need some more examples for better understanding, though u have explained really good.

  2. Thanks for the easy and no nonsense content, you could have also included the description of variance and it's significance. After all, it has different applications than standard deviation.. I guess.

Next → ← Prev