Descriptive statistics answer the following questions:
- What is the value that best describes the data set?
- How much a data set speads from its average value?
- What is the smallest and largest number in a data set?
It provides information on summary statistics that includes Mean, Standard Error, Median, Mode, Standard Deviation, Variance, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, and Count.
Measure of Central Tendency
It describes a whole set of data with a single value that represents the centre of its distribution.
There are three main measures of central tendency: the mode, the median and the mean.
Mean, Median and Mode
It is the sum of the observations divided by the sample size.
The mean of the values 5,6,6,8,9,9,9,9,10,10 is (5+6+6+8+9+9+9+9+10+10)/10 = 8.1Limitation :
It is affected by extreme values. Very large or very small numbers can distort the answer
It is the middle value. It splits the data in half. Half of the data are above the median; half of the data are below the median.
Advantage :It is NOT affected by extreme values. Very large or very small numbers does not affect it
It is the value that occurs most frequently in a dataset
Advantage :It can be used when the data is not numerical.
1. There may be no mode at all if none of the data is the same
2. There may be more than one mode
When to use mean, median and mode?
Mean – When your data is not skewed i.e normally distributed. In other words, there are no extreme values present in the data set (Outliers).
Median – When your data is skewed or you are dealing with ordinal (ordered categories) data (e.g. likert scale 1. Strongly dislike 2. Dislike 3.Neutral 4. Like 5. Strongly like)
Mode - When dealing with nominal (unordered categories) data.
In real life, suppose a company is considering expanding into an area and is studying the size of containers that competitors are offering. They would be more interested in the mode because they want to know what size tends to sell most often.
It refers to the spread or dispersion of scores. There are four main measures of variability: Range, Inter quartile range, Standard deviation and Variance.
It is simply the largest observation minus the smallest observation.
It is easy to calculate.
It is very sensitive to outliers and does not use all the observations in a data set.
It is a measure of spread of data about the mean.
Advantage :It gives a better picture of your data than just the mean alone.
Disadvantage :1. It doesn't give a clear picture about the whole range of the data.
2. It can give a skewed picture if data contain outliers.
It is a measure of symmetry. A distribution is symmetric if it looks the same to the left and right of the center point.
It is a measure of whether the data are peaked or flat relative to the rest of the data. Higher values indicate a higher, sharper peak; lower values indicate a lower, less distinct peak.
Example 2: Suppose you are asked to provide a figure that best describes the annual salary offered to students in ABC College. You would answer this question with a measure of central tendency and variability.