Evidence-Based Approaches to Public Health: Biostatistics – Descriptive Statistics: Measures of Central Tendency (Mean, Median, Mode)
In this tutorial, we will explore the three primary measures of central tendency: the mean, median, and mode. These statistical tools are used to summarize data by identifying a central point or typical value in a dataset.
By the end of this tutorial, you will understand what each measure of central tendency represents, how they are calculated, and when to use them. We will also provide practice questions to help reinforce your understanding.
Table of Contents:
- Introduction to Descriptive Statistics
- What is the Mean?
- Definition of the Mean
- How to Calculate the Mean
- When to Use the Mean
- What is the Median?
- Definition of the Median
- How to Calculate the Median
- When to Use the Median
- What is the Mode?
- Definition of the Mode
- How to Calculate the Mode
- When to Use the Mode
- Practice Questions
- Conclusion
1. Introduction to Descriptive Statistics
Descriptive statistics provide a way to summarize and describe the main features of a dataset. Measures of central tendency—mean, median, and mode—are used to identify a single value (or in many cases, a set of values) that represents the center of the data. These measures give us insight into where the majority of data points lie and help us understand the overall pattern in the dataset.
Each measure of central tendency has its own strengths and weaknesses, and it is important to choose the right one depending on the data distribution and the context of the analysis.
2. What is the Mean?
The mean (often referred to as the average) is the most commonly used measure of central tendency. It represents the arithmetic average of a set of values and is calculated by summing all the values in the dataset and then dividing by the number of values.
2.1 How to Calculate the Mean
The formula for calculating the mean is:
[math] \text{Mean} = \frac{\sum{x}}{n} [/math]
Where:
- Σx: The sum of all values in the dataset.
- n: The number of values in the dataset.
2.2 When to Use the Mean
The mean is useful when you have a dataset without extreme outliers and when you want to take all values into account. It provides a comprehensive measure of central tendency but can be misleading if the data are heavily skewed by outliers.
3. What is the Median?
The median is the middle value in a dataset when the values are arranged in ascending or descending order. If the dataset has an odd number of values, the median is the middle value; if the dataset has an even number of values, the median is the average of the two middle values.
3.1 Definition of the Median
The median is the value that separates the dataset into two equal halves: 50% of the data points are smaller than the median, and 50% are larger. The median is not affected by outliers, making it a better measure of central tendency for skewed distributions.
3.2 How to Calculate the Median
To calculate the median:
- Step 1: Arrange the data in ascending order.
- Step 2: If the number of values is odd, the median is the middle value.
- Step 3: If the number of values is even, the median is the average of the two middle values.
3.3 When to Use the Median
The median is useful when the dataset contains outliers or when the data are skewed. It provides a more accurate representation of the center of the data in these situations compared to the mean.
4. What is the Mode?
The mode is the value that appears most frequently in a dataset. Unlike the mean and median, the mode can be used for both numerical and categorical data. A dataset may have more than one mode (bimodal or multimodal) if multiple values occur with the same highest frequency.
4.1 Definition of the Mode
The mode is the value that occurs most often in a dataset. In some cases, there may be no mode if no value repeats, or there may be multiple modes if several values have the same frequency.
4.2 How to Calculate the Mode
To calculate the mode:
- Step 1: Identify the value(s) that occur most frequently in the dataset.
- Step 2: The value(s) with the highest frequency is the mode.
4.3 When to Use the Mode
The mode is useful for categorical data (e.g., identifying the most common category) or for datasets with repeated values. It is less commonly used for continuous data unless you want to identify the most frequent value.
5. Practice Questions
Test your understanding of the measures of central tendency with these practice questions. Try answering them before checking the solutions.
Question 1:
A dataset contains the following values: 2, 4, 4, 5, 7, 8, 10. What is the mean of this dataset?
Answer 1:
Answer, click to reveal
[math] \text{Mean} = \frac{2 + 4 + 4 + 5 + 7 + 8 + 10}{7} = \frac{40}{7} \approx 5.71 [/math]
Question 2:
The following values represent the number of hours worked by a group of individuals: 20, 25, 30, 35, 40. What is the median?
Answer 2:
Answer, click to reveal
Since there is an odd number of values, the median is the middle value, which is 30.
Question 3:
A set of test scores contains the following values: 85, 90, 90, 95, 100. What is the mode?
Answer 3:
Answer, click to reveal
The mode is 90, as it appears more frequently than the other values.
6. Conclusion
Measures of central tendency—mean, median, and mode—are fundamental tools in descriptive statistics that help summarize data by identifying its central point. Each measure has its own strengths and is appropriate for different types of data or distributions. Choosing the right measure of central tendency depends on the characteristics of the data, such as whether the data include outliers or are skewed.
Remember:
- The mean is best used for datasets without outliers, as it accounts for all data points.
- The median is best for skewed datasets or those with outliers, as it is less affected by extreme values.
- The mode is useful for categorical data or when identifying the most frequent value in a dataset.
Final Tip for the CPH Exam:
Ensure you understand how to calculate and interpret each measure of central tendency. In particular, keep in mind what might be the best for situations with normal distributions, skewed distributions, and so on.
Humanities Moment
The featured image for this article is Albicocca Alessandrina a Mandorla Amara. [Armeniaca Alexandrina ; Apricot (1817-1839) by Giorgio Gallesio (Italian, 1772-1839). Giorgio Gallesio, an 18th-19th century Italian botanist, specialized in citrus and authored the influential Traité du citrus, which challenged prevailing ideas by showing hybrids result from outcross pollination, not grafting. He also explored species compatibility and coined the term “dominant” in hereditary studies, cementing his legacy in botany and genetics.