Evidence-Based Approaches to Public Health: Biostatistics – Descriptive Statistics: Measures of Dispersion (Range, Variance, Standard Deviation)
In this tutorial, we will explore the key measures of dispersion used in descriptive statistics: range, variance, and standard deviation. These measures help describe the spread or variability of data in a dataset.
By the end of this tutorial, you will understand what each measure of dispersion represents, how to calculate them, and when to use them. We will also provide practice questions to help reinforce your understanding.
Table of Contents:
- Introduction to Measures of Dispersion
- What is the Range?
- Definition of Range
- How to Calculate the Range
- When to Use the Range
- What is the Variance?
- Definition of Variance
- How to Calculate the Variance
- When to Use the Variance
- What is the Standard Deviation?
- Definition of Standard Deviation
- How to Calculate the Standard Deviation
- When to Use the Standard Deviation
- Practice Questions
- Conclusion
1. Introduction to Measures of Dispersion
Measures of dispersion describe how spread out or scattered the values in a dataset are. While measures of central tendency (mean, median, mode) summarize the center of the data, measures of dispersion provide insights into the variability of the data. This is important because two datasets can have the same mean but different levels of variability, which can impact data interpretation.
The most common measures of dispersion are range, variance, and standard deviation.
2. What is the Range?
The range is the simplest measure of dispersion. It represents the difference between the highest and lowest values in a dataset, showing the total spread of the data.
2.1 Definition of Range
The range is the difference between the maximum and minimum values in a dataset. It provides a basic measure of the overall spread of the data, but it does not consider the distribution of the values between the extremes.
2.2 How to Calculate the Range
The formula for calculating the range is:
[math] \text{Range} = \text{Maximum Value} – \text{Minimum Value} [/math]
For example, if the highest value in a dataset is 90 and the lowest value is 50, the range is:
[math] \text{Range} = 90 – 50 = 40 [/math]
2.3 When to Use the Range
The range is useful when you want a quick, general sense of the spread of data. However, it is sensitive to outliers, as a single extreme value can significantly affect the range. Therefore, it is not always the most reliable measure of dispersion for datasets with outliers.
3. What is the Variance?
Variance is a measure of dispersion that calculates how much each value in a dataset differs from the mean. Variance gives more weight to larger deviations from the mean, making it a good measure of variability, especially in datasets with widely spread values.
3.1 Definition of Variance
Variance is the average of the squared differences between each data point and the mean of the dataset. It provides insight into the overall variability of the data, but because the differences are squared, variance is not in the same unit as the original data (e.g., if the data are in units of dollars, the variance is in dollars squared).
3.2 How to Calculate the Variance
The formula for variance differs slightly depending on whether you are calculating it for a population or a sample.
- Population variance: [math] \sigma^2 = \frac{\sum (x – \mu)^2}{N} [/math]
- Sample variance: [math] s^2 = \frac{\sum (x – \overline{x})^2}{n – 1} [/math]
Where:
- Σ(x – μ)²: The sum of the squared differences between each value and the population mean (μ).
- Σ(x – x̄)²: The sum of the squared differences between each value and the sample mean (x̄).
- N: The number of values in the population.
- n – 1: The number of values in the sample minus 1 (this is known as Bessel’s correction and is used to provide an unbiased estimate of the population variance).
3.3 When to Use the Variance
Variance is useful when you want to understand the overall variability of a dataset, especially in datasets with large differences from the mean. However, because variance is in squared units, it is often less interpretable than standard deviation, which presents the variability in the same units as the original data.
4. What is the Standard Deviation?
The standard deviation is the square root of the variance. It provides a measure of dispersion in the same units as the original data, making it easier to interpret than variance. Standard deviation is widely used in public health and biostatistics to measure the variability or spread of data.
4.1 Definition of Standard Deviation
Standard deviation measures the average amount by which each data point differs from the mean. It is widely used because it is in the same units as the original data, making it easier to understand. A low standard deviation indicates that the data points are close to the mean, while a high standard deviation indicates that the data points are spread out over a wider range of values.
4.2 How to Calculate the Standard Deviation
The formula for standard deviation is:
- Population standard deviation: [math] \sigma = \sqrt{\frac{\sum (x – \mu)^2}{N}} [/math]
- Sample standard deviation: [math] s = \sqrt{\frac{\sum (x – \overline{x})^2}{n – 1}} [/math]
Where:
- Σ(x – μ)²: The sum of the squared differences between each value and the population mean (μ).
- Σ(x – x̄)²: The sum of the squared differences between each value and the sample mean (x̄).
- N: The number of values in the population.
- n – 1: The number of values in the sample minus 1.
4.3 When to Use the Standard Deviation
Standard deviation is used when you need a measure of variability that is easy to interpret. It is useful for comparing the spread of data between different datasets and for understanding the typical amount by which data points deviate from the mean.
5. Practice Questions
Test your understanding of the measures of dispersion with these practice questions. Try answering them before checking the solutions.
Question 1:
A dataset contains the following values: 10, 15, 20, 25, 30. What is the range?
Answer 1:
Answer, click to reveal
[math] \text{Range} = 30 – 10 = 20 [/math]
Question 2:
A dataset contains the values 5, 7, 8, 10, and 12. Calculate the variance (sample variance).
Answer 2:
Answer, click to reveal
Step 1: Find the mean: [math] \overline{x} = \frac{5 + 7 + 8 + 10 + 12}{5} = 8.4 [/math]
Step 2: Subtract the mean from each value and square the result: [math] (5 – 8.4)^2 = 11.56 [/math], [math] (7 – 8.4)^2 = 1.96 [/math], [math] (8 – 8.4)^2 = 0.16 [/math], [math] (10 – 8.4)^2 = 2.56 [/math], [math] (12 – 8.4)^2 = 12.96 [/math]
Step 3: Sum the squared differences: [math] 11.56 + 1.96 + 0.16 + 2.56 + 12.96 = 29.2 [/math]
Step 4: Divide by (n – 1): [math] s^2 = \frac{29.2}{4} = 7.3 [/math]
Question 3:
A sample has a variance of 9. What is the standard deviation?
Answer 3:
Answer, click to reveal
[math] s = \sqrt{9} = 3 [/math]
6. Conclusion
Measures of dispersion—range, variance, and standard deviation—are critical tools in descriptive statistics that help us understand the spread or variability of data. While the range provides a quick overview of the data spread, variance and standard deviation offer more detailed insights into how data points deviate from the mean.
Remember:
- The range gives the difference between the maximum and minimum values but is sensitive to outliers.
- The variance measures the average squared deviation from the mean, but its units are not the same as the original data.
- The standard deviation is the square root of the variance and is in the same units as the original data, making it easier to interpret.
Final Tip for the CPH Exam:
Make sure you understand how to calculate and interpret each measure of dispersion, and how they are used in studies.
Humanities Moment
The featured image for this article is Öland (1912) by Helge Johansson (Swedish, 1886–1926). Unfortunately, despite Johansson’s artwork showing up at multiple auctions and the works garnering thousands of dollars, there is very little biographical information available.