How to lie with statistics [2] - Darrell Huff

Chapter 2 - The Well-Chosen Average

The chapter discusses how different types of averages: mean, median, and mode can be used to represent the same data but lead to vastly different interpretations.

  • Mean: The arithmetic average, calculated by summing all values and dividing by the number of values. It is sensitive to outliers and can be skewed by extreme values.

  • Median: The middle value in a dataset when ordered from lowest to highest. It is less affected by outliers and represents the "typical" value in a dataset.

  • Mode: The value that appears most frequently in a dataset. It is not influenced by outliers but may not be representative of the entire dataset.

Example of mean, mode and median:*A list of ages of 10 people is: {15, 18, 22, 25, 25, 30, 33, 40, 45, 50}
+++ Mean: (15+18+...+45+50) / 10 = 293 / 10 = 29.3
+++ Median: (25 + 30) / 2 = 27.5
+++ Mode: 25 (appears twice)*

Examples of the well-chosen average:

  • Income: A company might report a high average (mean) salary to attract employees, even if most employees earn much less. This is because the salaries of a few highly paid executives can significantly inflate the mean. The median salary would be a more accurate reflection of what a typical employee earns.

  • Real Estate: A real estate agent might use the mean to advertise high average home prices in a neighborhood, even if most homes are more affordable and only a few luxury homes drive up the average. The median home price would be a better indicator of the typical cost of a home in the area.

  • Test Scores: A school might report the mean test score of its students, which could be misleading if a few exceptionally high-performing students skew the results. The median test score would provide a more accurate representation of how the average student performed.

The Importance of Context:

The chapter emphasizes that understanding the context of the data is crucial for interpreting averages correctly. It's essential to know:

  • Which average is being used: Is it the mean, median, or mode?

  • What is the distribution of the data: Are there outliers that could be skewing the mean?

  • What is the purpose of the statistic: Is it being used to inform, persuade, or mislead?

By asking these questions, readers can become more critical consumers of statistics and avoid being deceived by misleading averages.