How to lie with statistics [9] - Darrell Huff
Chapter 9 - How to Statisticulate
"Statisticulate" is a term coined by the author to describe the deliberate misuse of statistics to mislead or deceive. The chapter explores various techniques and tricks employed to manipulate data and present it in a way that supports a desired narrative, often at the expense of accuracy and truthfulness.
Key points:
Percentages That Don't Add Up: This refers to the practice of using percentages in a misleading way. For example, a company might claim that its product is "50% better" than the competition, but fail to specify what it's better in terms of. Is it 50% more effective, 50% cheaper, or 50% more popular? The lack of context renders the percentage meaningless.
Spurious Accuracy: This involves presenting data with a level of precision that is not warranted by the underlying methodology or data collection process. For instance, reporting an average income of $45,321.47 suggests a degree of accuracy that is unlikely in income data, which is often rounded or estimated.
The Semiattached Figure: This technique involves using a statistic that is technically accurate but misleading due to its lack of context or relevance. For example, a toothpaste brand might claim that its product reduces cavities by 23%, but this figure might be based on a small or biased sample, or it might not be statistically significant.
The Well-Chosen Average: There are three main types of averages: mean, median, and mode. Each can paint a different picture of the data. A company might report its "average" salary as $75,000, but this could be the mean, which is skewed by a few high earners. The median or mode might be much lower and more representative of what most employees actually earn.
The Impressive Decimal: Adding decimals to a statistic can make it seem more precise and scientific, even if the underlying data is not very accurate. For example, a study might report a 100.4% improvement, which sounds impressive but could be statistically meaningless if the margin of error is high.
The Imaginary Baseline: Graphs can be manipulated to exaggerate or downplay trends by adjusting the baseline. For instance, a graph showing a sharp increase in sales might start at a non-zero baseline, making the growth appear more dramatic than it actually is.
Extrapolation: This involves projecting future trends based on past data, which can be misleading because it assumes that the past patterns will continue indefinitely. For example, predicting future stock prices based on past performance is risky because market conditions can change rapidly.
The Cautious Statement: This technique involves using vague or qualified language to make a claim seem more credible, even if there is little evidence to support it. For example, a politician might say, "Some experts believe..." to give the impression that their view is widely accepted, even if it's not.
The Misleading Correlation: Correlation does not equal causation. Just because two things happen at the same time or in a similar pattern doesn't mean one causes the other. For example, a study might find a correlation between ice cream sales and drowning deaths, but this doesn't mean eating ice cream causes drowning. Both could be influenced by a third factor, like hot weather.
The Unmentioned Error: All data collection and analysis involves some degree of error. Failing to acknowledge this error can mislead the audience. For example, a poll might report a candidate's approval rating as 55%, but fail to mention the margin of error, which could be +/- 5%, meaning the actual rating could be anywhere from 50% to 60%.