statisticsdistributionsdata literacy

The Mean Is Lying to You (And You Keep Falling For It)

/ 3 min read / C. Pearson

Here's a number: the average American household income is about $105,000.

Overhead view of a smartphone displaying colorful charts on paper graphs, symbolizing mobile data analysis.

Does that sound right to you? If it doesn't match your experience or anyone you know -- congratulations, you've just experienced why the arithmetic mean is a terrible summary statistic for most real-world data.

pie title "Income Distribution"
    "Below median" : 60
    "Median to mean" : 25
    "Above mean" : 15

The median household income is about $75,000. That $30k gap isn't a rounding error. It's a fundamental property of how the mean behaves with skewed distributions, and it matters far more than most data scientists care to admit.

The Geometry of the Mean

The arithmetic mean is a center of mass. It's the balance point of the distribution. And like a balance point, it gets pulled toward heavy weights — in statistical terms, toward outliers and tail values.

For symmetric distributions (normal, uniform), the mean and median coincide. The mean is a perfectly fine summary. For the Gaussian world that most intro stats courses inhabit, everything is great.

The problem: almost nothing in the real world is normally distributed.

Income: Log-normal. Right-skewed. The mean is always higher than the median. Home prices: Same story. A few mansions drag the mean way above what typical buyers experience. Response times: Exponential or Pareto-distributed. The mean is dominated by slow outliers. Web metrics: Power law distributed. The average is almost meaningless.

Where This Actually Hurts

A product team reports that average page load time is 2.1 seconds. Sounds fine. But the median is 0.8 seconds and the p95 is 12 seconds. What's actually happening? Most users get a fast experience, but a significant minority is getting absolutely crushed — and the mean masks this entirely.

An investment fund reports average annual returns of 15%. But the median investor return is 4% because returns are concentrated in a few large accounts. Most investors aren't getting anywhere near 15%.

A clinical trial shows the average survival benefit is 3 months. But the distribution is bimodal — most patients see no benefit, and a small subset sees dramatic improvement. "Average 3 months" describes nobody's actual experience.

The Fix Is Embarrassingly Simple

Report the median. Report quantiles (p25, p50, p75, p95). Show the distribution.

That's it. There's no fancy technique here. The fix is literally "don't summarize a distribution with a single number that's sensitive to outliers."

Yet paper after paper, dashboard after dashboard, presentation after presentation: "the average is..."

The mean has its uses. It's mathematically convenient — it minimizes squared error, it's the maximum likelihood estimator for Gaussian data, it decomposes nicely. But mathematical convenience isn't the same as descriptive accuracy.

Next time someone reports an average, ask: what's the distribution? If they can't tell you, that average is telling you essentially nothing.

E[X] is an operator, not an answer.

Get Mean Methods in your inbox

New posts delivered directly. No spam.

No spam. Unsubscribe anytime.

Related Reading