probabilitycognitive biasstatisticsBayesdecision making

Base Rate Neglect: Why Your Gut Probability Is Almost Always Wrong

C. Pearson C. Pearson
/ / 4 min read

There's a medical test for a rare disease. The test is 99% accurate. You test positive. How worried should you be?

Artistic display of blue dice in a glass and scattered red dice on a pastel blue background. Photo by DS stories on Pexels.

Most people say: very. After all, 99% accurate sounds nearly certain.

Here's what they're missing. If the disease affects 1 in 10,000 people, and you test 1,000,000 people, roughly 100 actually have it — and the test catches 99 of them. But the test also falsely flags 1% of healthy people. That's about 9,999 false positives. So among everyone who tests positive, fewer than 1% actually have the disease.

Your positive result, from a "99% accurate" test, still means you're almost certainly fine.

This is base rate neglect — and it doesn't just happen in contrived textbook examples. It happens in courtrooms, hiring decisions, fraud detection systems, and your company's A/B test interpretations. Probably more often than you'd like to admit.

What You're Ignoring When You Trust the Headline Number

Base rate neglect happens when you focus so hard on new information — a test result, a data point, a model's output — that you forget to weight it against how likely the thing was to begin with.

Technically, this is a failure to apply Bayes' theorem correctly. The posterior probability of something being true depends on two things: the likelihood of your evidence given that it's true, and the prior probability that it's true at all. Ignore the prior, and your posterior is fiction dressed up as analysis.

The formal relationship looks like this:

graph TD
    A[Prior Probability] --> C{Bayes Update}
    B[Likelihood of Evidence] --> C
    C --> D[Posterior Probability]
    D --> E{New Evidence?}
    E -->|Yes| C
    E -->|No| F[Final Estimate]

Skip node A — pretend there's no prior — and whatever comes out of node D is meaningless. You've solved for the wrong thing entirely.

Where This Kills Real Analysis

Consider fraud detection. A model flags a transaction as fraudulent with 95% precision on your test set. Sounds production-ready. But if only 0.1% of transactions are actually fraudulent, even a high-precision model drowns in false positives at scale. Your ops team spends their week chasing ghosts. Customers get blocked for buying groceries. The model isn't broken — your interpretation of it is.

Hiring is another minefield. Suppose a résumé screening tool has a strong correlation with future job performance. People get excited. But if genuinely exceptional candidates are rare (they are), the tool will misclassify enormous numbers of average candidates as exceptional ones — simply because there are so many more of them. You optimize your screen and accidentally exclude the people you wanted.

Same trap. Different label on the box.

Why Smart People Fall For This

Human brains are wired for representativeness. When something looks like the thing you're searching for, it triggers confidence — regardless of how rare that thing is. Kahneman and Tversky documented this thoroughly in the 1970s, and the decades since haven't improved our intuitions much.

The number in front of you feels real. The base rate feels abstract. So the brain discards the abstract and runs with the concrete.

Data scientists aren't immune. A confusion matrix showing 98% accuracy on a class-imbalanced dataset is a hallucination. You've built a model that learned to say "no" almost constantly — and it looks great on paper because "no" is almost always correct when the positive class is rare. The accuracy metric simply forgot to account for how often the positive class shows up at all.

The Fix Is Uncomfortable

You have to ask the boring, unglamorous question before you do anything else: how common is this thing in the real world?

Not in your training data. Not in your sample. In the actual population you're making decisions about.

Then you let that number do its work. If positive cases are rare, you need very high specificity — not just sensitivity. You need to be aggressive about false positives, not just false negatives. You need to report precision alongside recall, not accuracy alongside a press release.

Reporting posterior probabilities without disclosing the prior is, at best, sloppy. At worst, it's misleading.

None of this is complicated math. It's one formula. The hard part is the discipline to use it — to not get seduced by the shiny new signal and forget to ask what you already knew before it arrived.

Your model doesn't get to ignore the world it operates in. Neither do you.

Get Mean Methods in your inbox

New posts delivered directly. No spam.

No spam. Unsubscribe anytime.

Related Reading