statisticsdata sciencedecision makingmeasurement

Goodhart's Law: Why Every Metric You Optimize Will Eventually Betray You

C. Pearson C. Pearson
/ / 4 min read

There is a pattern that repeats itself across business, medicine, education, and machine learning with depressing regularity. Someone picks a number to measure success. They optimize hard for that number. The number goes up. The thing they actually cared about gets worse.

Detailed close-up of a hand pointing at colorful charts with a blue pen on wooden surface. Photo by Lukas Blazek on Pexels.

That pattern has a name: Goodhart's Law.

Charles Goodhart was a British economist who noticed something uncomfortable about monetary policy in the 1970s. The moment a government starts using a statistical measure as a target for policy, that measure stops being a reliable indicator of the thing it was supposed to represent. His original framing was narrow and technical. The broader version, popularized by social scientist Marilyn Strathern, is the one that should follow every data scientist to their grave: When a measure becomes a target, it ceases to be a good measure.

Sit with that for a second. Every metric you have ever used as a performance goal is actively rotting.

The Cobra Effect Is Older Than Goodhart

Before Goodhart formalized this, the British colonial government in India was already living it. They wanted to reduce the cobra population in Delhi, so they offered a bounty per dead cobra. Sensible policy. What happened next was entirely predictable in retrospect: people started breeding cobras to collect bounties. When the government scrapped the program, the breeders released their now-worthless snakes. The cobra population ended up higher than when they started.

This is Goodhart's Law in its purest form. The metric (dead cobras submitted) stopped representing the goal (fewer cobras in the wild) the moment it became a target. The incentive structure ate the signal.

This Is Not a Historical Curiosity

Walk into any organization with OKRs, KPIs, or dashboards and you will find this happening right now. Customer support teams optimizing for average handle time start rushing calls, which increases repeat contacts. Engineering teams measured on lines of code shipped write verbose, low-quality code. Schools teaching to standardized tests produce students who are excellent at standardized tests.

In machine learning, it takes a specific and painful form. You pick a loss function as a proxy for the real-world outcome you want. You optimize ruthlessly. Your model minimizes the loss. The real-world outcome is untouched or degraded. Reward hacking in reinforcement learning is Goodhart's Law with a learning algorithm doing the exploiting instead of a human.

Here is a rough picture of how the failure mode works:

graph TD
    A[Real Goal] --> B[Proxy Metric]
    B --> C{Metric Becomes Target}
    C --> D[Optimization Pressure]
    D --> E[Metric Improves]
    D --> F[Metric Decouples from Goal]
    F --> G[Real Goal Degrades]

Notice that E and G happen simultaneously. The dashboard looks green. The reality is not.

Why Smart People Keep Walking Into This

Measuring the actual goal is hard. Usually it involves time delays, messy data collection, or subjective judgment that resists quantification. So you substitute a proxy. Proxies are fast, clean, and cheap. They feel rigorous because they produce numbers.

The substitution seems reasonable at first because the proxy and the goal are genuinely correlated before optimization begins. Step count correlates with cardiovascular health in a general population. Once you start incentivizing step counts specifically, people shake their phones, take short pointless walks, and avoid activities that don't register well. The correlation evaporates because you destroyed the natural variation that created it.

This is the measurement trap: correlation under observation is not the same as correlation under intervention. Goodhart's Law is, at its root, a warning about treating observational relationships as if they survive optimization pressure.

What You Can Actually Do About It

Rotate your metrics. If no single number is the official target for long enough, it becomes harder to game systematically. This feels uncomfortable to people who want consistency, but consistency is exactly what creates the exploitation window.

Use multiple proxies that are hard to optimize simultaneously. If customer satisfaction, resolution rate, and repeat contact rate all matter, it becomes difficult to juice one without hurting another. The redundancy creates a kind of robustness.

Building in qualitative audits matters too. Numbers should have to answer to reality periodically, not the other way around. Ask whether the metric still feels connected to what you actually care about. When the answer gets murky, that is the earliest warning you will get.

Most importantly: hold your metrics with some suspicion from the start. Every KPI is a simplification of something real, and simplifications under pressure become fictions. Knowing that going in changes how you read the dashboard.

Your metrics are proxies. They are not the thing. Optimizing hard enough for any proxy will eventually teach you the difference.

Get Mean Methods in your inbox

New posts delivered directly. No spam.

No spam. Unsubscribe anytime.

Related Reading