"These two things move together" is where a lot of analysis stops. It's not where a defensible thesis can. A correlation strong enough to backtest can still evaporate in production because something else was driving both sides. The work that matters is ruling that out — and there's a toolkit for it. To show it concretely, we'll point each technique at a question we investigated publicly: why has NBA home-court advantage collapsed from 68% to 55%? The answer turned out to be the three-point revolution — but believing that required proving it, not just plotting it.
1. Control groups
In your backtest: the strongest way to show a signal reflects the effect you claim is to find a comparison group that experienced the same broad conditions but not the specific factor you're testing. If the outcome moves in the exposed group and stays flat in the control, the shared conditions — the market-wide moves, the secular trends — aren't the explanation. Yours is.
Worked proof: the obvious objection to "the three-point shot killed home-court advantage" is that basketball just professionalized — better travel, analytics, officiating — and that would erode home advantage everywhere. So we used the EuroLeague as a control group. Same sport, same era, same improvements in travel and analytics, equally hit by COVID. But its three-point volume barely changed (+5 attempts vs. the NBA's +19.5), and its home advantage barely moved (−0.7pp vs. the NBA's −4.4pp). Same conditions, only one league had the three-point explosion, and only that league lost its home edge.
What this means for your backtest: before attributing returns to your signal, ask what unexposed universe should have moved too if the market — not your factor — were the cause. If it didn't move, you've ruled out the biggest alternative explanation.
2. Natural experiments
In your backtest: you rarely get to run a randomized trial on markets, but the world runs them for you — a rule change, an index reconstitution, a regulatory shift, an exogenous shock nobody in the data chose. Because the change is external to the actors you're studying, the before/after behaves like a controlled experiment, and that's what lets you say "caused," not just "correlated."
Worked proof: from 1994–95 through 1996–97 the NBA shortened the three-point line from 23'9" to 22' — an exogenous rule change nobody chose for our purposes. If the three-point theory were right, this should have spiked three-point volume and dented home advantage, then partly reversed when the line moved back. That's exactly what happened: attempts nearly doubled, home win% fell 2.6pp, and both partly recovered when the line was restored. The forced change produced the predicted effect.
What this means for your backtest: hunt for the exogenous events in your window — a rule change, a venue change, a forced reconstitution — and test whether your signal behaves the way your causal story predicts across them. A signal that survives a natural experiment is in a different class from one that only correlates.
3. Confounder elimination
In your backtest: a confounder is a third variable that drives both your apparent cause and your effect, manufacturing a correlation that isn't causal. The discipline is to enumerate the plausible alternative explanations and knock them down one by one — not to stop at the first relationship that looks strong.
Worked proof: we tested ten hypotheses, not one. Travel distance was ruled out (correlation of −0.01; the decline was uniform across every distance band). Time-zone/jet-lag effects were real per game but flat over time, so they couldn't explain a trend. Pace of play pointed the opposite way — faster play actually increases home advantage. Seasonal fatigue and even weekend-crowd size showed essentially no effect. COVID revealed the crowd's value but didn't cause the decline, which was already underway. Only after the alternatives fell did the three-point explanation stand.
What this means for your backtest: write down every alternative driver before you commit capital, and show your work ruling each out. The credibility of a finding is largely the credibility of the confounders you eliminated.
4. The ecological fallacy (levels of analysis)
In your backtest: a relationship that holds at the aggregate level need not hold within any single group — and vice versa. Conflating a cross-sectional pattern with a time-series one is how a backtest produces a signal that looks powerful but doesn't trade, because it was measured at the wrong level.
Worked proof: across eras, three-point volume and home advantage correlate at a striking −0.88. But within any single season, the team-level correlation is near zero (−0.03). That isn't a contradiction — it's a textbook ecological effect. Within a season, the gap between a team taking 35 versus 40 attempts is noise against team quality; across eras, the shift from 2.4 to 37.6 attempts changed the game for everyone at once. Reading only the within-season number would have killed a real finding; reading only the macro number without understanding why would have been luck.
What this means for your backtest: be explicit about the level your signal lives at — security, sector, or regime — and don't assume a cross-sectional edge survives as a time-series one. Many "alpha decay" surprises are levels-of-analysis errors in disguise.
Why this is the work
Anyone can find a correlation. The reason an investment committee can trust a number is the chain of evidence behind it — a control group, a natural experiment, the confounders eliminated, the level of analysis stated. That discipline is what separates an analysis worth acting on from a chart. It's the same standard we hold our bespoke data work and technology due diligence to.
The full analysis, code, and datasets behind the worked examples are public in the project repository — reproducibility is part of the standard.