Why Standard Statistics Fail
Sample mean and variance instability when moments don't exist or converge slowly.
We've seen that fat-tailed distributions can have infinite variance or even infinite mean. Now we face a practical question: what happens when we try to use standard statistical methods on fat-tailed data? The short answer: they fail, often catastrophically and silently.
The Unstable Sample Mean
Sample Mean
Given observations , the sample mean is:
For thin-tailed distributions (like the Gaussian), the sample mean is a wonderful estimator. The Law of Large Numbers guarantees it converges to the true mean, and the variance of the sample mean shrinks predictably.
But for fat-tailed distributions, the story is very different.
The Pareto Paradox
Consider a Pareto distribution with :
- The true mean exists (it's finite)
- But the variance is infinite
- Therefore, the variance of the sample mean is also infinite
This means the sample mean doesn't converge in the usual sense. It jumps around wildly, even with millions of observations. Each new extreme observation can dramatically shift the estimate.
The Even Worse Sample Variance
Sample Variance
The sample variance is:
For a Pareto distribution with :
- The sample variance doesn't converge to the true variance
- It can wildly overestimate or underestimate the true value
- Adding more data doesn't necessarily help — one extreme observation can dominate
Estimating Variance of Market Returns
Suppose you're estimating the volatility of stock returns, which empirically show fat tails with .
Your sample variance will be extremely sensitive to whether your sample happens to include a market crash. Before the 2008 crisis, most samples showed "moderate" volatility. Including 2008 dramatically increased the estimate. This isn't just bad luck — it's a mathematical property of fat tails.
Why Standard Statistics Fail
Standard statistical methods are built on assumptions that don't hold for fat-tailed distributions:
| Assumption | Reality Under Fat Tails |
|---|---|
| Finite variance | May be infinite |
| All observations contribute equally | One observation can dominate |
| CLT applies at moderate n | May need impossibly large n |
| Past data represents future risk | Extremes may not be in sample |
The Core Problem
For thin-tailed distributions, outliers are rare and have bounded influence. You can safely ignore them or treat them as errors.
For fat-tailed distributions, outliers ARE the story. They're not noise — they're the most informative part of the data. But they're also what makes estimation so hard.
A More Formal Look
For a Pareto distribution with :
Read: “The variance of the sample mean equals the variance of X divided by n, which is infinity divided by n, which is still infinity”
Dividing infinity by any finite number gives infinity — the sample mean's uncertainty doesn't shrink with more data
This is not a technical oddity. It has profound implications:
- Confidence intervals become meaningless (they're infinitely wide)
- Hypothesis tests have incorrect Type I error rates
- Regression coefficients don't converge properly
Standard statistical software will happily compute means, variances, and p-values for fat-tailed data. It won't warn you that these numbers are meaningless. You must know when your data violates the assumptions.
Key Takeaways
- The sample mean is highly unstable for fat-tailed distributions with , even though the true mean exists
- The sample variance doesn't converge for
- Standard statistics assume finite variance and moderate outlier influence — assumptions that fail under fat tails
- Statistical software won't warn you — understanding your data's tail behavior is your responsibility