Practical Detection
Log-log plots, the mean excess function, and other tools for identifying fat tails in real data.
How do you know if your data has fat tails? This section covers practical techniques for detecting heavy-tailed behavior, from visual methods to formal statistical tests.
The Problem with Sample Kurtosis
A natural first thought: compute the sample kurtosis. The Gaussian has kurtosis 3, so higher values might indicate fat tails. But there's a fundamental problem.
Sample Kurtosis
The Kurtosis Paradox
If the true distribution has α < 4 (e.g., Pareto with α = 3), the theoretical kurtosis is infinite. But your sample kurtosis will always be finite — often surprisingly close to 3!
This is because extreme values are rare in any finite sample. The sample kurtosis underestimates tail heaviness precisely when it matters most.
Simulation Illustration
Take 1000 samples from Pareto with α = 2.5 (infinite kurtosis):
- Sample kurtosis often comes out between 5-15
- Sometimes you get 3-4 (looks Gaussian!)
- Occasionally 50+ (when you catch an extreme)
The sample kurtosis is extremely unstable — it tells you more about whether you happened to catch an extreme observation than about the true distribution.
Log-Log Plots
The most reliable visual tool for detecting power law tails is the log-log plot.
Log-Log Plot
Plot against . If the distribution has a power law tail , this appears as a straight line with slope -α.
The logic:
Read: “Log of the survival probability equals log c minus alpha times log x”
On log-log scale, a power law becomes a straight line with slope -α
Practical Implementation
- Sort your data: X₍₁₎ ≤ X₍₂₎ ≤ ... ≤ X₍ₙ₎
- Estimate
- Plot log(X₍ᵢ₎) vs log((n-i)/n)
- Look for linearity in the upper tail (large X values)
What to Look For
Straight line in the tail region → power law with slope = -α
Downward curve → lighter than power law (exponential, Gaussian)
Upward curve → heavier than simple power law (possibly infinite mean)
The Mean Excess Function
Mean Excess Function
The mean excess function (or mean residual life) at threshold u is:
Read: “e of u equals the expected value of X minus u, given that X exceeds u”
The average amount by which X exceeds u, among those values that do exceed u
The mean excess function reveals tail behavior through its shape:
Exponential: e(u) = constant
For Exp(λ), e(u) = 1/λ regardless of u. This is the memoryless property.
Gaussian: e(u) → 0
For Gaussian, e(u) decreases to 0 as u → ∞. Light tails exhaust quickly.
Pareto: e(u) increases linearly
For Pareto(α > 1), e(u) = u/(α-1). The higher the threshold, the more excess on average!
Pareto Mean Excess
For Pareto with α = 2 and x_m = 1:
- e(1) = 1 — average excess above 1 is 1
- e(10) = 10 — average excess above 10 is 10
- e(100) = 100 — average excess above 100 is 100
The conditional mean keeps growing — there's always more extreme territory ahead.
Explore: Mean Excess Plots
Compare mean excess functions for different distributions. Notice how the shape immediately reveals the tail character.
Mean Excess Function e(u) = E[X - u | X > u]
Gaussian
e(u) → 0 as u → ∞
Once you're in the tail, there's not much further to go. Tails exhaust quickly.
Exponential
e(u) = 1/λ (constant)
Memoryless property. No matter how high the threshold, expected excess is the same.
Pareto
e(u) = u/(α-1) (linear)
The higher you go, the more excess to expect. Fat tails never “run out”.
Practical Detection
To detect fat tails in your data: estimate the mean excess at various thresholds.
- Decreasing e(u): Thin tails (Gaussian-like)
- Constant e(u): Exponential-like (boundary case)
- Increasing e(u): Fat tails — beware!
A Practical Detection Checklist
- Visual inspection: Does your histogram have a long right tail with a few extreme values far from the bulk?
- Log-log plot: Is the tail region approximately linear? Estimate slope to get α.
- Mean excess plot: Does e(u) increase with u (fat tails), stay constant (exponential), or decrease (thin tails)?
- Stability test: Remove the largest observation. Does your mean change dramatically? That's a fat tail signature.
- Historical context: In your domain, have events occurred that were “unprecedented” or “impossible” under normal assumptions?
The Masquerade Problem
With limited data, fat-tailed distributions can masquerade as thin-tailed ones. If you haven't observed an extreme event, your data may look normal. This is not a failure of your tests — it's intrinsic to fat tails.
When in doubt about the domain (finance, catastrophe, epidemic), assume fat tails until proven otherwise.
Formal Statistical Tests
Several formal tests exist, though all have limitations:
Hill Estimator
Estimates the tail exponent α using only the k largest observations. Choice of k is crucial and often difficult.
Pickands Estimator
Based on extreme value theory, more robust but less efficient than Hill.
Moment Ratio Tests
Compare ratios of sample moments to theoretical values. Sensitive to sample size and extreme observations.
Key Takeaways
- Sample kurtosis is unreliable — it underestimates tail heaviness
- Log-log plots reveal power laws as straight lines with slope -α
- The mean excess function e(u) = E[X-u | X>u] characterizes tail behavior: increasing → fat tails, constant → exponential, decreasing → thin tails
- Limited data can hide fat tails — the masquerade problem
- When uncertain about tail behavior, assume fat tails as the safer default