Practical Detection

Log-log plots, the mean excess function, and other tools for identifying fat tails in real data.

How do you know if your data has fat tails? This section covers practical techniques for detecting heavy-tailed behavior, from visual methods to formal statistical tests.

The Problem with Sample Kurtosis

A natural first thought: compute the sample kurtosis. The Gaussian has kurtosis 3, so higher values might indicate fat tails. But there's a fundamental problem.

Definition

Sample Kurtosis

Key Insight

The Kurtosis Paradox

If the true distribution has α < 4 (e.g., Pareto with α = 3), the theoretical kurtosis is infinite. But your sample kurtosis will always be finite — often surprisingly close to 3!

This is because extreme values are rare in any finite sample. The sample kurtosis underestimates tail heaviness precisely when it matters most.

Example

Simulation Illustration

Take 1000 samples from Pareto with α = 2.5 (infinite kurtosis):

Sample kurtosis often comes out between 5-15
Sometimes you get 3-4 (looks Gaussian!)
Occasionally 50+ (when you catch an extreme)

The sample kurtosis is extremely unstable — it tells you more about whether you happened to catch an extreme observation than about the true distribution.

Log-Log Plots

The most reliable visual tool for detecting power law tails is the log-log plot.

Definition

Log-Log Plot

Plot against . If the distribution has a power law tail , this appears as a straight line with slope -α.

The logic:

Read: “Log of the survival probability equals log c minus alpha times log x”

On log-log scale, a power law becomes a straight line with slope -α

Practical Implementation

Sort your data: X₍₁₎ ≤ X₍₂₎ ≤ ... ≤ X₍ₙ₎
Estimate
Plot log(X₍ᵢ₎) vs log((n-i)/n)
Look for linearity in the upper tail (large X values)

Key Insight

What to Look For

Straight line in the tail region → power law with slope = -α

Downward curve → lighter than power law (exponential, Gaussian)

Upward curve → heavier than simple power law (possibly infinite mean)

The Mean Excess Function

Definition

Mean Excess Function

The mean excess function (or mean residual life) at threshold u is:

Read: “e of u equals the expected value of X minus u, given that X exceeds u”

The average amount by which X exceeds u, among those values that do exceed u

The mean excess function reveals tail behavior through its shape:

Exponential: e(u) = constant

For Exp(λ), e(u) = 1/λ regardless of u. This is the memoryless property.

Gaussian: e(u) → 0

For Gaussian, e(u) decreases to 0 as u → ∞. Light tails exhaust quickly.

Pareto: e(u) increases linearly

For Pareto(α > 1), e(u) = u/(α-1). The higher the threshold, the more excess on average!

Example

Pareto Mean Excess

For Pareto with α = 2 and x_m = 1:

e(1) = 1 — average excess above 1 is 1
e(10) = 10 — average excess above 10 is 10
e(100) = 100 — average excess above 100 is 100

The conditional mean keeps growing — there's always more extreme territory ahead.

Explore: Mean Excess Plots

Compare mean excess functions for different distributions. Notice how the shape immediately reveals the tail character.

αPareto tail exponent

2.00

Show Gaussian and Exponential for comparison

Mean Excess Function e(u) = E[X - u | X > u]

Loading chart...

Gaussian

e(u) → 0 as u → ∞

Once you're in the tail, there's not much further to go. Tails exhaust quickly.

Exponential

e(u) = 1/λ (constant)

Memoryless property. No matter how high the threshold, expected excess is the same.

Pareto

e(u) = u/(α-1) (linear)

The higher you go, the more excess to expect. Fat tails never “run out”.

Practical Detection

To detect fat tails in your data: estimate the mean excess at various thresholds.

Decreasing e(u): Thin tails (Gaussian-like)
Constant e(u): Exponential-like (boundary case)
Increasing e(u): Fat tails — beware!

A Practical Detection Checklist

Visual inspection: Does your histogram have a long right tail with a few extreme values far from the bulk?
Log-log plot: Is the tail region approximately linear? Estimate slope to get α.
Mean excess plot: Does e(u) increase with u (fat tails), stay constant (exponential), or decrease (thin tails)?
Stability test: Remove the largest observation. Does your mean change dramatically? That's a fat tail signature.
Historical context: In your domain, have events occurred that were “unprecedented” or “impossible” under normal assumptions?

Key Insight

The Masquerade Problem

With limited data, fat-tailed distributions can masquerade as thin-tailed ones. If you haven't observed an extreme event, your data may look normal. This is not a failure of your tests — it's intrinsic to fat tails.

When in doubt about the domain (finance, catastrophe, epidemic), assume fat tails until proven otherwise.

Formal Statistical Tests

Several formal tests exist, though all have limitations:

Hill Estimator

Estimates the tail exponent α using only the k largest observations. Choice of k is crucial and often difficult.

Pickands Estimator

Based on extreme value theory, more robust but less efficient than Hill.

Moment Ratio Tests

Compare ratios of sample moments to theoretical values. Sensitive to sample size and extreme observations.

Key Takeaways

Sample kurtosis is unreliable — it underestimates tail heaviness
Log-log plots reveal power laws as straight lines with slope -α
The mean excess function e(u) = E[X-u | X>u] characterizes tail behavior: increasing → fat tails, constant → exponential, decreasing → thin tails
Limited data can hide fat tails — the masquerade problem
When uncertain about tail behavior, assume fat tails as the safer default