Problem Set 3: Estimation

Problems on the Hill estimator and CLT failure.

This problem set focuses on estimation under fat tails. You will work with the Hill estimator — a key tool for estimating tail exponents — and explore why the Central Limit Theorem fails for fat-tailed distributions.

These problems bridge theory and practice, showing why standard statistical methods break down and what alternatives exist.

Problem 6: Hill Estimator

Definition

Problem Statement

Given the 10 largest observations: 50, 45, 40, 38, 35, 33, 30, 28, 27, 25. Estimate using the Hill estimator with .

The Hill Estimator

The Hill estimator uses the largest order statistics to estimate the tail exponent:

Where are the order statistics (sorted from largest to smallest).

Step-by-Step

  1. Order statistics (already sorted):
  2. With , the threshold is
  3. Compute for
  4. Sum the logs and divide by
  5. Take the reciprocal to get
Key Insight

Choosing k

The choice of involves a bias-variance tradeoff. Too small: high variance. Too large: includes non-tail observations, introducing bias. In practice, one often examines a "Hill plot" showing as a function of .

Problem 7: CLT Failure

Definition

Problem Statement

For , compute the variance of the sample mean. What does this imply about confidence intervals?

Standard CLT Setup

For i.i.d. random variables with finite variance , the Central Limit Theorem tells us:

This is the foundation of confidence intervals:

The Problem with Pareto(1, 1.5)

  • For Pareto with , the variance is:

Since , the variance is infinite!

Key Insight

The CLT Breaks Down

When , the formula . The variance of the sample mean is infinite regardless of sample size. Standard confidence intervals are completely meaningless.

Example

Practical Implications

Consider a financial analyst computing a 95% confidence interval for average daily returns. If returns follow a fat-tailed distribution with , the confidence interval has no theoretical basis. The formula gives a number, but that number is pure fiction.

This is why Taleb emphasizes that many statistical procedures become "noise" under fat tails — they appear to give answers, but those answers are meaningless.

What Happens Instead?

For , a generalized CLT applies. The sum converges not to a Gaussian but to a stable distribution. The convergence rate is much slower — instead of .

What You Should Learn

  • The Hill estimator is a practical tool for estimating tail exponents from data
  • The choice of (number of order statistics) involves a tradeoff
  • Standard CLT-based methods require finite variance — they fail completely for
  • Confidence intervals computed from infinite-variance data are meaningless
  • Under fat tails, we need alternative approaches: stable limits, robust statistics, or bounds-based methods