Problem Set 3: Estimation
Problems on the Hill estimator and CLT failure.
This problem set focuses on estimation under fat tails. You will work with the Hill estimator — a key tool for estimating tail exponents — and explore why the Central Limit Theorem fails for fat-tailed distributions.
These problems bridge theory and practice, showing why standard statistical methods break down and what alternatives exist.
Problem 6: Hill Estimator
Problem Statement
Given the 10 largest observations: 50, 45, 40, 38, 35, 33, 30, 28, 27, 25. Estimate using the Hill estimator with .
The Hill Estimator
The Hill estimator uses the largest order statistics to estimate the tail exponent:
Where are the order statistics (sorted from largest to smallest).
Step-by-Step
- Order statistics (already sorted):
- With , the threshold is
- Compute for
- Sum the logs and divide by
- Take the reciprocal to get
Choosing k
The choice of involves a bias-variance tradeoff. Too small: high variance. Too large: includes non-tail observations, introducing bias. In practice, one often examines a "Hill plot" showing as a function of .
Problem 7: CLT Failure
Problem Statement
For , compute the variance of the sample mean. What does this imply about confidence intervals?
Standard CLT Setup
For i.i.d. random variables with finite variance , the Central Limit Theorem tells us:
This is the foundation of confidence intervals:
The Problem with Pareto(1, 1.5)
- For Pareto with , the variance is:
Since , the variance is infinite!
The CLT Breaks Down
When , the formula . The variance of the sample mean is infinite regardless of sample size. Standard confidence intervals are completely meaningless.
Practical Implications
Consider a financial analyst computing a 95% confidence interval for average daily returns. If returns follow a fat-tailed distribution with , the confidence interval has no theoretical basis. The formula gives a number, but that number is pure fiction.
This is why Taleb emphasizes that many statistical procedures become "noise" under fat tails — they appear to give answers, but those answers are meaningless.
What Happens Instead?
For , a generalized CLT applies. The sum converges not to a Gaussian but to a stable distribution. The convergence rate is much slower — instead of .
What You Should Learn
- The Hill estimator is a practical tool for estimating tail exponents from data
- The choice of (number of order statistics) involves a tradeoff
- Standard CLT-based methods require finite variance — they fail completely for
- Confidence intervals computed from infinite-variance data are meaningless
- Under fat tails, we need alternative approaches: stable limits, robust statistics, or bounds-based methods