Better Estimators

Trimmed means, the Hill estimator, and other approaches for fat-tailed data.

If the sample mean and variance are unreliable for fat-tailed data, what should we use instead? Several alternative estimators have been developed, each with trade-offs between robustness and efficiency.

The Trimmed Mean

Definition

Trimmed Mean

The trimmed mean removes the top and bottom of observations before computing the average:

where are the order statistics (sorted values) and .

Advantages:

  • More robust to extreme observations
  • Reduces the influence of outliers
  • Simple to compute and understand

Disadvantages:

  • Throws away the most informative observations — the extremes tell you about the tail
  • Choice of trimming percentage is arbitrary
  • May bias the estimate if the distribution is asymmetric
Key Insight

The Fundamental Trade-off

Trimmed means gain stability by ignoring extremes. But in fat-tailed domains, the extremes contain crucial information about risk. You're trading accuracy for precision — getting a more stable number that may miss the point entirely.

The Median

Definition

Median

The median is the middle value when observations are sorted:

where denotes the -th smallest observation.

The median is extremely robust — it can't be unduly influenced by any finite number of extreme values.

Advantages:

  • Maximum robustness to outliers
  • Always exists (no convergence issues)
  • Often more intuitive than the mean

Disadvantages:

  • Uses none of the tail information
  • Inefficient for thin-tailed data (wastes information)
  • Doesn't tell you about risk — only the typical value
Example

Median Wealth

US median household wealth is about $120,000, while mean wealth is over $700,000. The median is more "typical" — most households are closer to the median. But the mean captures the fact that a few extremely wealthy households exist.

For personal financial planning, the median is relevant. For understanding total economic resources, the mean matters (even if unstable).

The Hill Estimator

Rather than trying to estimate the mean (which may not converge), we can estimate the tail exponent directly. This tells us about the fat-tailedness of the distribution.

Building Intuition: The Log-Log Slope

Before looking at the formula, let's understand why it works. The key insight comes from the Pareto survival function:

Taking logarithms of both sides:

Read: log of S(x) equals alpha times log of x-m minus alpha times log of x

On a log-log plot, this is a straight line with slope -α!

Key Insight

The Geometric Idea

If we plot the empirical survival function on a log-log scale, points from a Pareto distribution should fall on a straight line. The slope of that line equals .

The Hill estimator is essentially fitting this slope using only the tail observations — the k largest values — where the power-law behavior is most reliable.

Interactive: Understanding the Hill Estimator

The Hill estimator works by measuring the slope of the survival function on a log-log plot
αtrue tail index
2.50
nsample size
500.00

Step 1: From Pareto to Log-Log

For a Pareto distribution, the survival function is:

S(x) = P(X > x) = (xm/x)α

Taking logs of both sides:

log S(x) = α·log(xm) − α·log(x)

This is a straight line with slope −α on a log-log plot!

Log-Log Plot of Empirical Survival Function

Loading chart...

Points should follow a straight line for power-law tails; slope = −α

Step 2: Estimate the Slope

The Hill estimator uses only the k largest observations to estimate the slope. This focuses on the tail, where the power-law behavior matters most.

Mathematically, it computes:

α̂Hill = k / Σi=1k ln(X(n−i+1) / X(n−k))

This is essentially the reciprocal of the average log-spacing between the k largest observations and the k-th largest.

knumber of tail observations
50.00

True α

2.50

Hill estimate (k = 50)

2.26

Relative error

9.7%

Hill Estimate vs. Number of Tail Observations (k)

Loading chart...

Too small k: high variance. Too large k: bias from non-tail observations.

The k Trade-off

Choosing k is the art of the Hill estimator:

  • k too small: High variance — estimate jumps around
  • k too large: High bias — includes non-tail observations
  • Just right: Look for a "plateau" in the Hill plot where the estimate stabilizes

The Formal Definition

Definition

Hill Estimator

The Hill estimator for the tail exponent uses the largest observations:

Key insight: The Hill estimator uses only the tail observations, which is precisely where the relevant information lives.

Challenges:

  • Choice of : Too small gives high variance; too large includes non-tail observations
  • Assumes power-law tails: Will give misleading results for non-power-law distributions
  • Sensitive to dependence: Requires independent observations
Key Insight

Why Estimate Alpha?

Once you know , you know which moments exist:

  • : Mean exists
  • : Variance exists
  • : n-th moment exists

This guides which estimators and methods are appropriate. If , don't waste time computing sample variances.

Interactive: Hill Plot

Explore the Hill estimator by generating Pareto samples and visualizing how the estimate varies with the choice of k:

αtrue tail exponent
2.00
nsample size
1000
Loading chart...

True α

2.0

Best estimate

2.00

At k ≈

432

The bias-variance tradeoff: Small k uses few data points (high variance, low bias). Large k includes observations from the body of the distribution (low variance, high bias). Look for a "stable" region where the estimate plateaus near the true value.

The Hill estimator estimates the tail exponent α of a Pareto-like distribution using the k largest order statistics. The challenge is choosing k:

  • Too small: High variance — the estimate jumps around
  • Too large: High bias — we include non-tail data
  • Just right: A plateau region where the estimate is stable

In practice, finding this optimal k is one of the key challenges in fat-tailed estimation. The Hill plot helps visualize where the stable region might be.

Maximum Likelihood Estimation

Definition

Maximum Likelihood

Maximum likelihood estimation (MLE) finds the parameters that make the observed data most probable, assuming a specific distribution family:

Read: Theta-hat MLE equals the theta that maximizes the product of densities

Find the parameter value that would have made our data most likely

Advantages:

  • Uses all the data efficiently
  • Well-understood theoretical properties
  • Can estimate multiple parameters simultaneously

Critical limitation:

Model Misspecification Risk

MLE requires you to correctly specify the distribution family. If you assume Gaussian when the data is Pareto, your estimates will be wrong — and there's no warning. This is especially dangerous because fat-tailed data can look thin-tailed in small samples.

Example

Fitting the Wrong Distribution

You have 100 observations that look roughly bell-shaped. You fit a Gaussian and estimate , . Your model predicts the probability of seeing a value greater than 20 is essentially zero ().

But if the true distribution is a t-distribution with 3 degrees of freedom, values above 20 are actually quite possible (probability ~ 1%). Your model would catastrophically underestimate tail risk.

Comparing Estimators

EstimatorUses Extremes?Robust?Best For
Sample MeanYes (dominated by them)NoThin tails only
Trimmed MeanNo (discards them)ModerateReducing outlier influence
MedianNoMaximumTypical value
Hill EstimatorYes (focuses on them)ModerateTail exponent
MLEYesNoKnown distribution

Key Takeaways

  • Trimmed means gain robustness by discarding extremes, but lose crucial tail information
  • The median is maximally robust but tells you nothing about risk
  • The Hill estimator directly estimates the tail exponent , which determines what statistics are meaningful
  • MLE is powerful but dangerous if you specify the wrong distribution family
  • There is no perfect estimator — choose based on what you need to know and what assumptions you can justify