The Pareto Distribution

The quintessential fat-tailed distribution — power laws and the 80/20 rule.

The Pareto distribution is THE fat-tailed distribution. Named after economist Vilfredo Pareto, who observed that 80% of Italy's wealth was owned by 20% of the population, it appears throughout nature and society. The key parameter — the tail exponent α — determines everything about its behavior.

Power Law Functions

Before we define the Pareto distribution, let's understand the mathematical function at its core: the power law.

Definition

Power Law

A power law relationship has the form:

As x increases, y decreases — but much more slowly than exponential decay. The exponent α (alpha) controls the rate of decay.

Read: x to the minus alpha

x raised to the power of negative alpha — the larger x gets, the smaller y becomes

The crucial difference from exponential decay:

  • Exponential : decreases by a constant factor for each unit increase in x
  • Power law : decreases more slowly — extreme values remain probable

Explore: Power Law Decay

Adjust the exponent α to see how it affects decay rate. Compare with exponential decay to see why power laws create “fat tails.”

Power Law Decay: y = x−α

Unlike exponential decay (which drops rapidly), power law decay is slow. The exponent α controls how fast it decays — but even large α gives much slower decay than exponential.

αexponent
2.00
Loading chart...
0.250
y at x = 2
1.0e-2
y at x = 10
1.0e-4
y at x = 100

The Log-Log Signature

On a log-log plot, power laws appear as straight lines. The slope equals −α. This is how we identify power laws in real data.

Loading chart...

Reading the log-log plot:

  • Power law (x−α): straight line with slope = −α
  • Exponential (e−x): curves downward (decays faster)
  • If data looks linear on log-log, it suggests a power law
  • The slope tells you the tail exponent α

Why power laws create “fat tails”:

  • At x = 10: exponential gives e−10 ≈ 0.00005, but x−2 = 0.01
  • At x = 100: exponential is essentially 0, but x−2 = 0.0001
  • Power laws stay much larger at extreme values — “fat” tails
  • This means extreme events are far more likely than exponential/Gaussian predict
Key Insight

The Signature of Fat Tails

Power law decay is slow. Even at x = 100 or x = 1000, there's still appreciable probability. This is why Pareto distributions produce extreme events far more often than Gaussian or exponential distributions predict. When you see a straight line on a log-log plot, you're seeing a power law — and the slope tells you how fat the tail is.

The Pareto PDF

Definition

Pareto Distribution

A random variable follows a Pareto distribution with scale parameter and shape parameter (tail exponent) if its PDF is:

We write .

The Survival Function

The survival function of the Pareto has a remarkably simple form — the quintessential power law:

Read: S of x equals x_m over x, all to the power α

The fraction exceeding x is proportional to x to the negative α

And the CDF:

Key Insight

The Power Law Signature

On a log-log plot, the survival function appears as a straight line:

The slope is . This linear relationship is the signature of power law distributions and is used to estimate α from data.

When Do Moments Exist?

This is where Pareto distributions reveal their wild nature. The n-th moment exists only if :

α valueFinite MomentsUndefined Moments
NoneMean and all higher
Mean onlyVariance and higher
Mean, VarianceSkewness and higher
Mean, Variance, SkewnessKurtosis and higher
First 4 momentsMay still have undefined higher moments
Definition

Why Moments Diverge

The n-th moment integral is:

This integral converges only if , i.e.,. When , the integral diverges to infinity.

Example

Infinite Mean (α = 0.8)

Consider wealth in an extreme winner-take-all economy with . The theoretical mean wealth is infinite. Any sample average will:

  • Grow without bound as you add more observations
  • Be dominated by the single largest observation
  • Never “converge” to a stable value

Mean and Variance Formulas

When they exist:

Note how both approach infinity as α approaches their thresholds from above.

Explore: Pareto Tails and Moment Existence

Adjust α and observe how the tail thickness changes. Watch the moment existence indicators as you cross the critical thresholds.

αtail exponent
2.00

Survival Function S(x) = P(X > x)

Loading chart...

Slope = -α = -2.0. A straight line on log-log scale is the signature of a power law.

Moment Existence (α = 2.0)

Mean
✓ exists
> 1)
Variance
✗ infinite
> 2)
Skewness
✗ infinite
> 3)
Kurtosis
✗ infinite
> 4)
Mean = 2.00

Extreme Event Probabilities

P(X > 10) = 0.0100 = 1 in 100

P(X > 100) = 0.000100 = 1 in 10,000

P(X > 1000) = 1.00e-6 = 1 in 1,000,000

Compare to Gaussian: P(X > 10σ) ≈ 10^(-23). Pareto tails are vastly fatter.

Probability Density Function

Loading chart...

The 80/20 Rule

Pareto observed that in many systems, a small fraction of causes produces a large fraction of effects. This “80/20 rule” (or Pareto Principle) arises from power law distributions:

Example

Deriving 80/20

For a Pareto distribution, the fraction of total “mass” held by the top p fraction of the population is:

Setting Share = 0.8 and solving for α when p = 0.2:

  • 80% owned by 20% requires α ≈ 1.16
  • 90% owned by 10% requires α ≈ 1.05
  • 99% owned by 1% requires α ≈ 1.005 (approaching α = 1)
Key Insight

Observed Power Laws

Many real-world phenomena follow approximate power laws:

  • Wealth distribution (α ≈ 1.5-2 in most countries)
  • City sizes (Zipf's law: α ≈ 1)
  • Word frequencies in language (α ≈ 1)
  • Book sales and website traffic (α ≈ 1.5-2)
  • Earthquake magnitudes (Gutenberg-Richter law)

Why the Pareto Distribution Matters

Taleb's work emphasizes that Pareto-like distributions are the rule, not the exception, in many domains. Key implications:

  • Sample statistics are unreliable. If variance is infinite (α ≤ 2), the sample variance doesn't converge.
  • Extremes dominate. A single observation can be larger than the sum of all others.
  • Historical data misleads. If you haven't seen the maximum, your estimates understate risk.
  • The Central Limit Theorem fails. With infinite variance, sums don't converge to Gaussian.
Example

The Winner-Take-All Effect

Consider 1000 authors with book sales following Pareto(1, 1.5):

  • The median author sells ~1.6 units
  • The mean is ~3 units (if it exists at all for the sample)
  • But the top author might sell 1,000+ units
  • That single author contributes more than half the total sales

This is Extremistan: a single observation dominates the aggregate.

Key Takeaways

  • Pareto survival function: — power law decay
  • The tail exponent α controls everything: moments, concentration, extremes
  • Mean exists only if α > 1; variance only if α > 2
  • Log-log plots are the diagnostic tool: slope = -α
  • The 80/20 rule and related phenomena arise from power laws
  • Standard statistics (sample mean, variance, confidence intervals) fail when α is too small