The Pareto Distribution
The quintessential fat-tailed distribution — power laws and the 80/20 rule.
The Pareto distribution is THE fat-tailed distribution. Named after economist Vilfredo Pareto, who observed that 80% of Italy's wealth was owned by 20% of the population, it appears throughout nature and society. The key parameter — the tail exponent α — determines everything about its behavior.
Power Law Functions
Before we define the Pareto distribution, let's understand the mathematical function at its core: the power law.
Power Law
A power law relationship has the form:
As x increases, y decreases — but much more slowly than exponential decay. The exponent α (alpha) controls the rate of decay.
Read: “x to the minus alpha”
x raised to the power of negative alpha — the larger x gets, the smaller y becomes
The crucial difference from exponential decay:
- Exponential : decreases by a constant factor for each unit increase in x
- Power law : decreases more slowly — extreme values remain probable
Explore: Power Law Decay
Adjust the exponent α to see how it affects decay rate. Compare with exponential decay to see why power laws create “fat tails.”
Power Law Decay: y = x−α
Unlike exponential decay (which drops rapidly), power law decay is slow. The exponent α controls how fast it decays — but even large α gives much slower decay than exponential.
The Log-Log Signature
On a log-log plot, power laws appear as straight lines. The slope equals −α. This is how we identify power laws in real data.
Reading the log-log plot:
- Power law (x−α): straight line with slope = −α
- Exponential (e−x): curves downward (decays faster)
- If data looks linear on log-log, it suggests a power law
- The slope tells you the tail exponent α
Why power laws create “fat tails”:
- At x = 10: exponential gives e−10 ≈ 0.00005, but x−2 = 0.01
- At x = 100: exponential is essentially 0, but x−2 = 0.0001
- Power laws stay much larger at extreme values — “fat” tails
- This means extreme events are far more likely than exponential/Gaussian predict
The Signature of Fat Tails
Power law decay is slow. Even at x = 100 or x = 1000, there's still appreciable probability. This is why Pareto distributions produce extreme events far more often than Gaussian or exponential distributions predict. When you see a straight line on a log-log plot, you're seeing a power law — and the slope tells you how fat the tail is.
The Pareto PDF
Pareto Distribution
A random variable follows a Pareto distribution with scale parameter and shape parameter (tail exponent) if its PDF is:
We write .
The Survival Function
The survival function of the Pareto has a remarkably simple form — the quintessential power law:
Read: “S of x equals x_m over x, all to the power α”
The fraction exceeding x is proportional to x to the negative α
And the CDF:
The Power Law Signature
On a log-log plot, the survival function appears as a straight line:
The slope is . This linear relationship is the signature of power law distributions and is used to estimate α from data.
When Do Moments Exist?
This is where Pareto distributions reveal their wild nature. The n-th moment exists only if :
| α value | Finite Moments | Undefined Moments |
|---|---|---|
| None | Mean and all higher | |
| Mean only | Variance and higher | |
| Mean, Variance | Skewness and higher | |
| Mean, Variance, Skewness | Kurtosis and higher | |
| First 4 moments | May still have undefined higher moments |
Why Moments Diverge
The n-th moment integral is:
This integral converges only if , i.e.,. When , the integral diverges to infinity.
Infinite Mean (α = 0.8)
Consider wealth in an extreme winner-take-all economy with . The theoretical mean wealth is infinite. Any sample average will:
- Grow without bound as you add more observations
- Be dominated by the single largest observation
- Never “converge” to a stable value
Mean and Variance Formulas
When they exist:
Note how both approach infinity as α approaches their thresholds from above.
Explore: Pareto Tails and Moment Existence
Adjust α and observe how the tail thickness changes. Watch the moment existence indicators as you cross the critical thresholds.
Survival Function S(x) = P(X > x)
Slope = -α = -2.0. A straight line on log-log scale is the signature of a power law.
Moment Existence (α = 2.0)
Extreme Event Probabilities
P(X > 10) = 0.0100 = 1 in 100
P(X > 100) = 0.000100 = 1 in 10,000
P(X > 1000) = 1.00e-6 = 1 in 1,000,000
Compare to Gaussian: P(X > 10σ) ≈ 10^(-23). Pareto tails are vastly fatter.
Probability Density Function
The 80/20 Rule
Pareto observed that in many systems, a small fraction of causes produces a large fraction of effects. This “80/20 rule” (or Pareto Principle) arises from power law distributions:
Deriving 80/20
For a Pareto distribution, the fraction of total “mass” held by the top p fraction of the population is:
Setting Share = 0.8 and solving for α when p = 0.2:
- 80% owned by 20% requires α ≈ 1.16
- 90% owned by 10% requires α ≈ 1.05
- 99% owned by 1% requires α ≈ 1.005 (approaching α = 1)
Observed Power Laws
Many real-world phenomena follow approximate power laws:
- Wealth distribution (α ≈ 1.5-2 in most countries)
- City sizes (Zipf's law: α ≈ 1)
- Word frequencies in language (α ≈ 1)
- Book sales and website traffic (α ≈ 1.5-2)
- Earthquake magnitudes (Gutenberg-Richter law)
Why the Pareto Distribution Matters
Taleb's work emphasizes that Pareto-like distributions are the rule, not the exception, in many domains. Key implications:
- Sample statistics are unreliable. If variance is infinite (α ≤ 2), the sample variance doesn't converge.
- Extremes dominate. A single observation can be larger than the sum of all others.
- Historical data misleads. If you haven't seen the maximum, your estimates understate risk.
- The Central Limit Theorem fails. With infinite variance, sums don't converge to Gaussian.
The Winner-Take-All Effect
Consider 1000 authors with book sales following Pareto(1, 1.5):
- The median author sells ~1.6 units
- The mean is ~3 units (if it exists at all for the sample)
- But the top author might sell 1,000+ units
- That single author contributes more than half the total sales
This is Extremistan: a single observation dominates the aggregate.
Key Takeaways
- Pareto survival function: — power law decay
- The tail exponent α controls everything: moments, concentration, extremes
- Mean exists only if α > 1; variance only if α > 2
- Log-log plots are the diagnostic tool: slope = -α
- The 80/20 rule and related phenomena arise from power laws
- Standard statistics (sample mean, variance, confidence intervals) fail when α is too small