Expectation and Moments
Mean, variance, skewness, and kurtosis - measuring the shape of distributions.
The PDF tells us everything about a distribution, but it's often useful to summarize a distribution with a few numbers. These summary statistics — called moments — describe the center, spread, and shape of the distribution.
A critical insight for understanding fat tails: moments may not always exist!
Expected Value (Mean)
Expected Value
The expected value (or mean) of a continuous random variable with PDF is:
We often denote the mean as μ (mu).
Mean as Center of Mass
Think of the PDF as a physical shape made of material. The mean μ is the point where you could balance this shape on a fulcrum — the center of mass. Values with higher density contribute more to the average.
Variance and Standard Deviation
Variance
The variance measures how spread out a distribution is around its mean:
We denote variance as (sigma squared).
Read: “Variance of X equals the expected value of X minus mu, squared”
The average squared distance from the mean — larger variance means more spread
Standard Deviation
The standard deviation is the square root of variance:
It's in the same units as , making it easier to interpret than variance.
Standard Deviation Interpretation
For a normal distribution, approximately 68% of the probability mass lies within one standard deviation of the mean (μ ± σ), and about 95% lies within two standard deviations (μ ± 2σ).
Explore: Mean and Variance
Adjust the mean and standard deviation to see how they affect the distribution. The shaded region shows ±1 standard deviation from the mean.
Mean (μ)
0.00
Center of mass
Std Dev (σ)
1.00
Spread around mean
Variance (σ²)
1.00
Squared spread
P(μ-σ ≤ X ≤ μ+σ)
74.1%
Shaded region
The mean as center of mass: If you imagine the PDF as a physical shape, the mean μ is where you could balance it on a fulcrum. The standard deviation σ measures how spread out the distribution is around this balance point.
Higher Moments: Skewness and Kurtosis
The mean (1st moment) and variance (2nd central moment) describe location and spread. Higher moments capture more subtle shape features.
The n-th Moment
The n-th moment of is:
The n-th central moment measures deviation from the mean:
Skewness (3rd Standardized Moment)
Skewness
Skewness (γ, gamma) measures asymmetry of the distribution:
- : symmetric distribution
- : right-skewed (long right tail)
- : left-skewed (long left tail)
Kurtosis (4th Standardized Moment)
Kurtosis
Kurtosis measures tail heaviness:
- : normal distribution (mesokurtic)
- : heavier tails than normal (leptokurtic)
- : lighter tails than normal (platykurtic)
Kurtosis and Fat Tails
High kurtosis is a hallmark of fat tails. But beware: for truly fat-tailed distributions (like Pareto with α ≤ 4), kurtosis may be undefined because the integral diverges! You can't compute something that's infinite.
Explore: Skewness and Kurtosis
See how skewness affects asymmetry and how kurtosis (controlled here via degrees of freedom of a t-distribution) affects tail heaviness.
Skewness (γ): Asymmetry
Left-skewed
γ < 0
Long left tail
Symmetric
γ ≈ 0
Balanced
Right-skewed
γ > 0
Long right tail
Kurtosis (κ): Tail Heaviness
Kurtosis measures how heavy the tails are compared to the normal distribution (κ=3). We show three distributions with the same variance but different tail behavior:
Platykurtic
κ < 3
Thin tails, bounded
Uniform (κ=1.8)
Mesokurtic
κ = 3
Reference point
Normal (κ=3)
Leptokurtic
κ > 3
Fat tails, extremes
t-distribution
Lower ν → heavier tails (higher κ). As ν → ∞, the t-distribution approaches normal (κ → 3).
Why this matters: The uniform distribution has hard bounds — extreme values are impossible. The normal has thin, exponentially decaying tails. But the t-distribution (and other fat-tailed distributions) allows for extreme events that would be “impossible” under normality. This is Taleb's central concern: real-world phenomena often have κ ≫ 3.
When Moments Don't Exist
This is perhaps the most important concept for understanding fat tails: moments are not guaranteed to exist!
For a moment to exist, the integral must converge to a finite value. For the Pareto distribution with tail exponent α:
| Tail Exponent (α) | Finite Moments |
|---|---|
| Mean undefined (infinite) | |
| Variance undefined | |
| Skewness undefined | |
| Kurtosis undefined |
Why Does α > n Determine Moment Existence?
This isn't arbitrary — it comes from the mathematics of integration. Let's trace through the derivation.
Moment Convergence Condition
For a Pareto distribution with PDF :
The exponent on is . This integral converges if and only if this exponent is less than .
Checking the Second Moment (Variance)
For the second moment (n = 2):
- If : exponent is → diverges
- If : exponent is → converges
- If : exponent is → converges
So variance exists only when .
Interactive: Why Moments Diverge
Integrand: x2 × f(x) = const × x-1.5
Integral Accumulation (as upper limit increases)
| Upper limit | 5 | 10 | 20 | 50 | 100 | 200 | 500 | 1000 |
|---|---|---|---|---|---|---|---|---|
| ∫ from 1 to... | 2.76 | 3.42 | 3.88 | 4.29 | 4.50 | 4.66 | 4.85 | 5.12 |
Values stabilize → integral converges to finite value
E[X2] for Pareto(α = 2.5)
Condition: α > 2 → 2.5 > 2
Finite
Taleb's Critical Point
If you're working with data from a fat-tailed distribution but assuming moments exist, your statistical estimates will be meaningless. The sample mean won't converge to anything stable. The sample variance will wildly fluctuate. This isn't bad luck — it's mathematics.
Many real-world phenomena (wealth, market returns, earthquake magnitudes) have α between 1 and 3, meaning variance or even the mean may not exist!
The Cauchy Distribution
The Cauchy distribution (also called Lorentz distribution) has:
This is a Student's t-distribution with 1 degree of freedom. Its tails decay so slowly that even the mean doesn't exist. The sample average of Cauchy random variables doesn't converge — no matter how many samples you take.
Interactive: Exploring the Cauchy Distribution
What You're Seeing
Both distributions look bell-shaped near the center, but watch the tails! The Cauchy's tails are much "fatter" — they decay as 1/x² rather than exponentially.
Try: Toggle log scale to see how dramatically different the tail behavior is. The Gaussian becomes negligible while Cauchy maintains significant probability.
Interactive: When Do Moments Exist?
| Moment | Requires | Status | Value |
|---|---|---|---|
| Mean (1st moment) | α > 1 | ✓ Finite | 1.667 |
| Variance (2nd moment)← | α > 2 | ✓ Finite | 2.222 |
| Skewness (3rd moment) | α > 3 | ∞ Infinite | ∞ |
| Kurtosis (4th moment) | α > 4 | ∞ Infinite | ∞ |
Pareto PDF
Moment Integrand: x2 × f(x)
Area under curve is finite
E[X2] for Pareto(α = 2.5)
Requires α > 2, currently α = 2.5
5.000
Finite value
Understanding Moment Existence
The n-th moment E[Xn] exists only when the integral ∫ xn f(x) dx converges. For Pareto distributions, this requires α > n.
Why this matters: Many real-world phenomena follow Pareto-like distributions with α between 1 and 3. This means:
- Wealth distribution (α ≈ 1.5): Even the mean is unstable
- City sizes (α ≈ 2): Mean exists, but variance is infinite
- Financial returns: Often α ≈ 3, making higher moments unreliable
Interactive: Sample Mean Convergence (or Not!)
Gaussian (Normal)
Mean: Exists (μ = 0)
Variance: Exists (σ² = 1)
All moments exist. Sample mean converges to true mean.
Final Mean (n = 500)
-0.0126
Theoretical Behavior
Converges to μ
Why This Matters for Fat Tails
For Gaussian data, the Law of Large Numbers guarantees convergence. The sample mean settles down to the true mean as n → ∞. This is what we expect from "well-behaved" statistics.
Taleb's point: Many real-world distributions (market returns, wealth, etc.) behave more like Cauchy/low-ν Student's t than Gaussian. Standard statistics assumes convergence that may never happen.
Key Takeaways
- Expected value E[X] = μ is the center of mass of the distribution
- Variance Var(X) = σ² measures spread; standard deviation σ is its square root
- Skewness measures asymmetry; kurtosis measures tail heaviness
- Moments require convergent integrals — for fat-tailed distributions, higher moments (and sometimes even the mean) may not exist
- The tail exponent α determines which moments exist for power-law distributions