Expectation and Moments

Mean, variance, skewness, and kurtosis - measuring the shape of distributions.

The PDF tells us everything about a distribution, but it's often useful to summarize a distribution with a few numbers. These summary statistics — called moments — describe the center, spread, and shape of the distribution.

A critical insight for understanding fat tails: moments may not always exist!

Expected Value (Mean)

Definition

Expected Value

The expected value (or mean) of a continuous random variable with PDF is:

We often denote the mean as μ (mu).

Key Insight

Mean as Center of Mass

Think of the PDF as a physical shape made of material. The mean μ is the point where you could balance this shape on a fulcrum — the center of mass. Values with higher density contribute more to the average.

Variance and Standard Deviation

Definition

Variance

The variance measures how spread out a distribution is around its mean:

We denote variance as (sigma squared).

Read: Variance of X equals the expected value of X minus mu, squared

The average squared distance from the mean — larger variance means more spread

Definition

Standard Deviation

The standard deviation is the square root of variance:

It's in the same units as , making it easier to interpret than variance.

Example

Standard Deviation Interpretation

For a normal distribution, approximately 68% of the probability mass lies within one standard deviation of the mean (μ ± σ), and about 95% lies within two standard deviations (μ ± 2σ).

Explore: Mean and Variance

Adjust the mean and standard deviation to see how they affect the distribution. The shaded region shows ±1 standard deviation from the mean.

μmean
0.00
σstd dev
1.00
Loading chart...

Mean (μ)

0.00

Center of mass

Std Dev (σ)

1.00

Spread around mean

Variance (σ²)

1.00

Squared spread

P(μ-σ ≤ X ≤ μ+σ)

74.1%

Shaded region

The mean as center of mass: If you imagine the PDF as a physical shape, the mean μ is where you could balance it on a fulcrum. The standard deviation σ measures how spread out the distribution is around this balance point.

Higher Moments: Skewness and Kurtosis

The mean (1st moment) and variance (2nd central moment) describe location and spread. Higher moments capture more subtle shape features.

Definition

The n-th Moment

The n-th moment of is:

The n-th central moment measures deviation from the mean:

Skewness (3rd Standardized Moment)

Definition

Skewness

Skewness (γ, gamma) measures asymmetry of the distribution:

  • : symmetric distribution
  • : right-skewed (long right tail)
  • : left-skewed (long left tail)

Kurtosis (4th Standardized Moment)

Definition

Kurtosis

Kurtosis measures tail heaviness:

  • : normal distribution (mesokurtic)
  • : heavier tails than normal (leptokurtic)
  • : lighter tails than normal (platykurtic)
Key Insight

Kurtosis and Fat Tails

High kurtosis is a hallmark of fat tails. But beware: for truly fat-tailed distributions (like Pareto with α ≤ 4), kurtosis may be undefined because the integral diverges! You can't compute something that's infinite.

Explore: Skewness and Kurtosis

See how skewness affects asymmetry and how kurtosis (controlled here via degrees of freedom of a t-distribution) affects tail heaviness.

Skewness (γ): Asymmetry

γskewness
0.00
Loading chart...

Left-skewed

γ < 0

Long left tail

Symmetric

γ ≈ 0

Balanced

Right-skewed

γ > 0

Long right tail

Kurtosis (κ): Tail Heaviness

Kurtosis measures how heavy the tails are compared to the normal distribution (κ=3). We show three distributions with the same variance but different tail behavior:

Loading chart...

Platykurtic

κ < 3

Thin tails, bounded

Uniform (κ=1.8)

Mesokurtic

κ = 3

Reference point

Normal (κ=3)

Leptokurtic

κ > 3

Fat tails, extremes

t-distribution

νt-distribution degrees of freedom
5

Lower ν → heavier tails (higher κ). As ν → ∞, the t-distribution approaches normal (κ → 3).

Why this matters: The uniform distribution has hard bounds — extreme values are impossible. The normal has thin, exponentially decaying tails. But the t-distribution (and other fat-tailed distributions) allows for extreme events that would be “impossible” under normality. This is Taleb's central concern: real-world phenomena often have κ ≫ 3.

When Moments Don't Exist

This is perhaps the most important concept for understanding fat tails: moments are not guaranteed to exist!

For a moment to exist, the integral must converge to a finite value. For the Pareto distribution with tail exponent α:

Tail Exponent (α)Finite Moments
Mean undefined (infinite)
Variance undefined
Skewness undefined
Kurtosis undefined

Why Does α > n Determine Moment Existence?

This isn't arbitrary — it comes from the mathematics of integration. Let's trace through the derivation.

Definition

Moment Convergence Condition

For a Pareto distribution with PDF :

The exponent on is . This integral converges if and only if this exponent is less than .

Example

Checking the Second Moment (Variance)

For the second moment (n = 2):

  • If : exponent is → diverges
  • If : exponent is → converges
  • If : exponent is → converges

So variance exists only when .

Interactive: Why Moments Diverge

See how the tail exponent α determines whether the integral for E[X2] converges
αtail index
2.50
Moment order n:

Integrand: x2 × f(x) = const × x-1.5

Loading chart...

Integral Accumulation (as upper limit increases)

Upper limit51020501002005001000
∫ from 1 to...2.763.423.884.294.504.664.855.12

Values stabilize → integral converges to finite value

E[X2] for Pareto(α = 2.5)

Condition: α > 22.5 > 2

Finite

Key Insight

Taleb's Critical Point

If you're working with data from a fat-tailed distribution but assuming moments exist, your statistical estimates will be meaningless. The sample mean won't converge to anything stable. The sample variance will wildly fluctuate. This isn't bad luck — it's mathematics.

Many real-world phenomena (wealth, market returns, earthquake magnitudes) have α between 1 and 3, meaning variance or even the mean may not exist!

Example

The Cauchy Distribution

The Cauchy distribution (also called Lorentz distribution) has:

This is a Student's t-distribution with 1 degree of freedom. Its tails decay so slowly that even the mean doesn't exist. The sample average of Cauchy random variables doesn't converge — no matter how many samples you take.

Interactive: Exploring the Cauchy Distribution

Compare the Cauchy distribution to the Gaussian — notice how slowly the tails decay
γscale parameter
1.0
Loading chart...

What You're Seeing

Both distributions look bell-shaped near the center, but watch the tails! The Cauchy's tails are much "fatter" — they decay as 1/x² rather than exponentially.

Try: Toggle log scale to see how dramatically different the tail behavior is. The Gaussian becomes negligible while Cauchy maintains significant probability.

Interactive: When Do Moments Exist?

Explore how the tail index α determines which moments exist for the Pareto distribution
αtail index
2.50
MomentRequiresStatusValue
Mean (1st moment)α > 1✓ Finite1.667
Variance (2nd moment)α > 2✓ Finite2.222
Skewness (3rd moment)α > 3∞ Infinite
Kurtosis (4th moment)α > 4∞ Infinite
Visualize moment:

Pareto PDF

Loading chart...

Moment Integrand: x2 × f(x)

Loading chart...

Area under curve is finite

E[X2] for Pareto(α = 2.5)

Requires α > 2, currently α = 2.5

5.000

Finite value

Understanding Moment Existence

The n-th moment E[Xn] exists only when the integral ∫ xn f(x) dx converges. For Pareto distributions, this requires α > n.

Why this matters: Many real-world phenomena follow Pareto-like distributions with α between 1 and 3. This means:

  • Wealth distribution (α ≈ 1.5): Even the mean is unstable
  • City sizes (α ≈ 2): Mean exists, but variance is infinite
  • Financial returns: Often α ≈ 3, making higher moments unreliable

Interactive: Sample Mean Convergence (or Not!)

Watch how sample means behave as n grows — convergence depends on moment existence
Click to draw a fresh random sample
nmax samples
500

Gaussian (Normal)

Mean: Exists (μ = 0)

Variance: Exists (σ² = 1)

All moments exist. Sample mean converges to true mean.

Loading chart...

Final Mean (n = 500)

-0.0126

Theoretical Behavior

Converges to μ

Why This Matters for Fat Tails

For Gaussian data, the Law of Large Numbers guarantees convergence. The sample mean settles down to the true mean as n → ∞. This is what we expect from "well-behaved" statistics.

Taleb's point: Many real-world distributions (market returns, wealth, etc.) behave more like Cauchy/low-ν Student's t than Gaussian. Standard statistics assumes convergence that may never happen.

Key Takeaways

  • Expected value E[X] = μ is the center of mass of the distribution
  • Variance Var(X) = σ² measures spread; standard deviation σ is its square root
  • Skewness measures asymmetry; kurtosis measures tail heaviness
  • Moments require convergent integrals — for fat-tailed distributions, higher moments (and sometimes even the mean) may not exist
  • The tail exponent α determines which moments exist for power-law distributions