The Gaussian Distribution
The famous bell curve — its properties, why it's so common, and why Taleb warns against overreliance on it.
The Gaussian distribution — also called the normal distribution or bell curve — is the most famous distribution in statistics. It appears everywhere, from measurement errors to heights to IQ scores. But as Taleb emphasizes, its very familiarity can be dangerous when applied to domains where it doesn't belong.
The Bell Curve Function
Before we look at the full Gaussian formula, let's understand the mathematical function that creates the famous “bell” shape: .
The Bell Curve Shape
The Gaussian's shape comes from the function:
This is the exponential function with a negative squared argument. The squaring makes it symmetric (same shape left and right), and the negative sign makes it decay away from x = 0.
Read: “e to the minus x squared”
The function that creates the bell shape — peaks at x = 0 and drops rapidly on both sides
What makes special is how fast it decays. The x² in the exponent means the decay rate accelerates as you move away from zero — this is what creates the Gaussian's “thin tails.”
Explore: The Bell Curve
See how e−x² compares to other decay functions. Notice how quickly it drops to zero — this is why extreme events are so rare under Gaussian assumptions.
The Bell Curve: y = e−x²
The Gaussian's shape comes from e−x² — the exponential of negative x squared. This creates the famous “bell” shape with its distinctive properties.
Notice the differences:
- e−x² (black): Drops fastest — the “thin tails”
- e−|x| (green): Drops slower — the boundary between thin/fat
- 1/(1+x²) (red): Drops slowest — “fat tails” (Cauchy-like)
Why e−x² Creates Thin Tails
The x² in the exponent makes the decay accelerate as you move away from zero. At x = 3, you're computing e−9 ≈ 0.0001. At x = 4, it's e−16 ≈ 0.00000001.
Tail Comparison at x = 4
| Function | Value at x=4 | Tail Type |
|---|---|---|
| e−x² | 1.13e-7 | Extremely thin (Gaussian) |
| e−|x| | 1.83e-2 | Thin (Exponential) |
| 1/(1+x²) | 0.0588 | Fat (Cauchy-like) |
The Gaussian (e−x²) is 522712× smaller than the Cauchy-like function at x = 4. This ratio grows rapidly as x increases.
The squared exponent is everything:
- e−x decays exponentially (each unit of x divides by e ≈ 2.7)
- e−x² decays super-exponentially (the rate itself accelerates)
- This is why Gaussian tails are called “thin” — they vanish incredibly fast
- 5-sigma events become “impossible” under Gaussian assumptions
Why This Creates Thin Tails
The squared exponent is everything. While e−x decays at a constant rate (dividing by e ≈ 2.7 for each unit of x), e−x² decays at an acceleratingrate. At x = 4, you're computing e−16 — an incredibly small number. This is why 5-sigma events are “impossible” under Gaussian assumptions.
The Gaussian PDF
Gaussian Distribution
A random variable follows a Gaussian (or normal) distribution with mean μ and standard deviation σ if its probability density function is:
We write .
The Gaussian is completely determined by two parameters:
- (mu) — the mean, median, and mode (all equal due to symmetry)
- (sigma) — the standard deviation, controlling the spread
Key Properties
All Moments Exist
Every moment of the Gaussian distribution is finite. The mean is , the variance is , the skewness is 0 (symmetric), and the kurtosis is exactly 3.
Stability Under Addition
If and are independent, then:
The sum of Gaussians is Gaussian. This property (called “stability”) is rare and powerful — most distributions don't have it.
Maximum Entropy
Among all distributions with a given mean and variance, the Gaussian has maximum entropy. It's the “least informative” distribution given those constraints — which is why it often appears when we know little about a system.
The 68-95-99.7 Rule
For a Gaussian distribution, nearly all probability mass is concentrated within a few standard deviations of the mean:
68.3% of values fall within of
95.4% of values fall within of
99.7% of values fall within of
99.99994% of values fall within of
The Rarity of Extreme Events
Under a Gaussian distribution, a “5-sigma event” should occur roughly once in 3.5 million observations. A 6-sigma event is 1 in a billion.
Taleb's critique: In financial markets, 5+ sigma events happen far more frequently than this. The 1987 crash was a 20+ sigma event under Gaussian assumptions — essentially impossible. This suggests markets are not Gaussian.
Explore: The Gaussian Distribution
Adjust the parameters to see how μ shifts the center and σ controls the spread. Watch how the 68-95-99.7 bands change as you modify σ.
Extreme Event Probabilities (beyond ±nσ)
P(|X - μ| > 4σ) = 4.61e-5 ≈ 1 in 21,698
P(|X - μ| > 5σ) = 4.15e-7 ≈ 1 in 2,407,683
P(|X - μ| > 6σ) = 1.43e-9 ≈ 1 in 699,098,732
Under Gaussian assumptions, 5σ events are vanishingly rare. In real markets, they happen regularly.
Tail Behavior: Light Tails
The survival function of the Gaussian decays extremely fast:
Read: “P of X greater than x is approximately 1 over x root 2 pi times e to the minus x squared over 2”
The tail probability shrinks faster than any polynomial — exponentially fast in x²
This decay is extremely rapid. Compare:
- At : (3 in 100,000)
- At : (3 in 10 million)
- At : (1 in a billion)
Visualizing Tail Decay
The log scale on the y-axis reveals how rapidly Gaussian tails vanish. The table shows the exact probability and “1 in N” interpretation for key σ thresholds.
| Threshold | P(X > x) | Frequency |
|---|---|---|
| 3σ | 9.95e-4 | 1 in 1.0 thousand |
| 4σ | 2.30e-5 | 1 in 43.4 thousand |
| 5σ | 2.08e-7 | 1 in 4.8 million |
| 6σ | 7.2e-10 | 1 in 1.4 billion |
Key insight: The Gaussian tail decays as e-x²/2, which is faster than exponential. Each additional σ reduces the tail probability by roughly 100-1000×. This makes events beyond 5-6σ essentially “impossible” under Gaussian assumptions.
The problem: In fat-tailed domains like financial markets, events that “should” be 1-in-a-billion occur far more frequently, revealing the Gaussian assumption to be dangerously wrong.
Each additional σ reduces the tail probability by roughly a factor of 1000 or more. This is what makes extreme events “impossible” under Gaussian assumptions — and why using Gaussian models for fat-tailed phenomena leads to catastrophic underestimation of risk.
Where the Gaussian Applies (and Doesn't)
Good Applications
- Measurement errors — instrumental noise often follows Gaussian
- Heights of people — bounded by biology, symmetric around mean
- IQ scores — by construction (they're normalized to be Gaussian)
- Sums of many small effects — Central Limit Theorem applies
Dangerous Misapplications
- Financial returns — have fat tails, not Gaussian
- Insurance claims — dominated by rare large events
- Earthquake magnitudes — follow power laws
- Wealth distribution — extremely skewed, unbounded
- Pandemic sizes — fat-tailed with potential for extreme events
Taleb's Warning
The problem isn't that the Gaussian is wrong — it's that we apply it where it doesn't belong. In domains where single observations can dominate (wealth, returns, casualties), Gaussian assumptions lead to gross underestimation of tail risk.
Key Takeaways
- The Gaussian has PDF , with tails that decay extremely fast
- All moments exist; skewness = 0, kurtosis = 3
- The 68-95-99.7 rule shows how concentrated the distribution is
- Sums of Gaussians remain Gaussian (stability)
- Extreme events are treated as “impossible” — a dangerous assumption in fat-tailed domains
- Taleb's key insight: real-world extremes often vastly exceed Gaussian predictions