Random Variables and Distributions
PDF, CDF, and the Survival Function - the building blocks of probability distributions.
Before we can understand fat tails, we need a solid foundation in probability. This section introduces the key mathematical objects: random variables and their distributions.
What is a Random Variable?
Random Variable
A random variable is a function that assigns numerical values to outcomes of a random experiment. We write where is the sample space (all possible outcomes) and ℝ is the set of real numbers.
There are two main types of random variables:
- Discrete: Takes countable values (coin flips, dice rolls, number of customers)
- Continuous: Takes values in an interval (heights, stock returns, time)
For Taleb's work on fat tails, we focus primarily on continuous random variables, though the concepts apply to both types.
The Probability Density Function (PDF)
Probability Density Function
For a continuous random variable , the probability density function describes the relative likelihood of taking on different values. It satisfies:
- for all
- (total probability is 1)
Normal (Gaussian) Distribution PDF
The famous bell curve — also called the Gaussian distribution or simply the normal distribution — has PDF:
Read: “f of x equals one over sigma root two pi, times e to the minus x minus mu squared over two sigma squared”
The height of the bell curve at point x, centered at μ with spread σ
The standard normal is the special case where (centered at zero) and (unit spread). It's often written as .
Interactive: Probability as Area Under the Curve
Select range [a, b]
P(-1.00 ≤ X ≤ 1.00)
74.07%
= F(1.00) − F(-1.00) = 0.8703 − 0.1297
Quick presets (Normal distribution)
Probability as area: This visualization shows the fundamental relationship between the PDF and probability. The shaded area under the curve between a and b equals P(a ≤ X ≤ b). This is calculated using the CDF: P(a ≤ X ≤ b) = F(b) − F(a).
The Cumulative Distribution Function (CDF)
Cumulative Distribution Function
The CDF gives the probability that is less than or equal to :
Key properties of the CDF:
- is always non-decreasing (never goes down)
- The PDF is the derivative of the CDF:
From PDF to CDF and Back
The CDF is the cumulative (running total) area under the PDF curve. Conversely, the PDF is the slope of the CDF. These two functions contain the same information in different forms.
Explore: PDF and CDF
Use this interactive tool to see how the PDF and CDF relate. Adjust parameters and see how the shaded area under the PDF corresponds to the CDF value.
Probability Density Function (PDF)
Cumulative Distribution Function (CDF)
P(X ≤ 0.00) = F(0.00) = 0.5000
The Survival Function
Survival Function
The survival function (also called the tail function or complementary CDF) is:
Read: “S of x equals the probability that X exceeds x, which equals one minus F of x”
What fraction of the probability mass lies above the value x?
Why the Survival Function Matters for Fat Tails
The survival function tells us how much probability remains in the tails. For fat-tailed distributions, decays slowly — extreme events remain relatively probable even far from the mean.
Taleb's central observation: the difference between distributions is most visible in their survival functions, especially at large values of .
Compare how different distributions behave in the tails:
- Normal: — decays exponentially fast
- Exponential: — decays exponentially
- Pareto (power law): — decays much slower!
Preview: Gaussian vs Pareto
Before we dive deeper, here's a visual comparison. The Pareto distribution is the archetypal "fat-tailed" distribution — we'll study it in detail in Module 2.
Gaussian (Normal) — symmetric bell curve
Pareto — heavy right tail
Lower α = heavier tail (more probability of extreme values). The Pareto only exists for x ≥ 1 (its minimum value).
Notice how the Gaussian drops off quickly and symmetrically, while the Pareto has a long, slowly-decaying right tail. This "heavy tail" is what makes extreme events more likely under Pareto-like distributions.
Explore: Survival Functions and Tail Behavior
This plot compares how quickly different distributions decay in the tails. Toggle log-log scale to see the characteristic straight-line behavior of power law distributions.
Key insight: On a log-log plot, the Pareto distribution appears as a straight line, while the normal and exponential curves drop off much faster. This straight line is the signature of a power law or fat tail — extreme events remain relatively probable even far from the mean.
Notice: With α ≤ 2, the Pareto distribution has infinite variance. With α ≤ 1, even the mean is infinite!
Key Takeaways
- A random variable assigns numbers to random outcomes
- The PDF describes relative likelihoods; area under the curve gives probability
- The CDF gives cumulative probability
- The survival function measures tail probability — this is where fat tails reveal themselves
- The rate at which decays determines whether a distribution is thin-tailed or fat-tailed