Key Properties of Expectation

Linearity, independence, and conditional expectation - essential tools for analysis.

The expected value operator E[·] has powerful properties that make probability theory work. These properties are essential tools for analysis — and understanding when they hold (and when they don't) is crucial for working with fat tails.

Linearity of Expectation

Definition

Linearity Property

For any random variables and , and constants and :

Key Insight

Always True — No Independence Required!

Linearity of expectation always holds, regardless of whether and are independent. This makes it one of the most useful tools in probability. You can compute expected values of sums by summing expected values.

Example

Portfolio Returns

If you invest 60% in asset X (expected return 8%) and 40% in asset Y (expected return 5%), your portfolio's expected return is:

This works regardless of how X and Y are correlated!

Independence and Products

Definition

Statistical Independence

Two random variables and are independent if knowing the value of one tells you nothing about the other:

Read: “The joint probability equals the product of individual probabilities”

The events don't influence each other

Definition

Product of Independent Variables

If and are independent, then:

Independence Required!

Unlike linearity, this property requires independence. If and are correlated, .

For the variance of a sum, independence matters even more:

Read: “Variance of X plus Y equals variance of X plus variance of Y plus twice the covariance”

Only when and are independent (so ) does this simplify to:

Explore: Linearity in Action

See how combining two distributions with different weights affects the result. Notice that E[aX + bY] = aE[X] + bE[Y] always holds, while the variance formula depends on independence.

This demonstrates the linearity property: E[aX + bY] = aE[X] + bE[Y]

Distribution X

μₓmean

0.00

σₓstd dev

1.00

Distribution Y

μᵧmean

2.00

σᵧstd dev

0.80

Linear Combination Coefficients

1.00

Loading chart...

Linearity of Expectation

E[1X + 1Y] = 1·E[X] + 1·E[Y]

= 1·(0.00) + 1·(2.00)

= 2.00

Variance (if independent)

Var(1X + 1Y) = 1²·Var(X) + 1²·Var(Y)

= 1.0·(1.00) + 1.0·(0.64)

= 1.64

Key insight: The linearity of expectation always holds, regardless of whether X and Y are independent. However, the simple variance formula Var(aX + bY) = a²Var(X) + b²Var(Y) only holds when X and Y are independent. If they're correlated, there's an additional covariance term.

Conditional Expectation

Definition

Conditional Expectation

The conditional expectation is the expected value of given that we know the value of :

Read: “E of X given Y equals y”

The expected value of X when we know Y equals some specific value y

Conditional expectation is itself a random variable — it's a function of . A fundamental result connects it to the unconditional expectation:

Definition

Law of Total Expectation

For any random variables and :

Read: “The expected value of X equals the expected value of the conditional expectation of X given Y”

Average over all possible conditions to get the overall average

Example

Test Scores and Study Time

A student's test score depends on hours studied .

Someone who studied 2 hours: expected score might be 60
Someone who studied 6 hours: expected score might be 80
Someone who studied 10 hours: expected score might be 95

These are . The overall average score is the weighted average across all study times.

Example

Insurance Claims by Region

An insurance company wants to know expected claim amounts. Let = claim size and = region.

Urban areas: E[X | Y = urban] = $2,500
Suburban: E[X | Y = suburban] = $1,800
Rural: E[X | Y = rural] = $1,200

If 40% of customers are urban, 35% suburban, and 25% rural, then:

Interactive: Conditional Expectation in Action

Scenario: A student's test score (X) depends on hours studied (Y). The conditional expectation E[X|Y=y] tells us the expected score given specific study time.

YHours Studied

4.00

σScore variability

8.00

Show marginal distribution of X (average over all study times)

Loading chart...

Given Y = 4 hours:

E[X | Y = 4] = 50 + 5 × 4 = 70

The dashed vertical line shows this conditional mean

Law of Total Expectation

If we average E[X|Y] over all possible values of Y, we get E[X]:

E[X] = E[E[X|Y]] = 75

This is the overall average score if study time is uniformly distributed from 0-10 hours.

E[X|Y=y] for different study times:

Hours (Y)	0	2	4	6	8	10
E[X\|Y]	50	60	70	80	90	100

Notice: E[X|Y] is itself a random variable — it's a function of Y. Each hour of study adds 5 points to the expected score.

Key insight: Conditional expectation tells us how the average changes when we have partial information. If we know someone studied for 4 hours, our best guess for their score is 70 — not the overall average of 75. The marginal distribution (dashed line) shows what we'd expect without knowing study time.

Key Insight

Conditional Expectation and Fat Tails

Under fat tails, conditional expectations can behave unexpectedly. If we condition on observing an extreme event, the expected value of related quantities can shift dramatically. This is related to Taleb's point about how fat-tailed risks are "hidden" — conditioning on having seen only moderate events gives misleading expectations.

Jensen's Inequality

One more crucial property relates expectations of functions. This inequality is fundamental to Taleb's work on risk and decision-making.

Definition

Jensen's Inequality

For a convex function (curves upward, like ):

For a concave function (curves downward, like ):

Example

The Dice Example (Squaring)

Roll a fair die. The average roll is .

Square of the average:

Average of the squares:

The average of squares (15.17) exceeds the square of average (12.25). This gap is exactly the variance! .

Example

Income and Spending (Concave)

Suppose spending follows (diminishing returns).

Three people earn $10k, $40k, and $90k. Average income = $46.67k.

Spending at average income:
Average spending:

Average spending ($172) is less than spending at average income ($216). For concave functions, inequality flips: E[g(X)] ≤ g(E[X]).

Interactive: Jensen's Inequality Visualized

The Question: If you roll a fair die and apply a function to the result, is "the average of the transformed values" the same as "the transformation of the average"?

Transform function g(x)

Loading chart...

Average dice roll

E[X] = 3.50

Transform the average

g(E[X]) = 12.25

Blue diamond on curve

Average of transforms

E[g(X)] = 15.17

Orange star (on chord)

Jensen's Inequality: E[g(X)] ≥ g(E[X])

15.17 ≥ 12.25 ✓

For this convex function, the average of outputs exceeds the output of the average by 2.92.

Why the Gap?

For convex functions (curves "bowl upward"), the chord connecting any two points lies above the curve. E[g(X)] is a weighted average of points on the curve, which sits on this chord. g(E[X]) is the point directly on the curve. The chord is always above, so E[g(X)] ≥ g(E[X]).

Real-world example: If three friends earn $30k, $50k, and $100k, the average income is $60k. But the "average of squares" (3.47B) is much larger than the "square of average" (3.6B). This matters for understanding variance: Var(X) = E[X²] − (E[X])² uses this exact gap!

Key Insight

Taleb's Point About Convexity

Many real-world payoffs are convex (like options) or concave (like bounded utilities). Under fat tails, the gap between and can be enormous.

Example: If and has fat tails, then (which is infinite for Pareto with α ≤ 2) vastly exceeds . You can't just compute the average and square it!

Coming up in Module 7: We'll explore Jensen's Inequality in much more depth, including how it relates to option pricing, the barbell strategy, and why convex payoffs are your friend under uncertainty.

Key Takeaways

Linearity E[aX + bY] = aE[X] + bE[Y] always holds — no conditions required
Products: E[XY] = E[X]E[Y] only when X and Y are independent
Variance of sums: Includes covariance unless variables are independent
Conditional expectation E[X|Y] lets us compute expectations given partial information
Law of Total Expectation: E[X] = E[E[X|Y]] — average over conditions
Jensen's Inequality: For convex g, E[g(X)] ≥ g(E[X]) — crucial for understanding payoff nonlinearities under fat tails

Looking Ahead

You now have the probabilistic foundations for Taleb's work. In the next module, we'll explore specific distributions — the Gaussian, exponential, Pareto, and Student's t — to see how they differ in their tail behavior. This is where the abstract concepts become concrete and consequential.