Conditional Expectation & Variance Decomposition

The tower rule and the law of total variance let you solve hierarchical models one layer at a time — condition on the mixing variable, solve the easy inner problem, then average.

By the end you'll be able to apply the tower rule to find an unconditional mean, split a variance into within-group and between-group pieces, and use both on a compound (mixed) model like an aggregate insurance loss.

Predict: right now the three group means (30, 50, 70) are spread far apart. If you drag all three sliders to the same value, what happens to the between-group term? Then drag them together and check the bar below.

Three groups of \(Y\) each carry a weight \(P(Y=y)\) and a conditional distribution of \(X\). Drag a group's mean apart from the others and watch the between-group variance term grow; widen the shared spread and watch the within-group term grow. Hover a row for its exact contribution.

Group A mean E[X|Y=1] — 30

Group B mean E[X|Y=2] — 50

Group C mean E[X|Y=3] — 70

Within-group spread SD(X|Y=y) — 8

Tower rule

E[X] = E[E[X|Y]] = Σ P(y)·E[X|Y=y] = —

Law of total variance

Var(X) = — + — = —

E[Var(X|Y)] — within-group Var(E[X|Y]) — between-group

The tower rule \(E[X] = E[E[X|Y]]\) and the law of total variance \(\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])\) let you evaluate hierarchical (mixed) models one layer at a time.

Formal

Tower rule: average the conditional means over the distribution of the conditioning variable to recover the unconditional mean.

\(E[X] = E[\,E[X|Y]\,]\)

Law of total variance: total variance splits into within-group spread plus between-group shift of the conditional mean.

\(\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])\)

within-group average spread + spread of the group means

Applied

Picture a compound (mixed) model: \(N \sim \text{Poisson}(\lambda)\) claims arrive; given \(N = n\), total loss \(S|N=n \sim \text{Normal}(n\mu, n\sigma^2)\). Condition on the mixing variable \(N\) first (the "outer" layer), solve the easy inner problem, then average:

\(E[S] = E[E[S|N]] = E[N\mu] = \lambda\mu\)

\(\text{Var}(S) = E[\text{Var}(S|N)] + \text{Var}(E[S|N]) = E[N\sigma^2] + \text{Var}(N\mu) = \lambda\sigma^2 + \mu^2\lambda = \lambda(\sigma^2 + \mu^2)\)

You'll see this "condition on the mixing variable, then decompose" pattern constantly — it's the standard actuarial approach to aggregate-loss and compound-Poisson models.

Worked example

With \(\lambda=10, \mu=200, \sigma^2=5000\): \(E[S] = 10(200) = \) 2,000. \(\text{Var}(S) = 10(5000) + 200^2(10) = 50{,}000 + 400{,}000 = \) 450,000.

Your turn — finish the group-mixture example (matches the interactive's defaults)

Use the interactive's starting values above: \(P(Y{=}1)=0.3, P(Y{=}2)=0.4, P(Y{=}3)=0.3\); group means \(30, 50, 70\); shared SD \(=8\). Fill in the blanks.

Step 1 — tower rule: \(E[X] = 0.3(30) + 0.4(50) + 0.3(70) = \) ?

Reveal step 1

\(E[X] = 9 + 20 + 21 = 50\).

Step 2 — within-group term: \(E[\text{Var}(X|Y)] = \text{SD}^2 = \) ?

Reveal step 2

\(E[\text{Var}(X|Y)] = 8^2 = 64\).

Step 3 — between-group term: \(\text{Var}(E[X|Y]) = 0.3(30-50)^2 + 0.4(50-50)^2 + 0.3(70-50)^2 = \) ?

Reveal step 3 (final answer)

\(= 0.3(400) + 0 + 0.3(400) = 120 + 120 = 240\). So \(\text{Var}(X) = 64 + 240 = 304\) — scroll back up and check: that's exactly what the interactive's summary panel shows at its default settings.

More info — the same split shows up as ANOVA

If you've seen one-way ANOVA, this is the same idea wearing a different name: the "between-group sum of squares" and "within-group sum of squares" you compute there are exactly \(\text{Var}(E[X|Y])\) and \(E[\text{Var}(X|Y)]\), just scaled by sample counts instead of probabilities. So every time you condition on a grouping variable and split variance into "spread inside each group" versus "how far apart the group means are," you're applying the law of total variance — whether the groups come from a probability model (this lesson) or from measured data (ANOVA). For more drilling on the actuarial side, see the AnalystPrep video in Dive deeper below, or scroll back up and re-run the interactive with this framing in mind.

Check your understanding

Question 1 of 4

N ~ Poisson(λ) claims arrive, and given N=n, aggregate loss S|N=n has mean nμ and variance nσ². Using the tower rule, what is E[S]?

Question 2 of 4

In the variance decomposition Var(X) = E[Var(X|Y)] + Var(E[X|Y]), what does the term Var(E[X|Y]) represent?

Question 3 of 4

If X and Y are independent, what does the law of total variance Var(X) = E[Var(X|Y)] + Var(E[X|Y]) simplify to?

Question 4 of 4

For discrete X and Y with joint pmf P(X=x, Y=y), which computation gives the same value as the tower rule E[X] = E[E[X|Y]]?

Recap

Tower rule: \(E[X] = E[E[X|Y]]\) — average the conditional means over \(Y\) to get the unconditional mean.
Law of total variance: \(\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])\) — within-group spread plus between-group shift of the conditional mean.
Compound/mixed models: condition on the mixing variable first (e.g. claim count \(N\)), solve the inner problem, then average — the standard actuarial pattern for aggregate losses.
Independence check: if \(X\) and \(Y\) are independent, \(E[X|Y]\) is constant, so \(\text{Var}(E[X|Y]) = 0\) and \(\text{Var}(X) = E[\text{Var}(X|Y)]\) — none of the variance comes from \(Y\).

Dive deeper

MIT 6.012 — Conditional Expectation & the Total Expectation Theorem Learn the tower rule E[X] = E[E[X|Y]].
Harvard Stat 110 — Lecture 26: Conditional Expectation Continued Apply Adam's law to layered (mixed) models.
AnalystPrep — Law of Total Variance (Exam P) Drill the law of total variance on Exam P problems.

Sources

Conditional Expectation, the Tower Rule, and Variance Decomposition