Conditional Expectation & Variance Decomposition
The tower rule and the law of total variance let you solve hierarchical models one layer at a time — condition on the mixing variable, solve the easy inner problem, then average.
By the end you'll be able to apply the tower rule to find an unconditional mean, split a variance into within-group and between-group pieces, and use both on a compound (mixed) model like an aggregate insurance loss.
Predict: right now the three group means (30, 50, 70) are spread far apart. If you drag all three sliders to the same value, what happens to the between-group term? Then drag them together and check the bar below.
Three groups of \(Y\) each carry a weight \(P(Y=y)\) and a conditional distribution of \(X\). Drag a group's mean apart from the others and watch the between-group variance term grow; widen the shared spread and watch the within-group term grow. Hover a row for its exact contribution.
The tower rule \(E[X] = E[E[X|Y]]\) and the law of total variance \(\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])\) let you evaluate hierarchical (mixed) models one layer at a time.
Tower rule: average the conditional means over the distribution of the conditioning variable to recover the unconditional mean.
\(E[X] = E[\,E[X|Y]\,]\)
Law of total variance: total variance splits into within-group spread plus between-group shift of the conditional mean.
\(\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])\)
within-group average spread + spread of the group means
Picture a compound (mixed) model: \(N \sim \text{Poisson}(\lambda)\) claims arrive; given \(N = n\), total loss \(S|N=n \sim \text{Normal}(n\mu, n\sigma^2)\). Condition on the mixing variable \(N\) first (the "outer" layer), solve the easy inner problem, then average:
\(E[S] = E[E[S|N]] = E[N\mu] = \lambda\mu\)
\(\text{Var}(S) = E[\text{Var}(S|N)] + \text{Var}(E[S|N]) = E[N\sigma^2] + \text{Var}(N\mu) = \lambda\sigma^2 + \mu^2\lambda = \lambda(\sigma^2 + \mu^2)\)
You'll see this "condition on the mixing variable, then decompose" pattern constantly — it's the standard actuarial approach to aggregate-loss and compound-Poisson models.
Your turn — finish the group-mixture example (matches the interactive's defaults)
Use the interactive's starting values above: \(P(Y{=}1)=0.3, P(Y{=}2)=0.4, P(Y{=}3)=0.3\); group means \(30, 50, 70\); shared SD \(=8\). Fill in the blanks.
Step 1 — tower rule: \(E[X] = 0.3(30) + 0.4(50) + 0.3(70) = \) ?
Reveal step 1
Step 2 — within-group term: \(E[\text{Var}(X|Y)] = \text{SD}^2 = \) ?
Reveal step 2
Step 3 — between-group term: \(\text{Var}(E[X|Y]) = 0.3(30-50)^2 + 0.4(50-50)^2 + 0.3(70-50)^2 = \) ?
Reveal step 3 (final answer)
More info — the same split shows up as ANOVA
If you've seen one-way ANOVA, this is the same idea wearing a different name: the "between-group sum of squares" and "within-group sum of squares" you compute there are exactly \(\text{Var}(E[X|Y])\) and \(E[\text{Var}(X|Y)]\), just scaled by sample counts instead of probabilities. So every time you condition on a grouping variable and split variance into "spread inside each group" versus "how far apart the group means are," you're applying the law of total variance — whether the groups come from a probability model (this lesson) or from measured data (ANOVA). For more drilling on the actuarial side, see the AnalystPrep video in Dive deeper below, or scroll back up and re-run the interactive with this framing in mind.
Check your understanding
N ~ Poisson(λ) claims arrive, and given N=n, aggregate loss S|N=n has mean nμ and variance nσ². Using the tower rule, what is E[S]?
In the variance decomposition Var(X) = E[Var(X|Y)] + Var(E[X|Y]), what does the term Var(E[X|Y]) represent?
If X and Y are independent, what does the law of total variance Var(X) = E[Var(X|Y)] + Var(E[X|Y]) simplify to?
For discrete X and Y with joint pmf P(X=x, Y=y), which computation gives the same value as the tower rule E[X] = E[E[X|Y]]?
Recap
- Tower rule: \(E[X] = E[E[X|Y]]\) — average the conditional means over \(Y\) to get the unconditional mean.
- Law of total variance: \(\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])\) — within-group spread plus between-group shift of the conditional mean.
- Compound/mixed models: condition on the mixing variable first (e.g. claim count \(N\)), solve the inner problem, then average — the standard actuarial pattern for aggregate losses.
- Independence check: if \(X\) and \(Y\) are independent, \(E[X|Y]\) is constant, so \(\text{Var}(E[X|Y]) = 0\) and \(\text{Var}(X) = E[\text{Var}(X|Y)]\) — none of the variance comes from \(Y\).
Dive deeper
- MIT 6.012 — Conditional Expectation & the Total Expectation Theorem Learn the tower rule E[X] = E[E[X|Y]].
- Harvard Stat 110 — Lecture 26: Conditional Expectation Continued Apply Adam's law to layered (mixed) models.
- AnalystPrep — Law of Total Variance (Exam P) Drill the law of total variance on Exam P problems.
Sources
- Conditional Expectation, the Tower Rule, and Variance Decomposition