Expectation, Variance & Linearity

By the end of this page you'll compute E[X] and Var(X) for any distribution, and reuse their linearity rules to find E[aX+b] and Var(aX+b) without redoing the sum.

Predict: if you drag more weight onto x = 4, which way does the balance point E[X] move — and does the ±1 SD band get wider or narrower? Then drag a slider below to check. Weights auto-normalize to sum to 1; hover a bar for its exact value and probability.

Original distribution

E[X]
E[X²]
Var(X)
SD(X)

Linear transform aX + b

E[aX+b]
Var(aX+b)
SD(aX+b)

You'll use two numbers to summarize almost any distribution: the expectation \(E[X]\), and the variance \(\mathrm{Var}(X) = E[X^2] - (E[X])^2\). Both obey linearity rules under \(aX+b\), so you never need to redo the sum from scratch.

Intuitive

Picture \(E[X]\) as the long-run average you'd see if you repeated the experiment forever — it's the balance point of the distribution, exactly like the fulcrum in the interactive above. Standard deviation (\(\mathrm{Var}(X)\)'s square root) tells you how far outcomes typically stray from that balance point: a small value means results cluster near the mean, a large one means they're spread out and unpredictable — that's the green band around the fulcrum.

Formal

\(E[X] = \sum_x x\,p(x)\) (discrete) or \(\int x f(x)\,dx\) (continuous). For any function \(g\), \(E[g(X)] = \sum_x g(x)p(x)\) or \(\int g(x)f(x)\,dx\) — this is the law of the unconscious statistician (LOTUS), and it's how you'll get \(E[X^2]\) without finding \(X^2\)'s own distribution first. So \(\mathrm{Var}(X) = E[(X-E[X])^2] = E[X^2] - (E[X])^2\). Linearity: \(E[aX+b] = aE[X]+b\) for any constants \(a, b\); but \(\mathrm{Var}(aX+b) = a^2\mathrm{Var}(X)\) — shifting by \(b\) doesn't change spread, scaling by \(a\) stretches spread by \(|a|\), hence the \(a^2\) in variance.

Applied

Say a claim severity \(X\) has \(E[X] = \$2{,}000\), and you apply a 10% expense loading plus a flat \$50 fee. Your loaded premium is \(Y = 1.1X + 50\), so \(E[Y] = 1.1(2000) + 50 = \$2{,}250\). If \(\mathrm{Var}(X) = 1{,}000{,}000\), then \(\mathrm{Var}(Y) = 1.1^2 \times 1{,}000{,}000 = \$1{,}210{,}000\) — notice the \$50 fee never touches variance, only the 1.1 multiplier does.

Worked example
\(X\) uniform on \(\{1,\dots,6\}\) (fair die). \(E[X] = 21/6 = 3.5\). \(E[X^2] = 91/6\). \(\mathrm{Var}(X) = 91/6 - 3.5^2 = \) 35/12 ≈ 2.9167.
Now you try — faded example

\(X\) is uniform on \(\{1,2,3,4\}\) (a fair 4-sided spinner). Find \(E[X]\) and \(\mathrm{Var}(X)\).

Step 1 — \(E[X] = \dfrac{1+2+3+4}{4} = \dfrac{10}{4} = 2.5\).

Step 2 — \(E[X^2] = \dfrac{1^2+2^2+3^2+4^2}{4} = \dfrac{30}{4} = 7.5\).

Step 3 — finish it: \(\mathrm{Var}(X) = E[X^2] - (E[X])^2 = 7.5 - 2.5^2 = \)

Reveal the answer
\(7.5 - 6.25 = \) 1.25, so \(\mathrm{SD}(X) = \sqrt{1.25} \approx 1.118\).

More info — why Var(aX+b) picks up a² but not b

Here's another way to see it: shifting every outcome by \(b\) (adding a constant) slides the whole distribution sideways without changing how spread out it is relative to its own new mean — the balance point moves by \(b\), but the distances from it don't change, so variance is unaffected. Scaling by \(a\), though, stretches every distance-from-the-mean by a factor of \(a\); since variance averages squared distances, that stretch gets squared too, giving \(a^2\). Try it yourself: set \(a=2, b=0\) in the interactive above and watch \(\mathrm{Var}(aX+b)\) jump to 4× the original, while \(a=1, b\) sliding anywhere leaves it untouched. The StatQuest and Harvard Stat 110 links under Dive deeper below both re-derive this more slowly.

Check your understanding

Question 1 of 4

X is uniform on {1,2,3,4,5,6} (a fair die). What are E[X] and Var(X)?

Question 2 of 4

If Var(X) = 4, what is Var(3X + 5)?

Question 3 of 4

Which statement about Var(X) is always true?

Question 4 of 4

X takes values 1, 2, 3 with P(X=1)=p, P(X=2)=2p, P(X=3)=3p. Using the fact that probabilities over the sample space must sum to 1, find E[X].

Recap

  • Expectation \(E[X] = \sum_x x\,p(x)\) (or \(\int x f(x)\,dx\)) — the probability-weighted balance point.
  • Variance \(\mathrm{Var}(X) = E[X^2] - (E[X])^2\) — average squared distance from that balance point; \(\mathrm{SD}(X) = \sqrt{\mathrm{Var}(X)}\).
  • LOTUS: \(E[g(X)] = \sum_x g(x)p(x)\) lets you compute \(E[X^2]\) directly, no need for \(X^2\)'s own distribution.
  • Linearity: \(E[aX+b] = aE[X]+b\), but \(\mathrm{Var}(aX+b) = a^2\mathrm{Var}(X)\) — shifts (\(b\)) don't change spread, scaling (\(a\)) does, and it's squared.

Dive deeper

Sources

  • Expectation, Variance, and Linearity