Total Probability & Bayes' Theorem

You'll build up P(A) from a partition with the law of total probability, then flip the conditioning with Bayes' theorem — turning "how likely is the evidence given the cause" into "how likely is the cause given the evidence."

By the end you'll be able to compute P(A) from a partition, apply Bayes' theorem to invert a conditional probability, and explain why even an accurate test throws off many false positives when the condition it's testing for is rare.

Predict: prevalence is currently 1%. If you drag it down toward 0.1%, will P(disease | +) — the posterior — go up or down? Then drag the slider and check.

Each dot is one of 1,000 people. Drag the sliders to change disease prevalence, test sensitivity, and test specificity — watch how the population splits into true/false positives and negatives, and how the posterior responds. Hover a dot or a legend row for its exact count.

P(disease | positive test)
16.7%

The law of total probability decomposes \(P(A)\) over a partition \(\{B_i\}\), and Bayes' theorem inverts the conditioning: \(P(B_i \mid A) = \dfrac{P(A \mid B_i)P(B_i)}{\sum_j P(A \mid B_j)P(B_j)}\).

Intuitive

Start with a prior belief about which "branch" of the world you're in (\(P(B_i)\)). Observe some evidence \(A\). Bayes' theorem tells you how to flip the likelihood \(P(A \mid B_i)\) — how likely the evidence is under each branch — into \(P(B_i \mid A)\), your updated belief about which branch you're actually in.

Formal

Law of total probability: \(P(A) = \sum_i P(A \mid B_i)P(B_i)\) for a partition \(\{B_i\}\) of the sample space.
Bayes' theorem: \(P(B_i \mid A) = \dfrac{P(A \mid B_i)P(B_i)}{\sum_j P(A \mid B_j)P(B_j)}\) — the denominator is just the law of total probability applied to \(P(A)\).

Applied

An insurer's prior belief that a claim is fraudulent might be \(P(\text{fraud}) = 1\%\). A screening algorithm flags 90% of fraudulent claims but also flags 5% of legitimate ones. Bayes' theorem converts "\(P(\text{flag}\mid\text{fraud})\)" into the number that actually matters for operations: \(P(\text{fraud}\mid\text{flag})\) — the probability a flagged claim is truly fraudulent, which (as in the medical test example) is often much lower than intuition suggests.

Worked example
Test 99% sensitive, 95% specific, prevalence 1%. Per 1,000 people that's 10 people with the disease and 990 without — the same split the interactive shows at its default settings.
True positives: \(10 \times 0.99 \approx 9.9\). False positives: \(990 \times 0.05 \approx 49.5\).
Law of total probability: \(P(+) = 0.99(0.01) + 0.05(0.99) = 0.0594\).
Bayes' theorem: \(P(\text{disease}\mid +) = 0.0099 / 0.0594 \approx\) 0.167.
Your turn — finish the fraud-detection version

An insurer's prior is \(P(\text{fraud}) = 1\%\). A screening algorithm has sensitivity \(P(\text{flag}\mid\text{fraud}) = 90\%\) and a false-positive rate \(P(\text{flag}\mid\text{not fraud}) = 5\%\). A claim gets flagged. Fill in the blanks:

Step 1 — law of total probability: \(P(\text{flag}) = 0.90(0.01) + 0.05(0.99) = \) ?

Reveal step 1
\(P(\text{flag}) = 0.009 + 0.0495 = 0.0585\).

Step 2 — Bayes' theorem: \(P(\text{fraud}\mid\text{flag}) = 0.009 / 0.0585 = \) ?

Reveal step 2 (final answer)
\(P(\text{fraud}\mid\text{flag}) \approx 0.154\) — about 15%. Even with a 90% sensitive algorithm, most flagged claims are still legitimate, because fraud is rare. Same base-rate logic as the medical test above.
More info — the natural-frequency trick

Percentages can hide the base-rate effect; whole numbers rarely do. Instead of "99% sensitive, 1% prevalence," picture 1,000 people directly — exactly what the interactive above draws: 10 people have the disease, 990 don't. Of the 10, about 10 test positive (true positives). Of the 990, about 50 test positive anyway (false positives). So among the roughly 60 positive tests, only 10 are real — \(10/60 \approx 0.167\), the same answer Bayes' theorem gives. Gigerenzer & Hoffrage (1995) found this "natural frequency" framing makes the base-rate effect click for people who find the fraction form opaque. If the formal version above didn't land, scroll back up and re-run the interactive with this framing in mind, or see the StatQuest walkthrough in Dive deeper below.

Check your understanding

Question 1 of 4

A diagnostic test is 99% sensitive (P(+|disease) = 0.99) and 95% specific (P(−|no disease) = 0.95). Disease prevalence is 1%. Given a positive test result, what is P(disease | +)?

Question 2 of 4

An insurer's prior is P(fraud) = 2%. A screening flag has P(flag|fraud) = 80% and P(flag|not fraud) = 4%. Given a flagged claim, what is P(fraud | flag)?

Question 3 of 4

For the law of total probability P(A) = Σᵢ P(A|Bᵢ)P(Bᵢ) to be valid, what must the events {Bᵢ} satisfy?

Question 4 of 4

Suppose event A is independent of B — that is, P(A|B) = P(A). If you apply Bayes' theorem to compute P(B|A), what do you get?

Recap

  • Law of total probability: \(P(A) = \sum_i P(A \mid B_i)P(B_i)\) — add up P(A) across every piece of a partition \(\{B_i\}\).
  • Bayes' theorem: \(P(B_i \mid A) = \dfrac{P(A \mid B_i)P(B_i)}{\sum_j P(A \mid B_j)P(B_j)}\) — the numerator is one branch's contribution, the denominator (law of total probability) normalizes it against all branches.
  • Base-rate effect: when the prior \(P(B_i)\) is small, even a highly accurate test yields many false positives relative to true positives — always sanity check the posterior against the prior, not just the test's advertised accuracy.

Dive deeper

Sources

  • Law of Total Probability and Bayes' Theorem