10 Conditional Probability, Independence, and Bayes' Theorem

Lecture from: 20.03.2025 | Video: Homelab | Rui Zhangs Notes

Introduction to Probability

This lecture introduces fundamental concepts in probability theory, which are essential tools in the analysis and design of algorithms, especially randomized algorithms. We will cover probability spaces, conditional probability, Bayes’ theorem, and independence.

Probability Space

Discrete Probability Space (Recall)

A discrete probability space is defined by:

A sample space $Ω = {ω_{1}, ω_{2}, \dots}$ , which is the set of all possible elementary outcomes of an experiment.
A probability measure $P r$ , which assigns a probability $P r [ω_{i}]$ to each elementary event $ω_{i}$ such that:
- $0 \leq P r [ω_{i}] \leq 1$ for all $ω_{i} \in Ω$ .
- $\sum_{ω \in Ω} P r [ω] = 1$ (the sum of probabilities of all elementary outcomes is 1).

An event $E$ is any subset of the sample space $Ω$ (i.e., $E \subseteq Ω$ ). The probability of an event $E$ is the sum of the probabilities of the elementary outcomes in $E$ : $P r [E] := \sum_{ω \in E} P r [ω]$

The complementary event to $E$ , denoted $\overset{ˉ}{E}$ , is the set of all outcomes in $Ω$ that are not in $E$ , i.e., $\overset{ˉ}{E} := Ω ∖ E$ . It follows that $P r [\overset{ˉ}{E}] = 1 - P r [E]$ .

Laplace Space (Recall)

A Laplace space (or uniform probability space) is a special type of finite probability space where every elementary outcome $ω \in Ω$ is equally likely.

If $∣Ω∣$ is the total number of elementary outcomes, then for any $ω \in Ω$ : $P r [ω] = \frac{1}{∣Ω∣}$

In a Laplace space, the probability of an event $E$ is given by: $P r [E] = \frac{Number of outcomes in E}{Total number of outcomes} = \frac{∣ E ∣}{∣Ω∣}$

Example: Drawing a Card

Consider a standard deck of 52 playing cards. The set of cards $C$ can be represented as $C = {♣, ♠, ♡, ♢} \times {2, 3, \dots, 9, 10, J, Q, K, A}$ . If we draw one card randomly, the sample space is $Ω = C$ , and $∣Ω∣ = 52$ . This is a Laplace space, so the probability of drawing any specific card $ω$ is $P r [ω] = 1/52$ . For example, the event “drawing an Ace” consists of 4 outcomes. So, $P r [Ace] = 4/52 = 1/13$ .

Example: Card Game Scenario

We shuffle a deck of 52 cards and deal 5 cards to Player 1 and 5 cards to Player 2.

The sample space $Ω$ consists of all possible pairs of hands $(X, Y)$ , where $X$ is Player 1’s 5-card hand and $Y$ is Player 2’s 5-card hand, such that $X, Y \subset C$ , $X \cap Y = \emptyset$ , and $∣ X ∣ = ∣ Y ∣ = 5$ .

The total number of ways to deal these cards is $(5 52) (5 47)$ .

Let’s consider the event $A :=$ “Player 1 has four Aces”.

To calculate $P r [A]$ , we consider the possibilities for Player 1’s hand. The total number of 5-card hands Player 1 can receive is $(5 52)$ .

The number of ways Player 1 can have four Aces is by choosing all 4 Aces ( $(4 4) = 1$ way) and 1 other card from the remaining 48 non-Ace cards ( $(1 48) = 48$ ways). So, there are $1 \cdot 48 = 48$ hands where Player 1 has four Aces.

Thus, the probability is: $P r [A] = \frac{Number of hands for Player 1 with 4 Aces}{Total number of 5-card hands for Player 1} = \frac{( 4 4 ) ( 1 48 )}{( 5 52 )} = \frac{48}{2 , 598 , 960} \approx 0.000018468 \approx 0.0018%$

Conditional Probability

Conditional probability addresses how the probability of an event changes when we know that another event has occurred.

If we are interested in the probability of event $A$ occurring, given that event $B$ has already occurred, we denote this as $P r [A ∣ B]$ (read “probability of A given B”).

Definition

Let $A$ and $B$ be two events in a sample space $Ω$ . If $P r [B] > 0$ , the conditional probability of $A$ given $B$ is defined as: $P r [A ∣ B] := \frac{P r [ A \cap B ]}{P r [ B ]}$ where $A \cap B$ is the event that both $A$ and $B$ occur.

If the underlying space is a Laplace space, this can be written as: $P r [A ∣ B] = \frac{∣ A \cap B ∣/∣Ω∣}{∣ B ∣/∣Ω∣} = \frac{∣ A \cap B ∣}{∣ B ∣}$ This means we restrict our attention to the outcomes where $B$ occurs (this becomes our new, smaller sample space), and then find the fraction of these outcomes where $A$ also occurs.

Example: Card Game (Continued)

Let $A :=$ “Player 1 has four Aces”. Let $B :=$ “Player 2 is dealt the hand ${9♣, 5♠, 6♡, 6♢, K ♠}$ ” (none of these are Aces). What is $P r [A ∣ B]$ ?

Given Player 2’s hand (event $B$ ), there are $52 - 5 = 47$ cards remaining. Player 1 will receive 5 cards from these 47 cards. The total number of possible hands for Player 1, given Player 2’s hand, is $(5 47)$ . This is the size of our new sample space (outcomes where $B$ occurred).

Now, for event $A \cap B$ (Player 1 has 4 Aces AND Player 2 has their specific hand): Since Player 2 has no Aces, all 4 Aces are still available among the 47 remaining cards. For Player 1 to have 4 Aces, Player 1 must receive all 4 Aces ( $(4 4) = 1$ way) and 1 more card from the $47 - 4 = 43$ cards that are not Aces and not in Player 2’s hand ( $(1 43) = 43$ ways). So, there are $1 \cdot 43 = 43$ hands for Player 1 corresponding to $A \cap B$ .

$P r [A ∣ B] = \frac{Number of P1 hands with 4 Aces (given P2’s hand)}{Total P1 hands (given P2’s hand)} = \frac{43}{( 5 47 )} = \frac{43}{1 , 533 , 939} \approx 0.000028032 \approx 0.0028%$

Notice $P r [A ∣ B] \approx 0.0028%$ is higher than $P r [A] \approx 0.0018%$ . This is because Player 2 holding 5 non-Aces slightly increases the chance of Player 1 getting the Aces from the remaining cards.

The Multiplication Rule (derived from conditional probability)

$P r [A \cap B] = P r [A ∣ B] P r [B]$
Also, by symmetry: $P r [A \cap B] = P r [B ∣ A] P r [A]$

Example: The Two-Children Problem

A family has two children. We assume child births are independent and probability of male (M) or female (W) is 1/2 each. The sample space for the genders of two children (e.g., older, younger) is $Ω = {MM, M W, W M, WW}$ , with each outcome having probability $1/4$ .

Question 1:

“At least one child is female.” Given this, what’s the probability both are female?

Let $A :=$ “both children are female” $= {WW}$ . Let $B :=$ “at least one child is female” $= {M W, W M, WW}$ .

We want $P r [A ∣ B]$ .

$A \cap B = {WW} \cap {M W, W M, WW} = {WW}$ . $P r [A \cap B] = P r [{WW}] = 1/4$ . $P r [B] = P r [{M W, W M, WW}] = 3/4$ . $P r [A ∣ B] = \frac{P r [ A \cap B ]}{P r [ B ]} = \frac{1/4}{3/4} = 1/3$ .

Question 2:

“The older child is female.” Given this, what’s the probability both are female? Let $A :=$ “both children are female” $= {WW}$ . Let $C :=$ “the older child is female” $= {W M, WW}$ (assuming (Older, Younger) notation). We want $P r [A ∣ C]$ . $A \cap C = {WW} \cap {W M, WW} = {WW}$ . $P r [A \cap C] = P r [{WW}] = 1/4$ . $P r [C] = P r [{W M, WW}] = 2/4 = 1/2$ . $P r [A ∣ C] = \frac{P r [ A \cap C ]}{P r [ C ]} = \frac{1/4}{1/2} = 1/2$ .

The subtle difference in information (“at least one is female” vs. “the older is female”) changes the resulting probability.

Chain Rule (Generalization of Multiplication Rule)

For any $n$ events $A_{1}, A_{2}, \dots, A_{n}$ : $P r [A_{1} \cap A_{2} \cap \dots \cap A_{n}] = P r [A_{1}] \cdot P r [A_{2} ∣ A_{1}] \cdot P r [A_{3} ∣ A_{1} \cap A_{2}] \cdot \dots \cdot P r [A_{n} ∣ A_{1} \cap \dots \cap A_{n - 1}]$ This is proven by repeatedly applying the definition of conditional probability: $P r [X \cap Y] = P r [X ∣ Y] P r [Y]$ . The terms telescope.

The Birthday Problem

In a room with $m$ people, what is the probability that at least two people share the same birthday? (Assume 365 days, all equally likely, ignore leap years).

It’s easier to calculate the complementary probability: $P (all birthdays are different)$ . Let $m$ be the number of people (balls) and $n = 365$ be the number of possible birthdays (bins).

Let $A_{i}$ be the event that the $i$ -th person has a birthday different from the first $i - 1$ people. We want to find $P (A_{1} \cap A_{2} \cap \dots \cap A_{m})$ . Using the chain rule: $P (all different) = P (A_{1}) \cdot P (A_{2} ∣ A_{1}) \cdot P (A_{3} ∣ A_{1} \cap A_{2}) \cdot \dots \cdot P (A_{m} ∣ A_{1} \cap \dots \cap A_{m - 1})$

$P (A_{1}) = n / n = 1$ (The first person can have any birthday).
$P (A_{2} ∣ A_{1}) = (n - 1) / n$ (The second person must have one of the $n - 1$ remaining unique birthdays).
$P (A_{3} ∣ A_{1} \cap A_{2}) = (n - 2) / n$ (The third person must have one of the $n - 2$ remaining unique birthdays).
…
$P (A_{i} ∣ A_{1} \cap \dots \cap A_{i - 1}) = (n - (i - 1)) / n$ .

So, $P (all different) = 1 \cdot \frac{n - 1}{n} \cdot \frac{n - 2}{n} \cdot \dots \cdot \frac{n - ( m - 1 )}{n} = \prod_{i = 0}^{m - 1} \frac{n - i}{n}$ .

The probability of at least two people sharing a birthday is $1 - P (all different)$ . For $m = 23$ people, this probability is slightly over 0.5. For $m = 25$ , it’s about 0.57.

Aside: Approximating the Birthday Problem Product

The product $\prod_{i = 0}^{m - 1} (1 - i / n)$ can be approximated using $1 - x \approx e^{- x}$ for small $x$ . $P (all different) \approx \prod_{i = 0}^{m - 1} e^{- i / n} = e^{- \sum_{i = 0}^{m - 1} i / n} = e^{- m (m - 1) / (2 n)}$ .

This approximation shows that the probability of all distinct birthdays drops quickly as $m$ increases, especially when $m (m - 1)$ becomes comparable to $2 n$ . The “tipping point” where the probability of a shared birthday exceeds 0.5 occurs when $m \approx 2 n ln 2$ . For $n = 365$ , this is $m \approx 22.5$ .

Law of Total Probability

Let $A_{1}, A_{2}, \dots, A_{n}$ be a set of events that form a partition of the sample space $Ω$ . This means:

$A_{i} \cap A_{j} = \emptyset$ for $i \neq = j$ (they are pairwise disjoint).
$A_{1} \cup A_{2} \cup \dots \cup A_{n} = Ω$ (their union is the entire sample space).

For any event $B$ , its probability can be calculated as: $P r [B] = \sum_{i = 1}^{n} P r [B \cap A_{i}]$ Using the multiplication rule, $P r [B \cap A_{i}] = P r [B ∣ A_{i}] P r [A_{i}]$ . So, $P r [B] = \sum_{i = 1}^{n} P r [B ∣ A_{i}] P r [A_{i}]$

This law is useful when it’s easier to calculate the probability of $B$ conditioned on different cases $A_{i}$ .

Example: ETH vs. UZH Football Match

Let $B$ be the event “ETH wins”. Let $A_{1}$ be “Star player Messaldo plays”, and $A_{2}$ be “Messaldo does not play”. Assume ${A_{1}, A_{2}}$ forms a partition. Given:

$P r [B ∣ A_{1}] = 0.80$ (ETH wins if Messaldo plays)
$P r [B ∣ A_{2}] = 0.30$ (ETH wins if Messaldo doesn’t play)
$P r [A_{1}] = 0.60$ (Messaldo plays)
Then $P r [A_{2}] = 1 - P r [A_{1}] = 0.40$ (Messaldo doesn’t play)

$P r [B] = P r [B ∣ A_{1}] P r [A_{1}] + P r [B ∣ A_{2}] P r [A_{2}]$ $P r [B] = (0.80) (0.60) + (0.30) (0.40) = 0.48 + 0.12 = 0.60$ . So, the overall probability of ETH winning is 60%.

Bayes’ Theorem

Bayes’ Theorem allows us to “reverse” conditional probabilities. If we know $P r [B ∣ A_{i}]$ and the prior probabilities $P r [A_{i}]$ , we can find the posterior probability $P r [A_{i} ∣ B]$ .

Bayes’ Theorem

Let $A_{1}, A_{2}, \dots, A_{n}$ be a partition of the sample space $Ω$ , and let $B$ be an event with $P r [B] > 0$ . Then for any $A_{i}$ : $P r [A_{i} ∣ B] = \frac{P r [ A _{i} \cap B ]}{P r [ B ]} = \frac{P r [ B ∣ A _{i} ] P r [ A _{i} ]}{P r [ B ]}$ Using the Law of Total Probability for the denominator: $P r [A_{i} ∣ B] = \frac{P r [ B ∣ A _{i} ] P r [ A _{i} ]}{\sum _{j = 1}^{n} P r [ B ∣ A _{j} ] P r [ A _{j} ]}$

Application: Medical Diagnosis

Bayes’ Theorem is frequently used in medical testing to determine the probability that a patient has a disease given a test result.

Let $K$ be the event “patient has the disease” and $\overset{ˉ}{K}$ be “patient does not have the disease”. Let $T_{p os}$ be “test result is positive”. We often know:

$P r [T_{p os} ∣ K]$ : Sensitivity of the test (probability of a positive test if the disease is present).
$P r [T_{n e g} ∣ \overset{ˉ}{K}]$ : Specificity of the test (probability of a negative test if the disease is absent).
- From this, $P r [T_{p os} ∣ \overset{ˉ}{K}] = 1 - P r [T_{n e g} ∣ \overset{ˉ}{K}]$ (probability of a false positive).
$P r [K]$ : Prior probability (or prevalence) of the disease in the population.

We want to calculate $P r [K ∣ T_{p os}]$ : the probability the patient has the disease given a positive test.

$P r [K ∣ T_{p os}] = \frac{P r [ T _{p os} ∣ K ] P r [ K ]}{P r [ T _{p os} ∣ K ] P r [ K ] + P r [ T _{p os} ∣ K ˉ ] P r [ K ˉ ]}$

Example: Breast Cancer Screening (Age 50-69)

Sensitivity: $P r [T_{p os} ∣ K] = 0.72$ (72%)
Specificity: $P r [T_{n e g} ∣ \overset{ˉ}{K}] = 0.98$ (98%) $⟹ P r [T_{p os} ∣ \overset{ˉ}{K}] = 0.02$ (2% false positive rate)
Prevalence: $P r [K] = 0.01$ (1% of women in this age group have breast cancer)
- $P r [\overset{ˉ}{K}] = 0.99$

$P r [K ∣ T_{p os}] = \frac{( 0.72 ) ( 0.01 )}{( 0.72 ) ( 0.01 ) + ( 0.02 ) ( 0.99 )} = \frac{0.0072}{0.0072 + 0.0198} = \frac{0.0072}{0.0270} \approx 0.267$ or $26.7%$ . Even with a positive test, the probability of having cancer is only 26.7%. This is often counter-intuitive and is known as the base rate fallacy.

Example: Breast Cancer Screening (Age 20-29)

Same test sensitivity and specificity.

Prevalence: $P r [K] = 0.0002$ (0.02% of women in this age group have breast cancer)
- $P r [\overset{ˉ}{K}] = 0.9998$

$P r [K ∣ T_{p os}] = \frac{( 0.72 ) ( 0.0002 )}{( 0.72 ) ( 0.0002 ) + ( 0.02 ) ( 0.9998 )} = \frac{0.000144}{0.000144 + 0.019996} = \frac{0.000144}{0.02014} \approx 0.00715$ or $0.715%$ . For a lower-risk population, a positive test indicates a much lower probability of actually having the disease. This highlights the critical role of the prior probability (base rate).

Continue here: 11 Independence of Multiple Events, Random Variables

CS Notes

Explorer

10 Conditional Probability, Independence, and Bayes' Theorem

Introduction to Probability

Probability Space

Discrete Probability Space (Recall)

Laplace Space (Recall)

Example: Drawing a Card

Example: Card Game Scenario

Conditional Probability

Definition

Example: Card Game (Continued)

The Multiplication Rule (derived from conditional probability)

Example: The Two-Children Problem

Question 1:

Question 2:

Chain Rule (Generalization of Multiplication Rule)

The Birthday Problem

Aside: Approximating the Birthday Problem Product

Law of Total Probability

Example: ETH vs. UZH Football Match

Bayes’ Theorem

Bayes’ Theorem

Application: Medical Diagnosis

Example: Breast Cancer Screening (Age 50-69)

Example: Breast Cancer Screening (Age 20-29)

Table of Contents

Graph View