Lecture from: 20.03.2025 | Video: Homelab

Conditional Probability (Bedingte Wahrscheinlichkeit)

Often, we have partial information about the outcome of a random experiment, and we want to know how this information affects the probabilities of other events. This leads to the concept of conditional probability.

Intuition: What is the probability of event happening, given that we know event has already occurred?

Motivating Example: Card Game

Recall the scenario: Shuffle a 52-card deck, deal 5 cards to Player 1 () and 5 cards to Player 2 (). The sample space consists of all pairs of disjoint 5-card hands , with . This is a Laplace space.

Let be the event “Player 1 has all four Aces”. We previously calculated . To form Player 1’s hand, we must choose all 4 Aces ( way) and 1 non-Ace ( ways). So Player 1’s hand can be chosen in ways. Once is chosen, Player 2’s hand can be any 5 cards from the remaining 47 cards, so there are ways to choose . . .

Now, let’s view this from Player 2’s perspective. Suppose Player 2 looks at their hand and sees a specific set of cards, say . This hand contains no Aces. Does knowing Player 2’s hand change the probability that Player 1 has four Aces?

Player 2 is interested in:

This “probability of A given B” is the conditional probability.

Definition 2.7: Conditional Probability

Let and be events in a probability space . If , the conditional probability of given , denoted , is defined as:

Interpretation: Knowing that event has occurred effectively restricts the possible outcomes to those in . We are interested in the probability that an outcome also belongs to , relative to the “new” sample space . The probability represents the portion of the original sample space where both and occur. We renormalize this probability by dividing by so that the probabilities within the restricted space sum to 1.

As the image indicates:

  • is the probability that event A occurs, when we already know that event B has occurred.
  • The formula is how we calculate this conditional probability.

Applying to the Card Example (Intuitive Calculation): If Player 2 has hand (5 specific non-Ace cards), then there are 47 cards remaining for Player 1. For Player 1 to have 4 Aces, their hand must consist of the 4 Aces and 1 other card from the cards that are not Player 2’s cards and not Aces. So, given Player 2’s hand , there are 43 possible hands for Player 1 that satisfy event (“Player 1 has 4 Aces”).

How many possible hands could Player 1 have, given Player 2’s hand ? Player 1 must have 5 cards chosen from the 47 cards not in . The total number of possible hands for Player 1 is .

If we treat the set of possible hands for Player 1 given Player 2’s hand as a new Laplace space, then:

Applying the Definition (Formal Calculation): Let be the event “Player 2 has the specific hand ”. . (Only 1 way for Player 2 to have this specific hand , given Player 1’s hand is chosen from the remaining 47). A slight subtlety: . The number of outcomes where Player 2 has hand B is (Player 1 can have any hand from the remaining 47). So . Let be the event “Player 1 has 4 Aces”. is the event “Player 1 has 4 Aces AND Player 2 has hand B”. The number of outcomes in : Player 1’s hand must be {4 Aces, 1 non-Ace other than those in B} (43 choices for the non-Ace). Player 2’s hand is fixed as B. So . .

. This matches the intuitive calculation. The probability increased slightly because Player 2 holding non-Aces makes it slightly more likely that the Aces are concentrated elsewhere.

Example: The Two-Child Problem

This classic problem highlights the importance of carefully defining the sample space and the conditioning event.

Scenario: A family has two children. We assume the gender of each child (M=Male, W=Female) is equally likely (like a coin flip) and independent.

Sample Space: . Here, the order matters (e.g., MW means older is Male, younger is Female). This is a Laplace space with . .

Events: Female

  • : “Both children are female”. . .
  • : “At least one child is female”. . .

Question 1: What is the probability that both children are female, given that at least one child is female? We want . . . .

Question 2 (Variation): What is the probability that both children are female, given that the older child is female? Let be the event “Older child is female”. . . We want . . . .

Comparison: Knowing at least one child is female yields a 1/3 probability for both being female. Knowing the older child is female yields a 1/2 probability. The specific information matters!

Multiplication Rule (Multiplikationssatz)

Rearranging the definition of conditional probability gives a way to calculate the probability of an intersection:

If , then . Similarly, if , then .

This can be generalized to the intersection of multiple events, often called the Chain Rule of Probability.

Satz 2.10 (Multiplikationssatz): Let be events. If (which implies all intermediate intersections also have positive probability), then:

Proof: The right-hand side is a telescoping product: All intermediate terms cancel out, leaving only . (The division by probabilities is valid because the condition ensures that for all ).

Application: The Birthday Problem (Geburtstagsproblem)

Question: In a room with people, what is the probability that at least two people share the same birthday? (Assuming possible birthdays, equally likely, ignoring leap years).

Reformulation (Balls and Bins): Randomly assign each of people (balls) to one of birthdays (bins). What is the probability that at least one bin contains more than one ball?

Strategy: It’s easier to calculate the probability of the complementary event, : “All people have different birthdays”. The probability we want is .

Sample Space: We can model the assignment of birthdays as an ordered sequence of length , where each element is chosen from with replacement. . . This is a Laplace space.

Calculating using the Multiplication Rule: Let be the event “The -th person’s birthday is different from the birthdays of the first people”. The event (all birthdays are different) is the intersection . Using the multiplication rule:

  • : The first person can have any birthday. This event is certain. .
  • : Given the first person has a birthday, there are remaining birthdays out of that the second person can have to be different. .
  • : Given the first two people have different birthdays, there are remaining birthdays out of that the third person can have to be different from the first two. .
  • : Given the first people have different birthdays, there are remaining birthdays out of that the -th person can have. .

Substituting these into the multiplication rule: (with the first term for being 1).

Probability of a Match: The probability that at least two people share a birthday is:

Numerical Example (n=365, m=25): Calculating this product for gives . So, . There is about a 57% chance that at least two people share a birthday in a room of 25 people. The probability crosses 50% at .

Excursion: Approximating Products (Birthday Problem)

For large and moderate , we can approximate the product using the fact that for small . More accurately, .

Using the formula for the sum of the first integers, :

This approximation is quite accurate, as shown in the graph on slide 21.

Threshold Behavior: When does ? This happens when .

For large , . So, , which means . If , . This confirms that the threshold is around .

The approximation also shows:

  • If , then is small, . (Low probability of a match).
  • If , then is large, . (High probability of a match). The transition happens around .

Law of Total Probability (Satz von der totalen Wahrscheinlichkeit)

This law provides a way to calculate the probability of an event by conditioning on a set of mutually exclusive and exhaustive events (a partition).

Satz 2.12 (Satz von der totalen Wahrscheinlichkeit): Let be pairwise disjoint events such that their union contains event (i.e., ). Often, form a partition of the entire sample space . Then:

Derivation: The sets are pairwise disjoint because the are disjoint. Also, . Using the addition rule for disjoint events (Lemma 2.2, part 5): Using the definition of conditional probability, (assuming . If , the term is 0 anyway). Substituting this gives: .

Intuition: To find the total probability of , we consider all possible disjoint scenarios under which could occur. For each scenario , we find the probability that occurs given () and weight it by the probability of that scenario occurring (). Summing these weighted probabilities gives the total probability of .

Example: ETH vs. UZH Football Let be the event “ETH wins”. Let be the event “Star player Messaldo plays”. Let be the event “Messaldo does not play” (). and form a partition of the sample space.

Given information:

  • (80%)
  • (30%)
  • (60%)
  • (40%)

Using the Law of Total Probability to find the overall probability that ETH wins: The overall probability that ETH wins is 60%.

Bayes’ Theorem

Bayes’ Theorem relates the conditional probability of an event given to the conditional probability of given . It’s incredibly useful for updating beliefs based on new evidence.

Satz (Satz von Bayes): Let be pairwise disjoint events such that . Assume and for all . Then for any specific :

Using the Law of Total Probability for the denominator, we get the expanded form:

Derivation: Start with the definition of conditional probability:

  1. Substitute (2) into the numerator of (1): Now substitute the Law of Total Probability for in the denominator to get the expanded form.

Interpretation of Terms:

  • : Prior probability of . Our belief about before observing .
  • : Likelihood of observing evidence given that scenario is true.
  • : Evidence probability (marginal likelihood). The total probability of observing , averaged over all possible scenarios . Acts as a normalization constant.
  • : Posterior probability of . Our updated belief about after observing .

Application: Medical Testing (Breast Cancer Screening)

This is a classic application showing how Bayes’ Theorem helps interpret diagnostic test results.

Scenario: A screening test (digital mammography) for breast cancer. Events:

  • : Patient has breast cancer (Krebs).
  • : Patient does not have breast cancer.
  • : Test result is positive.
  • : Test result is negative.

Known Statistics (Example 1: Age 50-69):

  • Sensitivity: (72%). Probability test is positive, given patient has cancer.
  • Specificity: (98%). Probability test is negative, given patient does not have cancer.
  • Prevalence: (1%). Probability a woman in this age group has breast cancer (prior).

Derived Probabilities:

  • False Negative Rate: .
  • False Positive Rate: .
  • Prior (No Cancer): .

Question: If a woman in this age group tests positive, what is the probability she actually has breast cancer? We want .

Applying Bayes’ Theorem: The partition events are and . The evidence is .

First, find the denominator using the Law of Total Probability:

Now calculate the posterior probability:

Result: Even with a positive test, the probability of actually having breast cancer is only about 26.7%! This is because the prevalence is low, and the false positive rate contributes significantly to the positive results.

Example 2 (Age 20-29): Prevalence is much lower: (0.02%). . Sensitivity and Specificity are assumed the same.

Result: For a younger woman with much lower prior risk, a positive test yields only about a 0.7% chance of actually having breast cancer.