Lecture from: 20.03.2025 | Video: Homelab | Rui Zhangs Notes
Introduction to Probability
This lecture introduces fundamental concepts in probability theory, which are essential tools in the analysis and design of algorithms, especially randomized algorithms. We will cover probability spaces, conditional probability, Bayes’ theorem, and independence.
Probability Space
Discrete Probability Space (Recall)
A discrete probability space is defined by:
- A sample space , which is the set of all possible elementary outcomes of an experiment.
- A probability measure , which assigns a probability to each elementary event such that:
- for all .
- (the sum of probabilities of all elementary outcomes is 1).
An event is any subset of the sample space (i.e., ). The probability of an event is the sum of the probabilities of the elementary outcomes in :
The complementary event to , denoted , is the set of all outcomes in that are not in , i.e., . It follows that .
Laplace Space (Recall)
A Laplace space (or uniform probability space) is a special type of finite probability space where every elementary outcome is equally likely.
If is the total number of elementary outcomes, then for any :
In a Laplace space, the probability of an event is given by:
Example: Drawing a Card
Consider a standard deck of 52 playing cards. The set of cards can be represented as . If we draw one card randomly, the sample space is , and . This is a Laplace space, so the probability of drawing any specific card is . For example, the event “drawing an Ace” consists of 4 outcomes. So, .
Example: Card Game Scenario
We shuffle a deck of 52 cards and deal 5 cards to Player 1 and 5 cards to Player 2.
The sample space consists of all possible pairs of hands , where is Player 1’s 5-card hand and is Player 2’s 5-card hand, such that , , and .
The total number of ways to deal these cards is .
Let’s consider the event “Player 1 has four Aces”.
To calculate , we consider the possibilities for Player 1’s hand. The total number of 5-card hands Player 1 can receive is .
The number of ways Player 1 can have four Aces is by choosing all 4 Aces ( way) and 1 other card from the remaining 48 non-Ace cards ( ways). So, there are hands where Player 1 has four Aces.
Thus, the probability is:
Conditional Probability
Conditional probability addresses how the probability of an event changes when we know that another event has occurred.
If we are interested in the probability of event occurring, given that event has already occurred, we denote this as (read “probability of A given B”).
Definition
Let and be two events in a sample space . If , the conditional probability of given is defined as: where is the event that both and occur.
If the underlying space is a Laplace space, this can be written as: This means we restrict our attention to the outcomes where occurs (this becomes our new, smaller sample space), and then find the fraction of these outcomes where also occurs.
Example: Card Game (Continued)
Let “Player 1 has four Aces”. Let “Player 2 is dealt the hand ” (none of these are Aces). What is ?
Given Player 2’s hand (event ), there are cards remaining. Player 1 will receive 5 cards from these 47 cards. The total number of possible hands for Player 1, given Player 2’s hand, is . This is the size of our new sample space (outcomes where occurred).
Now, for event (Player 1 has 4 Aces AND Player 2 has their specific hand): Since Player 2 has no Aces, all 4 Aces are still available among the 47 remaining cards. For Player 1 to have 4 Aces, Player 1 must receive all 4 Aces ( way) and 1 more card from the cards that are not Aces and not in Player 2’s hand ( ways). So, there are hands for Player 1 corresponding to .
Notice is higher than . This is because Player 2 holding 5 non-Aces slightly increases the chance of Player 1 getting the Aces from the remaining cards.
The Multiplication Rule (derived from conditional probability)
- Also, by symmetry:
Example: The Two-Children Problem
A family has two children. We assume child births are independent and probability of male (M) or female (W) is 1/2 each. The sample space for the genders of two children (e.g., older, younger) is , with each outcome having probability .
Question 1:
“At least one child is female.” Given this, what’s the probability both are female?
Let “both children are female” . Let “at least one child is female” .
We want .
. . . .
Question 2:
“The older child is female.” Given this, what’s the probability both are female? Let “both children are female” . Let “the older child is female” (assuming (Older, Younger) notation). We want . . . . .
The subtle difference in information (“at least one is female” vs. “the older is female”) changes the resulting probability.
Chain Rule (Generalization of Multiplication Rule)
For any events : This is proven by repeatedly applying the definition of conditional probability: . The terms telescope.
The Birthday Problem
In a room with people, what is the probability that at least two people share the same birthday? (Assume 365 days, all equally likely, ignore leap years).
It’s easier to calculate the complementary probability: . Let be the number of people (balls) and be the number of possible birthdays (bins).
Let be the event that the -th person has a birthday different from the first people. We want to find . Using the chain rule:
- (The first person can have any birthday).
- (The second person must have one of the remaining unique birthdays).
- (The third person must have one of the remaining unique birthdays).
- …
- .
So, .
The probability of at least two people sharing a birthday is . For people, this probability is slightly over 0.5. For , it’s about 0.57.
Aside: Approximating the Birthday Problem Product
The product can be approximated using for small . .
This approximation shows that the probability of all distinct birthdays drops quickly as increases, especially when becomes comparable to . The “tipping point” where the probability of a shared birthday exceeds 0.5 occurs when . For , this is .
Law of Total Probability
Let be a set of events that form a partition of the sample space . This means:
- for (they are pairwise disjoint).
- (their union is the entire sample space).
For any event , its probability can be calculated as: Using the multiplication rule, . So,
This law is useful when it’s easier to calculate the probability of conditioned on different cases .
Example: ETH vs. UZH Football Match
Let be the event “ETH wins”. Let be “Star player Messaldo plays”, and be “Messaldo does not play”. Assume forms a partition. Given:
- (ETH wins if Messaldo plays)
- (ETH wins if Messaldo doesn’t play)
- (Messaldo plays)
- Then (Messaldo doesn’t play)
. So, the overall probability of ETH winning is 60%.
Bayes’ Theorem
Bayes’ Theorem allows us to “reverse” conditional probabilities. If we know and the prior probabilities , we can find the posterior probability .
Bayes’ Theorem
Let be a partition of the sample space , and let be an event with . Then for any : Using the Law of Total Probability for the denominator:
Application: Medical Diagnosis
Bayes’ Theorem is frequently used in medical testing to determine the probability that a patient has a disease given a test result.
Let be the event “patient has the disease” and be “patient does not have the disease”. Let be “test result is positive”. We often know:
- : Sensitivity of the test (probability of a positive test if the disease is present).
- : Specificity of the test (probability of a negative test if the disease is absent).
- From this, (probability of a false positive).
- : Prior probability (or prevalence) of the disease in the population.
We want to calculate : the probability the patient has the disease given a positive test.
Example: Breast Cancer Screening (Age 50-69)
- Sensitivity: (72%)
- Specificity: (98%) (2% false positive rate)
- Prevalence: (1% of women in this age group have breast cancer)
or . Even with a positive test, the probability of having cancer is only 26.7%. This is often counter-intuitive and is known as the base rate fallacy.
Example: Breast Cancer Screening (Age 20-29)
Same test sensitivity and specificity.
- Prevalence: (0.02% of women in this age group have breast cancer)
or . For a lower-risk population, a positive test indicates a much lower probability of actually having the disease. This highlights the critical role of the prior probability (base rate).
Continue here: 11 Independence of Multiple Events, Random Variables