Lecture from: 27.03.2025 | Video: Homelab
Recap: Random Variables and Expectation
Just to quickly refresh:
A random variable is a function that assigns a numerical value to each outcome in our sample space.
The expected value is defined as or, more fundamentally, .
The superstar property is linearity of expectation: For random variables and constants : .
This holds regardless of whether the ‘s are independent!
And for an indicator variable for event ( if occurs, 0 otherwise), we have .
Analyzing Randomized QuickSort
The task is to sort elements. QuickSort is a popular algorithm for this. In the randomized version, we pick the pivot element uniformly at random from the current subarray.
Algorithm QUICKSORT(A, l, r)
:
- If then:
- Choose a pivot uniformly at random from the elements .
- Partition around such that elements are to its left, and elements are to its right. Let end up at index . (So and ).
QUICKSORT(A, l, t-1)
(Recursive call on the “left block”)QUICKSORT(A, t+1, r)
(Recursive call on the “right block”)
The running time of QuickSort is dominated by the number of comparisons made. Let be the random variable representing the total number of comparisons made by Randomized QuickSort on an input array of length . From basic algorithms courses, we know the runtime is .
We want to find . The known result is .
Analyzing Comparisons using Indicator Variables
Let the input array be , and let be the elements of in sorted order. (These are the values, not necessarily their initial positions).
Let be the total number of comparisons.
We can express as a sum of indicator variables. For every pair of distinct elements from the sorted list, let be the indicator variable for the event ” is compared with during the execution of QuickSort”. We can assume without loss of generality, as comparison is symmetric.
By linearity of expectation:
And since is an indicator variable, . So, .
Key Observations about Comparisons in QuickSort
- A pivot element is compared with all other elements currently in its own block.
- Once an element is chosen as a pivot, it is never part of any future block and is never compared again.
- Two elements and are compared if and only if one of them is the first element to be chosen as a pivot from the set .
- Why? Consider the set of elements . These elements are all initially in the same block.
- If an element with is chosen as the pivot before either or are chosen as pivots, then will go to the left sub-block (since ) and will go to the right sub-block (since ). Once and are in different sub-blocks, they will never be compared.
- If is the first pivot chosen from , then will be compared to all other elements in that are still in its block, including .
- If is the first pivot chosen from , then will be compared to all other elements in that are still in its block, including .
So, is the probability that out of the set , either or is chosen as the pivot when that set (or a superset containing it) is first partitioned by one of its members.
The elements in are . There are such elements.
When the block containing all of is partitioned for the first time by choosing a pivot from , each of these elements has an equal chance of being chosen as that first pivot from this set. and will be compared if and only if is picked first from or is picked first from . There are 2 “favorable” choices (either or ) out of possibilities for being the first pivot from this set.
Therefore, .
Now we can calculate :
More on Indicator Variables
We’ve seen if , otherwise. How do indicator variables behave with set operations?
- Intersection: . An outcome is in if it’s in AND it’s in . iff AND . So, . (Product of indicator variables)
- Complement: . iff . So, .
- Union: . What is ? We know means OR (or both). Using De Morgan’s laws and complement: . So, . Alternatively, directly from inclusion-exclusion principle for probabilities and : . Since this holds for all probability distributions, it suggests . Let’s check: if : . If : . Correct.
Principle of Inclusion-Exclusion via Indicator Variables
Let . We want .
The complement is .
So, .
Expanding this product: .
Now, . . Taking the expectation of both sides and using and linearity: . This is exactly the Principle of Inclusion-Exclusion!
Common Discrete Probability Distributions
We’ll now look at a few named discrete distributions that appear frequently.
Bernoulli Distribution
TLDR: success or failure + independence
An indicator variable follows a Bernoulli distribution, if an experiment has only two outcomes, “success” (with probability ) and “failure” (with probability ), a random variable that is 1 for success and 0 for failure is a Bernoulli random variable.
We write .
The event is “success”, so .
PMF (Dichtefunktion):
Expected Value: .
Binomial Distribution
TLDR: n independent bernoulli with n choose k wins
Suppose we perform independent Bernoulli trials, each with the same success probability . Let be the total number of successes in these trials. Then follows a Binomial distribution, written .
To find (probability of exactly successes in trials):
- Choose positions out of for the successes: ways.
- The probability of one specific sequence of successes and failures is (due to independence).
PMF:
Expected Value: If where are independent indicator variables for success on trial , then by linearity of expectation: . So, .
Poisson Distribution (Balls and Bins)
TLDR: Limit of Binomial
Consider throwing balls randomly and independently into bins.
Let be the number of balls in the first bin. Each ball lands in the first bin with probability . So, .
.
What happens to as ?
As :
- for fixed . So the product .
- . We know . And .
So, .
More generally, if (so ), then as : .
This is the PMF of a Poisson distribution with parameter . We write . .
The Poisson distribution models the number of rare events occurring in a fixed interval of time or space, if these events occur with a known constant mean rate and independently of the time since the last event.
Example: Number of heart attacks in Switzerland in the next hour.
Geometric Distribution
TLDR: bernoulli until first success + memoryless
Suppose we perform independent Bernoulli trials, each with success probability . Let be the number of trials until the first success occurs.
Then follows a Geometric distribution, written .
Example: Keep flipping a coin (success = Head, prob ) until the first Head appears. is the number of flips.
For (first success on -th trial), we must have failures followed by one success: Sequence: F F … F S (k-1 F’s)
Probability: .
PMF:
Expected Value: .
Intuitively, if success prob is , you expect to wait 10 trials.
CDF: . means the first trials were all failures. So, . Thus, for .
Memorylessness Property
A key property of the Geometric distribution is memorylessness.
It states that for :
(The slide has which is slightly different but similar in spirit if is large, or can be rewritten using directly for ).
Let’s use the standard definition: If you’ve already waited trials without success, the probability that you have to wait at least additional trials for the first success is the same as the probability that you had to wait at least trials from the very beginning. The process “forgets” how long it has already waited.
Example: If you’ve flipped 1000 tails in a row, the probability of getting a Head on the 1001st flip is still , same as on the 1st flip.
Negative Binomial Distribution
TLDR: n-th successful bernoulli (i.e. generalize geometric)
This generalizes the Geometric distribution.
Let be the number of trials until the -th success occurs in a sequence of independent Bernoulli trials (success prob ).
Then follows a Negative Binomial distribution. (The slide calls it NegativeBinomial(n)
, but often it’s parameterized by and , like NB(n,p)
).
To have the -th success on the -th trial:
- The -th trial must be a success.
- In the first trials, there must have been exactly successes.
The number of ways to arrange successes in trials is . The probability of any such specific sequence of successes (and failures) in the first trials is . Then multiply by for the success on the -th trial.
PMF:
Expected Value: . If is time to -th success after -th success, then , so .
Coupon Collector’s Problem
There are different types of coupons (e.g., pictures in cereal boxes). In each round (e.g., buying a box), you get one coupon chosen uniformly at random from the types (with replacement).
Let be the number of rounds until you have collected at least one of each of the types of coupons. Question: What is ?
Solution Approach
We divide the process into phases.
Phase : The time (number of rounds) spent while you already have distinct coupon types and are waiting to collect the -th new distinct type.
Let be the number of rounds in Phase .
The total number of rounds is . By linearity of expectation, .
Consider Phase : You currently have distinct coupon types. In any round, the probability of getting any specific coupon is . There are coupon types that you don’t yet have.
So, the probability of getting a new coupon type (a success for this phase) in any given round is .
The number of rounds to get this -th new type is therefore Geometrically distributed: . So, .
Expectation
Now, sum the expectations:
.
Let . As goes from to , goes from down to .
.
Since (where is the Euler-Mascheroni constant),
.
So, on average, you need about rounds to collect all coupons. For example, if , .