12 Randomized QuickSort, Indicator Variables, Common Probability Distributions, Coupon Collector

Lecture from: 27.03.2025 | Video: Homelab

Recap: Random Variables and Expectation

Just to quickly refresh:

A random variable $X$ is a function $X : Ω \to R$ that assigns a numerical value to each outcome in our sample space.

The expected value $E [X]$ is defined as $E [X] := \sum_{x \in W_{X}} x \cdot P r [X = x]$ or, more fundamentally, $E [X] = \sum_{ω \in Ω} X (ω) \cdot P r [ω]$ .

The superstar property is linearity of expectation: For random variables $X_{1}, \dots, X_{n}$ and constants $a_{1}, \dots, a_{n}, b$ : $E [a_{1} X_{1} + \dots + a_{n} X_{n} + b] = a_{1} E [X_{1}] + \dots + a_{n} E [X_{n}] + b$ .

This holds regardless of whether the $X_{i}$ ‘s are independent!

And for an indicator variable $X_{A}$ for event $A$ ( $X_{A} = 1$ if $A$ occurs, 0 otherwise), we have $E [X_{A}] = P r [A]$ .

Analyzing Randomized QuickSort

The task is to sort $n$ elements. QuickSort is a popular algorithm for this. In the randomized version, we pick the pivot element uniformly at random from the current subarray.

Algorithm QUICKSORT(A, l, r):

If $l < r$ then:
Choose a pivot $p$ uniformly at random from the elements $A [l], \dots, A [r]$ .
Partition $A [l \dots r]$ around $p$ such that elements $\leq p$ are to its left, and elements $> p$ are to its right. Let $p$ end up at index $t$ . (So $A [l \dots t - 1] \leq A [t]$ and $A [t + 1 \dots r] > A [t]$ ).
QUICKSORT(A, l, t-1) (Recursive call on the “left block”)
QUICKSORT(A, t+1, r) (Recursive call on the “right block”)

The running time of QuickSort is dominated by the number of comparisons made. Let $X$ be the random variable representing the total number of comparisons made by Randomized QuickSort on an input array $A$ of length $n$ . From basic algorithms courses, we know the runtime is $Θ (X)$ .

We want to find $E [X]$ . The known result is $E [X] = O (n ln n)$ .

Analyzing Comparisons using Indicator Variables

Let the input array be $A$ , and let $a_{1} < a_{2} < \dots < a_{n}$ be the elements of $A$ in sorted order. (These are the values, not necessarily their initial positions).

Let $X$ be the total number of comparisons.

We can express $X$ as a sum of indicator variables. For every pair of distinct elements $(a_{i}, a_{j})$ from the sorted list, let $X_{i, j}$ be the indicator variable for the event ” $a_{i}$ is compared with $a_{j}$ during the execution of QuickSort”. We can assume $i < j$ without loss of generality, as comparison is symmetric.

$X = \sum_{1 \leq i < j \leq n} X_{i, j}$

By linearity of expectation: $E [X] = \sum_{1 \leq i < j \leq n} E [X_{i, j}]$

And since $X_{i, j}$ is an indicator variable, $E [X_{i, j}] = P r [a_{i} is compared with a_{j}]$ . So, $E [X] = \sum_{1 \leq i < j \leq n} P r [a_{i} is compared with a_{j}]$ .

Key Observations about Comparisons in QuickSort

A pivot element is compared with all other elements currently in its own block.
Once an element is chosen as a pivot, it is never part of any future block and is never compared again.
Two elements $a_{i}$ and $a_{j}$ are compared if and only if one of them is the first element to be chosen as a pivot from the set ${a_{i}, a_{i + 1}, \dots, a_{j}}$ .
- Why? Consider the set of elements $S_{i, j} = {a_{i}, a_{i + 1}, \dots, a_{j}}$ . These elements are all initially in the same block.
- If an element $a_{k}$ with $i < k < j$ is chosen as the pivot before either $a_{i}$ or $a_{j}$ are chosen as pivots, then $a_{i}$ will go to the left sub-block (since $a_{i} < a_{k}$ ) and $a_{j}$ will go to the right sub-block (since $a_{j} > a_{k}$ ). Once $a_{i}$ and $a_{j}$ are in different sub-blocks, they will never be compared.
- If $a_{i}$ is the first pivot chosen from $S_{i, j}$ , then $a_{i}$ will be compared to all other elements in $S_{i, j}$ that are still in its block, including $a_{j}$ .
- If $a_{j}$ is the first pivot chosen from $S_{i, j}$ , then $a_{j}$ will be compared to all other elements in $S_{i, j}$ that are still in its block, including $a_{i}$ .

So, $P r [a_{i} is compared with a_{j}]$ is the probability that out of the set $S_{i, j} = {a_{i}, a_{i + 1}, \dots, a_{j}}$ , either $a_{i}$ or $a_{j}$ is chosen as the pivot when that set (or a superset containing it) is first partitioned by one of its members.

The elements in $S_{i, j}$ are $a_{i}, a_{i + 1}, \dots, a_{j}$ . There are $j - i + 1$ such elements.

When the block containing all of $S_{i, j}$ is partitioned for the first time by choosing a pivot from $S_{i, j}$ , each of these $j - i + 1$ elements has an equal chance of being chosen as that first pivot from this set. $a_{i}$ and $a_{j}$ will be compared if and only if $a_{i}$ is picked first from $S_{i, j}$ or $a_{j}$ is picked first from $S_{i, j}$ . There are 2 “favorable” choices (either $a_{i}$ or $a_{j}$ ) out of $j - i + 1$ possibilities for being the first pivot from this set.

Therefore, $P r [a_{i} is compared with a_{j}] = \frac{2}{j - i + 1}$ .

Now we can calculate $E [X]$ : $E [X] = \sum_{1 \leq i < j \leq n} \frac{2}{j - i + 1}$

More on Indicator Variables

We’ve seen $I_{A} (ω) = 1$ if $ω \in A$ , $0$ otherwise. How do indicator variables behave with set operations?

Intersection: $A \cap B$ . An outcome $ω$ is in $A \cap B$ if it’s in $A$ AND it’s in $B$ . $I_{A \cap B} (ω) = 1$ iff $I_{A} (ω) = 1$ AND $I_{B} (ω) = 1$ . So, $I_{A \cap B} = I_{A} \cdot I_{B}$ . (Product of indicator variables)
Complement: $\overset{ˉ}{A} = Ω ∖ A$ . $I_{\overset{ˉ}{A}} (ω) = 1$ iff $I_{A} (ω) = 0$ . So, $I_{\overset{ˉ}{A}} = 1 - I_{A}$ .
Union: $A \cup B$ . What is $I_{A \cup B}$ ? We know $A \cup B$ means $ω \in A$ OR $ω \in B$ (or both). Using De Morgan’s laws and complement: $A \cup B = \overline{\overset{ˉ}{A} \cap \overset{ˉ}{B}}$ . So, $I_{A \cup B} = I_{\overline{\overset{ˉ}{A} \cap \overset{ˉ}{B}}} = 1 - I_{\overset{ˉ}{A} \cap \overset{ˉ}{B}}$ $= 1 - (I_{\overset{ˉ}{A}} \cdot I_{\overset{ˉ}{B}})$ $= 1 - ((1 - I_{A}) (1 - I_{B}))$ $= 1 - (1 - I_{A} - I_{B} + I_{A} I_{B})$ $= I_{A} + I_{B} - I_{A} I_{B}$ . Alternatively, directly from inclusion-exclusion principle for probabilities and $E [I_{E}] = P r [E]$ : $P r [A \cup B] = P r [A] + P r [B] - P r [A \cap B]$ $E [I_{A \cup B}] = E [I_{A}] + E [I_{B}] - E [I_{A \cap B}] = E [I_{A}] + E [I_{B}] - E [I_{A} I_{B}]$ . Since this holds for all probability distributions, it suggests $I_{A \cup B} = I_{A} + I_{B} - I_{A} I_{B}$ . Let’s check: if $ω \in A, ω \in / B$ : $1 = 1 + 0 - 0$ . If $ω \in A, ω \in B$ : $1 = 1 + 1 - 1$ . Correct.

Principle of Inclusion-Exclusion via Indicator Variables

Let $B = A_{1} \cup A_{2} \cup \dots \cup A_{n}$ . We want $P r [B] = E [I_{B}]$ .

The complement is $\overset{ˉ}{B} = \overline{A_{1} \cup \dots \cup A_{n}} = \overset{ˉ}{A_{1}} \cap \overset{ˉ}{A_{2}} \cap \dots \cap \overset{ˉ}{A_{n}}$ .

So, $I_{\overset{ˉ}{B}} = I_{\overset{ˉ}{A_{1}} \cap \dots \cap \overset{ˉ}{A_{n}}} = I_{\overset{ˉ}{A_{1}}} \cdot I_{\overset{ˉ}{A_{2}}} \cdot \dots \cdot I_{\overset{ˉ}{A_{n}}}$ $I_{\overset{ˉ}{B}} = (1 - I_{A_{1}}) (1 - I_{A_{2}}) \dots (1 - I_{A_{n}})$ .

Expanding this product: $I_{\overset{ˉ}{B}} = 1 - \sum_{i} I_{A_{i}} + \sum_{i < j} I_{A_{i}} I_{A_{j}} - \sum_{i < j < k} I_{A_{i}} I_{A_{j}} I_{A_{k}} + \dots + (- 1)^{n} I_{A_{1}} \dots I_{A_{n}}$ .

Now, $I_{B} = 1 - I_{\overset{ˉ}{B}}$ . $I_{B} = \sum_{i} I_{A_{i}} - \sum_{i < j} I_{A_{i}} I_{A_{j}} + \sum_{i < j < k} I_{A_{i}} I_{A_{j}} I_{A_{k}} - \dots - (- 1)^{n} I_{A_{1}} \dots I_{A_{n}}$ . Taking the expectation of both sides and using $E [I_{E} I_{F} \dots] = E [I_{E \cap F \cap \dots}] = P r [E \cap F \cap \dots]$ and linearity: $P r [B] = \sum_{i} P r [A_{i}] - \sum_{i < j} P r [A_{i} \cap A_{j}] + \sum_{i < j < k} P r [A_{i} \cap A_{j} \cap A_{k}] - \dots + (- 1)^{n - 1} P r [A_{1} \cap \dots \cap A_{n}]$ . This is exactly the Principle of Inclusion-Exclusion!

Common Discrete Probability Distributions

We’ll now look at a few named discrete distributions that appear frequently.

Bernoulli Distribution

TLDR: success or failure + independence

An indicator variable $I_{A}$ follows a Bernoulli distribution, if an experiment has only two outcomes, “success” (with probability $p$ ) and “failure” (with probability $1 - p$ ), a random variable $X$ that is 1 for success and 0 for failure is a Bernoulli random variable.

We write $X \sim Bernoulli (p)$ .

The event $A$ is “success”, so $p = P r [A] = E [I_{A}]$ .

PMF (Dichtefunktion): $f_{X} (x) = ⎩ ⎨ ⎧ p 1 - p 0 if x = 1 if x = 0 otherwise$

Expected Value: $E [X] = 1 \cdot p + 0 \cdot (1 - p) = p$ .

Binomial Distribution

TLDR: n independent bernoulli with n choose k wins

Suppose we perform $n$ independent Bernoulli trials, each with the same success probability $p$ . Let $X$ be the total number of successes in these $n$ trials. Then $X$ follows a Binomial distribution, written $X \sim Bin (n, p)$ .

To find $P r [X = k]$ (probability of exactly $k$ successes in $n$ trials):

Choose $k$ positions out of $n$ for the successes: $(k n)$ ways.
The probability of one specific sequence of $k$ successes and $n - k$ failures is $p^{k} (1 - p)^{n - k}$ (due to independence).

PMF: $P r [X = k] = f_{X} (k) = {(k n) p^{k} (1 - p)^{n - k} 0 for k \in {0, 1, \dots, n} otherwise$

Expected Value: If $X = X_{1} + \dots + X_{n}$ where $X_{i} \sim Bernoulli (p)$ are independent indicator variables for success on trial $i$ , then by linearity of expectation: $E [X] = E [X_{1}] + \dots + E [X_{n}] = p + \dots + p = n p$ . So, $E [X] = n p$ .

Poisson Distribution (Balls and Bins)

TLDR: Limit of Binomial

Consider throwing $n$ balls randomly and independently into $n$ bins.

Let $X$ be the number of balls in the first bin. Each ball lands in the first bin with probability $p = 1/ n$ . So, $X \sim Bin (n, 1/ n)$ .

$E [X] = n \cdot (1/ n) = 1$ .

What happens to $P r [X = i]$ as $n \to \infty$ ?

$P r [X = i] = (i n) (1/ n)^{i} (1 - 1/ n)^{n - i}$ $= \frac{n ( n - 1 ) \dots ( n - i + 1 )}{i !} \frac{1}{n ^{i}} (1 - 1/ n)^{n - i}$ $= \frac{n}{n} \frac{n - 1}{n} \dots \frac{n - i + 1}{n} \cdot \frac{1}{i !} \cdot (1 - 1/ n)^{n - i}$

As $n \to \infty$ :

$\frac{n - k}{n} \to 1$ for fixed $k$ . So the product $\frac{n}{n} \dots \frac{n - i + 1}{n} \to 1$ .
$(1 - 1/ n)^{n - i} = (1 - 1/ n)^{n} (1 - 1/ n)^{- i}$ . We know $(1 - 1/ n)^{n} \to e^{- 1}$ . And $(1 - 1/ n)^{- i} \to 1^{- i} = 1$ .

So, $lim_{n \to \infty} P r [X = i] = \frac{1}{i !} e^{- 1} = \frac{e ^{- 1}}{i !}$ .

More generally, if $X \sim Bin (n, λ / n)$ (so $E [X] = λ$ ), then as $n \to \infty$ : $lim_{n \to \infty} P r [X = i] = \frac{λ ^{i} e ^{- λ}}{i !}$ .

This is the PMF of a Poisson distribution with parameter $λ$ . We write $X \sim Po (λ)$ . $E [X] = λ$ .

The Poisson distribution models the number of rare events occurring in a fixed interval of time or space, if these events occur with a known constant mean rate and independently of the time since the last event.

Example: Number of heart attacks in Switzerland in the next hour.

Geometric Distribution

TLDR: bernoulli until first success + memoryless

Suppose we perform independent Bernoulli trials, each with success probability $p$ . Let $X$ be the number of trials until the first success occurs.

Then $X$ follows a Geometric distribution, written $X \sim Geo (p)$ .

Example: Keep flipping a coin (success = Head, prob $p$ ) until the first Head appears. $X$ is the number of flips.

For $X = k$ (first success on $k$ -th trial), we must have $k - 1$ failures followed by one success: Sequence: F F … F S (k-1 F’s)

Probability: $(1 - p) (1 - p) \dots (1 - p) \cdot p = (1 - p)^{k - 1} p$ .

PMF: $f_{X} (k) = {(1 - p)^{k - 1} p 0 for k \in {1, 2, 3, \dots} otherwise$

Expected Value: $E [X] = 1/ p$ .

Intuitively, if success prob is $p = 1/10$ , you expect to wait 10 trials.

CDF: $F_{X} (n) = P r [X \leq n] = 1 - P r [X > n]$ . $P r [X > n]$ means the first $n$ trials were all failures. So, $P r [X > n] = (1 - p)^{n}$ . Thus, $F_{X} (n) = 1 - (1 - p)^{n}$ for $n = 1, 2, \dots$ .

Memorylessness Property

A key property of the Geometric distribution is memorylessness.

It states that for $s, t \in N$ : $P r [X > s + t ∣ X > s] = P r [X > t]$

(The slide has $P r [X \geq s + t ∣ X > s] = P r [X \geq t]$ which is slightly different but similar in spirit if $s$ is large, or can be rewritten using $P r [X = k]$ directly for $P r [X > s + t ∣ X > s]$ ).

Let’s use the standard definition: If you’ve already waited $s$ trials without success, the probability that you have to wait at least $t$ additional trials for the first success is the same as the probability that you had to wait at least $t$ trials from the very beginning. The process “forgets” how long it has already waited.

Example: If you’ve flipped 1000 tails in a row, the probability of getting a Head on the 1001st flip is still $p$ , same as on the 1st flip.

Negative Binomial Distribution

TLDR: n-th successful bernoulli (i.e. generalize geometric)

This generalizes the Geometric distribution.

Let $X$ be the number of trials until the $n$ -th success occurs in a sequence of independent Bernoulli trials (success prob $p$ ).

Then $X$ follows a Negative Binomial distribution. (The slide calls it NegativeBinomial(n), but often it’s parameterized by $n$ and $p$ , like NB(n,p)).

To have the $n$ -th success on the $k$ -th trial:

The $k$ -th trial must be a success.
In the first $k - 1$ trials, there must have been exactly $n - 1$ successes.

The number of ways to arrange $n - 1$ successes in $k - 1$ trials is $(n - 1 k - 1)$ . The probability of any such specific sequence of $n - 1$ successes (and $(k - 1) - (n - 1) = k - n$ failures) in the first $k - 1$ trials is $p^{n - 1} (1 - p)^{k - n}$ . Then multiply by $p$ for the success on the $k$ -th trial.

PMF: $f_{X} (k) = {(n - 1 k - 1) p^{n} (1 - p)^{k - n} 0 for k \in {n, n + 1, \dots} otherwise$

Expected Value: $E [X] = n / p$ . If $X_{i} \sim Geo (p)$ is time to $i$ -th success after $(i - 1)$ -th success, then $X = X_{1} + \dots + X_{n}$ , so $E [X] = n \cdot (1/ p)$ .

Coupon Collector’s Problem

There are $n$ different types of coupons (e.g., pictures in cereal boxes). In each round (e.g., buying a box), you get one coupon chosen uniformly at random from the $n$ types (with replacement).

Let $X$ be the number of rounds until you have collected at least one of each of the $n$ types of coupons. Question: What is $E [X]$ ?

Solution Approach

We divide the process into $n$ phases.

Phase $i$ : The time (number of rounds) spent while you already have $i - 1$ distinct coupon types and are waiting to collect the $i$ -th new distinct type.

Let $X_{i}$ be the number of rounds in Phase $i$ .

The total number of rounds is $X = X_{1} + X_{2} + \dots + X_{n}$ . By linearity of expectation, $E [X] = E [X_{1}] + E [X_{2}] + \dots + E [X_{n}]$ .

Consider Phase $i$ : You currently have $i - 1$ distinct coupon types. In any round, the probability of getting any specific coupon is $1/ n$ . There are $n - (i - 1) = n - i + 1$ coupon types that you don’t yet have.

So, the probability of getting a new coupon type (a success for this phase) in any given round is $p_{i} = \frac{n - i + 1}{n}$ .

The number of rounds $X_{i}$ to get this $i$ -th new type is therefore Geometrically distributed: $X_{i} \sim Geo (p_{i})$ . So, $E [X_{i}] = 1/ p_{i} = \frac{n}{n - i + 1}$ .

Expectation

Now, sum the expectations:

$E [X] = \sum_{i = 1}^{n} E [X_{i}] = \sum_{i = 1}^{n} \frac{n}{n - i + 1}$ .

Let $k = n - i + 1$ . As $i$ goes from $1$ to $n$ , $k$ goes from $n$ down to $1$ .

$E [X] = \sum_{k = 1}^{n} \frac{n}{k} = n \sum_{k = 1}^{n} \frac{1}{k} = n H_{n}$ .

Since $H_{n} \approx ln n + γ$ (where $γ \approx 0.577$ is the Euler-Mascheroni constant),

$E [X] = n H_{n} \approx n ln n + O (n)$ .

So, on average, you need about $n ln n$ rounds to collect all $n$ coupons. For example, if $n = 50$ , $E [X] \approx 50 ln 50 \approx 50 \times 3.91 \approx 195.6$ .

Continue here: 13 Conditional Random Variables, Multiple Random Variables (Joint PMF, Marginal PMF), Independence of Random Variables, Sum of Independent Random Variables, Wald’s Identity, Variance and Concentration

CS Notes

Explorer