Lecture from: 25.03.2025 | Video: Homelab

Independence of Events

Recalling Conditional Probability

Before we talk about independence, let’s quickly remember what conditional probability is. If we have two events, say and , the probability of happening given that we already know has happened is written as . We calculate this as: (This formula only makes sense if is not zero, of course!)

Think of it like this: measures the probability of event when the “universe” of possibilities has shrunk down to only those outcomes where occurred.

The Idea of Independence

Now, what if knowing that event happened gives us absolutely no new information about whether event will happen? In such a case, the probability of remains the same, regardless of . So, intuitively, is independent of if:

If we plug this into our formula for conditional probability:

Multiplying both sides by gives us the formal definition of independence.

Definition of Independence for Two Events

Two events and are said to be independent if the probability that both and occur is simply the product of their individual probabilities:

This definition is beautifully symmetric. If is independent of , then is also independent of . And it works even if one of the probabilities is zero.

A crucial point: stochastic independence (this mathematical definition) doesn’t always mean the events are “physically” or “causally” unrelated in the real world. They are independent if their probabilities behave this way.

Example: Two Dice Rolls (Revisited)

Remember our example with a red die and a black die?

Let be the event “the red die shows an even number.” We found . Let be the event “the sum of the two dice is 7.” We found . The event is “the red die is even AND the sum is 7.” The outcomes are , so .

Now, let’s check: . Since is indeed equal to , the events “red die is even” and “sum is 7” are independent. Knowing the red die is even doesn’t change the likelihood of the sum being 7, and vice-versa.

Independence of Multiple Events

What if we have more than two events, say , , and ? When do we call them independent?

Intuition: If and are independent, then learning about any combination of them shouldn’t give us any information about the others. For example, knowing and happened shouldn’t change the probability of : And similarly for other combinations like , , etc.

Systematic Definition:

For three events and to be (mutually) independent, we need all of the following conditions to hold:

  1. (A and B are pairwise independent)
  2. (A and C are pairwise independent)
  3. (B and C are pairwise independent)
  4. (The crucial one for mutual independence)

If these hold, then indeed, for example, .

It’s important to note that pairwise independence (conditions 1, 2, 3) does not automatically imply mutual independence (condition 4 is also needed).

Example 1: Numbers from 1 to 8 (where pairwise doesn’t mean mutual)

Let’s pick a number uniformly at random from . Let “the number is in ”. . Let “the number is in ”. . Let "". (So C is the same event as B). .

Then: . So, . And . So, the condition holds.

However, and are clearly not independent, because they are the exact same event! . But . Since , and are not independent. Thus, are not mutually independent, even though the “triple intersection” formula worked out. This example highlights why all pairwise conditions (and the n-wise condition) are needed.

Example 2: Two Coin Flips (where pairwise holds, but not mutual)

Let’s flip two fair coins. Outcomes: HH, HT, TH, TT (each with probability 1/4). Let “first coin is Heads” . . Let “second coin is Heads” . . Let “the two coins show different results” . .

Let’s check pairwise independence:

  • . . . So, are independent.
  • . . . So, are independent.
  • . . . So, are independent.

So, all pairs are independent!

Now, what about ? This means “first is H, second is H, AND results are different”. This is impossible! So , and . But, . Since , the events are not mutually independent, even though they are pairwise independent. This is a classic example!

General Definition of Independence for Many Events

A collection of events are (mutually) independent if for every subcollection of these events, say (where is any subset of ), the probability of their intersection is the product of their individual probabilities:

This must hold for all non-trivial subsets of indices (of size 2 or more), plus the set of all events. This is a strong condition!

An infinite family of events (for ) is independent if this condition holds for every finite subcollection.

Some Useful Properties (Lemmas about Independence)

If events are independent, then:

  1. are also independent (where is the complement of ).
    • Proof sketch: (where represents the intersection of any subcollection of ‘s).
  2. If are independent, then are independent.
  3. If are independent, then are independent.
    • Proof sketch for union: . Using inclusion-exclusion and then independence for intersections: (since are also independent as a subcollection) .

An equivalent way to state that events are independent is that for any choice of (where and ), the following holds: This means you can “mix and match” events and their complements, and the product rule for intersections still applies.

Visual Cryptography: An Application of Independence (and Dependence)

Visual cryptography is a neat trick. Imagine you have a secret image. You can split it into two (or more) “shares” or transparencies. Each share by itself looks like random noise – you can’t see the image. But if you overlay the correct shares, the secret image magically appears!

How does this relate to independence? Consider a single pixel in the original secret image (either black or white). This pixel is encoded into corresponding sub-pixels on each share.

  • Within a single share: The sub-pixels making up the encoding of one original pixel might be chosen independently (or in a structured but random-looking way). Horizontally adjacent sub-pixels on a share might be independent.
  • Between shares (for a non-secret region): If you look at the sub-pixels in Share 1 corresponding to an original white pixel, and the sub-pixels in Share 2 for that same original white pixel, their combination when overlaid should still look random (or white).
  • Between shares (for a secret region): If you look at the sub-pixels in Share 1 corresponding to an original black pixel, and the sub-pixels in Share 2 for that same original black pixel, their combination when overlaid must result in black. This means the choice of sub-pixels on Share 1 depends on the choice on Share 2 (or they are chosen in a correlated way) to reveal the secret.

The “secret” lies in the dependencies created between the shares in the regions that form the image, while maintaining an appearance of randomness (independence) elsewhere and in each share individually.

Random Variables

Often, we are not interested in the elementary outcome of an experiment itself, but rather in some numerical quantity associated with that outcome. This is where random variables come in.

What is a Random Variable?

A random variable is a function that maps each outcome in the sample space to a real number.

It’s “random” because the outcome is random, and therefore the value is also random. It’s a “variable” because it can take on different numerical values.

Examples of Random Variables

  1. Die Roll: Sample space (the face showing on a die).

    • Let (the value of the random variable is just the number rolled).
    • Let Here, tells us if the outcome was prime or not.
    • Let . (A more complex function of the outcome).
  2. Coin Flips: Flip a coin 3 times. .

    • Let be the “number of Heads”. , etc.

Notation for Events involving Random Variables

We often write expressions like "". This is shorthand for the event consisting of all outcomes such that the random variable takes on a value less than or equal to 5 for that outcome. Formally: "" denotes the event . Similarly, "" denotes the event .

Indicator Random Variables

A very important type of random variable is an indicator random variable (or characteristic function).

For any event , its indicator random variable, often denoted or , is defined as:

Indicator variables are incredibly useful for translating properties of events into numerical values, especially when calculating expected values.

Probability Mass Function (PMF) / Density Function

For a discrete random variable (one that takes on a countable number of values), its probability mass function (PMF), often denoted or , gives the probability that takes on a specific value .

The PMF is defined for all real , but it will be non-zero only for the values that can actually take. The sum of over all possible values of must be 1.

Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF) of a random variable , denoted , gives the probability that takes on a value less than or equal to .

The CDF is defined for all real . It is a non-decreasing function, starting at 0 (as ) and ending at 1 (as ).

Example: Three Coin Flips

Let be the number of Heads in three fair coin flips.

, each with probability . Possible values for : 0, 1, 2, 3.

PMF : . for other .

CDF :

  • for
  • for
  • for
  • for
  • for

Example: Three Dice Rolls - Number of Odd Outcomes

Roll a fair die three times. Let be the number of times an odd number (1, 3, or 5) appears. The probability of rolling an odd number in one roll is . The probability of rolling an even number in one roll is . This is identical in structure to the three coin flips example (Heads = Odd, Tails = Even). So, . The PMF and CDF plots will look the same as for the coin flip example.

Random Sequences: Which is truly random?

Consider two sequences of K (Kopf/Heads) and Z (Zahl/Tails), each of length 200. One is from 200 actual coin flips, the other is artificially generated. Which one is which?

Sequence 1:

Number of Z (Zahl) = 137. Number of K (Kopf) = 63. (Ratio Z:K is 137:63 2.17:1)

Sequence 2:

Number of Z (Zahl) = 97. Number of K (Kopf) = 103. (Ratio Z:K is 97:103 0.94:1)

Typically, humans trying to “fake” a random sequence tend to make it “too balanced” or switch between heads and tails “too often.” True random sequences can have longer runs of the same outcome and can deviate more from a perfect 50/50 split than intuition might suggest for shorter sequences.

We re-analyze slightly different sequences later on:

Sequence 1:

Number of Z = 101 (out of 200). Number of K = 99. Ratio . Number of changes (K to Z or Z to K) = 112.

Sequence 2:

Number of Z = 100. Number of K = 100. Ratio . Number of changes = 133.

The sequence with 112 changes looks more like a real random sequence. For a fair coin, we expect about changes. Sequence 2 has 133 changes, which is significantly more than expected, suggesting it might be the one where someone tried to make it look “random” by switching too often. The sequence with a Z:K ratio far from 1:1 is less likely to be from a fair coin over 200 flips, although not impossible. The top sequence on slide with counts 101:99 and 112 changes is a good candidate for the truly random sequence.

Expected Value

The expected value (or expectation or mean) of a random variable , denoted , is the average value we would expect to take if we repeated the experiment many times.

Definition of Expected Value

For a discrete random variable that takes values in a set , its expected value is:

This is a weighted average of the possible values of , where each value is weighted by its probability.

“we only consider random variables for which the expected value exists.” This means the sum must converge absolutely.

An alternative, often more fundamental, definition using the sample space :

This definition always works, even for continuous random variables (where the sum becomes an integral). The two definitions are equivalent for discrete random variables.

Special Case for Non-Negative Integer Valued Random Variables

If a random variable takes values in , then its expected value can also be calculated as:

This formula is sometimes called the “tail sum formula” for expectation.

Proof

Linearity of Expectation

This is one of the most powerful properties of expected values. It states that the expectation of a sum of random variables is the sum of their expectations.

Crucially, this holds even if the random variables are dependent!

If and are two random variables defined on the same sample space : Let . This means for all . Then, .

Proof

General Linearity of Expectation

For random variables and constants : Let . Then, .

Expected Value of an Indicator Variable

Let be the indicator variable for an event . if , and if .

.

The expected value of an indicator variable for an event is simply the probability of that event. This is a very handy fact!

Example: Expected Number of Heads in 100 Coin Flips

Flip a fair coin 100 times. Let be the total number of Heads. We want to find .

Let be the indicator variable for the event “the -th flip is Heads” (for ). if -th flip is H, if -th flip is T. .

The total number of heads can be written as the sum of these indicator variables: . By linearity of expectation: (100 times) . So, the expected number of heads is 50. Notice how easy this was using linearity, without needing to calculate for all .

Application: Expected Size of a Randomly Generated Stable Set

A stable set (or independent set) in a graph is a set of vertices where no two vertices are adjacent.

Goal: Find a “large” stable set. This is a hard problem (NP-hard).

Consider a randomized algorithm:

  1. First Round (Node Selection): Each vertex decides to “survive” (join a candidate set ) independently with some probability . Let be the number of vertices in .

  2. Second Round (Edge Removal): For every edge where both and survived into , randomly remove one of or from to form the final stable set .

Algorithm

  1. Each vertex is kept with probability (independently). Let be the set of kept vertices.
  2. Create an initial set .
  3. For each edge where both : remove both and from . (The slide’s analysis is a bit different here, it bounds where is number of “bad” edges where one endpoint is removed per edge).

Expectation

Let’s analyze the expectation of the size of the stable set :

Let with and .

Define (probability a vertex survives the first round, not explicitly stated but implied in ).

Let be an indicator variable: if vertex “survives” the first round, otherwise. . Let be the number of vertices surviving the first round. By linearity, .

Let be an indicator variable: if edge “survives” the first round (meaning both and survived). . Since survival is independent, . So, .

Let be the number of edges surviving the first round. By linearity, .

The algorithm then constructs a stable set . The size of is at least the number of initially selected vertices () minus the number of “bad” edges () for which we have to remove one endpoint. So, .

Then, (by linearity). .

Continue here: 12 Randomized QuickSort, Indicator Variables, Common Probability Distributions, Coupon Collector