Lecture from: 01.04.2025 | Video: Homelab
Recap: Random Variables and Their Properties
As a quick reminder:
A random variable is a function mapping outcomes from a sample space to real numbers.
The PMF (Probability Mass Function, or “Dichtefunktion”) tells us the probability that takes on a specific value .
The CDF (Cumulative Distribution Function, or “Verteilungsfunktion”) gives the probability that is less than or equal to .
The Expected Value is the average value takes.
Conditional Random Variables
Sometimes, we are interested in the behavior of a random variable given that a certain event (with ) has occurred. This leads to the concept of a conditional random variable, denoted .
The function is essentially the same function as , but its domain is restricted to the outcomes . We are only looking at the values for those outcomes where happened.
Conditional PMF, CDF, and Expectation
We can define the PMF, CDF, and expectation for this conditional random variable:
-
Conditional PMF: This is the probability that the random variable takes the value , given that event has occurred.
-
Conditional CDF: This is the probability that is less than or equal to , given that event has occurred.
-
Conditional Expectation: Using the definition of conditional probability, . So, (assuming elementary outcomes for simplicity in the last sum if it’s a Laplace space, or more generally, summing over ).
A random variable is said to be independent of an event A if its conditional distribution given is the same as its unconditional distribution, i.e., for all . This means knowing occurred doesn’t change how behaves.
Example: Sum of Two Dice Rolls Conditioned on Events
Let be the outcome of the first die roll, be the outcome of the second.
- .
Let be the sum of the two dice.
- .
Consider two events:
- ” is even” (i.e., ). .
- ” is even” (i.e., ). .
Expectation of the sum given the dice 2 is even
Let’s look at .
By linearity of conditional expectation (which holds just like regular expectation):
.
- (first die roll) is independent of (second die roll is even). So, .
- : We are given is even. The possible values for are , each with probability (within the conditioned space A). So, . So, .
Knowing the second die is even increases the expected sum, which makes sense.
Expectation of the sum given the sum is even
Now, . .
It turns out that is independent of (the sum being even doesn’t change probabilities for ). For any , . So .
Similarly, . Thus, .
Multiple Random Variables
Often, we work with several random variables simultaneously, defined on the same sample space.
For example, if we flip coins: Indicator for Head on -th flip. Total number of Heads.
Here, are all random variables on the space of coin flips.
Another example:
- Roll a die: outcome of the die.
- Flip that many coins: number of Heads obtained. Here and are two random variables related in a sequential experiment.
Joint PMF (Gemeinsame Dichte)
When we have two discrete random variables and , their joint probability mass function gives the probability that takes value AND takes value : (short for )
Marginal PMF (Randdichte)
From the joint PMF, we can recover the individual PMF of (called the marginal PMF of ) by summing over all possible values of :
This is by the law of total probability: the event is partitioned by the events for all possible . Similarly for .
Example: Die roll (number ) then coin flips (number of heads )
, .
What is ?
for .
Given (we flip coins), (number of heads) follows . So, .
Thus,
Let’s calculate (marginal probability for ):
For , since .
.
It seems is .
Independence of Random Variables
Definition of Independence (Version 1 - using events)
TLDR:
Random variables are said to be independent if for all choices of values (where is in the range of ), the events "", "", …, "" are mutually independent events.
This means: And this must hold not just for the full set of variables, but for every subset of these variables as well (to ensure mutual independence of the events).
Many of these equations are redundant. For example, if only takes values :
If we know:
Then can be found by summing these two (by law of total probability over ):
Since , we get . This means the pairwise independence of follows from the conditions involving .
Definition of Independence (Version 2 - using PMFs)
TLDR: Joint PMF can be decomposed into product of marginal PMFs.
A more concise definition: Random variables are independent if their joint PMF is the product of their marginal PMFs: for all in the combined range .
This single equation covering all combinations of values implies the subset conditions automatically.
Trick for Two Indicator Variables
For two indicator variables and : and are independent .
This is because , , and .
If this holds, all other combinations like also work out. For instance, (using the condition) .
Caution: This shortcut does not generally apply for three or more indicator variables.
Example of Independence/Dependence of Indicator Variables
-
, for each . if is divisible by 2, else 0. . for . . So .
if is divisible by 3, else 0. . for . . So .
Event means is divisible by 2 AND 3, so divisible by 6. This is only . . Check: . Since , and are independent.
-
, for each . if is divisible by 2, else 0. for , for . . So .
if is divisible by 3, else 0. for , for . . So .
Event means divisible by 2 AND 3. No such in . . Check: . Since , and are not independent.
Properties of Independent Random Variables
-
If are independent random variables, then for any choice of subsets (where are sets of values can take): . This is a more general statement of independence. (This is “Lemma 2.53”).
-
Subsets of independent variables are independent: If are independent, then any subcollection, e.g., , is also independent. (This is “Korollar 2.53”).
-
Functions of independent variables: If are independent random variables, and are real-valued functions (), then the random variables are also independent. (This is “Satz 2.52”). For example, if are independent, then and are independent.
Sum of Independent Random Variables
If and are two independent discrete random variables, and , what is the PMF of , ?
The event occurs if and for some .
Since and are independent: .
This is called the convolution of the PMFs of and .
Sum of Multiple Independent Random Variables
If , , and are independent discrete random variables, and you define then the PMF of , denoted , is the convolution of the PMFs of , , and .
Step-by-step:
-
First, define the PMFs:
-
The PMF of is:
where are the supports of and respectively.
This is a double convolution.
Consequences for Common Distributions
This convolution formula leads to some nice “closure” properties for sums of independent random variables from certain families:
-
Sum of independent Poissons: If and are independent, then . The sum of independent Poisson variables is also Poisson, with parameter being the sum of parameters.
-
Sum of independent Binomials (with same p): If and are independent (note: same ), then . This makes intuitive sense: is sum of Bernoulli(p) trials, is sum of Bernoulli(p) trials. Their sum is like Bernoulli(p) trials.
Wald’s Identity (Waldsche Identität)
Wald’s Identity relates the expected value of a sum of a random number of independent and identically distributed (i.i.d.) random variables.
Setup
- Let be a sequence of i.i.d. random variables, each with mean .
- Let be a non-negative integer-valued random variable (a “stopping time”) that is independent of the sequence . The value of tells us how many ‘s to sum.
- Let . If , then .
Intuition for Setup: Imagine you’re at a casino playing a game.
- : The amount you win or lose on the -th play of the game. These are i.i.d. (each play is similar, and outcomes are independent). is your average winning/loss per play.
- : The number of times you decide to play. This number could itself be random (e.g., you play until you win a certain amount, or until you’ve played 5 times, or until you get bored).
- Crucial condition: Your decision on how many games to play () must be made independently of the actual outcomes of those games. For example, cannot be “I’ll stop playing as soon as I see a big win “. If depends on the values, Wald’s Identity (in this simple form) doesn’t apply. (The concept of a “stopping time” is more general and can relax this independence, but for this version, independence of and the sequence is key).
- : Your total winnings/losses after playing games.
Wald’s Identity states
(or )
Example 1: Die Roll and Coin Flips (revisited)
- Roll a fair die. Let be the outcome. . .
- Flip a fair coin times. Let be indicator for Head on -th flip. .
- Let be the total number of Heads.
Here, (die roll) is independent of the outcomes of the ‘s (coin flips).
By Wald’s Identity: .
Note: We use in the equation as all experiments are i.i.d and it’s expected value is the same…
Example 2: Two Phases of Coin Flips
- Phase 1: Flip a fair coin until the first Head appears. Let be the number of flips in this phase. . So .
- Phase 2: Flip the coin more times. Let be the indicator for Head on the -th flip in this second phase. . Let be the number of Heads in Phase 2.
The number of flips in Phase 2 () is determined by Phase 1, and is independent of the outcomes () within Phase 2.
By Wald’s Identity, .
So, we expect 1 Head in Phase 2.
Let be the total number of Heads overall (Phase 1 + Phase 2). Phase 1 contributes exactly 1 Head (the one that stopped Phase 1). So, . .
Variance and Concentration
The expected value tells us the “center” or average of a random variable. But it doesn’t tell us how spread out the values of are.
We’d like to say that if we know , then is likely to be “close” to . That is, .
However, this isn’t always true just based on .
Example:
- and .
- Then .
But is never close to its expectation 0! It’s always away.
We need a measure of spread or deviation.
Variance
The variance of a random variable , denoted or , measures the expected squared deviation of from its mean .
Since , . A small variance means tends to be close to its mean. A large variance means can be far from its mean.
Note: One could also measure spread using the average absolute deviation, or even higher powers like the 4th or 6th. But squaring turns out to be mathematically convenient, it leads to cleaner algebra and works well with many powerful tools later on.
Alternative Formula for Variance
Using linearity of expectation:
Since is a constant:
This is often easier to compute: calculate and , then use this formula.
The standard deviation is also commonly used. It has the same units as .
Properties of Variance
For any random variable and real constants :
This tells us two important things:
- Shifting a random variable by a constant () doesn’t affect its variance.
- Scaling a random variable by a factor () multiplies the variance by .
Proof Sketch
Let . Then:
So, the constant shift cancels out—it doesn’t affect how far values are from the mean, only where that mean is located.
The key idea: shifting moves everything together, so the “spread” stays the same. But scaling stretches or compresses the deviations, so it changes the variance.
Next time, we’ll see how variance lets us quantify how tightly a random variable clusters around its mean, using tools like Markov’s and Chebyshev’s inequalities.
Continue here: 14 Rules for Moments (Expectation, Variance), Estimating Probabilities (Markov, Chebyshev), Chernoff Bounds