13 Conditional Random Variables, Multiple Random Variables (Joint PMF, Marginal PMF), Independence of Random Variables, Sum of Independent Random Variables, Wald's Identity, Variance and Concentration

Lecture from: 01.04.2025 | Video: Homelab | Rui Zhangs Notes

Recap: Random Variables and Their Properties

As a quick reminder:

A random variable $X$ is a function $X : Ω \to R$ mapping outcomes from a sample space $Ω$ to real numbers.

The PMF (Probability Mass Function, or “Dichtefunktion”) $f_{X} (x) = P r [X = x]$ tells us the probability that $X$ takes on a specific value $x$ .

The CDF (Cumulative Distribution Function, or “Verteilungsfunktion”) $F_{X} (x) = P r [X \leq x]$ gives the probability that $X$ is less than or equal to $x$ .

The Expected Value $E [X] = \sum_{x \in W_{X}} x \cdot P r [X = x] = \sum_{ω \in Ω} X (ω) \cdot P r [ω]$ is the average value $X$ takes.

Conditional Random Variables

Sometimes, we are interested in the behavior of a random variable $X$ given that a certain event $A$ (with $P r [A] > 0$ ) has occurred. This leads to the concept of a conditional random variable, denoted $X ∣ A$ .

The function $X ∣ A$ is essentially the same function as $X$ , but its domain is restricted to the outcomes $ω \in A$ . We are only looking at the values $X (ω)$ for those outcomes where $A$ happened.

Conditional PMF, CDF, and Expectation

We can define the PMF, CDF, and expectation for this conditional random variable:

Conditional PMF: $f_{X ∣ A} (x) = P r [X = x ∣ A]$ This is the probability that the random variable $X$ takes the value $x$ , given that event $A$ has occurred.
Conditional CDF: $F_{X ∣ A} (x) = P r [X \leq x ∣ A]$ This is the probability that $X$ is less than or equal to $x$ , given that event $A$ has occurred.
Conditional Expectation: $E [X ∣ A] = \sum_{x \in W_{X}} x \cdot P r [X = x ∣ A]$ Using the definition of conditional probability, $P r [X = x ∣ A] = \frac{P r [( X = x ) \cap A ]}{P r [ A ]}$ . So, $E [X ∣ A] = \sum_{x \in W_{X}} x \cdot \frac{P r [{ ω \in A ∣ X ( ω ) = x }]}{P r [ A ]} = \frac{1}{P r [ A ]} \sum_{ω \in A} X (ω) \cdot P r [ω]$ (assuming elementary outcomes for simplicity in the last sum if it’s a Laplace space, or more generally, summing $X (ω) P r [ω]$ over $ω \in A$ ).

A random variable $X$ is said to be independent of an event A if its conditional distribution given $A$ is the same as its unconditional distribution, i.e., $f_{X ∣ A} (x) = f_{X} (x)$ for all $x$ . This means knowing $A$ occurred doesn’t change how $X$ behaves.

Example: Sum of Two Dice Rolls Conditioned on Events

Let $X_{1}$ be the outcome of the first die roll, $X_{2}$ be the outcome of the second.

$E [X_{1}] = E [X_{2}] = 3.5$ .

Let $X = X_{1} + X_{2}$ be the sum of the two dice.

$E [X] = E [X_{1}] + E [X_{2}] = 7$ .

Consider two events:

$A :=$ ” $X_{2}$ is even” (i.e., $X_{2} \in {2, 4, 6}$ ). $P r [A] = 1/2$ .
$B :=$ ” $X$ is even” (i.e., $X_{1} + X_{2} \in {2, 4, 6, 8, 10, 12}$ ). $P r [B] = 1/2$ .

Expectation of the sum given the dice 2 is even

Let’s look at $E [X ∣ A] = E [X_{1} + X_{2} ∣ A]$ .

By linearity of conditional expectation (which holds just like regular expectation):

$E [X ∣ A] = E [X_{1} ∣ A] + E [X_{2} ∣ A]$ .

$X_{1}$ (first die roll) is independent of $A$ (second die roll is even). So, $E [X_{1} ∣ A] = E [X_{1}] = 3.5$ .
$E [X_{2} ∣ A]$ : We are given $X_{2}$ is even. The possible values for $X_{2}$ are $2, 4, 6$ , each with probability $1/3$ (within the conditioned space A). So, $E [X_{2} ∣ A] = 2 (1/3) + 4 (1/3) + 6 (1/3) = (2 + 4 + 6) /3 = 12/3 = 4$ . So, $E [X ∣ A] = 3.5 + 4 = 7.5$ .

Knowing the second die is even increases the expected sum, which makes sense.

Expectation of the sum given the sum is even

Now, $E [X ∣ B] = E [X_{1} + X_{2} ∣ B]$ . $E [X ∣ B] = E [X_{1} ∣ B] + E [X_{2} ∣ B]$ .

It turns out that $X_{1}$ is independent of $B$ (the sum $X_{1} + X_{2}$ being even doesn’t change probabilities for $X_{1}$ ). For any $i \in {1..6}$ , $P r [B ∣ X_{1} = i] = 1/2 = P r [B]$ . So $E [X_{1} ∣ B] = E [X_{1}] = 3.5$ .

Similarly, $E [X_{2} ∣ B] = E [X_{2}] = 3.5$ . Thus, $E [X ∣ B] = 3.5 + 3.5 = 7$ .

Multiple Random Variables

Often, we work with several random variables simultaneously, defined on the same sample space.

For example, if we flip $n$ coins: $X_{i} =$ Indicator for Head on $i$ -th flip. $X = X_{1} + X_{2} + \dots + X_{n} =$ Total number of Heads.

Here, $X, X_{1}, \dots, X_{n}$ are all random variables on the space of $n$ coin flips.

Another example:

Roll a die: $X :=$ outcome of the die.
Flip that many coins: $Y :=$ number of Heads obtained. Here $X$ and $Y$ are two random variables related in a sequential experiment.

Joint PMF (Gemeinsame Dichte)

When we have two discrete random variables $X$ and $Y$ , their joint probability mass function $f_{X, Y} (x, y)$ gives the probability that $X$ takes value $x$ AND $Y$ takes value $y$ : $f_{X, Y} (x, y) := P r [X = x, Y = y]$ (short for $P r [(X = x) \cap (Y = y)]$ )

Marginal PMF (Randdichte)

From the joint PMF, we can recover the individual PMF of $X$ (called the marginal PMF of $X$ ) by summing over all possible values of $Y$ : $f_{X} (x) = P r [X = x] = \sum_{y \in W_{Y}} P r [X = x, Y = y] = \sum_{y \in W_{Y}} f_{X, Y} (x, y)$

This is by the law of total probability: the event $(X = x)$ is partitioned by the events $(X = x, Y = y)$ for all possible $y$ . Similarly for $f_{Y} (y) = \sum_{x \in W_{X}} f_{X, Y} (x, y)$ .

Example: Die roll (number $X$ ) then $X$ coin flips (number of heads $Y$ )

$X \in {1, \dots, 6}$ , $Y \in {0, \dots, X}$ .

What is $P r [X = x, Y = y]$ ?

$P r [X = x, Y = y] = P r [Y = y ∣ X = x] \cdot P r [X = x]$ $P r [X = x] = 1/6$ for $x \in {1, \dots, 6}$ .

Given $X = x$ (we flip $x$ coins), $Y$ (number of heads) follows $Bin (x, 1/2)$ . So, $P r [Y = y ∣ X = x] = (y x) (1/2)^{x}$ .

Thus, $f_{X, Y} (x, y) = {\frac{1}{6} (y x) \frac{1}{2 ^{x}} 0 if x \in {1..6}, y \in {0.. x} otherwise$

Let’s calculate $P r [Y = 3]$ (marginal probability for $Y$ ): $P r [Y = 3] = \sum_{x = 1}^{6} P r [X = x, Y = 3]$

For $x = 1, 2$ , $P r [X = x, Y = 3] = 0$ since $y \leq x$ .

$P r [Y = 3] = P r [X = 3, Y = 3] + P r [X = 4, Y = 3] + P r [X = 5, Y = 3] + P r [X = 6, Y = 3]$ $= \frac{1}{6} (3 3) \frac{1}{2 ^{3}} + \frac{1}{6} (3 4) \frac{1}{2 ^{4}} + \frac{1}{6} (3 5) \frac{1}{2 ^{5}} + \frac{1}{6} (3 6) \frac{1}{2 ^{6}}$ $= \frac{1}{6} (1 \cdot \frac{1}{8} + 4 \cdot \frac{1}{16} + 10 \cdot \frac{1}{32} + 20 \cdot \frac{1}{64})$ $= \frac{1}{6} (\frac{1}{8} + \frac{4}{16} + \frac{10}{32} + \frac{20}{64}) = \frac{1}{6} (\frac{8 + 16 + 20 + 20}{64}) = \frac{1}{6} \frac{64}{64} = \frac{1}{6}$ .

It seems $P r [Y = 3]$ is $1/6$ .

Independence of Random Variables

Definition of Independence (Version 1 - using events)

TLDR: $P r [A = a \land B = b \land \dots] = P r [A = a] * P r [B = b] * \dots$

Random variables $X_{1}, \dots, X_{n}$ are said to be independent if for all choices of values $x_{1}, \dots, x_{n}$ (where $x_{i}$ is in the range of $X_{i}$ ), the events " $X_{1} = x_{1}$ ", " $X_{2} = x_{2}$ ", …, " $X_{n} = x_{n}$ " are mutually independent events.

This means: $P r [X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}] = P r [X_{1} = x_{1}] \cdot P r [X_{2} = x_{2}] \cdot \dots \cdot P r [X_{n} = x_{n}]$ And this must hold not just for the full set of $n$ variables, but for every subset of these variables as well (to ensure mutual independence of the events).

Many of these equations are redundant. For example, if $X_{3}$ only takes values ${0, 1}$ :

If we know:

$P r [X_{1} = x_{1}, X_{2} = x_{2}, X_{3} = 0] = P r [X_{1} = x_{1}] P r [X_{2} = x_{2}] P r [X_{3} = 0]$
$P r [X_{1} = x_{1}, X_{2} = x_{2}, X_{3} = 1] = P r [X_{1} = x_{1}] P r [X_{2} = x_{2}] P r [X_{3} = 1]$

Then $P r [X_{1} = x_{1}, X_{2} = x_{2}]$ can be found by summing these two (by law of total probability over $X_{3}$ ): $P r [X_{1} = x_{1}, X_{2} = x_{2}] = P r [X_{1} = x_{1}] P r [X_{2} = x_{2}] (P r [X_{3} = 0] + P r [X_{3} = 1])$

Since $P r [X_{3} = 0] + P r [X_{3} = 1] = 1$ , we get $P r [X_{1} = x_{1}, X_{2} = x_{2}] = P r [X_{1} = x_{1}] P r [X_{2} = x_{2}]$ . This means the pairwise independence of $X_{1}, X_{2}$ follows from the conditions involving $X_{3}$ .

Definition of Independence (Version 2 - using PMFs)

TLDR: Joint PMF can be decomposed into product of marginal PMFs.

A more concise definition: Random variables $X_{1}, \dots, X_{n}$ are independent if their joint PMF is the product of their marginal PMFs: $f_{X_{1}, \dots, X_{n}} (x_{1}, \dots, x_{n}) = f_{X_{1}} (x_{1}) \cdot f_{X_{2}} (x_{2}) \cdot \dots \cdot f_{X_{n}} (x_{n})$ for all $(x_{1}, \dots, x_{n})$ in the combined range $W_{X_{1}} \times \dots \times W_{X_{n}}$ .

This single equation covering all combinations of values implies the subset conditions automatically.

Trick for Two Indicator Variables

For two indicator variables $X$ and $Y$ : $X$ and $Y$ are independent $⟺ f_{X, Y} (1, 1) = f_{X} (1) \cdot f_{Y} (1)$ .

This is because $f_{X} (1) = P r [X = 1]$ , $f_{Y} (1) = P r [Y = 1]$ , and $f_{X, Y} (1, 1) = P r [X = 1, Y = 1]$ .

If this holds, all other combinations like $P r [X = 1, Y = 0]$ also work out. For instance, $P r [X = 1, Y = 0] = P r [X = 1] - P r [X = 1, Y = 1]$ $= P r [X = 1] - P r [X = 1] P r [Y = 1]$ (using the condition) $= P r [X = 1] (1 - P r [Y = 1]) = P r [X = 1] P r [Y = 0]$ .

Caution: This shortcut does not generally apply for three or more indicator variables.

Example of Independence/Dependence of Indicator Variables

$Ω = {1, 2, 3, 6}$ , $P r [ω] = 1/4$ for each $ω$ . $X (ω) = 1$ if $ω$ is divisible by 2, else 0. $W_{X} = {0, 1}$ . $X = 1$ for $ω \in {2, 6}$ . $P r [X = 1] = 2/4 = 1/2$ . So $f_{X} (1) = 1/2, f_{X} (0) = 1/2$ .

$Y (ω) = 1$ if $ω$ is divisible by 3, else 0. $W_{Y} = {0, 1}$ . $Y = 1$ for $ω \in {3, 6}$ . $P r [Y = 1] = 2/4 = 1/2$ . So $f_{Y} (1) = 1/2, f_{Y} (0) = 1/2$ .

Event $(X = 1, Y = 1)$ means $ω$ is divisible by 2 AND 3, so divisible by 6. This is only $ω = 6$ . $f_{X, Y} (1, 1) = P r [X = 1, Y = 1] = P r [ω = 6] = 1/4$ . Check: $f_{X} (1) \cdot f_{Y} (1) = (1/2) \cdot (1/2) = 1/4$ . Since $f_{X, Y} (1, 1) = f_{X} (1) f_{Y} (1)$ , $X$ and $Y$ are independent.
$Ω = {2, 3}$ , $P r [ω] = 1/2$ for each $ω$ . $X (ω) = 1$ if $ω$ is divisible by 2, else 0. $X = 1$ for $ω = 2$ , $X = 0$ for $ω = 3$ . $P r [X = 1] = 1/2$ . So $f_{X} (1) = 1/2, f_{X} (0) = 1/2$ .

$Y (ω) = 1$ if $ω$ is divisible by 3, else 0. $Y = 1$ for $ω = 3$ , $Y = 0$ for $ω = 2$ . $P r [Y = 1] = 1/2$ . So $f_{Y} (1) = 1/2, f_{Y} (0) = 1/2$ .

Event $(X = 1, Y = 1)$ means $ω$ divisible by 2 AND 3. No such $ω$ in $Ω = {2, 3}$ . $f_{X, Y} (1, 1) = P r [X = 1, Y = 1] = 0$ . Check: $f_{X} (1) \cdot f_{Y} (1) = (1/2) \cdot (1/2) = 1/4$ . Since $0 \neq = 1/4$ , $X$ and $Y$ are not independent.

Properties of Independent Random Variables

If $X_{1}, \dots, X_{n}$ are independent random variables, then for any choice of subsets $S_{1}, \dots, S_{n} \subseteq R$ (where $S_{i}$ are sets of values $X_{i}$ can take): $P r [X_{1} \in S_{1}, \dots, X_{n} \in S_{n}] = P r [X_{1} \in S_{1}] \cdot \dots \cdot P r [X_{n} \in S_{n}]$ . This is a more general statement of independence. (This is “Lemma 2.53”).
Subsets of independent variables are independent: If $X_{1}, \dots, X_{n}$ are independent, then any subcollection, e.g., $X_{i_{1}}, \dots, X_{i_{k}}$ , is also independent. (This is “Korollar 2.53”).
Functions of independent variables: If $X_{1}, \dots, X_{n}$ are independent random variables, and $g_{1}, \dots, g_{n}$ are real-valued functions ( $g_{i} : R \to R$ ), then the random variables $Y_{1} = g_{1} (X_{1}), \dots, Y_{n} = g_{n} (X_{n})$ are also independent. (This is “Satz 2.52”). For example, if $X, Y$ are independent, then $X^{2}$ and $Y^{3}$ are independent.

Sum of Independent Random Variables

If $X$ and $Y$ are two independent discrete random variables, and $Z = X + Y$ , what is the PMF of $Z$ , $f_{Z} (z)$ ?

The event $(Z = z)$ occurs if $(X = x)$ and $(Y = z - x)$ for some $x$ . $f_{Z} (z) = P r [Z = z] = \sum_{x \in W_{X}} P r [X = x, Y = z - x]$

Since $X$ and $Y$ are independent: $f_{Z} (z) = \sum_{x \in W_{X}} P r [X = x] \cdot P r [Y = z - x] = \sum_{x \in W_{X}} f_{X} (x) \cdot f_{Y} (z - x)$ .

This is called the convolution of the PMFs of $X$ and $Y$ .

Sum of Multiple Independent Random Variables

If $A$ , $B$ , and $C$ are independent discrete random variables, and you define $Z = A + B + C,$ then the PMF of $Z$ , denoted $f_{Z} (z)$ , is the convolution of the PMFs of $A$ , $B$ , and $C$ .

Step-by-step:

First, define the PMFs:
- $f_{A} (a) = Pr [A = a]$
- $f_{B} (b) = Pr [B = b]$
- $f_{C} (c) = Pr [C = c]$
The PMF of $Z = A + B + C$ is:
$f_{Z} (z) = a \in W_{A} \sum b \in W_{B} \sum f_{A} (a) \cdot f_{B} (b) \cdot f_{C} (z - a - b)$
where $W_{A}, W_{B}$ are the supports of $A$ and $B$ respectively.

This is a double convolution.

Consequences for Common Distributions

This convolution formula leads to some nice “closure” properties for sums of independent random variables from certain families:

Sum of independent Poissons: If $X_{1} \sim Po (λ_{1})$ and $X_{2} \sim Po (λ_{2})$ are independent, then $X_{1} + X_{2} \sim Po (λ_{1} + λ_{2})$ . The sum of independent Poisson variables is also Poisson, with parameter being the sum of parameters.
Sum of independent Binomials (with same p): If $X_{1} \sim Bin (n, p)$ and $X_{2} \sim Bin (m, p)$ are independent (note: same $p$ ), then $X_{1} + X_{2} \sim Bin (n + m, p)$ . This makes intuitive sense: $X_{1}$ is sum of $n$ Bernoulli(p) trials, $X_{2}$ is sum of $m$ Bernoulli(p) trials. Their sum is like $n + m$ Bernoulli(p) trials.

Wald’s Identity (Waldsche Identität)

Wald’s Identity relates the expected value of a sum of a random number of independent and identically distributed (i.i.d.) random variables.

Setup

Let $X_{1}, X_{2}, \dots$ be a sequence of i.i.d. random variables, each with mean $E [X_{i}] = μ$ .
Let $N$ be a non-negative integer-valued random variable (a “stopping time”) that is independent of the sequence $X_{i}$ . The value of $N$ tells us how many $X_{i}$ ‘s to sum.
Let $Z = \sum_{i = 1}^{N} X_{i}$ . If $N = 0$ , then $Z = 0$ .

Intuition for Setup: Imagine you’re at a casino playing a game.

$X_{i}$ : The amount you win or lose on the $i$ -th play of the game. These are i.i.d. (each play is similar, and outcomes are independent). $E [X_{i}] = μ$ is your average winning/loss per play.

$N$ : The number of times you decide to play. This number $N$ could itself be random (e.g., you play until you win a certain amount, or until you’ve played 5 times, or until you get bored).

Crucial condition: Your decision on how many games to play ( $N$ ) must be made independently of the actual outcomes $X_{i}$ of those games. For example, $N$ cannot be “I’ll stop playing as soon as I see a big win $X_{i}$ “. If $N$ depends on the $X_{i}$ values, Wald’s Identity (in this simple form) doesn’t apply. (The concept of a “stopping time” is more general and can relax this independence, but for this version, independence of $N$ and the sequence $X_{i}$ is key).

$Z$ : Your total winnings/losses after playing $N$ games.

Wald’s Identity states

$E [Z] = E [N] \cdot E [X_{1}]$ (or $E [N] \cdot μ$ )

Example 1: Die Roll and Coin Flips (revisited)

Roll a fair die. Let $N$ be the outcome. $N \in {1, \dots, 6}$ . $E [N] = 3.5$ .
Flip a fair coin $N$ times. Let $X_{i}$ be indicator for Head on $i$ -th flip. $E [X_{i}] = 1/2$ .
Let $Z = \sum_{i = 1}^{N} X_{i}$ be the total number of Heads.

Here, $N$ (die roll) is independent of the outcomes of the $X_{i}$ ‘s (coin flips).

By Wald’s Identity: $E [Z] = E [N] \cdot E [X_{1}] = 3.5 \cdot (1/2) = 7/4 = 1.75$ .

_Note: We use $X_{1}$ in the equation as all $X_{i}$ experiments are i.i.d and it’s expected value is the same…_

Example 2: Two Phases of Coin Flips

Phase 1: Flip a fair coin until the first Head appears. Let $N$ be the number of flips in this phase. $N \sim Geo (1/2)$ . So $E [N] = 1/ (1/2) = 2$ .
Phase 2: Flip the coin $N$ more times. Let $X_{i}$ be the indicator for Head on the $i$ -th flip in this second phase. $E [X_{i}] = 1/2$ . Let $Y = \sum_{i = 1}^{N} X_{i}$ be the number of Heads in Phase 2.

The number of flips in Phase 2 ( $N$ ) is determined by Phase 1, and is independent of the outcomes ( $X_{i}$ ) within Phase 2.

By Wald’s Identity, $E [Y] = E [N] \cdot E [X_{1}] = 2 \cdot (1/2) = 1$ .

So, we expect 1 Head in Phase 2.

Let $Z$ be the total number of Heads overall (Phase 1 + Phase 2). Phase 1 contributes exactly 1 Head (the one that stopped Phase 1). So, $Z = 1 + Y$ . $E [Z] = E [1 + Y] = 1 + E [Y] = 1 + 1 = 2$ .

Variance and Concentration

The expected value $E [X]$ tells us the “center” or average of a random variable. But it doesn’t tell us how spread out the values of $X$ are.

We’d like to say that if we know $E [X]$ , then $X$ is likely to be “close” to $E [X]$ . That is, $P r [∣ X - E [X] ∣ is "large"] is "small"$ .

However, this isn’t always true just based on $E [X]$ .

Example:

$P r [X = - 1 0^{10}] = 1/2$ and $P r [X = 1 0^{10}] = 1/2$ .
Then $E [X] = (- 1 0^{10}) (1/2) + (1 0^{10}) (1/2) = 0$ .

But $X$ is never close to its expectation 0! It’s always $1 0^{10}$ away.

We need a measure of spread or deviation.

Variance

The variance of a random variable $X$ , denoted $Va r [X]$ or $σ^{2}_X$ , measures the expected squared deviation of $X$ from its mean $μ = E [X]$ .

Va r [X] := E [(X - μ)^{2}]

Since $(X - μ)^{2} \geq 0$ , $Va r [X] \geq 0$ . A small variance means $X$ tends to be close to its mean. A large variance means $X$ can be far from its mean.

Note: One could also measure spread using the average absolute deviation, or even higher powers like the 4th or 6th. But squaring turns out to be mathematically convenient, it leads to cleaner algebra and works well with many powerful tools later on.

Alternative Formula for Variance

Va r [X] = E [(X - E [X])^{2}] = E [X^{2} - 2 XE [X] + (E [X])^{2}]

Using linearity of expectation:

= E [X^{2}] - E [2 XE [X]] + E [(E [X])^{2}]

Since $E [X]$ is a constant:

= E [X^{2}] - 2 E [X] E [X] + (E [X])^{2}

Va r [X] = E [X^{2}] - (E [X])^{2}

This is often easier to compute: calculate $E [X]$ and $E [X^{2}]$ , then use this formula.

The standard deviation $σ_X = Va r [X]$ is also commonly used. It has the same units as $X$ .

Properties of Variance

For any random variable $X$ and real constants $a, b \in R$ :

Va r [a X + b] = a^{2} Va r [X]

This tells us two important things:

Shifting a random variable by a constant ( $b$ ) doesn’t affect its variance.
Scaling a random variable by a factor ( $a$ ) multiplies the variance by $a^{2}$ .

Proof Sketch

Let $μ = E [X]$ . Then:

E [a X + b] Var [a X + b] = a μ + b = E [(a X + b - E [a X + b])^{2}] = E [(a X + b - (a μ + b))^{2}] (since E [a X + b] = a μ + b) = E [(a X - a μ)^{2}] = E [a^{2} (X - μ)^{2}] = a^{2} E [(X - μ)^{2}] = a^{2} Var [X]

So, the constant shift $b$ cancels out-it doesn’t affect how far values are from the mean, only where that mean is located.

The key idea: shifting $X$ moves everything together, so the “spread” stays the same. But scaling stretches or compresses the deviations, so it changes the variance.

Next time, we’ll see how variance lets us quantify how tightly a random variable clusters around its mean, using tools like Markov’s and Chebyshev’s inequalities.

Continue here: 14 Rules for Moments (Expectation, Variance), Estimating Probabilities (Markov, Chebyshev), Chernoff Bounds

CS Notes

Explorer

13 Conditional Random Variables, Multiple Random Variables (Joint PMF, Marginal PMF), Independence of Random Variables, Sum of Independent Random Variables, Wald's Identity, Variance and Concentration

Recap: Random Variables and Their Properties

Conditional Random Variables

Conditional PMF, CDF, and Expectation

Example: Sum of Two Dice Rolls Conditioned on Events

Expectation of the sum given the dice 2 is even

Expectation of the sum given the sum is even

Multiple Random Variables

Joint PMF (Gemeinsame Dichte)

Marginal PMF (Randdichte)

Example: Die roll (number $X$ ) then $X$ coin flips (number of heads $Y$ )

Independence of Random Variables

Definition of Independence (Version 1 - using events)

Definition of Independence (Version 2 - using PMFs)

Trick for Two Indicator Variables

Example of Independence/Dependence of Indicator Variables

Properties of Independent Random Variables

Sum of Independent Random Variables

Sum of Multiple Independent Random Variables

Step-by-step:

Consequences for Common Distributions

Wald’s Identity (Waldsche Identität)

Setup

Wald’s Identity states

Example 1: Die Roll and Coin Flips (revisited)

Example 2: Two Phases of Coin Flips

Variance and Concentration

Variance

Alternative Formula for Variance

Properties of Variance

Proof Sketch

Table of Contents

Graph View

CS Notes

Explorer

13 Conditional Random Variables, Multiple Random Variables (Joint PMF, Marginal PMF), Independence of Random Variables, Sum of Independent Random Variables, Wald's Identity, Variance and Concentration

Recap: Random Variables and Their Properties

Conditional Random Variables

Conditional PMF, CDF, and Expectation

Example: Sum of Two Dice Rolls Conditioned on Events

Expectation of the sum given the dice 2 is even

Expectation of the sum given the sum is even

Multiple Random Variables

Joint PMF (Gemeinsame Dichte)

Marginal PMF (Randdichte)

Example: Die roll (number X) then X coin flips (number of heads Y)

Independence of Random Variables

Definition of Independence (Version 1 - using events)

Definition of Independence (Version 2 - using PMFs)

Trick for Two Indicator Variables

Example of Independence/Dependence of Indicator Variables

Properties of Independent Random Variables

Sum of Independent Random Variables

Sum of Multiple Independent Random Variables

Step-by-step:

Consequences for Common Distributions

Wald’s Identity (Waldsche Identität)

Setup

Wald’s Identity states

Example 1: Die Roll and Coin Flips (revisited)

Example 2: Two Phases of Coin Flips

Variance and Concentration

Variance

Alternative Formula for Variance

Properties of Variance

Proof Sketch

Table of Contents

Graph View

Example: Die roll (number $X$ ) then $X$ coin flips (number of heads $Y$ )