03 Random Variables, Expectation, Variance

Having explored the fundamentals of probability spaces and events, we now introduce the concept of random variables, a crucial abstraction that allows us to work with numerical outcomes of random experiments. Random variables bridge the gap between probability theory and mathematical analysis, enabling us to use powerful tools from calculus and linear algebra to study random phenomena.

Definition of Random Variables (2.4)

A random variable is formally defined as a function that maps outcomes from a sample space $Ω$ to real numbers $R$ . It assigns a numerical value to each possible outcome of a random experiment.

Definition 2.25: A random variable is a function $X : Ω \to R$ , where $Ω$ is the sample space of a probability space.

The range or value set of a random variable $X$ , denoted $W_{X}$ , is the set of all possible values that $X$ can take:

W_{X} = X (Ω) = {x \in R ∣ \exists ω \in Ω with X (ω) = x}

For discrete probability spaces (which we primarily consider in this course), the range $W_{X}$ of a random variable is always countable, meaning it is either finite or countably infinite.

Examples of Random Variables

Coin Tosses: Consider the experiment of tossing a fair coin three times. The sample space is $Ω = {H, T}^{3}$ . We can define a random variable $Y$ that represents the number of heads in three tosses. For example, $Y (H T H) = 2$ , $Y (TTT) = 0$ , $Y (HHH) = 3$ . The range of $Y$ is $W_{Y} = {0, 1, 2, 3}$ .
Die Roll: In the experiment of rolling a six-sided die, $Ω = {1, 2, 3, 4, 5, 6}$ . We can define a random variable $X$ that represents the outcome of the die roll. In this case, $X (ω) = ω$ for each $ω \in Ω$ , and $W_{X} = {1, 2, 3, 4, 5, 6}$ .

Probability Distribution of a Random Variable

When we study a random variable $X$ , we are often interested in the probabilities of $X$ taking on specific values or falling within certain ranges. This information is captured by the probability distribution of $X$ .

For a discrete random variable $X$ with range $W_{X} = {x_{1}, x_{2}, \dots}$ , we consider the events ${X = x_{i}} = {ω \in Ω ∣ X (ω) = x_{i}}$ . The probability of $X$ taking the value $x_{i}$ is denoted as $P r [X = x_{i}]$ and is calculated as the sum of probabilities of all elementary events $ω$ in the event ${X = x_{i}}$ :

P r [X = x_{i}] = ω \in {X = x_{i}} \sum P r [ω]

We can also consider cumulative probabilities, such as $P r [X \leq x]$ , which represents the probability that the random variable $X$ takes a value less than or equal to $x$ .

Density Function and Distribution Function

To characterize the probability distribution of a random variable, we use two key functions:

Density Function ( $f_{X} (x)$ ): The density function (or probability mass function for discrete random variables) $f_{X} : R \to [0, 1]$ gives the probability that the random variable $X$ takes on a specific value $x$ :
$f_{X} (x) = P r [X = x] = P r [{ω \in Ω ∣ X (ω) = x}]$
The density function is non-zero only for values $x$ in the range $W_{X}$ .
Distribution Function ( $F_{X} (x)$ ): The distribution function (or cumulative distribution function, CDF) $F_{X} : R \to [0, 1]$ gives the cumulative probability that the random variable $X$ takes a value less than or equal to $x$ :
$F_{X} (x) = P r [X \leq x] = P r [{ω \in Ω ∣ X (ω) \leq x}] = x^{'} \in W_{X} : x^{'} \leq x \sum P r [X = x^{'}]$

The density function and the distribution function provide complete descriptions of the probability distribution of a random variable. Knowing either of these functions allows us to determine all probabilities associated with the random variable.

Notation Conventions

To simplify notation, we often use shorthand notations for probabilities involving random variables. Instead of writing $P r [{ω \in Ω ∣ X (ω) = x}]$ , we simply write $P r [X = x]$ . Similarly, $P r [X \leq x]$ is used for $P r [{ω \in Ω ∣ X (ω) \leq x}]$ . We may also use expressions like $P r [X \geq x]$ , $P r [a < X \leq b]$ , and $P r [X^{2} \geq 2]$ with analogous interpretations.

Expectation (2.4.1)

One of the most important characteristics of a random variable is its expectation or expected value. The expectation of a random variable represents its “average” or “mean” value over many repetitions of the random experiment. It is a measure of the central tendency of the distribution.

Definition 2.27: For a random variable $X$ , the expectation or expected value $E [X]$ is defined as:

E [X] = x \in W_{X} \sum x \cdot P r [X = x]

provided that the sum converges absolutely. If the sum does not converge absolutely, the expectation is said to be undefined.

For random variables with a finite range, the expectation is always well-defined. However, for random variables with infinite range, we need to ensure convergence of the sum.

Intuitive Interpretation of Expectation

The expectation $E [X]$ can be interpreted as the long-run average value of $X$ in repeated trials of the random experiment. If we perform the experiment many times and record the values of $X$ , the average of these values will tend to approach $E [X]$ as the number of trials increases. For example, in the coin toss example, the expected number of heads in three tosses is $E [Y] = 3/2 = 1.5$ . This does not mean we will ever observe 1.5 heads in a single experiment, but over many repetitions, the average number of heads per experiment will be close to 1.5.

Alternative Formula for Expectation (Lemma 2.29)

An alternative, often useful, formula for the expectation expresses it as a sum over all elementary events in the sample space:

Lemma 2.29: For a random variable $X : Ω \to R$ :

E [X] = ω \in Ω \sum X (ω) \cdot P r [ω]

This formula directly relates the expectation to the probabilities of elementary events and can be useful in certain calculations.

Expectation for Non-negative Integer Random Variables (Satz 2.30)

For random variables that take only non-negative integer values (i.e., $W_{X} \subseteq N_{0}$ ), there is another convenient formula for the expectation, expressed as a sum of tail probabilities:

Satz 2.30: If $X$ is a random variable with $W_{X} \subseteq N_{0}$ , then:

E [X] = i = 1 \sum \infty P r [X \geq i]

This formula expresses the expectation as the sum of probabilities of exceeding each integer value, providing a different perspective on expectation calculation, particularly useful for variables counting events or trials.

Markov’s Inequality

A fundamental inequality in probability theory, Markov’s Inequality, provides an upper bound on the probability that a non-negative random variable exceeds a certain threshold, in terms of its expectation.

Markov’s Inequality: If $X$ is a non-negative random variable and $t > 0$ , then:

P r [X \geq t] \leq \frac{E [ X ]}{t}

Markov’s Inequality is remarkably simple yet powerful. It provides a general bound on tail probabilities using only the expectation of the random variable, without requiring detailed knowledge of its distribution. While the bound provided by Markov’s Inequality can be loose in some cases, it is widely applicable and often sufficient for obtaining useful probabilistic estimates, especially when dealing with non-negative random variables.

Conditional Expectation

Analogous to conditional probability, we can define conditional expectation, which is the expected value of a random variable given that a certain event has occurred. Conditional expectation allows us to refine our predictions and update our expected values based on new information.

For a random variable $X$ and an event $A$ with $P r [A] > 0$ , the conditional expectation of $X$ given $A$ , denoted $E [X ∣ A]$ , is defined as:

E [X ∣ A] = x \in W_{X} \sum x \cdot P r [X = x ∣ A]

where $P r [X = x ∣ A]$ is the conditional probability of the event ${X = x}$ given event $A$ .

Law of Total Expectation (Satz 2.32)

Similar to the Law of Total Probability, the Law of Total Expectation (Satz 2.32) allows us to calculate the expectation of a random variable by partitioning the sample space into disjoint events and considering conditional expectations within each partition.

Theorem 2.32 (Law of Total Expectation): Let $A_{1}, A_{2}, \dots, A_{n}$ be pairwise disjoint events that partition the sample space $Ω$ (i.e., $⋃_{i = 1}^{n} A_{i} = Ω$ and $A_{i} \cap A_{j} = \emptyset$ for $i \neq = j$ ). Let $X$ be a random variable. Then, the expectation of $X$ can be calculated as:

E [X] = i = 1 \sum n E [X ∣ A_{i}] \cdot P r [A_{i}]

This law is invaluable when calculating expectations for complex random variables by breaking down the calculation into simpler, conditional expectations. It is widely used in probabilistic analysis of algorithms and systems.

Linearity of Expectation (Satz 2.33)

One of the most powerful and frequently used properties of expectation is linearity of expectation (Satz 2.33). It states that the expectation of a sum of random variables is equal to the sum of their expectations, regardless of whether the random variables are independent or dependent.

Theorem 2.33 (Linearity of Expectation): For random variables $X_{1}, X_{2}, \dots, X_{n}$ and constants $a_{1}, a_{2}, \dots, a_{n}, b \in R$ , let $X = a_{1} X_{1} + a_{2} X_{2} + \dots + a_{n} X_{n} + b$ . Then:

E [X] = a_{1} E [X_{1}] + a_{2} E [X_{2}] + \dots + a_{n} E [X_{n}] + b

Linearity of expectation is a remarkably versatile tool. It allows us to calculate the expectation of complex random variables by decomposing them into sums of simpler variables, without worrying about dependencies between the variables. This property is extensively used in probabilistic analysis of algorithms, combinatorics, and various other areas.

Indicator Variables

A particularly useful class of random variables in conjunction with linearity of expectation is indicator variables. An indicator variable $X_{A}$ for an event $A$ is a random variable that takes the value 1 if event $A$ occurs and 0 otherwise.

Beobachtung 2.35: For an indicator variable $X_{A}$ of an event $A$ :

X_{A} (ω) = {1, 0, if ω \in A otherwise

The expectation of an indicator variable is simply the probability of the corresponding event:

E [X_{A}] = P r [A]

Indicator variables, combined with linearity of expectation, provide a powerful technique for calculating expectations, especially for counting problems. By expressing a random variable as a sum of indicator variables, we can often simplify the calculation of its expectation significantly.

Variance (2.4.2)

While expectation provides a measure of the central tendency of a random variable, it does not capture the spread or variability of its distribution. Variance is a measure of how spread out the values of a random variable are around its mean. It quantifies the “dispersion” or “scatter” of the distribution.

Definition 2.39: For a random variable $X$ with expectation $μ = E [X]$ , the variance $Va r [X]$ is defined as the expected value of the squared deviation from the mean:

Va r [X] = E [(X - μ)^{2}] = E [(X - E [X])^{2}]

The standard deviation $σ = Va r [X]$ is the square root of the variance and provides a measure of spread in the original units of the random variable.

Computational Formula for Variance (Satz 2.40)

An alternative, often computationally convenient, formula for variance is given by:

Satz 2.40: For any random variable $X$ :

Va r [X] = E [X^{2}] - (E [X])^{2}

This formula expresses variance in terms of the expectation of $X^{2}$ and the square of the expectation of $X$ , often simplifying variance calculations.

Properties of Variance (Satz 2.41)

Variance has several important properties that are useful in calculations and analysis.

Satz 2.41: For a random variable $X$ and constants $a, b \in R$ :

Va r [a X + b] = a^{2} Va r [X]

This property shows how variance scales and shifts under linear transformations of the random variable. Adding a constant $b$ does not change the variance, while scaling by a factor $a$ scales the variance by $a^{2}$ .

Moments of Random Variables

Expectation and variance are examples of moments of a random variable. Moments provide a way to characterize the distribution of a random variable through numerical values.

Definition 2.42: For a random variable $X$ , the k-th moment is defined as $E [X^{k}]$ , and the k-th central moment is defined as $E [(X - E [X])^{k}]$ .

Expectation is the first moment, and variance is the second central moment. Higher-order moments capture more detailed information about the shape and characteristics of the distribution.

Wald’s Identity (2.6.4)

Wald’s Identity is a powerful result that relates the expectation of a sum of a random number of random variables to the expectations of the number of summands and the individual summands.

Satz 2.65 (Wald’s Identity): Let $N$ and $X$ be two independent random variables, where $W_{N} \subseteq N$ (N takes values in natural numbers). Let $X_{1}, X_{2}, \dots$ be independent copies of $X$ . Define $Z = \sum_{i = 1}^{N} X_{i}$ . Then:

E [Z] = E [N] \cdot E [X]

Wald’s Identity is a fundamental result in probability theory with wide applications in areas such as queuing theory, risk theory, and analysis of algorithms. It provides a way to calculate the expectation of a sum when the number of terms in the sum is itself a random variable, provided that certain independence conditions are met.

In summary, this section has introduced the concept of random variables, their probability distributions, expectation, variance, and related properties. These concepts and tools are essential for probabilistic modeling, analysis, and the design of randomized algorithms, which will be explored in subsequent sections.

Prev: 02 Conditional Probabilities, Independence | Next: 04 Important Discrete Distributions

CS Notes

Explorer