02 Conditional Probabilities, Independence

Probability Under Constraints: Introducing Conditional Probability

In the preceding section, we explored the basic framework of probability theory. Now, we delve into a crucial refinement: conditional probability. Conditional probability addresses how our knowledge of one event occurring affects the likelihood of another event. In essence, it is the probability of an event given that another event has already occurred or is assumed to be true. This is a powerful concept that allows us to update our beliefs and make more informed predictions as we gain new information.

Consider a scenario where we have some prior knowledge that restricts the possible outcomes of a random experiment. This new information effectively reduces our sample space. Conditional probability provides a way to calculate probabilities within this reduced sample space.

Definition of Conditional Probability

Let $A$ and $B$ be events in a probability space $(Ω, F, P r)$ , and assume that $P r [B] > 0$ . The conditional probability of event $A$ given event $B$ , denoted as $P r [A ∣ B]$ , is defined as:

P r [A ∣ B] = \frac{P r [ A \cap B ]}{P r [ B ]}

This definition is fundamental. It tells us that the conditional probability of $A$ given $B$ is the ratio of the probability of both $A$ and $B$ occurring to the probability of $B$ occurring. Intuitively, we are restricting our attention to the outcomes where $B$ occurs (the denominator $P r [B]$ ), and within this restricted space, we are interested in the proportion of outcomes where $A$ also occurs (the numerator $P r [A \cap B]$ ).

Illustrative Example: Card Hands

Imagine a scenario involving card games, such as poker. Let’s revisit the example where player A holds four Aces and a Heart Two. We were interested in the probability of event $F$ : “Player B has a Straight Flush”. Initially, we calculated $P r [F]$ in the full sample space of all possible hands for player B.

However, suppose player A, through some observation (perhaps card markings or subtle cues), gains additional information that event $K$ has occurred: “Player B only has clubs in hand”. This new information drastically changes the context. The original sample space $Ω$ is now restricted to a smaller space $Ω^{'}$ consisting only of hands with clubs. The probability of event $F$ in this new, restricted space is the conditional probability $P r [F ∣ K]$ .

According to our definition of conditional probability, we compute $P r [F ∣ K]$ as:

P r [F ∣ K] = \frac{P r [ F \cap K ]}{P r [ K ]}

In this context, $F \cap K$ represents the event that player B has a straight flush and only has clubs in hand. The probability $P r [K]$ represents the probability that player B only has clubs in hand. As we saw in the provided text’s example, this conditional probability $P r [F ∣ K]$ is significantly higher than the unconditional probability $P r [F]$ , demonstrating how new information can dramatically alter our probabilistic assessments.

Properties of Conditional Probability

Conditional probabilities, defined in this way, inherit all the fundamental properties of probabilities. For a fixed event $B$ with $P r [B] > 0$ , the function $P r [\cdot ∣ B]$ is itself a valid probability function. This means that it satisfies the axioms of probability:

Non-negativity and Boundedness: For any event $A$ , $0 \leq P r [A ∣ B] \leq 1$ .
Normalization: $P r [Ω∣ B] = 1$ .

Furthermore, conditional probabilities obey counterparts of the addition rule and other probability rules. For instance:

Conditional Probability of Impossible Event: $P r [\emptyset∣ B] = 0$ .
Conditional Probability of Complement: $P r [A^{c} ∣ B] = 1 - P r [A ∣ B]$ .

These properties ensure that conditional probability is a consistent and well-behaved extension of probability theory, allowing us to apply familiar probabilistic reasoning in situations with partial information.

Law of Total Probability (Satz 2.13)

A powerful tool for calculating probabilities is the Law of Total Probability (Satz 2.13). It allows us to compute the probability of an event by partitioning the sample space into disjoint events and considering conditional probabilities within each partition.

Theorem 2.13 (Law of Total Probability): Let $A_{1}, A_{2}, \dots, A_{n}$ be pairwise disjoint events that partition the sample space $Ω$ , i.e., $A_{i} \cap A_{j} = \emptyset$ for $i \neq = j$ and $⋃_{i = 1}^{n} A_{i} = Ω$ . Let $B$ be any event. Then, the probability of event $B$ can be calculated as:

P r [B] = i = 1 \sum n P r [B ∣ A_{i}] \cdot P r [A_{i}]

This law is incredibly useful when calculating $P r [B]$ directly is difficult, but it is easier to calculate the conditional probabilities $P r [B ∣ A_{i}]$ and the probabilities of the partitioning events $P r [A_{i}]$ . The Law of Total Probability essentially breaks down the calculation of $P r [B]$ into a weighted average of conditional probabilities, where the weights are the probabilities of the partitioning events.

Bayes’ Theorem (Satz 2.15)

Another cornerstone result in probability theory, particularly in statistical inference and Bayesian analysis, is Bayes’ Theorem (Satz 2.15). It provides a way to reverse conditional probabilities, i.e., to calculate $P r [A ∣ B]$ in terms of $P r [B ∣ A]$ and the prior probabilities $P r [A]$ and $P r [B]$ .

Theorem 2.15 (Bayes’ Theorem): Let $A_{1}, A_{2}, \dots, A_{n}$ be pairwise disjoint events that partition the sample space $Ω$ , i.e., $A_{i} \cap A_{j} = \emptyset$ for $i \neq = j$ and $⋃_{i = 1}^{n} A_{i} = Ω$ . Let $B$ be any event with $P r [B] > 0$ . Then, for any $i \in {1, \dots, n}$ :

P r [A_{i} ∣ B] = \frac{P r [ A _{i} \cap B ]}{P r [ B ]} = \frac{P r [ B ∣ A _{i} ] \cdot P r [ A _{i} ]}{\sum _{j = 1}^{n} P r [ B ∣ A _{j} ] \cdot P r [ A _{j} ]}

Bayes’ Theorem is derived directly from the definition of conditional probability and the Law of Total Probability. It is a powerful tool for updating probabilities in light of new evidence. The numerator $P r [B ∣ A_{i}] \cdot P r [A_{i}]$ represents the joint probability of $A_{i}$ and $B$ occurring. The denominator, $\sum_{j = 1}^{n} P r [B ∣ A_{j}] \cdot P r [A_{j}]$ , is simply $P r [B]$ by the Law of Total Probability, ensuring that the conditional probabilities sum to 1.

Bayes’ Theorem is widely used in medical diagnosis, spam filtering, machine learning, and many other fields where we need to update probabilities based on observed data.

Independence (2.3)

Defining Independence

The concept of independence is central to probability theory. Two events are independent if the occurrence of one event does not affect the probability of the other event occurring. Intuitively, independent events are events that are “unrelated” in a probabilistic sense.

Definition 2.18: Two events $A$ and $B$ are said to be independent if and only if:

P r [A \cap B] = P r [A] \cdot P r [B]

This definition captures the essence of independence. If $A$ and $B$ are independent, then the probability of both $A$ and $B$ occurring is simply the product of their individual probabilities.

If $P r [B] \neq = 0$ , then the definition of independence can be rewritten in terms of conditional probability as:

P r [A ∣ B] = \frac{P r [ A \cap B ]}{P r [ B ]} = \frac{P r [ A ] \cdot P r [ B ]}{P r [ B ]} = P r [A]

This equivalent form reinforces the intuitive notion of independence: the probability of event $A$ given that event $B$ has occurred is the same as the probability of $A$ without any knowledge of $B$ . The occurrence of $B$ provides no information about the likelihood of $A$ .

Pairwise vs. Mutual Independence

When dealing with more than two events, the concept of independence becomes more nuanced. We distinguish between pairwise independence and mutual independence.

Pairwise Independence: Events $A_{1}, A_{2}, \dots, A_{n}$ are pairwise independent if every pair of events $A_{i}$ and $A_{j}$ (for $i \neq = j$ ) is independent.
Mutual Independence: Events $A_{1}, A_{2}, \dots, A_{n}$ are mutually independent (or simply independent) if for every subset of indices $I \subseteq {1, 2, \dots, n}$ , the probability of the intersection of events $A_{i}$ for all $i \in I$ is equal to the product of their individual probabilities:

P r [i \in I ⋂ A_{i}] = i \in I \prod P r [A_{i}]

Mutual independence is a stronger condition than pairwise independence. Pairwise independence does not imply mutual independence, as demonstrated by Example 2.20 in the provided text. Mutual independence requires that all subsets of events satisfy the product rule, not just pairs.

Definition 2.22: Events $A_{1}, A_{2}, \dots, A_{n}$ are said to be independent (mutually independent) if for every subset of indices $I \subseteq {1, 2, \dots, n}$ ,

P r [i \in I ⋂ A_{i}] = i \in I \prod P r [A_{i}]

For an infinite family of events ${A_{i}}_{i \in N}$ , independence is defined similarly, requiring the product rule to hold for every finite subset of indices $I \subseteq N$ .

Checking for Independence (Lemma 2.23)

Checking mutual independence directly using Definition 2.22 can be computationally intensive, as we need to verify the product rule for all $2^{n}$ subsets of events. Lemma 2.23 provides a more practical equivalent condition for checking independence, especially useful for theoretical analysis.

Lemma 2.23: Events $A_{1}, A_{2}, \dots, A_{n}$ are independent if and only if for all sequences $(s_{1}, s_{2}, \dots, s_{n}) \in {0, 1}^{n}$ ,

P r [A_{1}^{s_{1}} \cap A_{2}^{s_{2}} \cap \dots \cap A_{n}^{s_{n}}] = P r [A_{1}^{s_{1}}] \cdot P r [A_{2}^{s_{2}}] \cdot \dots \cdot P r [A_{n}^{s_{n}}]

where $A_{i}^{1} = A_{i}$ and $A_{i}^{0} = A_{i}^{c}$ (complement of $A_{i}$ ).

This lemma states that independence is equivalent to the product rule holding for intersections where each event can be either in its original form or its complement. This condition is often easier to verify in practice than the original definition.

Properties of Independent Events (Lemma 2.24)

Independent events exhibit several useful properties that simplify probabilistic calculations.

Lemma 2.24: If events $A, B,$ and $C$ are independent, then:

$A \cap B$ and $C$ are independent.
$A \cup B$ and $C$ are independent.

This lemma demonstrates that independence is preserved under intersection and union operations, allowing us to construct more complex independent events from simpler ones. This property is crucial for analyzing systems composed of independent components.

Pseudorandomness: Simulating Randomness

In practice, true randomness is often difficult to achieve with deterministic computers. Pseudorandom number generators (PRNGs) are algorithms designed to produce sequences of numbers that appear random, although they are generated deterministically from an initial seed value. While PRNGs are not truly random, they are widely used in simulations, cryptography, and randomized algorithms as a practical approximation of randomness. The quality of a PRNG is judged by how well its output sequences pass statistical tests for randomness and how computationally infeasible it is to predict future outputs given past outputs.

In the context of randomized algorithms, we often assume access to truly random bits for simplicity of analysis. However, it is important to recognize that in real-world implementations, pseudorandom numbers are typically used as a substitute for true randomness. The effectiveness of randomized algorithms in practice often hinges on the quality of the PRNG employed.

In summary, this section has introduced the concept of conditional probability and independence, two cornerstones of probability theory that are essential for reasoning about probabilistic systems and designing randomized algorithms. Conditional probability allows us to update probabilities based on new information, while independence simplifies calculations and allows us to analyze complex systems by breaking them down into independent components. These concepts will be further developed and applied in the subsequent sections.

Prev: 01 Fundamentals, Probability Spaces and Events | Next: 03 Random Variables, Expectation, Variance

CS Notes

Explorer