Lecture from: 03.04.2025 | Video: Homelab

We’ve established what variance is and some of its basic properties. Now, let’s explore how variance behaves with sums and products of random variables. This will lead us to powerful tools called concentration inequalities, which allow us to bound the probability that a random variable deviates far from its expected value.

Recap: Expectation and Variance

  • Expected Value : The average value of . We desire to quantify exactly that .

  • Variance : Measures the expected squared deviation from the mean. A small variance suggests is usually close to .

  • Property of Variance: . Adding a constant shifts the distribution but doesn’t change its spread (variance). Scaling by scales the variance by .

Rules for Moments: Expectation and Variance

Let’s now summarize how expectation and variance behave when working with multiple random variables. These foundational properties are often referred to as rules for moments, where:

  • The first moment is the expectation: .
  • The second central moment is the variance: .

1. Linearity of Expectation

This holds for all random variables and , regardless of whether they are dependent or independent. This property - linearity of expectation - is one of the most powerful and broadly applicable tools in probability. This is always true.

2. Product of Expectations

This identity holds if and are independent.

Proof (for independent ):

By independence: .

However, this fails for dependent variables.

Counterexample: Let be a Bernoulli() variable, and let . Then:

  • , so
  • But since , so

Thus, when and are dependent.

3. Variance of a Sum

This identity holds if and are independent.

Proof (for independent ):

Let . Then:

By linearity of expectation:

The final term is the covariance:

The final term is the covariance:

If and are independent, then so are and , since subtracting constants does not affect independence. Therefore:

But:

So:

Thus, under independence:

In fact, this result extends to sums of more than two variables: If are pairwise independent, then:

Pairwise independence is sufficient here, which is strictly weaker than full (mutual) independence.

4. Variance of a Product

This is false in general, even when and are independent.

Counterexample: Let be independent Bernoulli() variables:

  • So

Now let :

  • only when (probability ), and otherwise.
  • So

Clearly,

Summary of Moment Rules

PropertyAlways True?Conditions Required
✅ AlwaysNone
✅ If independent independent
✅ If independent (pairwise) independent
❌ Not in generalFails even if independent

Estimating Probabilities: Concentration Inequalities

We know the expectation gives the average value of a random variable. But how likely is it that takes a value far from its mean?

The goal is to bound:

for some deviation . These bounds are called concentration inequalities — they describe how tightly is concentrated around its mean.

Markov’s Inequality

This is the most basic concentration inequality. It applies to any non-negative random variable and requires only the expectation.

Intuitive explanation:

Theorem (Markov’s Inequality)

Let be a non-negative random variable. Then for any :

Proof

We split the expectation as follows:

Since all terms are non-negative, the first sum is non-negative, so:

Each , so replacing with in this sum gives:

Rearranging:

This is a one-sided inequality: it only bounds the probability that is large. It’s general, but can be loose since it only uses the mean.

Chebyshev’s Inequality

Chebyshev’s inequality gives a two-sided bound using both the mean and the variance.

Theorem (Chebyshev’s Inequality)

Let have mean and finite variance . Then for any :

Proof

Let . This is a non-negative random variable. The event:

So:

Apply Markov’s Inequality to :

But , so:

Interpretation

Smaller variance means tighter concentration around the mean. The tail probability decays quadratically in .

Letting , where , we get:

Examples:

  • Deviation of 2 standard deviations:
  • Deviation of 10 standard deviations:

Variance of Common Distributions

These are useful when applying Chebyshev’s inequality:

  • Bernoulli(): ,

  • Binomial(): , with and independent ,

  • Poisson(): ,

  • Geometric(): ,

Example: Coupon Collector with Chebyshev

Let be the total time to collect all coupons. Then:

where , and .

We know:

To apply Chebyshev, compute the variance:

Substitute , and after simplification:

So:

Applying Chebyshev

Let . Then:

This shows that is sharply concentrated around .

Sharper Example

Let . Then:

So even deviations of size (like ) become unlikely for large .

Chernoff Bounds: Concentration for Sums of Random Variables

Chebyshev’s inequality is useful for bounding how far a random variable might stray from its expected value, but it’s often too loose, especially for sums of independent random variables. This is where Chernoff bounds become powerful.

Chernoff bounds provide exponentially decreasing bounds on tail probabilities — i.e. the probability that a sum of independent random variables deviates significantly from its expected value.

Setting: Sums of Independent Bernoulli Variables

Suppose are independent Bernoulli random variables with , and let

Then Chernoff bounds give the following tail bounds:

Chernoff Bounds (Standard Forms)

For :

  • Upper tail:

  • Lower tail:

These bounds are exponentially decreasing in , which gives much tighter control than Chebyshev’s behavior.

Examples
  • Try different values of : 0.1, 0.5, 1.0
1. (10% above the mean)

🟠 Quite likely — about a 71.7% upper bound.

2. (50% above the mean)

🟢 Very unlikely — less than 0.03%.

3. (100% above the mean, i.e., double the expectation)

🔵 Astronomically unlikely — almost zero.

Intuition:

The probability decays exponentially fast as you move further above the expected value. The larger the deviation () or the expected value (), the sharper the drop.

Comparing Chebyshev and Chernoff for

Suppose . Then:

Let’s study the probability of a deviation of size . Rewriting in terms of standard deviations:

So we’re asking: What is the probability that deviates from its mean by more than ?

Chebyshev’s Bound

Substitute (i.e. deviation of ):

Interpretation: Chebyshev decays only polynomially in . This means even for large deviations (e.g., ), the bound is still only .

Chernoff’s Bound (Upper Tail)

For , , so:

Substitute , then:

Interpretation: Chernoff decays exponentially in , giving much smaller probabilities for large deviations.

General Chernoff Bounds (Summary)

Let , where are independent.

Let .

Then for all :

  1. For large deviations: If , then

Idea Behind the Proof (MGFs and Markov)

We apply Markov’s inequality, not directly to , but to an exponential transformation of :

Let . Then:

So we want to bound .

Because and the are independent:

Each , so:

Therefore:

The optimal value of depends on , and we can choose it to minimize this upper bound. The algebra is technical, but with this approach we can derive the exponential bounds.

Example Quick Bound Using

To get a quick bound on large deviations ():

Let , and suppose we want . Then:

Using the same independence and Bernoulli MGF trick:

Now observe that for any real number , we have . Applying this to each factor:

So:

If we want to make the bound even simpler, observe that:

So to ensure , it’s enough that:

Basically if which happens if the numerator is smaller than then denominator, then we can upper bound by .

A conservative sufficient condition is:

Under this condition, the tail bound becomes very simple:

Conclusion

Chernoff bounds are powerful tools for analyzing sums of independent random variables. Unlike Chebyshev, which offers general but loose control, Chernoff bounds exploit independence to yield strong exponential concentration.

These results are foundational in randomized algorithms, probabilistic method, machine learning, and theoretical computer science.

Continue here: 15 Randomized Algorithms, Monte Carlo vs Las Vegas, Reducing Error Probability