26 Symmetric Matrices, Spectral Theorem, Rayleigh Quotient

Lecture from 13.12.2024 | Video: Videos ETHZ

Recap of Previous Results

Last few lectures, we established key results concerning eigenvalues, eigenvectors, and diagonalization:

Complete Set of Eigenvectors: We explored conditions under which a matrix possesses a complete set of eigenvectors that form a basis for the vector space.
- We proved that matrices with distinct eigenvalues always have a complete set.
- We also showed that diagonal matrices and projection matrices always have complete sets of eigenvectors.
- A crucial concept introduced was that for a matrix to have a complete set of eigenvectors, the geometric multiplicity of each eigenvalue (dimension of its eigenspace) must equal its algebraic multiplicity (number of times it appears as a root of the characteristic polynomial).
Diagonalization: If a matrix $A \in R^{n \times n}$ has a complete set of eigenvectors, it can be diagonalized as $A = V Λ V^{- 1}$ , where $V$ is the matrix whose columns are the eigenvectors, and $Λ$ is a diagonal matrix with the corresponding eigenvalues.
Similar Matrices: We defined similar matrices ( $B = S^{- 1} A S$ for some invertible $S$ ) and proved that similar matrices share the same eigenvalues.

Now, building upon these foundations, we delve into the specific and important case of symmetric matrices. We want to address the question: for which matrices $A \in R^{n \times n}$ do we have:

All eigenvalues $λ_{i}$ are real.
The geometric multiplicity of each eigenvalue $λ_{i}$ is equal to its algebraic multiplicity $m_{i}$ .

We’ve already seen that diagonal and projection matrices satisfy these conditions. While having distinct real eigenvalues guarantees real eigenvalues and equal algebraic and geometric multiplicities, this condition is not necessary. We are seeking a more general characterization. This leads us to the Singular Value Decomposition (SVD), a powerful factorization that applies to all matrices. However, before tackling the SVD, we’ll first analyze a crucial special case: symmetric matrices. Understanding the properties of symmetric matrices will provide essential building blocks for the SVD.

Symmetric Matrices: A Special Case

Definition: A matrix $A \in R^{n \times n}$ is symmetric if $A^{T} = A$ . That is, the entries of A are symmetric across the main diagonal: $a_{ij} = a_{ji} .$

Symmetric matrices possess remarkable properties that greatly simplify their eigenstructure and have profound implications in various applications.

Real Eigenvalues for Symmetric Matrices

Proposition: If $A$ is a real symmetric matrix ( $A \in R^{n \times n}$ and $A^{T} = A$ ), then all its eigenvalues are real.

2x2 Example:

Proof: Eigenvalues of Real Symmetric Matrices are Real

This is simpler than it seems! Here’s the key idea: we’ll use the properties of the conjugate transpose (denoted by $X^{*}$ ) to show that any eigenvalue must be equal to its own conjugate, which means it has to be real.

Conjugate Transpose: For any matrix (or vector) $X$ with complex entries, $X^{*}$ (also denoted as Hermitian of a matrix $X^{H}$ ) is formed by taking the transpose of $X$ and then taking the complex conjugate of each entry. For example:

If $X = [1 + i 3 i 2 4 - i]$ , then $X^{*} = [1 - i 2 - 3 i 4 + i]$

Important properties of the conjugate transpose:
- $(X Y)^{*} = Y^{*} X^{*}$
- For a real matrix $A$ , $A^{*} = A^{T}$
- For a scalar $c$ , $c^{*} = \overset{c}{ˉ}$ is the complex conjugate
- For a vector $v$ , $v^{*} v = ∣∣ v ∣ ∣^{2}$ , the squared magnitude of $v .$
Setting Up: Let $λ$ be an eigenvalue of $A$ with a corresponding eigenvector $v$ (which might be complex). This means:
$A v = λ v$
Taking the Conjugate Transpose: Applying the conjugate transpose to both sides of the equation:
$(A v)^{*} = (λ v)^{*}$ $v^{*} A^{*} = \overset{ˉ}{λ} v^{*}$
(Using properties from step 1).
Using Symmetry: Since A is real and symmetric, $A^{*} = A^{T} = A$ . Substituting this into our equation:
$v^{*} A = \overset{ˉ}{λ} v^{*}$
The Trick: Now, let’s be clever. Left-multiply the original eigenvector equation ( $A v = λ v$ ) by $v^{*}$ :
$v^{*} A v = λ v^{*} v$
Substituting: From step 4, we know $v^{*} A = \overset{ˉ}{λ} v^{*}$ . Substitute this into the equation from step 5:
$\overset{ˉ}{λ} v^{*} v = λ v^{*} v$
Conclusion: Since $v$ is non-zero (it’s an eigenvector!), $v^{*} v = ∣∣ v ∣ ∣^{2} > 0$ . We can safely divide both sides by $v^{*} v$ :
$\overset{ˉ}{λ} = λ$
This means $λ$ equals its own conjugate. The only way a complex number can be equal to its conjugate is if it’s a real number! Therefore, $λ$ must be real.

Orthogonal Eigenvectors for Distinct Eigenvalues of Symmetric Matrices

Proposition: If $A$ is a symmetric matrix and $v_{1}$ and $v_{2}$ are eigenvectors corresponding to distinct eigenvalues $λ_{1}$ and $λ_{2}$ , then $v_{1}$ and $v_{2}$ are orthogonal, i.e $v_{1} ⊥ v_{2}$ .

Proof

We have $A v_{1} = λ_{1} v_{1}$ and $A v_{2} = λ_{2} v_{2}$ .

Pre-multiply the first equation by $v_{2}^{T}$ : $v_{2}^{T} A v_{1} = λ_{1} v_{2}^{T} v_{1}$ .
Transpose the second equation and post-multiply by $v_{1}$ : $(A v_{2})^{T} = (λ_{2} v_{2})^{T} ⟹ v_{2}^{T} A^{T} = λ_{2} v_{2}^{T} .$ Then $v_{2}^{T} A^{T} v_{1} = λ_{2} v_{2}^{T} v_{1}$ .
Since $A$ is symmetric, $A = A^{T}$ , so: $v_{2}^{T} A v_{1} = λ_{2} v_{2}^{T} v_{1}$ .
Now we have two equations:
- $v_{2}^{T} A v_{1} = λ_{1} v_{2}^{T} v_{1}$
- $v_{2}^{T} A v_{1} = λ_{2} v_{2}^{T} v_{1}$
Subtracting the second equation from the first:
$0 = (λ_{1} - λ_{2}) v_{2}^{T} v_{1}$
Since $λ_{1} \neq = λ_{2}$ , we must have $v_{2}^{T} v_{1} = 0$ , which means $v_{1}$ and $v_{2}$ are orthogonal.

The Spectral Theorem

The Spectral Theorem is a cornerstone result in linear algebra, revealing the profound connection between symmetric matrices and orthonormal bases. It states that real symmetric matrices are always diagonalizable by an orthogonal matrix and have real eigenvalues.

Theorem (Spectral Theorem): Every symmetric matrix $A \in R^{n \times n}$ possesses the following properties:

Real Eigenvalues: All eigenvalues of $A$ are real.
Orthogonal Eigenvectors: Eigenvectors corresponding to distinct eigenvalues are orthogonal.
Orthonormal Basis of Eigenvectors: There exists an orthonormal basis for $R^{n}$ composed entirely of eigenvectors of $A$ .
Diagonalization by an Orthogonal Matrix: $A$ can be factored as: $A = Q Λ Q^{T}$ where:
- $Q \in R^{n \times n}$ is an orthogonal matrix ( $Q^{T} Q = I$ ) whose columns are the orthonormal eigenvectors of $A$ .
- $Λ \in R^{n \times n}$ is a diagonal matrix whose diagonal entries are the corresponding eigenvalues of $A$ .

Equivalent Formulation: Every symmetric matrix $A \in R^{n \times n}$ has $n$ real eigenvalues (counting multiplicities) and an orthonormal basis of $R^{n}$ formed by the eigenvectors of $A .$

Symmetric Matrices and Positive Definiteness... (MIT)

Proof of the Spectral Theorem (by Induction)

For every $k \in {1, \dots, n}$ , there exist $k$ orthonormal eigenvectors of $A$ .

Base Case ( $k = 1$ )

We know that $A$ has at least one real eigenvalue $λ_{1}$ (this was proven earlier). Let $v_{1}$ be a corresponding eigenvector. Since $v_{1}$ is non-zero, we can normalize it to obtain an eigenvector of unit length: $v_{1}^{'} = \frac{v _{1}}{∣∣ v _{1} ∣∣}$ . This gives us one orthonormal eigenvector.

The Inductive Step

Inductive Hypothesis

Assume that for some $k$ , where $1 \leq k < n$ , there exist $k$ orthonormal eigenvectors $v_{1}, v_{2}, \dots, v_{k}$ of the symmetric matrix $A \in R^{n \times n}$ , with corresponding real eigenvalues $λ_{1}, λ_{2}, \dots, λ_{k}$ .

Construction

Subspace and its Orthogonal Complement:
- Let $S = span {v_{1}, v_{2}, \dots, v_{k}}$ be the subspace spanned by the first $k$ orthonormal eigenvectors.
- Let $S^{⊥}$ be the orthogonal complement of $S$ . Since $dim (S) + dim (S^{⊥}) = n$ , we have $dim (S^{⊥}) = n - k$ .
Orthonormal Basis for $S^{⊥}$ :
- Let ${u_{1}, u_{2}, \dots, u_{n - k}}$ be an orthonormal basis for $S^{⊥}$ .
The Key Idea: To extend to $k + 1$ orthonormal eigenvectors, we must find a new eigenvector in $S^{⊥}$ . This ensures it’s orthogonal to the existing eigenvectors (by definition of $S^{⊥}$ ). The strategy involves showing that $A$ maps vectors from $S^{⊥}$ to $S^{⊥}$ which allows us to restrict the problem to a smaller subspace.

Showing that $A$ maps $S^{⊥}$ to $S^{⊥}$

Crucial Lemma: If $A$ is symmetric and $x \in S^{⊥}$ then $A x \in S^{⊥}$

Proof: Since $x$ is in $S^{⊥}$ , it’s orthogonal to each of the vectors $v_{1}, ..., v_{k} .$ We must show that $A x$ is also orthogonal to each $v_{i} .$ Consider the dot product of $A x$ with an arbitrary $v_{i}$ :

(A x)^{T} v_{i} = x^{T} A^{T} v_{i} = x^{T} (A v_{i}) = x^{T} (λ_{i} v_{i}) = λ_{i} (x^{T} v_{i}) = λ_{i} \cdot 0 = 0.

(Recall that A is symmetric so $A^{T} = A$ , and $v_{i}$ is an eigenvector so $A v_{i} = λ_{i} v_{i}$ , and $x$ is orthogonal to $v_{i}$ so their dot product is 0). Since $A x$ is orthogonal to every basis vector of $S$ , $A x$ is in $S^{⊥} .$

Constructing the Matrix $B$

Matrix $V_{k}$ : Form a matrix $V_{k} \in R^{n \times n}$ whose columns are the vectors ${v_{1}, \dots, v_{k}, u_{1}, \dots, u_{n - k}}$ . Note that $V_{k}$ is orthogonal since its columns form an orthornormal basis for $R^{n} .$ Consequently $V_{k}^{- 1} = V_{k}^{T} .$
Matrix $B$ : Define $B = V_{k}^{T} A V_{k}$ .
$B = v_{1}^{T} ⋮ v_{k}^{T} u_{1}^{T} ⋮ u_{n - k}^{T} A [v_{1} \dots v_{k} u_{1} \dots u_{n - k}]$

Then since $A v_{i} = λ_{i} v_{i}$ and $(A v_{i})^{T} = λ_{i} v_{i}^{T}$ , and $v_{i}^{T} u_{j} = 0$ and $u_{i}^{T} v_{j} = 0$ (because they are from orthogonal subspaces):

B = [Λ_{k} 0 0 C]

where $Λ_{k}$ is a $k \times k$ diagonal matrix with $λ_{1}, \dots, λ_{k}$ on the diagonal, and $C$ is an $(n - k) \times (n - k)$ symmetric matrix representing the transformation of vectors in $S^{⊥}$ by A.

$C$ is diagonal because:

$C_{ij} = u_{i}^{T} A u_{j}$ . Then $C_{ji} = u_{j}^{T} A u_{i} = u_{j}^{T} A^{T} u_{i} = (A u_{j})^{T} u_{i} = (u_{i}^{T} A u_{j})^{T}$ . Since these are scalars ( $1 \times 1$ matrices), the transpose is equal to the original value: $C_{ji} = u_{i}^{T} A u_{j} = C_{ij}$ . Thus $C$ is symmetric.

Finding the ( $k + 1$ )-th Eigenvector

1. Eigenvalue and Eigenvector of $C$

Since $C$ is a symmetric matrix, it has at least one real eigenvalue $λ_{k + 1}$ and a corresponding eigenvector $y \in R^{n - k}$ (a consequence of $C$ being symmetric). This means:

C y = λ_{k + 1} y .

2. Constructing $w$

Now, construct a vector $w \in R^{n}$ as follows:

w = [0 y]

where $0$ is the zero vector in $R^{k}$ . Notice that $w$ has its first $k$ components as 0 and its remaining $n - k$ components equal to the components of $y .$ Importantly, since $y$ is an eigenvector of $C$ , it is non-zero. This means $w$ is non-zero.

3. Showing w is an Eigenvector of B

Let’s compute $B w$ :

B w = [Λ_{k} 0 0 C] [0 y] = [Λ_{k} 0 + 0 y 00 + C y] = [0 λ_{k + 1} y] = λ_{k + 1} [0 y] = λ_{k + 1} w

This demonstrates that $w$ is an eigenvector of $B$ with eigenvalue $λ_{k + 1} .$

4. Connecting back to $A$

Recall that $B = V_{k}^{T} A V_{k}$ . Therefore, $V_{k} B = A V_{k}$ . Now multiply both sides of the equation $B w = λ_{k + 1} w$ by $V_{k}$ on the left:

V_{k} B w = λ_{k + 1} V_{k} w ⟹ A V_{k} w = λ_{k + 1} V_{k} w .

Let $v_{k + 1} = V_{k} w .$ The equation above becomes:

A v_{k + 1} = λ_{k + 1} v_{k + 1} .

This shows that $v_{k + 1}$ is an eigenvector of $A$ with eigenvalue $λ_{k + 1}$ !

5. Orthogonality of $v_{k + 1}$

Since $w = [0 y]$ , and $V_{k} = [v_{1} ... v_{k} u_{1} ... u_{n - k}]$ , we have:

v_{k + 1} = V_{k} w = i = 1 \sum n - k y_{i} u_{i} .

Since $v_{k + 1}$ is a linear combination of the vectors $u_{i}$ which are in $S^{⊥}$ , this confirms that $v_{k + 1}$ is orthogonal to all vectors in S and is therefore orthogonal to $v_{1}, ... v_{k} .$ We can normalize $v_{k + 1}$ to obtain an orthonormal set of $k + 1$ eigenvectors.

Completing the Induction

We have successfully found a ( $k + 1$ )-th orthonormal eigenvector corresponding to a real eigenvalue. This completes the inductive step, proving that a symmetric matrix $A \in R^{n \times n}$ has $n$ orthonormal eigenvectors.

Conclusion

This inductive proof establishes the Spectral Theorem, demonstrating that real symmetric matrices have a complete set of orthonormal eigenvectors that form a basis for $R^{n}$ , and are therefore diagonalizable by an orthogonal matrix. This decomposition has significant implications for understanding the properties and behavior of symmetric matrices in various applications. The Spectral Theorem underlies many important techniques in data analysis, machine learning, physics, and engineering, highlighting its fundamental importance in linear algebra and related fields.

Consequences of the Spectral Theorem

Corollary:

Orthogonal Diagonalization: For any symmetric matrix $A \in R^{n \times n}$ , there exists an orthogonal matrix $Q \in R^{n \times n}$ (whose columns are eigenvectors of $A$ ) such that $A = Q Λ Q^{T}$ , where $Λ$ is a diagonal matrix with the eigenvalues of $A$ on its diagonal. This is a direct consequence of the orthonormal basis property. Since the columns of Q are orthonormal, $Q^{- 1} = Q^{T}$ .
Spectral Decomposition: Let $A$ be a real $n \times n$ symmetric matrix. Let $q_{1}, \dots, q_{n}$ be an orthonormal basis of eigenvectors of $A$ , and let $λ_{1}, \dots, λ_{n}$ be the associated eigenvalues. Then $A$ can be written as a sum of rank-1 projections:
$A = i = 1 \sum n λ_{i} q_{i} q_{i}^{T}$
This decomposition expresses $A$ as a weighted sum of projections onto the one-dimensional subspaces spanned by each eigenvector. This reveals how $A$ acts as a combination of scalings along orthogonal directions defined by its eigenvectors.
Rank and Eigenvalues: The rank of a real symmetric matrix $A$ is equal to the number of non-zero eigenvalues (counting multiplicities). This follows directly from the diagonalization: the rank of $A$ is the same as the rank of $Λ$ , which is the number of non-zero diagonal entries.

The Rayleigh Quotient

The Rayleigh quotient is a tool for analyzing the eigenvalues and eigenvectors of a symmetric matrix. It provides a way to estimate eigenvalues and characterize their extremal properties.

Definition

Let $A \in R^{n \times n}$ be a symmetric matrix ( $A = A^{T}$ ). The Rayleigh quotient $R (x)$ is a scalar-valued function defined for any non-zero vector $x \in R^{n}$ as:

R (x) = \frac{x ^{T} A x}{x ^{T} x}

The denominator $x^{T} x$ is the squared Euclidean norm of $x$ ( $∣∣ x ∣ ∣^{2}$ ), representing the squared length of the vector. The numerator $x^{T} A x$ is a quadratic form associated with the matrix $A$ .

Properties of the Rayleigh Quotient

Boundedness (for Symmetric Matrices): For a symmetric matrix $A$ , the Rayleigh quotient is bounded by the minimum and maximum eigenvalues of $A$ . Let $λ_{min}$ and $λ_{ma x}$ be the minimum and maximum eigenvalues of $A$ , respectively. Then for any non-zero $x$ :

λ_{min} \leq R (x) \leq λ_{ma x}

This is a crucial property. It tells us that the range of values the Rayleigh quotient can take is restricted by the eigenvalues of the matrix. We might prove this in the next lecture…

Continue here: 27 Positive (Semi)Definite Matrices, Gram Matrices, SVD

CS Notes

Explorer

26 Symmetric Matrices, Spectral Theorem, Rayleigh Quotient

Recap of Previous Results