Lecture from 18.12.2024 | Video: Videos ETHZ

This lecture explores the properties of symmetric matrices, leading to the development of the Singular Value Decomposition (SVD). We will begin by reviewing the spectral theorem and the Rayleigh quotient, which provide fundamental insights into the behavior of symmetric matrices. We’ll then move on to positive definite matrices, Gram matrices, and finally arrive at the SVD, a powerful tool for analyzing general matrices.

Symmetric Matrices

From the previous lecture…

Our starting point is the class of real, symmetric matrices. Recall that a matrix is symmetric if .

The Spectral Theorem

Theorem (Spectral Theorem)

Let be a real symmetric matrix. Then there exists an orthonormal basis of consisting of eigenvectors of , with corresponding real eigenvalues . Furthermore, can be decomposed as:

Explanation

  • Orthonormal Basis: The eigenvectors are mutually orthogonal (their dot products are zero) and have unit length (their norms are 1). They form a basis for , meaning any vector in can be expressed as a linear combination of these eigenvectors.
  • Decomposition: The spectral theorem states that any symmetric matrix can be expressed as a weighted sum of rank-1 matrices formed by the outer products of its eigenvectors. Each term represents a projection onto the direction of the eigenvector , scaled by the eigenvalue .

Note that this follow from:

is a diagonal matrix consisting of the eigenvalues (as shown in the last few lectures).

The Rayleigh Quotient

Proposition (Rayleigh Quotient)

Let be a symmetric matrix. The Rayleigh Quotient, defined for any non-zero vector , is given by:

The Rayleigh Quotient attains its maximum value at and its minimum value at , where and are the largest and smallest eigenvalues of , and and are their corresponding eigenvectors.

Proof

Since and , it suffices to show that:

From the spectral theorem, we know that for any , we can express as:

where form an orthonormal basis of eigenvectors of and are the associated eigenvalues.

For all , we have:

Summing over all , we get:

Dividing by :

Since the ‘s are orthonormal, the matrix with the ‘s as columns is orthogonal. Then, we have:

So:

Therefore:

This completes the proof.

Interpretation

  • The Rayleigh quotient provides a way to estimate the eigenvalues of a symmetric matrix without explicitly computing them.
  • It shows that the eigenvalues of a symmetric matrix can be characterized as the stationary values of the Rayleigh quotient.

Positive Definite Matrices

Positive definite and positive semidefinite matrices are special types of symmetric matrices that arise frequently in various fields, including optimization, statistics, mechanics, and machine learning. They possess unique properties that make them particularly useful for characterizing quadratic forms, defining inner products, and ensuring the convexity of functions.

Definition (Positive Definite and Positive Semidefinite)

A symmetric matrix is said to be:

  • Positive Semidefinite (PSD) if all its eigenvalues are non-negative (i.e., for all ).
  • Positive Definite (PD) if all its eigenvalues are strictly positive (i.e., for all ).

Intuition

  • Think of a symmetric matrix as defining a quadratic form . If is PSD, the quadratic form is always non-negative. If is PD, the quadratic form is always positive, except when is the zero vector.

  • Geometrically, a PD matrix corresponds to an ellipsoid that is stretched along its principal axes (the eigenvectors of ), with the lengths of the semi-axes determined by the eigenvalues. A PSD matrix corresponds to an ellipsoid that might be flattened in some directions (where the eigenvalues are zero).

Positive Definite Matrices and Minima... (MIT)

Proposition: Alternative Characterization of PSD and PD Matrices

The following proposition provides an alternative way to define PSD and PD matrices, which is often easier to work with than the eigenvalue definition.

  • A symmetric matrix is PSD if and only if for all .
  • A symmetric matrix is PD if and only if for all (i.e., for all non-zero vectors ).

Proof

These statements follow directly from the Rayleigh quotient. Recall that the Rayleigh quotient is defined as:

and that for a symmetric matrix , we have for all , where and are the smallest and largest eigenvalues of , respectively.

Positive Semi Definite

If all eigenvalues of are non-negative ( for all ), then . Since , we have for all . This implies for all (since ).

Conversely, if for all , then for all . Since is the minimum value of , we must have , which means all eigenvalues are non-negative.

Positive Definite

If all eigenvalues of are positive ( for all ), then . Since , we have for all . This implies for all .

Conversely, if for all , then for all . Since is the minimum value of , we must have , which means all eigenvalues are positive.

Lemma: Sum of PSD Matrices

If are symmetric and PSD, then is also PSD.

Proof

Let be an arbitrary vector. Since and are PSD, we know that:

Adding these inequalities, we get:

Using the distributive property of matrix multiplication, we can rewrite this as:

Since this holds for all , it follows that is PSD.

A Key Observation: Gram Matrices are PSD

Definition (Gram Matrix)

Given vectors in , let be the matrix whose columns are the vectors . The Gram matrix of is defined as the matrix where:

In matrix notation, the Gram matrix can be expressed as:

Interpretation:

  • The Gram matrix captures the pairwise inner products (dot products) between the vectors .
  • The diagonal entries represent the squared lengths (norms) of the vectors.

Proposition: Properties of Gram Matrices

Let . The non-zero eigenvalues of are the same as the non-zero eigenvalues (!!!) of . Both matrices are also symmetric and PSD.

Proof

Symmetry

, and . Thus, both and are symmetric.

PSD

For any , we have . This implies that is PSD. Similarly, for any , we have , so is also PSD.

Eigenvalues

Before going to the proof a small thing we need to show is…

Why is ?
  1. :

    • The rank of is equal to the rank of because row operations (used in Gaussian elimination) do not alter the number of linearly independent rows or columns.
    • The pivots in the row echelon form of correspond to independent rows and columns, and this property remains invariant under transposition.
  2. :

    • Use the rank-nullity theorem to argue:
      • Suppose , meaning . Then:

        Thus, .

      • Conversely, assume , meaning . Then:

        Hence, .

      • Since , their nullities are equal, and the ranks are the same:

  3. :

    • Similarly, for :
      • If , then , so .

      • Conversely, if , then:

        Since , we have .

      • Thus, , and their ranks are equal:

Combining all the above results, we have:

Now let us continue with the proof…

Proof

Let be the rank of . We know that:

and have a complete set of real eigenvalues and orthogonal eigenvectors. Let be the non-zero eigenvalues of and the corresponding eigenvectors. Let be the non-zero eigenvalues of and the corresponding eigenvectors.

We have . Hence, . Thus, is an eigenvalue of with eigenvector .

Similarly, . This shows that is an eigenvalue of with eigenvector .

Hence, .

Implications

  • This proposition establishes that Gram matrices are always symmetric and positive semidefinite.
  • The connection between the eigenvalues of and will be crucial for the development of the SVD.

What Else Do We Get for PSD Matrices?

Skipped during the lecture, but part of slides…

Proposition (Cholesky Decomposition)

Every symmetric positive semidefinite matrix is a Gram matrix of an upper triangular matrix . is known as the Cholesky Decomposition.

Proof

Since is symmetric and PSD, by the spectral theorem, there exists an orthogonal matrix and a diagonal matrix with the non-negative eigenvalues of on the diagonal such that:

Define as the diagonal matrix obtained by taking the square root of each diagonal entry of . Then:

Now, consider the decomposition of :

where is an orthogonal matrix and is an upper triangular matrix. Substituting this into the expression for :

Taking , we have , where is an upper triangular matrix. This establishes the Cholesky decomposition.

Significance:

  • The Cholesky decomposition provides an efficient way to factorize a symmetric positive semidefinite matrix.
  • It is widely used in numerical linear algebra, optimization, and statistics.

Introduction to Singular Value Decomposition (SVD)

The Singular Value Decomposition (SVD) is a fundamental matrix factorization technique with wide-ranging applications in linear algebra, data analysis, and machine learning. While the spectral theorem provides a powerful decomposition for symmetric matrices, the SVD extends this concept to all matrices, even those that are not square or symmetric.

The Guiding Question: How can we establish a decomposition analogous to the spectral theorem but applicable to general matrices?

Singular Value Decomposition... (MIT)

Motivation: Extending the Spectral Theorem

Recall that the spectral theorem allows us to decompose a symmetric matrix as:

where are orthonormal eigenvectors of , are the corresponding eigenvalues, is an orthogonal matrix whose columns are the eigenvectors, and is a diagonal matrix of eigenvalues. This decomposition is elegant and insightful, but its limitation is that it only applies to symmetric matrices.

The Problem of Different Dimensions

When dealing with a general matrix where , the concept of eigenvectors and eigenvalues in the traditional sense becomes problematic. The equation implies that maps a vector from to a scaled version of itself in . These spaces have different dimensions, so the notion of an eigenvector being scaled by a constant doesn’t directly apply.

A New Perspective: Mapping Between Bases

Instead of searching for eigenvectors of directly, let’s consider the possibility of mapping between different orthonormal bases. Suppose we have an orthonormal basis for and another orthonormal basis for . Could we find a way to relate these bases through the action of ?

Ideally, we’d like to find a set of relationships of the form:

where are scalar values. This equation suggests that maps each basis vector in to a scaled version of a corresponding basis vector in . The scalars would represent the “stretching factors” along these corresponding directions.

Leveraging and

From our earlier work, we know that for any matrix , the matrices and are symmetric and positive semidefinite. Moreover, they share the same non-zero eigenvalues. This suggests that we can apply the spectral theorem to these matrices to obtain orthonormal bases for and , respectively.

Let’s consider the spectral decompositions:

  • , where is orthogonal, and is a diagonal matrix with the eigenvalues of (which are non-negative).
  • , where is orthogonal, and is a diagonal matrix with the eigenvalues of (also non-negative).

The columns of and will serve as our desired orthonormal bases, and the square roots of the non-zero eigenvalues of and will provide the scaling factors .

Definition: Singular Value Decomposition (SVD)

Let be any matrix. A singular value decomposition of is a factorization of the form:

where:

  • is an orthogonal matrix whose columns are called the left singular vectors of .
  • is an orthogonal matrix whose columns are called the right singular vectors of .
  • is a diagonal matrix with non-negative entries on the diagonal, called the singular values of .

We stopped the lecture here, but the following content remained on the slides…

Compact SVD

If has rank , the SVD can be expressed in a more compact form:

where:

  • contains the first left singular vectors.
  • contains the first right singular vectors.
  • is a diagonal matrix containing the first (non-zero) singular values.

This compact form is often more efficient in practice, as it avoids storing and manipulating unnecessary zero singular values and their corresponding singular vectors.

Properties of the SVD and Connection to and

The SVD provides a powerful way to understand the structure of a general matrix by relating it to the associated symmetric matrices and . Let’s explore this connection further.

Suppose has an SVD given by .

Relationship with

Consider the product :

Since is orthogonal, . Also, note that is an diagonal matrix. Let’s examine the structure of :

  • If , then has the form:

    where is a zero matrix of size .

  • If , then has the form:

  • if , then is simply a diagonal matrix containing for .

Thus, we can make the following observations:

  1. The equation is an eigendecomposition of .
  2. The left singular vectors of (columns of ) are the eigenvectors of .
  3. The squares of the singular values of (diagonal entries of ) are the eigenvalues of .
  4. If , has singular values, while has eigenvalues. The “missing” eigenvalues are zero.

Relationship with

Now consider the product :

Since is orthogonal, . Also, note that is an diagonal matrix. Let’s examine the structure of :

  • If , then has the form:

    where is a zero matrix of size .

  • If , then has the form:

  • if , then is simply a diagonal matrix containing for .

We can make the following observations:

  1. The equation is an eigendecomposition of .
  2. The right singular vectors of (columns of ) are the eigenvectors of .
  3. The squares of the singular values of (diagonal entries of ) are the eigenvalues of .
  4. If , has singular values, while has eigenvalues. The “missing” eigenvalues are zero.

Key Insights

  • The SVD provides a direct link between a general matrix and the symmetric matrices and .
  • The singular values of are the square roots of the non-zero eigenvalues of both and .
  • The left and right singular vectors of form orthonormal bases for the column space and row space of , respectively.

We’ll look at the SVD proof and other topics next lecture…

Continue here: 28 SVD Theorem and Proof, SVD Abstraction, Pseudoinverse