27 Positive (Semi)Definite Matrices, Gram Matrices, SVD

Lecture from 18.12.2024 | Video: Videos ETHZ

This lecture explores the properties of symmetric matrices, leading to the development of the Singular Value Decomposition (SVD). We will begin by reviewing the spectral theorem and the Rayleigh quotient, which provide fundamental insights into the behavior of symmetric matrices. We’ll then move on to positive definite matrices, Gram matrices, and finally arrive at the SVD, a powerful tool for analyzing general matrices.

Symmetric Matrices

From the previous lecture…

Our starting point is the class of real, symmetric matrices. Recall that a matrix $A \in R^{n \times n}$ is symmetric if $A = A^{T}$ .

The Spectral Theorem

Theorem (Spectral Theorem)

Let $A$ be a real $n \times n$ symmetric matrix. Then there exists an orthonormal basis of $R^{n}$ consisting of eigenvectors $v_{1}, \dots, v_{n}$ of $A$ , with corresponding real eigenvalues $λ_{1}, \dots, λ_{n}$ . Furthermore, $A$ can be decomposed as:

A = i = 1 \sum n λ_{i} v_{i} v_{i}^{T}

Explanation

Orthonormal Basis: The eigenvectors $v_{1}, \dots, v_{n}$ are mutually orthogonal (their dot products are zero) and have unit length (their norms are 1). They form a basis for $R^{n}$ , meaning any vector in $R^{n}$ can be expressed as a linear combination of these eigenvectors.
Decomposition: The spectral theorem states that any symmetric matrix can be expressed as a weighted sum of rank-1 matrices formed by the outer products of its eigenvectors. Each term $λ_{i} v_{i} v_{i}^{T}$ represents a projection onto the direction of the eigenvector $v_{i}$ , scaled by the eigenvalue $λ_{i}$ .

Note that this follow from:

A = V Λ V^{T}

$Λ$ is a diagonal matrix consisting of the eigenvalues (as shown in the last few lectures).

The Rayleigh Quotient

Proposition (Rayleigh Quotient)

Let $A \in R^{n \times n}$ be a symmetric matrix. The Rayleigh Quotient, defined for any non-zero vector $x \in R^{n} ∖ {0}$ , is given by:

R (x) = \frac{x ^{T} A x}{x ^{T} x} = \frac{x ^{T} A x}{∥ x ∥ ^{2}}

The Rayleigh Quotient $R (x)$ attains its maximum value at $R (v_{ma x}) = λ_{ma x}$ and its minimum value at $R (v_{min}) = λ_{min}$ , where $λ_{ma x}$ and $λ_{min}$ are the largest and smallest eigenvalues of $A$ , and $v_{ma x}$ and $v_{min}$ are their corresponding eigenvectors.

Proof

Since $R (v_{ma x}) = λ_{ma x}$ and $R (v_{min}) = λ_{min}$ , it suffices to show that:

λ_{min} \leq R (x) \leq λ_{ma x} for all x \in R^{n} ∖ {0}

From the spectral theorem, we know that for any $x \in R^{n} ∖ {0}$ , we can express $R (x)$ as:

R (x) = \frac{x ^{T} ( \sum _{i = 1}^{n} λ _{i} v _{i} v _{i}^{T} ) x}{∥ x ∥ ^{2}} = \frac{\sum _{i = 1}^{n} λ _{i} ( x ^{T} v _{i} ) ^{2}}{∥ x ∥ ^{2}}

where $v_{1}, \dots, v_{n}$ form an orthonormal basis of eigenvectors of $A$ and $λ_{1}, \dots, λ_{n}$ are the associated eigenvalues.

For all $1 \leq i \leq n$ , we have:

λ_{min} (x^{T} v_{i})^{2} \leq λ_{i} (x^{T} v_{i})^{2} \leq λ_{ma x} (x^{T} v_{i})^{2}

Summing over all $i$ , we get:

λ_{min} i = 1 \sum n (x^{T} v_{i})^{2} \leq i = 1 \sum n λ_{i} (x^{T} v_{i})^{2} \leq λ_{ma x} i = 1 \sum n (x^{T} v_{i})^{2}

Dividing by $∣∣ x ∣ ∣^{2}$ :

λ_{min} \frac{\sum _{i = 1}^{n} ( x ^{T} v _{i} ) ^{2}}{∥ x ∥ ^{2}} \leq \frac{\sum _{i = 1}^{n} λ _{i} ( x ^{T} v _{i} ) ^{2}}{∥ x ∥ ^{2}} \leq λ_{ma x} \frac{\sum _{i = 1}^{n} ( x ^{T} v _{i} ) ^{2}}{∥ x ∥ ^{2}}

Since the $v_{i}$ ‘s are orthonormal, the matrix $V$ with the $v_{i}$ ‘s as columns is orthogonal. Then, we have:

i = 1 \sum n (x^{T} v_{i})^{2} = ∣∣ V x ∣ ∣^{2} = ∣∣ x ∣ ∣^{2}

So:

\frac{\sum _{i = 1}^{n} ( x ^{T} v _{i} ) ^{2}}{∥ x ∥ ^{2}} = \frac{∣∣ x ∣ ∣ ^{2}}{∣∣ x ∣ ∣ ^{2}} = 1

Therefore:

λ_{min} \leq R (x) \leq λ_{ma x}

This completes the proof.

Interpretation

The Rayleigh quotient provides a way to estimate the eigenvalues of a symmetric matrix without explicitly computing them.
It shows that the eigenvalues of a symmetric matrix can be characterized as the stationary values of the Rayleigh quotient.

Positive Definite Matrices

Positive definite and positive semidefinite matrices are special types of symmetric matrices that arise frequently in various fields, including optimization, statistics, mechanics, and machine learning. They possess unique properties that make them particularly useful for characterizing quadratic forms, defining inner products, and ensuring the convexity of functions.

Definition (Positive Definite and Positive Semidefinite)

A symmetric matrix $A \in R^{n \times n}$ is said to be:

Positive Semidefinite (PSD) if all its eigenvalues are non-negative (i.e., $λ_{i} \geq 0$ for all $i$ ).
Positive Definite (PD) if all its eigenvalues are strictly positive (i.e., $λ_{i} > 0$ for all $i$ ).

Intuition

Think of a symmetric matrix $A$ as defining a quadratic form $q (x) = x^{T} A x$ . If $A$ is PSD, the quadratic form is always non-negative. If $A$ is PD, the quadratic form is always positive, except when $x$ is the zero vector.
Geometrically, a PD matrix corresponds to an ellipsoid that is stretched along its principal axes (the eigenvectors of $A$ ), with the lengths of the semi-axes determined by the eigenvalues. A PSD matrix corresponds to an ellipsoid that might be flattened in some directions (where the eigenvalues are zero).

Positive Definite Matrices and Minima... (MIT)

Proposition: Alternative Characterization of PSD and PD Matrices

The following proposition provides an alternative way to define PSD and PD matrices, which is often easier to work with than the eigenvalue definition.

A symmetric matrix $A \in R^{n \times n}$ is PSD if and only if $x^{T} A x \geq 0$ for all $x \in R^{n}$ .
A symmetric matrix $A \in R^{n \times n}$ is PD if and only if $x^{T} A x > 0$ for all $x \in R^{n} ∖ {0}$ (i.e., for all non-zero vectors $x$ ).

Proof

These statements follow directly from the Rayleigh quotient. Recall that the Rayleigh quotient is defined as:

R (x) = \frac{x ^{T} A x}{x ^{T} x}

and that for a symmetric matrix $A$ , we have $λ_{min} \leq R (x) \leq λ_{ma x}$ for all $x \neq = 0$ , where $λ_{min}$ and $λ_{ma x}$ are the smallest and largest eigenvalues of $A$ , respectively.

Positive Semi Definite

If all eigenvalues of $A$ are non-negative ( $λ_{i} \geq 0$ for all $i$ ), then $λ_{min} \geq 0$ . Since $R (x) \geq λ_{min}$ , we have $R (x) \geq 0$ for all $x \neq = 0$ . This implies $x^{T} A x \geq 0$ for all $x \in R^{n}$ (since $x^{T} x = ∣∣ x ∣ ∣^{2} \geq 0$ ).

Conversely, if $x^{T} A x \geq 0$ for all $x \in R^{n}$ , then $R (x) \geq 0$ for all $x \neq = 0$ . Since $λ_{min}$ is the minimum value of $R (x)$ , we must have $λ_{min} \geq 0$ , which means all eigenvalues are non-negative.

Positive Definite

If all eigenvalues of $A$ are positive ( $λ_{i} > 0$ for all $i$ ), then $λ_{min} > 0$ . Since $R (x) \geq λ_{min}$ , we have $R (x) > 0$ for all $x \neq = 0$ . This implies $x^{T} A x > 0$ for all $x \in R^{n} ∖ {0}$ .

Conversely, if $x^{T} A x > 0$ for all $x \in R^{n} ∖ {0}$ , then $R (x) > 0$ for all $x \neq = 0$ . Since $λ_{min}$ is the minimum value of $R (x)$ , we must have $λ_{min} > 0$ , which means all eigenvalues are positive.

Lemma: Sum of PSD Matrices

If $A, B \in R^{n \times n}$ are symmetric and PSD, then $A + B$ is also PSD.

Proof

Let $x \in R^{n}$ be an arbitrary vector. Since $A$ and $B$ are PSD, we know that:

$x^{T} A x \geq 0$
$x^{T} B x \geq 0$

Adding these inequalities, we get:

x^{T} A x + x^{T} B x \geq 0

Using the distributive property of matrix multiplication, we can rewrite this as:

x^{T} (A + B) x \geq 0

Since this holds for all $x \in R^{n}$ , it follows that $A + B$ is PSD.

A Key Observation: Gram Matrices are PSD

Definition (Gram Matrix)

Given $n$ vectors $v_{1}, \dots, v_{n}$ in $R^{m}$ , let $V \in R^{m \times n}$ be the matrix whose columns are the vectors $v_{i}$ . The Gram matrix of $V$ is defined as the $n \times n$ matrix $G$ where:

G_{ij} = v_{i}^{T} v_{j}

In matrix notation, the Gram matrix can be expressed as:

G = V^{T} V

Interpretation:

The Gram matrix captures the pairwise inner products (dot products) between the vectors $v_{1}, \dots, v_{n}$ .
The diagonal entries $G_{ii} = v_{i}^{T} v_{i} = ∣∣ v_{i} ∣ ∣^{2}$ represent the squared lengths (norms) of the vectors.

Proposition: Properties of Gram Matrices

Let $A \in R^{m \times n}$ . The non-zero eigenvalues of $A^{T} A \in R^{n \times n}$ are the same as the non-zero eigenvalues (!!!) of $A A^{T} \in R^{m \times m}$ . Both matrices are also symmetric and PSD.

Proof

Symmetry

$(A^{T} A)^{T} = A^{T} (A^{T})^{T} = A^{T} A$ , and $(A A^{T})^{T} = (A^{T})^{T} A^{T} = A A^{T}$ . Thus, both $A^{T} A$ and $A A^{T}$ are symmetric.

PSD

For any $x \in R^{n}$ , we have $x^{T} (A^{T} A) x = (A x)^{T} (A x) = ∣∣ A x ∣ ∣^{2} \geq 0$ . This implies that $A^{T} A$ is PSD. Similarly, for any $y \in R^{m}$ , we have $y^{T} (A A^{T}) y = (A^{T} y)^{T} (A^{T} y) = ∣∣ A^{T} y ∣ ∣^{2} \geq 0$ , so $A A^{T}$ is also PSD.

Eigenvalues

Before going to the proof a small thing we need to show is…

Why is $r ank (A) = r ank (A^{T}) = r ank (A^{T} A) = r ank (A A^{T})$ ?

$r ank (A) = r ank (A^{T})$ :
- The rank of $A$ is equal to the rank of $A^{T}$ because row operations (used in Gaussian elimination) do not alter the number of linearly independent rows or columns.
- The pivots in the row echelon form of $A$ correspond to independent rows and columns, and this property remains invariant under transposition.
$r ank (A) = r ank (A^{T} A)$ :
- Use the rank-nullity theorem to argue:
  - Suppose $x \in N (A)$ , meaning $A x = 0$ . Then:
    $A^{T} A x = A^{T} (A x) = 0.$
    Thus, $N (A) \subseteq N (A^{T} A)$ .
  - Conversely, assume $x \in N (A^{T} A)$ , meaning $A^{T} A x = 0$ . Then:
    $x^{T} A^{T} A x = 0 ⟹ ∥ A x ∥^{2} = 0 ⟹ A x = 0.$
    Hence, $N (A^{T} A) \subseteq N (A)$ .
  - Since $N (A) = N (A^{T} A)$ , their nullities are equal, and the ranks are the same:
    $r ank (A) = r ank (A^{T} A) .$
$r ank (A) = r ank (A A^{T})$ :
- Similarly, for $A A^{T}$ :
  - If $x \in N (A)$ , then $A x = 0 ⟹ A A^{T} (A x) = 0$ , so $N (A) \subseteq N (A A^{T})$ .
  - Conversely, if $x \in N (A A^{T})$ , then:
    $A A^{T} x = 0 ⟹ x^{T} A A^{T} x = 0 ⟹ ∥ A^{T} x ∥^{2} = 0 ⟹ A^{T} x = 0.$
    Since $A^{T} x = 0 ⟹ A x = 0$ , we have $N (A A^{T}) \subseteq N (A)$ .
  - Thus, $N (A A^{T}) = N (A)$ , and their ranks are equal:
    $r ank (A) = r ank (A A^{T}) .$

Combining all the above results, we have:

r ank (A) = r ank (A^{T}) = r ank (A^{T} A) = r ank (A A^{T}) .

Now let us continue with the proof…

Proof

Let $r$ be the rank of $A$ . We know that:

r ank (A) = r ank (A^{T}) = r ank (A^{T} A) = r ank (A A^{T}) = r

$A^{T} A$ and $A A^{T}$ have a complete set of real eigenvalues and orthogonal eigenvectors. Let $λ_{1}, ..., λ_{r}$ be the $r$ non-zero eigenvalues of $A^{T} A$ and $v_{1}, ..., v_{r}$ the corresponding eigenvectors. Let $μ_{1}, ..., μ_{r}$ be the $r$ non-zero eigenvalues of $A A^{T}$ and $w_{1}, ..., w_{r}$ the corresponding eigenvectors.

We have $A^{T} A v_{k} = λ_{k} v_{k}$ . Hence, $A A^{T} A v_{k} = λ_{k} A v_{k}$ . Thus, $λ_{k}$ is an eigenvalue of $A A^{T}$ with eigenvector $A v_{k}$ .

Similarly, $(A^{T} A) A^{T} w_{i} = A^{T} (A A^{T} w_{i}) = μ_{i} A^{T} w_{i}$ . This shows that $μ_{i}$ is an eigenvalue of $A^{T} A$ with eigenvector $A^{T} w_{i}$ .

Hence, ${μ_{1}, \dots, μ_{r}} = {λ_{1}, \dots, λ_{r}}$ .

Implications

This proposition establishes that Gram matrices are always symmetric and positive semidefinite.
The connection between the eigenvalues of $A^{T} A$ and $A A^{T}$ will be crucial for the development of the SVD.

What Else Do We Get for PSD Matrices?

Skipped during the lecture, but part of slides…

Proposition (Cholesky Decomposition)

Every symmetric positive semidefinite matrix $M$ is a Gram matrix of an upper triangular matrix $C$ . $M = C^{T} C$ is known as the Cholesky Decomposition.

Proof

Since $M$ is symmetric and PSD, by the spectral theorem, there exists an orthogonal matrix $V$ and a diagonal matrix $Λ$ with the non-negative eigenvalues of $M$ on the diagonal such that:

M = V Λ V^{T}

Define $Λ^{1/2}$ as the diagonal matrix obtained by taking the square root of each diagonal entry of $Λ$ . Then:

M = V Λ^{1/2} Λ^{1/2} V^{T} = (V Λ^{1/2}) (V Λ^{1/2})^{T}

Now, consider the $QR$ decomposition of $(V Λ^{1/2})^{T}$ :

(V Λ^{1/2})^{T} = QR

where $Q$ is an orthogonal matrix and $R$ is an upper triangular matrix. Substituting this into the expression for $M$ :

M = (QR)^{T} (QR) = R^{T} Q^{T} QR = R^{T} R

Taking $C = R$ , we have $M = C^{T} C$ , where $C$ is an upper triangular matrix. This establishes the Cholesky decomposition.

Significance:

The Cholesky decomposition provides an efficient way to factorize a symmetric positive semidefinite matrix.
It is widely used in numerical linear algebra, optimization, and statistics.

Introduction to Singular Value Decomposition (SVD)

The Singular Value Decomposition (SVD) is a fundamental matrix factorization technique with wide-ranging applications in linear algebra, data analysis, and machine learning. While the spectral theorem provides a powerful decomposition for symmetric matrices, the SVD extends this concept to all matrices, even those that are not square or symmetric.

The Guiding Question: How can we establish a decomposition analogous to the spectral theorem but applicable to general matrices?

Singular Value Decomposition... (MIT)

Motivation: Extending the Spectral Theorem

Recall that the spectral theorem allows us to decompose a symmetric matrix $A \in R^{n \times n}$ as:

A = i = 1 \sum n λ_{i} v_{i} v_{i}^{T} = V Λ V^{T}

where $v_{1}, \dots, v_{n}$ are orthonormal eigenvectors of $A$ , $λ_{1}, \dots, λ_{n}$ are the corresponding eigenvalues, $V$ is an orthogonal matrix whose columns are the eigenvectors, and $Λ$ is a diagonal matrix of eigenvalues. This decomposition is elegant and insightful, but its limitation is that it only applies to symmetric matrices.

The Problem of Different Dimensions

When dealing with a general matrix $A \in R^{m \times n}$ where $m \neq = n$ , the concept of eigenvectors and eigenvalues in the traditional sense becomes problematic. The equation $A v = λ v$ implies that $A$ maps a vector $v$ from $R^{n}$ to a scaled version of itself in $R^{m}$ . These spaces have different dimensions, so the notion of an eigenvector being scaled by a constant doesn’t directly apply.

A New Perspective: Mapping Between Bases

Instead of searching for eigenvectors of $A$ directly, let’s consider the possibility of mapping between different orthonormal bases. Suppose we have an orthonormal basis ${v_{1}, \dots, v_{n}}$ for $R^{n}$ and another orthonormal basis ${u_{1}, \dots, u_{m}}$ for $R^{m}$ . Could we find a way to relate these bases through the action of $A$ ?

Ideally, we’d like to find a set of relationships of the form:

A v_{i} = σ_{i} u_{i}

where $σ_{i}$ are scalar values. This equation suggests that $A$ maps each basis vector $v_{i}$ in $R^{n}$ to a scaled version of a corresponding basis vector $u_{i}$ in $R^{m}$ . The scalars $σ_{i}$ would represent the “stretching factors” along these corresponding directions.

Leveraging $A^{T} A$ and $A A^{T}$

From our earlier work, we know that for any matrix $A \in R^{m \times n}$ , the matrices $A^{T} A$ and $A A^{T}$ are symmetric and positive semidefinite. Moreover, they share the same non-zero eigenvalues. This suggests that we can apply the spectral theorem to these matrices to obtain orthonormal bases for $R^{n}$ and $R^{m}$ , respectively.

Let’s consider the spectral decompositions:

$A A^{T} = U Λ U^{T}$ , where $U \in R^{m \times m}$ is orthogonal, and $Λ$ is a diagonal matrix with the eigenvalues of $A A^{T}$ (which are non-negative).
$A^{T} A = V Λ^{'} V^{T}$ , where $V \in R^{n \times n}$ is orthogonal, and $Λ^{'}$ is a diagonal matrix with the eigenvalues of $A^{T} A$ (also non-negative).

The columns of $U$ and $V$ will serve as our desired orthonormal bases, and the square roots of the non-zero eigenvalues of $A A^{T}$ and $A^{T} A$ will provide the scaling factors $σ_{i}$ .

Definition: Singular Value Decomposition (SVD)

Let $A \in R^{m \times n}$ be any matrix. A singular value decomposition of $A$ is a factorization of the form:

A = U Σ V^{T}

where:

$U \in R^{m \times m}$ is an orthogonal matrix whose columns are called the left singular vectors of $A$ .
$V \in R^{n \times n}$ is an orthogonal matrix whose columns are called the right singular vectors of $A$ .
$Σ \in R^{m \times n}$ is a diagonal matrix with non-negative entries $σ_{1} \geq σ_{2} \geq \dots \geq σ_{m i n {m, n}} \geq 0$ on the diagonal, called the singular values of $A$ .

We stopped the lecture here, but the following content remained on the slides…

Compact SVD

If $A$ has rank $r$ , the SVD can be expressed in a more compact form:

A = U_{r} Σ_{r} V_{r}^{T}

where:

$U_{r} \in R^{m \times r}$ contains the first $r$ left singular vectors.
$V_{r} \in R^{n \times r}$ contains the first $r$ right singular vectors.
$Σ_{r} \in R^{r \times r}$ is a diagonal matrix containing the first $r$ (non-zero) singular values.

This compact form is often more efficient in practice, as it avoids storing and manipulating unnecessary zero singular values and their corresponding singular vectors.

Properties of the SVD and Connection to $A A^{T}$ and $A^{T} A$

The SVD provides a powerful way to understand the structure of a general matrix $A$ by relating it to the associated symmetric matrices $A A^{T}$ and $A^{T} A$ . Let’s explore this connection further.

Suppose $A \in R^{m \times n}$ has an SVD given by $A = U Σ V^{T}$ .

Relationship with $A A^{T}$

Consider the product $A A^{T}$ :

A A^{T} = (U Σ V^{T}) (U Σ V^{T})^{T} = U Σ V^{T} V Σ^{T} U^{T} = U (Σ Σ^{T}) U^{T}

Since $V$ is orthogonal, $V^{T} V = I$ . Also, note that $Σ Σ^{T}$ is an $m \times m$ diagonal matrix. Let’s examine the structure of $Σ Σ^{T}$ :

If $m > n$ , then $Σ Σ^{T}$ has the form:
$Σ Σ^{T} = σ_{1}^{2} σ_{2}^{2} ⋱ σ_{n}^{2} 0$
where $0$ is a zero matrix of size $(m - n) \times (m - n)$ .
If $m < n$ , then $Σ Σ^{T}$ has the form:
$Σ Σ^{T} = σ_{1}^{2} σ_{2}^{2} ⋱ σ_{m}^{2}$
if $m = n$ , then $Σ Σ^{T}$ is simply a diagonal matrix containing $σ_{i}^{2}$ for $i = 1, ..., n$ .

Thus, we can make the following observations:

The equation $A A^{T} = U (Σ Σ^{T}) U^{T}$ is an eigendecomposition of $A A^{T}$ .
The left singular vectors of $A$ (columns of $U$ ) are the eigenvectors of $A A^{T}$ .
The squares of the singular values of $A$ (diagonal entries of $Σ Σ^{T}$ ) are the eigenvalues of $A A^{T}$ .
If $m > n$ , $A$ has $n$ singular values, while $A A^{T}$ has $m$ eigenvalues. The “missing” $m - n$ eigenvalues are zero.

Relationship with $A^{T} A$

Now consider the product $A^{T} A$ :

A^{T} A = (U Σ V^{T})^{T} (U Σ V^{T}) = V Σ^{T} U^{T} U Σ V^{T} = V (Σ^{T} Σ) V^{T}

Since $U$ is orthogonal, $U^{T} U = I$ . Also, note that $Σ^{T} Σ$ is an $n \times n$ diagonal matrix. Let’s examine the structure of $Σ^{T} Σ$ :

If $n > m$ , then $Σ^{T} Σ$ has the form:
$Σ^{T} Σ = σ_{1}^{2} σ_{2}^{2} ⋱ σ_{m}^{2} 0$
where $0$ is a zero matrix of size $(n - m) \times (n - m)$ .
If $n < m$ , then $Σ^{T} Σ$ has the form:
$Σ^{T} Σ = σ_{1}^{2} σ_{2}^{2} ⋱ σ_{n}^{2}$
if $m = n$ , then $Σ^{T} Σ$ is simply a diagonal matrix containing $σ_{i}^{2}$ for $i = 1, ..., n$ .

We can make the following observations:

The equation $A^{T} A = V (Σ^{T} Σ) V^{T}$ is an eigendecomposition of $A^{T} A$ .
The right singular vectors of $A$ (columns of $V$ ) are the eigenvectors of $A^{T} A$ .
The squares of the singular values of $A$ (diagonal entries of $Σ^{T} Σ$ ) are the eigenvalues of $A^{T} A$ .
If $n > m$ , $A$ has $m$ singular values, while $A^{T} A$ has $n$ eigenvalues. The “missing” $n - m$ eigenvalues are zero.

Key Insights

The SVD provides a direct link between a general matrix $A$ and the symmetric matrices $A A^{T}$ and $A^{T} A$ .
The singular values of $A$ are the square roots of the non-zero eigenvalues of both $A A^{T}$ and $A^{T} A$ .
The left and right singular vectors of $A$ form orthonormal bases for the column space and row space of $A$ , respectively.

We’ll look at the SVD proof and other topics next lecture…

Continue here: 28 SVD Theorem and Proof, SVD Abstraction, Pseudoinverse

CS Notes

Explorer

27 Positive (Semi)Definite Matrices, Gram Matrices, SVD

Symmetric Matrices

The Spectral Theorem

Theorem (Spectral Theorem)

Explanation

The Rayleigh Quotient

Proposition (Rayleigh Quotient)

Proof

Interpretation

Positive Definite Matrices

Definition (Positive Definite and Positive Semidefinite)

Intuition

Proposition: Alternative Characterization of PSD and PD Matrices

Proof

Positive Semi Definite

Positive Definite

Lemma: Sum of PSD Matrices

Proof

A Key Observation: Gram Matrices are PSD

Definition (Gram Matrix)

Proposition: Properties of Gram Matrices

Proof

Symmetry

PSD

Eigenvalues

Why is rank(A)=rank(AT)=rank(ATA)=rank(AAT)?

Proof

Implications

What Else Do We Get for PSD Matrices?

Proposition (Cholesky Decomposition)

Proof

Introduction to Singular Value Decomposition (SVD)

Motivation: Extending the Spectral Theorem

The Problem of Different Dimensions

A New Perspective: Mapping Between Bases

Leveraging ATA and AAT

Definition: Singular Value Decomposition (SVD)

Compact SVD

Properties of the SVD and Connection to AAT and ATA

Relationship with AAT

Relationship with ATA

Key Insights

Table of Contents

Graph View

Backlinks

Why is $r ank (A) = r ank (A^{T}) = r ank (A^{T} A) = r ank (A A^{T})$ ?

Leveraging $A^{T} A$ and $A A^{T}$

Properties of the SVD and Connection to $A A^{T}$ and $A^{T} A$

Relationship with $A A^{T}$

Relationship with $A^{T} A$