Lecture from 20.12.2024 | Video: Videos ETHZ
Singular Value Decomposition
Definition: Singular Value Decomposition (SVD)
Recall from last lecture…
Let be any matrix. A singular value decomposition of is a factorization of the form:
where:
- is an orthogonal matrix whose columns are called the left singular vectors of .
- is an orthogonal matrix whose columns are called the right singular vectors of .
- is a diagonal matrix with non-negative entries on the diagonal, called the singular values of .
Okay, here’s a complete proof of the Singular Value Decomposition (SVD) theorem, first for the general (full) SVD and then for the compact SVD.
Note that the lecturer looked at the case where , however below is the generalized proof for the complete SVD and the compact SVD…
SVD Theorem
Every matrix has a singular value decomposition of the form:
where:
- is an orthogonal matrix whose columns are the left singular vectors of .
- is an orthogonal matrix whose columns are the right singular vectors of .
- is a diagonal matrix with non-negative entries on the diagonal, called the singular values of .
Proof of the Full SVD
-
Spectral Decomposition of : Consider the symmetric matrix . By the spectral theorem, it has a complete set of orthonormal eigenvectors and can be decomposed as:
where is an orthogonal matrix whose columns are the eigenvectors of , and is a diagonal matrix containing the eigenvalues of .
-
Ordering Eigenvalues: Arrange the eigenvalues in in decreasing order: . Note that the eigenvalues are non-negative because is positive semidefinite.
-
Rank and Non-zero Eigenvalues: Let . Then, also has rank , and exactly eigenvalues are non-zero: and .
-
Singular Values: Define the singular values for . Note that only the first singular values are non-zero.
-
Constructing : Let be a diagonal matrix with the singular values on the diagonal, arranged in decreasing order.
-
Defining : Let be the matrix containing the first columns of (eigenvectors of corresponding to the non-zero eigenvalues). Define:
where is the diagonal matrix containing only the first non-zero singular values.
-
Orthogonality of : We need to show that :
-
Extending to an Orthogonal Matrix : Since has orthonormal columns, we can extend it to a full orthogonal matrix by adding orthonormal columns that are also orthogonal to the columns of . We can use the Gram-Schmidt process for this.
-
Showing : We’ll show that by demonstrating that their corresponding columns are equal.
- For :
- For : We need to show that . We know is orthogonal to the columns of , which are linear combinations of the columns of . This means that is orthogonal to every vector in the column space of , which is equivalent to being in the null space of . Thus, .
Combining these, we see that .
-
Final Step: Since is orthogonal, . Multiplying both sides of by on the right, we get:
This completes the proof of the full SVD.
Compact SVD Theorem
Every matrix of rank has a compact singular value decomposition of the form:
where:
- contains the first left singular vectors of .
- contains the first right singular vectors of .
- is a diagonal matrix containing the first (non-zero) singular values of .
Proof of the Compact SVD
The proof of the compact SVD follows directly from the proof of the full SVD. We have already constructed , , and in steps 1-7 above. We also showed that .
In step 9, we showed that for . This can be written in matrix form as:
Multiplying both sides on the right by , and using the fact that , we get:
From the full SVD proof, we know that . By considering only the first columns of , and non-zero entries in , and using the fact that the remaining columns of are in the nullspace of , we can write .
This completes the proof of the compact SVD.
Consequence of the SVD: Sum of Rank-1 Matrices
Not part of the lecture, but part of the slides…
Theorem: A rank- matrix can be expressed as a sum of rank-1 matrices.
Proof: This follows directly from the compact SVD:
Each outer product is a rank-1 matrix, and the SVD expresses as a weighted sum of these rank-1 matrices, where the weights are the singular values.
This decomposition is fundamental in many applications, including low-rank matrix approximation, dimensionality reduction, and data compression. The SVD provides a powerful and insightful way to understand the structure of any matrix.
Further Remarks on the SVD
Abstraction and Change of Basis: Understanding the Motivation for SVD (Remark 1)
Let’s delve into the core idea behind the Singular Value Decomposition (SVD) by exploring how a linear transformation can be represented differently under a change of basis. This provides a powerful way to understand the motivation and significance of the SVD.
Representing Linear Transformations with Respect to Different Bases
Consider a linear transformation represented by the matrix . This means that for any vector , the transformation is computed as , where .
-
Canonical Basis: Typically, we represent vectors using the standard (canonical) basis. In , the canonical basis is , where is a vector with a 1 in the -th position and 0s elsewhere. Similarly, the canonical basis in is . When we write , it’s implicitly understood that is represented in the canonical basis of , and the resulting vector is represented in the canonical basis of .
-
Expanding in the Canonical Basis: We can express the result of the transformation, , as a linear combination of the basis vectors of the codomain, . Let denote the -th component of . Then, we can write:
-
The Role of Abstraction: This formula highlights the role of abstraction. We are expressing the action of the linear transformation in terms of its effect on the components of the input vector, and then using the basis vectors of the codomain to construct the output vector.
Changing the Basis
Now, suppose we want to represent our vectors using different bases. Let:
- be a matrix whose columns form a basis for .
- be a matrix whose columns form a basis for .
Any vector can be expressed as a linear combination of the basis vectors in :
where is the vector of coordinates of with respect to the basis .
Similarly, any vector can be expressed as a linear combination of the basis vectors in :
where is the vector of coordinates of with respect to the basis .
Finding a New Representation for the Linear Transformation
Our goal is to find a matrix that represents the same linear transformation as , but with respect to the new bases and . In other words, we want a matrix such that if and , then:
This means that takes the coordinates of in the basis and produces the coordinates of in the basis .
We have , which can be written as . Multiplying both sides by on the left gives . Since we want , we can define: Multiplying both sides on the left by and on the right by , we also get the equivalent relationship
The Essence of SVD: The SVD seeks a special form of this relationship where:
- and are orthogonal matrices (their columns are orthonormal bases). This makes inverting them easy, as and .
- is a diagonal matrix (ideally with non-negative entries in decreasing order). This makes the transformation easy to understand, as it simply scales the components along the new basis vectors.
In essence, the SVD aims to find the “nicest” possible bases for the domain and codomain of a linear transformation, such that the transformation’s action is reduced to a simple scaling along the new coordinate axes. This is what makes the SVD such a powerful tool for understanding and manipulating linear transformations and matrices.
SVD and its Connection to the Spectral Theorem (Remark 2)
Let’s explore the connection between the Singular Value Decomposition (SVD) of an arbitrary matrix and the spectral theorem applied to the symmetric matrices and . This connection provides a deeper understanding of the SVD and its relationship to eigenvalues and eigenvectors.
Starting with the SVD
Let be an arbitrary matrix, and let its SVD be given by:
where and are orthogonal matrices, and is a diagonal matrix with the singular values of on its diagonal.
Transpose of A
Taking the transpose of , we get:
Forming
Now, let’s consider the product :
Since is orthogonal, . Thus:
Observations
- Eigenvectors of : The equation has the form of an eigendecomposition. The columns of are the eigenvectors of .
- Eigenvalues of : The matrix is a diagonal matrix. Its diagonal entries are the squares of the singular values of (), which are also the eigenvalues of . If , then the last diagonal entries of are zero.
- Spectral Theorem: This decomposition of is precisely the spectral theorem applied to the symmetric matrix . It confirms that has a complete set of orthonormal eigenvectors (the columns of ) and real, non-negative eigenvalues (the diagonal entries of ).
Forming
Similarly, let’s consider the product :
Since is orthogonal, . Thus:
Observations
- Eigenvectors of : The equation is an eigendecomposition of . The columns of are the eigenvectors of .
- Eigenvalues of : The matrix is a diagonal matrix. Its diagonal entries are also the squares of the singular values of (), which are the eigenvalues of . If , then the last diagonal entries of are zero.
- Spectral Theorem: This decomposition of is the spectral theorem applied to the symmetric matrix . It confirms that has a complete set of orthonormal eigenvectors (the columns of ) and real, non-negative eigenvalues (the diagonal entries of ).
A Generalized Notion of Eigenvectors
The SVD, through its connection to and , provides a generalization of the concept of eigenvectors. While eigenvectors are traditionally defined for square matrices, the singular vectors of a general matrix (which can be rectangular) can be thought of as analogous to eigenvectors in the following sense:
- The left singular vectors (columns of ) are eigenvectors of .
- The right singular vectors (columns of ) are eigenvectors of .
- The singular values of are the square roots of the non-zero eigenvalues of both and .
In other words, the SVD identifies two sets of orthonormal vectors (left and right singular vectors) that are related through the action of and , and it provides a set of scaling factors (singular values) that quantify the “stretching” or “shrinking” effect of the linear transformation along these directions. This provides a more general framework for understanding how a linear transformation acts on vectors, even when the matrix is not square or symmetric.
Here are some additional remarks on the properties and applications of the Singular Value Decomposition (SVD), expanding on the concepts presented in the lecture.
SVD and the (Pseudo)inverse (Remark 3)
Invertible Matrices
If is invertible and has an SVD , then its inverse can be easily computed using the SVD components:
Explanation
Since is invertible, all its singular values are non-zero, and is a square, diagonal matrix with non-zero entries. Thus, is simply the diagonal matrix with the reciprocals of the singular values on the diagonal. We can verify that this is the inverse:
Pseudoinverse
The concept of an inverse can be generalized to non-square or singular matrices using the pseudoinverse. The SVD provides a way to define the Moore-Penrose pseudoinverse, denoted by . If , then:
where is obtained from by taking the reciprocal of each non-zero singular value, leaving the zeros in place, and transposing the matrix.
Example
If , then . If , then .
Significance: The pseudoinverse is a powerful tool for solving least-squares problems, finding minimum-norm solutions to underdetermined systems, and analyzing matrix properties.
Compact SVD (Remark 4)
If has rank , its SVD can be represented in a compact form:
where:
- contains the first left singular vectors of (corresponding to the non-zero singular values).
- is a diagonal matrix containing the non-zero singular values of .
- contains the first right singular vectors of (corresponding to the non-zero singular values).
Example
Let be a matrix with rank 3. Then the compact SVD of would have the form:
where is , is , and is .
SVD as a Sum of Rank-1 Matrices (Remark 5)
The SVD provides a way to express any matrix as a sum of rank-1 matrices.
Theorem
Let have rank , with non-zero singular values , left singular vectors , and right singular vectors . Then can be expressed as:
Explanation
This follows directly from the compact SVD:
Expanding the matrix multiplication, we get:
Each term is a rank-1 matrix because it is the outer product of two vectors, and , scaled by the singular value .
Significance
This decomposition shows that any matrix can be built up from a linear combination of simple rank-1 matrices. This is fundamental to many applications, including:
- Low-rank approximation: Truncating the sum after terms provides the best rank- approximation of in terms of the Frobenius norm and the spectral norm.
- Data compression: Storing only the first singular values and corresponding singular vectors allows for a compressed representation of .
- Dimensionality reduction: The SVD can be used to project data onto a lower-dimensional subspace while preserving as much variance as possible (Principal Component Analysis).
The rest of the lecture didn’t seem useful enough to note down as it was mostly a bit of repetition…
This was the last lecture…