19 Gram-Schmidt Process, QR-Decomposition, Properties of Q and R

Lecture from 20.11.2024 | Video: Videos ETHZ

This lecture covers the Gram-Schmidt process for orthonormalizing a set of vectors and the concept of the pseudoinverse, a generalization of the matrix inverse for non-square or singular matrices.

Preparations for the Gram-Schmidt Process

Task

Given a subspace $S$ of $R^{m}$ spanned by a basis $a_{1}, \dots, a_{n}$ (i.e., $S = Span (a_{1}, \dots, a_{n})$ ), our goal is to construct an orthonormal basis for $S$ .

The Idea (Two Vectors)

To illustrate the Gram-Schmidt process, consider a subspace spanned by two vectors $a_{1}$ and $a_{2}$ .

Steps:

Normalization: Normalize the first vector $a_{1}$ to obtain: $q_{1} = \frac{a _{1}}{∥ a _{1} ∥} .$
Orthogonalization: Subtract the projection of $a_{2}$ onto $q_{1}$ from $a_{2}$ : $v_{2} = a_{2} - (a_{2}^{T} q_{1}) q_{1} .$ This ensures $v_{2}$ is orthogonal to $q_{1}$ .
- Key observation: $v_{2} \neq = 0$ because $a_{1}$ and $a_{2}$ are linearly independent.
Normalization: Normalize $v_{2}$ to obtain: $q_{2} = \frac{v _{2}}{∥ v _{2} ∥} .$

Claim: $q_{1}$ and $q_{2}$ are orthonormal.

Normalization: By construction, $∥ q_{1} ∥ = 1$ and $∥ q_{2} ∥ = 1$ .
Orthogonality: By definition of $v_{2}$ , the projection of $a_{2}$ onto $q_{1}$ is removed, leaving $v_{2}$ orthogonal to $q_{1}$ . To show this explicitly: $q_{1}^{T} q_{2} = \frac{1}{∥ v _{2} ∥} q_{1}^{T} (a_{2} - (a_{2}^{T} q_{1}) q_{1}) .$

Expanding this: $q_{1}^{T} q_{2} = \frac{1}{∥ v _{2} ∥} (q_{1}^{T} a_{2} - a_{2}^{T} q_{1} \cdot q_{1}^{T} q_{1}) .$

Since $q_{1}^{T} q_{1} = 1$ and $a_{2}^{T} q_{1} = q_{1}^{T} a_{2}$ , this simplifies to: $q_{1}^{T} q_{2} = 0.$

Thus, $q_{1}$ and $q_{2}$ are orthogonal, and since they are also normalized, they form an orthonormal set.

The Gram-Schmidt process works by systematically orthogonalizing and normalizing vectors in a given basis. This ensures the resulting vectors are both orthogonal and of unit length, creating an orthonormal basis for the subspace $S$ .

The process can be extended to any number of vectors, ensuring that each new vector is orthogonal to all previously processed vectors in the set.

The Gram-Schmidt Process

The Gram-Schmidt process is an algorithm for orthonormalizing a set of linearly independent vectors. It transforms a set of vectors $a_{1}, \dots, a_{n}$ that span a subspace $S$ into an orthonormal basis $q_{1}, \dots, q_{n}$ for the same subspace.

Intuition

The process works by iteratively constructing orthonormal vectors from the original basis vectors:

Start by normalizing the first vector $a_{1}$ to create $q_{1}$ .
For each subsequent vector $a_{k}$ , subtract its projection onto the subspace spanned by the previously constructed orthonormal vectors $q_{1}, \dots, q_{k - 1}$ .
This subtraction removes all components of $a_{k}$ that lie in the directions of the existing $q_{i}$ . What remains is a vector orthogonal to all previous $q_{i}$ .
Normalize this orthogonal vector to obtain the next orthonormal vector, $q_{k}$ .

This iterative approach guarantees that the resulting set ${q_{1}, \dots, q_{n}}$ is orthonormal and spans the same subspace $S$ as the original set ${a_{1}, \dots, a_{n}}$ .

Gram-Schmidt Algorithm

Initialize: Normalize the first vector to obtain:
$q_{1} = \frac{a _{1}}{∥ a _{1} ∥}$
Iterate: For $k = 2, \dots, n$ :
- Orthogonalize: Subtract the projection of $a_{k}$ onto the subspace spanned by $q_{1}, \dots, q_{k - 1}$ :
  $d_{k} = a_{k} - \sum_{i = 1}^{k - 1} (a_{k}^{T} q_{i}) q_{i}$
- Normalize: Normalize the resulting orthogonal vector:
  $q_{k} = \frac{d _{k}}{∥ d _{k} ∥}$

Proof of Correctness (by Induction)

We will prove by induction that for each $k$ , the vectors $q_{1}, \dots, q_{k}$ form an orthonormal basis for $S_{k} = Span (a_{1}, \dots, a_{k})$ .

Base Case ( $k = 1$ ):

The vector $q_{1}$ is defined as:
$q_{1} = \frac{a _{1}}{∥ a _{1} ∥}$
Since $q_{1}$ is scaled to have unit length, $∥ q_{1} ∥ = 1$ .
The span of $q_{1}$ is equal to $S_{1}$ because $q_{1}$ is a scalar multiple of $a_{1}$ .

Thus, $q_{1}$ forms an orthonormal basis for $S_{1}$ .

Inductive Hypothesis:

Assume that $q_{1}, \dots, q_{k - 1}$ form an orthonormal basis for $S_{k - 1} = Span (a_{1}, \dots, a_{k - 1})$ .

Inductive Step ( $k$ ):

We need to show that:

$q_{k}$ has unit length.
$q_{k}$ is orthogonal to $q_{1}, \dots, q_{k - 1}$ .
$q_{1}, \dots, q_{k}$ span $S_{k}$ .

1. $q_{k}$ has unit length:

By construction, $q_{k}$ is defined as:

$q_{k} = \frac{d _{k}}{∥ d _{k} ∥}, where d_{k} = a_{k} - \sum_{i = 1}^{k - 1} (a_{k}^{T} q_{i}) q_{i} .$
Since $q_{k}$ is normalized, its length is:

$∥ q_{k} ∥ = \frac{d _{k}}{∥ d _{k} ∥} = 1.$

2. $q_{k}$ is orthogonal to $q_{1}, \dots, q_{k - 1}$ :

To prove this, consider any $j < k$ :

$q_{j}^{T} q_{k} = q_{j}^{T} \frac{d _{k}}{∥ d _{k} ∥} .$

Substitute $d_{k}$ :

$q_{j}^{T} q_{k} = \frac{1}{∥ d _{k} ∥} q_{j}^{T} (a_{k} - \sum_{i = 1}^{k - 1} (a_{k}^{T} q_{i}) q_{i}) .$

Distribute the dot product:

$q_{j}^{T} q_{k} = \frac{1}{∥ d _{k} ∥} (q_{j}^{T} a_{k} - \sum_{i = 1}^{k - 1} (a_{k}^{T} q_{i}) q_{j}^{T} q_{i}) .$

Since $q_{1}, \dots, q_{k - 1}$ are orthonormal, $q_{j}^{T} q_{i} = 0$ for $i \neq = j$ , and $q_{j}^{T} q_{j} = 1$ . This simplifies to:

$q_{j}^{T} q_{k} = \frac{1}{∥ d _{k} ∥} (q_{j}^{T} a_{k} - (a_{k}^{T} q_{j})) .$

Since $a_{k}^{T} q_{j} = q_{j}^{T} a_{k}$ , we have:

$q_{j}^{T} q_{k} = 0.$

Thus, $q_{k}$ is orthogonal to $q_{1}, \dots, q_{k - 1}$ .

3. $q_{1}, \dots, q_{k}$ span $S_{k}$ :

By the inductive hypothesis, $q_{1}, \dots, q_{k - 1}$ span $S_{k - 1}$ . The vector $d_{k}$ is constructed as a linear combination of $a_{k}$ and $q_{1}, \dots, q_{k - 1}$ . Since $a_{k} \in S_{k}$ and $q_{1}, \dots, q_{k - 1} \subset S_{k - 1} \subset S_{k}$ , it follows that $d_{k} \in S_{k}$ . Hence, $q_{k} \in S_{k}$ .

Because $q_{1}, \dots, q_{k}$ are linearly independent and there are $k$ vectors in the $k$ -dimensional space $S_{k}$ , they form a basis for $S_{k}$ .

Conclusion

The Gram-Schmidt process constructs an orthonormal basis $q_{1}, \dots, q_{n}$ for the span of the input vectors $a_{1}, \dots, a_{n}$ . By iteratively orthogonalizing and normalizing, it ensures that the resulting vectors are orthonormal and span the same subspace.

QR Decomposition

The Gram-Schmidt orthonormalization process naturally gives rise to a valuable matrix factorization called the QR decomposition. This decomposition expresses a matrix as the product of an orthogonal matrix (or a matrix with orthonormal columns) and an upper triangular matrix. It has significant applications in linear algebra computations, including solving least squares problems and finding eigenvalues.

Definition (QR Decomposition):

Let $A$ be an $m \times n$ matrix with linearly independent columns. The QR decomposition of $A$ is given by:
$A = QR,$
where:

$Q$ is an $m \times n$ matrix with orthonormal columns (formed by the vectors $q_{1}, \dots, q_{n}$ obtained from the Gram-Schmidt process applied to the columns of $A$ ), and
$R$ is an $n \times n$ upper triangular matrix.

Lemma (Properties of QR Decomposition):

In the QR decomposition $A = QR$ :

$R$ is upper triangular and invertible.
$Q Q^{T} A = A$ . This shows that the columns of $Q$ span the same space as the columns of $A$ , i.e., $C (Q) = C (A)$ .

Proof of Properties:

1. Upper Triangularity of $R$ :

The entries of $R$ are given by:
$r_{ij} = q_{i}^{T} a_{j} .$

From the Gram-Schmidt process, $q_{i}$ is orthogonal to $a_{1}, \dots, a_{i - 1}$ because $q_{i}$ is orthogonal to the subspace spanned by $a_{1}, \dots, a_{i - 1}$ . Therefore, for $i > j$ :
$r_{ij} = q_{i}^{T} a_{j} = 0.$
This means $R$ is upper triangular.

2. Invertibility of $R$ :

The matrix $R$ is constructed as an upper triangular matrix with diagonal entries $r_{ii}$ representing the norms of the orthogonal components $d_{i}$ of the input vectors $a_{i}$ (after subtracting projections onto the previously computed orthonormal vectors). Specifically:
$r_{ii} = ∥ d_{i} ∥,$
where $d_{i} = a_{i} - \sum_{j = 1}^{i - 1} (a_{i}^{T} q_{j}) q_{j}$ .

Key Reason for Non-Zero $r_{ii}$ :

The columns of $A$ are assumed to be linearly independent. This ensures that none of the input vectors $a_{i}$ is in the span of the previous vectors $a_{1}, \dots, a_{i - 1}$ . Consequently, after subtracting their projections, the resulting orthogonal component $d_{i}$ is non-zero.
Since $r_{ii} = ∥ d_{i} ∥$ , and $d_{i} \neq = 0$ , it follows that $r_{ii} \neq = 0$ for all $i$ .

Invertibility of $R$ :

$R$ is upper triangular by construction, with all diagonal entries $r_{ii} \neq = 0$ .
A standard result in linear algebra is that an upper triangular matrix is invertible if and only if all its diagonal entries are non-zero.

3. $Q Q^{T} A = A$ and $C (Q) = C (A)$

We will rewrite the proof using the concept of projection matrices.

Step 1: Projection Matrix for $C (Q)$

The projection matrix onto the column space of $Q$ , denoted as $P$ , is given by:
$P = Q (Q^{T} Q)^{- 1} Q^{T} .$ For matrices where $Q$ has orthonormal columns, we have $Q^{T} Q = I$ , where $I$ is the identity matrix. Thus:
$P = Q (Q^{T} Q)^{- 1} Q^{T} = Q Q^{T} .$

Step 2: Projection of $A$ ‘s Columns onto $C (Q)$

Let $a_{j}$ denote the $j$ -th column of $A$ . Since $C (Q) = C (A)$ (i.e., the columns of $Q$ span the same space as the columns of $A$ ), the projection of $a_{j}$ onto $C (Q)$ is the column itself. Formally:
$proj_{C (Q)} (a_{j}) = P a_{j} = Q Q^{T} a_{j} .$

Since $a_{j} \in C (Q)$ , projecting $a_{j}$ onto $C (Q)$ leaves it unchanged:
$Q Q^{T} a_{j} = a_{j} .$

Step 3: Generalization to the Entire Matrix $A$

Applying the above argument to all columns of $A$ , we see that the projection of $A$ onto $C (Q)$ is $A$ itself:
$Q Q^{T} A = A .$

Step 4: Why $C (Q) = C (A)$ ?

The equality $Q Q^{T} A = A$ implies that the columns of $Q$ span the same space as the columns of $A$ . Specifically:

The action of $Q Q^{T}$ maps any vector in $C (A)$ to itself, ensuring that $C (Q) \supseteq C (A)$ .
Since $Q$ is constructed using the Gram-Schmidt process on the columns of $A$ , the columns of $Q$ are linear combinations of the columns of $A$ , ensuring $C (Q) \subseteq C (A)$ .

Therefore, $C (Q) = C (A)$ , completing the proof.

Example of QR Decomposition

Let

A = 111 1 - 1 1 .

We will compute its QR decomposition.

Step 1: Apply Gram-Schmidt to Columns of $A$

Let $a_{1} = 111$ and $a_{2} = 1 - 1 1$ .

Normalize $a_{1}$ :
$q_{1} = \frac{a _{1}}{∥ a _{1} ∥} = \frac{1}{3} 111 .$
Orthogonalize $a_{2}$ relative to $q_{1}$ :
The projection of $a_{2}$ onto $q_{1}$ is:
$proj_{q_{1}} (a_{2}) = (a_{2}^{T} q_{1}) q_{1} = 1 - 1 1^{T} \cdot \frac{1}{3} 111 \cdot \frac{1}{3} 111 .$

Compute the scalar $(a_{2}^{T} q_{1})$ :
$a_{2}^{T} q_{1} = \frac{1}{3} (1 \cdot 1 + (- 1) \cdot 1 + 1 \cdot 1) = \frac{1}{3} (1 - 1 + 1) = \frac{1}{3} .$

Therefore,
$proj_{q_{1}} (a_{2}) = \frac{1}{3} \cdot \frac{1}{3} 111 = \frac{1}{3} 111 .$

Subtract this projection from $a_{2}$ to get $d_{2}$ :
$d_{2} = a_{2} - proj_{q_{1}} (a_{2}) = 1 - 1 1 - \frac{1}{3} 111 = 1 - \frac{1}{3} - 1 - \frac{1}{3} 1 - \frac{1}{3} = \frac{2}{3} - \frac{4}{3} \frac{2}{3} .$
Normalize $d_{2}$ :
$q_{2} = \frac{d _{2}}{∥ d _{2} ∥},$
where:
$∥ d_{2} ∥ = (\frac{2}{3})^{2} + (- \frac{4}{3})^{2} + (\frac{2}{3})^{2} = \frac{4}{9} + \frac{16}{9} + \frac{4}{9} = \frac{24}{9} = \frac{8}{3} = \frac{2 6}{3} .$

Thus:
$q_{2} = \frac{\frac{2}{3} - \frac{4}{3} \frac{2}{3}}{\frac{2 6}{3}} = \frac{1}{6} 1 - 2 1 .$

Step 2: Construct $Q$ and $R$

The orthonormal vectors $q_{1}$ and $q_{2}$ form the columns of $Q$ :

Q = \frac{1}{3} \frac{1}{3} \frac{1}{3} \frac{1}{6} - \frac{2}{6} \frac{1}{6} .

The matrix $R$ is given by $R = Q^{T} A$ :

Compute $r_{11}$ :
$r_{11} = ∥ a_{1} ∥ = 1^{2} + 1^{2} + 1^{2} = 3 .$
Compute $r_{12}$ :
$r_{12} = q_{1}^{T} a_{2} = \frac{1}{3} 111^{T} 1 - 1 1 = \frac{1}{3} (1 - 1 + 1) = \frac{1}{3} .$
Compute $r_{22}$ :
$r_{22} = ∥ d_{2} ∥ = \frac{2 6}{3} .$

The matrix $R$ is:

R = [30 \frac{1}{3} \frac{2 6}{3}] .

Final QR Decomposition

A = QR = \frac{1}{3} \frac{1}{3} \frac{1}{3} \frac{1}{6} - \frac{2}{6} \frac{1}{6} [30 \frac{1}{3} \frac{2 6}{3}] .

Computational Usefulness of QR Decomposition

1. Projections:

Since $C (A) = C (Q)$ , projecting onto $C (A)$ is the same as projecting onto $C (Q)$ . With orthonormal columns in $Q$ , the projection simplifies to:
$proj_{C (A)} (b) = Q Q^{T} b .$

2. Least Squares:

The least squares solution to $A x = b$ minimizes $∥ A x - b ∥^{2}$ . Using the QR decomposition ( $A = QR$ ), the normal equations $A^{T} A x = A^{T} b$ become:
$R^{T} R x = R^{T} Q^{T} b .$

Since $R$ is invertible, this simplifies to:
$R x = Q^{T} b,$
which can be efficiently solved by back-substitution because $R$ is upper triangular.

Continue here: 20 Pseudoinverses, Constructing Pseudoinverses

CS Notes

Explorer

19 Gram-Schmidt Process, QR-Decomposition, Properties of Q and R

Preparations for the Gram-Schmidt Process

Task

The Idea (Two Vectors)

Steps:

Claim: q1​ and q2​ are orthonormal.

The Gram-Schmidt Process

Intuition

Gram-Schmidt Algorithm

Proof of Correctness (by Induction)

Base Case (k=1):

Inductive Hypothesis:

Inductive Step (k):

1. qk​ has unit length:

2. qk​ is orthogonal to q1​,…,qk−1​:

3. q1​,…,qk​ span Sk​:

Conclusion

QR Decomposition

Definition (QR Decomposition):

Lemma (Properties of QR Decomposition):

Proof of Properties:

1. Upper Triangularity of R:

2. Invertibility of R:

Key Reason for Non-Zero rii​:

Invertibility of R:

3. QQTA=A and C(Q)=C(A)

Step 1: Projection Matrix for C(Q)

Step 2: Projection of A‘s Columns onto C(Q)

Step 3: Generalization to the Entire Matrix A

Step 4: Why C(Q)=C(A)?