Lecture from 13.11.2024 | Video: Videos ETHZ
Projections
Definition: Projection onto a Subspace
The projection of a vector onto a subspace is the closest point in to . Formally:
Projection onto a Subspace (General Case)
Given a subspace (where is an matrix whose columns form a basis for ), the projection of onto is:
where satisfies the normal equations:
Derivation: The error vector is orthogonal to , meaning it’s orthogonal to each column of . This leads to the normal equations.
Invertible
Lemma 5.2.4
The matrix is invertible if and only if the matrix has linearly independent columns.
Proof
We will prove the equivalence by chaining together several “if and only if” statements:
-
is invertible has linearly independent columns: This is a fundamental property of matrices. A square matrix is invertible if and only if its columns (or rows) are linearly independent.
-
has linearly independent columns : The nullspace of a matrix contains only the zero vector if and only if its columns are linearly independent. If there were a non-zero vector in the nullspace, it would represent a non-trivial linear combination of the columns that equals zero, contradicting linear independence.
-
: We’ve proven this previously (see the lemma about the link between nullspaces of and ).
-
Columns of are linearly independent: Again, this is a fundamental property. The nullspace of a matrix contains only the zero vector if and only if its columns are linearly independent.
Therefore, by connecting these equivalences, we have shown that is invertible if and only if has linearly independent columns. This result is essential for least squares problems, as it guarantees a unique solution when the columns of are linearly independent.
Explicit Formula for Projection (Theorem 5.2.6)
When the columns of matrix are linearly independent, we can derive a direct formula for the projection of onto the subspace spanned by those columns ().
-
Normal Equations: We know the projection can be expressed as , where satisfies the normal equations: .
-
Invertibility of : A crucial fact (proven in Lemma 5.2.4) is that if has linearly independent columns, then the matrix is invertible. This invertibility is key to finding a direct solution for .
-
Solving for : Since is invertible, we can solve for by multiplying both sides of the normal equations by :
-
The Projection Formula: Substitute this expression for back into :
-
The Projection Matrix: This formula reveals that the projection operation is a linear transformation represented by the matrix . We call the projection matrix. So, the projection is simply:
Properties of P:
-
Idempotence (): Projecting a vector that’s already in the subspace shouldn’t change it. Mathematically, projecting the projection gives the same projection: .
Proof:
-
Projection onto the Orthogonal Complement: The matrix projects onto , the orthogonal complement of . This is because can be uniquely decomposed as the sum of its projection onto and its projection onto : .
Proof: Let be the projection of onto . Then . The error vector is the difference between and its projection :
-
Idempotence of : Similar to , projecting onto twice has the same effect as projecting once: .
-
Invertibility: It’s important to note that while and might not be square and therefore might not have inverses, the matrix is always square. And when the columns of are linearly independent, is guaranteed to be invertible.
Least Squares
Definition
Given and , a least squares solution minimizes the squared error:
Connection to Projections
The least squares solution is equivalent to finding the projection of onto the column space of :
The least squares solution is given by the normal equations: . If has linearly independent columns, the explicit solution is .
Application: Linear Regression
Linear regression aims to find the “best-fitting” straight line through a set of data points , where . We represent this line by the equation , where is the y-intercept and is the slope. “Best-fitting” means minimizing the sum of the squared vertical distances between the data points and the line.
The vertical distance between a data point and the line at is given by . We square these distances to eliminate the absolute value and to penalize larger deviations more heavily. The sum of squared errors is:
We are effectively summing up the error for each . Our goal is to find the values of and that minimize this sum.
Given an we can use them to create a line.
Matrix-Vector Formulation:
We can express this problem concisely using matrices and vectors. Let’s define:
- : This matrix holds the values (and a column of 1s so that we can multiply by to get the constant term).
- : This vector holds the unknowns we want to determine.
- : This vector holds the values (the vertical coordinates of the data points).
Now, the product gives us the predicted values for each :
The difference between the actual values and the predicted values represents the errors: . The sum of squared errors is then . Thus, the linear regression problem becomes:
This is a least squares problem.
Explicit Solution for Linear Regression Coefficients
We can solve this least squares problem using the normal equations:
Solving this system of equations for gives us the optimal values for and .
Here is how to derive that…
The explicit solution for the linear regression coefficients and , when the matrix A has linearly independent columns, is given by:
Expanding and gives:
This formula provides a direct way to calculate and from the data points . It’s important to remember that this explicit form relies on the invertibility of , which holds when the columns of (and thus the values) are not all identical. If the values are all the same, the matrix is singular, and we must resort to other methods to find a “best-fit” solution (which, in the case of all being equal, would represent a vertical line).
Another good video…
Practice problem…
Linear Independence
The columns of are linearly independent unless all the values are identical. If all are the same, the columns of are linearly dependent, and is not invertible, meaning there isn’t a unique least squares solution. Geometrically, this corresponds to all data points lying on a vertical line, in which case any vertical line through the data points would minimize the (vertical) least squares error. However, if there is even a slight variation in the values, the columns of become linearly independent, and a unique least squares solution exists. This unique solution corresponds to the familiar “line of best fit.”
Simplification with Orthogonal Columns (Observation 5.3.3)
The columns of are orthogonal if their dot product is zero. For the given design matrix, this means:
So, the condition for orthogonality of the columns of A is that the sum of the independent variable values () is zero.
Simplified Normal Equations
The normal equations are given by . If the columns of are orthogonal (i.e., ), then becomes:
Simplified Solution: Since is diagonal, its inverse is easy to compute:
And the least squares solution becomes:
Fitting a Parabola (Quadratic Regression)
Just as we fit a line to data points using linear regression, we can fit a parabola (or any higher-degree polynomial) using a similar least squares approach. For a parabola, the model is:
where are the data points, and , , and are the coefficients we want to determine.
Matrix-Vector Formulation:
To formulate this as a least squares problem, we define the following matrices and vectors:
-
. Each row corresponds to a data point, and the columns represent the constant, linear, and quadratic terms.
-
: This vector contains the unknown parabola coefficients.
-
: This vector holds the values of the data points.
The least squares problem is then: Note that we can extend this to cubic, quadratic, etc… powers.
Continue here: 18 Orthonormal Bases, Properties of Orthogonal Matrices