Authors comment: As most of you probably realized, NumCS is very chaotic and (imo) badly explained. I’m doing my best to distill the lecture in this note, but it’s very much possible that there are errors. If you find any (or even wanna improve the note), write in the comments or to me :D
An Overview of Numerical Topics
Numerical analysis is the study of algorithms for solving the problems of continuous mathematics. It is not a single, monolithic theory but rather a collection of powerful techniques designed to tackle distinct classes of problems that are often intractable to solve analytically. The expertise of a numerical analyst lies in diagnosing a problem’s structure to choose the most appropriate and efficient method from this toolkit.
Let’s begin with a survey of the landscape.
-
Quadrature (Numerical Integration): The fundamental task is to compute the value of a definite integral, . In practical applications, the domain of integration is rarely a simple one-dimensional interval. It could be a high-dimensional space (), an infinite domain, or a geometrically complex shape, such as those found in computer graphics. The challenge is to devise methods that are both accurate and efficient, as computational cost can grow exponentially with the dimension of the problem.
-
Interpolation: Given a set of discrete data points, , the objective is to construct a function that passes exactly through these points, satisfying the condition for all . This is a fundamental tool for creating continuous models from discrete measurements.
-
Numerical Linear Algebra: While you have likely studied methods for solving linear systems like , the numerical perspective focuses on efficiency and stability. For an matrix, the standard Gaussian elimination algorithm has a computational complexity of operations. For large-scale problems where can be in the millions, this is computationally infeasible. We must therefore explore more efficient algorithms, whose suitability critically depends on the specific properties (e.g., sparsity, symmetry) of the matrix . This field also covers essential tools like matrix factorizations (, , ) and methods for solving eigenvalue problems ().
-
Linear and Nonlinear Least Squares: Often, a system has no exact solution. The “least squares” approach seeks the best approximate solution by finding the vector that minimizes the squared Euclidean norm of the residual error:
This concept extends to the nonlinear world, which forms the bedrock of modern machine learning. The goal is to find a set of model parameters that minimizes the difference between a model’s prediction and observed data :
This is fundamentally an optimization problem and a vast, active area of research.
The Core Concerns of Numerics
Across these diverse topics, a set of fundamental principles and concerns consistently guides our work.
- Convergent Sequences: We almost always approximate a continuous problem with a discrete one. It is essential that as we increase the “resolution” of our discretization, our sequence of approximate solutions must converge to the true solution.
- Computational Cost: An algorithm’s practical utility is determined by its efficiency. We must analyze the Rechenzeit (runtime) and Speicherverbrauch (memory usage), typically as a function of the problem size, .
- Generality: We aim to develop algorithms that apply to broad classes of problems, not just a single, specific function.
- Quantifiability: Our analysis must be rigorous. We need to derive precise mathematical bounds on the error of our approximations and the complexity of our algorithms.
The Reality of Computer Arithmetic
Before exploring advanced algorithms, we must confront a foundational and practical issue: computers do not perform exact arithmetic. Understanding the limitations of computer arithmetic is essential to writing robust numerical code.
Absolute and Relative Error
Given an exact value and a computed approximation , we define two primary measures of error:
- Absolute Error:
- Relative Error: , for .
The absolute error tells you the magnitude of the mistake, while the relative error puts that mistake in context by comparing it to the magnitude of the true value.
Example: The Problem with Relative Error Near Zero Imagine we are measuring a quantity whose true value is . An algorithm produces an approximation .
- The absolute error is tiny: .
- The relative error is enormous: , which corresponds to a 100% error! This happens because dividing by a very small number amplifies the error measure. It highlights that for a computer, zero is a special number, and values close to it require careful handling.
Floating-Point Representation
The source of these issues is that computers cannot represent the infinite set of real numbers. They use a finite subset called floating-point numbers.
A floating-point number is stored in a format analogous to scientific notation:
- Base (B): The number system’s base. For most modern computers, this is binary (). For human-readable examples, we often use decimal ().
- Mantissa (d): A number with a fixed number of digits, , representing the significant digits of the value. It is typically normalized to be in a standard range, like . For example, in base 10, the number 5280 would be normalized to .
- Exponent (E): An integer that scales the mantissa, representing the number’s order of magnitude. It is restricted to a finite range, .
Example of Floating-Point Representation
Let’s represent the decimal number in a simplified binary floating-point format.
- Convert to Binary: and . So, .
- Normalize: We move the binary point so that it’s to the left of the first non-zero digit, adjusting the exponent accordingly. .
- Store: The computer would store the mantissa and the exponent .
This finite representation has two critical consequences:
- Rounding Error: Any number that cannot be represented exactly in this format must be rounded to the nearest available floating-point number.
- Non-uniform Spacing: The gap between representable numbers is smaller near zero and grows larger as the magnitude of the numbers increases.
The First Rule of Numerical Computing
Because of rounding errors, you must never test for exact equality with floating-point numbers. A check like
if x == 0.0:
is unreliable and almost always a bug. The correct approach is to check if the number’s magnitude is smaller than a chosen tolerance:if abs(x) < tol:
.
Catastrophic Cancellation (Auslöschung)
In the world of numerical computation, not all errors are created equal. The most insidious and dramatic source of error isn’t a bug in your code or a flaw in the hardware; it’s a mathematical phenomenon called catastrophic cancellation.
At its heart, the problem is this: subtracting two numbers that are nearly equal to each other.
Think about it like this. Imagine you have two very long wooden planks, and you want to find the tiny difference in their lengths. You measure each one with a tape measure that’s accurate to about a millimeter.
- Plank A:
5.000
meters (but it could be5.000 +/- 0.001
) - Plank B:
4.998
meters (but it could be4.998 +/- 0.001
)
The calculated difference is 0.002
meters, or 2 millimeters. But what’s the uncertainty? It’s now roughly 0.002
meters as well! Your result is the same size as your potential error. The “signal” (the true difference) has been drowned out by the “noise” (the measurement uncertainty).
In floating-point arithmetic, the “uncertainty” comes from the limited precision of the mantissa (see above). When you subtract two very close numbers, the leading, most significant digits, the ones we trust, cancel each other out. The result is then constructed from the trailing, least significant digits, the “noise” where rounding errors live. The computer then normalizes this result, promoting the noisy digits to the front, thereby catastrophically amplifying the relative error.
A Classic Case: The Quadratic Formula
Let’s see this in action with an example that every high school student knows: finding the roots of a quadratic equation.
Consider the polynomial . If you solve this algebraically, you’ll find the exact roots are beautifully simple: and . This gives us a perfect “ground truth” to test our numerical methods against.
The standard recipe for finding roots is the quadratic formula:
For our polynomial, we have , , and . (Let’s call the polynomial’s constant term to avoid confusion with our parameter ).
Now, let’s consider what happens when our parameter is very large, say .
- The term becomes a large negative number.
- The term under the square root, the discriminant, is . For a large , is huge, so is a number extremely close to .
This is where the trap is set. To find the two roots, we compute:
- : Here, we are adding two large, positive numbers. This is numerically stable and works perfectly fine.
- : Here, we are subtracting two large, nearly identical numbers. This is our catastrophic cancellation.
An Example in Action
Let’s take . The true roots are and .
Our term . The term will be calculated by the computer as a number incredibly close to .
When we calculate , we perform
(a huge number) - (a slightly different huge number)
. In floating-point precision, this might look like:100000000.00000001 - 100000000.00000000
The result depends entirely on the last, least reliable digits. We lose almost all significant figures of accuracy, and the computed result for will be wildly incorrect.
The plot below shows this failure in practice. For large , the error in the smaller root (computed with subtraction) explodes, while the larger root (computed with addition) remains accurate.
The Fix: A Stable Algorithm via Reformulation
We can’t change the laws of floating-point arithmetic, but we can change our algorithm to avoid the trap. The key is to sidestep the dangerous subtraction.
Remember Vieta’s formulas from algebra? For a quadratic , the product of the roots is . In our case, and , so we have the simple and powerful relationship:
This gives us a brilliant way out.
The Stable Strategy:
- First, calculate the “safe” root using the addition, which we know is stable:
- Then, use Vieta’s formula to find the second root without any subtraction:
This two-step method completely avoids catastrophic cancellation. Division by a large, stable number is a perfectly safe operation.
As the plot below confirms, this reformulated algorithm computes both roots with high accuracy across the entire range of .
The lesson here is: The underlying mathematics is identical, but the computational recipe matters immensely. A simple algebraic reformulation can turn an unstable method that produces garbage into a robust one that delivers the right answer.
Numerical Differentiation: The Finite and the Infinite
How can a computer, a machine that only knows about discrete numbers and finite steps, find the derivative of a function, a concept built on the infinitely small? This is the core challenge of numerical differentiation.
The Obvious Way (And Why It Fails)
Let’s start with the definition of the derivative we all learned in calculus:
The most straightforward way to turn this into an algorithm is to simply drop the limit and pick a very, very small number for . This gives us the forward difference formula:
How good is this approximation? We can use Taylor’s theorem, which tells us how to approximate a function around a point:
If we rearrange this to solve for our formula, we can see the error we’re making:
This is our truncation error. It’s the piece of the infinite Taylor series we “truncated” or cut off. Since it’s proportional to , we say the error is of order , written as . In theory, this is great news! To get a more accurate answer, just make smaller.
But reality is not so simple. Let’s try it for at (where we know the exact answer is ).
Look at that table. It’s a disaster!
- For a while, everything works as expected. As we decrease from to , the relative error gets smaller and smaller.
- But then, at , the error suddenly gets worse.
- For any smaller than that, the error explodes, until at , we have a 100% error and our result is pure garbage.
What went wrong? We’ve run headfirst into our old enemy: catastrophic cancellation. The numerator, , is the subtraction of two numbers that become nearly identical as . This wipes out the significant digits, leaving us with rounding noise.
This reveals a fundamental tension in numerical differentiation:
- Truncation Error: The mathematical error from our approximation. It wants a small .
- Rounding Error: The computational error from finite precision. It gets amplified by small and wants a large .
The total error is a combination of these two, creating a characteristic “V” shape on a log-log plot (see the plot below). The bottom of the “V” is the best we can do, but the method is fundamentally unstable. We need a better idea.
Idea 1: A Clever Escape to the Complex Plane
What if we could get the derivative without subtraction? It sounds impossible, but a beautiful trick exists if we’re allowed to step into the complex numbers. This is the complex step derivative.
Assume our function is analytic (infinitely differentiable (aka smooth) and representable by its Taylor series, like exp
, sin
, cos
, etc.). Let’s see what happens when we evaluate it at , where and is a small real number.
The Taylor series is:
Now let’s simplify the powers of : , , etc.
Let’s group the terms into a real part (no ) and an imaginary part (with ):
Look closely at the imaginary part. It contains the term that we want! We can isolate it by taking the imaginary part of the whole expression:
Now, just divide by :
This gives us our formula:
Why the Complex Step is Brilliant
- No Cancellation: The formula is
Im[f(x+ih)] / h
. There is no subtraction of nearly equal numbers. We have completely sidestepped the cause of our previous failure.- High Accuracy: The truncation error is now . This means if you halve , the error doesn’t just get 2x smaller, it gets 4x smaller. It converges much more quickly.
The plots below show the proof. The forward difference (diffd1
) shows the V-shaped error profile. The complex step derivative (diffih
) is a straight line of decreasing error until it hits the limit of machine precision. It is vastly more stable and accurate.
Idea 2: Getting More from Reality with Richardson Extrapolation
The complex step is amazing, but what if our function or programming language doesn’t support complex numbers (or simply because doing this might be a pain…)? There’s another, profoundly powerful idea called Richardson Extrapolation that works entirely in the real domain.
The trick is to start with a better, more symmetric formula: the central difference formula. We get this by combining two Taylor expansions:
If we subtract the second equation from the first, the even-powered terms (, , etc.) cancel out perfectly:
Solving for gives us the central difference formula:
The coefficients in the error expansion are given by the general formula (but just think of them as some constants):
This is already better than the forward difference because its truncation error is . But the real magic is that the error expansion contains only even powers of h. This is the key we can exploit.
Let’s call our approximation . We have:
What if we compute the same approximation but with half the step size, ?
Now we have a system of two equations. We can play a simple algebraic game to eliminate the biggest error term, . Multiply the second equation by 4 and subtract the first:
Rearranging gives us a brand new, spectacular approximation:
The error of this new formula is ! We’ve combined two results to get an result. This process is called Richardson Extrapolation. We can repeat it again and again, combining results to get an result, and so on. This is typically organized in a table called the Richardson Schema.
The Power of Extrapolation
Richardson Extrapolation doesn’t eliminate catastrophic cancellation, but it’s so powerful that it lets us achieve very high accuracy using a relatively large value of , where cancellation is not yet a problem. It’s a method for “accelerating convergence”, getting a great answer from mediocre ones.
The plots below show how the central difference (diffd2
) is already better than the forward difference, but the result after one step of Richardson extrapolation (diffRichardsonV
) is orders of magnitude more accurate.
Computational Cost
To compare algorithms, we analyze their asymptotic complexity using Big O notation.
Definition: Big O Notation
We say a function is if there exists a constant and a value such that for all , we have . This describes the growth rate of the computational cost for large problem sizes .
Appendix: Computing with Matrices
The building blocks of many numerical algorithms are matrix and vector operations.
- Inner Product (Dot Product): For vectors , the inner product is . Note the complex conjugate on the first vector.
- Outer Product: The outer product produces an matrix.
- Matrix Multiplication: The entry is the inner product of the -th row of with the -th column of .
In modern numerical environments like Python with NumPy, these operations are highly optimized. It is crucial to use these built-in, vectorized functions rather than writing explicit loops.