Lecture from: 06.05.2024 | Video: Video ETHZ
Review: What We Know About Convex Functions and Their Derivatives
Let’s quickly recap the highlights from our last lecture on convexity:
- The Core Idea: A function is convex if its graph “holds water” - meaning any line segment connecting two points on the graph (a secant line) lies above or on the graph itself.
- First Derivative Test: If our function (defined on an interval with more than one point) is differentiable, then is convex if and only if its derivative is a monotonically increasing function. Think of it this way: for the graph to bend upwards, its slope must be increasing (or at least not decreasing).
- Second Derivative Test: If is twice differentiable, things get even simpler. The function is convex if and only if its second derivative is non-negative () throughout the interval. Why? Because is the derivative of . So, means is monotonically increasing, which, by the first test, means is convex.
Higher Order Derivatives
We’ve talked a lot about the first derivative (slope) and the second derivative (related to concavity/convexity). But why stop there? We can often keep differentiating.
Let be our domain, and let’s assume it’s “nice” enough that every point is an accumulation point of . (An interval with more than one point is a perfect example).
As a baseline, we define the 0-th derivative of to be just the function itself: .
Definition: What “n-times Differentiable” Means
-
Getting to the n-th Derivative: For a positive integer (like ), we say a function is -times differentiable in if we can successfully differentiate it times to get , and then this -th derivative is itself differentiable. The -th derivative is then defined as the derivative of the -th derivative: .
Our Shorthand Notation:
- (the familiar first derivative)
- (the second derivative)
- (the third derivative), and so on.
-
Adding Continuity: n-times Continuously Differentiable: A function is -times continuously differentiable in if it’s -times differentiable, and its -th derivative, , is a continuous function in . These functions are often denoted .
-
The Ultimate Smoothness: “Smooth” Functions: A function is called smooth (or “glatt” in German) in if it is -times differentiable for all positive integers . You can keep differentiating it forever! This is also sometimes called “infinitely differentiable” or “-times differentiable,” and the set of such functions is often denoted .
An Important Link: Higher Differentiability and Continuity of Lower Derivatives
If a function is -times differentiable, it implies that its -th derivative is continuously differentiable (in fact, it’s -times continuously differentiable). Why? Because for to be differentiable (which it must be for to exist), itself must be continuous. (Remember our corollary: differentiability at a point implies continuity at that point).
Theorem: Rules for Higher Derivatives (They Behave Nicely!)
Let our domain be suitable (every point is an accumulation point), and let . If are both -times differentiable in , then:
-
Sum Rule Still Holds: The sum is also -times differentiable, and the -th derivative of the sum is the sum of the -th derivatives: (This follows quite naturally by applying the basic sum rule repeatedly).
-
Generalized Product Rule (also known as Leibniz’s Formula): The product is also -times differentiable, and its -th derivative is given by a formula that looks remarkably like the binomial expansion: Here, are the binomial coefficients, and means itself, and means itself.
Analogy to Binomial Theorem: Recall . Leibniz’s formula for derivatives has the same structure, which is a neat connection!
Examples: Functions That Are Infinitely Smooth
-
The Champions of Smoothness: , , These functions are all smooth on the entire real line .
- For , its derivative is always . You can differentiate it as many times as you like, and it stays .
- For and , their derivatives cycle through and . Since each of these is differentiable, the cycle continues indefinitely.
-
Polynomials: Smooth Operators Any polynomial function is smooth on .
- When you differentiate a polynomial, you get another polynomial (of one degree less). Eventually, after enough differentiations, you’ll get a constant, and then zero. Since polynomials and constants are always differentiable, all polynomials are smooth.
-
The Natural Logarithm: Smooth on its Domain The function is smooth on its domain .
- Let’s look at its derivatives: A pattern emerges! For : (You can prove this pattern rigorously using mathematical induction). Since is well-defined and differentiable for any as long as , it follows that is smooth on .
Clicker Question Flashback: If , what is ?
We know . We could keep differentiating this using the quotient rule (or chain rule if we write it as ) to find and then . It gets a bit messy.
We’ll see a more elegant way to solve this later using power series! (For now, we’ll be “lazy” and defer the brute-force calculation. The answer will be revealed!)
Theorem: Higher Derivatives and Algebraic Operations
Let be a suitable domain, and .
-
Quotient Rule for Higher Derivatives: If are -times differentiable in , and importantly, for all , then their quotient is also -times differentiable in . (The general formula for is very complicated, so we usually don’t write it out explicitly beyond the first derivative).
-
Chain Rule for Higher Derivatives (Faà di Bruno’s Formula): If are suitable domains, and and are both -times differentiable functions, then their composition is also -times differentiable in . The formulas get complicated quickly:
- n=1 (The familiar Chain Rule):
- n=2 (Differentiating the first derivative): Using the product rule: Using the chain rule for : So,
A General Principle: The properties of being “-times differentiable” usually carry over if you replace it with “smooth”. If the constituent functions are smooth, their sums, products, quotients (where defined), and compositions are also smooth.
Examples: Smoothness from Combining Functions
-
Tangent Function is Smooth (on its domain): is smooth wherever it’s defined, which is .
- Why? and are smooth everywhere. Their quotient is smooth as long as the denominator is not zero.
-
Power Functions are Smooth (for ): For any real number , the function is smooth on the interval .
- Why? We write . The function is smooth on . Multiplying by a constant (i.e., ) is a smooth operation. The exponential function is smooth on . Since is a composition of these smooth functions, it is itself smooth on by the (generalized) chain rule.
-
Inverse Trigonometric Functions are Smooth (mostly):
-
and are smooth on the open interval .
-
is smooth on the entire real line .
-
Let’s see why for : We found .
The function is a polynomial, so it’s smooth everywhere. For , the value is in .
The function (which is ) is smooth on (from our previous example of with ).
Since is a composition of , it is smooth on .
If the derivative of a function () is smooth, then the original function () must also be smooth. (Think of it this way: if you can differentiate infinitely many times, you could certainly differentiate one more time than that!)
-
Power Series and Taylor Approximation: The Link Between Smoothness and Polynomials
What’s the Big Idea Here? Imagine you have a really smooth, curvy function. We already know that special types of infinite sums called “power series” can create such smooth functions. Now, we’re asking the reverse: can we take any smooth function and represent it or approximate it using a polynomial (or an infinitely long polynomial, which is a power series)? This section explores that connection.
This section bridges two important ideas:
- We know that power series define smooth functions (infinitely differentiable functions) inside their interval of convergence. Think of them as “infinitely long polynomials” that behave very nicely.
- We want to see how differentiable (especially smooth) functions can be approximated by polynomials. This naturally leads to the concept of Taylor series, where we use the function’s derivatives at a single point to build these polynomial approximations.
Clicker Question Analysis (A Look Ahead): Consider and .
Hold On, A Word of Caution! This example is a bit of a “heads-up.” We’re looking at a sequence of perfectly smooth functions () that get closer and closer to a function () that isn’t smooth everywhere (it has a sharp corner at ). This tells us that just because smooth things approach a limit doesn’t mean the limit itself will be smooth. Something more is needed!
-
Convergence behavior: Let’s look at the difference: . We can rationalize this (multiply by the conjugate over itself, a common algebraic trick to simplify expressions involving square roots): . The denominator is always at least (this minimum occurs when ). So, .
Since as , and this bound does not depend on , the sequence converges uniformly to on . Uniform convergence means that all parts of the function approach at roughly the same rate.
-
Differentiability contrast: Each function is smooth for all . Why? Because the term inside the square root, , is always strictly positive (since , ). The square root function is smooth as long as . However, the limit function is famously not differentiable at . It has a sharp point there, and a derivative needs a well-defined, unique tangent line.
The punchline of this clicker question: Uniform convergence of a sequence of differentiable functions is not enough to guarantee that the limit function is differentiable. We need something more – specifically, we need the sequence of derivatives to also converge nicely (uniformly, in fact).
Theorem: Interchanging Limits and Differentiation
When Can You Swap ‘Limit’ and ‘Derivative’? This is a crucial question in calculus. Imagine you have a sequence of functions that are morphing into a final function . Can you find the slope (derivative) of by first finding the slopes of all the ‘s and then seeing what those slopes approach? This theorem says “yes, sometimes!” but lays down the specific conditions needed.
Let be an interval with more than one point.
Consider a sequence of functions , where each is continuously differentiable (meaning exists and is continuous).
Suppose two conditions hold:
- The sequence of functions converges pointwise on to some function . That is, for each specific , . (The functions settle down at each point.)
- The sequence of derivatives converges uniformly on to some function . That is, uniformly for . (The slopes of the functions settle down, and they do so together, across the whole interval.)
Then, the limit function is differentiable on , and its derivative is exactly . In other words: This theorem is a cornerstone because it tells us precisely when we can swap the order of taking a limit and performing differentiation.
Proof Idea (Connecting to MVT and Continuity)
How Does This Swapping Magic Work? (The Intuition) The proof relies on a few key ideas. First, if the derivatives are all continuous and they converge uniformly to , then itself must be continuous (a nice property of uniform convergence). To show , we look at the definition of the derivative of at : . The core idea is that for and to be close to and (due to pointwise convergence), the fraction should be close to . The Mean Value Theorem (MVT) steps in here: it says that is exactly equal to for some between and . Now, if uniformly, then should be close to . And since is continuous, is close to when (and thus ) is close to . Stringing these “close to” ideas together suggests that the derivative of at is indeed .
Since each is continuous and uniformly, the limit function must be continuous on (this is a standard result, cited here as Theorem 3.7.4).
We want to show that for any , .
For close to and large: The difference quotient (the slope of the secant line for ) should be close to (the slope of the secant line for ).
By the Mean Value Theorem applied to on an interval containing and , there is some between and such that .
As , (which is squeezed between and ) also approaches . Because uniformly, is close to for large . Because is continuous at , is close to when is close to .
Putting it all together: . This makes it plausible that .
More Formal Sketch of the Proof
Making the Intuition Rigorous (The Epsilon-Delta Dance) The “proof idea” gave us a good gut feeling. Now, we make it mathematically solid using precise definitions of limits (epsilons and deltas). The goal is to show that the difference between the actual slope of and the proposed slope can be made arbitrarily small.
Let . We need to show that can be made small by choosing sufficiently close to .
- Since is continuous at : for our chosen , there exists a such that if , then . Since will be between and , if , then , so . (Statement )
- Since uniformly: for our chosen , there exists an integer such that if , then for all . (Statement )
Now, pick an with .
Apply the Mean Value Theorem to on the interval between and : there exists between and such that . Note that is automatically within the -neighborhood of because is.
For : We can write: .
Using the triangle inequality (): . So, for , we have .
Now, we take the limit as . Since for each (pointwise convergence), the term converges to .
Thus, taking the limit of the inequality (the part is independent of ): .
Since this holds for any in the punctured -neighborhood of (meaning but ), and was an arbitrary positive number, this means that by definition of the limit.
Thus, .
Application: Differentiating Power Series Term by Term
Power Series Good news! The theorem we just proved about swapping limits and derivatives is exactly what we need to justify differentiating power series (those “infinite polynomials”) term by term. It turns out power series are well-behaved enough to meet the theorem’s conditions within their comfort zone (interval of convergence).
This theorem is the backbone for why we can differentiate power series term by term, a technique that feels as natural as differentiating ordinary polynomials.
Theorem: Term-by-Term Differentiation of Power Series
Official Rule: Differentiate Piece by Piece! This theorem formally states that if you have a power series that defines a function , you can find the derivative by simply differentiating each term of the series individually. And importantly, the new “derivative series” will work (converge) in the same -range as the original series.
Let be a power series with a radius of convergence . (This defines the “comfort zone” where the series behaves nicely.)
Then the function is differentiable within its interval of convergence .
Its derivative is found by differentiating term by term: Crucially, this new power series (the “differentiated series”) has the same radius of convergence as the original series. This is a powerful result!
Why this works (linking to the previous theorem):
Connecting the Dots: Why Term-by-Term is Okay Why can we do this? Because on any closed subinterval inside the interval of convergence (say, from to where ), a power series converges uniformly. Moreover, the series of its derivatives also converges uniformly on that same subinterval. These are precisely the conditions our “Interchanging Limits and Differentiation” theorem needs! The partial sums of the power series are our ‘s.
On any closed subinterval where :
- The original power series (viewed as the limit of its sequence of partial sums ) converges uniformly to .
- The series of derivatives (i.e., ) is also a power series. It can be shown (often using the same methods as finding the original radius of convergence, like the ratio test for its coefficients) that it has the same radius of convergence . Thus, its sequence of partial sums converges uniformly to some function, let’s call it , on .
The conditions of the “Interchanging Limits and Differentiation” Theorem are met on these closed subintervals (with being the partial sums of the original series, and being the partial sums of the differentiated series). So, we can conclude that , which is the sum of the term-by-term differentiated series.
Corollary: Power Series Define Smooth Functions, and Taylor Coefficients
Power Series = Super Smooth Functions! And a Bonus: Finding Coefficients! This is a fantastic consequence. If a function can be written as a power series:
- It’s not just differentiable once, but infinitely many times (it’s “smooth”).
- We can keep differentiating term-by-term.
- Most excitingly, if we plug in the center point into the function and its derivatives, we can figure out exactly what the coefficients of the original power series must be! They are the famous Taylor coefficients.
If a function is given by a power series with radius of convergence , then:
-
is smooth (infinitely differentiable) in the interval . Intuition: Differentiating a polynomial gives another polynomial. Since a power series is like an “infinite polynomial” and term-by-term differentiation yields another power series with the same radius of convergence, we can repeat this process indefinitely.
-
Its -th derivative can be found by differentiating term-by-term times: (Note: the sum starts at because terms with will become zero after differentiations.)
-
A very important consequence: If we evaluate the -th derivative at , all terms in the sum vanish except for the very first one (where ): When , the term is . (Note that ; might be controversial, but the internet and math-stackexchange says so…)
So, . This gives us a remarkable way to find the coefficients of a power series if we know the function it represents and can calculate its derivatives at :
These are the Taylor coefficients. This formula is a cornerstone of Taylor series.
Example
Let’s See It in Action! Consider the series for . If we differentiate it term by term, we should get the series for its derivative, . Let’s check!
Consider .
This series has a radius of convergence (you can check this with the ratio test: , so it converges for ).
For : .
Let . As goes from to , goes from to . So, .
This is the geometric series, which sums to for .
This makes perfect sense: if , then should be .
Since , and , we have .
And indeed, the series for centered at is .
Solving the Clicker Question for Elegantly
A Clever Shortcut for Derivatives! Remember that earlier “clicker question” about ? Calculating higher derivatives directly can be tedious. But if we can find a power series for a related function (like ), we can use the formula to pick off the derivative we need from the coefficients!
Let . We know that .
For (which ensures ), we can expand as a geometric series: So, .
This is the power series for centered at .
Let’s write out the coefficients for :
(coefficient of , when ) (there is no term) (coefficient of , when ) (there is no term) (coefficient of , when ), and so on. Only even powers of appear.
We want .
Since , then , and . So, we are looking for .
From the corollary, for a power series , we have .
Applying this to our function and its coefficients : .
The coefficient (of ) in the series for is . Therefore, . Thus, . Much easier than differentiating three times!
Taylor Approximation: Using Derivatives to Build Polynomial Approximations
The Taylor Polynomial The fact that power series coefficients are is incredibly powerful. It suggests that if a function is smooth enough (has enough derivatives) at a point , we can try to build a polynomial that “mimics” the function near by using these very derivatives. This special polynomial is called the Taylor polynomial. The more derivatives it matches, the better it “hugs” the original function near .
Our discussion of power series and how their coefficients relate to derivatives, , makes the following idea very natural:
If a function is sufficiently smooth (has enough derivatives) near a point , perhaps we can approximate it well using a polynomial built from these derivatives.
The Taylor polynomial of degree for around is defined as: This polynomial is constructed with a specific goal: it must match the function and its first derivatives precisely at the point .
- …
The intuition is that if a polynomial shares these fundamental characteristics (value, slope, concavity, rate of change of concavity, etc.) with the function at , it should provide a good local approximation to when is near . The higher the degree , the more derivatives are matched, and generally, the better the approximation becomes over a wider range around . This is the foundational idea behind Taylor series (which is ) and Taylor’s theorem, which provides a way to quantify the error in this approximation.
Continue here: 20 Taylor Approximation, Higher Derivative Test for Local Extrema, Riemann Integral, Darboux Sums and Integrals