19 Higher Order Derivatives, Smooth Functions, Power Series and Taylor Approximation

Lecture from: 06.05.2024 | Video: Video ETHZ

Review: What We Know About Convex Functions and Their Derivatives

Let’s quickly recap the highlights from our last lecture on convexity:

The Core Idea: A function $f$ is convex if its graph “holds water” - meaning any line segment connecting two points on the graph (a secant line) lies above or on the graph itself.
First Derivative Test: If our function $f$ (defined on an interval $I$ with more than one point) is differentiable, then $f$ is convex if and only if its derivative $f^{'}$ is a monotonically increasing function. Think of it this way: for the graph to bend upwards, its slope must be increasing (or at least not decreasing).
Second Derivative Test: If $f$ is twice differentiable, things get even simpler. The function $f$ is convex if and only if its second derivative $f^{''}$ is non-negative ( $f^{''} (x) \geq 0$ ) throughout the interval. Why? Because $f^{''}$ is the derivative of $f^{'}$ . So, $f^{''} \geq 0$ means $f^{'}$ is monotonically increasing, which, by the first test, means $f$ is convex.

Higher Order Derivatives

We’ve talked a lot about the first derivative $f^{'}$ (slope) and the second derivative $f^{''}$ (related to concavity/convexity). But why stop there? We can often keep differentiating.

Let $D \subseteq R$ be our domain, and let’s assume it’s “nice” enough that every point $x_{0} \in D$ is an accumulation point of $D$ . (An interval with more than one point is a perfect example).

As a baseline, we define the 0-th derivative of $f$ to be just the function $f$ itself: $f^{(0)} := f$ .

Definition: What “n-times Differentiable” Means

Getting to the n-th Derivative: For a positive integer $n$ (like $n = 1, 2, 3, \dots$ ), we say a function $f : D \to R$ is $n$ -times differentiable in $D$ if we can successfully differentiate it $n - 1$ times to get $f^{(n - 1)}$ , and then this $(n - 1)$ -th derivative is itself differentiable. The $n$ -th derivative is then defined as the derivative of the $(n - 1)$ -th derivative: $f^{(n)} := (f^{(n - 1)})^{'}$ .

Our Shorthand Notation:
- $f^{'} = f^{(1)}$ (the familiar first derivative)
- $f^{''} = f^{(2)}$ (the second derivative)
- $f^{'''} = f^{(3)}$ (the third derivative), and so on.
Adding Continuity: n-times Continuously Differentiable: A function $f : D \to R$ is $n$ -times continuously differentiable in $D$ if it’s $n$ -times differentiable, and its $n$ -th derivative, $f^{(n)}$ , is a continuous function in $D$ . These functions are often denoted $C^{n} (D)$ .
The Ultimate Smoothness: “Smooth” Functions: A function $f : D \to R$ is called smooth (or “glatt” in German) in $D$ if it is $n$ -times differentiable for all positive integers $n$ . You can keep differentiating it forever! This is also sometimes called “infinitely differentiable” or “ $\infty$ -times differentiable,” and the set of such functions is often denoted $C^{\infty} (D)$ .

An Important Link: Higher Differentiability and Continuity of Lower Derivatives

If a function is $n$ -times differentiable, it implies that its $(n - 1)$ -th derivative is continuously differentiable (in fact, it’s $(n - 1)$ -times continuously differentiable). Why? Because for $f^{(n - 1)}$ to be differentiable (which it must be for $f^{(n)}$ to exist), $f^{(n - 1)}$ itself must be continuous. (Remember our corollary: differentiability at a point implies continuity at that point).

Theorem: Rules for Higher Derivatives (They Behave Nicely!)

Let our domain $D$ be suitable (every point is an accumulation point), and let $n \in N^{*}$ . If $f, g : D \to R$ are both $n$ -times differentiable in $D$ , then:

Sum Rule Still Holds: The sum $f + g$ is also $n$ -times differentiable, and the $n$ -th derivative of the sum is the sum of the $n$ -th derivatives: $(f + g)^{(n)} = f^{(n)} + g^{(n)}$ (This follows quite naturally by applying the basic sum rule repeatedly).
Generalized Product Rule (also known as Leibniz’s Formula): The product $f \cdot g$ is also $n$ -times differentiable, and its $n$ -th derivative is given by a formula that looks remarkably like the binomial expansion: $(f \cdot g)^{(n)} = \sum_{k = 0}^{n} (k n) f^{(k)} g^{(n - k)}$ Here, $(k n) = \frac{n !}{k ! ( n - k )!}$ are the binomial coefficients, and $f^{(0)}$ means $f$ itself, and $g^{(0)}$ means $g$ itself.

Analogy to Binomial Theorem: Recall $(z + w)^{n} = \sum_{k = 0}^{n} (k n) z^{k} w^{n - k}$ . Leibniz’s formula for derivatives has the same structure, which is a neat connection!

Examples: Functions That Are Infinitely Smooth

The Champions of Smoothness: $exp (x)$ , $sin (x)$ , $cos (x)$ These functions are all smooth on the entire real line $R$ .
- For $exp (x)$ , its derivative is always $exp (x)$ . You can differentiate it as many times as you like, and it stays $exp (x)$ .
- For $sin (x)$ and $cos (x)$ , their derivatives cycle through $\pm sin (x)$ and $\pm cos (x)$ . Since each of these is differentiable, the cycle continues indefinitely.
Polynomials: Smooth Operators Any polynomial function is smooth on $R$ .
- When you differentiate a polynomial, you get another polynomial (of one degree less). Eventually, after enough differentiations, you’ll get a constant, and then zero. Since polynomials and constants are always differentiable, all polynomials are smooth.
The Natural Logarithm: Smooth on its Domain The function $ln : (0, \infty) \to R$ is smooth on its domain $(0, \infty)$ .
- Let’s look at its derivatives: $ln^{'} (x) = \frac{1}{x} = x^{- 1}$ $ln^{''} (x) = (- 1) x^{- 2}$ $ln^{'''} (x) = (- 1) (- 2) x^{- 3} = (- 1)^{2} \cdot 2! \cdot x^{- 3}$ A pattern emerges! For $n \geq 1$ : $ln^{(n)} (x) = (- 1)^{n - 1} (n - 1)! x^{- n}$ (You can prove this pattern rigorously using mathematical induction). Since $x^{- n}$ is well-defined and differentiable for any $n$ as long as $x \in (0, \infty)$ , it follows that $ln (x)$ is smooth on $(0, \infty)$ .

Clicker Question Flashback: If $f (x) = arctan (x)$ , what is $f^{(3)} (0)$ ?

We know $f^{'} (x) = \frac{1}{1 + x ^{2}}$ . We could keep differentiating this using the quotient rule (or chain rule if we write it as $(1 + x^{2})^{- 1}$ ) to find $f^{''}$ and then $f^{'''}$ . It gets a bit messy.

We’ll see a more elegant way to solve this later using power series! (For now, we’ll be “lazy” and defer the brute-force calculation. The answer will be revealed!)

Theorem: Higher Derivatives and Algebraic Operations

Let $D$ be a suitable domain, and $n \in N^{*}$ .

Quotient Rule for Higher Derivatives: If $f, g : D \to R$ are $n$ -times differentiable in $D$ , and importantly, $g (x) \neq = 0$ for all $x \in D$ , then their quotient $f / g$ is also $n$ -times differentiable in $D$ . (The general formula for $(f / g)^{(n)}$ is very complicated, so we usually don’t write it out explicitly beyond the first derivative).
Chain Rule for Higher Derivatives (Faà di Bruno’s Formula): If $D, E \subseteq R$ are suitable domains, and $f : D \to E$ and $g : E \to R$ are both $n$ -times differentiable functions, then their composition $g \circ f$ is also $n$ -times differentiable in $D$ . The formulas get complicated quickly:
- n=1 (The familiar Chain Rule): $(g \circ f)^{'} (x) = g^{'} (f (x)) f^{'} (x)$
- n=2 (Differentiating the first derivative): $(g \circ f)^{''} (x) = \frac{d}{d x} (g^{'} (f (x)) f^{'} (x))$ Using the product rule: $(g^{'} (f (x)))^{'} \cdot f^{'} (x) + g^{'} (f (x)) \cdot (f^{'} (x))^{'}$ Using the chain rule for $(g^{'} (f (x)))^{'}$ : $g^{''} (f (x)) f^{'} (x) \cdot f^{'} (x) + g^{'} (f (x)) f^{''} (x)$ So, $(g \circ f)^{''} (x) = g^{''} (f (x)) (f^{'} (x))^{2} + g^{'} (f (x)) f^{''} (x)$

A General Principle: The properties of being “ $n$ -times differentiable” usually carry over if you replace it with “smooth”. If the constituent functions are smooth, their sums, products, quotients (where defined), and compositions are also smooth.

Examples: Smoothness from Combining Functions

Tangent Function is Smooth (on its domain): $tan (x) = \frac{s i n ( x )}{c o s ( x )}$ is smooth wherever it’s defined, which is $R ∖ {\frac{π}{2} + kπ ∣ k \in Z}$ .
- Why? $sin (x)$ and $cos (x)$ are smooth everywhere. Their quotient is smooth as long as the denominator $cos (x)$ is not zero.
Power Functions $x^{a}$ are Smooth (for $x > 0$ ): For any real number $a$ , the function $x \mapsto x^{a}$ is smooth on the interval $(0, \infty)$ .
- Why? We write $x^{a} = exp (a ln (x))$ . The function $ln (x)$ is smooth on $(0, \infty)$ . Multiplying by a constant $a$ (i.e., $u \mapsto a \cdot u$ ) is a smooth operation. The exponential function $exp (v)$ is smooth on $R$ . Since $x^{a}$ is a composition of these smooth functions, it is itself smooth on $(0, \infty)$ by the (generalized) chain rule.
Inverse Trigonometric Functions are Smooth (mostly):
- $arcsin (x)$ and $arccos (x)$ are smooth on the open interval $(- 1, 1)$ .
- $arctan (x)$ is smooth on the entire real line $R$ .
- Let’s see why for $arcsin (x)$ : We found $arcsin^{'} (x) = \frac{1}{1 - x ^{2}} = (1 - x^{2})^{- 1/2}$ .
  
  The function $u (x) = 1 - x^{2}$ is a polynomial, so it’s smooth everywhere. For $x \in (- 1, 1)$ , the value $1 - x^{2}$ is in $(0, 1]$ .
  
  The function $v (u) = u^{- 1/2}$ (which is $1/ u$ ) is smooth on $(0, \infty)$ (from our previous example of $x^{a}$ with $a = - 1/2$ ).
  
  Since $arcsin^{'} (x)$ is a composition of $v (u (x))$ , it is smooth on $(- 1, 1)$ .
  
  If the derivative of a function ( $arcsin^{'} (x)$ ) is smooth, then the original function ( $arcsin (x)$ ) must also be smooth. (Think of it this way: if you can differentiate $f^{'}$ infinitely many times, you could certainly differentiate $f$ one more time than that!)

Power Series and Taylor Approximation: The Link Between Smoothness and Polynomials

What’s the Big Idea Here? Imagine you have a really smooth, curvy function. We already know that special types of infinite sums called “power series” can create such smooth functions. Now, we’re asking the reverse: can we take any smooth function and represent it or approximate it using a polynomial (or an infinitely long polynomial, which is a power series)? This section explores that connection.

This section bridges two important ideas:

We know that power series define smooth functions (infinitely differentiable functions) inside their interval of convergence. Think of them as “infinitely long polynomials” that behave very nicely.
We want to see how differentiable (especially smooth) functions can be approximated by polynomials. This naturally leads to the concept of Taylor series, where we use the function’s derivatives at a single point to build these polynomial approximations.

Clicker Question Analysis (A Look Ahead): Consider $f_{n} (x) = x^{2} + \frac{1}{n ^{2}}$ and $f (x) = ∣ x ∣ = x^{2}$ .

Hold On, A Word of Caution! This example is a bit of a “heads-up.” We’re looking at a sequence of perfectly smooth functions ( $f_{n} (x)$ ) that get closer and closer to a function ( $f (x) = ∣ x ∣$ ) that isn’t smooth everywhere (it has a sharp corner at $x = 0$ ). This tells us that just because smooth things approach a limit doesn’t mean the limit itself will be smooth. Something more is needed!

Convergence behavior: Let’s look at the difference: $∣ f_{n} (x) - f (x) ∣ = x^{2} + \frac{1}{n ^{2}} - x^{2}$ . We can rationalize this (multiply by the conjugate over itself, a common algebraic trick to simplify expressions involving square roots): $= \frac{( x ^{2} + \frac{1}{n ^{2}} ) - x ^{2}}{x ^{2} + \frac{1}{n ^{2}} + x ^{2}} = \frac{1/ n ^{2}}{x ^{2} + \frac{1}{n ^{2}} + ∣ x ∣}$ . The denominator is always at least $0 + \frac{1}{n ^{2}} + 0 = \frac{1}{n}$ (this minimum occurs when $x = 0$ ). So, $∣ f_{n} (x) - f (x) ∣ \leq \frac{1/ n ^{2}}{1/ n} = \frac{1}{n}$ .

Since $\frac{1}{n} \to 0$ as $n \to \infty$ , and this bound $1/ n$ does not depend on $x$ , the sequence $f_{n} (x)$ converges uniformly to $f (x) = ∣ x ∣$ on $R$ . Uniform convergence means that all parts of the function $f_{n} (x)$ approach $f (x)$ at roughly the same rate.
Differentiability contrast: Each function $f_{n} (x) = x^{2} + \frac{1}{n ^{2}}$ is smooth for all $x \in R$ . Why? Because the term inside the square root, $x^{2} + \frac{1}{n ^{2}}$ , is always strictly positive (since $n \geq 1$ , $\frac{1}{n ^{2}} > 0$ ). The square root function $u$ is smooth as long as $u > 0$ . However, the limit function $f (x) = ∣ x ∣$ is famously not differentiable at $x_{0} = 0$ . It has a sharp point there, and a derivative needs a well-defined, unique tangent line.

The punchline of this clicker question: Uniform convergence of a sequence of differentiable functions is not enough to guarantee that the limit function is differentiable. We need something more – specifically, we need the sequence of derivatives to also converge nicely (uniformly, in fact).

Theorem: Interchanging Limits and Differentiation

When Can You Swap ‘Limit’ and ‘Derivative’? This is a crucial question in calculus. Imagine you have a sequence of functions $f_{n}$ that are morphing into a final function $f$ . Can you find the slope (derivative) of $f$ by first finding the slopes of all the $f_{n}$ ‘s and then seeing what those slopes approach? This theorem says “yes, sometimes!” but lays down the specific conditions needed.

Let $I \subseteq R$ be an interval with more than one point.

Consider a sequence of functions $(f_{n})_{n \in N^{*}}$ , where each $f_{n} : I \to R$ is continuously differentiable (meaning $f_{n}^{'}$ exists and is continuous).

Suppose two conditions hold:

The sequence of functions $(f_{n})$ converges pointwise on $I$ to some function $f$ . That is, for each specific $x \in I$ , $lim_{n \to \infty} f_{n} (x) = f (x)$ . (The functions settle down at each point.)
The sequence of derivatives $(f_{n}^{'})$ converges uniformly on $I$ to some function $p$ . That is, $lim_{n \to \infty} f_{n}^{'} (x) = p (x)$ uniformly for $x \in I$ . (The slopes of the functions settle down, and they do so together, across the whole interval.)

Then, the limit function $f$ is differentiable on $I$ , and its derivative is exactly $p$ . In other words: $f^{'} = p or (lim_{n \to \infty} f_{n})^{'} = lim_{n \to \infty} f_{n}^{'}$ This theorem is a cornerstone because it tells us precisely when we can swap the order of taking a limit and performing differentiation.

Proof Idea (Connecting to MVT and Continuity)

How Does This Swapping Magic Work? (The Intuition) The proof relies on a few key ideas. First, if the derivatives $f_{n}^{'}$ are all continuous and they converge uniformly to $p$ , then $p$ itself must be continuous (a nice property of uniform convergence). To show $f^{'} (x_{0}) = p (x_{0})$ , we look at the definition of the derivative of $f$ at $x_{0}$ : $lim_{x \to x_{0}} \frac{f ( x ) - f ( x _{0} )}{x - x _{0}}$ . The core idea is that for $f (x)$ and $f (x_{0})$ to be close to $f_{n} (x)$ and $f_{n} (x_{0})$ (due to pointwise convergence), the fraction $\frac{f ( x ) - f ( x _{0} )}{x - x _{0}}$ should be close to $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}}$ . The Mean Value Theorem (MVT) steps in here: it says that $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}}$ is exactly equal to $f_{n}^{'} (ξ_{n})$ for some $ξ_{n}$ between $x$ and $x_{0}$ . Now, if $f_{n}^{'} \to p$ uniformly, then $f_{n}^{'} (ξ_{n})$ should be close to $p (ξ_{n})$ . And since $p$ is continuous, $p (ξ_{n})$ is close to $p (x_{0})$ when $x$ (and thus $ξ_{n}$ ) is close to $x_{0}$ . Stringing these “close to” ideas together suggests that the derivative of $f$ at $x_{0}$ is indeed $p (x_{0})$ .

Since each $f_{n}^{'}$ is continuous and $f_{n}^{'} \to p$ uniformly, the limit function $p$ must be continuous on $I$ (this is a standard result, cited here as Theorem 3.7.4).

We want to show that for any $x_{0} \in I$ , $lim_{x \to x_{0}} \frac{f ( x ) - f ( x _{0} )}{x - x _{0}} = p (x_{0})$ .

For $x$ close to $x_{0}$ and $n$ large: The difference quotient $\frac{f ( x ) - f ( x _{0} )}{x - x _{0}}$ (the slope of the secant line for $f$ ) should be close to $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}}$ (the slope of the secant line for $f_{n}$ ).

By the Mean Value Theorem applied to $f_{n}$ on an interval containing $x$ and $x_{0}$ , there is some $ξ_{n}$ between $x$ and $x_{0}$ such that $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}} = f_{n}^{'} (ξ_{n})$ .

As $x \to x_{0}$ , $ξ_{n}$ (which is squeezed between $x$ and $x_{0}$ ) also approaches $x_{0}$ . Because $f_{n}^{'} \to p$ uniformly, $f_{n}^{'} (ξ_{n})$ is close to $p (ξ_{n})$ for large $n$ . Because $p$ is continuous at $x_{0}$ , $p (ξ_{n})$ is close to $p (x_{0})$ when $ξ_{n}$ is close to $x_{0}$ .

Putting it all together: $\frac{f ( x ) - f ( x _{0} )}{x - x _{0}} \approx \frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}} = f_{n}^{'} (ξ_{n}) \approx p (ξ_{n}) \approx p (x_{0})$ . This makes it plausible that $f^{'} (x_{0}) = p (x_{0})$ .

More Formal Sketch of the Proof

Making the Intuition Rigorous (The Epsilon-Delta Dance) The “proof idea” gave us a good gut feeling. Now, we make it mathematically solid using precise definitions of limits (epsilons and deltas). The goal is to show that the difference between the actual slope of $f$ and the proposed slope $p (x_{0})$ can be made arbitrarily small.

Let $ϵ > 0$ . We need to show that $\frac{f ( x ) - f ( x _{0} )}{x - x _{0}} - p (x_{0})$ can be made small by choosing $x$ sufficiently close to $x_{0}$ .

Since $p$ is continuous at $x_{0}$ : for our chosen $ϵ > 0$ , there exists a $δ_{1} > 0$ such that if $∣ y - x_{0} ∣ < δ_{1}$ , then $∣ p (y) - p (x_{0}) ∣ \leq ϵ$ . Since $ξ_{n}$ will be between $x$ and $x_{0}$ , if $∣ x - x_{0} ∣ < δ_{1}$ , then $∣ ξ_{n} - x_{0} ∣ < δ_{1}$ , so $∣ p (ξ_{n}) - p (x_{0}) ∣ \leq ϵ$ . (Statement $*$ )
Since $f_{n}^{'} \to p$ uniformly: for our chosen $ϵ > 0$ , there exists an integer $N$ such that if $n \geq N$ , then $∣ f_{n}^{'} (ξ) - p (ξ) ∣ \leq ϵ$ for all $ξ \in I$ . (Statement $* *$ )

Now, pick an $x \in (x_{0} - δ_{1}, x_{0} + δ_{1}) \cap I$ with $x \neq = x_{0}$ .

Apply the Mean Value Theorem to $f_{n}$ on the interval between $x_{0}$ and $x$ : there exists $ξ_{n}$ between $x_{0}$ and $x$ such that $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}} = f_{n}^{'} (ξ_{n})$ . Note that $ξ_{n}$ is automatically within the $δ_{1}$ -neighborhood of $x_{0}$ because $x$ is.

For $n \geq N$ : We can write: $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}} - p (x_{0}) = ∣ f_{n}^{'} (ξ_{n}) - p (x_{0}) ∣$ .

Using the triangle inequality ( $∣ a - c ∣ \leq ∣ a - b ∣ + ∣ b - c ∣$ ): $∣ f_{n}^{'} (ξ_{n}) - p (x_{0}) ∣ \leq \leq ϵ by ** ∣ f_{n}^{'} (ξ_{n}) - p (ξ_{n}) ∣ + \leq ϵ by * ∣ p (ξ_{n}) - p (x_{0}) ∣$ . So, for $n \geq N$ , we have $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}} - p (x_{0}) \leq ϵ + ϵ = 2 ϵ$ .

Now, we take the limit as $n \to \infty$ . Since $f_{n} (x) \to f (x)$ for each $x$ (pointwise convergence), the term $\frac{f _{n} ( x ) - f _{n} ( x _{0} )}{x - x _{0}}$ converges to $\frac{f ( x ) - f ( x _{0} )}{x - x _{0}}$ .

Thus, taking the limit of the inequality (the $\leq 2 ϵ$ part is independent of $n$ ): $\frac{f ( x ) - f ( x _{0} )}{x - x _{0}} - p (x_{0}) \leq 2 ϵ$ .

Since this holds for any $x$ in the punctured $δ_{1}$ -neighborhood of $x_{0}$ (meaning $x \neq = x_{0}$ but $∣ x - x_{0} ∣ < δ_{1}$ ), and $ϵ$ was an arbitrary positive number, this means that $lim_{x \to x_{0}} \frac{f ( x ) - f ( x _{0} )}{x - x _{0}} = p (x_{0})$ by definition of the limit.

Thus, $f^{'} (x_{0}) = p (x_{0})$ .

Application: Differentiating Power Series Term by Term

Power Series Good news! The theorem we just proved about swapping limits and derivatives is exactly what we need to justify differentiating power series (those “infinite polynomials”) term by term. It turns out power series are well-behaved enough to meet the theorem’s conditions within their comfort zone (interval of convergence).

This theorem is the backbone for why we can differentiate power series term by term, a technique that feels as natural as differentiating ordinary polynomials.

Theorem: Term-by-Term Differentiation of Power Series

Official Rule: Differentiate Piece by Piece! This theorem formally states that if you have a power series that defines a function $f (x)$ , you can find the derivative $f^{'} (x)$ by simply differentiating each term of the series individually. And importantly, the new “derivative series” will work (converge) in the same $x$ -range as the original series.

Let $\sum_{k = 0}^{\infty} c_{k} (x - x_{0})^{k}$ be a power series with a radius of convergence $ρ > 0$ . (This $ρ$ defines the “comfort zone” $(x_{0} - ρ, x_{0} + ρ)$ where the series behaves nicely.)

Then the function $f (x) = \sum_{k = 0}^{\infty} c_{k} (x - x_{0})^{k}$ is differentiable within its interval of convergence $(x_{0} - ρ, x_{0} + ρ)$ .

Its derivative is found by differentiating term by term: $f^{'} (x) = \sum_{k = 1}^{\infty} k c_{k} (x - x_{0})^{k - 1}$ Crucially, this new power series (the “differentiated series”) has the same radius of convergence $ρ$ as the original series. This is a powerful result!

Why this works (linking to the previous theorem):

Connecting the Dots: Why Term-by-Term is Okay Why can we do this? Because on any closed subinterval inside the interval of convergence (say, from $x_{0} - r$ to $x_{0} + r$ where $r < ρ$ ), a power series converges uniformly. Moreover, the series of its derivatives also converges uniformly on that same subinterval. These are precisely the conditions our “Interchanging Limits and Differentiation” theorem needs! The partial sums of the power series are our $f_{n}$ ‘s.

On any closed subinterval $[x_{0} - r, x_{0} + r]$ where $0 \leq r < ρ$ :

The original power series (viewed as the limit of its sequence of partial sums $S_{N} (x) = \sum_{k = 0}^{N} c_{k} (x - x_{0})^{k}$ ) converges uniformly to $f (x)$ .
The series of derivatives (i.e., $\sum_{k = 1}^{\infty} k c_{k} (x - x_{0})^{k - 1}$ ) is also a power series. It can be shown (often using the same methods as finding the original radius of convergence, like the ratio test for its coefficients) that it has the same radius of convergence $ρ$ . Thus, its sequence of partial sums converges uniformly to some function, let’s call it $p (x)$ , on $[x_{0} - r, x_{0} + r]$ .

The conditions of the “Interchanging Limits and Differentiation” Theorem are met on these closed subintervals (with $f_{N}$ being the partial sums of the original series, and $f_{N}^{'}$ being the partial sums of the differentiated series). So, we can conclude that $f^{'} (x) = p (x)$ , which is the sum of the term-by-term differentiated series.

Corollary: Power Series Define Smooth Functions, and Taylor Coefficients

Power Series = Super Smooth Functions! And a Bonus: Finding Coefficients! This is a fantastic consequence. If a function can be written as a power series:

It’s not just differentiable once, but infinitely many times (it’s “smooth”).

We can keep differentiating term-by-term.

Most excitingly, if we plug in the center point $x_{0}$ into the function and its derivatives, we can figure out exactly what the coefficients $c_{j}$ of the original power series must be! They are the famous Taylor coefficients.

If a function $f (x)$ is given by a power series $f (x) = \sum_{k = 0}^{\infty} c_{k} (x - x_{0})^{k}$ with radius of convergence $ρ > 0$ , then:

$f$ is smooth (infinitely differentiable) in the interval $(x_{0} - ρ, x_{0} + ρ)$ . Intuition: Differentiating a polynomial gives another polynomial. Since a power series is like an “infinite polynomial” and term-by-term differentiation yields another power series with the same radius of convergence, we can repeat this process indefinitely.
Its $j$ -th derivative can be found by differentiating term-by-term $j$ times: $f^{(j)} (x) = \sum_{k = j}^{\infty} c_{k} \cdot k (k - 1) \dots (k - j + 1) (x - x_{0})^{k - j}$ (Note: the sum starts at $k = j$ because terms $(x - x_{0})^{m}$ with $m < j$ will become zero after $j$ differentiations.)
A very important consequence: If we evaluate the $j$ -th derivative at $x = x_{0}$ , all terms in the sum vanish except for the very first one (where $k = j$ ): When $k = j$ , the term is $c_{j} \cdot j (j - 1) \dots (1) (x - x_{0})^{j - j} = c_{j} \cdot j! (x - x_{0})^{0}$ . (Note that $0^{0} = 1$ ; might be controversial, but the internet and math-stackexchange says so…)

So, $f^{(j)} (x_{0}) = c_{j} \cdot j! \cdot 1 = c_{j} \cdot j!$ . This gives us a remarkable way to find the coefficients of a power series if we know the function it represents and can calculate its derivatives at $x_{0}$ :
$c_{j} = \frac{f ^{(j)} ( x _{0} )}{j !}$
These are the Taylor coefficients. This formula is a cornerstone of Taylor series.

Example

Let’s See It in Action! Consider the series for $- ln (1 - x)$ . If we differentiate it term by term, we should get the series for its derivative, $1/ (1 - x)$ . Let’s check!

Consider $f (x) = \sum_{k = 1}^{\infty} \frac{1}{k} x^{k} = x + \frac{x ^{2}}{2} + \frac{x ^{3}}{3} + \dots$ .

This series has a radius of convergence $ρ = 1$ (you can check this with the ratio test: $lim_{k \to \infty} \frac{x ^{k + 1} / ( k + 1 )}{x ^{k} / k} = lim_{k \to \infty} \frac{k}{k + 1} x = ∣ x ∣$ , so it converges for $∣ x ∣ < 1$ ).

For $x \in (- 1, 1)$ : $f^{'} (x) = \sum_{k = 1}^{\infty} \frac{1}{k} \cdot k x^{k - 1} = \sum_{k = 1}^{\infty} x^{k - 1}$ .

Let $j = k - 1$ . As $k$ goes from $1$ to $\infty$ , $j$ goes from $0$ to $\infty$ . So, $f^{'} (x) = \sum_{j = 0}^{\infty} x^{j} = 1 + x + x^{2} + \dots$ .

This is the geometric series, which sums to $\frac{1}{1 - x}$ for $∣ x ∣ < 1$ .

This makes perfect sense: if $f^{'} (x) = \frac{1}{1 - x}$ , then $f (x)$ should be $\int \frac{1}{1 - x} d x = - ln (1 - x) + C$ .

Since $f (0) = \sum_{k = 1}^{\infty} \frac{1}{k} (0)^{k} = 0$ , and $- ln (1 - 0) + C = 0 + C$ , we have $C = 0$ .

And indeed, the series for $- ln (1 - x)$ centered at $x_{0} = 0$ is $x + \frac{x ^{2}}{2} + \frac{x ^{3}}{3} + \dots$ .

Solving the Clicker Question for $arctan^{(3)} (0)$ Elegantly

A Clever Shortcut for Derivatives! Remember that earlier “clicker question” about $arctan^{(3)} (0)$ ? Calculating higher derivatives directly can be tedious. But if we can find a power series for a related function (like $arctan^{'} (x)$ ), we can use the $c_{j} = f^{(j)} (0) / j!$ formula to pick off the derivative we need from the coefficients!

Let $g (x) = arctan^{'} (x)$ . We know that $arctan^{'} (x) = \frac{1}{1 + x ^{2}}$ .

For $x \in (- 1, 1)$ (which ensures $∣ - x^{2} ∣ < 1$ ), we can expand $\frac{1}{1 + x ^{2}}$ as a geometric series: $\frac{1}{1 + x ^{2}} = \frac{1}{1 - ( - x ^{2} )} = 1 + (- x^{2}) + (- x^{2})^{2} + (- x^{2})^{3} + \dots$ So, $g (x) = 1 - x^{2} + x^{4} - x^{6} + x^{8} - \dots = \sum_{m = 0}^{\infty} (- 1)^{m} x^{2 m}$ .

This is the power series for $g (x) = arctan^{'} (x)$ centered at $x_{0} = 0$ .

Let’s write out the coefficients $c_{k}$ for $g (x) = \sum_{k = 0}^{\infty} c_{k} x^{k}$ :

$c_{0} = 1$ (coefficient of $x^{0}$ , when $m = 0$ ) $c_{1} = 0$ (there is no $x^{1}$ term) $c_{2} = - 1$ (coefficient of $x^{2}$ , when $m = 1$ ) $c_{3} = 0$ (there is no $x^{3}$ term) $c_{4} = 1$ (coefficient of $x^{4}$ , when $m = 2$ ), and so on. Only even powers of $x$ appear.

We want $arctan^{(3)} (0)$ .

Since $g (x) = arctan^{'} (x)$ , then $g^{'} (x) = arctan^{''} (x)$ , and $g^{''} (x) = arctan^{'''} (x)$ . So, we are looking for $g^{(2)} (0)$ .

From the corollary, for a power series $h (x) = \sum b_{k} x^{k}$ , we have $h^{(j)} (0) = b_{j} \cdot j!$ .

Applying this to our function $g (x)$ and its coefficients $c_{k}$ : $g^{(2)} (0) = c_{2} \cdot 2!$ .

The coefficient $c_{2}$ (of $x^{2}$ ) in the series for $g (x)$ is $- 1$ . Therefore, $g^{(2)} (0) = (- 1) \cdot 2! = - 1 \cdot 2 = - 2$ . Thus, $arctan^{(3)} (0) = - 2$ . Much easier than differentiating $arctan (x)$ three times!

Taylor Approximation: Using Derivatives to Build Polynomial Approximations

The Taylor Polynomial The fact that power series coefficients are $c_{j} = \frac{f ^{(j)} ( x _{0} )}{j !}$ is incredibly powerful. It suggests that if a function is smooth enough (has enough derivatives) at a point $x_{0}$ , we can try to build a polynomial that “mimics” the function near $x_{0}$ by using these very derivatives. This special polynomial is called the Taylor polynomial. The more derivatives it matches, the better it “hugs” the original function near $x_{0}$ .

Our discussion of power series and how their coefficients relate to derivatives, $c_{j} = \frac{f ^{(j)} ( x _{0} )}{j !}$ , makes the following idea very natural:

If a function $f$ is sufficiently smooth (has enough derivatives) near a point $x_{0}$ , perhaps we can approximate it well using a polynomial built from these derivatives.

The Taylor polynomial of degree $n$ for $f$ around $x_{0}$ is defined as: $T_{n} (x; x_{0}) = \sum_{k = 0}^{n} \frac{f ^{(k)} ( x _{0} )}{k !} (x - x_{0})^{k}$ This polynomial is constructed with a specific goal: it must match the function $f$ and its first $n$ derivatives precisely at the point $x_{0}$ .

$T_{n} (x_{0}; x_{0}) = f (x_{0})$
$T_{n}^{'} (x_{0}; x_{0}) = f^{'} (x_{0})$
…
$T_{n}^{(n)} (x_{0}; x_{0}) = f^{(n)} (x_{0})$

The intuition is that if a polynomial shares these fundamental characteristics (value, slope, concavity, rate of change of concavity, etc.) with the function at $x_{0}$ , it should provide a good local approximation to $f (x)$ when $x$ is near $x_{0}$ . The higher the degree $n$ , the more derivatives are matched, and generally, the better the approximation becomes over a wider range around $x_{0}$ . This is the foundational idea behind Taylor series (which is $T_{\infty} (x; x_{0})$ ) and Taylor’s theorem, which provides a way to quantify the error $R_{n} (x) = f (x) - T_{n} (x; x_{0})$ in this approximation.

Continue here: 20 Taylor Approximation, Higher Derivative Test for Local Extrema, Riemann Integral, Darboux Sums and Integrals

CS Notes

Explorer

19 Higher Order Derivatives, Smooth Functions, Power Series and Taylor Approximation

Review: What We Know About Convex Functions and Their Derivatives

Higher Order Derivatives

Definition: What “n-times Differentiable” Means

An Important Link: Higher Differentiability and Continuity of Lower Derivatives

Theorem: Rules for Higher Derivatives (They Behave Nicely!)

Examples: Functions That Are Infinitely Smooth

Theorem: Higher Derivatives and Algebraic Operations

Examples: Smoothness from Combining Functions

Power Series and Taylor Approximation: The Link Between Smoothness and Polynomials

Theorem: Interchanging Limits and Differentiation

Proof Idea (Connecting to MVT and Continuity)

More Formal Sketch of the Proof

Application: Differentiating Power Series Term by Term

Theorem: Term-by-Term Differentiation of Power Series

Why this works (linking to the previous theorem):

Corollary: Power Series Define Smooth Functions, and Taylor Coefficients

Example

Solving the Clicker Question for $arctan^{(3)} (0)$ Elegantly

Taylor Approximation: Using Derivatives to Build Polynomial Approximations

Table of Contents

Graph View

Backlinks

CS Notes

Explorer

19 Higher Order Derivatives, Smooth Functions, Power Series and Taylor Approximation

Review: What We Know About Convex Functions and Their Derivatives

Higher Order Derivatives

Definition: What “n-times Differentiable” Means

An Important Link: Higher Differentiability and Continuity of Lower Derivatives

Theorem: Rules for Higher Derivatives (They Behave Nicely!)

Examples: Functions That Are Infinitely Smooth

Theorem: Higher Derivatives and Algebraic Operations

Examples: Smoothness from Combining Functions

Power Series and Taylor Approximation: The Link Between Smoothness and Polynomials

Theorem: Interchanging Limits and Differentiation

Proof Idea (Connecting to MVT and Continuity)

More Formal Sketch of the Proof

Application: Differentiating Power Series Term by Term

Theorem: Term-by-Term Differentiation of Power Series

Why this works (linking to the previous theorem):

Corollary: Power Series Define Smooth Functions, and Taylor Coefficients

Example

Solving the Clicker Question for arctan(3)(0) Elegantly

Taylor Approximation: Using Derivatives to Build Polynomial Approximations

Table of Contents

Graph View

Backlinks

Solving the Clicker Question for $arctan^{(3)} (0)$ Elegantly