What is the second derivative?, Part I

I’ve been interested in learning about higher order derivatives of vector functions recently and inspired by this answer by @Bye_World, I have some questions.

So first, how exactly do we define higher order derivatives of functions $f:\Bbb R^m \to \Bbb R^n$? I know that the first derivative is the linear function $L$ such that $$\lim_{h\to 0}\frac{\|f(x+h)-f(x)-L(h)\|_{\Bbb R^n}}{\|h\|_{\Bbb R^m}}=0$$

I asked a question earlier today about a possible limit definition of the second derivative but found that my guess at one didn’t work. Is there no better way than to define it recursively by $$\lim_{h\to 0}\frac{\|D^{n-1}f(x+h)-D^{n-1}f(x)-L(h)\|_{\Bbb R^n}}{\|h\|_{\Bbb R^m}}=0\, ?$$

And even in that formula, I don’t know how you take account of the fact that the $n$th derivative is evaluated at $n$ different points (or vectors?).

Also @Bye_World mentions “the isomorphism $\mathcal L(X,\mathcal L(X,Y)) \simeq \operatorname{Bil}(X\times X\to Y)$”. What does that mean and how is it related?

Solutions Collecting From Web of "What is the second derivative?, Part I"

The isomorphism is relevant to the recursive definition. Let me generalize the setting a little so that certain things will become smoother.

Let $V,W$ be finite dimensional vector spaces over $\mathbb{R}$ and endow them with arbitrary norms $||\cdot||_V$ and $||\cdot||_W$. An important thing to know is any two norms on a finite dimensional vector space are equivalent and this will imply that anything we will do in the following paragraphs will actually be independent of the specific norms we have used.

Given a function $f \colon V \rightarrow W$, we say that $f$ is differentiable at $p \in V$ if there exists a linear map $L \colon V \rightarrow W$ such that

$$ \lim_{h \to 0} \frac{\|f(p + h) – f(p) – L(h)\|_{W}}{\|h\|_{V}} = 0. $$

The map $L$ is the best linear approximation to the map $f(x) – f(p)$ at $x = p$ and is denoted by $Df(p)$.

Now let us assume that $f$ is differentiable at each $p \in V$. Then we get a map $Df \colon V \rightarrow \mathcal{L}(V,W)$ which gives us at each point $p \in V$ the derivative $Df(p)$. Since the right hand side is a finite dimensional vector space, we can repeat the process above (endowing $\mathcal{L}(V,W)$ with an arbitrary norm) and ask whether $Df$ is differentiable at $p \in V$. This will be the case if we can find a linear map $L \colon V \rightarrow \mathcal{L}(V,W)$ such that

$$ \lim_{h \to 0} \frac{\| (Df)(p + h) – (Df)(p) – L(h) \|_{\mathcal{L}(V,W)}}{\|h\|_{V}}. $$

Note that for each $h$, the expression $(Df)(p + h) – (Df)(p) – L(p)$ is itself a linear map from $V$ to $W$. If such a map exists, we denote it by $(D^2 f)(p)$. This is something that takes time to get used to. The first derivative $(Df)(p)$ is a linear map from $V$ to $W$ but the second derivative is a linear map from $V$ to the vector space $\mathcal{L}(V,W)$ of linear maps between $V$ and $W$!

How can we wrap our heads around this strange beast? Let us see first what this definition gives us in the case of a scalar function $f \colon V \rightarrow \mathbb{R}$. In this case, the first derivative $Df \colon V \rightarrow \mathcal{L}(V,\mathbb{R}) = V^{*}$ has the interpretation of a directional derivative. If we fix a point $p \in V$ and a vector $v \in V$, the quantity $((Df)(p))(v) = Df|_{p}(v)$ will give us the directional derivative of $f$ at $p$ in the direction of $v$:

$$ Df|_p(v) = \lim_{t \to 0} \frac{f(p + tv) – f(p)}{t} = \frac{d}{dt} f(p + tv)|_{t = 0}. $$

The second derivative at a point $p \in V$ will be a map $D^2 f|_{p} \colon V \rightarrow \mathcal{L}(V, \mathcal{L}(V, \mathbb{F}))$. Now, elements of $\mathcal{L}(V, \mathcal{L}(V, \mathbb{R}))$ can be identified with bilinear forms on $V$ using the identification

$$ S \in \mathcal{L}(V, \mathcal{L}(V, \mathbb{R})) \mapsto B_S(v,w) = (S(v))(w). $$

Thus, we can think of $D^2 f|_{p}$ as something that eats two vectors $v,w \in V$ and returns a scalar. This can be interpreted again in terms of directional derivatives. We first take the directional derivative of $f$ in the direction $v$ and get a function $Df|_{q}(v)$ of $q$. Then we take the directional derivative of this function at $q = p$ in the direction $w$ and get $D^2f|_p(v,w)$. By choosing the standard basis for $V$, we can see that the second derivative at a point $p$ will be represented with respect to the standard basis by the Hessian matrix

$$ H(p) = \left( \frac{\partial^2 f}{\partial x_i \partial x_j} \right)_{i,j=1}^n $$

of the second partial derivatives of $f$. Assuming $f$ is sufficiently nice, the bilinear form $D^2f|_{p}$ will be symmetric (this corresponds to the fact that $H(p)$ will be symmetric, or, in other words, that the mixed partial derivatives commute).

Similarly, $D^3f|_{p}$ can be identified with a tri-linear map $V \times V \times V \rightarrow \mathbb{R}$ which describes how we differentiate $f$ with respect to three distinct directions $u,v,w$ and again, if $f$ is nice enough, the precise order of the directions $u,v,w$ will be unimportant.

The analogy of the Taylor expansion for such a scalar function $f$ will be

$$ f(p + h) = f(p) + Df|_p(h) + \frac{1}{2} D^2f|_{p}(h,h) + \frac{1}{3!} D^3f|_{p}(h,h,h) + \dots $$

where $D^2 f|_{p}(h,h)$ gives us the “quadratic part” (bilinear forms can also be identified with quadratic forms), $D^3f|_{p}(h,h,h)$ gives us the third order part and so on.

Finally, for general vector-valued functions, the second derivative can be identified with a “$W$-valued bilinear form”. Namely, we can identify maps in $\mathcal{L}(V, \mathcal{L}(V,W))$ with bilinear maps $V \times V \rightarrow W$ and again, $D^2 f|_{p}(v,w)$ will give us the iterated direction derivative of $f$ in the directions $v$ and then $w$ (or the other way around), only now the directional derivative will be a vector in $W$ (as $f$ is a vector-valued function).