How to evaluate the derivatives of matrix inverse?

Cliff Taubes wrote in his differential geometry book that:

We now calculate the directional derivatives of the map $$M\rightarrow M^{-1}$$ Let $\alpha\in M(n,\mathbb{R})$ denote any given matrix. Then the directional derivatives of the coordinates of the map $M\rightarrow M^{-1}$ in the drection $\alpha$ are the entries of the matrix $$-M^{-1}\alpha M^{-1}$$ Consider, for example, the coordinate given by the $(i,j)$th entry, $(M^{-1})_{ij}$. The directional derivative in the drection $\alpha$ of this function on $GL(n,\mathbb{R})$ is $$-(M^{-1}\alpha M^{-1})_{ij}$$ In particular, the partial derivative of the function $M\rightarrow (M^{-1})_{ij}$ with respect to the coordinate $M_{rs}$ is $-(M^{-1})_{ir}(M^{-1})_{sj}$.

I am wondering why this is true. He did not give any deduction of this formula, and all the formulas I know for matrix inverse does not generate anything similar to his result. So I venture to ask.

Solutions Collecting From Web of "How to evaluate the derivatives of matrix inverse?"

Not sure if this is the type of answer you want, since I’m giving another argument rather than explain his argument. However, this is how I usually think of it.

Let $M$ be a matrix and $\delta M$ the infinitesimal perturbation (e.g. $\epsilon$ times the derivative). Now, let $N=M^{-1}$ and $\delta N$ the corresponding perturbation of the inverse so that $N+\delta N=(M+\delta M)^{-1}$. Including only first order perturbations (i.e. ignoring terms with two $\delta$s), this gives
$$
\begin{split}
I=&(M+\delta M)(N+\delta N)=MN+M\,\delta N+\delta M\,N\\
&\implies M\,\delta N=-\delta M\,N=-\delta M\,M^{-1}\\
&\implies \delta N=-M^{-1}\,\delta M\,M^{-1}.\\
\end{split}
$$
Written in terms of derivatives, i.e. $M’=dM/ds$ and $N’=dN/ds$ where $M=M(s)$ and $N=N(s)$ and $M(s)N(s)=I$, the same would be written
$$
0=I’=(MN)’=M’N+MN’\implies N’=-M^{-1}\,M’\,M^{-1}.
$$


To address some of the comments, although a bit belatedly:

For example, if you let $M(s)=M+s\Delta M$, this makes the derivative $M'(s)=\Delta M$ for all $s$. This makes $N(s)=M(s)^{-1}=(M+s\Delta M)^{-1}$, and you can use $M(s)\cdot N(s)=I$, and differentiate to get the above expressions.

For any partial derivative, e.g. with respect to $M_{rs}$, just set $\Delta M$ to be the matrix $E^{[rs]}$ with $1$ in cell $(r,s)$ and zero elsewhere, and you get
$$
\frac{\partial}{M_{rs}} M^{-1}
= -M^{-1}\frac{\partial M}{\partial M_{rs}} M^{-1}
= -M^{-1} E^{[rs]} M^{-1}
$$
which makes cell $(i,j)$ of the inverse
$$
\frac{\partial (M^{-1})_{ij}}{\partial M_{rs}}
= -(M^{-1})_{ir}(M^{-1})_{sj}.
$$

I have the following result. I am assuming you already proved that the inversion map (I will call it $f$) is differentiable. We will look at the total derivative $Df(A)$ at $A\in GL(n,\mathbb{R})$.

Take the identity map $Id:GL(n,\mathbb{R})\to GL(n,\mathbb{R}):A\mapsto A$ and the map $g:GL(n,\mathbb{R})\to GL(n,\mathbb{R}):A\mapsto A\cdot A^{-1}=I_n$. Note that the derivative of $Id$ is $DId(A)(H)=Id(H)=H$ for $A,H\in GL(n,\mathbb{R})$ since $Id$ is a linear map. Furthermore, note that $g=Id\cdot f$ and that since $g$ is a constant map, it’s derivative is the zero matrix. Here I use the following result that I will prove later on:

Let $h,k:GL(n,\mathbb{R})\to GL(n,\mathbb{R})$ be differentiable at $A\in GL(n,\mathbb{R})$. Then $$D(h\cdot k)(A)(H)=Dh(A)(H)k(A)+h(A)Dk(A)(H)\;\text{for}\; H\in GL(n,\mathbb{R})$$
From this follows:
$$Dg(A)(H)=DId(A)(H)f(A)+Id(A)Df(A)(H)$$
$$0=H\cdot f(A)+A\cdot Df(A)(H)$$
$$-H\cdot A^{-1}=A\cdot Df(A)(H)$$
$$-A^{-1}HA^{-1}=Df(A)(H)$$
Which is the desired result. Now we have to show that the result I used is true. This is a bit iffy since I will prove it for functions on $\mathbb{R}^n$ and since there exists an isomorphism of vector spaces between $n\times m$-matrices and the metric space $\mathbb{R}^{nm}$ I think it also holds for matrices. Input is welcome but here it goes:

Suppose we have two functions $f:U\to\mathbb{R}^{n_1n_2}$ and $g:U\to\mathbb{R}^{n_2n_3}$ that are differentiable at $x_0$ with $U\subset\mathbb{R}^m$ an open subset. Define $\phi:\mathbb{R}^{n_1n_2}\times\mathbb{R}^{n_2n_3}\to\mathbb{R}^{n_1n_3}:(x,y)\mapsto xy$. Note that $h$ is bilinear and thus is differentiable with derivative: $Dh(x,y)(v,w)=h(v,y)+h(x,w)=vy+xw$ (nice exercise to prove this).

We define $k:U\to\mathbb{R}^{n_1n_2}\times\mathbb{R}^{n_2n_3}:x\mapsto (f(x),g(x))$. Note that $k$ is differentiable at $x_0$ if and only if it’s components are. But it’s components are $f$ and $g$ and so differentiable at $x_0$ by definition, thus $k$ is differentiable at $x_0$. Similarly the derivative of $k$ is the vector of derivatives of it’s components.

By the Chain Rule $h\circ k$ is differentiable at $x_0$ with derivative: $$D(h\circ k)(x_0)=Dh(k(x_0))\circ Dk(x_0)$$
$$D(h\circ k)(x_0)=Dh((f(x_0),g(x_0))\circ (Df(x_0),Dg(x_0))$$
$$D(h\circ k)(x_0)=Df(x_0)g(x_0)+f(x_0)Dg(x_0)$$
The last part was obtained by using the identity for the derivative of bilinear maps I gave earlier.

Hope this is clear and any additions to the solution are welcome!