If matrices $A$ and $B$ commute, $A$ with distinct eigenvalues, then $B$ is a polynomial in $A$

If $A\in M_{n}$ has $n$ distinct eigenvalues and if $A$ commutes with a given matrix $B\in M_{n}$, how can I show that $B$ is a polynomial in $A$ of degree at most $n-1$? I think first I need to show that $A$ and $B$ are simultaneously diagonalizable, but I don’t know how to begin. Any advice is much appreciated.

Solutions Collecting From Web of "If matrices $A$ and $B$ commute, $A$ with distinct eigenvalues, then $B$ is a polynomial in $A$"

At the question Why does a diagonalization of a matrix B with the basis of a commuting matrix A give a block diagonal matrix? you can see why a basis consisting of eigenvectors for $A$ will also diagonalize $B$.

Since all matrices that commute with $A$ are diagonalized by a basis of eigenvectors for $A$, and conversely, any matrix diagonalized by a basis of eigenvectors for $A$ commutes with $A$, the set of matrices commuting with $A$ forms a vector subspace of $M_n$ of dimension $n$. The matrices $I,A,A^2,\ldots,A^{n-1}$ lie in this subspace. You can show that these matrices are linearly independent using the facts that $A$ has $n$ distinct eigenvalues, and that a nonzero polynomial of degree at most $n-1$ over a field can have at most $n-1$ zeros. I hope that helps.

In the setting of the question, the polynomial $g\in K[X]$ of degree $ < n$ such that $B=g(A)$ is given by Lagrange’s Interpolation Formula. [Here $K$ is the ground field.] Indeed, letting $a_i$ and $b_i$ be the eigenvalues of $A$ and $B$ (numbered in a coherent way), we have
$$
g(X)=\sum_{i=1}^n\ b_i\ \prod_{j\not=i}\ \frac{X-a_j}{a_i-a_j}\quad.
$$

More generally, let $A,B\in M_n(K)$ be two $n$ by $n$ matrices with coefficients in a field $K$, let $f\in K[X]$ be the minimal polynomial of $A$, and let $d$ be its degree. Clearly:

(1) If $B$ is a polynomial in $A$, then there is a unique polynomial $g\in K[X]$ of degree less that $d$ such that $B=g(A)$.

Let $e_1,\dots,e_r$ be the minimal idempotents of $K[A]\simeq K[X]/(f)$ [canonical isomorphism], and put $V_i:=e_iK^n$. Then $K^n$ is the direct sum of the $V_i$. Let $A_i$ be the restriction of $A$ to $V_i$. Then there are distinct monic irreducible polynomials $f_i$, and there are positive integers $m_i$, such that $K[A_i]\simeq K[X]/(f_i^{m_i})$ [canonical isomorphism], and $f$ is the product of the $f_i^{m_i}$.

Assume that $A$ and $B$ commute. Then $BV_i\subset V_i$ for all $i$. Let $B_i$ be the restriction of $B$ to $V_i$.

Then $B$ is a polynomial in $A$ if and only if each $B_i$ is a polynomial in $A_i$.

More precisely, if $B_i=g_i(A_i)$, the polynomial $g$ of (1) is the unique degree less than $d$ solution to the congruences
$$
g\equiv g_i\ \bmod\ f_i^{m_i}
$$
(see the Chinese Remainder Theorem).

If the eigenvalues of $A$ are in $K$ (as can always be assumed), the $f_i$ have degree $1$, and the congruences can be solved by Taylor’s Formula. More precisely, if $f_i=X-a_i$, then
$$
g(X)=\sum_{i=1}^r\ T_i\left(g(X)\frac{(X-a_i)^{m_i}}{f(X)}\right)\frac{f(X)}{(X-a_i)^{m_i}}\quad,
$$
where $T_i(h(X))$ means “degree less than $m_i$ Taylor approximation of $h(X)$ at $X=a_i$”. [Note that $V_i$ is the $a_i$-generalized eigenspace.]

First, if $B$ commutes with $A$, then every eigenspace for $A$ is $B$-stable (see here for a very simple proof). Since these eigenspaces are by assumption $1$-dimensional, any eigenvector for$~A$ is an eigenvector for$~B$ as well. So any change of basis diagonalising $A$ will simultaneously diagonalise$~B$. (But $B$ might have repeated eigenvalues, in which case the converse is not true.)

To show that $B$ is a polynomial in$~A$, we may first perform such a change of basis to reduce to the case where $A$ is diagonal; we need to show that then every diagonal matrix is a polynomial in$~A$. The kernel of the map $P\mapsto P[A]$ from polynomials$~P$ to matrices is generated by the minimal polynomial $(X-\lambda_1)\ldots(X-\lambda_n)$ of$~A$, which has degree$~n$. The restriction of that map to polynomials of degree less than$~n$ is therefore injective, and it maps to the space$~D_n$ of diagonal $n\times n$ matrices which also has dimension$~n$; the map is therefore surjective (so bijective) to$~D_n$, and $B=P[A]$ for some $P$.