Is the proof of this lemma really necessary?

To prove the Cayley-Hamilton theorem in linear algebra, my professor said that a lemma was necessary:

Lemma: Let $A \in M_n(\mathbb{K})$ be an $n\times n$ matrix over a field $\mathbb{K}$, let $b(t) \in M_n(\mathbb{K})[t]$ and $P(t) = b(t)[A-tI]$, then $P(A) = 0$

The theorem (which says that if $f$ is an endomorphism of V, then $f$ is a solution to its characteristic polynomial), was then proven thus:

let $B(t) = \text{adj}[A-tI]$ and $P(t) = B(t)[A-tI]$, then $P(A)=0$ but also $P(t) = \delta I$ (where $\delta = \det(A-tI)$). Since $\delta = \chi_f(t)$, so $P(A) = 0 \Rightarrow \chi_f(A) = 0$.

My question is: since we interpret the $P(t)$ of the theorem as a polynomial with matrix coefficients, isn’t the whole thing kind of obvious for the properties of a polynomial ring? (Assuming we all know how to switch between matrices and endomorphisms)

Solutions Collecting From Web of "Is the proof of this lemma really necessary?"

The reason we need the lemma is that from $P(t)=b(t)(A-tI)$ one cannot directly conclude that $P(A)=b(A)(A-AI)$.

If $R$ is a commutative ring, then there is a natural map $R[t]\to R^R$ which is a ring homomorphism (we endow $R^R$ with the pointwise ring structure: $(f+g)(r) = f(r)+g(r)$, and $fg(r) = f(r)g(r)$ for every $r\in R$). If $p(t)=q(t)s(t)$, then for every $r\in R$ you have that $p(r)=q(r)s(r)$.

But this doesn’t work if $R$ is not commutative. For example, taking $p(t) = at$, $q(t) = t$ and $s(t)=a$, you have $p(t)=q(t)s(t)$ in $R[t]$ (since $t$ is central in $R[t]$ even when $R$ is not commutative), but $p(r) = ar$ while $q(r)s(r) = ra$. So you get $p(r)=q(r)s(r)$ if and only if $a$ and $r$ commute. Thus, while you can certainly define a map $\psi\colon R[t]\to R^R$ by
$$\psi(a_0+a_1t+\cdots+a_nt^n)(r) = a_0 + a_1r + \cdots + a_nr^n,$$
this map is not a ring homomorphism when the ring is not commutative. This is the situation we have here, where the ring $R$ is the ring $n\times n$ matrices over $\mathbb{K}$, which is not commutative when $n\gt 1$. In particular, from $P(t) = B(t)(A-tI)$ one cannot simply conclude that $P(A)=B(A)(A-AI)$. This implicitly assumes that your map $M_n(\mathbb{K})[t]\to M_n(\mathbb{K})^{M_n(\mathbb{K})}$ is multiplicative, which it is not in this case.

If your $A$ happens to be central in $M_n(\mathbb{K})$, then it is true that the induced map $M_n(\mathbb{K})[t]\to M_n(\mathbb{K})$ is a homomorphism. But then you would be assuming that your $A$ is a scalar multiple of the identity. It would also be true if the coefficients of the polynomial $b(t)$ centralize $A$, but you are not assuming that. So you do need to prove that in this case you have $P(A)=b(A)(A-AI)$, since it does not follow from the general set-up (the way it would in a commutative setting).

P.S. In fact, this is the subtle point where the proof that a polynomial over a field of degree $n$ has at most $n$ roots breaks down for skew fields/division rings. If $K$ is a division ring, then the division algorithm holds for polynomials with coefficients over $K$, so one can show that for every $p(t)\in K[t]$ and $a(t)\in K[t]$, $a(t)\neq 0$, there exist unique $q(t)$ and $r(t)$ such that $p(t)=q(t)a(t) + r(t)$ and $r(t)=0$ or $\deg(r)\lt \deg(a)$. From this, we can deduce that for every polynomial $p(t)$ and for every $a\in K$, we can write $p(t) = q(t)(t-a) + r$, where $r\in K$. But the proof of the Remainder and Factor Theorems no longer goes through, because we cannot go from $p(t)=q(t)(t-a)+r$ to $p(a)=q(a)(a-a)+r$; and you cannot get the recursion argument to work, because from $p(t)=q(t)(t-a)$, and $p(b)=0$ with $b\neq a$, you cannot deduce that $q(b)=0$. For instance, over the real quaternions, we have $p(t)=t^2+1=(t+i)(t-i)$, but $p(j)=j^2+1\neq 2k = ij-ji = (j+i)(j-i)$. I remember when I first learned the corresponding theorems for polynomial rings, the professor challenging us to identify all the field axioms used in the proofs of the Remainder and Factor Theorem; none of us spotted the use of commutativity in the evaluation map.

It’s not clear to me what the exact problem is here, but there is a common pitfall
concerning polynomials with matrix coefficients.

Given a polynomial $P(t)=\sum_i B_i t^i$ we can certainly substitute a matrix
$A$ for $t$ to get $P(A)=\sum_i B_i A^i$. But there is a catch when we multiply
polynomials. If $Q(t)$ is another polynomial with matrix coefficients, then we may
$$(PQ)(A)\ne P(A)Q(A).$$
A sufficient condition for $(PQ)(A)=P(A)Q(A)$ to hold is that $A$ commutes
with the coefficients of $Q$. But when it doesn’t then in general
$(PQ)(A)\ne P(A)Q(A)$.

I’d just like to make the following three observations as a minor complement to the other answers.

OBSERVATION 1. What the Cayley-Hamilton Theorem says is $$\det
a_{11}-A & a_{12}&\cdots&a_{1n}\\

OBSERVATION 2. The proof of the Cayley-Hamilton Theorem I like best (among the ones I know) is on page 21 (proof of Proposition 2.4) of Introduction to Commutative Algebra by Atiyah and MacDonald. The argument can be phrased as follows.

Let $K$ be a commutative ring; let $n$ be a positive integer; let $A=(a_{ij})\in M_n(K)$ be an $n$ by $n$ matrix with entries in $K$; let $\chi$ be its characteristic polynomial; define $B=(b_{ij})\in M_n(K[A])$ by $b_{ij}:=\delta_{ij}\,A-a_{ij}$; observe $$\sum_i\ \ b_{ij}\ e_i=0,\quad\det B=\chi(A);$$ and write $(c_{ij})$ for the adjugate of $B$. Applying (a trivial case of) Fubini’s Theorem to the double sum $\sum_{i,j}\ c_{jk}\ b_{ij}\ e_i$, we get $\chi(A)=0$.

OBSERVATION 3. It’s easy to define the ring $R[X]$ of polynomials in the indeterminate $X$ with coefficients in the noncommutative $R$. But when you think in terms of universal properties, you see that this construction is not very natural. So, it’s better, I think, not to introduce it just to prove the Cayley-Hamilton Theorem.

The problem is quite simple: the definition of polynomial multiplication assumes that the coefficients commute with the indeterminates since e.g. $\rm\; r\ x = x\ r\:.\:$ So for evaluation to preserve polynomial arithmetic, i.e. for it to be a ring homomorphism, one can only evaluate polynomials in rings where the coefficient ring is central, i.e. where coefficients commute with all other ring elements. This concept is so important that it has a name, viz. $\rm\:R\:$-algebras are those rings $\rm\:A\:$ in which polynomials in $\rm R[x]$ can be evaluated. More precisely, this means that $\rm\:A\:$ contains an image $\rm\:R’$ of $\rm\:R\:$ and that for all $\rm\;a\in A\:$ the evaluation map $\rm\:x\to a\:$ is a ring homomorphism $\rm\:R[x]\to A\:$. In particular, evaluating $\rm\;\; r\: x = x\: r\;\;$ at $\rm\:x = a\in A\:$ implies that $\rm\;R’$ is central, i.e. it commutes with all elements of $\rm\:A\:$. That’s precisely all that’s required for the existence of a unique evaluation homomorphism $\rm\:\epsilon_a : x\to a\:$ for every $\rm\:a\in A\:.\:$ This permits one to view the polynomial ring $\rm R[x]$ as the universal $\rm\:R$-algebra generated by $\rm\:x\:.$