Determinant trick, Cayley-Hamilton theorem and Nakayama’s lemma

The post is also available as pdf.

Cayley-Hamilton theorem is usually presented in a linear algebra context, which says that an F-vector space endomorphism (and its matrix representation) satisfies its own characteristic polynomial. Actually, this result can be generalised a little bit to all modules over commutative rings (which are certainly required to be finitely generated for determinant to make sense). There are many proofs available, among which is the bogus proof of substituting the matrix into the characteristic polynomial \det (x \cdot I - A) and obtaining

    \[ \det (A \cdot I - A) = \det (A - A) = 0. \]

The reason it doesn’t work is because the product x \cdot I in the characteristic polynomial is multiplication of a matrix by a scalar, while the product assumed in the bogus proof is matrix multiplication. In this post I will show that the intuition behind this “substitution-and-cancellation” is correct (in a narrow sense), and the proof is rescuable by using a trick, which in itself is quite useful.

I will develop the theory using language of rings and modules but if you don’t understand that, feel free to substitute “fields” and “vector spaces” in place.

Let A be a commutative ring and M be an A-module generated by \{m_1, \dots, m_n\}. Note that M is naturally an \End(M)-module and for all f \in \End(M), write [f] \in \matrixring_n(A) for its representation with respect to the generators above, i.e. f(m_i) = \sum_j [f]_{ij}m_j. In particular, there is a ring homomorphism \mu: A \to \End(M), a \mapsto a \cdot - sending an element to its multiplication action. Let A' = \mu(A).

There is a technical remark to make: later we will use determinant of matrices over \End(M), which is non-commutative. However, throughout the discussion we are concerned with only one endomorphism \varphi (besides multiplication, of course) so we can restrict the scalars to A'[\varphi], a subring contained in the centre of \End(M).

Given a module endomorphism \varphi: M \to M, its characteristic polynomial is defined to be

    \[ \chi_{[\varphi]}(x) = \det (x \cdot I - [\varphi]) \in A[x] \]

where I is the n \times n identity matrix and the product x \cdot I is multiplication of a matrix by a scalar. Note that \chi is generator dependent in general. We have

Cayley-Hamilton Theorem

    \[ \chi_{[\varphi]}(\varphi) = 0. \]

Note that this is a relation of endomorphisms with coefficients in A.

Proof: Let [\varphi]_{ij} = a_{ij} and view M as an A'[\varphi]-module. Since

    \[ \varphi m_i = \sum_j a_{ij}m_j, \]

we have

(*)   \begin{equation*} \sum_j \underbrace{(\varphi \delta_{ij} - a_{ij})}_{\Delta_{ij}} m_j = 0 \end{equation*}


    \[ \Delta = \varphi \cdot I - N \in \matrixring_n(A'[\varphi]). \]

Again, the multiplication is by scalar \varphi, viewed as an element of the ring \End(M).

Claim that if \det \Delta = 0 \in A'[\varphi] then we are done: consider the ring homomorphism

    \begin{align*} A[x] &\to \End(M) \\ x &\mapsto \varphi \end{align*}

which maps \chi_{[\varphi]}(t) \mapsto \chi_{[\varphi]}(\varphi) = \det \Delta since \det is a polynomial function. So done.

To show this, recall that

    \[ (\adj \Delta) \cdot \Delta = \det \Delta \cdot I \in \matrixring_n(A'[\varphi]) \]

where multiplication on the left is between matrices. Let (\adj \Delta)_{ij} = b_{ij}. Then multiply (*) by b_{ki} and apply the identity,

    \[ \sum_{i, j} (b_{ki} \Delta_{ij}) m_j = \sum_j (\det \Delta \delta_{kj}) m_j = (\det \Delta) m_k = 0. \]

so \det \Delta = 0 as required.

If you feel that little work is done in the proof and suspect it might be tautological somewhere (which I had when I first saw this proof), go through it again and convince yourself it is indeed a bona fide proof. There are two tricks used here: firstly we extend the scalars by recognising M as an A'[\varphi] ring so that action by \varphi becomes multiplication. This is essentially since it allows us to treat \varphi and scalar multiplication by A on an equal footing.  Secondly, the ring homomorphism A[x] \to A'[\varphi] substitutes \varphi in place of x. Aha, the intuition in the bogus proof is correct, but we need a little extra work to sort out the notation to express precisely what we mean.

The key idea in the proof, sometimes called the determinant trick, has many applications in commutative algebra:

Let M be an A-module generated by n elements and \varphi: M \to M a homomorphism. Suppose I is an ideal of A such that \varphi(M) \subseteq IM, then there is a relation

    \[ \varphi^n + a_1 \varphi^{n - 1} + \dots + a_{n - 1} \varphi + a_n = 0 \]

where a_i \in I^i for all i.

Proof: Let \{m_1, \dots, m_n\} be a set of generators of M. Since \varphi(m_i) \in IM, we can write

    \[ \varphi m_i = \sum_j a_{ij}m_j \]

with a_{ij} \in I. Multiply

    \[ \sum_j \underbrace{(\varphi \delta_{ij} - a_{ij})}_{\Delta_{ij}} m_j = 0 \]

by \adj \Delta, we deduce that (\det \Delta) m_j = 0 so \det \Delta = 0 \in \End(M). Expand.

An immediate corollary is Nakayama’s Lemma, which alone is an important result in commutative algebra:

Nakayama’s Lemma

If M is a finitely generated A-module and I \ideal R is such that M = IM then there exists x \in A such that x - 1 \in I and xM = 0.

Proof: Apply the trick to \id_M. Since \id_M^i = \id_M and a_n = a_n\id_M, we get

    \[ \left(1 + \sum_{i = 1}^n a_i\right) \id_M = 0. \]

We use the result to prove a rather interesting fact about module homomorphism:

Let M be a finitely generated A-module. Then every surjective module homomorphism on M is also injective.

Proof: Let \varphi: M \to M be surjective. Let M be an A'[\varphi] module and I = (\varphi) \ideal A'[\varphi]. Then M = IM by surjectivity of \varphi. Thus by Nakayama’s Lemma, there exists x = 1 + \varphi\psi, \psi \in A'[\varphi] such that (1 + \varphi\psi)M = 0, i.e. for all m \in M, (1 + \varphi\psi)m = 0. It follows that \varphi^{-1} = -\psi.

As a side note, the converse is not true: injective module homomorphisms need not be surjective. For example 2 \cdot -: \mathbb Z \to \mathbb Z.



M. Reid, Undergraduate Commutative Algebra, §2.6 – 2.8