The post is also available as pdf.
Cayley-Hamilton theorem is usually presented in a linear algebra context, which says that an -vector space endomorphism (and its matrix representation) satisfies its own characteristic polynomial. Actually, this result can be generalised a little bit to all modules over commutative rings (which are certainly required to be finitely generated for determinant to make sense). There are many proofs available, among which is the bogus proof of substituting the matrix into the characteristic polynomial and obtaining
The reason it doesn’t work is because the product in the characteristic polynomial is multiplication of a matrix by a scalar, while the product assumed in the bogus proof is matrix multiplication. In this post I will show that the intuition behind this “substitution-and-cancellation” is correct (in a narrow sense), and the proof is rescuable by using a trick, which in itself is quite useful.
I will develop the theory using language of rings and modules but if you don’t understand that, feel free to substitute “fields” and “vector spaces” in place.
Let be a commutative ring and be an -module generated by . Note that is naturally an -module and for all , write for its representation with respect to the generators above, i.e. . In particular, there is a ring homomorphism sending an element to its multiplication action. Let .
There is a technical remark to make: later we will use determinant of matrices over , which is non-commutative. However, throughout the discussion we are concerned with only one endomorphism (besides multiplication, of course) so we can restrict the scalars to , a subring contained in the centre of .
Given a module endomorphism , its characteristic polynomial is defined to be
where is the identity matrix and the product is multiplication of a matrix by a scalar. Note that is generator dependent in general. We have
Cayley-Hamilton Theorem
Note that this is a relation of endomorphisms with coefficients in .
Proof: Let and view as an -module. Since
we have
(*)
with
Again, the multiplication is by scalar , viewed as an element of the ring .
Claim that if then we are done: consider the ring homomorphism
which maps since is a polynomial function. So done.
To show this, recall that
where multiplication on the left is between matrices. Let . Then multiply by and apply the identity,
so as required.
If you feel that little work is done in the proof and suspect it might be tautological somewhere (which I had when I first saw this proof), go through it again and convince yourself it is indeed a bona fide proof. There are two tricks used here: firstly we extend the scalars by recognising as an ring so that action by becomes multiplication. This is essentially since it allows us to treat and scalar multiplication by on an equal footing. Secondly, the ring homomorphism substitutes in place of . Aha, the intuition in the bogus proof is correct, but we need a little extra work to sort out the notation to express precisely what we mean.
The key idea in the proof, sometimes called the determinant trick, has many applications in commutative algebra:
Let be an -module generated by elements and a homomorphism. Suppose is an ideal of such that , then there is a relation
where for all .
Proof: Let be a set of generators of . Since , we can write
with . Multiply
by , we deduce that so . Expand.
An immediate corollary is Nakayama’s Lemma, which alone is an important result in commutative algebra:
Nakayama’s Lemma
If is a finitely generated -module and is such that then there exists such that and .
Proof: Apply the trick to . Since and , we get
We use the result to prove a rather interesting fact about module homomorphism:
Let be a finitely generated -module. Then every surjective module homomorphism on is also injective.
Proof: Let be surjective. Let be an module and . Then by surjectivity of . Thus by Nakayama’s Lemma, there exists , such that , i.e. for all , . It follows that .
As a side note, the converse is not true: injective module homomorphisms need not be surjective. For example .
References
M. Reid, Undergraduate Commutative Algebra, §2.6 – 2.8
Even though I also read Reid’s chapter on modules, your presentation of the topic really helped me grasp the topic. Thank you!
I have one question – do you know why in the “determinant trick” we stop talking about characteristic polynomial of A? We could say “if the matrix of φ has coefficients in I, then characteristic polynomial of [φ] also has coefficients in I and χ[φ](x)=0”
But we instead choose to use the wording “φ(M) subset IM and there exist *some* polynomial in I[x] such that evaluation at φ in A'[φ] equals 0”. Why is that? I think we lose information by saying *some* polynomial instead of characteristic polynomial of [φ].