Archimedeans talk: A unified view towards diagonal arguments

I gave the talk “A unified view towards diagonal arguments” at Archimedeans yesterday and the transcript is now available here.

The talk is on Lawvere fixed point theorem, beginning with cartesian closed categories, proceeding to state and prove the theorem in that context (although in the version I delivered most formal treatment of categories was skipped for brevity and better intuition), and going on to give classical applications, culminating in Gödel incompleteness theorem.

As I said in the acknowledgement, I want to thank Archimedeans for organising this event, which I found a great way to know what my friends on the other side are doing in their own time. I am also extremely grateful to those who have, in one way or another, helped me in the preparation, delivery and feedback of the talk.

Finally I would like to wrap up with a feedback I got from a friend, who asked me if I had some kind of superpower IA mathmo in mind when I was writing prerequisite for the talk. Yes and no: you are an extremely strong IA mathmo if you went to the talk and understood everything on the spot. But even if you did not, there is some part of the talk (I hope) that makes sense to you, and more importantly, I hope this has made a dent somewhere in your repertoire of maths knowledge so that later on when you learned, for example, Gödel incompleteness theorem along the way, the idea will click with you. Perhaps with no coincidence, this is also the opinion I had towards category theory: learn it formally at some point and let time sink in.

So long!

Symmetric bilinear form vs. quadratic form

It has just struck me that symmetric bilinear forms and quadratic forms are different objects, corresponding to symmetric tensors and symmetric algebra respectively of certain tensor algebra (in this post “tensor algebra” will take the narrow meaning of 2nd tensor power of a vector space). I first came to this observation when I was told in IB Linear Algebra that to recover a symmetric bilinear form from a quadratic form, one requires the characteristic of the ground field to be not 2. This oddity can be explained by the “averaging” map

    \[ \frac{1}{2}(\varphi(u, v) + \varphi(v, u)) \]

which requires 2 to be a unit. This explanation is stilted (at least to me) and troubled me since it hinges on this explicit construction and does not tell us a priori why so. It occurred to me recently when I was going through some old notes that the formula is precisely the symmetrization map of the tensor algebra. It then follows easily that symmetric bilinear forms are images of the “averaging” map and thus is the subalgebra of symmetric tensors. I’ll elaborate briefly on this idea in this post.

Let k be a field and V be a finite-dimensional k-vector space. The space of bilinear forms on V can be identified with T^2(V^*), the 2-tensors by noting that

    \[ \operatorname{Hom}(V \otimes V, k) \cong \operatorname{Hom}(V, \operatorname{Hom}(V, k)) = \operatorname{Hom}(V, V^*) \cong V^* \otimes V^* \]

where the first isomorphism comes from Hom-tensor adjunction and the last one is an easy exercise. It is also possible to directly construct a bilinear map

    \begin{align*} V^* \times V^* &\to \operatorname{Bilin}(V, V) \\ (\varepsilon, \eta) &\mapsto ((u, v) \mapsto \varepsilon(u) \eta(v)) \end{align*}

In this way, symmetric tensors in T^2(V^*) are precisely those invariant under action of S_2, i.e. permutation, and thus can be identified with symmetric bilinear forms.

Now if the characteristic of k is not divisible by 2!, the symmetrization map

    \[ \frac{1}{2!}\operatorname{Sym}:  T^2(V^*) \to T^2(V^*) \]

induces an isomorphism of S^2(V^*), the symmetric algebra, and the subalgebra of symmetric 2-tensors. To identify quadratic forms with the symmetric algebra, we can construct a similar bilinear map from V^* \times V^* to the space of quadratic forms and resort to universal property of symmetric multilinear map. And in this way we have shown that symmetric forms and quadratic forms are isomorphic.

One more thing we get for free from this construction is that the space of bilinear forms is the direct sum of symmetric and skew-symmetric forms: since the symmetrization map (call it \pi) is a projection, we have

    \[ T^2(V^*) \cong \operatorname{ker}(\pi) \oplus \operatorname{Im}(\pi) \]

where the former is the ideal generated by elements of the form \varepsilon \otimes \eta - \eta \otimes \varepsilon that we quotient by and consists of skew-symmetric forms, and the latter can be identified by symmetric forms.

Just a word to end the post: as long as the characteristic of the field is not 2, it is possible to generalise this construction (and thus equivalence) to *-algebras. This is related to epsilon-quadratic form.


References

D. Dummit & R. Foote, Abstract Algebra, §11.5

Recommendation: A beginner’s guide to forcing

I’ve just finished reading Timothy Chow’s A beginner’s guide to forcing and what a nice read! I recommend all readers, particularly those interested in model and set theory, to take a look.

I’m particularly fond of the exposition at the beginning addressing the validity of ZFC as the foundation of maths, which I had had qualms about at some point:

One common confusion about models of ZFC stems from a tacit expectation that some people have, namely that we are supposed to suspend all our preconceptions about sets when beginning the study of ZFC. For example, it may have been a surprise to some readers to see that a universe is defined to be a set together with. . . . Wait a minute—what is a set? Isn’t it circular to define sets in terms of sets? In fact, we are not defining sets in terms of sets, but universes in terms of sets. Once we see that all we are doing is studying a subject called “universe theory” (rather than “set theory”), the apparent circularity disappears.

The reader may still be bothered by the lingering feeling that the point of introducing ZFC is to “make set theory rigorous” or to examine the foundations of mathematics. While it is true that ZFC can be used as a tool for such philosophical investigations, we do not do so in this paper. Instead, we take for granted that ordinary mathematical reasoning—including reasoning about sets—is perfectly valid and does not suddenly become invalid when the object of study is ZFC. That is, we approach the study of ZFC and its models in the same way that one approaches the study of any other mathematical subject.

Also there is a very nice, if not the best I’ve seen, explanation of Skolem’s paradox. So much so for the introduction part, I’ll leave you to find out what forcing is and how it is used to prove the independence of CH from ZFC.

Quadratic fields

An illustrated summary of IID Number Fields using quadratic fields is available here. Therein the following concepts/results are stated, usually accompanied by a worked example in quadratic field:

  • ring of integers
  • integral basis
  • discriminant
  • Dedekind’s criterion, unique factorisation & class group
  • Minkowski’s theorem and application
  • Dirichlet’s unit theorem

Different flavours of geometry

This post is almost verbatim from Undergraduate Algebraic Geometry § 0.3.

The specific flavour of algebraic geometry comes from the use of only polynomial functions (together with rational functions); to explain this, if U \subseteq \mathbb R is an open interval, one can reasonably consider the following rings of functions on U:

  • C^0(U): all continuous functions f: U \to \mathbb R;
  • C^\infty(U): all smooth functions (that is, differentiable to any order);
  • C^\omega(U): all analytic functions (that is, convergent power series);
  • \mathbb R[X]: the polynomial ring, viewed as a polynomial function on U.

There are of course inclusions \mathbb R[X] \subset C^\omega(U) \subset C^\infty(U) \subset C^0(U).

These rings of functions correspond to some of the important categories of geometry: C^0(U) to the topological category, C^\infty(U) to the differentiable category (differentiable manifolds), C^\infty(U) to real analytic geometry, and \mathbb R[X] to algebraic geometry.

The point I want to make here is that each of these inclusion signs represents an absolutely huge gap, and that this leads to the main characteristics of geometry in the different categories. Although it’s not stressed very much in school and first year university calculus, any reasonable way of measuring C^0(U) will reveal that the differentiable functions have measure 0 in the continuous functions. The gap between C^\omega(U) and C^\infty(U) is exemplified by the behaviour of \exp(1-1/x^2), the standard function which is differentiable infinitely often, but for which the Taylor series (at 0) does not converge to the function. Using this one can easily build a bump function that is 1 on a compact set and vanishes outside any given open set containing the compact set. In contrast, an analytic function on  U extends (as a convergent power series) to an analytic function of a complex variable on a suitable domain in \mathbb C, so that (using results from complex analysis) if f \in C^\infty(U) vanishes on a real interval, it must vanish identically. This is a kind of “rigid” property which characterises analytic geometry as opposed to differential topology.

 

References

M. Reid, Undergraduate Algebraic Geometry, § 0.3

Determinant trick, Cayley-Hamilton theorem and Nakayama’s lemma

The post is also available as pdf.

Cayley-Hamilton theorem is usually presented in a linear algebra context, which says that an F-vector space endomorphism (and its matrix representation) satisfies its own characteristic polynomial. Actually, this result can be generalised a little bit to all modules over commutative rings (which are certainly required to be finitely generated for determinant to make sense). There are many proofs available, among which is the bogus proof of substituting the matrix into the characteristic polynomial \det (x \cdot I - A) and obtaining

    \[ \det (A \cdot I - A) = \det (A - A) = 0. \]

The reason it doesn’t work is because the product x \cdot I in the characteristic polynomial is multiplication of a matrix by a scalar, while the product assumed in the bogus proof is matrix multiplication. In this post I will show that the intuition behind this “substitution-and-cancellation” is correct (in a narrow sense), and the proof is rescuable by using a trick, which in itself is quite useful.

I will develop the theory using language of rings and modules but if you don’t understand that, feel free to substitute “fields” and “vector spaces” in place.

Let A be a commutative ring and M be an A-module generated by \{m_1, \dots, m_n\}. Note that M is naturally an \End(M)-module and for all f \in \End(M), write [f] \in \matrixring_n(A) for its representation with respect to the generators above, i.e. f(m_i) = \sum_j [f]_{ij}m_j. In particular, there is a ring homomorphism \mu: A \to \End(M), a \mapsto a \cdot - sending an element to its multiplication action. Let A' = \mu(A).

There is a technical remark to make: later we will use determinant of matrices over \End(M), which is non-commutative. However, throughout the discussion we are concerned with only one endomorphism \varphi (besides multiplication, of course) so we can restrict the scalars to A'[\varphi], a subring contained in the centre of \End(M).

Given a module endomorphism \varphi: M \to M, its characteristic polynomial is defined to be

    \[ \chi_{[\varphi]}(x) = \det (x \cdot I - [\varphi]) \in A[x] \]

where I is the n \times n identity matrix and the product x \cdot I is multiplication of a matrix by a scalar. Note that \chi is generator dependent in general. We have

Cayley-Hamilton Theorem

    \[ \chi_{[\varphi]}(\varphi) = 0. \]

Note that this is a relation of endomorphisms with coefficients in A.


Proof: Let [\varphi]_{ij} = a_{ij} and view M as an A'[\varphi]-module. Since

    \[ \varphi m_i = \sum_j a_{ij}m_j, \]

we have

(*)   \begin{equation*} \sum_j \underbrace{(\varphi \delta_{ij} - a_{ij})}_{\Delta_{ij}} m_j = 0 \end{equation*}

with

    \[ \Delta = \varphi \cdot I - N \in \matrixring_n(A'[\varphi]). \]

Again, the multiplication is by scalar \varphi, viewed as an element of the ring \End(M).

Claim that if \det \Delta = 0 \in A'[\varphi] then we are done: consider the ring homomorphism

    \begin{align*} A[x] &\to \End(M) \\ x &\mapsto \varphi \end{align*}

which maps \chi_{[\varphi]}(t) \mapsto \chi_{[\varphi]}(\varphi) = \det \Delta since \det is a polynomial function. So done.

To show this, recall that

    \[ (\adj \Delta) \cdot \Delta = \det \Delta \cdot I \in \matrixring_n(A'[\varphi]) \]

where multiplication on the left is between matrices. Let (\adj \Delta)_{ij} = b_{ij}. Then multiply (*) by b_{ki} and apply the identity,

    \[ \sum_{i, j} (b_{ki} \Delta_{ij}) m_j = \sum_j (\det \Delta \delta_{kj}) m_j = (\det \Delta) m_k = 0. \]

so \det \Delta = 0 as required.


If you feel that little work is done in the proof and suspect it might be tautological somewhere (which I had when I first saw this proof), go through it again and convince yourself it is indeed a bona fide proof. There are two tricks used here: firstly we extend the scalars by recognising M as an A'[\varphi] ring so that action by \varphi becomes multiplication. This is essentially since it allows us to treat \varphi and scalar multiplication by A on an equal footing.  Secondly, the ring homomorphism A[x] \to A'[\varphi] substitutes \varphi in place of x. Aha, the intuition in the bogus proof is correct, but we need a little extra work to sort out the notation to express precisely what we mean.

The key idea in the proof, sometimes called the determinant trick, has many applications in commutative algebra:

Let M be an A-module generated by n elements and \varphi: M \to M a homomorphism. Suppose I is an ideal of A such that \varphi(M) \subseteq IM, then there is a relation

    \[ \varphi^n + a_1 \varphi^{n - 1} + \dots + a_{n - 1} \varphi + a_n = 0 \]

where a_i \in I^i for all i.


Proof: Let \{m_1, \dots, m_n\} be a set of generators of M. Since \varphi(m_i) \in IM, we can write

    \[ \varphi m_i = \sum_j a_{ij}m_j \]

with a_{ij} \in I. Multiply

    \[ \sum_j \underbrace{(\varphi \delta_{ij} - a_{ij})}_{\Delta_{ij}} m_j = 0 \]

by \adj \Delta, we deduce that (\det \Delta) m_j = 0 so \det \Delta = 0 \in \End(M). Expand.


An immediate corollary is Nakayama’s Lemma, which alone is an important result in commutative algebra:

Nakayama’s Lemma

If M is a finitely generated A-module and I \ideal R is such that M = IM then there exists x \in A such that x - 1 \in I and xM = 0.


Proof: Apply the trick to \id_M. Since \id_M^i = \id_M and a_n = a_n\id_M, we get

    \[ \left(1 + \sum_{i = 1}^n a_i\right) \id_M = 0. \]


We use the result to prove a rather interesting fact about module homomorphism:

Let M be a finitely generated A-module. Then every surjective module homomorphism on M is also injective.


Proof: Let \varphi: M \to M be surjective. Let M be an A'[\varphi] module and I = (\varphi) \ideal A'[\varphi]. Then M = IM by surjectivity of \varphi. Thus by Nakayama’s Lemma, there exists x = 1 + \varphi\psi, \psi \in A'[\varphi] such that (1 + \varphi\psi)M = 0, i.e. for all m \in M, (1 + \varphi\psi)m = 0. It follows that \varphi^{-1} = -\psi.


As a side note, the converse is not true: injective module homomorphisms need not be surjective. For example 2 \cdot -: \mathbb Z \to \mathbb Z.

 

References

M. Reid, Undergraduate Commutative Algebra, §2.6 – 2.8