Modular forms, projective structures,
and the four squares theorem
Abstract.
It is well-known that Lagrange’s four-square theorem, stating that every natural number may be written as the sum of four squares, may be proved using methods from the classicaltheory of modular forms and theta functions. We revisit this proof. In doing so, we concentrate on geometry and thereby avoid some of the tricky analysis that is often encountered. Guided by projective differential geometry we find a new route to Lagrange’s theorem.
1991 Mathematics Subject Classification:
11F03, 11F27, 53A20An artist’s impression of the action of on the unit disc.
In the Polish wycinanka łowicka style by Katarzyna Nurowska.
1. Introduction
In 1770, Lagrange proved that every natural number can be written as the sum of four squares. In 1834, Jacobi gave a formula for the number of different ways that this can be done. More precisely, if we consider the formal power series
(1) |
then Lagrange’s Theorem says that all coefficients of
are positive whilst Jacobi’s Theorem gives a manifestly positive formula for these coefficients. In fact, it is evident from the identity
that, for Lagrange’s theorem, it suffices to show that all odd natural numbers may be written as the sum of four squares whence it suffices to establish Jacobi’s formula in this case, namely that
(2) |
where is the sum-of-divisors function. The aim of this article is to prove (2). It is well-known that this can be accomplished using modular forms and this is what we shall do. However, some of the tricky analysis can be avoided in favour of geometry. This is one motivation for this article. Another is that a key feature of the usual proof, namely that a certain vector space of modular forms is two-dimensional, is replaced by the two-dimensionality of the solution space to a projectively invariant linear differential equation. This reasoning is potentially applicable for automorphic forms beyond complex analysis.
2. The twice-punctured sphere
It is not commonly realised that the first contributor to the theory of modular forms was the cartographer Mercator, who in 1569 found an accurate conformal map of the twice-punctured round sphere. With the punctures at the South and North Poles, this Mercator projection is the default representation of the Earth to be found in ordinary atlases♯\sharp♯\sharpBut we find it convenient to put the southern hemisphere at the top.. From a modern perspective, it may be constructed in two steps:
-
•
Use stereographic projection
to identify with the complex plane .
-
•
Use the complex logarithm to ‘unwrap’ the punctured complex plane to its universal cover .
These two steps are conformal, the first by geometry or calculus, and the second by the Cauchy-Riemann equations. Explicit formulæ are
and we end up with two crucial (and conformal) facts:
-
•
,
-
•
is a local coördinate on near the South Pole.
Note that this essential appearance of the logarithm in the Mercator projection predates Napier and others (in the seventeenth century).
The Mercator realisation of the twice-punctured sphere
may already be used to prove some useful identities as follows.
Theorem 1.
If , then
(3) |
Proof.
It is easy to check that the left hand side is uniformly convergent on compact subsets of . It is invariant under and therefore descends to a holomorphic function on the thrice-punctured sphere:
Let us call this function and note that
-
•
as ,
-
•
.
It follows that extends holomorphically through and and has zeroes at these two points whilst at it clearly extends meromorphically with a double pole there. Hence,
for some constant . To compute , we may substitute to find that
Finally, if , then
as required. ∎
Corollary 1.
For and ,
(4) |
Proof.
We remark that identities such as (3) and (4) are often established using ‘unfamiliar expressions’ for trigonometric functions and regarded as a ‘standard rite of passage into modular forms’ [2, p. 5]. Already, we see the utility of the Mercator projection in identifying the universal cover of the twice-punctured sphere and it is natural to ask about a similar identification for the thrice-punctured sphere.
3. The thrice-punctured sphere
Our exposition in this section follows advice from Tony Scholl to the the first author in 1984.
Let be the thrice-punctured Riemann sphere. More specifically, let us use the standard coördinate , and set
By the Riemann mapping theorem there is a conformal isomorphism between the lower half plane
and the following subset
(5) |
of the upper half plane . In fact, as with all Riemann mappings, there is a three-parameter family thereof and we need to specify just one of them. To do this let us extend the lower half plane as the complement of two rays
extend the target domain as
and consider the Riemann mapping between these extensions that sends to and, at these points, sends the direction to , as shown.
This particular Riemann mapping is chosen so that it intertwines the involution (having fixed point ) with the involution of (fixing and preserving the extended target).
We conclude that the lower half plane is sent to the ‘tile’
and that this mapping holomorphically extends across the line segment to the upper half plane, which itself is sent to a neighbouring and translated tile attached to the right of the original. It is illuminating to view this construction on the sphere
with the lower hemisphere as domain and, replacing the upper half plane by the unit disc with its hyperbolic metric, the target is now the ideal triangle:
(6) |
From this point of view, the mapping extends holomorphically through a ‘portal’ in the equator between and to the upper hemisphere, with the result mapping to
However, there are three such portals to the upper hemisphere, all on an equal footing with respect to the evident three-fold rotational symmetry. Using all three unwraps the thrice-punctured sphere to
(7) |
and, of course, we can keep going to and fro between north and south through our three portals to obtain a tessellation♯\sharp♯\sharpFamiliar from the works of M.C. Escher. of the hyperbolic disc and a conformal covering . This is an explicit realisation of the universal covering. We remark that the Little Picard Theorem follows immediately from this realisation.
4. Symmetries of the upper half plane
The reader may be wondering why we viewed the extended target as a domain in the upper half plane
rather than the corresponding domain in the unit disc:
The point is that the upper half plane is more congenial with regard to an explicit realisation of the symmetry group for which this extended tile is a fundamental domain.
Lemma 1.
The two transformations
generate a group of biholomorphisms of the upper half plane , having as fundamental domain.
Proof.
Regarding the ideal triangle (6), the corresponding tessellation (7) is evidently generated by the three hyperbolic reflections in its sides. Viewed in the upper half plane (5), these three reflections are
Therefore, the group we seek may be generated by , , and , namely
But these three transformations are , , and . ∎
It is useful to have an algebraic description of the group generated by and . To this end, and also because we shall need some of this algebra for other purposes later on, we record some well-known properties of the following well-known group.
4.1. The modular group
This is an alternative name for the group , of unit determinant matrices with integer entries. It is generated by
Notice that
There is a normal subgroup . The quotient group is denoted . It is generated by and subject to the relations . The group acts on the upper half plane according to
this action descending to a faithful action of . Indeed, this action identifies as the biholomorphisms of . Having done this, the subgroup acts properly discontinuously on . It is easy to verify and well-known that
(8) |
is a fundamental domain for this action.
4.2. Some congruence subgroups
Let us consider the following two subgroups of .
-
•
.
-
•
.
It is clear that
and easily verified that has 48 elements. In particular, the subgroup has index 48 in . Also the homomorphism
shows that of index . Therefore, whilst is not a normal subgroup of , it has index .
We may now achieve our goal of an algebraic description of the group generated by and .
Lemma 2.
The subgroup of generated by
and |
is .
Proof.
We give a geometric proof by comparing fundamental domains. To this end we note that
(9) |
is a perfectly good alternative to the usual (8) as a fundamental domain for the action of . Moreover, six hyperbolic copies of this alternative may be used to tile the fundamental domain concerning the action of Lemma 1:
(10) |
We have observed that has index . It follows that
has index and, therefore, that may be regarded as a subgroup of of index . Certainly,
Equality follows because, as subgroups of , they have the same index of , as (10) shows. ∎
It is usual to introduce another congruence subgroup of the modular group
but it has already occurred in our proof above as .
In summary, the group acts on the upper half plane by
The resulting homomorphism is a double cover, having as kernel. The subgroup descends to
which acts discontinuously and without fixed points. The resulting mapping
is an explicit (and conformal) realisation of the universal cover of the thrice-punctured sphere .
Note that there is still a certain amount of mystery built into this realisation, which can be traced back to our use of the non-constructive Riemann mapping theorem at the start of Section 3. This mystery now shows up in our having two natural local coördinates near the South Pole. On the one hand, we may write , as we did for the twice-punctured sphere, to obtain a local holomorphic coördinate replacing for as . On the other hand, we have, by construction, the global meromorphic coördinate on the sphere with the South Pole at . It follow that is a holomorphic function of near and vice versa. For the moment, the relationship between and is mysterious save that various key points coincide:
It is clear, however, that acquires a projective structure: a preferred set of local coördinates related by Möbius transformations. In fact, it is better: we have defined up to freedom (real Möbius transformations).
5. Puncture repair
The main upshot of the reasoning in Sections 3–4 is a realisation of the thrice-punctured Riemann sphere as the upper half plane modulo the action of , an explicit subgroup of acting properly discontinuously and without fixed points. Furthermore, it is evident from this construction, that may be compactified as the Riemann sphere (using, for example, the coördinate change ). In fact, an argument due to Ahlfors and Beurling [1] shows that there are no other conformal compactifications.
Theorem 2.
Suppose is a compact Riemann surface with a conformal isomorphism onto an open subset of . Then must be conformal to the Riemann sphere with the standard embedding.
Proof.
In fact, this is a local result as in the following picture,
taken from [3]. The punctured open disc is assumed to be conformally isomorphic to the open set (but nothing is supposed concerning the boundary of in ). We conclude that is conformally the disc and the punctured disc, tautologically included. To see this, we calculate in polar coördinates on the unit disc. We know that there is a smooth positive function defined for so that the metric smoothly extends from to . We will encounter a contradiction if contains two or more points since, in this case, the concentric curves , as , have length bounded away from zero in the metric . More explicitly,
is bounded away from zero as . On the other hand, the area of the region in is estimated by Cauchy-Schwarz as
and is therefore forced to be infinite.∎
Otherwise said, there is no difference between the Riemann sphere, either marked at or punctured there. Thus, it makes intrinsic sense on to consider holomorphic -forms that are restricted from meromorphic -forms on with poles only at the marks. Of special interest is the space (in traditional arcane notation)
Theorem 3.
There is a canonical isomorphism
Proof.
The isomorphism is given by
with being a consequence of the Residue Theorem. ∎
In particular, there is the special meromorphic -form
6. Automorphisms of the thrice-punctured sphere
By the Ahlfors-Beurling Theorem, automorphisms of correspond to permutations of and there are two particular ones that we shall find useful. Firstly, since
it follows that
(11) |
induces an automorphism of . In the -coördinate, it is the one that swops and but fixes , namely .
Secondly, since
it follows that
(12) |
is the automorphism of that swops and whilst fixing . Close to , we recognise it as . In the -coördinate, it is
7. The normal distribution
At this point, rather bizarrely, it is useful to discuss the normal distribution
and its well-known invariance under the Fourier transform
More generally, integration by substitution shows that
(13) |
for any . The Poisson summation formula says that
for a suitably well-behaved function (for example, one that lies in Schwartz space). For , as in (13), we find that
(14) |
8. A miracle
An outrageous suggestion is to view the formal power series (1) as defining a holomorphic function of the complex variable (now called Jacobi’s theta function). Clearly, it is convergent for . Hence, setting , we obtain a holomorphic function of for . Then a miracle occurs:
Theorem 4.
For , we have
Equivalently, if we define by
and consider the holomorphic -form then
(15) |
Proof.
Notice that the transformation has already made its appearance (11) as inducing an automorphism of , the thrice-punctured sphere. If we also introduce by
then it is clear that and . Hence, we see that
(16) |
Finally, to obtain a geometric interpretation of (15) we note that
is given by
and recall that and together generate . Note that in accordance with (15) and (16). Putting all this together, we have proved the following.
Theorem 5.
The holomorphic -form descends to the thrice-punctured sphere and, under the automorphism , satisfies .
Corollary 2.
In the usual -coördinate on the thrice-punctured sphere,
Proof.
From we see that and so
near and, in particular, meromorphically extends through , having a simple pole there with residue . This is a coördinate-free statement and so also applies in the -coördinate:
Recall that in the -coördinate, the automorphism interchanges with whilst fixing . The relation , implies that also has a pole at with residue . Finally, the behaviour of at may be investigated by means of the automorphism (12), let us call it , which swops and whilst fixing . In particular, we may easily compare along the imaginary -axis with its behaviour along the translated axis :
It is clear that has only a simple pole at . But is real-valued when or , and the -expansion coefficients are all non-negative, so is dominated by as . The possibility of an essential singularity is excluded by the observation that the intersection of any semicircle centred at with an appropriately chosen fundamental domain containing is a finite curve, so the maximal value of , as runs along the semicircle, is bounded by for some real . So the behaviour of at is certainly no worse than the behaviour at .
In summary, the holomorphic -form on enjoys a meromorphic extension to with
-
•
a simple pole at with residue ,
-
•
a simple pole at with residue ,
-
•
at worse at simple pole at .
By the residue theorem, the sum of the residues of any meromorphic -form on any Riemann surface is zero. It follows that has poles only at and . Having identified precisely two poles, it cannot have any zeros. At this point is determined as stated. ∎
9. An Eisenstein series
Introduce
and, by absolute convergence, observe that
(17) |
Theorem 6.
(18) |
where (and recall that ).
Proof.
This is a straightforward application of (4):
10. The Ramanujan ODE
Following Ramanujan, let
(21) |
defined for . The following identity was proved by Ramanujan [4, identities (17), (27), (28), and (30)], as a corollary of his straightforward but inspired proof of a certain identity between Lambert series. These Lambert series identities were elucidated by van der Pol [6], who showed that they ultimately derive from the product formula and transformation formula for Jacobi’s theta function. A direct combinatorial proof is due to Skoruppa [5].
Theorem 7.
As (formal) power series,
(22) |
As usual, by setting , we may view as a holomorphic function for . A change of variables gives
(23) |
an equivalent statement to (22). Locally, we may write
(24) |
and (23) becomes . Thus, we are led to consider
(25) |
for a holomorphic function and (22) says that is a solution of (25). We may investigate the solutions of the linear equation (25) quite explicitly. Firstly, we may figure out much more about as follows.
Lemma 3.
We may take
a globally defined holomorphic function .
Proof.
As an aside, we note that the resulting power series expansion
where, as one obtains easily from (26),
(27) |
has integer coefficients. Indeed, the generating function of is the -expansion of a Lambert series
which, upon rewriting, assumes the form
But the -expansion of this infinite product is well-known. It is the generating function of the manifestly integral partition numbers :
Returning to (26), we find that satisfies
and, recalling that , we find that .
Let denote the solution space of (25). As is simply-connected, we conclude that is two-dimensional and in Lemma 3 we have already found one non-zero element in . To complete our understanding of it suffices to find another linearly independent element:
Lemma 4.
There is a convergent power series
so that is in .
Proof.
We try as an Ansatz in (25). A calculation shows that (25) reduces to
whereas substituting instead, gives
Each of these gives a recursion relation for the coefficients of a formal power series for the function in question, namely
where and, for ,
By Lemma 3, we know that the power series converges for (and, from this formal point of view, the content of (22) is that the recursion relation (27) yields the same coefficients ). From these recurrence relations it is clear, by induction, that . It follows that also converges for and we are done.∎
In summary, Lemmata 3 and 4 give us a basis for of the form
and are holomorphic functions on the unit disc . Also notice that both and are strictly positive along the imaginary axis in . In particular, we conclude that .
Theorem 8.
The equation (25) is projectively invariant.
Proof.
Firstly, we must explain what the phrase ‘projectively invariant’ means. There is no local structure in the conformal geometry of (an -dimensional complex manifold is locally biholomorphic to ; end of story). Globally, however, the group acts conformally on and this may be recorded as local information on , specifically as a collection of preferred local coördinates, namely and its translates
Roughly speaking, this is a ‘projective structure.’ In any case, to say that (25) is ‘projectively invariant’ is to say that it respects the action of . For this to be true we decree that
(28) |
(In the language of projective differential geometry is a ‘projective density of weight .’) From (17), (18), and (19), we already know that
and so it suffices to show that
which is an elementary consequence of the chain rule. ∎
Recall that , the solution space of (25), is two-dimensional. In accordance with Theorem 8, the group , generated by
(29) |
is represented on . More specifically, if solves (25) then, according to (28), so do
Theorem 9.
The holomorphic function satisfies
(30) |
for .
Proof.
It suffices to prove (30) for the generators and of , specifically that
The first of these holds by Lemma 3, which implies that . To establish the second identity, it suffices to show that for some constant : if , then
so
Therefore
and so
in other words, from (24),
as required. To finish the proof, let us consider the action of on . If , then we may set to obtain as a basis of . By construction
By Lemma 3, we already know that and, from Lemma 4, we know that the action of on is diagonalisable with the other eigenvalue being . In other words
for some constant . In , the matrices (29) satisfy the relations
These same relations must hold for their action on . For this is evident and for we conclude that . Therefore, since
we find that
However, in Lemma 4, we already found in an eigenvector for the action of on with eigenvalue . It follows that
(31) |
for some constant . We have already observed that whereas, substituting into , we find that
Therefore, the only option in (31) is that and so . Hence, assuming that we have found that . This contradiction finishes the proof. ∎
Corollary 3.
The holomorphic -form
is -invariant.
Proof.
We need only check invariance under the generators of :
The first of these is clear since . For the second, we may use Theorem 9 immediately to conclude that
but also that
Subtracting these identities gives
But
the factor of cancels, and we are done. ∎
Lemma 5.
Suppose is a holomorphic function and let . In order that extend to a meromorphic differential form on the unit disc with at worse a simple pole at , it is necessary and sufficient that
-
•
,
-
•
is bounded on the rectangle .
Proof.
The first condition ensures that is, in fact, a holomorphic function of and then, since the second condition says that is bounded on the disc at which point Riemann’s removable singularities theorem implies that extends holomorphically across the origin: . Therefore,
as required. ∎
Now consider the holomorphic -form
With , as usual, it follows from the definition (21) of that
and so and, in particular, extends holomorphically across . Now we ask what happens at the cusps, a sensible question in view of Corollary 3.
The change of coördinates sends our usual fundamental domain for into itself whilst sending
(it’s a half turn about in the hyperbolic metric on ). In order to figure out the behaviour of let us firstly consider the holomorphic -form . We may view it in the coördinate :
and employ Theorem 9 to conclude that
Of course, whilst is periodic under , is not. Thus, the first stipulation of Lemma 5 in this case (namely, theperiodicity of ) is not satisfied. But on the rectangle in the statement of Lemma 5, this function is at least bounded. Now, if we apply the same reasoning to the holomorphic -form , then the boundedness hypothesis of Lemma 5 is again satisfied, and again periodicity fails. When we subtract from , periodicity is restored in view of Corollary 3 and boundedness persists! Lemma 5 now applies and we conclude that has no worse than a simple pole at . Similar reasoning applies concerning the cusp at . With more care we could even compute the residues at these points (but this is an optional extra).
To conclude, we have verified that
and
are meromorphic one-forms on the thrice-punctured sphere with poles and zeros in the same locations. It follows that one is a constant multiple of the other, and the proof of (2)♯\sharp♯\sharpThe full force of the Jacobi four-square theorem, namely that the number of ways of representing an integer as a sum of four squares of integers is equal to , follows from (2) in an elementary fashion. is complete upon comparing their power series expansions in .
References
- [1] L. Ahlfors and A. Beurling, Conformal invariants and function-theoretic null-sets, Acta Math. 83 (1950) 101–129.
- [2] F.I. Diamond and J. Shurman, A First Course in Modular Forms, Springer 2005.
- [3] M.G. Eastwood and A.R. Gover, Volume growth and puncture repair in conformal geometry, Jour. Geom. Phys. 127 (2018) 128–132.
- [4] S. Ramanujan, On certain arithmetical functions, Trans. Cambridge Philos. Soc. 22 (1916) 159–184.
- [5] N.-P. Skoruppa, A quick combinatorial proof of Eisenstein series identities, Jour. Number Theory 43 (1993) 68–73.
- [6] B. van der Pol, On a non-linear partial differential equation satisfied by the logarithm of the Jacobian theta-functions, with arithmetical applications, Parts I and II, Indagationes Math. 13 (1951) 261–284.