This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An illustrated introduction to the arithmetic of Apollonian circle packings, continued fractions, and other thin orbits

Katherine E. Stange Department of Mathematics, University of Colorado, Campux Box 395, Boulder, Colorado 80309-0395 kstange@math.colorado.edu
(Date: , Draft #1)
Abstract.

These notes cover and expand upon the material for two summer schools: The first, which was held at CIRM, Marseille, France, July 10-14, 2023, as part of Renormalization and Visualization for packing, billiard and surfaces, was titled Number theory as a door to geometry, dynamics and illustration. The second was held at NSU IMS in Singapore, June 3-7, 2024, as part of Computational Aspects of Thin Groups, and was titled Integral packings and number theory. Both courses were put together by a number theorist for students and researchers in other fields. They cover a web of ideas relating to Apollonian circle packings, integral orbits, thin groups, hyperbolic geometry, continued fractions, and Diophantine approximation. The connection of geometry and dynamics to number theory gives an opportunity to illustrate arithmetic by appealing to our visual intuition.

Key words and phrases:

THIS IS A DRAFT;
PLEASE REPORT ERRORS

1. Introduction

These notes were meant to serve for two separate summer schools. The first, which was held at CIRM, Marseille, France, July 10-14, 2023, as part of Renormalization and Visualization for packing, billiard and surfaces, was titled Number theory as a door to geometry, dynamics and illustration. The abstract was:

The course will explore several related topics in number theory with dynamical and/or geometric facets: continued fractions, Diophantine approximation, and Apollonian circle packings. We will focus on both theoretical and experimental tools; a parallel goal will be to experience the role of visualization and illustration in mathematical research.

In covering background material, the approach will emphasize the visual and dynamical:

  1. (1)

    Continued fractions, quadratic forms, and Diophantine approximation.

  2. (2)

    Hyperbolic geometry, Minkowski space, and Kleinian groups.

With these tools at hand, we will study some areas of current research:

  1. (1)

    The geometry of Diophantine approximation and continued fractions in the complex plane, including algebraic starscapes and Schmidt arrangements.

  2. (2)

    Apollonian circle packings, with an emphasis on their surprising relationships to the preceding topics.

The second was held at NSU IMS in Singapore, June 3-7, 2024, as part of Computational Aspects of Thin Groups, and is titled Integral packings and number theory. The abstract for this course was:

The course will use Apollonian circle packings as a central example for connections between number theory and thin groups. The symmetries of such a packing are governed by a thin group called the Apollonian group, and the curvatures form an orbit of that group. Our goal is to study such orbits, particularly in the case that the orbit consists entirely of integers. Some of the topics that are entwined with the study of these packings include quadratic forms, hyperbolic geometry in 2 and 3 dimensions, arithmetic geometry, continued fractions, spectral theory of graphs and strong approximation. I will give a tour of the area, with the goal of introducing the number theory perspective on such problems, highlighting the tools at hand, and finishing by considering the wider class of problems that can be phrased as questions about the arithmetic of thin orbits.

In both cases, the audience did not consist of number theorists, and my goal is to highlight connections between number theory and other domains, and to share some of the number theorist’s perspective on ideas familiar in other domains. The notes will contain a number of examples of the ways that number theory leads to geometry and dynamics. As a number theorist, I see these relationships as number theory questions with geometric answers, but this is surely not the only way to see them.

Three motivating questions are:

  1. (1)

    Which complex numbers can be well approximated by algebraic numbers (of various flavours)?

  2. (2)

    What curvatures appear in a primitive integral Apollonian circle packing or another thin orbit?

  3. (3)

    For imaginary quadratic fields KK, how is the number theory of 𝒪K\mathcal{O}_{K} visible in the Bianchi group PSL2(𝒪K)\operatorname{PSL}_{2}(\mathcal{O}_{K})?

It turns out these are all connected to one another by their underlying geometry. And they are all generalizations of very classical questions in number theory about the integer solutions to equations, the distribution of rationals in the real line, and the study of continued fraction expansions.

I must admit that a particular love of the author is the Apollonian circle packing. If one is interested in the classical number theoretic topics of PSL2()\operatorname{PSL}_{2}(\mathbb{Z}), continued fractions, or quadratic forms, and one begins to wonder about higher dimensional analogues, one comes very naturally to study PSL2([i])\operatorname{PSL}_{2}(\mathbb{Z}[i]), Schmidt’s continued fractions, and Hermitian forms. Studying these geometrically, one discovers that the essential new riddle that lies wrapped in the mystery inside the enigma…is the Apollonian circle packing. It is a halfway house, an unavoidable geometric feature, but an orbit of a group that is not itself an arithmetic group. It begs the same questions we ask for Diophantine equations, without being one itself. This is its allure.

Finally, these topics are all examples of the interplay between research and the illustration of mathematics. The use of computers to explore mathematics with our highly evolved visual cortices is not only productive, but also sensual, in the language of Conway [Con97]. One of my goals in these notes is to emphasize the utility and beauty of these explorations, and make a case for their wider use in research.

These notes are written by a number theorist. I apologize in advance for the inevitable injustice in making connections to other domains which I cannot fully follow up. I hope, however, that this insufficiency will serve as a motivation. There are a great number of interesting unexplored species in this jungle.

One final note to the reader: in the off chance that your enthusiasm causes you to rush headlong (as mine sometimes does), I would warn you that some important statements are contained in the exercises, so their statements are an important part of the narrative.

1.0.1. Acknowledgements

I owe a great debt of gratitude to Jayadev Athreya for inspiration and support over many years, and for his invitation to participate in his Morlet semester at Centre international de rencontres mathématiques, which eventually led to these notes. Thank you to all the organizers of the Research School, Renormalization and Visualization for packing, billiard and surfaces, held July 10-14, 2023, for which these lecture notes were originally drafted, and to the organizers of the Programme Computational Aspects of Thin Groups, held June 3-7, 2024, for which these lecture notes were adapted and expanded. These organizers include Jayadev Athreya and Nicalas Bedaride in the first case; and Bettina Eick, Eamonn O’Brien, Alan W. Reid, and Ser Peow Tan in the second. Thank you to all the students and researchers who attended and contributed feedback; and gratitude to the Centre international de rencontres mathématiques (CIRM) and the National University of Singapore Institute for Mathematical Sciences (NUS IMS) for hosting and supporting these conferences. Thank you to James Rickards for TA help at the IMS school. Thank you to Elena Fuchs, Daniel Martin and James Rickards for consultation, and to many for sharing permission to use their images (individually noted in each case). Thank you to Glen Whitney for feedback. Thank you to Summer Haag for her detailed reading and feedback, and willingness to dive into the minus signs. And finally, thank you to all my intrepid collaborators on all the projects which are mentioned in these notes.

2. A number theory perspective

2.1. Diophantine problems

Given a system of polynomial equations, what are the integer or rational solutions? Such problems are called Diophantine problems and make up a significant part of modern and ancient number theory.

Some examples of Diophantine problems are:

  1. (1)

    Solutions to a single equation f(x1,,xn)=0f(x_{1},\ldots,x_{n})=0, amongst the most famous of which are quadratic forms ax2+bxy+cy2=max^{2}+bxy+cy^{2}=m and elliptic curves y2=x3+ax+by^{2}=x^{3}+ax+b.

  2. (2)

    Generalizing the previous case, solutions to systems of equations in several variables, which represent algebraic varieties such as abelian varieties, curves and surfaces.

  3. (3)

    Many problems reduce to or can be phrased as relating to Diophantine problems, such as the famous congruent number problem, the units in a ring of integers (e.g., Pell’s equation), and the arithmetic study of function iteration (arithmetic dynamics).

  4. (4)

    Among the more significant examples are algebraic varieties arising as moduli spaces in the study of other problems (e.g., modular curves), where knowing the points amounts to classifying mathematical objects of some type.

  5. (5)

    The study of matrix groups and algebraic groups more generally. This includes most/many Lie groups.

  6. (6)

    In the context of a group such as an algebraic group, the study of the elements of an orbit.

The depth and breadth of the partial list above amply motivates the study of Diophantine equations.

As we shall see, in the process of trying to understand these problems, we are led naturally to look for solutions modulo pp, in the pp-adics p\mathbb{Q}_{p}, in the real numbers \mathbb{R} and in the complex numbers \mathbb{C}. Knowledge of the solutions in these other realms all have a role to play in the original realm – \mathbb{Q} or \mathbb{Z} – which is in some sense the hardest place to look for solutions. We are also led naturally to generalize these questions to number fields and their rings of integers.

2.2. Quadratic forms

There are some natural starting points for the study of Diophantine equations. After linear equations such as ax+by=cax+by=c (which one solves with the Euclidean algorithm, which is central to everything), the historical and logical next step is quadratic equations and quadratic forms. These will have an important role to play throughout the entirety of these notes.

An nn-ary quadratic form is a homogeneous polynomial in nn variables of degree 22. (The term form refers to the homogeneity.) Therefore, a binary quadratic form is a homogeneous polynomial in two variables of degree 22, which will necessarily have the shape

f(x,y)=ax2+bxy+cy2f(x,y)=ax^{2}+bxy+cy^{2}

for some coefficients a,b,ca,b,c, which we will take to be in \mathbb{R}, to begin.

Binary quadratic forms have a beautiful Diophantine theory, whose geometric incarnations we will meet later. The main question is to ask: what are the values represented by a quadratic form? For the number theorist, we ask especially for forms with integer coordinates, evaluated on integer inputs. A famous example is the following:

Theorem 2.1 (Fermat).

The quadratic form x2+y2x^{2}+y^{2} represents exactly those odd primes which are 11 modulo 44.

There are many proofs of this fact, arising from many different tools. It can be seen as a property of the Gaussian integers [i]\mathbb{Z}[i] and quadratic reciprocity (Corollary 2.8 below), a result of the fact that π>2\pi>2 via the geometry of lattices (Exercise 2.3), using dynamics (Exercise 2.9, or proven using Jacobi sums [IR82] or even partition theory [Chr16].

The discriminant Δf\Delta_{f} of the form ff is b24acb^{2}-4ac. If the form takes both negative and positive values when evaluated at (x,y)(x,y)\in\mathbb{Z}, we call it indefinite. Otherwise, there are two cases: it can be semi-definite (taking the value 0 on some non-trivial input) or definite (if it does not). If it is definite, its values away from (0,0)(0,0) all have the same sign, so it is either positive or negative.

As number theorists, our interest will be in the case a,b,ca,b,c\in\mathbb{Q}. Up to a scalar multiple, it is convenient to take a,b,ca,b,c\in\mathbb{Z} having no common factor. This is called a primitive integral binary quadratic form. We are mainly interested in positive definite primitive integral binary quadratic forms. This cumbersome terminology teases the limit of human patience, so some authors abbreviate such forms as ‘PDPIBQFs’ or something similarly awkward; we will just say ‘integral quadratic form’ and imply all the abovementioned qualifiers unless stated otherwise. If we say ‘real quadratic form’ we mean a positive definite binary quadratic form with real coefficients.

We wish to abstract away the choice of coordinates inherent in this definition: we would like to identify f(x,y)f(x,y) and g(X,Y)g(X,Y) if they are related by a linear change of variables X=αx+βyX=\alpha x+\beta y, Y=γx+δyY=\gamma x+\delta y, where α,β,δ,γ\alpha,\beta,\delta,\gamma\in\mathbb{Z} and αδβγ=1\alpha\delta-\beta\gamma=1. We write this as a matrix action of SL2()\operatorname{SL}_{2}(\mathbb{Z}):

g(X,Y)=Mf(x,y)=f(M(x,y)t),M=(αβγδ)SL2().g(X,Y)=M\cdot f(x,y)=f(M\cdot(x,y)^{t}),\quad M=\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}\in\operatorname{SL}_{2}(\mathbb{Z}).

We call two such forms properly equivalent or just equivalent (there is also a notion of GL2()\operatorname{GL}_{2}(\mathbb{Z})-equivalence, but we will prefer this one for now). Equivalent forms share much of their behaviour. Two equivalent forms will represent the same set of integers upon integer inputs, and will have the same discriminant. If ff is primitive, then MfM\cdot f is also primitive. Finally, observe that the matrix ISL2()-I\in\operatorname{SL}_{2}(\mathbb{Z}) has no effect on a form ff, so that it is just as reasonable to talk about the action of PSL2():=SL2()/±I\operatorname{PSL}_{2}(\mathbb{Z}):=\operatorname{SL}_{2}(\mathbb{Z})/\pm I.

Exercise 2.2.

Verify the statement that proper equivalence preserves primitivity, discriminant and the set of integers represented.

Exercise 2.3.

This exercise outlines a geometric proof that x2+y2x^{2}+y^{2} represents exactly those odd primes that are 11 modulo 44. It turns on the fact that π>2\pi>2.

  1. (1)

    Observe that x2+y2x^{2}+y^{2} cannot represent anything which is 33 modulo 44.

  2. (2)

    Let p1(mod4)p\equiv 1~{}(\textup{mod}~{}4). Show that 1-1 is a square modulo pp (this can be accomplished using the group theory of (/p)(\mathbb{Z}/p\mathbb{Z})^{*}, which is cyclic). Conclude that pp divides m2+1m^{2}+1 for some mm.

  3. (3)

    Prove Minkowski’s Theorem in dimension two: Any convex set in 2\mathbb{R}^{2} which is symmetric about the origin and of volume exceeding 44 contains a non-zero integer point.

  4. (4)

    Consider the lattice Λ\Lambda of 2\mathbb{R}^{2} generated as the \mathbb{Z}-span of (1,m)(1,m) and (0,p)(0,p). Show that all elements 𝐯Λ\mathbf{v}\in\Lambda have norm squared 𝐯2||\mathbf{v}||^{2} divisible by pp. Compute the covolume of this lattice (the area of the fundamental parallelogram).

  5. (5)

    Conclude the argument using Minkowski’s Theorem to show there is an element 𝐯Λ\mathbf{v}\in\Lambda having 𝐯2=p||\mathbf{v}||^{2}=p.

2.3. Local-to-global

Consider the question of when a positive integer nn is a sum of three squares: n=x2+y2+z2n=x^{2}+y^{2}+z^{2}. It was first observed by Legendre that this has a solution (x,y,z)(x,y,z) if and only if nn is not 77 modulo 88. We can rephrase this to say that the equation has a solution in \mathbb{Z} if and only if it has a solution modulo every odd prime, and modulo 88. This characterizes nn’s representability in terms of the ‘local’ congruence conditions ‘at’ each prime. In fact, Fermat’s Theorem, that an odd prime can be represented as a sum of squares if and only if it is 1(mod4)1~{}(\textup{mod}~{}4), can be viewed from this lens also. This is called a local-to-global principle.

To formalize this just slightly, let fi(x1,,xn)=0f_{i}(x_{1},\ldots,x_{n})=0 be a system of polynomial equations with coefficients in \mathbb{Q}. Suppose this system has a solution in \mathbb{Q}, say (x1,,xn)n(x_{1},\ldots,x_{n})\in\mathbb{Q}^{n}. Since \mathbb{Q} embeds in \mathbb{R} and in p\mathbb{Q}_{p} for all pp, the \mathbb{Q}-solution, called a global solution, will also provide local solutions everywhere, i.e. solutions in p\mathbb{Q}_{p} and \mathbb{R}. It is traditional in this setting to consider \infty a prime of \mathbb{Q}, along with the usual primes, calling all of these places or valuations; the primes are finite places and \infty is the infinite place. The field =:\mathbb{R}=:\mathbb{Q}_{\infty} is the completion of \mathbb{Q} at the place \infty and p\mathbb{Q}_{p} is the completion at the place pp, for each prime pp. These completions, called local fields, are actually easier fields in which to study polynomial solutions. All of this has generalizations for number fields, but we will stick to \mathbb{Q} for the moment.

Thus we have observed that global solutions imply local ones. In the case of quadratic forms, the converse is true:

Theorem 2.4 (Hasse-Minkowski Theorem).

Let Q(x1,,xn)Q(x_{1},\ldots,x_{n}) be an nn-ary quadratic form with coefficients in \mathbb{Q}. Then Q=0Q=0 has a solution in \mathbb{Q} (i.e. (x1,xn)n(x_{1},\ldots x_{n})\in\mathbb{Q}^{n}) if and only if it has a solution in p\mathbb{Q}_{p} for all pp and in \mathbb{R}.

There is a version of this theorem over number fields more generally. For a proof see [Voi21, Theorem 14.3.3]; it is non-trivial. When such a converse holds, i.e. there are global solutions if and only if there are solutions everywhere locally, then we say that a variety satisfies the Hasse principle or a local-to-global principle. The three squares example of Legendre at the beginning of this section demonstrates an integral flavour of the local-to-global principle, obtained by looking at the integers \mathbb{Z} and p\mathbb{Z}_{p}.

The Hasse principle (when it holds) is powerful. In particular, determining if a variety XX has solutions over a local field is a finite computation (for p\mathbb{Q}_{p}, check for solutions modulo pp and then use Hensel’s Lemma). By contrast, determining the existence of global solutions for a variety without the Hasse principal can be quite difficult. In fact, the general problem of determining if there are integer solutions to a Diophantine problem is undecideable; this is known as Hilbert’s Tenth Problem [Mat93].

Exercise 2.5.

Show that x2+y2+7z2=0x^{2}+y^{2}+7z^{2}=0 has no non-trivial \mathbb{R}-solutions, no non-trivial 7\mathbb{Q}_{7}-solutions and no non-trivial \mathbb{Q}-solutions.

2.4. Quadratic reciprocity

Quadratic reciprocity was first observed by Legendre and Euler and proved by Gauss. Whereas Sunzi’s Theorem111More commonly known as the Chinese Remainder Theorem can be viewed as a statement that coprime moduli are ‘independent’ in a certain way, quadratic reciprocity describes a deep way in which /p\mathbb{Z}/p\mathbb{Z} and /q\mathbb{Z}/q\mathbb{Z}, for primes pqp\neq q do interact.

Definition 2.6.

The Legendre symbol is the defined for integer aa and prime pp as:

(ap)={1a is a non-zero square modulo p1a is not a square modulo p0a is zero modulo p\left(\frac{a}{p}\right)=\left\{\begin{array}[]{ll}1&a\text{ is a non-zero square modulo }p\\ -1&a\text{ is not a square modulo }p\\ 0&a\text{ is zero modulo }p\\ \end{array}\right.
Theorem 2.7 (Quadratic reciprocity).

Let pp and qq be distinct odd primes. Then

(pq)(qp)=(1)p12q12\left(\frac{p}{q}\right)\left(\frac{q}{p}\right)=(-1)^{\frac{p-1}{2}\frac{q-1}{2}}

Furthermore,

  1. (1)

    1-1 is a square modulo pp if and only if p1(mod4)p\equiv 1~{}(\textup{mod}~{}4)

  2. (2)

    22 is a square modulo pp if and only if p±1(mod8)p\equiv\pm 1~{}(\textup{mod}~{}8)

It is said that there are hundreds of proofs of quadratic reciprocity in the literature222Indeed, Lemmermeyer maintains a list currently over 300 entries: https://www.mathi.uni-heidelberg.de/~flemmermeyer/qrg_proofs.html.. Among the themes recurring in these proofs we often see aspects of the Fourier transform, or the signs of permutations, among others. We now provide a proof of Fermat’s Theorem which depends on quadratic reciprocity.

Corollary 2.8.

The integers nn which can be represented as a sum of two integer squares are exactly those for which, for each prime pp that divides nn, either (a) pp divides nn to an even power; or (b) p1,2(mod4)p\equiv 1,2~{}(\textup{mod}~{}4).

Proof.

First one shows that a prime p3(mod4)p\equiv 3~{}(\textup{mod}~{}4) cannot be represented, simply because there’s no solution modulo 44.

Then one shows that primes equivalent to 1(mod4)1~{}(\textup{mod}~{}4) can be represented, as follows. By quadratic reciprocity, such a prime p=4k+1p=4k+1 has the property that 1-1 is a square modulo pp. Therefore 4k1x2(modp)4k\equiv-1\equiv x^{2}~{}(\textup{mod}~{}p). Thus np=x2+1np=x^{2}+1, a sum of squares. Thus (x+i)(xi)=np(x+i)(x-i)=np in [i]\mathbb{Z}[i], which has unique factorization. Since the gcd of x+ix+i and xix-i is at most a factor of 2i2i, and pp is odd, we see that pp cannot be prime in [i]\mathbb{Z}[i] (since otherwise, it would divide both, by symmetry under conjugation). Therefore p=(a+bi)(c+di)p=(a+bi)(c+di), where neither factor is a unit times an integer, so abcd0abcd\neq 0. Then since pp\in\mathbb{Z}, c+di=k(abi)c+di=k(a-bi) for some kk\in\mathbb{Z}. Since pp is prime, k=±1k=\pm 1. So we have p=a2+b2p=a^{2}+b^{2}.

Finally, observe that sums of squares are exactly the norms of Gaussian integers. We have shown that every prime p1(mod4)p\equiv 1~{}(\textup{mod}~{}4) is such a norm, and no prime p3(mod4)p\equiv 3~{}(\textup{mod}~{}4) is such a norm. Also, 2=12+12=N(1+i)2=1^{2}+1^{2}=N(1+i). By unique factorization, the norms are exactly those which can be produced as products of norms of Gaussian primes, which we have now essentially classified. ∎

Exercise 2.9.

(Zagier [Zag90]) There is a short proof of Corollary 2.8 using an argument about fixed points. Suppose p1(mod4)p\equiv 1~{}(\textup{mod}~{}4). Let S={(x,y,z)>0:x2+4yz=p}S=\{(x,y,z)\in\mathbb{Z}^{>0}:x^{2}+4yz=p\} be the set of natural numbers solving x2+4yz=px^{2}+4yz=p.

  1. (1)

    Show the following is an involution of SS and has exactly one fixed point:

    (x,y,z){(x+2z,z,yxz)if x<yz;(2yx,y,xy+z)if yz<x<2y;(x2y,xy+z,y)if x>2y.(x,y,z)\mapsto\begin{cases}(x+2z,z,y-x-z)&\text{if }x<y-z;\\ (2y-x,y,x-y+z)&\text{if }y-z<x<2y;\\ (x-2y,x-y+z,y)&\text{if }x>2y.\end{cases}
  2. (2)

    What other (more trivial) involutions are there?

  3. (3)

    Prove pp can be written as a sum of two squares.

2.5. Brauer-Manin obstructions

A famous counterexample to the Hasse principle due to Selmer is given by

3x3+4y3+5z3=03x^{3}+4y^{3}+5z^{3}=0

which has local solutions everywhere but no global solutions. When the Hasse principle fails, many of the failures are captured by a Brauer-Manin obstruction, which arises from reciprocity laws. We will illustrate the phenomenon by an example of a local-to-global failure due to Iskovskikh [Isk71]:

y2+z2=(3x2)(x22)y^{2}+z^{2}=(3-x^{2})(x^{2}-2)
Proposition 2.10.

The equation above has local solutions everywhere, but no global solutions.

Proof.

To show that it has a solution in \mathbb{R}, take x2>2x^{2}>2 but x2<3x^{2}<3 so the right hand side is positive. For p\mathbb{Q}_{p}, it suffices to find solutions modulo pp and lift by Hensel’s lemma; we leave this as an exercise.

To show there are no global \mathbb{Q}-solutions, first, let us homogenize the equation, so we can look for \mathbb{Z} solutions:

X:y2+z2=(3t2x2)(x22t2)=:A(t,x)B(t,x).X:y^{2}+z^{2}=(3t^{2}-x^{2})(x^{2}-2t^{2})=:A(t,x)B(t,x).

We may assume that there is no common factor to the quadruple x,y,z,tx,y,z,t.

First we examine the equation modulo 44. By running through the possibilities for xx and tt being even or odd, we restrict the possible values of (A,B)(A,B) modulo 44: {(0,0),(2,3),(3,1),(3,2)}\{(0,0),(2,3),(3,1),(3,2)\}. However, (A,B)(0,0)(mod4)(A,B)\equiv(0,0)~{}(\textup{mod}~{}4) if and only if x,yx,y are both even if and only if x,y,z,tx,y,z,t are all even. But we have ruled out such a common factor, so we have the list:

{(2,3),(3,1),(3,2)}.\{(2,3),(3,1),(3,2)\}.

We will now consider primes pp dividing at least one of AA or BB.

First, suppose pp divides both xx and tt. Since x,y,z,tx,y,z,t cannot have a common factor, we may assume without loss of generality that pp does not divide zz. Then considering the equation modulo pp, we find that 1y2/z2-1\equiv y^{2}/z^{2} is a square modulo pp. Since our previous work ruled out p=2p=2 (as xx and tt are not both even), we conclude that p1(mod4)p\equiv 1~{}(\textup{mod}~{}4) by the first supplementary law of quadratic reciprocity (Theorem 2.7(1)).

Second, suppose pp divides at most one of xx and tt. Then pp cannot divide both AA and BB. Let pkp^{k} be the maximum power to which it appears. On the left, pkp^{k} divides a sum of squares, so pkp^{k} must be 0, 11 or 2(mod4)2~{}(\textup{mod}~{}4) (Corollary 2.8).

Hence in either case, all the maximal prime powers that divide AA or BB are 0, 11 or 2(mod4)2~{}(\textup{mod}~{}4) and therefore (A,B)(mod4)(A,B)~{}(\textup{mod}~{}4) can only be a member of the list

{(0,0),(1,1),(2,2),(0,1),(1,0),(2,0),(0,2),(1,2),(2,1)}\{(0,0),(1,1),(2,2),(0,1),(1,0),(2,0),(0,2),(1,2),(2,1)\}

Comparing with the list from the first part, we discover that there are no solutions. ∎

The proof is completely elementary with the exception of the use of the first supplementary law of quadratic reciprocity (once directly and once in the form of Corollary 2.8). Therefore we think of this obstruction as arising from quadratic reciprocity.

Exercise 2.11.

Complete the proof above by finding solutions in p\mathbb{Q}_{p} for each pp.

2.6. The modular group PSL2()\operatorname{PSL}_{2}(\mathbb{Z})

We met SL2()\operatorname{SL}_{2}(\mathbb{Z}) and PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) as natural groups of symmetries on quadratic forms. They have many other roles to play in mathematics, some of them geometric. This leads to certain connections between quadratic forms and geometry. Recalling these essential and beautiful facts is our next task.

We will begin with the action of SL2()\operatorname{SL}_{2}(\mathbb{Z}) on the upper half plane U2\mathbb{H}^{2}_{U}. We define the upper half plane as

U2={z:(z)>0}.\mathbb{H}^{2}_{U}=\{z\in\mathbb{C}:\Im(z)>0\}\subseteq\mathbb{C}.

The notation \Im is for ‘imaginary part’333This symbol is ’s command \Im and was traditionally used in typesetting old european books.; later \Re will be for real part. The action is via Möbius transformations:

(abcd)z=az+bcz+d\begin{pmatrix}a&b\\ c&d\end{pmatrix}\cdot z=\frac{az+b}{cz+d}

in terms of the usual \mathbb{C} structure. The fact that the matrix I-I acts as the trivial Möbius transformation motivates our use of the projectivization PSL2():=SL2()/±I\operatorname{PSL}_{2}(\mathbb{Z}):=\operatorname{SL}_{2}(\mathbb{Z})/\pm I.

Exercise 2.12.

Möbius transformations with coefficients from \mathbb{C}, i.e. PSL2()\operatorname{PSL}_{2}(\mathbb{C}), act on the extended complex plane ^:={}\widehat{\mathbb{C}}:=\mathbb{C}\cup\{\infty\}.

  1. (1)

    Amongst these, show that the Möbius transformations preserving U2\mathbb{H}^{2}_{U} are exactly those with real coefficients and positive determinant.

  2. (2)

    In ^\widehat{\mathbb{C}}, we consider straight lines to be circles through \infty. Show that Möbius transformations take circles to circles and preserve angles.

A standard fundamental region for the action of PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) on U2\mathbb{H}^{2}_{U} is as follows.

Theorem 2.13 (for example, [Sil94, Proposition 1.5]).

Let

={zU2:|z|>1,12<(z)12,(|z|=1(z)0)}.\mathcal{F}=\left\{z\in\mathbb{H}^{2}_{U}:|z|>1,\frac{1}{2}<\Re(z)\leq\frac{1}{2},\left(|z|=1\Rightarrow\Re(z)\geq 0\right)\right\}.

Then \mathcal{F} is a fundamental domain for the action of PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) on U2\mathbb{H}^{2}_{U}, meaning that for all zz, exactly one PSL2()\operatorname{PSL}_{2}(\mathbb{Z})-translate of zz lies in \mathcal{F}.

To visualize this theorem, it is useful to consider two important elements of PSL2()\operatorname{PSL}_{2}(\mathbb{Z}):

S=(0110),T=(1101).S=\begin{pmatrix}0&-1\\ 1&0\end{pmatrix},\quad T=\begin{pmatrix}1&1\\ 0&1\end{pmatrix}.

The action of SS and TT as Möbius transformations is inversion z1/zz\mapsto-1/z and translation zz+1z\mapsto z+1, respectively. See Figure 2.1 for an image of \mathcal{F} and some of its translates under SS and TT.

Exercise 2.14.
  1. (1)

    Show that if M=(abcd)PSL2()M=\begin{pmatrix}a&b\\ c&d\end{pmatrix}\in\operatorname{PSL}_{2}(\mathbb{Z}), then

    (Mz)=(z)|cz+d|2.\Im(M\cdot z)=\frac{\Im(z)}{|cz+d|^{2}}.
  2. (2)

    Prove that any PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) orbit intersects \mathcal{F} (this proves a part of Theorem 2.13). To do so, first show that there is an element MPSL2()M\in\operatorname{PSL}_{2}(\mathbb{Z}) which maximizes the imaginary part of MzM\cdot z. Then illustrate how to move this element of the orbit into \mathcal{F} using SS and TT.

The PSL2()\operatorname{PSL}_{2}(\mathbb{Z})-stabilizer of each point zz\in\mathcal{F} is trivial, with the exception of the following special points:

Stab(i)={I,S},Stab(e2πi/6)={I,TS,(TS)2}.\operatorname{Stab}(i)=\{I,S\},\quad\operatorname{Stab}(e^{2\pi i/6})=\{I,TS,(TS)^{2}\}.
Refer to caption
Figure 2.1. The fundamental region \mathcal{F} of SL2()\operatorname{SL}_{2}(\mathbb{Z}) and some of its translates.

In fact, these two elements SS and TT generate PSL2()\operatorname{PSL}_{2}(\mathbb{Z}).

Theorem 2.15 (for example, [Sil94, Corollary 1.6]).

The group PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) is generated by SS and TT. In fact, PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) is the free product of the subgroups generated by SS and STST, of order 22 and 33 respectively. We have

PSL2()S,T:S2=1,(ST)3=1.\operatorname{PSL}_{2}(\mathbb{Z})\cong\langle S,T:S^{2}=1,(ST)^{3}=1\rangle.

2.7. Quadratic forms in the upper half plane

Our next goal is to parametrize real quadratic forms f(x,y):=ax2+bxy+cy2f(x,y):=ax^{2}+bxy+cy^{2} in some useful way. To a quadratic form f(x,y)=ax2+bxy+cy2f(x,y)=ax^{2}+bxy+cy^{2}, we might associate the polynomial in one variable f(x,1)=ax2+bx+cf(x,1)=ax^{2}+bx+c; and to such a minimal polynomial we might naturally associate the quadratic form ax2+bxy+cy2ax^{2}+bxy+cy^{2}. This association preserves discriminant.

A minimal polynomial with Δ<0\Delta<0 has two complex roots, one of which will lie in U2\mathbb{H}^{2}_{U}. Thus, to positive definite quadratic forms, we may associate points in the upper half plane, and this association is clearly a bijection, at least as long as we take the form ff up to scaling by an element of \mathbb{R}^{*}.

However, associating the root zz (or the pair (z,z¯)(z,\overline{z})) to the form ff breaks a symmetry, at the very least between xx and yy, but more generally it is not invariant under our preferred SL2()\operatorname{SL}_{2}(\mathbb{Z})-equivalence. First of all, it is more natural to say that the solutions to f(x,y)=ax2+bxy+cy2=0f(x,y)=ax^{2}+bxy+cy^{2}=0 are the two projective points [z:1][z:1] and [z¯:1][\overline{z}:1] in 1()\mathbb{P}^{1}(\mathbb{C}), and therefore to identify a solution [z:1][z:1] with [λz:λ1][\lambda z:\lambda 1], for λ\lambda\in\mathbb{C}^{*}. Then, from the perspective of SL2()\operatorname{SL}_{2}(\mathbb{Z})-equivalence of quadratic forms, we wish to identify the roots of f(x,y)f(x,y) with those of Mf(x,y)M\cdot f(x,y), for each M=(αβγδ)SL2()M=\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}\in\operatorname{SL}_{2}(\mathbb{Z}), which are

[αz+β:γz+δ],[αz¯+β:γz¯+δ].[\alpha z+\beta:\gamma z+\delta],\quad[\alpha\overline{z}+\beta:\gamma\overline{z}+\delta].

Because we are in projective space, it is more natural to projectivize and identify the roots under PSL2()\operatorname{PSL}_{2}(\mathbb{Z})-equivalence, since I-I has no effect on the form at all444By which I mean, f(x,y)=f(x,y)f(-x,-y)=f(x,y)..

This motivates the following theorem.

Theorem 2.16.

Define a map ρ\rho on the collection of primitive integral binary quadratic forms up to \mathbb{R}^{*} scaling, taking values in U2\mathbb{H}^{2}_{U}, by letting ρ(f)=z\rho(f)=z, where zz is the root of f(x,1)f(x,1) in U2\mathbb{H}^{2}_{U}. Then this map is PSL2()\operatorname{PSL}_{2}(\mathbb{Z})-invariant, where the action of f(x,y)f(x,y) is by proper equivalence, and the action on U2\mathbb{H}^{2}_{U} is by Möbius transformation. In other words, Mρ(f)=ρ(Mf)M\cdot\rho(f)=\rho(M\cdot f) for MPSL2()M\in\operatorname{PSL}_{2}(\mathbb{Z}).

Exercise 2.17.

Prove Theorem 2.16.

The following exercise addresses classical questions of Gauss and Legendre: how many equivalence classes of primitive integral quadratic forms exist for a fixed discriminant?

Exercise 2.18.
  1. (1)

    Show that every real quadratic form is equivalent to one of the form Ax2+Bxy+Cy2Ax^{2}+Bxy+Cy^{2} satisfying (i) |B|AC|B|\leq A\leq C and (ii) B0B\geq 0 whenever one of the \leq in part (i) is an equality. Hint: use the upper half plane.

  2. (2)

    Use the previous exercise to determine how many inequivalent primitive integral quadratic forms there are of discriminant 4-4 and 20-20.

  3. (3)

    Fix Δ<0\Delta<0. Prove that there are finitely many distinct equivalence classes of integral quadratic forms of discriminant Δ\Delta. Can you give a bound?

2.8. Lattices in the upper half plane

The upper half plane also parametrizes certain lattices555A common place to first meet this idea is in the complex theory of elliptic curves.. A lattice is a discrete666In the metric topology of \mathbb{C}. subgroup of the additive group of \mathbb{C}. We say it is rank two if it is isomorphic to 2\mathbb{Z}^{2} as a \mathbb{Z}-module.

Exercise 2.19.

Show that a lattice in \mathbb{C} is of rank two if and only if it spans \mathbb{C} as an \mathbb{R}-vector space.

The fundamental observation is that 1()\mathbb{P}^{1}(\mathbb{C}) is (almost) in bijection with rank two lattice bases (up to scaling) in \mathbb{C}, via [z:w]w+z[z:w]\leftrightarrow w\mathbb{Z}+z\mathbb{Z}. I say almost because points for which z/wz/w\in\mathbb{R} give rise to \mathbb{Z}-modules which do not span \mathbb{C} (being either rank one or not discrete). If we restrict to the upper half plane U2^1()\mathbb{H}^{2}_{U}\subseteq\widehat{\mathbb{C}}\cong\mathbb{P}^{1}(\mathbb{C}), then this associates to each zU2z\in\mathbb{H}^{2}_{U} the rank two lattice +z\mathbb{Z}+z\mathbb{Z}. As mentioned, we must consider lattices only up to homothety, i.e., scaling by \mathbb{C}^{*}. That is, we say two lattices Λ1\Lambda_{1} and Λ2\Lambda_{2} are homothetic, writing Λ1Λ2\Lambda_{1}\sim\Lambda_{2}, if Λ1=λΛ2\Lambda_{1}=\lambda\Lambda_{2} for some λ\lambda\in\mathbb{C}^{*}. It is not hard to verify this is an equivalence relation.

A rank two lattice in \mathbb{C} comes with an orientation: if the angle from the first basis vector to the second is less than π\pi, then it is positively oriented and otherwise it is negatively oriented. That (z)>0\Im(z)>0 implies the associated lattices are positively oriented.

Theorem 2.20.

The upper half plane U2\mathbb{H}^{2}_{U} is in bijection with homothety classes of positively oriented rank two lattice bases in \mathbb{C}, by the following map:

z+z.z\mapsto\mathbb{Z}+z\mathbb{Z}.

Furthermore, the bijection is PSL2()\operatorname{PSL}_{2}(\mathbb{Z})-equivariant, where PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) acts on lattices by change of basis:

(abcd)(+z)=(cz+d)+(az+b).\begin{pmatrix}a&b\\ c&d\end{pmatrix}\cdot(\mathbb{Z}+z\mathbb{Z})=(cz+d)\mathbb{Z}+(az+b)\mathbb{Z}.
Exercise 2.21.

Prove the theorem.

2.9. Lattices and quadratic forms

Let us return to the question raised above: What is preserved under the action of PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) on quadratic forms, if not the roots [z:1][z:1] and [z¯:1][\overline{z}:1]? The answer: the collection of the totality of roots of all equivalent forms. By the last section, one way to encapsulate this data is as a \mathbb{Z}-lattice, whose bases are in bijection with the roots. If [z:1]1()[z:1]\in\mathbb{P}^{1}(\mathbb{C}) represents a root of the form f(x,y)f(x,y), then we set Λf:=+z\Lambda_{f}:=\mathbb{Z}+z\mathbb{Z}. Since we always have two conjugate roots, we always have two conjugate lattices, exactly one of which is positively oriented, and we can choose that lattice to associate to our quadratic form.

From an equivalence class of lattices up to homothety, we can also recover the quadratic form: use a homothety to write the lattice Λ\Lambda as +z\mathbb{Z}+z\mathbb{Z} where zU2z\in\mathbb{H}^{2}_{U}, take the minimal polynomial ax2+bx+cax^{2}+bx+c of zz, and let f(x,y)=ax2+bxy+cy2f(x,y)=ax^{2}+bxy+cy^{2}, defined up to scaling by \mathbb{R}^{*}. Another way to recover the form from the lattice is to take α+β\alpha\mathbb{Z}+\beta\mathbb{Z} to the quadratic form N(x,y)=N(yα+xβ)N(x,y)=N(y\alpha+x\beta) where N(z)=|z|2N(z)=|z|^{2}, the square of the complex absolute value. Thus, the lattice’s relationship to the form can be seen in two interesting ways: first, as the lattice whose bases form the totality of roots of all equivalent forms; and second, as the lattice whose vector lengths are the values of the quadratic form. It is not perhaps entirely obvious that these are ideas are the same.

Exercise 2.22.

Let zU2z\in\mathbb{H}^{2}_{U}. Define

f(x,y):=(xy)(1z)(1z)(xy)¯.f(x,y):=\begin{pmatrix}x&y\end{pmatrix}\begin{pmatrix}1\\ -z\end{pmatrix}\overline{\begin{pmatrix}1&-z\end{pmatrix}\begin{pmatrix}x\\ y\end{pmatrix}}.

From this, recover the two interpretations given above of the lattice +z\mathbb{Z}+z\mathbb{Z} (which is SL2()\operatorname{SL}_{2}(\mathbb{Z})-equivalent to zz\mathbb{Z}-\mathbb{Z}) as carrying information about the form ff.

In light of this connection, we will use the following notation:

𝒬:={positive definite real binary quadratic formsup to scaling by }\mathcal{Q}_{\mathbb{R}}:=\left\{\begin{subarray}{c}\text{positive definite real binary quadratic forms}\\ \text{up to scaling by $\mathbb{R}^{*}$}\end{subarray}\right\}
:={homothety classes of positively oriented rank two -lattices Λ=α+β in }\mathcal{L}_{\mathbb{R}}:=\left\{\begin{subarray}{c}\text{homothety classes of positively oriented rank two $\mathbb{Z}$-lattices $\Lambda=\alpha\mathbb{Z}+\beta\mathbb{Z}$ in $\mathbb{C}$}\end{subarray}\right\}

When we wish to focus on integral forms, we define

𝒬:={positive definite primitiveintegral binary quadratic forms}\mathcal{Q}_{\mathbb{Z}}:=\left\{\begin{subarray}{c}\text{positive definite primitive}\\ \text{integral binary quadratic forms}\end{subarray}\right\}
:={homothety classes of positively oriented rank two -lattices Λ=α+β in ,where α/β is an algebraic number of degree 2.}\mathcal{L}_{\mathbb{Z}}:=\left\{\begin{subarray}{c}\text{homothety classes of positively oriented rank two $\mathbb{Z}$-lattices $\Lambda=\alpha\mathbb{Z}+\beta\mathbb{Z}$ in $\mathbb{C}$,}\\ \text{where $\alpha/\beta$ is an algebraic number of degree $2$.}\end{subarray}\right\}

Then our description so far can be filled out to obtain the following classical result.

Theorem 2.23.

There is a PSL2()\operatorname{PSL}_{2}(\mathbb{Z})-equivariant bijection between 𝒬\mathcal{Q}_{\mathbb{R}} and \mathcal{L}_{\mathbb{R}}, restricting to a bijection between 𝒬\mathcal{Q}_{\mathbb{Z}} and \mathcal{L}_{\mathbb{Z}}.

Proof.

The \mathbb{R} case is essentially a collection of our work so far. Observe that the map from forms to lattices, on 𝒬\mathcal{Q}_{\mathbb{Z}}, returns lattices +z\mathbb{Z}+z\mathbb{Z} where zz is a quadratic algebraic number. Conversely, if zz is quadratic algebraic, then the form has \mathbb{Q} coefficients and can be normalized to lie in 𝒬\mathcal{Q}_{\mathbb{Z}}. ∎

In fact, the lattices of \mathcal{L}_{\mathbb{Z}} are exactly those which arise as fractional ideals of imaginary quadratic fields.

Exercise 2.24.

The order of a rank-two lattice Λ=α+β\Lambda=\alpha\mathbb{Z}+\beta\mathbb{Z}\subseteq\mathbb{C} is ord(Λ):={z:zΛΛ}\operatorname{ord}(\Lambda):=\{z\in\mathbb{C}:z\Lambda\subseteq\Lambda\}. Suppose the lattice is associated under Theorem 2.23 to an integral quadratic form. Show that the order is a subring of an imaginary quadratic field. Which field?

2.10. Diophantine approximation

We now return to one of the most basic questions of number theory, which can be asked about the real line, but answered with the geometry of the upper half plane. How do real numbers lie in relation to rational numbers in \mathbb{R}?

One simple way to answer this question is to describe the decimal system, which is a sort of addressing system for real numbers by successive approximation by rationals with 1010-power denominator. There are, however, many ways to approximate a real number by rationals. Diophantine approximation asks us to approximate a real number α\alpha by the ‘best’ rationals p/qp/q in the sense that |p/qα||p/q-\alpha| is small while |q||q| is simultaneously small. One way to measure the quality of an approximation is to study the quantity

logq(minp|pqα|).-\log_{q}\left(\min_{p}\left|\frac{p}{q}-\alpha\right|\right).

Since we can always expect an approximation to within 1q\frac{1}{q}, this quantity is bounded below by 11. But we can often do significantly better. Phrased in what is sometimes a more traditional way, we ask, for an exponent η\eta, whether we can find, or can find infinitely many, p/qp/q satisfying

|pqα|<1qη.\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{\eta}}.

Further evidence for the appropriateness of this measure of ‘goodness’ of an approximation is given by Dirichlet’s Theorem.

Theorem 2.25 (Dirichlet).

Let α\alpha\in\mathbb{R}. Then α\alpha is irrational if and only if there exist infinitely many distinct p/qp/q\in\mathbb{Q} such that

|pqα|<1q2.\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{2}}. (1)

We interpret this as saying that rational numbers are “poorly approximable” and irrationals are “well approximable.”

Proof.

Consider an irrational α\alpha\in\mathbb{R}. Divide the unit interval [0,1][0,1] into QQ even subintervals, where Q>0Q>0 is an integer. Amongst the Q+1Q+1 real numbers 0,α,2α,,Qα0,\alpha,2\alpha,\ldots,Q\alpha, there must be two whose fractional parts777The fractional part of xx is xxx-\lfloor x\rfloor, the number xx minus the largest integer less than or equal to xx. It lies in the unit inteval [0,1)[0,1). fall into the same subinterval. Call these iαi\alpha and jαj\alpha, where 0i<jQ0\leq i<j\leq Q. Then

|(ji)αp|<1Q\left|(j-i)\alpha-p\right|<\frac{1}{Q}

for an appropriate choice of integer pp. Then let q:=jiQq:=j-i\leq Q and

|pqα|<1qQ1q2.\left|\frac{p}{q}-\alpha\right|<\frac{1}{qQ}\leq\frac{1}{q^{2}}.

Thus we have found one example. To generate another example, distinct from any pi/qip_{i}/q_{i} that may have come before, we choose QQ^{\prime} so that

|piqiα|>1Q\left|p_{i}-q_{i}\alpha\right|>\frac{1}{Q^{\prime}}

for all ii (notice that this is possible because α\alpha is irrational, so qiαq_{i}\alpha is never an integer), and then repeat the argument above. In this way, we generate infinitely many distinct p/qp/q\in\mathbb{Q} having the desired property.

On the other hand, rational numbers ‘repel’ one another in the sense that for any distinct p1/q1,p2/q2p_{1}/q_{1},p_{2}/q_{2}\in\mathbb{Q},

|p1q1p2q2|1q1q2.\left|\frac{p_{1}}{q_{1}}-\frac{p_{2}}{q_{2}}\right|\geq\frac{1}{q_{1}q_{2}}.

In particular, for q2>q1q_{2}>q_{1},

|p1q1p2q2|>1q22.\left|\frac{p_{1}}{q_{1}}-\frac{p_{2}}{q_{2}}\right|>\frac{1}{q_{2}^{2}}.

This means that if α\alpha is rational, (1) can only be satisfied finitely many times. ∎

The theorem is illustrated in Figure 2.2. Dirichlet’s theorem is sharp in the sense that it fails for any exponent exceeding 22 on the right side.

Refer to caption
Figure 2.2. Translucent white disks of radius 1/q21/q^{2} at p/qp/q on a black background (so a lighter colour indicates an overlap of more disks). The real line passing horizontally through the middle of the picture ranges from 1-1 to 22; each integer appears as a distinct ‘pinch point.’ Dirichlet’s theorem states that the points lying under infinitely many disks are exactly the irrationals. (The illusory three ‘eggs’ are actually pairwise overlaps of the disks of radius 11 centred at each integer.) Image: Edmund Harriss, Steve Trettel and Katherine E. Stange.
Exercise 2.26.

We will show that the golden ratio α=1+52\alpha=\frac{1+\sqrt{5}}{2} is particularly poorly approximable888Enjoy this: https://slate.com/technology/2021/06/golden-ratio-phi-irrational-number-ellenberg-shape.html. Let ff be the minimal polynomial for α\alpha. Let p/qp/q\in\mathbb{Q}. Obtain a lower bound for f(p/q)f(p/q) in terms of qq. Factoring f(p/q)=(p/qα)(p/qα¯)f(p/q)=(p/q-\alpha)(p/q-\overline{\alpha}), and bounding the second factor above, prove that there exists some constant KK such that |pqα|1Kq2\left|\frac{p}{q}-\alpha\right|\geq\frac{1}{Kq^{2}} for all but finitely many rationals p/qp/q. Fun fact: α\alpha is best approximated by ratios Fn/Fn1F_{n}/F_{n-1} where FnF_{n} is the Fibonacci sequence.

2.11. The Farey subdivision

How do we find the set of ‘good approximations,’ that is to say, solutions to (1)? The pigeonhole principle proof above provides an algorithm, albeit a slow one. There is a geometric story, however, which gives rise to a very efficient algorithm: the theory of continued fractions. My favourite version of this story is due to Caroline Series [Ser85a, Ser85b].

Refer to caption
Figure 2.3. The upper half plane, with example geodesics.

We need a little of the geometry of U2\mathbb{H}^{2}_{U} here, which we will detail further in a subsequent section. In particular, U2\mathbb{H}^{2}_{U} is a model of the hyperbolic plane in which geodesics are exactly the straight vertical lines, and the upper half-circles centered on the real line (see Figure 2.3). For Series’ story, we study the Farey Tesselation, shown in Figure 2.6. To generate this image, do the following:

  1. (1)

    draw vertical lines upward from each integer;

  2. (2)

    connect each pair of consecutive integers by a geodesic;

  3. (3)

    for each pair of rational numbers p1/q1p_{1}/q_{1} and p2/q2p_{2}/q_{2} connected by a geodesic, define their mediant (p1+p2)/(q1+q2)(p_{1}+p_{2})/(q_{1}+q_{2}) and draw a geodesic connecting p1/q1p_{1}/q_{1} to the mediant, as well as a geodesic connecting p2/q2p_{2}/q_{2} to the mediant;

  4. (4)

    repeat the last step, ad infinitum.

This mediant operation is actually quite natural if we view {}\mathbb{Q}\cup\{\infty\} as 1()\mathbb{P}^{1}(\mathbb{Z}), the projectivization999For any ring RR, my notation for a projective space is: n(R):={𝐱:=(x1,,xn)0:xiR}/(𝐱=λ𝐱,λR)\mathbb{P}^{n}(R):=\{\mathbf{x}:=(x_{1},\ldots,x_{n})\neq 0:x_{i}\in R\}/(\mathbf{x}=\lambda\mathbf{x},\lambda\in R^{*}). In topological or geometric contexts, authors write FnF\mathbb{P}^{n} for the projective space over the field FF. of the square lattice 2\mathbb{Z}^{2}. Each rational number, by taking it in reduced form, corresponds to an element of 2\mathbb{Z}^{2} with a sightline to the origin (i.e. a primitive vector). Then the mediant operation is vector addition of these primitive vectors (reduced fractions). One way to ‘see’ this is to consider the lattice 2\mathbb{Z}^{2} in the plane. Standing at the origin and looking out, we see the vertices of the lattice as a copy of 1()\mathbb{P}^{1}(\mathbb{Z}) (see Figure 2.4). The ‘nearby’ points are those with small vector entries, and these occur earlier in the Farey subdivision. Visually, if the lattice points are indicated by spheres, they would appear as larger dots (compare Figures 2.4 and 2.5).

This process actually generates all the rational numbers, as the following exercise demonstrates.

Refer to caption
Figure 2.4. The view floating slightly above the origin of 2\mathbb{Z}^{2}.
Refer to caption
Figure 2.5. The rationals p/qp/q indicated with disks of radius 320q2\frac{3}{20q^{2}}; similar to the view of 2\mathbb{Z}^{2} from the origin in Figure 2.4 (Image: [HST22, Figure 15]).
Exercise 2.27.

Begin with any positive rational number in lowest form p/qp/q. Show that there exists u/vu/v\in\mathbb{Q} such that pvuq=1pv-uq=1. Then perform a Euclidean-style algorithm on the vectors (pq)\begin{pmatrix}p\\ q\end{pmatrix} and (uv)\begin{pmatrix}u\\ v\end{pmatrix}, repeatedly substracting one from the other until attaining the pair of standard basis vectors for 2\mathbb{Z}^{2}. Explain why this is a proof that the Farey tesselation generates all rational numbers as geodesic endpoints.

In this image, there is a ‘bubble,’ or geodesic, lying above the unit interval [0,1][0,1] and then it is further subdivided into two bubbles, each of which is further subdivided, etc. This gives a way to ‘organize’ the real line: compare this to the more common decimal subdivision in Figure 2.7. In either system, we can specify the location of a real number by indicating which interval it lies in, then which subinterval of that, etc., iteratively. In the decimal system, we choose the leftmost endpoint of each successive interval, and call the resulting sequence the decimal expansion.

Refer to caption
Figure 2.6. The Farey subdivision of the upper half plane, i.e. orbit of the geodesic from 0 to \infty under PSL2()\operatorname{PSL}_{2}(\mathbb{Z}).
Refer to caption
Refer to caption
Figure 2.7. At left, the decimal subdivision of the unit interval, to three levels; at right the Farey subdivision, to nine levels.

What if we use the Farey subdivision? We obtain the continued fraction expansion. To describe the ‘path’ to α\alpha through the Farey ‘froth’ of semicircles, we use the language of geodesics. It was long known that continued fractions were connected to the paths of geodesics in the upper half plane and to PSL2()\operatorname{PSL}_{2}(\mathbb{Z}), but it was Caroline Series who realized that the Farey subdivision is perhaps the most natural way to describe this relationship.

Refer to caption
Figure 2.8. The upper geodesic cut (of this hyperbolic triangle) is labelled L, since the region to the left (viewed by an ant riding the geodesic flow) has one cusp. We similarly label the lower cut R. As we look down at the page, this appears counter-intuitive, but to the ant crawling along the page in the direction indicated by the geodesic, this is the intuitive left or right forking choice.
Refer to caption
Refer to caption
Figure 2.9. An example cutting sequence for π\pi, as in Theorem 2.31, showing the first two convergents, 33 and 22/722/7. Top image: the thick dark cluster at left is the integer 33 and the vertical line is indicating π\pi. Bottom image: a detail of the first image, where we see π\pi just to the left of 22/722/7. In both images, the red geodesics indicate the subdivisions crossed as we approach π\pi. The greater the number of red geodesics converging to a point, the better the approximation (and the larger the corresponding continued fraction coefficient). After using L3L^{3} to enter the region between 33 and 44, the top image shows R7R^{7} (red lines emanating from 33 are the walls crossed); the bottom image shows L15L^{15}. The continued fraction expansion of π\pi begins L3R7L15R1L292R1L1R1L^{3}R^{7}L^{15}R^{1}L^{292}R^{1}L^{1}R^{1}\cdots.
Definition 2.28.

Consider any geodesic, heading out from the imaginary axis to a positive real number α\alpha. Label it by L or R as it passes through each region of the Farey tessellation (or subdivision), as follows. The geodesic cuts each triangular fundamental region it passes through into two parts, one having one cusp, and one having two. Label it L if the region to the left of the geodesic (from the perspective of its direction of travel) has one cusp, and R otherwise (Figure 2.8). If the geodesic heads directly into a cusp, label with either an L or an R. The resulting word (generated from left to right), is the Series continued fraction.

The sequence La0Ra1L^{a_{0}}R^{a_{1}}\cdots is an example of a cutting sequence. See Figure 2.9 for an example.

Exercise 2.29.

Determine the beginning of the continued fraction expansion of ee, and the two full expansions for 17/517/5.

Returning to the lattice interpretation, the element α\alpha\in\mathbb{R} corresponds to a ray from the origin at slope α\alpha. If α\alpha is irrational, it will never hit a lattice point. However, there are ‘near misses:’ lattice points that it passes close to as it heads out from the origin. These are the corners of the fundamental triangles of the Farey tessellation at which we ‘turn’ left or right.

2.12. The matrix continued fraction expansion

To formalize the continued fraction process and the good rational approximations it will produce, we consider SL2+()\operatorname{SL}_{2}^{+}(\mathbb{Z}) (the ++ indicates that we include only those matrices whose entries are non-negative) acting on 1()\mathbb{P}^{1}(\mathbb{Q}). This monoid (group without inverses), or semigroup101010A semigroup is a group without the requirement for inverses or an identity. In the literature of continued fractions, SL2+()\operatorname{SL}_{2}^{+}(\mathbb{Z}) is often called a semigroup, although we would normally include the identity., is generated from the identity matrix by two matrices:

ML=(1101),MR=(1011).M_{L}=\begin{pmatrix}1&1\\ 0&1\end{pmatrix},\quad M_{R}=\begin{pmatrix}1&0\\ 1&1\end{pmatrix}.

Phrased another way, SL2+()\operatorname{SL}_{2}^{+}(\mathbb{Z}) is exactly the collection of matrices formed as finite words (including the empty word) in the words MLM_{L} and MRM_{R}. Form a tree whose root is the identity matrix, and for each matrix, the left child is obtained by multiplying on the right by MLM_{L} and the right child is obtained by multiplying on the right by MRM_{R}, as illustrated in Figure 2.10.

Exercise 2.30.

Prove that SL2+()\operatorname{SL}_{2}^{+}(\mathbb{Z}) is generated as a semigroup by MLM_{L} and MRM_{R}. (See Exercise 2.27.) Show that PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) is generated as a group from SL2+()\operatorname{SL}_{2}^{+}(\mathbb{Z}) by the addition of (0110)\begin{pmatrix}0&-1\\ 1&0\end{pmatrix}.

(1001)\textstyle{{\begin{pmatrix}1&0\\ 0&1\end{pmatrix}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(1101)\textstyle{{\begin{pmatrix}1&1\\ 0&1\end{pmatrix}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(1011)\textstyle{{\begin{pmatrix}1&0\\ 1&1\end{pmatrix}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(1201)\textstyle{{\begin{pmatrix}1&2\\ 0&1\end{pmatrix}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(2111)\textstyle{{\begin{pmatrix}2&1\\ 1&1\end{pmatrix}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(1112)\textstyle{{\begin{pmatrix}1&1\\ 1&2\end{pmatrix}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(1021)\textstyle{{\begin{pmatrix}1&0\\ 2&1\end{pmatrix}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}(1301)\textstyle{{\begin{pmatrix}1&3\\ 0&1\end{pmatrix}}}(3211)\textstyle{{\begin{pmatrix}3&2\\ 1&1\end{pmatrix}}}(2312)\textstyle{{\begin{pmatrix}2&3\\ 1&2\end{pmatrix}}}(3121)\textstyle{{\begin{pmatrix}3&1\\ 2&1\end{pmatrix}}}(1213)\textstyle{{\begin{pmatrix}1&2\\ 1&3\end{pmatrix}}}(2132)\textstyle{{\begin{pmatrix}2&1\\ 3&2\end{pmatrix}}}(1123)\textstyle{{\begin{pmatrix}1&1\\ 2&3\end{pmatrix}}}(1031)\textstyle{{\begin{pmatrix}1&0\\ 3&1\end{pmatrix}}}
Figure 2.10. The SL2+()\operatorname{SL}_{2}^{+}(\mathbb{Z}) matrix tree, shown to four levels.

From the previous exercise, one can conclude that the Farey tesselation can equivalently be described as the orbit in U2\mathbb{H}^{2}_{U} of the geodesic from 0 to \infty under PSL2()\operatorname{PSL}_{2}(\mathbb{Z}).

Each geodesic in the picture, once we choose a direction, corresponds to a unique element (p1p2q1q2)\begin{pmatrix}p_{1}&p_{2}\\ q_{1}&q_{2}\end{pmatrix} by writing its rational endpoints p1/q1p_{1}/q_{1} and p2/q2p_{2}/q_{2} as column vectors. In fact, this matrix lies in PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) as the image of the vertical imaginary axis (in a downward direction), which corresponds to (1001)\begin{pmatrix}1&0\\ 0&1\end{pmatrix}, connecting 0 to \infty. Therefore, the pairs of rational numbers connected by a geodesic have the property that p1q2p2q1=1p_{1}q_{2}-p_{2}q_{1}=1, which earns them the name of a unimodal pair.

The continued fraction of a real number α\alpha is more commonly defined by an iterative algebraic process, generating an expression

α=a0+1a1+1a2+1a3+.\alpha=a_{0}+\dfrac{1}{a_{1}+\dfrac{1}{a_{2}+\dfrac{1}{a_{3}+\cdots}}}. (2)

The finite truncations of this expression give rational numbers

pk/qk=a0+1a1+1a2+1a3++1/ak,p_{k}/q_{k}=a_{0}+\dfrac{1}{a_{1}+\dfrac{1}{a_{2}+\dfrac{1}{a_{3}+\cdots+1/a_{k}}}},

called the convergents of α\alpha. The aia_{i} are called the partial quotients or coefficients.

Now a cutting sequence through the Farey tessellation can be interpreted as a word in MRM_{R} and MLM_{L}, by simply replacing RMRR\leftarrow M_{R} and LMLL\leftarrow M_{L}. This is justified by the observation that if a triangular region TT has endpoints α1/b1\alpha_{1}/b_{1}, (a1+a2)/(b1+b2)(a_{1}+a_{2})/(b_{1}+b_{2}), a2/b2a_{2}/b_{2}, then the triangular region TMLTM_{L} (interpreting the matrix as a Möbius transformation) has triangular endpoints a1/b1a_{1}/b_{1}, (2a1+a2)/(2b1+b2)(2a_{1}+a_{2})/(2b_{1}+b_{2}), (a1+a2)/(b1+b2)(a_{1}+a_{2})/(b_{1}+b_{2}), i.e., it corresponds to the left sub-interval.

The following proposition now states that the ‘traditional’ expansion (2) is the same as Series’ geodesic expansion interpreted as matrices.

Theorem 2.31 (Series [Ser85b]).

Suppose

La0Ra1La2Ra3L^{a_{0}}R^{a_{1}}L^{a_{2}}R^{a_{3}}\cdots

is the L,R-sequence for a geodesic originating on the imaginary axis and ending at α\alpha\in\mathbb{R}. Then a classical continued fraction expansion of α\alpha is

a0+1a1+1a2+1a3+.a_{0}+\dfrac{1}{a_{1}+\dfrac{1}{a_{2}+\dfrac{1}{a_{3}+\cdots}}}.

In more detail, if α\alpha has cutting sequence La0Ra1L^{a_{0}}R^{a_{1}}\cdots, define Mn:=MLa0MRa1ManSL2+()M_{n}:=M_{L}^{a_{0}}M_{R}^{a_{1}}\cdots M_{*}^{a_{n}}\in\operatorname{SL}_{2}^{+}(\mathbb{Z}). Then the convergents of the continued fraction expansion of α\alpha are given by

(pnqn)=Mn(10), or Mn(01),\begin{pmatrix}p_{n}\\ q_{n}\end{pmatrix}=M_{n}\begin{pmatrix}1\\ 0\end{pmatrix},\text{ or }M_{n}\begin{pmatrix}0\\ 1\end{pmatrix},

depending whether the last letter was LL or RR, respectively. Furthermore, if α\alpha is rational with lowest terms representation p/qp/q, then there are exactly two expansions of α\alpha (since heading into a cusp is the last step), and we have

(pq)={La0Ra1La2Ra3(10)if the expansion ends in RLa0Ra1La2Ra3(01)if the expansion ends in L\begin{pmatrix}p\\ q\end{pmatrix}=\left\{\begin{array}[]{ll}L^{a_{0}}R^{a_{1}}L^{a_{2}}R^{a_{3}}\cdots\begin{pmatrix}1\\ 0\end{pmatrix}&\text{if the expansion ends in R}\\ L^{a_{0}}R^{a_{1}}L^{a_{2}}R^{a_{3}}\cdots\begin{pmatrix}0\\ 1\end{pmatrix}&\text{if the expansion ends in L}\\ \end{array}\right.

Observe that ‘heading into the mediant’ to end a rational approximation would mean evaluating at the vector (1,1)t(1,1)^{t}, so instead we head one step lower to either left or right; this constitutes the ambiguity in the second part of the theorem.

The following exercise should convince you that Theorem 2.31 is correct.

Exercise 2.32.

Computationally verify the following:

Lk(01)=(k1),LkR(10)=(1+k),LkRLt(01)=(t+k+ktt+1).L^{k}\begin{pmatrix}0\\ 1\end{pmatrix}=\begin{pmatrix}k\\ 1\end{pmatrix},\quad L^{k}R^{\ell}\begin{pmatrix}1\\ 0\end{pmatrix}=\begin{pmatrix}1+\ell k\\ \ell\end{pmatrix},\quad L^{k}R^{\ell}L^{t}\begin{pmatrix}0\\ 1\end{pmatrix}=\begin{pmatrix}t+k+k\ell t\\ \ell t+1\end{pmatrix}.

Verify also that

k1=k,1+k=k+1,t+k+ktt+1=k+1+1t.\frac{k}{1}=k,\quad\frac{1+\ell k}{\ell}=k+\frac{1}{\ell},\quad\frac{t+k+k\ell t}{\ell t+1}=k+\frac{1}{\ell+\frac{1}{t}}.
Exercise 2.33.

Verify Theorem 2.31 for the beginning of the continued fraction expansion of ee, and the two full expansions for 17/517/5.

We complete our study of continued fractions with one of the principal reasons they are studied: that the continued fraction convergents capture good approximations.

Theorem 2.34.

Any rational approximation p/qp/q\in\mathbb{Q} to α\alpha\in\mathbb{R} satisfying |pqα|<12q2\left|\frac{p}{q}-\alpha\right|<\frac{1}{2q^{2}} is a continued fraction convergent of α\alpha.

Exercise 2.35.

To prove Theorem 2.34, do the following:

  1. (1)

    Let p/qp/q be a good approximation in the sense of Theorem 2.34. Suppose without loss of generality that p/q<αp/q<\alpha. Show that p/qp/q must be the closest rational to α\alpha, from below, with denominator q\leq q.

  2. (2)

    Let p/qp^{\prime}/q^{\prime} be the closest rational to α\alpha, from above, with denominator q\leq q.

  3. (3)

    Conclude that p/qp/q and p/qp^{\prime}/q^{\prime} are a unimodal pair and therefore the endpoints of a bubble in the Farey tesselation containing α\alpha.

  4. (4)

    Now consider the mediant MM between these two rational numbers. Show that it must lie to the right of α\alpha, by measuring the distance between p/qp/q and MM (and using the good approximation property of p/qp/q).

  5. (5)

    Show that if α\alpha lies in a bubble of the Farey tesselation, then the bubble endpoint with smallest denominator is a convergent.

2.13. Indefinite quadratic forms and real quadratic irrationalities

Recall that definite quadratic forms have a reduction theory (Exercise 2.18 and Figure 2.1). In short, reduction of a quadratic form is a process of navigating through PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) to obtain a reduced representative of the equivalence class. There is also a reduction theory of indefinite quadratic forms, and it can be explained with the Farey tesselation and is closely related to continued fractions. We turn to this now.

Primitive integral indefinite quadratic forms ax2+bxy+cy2ax^{2}+bxy+cy^{2}, a,b,ca,b,c\in\mathbb{Z}, correspond to a conjugate pair of roots α,α¯=b±Δ2a\alpha,\overline{\alpha}=\frac{-b\pm\sqrt{\Delta}}{2a} in the real line. Such a pair α\alpha, α¯\overline{\alpha} is called reduced if (up to swapping the two roots),

α¯<1<0<α<1.\overline{\alpha}<-1<0<\alpha<1.

It is an elementary exercise to show that this is equivalent to the conditions

0b<Δ,0<Δb2aΔ+b,0\leq b<\sqrt{\Delta},\quad 0<\sqrt{\Delta}-b\leq 2a\leq\sqrt{\Delta}+b,

or, to the satisfyingly simple

b>|a+c|,ac<0.b>|a+c|,\quad ac<0.
Exercise 2.36.

Show the equivalences just mentioned.

In particular, fixing Δ\Delta, there are only finitely many possible values for bb, therefore only finitely many possible aa, and for each such pair only one possible cc. So there are only finitely many reduced forms for each discriminant.

Next, observe that the transformation

(α,α¯)(1α1α,1α¯1α),(\alpha,\overline{\alpha})\mapsto\left(\frac{1}{\alpha}-\left\lfloor\frac{1}{\alpha}\right\rfloor,\frac{1}{\overline{\alpha}}-\left\lfloor\frac{1}{\alpha}\right\rfloor\right),

takes (α,α¯)(\alpha,\overline{\alpha}) to another reduced pair. Furthermore, under the reduced assumption, this operation on pairs is a bijection, with inverse

(α,α¯)(1α1+α¯,1α¯1+α¯).(\alpha,\overline{\alpha})\mapsto\left(\frac{1}{\alpha-1+\lfloor\overline{\alpha}\rfloor},\frac{1}{\overline{\alpha}-1+\lfloor\overline{\alpha}\rfloor}\right).

Thus, the reduced forms fall into cycles, where each step is realized by a bespoke Möbius map (namely, z1/z1/αz\mapsto 1/z-\lfloor 1/\alpha\rfloor).

Proposition 2.37.

For each primitive integral indefinite binary quadratic form ax2+bxy+cy2ax^{2}+bxy+cy^{2} corresponding to pair (α,α¯)(\alpha,\overline{\alpha}), there is a reduced form with pair (α,α¯)(\alpha^{\prime},\overline{\alpha}^{\prime}) such that α=Mα\alpha=M\alpha^{\prime} where MPSL2+()M\in\operatorname{PSL}_{2}^{+}(\mathbb{Z}).

Proof.

We prove it only in the case that α>α¯>0\alpha>\overline{\alpha}>0 (other cases can be reduced to this case). There is some triangle of the Farey tesselation with corners aa, bb, cc such that 0<a<α¯<b<α<c0<a<\overline{\alpha}<b<\alpha<c. Then, by the transitivity of the Farey tesselation, this triangle is an image under some M0PSL2+()M_{0}\in\operatorname{PSL}_{2}^{+}(\mathbb{Z}) of the triangle with corners 0, 11 and \infty, in such a way that 0<α¯0<1<α0<0<\overline{\alpha}_{0}<1<\alpha_{0}<\infty, and M0(α0)=αM_{0}(\alpha_{0})=\alpha, M0(α¯0)=α¯M_{0}(\overline{\alpha}_{0})=\overline{\alpha}. By composing with a translation, we can assume instead that α¯0<0<α0<1\overline{\alpha}_{0}<0<\alpha_{0}<1.

If α¯0<1<0<α0<1\overline{\alpha}_{0}<-1<0<\alpha_{0}<1, we are done. If not, then 1<α¯0<0<α0<1-1<\overline{\alpha}_{0}<0<\alpha_{0}<1, and there is some α1,α¯1\alpha_{1},\overline{\alpha}_{1} with α¯1<1<0<α<1\overline{\alpha}_{1}<-1<0<\alpha<1 and M1PSL2()M_{1}\in\operatorname{PSL}_{2}(\mathbb{Z}) such that M1(α1)=α0M_{1}(\alpha_{1})=\alpha_{0}, M(α¯1)=α¯0M(\overline{\alpha}_{1})=\overline{\alpha}_{0}; in fact M1=(0110)(1b01)M_{1}=\begin{pmatrix}0&-1\\ 1&0\end{pmatrix}\begin{pmatrix}1&b\\ 0&1\end{pmatrix}, b1b\geq 1 will suffice. Note that although M1PSL2+()M_{1}\notin\operatorname{PSL}_{2}^{+}(\mathbb{Z}), the composition M0M1PSL2+()M_{0}M_{1}\in\operatorname{PSL}_{2}^{+}(\mathbb{Z}). The proof is illustrated in Figure 2.11. ∎

Refer to caption
Figure 2.11. A flow chart that relates an indefinite form to a reduced one via a transformation from SL2+()\operatorname{SL}_{2}^{+}(\mathbb{Z}), illustrating the proof of Proposition 2.37.

Combining the proof above, which gives the ‘pre-periodic’ portion of a continued fraction expansion, with the observed cycles of reduced forms, we obtain a corollary.

Corollary 2.38.

Let α\alpha be a real quadratic irrational. Then the continued fraction expansion of α\alpha is eventually periodic, and perfectly periodic if α\alpha is reduced.

Exercise 2.39.

Provide the details of the corollary above. Amongst these details is the converse that any eventually periodic continued fraction converges to a real quadratic irrational.

Exercise 2.40.

Determine the automorphisms of an indefinite form, by using the fact that such an automorphism is a Möbius transformation fixing the roots. Show that such automorphisms are given by solutions to the Pell111111As is the case with so much mathematical nomenclature, this equation has at best a tenuous connection to the person it is named after. Pell revised a translation of a book that discussed it, approximately two millenia after it was first discussed. equation X2ΔY2=4X^{2}-\Delta Y^{2}=4. Next, observe that the Möbius transformations that circumnavigate once around a cycle of indefinite forms (as described above) fix a reduced form, and therefore correspond to solutions to Pell’s equation. This leads to an algorithm to compute solutions to Pell’s equation by continued fractions. (Such automorphisms correspond to multiplying the associated ideal of (Δ)\mathbb{Q}(\sqrt{\Delta}) by a unit.)

The Farey diagram has a dual graph called the topograph, put to lovely use by Conway [Con97] to prove all the basic facts about binary quadratic forms, both definite and indefinite. A wonderful description of this point of view is Allen Hatcher’s Topology of Numbers [Hat22].

2.14. Lagrange spectrum

Dirichlet’s Theorem differentiates the approximation properties of rationals and irrationals. We might ask a more nuanced question, which is, given a constant CC, whether for an irrational α\alpha, there are infinitely many p/qp/q\in\mathbb{Q} such that

|pqα|<Cq2.\left|\frac{p}{q}-\alpha\right|<\frac{C}{q^{2}}.

For C=1C=1, there are infinitely many such approximations. This remains true as CC decreases to 1/51/\sqrt{5}, where Hurwitz’ Theorem states that it still holds. But after that, there are only finitely many approximations for any α\alpha whose continued fraction ends in all 11’s. These are called ‘noble numbers’ and include the golden ratio. More formally, we can define

v(α)=inf{C:|αp/q|<C/q2 for infinitely many p/q}.v(\alpha)=\inf\{C:|\alpha-p/q|<C/q^{2}\text{ for infinitely many }p/q\in\mathbb{Q}\}.

Markoff showed that there is a discrete set of values v(α)v(\alpha) above 1/31/3; the α\alpha that realize these discrete values are called Markoff irrationalities, and the values themselves are the discrete part of the Lagrange spectrum (sometimes called Markoff spectrum, as for example by Series [Ser85a]; the terminology is occasionally murky).

It turns out that the discrete elements of the Markoff spectrum are associated to a simple loop on the punctured torus. See [Ser85b, Ser85a] for the details of this story, but in short, one can obtain the punctured torus as a quotient of U2\mathbb{H}^{2}_{U} by a subgroup of PSL2()\operatorname{PSL}_{2}(\mathbb{Z}), and the geodesic in question is the geodesic whose endpoint is the irrational to be approximated, and the cutting sequence in this situation is closely related to the continued fraction expansion.

2.15. Roth’s Theorem

The Diophantine approximation of algebraic numbers has been of particular interest. It turns out that they are ‘poorly approximable’ in the sense that the exponent 2 in Dirichlet’s Theorem is best possible.

Theorem 2.41 (Roth [Rot55]).

Let α\alpha be an algebraic number of degree d2d\geq 2. Then for every ϵ>0\epsilon>0, there are only finitely many p/qp/q\in\mathbb{Q} such that

|pqα|<1q2+ϵ.\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{2+\epsilon}}.

In fact, most numbers (almost all, in a measure-theoretic sense) are poorly approximable in the same way (Sprindz̆uk [Sz69]). So is it possible to find some that are well-approximable? Liouville constructs such numbers, which can be done cleverly (Exercise 2.42).

Exercise 2.42.

Show that α:=k=0110k!\alpha:=\sum_{k=0}^{\infty}\frac{1}{10^{k!}} is very well-approximable in the following sense. There are infinitely many pn/qnp_{n}/q_{n}\in\mathbb{Q} such that |pn/qnα|<1/qnn\left|p_{n}/q_{n}-\alpha\right|<1/q_{n}^{n}.

Exercise 2.43.

This exercise outlines a proof due to Liouville that any algebraic number α\alpha of degree d2d\geq 2 has only finitely many approximations p/qp/q\in\mathbb{Q} satisfying |pqα|<1qd+ϵ\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{d+\epsilon}} (in other words, Roth’s theorem with exponent d+ϵd+\epsilon instead of 2+ϵ2+\epsilon). This method of proof contains the seeds of the proof of Roth’s theorem.

  1. (1)

    Consider g(x)g(x), the minimal polynomial of α\alpha but scaled to lie in [x]\mathbb{Z}[x] with coefficients having no primitive factor. Give a lower bound on |g(p/q)||g(p/q)|.

  2. (2)

    Use the mean value theorem on the difference g(α)g(p/q)g(\alpha)-g(p/q), to give a lower bound on |αp/q||\alpha-p/q|.

  3. (3)

    Argue that this lower bound is independent of pp and qq.

  4. (4)

    Comparing the two estimates above, complete the argument.

3. Hyperbolic and Minkowski geometry

Having seen the importance of the upper half plane and the Möbius action in number theory, we will now turn to studying this geometry in earnest, including a higher-dimensional generalization. We assume general knowledge of hyperbolic space, but will need the details of several models and the isometries between them. Our approach to defining a model of hyperbolic space is to emphasize the underlying space and its isometries, in the spirit of of Klein’s Erlanger Programm [Trk07]. More specifically, we take an approach based on linear algebra, also in the spirit of Klein; for a more detailed treatment, see for example [Par08]. Here we depart from the standard treatment to give the isometries between several models of hyperbolic space in terms of roots and coefficients of polynomials; for this specific treatment, see [HST22].

3.1. Minkowski space

Refer to caption
Figure 3.1. The Minkowski space M3M^{3}. To visualize M4M^{4}, imagine adding one spatial dimension.

Consider an n+1n+1 dimensional real vector space Mn,1:=n+1M^{n,1}:=\mathbb{R}^{n+1}. Put a quadratic form QQ of signature (n,1)(n,1) and its associated bilinear form ,Q\langle\cdot,\cdot\rangle_{Q} on this space: we will now call it a Minkowski space.

Exercise 3.1.

Suppose we are working over a field of characteristic not equal to 22. Show that if Q(𝐱)Q(\mathbf{x}) is a quadratic form on a vector space, then 𝐱,𝐲:=12(Q(𝐱+𝐲)Q(𝐱)Q(𝐲))\langle\mathbf{x},\mathbf{y}\rangle:=\frac{1}{2}\left(Q(\mathbf{x}+\mathbf{y})-Q(\mathbf{x})-Q(\mathbf{y})\right) is a bilinear form. Conversely, and inversely, show how to recover a quadratic form from a bilinear form.

The forms QQ and ,Q\langle\cdot,\cdot\rangle_{Q} endow Mn,1M^{n,1} with geometry. The zero locus Q=0Q=0 is a double cone emanating from the origin, called the light cone. Outside the cone, the level sets Q=cQ=c, c>0c>0 are one-sheeted hyperboloids. Inside the cone, the level sets Q=cQ=c, c<0c<0 are two-sheeted hyperboloids.

Projectivizing this space to obtain Mn,1\mathbb{P}M^{n,1}, we can take the subset inside the cone:

Mn:={[x0:x1::xn]:Q(x0,,xn)<0}Mn,1;\mathbb{H}^{n}_{M}:=\{[x_{0}:x_{1}:\cdots:x_{n}]:Q(x_{0},\ldots,x_{n})<0\}\subset\mathbb{P}M^{n,1};

one may think of this as obtained by gluing the two sheets of the two-sheeted hyperboloid. Then Mn\mathbb{H}^{n}_{M} is a model of hyperbolic nn-space, called the hyperboloid model. See Figure 3.1.

The metric is given by the differential

ds2=Q(dx0,dx1,,dxn),ds^{2}=Q(dx_{0},dx_{1},\ldots,dx_{n}),

or the distance function dMd_{M} satisfying

cosh(dM(𝐮,𝐯))=𝐮,𝐯QQ(𝐮)Q(𝐯).\cosh\left(d_{M}(\mathbf{u},\mathbf{v})\right)=\frac{\langle\mathbf{u},\mathbf{v}\rangle_{Q}}{\sqrt{Q(\mathbf{u})Q(\mathbf{v})}}. (3)

The quadratic space Mn,1M^{n,1} is acted upon by SOQ()SOn,1()SO_{Q}(\mathbb{R})\cong SO_{n,1}(\mathbb{R}), the special orthogonal transformations preserving the form QQ. Their action takes Mn\mathbb{H}^{n}_{M} to itself, acting as hyperbolic isometries.

Geodesics in Mn\mathbb{H}^{n}_{M} are obtained as the intersections with lines of Mn,1\mathbb{P}M^{n,1} (one may think of this as intersecting planes in n+1\mathbb{R}^{n+1} with the two-sheeted hyperboloid). The light cone itself can be thought of as the ‘boundary at infinity’ of Mn\mathbb{H}^{n}_{M}, and the intersection of the aforementioned projective line (or affine plane) with the cone gives the limit points of the geodesic. In higher dimensions, we obtain geodesic surfaces, spaces, etc. by intersecting higher dimensional subspaces.

3.2. The upper half plane

We’ve seen that the quadratics have an affinity for the Möbius action on the upper half plane. The upper half plane U2\mathbb{H}^{2}_{U} is a model of the hyperbolic plane, whose isometries are PSL2()\operatorname{PSL}_{2}(\mathbb{Z}). The metric is given by the differential

ds2=dzdz¯(z)2ds^{2}=\frac{dz\;d\overline{z}}{\Im(z)^{2}}

or the distance function dUd_{U} satisfying

cosh(dU(z,w))=1+|zw|22(z)(w).\cosh\left(d_{U}(z,w)\right)=1+\frac{|z-w|^{2}}{2\Im(z)\Im(w)}. (4)

The hyperbolic isometries are the Möbius transformations which we met earlier, PSL2()\operatorname{PSL}_{2}(\mathbb{R}). The geodesics in the upper half plane consist of the restrictions to U2\mathbb{H}^{2}_{U} of lines (z)=a\Re(z)=a in \mathbb{C} and circles in \mathbb{C} centred on \mathbb{R} (Figure 2.3). The boundary of U2\mathbb{H}^{2}_{U} is

U2:=^:={}\partial\mathbb{H}^{2}_{U}:=\widehat{\mathbb{R}}:=\mathbb{R}\cup\{\infty\}

where the point \infty is an ideal (in other words, mathematically imagined) point ‘at infinity.’ The point \infty is a limit point for any geodesic arising from a vertical line (z)=a\Re(z)=a, the other limit point being aa. Half-circle geodesics have as their two limit points the intersection of the circle with \mathbb{R}. Observe that the isometries PSL2()\operatorname{PSL}_{2}(\mathbb{R}), interpreted on the extended plane ^:={}\widehat{\mathbb{C}}:=\mathbb{C}\cup\{\infty\}, take U2\partial\mathbb{H}^{2}_{U} to itself.

3.3. Relating the upper half plane and hyperboloid models

There is a beautiful dictionary between the hyperboloid model and the upper half plane model of the hyperbolic plane, carried out by a hyperbolic isometry between the two, and it contains some surprises. The essential observation is that Δ(A,B,C)=B24AC\Delta(A,B,C)=B^{2}-4AC is a signature 2,12,1 form, but is also the discriminant form for quadratic polynomials Ax2+Bx+CAx^{2}+Bx+C, so we can think of the Minkowski space 2,1\mathbb{R}^{2,1} as parametrizing such polynomials by taking Q=ΔQ=\Delta. Then the associated inner product is

(A1,B1,C1),(A2,B2,C2)Q=B1B22A1C22A2C1.\langle(A_{1},B_{1},C_{1}),(A_{2},B_{2},C_{2})\rangle_{Q}=B_{1}B_{2}-2A_{1}C_{2}-2A_{2}C_{1}.

If we are interested in the roots of such polynomials, then it is natural to take a projectivization of this space, as the roots are unaffected by scaling the polynomial by a constant.

Exercise 3.2.

Demonstrate that Δ(A,B,C)\Delta(A,B,C) is a signature 2,12,1 form.

Viewed as a space of polynomials, the light cone cuts out those polynomials with complex conjugate roots as the interior of the light cone (Δ<0\Delta<0), leaving those with distinct real roots on the exterior (Δ>0\Delta>0). The light cone itself corresponds to the quadratic polynomial having a double root (Δ=0\Delta=0).

To be more formal, consider now the projectivized interior of the light cone:

M2:={[A:B:C]:A,B,C,B2<4AC}={Ax2+Bx+C:A,B,C,B2<4AC}.\mathbb{H}^{2}_{M}:=\{[A:B:C]:A,B,C\in\mathbb{R},B^{2}<4AC\}=\{Ax^{2}+Bx+C:A,B,C\in\mathbb{R},B^{2}<4AC\}.

Then, the hyperbolic isometry between M2\mathbb{H}^{2}_{M} and U2\mathbb{H}^{2}_{U} is essentially the quadratic formula!

Theorem 3.3 ([HST22, Theorem 4.9]).

The map from M2\mathbb{H}^{2}_{M} to U2\mathbb{H}^{2}_{U} given on [A:B:C][A:B:C] by taking the root of positive imaginary part to the polynomial Ax2+Bx+CAx^{2}+Bx+C, i.e., the quadratic formula, is a hyperbolic isometry. The inverse to this map takes α\alpha\in\mathbb{C}\smallsetminus\mathbb{R} to [1:αα¯:αα¯][1:-\alpha-\overline{\alpha}:\alpha\overline{\alpha}].

We can now line up the hyperbolic isometries on either side of the identification between U2\mathbb{H}^{2}_{U} and M2\mathbb{H}^{2}_{M}.

Theorem 3.4 ([HST22, Observation 4.6]).

Identify elements [A:B:C][A:B:C] of M2\mathbb{H}^{2}_{M} with matrices DA,B,C:=(CB/2B/2A)D_{A,B,C}:=\begin{pmatrix}C&B/2\\ B/2&A\end{pmatrix}. Then the isometry of Theorem 3.3 between U2\mathbb{H}^{2}_{U} and M2\mathbb{H}^{2}_{M} is PSL2()\operatorname{PSL}_{2}(\mathbb{R})-equivariant, relating the action of PSL2()\operatorname{PSL}_{2}(\mathbb{R}) via Möbius transforms on U2\mathbb{H}^{2}_{U} to the action MDA,B,C:=M1DA,B,CMtM\cdot D_{A,B,C}:=M^{-1}D_{A,B,C}M^{-t} on M2\mathbb{H}^{2}_{M}.

Writing this out explicitly gives a representation of PSL2()\operatorname{PSL}_{2}(\mathbb{R}) in OQ()O_{Q}(\mathbb{R}), i.e. the effect of MM on [A:B:C][A:B:C] is multiplication by the following 3×33\times 3 matrix:

PSL2()OQ(),(abcd)(a2acc22abbc+ad2cdb2bdd2).\operatorname{PSL}_{2}(\mathbb{R})\rightarrow O_{Q}(\mathbb{R}),\quad\begin{pmatrix}a&b\\ c&d\end{pmatrix}\mapsto\begin{pmatrix}a^{2}&-ac&c^{2}\\ -2ab&bc+ad&-2cd\\ b^{2}&-bd&d^{2}\end{pmatrix}.

One advantage of the ambient Minkowski space containing M2\mathbb{H}^{2}_{M} is that there’s a satisfying analogous relationship between the space outside the light cone and the geodesics of U2\mathbb{H}^{2}_{U}. For this, we might give a name to the projectivized exterior of the light cone:

𝔻M2:={[A:B:C]:A,B,C,B2>4AC}={Ax2+Bx+C:A,B,C,B2>4AC}.\mathbb{D}^{2}_{M}:=\{[A:B:C]:A,B,C\in\mathbb{R},B^{2}>4AC\}=\{Ax^{2}+Bx+C:A,B,C\in\mathbb{R},B^{2}>4AC\}.

The 𝔻\mathbb{D} is an anti de Sitter space, which is a term from physics. See Figure 3.1.

Theorem 3.5 ([HST22, Observation 4.11]).

The following two maps from 𝔻M2\mathbb{D}^{2}_{M} to the space of geodesics of U2\mathbb{H}^{2}_{U} coincide:

  1. (1)

    given on [A:B:C][A:B:C] by returning the geodesic whose endpoints are exactly the two real roots of Ax2+Bx+CAx^{2}+Bx+C;

  2. (2)

    given on [A:B:C][A:B:C] by first taking the plane normal to [A:B:C][A:B:C] in M2,1M^{2,1} (with respect to the Minkowski norm), then intersecting it with M2\mathbb{H}^{2}_{M}, and finally composing with the hyperbolic isometry of Theorem 3.3.

Exercise 3.6.

Prove Theorems 3.3, 3.4, 3.5.

Refer to caption
Figure 3.2. M3M^{3} with example plane cutting M2\mathbb{H}^{2}_{M} (giving a hyperbolic geodesic, in thick pen) and ray piercing M2\mathbb{H}^{2}_{M} (giving a hyperbolic point, in thick pen).

We summarize the situation in a table:

upper half plane U2\mathbb{H}^{2}_{U} Minkowski space M2,1M^{2,1}
points vectors inside the light cone
root of Ax2+Bx+CAx^{2}+Bx+C, Q<0Q<0 vector [A:B:C][A:B:C], Q<0Q<0, or matrix DA,B,C:=(CB/2B/2A)D_{A,B,C}:=\begin{pmatrix}C&B/2\\ B/2&A\end{pmatrix}
geodesics planes cutting the light cone
geodesic joining roots of Ax2+Bx+CAx^{2}+Bx+C, Q>0Q>0 plane normal to vector [A:B:C][A:B:C], Q>0Q>0
Möbius action of MPSL2()M\in\operatorname{PSL}_{2}(\mathbb{R}) action DA,B,CM1DA,B,CMtD_{A,B,C}\mapsto M^{-1}D_{A,B,C}M^{-t}, MPSL2()M\in\operatorname{PSL}_{2}(\mathbb{R})

It makes sense to think of the one-sheeted hyperboloid as the space of geodesics of the upper half plane, since any point in that space is a normal vector normal to a plane cutting out a geodesic.

3.4. The Hamilton quaternions and upper half space

Refer to caption
Figure 3.3. The upper half space, with example geodesic planes and lines.

The ring of Hamilton quaternions is the ring

H={x+yi+zj+wk:x,y,z,w}H=\{x+yi+zj+wk:x,y,z,w\in\mathbb{R}\}

with relations i2=j2=k2=1i^{2}=j^{2}=k^{2}=-1 and k=ij=jik=ij=-ji. There is a quaternionic conjugation:

x+yi+zj+wk¯=zyizjwk.\overline{x+yi+zj+wk}=z-yi-zj-wk.

Analogously to the upper half plane, we can define the upper half space

U3:={x+yi+zj:z>0}H.\mathbb{H}^{3}_{U}:=\{x+yi+zj:z>0\}\subseteq H.

This is a standard model of hyperbolic 33-space, thought of as a halfspace of 3={x+yi+zj}\mathbb{R}^{3}=\{x+yi+zj\}, whose boundary is U3:=^:={}\partial\mathbb{H}^{3}_{U}:=\widehat{\mathbb{C}}:=\mathbb{C}\cup\{\infty\}, consisting of a copy of \mathbb{C} (thought of as z=0z=0 in 3\mathbb{R}^{3} or z=w=0z=w=0 in H), augmented by a point at \infty. We can also write α:=x+yi\alpha:=x+yi\in\mathbb{C} and an element of U3\mathbb{H}^{3}_{U} as α+zj\alpha+zj, as in Figure 3.3.

The differential is

ds2=dαdα¯+dz2z2,ds^{2}=\frac{d\alpha d\overline{\alpha}+dz^{2}}{z^{2}},

and the distance function dUd_{U} satisfies

cosh(dU(α+zj,β+wj))=1+|αβ|2+(zw)22zw.\cosh\left(d_{U}(\alpha+zj,\beta+wj)\right)=1+\frac{|\alpha-\beta|^{2}+(z-w)^{2}}{2zw}. (5)

There is an action of PSL2()\operatorname{PSL}_{2}(\mathbb{C}) via Möbius transformations on the boundary of the upper half space, ^\widehat{\mathbb{C}}. This action extends to a unique action on U3\mathbb{H}^{3}_{U} by hyperbolic isometries, namely

(abcd)(α+zj)=(a(α+zj)+b)(c(α+zj)+d)1,(abcd)PSL2(),α,z,\begin{pmatrix}a&b\\ c&d\end{pmatrix}\cdot(\alpha+zj)=(a(\alpha+zj)+b)(c(\alpha+zj)+d)^{-1},\quad\begin{pmatrix}a&b\\ c&d\end{pmatrix}\in\operatorname{PSL}_{2}(\mathbb{C}),\quad\alpha\in\mathbb{C},\quad z\in\mathbb{R},

where we must keep in mind the non-commutativity of ii and jj, so that order matters; and the notion of inverse takes place in the quaternions, so that

(c(α+zj)+d)1=(α¯zj)c¯+d¯|cα+d|2+|cz|2.(c(\alpha+zj)+d)^{-1}=\frac{(\overline{\alpha}-zj)\overline{c}+\overline{d}}{|c\alpha+d|^{2}+|cz|^{2}}. (6)

Again, order matters. Note that for z=0z=0 the given action restricts to the usual Möbius action on U3\partial\mathbb{H}^{3}_{U}.

Exercise 3.7.
  1. (1)

    Prove that any element of PSL2()\operatorname{PSL}_{2}(\mathbb{C}) can be expressed as a composition of translation (zz+βz\mapsto z+\beta), scaling (zβzz\mapsto\beta z) and circle inversion (z1/zz\mapsto 1/z).

  2. (2)

    Prove equation (6).

  3. (3)

    Prove that the given Möbius action takes U3\mathbb{H}^{3}_{U} to itself.

  4. (4)

    Prove that the given Möbius action on U3\mathbb{H}^{3}_{U} acts by isometry.

We define the upper half space model of 3\mathbb{H}^{3} to be U3\mathbb{H}^{3}_{U} with isometries given by the Möbius action of PSL2()\operatorname{PSL}_{2}(\mathbb{C}) as above.

The geodesics are given by the restriction to U3\mathbb{H}^{3}_{U} of any circle contained in a vertical plane with centre on U3\partial\mathbb{H}^{3}_{U}, together with vertical lines α=α0\alpha=\alpha_{0} (Figure 3.3).

3.5. Relating the hyperboloid and upper half-space models for hyperbolic 33-space

The story in Section 3.3 has an analog for hyperbolic 33-space in place of the hyperbolic plane. Recall that we viewed M2,1M^{2,1} as a space of symmetric matrices DA,B,C:=(CB/2B/2A)D_{A,B,C}:=\begin{pmatrix}C&B/2\\ B/2&A\end{pmatrix} associated to quadratic forms Ax2+Bxy+Cy2Ax^{2}+Bxy+Cy^{2} and quadratic polynomials Ax2+Bx+CAx^{2}+Bx+C.

Moving up from M2,1M^{2,1} to M3,1M^{3,1}, we can now consider M3,1M^{3,1} to be a space of Hermitian matrices

(qr+sirsip),p,q,r,s,\begin{pmatrix}q&-r+si\\ -r-si&p\end{pmatrix},\quad p,q,r,s\in\mathbb{R},

with determinant Q(p,q,r,s)=r2+s2pqQ(p,q,r,s)=r^{2}+s^{2}-pq playing the role of the signature 3,13,1 form. The term Hermitian matrix means that M=MM=M^{\dagger} where \dagger represents the complex conjugate transpose. The associated bilinear form is

(p1,q1,r1,s1),(p2,q2,r2,s2)Q=r1r2+s1s2p1q22p2q12.\langle(p_{1},q_{1},r_{1},s_{1}),(p_{2},q_{2},r_{2},s_{2})\rangle_{Q}=r_{1}r_{2}+s_{1}s_{2}-\frac{p_{1}q_{2}}{2}-\frac{p_{2}q_{1}}{2}.

The form QQ breaks the space into the interior and exterior of the light cone Q=0Q=0, as before.

If we projectivize, then we obtain Hermitian matrices up to scaling. Let us define, as before,

𝔻M3\displaystyle\mathbb{D}^{3}_{M} :={[p:q:r:s]:p,q,r,s,r2+s2>pq},\displaystyle:=\{[p:q:r:s]:p,q,r,s\in\mathbb{R},r^{2}+s^{2}>pq\},
M3\displaystyle\mathbb{H}^{3}_{M} :={[p:q:r:s]:p,q,r,s,r2+s2<pq}.\displaystyle:=\{[p:q:r:s]:p,q,r,s\in\mathbb{R},r^{2}+s^{2}<pq\}.

Revisit Figure 3.1 and imagine adding one spatial dimension.

Then we have an isometry between our two models of hyperbolic 33-space, reminiscent of the quadratic formula.

Theorem 3.8.

The following map is an isometry from M3\mathbb{H}^{3}_{M} to U3\mathbb{H}^{3}_{U}:

[p:q:r:s]rp+spi+r2+s2pqpj.[p:q:r:s]\mapsto\frac{r}{p}+\frac{s}{p}i+\frac{\sqrt{r^{2}+s^{2}-pq}}{p}j.

The inverse is

a+bi+cj[1:a2+b2c2:a:b].a+bi+cj\mapsto[1:a^{2}+b^{2}-c^{2}:a:b].
Proof.

That they are inverses is easy. We will show distances are preserved. By the transitivity of the hyperbolic isometries of hyperbolic 33-space on the tangent bundle, we need only compare points jj and ajaj in U3\mathbb{H}^{3}_{U}, which correspond to points [1,1,0,0][1,-1,0,0] and [1,a2,0,0][1,-a^{2},0,0] in M3\mathbb{H}^{3}_{M}. Specifically, we can move the first point to jj by an isometry (by transitivity on points), and then move the tangent direction to the second point to point upward along the jj line (by transitivity on the tangent bundle). For more details, consult the analogous [HST22, Theorem 4.9]. Then we need only compute the distance before and after the isometry. Using (3), we find

cosh(dM([1:1:0:0],[1:a2:0:0]))=1+a22a.\cosh(d_{M}([1:-1:0:0],[1:-a^{2}:0:0]))=\frac{1+a^{2}}{2a}.

Using (5), we find

cosh(dU(j,aj))=1+a22a.\cosh\left(d_{U}(j,aj)\right)=\frac{1+a^{2}}{2a}.

This is somewhat analogous to the ‘quadratic formula’ isometry between ‘coefficient space’ M2\mathbb{H}^{2}_{M} and ‘root space’ U2\mathbb{H}^{2}_{U} discussed previously, in that the map takes [p:q:r:s][p:q:r:s] to a quaternionic root of the polynomial

p(Zr+sip)2r2+s2pqp.p\left(Z-\frac{r+si}{p}\right)^{2}-\frac{r^{2}+s^{2}-pq}{p}. (7)

The only quaternionic root to this polynomial that lies in U2\mathbb{H}^{2}_{U} is the root mentioned. We also obtain an equivariance statement.

Theorem 3.9.

Identify elements [p:q:r:s][p:q:r:s] of M3\mathbb{H}^{3}_{M} with matrices Hp,q,r,s:=(qr+sirsip)H_{p,q,r,s}:=\begin{pmatrix}q&-r+si\\ -r-si&p\end{pmatrix}. Then the map of Theorem 3.8 between upper half space U3\mathbb{H}^{3}_{U} and M3\mathbb{H}^{3}_{M} is PSL2()\operatorname{PSL}_{2}(\mathbb{C})-equivariant, relating the action of PSL2()\operatorname{PSL}_{2}(\mathbb{C}) via Möbius transformations on U3\mathbb{H}^{3}_{U} to the action MHp,q,r,s:=M1Hp,q,r,sMM\cdot H_{p,q,r,s}:=M^{-1}H_{p,q,r,s}M^{-\dagger} on M3\mathbb{H}^{3}_{M}.

Exercise 3.10.

Prove Theorem 3.9 and give the corresponding representation of PSL2()\operatorname{PSL}_{2}(\mathbb{C}) in OQ()O_{Q}(\mathbb{R}).

3.6. The space of circles

By a circle in ^\widehat{\mathbb{C}}, we mean any circle in \mathbb{C} or the union of any straight line in \mathbb{C} with \infty. These latter we think of as circles through \infty, and they have curvature 0 (infinite radius).

Associated to a Hermitian matrix such as M=(qr+sirsip)M=\begin{pmatrix}q&-r+si\\ -r-si&p\end{pmatrix} in the last section, we have a Hermitian form

HM(Z,W):=(WZ)M¯(W¯Z¯)=pZZ¯+(r+si)ZW¯+(rsi)Z¯W+qWW¯,p,q,r,s.H_{M}(Z,W):=\begin{pmatrix}W&Z\end{pmatrix}\overline{M}\begin{pmatrix}\overline{W}\\ \overline{Z}\end{pmatrix}=pZ\overline{Z}+(-r+si)Z\overline{W}+(-r-si)\overline{Z}W+qW\overline{W},\quad p,q,r,s\in\mathbb{R}.

When Δ>0\Delta>0 (i.e. M𝔻M3M\in\mathbb{D}^{3}_{M}), then the locus HM(Z,W)H_{M}(Z,W) in 1()\mathbb{P}^{1}(\mathbb{C}) (or HM(z,1)H_{M}(z,1) in {}\mathbb{C}\cup\{\infty\}) gives a circle.

Theorem 3.11.

Let [p:q:r:s]𝔻M3[p:q:r:s]\in\mathbb{D}^{3}_{M}. Let HM(Z,W)H_{M}(Z,W) be the associated Hermitian form. Then the roots ZZ\in\mathbb{C} of HM(Z,1)H_{M}(Z,1) form a circle in ^\widehat{\mathbb{C}}. The circle is the boundary of a unique geodesic plane in the upper half space. Conversely, every circle arises uniquely in this way.

Proof.

The equation HM(Z,1)=0H_{M}(Z,1)=0 can be expressed as

(Zr+sip)(Z¯rsip)=r2+s2pqp2,\left(Z-\frac{r+si}{p}\right)\left(\overline{Z}-\frac{r-si}{p}\right)=\frac{r^{2}+s^{2}-pq}{p^{2}},

so represents a circle with centre (r+si)/p(r+si)/p and radius r2+s2pq/p\sqrt{r^{2}+s^{2}-pq}/p whenever r2+s2>pqr^{2}+s^{2}>pq. This last inequality follows from [p:q:r:s]𝔻M3[p:q:r:s]\in\mathbb{D}^{3}_{M}. For the converse, observe that the circle determines r/p,s/p,q/pr/p,s/p,q/p, hence [p:q:r:s]M4[p:q:r:s]\in\mathbb{P}M^{4}. ∎

Thus, it makes sense to consider the projectivization of the one-sheeted hyperboloid 𝔻M3\mathbb{D}^{3}_{M} to be a parameter space for the circles in ^\widehat{\mathbb{C}}, i.e. the space of circles. Normalizing so that r2+s2pq=1r^{2}+s^{2}-pq=1, the coordinates have the following somewhat standard names:

  1. (1)

    pp = curvature, which is the inverse of radius;

  2. (2)

    rr = real part of curvature times center (the curvature times centre is sometimes more simply called the curvature-centre);

  3. (3)

    ss = imaginary part of curvature times center;

  4. (4)

    qq = co-curvature, which is the curvature of the inversion of the circle through the unit circle.

Exercise 3.12.

Show that the circle associated to each point of 𝔻M3\mathbb{D}^{3}_{M} via Theorem 3.11 is the same circle obtained by the following process. For a vector 𝐯𝔻M3\mathbf{v}\in\mathbb{D}^{3}_{M}, take the hyperplane PP perpendicular to 𝐯\mathbf{v} in the Minkowski geometry (i.e., with respect to ,Q\langle\cdot,\cdot\rangle_{Q}). Take the intersection of PP with M3\mathbb{H}^{3}_{M} and call the intersection II. Then use the hyperbolic isometry of Theorem 3.8 to map II into U3\mathbb{H}^{3}_{U}. This should provide a geodesic plane GG. Take the circle 𝒞\mathcal{C} at the boundary of GG. Revisit Figure 3.2, imagining an extra spatial dimension.

Recall that PSL2()\operatorname{PSL}_{2}(\mathbb{C}) acts on ^\widehat{\mathbb{C}}. It will be helpful later to understand the image of ^\widehat{\mathbb{R}} under a Möbius transformation; this is a straightforward computation.

Proposition 3.13 ([Sta18b, Proposition 3.7]).

Let zαz+βγz+δz\mapsto\frac{\alpha z+\beta}{\gamma z+\delta} be a Möbius transformation. Then the image of ^\widehat{\mathbb{R}} under the transformation is a circle with curvature equal to 2(γ¯δ)=i(γδ¯γ¯δ)2\Im(\overline{\gamma}{\delta})=i(\gamma\overline{\delta}-\overline{\gamma}\delta), and curvature times center equal to i(αδ¯γ¯β)i(\alpha\overline{\delta}-\overline{\gamma}\beta). Finally, the co-curvature is 2(α¯β)=i(αβ¯α¯β)2\Im(\overline{\alpha}\beta)=i(\alpha\overline{\beta}-\overline{\alpha}\beta).

Exercise 3.14.

Prove Proposition 3.13. Furthermore, assume the Möbius transformation is in PSL2([i])\operatorname{PSL}_{2}(\mathbb{Z}[i]), and show that for such a circle, the resulting vector (p,q,r,s)4(p,q,r,s)\in\mathbb{Z}^{4} in the space of circles is congruent to (0,0,0,1)(0,0,0,1) modulo 22.

Exercise 3.15.

What are the correct definitions of curvature and curvature-center for straight lines?

4. Diophantine approximation in the complex plane

It is natural to ask some of our fundamental Diophantine approximation questions for complex numbers. This is where we can reap the benefits of the geometric perspective of the last section. Since the rationals cannot approximate complex numbers off the real line, we must begin by asking what we are hoping to approximate by. One natural answer is to ask to approximate by algebraic numbers of a fixed degree. Another is to approximate by elements of a fixed number field. Here, we will consider the first (later, we will consider the second).

The next question is how to measure the size of an approximation. In the real/rational case, we used the denominator of p/qp/q\in\mathbb{Q} as a natural measure of size. For a general algebraic number α\alpha, having minimal polynomial adxd++a0[x]a_{d}x^{d}+\cdots+a_{0}\in\mathbb{Z}[x], where gcd(ai)=1{\operatorname{gcd}}(a_{i})=1, the naïve height is defined as

H(α):=maxi|ai|.H(\alpha):=\max_{i}|a_{i}|.

Then, in analogy to Dirichlet’s and Roth’s theorems, Koksma defines [Kok39]

kd(α):=sup{k: there exist infinitely many algebraic β of degree d such that |αβ|<1/H(β)k}.k_{d}(\alpha):=\sup\{k:\text{ there exist infinitely many algebraic $\beta$ of degree $\leq d$ such that }|\alpha-\beta|<1/H(\beta)^{k}\}.

In this language, Dirichlet’s Theorem states that k1(α)2k_{1}(\alpha)\leq 2 for rational α\alpha and 2\geq 2 for α\alpha\in\mathbb{R}\smallsetminus\mathbb{Q}. Roth’s theorem says that k1(α)=2k_{1}(\alpha)=2 for algebraic α\alpha, and Sprindz̆uk says k1(α)=2k_{1}(\alpha)=2 for almost all real α\alpha.

Theorem 4.1 (Sprindz̆uk, [Spr69]).

kd(α)=d+1k_{d}(\alpha)=d+1 for almost all α\alpha\in\mathbb{R} and kd(α)=d+12k_{d}(\alpha)=\frac{d+1}{2} for almost all α\alpha\in\mathbb{C}\smallsetminus\mathbb{R}.

This theorem shows that approximation by algebraic numbers is fundamentally different on the real line than off of it. Next, here is a tantalyzingly precise description of kd(α)k_{d}(\alpha) for α\alpha algebraic.

Theorem 4.2 (Bugeaud-Evertse [BE09]).

Let α\alpha\in\mathbb{C}\smallsetminus\mathbb{R} be algebraic. Then

kd(α)={d+12 or d+22 if deg(α)d+2,d evenmin{deg(α)2,d+12} otherwise k_{d}(\alpha)=\left\{\begin{array}[]{ll}\frac{d+1}{2}\text{ or }\frac{d+2}{2}&\text{ if }\deg(\alpha)\geq d+2,d\text{ even}\\ \min\left\{\frac{deg(\alpha)}{2},\frac{d+1}{2}\right\}&\text{ otherwise }\end{array}\right.

For d=2d=2, deg(α)>2\deg(\alpha)>2:

kd(α)={2 if 1,αα¯,α+α¯ are -independent3/2otherwisek_{d}(\alpha)=\left\{\begin{array}[]{ll}2&\text{ if }1,\alpha\overline{\alpha},\alpha+\overline{\alpha}\text{ are $\mathbb{Q}$-independent}\\ 3/2&\text{otherwise}\end{array}\right.

The underlying methods for this are algebraic. This theorem, like Roth’s Theorem, follows from a far-reaching theorem called Schmidt’s subspace theorem.

4.1. Quadratics

We are interested in finding geometric connections or explanations for Theorem 4.2. In particular, the result shows that approximation by quadratics is quite different for certain types of complex numbers compared to others. Our starting point is an analog to Figure 2.5 showing the rational numbers. In Figure 4.1, we see the quadratic algebraic numbers in the upper half plane, sized by their discriminants.

Refer to caption
Figure 4.1. The quadratic algebraic numbers in the upper half plane, sized by discriminant. Image: Edmund Harriss, Steve Trettel, Katherine E. Stange.

In this picture, our eyes infer the SL2()\operatorname{SL}_{2}(\mathbb{Z})-tessellation from Figure 2.1, but with the geodesic boundaries – and many more geodesics besides – filled in with the pearly necklaces of Figure 2.5. In light of the 2\mathbb{Z}^{2} hiding in Figure 2.5, this image asks us a question: are we viewing a higher dimensional lattice under some transformation?

To answer this question, we return to the coefficient and roots spaces M2\mathbb{H}^{2}_{M} and U2\mathbb{H}^{2}_{U} of Section 3. Within the coefficient space M2\mathbb{H}^{2}_{M}, we have the quadratic algebraics: M2():={[a:b:c]:a,b,c,b2<4ac}\mathbb{H}^{2}_{M}(\mathbb{Z}):=\{[a:b:c]:a,b,c\in\mathbb{Z},b^{2}<4ac\}. This is the projectivization of the portion of the 3D lattice 3\mathbb{Z}^{3} which lies inside the light cone. The naïve height is, roughly speaking, a measure of how far the representative [a:b:c][a:b:c] with gcd(a,b,c)=1{\operatorname{gcd}}(a,b,c)=1 is from the origin. So we are in a situation not unlike our description of continued fractions in the real line as the projectivization of the lattice 2\mathbb{Z}^{2}. That is, the relationship between M2()\mathbb{H}^{2}_{M}(\mathbb{Z}) and Figure 4.1 is described by the transformation of Theorem  3.3, which describes Figure 4.1 as an image of a 3D lattice under a certain geometric transformation. See Figure 4.2.

Refer to caption
Figure 4.2. The transformation of polynomials with integer coefficients from ‘coefficient space’ M2\mathbb{H}^{2}_{M} to ‘root space’ U2\mathbb{H}^{2}_{U}. From top left, left-to-right and top-to-bottom: The lattice 3M2,1\mathbb{Z}^{3}\subseteq M^{2,1} representing polynomials with integer coefficients (i.e., having algebraic roots); the cone B24AC=0B^{2}-4AC=0 and the portion of the lattice within it, representing those polynomials with complex conjugate algebraic roots; the projection onto a plane perpendicular to the axis of symmetry of the cone (the Klein disc model); the image of the lattice under the hyperbolic isometry of Theorem 3.3 in the upper half plane. Image: [HST22, Figure 24].

Under this transformation, the planes inside the lattice 3\mathbb{Z}^{3} projectivize to geodesics on the Poincaré disc model and hence in the upper half plane; these are the visually striking necklaces in Figure 4.1. We have the following consequence:

Proposition 4.3 ([HST22, Observation 4.12]).

Let α\alpha\in\mathbb{C}\smallsetminus\mathbb{R}. Then 1,αα¯,α+α¯1,\alpha\overline{\alpha},\alpha+\overline{\alpha} are \mathbb{Q}-dependent if and only if α\alpha lies on a geodesic whose limit points form a conjugate pair of points in a real quadratic field, or a pair of rational points.

We call such geodesics rational geodesics. The main reference for the remainder of this section is [HST22].

Exercise 4.4.

Prove Proposition 4.3.

4.2. The Diophantine approximation of quadratics

Having this geometric interpretation of the way the quadratic algebraic numbers fill out the complex upper half plane, we make some geometrically motivated choices for the notions of distance, complexity and goodness in Diophantine approximation, that differ somewhat from the algebraically motivated ones made classically:

  1. (1)

    We size quadratics by their discriminant. Unlike the naïve height and other classical measures of arithmetic complexity, the discriminant is invariant under PSL2()\operatorname{PSL}_{2}(\mathbb{Z}), the natural symmetries of the space. In the fundamental region, it tracks with the naïve height, so this is not a violent change (see [HST22, §5.2.4]).

  2. (2)

    We use the hyperbolic metric in the upper half plane, not the Euclidean one. Again, this respects the action of PSL2()\operatorname{PSL}_{2}(\mathbb{Z}), without doing great violence in any bounded region. Let α\alpha and β\beta correspond to points fαf_{\alpha} and fβf_{\beta} in M2\mathbb{H}^{2}_{M}. The distance in M2\mathbb{H}^{2}_{M} dictated by the Minkowski geometry is given by

    dM(fα,fβ)=acosh(fα,fβΔαΔβ).d_{M}(f_{\alpha},f_{\beta})=\operatorname{acosh}\left(\frac{-\langle f_{\alpha},f_{\beta}\rangle}{\sqrt{\Delta_{\alpha}\Delta_{\beta}}}\right).
Refer to caption
Figure 4.3. Quadratics are shown in grey. Quartics are shown in two shades of blue. Dots are sized by discriminant. Image: [HST22, Figure 33].

As a consequence, we can re-interpret complex approximation results under these choices.

Theorem 4.5 ([HST22, Theorem 6.3]).

Let α\alpha\in\mathbb{C}\smallsetminus\mathbb{R} not be quadratic algebraic, but lying on a rational geodesic. Then there exists Kα>0K_{\alpha}>0, depending on PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) orbit of α\alpha, so that there are infinitely many quadratic β\beta on that geodesic with

dU(α,β)acosh(1+Kα|Δβ|2).d_{U}(\alpha,\beta)\leq\operatorname{acosh}\left(1+\frac{K_{\alpha}}{|\Delta_{\beta}|^{2}}\right).
Proof.

We provide only a sketch, but the proof is an adaptation of the proof of Dirichlet’s Theorem. We begin by writing the element of coefficient space corresponding to α\alpha as

[1/(α+α¯):1:αα¯/(α+α¯)]=[α1:1:α2].[1/(\alpha+\overline{\alpha}):1:\alpha\overline{\alpha}/(\alpha+\overline{\alpha})]=[\alpha_{1}:1:\alpha_{2}].

Then, we consider the multiples of this vector modulo [::][\mathbb{Z}:\mathbb{Z}:\mathbb{Z}]. Dirichlet’s box principle121212More commonly known as the pigeonhole principle, although I’ve heard it claimed that this very proof is the first recorded use of the idea in the mathematical literature. tells us two multiples are close to one another, so their difference, which we write as n[α1:1:α2]n[\alpha_{1}:1:\alpha_{2}] is close to a lattice element, which we denote [p1:n:p2][p_{1}:n:p_{2}]; this is a candidate good approximation fβf_{\beta}. This tells us that certain linear forms are small:

|nα1p1|,|nα2p2|,|α1p2α2p1|.|n\alpha_{1}-p_{1}|,\quad|n\alpha_{2}-p_{2}|,\quad|\alpha_{1}p_{2}-\alpha_{2}p_{1}|.

This gives a small discriminant pairing fα,fβ\langle f_{\alpha},f_{\beta}\rangle, which in turn gives a small hyperbolic distance. ∎

Refer to caption
Figure 4.4. The cubics in the upper half plane [HST22, Figure 1].

Next, we re-interpret the result of Bugeaud and Evertse (see Figure 4.3).

Theorem 4.6 ([HST22, Theorem 6.6]).

Let α\alpha\in\mathbb{C}\smallsetminus\mathbb{R} be algebraic but not quadratic. Let ϵ>0\epsilon>0. If α\alpha lies on a rational geodesic, then there are only finitely many quadratic β\beta such that

dU(α,β)acosh(1+1|Δβ|2+ϵ).d_{U}(\alpha,\beta)\leq\operatorname{acosh}\left(1+\frac{1}{|\Delta_{\beta}|^{{2+\epsilon}}}\right).

Whether α\alpha is on a rational geodesic or not, amongst β\beta not sharing a rational geodesic with α\alpha, there are only finitely many such that

dU(α,β)acosh(1+1|Δβ|3/2+ϵ).d_{U}(\alpha,\beta)\leq\operatorname{acosh}\left(1+\frac{1}{|\Delta_{\beta}|^{{3/2+\epsilon}}}\right).

The proof proceeds in a similar way, by moving to the same linear forms as in Theorem 4.5, and then one applies Schmidt’s subspace theorem.

4.3. Cubics and beyond

The cubics having two complex conjugate roots can be treated in a similar way, although the discriminant locus is more complicated. The lattice of coefficients is 4-dimensional, so the image is much more complicated and layered when drawn as roots in the upper half plane. It can help to embed it in a 3-dimensional space made up of the upper half plane root together with the real root. It is natural to use a disc model for the upper half plane, and a circle for the real line, resulting in a torus as their product. Some images are shown in Figures 4.44.6.

[Uncaptioned image]
Figure 4.5. A detail of the cubics of Figure 4.4.
[Uncaptioned image]
Figure 4.6. A 3d view of the cubic roots embedded in a torus (this image was created using Emily Dumas’ SL(View) software https://www.dumas.io/) [HST22, Figure 10].

4.4. Open problems

There are a great many open problems motivated by this perspective.

  1. (1)

    Is there a geodesic flow / continued fraction theory for good approximations by quadratics?

  2. (2)

    Is there a natural analog to a Lagrange spectrum?

  3. (3)

    Is there an analog to the Farey subdivision?

  4. (4)

    And many more; see [HST22].

5. Apollonian circle packings: geometric aspects

5.1. Schmidt subdivision

Asmus Schmidt developed a beautiful complex analogue to the Farey subdivision [Sch75]. Recall that the Farey subdivision on ^=U2\widehat{\mathbb{R}}=\partial\mathbb{H}^{2}_{U} is the boundary of a nested set of geodesics in U2\mathbb{H}^{2}_{U}. Similarly, Schmidt’s subdivision views ^\widehat{\mathbb{C}} as the boundary of the upper half space, and the subdivision is formed as the boundary of geodesic planes in the upper half space. The initial subdivision divides the plane by the use of two lines and two circles into eight regions (see Figure 5.1, top). Each triangle is subdivided as in the lower portion of Figure 5.1, by inserting a new circle tangent to all three sides. Repeating this, we obtain nested regions like in Figure 5.2. (Schmidt’s subdivision also incorporates dual circles orthogonal to these, but we will ignore them for now.)

Figure 5.1. A screenshot of Asmus Schmidt’s paper, Diophantine Approximation of Complex Numbers [Sch75, Figure 1, 1*]. Currently suppressed because permission has not yet been requested.
Refer to caption
Figure 5.2. Schmidt’s subdivisions, iterated (without the dual circles).

The analogy is as follows:

Continued fractions Farey subdivision Schmidt subdivision
^\widehat{\mathbb{R}} ^\widehat{\mathbb{C}}
PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) PSL2([i])\operatorname{PSL}_{2}(\mathbb{Z}[i])
intervals circles and triangles
convergents endpoints tangency points
coefficients series of nested geodesics series of nested geodesic planes
Apollonian circle packings

The last line of the table is a new and very rich phenomenon that only exists in the ^\widehat{\mathbb{C}} case. Unlike the Farey subdivision, the Schmidt subdivisions are built out of intermediate ‘pieces,’ called Apollonian circle packings. It is this new object that is so fascinating.

5.2. Descartes quadruples and Apollonian circle packings

A Descartes quadruple is a collection of four circles, all pairwise mutually tangent, of disjoint interiors. Given three mutually tangent circles, there are exactly two circles, called Soddy circles, which can complete the triple to a Descartes quadruple. To construct an Apollonian circle packing, begin with three mutually tangent circles. At each stage, add in any missing Soddy circles for any mutually tangent triple in the packing, and repeat ad infinitum (Figure 5.3).

Famously proven by Descartes and Princess Elisabeth of Bohemia131313What we know is as follows. Descartes proposed the problem (of finding the radii of the circles tangent to a given triple of circles) to Elisabeth in 1643, and she provided a solution, which is lost, but left Descartes, in his own words, ‘filled with joy.’ (Keep in mind he was writing to a Princess!) Descartes’ side of the correspondence having survived, we have two of his solutions, the second of which assumes the three starting circles are mutually tangent, and most closely matches what is now often called Descartes’ Theorem in this context. in correspondence [Sha07, pp. 73–81; AT 4:37, 4:44, 4:45], four circles form a Descartes quadruple if and only if their curvatures a,b,c,da,b,c,d satisfy Q(a,b,c,d)=0Q(a,b,c,d)=0 for the quadratic form

Q(a,b,c,d)=(a+b+c+d)22(a2+b2+c2+d2).Q(a,b,c,d)=(a+b+c+d)^{2}-2(a^{2}+b^{2}+c^{2}+d^{2}). (8)

In particular, given a mutually tangent triple of curvatures a,b,ca,b,c, the Soddy circles have curvatures d1d_{1} and d2d_{2} satisfying Q(a,b,c,di)=0Q(a,b,c,d_{i})=0; these are related by

d1+d2=2(a+b+c).d_{1}+d_{2}=2(a+b+c).

There are various generalizations of this theorem, including to spheres in higher dimension [LMW02, AR23]. We will sometimes refer to the quadruple of curvatures as a Descartes quadruple also, at the risk of some minor confusion.

As we will exploit later, this has the beautiful consequence that if one begins with a Descartes quadruple of integer curvatures a,b,c,da,b,c,d, then the entire packing will consist of integer curvatures. Such an integral packing is called primitive if it has no common factor among its curvatures. Some examples are shown in Figure 5.4.

Refer to caption Refer to caption Refer to caption \;\ldots\; Refer to caption

Figure 5.3. Generating an Apollonian circle packing; at each stage, curvilinear triangles are filled with tangent circles. In the final packing, curvatures are indicated.
Refer to caption
Refer to caption
Figure 5.4. Several examples of Apollonian circle packings. The packing at left can be zoomed in quite amazingly far if you are on a computer screen, and was produced with user-friendly public software created by James Rickards [Ric23]. The packing with people in it was drawn by Katherine Sanden; a few other fanciful packings can be enjoyed in [FS11].

5.3. Apollonian group

The structure of the Apollonian packing is governed by the Apollonian group 𝒜\mathcal{A}, which acts freely and transitively on the collection of its unordered Descartes quadruples [GLM+05, Theorem 4.3]. In particular, it acts on quadruples of curvatures (a,b,c,d)(a,b,c,d) satisfying Q(a,b,c,d)=0Q(a,b,c,d)=0, and therefore is considered a subgroup of the orthogonal group OQ()O_{Q}(\mathbb{Z}) preserving the form QQ. The Descartes form QQ has signature (3,1)(3,1). Specifically, 𝒜\mathcal{A} is generated by the four matrices

S1:=(1000210020102001),S2:=(1200010002100201),S3:=(1020012000100021),S4:=(1002010200120001),\tiny S_{1}:=\begin{pmatrix}-1&0&0&0\\ 2&1&0&0\\ 2&0&1&0\\ 2&0&0&1\\ \end{pmatrix},S_{2}:=\begin{pmatrix}1&2&0&0\\ 0&-1&0&0\\ 0&2&1&0\\ 0&2&0&1\\ \end{pmatrix},S_{3}:=\begin{pmatrix}1&0&2&0\\ 0&1&2&0\\ 0&0&-1&0\\ 0&0&2&1\\ \end{pmatrix},S_{4}:=\begin{pmatrix}1&0&0&2\\ 0&1&0&2\\ 0&0&1&2\\ 0&0&0&-1\\ \end{pmatrix}, (9)

acting on row vectors of curvatures from the right. Each generator corresponds to fixing three of the circles in a Descartes quadruple and ‘swapping’ out one Soddy circle for its alternative (see Figure 5.5). The Descartes quadruples in any one packing constitute one orbit of the Apollonian group.

[Uncaptioned image]
Figure 5.5. An Apollonian swap, replacing the fourth circle C4C_{4} with its alternate C4C_{4}^{\prime}.
[Uncaptioned image]
Figure 5.6. Cayley graph of 𝒜\mathcal{A}, shown to two levels.

This group, whose definition goes back to Hirst [Hir67], is our point of access to the rich arithmetic structure of the curvatures. Fuchs showed that it was a thin group [Fuc11], meaning that it has infinite index in OQ()O_{Q}(\mathbb{Z}) and yet is Zariski dense in the algebraic group OQO_{Q}. Zariski density means that if a polynomial f(,xij,)f(\ldots,x_{ij},\ldots) vanishes on elements of 𝒜\mathcal{A} (the xijx_{ij} representing the entries of the matrix), then it vanishes for all matrices in OQ()O_{Q}(\mathbb{R}). In other words, elements of 𝒜\mathcal{A} cannot be detected by any polynomial condition on its matrix entries.

These matrices satisfy the relations Si2=IS_{i}^{2}=I, and in fact there are no other relations [GLM+05, Proof of Theorem 4.3], so that

𝒜=S1,S2,S3,S4:Si2=1<OQ().\mathcal{A}=\left\langle S_{1},S_{2},S_{3},S_{4}:S_{i}^{2}=1\right\rangle<\operatorname{O}_{Q}(\mathbb{Z}).

This means that the Cayley graph is particularly nice. The Cayley graph of a group GG with respect to a generating set SS is the graph whose vertices are the elements of GG and which has a directed edge from gg to sgsg for all sSs\in S and gGg\in G. In the case of the Apollonian group, we take SS to be the set of generators S1,S2,S3,S4S_{1},S_{2},S_{3},S_{4}. Since these are involutions, we can consider the Cayley graph to be an undirected graph of degree 44. It will be a tree, as in Figure 5.6.

[Uncaptioned image]
Figure 5.7. Inversion S1S_{1}^{\perp}.
[Uncaptioned image]
Figure 5.8. The base quadruple.

5.4. Super-Apollonian group

Graham, Lagarias, Mallows, Wilks and Yan defined the super-Apollonian group by adding four circle inversions SiS_{i}^{\perp} to the Apollonian group, one for inverting into each of the four circles of the Descartes quadruple (Figure 5.7). The matrix for SiS_{i}^{\perp} is the transpose of the matrix SiS_{i}, which is a consequence of a type of duality [GLM+05]. The presentation is

S1,S2,S3,S4,S1,S2,S3,S4:Si2=(Si)2,SjSk=SkSj,jk.\langle S_{1},S_{2},S_{3},S_{4},S_{1}^{\perp},S_{2}^{\perp},S_{3}^{\perp},S_{4}^{\perp}:S_{i}^{2}=(S_{i}^{\perp})^{2},S_{j}S_{k}^{\perp}=S_{k}^{\perp}S_{j},j\neq k\rangle.

This has finite index in OQ()O_{Q}(\mathbb{Z}), so it is no longer thin. The words of length 55 taken in normal form (eliminating Si2S_{i}^{2}, (Si)2(S_{i}^{\perp})^{2}, SiSjS_{i}^{\perp}S_{j}) are shown in Figure 5.9. In fact, the full orbit coincides with Schmidt’s subdivision shown in Figure 5.2. This gives a little bit of perspective on the way in which the Apollonian circle packings form an essential ‘piece’ of Schmidt’s vision of ^\widehat{\mathbb{C}}.

Refer to caption
Figure 5.9. Circles obtained from the base quadruple of Figure 5.8 using words of length six in normal form from the super-Apollonian group. This illustrates the invariant measure associated to super-Apollonian continued fractions. Image: Robert Hines; see also [CFHS19, Figure 7].

Using this super-Apollonian perspective, one can define continued fractions for the complex plane, for approximating elements of \mathbb{C} by Gaussian rationals [CFHS19]; this is distinct from, but related to, Schmidt’s method [Sch75].

5.5. Geometric Apollonian group

The Apollonian group has another incarnation, sometimes called the geometric Apollonian group (as distinct from its algebraic version above).

From this perspective, we view the strip packing as the orbit of the base quadruple shown in Figure 5.8, under a group of Möbius transformations called the geometric Apollonian group. In fact, these transformations can be taken to have entries from [i]\mathbb{Z}[i], the Gaussian integers, so that we obtain a group 𝒜geo<PSL2([i])τ\mathcal{A}^{geo}<\operatorname{PSL}_{2}(\mathbb{Z}[i])\rtimes\langle\tau\rangle, where τ\tau is complex conjugation. This larger group of conformal maps on ^\widehat{\mathbb{C}} is sometimes called the generalized Möbius transformations [GLM+05]. One set of generators that suffices is [Sta18a, equation (8)]

z(2i+1)z¯22z¯+2i1,zz¯+2,zz¯2z¯1,zz¯.z\mapsto\frac{(2i+1)\overline{z}-2}{2\overline{z}+2i-1},\quad z\mapsto-\overline{z}+2,\quad z\mapsto\frac{\overline{z}}{2\overline{z}-1},\quad z\mapsto-\overline{z}.

This generates the strip packing (Figure 5.12), although one must then scale by a factor of 22 to obtain a primitive integral packing (note that the base quadruple in Figure 5.8 has curvatures 0,0,2,20,0,2,2). All other Apollonian packings are images of the strip packing under some Möbius transformation.

Exercise 5.1.

Using the fact that the Möbius transformations act triply transitively on ^\widehat{\mathbb{C}}, show that any two Apollonian packings are related by a Möbius transformation.

To relate the algebraic and geometric Apollonian groups, one can use the exceptional isomorphism

PGL2()SO1,3+().\operatorname{PGL}_{2}(\mathbb{C})\rightarrow\operatorname{SO}^{+}_{1,3}(\mathbb{R}). (10)

For now, we leave aside the details (which are finicky). This is sometimes discussed in the language of the spin homomorphism SL2()SO1,3()\operatorname{SL}_{2}(\mathbb{C})\rightarrow\operatorname{SO}_{1,3}(\mathbb{R}) or spin double cover. See [Fuc11, GLM+05].

The group PGL2()\operatorname{PGL}_{2}(\mathbb{C}) can be identified with Isom(U3)\operatorname{Isom}(\mathbb{H}^{3}_{U}), the isometries of the hyperbolic 3-space realized as the upper half space whose boundary is ^\widehat{\mathbb{C}}. Under this interpretation, 𝒜geo\mathcal{A}^{geo} is a Kleinian group (that is, a discrete subgroup of PGL2()\operatorname{PGL}_{2}(\mathbb{C})), with an infinite volume quotient hyperbolic 33-manifold 𝒜geo\U3\mathcal{A}^{geo}\backslash\mathbb{H}^{3}_{U}. It is geometrically finite (in this context, this means the fundamental domain can be taken to have finitely many sides).

5.6. Limit set and circle growth

Viewing an Apollonian packing 𝒫\mathcal{P} as a subset of ^\widehat{\mathbb{C}}, we define the residual set Λ(𝒫)\Lambda(\mathcal{P}) as the closure of the union of the circles of the packing. There are countably many circles and countably many tangency points, but there are uncountably many points in Λ(𝒫)\Lambda(\mathcal{P}) which are added in the closure process. The complement of Λ(𝒫)\Lambda(\mathcal{P}) consists of the interiors of all the circles.

For 𝒫\mathcal{P} the strip packing, Λ(𝒫)\Lambda(\mathcal{P}) is equal to the limit set of 𝒜geo\mathcal{A}^{geo} (i.e., the set of accumulation points for the orbit of a point under 𝒜geo\mathcal{A}^{geo}) (a similar statement holds for other packings, taking an appropriate conjugate of 𝒜geo\mathcal{A}^{geo}).

Theorem 5.2 ([McM98]).

The Hausdorff dimension of Λ(𝒫)\Lambda(\mathcal{P}) is α:=1.30568\alpha:=1.30568\ldots.

This constant, for which no closed form is known, was recently rigorously computed to an impressive 128 digits by Vytnova and Wormell [VW24].

Exercise 5.3.

Let Λ(𝒫)\Lambda^{*}(\mathcal{P}) temporarily denote the complement of the interiors of 𝒫\mathcal{P}. Hirst [Hir67] showed this has Hausdorff dimension less than 22, hence measure zero. Use this to show that Λ(𝒫)=Λ(𝒫)\Lambda^{*}(\mathcal{P})=\Lambda(\mathcal{P}).

The Hausdorff dimension controls the growth of the number of circles in the packing in terms of size:

Theorem 5.4 ([KO11]).

Let N𝒫(X)=#{C𝒫:curv(C)<X}N_{\mathcal{P}}(X)=\#\{C\in\mathcal{P}:\operatorname{curv}(C)<X\}. Then

N𝒫(X)c𝒫Xα.N_{\mathcal{P}}(X)\sim c_{\mathcal{P}}X^{\alpha}.

An error term and other refinements came afterward [LO13, OS12]. The constant c𝒫c_{\mathcal{P}} is called the Apollonian constant for the packing 𝒫\mathcal{P} and a formula is given in [Vin14, Remark 2.9].

For further details in this direction, consult [GLM+05, Theorem 4.1], [Oh14] and [Oh10].

5.7. Quadratic forms

The geometric Apollonian group has as a subgroup the 22-congruence subgroup, the kernel in PSL2()\operatorname{PSL}_{2}(\mathbb{Z}) under reduction modulo 22:

Γ(2):={MPSL2():MI(mod2)}PSL2().\Gamma(2):=\{M\in\operatorname{PSL}_{2}(\mathbb{Z}):M\equiv I~{}(\textup{mod}~{}2)\}\subseteq\operatorname{PSL}_{2}(\mathbb{Z}).

This is an arithmetic group (much more is known for arithmetic groups than for thin groups), and is a Fuchsian group, that is, a discrete subgroup of PSL2()\operatorname{PSL}_{2}(\mathbb{R}). The group Γ(2)\Gamma(2) is the subgroup of 𝒜geo\mathcal{A}^{geo} stabilizing ^\widehat{\mathbb{R}}.

Take the three circles of the base quadruple tangent to ^\widehat{\mathbb{R}} (Figure 5.8); those of radius 1/21/2 centred on i/2i/2 and 1+i/21+i/2, and the circle i+^i+\widehat{\mathbb{R}}, i.e. the horizontal line through ii (this is indeed tangent to ^\widehat{\mathbb{R}} at \infty; exercise). The group Γ(2)\Gamma(2) preserves tangencies, and so the orbit of the base quadruple under Γ(2)\Gamma(2) gives a family of circles tangent to ^\widehat{\mathbb{R}}. This is the family of the Ford circles: the circles tangent to ^\widehat{\mathbb{R}} at each rational number p/qp/q\in\mathbb{Q} (in lowest form) of radius 1/2q21/2q^{2} (plus the horizontal line through ii). See Figure 5.10. Their curvatures, as a family, are exactly twice the perfect squares.

Exercise 5.5.

Prove this statement (that the Ford circles are the indicated orbit).

Refer to caption
Figure 5.10. The Ford circles.
Refer to caption
Figure 5.11. A mother circle (orange) and the family of circle tangent to it (blue). Other circles in the packing are green.

More generally, fix a circle 𝒞\mathcal{C}, which we will call the mother circle. Letting M^=𝒞M\cdot\widehat{\mathbb{R}}=\mathcal{C}, the subgroup MΓ(2)M1M\Gamma(2)M^{-1} is the stabilizer of 𝒞\mathcal{C} in M𝒜geoM1M\mathcal{A}^{geo}M^{-1} and gives rise to a collection of circles tangent to 𝒞\mathcal{C} (see Figure 5.11) as the image of the Ford circles under MM. The curvatures of this family of circles is exactly the set of primitively represented values of a translated quadratic form. Filling out the details of this relationship proves the following theorem, first observed in this form by Sarnak [Sar], but present in another form in [GLM+03].

Theorem 5.6.

Let 𝒞\mathcal{C} be a circle of curvature cc within an Apollonian circle packing 𝒫^\mathcal{P}\subseteq\widehat{\mathbb{C}}. Then there is a real binary quadratic form f𝒞(x,y)f_{\mathcal{C}}(x,y) of discriminant 4c2-4c^{2} such that the set of curvatures of circles tangent to 𝒞\mathcal{C} within 𝒫\mathcal{P} is the set

{f𝒞(x,y)c:x,y,(x,y)=1}.\{f_{\mathcal{C}}(x,y)-c:x,y\in\mathbb{Z},(x,y)=1\}.
Proof.

The original observation was derived as a consequence of Descartes’ relation. Here we give a proof using 𝒜geo\mathcal{A}^{geo}, beginning with the strip packing. Recall that 𝒜geo\mathcal{A}^{geo} generates the strip packing from the base quadruple (0,0,2,2)(0,0,2,2) of Figure 5.8. Consider the circle which is the horizontal line through ii. The corresponding Möbius transformation from ^\widehat{\mathbb{R}} is (1i01)\begin{pmatrix}1&i\\ 0&1\end{pmatrix}. If we post-compose by Γ(2)\Gamma(2) (which, we recall, acts to permute the circles tangent to ^\widehat{\mathbb{R}}), we have a coset

Γ(2)(1i01)\Gamma(2)\begin{pmatrix}1&i\\ 0&1\end{pmatrix}

of Möbius transformations representing the circles tangent to ^\widehat{\mathbb{R}}. However, these are all oriented opposite to how they should be (having negative curvatures), so we must post-compose by, say, (0110)\begin{pmatrix}0&1\\ 1&0\end{pmatrix}.

Now more generally, consider a packing that is the image of the strip packing under some MPSL2()M\in\operatorname{PSL}_{2}(\mathbb{C}), where we wish to parametrize the family of circles tangent to 𝒞=M^\mathcal{C}=M\cdot\widehat{\mathbb{R}}. Then this family is given by:

M(0110)Γ(2)(1i01).M\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\Gamma(2)\begin{pmatrix}1&i\\ 0&1\end{pmatrix}.

Now we apply Proposition 3.13 to compute the curvatures of this family.

Taking an element (xrys)Γ(2)\begin{pmatrix}x&r\\ y&s\end{pmatrix}\in\Gamma(2), we obtain

2((xδ+yγ)¯((rδ+sγ)+i(xδ+yγ)))\displaystyle 2\Im(\overline{(x\delta+y\gamma)}{\left((r\delta+s\gamma)+i(x\delta+y\gamma)\right)}) =2((xδ+yγ)¯(rδ+sγ))+2((xδ+yγ)¯i(xδ+yγ))\displaystyle=2\Im(\overline{(x\delta+y\gamma)}{(r\delta+s\gamma)})+2\Im(\overline{(x\delta+y\gamma)}{i(x\delta+y\gamma)})
=2(γ¯δ)+2N(xδ+yγ).\displaystyle=-2\Im(\overline{\gamma}{\delta})+2N(x\delta+y\gamma).

The first term is the curvature of 𝒞\mathcal{C}. The second is a quadratic form in integral variables x,yx,y. ∎

Exercise 5.7.

Using the the proof above, recover the fact that the Ford circles have curvatures 2x22x^{2}.

Exercise 5.8.

Prove the theorem from the Descartes relation.

The proof hints at how to compute the form f𝒞f_{\mathcal{C}}. Fix a Descartes quadruple 𝒞1,𝒞2,𝒞3,𝒞4\mathcal{C}_{1},\mathcal{C}_{2},\mathcal{C}_{3},\mathcal{C}_{4} containing 𝒞1=𝒞\mathcal{C}_{1}=\mathcal{C}, the mother circle. Write [n,a,b,c][n,a,b,c] for the quadruple of curvatures in the same order. Choose MM to take \infty, 0, 11 to a,b,ca,b,c. Then, the form is

f𝒞(x,y)=(n+a)x2+(n+a+bc)xy+(n+b)y2;f_{\mathcal{C}}(x,y)=(n+a)x^{2}+(n+a+b-c)xy+(n+b)y^{2};

because then we recover the curvatures a,b,ca,b,c from f𝒞(x,y)nf_{\mathcal{C}}(x,y)-n where (x,y)=(1,0),(0,1),(1,1)(x,y)=(1,0),(0,1),(1,1) respectively. Notice that a different choice of quadruple including 𝒞1\mathcal{C}_{1} corresponds to a different MM and a change of variables on f𝒞f_{\mathcal{C}} within the PGL2()\operatorname{PGL}_{2}(\mathbb{Z})-equivalence class. The following strong statement encompasses this.

Proposition 5.9 ( [GLM+03, Theorem 4.2], [Ric24, Proposition 3.1.2] ).

Let nn\in\mathbb{R}. The quadruples [n,a,b,c]4[n,a,b,c]\in\mathbb{R}^{4} satisfying the Descartes quadratic form (8) biject with the set of positive semi-definite (this means not taking negative values) binary quadratic forms Ax2+Bxy+Cy2Ax^{2}+Bxy+Cy^{2} of discriminant 4n2-4n^{2}. The map is

ϕ:[n,a,b,c](n+a)x2+(n+a+bc)xy+(n+b)y2.\phi:[n,a,b,c]\mapsto(n+a)x^{2}+(n+a+b-c)xy+(n+b)y^{2}.

Furthermore, if we identify quadruples [n,a,b,c][n,a,b,c] under the action of S2S_{2}, S3S_{3}, S4S_{4} and under permutation of the last three entries, then equivalence classes of Descartes quadruples [n,a,b,c][n,a,b,c] are in bijection with GL2()\operatorname{GL}_{2}(\mathbb{Z})-equivalence classes of positive semi-definite binary quadratic forms of discriminant 4n2-4n^{2}.

We have seen that these forms are a reflection of the Fuchsian group Γ(2)\Gamma(2) and its conjugates inside the Apollonian group. These form one of the main tools used to study Apollonian circle packings (more generally, these are a special feature of certain thin Kleinian groups that cause them to behave similarly to the Apollonian case [FSZ19]).

5.7.1. The space of Descartes quadruples over \mathbb{R}

Proposition 5.9 allows us to map Descartes quadruples into the upper half plane U2\mathbb{H}^{2}_{U}, using the correspondence Ax2+Bxy+Cy2B±B24AC2AAx^{2}+Bxy+Cy^{2}\mapsto\frac{-B\pm\sqrt{B^{2}-4AC}}{2A} as studied in Section 3.3. Thus, U2\mathbb{H}^{2}_{U} is the space of pairs (𝒞,A)(\mathcal{C},A) where 𝒞\mathcal{C} is a circle and AA an Apollonian circle packing containing it, where these pairs are identified under affine transformations (i.e. we only think of the geometry of the packing, not its position in space).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5.12. The various types of packings, clockwise from upper left: bounded, strip, half-plane, full-plane. Images: James Rickards.
Refer to caption
Figure 5.13. The upper half plane as a parameter space of Descartes quadruples. That’s right, the parameter space of Apollonian circle packings, marked by type, is itself an Apollonian circle packing! In this image, the interior of each circle contains bounded packings, and they are labelled by their depth element, i.e., the elements of the Apollonian group which must be applied to obtain the ‘nearest’ quadruple involving the outer bounding circle. Image: [Ric24, Figure 6].

This raises an interesting question: can we explore this parameter space? Since we are in the context of real curvatures (not necessarily integral or rational), the packing generated by a quadruple has more variety than we have so far discussed: there are bounded packings, half-plane packings, and full-plane packings (see Figure 5.12).

It is natural to ask which type of packing one obtains – for example, what is the locus, in the upper half plane, of the bounded packings? The answer is striking: colouring the parameter space according to the type of packing results in the strip packing itself in the upper half plane! Let U2\mathbb{H}^{2}_{U} denote the upper half plane considered as a parameter space of quadruples, and let 𝒫\mathcal{P} be the strip packing inside U2\mathbb{H}^{2}_{U}. Then:

  1. (1)

    the bounded packings occur exactly in the interiors of the circles of 𝒫\mathcal{P};

  2. (2)

    the strip packing occurs at each tangency point of 𝒫\mathcal{P};

  3. (3)

    the half-plane packings occur on the circles of 𝒫\mathcal{P}, away from tangency points;

  4. (4)

    the full-plane packings occur at all remaining points, which are the points in Λ(𝒫)𝒫\Lambda(\mathcal{P})\smallsetminus\mathcal{P}.

In Figure 5.13, we label the circles of 𝒫\mathcal{P} by their depth: the number of Apollonian swaps required to reach the outer circle. This is explained in [Ric24]; see also variations on this parameter space in [Hol21] and [Koc20].

6. Apollonian circle packings: number theory aspects

6.1. What is known about curvatures?

Which integer curvatures appear as curvatures in a primitive integral Apollonian circle packing? It is evident that not all curvatures can appear, since the packing doesn’t have enough geometric ‘room’ to fit all the small curvatures. However, once we start collating the larger curvatures, we begin to see many repeats. So it may be reasonable to believe that we eventually begin to see every sufficiently large curvature.

Refer to caption
Refer to caption
Figure 6.1. An Apollonian packing coloured by residue modulo 33 and 55, respectively.

This is not true however. See Figure 6.1. Graham, Lagarias, Mallows, Wilks and Yan observed a congruence restriction on curvatures [GLM+03]. Precisely, they observed that certain residue classes modulo 2424 were sometimes entirely avoided by the set of curvatures in a primitive integral Apollonian circle packing. Specifically, when reducing the curvatures of an Apollonian packing modulo 2424, one obtains one of the following six possible sets of admissible curvatures (for a complete proof, see [HKRS24, Proposition 2.1]):

type residues
(6,1)(6,1) 0,1,4,9,12,160,1,4,9,12,16
(6,5)(6,5) 0,5,8,12,20,210,5,8,12,20,21
(6,13)(6,13) 0,4,12,13,16,210,4,12,13,16,21
(6,17)(6,17) 0,8,9,12,17,200,8,9,12,17,20
(8,7)(8,7) 3,6,7,10,15,18,19,223,6,7,10,15,18,19,22
(8,11)(8,11) 2,3,6,11,14,15,18,232,3,6,11,14,15,18,23

Each admissible set is assigned a type for reference, which is (n,k)(n,k) where nn is the number of residues and kk is the smallest residue coprime to 2424.

To understand the local (i.e., congruence) obstructions that may appear for a modulus nn, one can label the Cayley graph by Descartes quadruples (beginning from the root labelled with any fixed quadruple of the packing), and reduce modulo nn, identifying vertices with identical quadruples modulo nn. We obtain a picture such as that in Figure 6.2. For some moduli, the reduced Cayley graph is small, and some residue classes are missed amongst the curvatures (as in the figure). This proves that the corresponding residue class can never occur as the residue of a curvature in the Apollonian circle packing.

Refer to caption
Figure 6.2. An Apollonian packing reduced modulo 88.

Graham, Lagarias, Mallows, Wilks and Yan conjectured that this obstruction is essentially all we expect, giving the set of curvatures a positive density in the natural numbers [GLM+03, Positive Density Conjecture]; and that maybe even only finitely many exceptions beyond the congruence obstructions occur [GLM+03, Strong Density Conjecture]. This came to be known as the local-to-global conjecture for Apollonian circle packings, later refined by Fuchs-Sanden [FS11], who collected data on several packings, computing curvatures up to size 5×1085\times 10^{8}. In particular, they computed the multiplicity of appearance for curvatures. An example of the results is shown in Figure 6.3, and this indicates that, for most admissible curvatures, the expected multiplicity grows with the curvature size.

Refer to caption
Figure 6.3. On the xx axis, the number of times a curvature appears. On the yy-axis, the number of curvatures with that multiplicity of appearance. The graph concerns curvatures which are 11(mod24)11~{}(\textup{mod}~{}24) in the packing generated by quadruple (1,2,2,3)(-1,2,2,3), in the range 10610^{6} to 10810^{8}. Image: [FS11, Figure 10].

If we write 𝒦(N):=#{n<N:n is a curvature in 𝒫}\mathcal{K}(N):=\#\{n<N:n\text{ is a curvature in ${\mathcal{P}}$}\}, then the conjecture implies that

𝒦(N)=cN+O(1),\mathcal{K}(N)=cN+O(1), (11)

where c=# admissible curvatures modulo 2424c=\frac{\mbox{\# admissible curvatures modulo $24$}}{24}.

Over time, lower bounds on the number of distinct integer curvatures which appear in a packing have gradually improved. In the original paper of Graham, Lagarias, Mallows, Wilks and Yan, it was shown that 𝒦(N)N\mathcal{K}(N)\gg\sqrt{N}. Sarnak was able to show that 𝒦(N)NlogN\mathcal{K}(N)\gg\frac{N}{\sqrt{\log N}} [Sar]. Positive density (meaning a positive proportion of integers) was shown by Bourgain and Fuchs: 𝒦(N)N\mathcal{K}(N)\gg N [BF11]. This was improved to density one using the Hardy-Littlewood circle method by Bourgain and Kontorovich: η>0,𝒦(N)=cN+O(N1η)\exists\eta>0,\;\mathcal{K}(N)=cN+O(N^{1-\eta}), where η\eta is effectively computable [BK14]. Finally, this result was extended to other packings by Fuchs, Stange and Zhang [FSZ19].

However, it was recently discovered [HKRS24] that most Apollonian circle packings are subject to some additional more subtle restrictions on their curvatures: the local-to-global conjecture is false! The source of these new obstructions is quadratic reciprocity. Recall Definition 2.6 of the Legendre symbol for an integer aa and prime pp is defined as:

(ap)={1a is a non-zero square modulo p1a is not a square modulo p0a is zero modulo p\left(\frac{a}{p}\right)=\left\{\begin{array}[]{ll}1&a\text{ is a non-zero square modulo }p\\ -1&a\text{ is not a square modulo }p\\ 0&a\text{ is zero modulo }p\\ \end{array}\right.

This is multiplicative in the numerator:

(abp)=(ap)(bp).\left(\frac{ab}{p}\right)=\left(\frac{a}{p}\right)\left(\frac{b}{p}\right).

More generally, the Jacobi symbol extends the Legendre symbol multiplicatively for odd positive denominators:

(ap1p2)=(ap1)(ap2).\left(\frac{a}{p_{1}p_{2}}\right)=\left(\frac{a}{p_{1}}\right)\left(\frac{a}{p_{2}}\right).

One can extend even further to the Kronecker symbol defined for all integers. Then, quadratic reciprocity says that for two odd primes pp and qq, there is a symmetry between the behaviour of pp modulo qq and qq modulo pp:

(pq)=(1)p12q12(qp).\left(\frac{p}{q}\right)=(-1)^{\frac{p-1}{2}\cdot\frac{q-1}{2}}\left(\frac{q}{p}\right).

There are special rules for 22 and 1-1.

To demonstrate where these new reciprocity obstructions arise from, we prove a single example.

Theorem 6.1 ([HKRS24, Theorem 1.9]).

The Apollonian circle packing generated by the quadruple (3,5,8,8)(-3,5,8,8) has no square curvatures.

Proof.

This packing has the property that all curvaures are 0 or 1(mod4)1~{}(\textup{mod}~{}4). Let 𝒞\mathcal{C} be a circle of curvature nn. By Theorem 5.6, the circles tangent to 𝒞\mathcal{C} have curvatures arising as the primitively represented values of a translated quadratic form f𝒞(x,y)nf_{\mathcal{C}}(x,y)-n of discriminant 4n2-4n^{2}. Therefore the form f𝒞(x,y)f_{\mathcal{C}}(x,y) becomes degenerate modulo nn, being equivalent to Ax2Ax^{2} for some coefficient AA after a change of variables. We see then that the invertible values of f𝒞f_{\mathcal{C}} lie in one multiplicative coset of the squares (see [HKRS24, Proposition 4.1]). Thus we define χ2(𝒞)\chi_{2}(\mathcal{C}) to be the unique non-zero value of the Kronecker symbol (cn)\left(\frac{c}{n}\right), where cc ranges over the curvatures of circles tangent to 𝒞\mathcal{C}.

Now suppose two circles 𝒞1\mathcal{C}_{1} and 𝒞2\mathcal{C}_{2} are tangent in the packing, having coprime curvatures aa and bb, respectively. Then, by quadratic reciprocity,

χ2(𝒞1)χ2(𝒞2)=(ab)(ba)=1.\chi_{2}(\mathcal{C}_{1})\chi_{2}(\mathcal{C}_{2})=\left(\dfrac{a}{b}\right)\left(\dfrac{b}{a}\right)=1.

So χ2(𝒞1)=χ2(𝒞2)\chi_{2}(\mathcal{C}_{1})=\chi_{2}(\mathcal{C}_{2}).

By [HKRS24, Corollary 4.7], any two circles are connected by a path of consecutively pairwise coprime curvatures, and so χ2(𝒞)\chi_{2}(\mathcal{C}) is constant across the entire packing. It remains to compute this value using one pair of circles, say in the root quadruple:

χ2(𝒫)=(85)=1.\chi_{2}(\mathcal{P})=\left(\dfrac{8}{5}\right)=-1.

If there did exist a circle 𝒞\mathcal{C} of square curvature in 𝒫\mathcal{P}, it would give χ2(𝒫)=1\chi_{2}(\mathcal{P})=1, a contradiction. ∎

Exercise 6.2.

Prove that any two circles in an integral Apollonian circle packing are connected by a path of consecutively coprime curvatures.

In general, we define two functions χ2\chi_{2} and χ4\chi_{4} on the set of Descartes quadruples, or, if you prefer, on the set pairs (𝒞,𝒫)(\mathcal{C},\mathcal{P}) of a circle in a packing. These should be thought of in the vein of characters or Legendre symbols; in particular, they take values χ2{±1}\chi_{2}\in\{\pm 1\} and χ4{±1,±i}\chi_{4}\in\{\pm 1,\pm i\}. To define them in general is a little complicated, but we can explain χ2(𝒞)\chi_{2}(\mathcal{C}) fairly simply in the case of a packing of type (6,)(6,*) as the Legendre symbol (ab)\left(\frac{a}{b}\right) where bb is the curvature of 𝒞\mathcal{C}, and aa is a coprime curvature tangent to 𝒞\mathcal{C}, and aa and bb are both coprime to 66. The symbol χ4\chi_{4} is only relevant, hence only defined, for types (6,1)(6,1) and (6,17)(6,17).

Then, it is a consequence of quadratic reciprocity that the values of χ2\chi_{2} are well-defined for a packing 𝒫\mathcal{P} independent of the circle 𝒞\mathcal{C}. Hence we can write χ2(𝒫)\chi_{2}(\mathcal{P}). Similarly, χ4(𝒫)\chi_{4}(\mathcal{P}) is well-defined because of quartic recprocity.

Theorem 6.3 (Haag-Kertzer-Rickards-Stange [HKRS24]).

Let 𝒫\mathcal{P} be a primitive integral Apollonian circle packing. There is an explicit list of obstructions (families of missing curvatures) of the form {ux2:x}\{ux^{2}:x\in\mathbb{Z}\} for u6u\mid 6 or {ux4:x}\{ux^{4}:x\in\mathbb{Z}\} for u36u\mid 36 that are missed by the list of curvatures in 𝒫\mathcal{P}. The list is determined entirely by the set of admissible curvatures, and by the value(s) χ2(𝒫)\chi_{2}(\mathcal{P}) and χ4(𝒫)\chi_{4}(\mathcal{P}), if defined.

The full list is given here, where the type is now extended to be of the form (n,k,χ2)(n,k,\chi_{2}) or (n,k,χ2,χ4)(n,k,\chi_{2},\chi_{4}):

type quadratic obstructions quartic obstructions
(6,1,1,1)(6,1,1,1)
(6,1,1,1)(6,1,1,-1) n4,4n4,9n4,36n4n^{4},4n^{4},9n^{4},36n^{4}
(6,1,1)(6,1,-1) n2,2n2,3n2,6n2n^{2},2n^{2},3n^{2},6n^{2}
(6,5,1)(6,5,1) 2n2,3n22n^{2},3n^{2}
(6,5,1)(6,5,-1) n2,6n2n^{2},6n^{2}
(6,13,1)(6,13,1) 2n2,6n22n^{2},6n^{2}
(6,13,1)(6,13,-1) n2,3n2n^{2},3n^{2}
(6,17,1,1)(6,17,1,1) 3n2,6n23n^{2},6n^{2} 9n4,36n49n^{4},36n^{4}
(6,17,1,1)(6,17,1,-1) 3n2,6n23n^{2},6n^{2} n4,4n4n^{4},4n^{4}
(6,17,1)(6,17,-1) n2,2n2n^{2},2n^{2}
(8,7,1)(8,7,1) 3n2,6n23n^{2},6n^{2}
(8,7,1)(8,7,-1) 2n22n^{2}
(8,11,1)(8,11,1)
(8,11,1)(8,11,-1) 2n2,3n2,6n22n^{2},3n^{2},6n^{2}

These families of powers are called reciprocity obstructions and more specifically quadratic obstructions and quartic obstructions. Certain packings have no reciprocity obstructions at all, but many (most, in a suitable sense) do.

We use the term missing for the curvatures which do not appear in a packing but are allowed by the congruence obstructions. Curvatures which are allowed by the known linear (congruence), quadratic, and quartic obstructions but are still nevertheless missing, are called sporadic.

Conjecture 6.4 (Haag-Kertzer-Rickards-Stange [HKRS24]).

Let 𝒫{\mathcal{P}} be a primitive integral Apollonian circle packing. Then the sporadic set S(𝒫)S({\mathcal{P}}) is finite.

This actually says that, instead of (11), we are in most cases asserting at best that

𝒦(N)=cN+O(N).\mathcal{K}(N)=cN+O(\sqrt{N}). (12)

Haag, Kertzer, Rickards and Stange collected data on the missing curvatures up to bounds between 101010^{10} and 101210^{12} in a few dozen small packings, and the data supports the conjecture that the sporadic set is finite, as the set peters out in that range and appears to end.

For a nice exposition on Apollonian circle packings and their integer curvatures (before the newest obstructions were found, however), see [Fuc13]. The rest of this section is devoted to some of the key tools that appear in proofs concerning integral packings, particularly lower bounds on the number of distinct curvatures.

6.2. Integral quadratic forms

We have seen in Theorem 5.6, that the circles tangent to a fixed ‘mother’ circle represent the values of a translated quadratic form. It is furthermore the case that when 𝒫\mathcal{P} is a primitive integral packing, then the form is a primitive integral positive semi-definite form, and the multiplicity with which a curvature kk\in\mathbb{Z} appears is exactly the number of primitive solutions (x,y)(x,y) to f𝒞(x,y)c=kf_{\mathcal{C}}(x,y)-c=k. Since quadratic forms are141414arguably well-understood, many of the analytic lower bound results so far mentioned involve collecting ensembles of such forms and restricting their overlap to grow large collections of curvatures.

In the section on Schmidt arrangements that follows, we will see that the images of the strip packing (0,0,2,2)(0,0,2,2) under PSL2([i])\operatorname{PSL}_{2}(\mathbb{Z}[i]) include all primitive integral packings (all scaled by two); see Theorem 7.3. We will see that up to similarity, they are in bijection with ideal classes of orders of the Gaussian integers.

6.3. Expander graphs

Consider the Cayley graphs of the Apollonian group modulo pmp^{m}. To see how these are useful, we consider the spectral theory of graphs, which is to say, studying the eigenvalues of the adjacency matrix. The graph we are interested in, denoted 𝒜pm\mathcal{A}_{p^{m}}, is the Cayley graph of 𝒜geo/pm\mathcal{A}^{geo}/p^{m} (coefficients of the matrices taken modulo pmp^{m}) using the images of the standard swaps as generators. This is still 44-regular, just as for the Cayley graph of 𝒜\mathcal{A}. These Cayley graphs, taken as a family, are an expander family, meaning that they are well-connected in a precise sense as pmp^{m} grows.

We will develop the basics of the spectral theory slightly more generally. Let 𝒢\mathcal{G} be a dd-regular graph with vertex set VV of size nn and edge set EE. (The spectral theory of non-regular graphs is significantly more complex in various ways.) Let AA be the n×nn\times n adjacency matrix of 𝒢\mathcal{G}, whose ijij-th entry is 11 when the ii-th vertex is connected to the jj-th vertex, and 0 otherwise. The normalized adjacency matrix 1dA\frac{1}{d}A can be thought of as an operator controlling the flow of mass between vertices. If 𝐱\mathbf{x} is a vector whose entries are indexed by the vertices of 𝒢\mathcal{G}, then 1dA𝐱\frac{1}{d}A\mathbf{x} has entries

1dwvxw,\frac{1}{d}\sum_{w\sim v}x_{w},

where vwv\sim w denotes adjacency, so the sum is over the neighbours. Imagine the vector 𝐱\mathbf{x} denotes a distribution of mass amongst the vertices of 𝒢\mathcal{G}. Then 1dA𝐱\frac{1}{d}A\mathbf{x} is the mass distribution after each vertex ‘gives away’ its mass uniformly to its neighbours (that is, it sends 1/d1/d of its mass to each neighbour), and, consequently, receives 1/d1/d of the mass of each of its neighbours. This is a type of Markov chain.

Under this perspective, an eigenvector of this matrix is a distribution of mass which is scaled under such a flow. One such eigenvector is the uniform distribution (the same mass at all vertices), which has eigenvalue 11. In fact, the eigenvalues λi\lambda_{i} of this matrix are real, and satisfy

1=λ0λ11.1=\lambda_{0}\geq\lambda_{1}\geq\cdots\geq-1.

That they are real is a property of symmetric matrices. Moreover, since the adjacency matrix is real, symmetric, non-negative and irreducible, there is an orthonormal basis of eigenvectors.

If 𝒢\mathcal{G} is connected, then λ0>λ1\lambda_{0}>\lambda_{1} (see Exercise 6.7). The size of this spectral gap measures the connectedness of the graph in some sense. One intuition for this is to consider the convergence of the Markov chain toward the uniform distribution. Let 𝐯0,,𝐯n1\mathbf{v}_{0},\ldots,\mathbf{v}_{n-1} be an orthonormal basis of eigenvectors such that 1dA𝐯i=λi𝐯i\frac{1}{d}A\mathbf{v}_{i}=\lambda_{i}\mathbf{v}_{i}. Let 𝐰\mathbf{w} be any mass distribution on the graph. Then 𝐰=αi𝐯i\mathbf{w}=\sum\alpha_{i}\mathbf{v}_{i}. Then we have

(1dA)k𝐰=α0𝐯0+i>0λikαi𝐯i.\left(\frac{1}{d}A\right)^{k}\mathbf{w}=\alpha_{0}\mathbf{v}_{0}+\sum_{i>0}\lambda_{i}^{k}\alpha_{i}\mathbf{v}_{i}.

From this, using that the basis of eigenvectors is orthonormal,

(1dA)k𝐰α0𝐯022=i>0|λi|2k|αi|2|λ1|2k𝐰2.\left|\left|\left(\frac{1}{d}A\right)^{k}\mathbf{w}-\alpha_{0}\mathbf{v}_{0}\right|\right|_{2}^{2}=\sum_{i>0}|\lambda_{i}|^{2k}|\alpha_{i}|^{2}\leq|\lambda_{1}|^{2k}||\mathbf{w}||^{2}.

Thus we see that the rate of convergence to a uniform distribution is controlled by the spectral gap (i.e., |λ1||\lambda_{1}|).

Definition 6.5.

A family of graphs, 𝒢i\mathcal{G}_{i} for i1i\geq 1, is called an expander family of degree dd if the following hold:

  1. (1)

    the 𝒢i\mathcal{G}_{i} are finite dd-regular connected graphs with |𝒢i||\mathcal{G}_{i}|\rightarrow\infty;

  2. (2)

    for each ii, let λi\lambda_{i} be the largest eigenvalue in absolute value besides ±1\pm 1 of the normalized adjacency matrix of 𝒢i\mathcal{G}_{i}; then

    ϵ:=lim supiλi<1.\epsilon:=\limsup_{i\rightarrow\infty}\lambda_{i}<1.

If the graph is ‘easily cut’ in the sense that removing a small number of edges can disconnect it, then these edges form a bottleneck to rapid convergence. Another measure of the same ‘connectedness’ is given in this language. For any subset SVS\subseteq V, write SE\partial S\subseteq E for the ‘boundary’ of SS, i.e. the edges connecting a vertex of SS to a vertex of the complement. The Cheeger constant of 𝒢\mathcal{G} is

h(𝒢)=minSV,|S||V|/2|S||S|.h(\mathcal{G})=\min_{S\subseteq V,|S|\leq|V|/2}\frac{|\partial S|}{|S|}.

This measures how easy it is to disconnect the graph by removing edges. Then we can replace condition (2) above with the existence of an ϵ<0\epsilon<0 such that h(𝒢i)>ϵh(\mathcal{G}_{i})>\epsilon for all ii.

It is possible to prove a spectral gap for the Apollonian Cayley graphs 𝒜pm:=𝒜geo/pm\mathcal{A}_{p^{m}}:=\mathcal{A}^{geo}/p^{m} in a combinatorial way, by showing that every vertex can be reached in a bounded number of steps, where the bound is independent of the growing parameter pmp^{m}. The existence of short paths to all points in the graph implies ‘good mixing’ and ‘connectedness’; this is the ‘combinatorial spectral gap’ used in [FSZ19, Section 8].

In the next section we will prove strong approximation, but a byproduct of this proof is the spectral gap.

Theorem 6.6.

The Cayley graphs 𝒜geo/pm\mathcal{A}^{geo}/p^{m} form an expander family.

We will discuss the use of this tool at the end of the next section. A good reference for expander graphs for those interested in Cayley graphs is [Kow19].

Exercise 6.7.

Let GG be a dd-regular graph with kk connected components. Show that λ0=λ1==λk1\lambda_{0}=\lambda_{1}=\cdots=\lambda_{k-1} by finding independent eigenvectors. Conversely, show that when k=1k=1, λ0>λ1\lambda_{0}>\lambda_{1}.

Exercise 6.8.

Let GG be a connected dd-regular graph. Show that the eigenvectors for λi<1\lambda_{i}<1 have the property that the sum of their entries is 0. This shows that they are orthogonal to the eigenvector for λ0=1\lambda_{0}=1.

6.4. Strong approximation

An algebraic group GG has strong approximation if the maps G()G(/pm)G(\mathbb{Z})\mapsto G(\mathbb{Z}/p^{m}\mathbb{Z}) are surjective. For example, GL2\operatorname{GL}_{2} fails this property since matrices in GL2(/p)\operatorname{GL}_{2}(\mathbb{Z}/p\mathbb{Z}) with invertible determinants other than ±1\pm 1 are not in the image. However, it does hold for SL2\operatorname{SL}_{2} [DSV03].

In particular, the reduction map modulo 𝔞\mathfrak{a} for any ideal 𝔞\mathfrak{a} of [i]\mathbb{Z}[i], SL2([i])SL2([i]/𝔞)\operatorname{SL}_{2}(\mathbb{Z}[i])\rightarrow\operatorname{SL}_{2}(\mathbb{Z}[i]/\mathfrak{a}) is always surjective. One way to measure of the ‘size’ of a subgroup like 𝒜geo\mathcal{A}^{geo} is to ask whether the reduction maps 𝒜geoSL2(/pm)\mathcal{A}^{geo}\rightarrow\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z}) are surjective. The Apollonian group almost has this property: it holds for sufficiently large prime powers. Thus we say that 𝒜geo\mathcal{A}^{geo} itself has strong approximation.

For primes besides 22 and 33, the proof is by construction, using the fact that zz+iz\mapsto z+i is in the Apollonian group.

Theorem 6.9.

The Apollonian group 𝒜geo\mathcal{A}^{geo} satisfies 𝒜geo/pmSL2(/pm)\mathcal{A}^{geo}/p^{m}\cong\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z}) for all p5p\geq 5.

The following proof is an adaptation of that of Varjú [BK14, Appendix] and [FSZ19].

Proof.

Let p5p\geq 5 be prime and m1m\geq 1 be an integer. Then there exists a pair x,yx,y\in\mathbb{Z} such that x2+y21(modpm)x^{2}+y^{2}\equiv 1~{}(\textup{mod}~{}p^{m}) and (xy,p)=1(xy,p)=1 [Cas78, Exercise 13(v)]. Consider reduction modulo pmp^{m} on [i]\mathbb{Z}[i] (the following will work whether pp is split or inert; we write ii for the image of ii under reduction). The image matrix T0:=(xyyx)T_{0}:=\begin{pmatrix}x&y\\ -y&x\end{pmatrix} lies in SL2(/pm)\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z}) by construction. Note that T0T_{0} has fixed points ±i\pm i. Then there exists a lift T1:=(x0y0y0x0)SL2()T_{1}:=\begin{pmatrix}x_{0}&-y_{0}\\ y_{0}&x_{0}\end{pmatrix}\in\operatorname{SL}_{2}(\mathbb{Z}) of T0T_{0} by the strong approximation of SL2\operatorname{SL}_{2}.

Let T:=(1i01)𝒜T:=\begin{pmatrix}1&i\\ 0&1\end{pmatrix}\in\mathcal{A}. Then TT1T1TT_{1}T^{-1} has fixed points T(±i)={0,2i}T(\pm i)=\{0,2i\}, the first of which we can conjugate to \infty using SSL2()S\in\operatorname{SL}_{2}(\mathbb{Z}). Call the result T2:=STT1T1S1𝒜T_{2}:=STT_{1}T^{-1}S^{-1}\in\mathcal{A}, which fixes \infty modulo pmp^{m}. That is, it must have the form

T2=(a0b0a1).T_{2}=\begin{pmatrix}a_{0}&b\\ 0&a_{1}\end{pmatrix}.

Here, a0a_{0} and a1a_{1} are the eigenvalues of T2T_{2}, and hence also of T1T_{1}, which are x±yix\pm yi. In particular, a02/pma_{0}^{2}\notin\mathbb{Z}/p^{m}\mathbb{Z} (observe that y0(modp)y\not\equiv 0~{}(\textup{mod}~{}p) by construction, so the reduction of x+iyx+iy modulo pmp^{m} is in [i]/pm[i]/pm\mathbb{Z}[i]/p^{m}\mathbb{Z}[i]\smallsetminus\mathbb{Z}/p^{m}\mathbb{Z}), and it is invertible (also by construction). Now let

T3,n:=T1(1n01)T11(1na0201)T_{3,n}:=T_{1}\begin{pmatrix}1&n\\ 0&1\end{pmatrix}T_{1}^{-1}\equiv\begin{pmatrix}1&na_{0}^{2}\\ 0&1\end{pmatrix}

where the upper right corner of T3,1T_{3,1} is invertible, but not in /pm\mathbb{Z}/p^{m}\mathbb{Z}. Hence a02a_{0}^{2} and 11 generate [i]/pm[i]\mathbb{Z}[i]/p^{m}\mathbb{Z}[i]. This implies that all upper triangular matrices are in 𝒜\mathcal{A}. Similarly we obtain all lower triangular matrices. By combining these, we have everything except those things whose lower left entry is divisible by pp. That is, we have more than half of SL2(/pm)\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z}). Therefore we must generate it all. ∎

Although the theorem fails for (p,m){(2,1),(2,2),(2,3),(3,1)}(p,m)\in\{(2,1),(2,2),(2,3),(3,1)\}, for higher powers of 22 and 33, we do recover predictable behaviour, in the following sense. Although 𝒜3\mathcal{A}_{3} is not all of SL2(/3)\operatorname{SL}_{2}(\mathbb{Z}/3\mathbb{Z}), when lifting from 𝒜3\mathcal{A}_{3} to 𝒜32\mathcal{A}_{3^{2}}, we do obtain ‘all’ of the valid lifts, meaning that although of course we cannot recover SL2(/32)\operatorname{SL}_{2}(\mathbb{Z}/3^{2}\mathbb{Z}), we do not ‘lose even more.’ More precisely, for any m>mpm>m_{p} (where m2=2m_{2}=2 and m3=1m_{3}=1), if MSL2(/pm)M\in\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z}) has a reduction to 𝒜pmp\mathcal{A}_{p^{m_{p}}}, then it has a reduction to 𝒜pm\mathcal{A}_{p^{m}}.

In fact, the proof above for p5p\geq 5 shows that all of SL2(/pm)\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z}) is generated by words of a bounded finite length independent of pp and mm; a similar result for 22 and 33 combines with this to give a spectral gap, showing that the Cayley graphs form an expander family (Theorem 6.6).

Strong approximation and the spectral gap turn out to be an important tool in the proofs that many curvatures appear. The rough idea is that for an expander graph, there is rapid mixing, so that as nn and mm grow, all curvatures modulo nn and modulo mm will be appearing regularly, so that all residues modulo nmnm are also likely to occur regularly. Analytic methods allow one to extrapolate that most integers will eventually occur. One might think of this as a sort of explicit Sunzi’s Theorem for Apollonian packings.

Exercise 6.10.

Prove that the natural reduction map PSL2()PSL2(/n)\operatorname{PSL}_{2}(\mathbb{Z})\rightarrow\operatorname{PSL}_{2}(\mathbb{Z}/n\mathbb{Z}) is surjective. Show that this is a group homomorphism and find the kernel.

6.5. Orbits of thin groups more generally

These can be viewed as statements about orbits of the Apollonian group. To be precise, in this perspective, explored in more depth in [Kon13], the curvatures of a fixed packing form a set {πi(𝐯):𝐯𝐯0𝒜,1i4}\{\pi_{i}(\mathbf{v}):\mathbf{v}\in\mathbf{v}_{0}\mathcal{A},1\leq i\leq 4\}, where πi\pi_{i} is projection on the ii-th coordinate, 𝒜\mathcal{A} is the Apollonian group, and 𝐯0\mathbf{v}_{0} is a row vector of curvatures of some fixed Descartes quadruple. We consider the Apollonian group as a subgroup 𝒜\mathcal{A} of OQ()O_{Q}(\mathbb{Z}).

Another famous conjecture is part of the same general type.

Conjecture 6.11 (Zaremba’s Conjecture, [Zar72]).

There exists a positive constant ZZ such that every natural number is the denominator of some rational number (in reduced form) whose continued fraction partial quotients are Z\leq Z.

For example, the continued fraction expansions of all rationals with denominator 77 are:

17=[0;7],27=[0;3,2],37=[0;2,3],47=[0;1,1,3],57=[0;1,2,2],67=[0;1,6],\frac{1}{7}=[0;7],\quad\frac{2}{7}=[0;3,2],\quad\frac{3}{7}=[0;2,3],\quad\frac{4}{7}=[0;1,1,3],\quad\frac{5}{7}=[0;1,2,2],\quad\frac{6}{7}=[0;1,6],\quad

A reasonable guess with current data is that Zaremba’s conjecture holds for Z=5Z=5. In the example above, things are ‘so far so good’ for Z=5Z=5 because at least one of the expansions (in fact, 4 of them) involves only convergents 5\leq 5. We know that denominators 66, 5454, and 150150 fail for Z=4Z=4, but we do not know of any further failures. Niederreiter [Nie78] conjectured that even for Z=3Z=3 there are only finitely many exceptions; Hensley [Hen96] conjectured this for Z=2Z=2. Of course, it certainly fails for Z=1Z=1, whose continued fraction expansions have only Fibonacci sequence denominators.

One might consider the Cantor-like set CZC_{Z} of real numbers whose continued fraction expansions contain only convergents aiZa_{i}\leq Z. Or even more generally, a cantor set CSC_{S} for a set SS of allowable convergents. Hensley conjectured that a Zaremba-like statement will hold for convergents in SS if and only if the Hausdorff dimension of CZC_{Z} exceeds 1/21/2 [Hen96]. However, Bourgain and Kontorovich found a counterexample of S={2,4,6,8,10}S=\{2,4,6,8,10\}, which has a congruence-like obstruction; denominators of 3(mod4)3~{}(\textup{mod}~{}4) cannot appear [BK11].

Zaremba’s conjecture and its variants can be phrased as a thin orbit question. As we saw before, we can generate continued fraction convergents as the columns of elements MSL2+()M\in\operatorname{SL}_{2}^{+}(\mathbb{Z}). A slight reformution is that every continued fraction convergent can be found as a column of a matrix of the form

(011a0)(011a1)(011a2)(011an)\begin{pmatrix}0&1\\ 1&a_{0}\end{pmatrix}\begin{pmatrix}0&1\\ 1&a_{1}\end{pmatrix}\begin{pmatrix}0&1\\ 1&a_{2}\end{pmatrix}\cdots\begin{pmatrix}0&1\\ 1&a_{n}\end{pmatrix}

where the ana_{n} are the partial quotients. To restrict the partial quotients, we simply generate a semi-group (i.e., no inverses) by the matrices (011a)\begin{pmatrix}0&1\\ 1&a\end{pmatrix} where aa is in our set SS of allowable convergents. In recent work, Rickards and Stange find reciprocity obstructions in this context [RS24].

7. Schmidt arrangements

7.1. PSL2(𝒪K)\operatorname{PSL}_{2}(\mathcal{O}_{K})-orbit

As we saw before, Schmidt defined a subdivision of the complex plane that involved Apollonian circle packings. See Figure 7.1. The easiest way to generate this image is to take the image of ^\widehat{\mathbb{R}} under PSL2([i])\operatorname{PSL}_{2}(\mathbb{Z}[i]). Suitable references for this section are [Sta18b, Sta18a, Mar22].

Refer to caption
Figure 7.1. The Gaussian Schmidt arrangement.

The definition of the Schmidt arrangement can be given more generally in terms of an imaginary quadratic field, so that Figure 7.1 is for the Gaussian field (i)\mathbb{Q}(i). Another example is given in Figure 7.2.

Definition 7.1.

The Schmidt arrangement 𝒮K\mathcal{S}_{K} for an imaginary quadratic ring 𝒪K\mathcal{O}_{K} is the image of ^\widehat{\mathbb{R}} under PSL2(𝒪K)\operatorname{PSL}_{2}(\mathcal{O}_{K}).

We will be mainly interested in the case 𝒪K=[i]\mathcal{O}_{K}=\mathbb{Z}[i], because of its connection to the Apollonian packing.

Refer to caption
Figure 7.2. The Schmidt arrangement for (15)\mathbb{Q}(\sqrt{-15}).

We begin with some basic properties, all of which are a consequence of Proposition 3.13 (see also the proof of Theorem 5.6).

Proposition 7.2.
  1. (1)

    The Schmidt arrangement has symmetry by translation by [i]\mathbb{Z}[i] and by rotation of π\pi about the origin.

  2. (2)

    The curvatures of the circles in the Schmidt arrangement lie in 22\mathbb{Z}.

  3. (3)

    The circles are either tangent or disjoint.

  4. (4)

    The points of tangency are Gaussian rationals.

In particular, up to a scaling factor of 22, we’ll think of the circles as having integral curvature. In fact, we’ll call half the curvature the reduced curvature for convenience; this lies in \mathbb{Z}.

The following fact was known to Graham, Lagarias, Mallows, Wilks and Yan in the context of the Apollonian super-packing.

Theorem 7.3 ([GLM+05, Sta18a]).

Every single primitive integral Apollonian circle packing appears in the Schmidt arrangement of (i)\mathbb{Q}(i) exactly once up to translation by [i]\mathbb{Z}[i] and rotation about the origin by π\pi.

We saw in Theorem 5.6 that every circle in a packing has a natural quadratic form associated to it. We also saw in Section 2.7 that quadratic forms are in bijection with lattices. In fact, every circle has a lattice associated to it.

Theorem 7.4.

Let 𝒞\mathcal{C} be a circle in the Schmidt arrangement for (i)\mathbb{Q}(i), given as an image of ^\widehat{\mathbb{R}} by (αβγδ)\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}. Let Λ=γ+δ\Lambda=\gamma\mathbb{Z}+\delta\mathbb{Z}. Then the denominators of the tangency points touching 𝒞\mathcal{C} are exactly the elements of Λ\Lambda, and at tangency point σ/ρ\sigma/\rho, the circles tangent have curvatures (γδ¯)+2kN(ρ)\Im(\gamma\overline{\delta})+2kN(\rho), kk\in\mathbb{Z}.

7.2. Lattice in the space of circles

As an orbit of a circle under Möbius transformations, we can ask what the Schmidt arrangement is as a subset of the space of circles. It is actually a very nicely described subset, as given by Daniel Martin (who did this for general Schmidt arrangements). This is very useful for drawing images.

Proposition 7.5 ([Mar22, Definition 3.1 and Theorem 3.11]).

The circles of the Schmidt arrangement are exactly those circles having curvature 2p2p^{\prime} and curvature center 2t+(2s+1)i2t^{\prime}+(2s^{\prime}+1)i whenever pr2+s2+sp^{\prime}\mid r^{\prime 2}+s^{\prime 2}+s^{\prime}.

For example, for p=1p^{\prime}=1, the condition is always satisfied and we have centers t+(s+1/2)it^{\prime}+(s^{\prime}+1/2)i for all s,ts^{\prime},t^{\prime}\in\mathbb{Z}.

Proof.

We will describe the Gaussian Schmidt arrangement as the intersection of the one-sheeted hyperboloid with a lattice in the space of circles. The lattice will be

Λ:={(p,q,r,s)4:(p,q,r,s)(0,0,0,1)(mod2)}4,\Lambda:=\{(p,q,r,s)\in\mathbb{Z}^{4}:(p,q,r,s)\equiv(0,0,0,1)~{}(\textup{mod}~{}2)\}\subset\mathbb{R}^{4},

coordinates representing curvature, co-curvature, real and imaginary parts of curvature-centre in the space of circles, as usual. By Exercise 3.14, circles of the Schmidt arrangement lie in Λ\Lambda.

The image GG of PSL2([i])\operatorname{PSL}_{2}(\mathbb{Z}[i]) in OQ()O_{Q}(\mathbb{R}) (under the exceptional isomorphism (10)) lies in OQ()O_{Q}(\mathbb{Z}), and satisfies GΛΛG\Lambda\subseteq\Lambda. The extended real line ^\widehat{\mathbb{R}} corresponds to (0,0,0,1)Λ(0,0,0,-1)\in\Lambda, which lies on the one-sheeted hyperboloid r2+s2pq=1r^{2}+s^{2}-pq=1. Thus, since GG preserves length, the entire Schmidt arrangement lies in the one-sheeted hyperboloid r2+s2pq=1r^{2}+s^{2}-pq=1.

Conversely, assume we have a circle 𝒞\mathcal{C} in the lattice and on the hyperboloid. Choose a Gaussian rational point α/β𝒞\alpha/\beta\in\mathcal{C}. There is an element of PSL2([i])\operatorname{PSL}_{2}(\mathbb{Z}[i]) mapping α/β\alpha/\beta to 0 because there is a solution to αx+βy=1\alpha x+\beta y=1 (by coprimality of numerator and denominator). So without loss of generality, we can assume our circle touches 0, and therefore has co-curvature q=0q=0 (exercise). Suppose it has curvature pp and curvature-center r+sir+si. It will still lie in the lattice, since GΛΛG\Lambda\subseteq\Lambda. Consider the matrix

M:=(0s+ir1ip/2).M:=\begin{pmatrix}0&-s+ir\\ 1&ip/2\end{pmatrix}.

Its determinant has absolute value |sir|=r2+s2p0=1|s-ir|=r^{2}+s^{2}-p\cdot 0=1, so it has determinant in {±1,±i}\{\pm 1,\pm i\}. As ss is odd, the determinant is ±1\pm 1, and so MPSL2([i])M\in\operatorname{PSL}_{2}(\mathbb{Z}[i]). Since C^=M^\widehat{C}=M\cdot\widehat{\mathbb{R}}, it is in the Schmidt arrangement.

Finally, we simply work out the condition of intersecting the lattice with the one-sheeted hyperboloid. Writing p=2pp=2p^{\prime}, q=2qq=2q^{\prime}, r=2rr=2r^{\prime}, s=2s+1s=2s^{\prime}+1, the condition that qq^{\prime}\in\mathbb{Z} and pqr2s2+1=0pq-r^{2}-s^{2}+1=0 becomes

pr2+s2+s.p^{\prime}\mid r^{\prime 2}+s^{\prime 2}+s^{\prime}.

7.3. Visual structure of imaginary quadratic fields

The Schmidt arrangement 𝒮K\mathcal{S}_{K} gives visual form to the arithmetic of KK. KK-Bianchi circles intersect only at KK-points and only at ‘unit angles’, i.e. angles θ\theta such that eiθe^{i\theta} lies in the unit group of 𝒪K\mathcal{O}_{K}. Their curvatures lie in Δ\sqrt{-\Delta}\mathbb{Z}. Furthermore, the circles themselves are in bijection with certain ideal classes:

Theorem 7.6 ([Sta18b, Theorem 1.4]).

KK-Bianchi circles of curvature ff, modulo translation into the fundamental region of 𝒪K\mathcal{O}_{K}, and rotation by unit angles, are in bijection with the invertible ideal classes of the order of conductor ff which extend to the trivial class in 𝒪K\mathcal{O}_{K}.

This bijection is very explicit: if M^M\cdot\widehat{\mathbb{R}} is a KK-Bianchi circle of curvature ff, where M=(αβγδ)M=\tiny\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}, then γ+δ\gamma\mathbb{Z}+\delta\mathbb{Z} is an ideal with conductor ff. This is the lattice we saw as the lattice of denominators in the last section.

Furthemore the connectedness of 𝒮K\mathcal{S}_{K} is easily characterised:

Theorem 7.7 ([Sta18b, Theorem 1.5]).

𝒮K\mathcal{S}_{K} is connected if and only if 𝒪K\mathcal{O}_{K} is Euclidean.

Refer to caption
Refer to caption
Figure 7.3. The geodesic surfaces that form the Schmidt arrangement, 3d printed (front and back views).

The arithmetic of Kleinian groups has a long history. The Möbius transformations of ^\widehat{\mathbb{C}} extend to hyperbolic isometries on the upper half plane model of hyperbolic 33-space for which ^\widehat{\mathbb{C}} is the boundary. The quotient of this space by a Bianchi group defines a Bianchi orbifold, and the arithmetic of the field is known to play an important role in the topology and geometry of these orbifolds (see, for example [MR03]). As the simplest example, the cusps of the Bianchi orbifold are in bijection with the class number. The Schmidt arrangement is another aspect of the orbifold: in essence, 𝒮K\mathcal{S}_{K} represents a particular choice of geodesic surface in the manifold. The classification of geodesic surfaces in Bianchi orbifolds is not yet well understood.

Refer to caption
Figure 7.4. A (2)\mathbb{Q}(\sqrt{-2})-Apollonian packing with curvatures shown.
Refer to caption
Figure 7.5. A (7)\mathbb{Q}(\sqrt{-7})-Apollonian packing.

There is a simple geometric criterion for an Apollonian circle packing as a subset of 𝒮(i){\mathcal{S}}_{\mathbb{Q}(i)}: it is obtained from any one circle by adding on the largest exteriorly tangent circle at each tangency point. This criterion, applied to other Schmidt arrangements, gives KK-Apollonian packings for other imaginary quadratic fields (Figures 7.4 and 7.5). Along with these packings come KK-Apollonian groups for which these packings are the limit set. These are thin groups acting on appropriate clusters of circles (analogous to the notion of Descartes quadruple) to generate the packing. For example, in (2)\mathbb{Q}(\sqrt{-2}) the relevant cluster has as tangency graph a cube, and there are six swaps through faces, so that the KK-Apollonian group is a free product of six copies of /2\mathbb{Z}/2\mathbb{Z}. The description of the local obstructions can be extended to KK-Apollonian packings [Sta18a].

8. A postscript

I have had the good fortune of a great deal of freedom in choosing and exploring mathematical projects. I have found myself attracted to subjects in number theory that have an essential geometric aspect, in particular one which can illuminate proofs and demand its own questions. I fell in love with quadratic forms and continued fractions through the Farey subdivision and Conway’s topograph. I studied curves in graduate school, and learned that it was the genus – their topology – that controlled their Diophantine behaviour. I believe strongly that number theory – probably all number theory – is essentially geometric, and that algebra, although powerful and in possession of a beauty of its own, can at times obscure a hidden geometric splendour.

Of my own research explored in these notes, every project involved extensive computer experiments, often visual ones. As I discovered the properties of Schmidt arrangements, I held an old fashioned compass to a computer print-out to find patterns. As I worked with Harriss and Trettel on algebraic numbers, we made an explicit decision to let aesthetics drive our computer experiments, which is how choices like sizing by discriminant and measuring approximation in hyperbolic geometry were born.

We are just discovering the many ways in which Apollonian circle packings are essential and natural objects in number theory, much like elliptic curves. But my favourite justification is geometric: draw the Schmidt arrangement, which is, in a very real sense, the way that Gaussian rationals choose to organize themselves, and the Apollonian packings are an essential intermediate geometric piece. They pop out of the picture. They cannot be avoided: they are dictated by nature.

I’ve learned many things from these experiences. I’ve learned that the visual cortex is a powerful tool for creativity and intuition, as well as reasoning. Patterns emerge to our visual cortex that will pass unnoticed in numerical data. We are essentially visual and social beings, and as such, we vividly recall our visual and social encounters. This is why mathematics is often best conveyed in stories and pictures. We remember mathematical objects that appear, to us, to have personality and shape. As mathematicians, we do this, more-or-less unconsciously, with the most abstract mathematical objects; they become our friends, our tormentors, our landscapes. Why not also do it explicitly?

I encourage the reader to wander the pages of John H. Conway and Francis Y. C. Fung’s The Sensual Quadratic Form [Con97], Martin H. Weissman’s An Illustrated Theory of Numbers [Wei17], and Allen Hatcher’s Topology of Numbers [Hat22]; to value an illustrative and visual approach to mathematics; and, at the risk of sentimentality, to follow one’s heart in mathematical research and elsewhere.

References

  • [AR23] Jorge L. Ramírez Alfonsín and Iván Rasskin. A polytopal generalization of apollonian packings and descartes’ theorem, 2023.
  • [BE09] Yann Bugeaud and Jan-Hendrik Evertse. Approximation of complex algebraic numbers by algebraic numbers of bounded degree. Ann. Sc. Norm. Super. Pisa Cl. Sci. (5), 8(2):333–368, 2009.
  • [BF11] Jean Bourgain and Elena Fuchs. A proof of the positive density conjecture for integer Apollonian circle packings. J. Amer. Math. Soc., 24(4):945–967, 2011.
  • [BK11] Jean Bourgain and Alex Kontorovich. On Zaremba’s conjecture. C. R. Math. Acad. Sci. Paris, 349(9-10):493–495, 2011.
  • [BK14] Jean Bourgain and Alex Kontorovich. On the local-global conjecture for integral Apollonian gaskets. Invent. Math., 196(3):589–650, 2014. With an appendix by Péter P. Varjú.
  • [Cas78] J. W. S. Cassels. Rational quadratic forms, volume 13 of London Mathematical Society Monographs. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], London-New York, 1978.
  • [CFHS19] Sneha Chaubey, Elena Fuchs, Robert Hines, and Katherine E. Stange. The dynamics of super-Apollonian continued fractions. Trans. Amer. Math. Soc., 372(4):2287–2334, 2019.
  • [Chr16] A. David Christopher. A partition-theoretic proof of Fermat’s two squares theorem. Discrete Math., 339(4):1410–1411, 2016.
  • [Con97] John H. Conway. The sensual (quadratic) form, volume 26 of Carus Mathematical Monographs. Mathematical Association of America, Washington, DC, 1997. With the assistance of Francis Y. C. Fung.
  • [DSV03] Giuliana Davidoff, Peter Sarnak, and Alain Valette. Elementary number theory, group theory, and Ramanujan graphs, volume 55 of London Mathematical Society Student Texts. Cambridge University Press, Cambridge, 2003.
  • [FS11] Elena Fuchs and Katherine Sanden. Some experiments with integral Apollonian circle packings. Exp. Math., 20(4):380–399, 2011.
  • [FSZ19] Elena Fuchs, Katherine E. Stange, and Xin Zhang. Local-global principles in circle packings. Compos. Math., 155(6):1118–1170, 2019.
  • [Fuc11] Elena Fuchs. Strong approximation in the Apollonian group. J. Number Theory, 131(12):2282–2302, 2011.
  • [Fuc13] Elena Fuchs. Counting problems in Apollonian packings. Bull. Amer. Math. Soc. (N.S.), 50(2):229–266, 2013.
  • [GLM+03] Ronald L. Graham, Jeffrey C. Lagarias, Colin L. Mallows, Allan R. Wilks, and Catherine H. Yan. Apollonian circle packings: number theory. J. Number Theory, 100(1):1–45, 2003.
  • [GLM+05] Ronald L. Graham, Jeffrey C. Lagarias, Colin L. Mallows, Allan R. Wilks, and Catherine H. Yan. Apollonian circle packings: geometry and group theory. I. The Apollonian group. Discrete Comput. Geom., 34(4):547–585, 2005.
  • [Hat22] Allen Hatcher. Topology of numbers. American Mathematical Society, Providence, RI, [2022] ©2022.
  • [Hen96] Douglas Hensley. Erratum: “A polynomial time algorithm for the Hausdorff dimension of continued fraction Cantor sets”. J. Number Theory, 59(2):419, 1996.
  • [Hir67] K. E. Hirst. The Apollonian packing of circles. J. London Math. Soc., 42:281–291, 1967.
  • [HKRS24] Summer Haag, Clyde Kertzer, James Rickards, and Katherine Stange. The local-global conjecture for Apollonian circle packings is false. Ann. of Math. (2), 200(2):749–770, 2024.
  • [Hol21] Jan E. Holly. What type of Apollonian circle packing will appear? Amer. Math. Monthly, 128(7):611–629, 2021.
  • [HST22] Edmund Harriss, Katherine E. Stange, and Steve Trettel. Algebraic number starscapes. Exp. Math., 31(4):1098–1149, 2022.
  • [IR82] Kenneth F. Ireland and Michael I. Rosen. A classical introduction to modern number theory, volume 84 of Graduate Texts in Mathematics. Springer-Verlag, New York-Berlin, revised edition, 1982.
  • [Isk71] V. A. Iskovskikh. A counterexample to the Hasse principle for systems of two quadratic forms in five variables. Mat. Zametki, 10:253–257, 1971.
  • [KO11] Alex Kontorovich and Hee Oh. Apollonian circle packings and closed horospheres on hyperbolic 3-manifolds. J. Amer. Math. Soc., 24(3):603–648, 2011. With an appendix by Oh and Nimish Shah.
  • [Koc20] Jerzy Kocik. Apollonian depth and the accidental fractal, 2020.
  • [Kok39] J. F. Koksma. über die Mahlersche Klasseneinteilung der transzendenten Zahlen und die Approximation komplexer Zahlen durch algebraische Zahlen. Monatsh. Math. Phys., 48:176–189, 1939.
  • [Kon13] Alex Kontorovich. From Apollonius to Zaremba: local-global phenomena in thin orbits. Bull. Amer. Math. Soc. (N.S.), 50(2):187–228, 2013.
  • [Kow19] Emmanuel Kowalski. An introduction to expander graphs, volume 26 of Cours Spécialisés [Specialized Courses]. Société Mathématique de France, Paris, 2019.
  • [LMW02] Jeffrey C. Lagarias, Colin L. Mallows, and Allan R. Wilks. Beyond the Descartes circle theorem. Amer. Math. Monthly, 109(4):338–361, 2002.
  • [LO13] Min Lee and Hee Oh. Effective circle count for Apollonian packings and closed horospheres. Geom. Funct. Anal., 23(2):580–621, 2013.
  • [Mar22] Daniel Martin. A geometric study of circle packings and ideal class groups, 2022.
  • [Mat93] Yuri V. Matiyasevich. Hilbert’s tenth problem. Foundations of Computing Series. MIT Press, Cambridge, MA, 1993. Translated from the 1993 Russian original by the author, With a foreword by Martin Davis.
  • [McM98] Curtis T. McMullen. Hausdorff dimension and conformal dynamics. III. Computation of dimension. Amer. J. Math., 120(4):691–721, 1998.
  • [MR03] Colin Maclachlan and Alan W. Reid. The arithmetic of hyperbolic 3-manifolds, volume 219 of Graduate Texts in Mathematics. Springer-Verlag, New York, 2003.
  • [Nie78] Harald Niederreiter. Quasi-Monte Carlo methods and pseudo-random numbers. Bull. Amer. Math. Soc., 84(6):957–1041, 1978.
  • [Oh10] Hee Oh. Dynamics on geometrically finite hyperbolic manifolds with applications to Apollonian circle packings and beyond. In Proceedings of the International Congress of Mathematicians. Volume III, pages 1308–1331. Hindustan Book Agency, New Delhi, 2010.
  • [Oh14] Hee Oh. Apollonian circle packings: dynamics and number theory. Jpn. J. Math., 9(1):69–97, 2014.
  • [OS12] Hee Oh and Nimish Shah. The asymptotic distribution of circles in the orbits of Kleinian groups. Invent. Math., 187(1):1–35, 2012.
  • [Par08] John R. Parker. Hyperbolic spaces: The jyväskylä notes, 2008.
  • [Ric23] James Rickards. Apollonian. https://github.com/JamesRickards-Canada/Apollonian, 2023.
  • [Ric24] James Rickards. The Apollonian staircase. Int. Math. Res. Not. IMRN, (2):1–33, 2024.
  • [Rot55] K. F. Roth. Rational approximations to algebraic numbers. Mathematika, 2:1–20; corrigendum, 168, 1955.
  • [RS24] James Rickards and Katherine E. Stange. Reciprocity obstructions in semigroup orbits in sl(2, z), 2024.
  • [Sar] Peter Sarnak. Letter to Lagarias. http://www.math.princeton.edu/sarnak.
  • [Sch75] Asmus L. Schmidt. Diophantine approximation of complex numbers. Acta Math., 134:1–85, 1975.
  • [Ser85a] Caroline Series. The geometry of Markoff numbers. Math. Intelligencer, 7(3):20–29, 1985.
  • [Ser85b] Caroline Series. The modular surface and continued fractions. J. London Math. Soc. (2), 31(1):69–80, 1985.
  • [Sha07] Lisa Shapiro, editor. The Correspondence Between Princess Elisabeth of Bohemia and René Descartes. University of Chicago Press, 2007.
  • [Sil94] Joseph H. Silverman. Advanced topics in the arithmetic of elliptic curves, volume 151 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1994.
  • [Spr69] V. G. Sprindžuk. Mahler’s problem in metric number theory. Translations of Mathematical Monographs, Vol. 25. American Mathematical Society, Providence, R.I., 1969. Translated from the Russian by B. Volkmann.
  • [Sta18a] Katherine E. Stange. The Apollonian structure of Bianchi groups. Trans. Amer. Math. Soc., 370(9):6169–6219, 2018.
  • [Sta18b] Katherine E. Stange. Visualizing the arithmetic of imaginary quadratic fields. Int. Math. Res. Not. IMRN, (12):3908–3938, 2018.
  • [Sz69] V. G. Sprindˇzuk. Mahler’s problem in metric number theory, volume Vol. 25 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 1969. Translated from the Russian by B. Volkmann.
  • [Trk07] D. Trkovská. Felix Klein and his Erlanger Programm, 2007.
  • [Vin14] Ilya Vinogradov. Effective bisector estimate with application to Apollonian circle packings. Int. Math. Res. Not. IMRN, (12):3217–3262, 2014.
  • [Voi21] John Voight. Quaternion algebras, volume 288 of Graduate Texts in Mathematics. Springer, Cham, [2021] ©2021.
  • [VW24] Polina Vytnova and Caroline Wormell. Hausdorff dimension of the apollonian gasket, 2024.
  • [Wei17] Martin H. Weissman. An illustrated theory of numbers. American Mathematical Society, Providence, RI, 2017.
  • [Zag90] D. Zagier. A one-sentence proof that every prime p1(mod4)p\equiv 1~{}(\textup{mod}~{}4) is a sum of two squares. Amer. Math. Monthly, 97(2):144, 1990.
  • [Zar72] S. K. Zaremba. La méthode des “bons treillis” pour le calcul des intégrales multiples. In Applications of number theory to numerical analysis (Proc. Sympos., Univ. Montréal, Montreal, Que., 1971), pages 39–119. Academic Press, New York-London, 1972.