An illustrated introduction to the arithmetic of Apollonian circle packings, continued fractions, and other thin orbits

Katherine E. Stange Department of Mathematics, University of Colorado, Campux Box 395, Boulder, Colorado 80309-0395 kstange@math.colorado.edu

(Date: , Draft #1)

Abstract.

These notes cover and expand upon the material for two summer schools: The first, which was held at CIRM, Marseille, France, July 10-14, 2023, as part of Renormalization and Visualization for packing, billiard and surfaces, was titled Number theory as a door to geometry, dynamics and illustration. The second was held at NSU IMS in Singapore, June 3-7, 2024, as part of Computational Aspects of Thin Groups, and was titled Integral packings and number theory. Both courses were put together by a number theorist for students and researchers in other fields. They cover a web of ideas relating to Apollonian circle packings, integral orbits, thin groups, hyperbolic geometry, continued fractions, and Diophantine approximation. The connection of geometry and dynamics to number theory gives an opportunity to illustrate arithmetic by appealing to our visual intuition.

Key words and phrases:

THIS IS A DRAFT;
PLEASE REPORT ERRORS

1. Introduction

These notes were meant to serve for two separate summer schools. The first, which was held at CIRM, Marseille, France, July 10-14, 2023, as part of Renormalization and Visualization for packing, billiard and surfaces, was titled Number theory as a door to geometry, dynamics and illustration. The abstract was:

The course will explore several related topics in number theory with dynamical and/or geometric facets: continued fractions, Diophantine approximation, and Apollonian circle packings. We will focus on both theoretical and experimental tools; a parallel goal will be to experience the role of visualization and illustration in mathematical research.

In covering background material, the approach will emphasize the visual and dynamical:

(1)

Continued fractions, quadratic forms, and Diophantine approximation.

(2)

Hyperbolic geometry, Minkowski space, and Kleinian groups.

With these tools at hand, we will study some areas of current research:

(1)

The geometry of Diophantine approximation and continued fractions in the complex plane, including algebraic starscapes and Schmidt arrangements.

(2)

Apollonian circle packings, with an emphasis on their surprising relationships to the preceding topics.

The second was held at NSU IMS in Singapore, June 3-7, 2024, as part of Computational Aspects of Thin Groups, and is titled Integral packings and number theory. The abstract for this course was:

The course will use Apollonian circle packings as a central example for connections between number theory and thin groups. The symmetries of such a packing are governed by a thin group called the Apollonian group, and the curvatures form an orbit of that group. Our goal is to study such orbits, particularly in the case that the orbit consists entirely of integers. Some of the topics that are entwined with the study of these packings include quadratic forms, hyperbolic geometry in 2 and 3 dimensions, arithmetic geometry, continued fractions, spectral theory of graphs and strong approximation. I will give a tour of the area, with the goal of introducing the number theory perspective on such problems, highlighting the tools at hand, and finishing by considering the wider class of problems that can be phrased as questions about the arithmetic of thin orbits.

In both cases, the audience did not consist of number theorists, and my goal is to highlight connections between number theory and other domains, and to share some of the number theorist’s perspective on ideas familiar in other domains. The notes will contain a number of examples of the ways that number theory leads to geometry and dynamics. As a number theorist, I see these relationships as number theory questions with geometric answers, but this is surely not the only way to see them.

Three motivating questions are:

(1)

Which complex numbers can be well approximated by algebraic numbers (of various flavours)?
(2)

What curvatures appear in a primitive integral Apollonian circle packing or another thin orbit?
(3)

For imaginary quadratic fields $K$ , how is the number theory of $\mathcal{O}_{K}$ visible in the Bianchi group $\operatorname{PSL}_{2}(\mathcal{O}_{K})$ ?

It turns out these are all connected to one another by their underlying geometry. And they are all generalizations of very classical questions in number theory about the integer solutions to equations, the distribution of rationals in the real line, and the study of continued fraction expansions.

I must admit that a particular love of the author is the Apollonian circle packing. If one is interested in the classical number theoretic topics of $\operatorname{PSL}_{2}(\mathbb{Z})$ , continued fractions, or quadratic forms, and one begins to wonder about higher dimensional analogues, one comes very naturally to study $\operatorname{PSL}_{2}(\mathbb{Z}[i])$ , Schmidt’s continued fractions, and Hermitian forms. Studying these geometrically, one discovers that the essential new riddle that lies wrapped in the mystery inside the enigma…is the Apollonian circle packing. It is a halfway house, an unavoidable geometric feature, but an orbit of a group that is not itself an arithmetic group. It begs the same questions we ask for Diophantine equations, without being one itself. This is its allure.

Finally, these topics are all examples of the interplay between research and the illustration of mathematics. The use of computers to explore mathematics with our highly evolved visual cortices is not only productive, but also sensual, in the language of Conway [Con97]. One of my goals in these notes is to emphasize the utility and beauty of these explorations, and make a case for their wider use in research.

These notes are written by a number theorist. I apologize in advance for the inevitable injustice in making connections to other domains which I cannot fully follow up. I hope, however, that this insufficiency will serve as a motivation. There are a great number of interesting unexplored species in this jungle.

One final note to the reader: in the off chance that your enthusiasm causes you to rush headlong (as mine sometimes does), I would warn you that some important statements are contained in the exercises, so their statements are an important part of the narrative.

1.0.1. Acknowledgements

I owe a great debt of gratitude to Jayadev Athreya for inspiration and support over many years, and for his invitation to participate in his Morlet semester at Centre international de rencontres mathématiques, which eventually led to these notes. Thank you to all the organizers of the Research School, Renormalization and Visualization for packing, billiard and surfaces, held July 10-14, 2023, for which these lecture notes were originally drafted, and to the organizers of the Programme Computational Aspects of Thin Groups, held June 3-7, 2024, for which these lecture notes were adapted and expanded. These organizers include Jayadev Athreya and Nicalas Bedaride in the first case; and Bettina Eick, Eamonn O’Brien, Alan W. Reid, and Ser Peow Tan in the second. Thank you to all the students and researchers who attended and contributed feedback; and gratitude to the Centre international de rencontres mathématiques (CIRM) and the National University of Singapore Institute for Mathematical Sciences (NUS IMS) for hosting and supporting these conferences. Thank you to James Rickards for TA help at the IMS school. Thank you to Elena Fuchs, Daniel Martin and James Rickards for consultation, and to many for sharing permission to use their images (individually noted in each case). Thank you to Glen Whitney for feedback. Thank you to Summer Haag for her detailed reading and feedback, and willingness to dive into the minus signs. And finally, thank you to all my intrepid collaborators on all the projects which are mentioned in these notes.

2. A number theory perspective

2.1. Diophantine problems

Given a system of polynomial equations, what are the integer or rational solutions? Such problems are called Diophantine problems and make up a significant part of modern and ancient number theory.

Some examples of Diophantine problems are:

(1)

Solutions to a single equation $f(x_{1},\ldots,x_{n})=0$ , amongst the most famous of which are quadratic forms $ax^{2}+bxy+cy^{2}=m$ and elliptic curves $y^{2}=x^{3}+ax+b$ .
(2)

Generalizing the previous case, solutions to systems of equations in several variables, which represent algebraic varieties such as abelian varieties, curves and surfaces.
(3)

Many problems reduce to or can be phrased as relating to Diophantine problems, such as the famous congruent number problem, the units in a ring of integers (e.g., Pell’s equation), and the arithmetic study of function iteration (arithmetic dynamics).
(4)

Among the more significant examples are algebraic varieties arising as moduli spaces in the study of other problems (e.g., modular curves), where knowing the points amounts to classifying mathematical objects of some type.
(5)

The study of matrix groups and algebraic groups more generally. This includes most/many Lie groups.
(6)

In the context of a group such as an algebraic group, the study of the elements of an orbit.

The depth and breadth of the partial list above amply motivates the study of Diophantine equations.

As we shall see, in the process of trying to understand these problems, we are led naturally to look for solutions modulo $p$ , in the $p$ -adics $\mathbb{Q}_{p}$ , in the real numbers $\mathbb{R}$ and in the complex numbers $\mathbb{C}$ . Knowledge of the solutions in these other realms all have a role to play in the original realm – $\mathbb{Q}$ or $\mathbb{Z}$ – which is in some sense the hardest place to look for solutions. We are also led naturally to generalize these questions to number fields and their rings of integers.

2.2. Quadratic forms

There are some natural starting points for the study of Diophantine equations. After linear equations such as $ax+by=c$ (which one solves with the Euclidean algorithm, which is central to everything), the historical and logical next step is quadratic equations and quadratic forms. These will have an important role to play throughout the entirety of these notes.

An $n$ -ary quadratic form is a homogeneous polynomial in $n$ variables of degree $2$ . (The term form refers to the homogeneity.) Therefore, a binary quadratic form is a homogeneous polynomial in two variables of degree $2$ , which will necessarily have the shape

f(x,y)=ax^{2}+bxy+cy^{2}

for some coefficients $a,b,c$ , which we will take to be in $\mathbb{R}$ , to begin.

Binary quadratic forms have a beautiful Diophantine theory, whose geometric incarnations we will meet later. The main question is to ask: what are the values represented by a quadratic form? For the number theorist, we ask especially for forms with integer coordinates, evaluated on integer inputs. A famous example is the following:

Theorem 2.1 (Fermat).

The quadratic form $x^{2}+y^{2}$ represents exactly those odd primes which are $1$ modulo $4$ .

There are many proofs of this fact, arising from many different tools. It can be seen as a property of the Gaussian integers $\mathbb{Z}[i]$ and quadratic reciprocity (Corollary 2.8 below), a result of the fact that $\pi>2$ via the geometry of lattices (Exercise 2.3), using dynamics (Exercise 2.9, or proven using Jacobi sums [IR82] or even partition theory [Chr16].

The discriminant $\Delta_{f}$ of the form $f$ is $b^{2}-4ac$ . If the form takes both negative and positive values when evaluated at $(x,y)\in\mathbb{Z}$ , we call it indefinite. Otherwise, there are two cases: it can be semi-definite (taking the value $0$ on some non-trivial input) or definite (if it does not). If it is definite, its values away from $(0,0)$ all have the same sign, so it is either positive or negative.

As number theorists, our interest will be in the case $a,b,c\in\mathbb{Q}$ . Up to a scalar multiple, it is convenient to take $a,b,c\in\mathbb{Z}$ having no common factor. This is called a primitive integral binary quadratic form. We are mainly interested in positive definite primitive integral binary quadratic forms. This cumbersome terminology teases the limit of human patience, so some authors abbreviate such forms as ‘PDPIBQFs’ or something similarly awkward; we will just say ‘integral quadratic form’ and imply all the abovementioned qualifiers unless stated otherwise. If we say ‘real quadratic form’ we mean a positive definite binary quadratic form with real coefficients.

We wish to abstract away the choice of coordinates inherent in this definition: we would like to identify $f(x,y)$ and $g(X,Y)$ if they are related by a linear change of variables $X=\alpha x+\beta y$ , $Y=\gamma x+\delta y$ , where $\alpha,\beta,\delta,\gamma\in\mathbb{Z}$ and $\alpha\delta-\beta\gamma=1$ . We write this as a matrix action of $\operatorname{SL}_{2}(\mathbb{Z})$ :

g(X,Y)=M\cdot f(x,y)=f(M\cdot(x,y)^{t}),\quad M=\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}\in\operatorname{SL}_{2}(\mathbb{Z}).

We call two such forms properly equivalent or just equivalent (there is also a notion of $\operatorname{GL}_{2}(\mathbb{Z})$ -equivalence, but we will prefer this one for now). Equivalent forms share much of their behaviour. Two equivalent forms will represent the same set of integers upon integer inputs, and will have the same discriminant. If $f$ is primitive, then $M\cdot f$ is also primitive. Finally, observe that the matrix $-I\in\operatorname{SL}_{2}(\mathbb{Z})$ has no effect on a form $f$ , so that it is just as reasonable to talk about the action of $\operatorname{PSL}_{2}(\mathbb{Z}):=\operatorname{SL}_{2}(\mathbb{Z})/\pm I$ .

Exercise 2.2.

Verify the statement that proper equivalence preserves primitivity, discriminant and the set of integers represented.

Exercise 2.3.

This exercise outlines a geometric proof that $x^{2}+y^{2}$ represents exactly those odd primes that are $1$ modulo $4$ . It turns on the fact that $\pi>2$ .

(1)

Observe that $x^{2}+y^{2}$ cannot represent anything which is $3$ modulo $4$ .
(2)

Let $p\equiv 1~{}(\textup{mod}~{}4)$ . Show that $-1$ is a square modulo $p$ (this can be accomplished using the group theory of $(\mathbb{Z}/p\mathbb{Z})^{*}$ , which is cyclic). Conclude that $p$ divides $m^{2}+1$ for some $m$ .
(3)

Prove Minkowski’s Theorem in dimension two: Any convex set in $\mathbb{R}^{2}$ which is symmetric about the origin and of volume exceeding $4$ contains a non-zero integer point.
(4)

Consider the lattice $\Lambda$ of $\mathbb{R}^{2}$ generated as the $\mathbb{Z}$ -span of $(1,m)$ and $(0,p)$ . Show that all elements $\mathbf{v}\in\Lambda$ have norm squared $||\mathbf{v}||^{2}$ divisible by $p$ . Compute the covolume of this lattice (the area of the fundamental parallelogram).
(5)

Conclude the argument using Minkowski’s Theorem to show there is an element $\mathbf{v}\in\Lambda$ having $||\mathbf{v}||^{2}=p$ .

2.3. Local-to-global

Consider the question of when a positive integer $n$ is a sum of three squares: $n=x^{2}+y^{2}+z^{2}$ . It was first observed by Legendre that this has a solution $(x,y,z)$ if and only if $n$ is not $7$ modulo $8$ . We can rephrase this to say that the equation has a solution in $\mathbb{Z}$ if and only if it has a solution modulo every odd prime, and modulo $8$ . This characterizes $n$ ’s representability in terms of the ‘local’ congruence conditions ‘at’ each prime. In fact, Fermat’s Theorem, that an odd prime can be represented as a sum of squares if and only if it is $1~{}(\textup{mod}~{}4)$ , can be viewed from this lens also. This is called a local-to-global principle.

To formalize this just slightly, let $f_{i}(x_{1},\ldots,x_{n})=0$ be a system of polynomial equations with coefficients in $\mathbb{Q}$ . Suppose this system has a solution in $\mathbb{Q}$ , say $(x_{1},\ldots,x_{n})\in\mathbb{Q}^{n}$ . Since $\mathbb{Q}$ embeds in $\mathbb{R}$ and in $\mathbb{Q}_{p}$ for all $p$ , the $\mathbb{Q}$ -solution, called a global solution, will also provide local solutions everywhere, i.e. solutions in $\mathbb{Q}_{p}$ and $\mathbb{R}$ . It is traditional in this setting to consider $\infty$ a prime of $\mathbb{Q}$ , along with the usual primes, calling all of these places or valuations; the primes are finite places and $\infty$ is the infinite place. The field $\mathbb{R}=:\mathbb{Q}_{\infty}$ is the completion of $\mathbb{Q}$ at the place $\infty$ and $\mathbb{Q}_{p}$ is the completion at the place $p$ , for each prime $p$ . These completions, called local fields, are actually easier fields in which to study polynomial solutions. All of this has generalizations for number fields, but we will stick to $\mathbb{Q}$ for the moment.

Thus we have observed that global solutions imply local ones. In the case of quadratic forms, the converse is true:

Theorem 2.4 (Hasse-Minkowski Theorem).

Let $Q(x_{1},\ldots,x_{n})$ be an $n$ -ary quadratic form with coefficients in $\mathbb{Q}$ . Then $Q=0$ has a solution in $\mathbb{Q}$ (i.e. $(x_{1},\ldots x_{n})\in\mathbb{Q}^{n}$ ) if and only if it has a solution in $\mathbb{Q}_{p}$ for all $p$ and in $\mathbb{R}$ .

There is a version of this theorem over number fields more generally. For a proof see [Voi21, Theorem 14.3.3]; it is non-trivial. When such a converse holds, i.e. there are global solutions if and only if there are solutions everywhere locally, then we say that a variety satisfies the Hasse principle or a local-to-global principle. The three squares example of Legendre at the beginning of this section demonstrates an integral flavour of the local-to-global principle, obtained by looking at the integers $\mathbb{Z}$ and $\mathbb{Z}_{p}$ .

The Hasse principle (when it holds) is powerful. In particular, determining if a variety $X$ has solutions over a local field is a finite computation (for $\mathbb{Q}_{p}$ , check for solutions modulo $p$ and then use Hensel’s Lemma). By contrast, determining the existence of global solutions for a variety without the Hasse principal can be quite difficult. In fact, the general problem of determining if there are integer solutions to a Diophantine problem is undecideable; this is known as Hilbert’s Tenth Problem [Mat93].

Exercise 2.5.

Show that $x^{2}+y^{2}+7z^{2}=0$ has no non-trivial $\mathbb{R}$ -solutions, no non-trivial $\mathbb{Q}_{7}$ -solutions and no non-trivial $\mathbb{Q}$ -solutions.

2.4. Quadratic reciprocity

Quadratic reciprocity was first observed by Legendre and Euler and proved by Gauss. Whereas Sunzi’s Theorem¹¹1More commonly known as the Chinese Remainder Theorem can be viewed as a statement that coprime moduli are ‘independent’ in a certain way, quadratic reciprocity describes a deep way in which $\mathbb{Z}/p\mathbb{Z}$ and $\mathbb{Z}/q\mathbb{Z}$ , for primes $p\neq q$ do interact.

Definition 2.6.

The Legendre symbol is the defined for integer $a$ and prime $p$ as:

\left(\frac{a}{p}\right)=\left\{\begin{array}[]{ll}1&a\text{ is a non-zero square modulo }p\\ -1&a\text{ is not a square modulo }p\\ 0&a\text{ is zero modulo }p\\ \end{array}\right.

Theorem 2.7 (Quadratic reciprocity).

Let $p$ and $q$ be distinct odd primes. Then

\left(\frac{p}{q}\right)\left(\frac{q}{p}\right)=(-1)^{\frac{p-1}{2}\frac{q-1}{2}}

Furthermore,

(1)

$-1$ is a square modulo $p$ if and only if $p\equiv 1~{}(\textup{mod}~{}4)$
(2)

$2$ is a square modulo $p$ if and only if $p\equiv\pm 1~{}(\textup{mod}~{}8)$

It is said that there are hundreds of proofs of quadratic reciprocity in the literature²²2Indeed, Lemmermeyer maintains a list currently over 300 entries: https://www.mathi.uni-heidelberg.de/~flemmermeyer/qrg_proofs.html.. Among the themes recurring in these proofs we often see aspects of the Fourier transform, or the signs of permutations, among others. We now provide a proof of Fermat’s Theorem which depends on quadratic reciprocity.

Corollary 2.8.

The integers $n$ which can be represented as a sum of two integer squares are exactly those for which, for each prime $p$ that divides $n$ , either (a) $p$ divides $n$ to an even power; or (b) $p\equiv 1,2~{}(\textup{mod}~{}4)$ .

Proof.

First one shows that a prime $p\equiv 3~{}(\textup{mod}~{}4)$ cannot be represented, simply because there’s no solution modulo $4$ .

Then one shows that primes equivalent to $1~{}(\textup{mod}~{}4)$ can be represented, as follows. By quadratic reciprocity, such a prime $p=4k+1$ has the property that $-1$ is a square modulo $p$ . Therefore $4k\equiv-1\equiv x^{2}~{}(\textup{mod}~{}p)$ . Thus $np=x^{2}+1$ , a sum of squares. Thus $(x+i)(x-i)=np$ in $\mathbb{Z}[i]$ , which has unique factorization. Since the gcd of $x+i$ and $x-i$ is at most a factor of $2i$ , and $p$ is odd, we see that $p$ cannot be prime in $\mathbb{Z}[i]$ (since otherwise, it would divide both, by symmetry under conjugation). Therefore $p=(a+bi)(c+di)$ , where neither factor is a unit times an integer, so $abcd\neq 0$ . Then since $p\in\mathbb{Z}$ , $c+di=k(a-bi)$ for some $k\in\mathbb{Z}$ . Since $p$ is prime, $k=\pm 1$ . So we have $p=a^{2}+b^{2}$ .

Finally, observe that sums of squares are exactly the norms of Gaussian integers. We have shown that every prime $p\equiv 1~{}(\textup{mod}~{}4)$ is such a norm, and no prime $p\equiv 3~{}(\textup{mod}~{}4)$ is such a norm. Also, $2=1^{2}+1^{2}=N(1+i)$ . By unique factorization, the norms are exactly those which can be produced as products of norms of Gaussian primes, which we have now essentially classified. ∎

Exercise 2.9.

(Zagier [Zag90]) There is a short proof of Corollary 2.8 using an argument about fixed points. Suppose $p\equiv 1~{}(\textup{mod}~{}4)$ . Let $S=\{(x,y,z)\in\mathbb{Z}^{>0}:x^{2}+4yz=p\}$ be the set of natural numbers solving $x^{2}+4yz=p$ .

(1)

Show the following is an involution of $S$ and has exactly one fixed point:

(x,y,z)\mapsto\begin{cases}(x+2z,z,y-x-z)&\text{if }x<y-z;\\ (2y-x,y,x-y+z)&\text{if }y-z<x<2y;\\ (x-2y,x-y+z,y)&\text{if }x>2y.\end{cases}

(2)

What other (more trivial) involutions are there?
(3)

Prove $p$ can be written as a sum of two squares.

2.5. Brauer-Manin obstructions

A famous counterexample to the Hasse principle due to Selmer is given by

3x^{3}+4y^{3}+5z^{3}=0

which has local solutions everywhere but no global solutions. When the Hasse principle fails, many of the failures are captured by a Brauer-Manin obstruction, which arises from reciprocity laws. We will illustrate the phenomenon by an example of a local-to-global failure due to Iskovskikh [Isk71]:

y^{2}+z^{2}=(3-x^{2})(x^{2}-2)

Proposition 2.10.

The equation above has local solutions everywhere, but no global solutions.

Proof.

To show that it has a solution in $\mathbb{R}$ , take $x^{2}>2$ but $x^{2}<3$ so the right hand side is positive. For $\mathbb{Q}_{p}$ , it suffices to find solutions modulo $p$ and lift by Hensel’s lemma; we leave this as an exercise.

To show there are no global $\mathbb{Q}$ -solutions, first, let us homogenize the equation, so we can look for $\mathbb{Z}$ solutions:

X:y^{2}+z^{2}=(3t^{2}-x^{2})(x^{2}-2t^{2})=:A(t,x)B(t,x).

We may assume that there is no common factor to the quadruple $x,y,z,t$ .

First we examine the equation modulo $4$ . By running through the possibilities for $x$ and $t$ being even or odd, we restrict the possible values of $(A,B)$ modulo $4$ : $\{(0,0),(2,3),(3,1),(3,2)\}$ . However, $(A,B)\equiv(0,0)~{}(\textup{mod}~{}4)$ if and only if $x,y$ are both even if and only if $x,y,z,t$ are all even. But we have ruled out such a common factor, so we have the list:

\{(2,3),(3,1),(3,2)\}.

We will now consider primes $p$ dividing at least one of $A$ or $B$ .

First, suppose $p$ divides both $x$ and $t$ . Since $x,y,z,t$ cannot have a common factor, we may assume without loss of generality that $p$ does not divide $z$ . Then considering the equation modulo $p$ , we find that $-1\equiv y^{2}/z^{2}$ is a square modulo $p$ . Since our previous work ruled out $p=2$ (as $x$ and $t$ are not both even), we conclude that $p\equiv 1~{}(\textup{mod}~{}4)$ by the first supplementary law of quadratic reciprocity (Theorem 2.7(1)).

Second, suppose $p$ divides at most one of $x$ and $t$ . Then $p$ cannot divide both $A$ and $B$ . Let $p^{k}$ be the maximum power to which it appears. On the left, $p^{k}$ divides a sum of squares, so $p^{k}$ must be $0$ , $1$ or $2~{}(\textup{mod}~{}4)$ (Corollary 2.8).

Hence in either case, all the maximal prime powers that divide $A$ or $B$ are $0$ , $1$ or $2~{}(\textup{mod}~{}4)$ and therefore $(A,B)~{}(\textup{mod}~{}4)$ can only be a member of the list

\{(0,0),(1,1),(2,2),(0,1),(1,0),(2,0),(0,2),(1,2),(2,1)\}

Comparing with the list from the first part, we discover that there are no solutions. ∎

The proof is completely elementary with the exception of the use of the first supplementary law of quadratic reciprocity (once directly and once in the form of Corollary 2.8). Therefore we think of this obstruction as arising from quadratic reciprocity.

Exercise 2.11.

Complete the proof above by finding solutions in $\mathbb{Q}_{p}$ for each $p$ .

2.6. The modular group $\operatorname{PSL}_{2}(\mathbb{Z})$

We met $\operatorname{SL}_{2}(\mathbb{Z})$ and $\operatorname{PSL}_{2}(\mathbb{Z})$ as natural groups of symmetries on quadratic forms. They have many other roles to play in mathematics, some of them geometric. This leads to certain connections between quadratic forms and geometry. Recalling these essential and beautiful facts is our next task.

We will begin with the action of $\operatorname{SL}_{2}(\mathbb{Z})$ on the upper half plane $\mathbb{H}^{2}_{U}$ . We define the upper half plane as

\mathbb{H}^{2}_{U}=\{z\in\mathbb{C}:\Im(z)>0\}\subseteq\mathbb{C}.

The notation $\Im$ is for ‘imaginary part’³³3This symbol is LaTeX’s command \Im and was traditionally used in typesetting old european books.; later $\Re$ will be for real part. The action is via Möbius transformations:

\begin{pmatrix}a&b\\ c&d\end{pmatrix}\cdot z=\frac{az+b}{cz+d}

in terms of the usual $\mathbb{C}$ structure. The fact that the matrix $-I$ acts as the trivial Möbius transformation motivates our use of the projectivization $\operatorname{PSL}_{2}(\mathbb{Z}):=\operatorname{SL}_{2}(\mathbb{Z})/\pm I$ .

Exercise 2.12.

Möbius transformations with coefficients from $\mathbb{C}$ , i.e. $\operatorname{PSL}_{2}(\mathbb{C})$ , act on the extended complex plane $\widehat{\mathbb{C}}:=\mathbb{C}\cup\{\infty\}$ .

(1)

Amongst these, show that the Möbius transformations preserving $\mathbb{H}^{2}_{U}$ are exactly those with real coefficients and positive determinant.
(2)

In $\widehat{\mathbb{C}}$ , we consider straight lines to be circles through $\infty$ . Show that Möbius transformations take circles to circles and preserve angles.

A standard fundamental region for the action of $\operatorname{PSL}_{2}(\mathbb{Z})$ on $\mathbb{H}^{2}_{U}$ is as follows.

Theorem 2.13 (for example, [Sil94, Proposition 1.5]).

Let

\mathcal{F}=\left\{z\in\mathbb{H}^{2}_{U}:|z|>1,\frac{1}{2}<\Re(z)\leq\frac{1}{2},\left(|z|=1\Rightarrow\Re(z)\geq 0\right)\right\}.

Then $\mathcal{F}$ is a fundamental domain for the action of $\operatorname{PSL}_{2}(\mathbb{Z})$ on $\mathbb{H}^{2}_{U}$ , meaning that for all $z$ , exactly one $\operatorname{PSL}_{2}(\mathbb{Z})$ -translate of $z$ lies in $\mathcal{F}$ .

To visualize this theorem, it is useful to consider two important elements of $\operatorname{PSL}_{2}(\mathbb{Z})$ :

S=\begin{pmatrix}0&-1\\ 1&0\end{pmatrix},\quad T=\begin{pmatrix}1&1\\ 0&1\end{pmatrix}.

The action of $S$ and $T$ as Möbius transformations is inversion $z\mapsto-1/z$ and translation $z\mapsto z+1$ , respectively. See Figure 2.1 for an image of $\mathcal{F}$ and some of its translates under $S$ and $T$ .

Exercise 2.14.

(1)

Show that if $M=\begin{pmatrix}a&b\\ c&d\end{pmatrix}\in\operatorname{PSL}_{2}(\mathbb{Z})$ , then

$\Im(M\cdot z)=\frac{\Im(z)}{|cz+d|^{2}}.$
(2)

Prove that any $\operatorname{PSL}_{2}(\mathbb{Z})$ orbit intersects $\mathcal{F}$ (this proves a part of Theorem 2.13). To do so, first show that there is an element $M\in\operatorname{PSL}_{2}(\mathbb{Z})$ which maximizes the imaginary part of $M\cdot z$ . Then illustrate how to move this element of the orbit into $\mathcal{F}$ using $S$ and $T$ .

The $\operatorname{PSL}_{2}(\mathbb{Z})$ -stabilizer of each point $z\in\mathcal{F}$ is trivial, with the exception of the following special points:

\operatorname{Stab}(i)=\{I,S\},\quad\operatorname{Stab}(e^{2\pi i/6})=\{I,TS,(TS)^{2}\}.

Refer to caption — Figure 2.1. The fundamental region $\mathcal{F}$ of $\operatorname{SL}_{2}(\mathbb{Z})$ and some of its translates.

In fact, these two elements $S$ and $T$ generate $\operatorname{PSL}_{2}(\mathbb{Z})$ .

Theorem 2.15 (for example, [Sil94, Corollary 1.6]).

The group $\operatorname{PSL}_{2}(\mathbb{Z})$ is generated by $S$ and $T$ . In fact, $\operatorname{PSL}_{2}(\mathbb{Z})$ is the free product of the subgroups generated by $S$ and $ST$ , of order $2$ and $3$ respectively. We have

\operatorname{PSL}_{2}(\mathbb{Z})\cong\langle S,T:S^{2}=1,(ST)^{3}=1\rangle.

2.7. Quadratic forms in the upper half plane

Our next goal is to parametrize real quadratic forms $f(x,y):=ax^{2}+bxy+cy^{2}$ in some useful way. To a quadratic form $f(x,y)=ax^{2}+bxy+cy^{2}$ , we might associate the polynomial in one variable $f(x,1)=ax^{2}+bx+c$ ; and to such a minimal polynomial we might naturally associate the quadratic form $ax^{2}+bxy+cy^{2}$ . This association preserves discriminant.

A minimal polynomial with $\Delta<0$ has two complex roots, one of which will lie in $\mathbb{H}^{2}_{U}$ . Thus, to positive definite quadratic forms, we may associate points in the upper half plane, and this association is clearly a bijection, at least as long as we take the form $f$ up to scaling by an element of $\mathbb{R}^{*}$ .

However, associating the root $z$ (or the pair $(z,\overline{z})$ ) to the form $f$ breaks a symmetry, at the very least between $x$ and $y$ , but more generally it is not invariant under our preferred $\operatorname{SL}_{2}(\mathbb{Z})$ -equivalence. First of all, it is more natural to say that the solutions to $f(x,y)=ax^{2}+bxy+cy^{2}=0$ are the two projective points $[z:1]$ and $[\overline{z}:1]$ in $\mathbb{P}^{1}(\mathbb{C})$ , and therefore to identify a solution $[z:1]$ with $[\lambda z:\lambda 1]$ , for $\lambda\in\mathbb{C}^{*}$ . Then, from the perspective of $\operatorname{SL}_{2}(\mathbb{Z})$ -equivalence of quadratic forms, we wish to identify the roots of $f(x,y)$ with those of $M\cdot f(x,y)$ , for each $M=\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}\in\operatorname{SL}_{2}(\mathbb{Z})$ , which are

[\alpha z+\beta:\gamma z+\delta],\quad[\alpha\overline{z}+\beta:\gamma\overline{z}+\delta].

Because we are in projective space, it is more natural to projectivize and identify the roots under $\operatorname{PSL}_{2}(\mathbb{Z})$ -equivalence, since $-I$ has no effect on the form at all⁴⁴4By which I mean, $f(-x,-y)=f(x,y)$ ..

This motivates the following theorem.

Theorem 2.16.

Define a map $\rho$ on the collection of primitive integral binary quadratic forms up to $\mathbb{R}^{*}$ scaling, taking values in $\mathbb{H}^{2}_{U}$ , by letting $\rho(f)=z$ , where $z$ is the root of $f(x,1)$ in $\mathbb{H}^{2}_{U}$ . Then this map is $\operatorname{PSL}_{2}(\mathbb{Z})$ -invariant, where the action of $f(x,y)$ is by proper equivalence, and the action on $\mathbb{H}^{2}_{U}$ is by Möbius transformation. In other words, $M\cdot\rho(f)=\rho(M\cdot f)$ for $M\in\operatorname{PSL}_{2}(\mathbb{Z})$ .

Exercise 2.17.

Prove Theorem 2.16.

The following exercise addresses classical questions of Gauss and Legendre: how many equivalence classes of primitive integral quadratic forms exist for a fixed discriminant?

Exercise 2.18.

(1)

Show that every real quadratic form is equivalent to one of the form $Ax^{2}+Bxy+Cy^{2}$ satisfying (i) $|B|\leq A\leq C$ and (ii) $B\geq 0$ whenever one of the $\leq$ in part (i) is an equality. Hint: use the upper half plane.
(2)

Use the previous exercise to determine how many inequivalent primitive integral quadratic forms there are of discriminant $-4$ and $-20$ .
(3)

Fix $\Delta<0$ . Prove that there are finitely many distinct equivalence classes of integral quadratic forms of discriminant $\Delta$ . Can you give a bound?

2.8. Lattices in the upper half plane

The upper half plane also parametrizes certain lattices⁵⁵5A common place to first meet this idea is in the complex theory of elliptic curves.. A lattice is a discrete⁶⁶6In the metric topology of $\mathbb{C}$ . subgroup of the additive group of $\mathbb{C}$ . We say it is rank two if it is isomorphic to $\mathbb{Z}^{2}$ as a $\mathbb{Z}$ -module.

Exercise 2.19.

Show that a lattice in $\mathbb{C}$ is of rank two if and only if it spans $\mathbb{C}$ as an $\mathbb{R}$ -vector space.

The fundamental observation is that $\mathbb{P}^{1}(\mathbb{C})$ is (almost) in bijection with rank two lattice bases (up to scaling) in $\mathbb{C}$ , via $[z:w]\leftrightarrow w\mathbb{Z}+z\mathbb{Z}$ . I say almost because points for which $z/w\in\mathbb{R}$ give rise to $\mathbb{Z}$ -modules which do not span $\mathbb{C}$ (being either rank one or not discrete). If we restrict to the upper half plane $\mathbb{H}^{2}_{U}\subseteq\widehat{\mathbb{C}}\cong\mathbb{P}^{1}(\mathbb{C})$ , then this associates to each $z\in\mathbb{H}^{2}_{U}$ the rank two lattice $\mathbb{Z}+z\mathbb{Z}$ . As mentioned, we must consider lattices only up to homothety, i.e., scaling by $\mathbb{C}^{*}$ . That is, we say two lattices $\Lambda_{1}$ and $\Lambda_{2}$ are homothetic, writing $\Lambda_{1}\sim\Lambda_{2}$ , if $\Lambda_{1}=\lambda\Lambda_{2}$ for some $\lambda\in\mathbb{C}^{*}$ . It is not hard to verify this is an equivalence relation.

A rank two lattice in $\mathbb{C}$ comes with an orientation: if the angle from the first basis vector to the second is less than $\pi$ , then it is positively oriented and otherwise it is negatively oriented. That $\Im(z)>0$ implies the associated lattices are positively oriented.

Theorem 2.20.

The upper half plane $\mathbb{H}^{2}_{U}$ is in bijection with homothety classes of positively oriented rank two lattice bases in $\mathbb{C}$ , by the following map:

z\mapsto\mathbb{Z}+z\mathbb{Z}.

Furthermore, the bijection is $\operatorname{PSL}_{2}(\mathbb{Z})$ -equivariant, where $\operatorname{PSL}_{2}(\mathbb{Z})$ acts on lattices by change of basis:

\begin{pmatrix}a&b\\ c&d\end{pmatrix}\cdot(\mathbb{Z}+z\mathbb{Z})=(cz+d)\mathbb{Z}+(az+b)\mathbb{Z}.

Exercise 2.21.

Prove the theorem.

2.9. Lattices and quadratic forms

Let us return to the question raised above: What is preserved under the action of $\operatorname{PSL}_{2}(\mathbb{Z})$ on quadratic forms, if not the roots $[z:1]$ and $[\overline{z}:1]$ ? The answer: the collection of the totality of roots of all equivalent forms. By the last section, one way to encapsulate this data is as a $\mathbb{Z}$ -lattice, whose bases are in bijection with the roots. If $[z:1]\in\mathbb{P}^{1}(\mathbb{C})$ represents a root of the form $f(x,y)$ , then we set $\Lambda_{f}:=\mathbb{Z}+z\mathbb{Z}$ . Since we always have two conjugate roots, we always have two conjugate lattices, exactly one of which is positively oriented, and we can choose that lattice to associate to our quadratic form.

From an equivalence class of lattices up to homothety, we can also recover the quadratic form: use a homothety to write the lattice $\Lambda$ as $\mathbb{Z}+z\mathbb{Z}$ where $z\in\mathbb{H}^{2}_{U}$ , take the minimal polynomial $ax^{2}+bx+c$ of $z$ , and let $f(x,y)=ax^{2}+bxy+cy^{2}$ , defined up to scaling by $\mathbb{R}^{*}$ . Another way to recover the form from the lattice is to take $\alpha\mathbb{Z}+\beta\mathbb{Z}$ to the quadratic form $N(x,y)=N(y\alpha+x\beta)$ where $N(z)=|z|^{2}$ , the square of the complex absolute value. Thus, the lattice’s relationship to the form can be seen in two interesting ways: first, as the lattice whose bases form the totality of roots of all equivalent forms; and second, as the lattice whose vector lengths are the values of the quadratic form. It is not perhaps entirely obvious that these are ideas are the same.

Exercise 2.22.

Let $z\in\mathbb{H}^{2}_{U}$ . Define

f(x,y):=\begin{pmatrix}x&y\end{pmatrix}\begin{pmatrix}1\\ -z\end{pmatrix}\overline{\begin{pmatrix}1&-z\end{pmatrix}\begin{pmatrix}x\\ y\end{pmatrix}}.

From this, recover the two interpretations given above of the lattice $\mathbb{Z}+z\mathbb{Z}$ (which is $\operatorname{SL}_{2}(\mathbb{Z})$ -equivalent to $z\mathbb{Z}-\mathbb{Z}$ ) as carrying information about the form $f$ .

In light of this connection, we will use the following notation:

\mathcal{Q}_{\mathbb{R}}:=\left\{\begin{subarray}{c}\text{positive definite real binary quadratic forms}\\ \text{up to scaling by $\mathbb{R}^{*}$}\end{subarray}\right\}

\mathcal{L}_{\mathbb{R}}:=\left\{\begin{subarray}{c}\text{homothety classes of positively oriented rank two $\mathbb{Z}$-lattices $\Lambda=\alpha\mathbb{Z}+\beta\mathbb{Z}$ in $\mathbb{C}$}\end{subarray}\right\}

When we wish to focus on integral forms, we define

\mathcal{Q}_{\mathbb{Z}}:=\left\{\begin{subarray}{c}\text{positive definite primitive}\\ \text{integral binary quadratic forms}\end{subarray}\right\}

\mathcal{L}_{\mathbb{Z}}:=\left\{\begin{subarray}{c}\text{homothety classes of positively oriented rank two $\mathbb{Z}$-lattices $\Lambda=\alpha\mathbb{Z}+\beta\mathbb{Z}$ in $\mathbb{C}$,}\\ \text{where $\alpha/\beta$ is an algebraic number of degree $2$.}\end{subarray}\right\}

Then our description so far can be filled out to obtain the following classical result.

Theorem 2.23.

There is a $\operatorname{PSL}_{2}(\mathbb{Z})$ -equivariant bijection between $\mathcal{Q}_{\mathbb{R}}$ and $\mathcal{L}_{\mathbb{R}}$ , restricting to a bijection between $\mathcal{Q}_{\mathbb{Z}}$ and $\mathcal{L}_{\mathbb{Z}}$ .

Proof.

The $\mathbb{R}$ case is essentially a collection of our work so far. Observe that the map from forms to lattices, on $\mathcal{Q}_{\mathbb{Z}}$ , returns lattices $\mathbb{Z}+z\mathbb{Z}$ where $z$ is a quadratic algebraic number. Conversely, if $z$ is quadratic algebraic, then the form has $\mathbb{Q}$ coefficients and can be normalized to lie in $\mathcal{Q}_{\mathbb{Z}}$ . ∎

In fact, the lattices of $\mathcal{L}_{\mathbb{Z}}$ are exactly those which arise as fractional ideals of imaginary quadratic fields.

Exercise 2.24.

The order of a rank-two lattice $\Lambda=\alpha\mathbb{Z}+\beta\mathbb{Z}\subseteq\mathbb{C}$ is $\operatorname{ord}(\Lambda):=\{z\in\mathbb{C}:z\Lambda\subseteq\Lambda\}$ . Suppose the lattice is associated under Theorem 2.23 to an integral quadratic form. Show that the order is a subring of an imaginary quadratic field. Which field?

2.10. Diophantine approximation

We now return to one of the most basic questions of number theory, which can be asked about the real line, but answered with the geometry of the upper half plane. How do real numbers lie in relation to rational numbers in $\mathbb{R}$ ?

One simple way to answer this question is to describe the decimal system, which is a sort of addressing system for real numbers by successive approximation by rationals with $10$ -power denominator. There are, however, many ways to approximate a real number by rationals. Diophantine approximation asks us to approximate a real number $\alpha$ by the ‘best’ rationals $p/q$ in the sense that $|p/q-\alpha|$ is small while $|q|$ is simultaneously small. One way to measure the quality of an approximation is to study the quantity

-\log_{q}\left(\min_{p}\left|\frac{p}{q}-\alpha\right|\right).

Since we can always expect an approximation to within $\frac{1}{q}$ , this quantity is bounded below by $1$ . But we can often do significantly better. Phrased in what is sometimes a more traditional way, we ask, for an exponent $\eta$ , whether we can find, or can find infinitely many, $p/q$ satisfying

\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{\eta}}.

Further evidence for the appropriateness of this measure of ‘goodness’ of an approximation is given by Dirichlet’s Theorem.

Theorem 2.25 (Dirichlet).

Let $\alpha\in\mathbb{R}$ . Then $\alpha$ is irrational if and only if there exist infinitely many distinct $p/q\in\mathbb{Q}$ such that

\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{2}}.

(1)

We interpret this as saying that rational numbers are “poorly approximable” and irrationals are “well approximable.”

Proof.

Consider an irrational $\alpha\in\mathbb{R}$ . Divide the unit interval $[0,1]$ into $Q$ even subintervals, where $Q>0$ is an integer. Amongst the $Q+1$ real numbers $0,\alpha,2\alpha,\ldots,Q\alpha$ , there must be two whose fractional parts⁷⁷7The fractional part of $x$ is $x-\lfloor x\rfloor$ , the number $x$ minus the largest integer less than or equal to $x$ . It lies in the unit inteval $[0,1)$ . fall into the same subinterval. Call these $i\alpha$ and $j\alpha$ , where $0\leq i<j\leq Q$ . Then

\left|(j-i)\alpha-p\right|<\frac{1}{Q}

for an appropriate choice of integer $p$ . Then let $q:=j-i\leq Q$ and

\left|\frac{p}{q}-\alpha\right|<\frac{1}{qQ}\leq\frac{1}{q^{2}}.

Thus we have found one example. To generate another example, distinct from any $p_{i}/q_{i}$ that may have come before, we choose $Q^{\prime}$ so that

\left|p_{i}-q_{i}\alpha\right|>\frac{1}{Q^{\prime}}

for all $i$ (notice that this is possible because $\alpha$ is irrational, so $q_{i}\alpha$ is never an integer), and then repeat the argument above. In this way, we generate infinitely many distinct $p/q\in\mathbb{Q}$ having the desired property.

On the other hand, rational numbers ‘repel’ one another in the sense that for any distinct $p_{1}/q_{1},p_{2}/q_{2}\in\mathbb{Q}$ ,

\left|\frac{p_{1}}{q_{1}}-\frac{p_{2}}{q_{2}}\right|\geq\frac{1}{q_{1}q_{2}}.

In particular, for $q_{2}>q_{1}$ ,

\left|\frac{p_{1}}{q_{1}}-\frac{p_{2}}{q_{2}}\right|>\frac{1}{q_{2}^{2}}.

This means that if $\alpha$ is rational, (1) can only be satisfied finitely many times. ∎

The theorem is illustrated in Figure 2.2. Dirichlet’s theorem is sharp in the sense that it fails for any exponent exceeding $2$ on the right side.

Exercise 2.26.

We will show that the golden ratio $\alpha=\frac{1+\sqrt{5}}{2}$ is particularly poorly approximable⁸⁸8Enjoy this: https://slate.com/technology/2021/06/golden-ratio-phi-irrational-number-ellenberg-shape.html. Let $f$ be the minimal polynomial for $\alpha$ . Let $p/q\in\mathbb{Q}$ . Obtain a lower bound for $f(p/q)$ in terms of $q$ . Factoring $f(p/q)=(p/q-\alpha)(p/q-\overline{\alpha})$ , and bounding the second factor above, prove that there exists some constant $K$ such that $\left|\frac{p}{q}-\alpha\right|\geq\frac{1}{Kq^{2}}$ for all but finitely many rationals $p/q$ . Fun fact: $\alpha$ is best approximated by ratios $F_{n}/F_{n-1}$ where $F_{n}$ is the Fibonacci sequence.

2.11. The Farey subdivision

How do we find the set of ‘good approximations,’ that is to say, solutions to (1)? The pigeonhole principle proof above provides an algorithm, albeit a slow one. There is a geometric story, however, which gives rise to a very efficient algorithm: the theory of continued fractions. My favourite version of this story is due to Caroline Series [Ser85a, Ser85b].

We need a little of the geometry of $\mathbb{H}^{2}_{U}$ here, which we will detail further in a subsequent section. In particular, $\mathbb{H}^{2}_{U}$ is a model of the hyperbolic plane in which geodesics are exactly the straight vertical lines, and the upper half-circles centered on the real line (see Figure 2.3). For Series’ story, we study the Farey Tesselation, shown in Figure 2.6. To generate this image, do the following:

(1)

draw vertical lines upward from each integer;
(2)

connect each pair of consecutive integers by a geodesic;
(3)

for each pair of rational numbers $p_{1}/q_{1}$ and $p_{2}/q_{2}$ connected by a geodesic, define their mediant $(p_{1}+p_{2})/(q_{1}+q_{2})$ and draw a geodesic connecting $p_{1}/q_{1}$ to the mediant, as well as a geodesic connecting $p_{2}/q_{2}$ to the mediant;
(4)

repeat the last step, ad infinitum.

This mediant operation is actually quite natural if we view $\mathbb{Q}\cup\{\infty\}$ as $\mathbb{P}^{1}(\mathbb{Z})$ , the projectivization⁹⁹9For any ring $R$ , my notation for a projective space is: $\mathbb{P}^{n}(R):=\{\mathbf{x}:=(x_{1},\ldots,x_{n})\neq 0:x_{i}\in R\}/(\mathbf{x}=\lambda\mathbf{x},\lambda\in R^{*})$ . In topological or geometric contexts, authors write $F\mathbb{P}^{n}$ for the projective space over the field $F$ . of the square lattice $\mathbb{Z}^{2}$ . Each rational number, by taking it in reduced form, corresponds to an element of $\mathbb{Z}^{2}$ with a sightline to the origin (i.e. a primitive vector). Then the mediant operation is vector addition of these primitive vectors (reduced fractions). One way to ‘see’ this is to consider the lattice $\mathbb{Z}^{2}$ in the plane. Standing at the origin and looking out, we see the vertices of the lattice as a copy of $\mathbb{P}^{1}(\mathbb{Z})$ (see Figure 2.4). The ‘nearby’ points are those with small vector entries, and these occur earlier in the Farey subdivision. Visually, if the lattice points are indicated by spheres, they would appear as larger dots (compare Figures 2.4 and 2.5).

This process actually generates all the rational numbers, as the following exercise demonstrates.

Exercise 2.27.

Begin with any positive rational number in lowest form $p/q$ . Show that there exists $u/v\in\mathbb{Q}$ such that $pv-uq=1$ . Then perform a Euclidean-style algorithm on the vectors $\begin{pmatrix}p\\ q\end{pmatrix}$ and $\begin{pmatrix}u\\ v\end{pmatrix}$ , repeatedly substracting one from the other until attaining the pair of standard basis vectors for $\mathbb{Z}^{2}$ . Explain why this is a proof that the Farey tesselation generates all rational numbers as geodesic endpoints.

In this image, there is a ‘bubble,’ or geodesic, lying above the unit interval $[0,1]$ and then it is further subdivided into two bubbles, each of which is further subdivided, etc. This gives a way to ‘organize’ the real line: compare this to the more common decimal subdivision in Figure 2.7. In either system, we can specify the location of a real number by indicating which interval it lies in, then which subinterval of that, etc., iteratively. In the decimal system, we choose the leftmost endpoint of each successive interval, and call the resulting sequence the decimal expansion.

What if we use the Farey subdivision? We obtain the continued fraction expansion. To describe the ‘path’ to $\alpha$ through the Farey ‘froth’ of semicircles, we use the language of geodesics. It was long known that continued fractions were connected to the paths of geodesics in the upper half plane and to $\operatorname{PSL}_{2}(\mathbb{Z})$ , but it was Caroline Series who realized that the Farey subdivision is perhaps the most natural way to describe this relationship.

Definition 2.28.

Consider any geodesic, heading out from the imaginary axis to a positive real number $\alpha$ . Label it by L or R as it passes through each region of the Farey tessellation (or subdivision), as follows. The geodesic cuts each triangular fundamental region it passes through into two parts, one having one cusp, and one having two. Label it L if the region to the left of the geodesic (from the perspective of its direction of travel) has one cusp, and R otherwise (Figure 2.8). If the geodesic heads directly into a cusp, label with either an L or an R. The resulting word (generated from left to right), is the Series continued fraction.

The sequence $L^{a_{0}}R^{a_{1}}\cdots$ is an example of a cutting sequence. See Figure 2.9 for an example.

Exercise 2.29.

Determine the beginning of the continued fraction expansion of $e$ , and the two full expansions for $17/5$ .

Returning to the lattice interpretation, the element $\alpha\in\mathbb{R}$ corresponds to a ray from the origin at slope $\alpha$ . If $\alpha$ is irrational, it will never hit a lattice point. However, there are ‘near misses:’ lattice points that it passes close to as it heads out from the origin. These are the corners of the fundamental triangles of the Farey tessellation at which we ‘turn’ left or right.

2.12. The matrix continued fraction expansion

To formalize the continued fraction process and the good rational approximations it will produce, we consider $\operatorname{SL}_{2}^{+}(\mathbb{Z})$ (the $+$ indicates that we include only those matrices whose entries are non-negative) acting on $\mathbb{P}^{1}(\mathbb{Q})$ . This monoid (group without inverses), or semigroup¹⁰¹⁰10A semigroup is a group without the requirement for inverses or an identity. In the literature of continued fractions, $\operatorname{SL}_{2}^{+}(\mathbb{Z})$ is often called a semigroup, although we would normally include the identity., is generated from the identity matrix by two matrices:

M_{L}=\begin{pmatrix}1&1\\ 0&1\end{pmatrix},\quad M_{R}=\begin{pmatrix}1&0\\ 1&1\end{pmatrix}.

Phrased another way, $\operatorname{SL}_{2}^{+}(\mathbb{Z})$ is exactly the collection of matrices formed as finite words (including the empty word) in the words $M_{L}$ and $M_{R}$ . Form a tree whose root is the identity matrix, and for each matrix, the left child is obtained by multiplying on the right by $M_{L}$ and the right child is obtained by multiplying on the right by $M_{R}$ , as illustrated in Figure 2.10.

Exercise 2.30.

Prove that $\operatorname{SL}_{2}^{+}(\mathbb{Z})$ is generated as a semigroup by $M_{L}$ and $M_{R}$ . (See Exercise 2.27.) Show that $\operatorname{PSL}_{2}(\mathbb{Z})$ is generated as a group from $\operatorname{SL}_{2}^{+}(\mathbb{Z})$ by the addition of $\begin{pmatrix}0&-1\\ 1&0\end{pmatrix}$ .

Figure 2.10. The

\operatorname{SL}_{2}^{+}(\mathbb{Z})

matrix tree, shown to four levels.

From the previous exercise, one can conclude that the Farey tesselation can equivalently be described as the orbit in $\mathbb{H}^{2}_{U}$ of the geodesic from $0$ to $\infty$ under $\operatorname{PSL}_{2}(\mathbb{Z})$ .

Each geodesic in the picture, once we choose a direction, corresponds to a unique element $\begin{pmatrix}p_{1}&p_{2}\\ q_{1}&q_{2}\end{pmatrix}$ by writing its rational endpoints $p_{1}/q_{1}$ and $p_{2}/q_{2}$ as column vectors. In fact, this matrix lies in $\operatorname{PSL}_{2}(\mathbb{Z})$ as the image of the vertical imaginary axis (in a downward direction), which corresponds to $\begin{pmatrix}1&0\\ 0&1\end{pmatrix}$ , connecting $0$ to $\infty$ . Therefore, the pairs of rational numbers connected by a geodesic have the property that $p_{1}q_{2}-p_{2}q_{1}=1$ , which earns them the name of a unimodal pair.

The continued fraction of a real number $\alpha$ is more commonly defined by an iterative algebraic process, generating an expression

\alpha=a_{0}+\dfrac{1}{a_{1}+\dfrac{1}{a_{2}+\dfrac{1}{a_{3}+\cdots}}}.

(2)

The finite truncations of this expression give rational numbers

p_{k}/q_{k}=a_{0}+\dfrac{1}{a_{1}+\dfrac{1}{a_{2}+\dfrac{1}{a_{3}+\cdots+1/a_{k}}}},

called the convergents of $\alpha$ . The $a_{i}$ are called the partial quotients or coefficients.

Now a cutting sequence through the Farey tessellation can be interpreted as a word in $M_{R}$ and $M_{L}$ , by simply replacing $R\leftarrow M_{R}$ and $L\leftarrow M_{L}$ . This is justified by the observation that if a triangular region $T$ has endpoints $\alpha_{1}/b_{1}$ , $(a_{1}+a_{2})/(b_{1}+b_{2})$ , $a_{2}/b_{2}$ , then the triangular region $TM_{L}$ (interpreting the matrix as a Möbius transformation) has triangular endpoints $a_{1}/b_{1}$ , $(2a_{1}+a_{2})/(2b_{1}+b_{2})$ , $(a_{1}+a_{2})/(b_{1}+b_{2})$ , i.e., it corresponds to the left sub-interval.

The following proposition now states that the ‘traditional’ expansion (2) is the same as Series’ geodesic expansion interpreted as matrices.

Theorem 2.31 (Series [Ser85b]).

Suppose

L^{a_{0}}R^{a_{1}}L^{a_{2}}R^{a_{3}}\cdots

is the L,R-sequence for a geodesic originating on the imaginary axis and ending at $\alpha\in\mathbb{R}$ . Then a classical continued fraction expansion of $\alpha$ is

a_{0}+\dfrac{1}{a_{1}+\dfrac{1}{a_{2}+\dfrac{1}{a_{3}+\cdots}}}.

In more detail, if $\alpha$ has cutting sequence $L^{a_{0}}R^{a_{1}}\cdots$ , define $M_{n}:=M_{L}^{a_{0}}M_{R}^{a_{1}}\cdots M_{*}^{a_{n}}\in\operatorname{SL}_{2}^{+}(\mathbb{Z})$ . Then the convergents of the continued fraction expansion of $\alpha$ are given by

\begin{pmatrix}p_{n}\\ q_{n}\end{pmatrix}=M_{n}\begin{pmatrix}1\\ 0\end{pmatrix},\text{ or }M_{n}\begin{pmatrix}0\\ 1\end{pmatrix},

depending whether the last letter was $L$ or $R$ , respectively. Furthermore, if $\alpha$ is rational with lowest terms representation $p/q$ , then there are exactly two expansions of $\alpha$ (since heading into a cusp is the last step), and we have

\begin{pmatrix}p\\ q\end{pmatrix}=\left\{\begin{array}[]{ll}L^{a_{0}}R^{a_{1}}L^{a_{2}}R^{a_{3}}\cdots\begin{pmatrix}1\\ 0\end{pmatrix}&\text{if the expansion ends in R}\\ L^{a_{0}}R^{a_{1}}L^{a_{2}}R^{a_{3}}\cdots\begin{pmatrix}0\\ 1\end{pmatrix}&\text{if the expansion ends in L}\\ \end{array}\right.

Observe that ‘heading into the mediant’ to end a rational approximation would mean evaluating at the vector $(1,1)^{t}$ , so instead we head one step lower to either left or right; this constitutes the ambiguity in the second part of the theorem.

The following exercise should convince you that Theorem 2.31 is correct.

Exercise 2.32.

Computationally verify the following:

L^{k}\begin{pmatrix}0\\ 1\end{pmatrix}=\begin{pmatrix}k\\ 1\end{pmatrix},\quad L^{k}R^{\ell}\begin{pmatrix}1\\ 0\end{pmatrix}=\begin{pmatrix}1+\ell k\\ \ell\end{pmatrix},\quad L^{k}R^{\ell}L^{t}\begin{pmatrix}0\\ 1\end{pmatrix}=\begin{pmatrix}t+k+k\ell t\\ \ell t+1\end{pmatrix}.

Verify also that

\frac{k}{1}=k,\quad\frac{1+\ell k}{\ell}=k+\frac{1}{\ell},\quad\frac{t+k+k\ell t}{\ell t+1}=k+\frac{1}{\ell+\frac{1}{t}}.

Exercise 2.33.

Verify Theorem 2.31 for the beginning of the continued fraction expansion of $e$ , and the two full expansions for $17/5$ .

We complete our study of continued fractions with one of the principal reasons they are studied: that the continued fraction convergents capture good approximations.

Theorem 2.34.

Any rational approximation $p/q\in\mathbb{Q}$ to $\alpha\in\mathbb{R}$ satisfying $\left|\frac{p}{q}-\alpha\right|<\frac{1}{2q^{2}}$ is a continued fraction convergent of $\alpha$ .

Exercise 2.35.

To prove Theorem 2.34, do the following:

(1)

Let $p/q$ be a good approximation in the sense of Theorem 2.34. Suppose without loss of generality that $p/q<\alpha$ . Show that $p/q$ must be the closest rational to $\alpha$ , from below, with denominator $\leq q$ .
(2)

Let $p^{\prime}/q^{\prime}$ be the closest rational to $\alpha$ , from above, with denominator $\leq q$ .
(3)

Conclude that $p/q$ and $p^{\prime}/q^{\prime}$ are a unimodal pair and therefore the endpoints of a bubble in the Farey tesselation containing $\alpha$ .
(4)

Now consider the mediant $M$ between these two rational numbers. Show that it must lie to the right of $\alpha$ , by measuring the distance between $p/q$ and $M$ (and using the good approximation property of $p/q$ ).
(5)

Show that if $\alpha$ lies in a bubble of the Farey tesselation, then the bubble endpoint with smallest denominator is a convergent.

2.13. Indefinite quadratic forms and real quadratic irrationalities

Recall that definite quadratic forms have a reduction theory (Exercise 2.18 and Figure 2.1). In short, reduction of a quadratic form is a process of navigating through $\operatorname{PSL}_{2}(\mathbb{Z})$ to obtain a reduced representative of the equivalence class. There is also a reduction theory of indefinite quadratic forms, and it can be explained with the Farey tesselation and is closely related to continued fractions. We turn to this now.

Primitive integral indefinite quadratic forms $ax^{2}+bxy+cy^{2}$ , $a,b,c\in\mathbb{Z}$ , correspond to a conjugate pair of roots $\alpha,\overline{\alpha}=\frac{-b\pm\sqrt{\Delta}}{2a}$ in the real line. Such a pair $\alpha$ , $\overline{\alpha}$ is called reduced if (up to swapping the two roots),

\overline{\alpha}<-1<0<\alpha<1.

It is an elementary exercise to show that this is equivalent to the conditions

0\leq b<\sqrt{\Delta},\quad 0<\sqrt{\Delta}-b\leq 2a\leq\sqrt{\Delta}+b,

or, to the satisfyingly simple

b>|a+c|,\quad ac<0.

Exercise 2.36.

Show the equivalences just mentioned.

In particular, fixing $\Delta$ , there are only finitely many possible values for $b$ , therefore only finitely many possible $a$ , and for each such pair only one possible $c$ . So there are only finitely many reduced forms for each discriminant.

Next, observe that the transformation

(\alpha,\overline{\alpha})\mapsto\left(\frac{1}{\alpha}-\left\lfloor\frac{1}{\alpha}\right\rfloor,\frac{1}{\overline{\alpha}}-\left\lfloor\frac{1}{\alpha}\right\rfloor\right),

takes $(\alpha,\overline{\alpha})$ to another reduced pair. Furthermore, under the reduced assumption, this operation on pairs is a bijection, with inverse

(\alpha,\overline{\alpha})\mapsto\left(\frac{1}{\alpha-1+\lfloor\overline{\alpha}\rfloor},\frac{1}{\overline{\alpha}-1+\lfloor\overline{\alpha}\rfloor}\right).

Thus, the reduced forms fall into cycles, where each step is realized by a bespoke Möbius map (namely, $z\mapsto 1/z-\lfloor 1/\alpha\rfloor$ ).

Proposition 2.37.

For each primitive integral indefinite binary quadratic form $ax^{2}+bxy+cy^{2}$ corresponding to pair $(\alpha,\overline{\alpha})$ , there is a reduced form with pair $(\alpha^{\prime},\overline{\alpha}^{\prime})$ such that $\alpha=M\alpha^{\prime}$ where $M\in\operatorname{PSL}_{2}^{+}(\mathbb{Z})$ .

Proof.

We prove it only in the case that $\alpha>\overline{\alpha}>0$ (other cases can be reduced to this case). There is some triangle of the Farey tesselation with corners $a$ , $b$ , $c$ such that $0<a<\overline{\alpha}<b<\alpha<c$ . Then, by the transitivity of the Farey tesselation, this triangle is an image under some $M_{0}\in\operatorname{PSL}_{2}^{+}(\mathbb{Z})$ of the triangle with corners $0$ , $1$ and $\infty$ , in such a way that $0<\overline{\alpha}_{0}<1<\alpha_{0}<\infty$ , and $M_{0}(\alpha_{0})=\alpha$ , $M_{0}(\overline{\alpha}_{0})=\overline{\alpha}$ . By composing with a translation, we can assume instead that $\overline{\alpha}_{0}<0<\alpha_{0}<1$ .

If $\overline{\alpha}_{0}<-1<0<\alpha_{0}<1$ , we are done. If not, then $-1<\overline{\alpha}_{0}<0<\alpha_{0}<1$ , and there is some $\alpha_{1},\overline{\alpha}_{1}$ with $\overline{\alpha}_{1}<-1<0<\alpha<1$ and $M_{1}\in\operatorname{PSL}_{2}(\mathbb{Z})$ such that $M_{1}(\alpha_{1})=\alpha_{0}$ , $M(\overline{\alpha}_{1})=\overline{\alpha}_{0}$ ; in fact $M_{1}=\begin{pmatrix}0&-1\\ 1&0\end{pmatrix}\begin{pmatrix}1&b\\ 0&1\end{pmatrix}$ , $b\geq 1$ will suffice. Note that although $M_{1}\notin\operatorname{PSL}_{2}^{+}(\mathbb{Z})$ , the composition $M_{0}M_{1}\in\operatorname{PSL}_{2}^{+}(\mathbb{Z})$ . The proof is illustrated in Figure 2.11. ∎

Combining the proof above, which gives the ‘pre-periodic’ portion of a continued fraction expansion, with the observed cycles of reduced forms, we obtain a corollary.

Corollary 2.38.

Let $\alpha$ be a real quadratic irrational. Then the continued fraction expansion of $\alpha$ is eventually periodic, and perfectly periodic if $\alpha$ is reduced.

Exercise 2.39.

Provide the details of the corollary above. Amongst these details is the converse that any eventually periodic continued fraction converges to a real quadratic irrational.

Exercise 2.40.

Determine the automorphisms of an indefinite form, by using the fact that such an automorphism is a Möbius transformation fixing the roots. Show that such automorphisms are given by solutions to the Pell¹¹¹¹11As is the case with so much mathematical nomenclature, this equation has at best a tenuous connection to the person it is named after. Pell revised a translation of a book that discussed it, approximately two millenia after it was first discussed. equation $X^{2}-\Delta Y^{2}=4$ . Next, observe that the Möbius transformations that circumnavigate once around a cycle of indefinite forms (as described above) fix a reduced form, and therefore correspond to solutions to Pell’s equation. This leads to an algorithm to compute solutions to Pell’s equation by continued fractions. (Such automorphisms correspond to multiplying the associated ideal of $\mathbb{Q}(\sqrt{\Delta})$ by a unit.)

The Farey diagram has a dual graph called the topograph, put to lovely use by Conway [Con97] to prove all the basic facts about binary quadratic forms, both definite and indefinite. A wonderful description of this point of view is Allen Hatcher’s Topology of Numbers [Hat22].

2.14. Lagrange spectrum

Dirichlet’s Theorem differentiates the approximation properties of rationals and irrationals. We might ask a more nuanced question, which is, given a constant $C$ , whether for an irrational $\alpha$ , there are infinitely many $p/q\in\mathbb{Q}$ such that

\left|\frac{p}{q}-\alpha\right|<\frac{C}{q^{2}}.

For $C=1$ , there are infinitely many such approximations. This remains true as $C$ decreases to $1/\sqrt{5}$ , where Hurwitz’ Theorem states that it still holds. But after that, there are only finitely many approximations for any $\alpha$ whose continued fraction ends in all $1$ ’s. These are called ‘noble numbers’ and include the golden ratio. More formally, we can define

v(\alpha)=\inf\{C:|\alpha-p/q|<C/q^{2}\text{ for infinitely many }p/q\in\mathbb{Q}\}.

Markoff showed that there is a discrete set of values $v(\alpha)$ above $1/3$ ; the $\alpha$ that realize these discrete values are called Markoff irrationalities, and the values themselves are the discrete part of the Lagrange spectrum (sometimes called Markoff spectrum, as for example by Series [Ser85a]; the terminology is occasionally murky).

It turns out that the discrete elements of the Markoff spectrum are associated to a simple loop on the punctured torus. See [Ser85b, Ser85a] for the details of this story, but in short, one can obtain the punctured torus as a quotient of $\mathbb{H}^{2}_{U}$ by a subgroup of $\operatorname{PSL}_{2}(\mathbb{Z})$ , and the geodesic in question is the geodesic whose endpoint is the irrational to be approximated, and the cutting sequence in this situation is closely related to the continued fraction expansion.

2.15. Roth’s Theorem

The Diophantine approximation of algebraic numbers has been of particular interest. It turns out that they are ‘poorly approximable’ in the sense that the exponent 2 in Dirichlet’s Theorem is best possible.

Theorem 2.41 (Roth [Rot55]).

Let $\alpha$ be an algebraic number of degree $d\geq 2$ . Then for every $\epsilon>0$ , there are only finitely many $p/q\in\mathbb{Q}$ such that

\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{2+\epsilon}}.

In fact, most numbers (almost all, in a measure-theoretic sense) are poorly approximable in the same way (Sprindz̆uk [Sz69]). So is it possible to find some that are well-approximable? Liouville constructs such numbers, which can be done cleverly (Exercise 2.42).

Exercise 2.42.

Show that $\alpha:=\sum_{k=0}^{\infty}\frac{1}{10^{k!}}$ is very well-approximable in the following sense. There are infinitely many $p_{n}/q_{n}\in\mathbb{Q}$ such that $\left|p_{n}/q_{n}-\alpha\right|<1/q_{n}^{n}$ .

Exercise 2.43.

This exercise outlines a proof due to Liouville that any algebraic number $\alpha$ of degree $d\geq 2$ has only finitely many approximations $p/q\in\mathbb{Q}$ satisfying $\left|\frac{p}{q}-\alpha\right|<\frac{1}{q^{d+\epsilon}}$ (in other words, Roth’s theorem with exponent $d+\epsilon$ instead of $2+\epsilon$ ). This method of proof contains the seeds of the proof of Roth’s theorem.

(1)

Consider $g(x)$ , the minimal polynomial of $\alpha$ but scaled to lie in $\mathbb{Z}[x]$ with coefficients having no primitive factor. Give a lower bound on $|g(p/q)|$ .
(2)

Use the mean value theorem on the difference $g(\alpha)-g(p/q)$ , to give a lower bound on $|\alpha-p/q|$ .
(3)

Argue that this lower bound is independent of $p$ and $q$ .
(4)

Comparing the two estimates above, complete the argument.

3. Hyperbolic and Minkowski geometry

Having seen the importance of the upper half plane and the Möbius action in number theory, we will now turn to studying this geometry in earnest, including a higher-dimensional generalization. We assume general knowledge of hyperbolic space, but will need the details of several models and the isometries between them. Our approach to defining a model of hyperbolic space is to emphasize the underlying space and its isometries, in the spirit of of Klein’s Erlanger Programm [Trk07]. More specifically, we take an approach based on linear algebra, also in the spirit of Klein; for a more detailed treatment, see for example [Par08]. Here we depart from the standard treatment to give the isometries between several models of hyperbolic space in terms of roots and coefficients of polynomials; for this specific treatment, see [HST22].

3.1. Minkowski space

Consider an $n+1$ dimensional real vector space $M^{n,1}:=\mathbb{R}^{n+1}$ . Put a quadratic form $Q$ of signature $(n,1)$ and its associated bilinear form $\langle\cdot,\cdot\rangle_{Q}$ on this space: we will now call it a Minkowski space.

Exercise 3.1.

Suppose we are working over a field of characteristic not equal to $2$ . Show that if $Q(\mathbf{x})$ is a quadratic form on a vector space, then $\langle\mathbf{x},\mathbf{y}\rangle:=\frac{1}{2}\left(Q(\mathbf{x}+\mathbf{y})-Q(\mathbf{x})-Q(\mathbf{y})\right)$ is a bilinear form. Conversely, and inversely, show how to recover a quadratic form from a bilinear form.

The forms $Q$ and $\langle\cdot,\cdot\rangle_{Q}$ endow $M^{n,1}$ with geometry. The zero locus $Q=0$ is a double cone emanating from the origin, called the light cone. Outside the cone, the level sets $Q=c$ , $c>0$ are one-sheeted hyperboloids. Inside the cone, the level sets $Q=c$ , $c<0$ are two-sheeted hyperboloids.

Projectivizing this space to obtain $\mathbb{P}M^{n,1}$ , we can take the subset inside the cone:

\mathbb{H}^{n}_{M}:=\{[x_{0}:x_{1}:\cdots:x_{n}]:Q(x_{0},\ldots,x_{n})<0\}\subset\mathbb{P}M^{n,1};

one may think of this as obtained by gluing the two sheets of the two-sheeted hyperboloid. Then $\mathbb{H}^{n}_{M}$ is a model of hyperbolic $n$ -space, called the hyperboloid model. See Figure 3.1.

The metric is given by the differential

ds^{2}=Q(dx_{0},dx_{1},\ldots,dx_{n}),

or the distance function $d_{M}$ satisfying

\cosh\left(d_{M}(\mathbf{u},\mathbf{v})\right)=\frac{\langle\mathbf{u},\mathbf{v}\rangle_{Q}}{\sqrt{Q(\mathbf{u})Q(\mathbf{v})}}.

(3)

The quadratic space $M^{n,1}$ is acted upon by $SO_{Q}(\mathbb{R})\cong SO_{n,1}(\mathbb{R})$ , the special orthogonal transformations preserving the form $Q$ . Their action takes $\mathbb{H}^{n}_{M}$ to itself, acting as hyperbolic isometries.

Geodesics in $\mathbb{H}^{n}_{M}$ are obtained as the intersections with lines of $\mathbb{P}M^{n,1}$ (one may think of this as intersecting planes in $\mathbb{R}^{n+1}$ with the two-sheeted hyperboloid). The light cone itself can be thought of as the ‘boundary at infinity’ of $\mathbb{H}^{n}_{M}$ , and the intersection of the aforementioned projective line (or affine plane) with the cone gives the limit points of the geodesic. In higher dimensions, we obtain geodesic surfaces, spaces, etc. by intersecting higher dimensional subspaces.

3.2. The upper half plane

We’ve seen that the quadratics have an affinity for the Möbius action on the upper half plane. The upper half plane $\mathbb{H}^{2}_{U}$ is a model of the hyperbolic plane, whose isometries are $\operatorname{PSL}_{2}(\mathbb{Z})$ . The metric is given by the differential

ds^{2}=\frac{dz\;d\overline{z}}{\Im(z)^{2}}

or the distance function $d_{U}$ satisfying

\cosh\left(d_{U}(z,w)\right)=1+\frac{|z-w|^{2}}{2\Im(z)\Im(w)}.

(4)

The hyperbolic isometries are the Möbius transformations which we met earlier, $\operatorname{PSL}_{2}(\mathbb{R})$ . The geodesics in the upper half plane consist of the restrictions to $\mathbb{H}^{2}_{U}$ of lines $\Re(z)=a$ in $\mathbb{C}$ and circles in $\mathbb{C}$ centred on $\mathbb{R}$ (Figure 2.3). The boundary of $\mathbb{H}^{2}_{U}$ is

\partial\mathbb{H}^{2}_{U}:=\widehat{\mathbb{R}}:=\mathbb{R}\cup\{\infty\}

where the point $\infty$ is an ideal (in other words, mathematically imagined) point ‘at infinity.’ The point $\infty$ is a limit point for any geodesic arising from a vertical line $\Re(z)=a$ , the other limit point being $a$ . Half-circle geodesics have as their two limit points the intersection of the circle with $\mathbb{R}$ . Observe that the isometries $\operatorname{PSL}_{2}(\mathbb{R})$ , interpreted on the extended plane $\widehat{\mathbb{C}}:=\mathbb{C}\cup\{\infty\}$ , take $\partial\mathbb{H}^{2}_{U}$ to itself.

3.3. Relating the upper half plane and hyperboloid models

There is a beautiful dictionary between the hyperboloid model and the upper half plane model of the hyperbolic plane, carried out by a hyperbolic isometry between the two, and it contains some surprises. The essential observation is that $\Delta(A,B,C)=B^{2}-4AC$ is a signature $2,1$ form, but is also the discriminant form for quadratic polynomials $Ax^{2}+Bx+C$ , so we can think of the Minkowski space $\mathbb{R}^{2,1}$ as parametrizing such polynomials by taking $Q=\Delta$ . Then the associated inner product is

\langle(A_{1},B_{1},C_{1}),(A_{2},B_{2},C_{2})\rangle_{Q}=B_{1}B_{2}-2A_{1}C_{2}-2A_{2}C_{1}.

If we are interested in the roots of such polynomials, then it is natural to take a projectivization of this space, as the roots are unaffected by scaling the polynomial by a constant.

Exercise 3.2.

Demonstrate that $\Delta(A,B,C)$ is a signature $2,1$ form.

Viewed as a space of polynomials, the light cone cuts out those polynomials with complex conjugate roots as the interior of the light cone ( $\Delta<0$ ), leaving those with distinct real roots on the exterior ( $\Delta>0$ ). The light cone itself corresponds to the quadratic polynomial having a double root ( $\Delta=0$ ).

To be more formal, consider now the projectivized interior of the light cone:

\mathbb{H}^{2}_{M}:=\{[A:B:C]:A,B,C\in\mathbb{R},B^{2}<4AC\}=\{Ax^{2}+Bx+C:A,B,C\in\mathbb{R},B^{2}<4AC\}.

Then, the hyperbolic isometry between $\mathbb{H}^{2}_{M}$ and $\mathbb{H}^{2}_{U}$ is essentially the quadratic formula!

Theorem 3.3 ([HST22, Theorem 4.9]).

The map from $\mathbb{H}^{2}_{M}$ to $\mathbb{H}^{2}_{U}$ given on $[A:B:C]$ by taking the root of positive imaginary part to the polynomial $Ax^{2}+Bx+C$ , i.e., the quadratic formula, is a hyperbolic isometry. The inverse to this map takes $\alpha\in\mathbb{C}\smallsetminus\mathbb{R}$ to $[1:-\alpha-\overline{\alpha}:\alpha\overline{\alpha}]$ .

We can now line up the hyperbolic isometries on either side of the identification between $\mathbb{H}^{2}_{U}$ and $\mathbb{H}^{2}_{M}$ .

Theorem 3.4 ([HST22, Observation 4.6]).

Identify elements $[A:B:C]$ of $\mathbb{H}^{2}_{M}$ with matrices $D_{A,B,C}:=\begin{pmatrix}C&B/2\\ B/2&A\end{pmatrix}$ . Then the isometry of Theorem 3.3 between $\mathbb{H}^{2}_{U}$ and $\mathbb{H}^{2}_{M}$ is $\operatorname{PSL}_{2}(\mathbb{R})$ -equivariant, relating the action of $\operatorname{PSL}_{2}(\mathbb{R})$ via Möbius transforms on $\mathbb{H}^{2}_{U}$ to the action $M\cdot D_{A,B,C}:=M^{-1}D_{A,B,C}M^{-t}$ on $\mathbb{H}^{2}_{M}$ .

Writing this out explicitly gives a representation of $\operatorname{PSL}_{2}(\mathbb{R})$ in $O_{Q}(\mathbb{R})$ , i.e. the effect of $M$ on $[A:B:C]$ is multiplication by the following $3\times 3$ matrix:

\operatorname{PSL}_{2}(\mathbb{R})\rightarrow O_{Q}(\mathbb{R}),\quad\begin{pmatrix}a&b\\ c&d\end{pmatrix}\mapsto\begin{pmatrix}a^{2}&-ac&c^{2}\\ -2ab&bc+ad&-2cd\\ b^{2}&-bd&d^{2}\end{pmatrix}.

One advantage of the ambient Minkowski space containing $\mathbb{H}^{2}_{M}$ is that there’s a satisfying analogous relationship between the space outside the light cone and the geodesics of $\mathbb{H}^{2}_{U}$ . For this, we might give a name to the projectivized exterior of the light cone:

\mathbb{D}^{2}_{M}:=\{[A:B:C]:A,B,C\in\mathbb{R},B^{2}>4AC\}=\{Ax^{2}+Bx+C:A,B,C\in\mathbb{R},B^{2}>4AC\}.

The $\mathbb{D}$ is an anti de Sitter space, which is a term from physics. See Figure 3.1.

Theorem 3.5 ([HST22, Observation 4.11]).

The following two maps from $\mathbb{D}^{2}_{M}$ to the space of geodesics of $\mathbb{H}^{2}_{U}$ coincide:

(1)

given on $[A:B:C]$ by returning the geodesic whose endpoints are exactly the two real roots of $Ax^{2}+Bx+C$ ;
(2)

given on $[A:B:C]$ by first taking the plane normal to $[A:B:C]$ in $M^{2,1}$ (with respect to the Minkowski norm), then intersecting it with $\mathbb{H}^{2}_{M}$ , and finally composing with the hyperbolic isometry of Theorem 3.3.

Exercise 3.6.

Prove Theorems 3.3, 3.4, 3.5.

We summarize the situation in a table:

upper half plane $\mathbb{H}^{2}_{U}$	Minkowski space $M^{2,1}$
points	vectors inside the light cone
root of $Ax^{2}+Bx+C$ , $Q<0$	vector $[A:B:C]$ , $Q<0$ , or matrix $D_{A,B,C}:=\begin{pmatrix}C&B/2\\ B/2&A\end{pmatrix}$
geodesics	planes cutting the light cone
geodesic joining roots of $Ax^{2}+Bx+C$ , $Q>0$	plane normal to vector $[A:B:C]$ , $Q>0$
Möbius action of $M\in\operatorname{PSL}_{2}(\mathbb{R})$	action $D_{A,B,C}\mapsto M^{-1}D_{A,B,C}M^{-t}$ , $M\in\operatorname{PSL}_{2}(\mathbb{R})$

It makes sense to think of the one-sheeted hyperboloid as the space of geodesics of the upper half plane, since any point in that space is a normal vector normal to a plane cutting out a geodesic.

3.4. The Hamilton quaternions and upper half space

The ring of Hamilton quaternions is the ring

H=\{x+yi+zj+wk:x,y,z,w\in\mathbb{R}\}

with relations $i^{2}=j^{2}=k^{2}=-1$ and $k=ij=-ji$ . There is a quaternionic conjugation:

\overline{x+yi+zj+wk}=z-yi-zj-wk.

Analogously to the upper half plane, we can define the upper half space

\mathbb{H}^{3}_{U}:=\{x+yi+zj:z>0\}\subseteq H.

This is a standard model of hyperbolic $3$ -space, thought of as a halfspace of $\mathbb{R}^{3}=\{x+yi+zj\}$ , whose boundary is $\partial\mathbb{H}^{3}_{U}:=\widehat{\mathbb{C}}:=\mathbb{C}\cup\{\infty\}$ , consisting of a copy of $\mathbb{C}$ (thought of as $z=0$ in $\mathbb{R}^{3}$ or $z=w=0$ in H), augmented by a point at $\infty$ . We can also write $\alpha:=x+yi\in\mathbb{C}$ and an element of $\mathbb{H}^{3}_{U}$ as $\alpha+zj$ , as in Figure 3.3.

The differential is

ds^{2}=\frac{d\alpha d\overline{\alpha}+dz^{2}}{z^{2}},

and the distance function $d_{U}$ satisfies

\cosh\left(d_{U}(\alpha+zj,\beta+wj)\right)=1+\frac{|\alpha-\beta|^{2}+(z-w)^{2}}{2zw}.

(5)

There is an action of $\operatorname{PSL}_{2}(\mathbb{C})$ via Möbius transformations on the boundary of the upper half space, $\widehat{\mathbb{C}}$ . This action extends to a unique action on $\mathbb{H}^{3}_{U}$ by hyperbolic isometries, namely

\begin{pmatrix}a&b\\ c&d\end{pmatrix}\cdot(\alpha+zj)=(a(\alpha+zj)+b)(c(\alpha+zj)+d)^{-1},\quad\begin{pmatrix}a&b\\ c&d\end{pmatrix}\in\operatorname{PSL}_{2}(\mathbb{C}),\quad\alpha\in\mathbb{C},\quad z\in\mathbb{R},

where we must keep in mind the non-commutativity of $i$ and $j$ , so that order matters; and the notion of inverse takes place in the quaternions, so that

(c(\alpha+zj)+d)^{-1}=\frac{(\overline{\alpha}-zj)\overline{c}+\overline{d}}{|c\alpha+d|^{2}+|cz|^{2}}.

(6)

Again, order matters. Note that for $z=0$ the given action restricts to the usual Möbius action on $\partial\mathbb{H}^{3}_{U}$ .

Exercise 3.7.

(1)

Prove that any element of $\operatorname{PSL}_{2}(\mathbb{C})$ can be expressed as a composition of translation ( $z\mapsto z+\beta$ ), scaling ( $z\mapsto\beta z$ ) and circle inversion ( $z\mapsto 1/z$ ).
(2)

Prove equation (6).
(3)

Prove that the given Möbius action takes $\mathbb{H}^{3}_{U}$ to itself.
(4)

Prove that the given Möbius action on $\mathbb{H}^{3}_{U}$ acts by isometry.

We define the upper half space model of $\mathbb{H}^{3}$ to be $\mathbb{H}^{3}_{U}$ with isometries given by the Möbius action of $\operatorname{PSL}_{2}(\mathbb{C})$ as above.

The geodesics are given by the restriction to $\mathbb{H}^{3}_{U}$ of any circle contained in a vertical plane with centre on $\partial\mathbb{H}^{3}_{U}$ , together with vertical lines $\alpha=\alpha_{0}$ (Figure 3.3).

3.5. Relating the hyperboloid and upper half-space models for hyperbolic $3$ -space

The story in Section 3.3 has an analog for hyperbolic $3$ -space in place of the hyperbolic plane. Recall that we viewed $M^{2,1}$ as a space of symmetric matrices $D_{A,B,C}:=\begin{pmatrix}C&B/2\\ B/2&A\end{pmatrix}$ associated to quadratic forms $Ax^{2}+Bxy+Cy^{2}$ and quadratic polynomials $Ax^{2}+Bx+C$ .

Moving up from $M^{2,1}$ to $M^{3,1}$ , we can now consider $M^{3,1}$ to be a space of Hermitian matrices

\begin{pmatrix}q&-r+si\\ -r-si&p\end{pmatrix},\quad p,q,r,s\in\mathbb{R},

with determinant $Q(p,q,r,s)=r^{2}+s^{2}-pq$ playing the role of the signature $3,1$ form. The term Hermitian matrix means that $M=M^{\dagger}$ where $\dagger$ represents the complex conjugate transpose. The associated bilinear form is

\langle(p_{1},q_{1},r_{1},s_{1}),(p_{2},q_{2},r_{2},s_{2})\rangle_{Q}=r_{1}r_{2}+s_{1}s_{2}-\frac{p_{1}q_{2}}{2}-\frac{p_{2}q_{1}}{2}.

The form $Q$ breaks the space into the interior and exterior of the light cone $Q=0$ , as before.

If we projectivize, then we obtain Hermitian matrices up to scaling. Let us define, as before,

	$\displaystyle\mathbb{D}^{3}_{M}$	$\displaystyle:=\{[p:q:r:s]:p,q,r,s\in\mathbb{R},r^{2}+s^{2}>pq\},$
	$\displaystyle\mathbb{H}^{3}_{M}$	$\displaystyle:=\{[p:q:r:s]:p,q,r,s\in\mathbb{R},r^{2}+s^{2}<pq\}.$

Revisit Figure 3.1 and imagine adding one spatial dimension.

Then we have an isometry between our two models of hyperbolic $3$ -space, reminiscent of the quadratic formula.

Theorem 3.8.

The following map is an isometry from $\mathbb{H}^{3}_{M}$ to $\mathbb{H}^{3}_{U}$ :

[p:q:r:s]\mapsto\frac{r}{p}+\frac{s}{p}i+\frac{\sqrt{r^{2}+s^{2}-pq}}{p}j.

The inverse is

a+bi+cj\mapsto[1:a^{2}+b^{2}-c^{2}:a:b].

Proof.

That they are inverses is easy. We will show distances are preserved. By the transitivity of the hyperbolic isometries of hyperbolic $3$ -space on the tangent bundle, we need only compare points $j$ and $aj$ in $\mathbb{H}^{3}_{U}$ , which correspond to points $[1,-1,0,0]$ and $[1,-a^{2},0,0]$ in $\mathbb{H}^{3}_{M}$ . Specifically, we can move the first point to $j$ by an isometry (by transitivity on points), and then move the tangent direction to the second point to point upward along the $j$ line (by transitivity on the tangent bundle). For more details, consult the analogous [HST22, Theorem 4.9]. Then we need only compute the distance before and after the isometry. Using (3), we find

\cosh(d_{M}([1:-1:0:0],[1:-a^{2}:0:0]))=\frac{1+a^{2}}{2a}.

Using (5), we find

\cosh\left(d_{U}(j,aj)\right)=\frac{1+a^{2}}{2a}.

∎

This is somewhat analogous to the ‘quadratic formula’ isometry between ‘coefficient space’ $\mathbb{H}^{2}_{M}$ and ‘root space’ $\mathbb{H}^{2}_{U}$ discussed previously, in that the map takes $[p:q:r:s]$ to a quaternionic root of the polynomial

p\left(Z-\frac{r+si}{p}\right)^{2}-\frac{r^{2}+s^{2}-pq}{p}.

(7)

The only quaternionic root to this polynomial that lies in $\mathbb{H}^{2}_{U}$ is the root mentioned. We also obtain an equivariance statement.

Theorem 3.9.

Identify elements $[p:q:r:s]$ of $\mathbb{H}^{3}_{M}$ with matrices $H_{p,q,r,s}:=\begin{pmatrix}q&-r+si\\ -r-si&p\end{pmatrix}$ . Then the map of Theorem 3.8 between upper half space $\mathbb{H}^{3}_{U}$ and $\mathbb{H}^{3}_{M}$ is $\operatorname{PSL}_{2}(\mathbb{C})$ -equivariant, relating the action of $\operatorname{PSL}_{2}(\mathbb{C})$ via Möbius transformations on $\mathbb{H}^{3}_{U}$ to the action $M\cdot H_{p,q,r,s}:=M^{-1}H_{p,q,r,s}M^{-\dagger}$ on $\mathbb{H}^{3}_{M}$ .

Exercise 3.10.

Prove Theorem 3.9 and give the corresponding representation of $\operatorname{PSL}_{2}(\mathbb{C})$ in $O_{Q}(\mathbb{R})$ .

3.6. The space of circles

By a circle in $\widehat{\mathbb{C}}$ , we mean any circle in $\mathbb{C}$ or the union of any straight line in $\mathbb{C}$ with $\infty$ . These latter we think of as circles through $\infty$ , and they have curvature $0$ (infinite radius).

Associated to a Hermitian matrix such as $M=\begin{pmatrix}q&-r+si\\ -r-si&p\end{pmatrix}$ in the last section, we have a Hermitian form

H_{M}(Z,W):=\begin{pmatrix}W&Z\end{pmatrix}\overline{M}\begin{pmatrix}\overline{W}\\ \overline{Z}\end{pmatrix}=pZ\overline{Z}+(-r+si)Z\overline{W}+(-r-si)\overline{Z}W+qW\overline{W},\quad p,q,r,s\in\mathbb{R}.

When $\Delta>0$ (i.e. $M\in\mathbb{D}^{3}_{M}$ ), then the locus $H_{M}(Z,W)$ in $\mathbb{P}^{1}(\mathbb{C})$ (or $H_{M}(z,1)$ in $\mathbb{C}\cup\{\infty\}$ ) gives a circle.

Theorem 3.11.

Let $[p:q:r:s]\in\mathbb{D}^{3}_{M}$ . Let $H_{M}(Z,W)$ be the associated Hermitian form. Then the roots $Z\in\mathbb{C}$ of $H_{M}(Z,1)$ form a circle in $\widehat{\mathbb{C}}$ . The circle is the boundary of a unique geodesic plane in the upper half space. Conversely, every circle arises uniquely in this way.

Proof.

The equation $H_{M}(Z,1)=0$ can be expressed as

\left(Z-\frac{r+si}{p}\right)\left(\overline{Z}-\frac{r-si}{p}\right)=\frac{r^{2}+s^{2}-pq}{p^{2}},

so represents a circle with centre $(r+si)/p$ and radius $\sqrt{r^{2}+s^{2}-pq}/p$ whenever $r^{2}+s^{2}>pq$ . This last inequality follows from $[p:q:r:s]\in\mathbb{D}^{3}_{M}$ . For the converse, observe that the circle determines $r/p,s/p,q/p$ , hence $[p:q:r:s]\in\mathbb{P}M^{4}$ . ∎

Thus, it makes sense to consider the projectivization of the one-sheeted hyperboloid $\mathbb{D}^{3}_{M}$ to be a parameter space for the circles in $\widehat{\mathbb{C}}$ , i.e. the space of circles. Normalizing so that $r^{2}+s^{2}-pq=1$ , the coordinates have the following somewhat standard names:

(1)

$p$ = curvature, which is the inverse of radius;
(2)

$r$ = real part of curvature times center (the curvature times centre is sometimes more simply called the curvature-centre);
(3)

$s$ = imaginary part of curvature times center;
(4)

$q$ = co-curvature, which is the curvature of the inversion of the circle through the unit circle.

Exercise 3.12.

Show that the circle associated to each point of $\mathbb{D}^{3}_{M}$ via Theorem 3.11 is the same circle obtained by the following process. For a vector $\mathbf{v}\in\mathbb{D}^{3}_{M}$ , take the hyperplane $P$ perpendicular to $\mathbf{v}$ in the Minkowski geometry (i.e., with respect to $\langle\cdot,\cdot\rangle_{Q}$ ). Take the intersection of $P$ with $\mathbb{H}^{3}_{M}$ and call the intersection $I$ . Then use the hyperbolic isometry of Theorem 3.8 to map $I$ into $\mathbb{H}^{3}_{U}$ . This should provide a geodesic plane $G$ . Take the circle $\mathcal{C}$ at the boundary of $G$ . Revisit Figure 3.2, imagining an extra spatial dimension.

Recall that $\operatorname{PSL}_{2}(\mathbb{C})$ acts on $\widehat{\mathbb{C}}$ . It will be helpful later to understand the image of $\widehat{\mathbb{R}}$ under a Möbius transformation; this is a straightforward computation.

Proposition 3.13 ([Sta18b, Proposition 3.7]).

Let $z\mapsto\frac{\alpha z+\beta}{\gamma z+\delta}$ be a Möbius transformation. Then the image of $\widehat{\mathbb{R}}$ under the transformation is a circle with curvature equal to $2\Im(\overline{\gamma}{\delta})=i(\gamma\overline{\delta}-\overline{\gamma}\delta)$ , and curvature times center equal to $i(\alpha\overline{\delta}-\overline{\gamma}\beta)$ . Finally, the co-curvature is $2\Im(\overline{\alpha}\beta)=i(\alpha\overline{\beta}-\overline{\alpha}\beta)$ .

Exercise 3.14.

Prove Proposition 3.13. Furthermore, assume the Möbius transformation is in $\operatorname{PSL}_{2}(\mathbb{Z}[i])$ , and show that for such a circle, the resulting vector $(p,q,r,s)\in\mathbb{Z}^{4}$ in the space of circles is congruent to $(0,0,0,1)$ modulo $2$ .

Exercise 3.15.

What are the correct definitions of curvature and curvature-center for straight lines?

4. Diophantine approximation in the complex plane

It is natural to ask some of our fundamental Diophantine approximation questions for complex numbers. This is where we can reap the benefits of the geometric perspective of the last section. Since the rationals cannot approximate complex numbers off the real line, we must begin by asking what we are hoping to approximate by. One natural answer is to ask to approximate by algebraic numbers of a fixed degree. Another is to approximate by elements of a fixed number field. Here, we will consider the first (later, we will consider the second).

The next question is how to measure the size of an approximation. In the real/rational case, we used the denominator of $p/q\in\mathbb{Q}$ as a natural measure of size. For a general algebraic number $\alpha$ , having minimal polynomial $a_{d}x^{d}+\cdots+a_{0}\in\mathbb{Z}[x]$ , where ${\operatorname{gcd}}(a_{i})=1$ , the naïve height is defined as

H(\alpha):=\max_{i}|a_{i}|.

Then, in analogy to Dirichlet’s and Roth’s theorems, Koksma defines [Kok39]

k_{d}(\alpha):=\sup\{k:\text{ there exist infinitely many algebraic $\beta$ of degree $\leq d$ such that }|\alpha-\beta|<1/H(\beta)^{k}\}.

In this language, Dirichlet’s Theorem states that $k_{1}(\alpha)\leq 2$ for rational $\alpha$ and $\geq 2$ for $\alpha\in\mathbb{R}\smallsetminus\mathbb{Q}$ . Roth’s theorem says that $k_{1}(\alpha)=2$ for algebraic $\alpha$ , and Sprindz̆uk says $k_{1}(\alpha)=2$ for almost all real $\alpha$ .

Theorem 4.1 (Sprindz̆uk, [Spr69]).

$k_{d}(\alpha)=d+1$ for almost all $\alpha\in\mathbb{R}$ and $k_{d}(\alpha)=\frac{d+1}{2}$ for almost all $\alpha\in\mathbb{C}\smallsetminus\mathbb{R}$ .

This theorem shows that approximation by algebraic numbers is fundamentally different on the real line than off of it. Next, here is a tantalyzingly precise description of $k_{d}(\alpha)$ for $\alpha$ algebraic.

Theorem 4.2 (Bugeaud-Evertse [BE09]).

Let $\alpha\in\mathbb{C}\smallsetminus\mathbb{R}$ be algebraic. Then

k_{d}(\alpha)=\left\{\begin{array}[]{ll}\frac{d+1}{2}\text{ or }\frac{d+2}{2}&\text{ if }\deg(\alpha)\geq d+2,d\text{ even}\\ \min\left\{\frac{deg(\alpha)}{2},\frac{d+1}{2}\right\}&\text{ otherwise }\end{array}\right.

For $d=2$ , $\deg(\alpha)>2$ :

k_{d}(\alpha)=\left\{\begin{array}[]{ll}2&\text{ if }1,\alpha\overline{\alpha},\alpha+\overline{\alpha}\text{ are $\mathbb{Q}$-independent}\\ 3/2&\text{otherwise}\end{array}\right.

The underlying methods for this are algebraic. This theorem, like Roth’s Theorem, follows from a far-reaching theorem called Schmidt’s subspace theorem.

4.1. Quadratics

We are interested in finding geometric connections or explanations for Theorem 4.2. In particular, the result shows that approximation by quadratics is quite different for certain types of complex numbers compared to others. Our starting point is an analog to Figure 2.5 showing the rational numbers. In Figure 4.1, we see the quadratic algebraic numbers in the upper half plane, sized by their discriminants.

In this picture, our eyes infer the $\operatorname{SL}_{2}(\mathbb{Z})-$ tessellation from Figure 2.1, but with the geodesic boundaries – and many more geodesics besides – filled in with the pearly necklaces of Figure 2.5. In light of the $\mathbb{Z}^{2}$ hiding in Figure 2.5, this image asks us a question: are we viewing a higher dimensional lattice under some transformation?

To answer this question, we return to the coefficient and roots spaces $\mathbb{H}^{2}_{M}$ and $\mathbb{H}^{2}_{U}$ of Section 3. Within the coefficient space $\mathbb{H}^{2}_{M}$ , we have the quadratic algebraics: $\mathbb{H}^{2}_{M}(\mathbb{Z}):=\{[a:b:c]:a,b,c\in\mathbb{Z},b^{2}<4ac\}$ . This is the projectivization of the portion of the 3D lattice $\mathbb{Z}^{3}$ which lies inside the light cone. The naïve height is, roughly speaking, a measure of how far the representative $[a:b:c]$ with ${\operatorname{gcd}}(a,b,c)=1$ is from the origin. So we are in a situation not unlike our description of continued fractions in the real line as the projectivization of the lattice $\mathbb{Z}^{2}$ . That is, the relationship between $\mathbb{H}^{2}_{M}(\mathbb{Z})$ and Figure 4.1 is described by the transformation of Theorem 3.3, which describes Figure 4.1 as an image of a 3D lattice under a certain geometric transformation. See Figure 4.2.

Under this transformation, the planes inside the lattice $\mathbb{Z}^{3}$ projectivize to geodesics on the Poincaré disc model and hence in the upper half plane; these are the visually striking necklaces in Figure 4.1. We have the following consequence:

Proposition 4.3 ([HST22, Observation 4.12]).

Let $\alpha\in\mathbb{C}\smallsetminus\mathbb{R}$ . Then $1,\alpha\overline{\alpha},\alpha+\overline{\alpha}$ are $\mathbb{Q}$ -dependent if and only if $\alpha$ lies on a geodesic whose limit points form a conjugate pair of points in a real quadratic field, or a pair of rational points.

We call such geodesics rational geodesics. The main reference for the remainder of this section is [HST22].

Exercise 4.4.

Prove Proposition 4.3.

4.2. The Diophantine approximation of quadratics

Having this geometric interpretation of the way the quadratic algebraic numbers fill out the complex upper half plane, we make some geometrically motivated choices for the notions of distance, complexity and goodness in Diophantine approximation, that differ somewhat from the algebraically motivated ones made classically:

(1)

We size quadratics by their discriminant. Unlike the naïve height and other classical measures of arithmetic complexity, the discriminant is invariant under $\operatorname{PSL}_{2}(\mathbb{Z})$ , the natural symmetries of the space. In the fundamental region, it tracks with the naïve height, so this is not a violent change (see [HST22, §5.2.4]).

(2)

We use the hyperbolic metric in the upper half plane, not the Euclidean one. Again, this respects the action of $\operatorname{PSL}_{2}(\mathbb{Z})$ , without doing great violence in any bounded region. Let $\alpha$ and $\beta$ correspond to points $f_{\alpha}$ and $f_{\beta}$ in $\mathbb{H}^{2}_{M}$ . The distance in $\mathbb{H}^{2}_{M}$ dictated by the Minkowski geometry is given by

d_{M}(f_{\alpha},f_{\beta})=\operatorname{acosh}\left(\frac{-\langle f_{\alpha},f_{\beta}\rangle}{\sqrt{\Delta_{\alpha}\Delta_{\beta}}}\right).

As a consequence, we can re-interpret complex approximation results under these choices.

Theorem 4.5 ([HST22, Theorem 6.3]).

Let $\alpha\in\mathbb{C}\smallsetminus\mathbb{R}$ not be quadratic algebraic, but lying on a rational geodesic. Then there exists $K_{\alpha}>0$ , depending on $\operatorname{PSL}_{2}(\mathbb{Z})$ orbit of $\alpha$ , so that there are infinitely many quadratic $\beta$ on that geodesic with

d_{U}(\alpha,\beta)\leq\operatorname{acosh}\left(1+\frac{K_{\alpha}}{|\Delta_{\beta}|^{2}}\right).

Proof.

We provide only a sketch, but the proof is an adaptation of the proof of Dirichlet’s Theorem. We begin by writing the element of coefficient space corresponding to $\alpha$ as

[1/(\alpha+\overline{\alpha}):1:\alpha\overline{\alpha}/(\alpha+\overline{\alpha})]=[\alpha_{1}:1:\alpha_{2}].

Then, we consider the multiples of this vector modulo $[\mathbb{Z}:\mathbb{Z}:\mathbb{Z}]$ . Dirichlet’s box principle¹²¹²12More commonly known as the pigeonhole principle, although I’ve heard it claimed that this very proof is the first recorded use of the idea in the mathematical literature. tells us two multiples are close to one another, so their difference, which we write as $n[\alpha_{1}:1:\alpha_{2}]$ is close to a lattice element, which we denote $[p_{1}:n:p_{2}]$ ; this is a candidate good approximation $f_{\beta}$ . This tells us that certain linear forms are small:

|n\alpha_{1}-p_{1}|,\quad|n\alpha_{2}-p_{2}|,\quad|\alpha_{1}p_{2}-\alpha_{2}p_{1}|.

This gives a small discriminant pairing $\langle f_{\alpha},f_{\beta}\rangle$ , which in turn gives a small hyperbolic distance. ∎

Next, we re-interpret the result of Bugeaud and Evertse (see Figure 4.3).

Theorem 4.6 ([HST22, Theorem 6.6]).

Let $\alpha\in\mathbb{C}\smallsetminus\mathbb{R}$ be algebraic but not quadratic. Let $\epsilon>0$ . If $\alpha$ lies on a rational geodesic, then there are only finitely many quadratic $\beta$ such that

d_{U}(\alpha,\beta)\leq\operatorname{acosh}\left(1+\frac{1}{|\Delta_{\beta}|^{{2+\epsilon}}}\right).

Whether $\alpha$ is on a rational geodesic or not, amongst $\beta$ not sharing a rational geodesic with $\alpha$ , there are only finitely many such that

d_{U}(\alpha,\beta)\leq\operatorname{acosh}\left(1+\frac{1}{|\Delta_{\beta}|^{{3/2+\epsilon}}}\right).

The proof proceeds in a similar way, by moving to the same linear forms as in Theorem 4.5, and then one applies Schmidt’s subspace theorem.

4.3. Cubics and beyond

The cubics having two complex conjugate roots can be treated in a similar way, although the discriminant locus is more complicated. The lattice of coefficients is 4-dimensional, so the image is much more complicated and layered when drawn as roots in the upper half plane. It can help to embed it in a 3-dimensional space made up of the upper half plane root together with the real root. It is natural to use a disc model for the upper half plane, and a circle for the real line, resulting in a torus as their product. Some images are shown in Figures 4.4–4.6.

Figure 4.5. A detail of the cubics of Figure 4.4.

Figure 4.6. A 3d view of the cubic roots embedded in a torus (this image was created using Emily Dumas’ SL(View) software https://www.dumas.io/) [HST22, Figure 10].

4.4. Open problems

There are a great many open problems motivated by this perspective.

(1)

Is there a geodesic flow / continued fraction theory for good approximations by quadratics?
(2)

Is there a natural analog to a Lagrange spectrum?
(3)

Is there an analog to the Farey subdivision?
(4)

And many more; see [HST22].

5. Apollonian circle packings: geometric aspects

5.1. Schmidt subdivision

Asmus Schmidt developed a beautiful complex analogue to the Farey subdivision [Sch75]. Recall that the Farey subdivision on $\widehat{\mathbb{R}}=\partial\mathbb{H}^{2}_{U}$ is the boundary of a nested set of geodesics in $\mathbb{H}^{2}_{U}$ . Similarly, Schmidt’s subdivision views $\widehat{\mathbb{C}}$ as the boundary of the upper half space, and the subdivision is formed as the boundary of geodesic planes in the upper half space. The initial subdivision divides the plane by the use of two lines and two circles into eight regions (see Figure 5.1, top). Each triangle is subdivided as in the lower portion of Figure 5.1, by inserting a new circle tangent to all three sides. Repeating this, we obtain nested regions like in Figure 5.2. (Schmidt’s subdivision also incorporates dual circles orthogonal to these, but we will ignore them for now.)

Figure 5.1. A screenshot of Asmus Schmidt’s paper, Diophantine Approximation of Complex Numbers [Sch75, Figure 1, 1*]. Currently suppressed because permission has not yet been requested.

The analogy is as follows:

Continued fractions	Farey subdivision	Schmidt subdivision
	$\widehat{\mathbb{R}}$	$\widehat{\mathbb{C}}$
	$\operatorname{PSL}_{2}(\mathbb{Z})$	$\operatorname{PSL}_{2}(\mathbb{Z}[i])$
	intervals	circles and triangles
convergents	endpoints	tangency points
coefficients	series of nested geodesics	series of nested geodesic planes
	—	Apollonian circle packings

The last line of the table is a new and very rich phenomenon that only exists in the $\widehat{\mathbb{C}}$ case. Unlike the Farey subdivision, the Schmidt subdivisions are built out of intermediate ‘pieces,’ called Apollonian circle packings. It is this new object that is so fascinating.

5.2. Descartes quadruples and Apollonian circle packings

A Descartes quadruple is a collection of four circles, all pairwise mutually tangent, of disjoint interiors. Given three mutually tangent circles, there are exactly two circles, called Soddy circles, which can complete the triple to a Descartes quadruple. To construct an Apollonian circle packing, begin with three mutually tangent circles. At each stage, add in any missing Soddy circles for any mutually tangent triple in the packing, and repeat ad infinitum (Figure 5.3).

Famously proven by Descartes and Princess Elisabeth of Bohemia¹³¹³13What we know is as follows. Descartes proposed the problem (of finding the radii of the circles tangent to a given triple of circles) to Elisabeth in 1643, and she provided a solution, which is lost, but left Descartes, in his own words, ‘filled with joy.’ (Keep in mind he was writing to a Princess!) Descartes’ side of the correspondence having survived, we have two of his solutions, the second of which assumes the three starting circles are mutually tangent, and most closely matches what is now often called Descartes’ Theorem in this context. in correspondence [Sha07, pp. 73–81; AT 4:37, 4:44, 4:45], four circles form a Descartes quadruple if and only if their curvatures $a,b,c,d$ satisfy $Q(a,b,c,d)=0$ for the quadratic form

Q(a,b,c,d)=(a+b+c+d)^{2}-2(a^{2}+b^{2}+c^{2}+d^{2}).

(8)

In particular, given a mutually tangent triple of curvatures $a,b,c$ , the Soddy circles have curvatures $d_{1}$ and $d_{2}$ satisfying $Q(a,b,c,d_{i})=0$ ; these are related by

d_{1}+d_{2}=2(a+b+c).

There are various generalizations of this theorem, including to spheres in higher dimension [LMW02, AR23]. We will sometimes refer to the quadruple of curvatures as a Descartes quadruple also, at the risk of some minor confusion.

As we will exploit later, this has the beautiful consequence that if one begins with a Descartes quadruple of integer curvatures $a,b,c,d$ , then the entire packing will consist of integer curvatures. Such an integral packing is called primitive if it has no common factor among its curvatures. Some examples are shown in Figure 5.4.

5.3. Apollonian group

The structure of the Apollonian packing is governed by the Apollonian group $\mathcal{A}$ , which acts freely and transitively on the collection of its unordered Descartes quadruples [GLM⁺05, Theorem 4.3]. In particular, it acts on quadruples of curvatures $(a,b,c,d)$ satisfying $Q(a,b,c,d)=0$ , and therefore is considered a subgroup of the orthogonal group $O_{Q}(\mathbb{Z})$ preserving the form $Q$ . The Descartes form $Q$ has signature $(3,1)$ . Specifically, $\mathcal{A}$ is generated by the four matrices

\tiny S_{1}:=\begin{pmatrix}-1&0&0&0\\ 2&1&0&0\\ 2&0&1&0\\ 2&0&0&1\\ \end{pmatrix},S_{2}:=\begin{pmatrix}1&2&0&0\\ 0&-1&0&0\\ 0&2&1&0\\ 0&2&0&1\\ \end{pmatrix},S_{3}:=\begin{pmatrix}1&0&2&0\\ 0&1&2&0\\ 0&0&-1&0\\ 0&0&2&1\\ \end{pmatrix},S_{4}:=\begin{pmatrix}1&0&0&2\\ 0&1&0&2\\ 0&0&1&2\\ 0&0&0&-1\\ \end{pmatrix},

(9)

acting on row vectors of curvatures from the right. Each generator corresponds to fixing three of the circles in a Descartes quadruple and ‘swapping’ out one Soddy circle for its alternative (see Figure 5.5). The Descartes quadruples in any one packing constitute one orbit of the Apollonian group.

Figure 5.5. An Apollonian swap, replacing the fourth circle

C_{4}

with its alternate

C_{4}^{\prime}

Figure 5.6. Cayley graph of

\mathcal{A}

, shown to two levels.

This group, whose definition goes back to Hirst [Hir67], is our point of access to the rich arithmetic structure of the curvatures. Fuchs showed that it was a thin group [Fuc11], meaning that it has infinite index in $O_{Q}(\mathbb{Z})$ and yet is Zariski dense in the algebraic group $O_{Q}$ . Zariski density means that if a polynomial $f(\ldots,x_{ij},\ldots)$ vanishes on elements of $\mathcal{A}$ (the $x_{ij}$ representing the entries of the matrix), then it vanishes for all matrices in $O_{Q}(\mathbb{R})$ . In other words, elements of $\mathcal{A}$ cannot be detected by any polynomial condition on its matrix entries.

These matrices satisfy the relations $S_{i}^{2}=I$ , and in fact there are no other relations [GLM⁺05, Proof of Theorem 4.3], so that

\mathcal{A}=\left\langle S_{1},S_{2},S_{3},S_{4}:S_{i}^{2}=1\right\rangle<\operatorname{O}_{Q}(\mathbb{Z}).

This means that the Cayley graph is particularly nice. The Cayley graph of a group $G$ with respect to a generating set $S$ is the graph whose vertices are the elements of $G$ and which has a directed edge from $g$ to $sg$ for all $s\in S$ and $g\in G$ . In the case of the Apollonian group, we take $S$ to be the set of generators $S_{1},S_{2},S_{3},S_{4}$ . Since these are involutions, we can consider the Cayley graph to be an undirected graph of degree $4$ . It will be a tree, as in Figure 5.6.

Figure 5.7. Inversion

S_{1}^{\perp}

Figure 5.8. The base quadruple.

5.4. Super-Apollonian group

Graham, Lagarias, Mallows, Wilks and Yan defined the super-Apollonian group by adding four circle inversions $S_{i}^{\perp}$ to the Apollonian group, one for inverting into each of the four circles of the Descartes quadruple (Figure 5.7). The matrix for $S_{i}^{\perp}$ is the transpose of the matrix $S_{i}$ , which is a consequence of a type of duality [GLM⁺05]. The presentation is

\langle S_{1},S_{2},S_{3},S_{4},S_{1}^{\perp},S_{2}^{\perp},S_{3}^{\perp},S_{4}^{\perp}:S_{i}^{2}=(S_{i}^{\perp})^{2},S_{j}S_{k}^{\perp}=S_{k}^{\perp}S_{j},j\neq k\rangle.

This has finite index in $O_{Q}(\mathbb{Z})$ , so it is no longer thin. The words of length $5$ taken in normal form (eliminating $S_{i}^{2}$ , $(S_{i}^{\perp})^{2}$ , $S_{i}^{\perp}S_{j}$ ) are shown in Figure 5.9. In fact, the full orbit coincides with Schmidt’s subdivision shown in Figure 5.2. This gives a little bit of perspective on the way in which the Apollonian circle packings form an essential ‘piece’ of Schmidt’s vision of $\widehat{\mathbb{C}}$ .

Using this super-Apollonian perspective, one can define continued fractions for the complex plane, for approximating elements of $\mathbb{C}$ by Gaussian rationals [CFHS19]; this is distinct from, but related to, Schmidt’s method [Sch75].

5.5. Geometric Apollonian group

The Apollonian group has another incarnation, sometimes called the geometric Apollonian group (as distinct from its algebraic version above).

From this perspective, we view the strip packing as the orbit of the base quadruple shown in Figure 5.8, under a group of Möbius transformations called the geometric Apollonian group. In fact, these transformations can be taken to have entries from $\mathbb{Z}[i]$ , the Gaussian integers, so that we obtain a group $\mathcal{A}^{geo}<\operatorname{PSL}_{2}(\mathbb{Z}[i])\rtimes\langle\tau\rangle$ , where $\tau$ is complex conjugation. This larger group of conformal maps on $\widehat{\mathbb{C}}$ is sometimes called the generalized Möbius transformations [GLM⁺05]. One set of generators that suffices is [Sta18a, equation (8)]

z\mapsto\frac{(2i+1)\overline{z}-2}{2\overline{z}+2i-1},\quad z\mapsto-\overline{z}+2,\quad z\mapsto\frac{\overline{z}}{2\overline{z}-1},\quad z\mapsto-\overline{z}.

This generates the strip packing (Figure 5.12), although one must then scale by a factor of $2$ to obtain a primitive integral packing (note that the base quadruple in Figure 5.8 has curvatures $0,0,2,2$ ). All other Apollonian packings are images of the strip packing under some Möbius transformation.

Exercise 5.1.

Using the fact that the Möbius transformations act triply transitively on $\widehat{\mathbb{C}}$ , show that any two Apollonian packings are related by a Möbius transformation.

To relate the algebraic and geometric Apollonian groups, one can use the exceptional isomorphism

\operatorname{PGL}_{2}(\mathbb{C})\rightarrow\operatorname{SO}^{+}_{1,3}(\mathbb{R}).

(10)

For now, we leave aside the details (which are finicky). This is sometimes discussed in the language of the spin homomorphism $\operatorname{SL}_{2}(\mathbb{C})\rightarrow\operatorname{SO}_{1,3}(\mathbb{R})$ or spin double cover. See [Fuc11, GLM⁺05].

The group $\operatorname{PGL}_{2}(\mathbb{C})$ can be identified with $\operatorname{Isom}(\mathbb{H}^{3}_{U})$ , the isometries of the hyperbolic 3-space realized as the upper half space whose boundary is $\widehat{\mathbb{C}}$ . Under this interpretation, $\mathcal{A}^{geo}$ is a Kleinian group (that is, a discrete subgroup of $\operatorname{PGL}_{2}(\mathbb{C})$ ), with an infinite volume quotient hyperbolic $3$ -manifold $\mathcal{A}^{geo}\backslash\mathbb{H}^{3}_{U}$ . It is geometrically finite (in this context, this means the fundamental domain can be taken to have finitely many sides).

5.6. Limit set and circle growth

Viewing an Apollonian packing $\mathcal{P}$ as a subset of $\widehat{\mathbb{C}}$ , we define the residual set $\Lambda(\mathcal{P})$ as the closure of the union of the circles of the packing. There are countably many circles and countably many tangency points, but there are uncountably many points in $\Lambda(\mathcal{P})$ which are added in the closure process. The complement of $\Lambda(\mathcal{P})$ consists of the interiors of all the circles.

For $\mathcal{P}$ the strip packing, $\Lambda(\mathcal{P})$ is equal to the limit set of $\mathcal{A}^{geo}$ (i.e., the set of accumulation points for the orbit of a point under $\mathcal{A}^{geo}$ ) (a similar statement holds for other packings, taking an appropriate conjugate of $\mathcal{A}^{geo}$ ).

Theorem 5.2 ([McM98]).

The Hausdorff dimension of $\Lambda(\mathcal{P})$ is $\alpha:=1.30568\ldots$ .

This constant, for which no closed form is known, was recently rigorously computed to an impressive 128 digits by Vytnova and Wormell [VW24].

Exercise 5.3.

Let $\Lambda^{*}(\mathcal{P})$ temporarily denote the complement of the interiors of $\mathcal{P}$ . Hirst [Hir67] showed this has Hausdorff dimension less than $2$ , hence measure zero. Use this to show that $\Lambda^{*}(\mathcal{P})=\Lambda(\mathcal{P})$ .

The Hausdorff dimension controls the growth of the number of circles in the packing in terms of size:

Theorem 5.4 ([KO11]).

Let $N_{\mathcal{P}}(X)=\#\{C\in\mathcal{P}:\operatorname{curv}(C)<X\}$ . Then

N_{\mathcal{P}}(X)\sim c_{\mathcal{P}}X^{\alpha}.

An error term and other refinements came afterward [LO13, OS12]. The constant $c_{\mathcal{P}}$ is called the Apollonian constant for the packing $\mathcal{P}$ and a formula is given in [Vin14, Remark 2.9].

For further details in this direction, consult [GLM⁺05, Theorem 4.1], [Oh14] and [Oh10].

5.7. Quadratic forms

The geometric Apollonian group has as a subgroup the $2$ -congruence subgroup, the kernel in $\operatorname{PSL}_{2}(\mathbb{Z})$ under reduction modulo $2$ :

\Gamma(2):=\{M\in\operatorname{PSL}_{2}(\mathbb{Z}):M\equiv I~{}(\textup{mod}~{}2)\}\subseteq\operatorname{PSL}_{2}(\mathbb{Z}).

This is an arithmetic group (much more is known for arithmetic groups than for thin groups), and is a Fuchsian group, that is, a discrete subgroup of $\operatorname{PSL}_{2}(\mathbb{R})$ . The group $\Gamma(2)$ is the subgroup of $\mathcal{A}^{geo}$ stabilizing $\widehat{\mathbb{R}}$ .

Take the three circles of the base quadruple tangent to $\widehat{\mathbb{R}}$ (Figure 5.8); those of radius $1/2$ centred on $i/2$ and $1+i/2$ , and the circle $i+\widehat{\mathbb{R}}$ , i.e. the horizontal line through $i$ (this is indeed tangent to $\widehat{\mathbb{R}}$ at $\infty$ ; exercise). The group $\Gamma(2)$ preserves tangencies, and so the orbit of the base quadruple under $\Gamma(2)$ gives a family of circles tangent to $\widehat{\mathbb{R}}$ . This is the family of the Ford circles: the circles tangent to $\widehat{\mathbb{R}}$ at each rational number $p/q\in\mathbb{Q}$ (in lowest form) of radius $1/2q^{2}$ (plus the horizontal line through $i$ ). See Figure 5.10. Their curvatures, as a family, are exactly twice the perfect squares.

Exercise 5.5.

Prove this statement (that the Ford circles are the indicated orbit).

More generally, fix a circle $\mathcal{C}$ , which we will call the mother circle. Letting $M\cdot\widehat{\mathbb{R}}=\mathcal{C}$ , the subgroup $M\Gamma(2)M^{-1}$ is the stabilizer of $\mathcal{C}$ in $M\mathcal{A}^{geo}M^{-1}$ and gives rise to a collection of circles tangent to $\mathcal{C}$ (see Figure 5.11) as the image of the Ford circles under $M$ . The curvatures of this family of circles is exactly the set of primitively represented values of a translated quadratic form. Filling out the details of this relationship proves the following theorem, first observed in this form by Sarnak [Sar], but present in another form in [GLM⁺03].

Theorem 5.6.

Let $\mathcal{C}$ be a circle of curvature $c$ within an Apollonian circle packing $\mathcal{P}\subseteq\widehat{\mathbb{C}}$ . Then there is a real binary quadratic form $f_{\mathcal{C}}(x,y)$ of discriminant $-4c^{2}$ such that the set of curvatures of circles tangent to $\mathcal{C}$ within $\mathcal{P}$ is the set

\{f_{\mathcal{C}}(x,y)-c:x,y\in\mathbb{Z},(x,y)=1\}.

Proof.

The original observation was derived as a consequence of Descartes’ relation. Here we give a proof using $\mathcal{A}^{geo}$ , beginning with the strip packing. Recall that $\mathcal{A}^{geo}$ generates the strip packing from the base quadruple $(0,0,2,2)$ of Figure 5.8. Consider the circle which is the horizontal line through $i$ . The corresponding Möbius transformation from $\widehat{\mathbb{R}}$ is $\begin{pmatrix}1&i\\ 0&1\end{pmatrix}$ . If we post-compose by $\Gamma(2)$ (which, we recall, acts to permute the circles tangent to $\widehat{\mathbb{R}}$ ), we have a coset

\Gamma(2)\begin{pmatrix}1&i\\ 0&1\end{pmatrix}

of Möbius transformations representing the circles tangent to $\widehat{\mathbb{R}}$ . However, these are all oriented opposite to how they should be (having negative curvatures), so we must post-compose by, say, $\begin{pmatrix}0&1\\ 1&0\end{pmatrix}$ .

Now more generally, consider a packing that is the image of the strip packing under some $M\in\operatorname{PSL}_{2}(\mathbb{C})$ , where we wish to parametrize the family of circles tangent to $\mathcal{C}=M\cdot\widehat{\mathbb{R}}$ . Then this family is given by:

M\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\Gamma(2)\begin{pmatrix}1&i\\ 0&1\end{pmatrix}.

Now we apply Proposition 3.13 to compute the curvatures of this family.

Taking an element $\begin{pmatrix}x&r\\ y&s\end{pmatrix}\in\Gamma(2)$ , we obtain

	$\displaystyle 2\Im(\overline{(x\delta+y\gamma)}{\left((r\delta+s\gamma)+i(x\delta+y\gamma)\right)})$	$\displaystyle=2\Im(\overline{(x\delta+y\gamma)}{(r\delta+s\gamma)})+2\Im(\overline{(x\delta+y\gamma)}{i(x\delta+y\gamma)})$
		$\displaystyle=-2\Im(\overline{\gamma}{\delta})+2N(x\delta+y\gamma).$

The first term is the curvature of $\mathcal{C}$ . The second is a quadratic form in integral variables $x,y$ . ∎

Exercise 5.7.

Using the the proof above, recover the fact that the Ford circles have curvatures $2x^{2}$ .

Exercise 5.8.

Prove the theorem from the Descartes relation.

The proof hints at how to compute the form $f_{\mathcal{C}}$ . Fix a Descartes quadruple $\mathcal{C}_{1},\mathcal{C}_{2},\mathcal{C}_{3},\mathcal{C}_{4}$ containing $\mathcal{C}_{1}=\mathcal{C}$ , the mother circle. Write $[n,a,b,c]$ for the quadruple of curvatures in the same order. Choose $M$ to take $\infty$ , $0$ , $1$ to $a,b,c$ . Then, the form is

f_{\mathcal{C}}(x,y)=(n+a)x^{2}+(n+a+b-c)xy+(n+b)y^{2};

because then we recover the curvatures $a,b,c$ from $f_{\mathcal{C}}(x,y)-n$ where $(x,y)=(1,0),(0,1),(1,1)$ respectively. Notice that a different choice of quadruple including $\mathcal{C}_{1}$ corresponds to a different $M$ and a change of variables on $f_{\mathcal{C}}$ within the $\operatorname{PGL}_{2}(\mathbb{Z})-$ equivalence class. The following strong statement encompasses this.

Proposition 5.9 ( [GLM⁺03, Theorem 4.2], [Ric24, Proposition 3.1.2] ).

Let $n\in\mathbb{R}$ . The quadruples $[n,a,b,c]\in\mathbb{R}^{4}$ satisfying the Descartes quadratic form (8) biject with the set of positive semi-definite (this means not taking negative values) binary quadratic forms $Ax^{2}+Bxy+Cy^{2}$ of discriminant $-4n^{2}$ . The map is

\phi:[n,a,b,c]\mapsto(n+a)x^{2}+(n+a+b-c)xy+(n+b)y^{2}.

Furthermore, if we identify quadruples $[n,a,b,c]$ under the action of $S_{2}$ , $S_{3}$ , $S_{4}$ and under permutation of the last three entries, then equivalence classes of Descartes quadruples $[n,a,b,c]$ are in bijection with $\operatorname{GL}_{2}(\mathbb{Z})-$ equivalence classes of positive semi-definite binary quadratic forms of discriminant $-4n^{2}$ .

We have seen that these forms are a reflection of the Fuchsian group $\Gamma(2)$ and its conjugates inside the Apollonian group. These form one of the main tools used to study Apollonian circle packings (more generally, these are a special feature of certain thin Kleinian groups that cause them to behave similarly to the Apollonian case [FSZ19]).

5.7.1. The space of Descartes quadruples over $\mathbb{R}$

Proposition 5.9 allows us to map Descartes quadruples into the upper half plane $\mathbb{H}^{2}_{U}$ , using the correspondence $Ax^{2}+Bxy+Cy^{2}\mapsto\frac{-B\pm\sqrt{B^{2}-4AC}}{2A}$ as studied in Section 3.3. Thus, $\mathbb{H}^{2}_{U}$ is the space of pairs $(\mathcal{C},A)$ where $\mathcal{C}$ is a circle and $A$ an Apollonian circle packing containing it, where these pairs are identified under affine transformations (i.e. we only think of the geometry of the packing, not its position in space).

This raises an interesting question: can we explore this parameter space? Since we are in the context of real curvatures (not necessarily integral or rational), the packing generated by a quadruple has more variety than we have so far discussed: there are bounded packings, half-plane packings, and full-plane packings (see Figure 5.12).

It is natural to ask which type of packing one obtains – for example, what is the locus, in the upper half plane, of the bounded packings? The answer is striking: colouring the parameter space according to the type of packing results in the strip packing itself in the upper half plane! Let $\mathbb{H}^{2}_{U}$ denote the upper half plane considered as a parameter space of quadruples, and let $\mathcal{P}$ be the strip packing inside $\mathbb{H}^{2}_{U}$ . Then:

(1)

the bounded packings occur exactly in the interiors of the circles of $\mathcal{P}$ ;
(2)

the strip packing occurs at each tangency point of $\mathcal{P}$ ;
(3)

the half-plane packings occur on the circles of $\mathcal{P}$ , away from tangency points;
(4)

the full-plane packings occur at all remaining points, which are the points in $\Lambda(\mathcal{P})\smallsetminus\mathcal{P}$ .

In Figure 5.13, we label the circles of $\mathcal{P}$ by their depth: the number of Apollonian swaps required to reach the outer circle. This is explained in [Ric24]; see also variations on this parameter space in [Hol21] and [Koc20].

6. Apollonian circle packings: number theory aspects

6.1. What is known about curvatures?

Which integer curvatures appear as curvatures in a primitive integral Apollonian circle packing? It is evident that not all curvatures can appear, since the packing doesn’t have enough geometric ‘room’ to fit all the small curvatures. However, once we start collating the larger curvatures, we begin to see many repeats. So it may be reasonable to believe that we eventually begin to see every sufficiently large curvature.

This is not true however. See Figure 6.1. Graham, Lagarias, Mallows, Wilks and Yan observed a congruence restriction on curvatures [GLM⁺03]. Precisely, they observed that certain residue classes modulo $24$ were sometimes entirely avoided by the set of curvatures in a primitive integral Apollonian circle packing. Specifically, when reducing the curvatures of an Apollonian packing modulo $24$ , one obtains one of the following six possible sets of admissible curvatures (for a complete proof, see [HKRS24, Proposition 2.1]):

type	residues
$(6,1)$	$0,1,4,9,12,16$
$(6,5)$	$0,5,8,12,20,21$
$(6,13)$	$0,4,12,13,16,21$
$(6,17)$	$0,8,9,12,17,20$
$(8,7)$	$3,6,7,10,15,18,19,22$
$(8,11)$	$2,3,6,11,14,15,18,23$

Each admissible set is assigned a type for reference, which is $(n,k)$ where $n$ is the number of residues and $k$ is the smallest residue coprime to $24$ .

To understand the local (i.e., congruence) obstructions that may appear for a modulus $n$ , one can label the Cayley graph by Descartes quadruples (beginning from the root labelled with any fixed quadruple of the packing), and reduce modulo $n$ , identifying vertices with identical quadruples modulo $n$ . We obtain a picture such as that in Figure 6.2. For some moduli, the reduced Cayley graph is small, and some residue classes are missed amongst the curvatures (as in the figure). This proves that the corresponding residue class can never occur as the residue of a curvature in the Apollonian circle packing.

Graham, Lagarias, Mallows, Wilks and Yan conjectured that this obstruction is essentially all we expect, giving the set of curvatures a positive density in the natural numbers [GLM⁺03, Positive Density Conjecture]; and that maybe even only finitely many exceptions beyond the congruence obstructions occur [GLM⁺03, Strong Density Conjecture]. This came to be known as the local-to-global conjecture for Apollonian circle packings, later refined by Fuchs-Sanden [FS11], who collected data on several packings, computing curvatures up to size $5\times 10^{8}$ . In particular, they computed the multiplicity of appearance for curvatures. An example of the results is shown in Figure 6.3, and this indicates that, for most admissible curvatures, the expected multiplicity grows with the curvature size.

If we write $\mathcal{K}(N):=\#\{n<N:n\text{ is a curvature in ${\mathcal{P}}$}\}$ , then the conjecture implies that

\mathcal{K}(N)=cN+O(1),

(11)

where $c=\frac{\mbox{\# admissible curvatures modulo $24$}}{24}$ .

Over time, lower bounds on the number of distinct integer curvatures which appear in a packing have gradually improved. In the original paper of Graham, Lagarias, Mallows, Wilks and Yan, it was shown that $\mathcal{K}(N)\gg\sqrt{N}$ . Sarnak was able to show that $\mathcal{K}(N)\gg\frac{N}{\sqrt{\log N}}$ [Sar]. Positive density (meaning a positive proportion of integers) was shown by Bourgain and Fuchs: $\mathcal{K}(N)\gg N$ [BF11]. This was improved to density one using the Hardy-Littlewood circle method by Bourgain and Kontorovich: $\exists\eta>0,\;\mathcal{K}(N)=cN+O(N^{1-\eta})$ , where $\eta$ is effectively computable [BK14]. Finally, this result was extended to other packings by Fuchs, Stange and Zhang [FSZ19].

However, it was recently discovered [HKRS24] that most Apollonian circle packings are subject to some additional more subtle restrictions on their curvatures: the local-to-global conjecture is false! The source of these new obstructions is quadratic reciprocity. Recall Definition 2.6 of the Legendre symbol for an integer $a$ and prime $p$ is defined as:

\left(\frac{a}{p}\right)=\left\{\begin{array}[]{ll}1&a\text{ is a non-zero square modulo }p\\ -1&a\text{ is not a square modulo }p\\ 0&a\text{ is zero modulo }p\\ \end{array}\right.

This is multiplicative in the numerator:

\left(\frac{ab}{p}\right)=\left(\frac{a}{p}\right)\left(\frac{b}{p}\right).

More generally, the Jacobi symbol extends the Legendre symbol multiplicatively for odd positive denominators:

\left(\frac{a}{p_{1}p_{2}}\right)=\left(\frac{a}{p_{1}}\right)\left(\frac{a}{p_{2}}\right).

One can extend even further to the Kronecker symbol defined for all integers. Then, quadratic reciprocity says that for two odd primes $p$ and $q$ , there is a symmetry between the behaviour of $p$ modulo $q$ and $q$ modulo $p$ :

\left(\frac{p}{q}\right)=(-1)^{\frac{p-1}{2}\cdot\frac{q-1}{2}}\left(\frac{q}{p}\right).

There are special rules for $2$ and $-1$ .

To demonstrate where these new reciprocity obstructions arise from, we prove a single example.

Theorem 6.1 ([HKRS24, Theorem 1.9]).

The Apollonian circle packing generated by the quadruple $(-3,5,8,8)$ has no square curvatures.

Proof.

This packing has the property that all curvaures are $0$ or $1~{}(\textup{mod}~{}4)$ . Let $\mathcal{C}$ be a circle of curvature $n$ . By Theorem 5.6, the circles tangent to $\mathcal{C}$ have curvatures arising as the primitively represented values of a translated quadratic form $f_{\mathcal{C}}(x,y)-n$ of discriminant $-4n^{2}$ . Therefore the form $f_{\mathcal{C}}(x,y)$ becomes degenerate modulo $n$ , being equivalent to $Ax^{2}$ for some coefficient $A$ after a change of variables. We see then that the invertible values of $f_{\mathcal{C}}$ lie in one multiplicative coset of the squares (see [HKRS24, Proposition 4.1]). Thus we define $\chi_{2}(\mathcal{C})$ to be the unique non-zero value of the Kronecker symbol $\left(\frac{c}{n}\right)$ , where $c$ ranges over the curvatures of circles tangent to $\mathcal{C}$ .

Now suppose two circles $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ are tangent in the packing, having coprime curvatures $a$ and $b$ , respectively. Then, by quadratic reciprocity,

\chi_{2}(\mathcal{C}_{1})\chi_{2}(\mathcal{C}_{2})=\left(\dfrac{a}{b}\right)\left(\dfrac{b}{a}\right)=1.

So $\chi_{2}(\mathcal{C}_{1})=\chi_{2}(\mathcal{C}_{2})$ .

By [HKRS24, Corollary 4.7], any two circles are connected by a path of consecutively pairwise coprime curvatures, and so $\chi_{2}(\mathcal{C})$ is constant across the entire packing. It remains to compute this value using one pair of circles, say in the root quadruple:

\chi_{2}(\mathcal{P})=\left(\dfrac{8}{5}\right)=-1.

If there did exist a circle $\mathcal{C}$ of square curvature in $\mathcal{P}$ , it would give $\chi_{2}(\mathcal{P})=1$ , a contradiction. ∎

Exercise 6.2.

Prove that any two circles in an integral Apollonian circle packing are connected by a path of consecutively coprime curvatures.

In general, we define two functions $\chi_{2}$ and $\chi_{4}$ on the set of Descartes quadruples, or, if you prefer, on the set pairs $(\mathcal{C},\mathcal{P})$ of a circle in a packing. These should be thought of in the vein of characters or Legendre symbols; in particular, they take values $\chi_{2}\in\{\pm 1\}$ and $\chi_{4}\in\{\pm 1,\pm i\}$ . To define them in general is a little complicated, but we can explain $\chi_{2}(\mathcal{C})$ fairly simply in the case of a packing of type $(6,*)$ as the Legendre symbol $\left(\frac{a}{b}\right)$ where $b$ is the curvature of $\mathcal{C}$ , and $a$ is a coprime curvature tangent to $\mathcal{C}$ , and $a$ and $b$ are both coprime to $6$ . The symbol $\chi_{4}$ is only relevant, hence only defined, for types $(6,1)$ and $(6,17)$ .

Then, it is a consequence of quadratic reciprocity that the values of $\chi_{2}$ are well-defined for a packing $\mathcal{P}$ independent of the circle $\mathcal{C}$ . Hence we can write $\chi_{2}(\mathcal{P})$ . Similarly, $\chi_{4}(\mathcal{P})$ is well-defined because of quartic recprocity.

Theorem 6.3 (Haag-Kertzer-Rickards-Stange [HKRS24]).

Let $\mathcal{P}$ be a primitive integral Apollonian circle packing. There is an explicit list of obstructions (families of missing curvatures) of the form $\{ux^{2}:x\in\mathbb{Z}\}$ for $u\mid 6$ or $\{ux^{4}:x\in\mathbb{Z}\}$ for $u\mid 36$ that are missed by the list of curvatures in $\mathcal{P}$ . The list is determined entirely by the set of admissible curvatures, and by the value(s) $\chi_{2}(\mathcal{P})$ and $\chi_{4}(\mathcal{P})$ , if defined.

The full list is given here, where the type is now extended to be of the form $(n,k,\chi_{2})$ or $(n,k,\chi_{2},\chi_{4})$ :

type	quadratic obstructions	quartic obstructions
$(6,1,1,1)$
$(6,1,1,-1)$		$n^{4},4n^{4},9n^{4},36n^{4}$
$(6,1,-1)$	$n^{2},2n^{2},3n^{2},6n^{2}$
$(6,5,1)$	$2n^{2},3n^{2}$
$(6,5,-1)$	$n^{2},6n^{2}$
$(6,13,1)$	$2n^{2},6n^{2}$
$(6,13,-1)$	$n^{2},3n^{2}$
$(6,17,1,1)$	$3n^{2},6n^{2}$	$9n^{4},36n^{4}$
$(6,17,1,-1)$	$3n^{2},6n^{2}$	$n^{4},4n^{4}$
$(6,17,-1)$	$n^{2},2n^{2}$
$(8,7,1)$	$3n^{2},6n^{2}$
$(8,7,-1)$	$2n^{2}$
$(8,11,1)$
$(8,11,-1)$	$2n^{2},3n^{2},6n^{2}$

These families of powers are called reciprocity obstructions and more specifically quadratic obstructions and quartic obstructions. Certain packings have no reciprocity obstructions at all, but many (most, in a suitable sense) do.

We use the term missing for the curvatures which do not appear in a packing but are allowed by the congruence obstructions. Curvatures which are allowed by the known linear (congruence), quadratic, and quartic obstructions but are still nevertheless missing, are called sporadic.

Conjecture 6.4 (Haag-Kertzer-Rickards-Stange [HKRS24]).

Let ${\mathcal{P}}$ be a primitive integral Apollonian circle packing. Then the sporadic set $S({\mathcal{P}})$ is finite.

This actually says that, instead of (11), we are in most cases asserting at best that

\mathcal{K}(N)=cN+O(\sqrt{N}).

(12)

Haag, Kertzer, Rickards and Stange collected data on the missing curvatures up to bounds between $10^{10}$ and $10^{12}$ in a few dozen small packings, and the data supports the conjecture that the sporadic set is finite, as the set peters out in that range and appears to end.

For a nice exposition on Apollonian circle packings and their integer curvatures (before the newest obstructions were found, however), see [Fuc13]. The rest of this section is devoted to some of the key tools that appear in proofs concerning integral packings, particularly lower bounds on the number of distinct curvatures.

6.2. Integral quadratic forms

We have seen in Theorem 5.6, that the circles tangent to a fixed ‘mother’ circle represent the values of a translated quadratic form. It is furthermore the case that when $\mathcal{P}$ is a primitive integral packing, then the form is a primitive integral positive semi-definite form, and the multiplicity with which a curvature $k\in\mathbb{Z}$ appears is exactly the number of primitive solutions $(x,y)$ to $f_{\mathcal{C}}(x,y)-c=k$ . Since quadratic forms are¹⁴¹⁴14arguably well-understood, many of the analytic lower bound results so far mentioned involve collecting ensembles of such forms and restricting their overlap to grow large collections of curvatures.

In the section on Schmidt arrangements that follows, we will see that the images of the strip packing $(0,0,2,2)$ under $\operatorname{PSL}_{2}(\mathbb{Z}[i])$ include all primitive integral packings (all scaled by two); see Theorem 7.3. We will see that up to similarity, they are in bijection with ideal classes of orders of the Gaussian integers.

6.3. Expander graphs

Consider the Cayley graphs of the Apollonian group modulo $p^{m}$ . To see how these are useful, we consider the spectral theory of graphs, which is to say, studying the eigenvalues of the adjacency matrix. The graph we are interested in, denoted $\mathcal{A}_{p^{m}}$ , is the Cayley graph of $\mathcal{A}^{geo}/p^{m}$ (coefficients of the matrices taken modulo $p^{m}$ ) using the images of the standard swaps as generators. This is still $4$ -regular, just as for the Cayley graph of $\mathcal{A}$ . These Cayley graphs, taken as a family, are an expander family, meaning that they are well-connected in a precise sense as $p^{m}$ grows.

We will develop the basics of the spectral theory slightly more generally. Let $\mathcal{G}$ be a $d$ -regular graph with vertex set $V$ of size $n$ and edge set $E$ . (The spectral theory of non-regular graphs is significantly more complex in various ways.) Let $A$ be the $n\times n$ adjacency matrix of $\mathcal{G}$ , whose $ij$ -th entry is $1$ when the $i$ -th vertex is connected to the $j$ -th vertex, and $0$ otherwise. The normalized adjacency matrix $\frac{1}{d}A$ can be thought of as an operator controlling the flow of mass between vertices. If $\mathbf{x}$ is a vector whose entries are indexed by the vertices of $\mathcal{G}$ , then $\frac{1}{d}A\mathbf{x}$ has entries

\frac{1}{d}\sum_{w\sim v}x_{w},

where $v\sim w$ denotes adjacency, so the sum is over the neighbours. Imagine the vector $\mathbf{x}$ denotes a distribution of mass amongst the vertices of $\mathcal{G}$ . Then $\frac{1}{d}A\mathbf{x}$ is the mass distribution after each vertex ‘gives away’ its mass uniformly to its neighbours (that is, it sends $1/d$ of its mass to each neighbour), and, consequently, receives $1/d$ of the mass of each of its neighbours. This is a type of Markov chain.

Under this perspective, an eigenvector of this matrix is a distribution of mass which is scaled under such a flow. One such eigenvector is the uniform distribution (the same mass at all vertices), which has eigenvalue $1$ . In fact, the eigenvalues $\lambda_{i}$ of this matrix are real, and satisfy

1=\lambda_{0}\geq\lambda_{1}\geq\cdots\geq-1.

That they are real is a property of symmetric matrices. Moreover, since the adjacency matrix is real, symmetric, non-negative and irreducible, there is an orthonormal basis of eigenvectors.

If $\mathcal{G}$ is connected, then $\lambda_{0}>\lambda_{1}$ (see Exercise 6.7). The size of this spectral gap measures the connectedness of the graph in some sense. One intuition for this is to consider the convergence of the Markov chain toward the uniform distribution. Let $\mathbf{v}_{0},\ldots,\mathbf{v}_{n-1}$ be an orthonormal basis of eigenvectors such that $\frac{1}{d}A\mathbf{v}_{i}=\lambda_{i}\mathbf{v}_{i}$ . Let $\mathbf{w}$ be any mass distribution on the graph. Then $\mathbf{w}=\sum\alpha_{i}\mathbf{v}_{i}$ . Then we have

\left(\frac{1}{d}A\right)^{k}\mathbf{w}=\alpha_{0}\mathbf{v}_{0}+\sum_{i>0}\lambda_{i}^{k}\alpha_{i}\mathbf{v}_{i}.

From this, using that the basis of eigenvectors is orthonormal,

\left|\left|\left(\frac{1}{d}A\right)^{k}\mathbf{w}-\alpha_{0}\mathbf{v}_{0}\right|\right|_{2}^{2}=\sum_{i>0}|\lambda_{i}|^{2k}|\alpha_{i}|^{2}\leq|\lambda_{1}|^{2k}||\mathbf{w}||^{2}.

Thus we see that the rate of convergence to a uniform distribution is controlled by the spectral gap (i.e., $|\lambda_{1}|$ ).

Definition 6.5.

A family of graphs, $\mathcal{G}_{i}$ for $i\geq 1$ , is called an expander family of degree $d$ if the following hold:

(1)

the $\mathcal{G}_{i}$ are finite $d$ -regular connected graphs with $|\mathcal{G}_{i}|\rightarrow\infty$ ;
(2)

for each $i$ , let $\lambda_{i}$ be the largest eigenvalue in absolute value besides $\pm 1$ of the normalized adjacency matrix of $\mathcal{G}_{i}$ ; then

$\epsilon:=\limsup_{i\rightarrow\infty}\lambda_{i}<1.$

If the graph is ‘easily cut’ in the sense that removing a small number of edges can disconnect it, then these edges form a bottleneck to rapid convergence. Another measure of the same ‘connectedness’ is given in this language. For any subset $S\subseteq V$ , write $\partial S\subseteq E$ for the ‘boundary’ of $S$ , i.e. the edges connecting a vertex of $S$ to a vertex of the complement. The Cheeger constant of $\mathcal{G}$ is

h(\mathcal{G})=\min_{S\subseteq V,|S|\leq|V|/2}\frac{|\partial S|}{|S|}.

This measures how easy it is to disconnect the graph by removing edges. Then we can replace condition (2) above with the existence of an $\epsilon<0$ such that $h(\mathcal{G}_{i})>\epsilon$ for all $i$ .

It is possible to prove a spectral gap for the Apollonian Cayley graphs $\mathcal{A}_{p^{m}}:=\mathcal{A}^{geo}/p^{m}$ in a combinatorial way, by showing that every vertex can be reached in a bounded number of steps, where the bound is independent of the growing parameter $p^{m}$ . The existence of short paths to all points in the graph implies ‘good mixing’ and ‘connectedness’; this is the ‘combinatorial spectral gap’ used in [FSZ19, Section 8].

In the next section we will prove strong approximation, but a byproduct of this proof is the spectral gap.

Theorem 6.6.

The Cayley graphs $\mathcal{A}^{geo}/p^{m}$ form an expander family.

We will discuss the use of this tool at the end of the next section. A good reference for expander graphs for those interested in Cayley graphs is [Kow19].

Exercise 6.7.

Let $G$ be a $d$ -regular graph with $k$ connected components. Show that $\lambda_{0}=\lambda_{1}=\cdots=\lambda_{k-1}$ by finding independent eigenvectors. Conversely, show that when $k=1$ , $\lambda_{0}>\lambda_{1}$ .

Exercise 6.8.

Let $G$ be a connected $d$ -regular graph. Show that the eigenvectors for $\lambda_{i}<1$ have the property that the sum of their entries is $0$ . This shows that they are orthogonal to the eigenvector for $\lambda_{0}=1$ .

6.4. Strong approximation

An algebraic group $G$ has strong approximation if the maps $G(\mathbb{Z})\mapsto G(\mathbb{Z}/p^{m}\mathbb{Z})$ are surjective. For example, $\operatorname{GL}_{2}$ fails this property since matrices in $\operatorname{GL}_{2}(\mathbb{Z}/p\mathbb{Z})$ with invertible determinants other than $\pm 1$ are not in the image. However, it does hold for $\operatorname{SL}_{2}$ [DSV03].

In particular, the reduction map modulo $\mathfrak{a}$ for any ideal $\mathfrak{a}$ of $\mathbb{Z}[i]$ , $\operatorname{SL}_{2}(\mathbb{Z}[i])\rightarrow\operatorname{SL}_{2}(\mathbb{Z}[i]/\mathfrak{a})$ is always surjective. One way to measure of the ‘size’ of a subgroup like $\mathcal{A}^{geo}$ is to ask whether the reduction maps $\mathcal{A}^{geo}\rightarrow\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z})$ are surjective. The Apollonian group almost has this property: it holds for sufficiently large prime powers. Thus we say that $\mathcal{A}^{geo}$ itself has strong approximation.

For primes besides $2$ and $3$ , the proof is by construction, using the fact that $z\mapsto z+i$ is in the Apollonian group.

Theorem 6.9.

The Apollonian group $\mathcal{A}^{geo}$ satisfies $\mathcal{A}^{geo}/p^{m}\cong\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z})$ for all $p\geq 5$ .

The following proof is an adaptation of that of Varjú [BK14, Appendix] and [FSZ19].

Proof.

Let $p\geq 5$ be prime and $m\geq 1$ be an integer. Then there exists a pair $x,y\in\mathbb{Z}$ such that $x^{2}+y^{2}\equiv 1~{}(\textup{mod}~{}p^{m})$ and $(xy,p)=1$ [Cas78, Exercise 13(v)]. Consider reduction modulo $p^{m}$ on $\mathbb{Z}[i]$ (the following will work whether $p$ is split or inert; we write $i$ for the image of $i$ under reduction). The image matrix $T_{0}:=\begin{pmatrix}x&y\\ -y&x\end{pmatrix}$ lies in $\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z})$ by construction. Note that $T_{0}$ has fixed points $\pm i$ . Then there exists a lift $T_{1}:=\begin{pmatrix}x_{0}&-y_{0}\\ y_{0}&x_{0}\end{pmatrix}\in\operatorname{SL}_{2}(\mathbb{Z})$ of $T_{0}$ by the strong approximation of $\operatorname{SL}_{2}$ .

Let $T:=\begin{pmatrix}1&i\\ 0&1\end{pmatrix}\in\mathcal{A}$ . Then $TT_{1}T^{-1}$ has fixed points $T(\pm i)=\{0,2i\}$ , the first of which we can conjugate to $\infty$ using $S\in\operatorname{SL}_{2}(\mathbb{Z})$ . Call the result $T_{2}:=STT_{1}T^{-1}S^{-1}\in\mathcal{A}$ , which fixes $\infty$ modulo $p^{m}$ . That is, it must have the form

T_{2}=\begin{pmatrix}a_{0}&b\\ 0&a_{1}\end{pmatrix}.

Here, $a_{0}$ and $a_{1}$ are the eigenvalues of $T_{2}$ , and hence also of $T_{1}$ , which are $x\pm yi$ . In particular, $a_{0}^{2}\notin\mathbb{Z}/p^{m}\mathbb{Z}$ (observe that $y\not\equiv 0~{}(\textup{mod}~{}p)$ by construction, so the reduction of $x+iy$ modulo $p^{m}$ is in $\mathbb{Z}[i]/p^{m}\mathbb{Z}[i]\smallsetminus\mathbb{Z}/p^{m}\mathbb{Z}$ ), and it is invertible (also by construction). Now let

T_{3,n}:=T_{1}\begin{pmatrix}1&n\\ 0&1\end{pmatrix}T_{1}^{-1}\equiv\begin{pmatrix}1&na_{0}^{2}\\ 0&1\end{pmatrix}

where the upper right corner of $T_{3,1}$ is invertible, but not in $\mathbb{Z}/p^{m}\mathbb{Z}$ . Hence $a_{0}^{2}$ and $1$ generate $\mathbb{Z}[i]/p^{m}\mathbb{Z}[i]$ . This implies that all upper triangular matrices are in $\mathcal{A}$ . Similarly we obtain all lower triangular matrices. By combining these, we have everything except those things whose lower left entry is divisible by $p$ . That is, we have more than half of $\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z})$ . Therefore we must generate it all. ∎

Although the theorem fails for $(p,m)\in\{(2,1),(2,2),(2,3),(3,1)\}$ , for higher powers of $2$ and $3$ , we do recover predictable behaviour, in the following sense. Although $\mathcal{A}_{3}$ is not all of $\operatorname{SL}_{2}(\mathbb{Z}/3\mathbb{Z})$ , when lifting from $\mathcal{A}_{3}$ to $\mathcal{A}_{3^{2}}$ , we do obtain ‘all’ of the valid lifts, meaning that although of course we cannot recover $\operatorname{SL}_{2}(\mathbb{Z}/3^{2}\mathbb{Z})$ , we do not ‘lose even more.’ More precisely, for any $m>m_{p}$ (where $m_{2}=2$ and $m_{3}=1$ ), if $M\in\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z})$ has a reduction to $\mathcal{A}_{p^{m_{p}}}$ , then it has a reduction to $\mathcal{A}_{p^{m}}$ .

In fact, the proof above for $p\geq 5$ shows that all of $\operatorname{SL}_{2}(\mathbb{Z}/p^{m}\mathbb{Z})$ is generated by words of a bounded finite length independent of $p$ and $m$ ; a similar result for $2$ and $3$ combines with this to give a spectral gap, showing that the Cayley graphs form an expander family (Theorem 6.6).

Strong approximation and the spectral gap turn out to be an important tool in the proofs that many curvatures appear. The rough idea is that for an expander graph, there is rapid mixing, so that as $n$ and $m$ grow, all curvatures modulo $n$ and modulo $m$ will be appearing regularly, so that all residues modulo $nm$ are also likely to occur regularly. Analytic methods allow one to extrapolate that most integers will eventually occur. One might think of this as a sort of explicit Sunzi’s Theorem for Apollonian packings.

Exercise 6.10.

Prove that the natural reduction map $\operatorname{PSL}_{2}(\mathbb{Z})\rightarrow\operatorname{PSL}_{2}(\mathbb{Z}/n\mathbb{Z})$ is surjective. Show that this is a group homomorphism and find the kernel.

6.5. Orbits of thin groups more generally

These can be viewed as statements about orbits of the Apollonian group. To be precise, in this perspective, explored in more depth in [Kon13], the curvatures of a fixed packing form a set $\{\pi_{i}(\mathbf{v}):\mathbf{v}\in\mathbf{v}_{0}\mathcal{A},1\leq i\leq 4\}$ , where $\pi_{i}$ is projection on the $i$ -th coordinate, $\mathcal{A}$ is the Apollonian group, and $\mathbf{v}_{0}$ is a row vector of curvatures of some fixed Descartes quadruple. We consider the Apollonian group as a subgroup $\mathcal{A}$ of $O_{Q}(\mathbb{Z})$ .

Another famous conjecture is part of the same general type.

Conjecture 6.11 (Zaremba’s Conjecture, [Zar72]).

There exists a positive constant $Z$ such that every natural number is the denominator of some rational number (in reduced form) whose continued fraction partial quotients are $\leq Z$ .

For example, the continued fraction expansions of all rationals with denominator $7$ are:

\frac{1}{7}=[0;7],\quad\frac{2}{7}=[0;3,2],\quad\frac{3}{7}=[0;2,3],\quad\frac{4}{7}=[0;1,1,3],\quad\frac{5}{7}=[0;1,2,2],\quad\frac{6}{7}=[0;1,6],\quad

A reasonable guess with current data is that Zaremba’s conjecture holds for $Z=5$ . In the example above, things are ‘so far so good’ for $Z=5$ because at least one of the expansions (in fact, 4 of them) involves only convergents $\leq 5$ . We know that denominators $6$ , $54$ , and $150$ fail for $Z=4$ , but we do not know of any further failures. Niederreiter [Nie78] conjectured that even for $Z=3$ there are only finitely many exceptions; Hensley [Hen96] conjectured this for $Z=2$ . Of course, it certainly fails for $Z=1$ , whose continued fraction expansions have only Fibonacci sequence denominators.

One might consider the Cantor-like set $C_{Z}$ of real numbers whose continued fraction expansions contain only convergents $a_{i}\leq Z$ . Or even more generally, a cantor set $C_{S}$ for a set $S$ of allowable convergents. Hensley conjectured that a Zaremba-like statement will hold for convergents in $S$ if and only if the Hausdorff dimension of $C_{Z}$ exceeds $1/2$ [Hen96]. However, Bourgain and Kontorovich found a counterexample of $S=\{2,4,6,8,10\}$ , which has a congruence-like obstruction; denominators of $3~{}(\textup{mod}~{}4)$ cannot appear [BK11].

Zaremba’s conjecture and its variants can be phrased as a thin orbit question. As we saw before, we can generate continued fraction convergents as the columns of elements $M\in\operatorname{SL}_{2}^{+}(\mathbb{Z})$ . A slight reformution is that every continued fraction convergent can be found as a column of a matrix of the form

\begin{pmatrix}0&1\\ 1&a_{0}\end{pmatrix}\begin{pmatrix}0&1\\ 1&a_{1}\end{pmatrix}\begin{pmatrix}0&1\\ 1&a_{2}\end{pmatrix}\cdots\begin{pmatrix}0&1\\ 1&a_{n}\end{pmatrix}

where the $a_{n}$ are the partial quotients. To restrict the partial quotients, we simply generate a semi-group (i.e., no inverses) by the matrices $\begin{pmatrix}0&1\\ 1&a\end{pmatrix}$ where $a$ is in our set $S$ of allowable convergents. In recent work, Rickards and Stange find reciprocity obstructions in this context [RS24].

7. Schmidt arrangements

7.1. $\operatorname{PSL}_{2}(\mathcal{O}_{K})-$ orbit

As we saw before, Schmidt defined a subdivision of the complex plane that involved Apollonian circle packings. See Figure 7.1. The easiest way to generate this image is to take the image of $\widehat{\mathbb{R}}$ under $\operatorname{PSL}_{2}(\mathbb{Z}[i])$ . Suitable references for this section are [Sta18b, Sta18a, Mar22].

The definition of the Schmidt arrangement can be given more generally in terms of an imaginary quadratic field, so that Figure 7.1 is for the Gaussian field $\mathbb{Q}(i)$ . Another example is given in Figure 7.2.

Definition 7.1.

The Schmidt arrangement $\mathcal{S}_{K}$ for an imaginary quadratic ring $\mathcal{O}_{K}$ is the image of $\widehat{\mathbb{R}}$ under $\operatorname{PSL}_{2}(\mathcal{O}_{K})$ .

We will be mainly interested in the case $\mathcal{O}_{K}=\mathbb{Z}[i]$ , because of its connection to the Apollonian packing.

We begin with some basic properties, all of which are a consequence of Proposition 3.13 (see also the proof of Theorem 5.6).

Proposition 7.2.

(1)

The Schmidt arrangement has symmetry by translation by $\mathbb{Z}[i]$ and by rotation of $\pi$ about the origin.
(2)

The curvatures of the circles in the Schmidt arrangement lie in $2\mathbb{Z}$ .
(3)

The circles are either tangent or disjoint.
(4)

The points of tangency are Gaussian rationals.

In particular, up to a scaling factor of $2$ , we’ll think of the circles as having integral curvature. In fact, we’ll call half the curvature the reduced curvature for convenience; this lies in $\mathbb{Z}$ .

The following fact was known to Graham, Lagarias, Mallows, Wilks and Yan in the context of the Apollonian super-packing.

Theorem 7.3 ([GLM⁺05, Sta18a]).

Every single primitive integral Apollonian circle packing appears in the Schmidt arrangement of $\mathbb{Q}(i)$ exactly once up to translation by $\mathbb{Z}[i]$ and rotation about the origin by $\pi$ .

We saw in Theorem 5.6 that every circle in a packing has a natural quadratic form associated to it. We also saw in Section 2.7 that quadratic forms are in bijection with lattices. In fact, every circle has a lattice associated to it.

Theorem 7.4.

Let $\mathcal{C}$ be a circle in the Schmidt arrangement for $\mathbb{Q}(i)$ , given as an image of $\widehat{\mathbb{R}}$ by $\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}$ . Let $\Lambda=\gamma\mathbb{Z}+\delta\mathbb{Z}$ . Then the denominators of the tangency points touching $\mathcal{C}$ are exactly the elements of $\Lambda$ , and at tangency point $\sigma/\rho$ , the circles tangent have curvatures $\Im(\gamma\overline{\delta})+2kN(\rho)$ , $k\in\mathbb{Z}$ .

7.2. Lattice in the space of circles

As an orbit of a circle under Möbius transformations, we can ask what the Schmidt arrangement is as a subset of the space of circles. It is actually a very nicely described subset, as given by Daniel Martin (who did this for general Schmidt arrangements). This is very useful for drawing images.

Proposition 7.5 ([Mar22, Definition 3.1 and Theorem 3.11]).

The circles of the Schmidt arrangement are exactly those circles having curvature $2p^{\prime}$ and curvature center $2t^{\prime}+(2s^{\prime}+1)i$ whenever $p^{\prime}\mid r^{\prime 2}+s^{\prime 2}+s^{\prime}$ .

For example, for $p^{\prime}=1$ , the condition is always satisfied and we have centers $t^{\prime}+(s^{\prime}+1/2)i$ for all $s^{\prime},t^{\prime}\in\mathbb{Z}$ .

Proof.

We will describe the Gaussian Schmidt arrangement as the intersection of the one-sheeted hyperboloid with a lattice in the space of circles. The lattice will be

\Lambda:=\{(p,q,r,s)\in\mathbb{Z}^{4}:(p,q,r,s)\equiv(0,0,0,1)~{}(\textup{mod}~{}2)\}\subset\mathbb{R}^{4},

coordinates representing curvature, co-curvature, real and imaginary parts of curvature-centre in the space of circles, as usual. By Exercise 3.14, circles of the Schmidt arrangement lie in $\Lambda$ .

The image $G$ of $\operatorname{PSL}_{2}(\mathbb{Z}[i])$ in $O_{Q}(\mathbb{R})$ (under the exceptional isomorphism (10)) lies in $O_{Q}(\mathbb{Z})$ , and satisfies $G\Lambda\subseteq\Lambda$ . The extended real line $\widehat{\mathbb{R}}$ corresponds to $(0,0,0,-1)\in\Lambda$ , which lies on the one-sheeted hyperboloid $r^{2}+s^{2}-pq=1$ . Thus, since $G$ preserves length, the entire Schmidt arrangement lies in the one-sheeted hyperboloid $r^{2}+s^{2}-pq=1$ .

Conversely, assume we have a circle $\mathcal{C}$ in the lattice and on the hyperboloid. Choose a Gaussian rational point $\alpha/\beta\in\mathcal{C}$ . There is an element of $\operatorname{PSL}_{2}(\mathbb{Z}[i])$ mapping $\alpha/\beta$ to $0$ because there is a solution to $\alpha x+\beta y=1$ (by coprimality of numerator and denominator). So without loss of generality, we can assume our circle touches $0$ , and therefore has co-curvature $q=0$ (exercise). Suppose it has curvature $p$ and curvature-center $r+si$ . It will still lie in the lattice, since $G\Lambda\subseteq\Lambda$ . Consider the matrix

M:=\begin{pmatrix}0&-s+ir\\ 1&ip/2\end{pmatrix}.

Its determinant has absolute value $|s-ir|=r^{2}+s^{2}-p\cdot 0=1$ , so it has determinant in $\{\pm 1,\pm i\}$ . As $s$ is odd, the determinant is $\pm 1$ , and so $M\in\operatorname{PSL}_{2}(\mathbb{Z}[i])$ . Since $\widehat{C}=M\cdot\widehat{\mathbb{R}}$ , it is in the Schmidt arrangement.

Finally, we simply work out the condition of intersecting the lattice with the one-sheeted hyperboloid. Writing $p=2p^{\prime}$ , $q=2q^{\prime}$ , $r=2r^{\prime}$ , $s=2s^{\prime}+1$ , the condition that $q^{\prime}\in\mathbb{Z}$ and $pq-r^{2}-s^{2}+1=0$ becomes

p^{\prime}\mid r^{\prime 2}+s^{\prime 2}+s^{\prime}.

∎

7.3. Visual structure of imaginary quadratic fields

The Schmidt arrangement $\mathcal{S}_{K}$ gives visual form to the arithmetic of $K$ . $K$ -Bianchi circles intersect only at $K$ -points and only at ‘unit angles’, i.e. angles $\theta$ such that $e^{i\theta}$ lies in the unit group of $\mathcal{O}_{K}$ . Their curvatures lie in $\sqrt{-\Delta}\mathbb{Z}$ . Furthermore, the circles themselves are in bijection with certain ideal classes:

Theorem 7.6 ([Sta18b, Theorem 1.4]).

$K$ -Bianchi circles of curvature $f$ , modulo translation into the fundamental region of $\mathcal{O}_{K}$ , and rotation by unit angles, are in bijection with the invertible ideal classes of the order of conductor $f$ which extend to the trivial class in $\mathcal{O}_{K}$ .

This bijection is very explicit: if $M\cdot\widehat{\mathbb{R}}$ is a $K$ -Bianchi circle of curvature $f$ , where $M=\tiny\begin{pmatrix}\alpha&\beta\\ \gamma&\delta\end{pmatrix}$ , then $\gamma\mathbb{Z}+\delta\mathbb{Z}$ is an ideal with conductor $f$ . This is the lattice we saw as the lattice of denominators in the last section.

Furthemore the connectedness of $\mathcal{S}_{K}$ is easily characterised:

Theorem 7.7 ([Sta18b, Theorem 1.5]).

$\mathcal{S}_{K}$ is connected if and only if $\mathcal{O}_{K}$ is Euclidean.

The arithmetic of Kleinian groups has a long history. The Möbius transformations of $\widehat{\mathbb{C}}$ extend to hyperbolic isometries on the upper half plane model of hyperbolic $3$ -space for which $\widehat{\mathbb{C}}$ is the boundary. The quotient of this space by a Bianchi group defines a Bianchi orbifold, and the arithmetic of the field is known to play an important role in the topology and geometry of these orbifolds (see, for example [MR03]). As the simplest example, the cusps of the Bianchi orbifold are in bijection with the class number. The Schmidt arrangement is another aspect of the orbifold: in essence, $\mathcal{S}_{K}$ represents a particular choice of geodesic surface in the manifold. The classification of geodesic surfaces in Bianchi orbifolds is not yet well understood.

There is a simple geometric criterion for an Apollonian circle packing as a subset of ${\mathcal{S}}_{\mathbb{Q}(i)}$ : it is obtained from any one circle by adding on the largest exteriorly tangent circle at each tangency point. This criterion, applied to other Schmidt arrangements, gives $K$ -Apollonian packings for other imaginary quadratic fields (Figures 7.4 and 7.5). Along with these packings come $K$ -Apollonian groups for which these packings are the limit set. These are thin groups acting on appropriate clusters of circles (analogous to the notion of Descartes quadruple) to generate the packing. For example, in $\mathbb{Q}(\sqrt{-2})$ the relevant cluster has as tangency graph a cube, and there are six swaps through faces, so that the $K$ -Apollonian group is a free product of six copies of $\mathbb{Z}/2\mathbb{Z}$ . The description of the local obstructions can be extended to $K$ -Apollonian packings [Sta18a].

8. A postscript

I have had the good fortune of a great deal of freedom in choosing and exploring mathematical projects. I have found myself attracted to subjects in number theory that have an essential geometric aspect, in particular one which can illuminate proofs and demand its own questions. I fell in love with quadratic forms and continued fractions through the Farey subdivision and Conway’s topograph. I studied curves in graduate school, and learned that it was the genus – their topology – that controlled their Diophantine behaviour. I believe strongly that number theory – probably all number theory – is essentially geometric, and that algebra, although powerful and in possession of a beauty of its own, can at times obscure a hidden geometric splendour.

Of my own research explored in these notes, every project involved extensive computer experiments, often visual ones. As I discovered the properties of Schmidt arrangements, I held an old fashioned compass to a computer print-out to find patterns. As I worked with Harriss and Trettel on algebraic numbers, we made an explicit decision to let aesthetics drive our computer experiments, which is how choices like sizing by discriminant and measuring approximation in hyperbolic geometry were born.

We are just discovering the many ways in which Apollonian circle packings are essential and natural objects in number theory, much like elliptic curves. But my favourite justification is geometric: draw the Schmidt arrangement, which is, in a very real sense, the way that Gaussian rationals choose to organize themselves, and the Apollonian packings are an essential intermediate geometric piece. They pop out of the picture. They cannot be avoided: they are dictated by nature.

I’ve learned many things from these experiences. I’ve learned that the visual cortex is a powerful tool for creativity and intuition, as well as reasoning. Patterns emerge to our visual cortex that will pass unnoticed in numerical data. We are essentially visual and social beings, and as such, we vividly recall our visual and social encounters. This is why mathematics is often best conveyed in stories and pictures. We remember mathematical objects that appear, to us, to have personality and shape. As mathematicians, we do this, more-or-less unconsciously, with the most abstract mathematical objects; they become our friends, our tormentors, our landscapes. Why not also do it explicitly?

I encourage the reader to wander the pages of John H. Conway and Francis Y. C. Fung’s The Sensual Quadratic Form [Con97], Martin H. Weissman’s An Illustrated Theory of Numbers [Wei17], and Allen Hatcher’s Topology of Numbers [Hat22]; to value an illustrative and visual approach to mathematics; and, at the risk of sentimentality, to follow one’s heart in mathematical research and elsewhere.

References

[AR23] Jorge L. Ramírez Alfonsín and Iván Rasskin. A polytopal generalization of apollonian packings and descartes’ theorem, 2023.
[BE09] Yann Bugeaud and Jan-Hendrik Evertse. Approximation of complex algebraic numbers by algebraic numbers of bounded degree. Ann. Sc. Norm. Super. Pisa Cl. Sci. (5), 8(2):333–368, 2009.
[BF11] Jean Bourgain and Elena Fuchs. A proof of the positive density conjecture for integer Apollonian circle packings. J. Amer. Math. Soc., 24(4):945–967, 2011.
[BK11] Jean Bourgain and Alex Kontorovich. On Zaremba’s conjecture. C. R. Math. Acad. Sci. Paris, 349(9-10):493–495, 2011.
[BK14] Jean Bourgain and Alex Kontorovich. On the local-global conjecture for integral Apollonian gaskets. Invent. Math., 196(3):589–650, 2014. With an appendix by Péter P. Varjú.
[Cas78] J. W. S. Cassels. Rational quadratic forms, volume 13 of London Mathematical Society Monographs. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], London-New York, 1978.
[CFHS19] Sneha Chaubey, Elena Fuchs, Robert Hines, and Katherine E. Stange. The dynamics of super-Apollonian continued fractions. Trans. Amer. Math. Soc., 372(4):2287–2334, 2019.
[Chr16] A. David Christopher. A partition-theoretic proof of Fermat’s two squares theorem. Discrete Math., 339(4):1410–1411, 2016.
[Con97] John H. Conway. The sensual (quadratic) form, volume 26 of Carus Mathematical Monographs. Mathematical Association of America, Washington, DC, 1997. With the assistance of Francis Y. C. Fung.
[DSV03] Giuliana Davidoff, Peter Sarnak, and Alain Valette. Elementary number theory, group theory, and Ramanujan graphs, volume 55 of London Mathematical Society Student Texts. Cambridge University Press, Cambridge, 2003.
[FS11] Elena Fuchs and Katherine Sanden. Some experiments with integral Apollonian circle packings. Exp. Math., 20(4):380–399, 2011.
[FSZ19] Elena Fuchs, Katherine E. Stange, and Xin Zhang. Local-global principles in circle packings. Compos. Math., 155(6):1118–1170, 2019.
[Fuc11] Elena Fuchs. Strong approximation in the Apollonian group. J. Number Theory, 131(12):2282–2302, 2011.
[Fuc13] Elena Fuchs. Counting problems in Apollonian packings. Bull. Amer. Math. Soc. (N.S.), 50(2):229–266, 2013.
[GLM⁺03] Ronald L. Graham, Jeffrey C. Lagarias, Colin L. Mallows, Allan R. Wilks, and Catherine H. Yan. Apollonian circle packings: number theory. J. Number Theory, 100(1):1–45, 2003.
[GLM⁺05] Ronald L. Graham, Jeffrey C. Lagarias, Colin L. Mallows, Allan R. Wilks, and Catherine H. Yan. Apollonian circle packings: geometry and group theory. I. The Apollonian group. Discrete Comput. Geom., 34(4):547–585, 2005.
[Hen96] Douglas Hensley. Erratum: “A polynomial time algorithm for the Hausdorff dimension of continued fraction Cantor sets”. J. Number Theory, 59(2):419, 1996.
[Hir67] K. E. Hirst. The Apollonian packing of circles. J. London Math. Soc., 42:281–291, 1967.
[HKRS24] Summer Haag, Clyde Kertzer, James Rickards, and Katherine Stange. The local-global conjecture for Apollonian circle packings is false. Ann. of Math. (2), 200(2):749–770, 2024.
[Hol21] Jan E. Holly. What type of Apollonian circle packing will appear? Amer. Math. Monthly, 128(7):611–629, 2021.
[HST22] Edmund Harriss, Katherine E. Stange, and Steve Trettel. Algebraic number starscapes. Exp. Math., 31(4):1098–1149, 2022.
[IR82] Kenneth F. Ireland and Michael I. Rosen. A classical introduction to modern number theory, volume 84 of Graduate Texts in Mathematics. Springer-Verlag, New York-Berlin, revised edition, 1982.
[Isk71] V. A. Iskovskikh. A counterexample to the Hasse principle for systems of two quadratic forms in five variables. Mat. Zametki, 10:253–257, 1971.
[KO11] Alex Kontorovich and Hee Oh. Apollonian circle packings and closed horospheres on hyperbolic 3-manifolds. J. Amer. Math. Soc., 24(3):603–648, 2011. With an appendix by Oh and Nimish Shah.
[Koc20] Jerzy Kocik. Apollonian depth and the accidental fractal, 2020.
[Kok39] J. F. Koksma. über die Mahlersche Klasseneinteilung der transzendenten Zahlen und die Approximation komplexer Zahlen durch algebraische Zahlen. Monatsh. Math. Phys., 48:176–189, 1939.
[Kon13] Alex Kontorovich. From Apollonius to Zaremba: local-global phenomena in thin orbits. Bull. Amer. Math. Soc. (N.S.), 50(2):187–228, 2013.
[Kow19] Emmanuel Kowalski. An introduction to expander graphs, volume 26 of Cours Spécialisés [Specialized Courses]. Société Mathématique de France, Paris, 2019.
[LMW02] Jeffrey C. Lagarias, Colin L. Mallows, and Allan R. Wilks. Beyond the Descartes circle theorem. Amer. Math. Monthly, 109(4):338–361, 2002.
[LO13] Min Lee and Hee Oh. Effective circle count for Apollonian packings and closed horospheres. Geom. Funct. Anal., 23(2):580–621, 2013.
[Mar22] Daniel Martin. A geometric study of circle packings and ideal class groups, 2022.
[Mat93] Yuri V. Matiyasevich. Hilbert’s tenth problem. Foundations of Computing Series. MIT Press, Cambridge, MA, 1993. Translated from the 1993 Russian original by the author, With a foreword by Martin Davis.
[McM98] Curtis T. McMullen. Hausdorff dimension and conformal dynamics. III. Computation of dimension. Amer. J. Math., 120(4):691–721, 1998.
[MR03] Colin Maclachlan and Alan W. Reid. The arithmetic of hyperbolic 3-manifolds, volume 219 of Graduate Texts in Mathematics. Springer-Verlag, New York, 2003.
[Nie78] Harald Niederreiter. Quasi-Monte Carlo methods and pseudo-random numbers. Bull. Amer. Math. Soc., 84(6):957–1041, 1978.
[Oh10] Hee Oh. Dynamics on geometrically finite hyperbolic manifolds with applications to Apollonian circle packings and beyond. In Proceedings of the International Congress of Mathematicians. Volume III, pages 1308–1331. Hindustan Book Agency, New Delhi, 2010.
[Oh14] Hee Oh. Apollonian circle packings: dynamics and number theory. Jpn. J. Math., 9(1):69–97, 2014.
[OS12] Hee Oh and Nimish Shah. The asymptotic distribution of circles in the orbits of Kleinian groups. Invent. Math., 187(1):1–35, 2012.
[Par08] John R. Parker. Hyperbolic spaces: The jyväskylä notes, 2008.
[Ric23] James Rickards. Apollonian. https://github.com/JamesRickards-Canada/Apollonian, 2023.
[Ric24] James Rickards. The Apollonian staircase. Int. Math. Res. Not. IMRN, (2):1–33, 2024.
[Rot55] K. F. Roth. Rational approximations to algebraic numbers. Mathematika, 2:1–20; corrigendum, 168, 1955.
[RS24] James Rickards and Katherine E. Stange. Reciprocity obstructions in semigroup orbits in sl(2, z), 2024.
[Sar] Peter Sarnak. Letter to Lagarias. http://www.math.princeton.edu/sarnak.
[Sch75] Asmus L. Schmidt. Diophantine approximation of complex numbers. Acta Math., 134:1–85, 1975.
[Ser85a] Caroline Series. The geometry of Markoff numbers. Math. Intelligencer, 7(3):20–29, 1985.
[Ser85b] Caroline Series. The modular surface and continued fractions. J. London Math. Soc. (2), 31(1):69–80, 1985.
[Sha07] Lisa Shapiro, editor. The Correspondence Between Princess Elisabeth of Bohemia and René Descartes. University of Chicago Press, 2007.
[Sil94] Joseph H. Silverman. Advanced topics in the arithmetic of elliptic curves, volume 151 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1994.
[Spr69] V. G. Sprindžuk. Mahler’s problem in metric number theory. Translations of Mathematical Monographs, Vol. 25. American Mathematical Society, Providence, R.I., 1969. Translated from the Russian by B. Volkmann.
[Sta18a] Katherine E. Stange. The Apollonian structure of Bianchi groups. Trans. Amer. Math. Soc., 370(9):6169–6219, 2018.
[Sta18b] Katherine E. Stange. Visualizing the arithmetic of imaginary quadratic fields. Int. Math. Res. Not. IMRN, (12):3908–3938, 2018.
[Sz69] V. G. Sprindˇzuk. Mahler’s problem in metric number theory, volume Vol. 25 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 1969. Translated from the Russian by B. Volkmann.
[Trk07] D. Trkovská. Felix Klein and his Erlanger Programm, 2007.
[Vin14] Ilya Vinogradov. Effective bisector estimate with application to Apollonian circle packings. Int. Math. Res. Not. IMRN, (12):3217–3262, 2014.
[VW24] Polina Vytnova and Caroline Wormell. Hausdorff dimension of the apollonian gasket, 2024.
[Wei17] Martin H. Weissman. An illustrated theory of numbers. American Mathematical Society, Providence, RI, 2017.
[Zag90] D. Zagier. A one-sentence proof that every prime $p\equiv 1~{}(\textup{mod}~{}4)$ is a sum of two squares. Amer. Math. Monthly, 97(2):144, 1990.
[Zar72] S. K. Zaremba. La méthode des “bons treillis” pour le calcul des intégrales multiples. In Applications of number theory to numerical analysis (Proc. Sympos., Univ. Montréal, Montreal, Que., 1971), pages 39–119. Academic Press, New York-London, 1972.

An illustrated introduction to the arithmetic of Apollonian circle packings, continued fractions, and other thin orbits

Abstract.

Key words and phrases:

1. Introduction

1.0.1. Acknowledgements

2. A number theory perspective

2.1. Diophantine problems

2.2. Quadratic forms

Theorem 2.1 (Fermat).

Exercise 2.2.

Exercise 2.3.

2.3. Local-to-global

Theorem 2.4 (Hasse-Minkowski Theorem).

Exercise 2.5.

2.4. Quadratic reciprocity

Definition 2.6.

Theorem 2.7 (Quadratic reciprocity).

Corollary 2.8.

Proof.

Exercise 2.9.

2.5. Brauer-Manin obstructions

Proposition 2.10.

Proof.

Exercise 2.11.

2.6. The modular group PSL2⁡(ℤ)\operatorname{PSL}_{2}(\mathbb{Z})

Exercise 2.12.

Theorem 2.13 (for example, [Sil94, Proposition 1.5]).

Exercise 2.14.

Theorem 2.15 (for example, [Sil94, Corollary 1.6]).

2.7. Quadratic forms in the upper half plane

Theorem 2.16.

Exercise 2.17.

Exercise 2.18.

2.8. Lattices in the upper half plane

Exercise 2.19.

Theorem 2.20.

Exercise 2.21.

2.9. Lattices and quadratic forms

Exercise 2.22.

Theorem 2.23.

Proof.

Exercise 2.24.

2.10. Diophantine approximation

Theorem 2.25 (Dirichlet).

Proof.

Exercise 2.26.

2.11. The Farey subdivision

Exercise 2.27.

Definition 2.28.

Exercise 2.29.

2.12. The matrix continued fraction expansion

Exercise 2.30.

Theorem 2.31 (Series [Ser85b]).

Exercise 2.32.

Exercise 2.33.

Theorem 2.34.

Exercise 2.35.

2.13. Indefinite quadratic forms and real quadratic irrationalities

Exercise 2.36.

Proposition 2.37.

Proof.

Corollary 2.38.

Exercise 2.39.

Exercise 2.40.

2.14. Lagrange spectrum

2.15. Roth’s Theorem

Theorem 2.41 (Roth [Rot55]).

Exercise 2.42.

Exercise 2.43.

3. Hyperbolic and Minkowski geometry

3.1. Minkowski space

Exercise 3.1.

3.2. The upper half plane

3.3. Relating the upper half plane and hyperboloid models

Exercise 3.2.

Theorem 3.3 ([HST22, Theorem 4.9]).

Theorem 3.4 ([HST22, Observation 4.6]).

Theorem 3.5 ([HST22, Observation 4.11]).

Exercise 3.6.

3.4. The Hamilton quaternions and upper half space

2.6. The modular group $\operatorname{PSL}_{2}(\mathbb{Z})$

3.5. Relating the hyperboloid and upper half-space models for hyperbolic $3$ -space

Proposition 5.9 ( [GLM⁺03, Theorem 4.2], [Ric24, Proposition 3.1.2] ).

5.7.1. The space of Descartes quadruples over $\mathbb{R}$

7.1. $\operatorname{PSL}_{2}(\mathcal{O}_{K})-$ orbit

Theorem 7.3 ([GLM⁺05, Sta18a]).