Topoi of automata I:
Four topoi of automata and regular languages

Ryuya Hora

Abstract.

Both topos theory and automata theory are known for their multi-faceted nature and relationship with topology, algebra, logic, and category theory. This paper aims to clarify the topos-theoretic aspects of automata theory, particularly demonstrating through two main theorems how regular (and non-regular) languages arise in topos-theoretic calculation. First, it is shown that the four different notions of automata form four types of Grothendieck topoi, illustrating how the technical details of automata theory are described by topos theory. Second, we observe that the four characterizations of regular languages (DFA, Myhill-Nerode theorem, finite monoids, profinite words) provide Morita-equivalent definitions of a single Boolean-ringed topos, situating this within the context of Olivia Caramello’s ‘Toposes as Bridges.’

This paper also serves as a preparation for follow-up papers, which deal with the relationship between hyperconnected geometric morphisms and algebraic/geometric aspects of formal language theory.

Key words and phrases:

Automaton, topos, regular language, coalgebra, finite monoid, profinite word, Myhill-Nerode theorem

2020 Mathematics Subject Classification:

18F10, 68Q70, 20M35, 18B20

Graduate School of Mathematical Sciences, University of Tokyo. hora@ms.u-tokyo

1. Introduction

This series of papers aims to propose a topos-theoretic framework for automata theory with the following future goals:

•

to unify aspects of automata theory in terms of topoi, and
•

to introduce geometric methods into automata theory.

The connection between category theory and automata theory is a richly historic area. There are a vast number of studies on the connection between category theory and automata theory, including [Adá74, Jac17, Rut19, CP20, GPA22], and also connections between topos theory and automata theory [Law04, Ura17, GPA22, Boc+23, Iwa24].

As far as the author knows, the novelty of this paper is to consider the topoi (consisting) of automata (not automata in topoi or topoi constructed from automata-theoretic gadgets.) Our starting point is the following fact: the category of automata (defined as a coalgebra $Q\to Q^{\Sigma}\times\{\top,\bot\}$ ) is a presheaf topos (over the category of languages). (see corollary 2.13). In this series of papers, we will provide various “Grothendieck topoi of automata”, which can be regarded as variants of this topos. Some of them are presheaf topoi, but some are not.

The structure of this first paper is as follows:

section 2:: Introducing four topoi of automata.
section 3:: Proving that four characterizations of regular languages provide four descriptions of a single boolean-ringed topos $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ .

1. Introducing four topoi of automata.

In section 2, introducing and calculating four topoi of automata, we will see some automata-theoretic topics naturally arise in our approach of ‘topoi of automata.’ Those include language recognition, coalgebraic treatment, automata minimalization, the quotient of language, and regular languages. (See table 1 table 2, and table 3, though some rows in the tables will be treated in the follow-up papers).

topos theory		automata theory
sheaf in $\Sigma\text{-}\mathbf{Set}$	$\leftrightsquigarrow$	word action
point of $\Sigma\text{-}\mathbf{Set}$	$\leftrightsquigarrow$	infinite word
The canonical point $p$ of $\Sigma\text{-}\mathbf{Set}$	$\leftrightsquigarrow$	Run of Moore machine
The internal Boolean algebra $p_{\ast}\{\top,\bot\}$	$\leftrightsquigarrow$	Boolean algebra of languages
Path action on $p_{\ast}\{\top,\bot\}$	$\leftrightsquigarrow$	Quotient of language
Morphism to $p_{\ast}\{\top,\bot\}$	$\leftrightsquigarrow$	automaton $=$ $2x^{\Sigma}$ -coalgebra
Image of the Yoneda map $\textrm{\!\maljapanese\char 72\relax}(\ast)\to\mathcal{L}$	$\leftrightsquigarrow$	minimal automata
hyperconnected quotient $\Sigma\text{-}\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$	$\leftrightsquigarrow$	Regular languages
Generated hyperconnected quotient	$\leftrightsquigarrow$	Syntactic monoid

Table 1. Some correspondence on

\Sigma\text{-}\mathbf{Set}

topos theory		automata theory
sheaf in $\mathbf{Atmt}$	$\leftrightsquigarrow$	automaton $=$ $2x^{\Sigma}$ -coalgebra
Structure map of an étale space	$\leftrightsquigarrow$	language recognition $=$ coinduction
étale covering $\mathbf{Atmt}\twoheadrightarrow\Sigma\text{-}\mathbf{Set}$	$\leftrightsquigarrow$	Forgetting the accept states
essential point of $\mathbf{Atmt}$	$\leftrightsquigarrow$	a language
Open subtopos of $\mathbf{Atmt}$	$\leftrightsquigarrow$	a quotient-stable language class

Table 2. Some correspondence on

\mathbf{Atmt}

the canonical Boolean algebra	$\leftrightsquigarrow$	Boolean algebra of regular languages
The ringed site $(\Sigma\text{-}\mathbf{FinSet},J),\mathrm{DFA}$	$\leftrightsquigarrow$	recognition by DFA
The ringed site $(\Sigma\text{-}\mathbf{FinMon},J),\mathcal{P}$	$\leftrightsquigarrow$	recognition by finite monoids
The topological monoid action ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\simeq\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}})$	$\leftrightsquigarrow$	profinite words description

Table 3. Some correspondence on

{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}

2. Proving that four characterizations of regular languages provide four descriptions of a single boolean-ringed topos $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ .

In section 3, we will deal with regular languages. Regular languages are a class of languages defined by certain finiteness properties and are known to have many characterizations (see fig. 1):

•

They are accepted by finite automata.
•

They are recognized by finite monoids.
•

They are (pullbacks of) clopen sets of profinite words.
•

Their corresponding Nerode congruence has only finitely many equivalence classes.

Figure 1. Four characterizations of regular languages are Morita equivalent.

We show that these data are Morita equivalent, in the sense that we construct (a priori four) Boolean-ringed topoi from these four data and prove that they are equivalent. The author regards this as an example of Olivia Caramello’s slogan of ‘toposes as bridges’ (see [Car23]), at least in a broader sense. This unified perspective demonstrates that the diverse views on regular languages can be interpreted as a single multifaceted topos.

On the follow-up papers

The contents of the follow-up papers include how points of the topoi categorify infinite words and how the complete lattice of hyperconnected quotients generalizes classes of languages and corresponding syntactic monoids.

Acknowledgement

The author would like to thank his supervisor, Ryu Hasegawa, for his continuous and helpful advice. I would like to thank Takeo Uramoto for his advice and for suggesting a connection with the variety theorem, Yuhi Kamio for his enlightening explanation of algebraic language theory, Morgan Rogers for the discussion on automata as topological monoid actions, Victor Iwaniack for topos theoretic automata theory, and Ryoma Sin’ya for his fascinating introduction to the field of automata.

I would like to extend my gratitude to Keisuke Hoshino, Takao Yuyama, Yusuke Inoue, Isao Ishikawa, Yuzuki Haga, David Jaz Myers, Ivan Tomasic, Igor Bakovic, and Joshua Wrigley for their helpful and encouraging discussions.

This research is supported by FoPM, WINGS Program, the University of Tokyo.

Notation 1.1.

In this note, we will fix a finite set of alphabet $\Sigma$ . Let ${{\Sigma}^{\ast}}$ denote the set of all words, i.e., the free monoid over the set $\Sigma$ .

2. Four topoi of automata

This section aims to introduce four topoi of automata

$\Sigma\text{-}\mathbf{Set}$ :: The topos of word actions (section 2.1)
$\mathbf{Atmt}$ :: The topos of (coalgebraic) automata (section 2.2)
${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ :: The topos of orbit-finite word actions (section 2.3)
$\mathbf{Atmt}_{\mathrm{o.f.}}$ :: The topos of orbit-finite automata (section 2.4)

and to provide a theoretical framework for the following sections and the follow-up papers.

2.1. $\Sigma\text{-}\mathbf{Set}$ : The presheaf topos of word actions

This subsection focuses on the simplest topos in this paper, the topos of word actions $\Sigma\text{-}\mathbf{Set}$ . Although the topos might seem too trivial to be interesting, we will observe that this topos has depth in its simplicity and naturally includes aspects of formal language theory.

2.1.1. Word actions form a topos

As mentioned in 1.1, the alphabet $\Sigma$ is fixed throughout the paper.

Definition 2.1.

We adopt the following definitions and terminologies.

•

A $\Sigma$ -set is a (possibly infinite) set $Q$ equipped with a function $\delta\colon Q\times\Sigma\to Q$ .
•

An element of $Q$ is called a state, and the associated function $\delta$ is called the transition function.
•

A morphism of $\Sigma$ -sets $f\colon(Q_{1},\delta_{1})\to(Q_{2},\delta_{2})$ is a function $f\colon Q_{1}\to Q_{2}$ that commutes with their transition functions.
•

The category of $\Sigma$ -sets is denoted by $\Sigma\text{-}\mathbf{Set}$ .

The following (seemingly boring) proposition is our starting point:

Proposition 2.2.

The category of $\Sigma$ -sets is equivalent to category of right ${{\Sigma}^{\ast}}$ -actions

\Sigma\text{-}\mathbf{Set}\simeq\mathbf{PSh}({{\Sigma}^{\ast}}).

In particular, it is a presheaf topos (and hence, a Grothendieck topos).

Proof.

For a $\Sigma$ -set $(Q,\delta)$ , the right action of a word $w=a_{1}\dots a_{n}$ on $q\in Q$ is defined by

qw\coloneqq\delta(\dots\delta(\delta(q,a_{1}),a_{2}),\dots a_{n}).

This construction naturally induces an equivalence of categories $\Sigma\text{-}\mathbf{Set}\to\mathbf{PSh}({{\Sigma}^{\ast}})$ . ∎

Remark 2.3 (Studies on the topos of word actions.).

This topos has been studied from several points of view. Of course, this topos is an example of topoi of (topological or discrete) monoid actions [Rog19, Rog23]. Even the case where $\Sigma$ is a singleton, has been of interest [LS09, Tom20, HK24]. The author also studied this topos (where $\Sigma$ is infinite) from the viewpoint of the study of quotient topoi [KH24].

2.1.2. The canonical point and the internal Boolean algebra of languages

This sub-subsection explains how the notion of languages arises from the topos $\Sigma\text{-}\mathbf{Set}$ . This also serves as a preparation for section 2.2.

Definition 2.4.

We adopt the following definitions:

•

A language is a subset of the free monoid ${{\Sigma}^{\ast}}$ .
•

The set of languages $\mathcal{P}({{\Sigma}^{\ast}})$ will be denoted by $\mathcal{L}$ .
•

The action $\delta\colon\mathcal{L}\times\Sigma\to\mathcal{L}$ is defined by the left quotient

$\delta(L,a)\coloneqq\{v\in{{\Sigma}^{\ast}}\mid av\in L\},$

and $\delta(L,a)$ is denoted by $a^{-1}L$ ¹¹1In this paper, terminologies and notations in automata theory are basically borrowed from [Pin22]..

Recall that a point of a Grothendieck topos $\mathcal{E}$ is defined as a geometric morphism from the topos $\mathbf{Set}\simeq\mathbf{Sh}(1)$ to the topos $\mathcal{E}$ . A pointed topos is a Grothendieck topos $\mathcal{E}$ equipped with a point $p\colon\mathbf{Set}\to\mathcal{E}$ .

We will show that the notion of languages naturally arises from the topos $\Sigma\text{-}\mathbf{Set}$ , using the next general lemma (see also [[]Lemma 2.3]rogers2023toposes):

Lemma 2.5 (Canonical boolean algebra in a pointed topos).

For a pointed topos $p\colon\mathbf{Set}\to\mathcal{E}$ , the Boolean operations on $\{\top,\bot\}$ induces an internal Boolean algebra structure on the object $p_{\ast}\{\top,\bot\}$ in $\mathcal{E}$ .

Proof.

Since the right adjoint functor $p_{\ast}$ preserves all finite products, it preserves all internal algebras. ∎

We call this internal boolean algebra $p_{\ast}\{\top,\bot\}$ the canonical Boolean algebra of a pointed topos $p\colon\mathbf{Set}\to\mathcal{E}$ . To apply this to our topos $\Sigma\text{-}\mathbf{Set}$ , we introduce the notion of the canonical point. This terminology is due to [Rog19].

Definition 2.6.

The canonical point $p\colon\mathbf{Set}\to\Sigma\text{-}\mathbf{Set}$ of the topos $\Sigma\text{-}\mathbf{Set}$ is the geometric morphism

p\colon\mathbf{Set}\simeq\mathbf{PSh}(1)\to\mathbf{PSh}({{\Sigma}^{\ast}})\simeq\Sigma\text{-}\mathbf{Set},

induced by the unique monoid homomorphism $1\to{{\Sigma}^{\ast}}$ , where the inverse image functor $p^{\ast}$ is the forgetful functor.

The direct image part $p_{\ast}$ is calculated by the formula of pointwise right Kan extension, and the result is as follows:

Lemma 2.7.

The direct image part $p_{\ast}\colon\mathbf{Set}\to\Sigma\text{-}\mathbf{Set}$ sends a set $X$ to the set of functions $p_{\ast}(X)=X^{{{\Sigma}^{\ast}}}$ equipped with the ${{\Sigma}^{\ast}}$ -action $(\phi w)(v)=\phi(wv)$ .

The following proposition provides a categorical description of the boolean algebra of languages and its universality, which serves as a foundation of this paper. For example, as we will see in the next subsection, this universality implies the famous coinductive description of language recognition.

Proposition 2.8 (The canonical Boolean algebra consists of all languages).

The $\Sigma$ -set of languages $\mathcal{L}$ is isomorphic to the canonical Boolean algebra $p_{\ast}(\{\top,\bot\})$ (lemma 2.5), i.e.,

\mathcal{L}\cong p_{\ast}(\{\top,\bot\})\text{ in }\Sigma\text{-}\mathbf{Set}.

Proof.

The isomorphism $\mathcal{L}\cong{\{\top,\bot\}}^{{{\Sigma}^{\ast}}}\cong p_{\ast}(\{\top,\bot\})$ follows from the above lemma. ∎

Remark 2.9 (The point $p^{\ast}\dashv p_{\ast}$ describes computation by Moore machine).

This adjunction $p^{\ast}\dashv p_{\ast}$ exhibits the behavior of Moore machines. More precisely, for a set $O$ (of outputs) and a $\Sigma$ -set $(Q,\delta)$ , the adjunction provides the one-to-one correspondence between

Output assignment to each state:: a function $g^{\sharp}\colon p^{*}(Q,\delta)=Q\to O$ , and
(Curried) run:: a $\Sigma$ -set morphism $g^{\flat}\colon(Q,\delta)\to O^{{{\Sigma}^{\ast}}}=p_{*}O$ ,

where $g^{\flat}$ sends $q\in Q$ to

{g^{\flat}}(q)\colon{{\Sigma}^{\ast}}\to O\colon w\mapsto g^{\sharp}(qw).

This is exactly same as the computation by a Moore machine, since $g^{\sharp}(qw)$ is the output of the run, with the initial state $q\in Q$ and the input word $w\in{{\Sigma}^{\ast}}$ . The language recognition is the special case where $O=\{\top,\bot\}$ (corollary 2.15).

2.2. $\mathbf{Atmt}$ : The presheaf topos of (coalgebraic) automata

This subsection aims to define a (presheaf) topos of automata, which rewrite the coalgebraic treatment of automata. We will observe that the category of (coalgebraicly defined) automata is a presheaf topos (corollary 2.13), and that the language recognition ( $=$ coinduction) is the structure map of a slice topos (corollary 2.15).

2.2.1. (Coalgebraic) automata form a topos

We define the category of automata as follows:

Definition 2.10.

We adopt the following definitions and terminologies:

•

An automaton²²2Our definition of automata does not contain the notion of start states. However, start states will naturally appear in our formulation, for example, in corollary 2.15, in remark 2.16, and also in the follow-up paper. is a $\Sigma$ -set $(Q,\delta\colon Q\times\Sigma\to Q)$ equipped with a subset $F\subset Q$ .
•

An element of $F$ is called an accept state.
•

A morphism of automata $f\colon(Q_{1},\delta_{1},F_{1})\to(Q_{2},\delta_{2},F_{2})$ is a $\Sigma$ -set morphism $f\colon(Q_{1},\delta_{1})\to(Q_{2},\delta_{2})$ that preserves and reflects accept states, (i.e., $F_{1}=f^{-1}(F_{2})$ ).
•

The category of automata is denoted by $\mathbf{Atmt}$ .

Remark 2.11 (Automata as coalgebras).

In the category theory community, the above definition of automata has been considered in the context of colalgebras More precisely, the category of automata $\mathbf{Atmt}$ is equivalent to the category of coalgebras of an endofunctor $2x^{\Sigma}\colon\mathbf{Set}\to\mathbf{Set}\colon X\mapsto\{\text{accept},\text{reject}\}\times X^{\Sigma}$ .

\mathbf{Atmt}\simeq\mathbf{Coalg}_{2x^{\Sigma}}

(For more details, see textbooks including [Jac17, Rut19])

Theorem 2.12.

The category of automata $\mathbf{Atmt}$ is equivalent to the slice category $\Sigma\text{-}\mathbf{Set}/\mathcal{L}$ .

\mathbf{Atmt}\simeq\Sigma\text{-}\mathbf{Set}/\mathcal{L}

Proof.

Due to proposition 2.8, a $\Sigma$ -set morphism $(Q,\delta)\to\mathcal{L}$ corresponds to a function $\chi_{F}\colon Q\to\{\top,\bot\}$ , which specifies the set of accept states $F\subset Q$ . ∎

Geometrically speaking, the topos $\mathbf{Atmt}$ is an étale covering over the topos $\Sigma\text{-}\mathbf{Set}$ .

Corollary 2.13.

The following four categories are mutually equivalent, and hence they are presheaf topoi, (in particular, Grothendieck topoi).

•

$\mathbf{Atmt}$
•

$\mathbf{Coalg}_{2x^{\Sigma}}$ (remark 2.11)
•

$\Sigma\text{-}\mathbf{Set}/\mathcal{L}$
•

$\mathbf{PSh}({{\int}\mathcal{L}})$ , where the category of languages ${{\int}\mathcal{L}}$ is defined to be the category of elements of $\mathcal{L}\in\mathrm{ob}(\mathbf{PSh}({{\Sigma}^{\ast}}))$ .

Proof.

We have observed the equivalence between the first three categories. For the last one $\mathbf{PSh}({{\int}\mathcal{L}})$ , this follows from the general fact of a slice of a presheaf topos $\mathbf{PSh}(\mathcal{C})/P\simeq\mathbf{PSh}({\int}P)$ . ∎

2.2.2. Language recognition

By abuse of notation, we will refer to the automaton of languages defined below and the $\Sigma$ -set of languages, by the same symbol $\mathcal{L}$ .

Definition 2.14.

The automaton of languages, which is also denoted by $\mathcal{L}$ , is the $\Sigma$ -set of languages $\mathcal{L}$ equipped with the set of accept states $F\subset\mathcal{L}$ defined by

F\coloneqq\{L\in\mathcal{L}\mid\varepsilon\in L\},

where $\varepsilon$ denotes the empty word.

Theorem 2.12 immediately implies (and provides a new perspective on) the following famous theorem in the coalgebraic theory of automata.

Corollary 2.15 (Categorical description of language recognition).

The automaton of languages $\mathcal{L}$ is the terminal object of $\mathbf{Atmt}$ . Furthermore, for an automaton $A=(Q,\delta,F)$ , the unique morphism $A\to\mathcal{L}$ in $\mathbf{Atmt}$ sends a state $q\in Q$ to the language $(\in\mathcal{L})$ that the automaton $(Q,\delta,F)$ equipped with the starting state $q$ recognizes.

Proof.

This is usually proven by the theory of final coalgebras. (See, for example, [Jac17].) But we will derive it from theorem 2.12. Since the terminal object of a slice category $\mathcal{C}/c$ is the identity map $\mathrm{id}_{c}\colon c\to c$ in general, the terminal object of $\mathbf{Atmt}\simeq\Sigma\text{-}\mathbf{Set}/\mathcal{L}$ is the identity map $\mathrm{id}_{\mathcal{L}}\colon\mathcal{L}\to\mathcal{L}$ , which corresponds to the automaton of languages. The latter statement follows from the calculation of the adjunction $p^{\ast}\dashv p_{\ast}$ (remark 2.9). ∎

Remark 2.16 (Yoneda point of view on initial state, recognition, and the minimal automaton).

Our topos-theoretic framework also describes the automata minimalization, which resembles the functorial approach [CP20]. Let $\textrm{\!\maljapanese\char 72\relax}(\ast)$ denote the free $\Sigma$ -set, whose underlying set is ${{\Sigma}^{\ast}}$ , and the action is given by the concatenation of words. It is the unique representable presheaf in $\Sigma\text{-}\mathbf{Set}$ . By the Yoneda lemma, a diagram

in $\Sigma\text{-}\mathbf{Set}$ corresponds to the data of an automaton $(Q,\delta,F)$ equipped with an initial state $q_{0}\in Q$ . Their composite

corresponds to the recognized language $L\in\mathcal{L}$ by the Yoneda lemma.

Conversely, for any language $L\in\mathcal{L}$ , there is the corresponding morphism

by the Yoneda lemma. Since $\Sigma\text{-}\mathbf{Set}$ is a topos, there is an epi-mono factorization, which provides the canonically constructed new automaton (equipped with an initial state):

and this new automaton $\mathcal{A}(L)$ coincides with what’s called the minimal automaton (or Nerode-automaton, see details for [[]4.6 Minimal automata]pin2020mathematical) of the language $L$ .

2.3. ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ : The Grothendieck topos of orbit-finite word actions

So far, both of $\Sigma\text{-}\mathbf{Set},\mathbf{Atmt}$ are presheaf topoi, and we did not assume any finiteness assumption. However, needles to say, finiteness is crucial for the theory of regular languages. For example, a regular language is defined as a language recognizable by a finite automaton. So, our next question is: how can we deal with finiteness in our framework? To answer this question, we will define a Grothendieck topos, which is not a presheaf topos.

This subsection focuses on the category of orbit-finite $\Sigma$ -sets, which turns out to be a Grothendieck topos (proposition 2.19). The content of the present subsection will be generalized to a broader context in the follow-up paper.

2.3.1. Orbit-finite word actions form a topos

We introduce the notion of an orbit-finite $\Sigma$ -set and show that it forms a (pointed) Grothendieck topos. Then, we demonstrate that this topos characterizes a class of regular languages in complete parallel with section 2.1.

Definition 2.17.

We define the notion of local finiteness as follows:

•

For a $\Sigma$ -set $(Q,\delta)$ and its state $q\in Q$ , the orbit of $q$ is the set $\{qw\mid w\in{{\Sigma}^{\ast}}\}\subset Q$ .
•

A $\Sigma$ -set $(Q,\delta)$ is orbit-finite, if, for any $q\in Q$ , its orbit is a finite set.
•

The category of orbit-finite $\Sigma$ -sets, which is a full subcategory of $\Sigma\text{-}\mathbf{Set}$ , is denoted by ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ .

Remark 2.18 (Why orbit-finite, not finite?).

Regular languages are defined as languages that are recognized by finite automata. Then why do we consider orbit-finite automata instead of finite automata? There are many reasons, but they can be broadly divided into two:

(1)

It is equivalent to say that regular languages are those accepted by orbit-finite automata.
(2)

Orbit-finite automata are closed under more constructions than finite automata.

The first reason indicates that we do not necessarily need to stick to finite automata. The latter ensures good categorical properties, such as existense of small colimits, specifically appearing as:

•

${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ becomes a Grothendieck topos (proposition 2.19),
•

the collection of all regular languages forms an orbit-finite (but not finite) automaton (proposition 2.24).

The category of finite $\Sigma$ -sets will be utilized as a site for the topos (proposition 3.8). See also remark 3.6 and [Ura17]

The goal of this sub-subsection is to prove the following proposition. For the notion of hyperconnected geometric morphisms, see appendix A, [Joh02], or [Joh81].

Proposition 2.19.

The category ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ is a Grothendieck topos, and there is a hyperconnected geometric morphism

h\colon\Sigma\text{-}\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}

whose inverse image part $h^{\ast}\colon{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\to\Sigma\text{-}\mathbf{Set}$ is the canonical embedding functor.

Proof.

Due to lemma A.2, it is enough to prive that the full subcategory ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\to\Sigma\text{-}\mathbf{Set}$ is closed under taking small coproducts, finite products, and subquoteints (subobjects, and quotient objects). This immediately follows from the concrete calculation (and the fact that finite limits, subobjects, and quotient objects are preserved by the forgetful functor $\Sigma\text{-}\mathbf{Set}\to\mathbf{Set}$ ). ∎

We have proven the above proposition by abstract nonsense, but we also provides a concrete description of the right adjoint for the later referrences.

Lemma 2.20 (Construction of $h_{\ast}$ ).

We can construct the right adjoint $h_{*}\colon\Sigma\text{-}\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ as follows:

•

For a $\Sigma$ -set $(Q,\delta)$ and its subset

$Q_{\text{fin}}\coloneqq\{q\in Q\mid\text{the orbit of }q\text{ is finite}\},$

$(Q_{\text{fin}},\delta)$ is the maximum orbit-finite sub $\Sigma$ -set.
•

The above construction $(Q,\delta)\mapsto(Q_{\text{fin}},\delta)$ defines the right adjoint to the full embedding functor $h^{*}\colon{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\to\Sigma\text{-}\mathbf{Set}$ .

Proof.

The former part is easy to prove. Notice that $Q_{\text{fin}}$ is closed by the ${{\Sigma}^{\ast}}$ actions. The latter part follows from the former one and lemma A.2. ∎

We obtained the concrete description $h_{\ast}(Q,\delta)=(Q_{\text{fin}},\delta)$ from lemma 2.20. The monic counit (see definition A.1) is the inclusion $\epsilon_{(Q,\delta)}\colon(Q_{\text{fin}},\delta)\rightarrowtail(Q,\delta)$ .

2.3.2. The canonical point and the internal Boolean algebra of regular languages

We will observe the canonical point and the internal Boolean algebra of ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ in parallel with section 2.1.2.

Definition 2.21.

The canonical point³³3This terminology is also a special case of [Rog23]. Furthermore, the definition of this point as a compotite of essential surjective point followed by a hyperconnected geometric morphism, is nothing other than the characterization of toposes of topological monoid actions ([[]Theorem 3.20.]rogers2023toposes). We will come back to this observation shortly in section 3.4 and extensively in the follow-up paper. ${p_{\mathrm{o.f.}}}$ of ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ is the composite of

Since ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ is a pointed topos, there is the associated internal Boolean algebra ${p_{\mathrm{o.f.}}}_{\ast}(\{\top,\bot\})$ (by lemma 2.5). We will prove that it is the Boolean algebra of regular languages. To prove it, we need the Myhill-Nerode theorem (see [Pin22]), in our terminology⁴⁴4Myhill-Nerode theorem in terms of Nerode congruences will appear in the follow-up paper, utilizing the theory of a local state classifier [Hor24]..

According to remark 2.16, let $\textrm{\!\maljapanese\char 72\relax}(\ast)\cong{{\Sigma}^{\ast}}$ denote the free $\Sigma$ -set, and denote the morphism corresponding to a language $L\in\mathcal{L}$ by the Yoneda lemma. Using this, Myhill-Nerode theorem is

Lemma 2.22 (Myhill-Nerode theorem).

For a language $L\in\mathcal{L}$ , the followings are equivalent:

(1)

$L$ is regular (i.e., recognized by a finite automaton).
(2)

The Yoneda-corresponding morphism

factor through a finite $\Sigma$ -set ⁵⁵5A $\Sigma$ -set is said to be finite, if its underlying set is finite (definition 3.4). .
(3)

The image of

is finite.
(4)

The orbit of $L\in\mathcal{L}$ , which is $\{w^{-1}L\mid w\in{{\Sigma}^{\ast}}\}\subset\mathcal{L}$ , is finite.

Proof.

The first two conditions are just paraphrases due to remark 2.16. The last two conditions are more appearently equivalent, since $\{w^{-1}L\mid w\in{{\Sigma}^{\ast}}\}\subset\mathcal{L}$ is the image of $\lceil L\rceil$ . The equivalence between second and third conditions follows from the image factorizations in $\Sigma\text{-}\mathbf{Set}$ . ∎

Definition 2.23 (orbit-finite $\Sigma$ -set of regular languages).

The orbit-finite $\Sigma$ -set of regular languages is denoted by $\mathcal{R}$ .

Proposition 2.24 (The canonical Boolean algebra consists of regular languages).

The canonical internal Boolean algebra (described in lemma 2.5) of the pointed topos ${p_{\mathrm{o.f.}}}\colon\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ is isomorphic to the orbit-finite $\Sigma$ -set of regular languages:

\mathcal{R}\cong{p_{\mathrm{o.f.}}}_{\ast}(\{\top,\bot\})\text{ in }{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}.

Proof.

We have the following isomorphisms

{p_{\mathrm{o.f.}}}_{\ast}(\{\top,\bot\})\cong h_{\ast}(p_{\ast}(\{\top,\bot\}))\cong h_{\ast}(\mathcal{L}),

by proposition 2.8. Furthermore, lemma 2.20 and lemma 2.22 imply that $h_{\ast}(\mathcal{L})\cong\mathcal{R}$ , which completes the proof. ∎

2.4. $\mathbf{Atmt}_{\mathrm{o.f.}}$ : The Grothendieck topos of orbit-finite automata

Definition 2.25.

We adopt the following definitions and terminologies:

•

An automaton $(Q,\delta,F)$ is orbit-finite if its underlying $\Sigma$ -set $(Q,\delta)$ is orbit-finite.
•

The category of orbit-finite automata is denoted by $\mathbf{Atmt}_{\mathrm{o.f.}}$ .

In parallel with theorem 2.12, we obtain the following “slice description” of $\mathbf{Atmt}_{\mathrm{o.f.}}$ .

Proposition 2.26.

The category of orbit-finite automata is equivalent to the slice category

\mathbf{Atmt}_{\mathrm{o.f.}}\simeq{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}/\mathcal{R}.

Proof.

The same proof as theorem 2.12. ∎

Then we obtain a variant of corollary 2.15 for regular languages.

Corollary 2.27.

The category $\mathbf{Atmt}_{\mathrm{o.f.}}$ is a Grothendieck topos.

Proof.

Every slice category of a Grothendieck topos is again a Grothendieck topos. ∎

Proposition 2.28.

We can observe the recognition of regular languages in the Grothendieck topos $\mathbf{Atmt}_{\mathrm{o.f.}}$ :

•

The terminal object of $\mathbf{Atmt}_{\mathrm{o.f.}}$ is the orbit-finite automaton of regular languages $\mathcal{R}$ .
•

Furthermore, for an orbit-finite automaton $(Q,\delta,F)$ , the unique map $!\colon(Q,\delta,F)\to\mathcal{R}$ sends a state $q\in Q$ to the regular language that $(Q,\delta,q,F)$ recognizes with the start state $q$ .

In the follow-up paper, the content of this subsection will be generalized to other hyperconencted geometric morphisms form $\Sigma\text{-}\mathbf{Set}$ , so that we can consider other classes of (possibly non-regular) languages.

3. Four Morita equivalent definitions of the Boolean-ringed topos of regular languages $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$

In this section, we will observe that the following four characterizations of regular languages provide four different “Morita equivalent” definitions of the single Boolean-ringed topos $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ .

A language $L$ is regular, if and only if

section 3.1 Myhill-Nerode theorem:: its orbit $\{w^{-1}L\mid w\in{{\Sigma}^{\ast}}\}$ is finite.
section 3.2 DFA:: it is recognized by a deterministic finite automaton.
section 3.3 Finite monoids.:: there is a monoid homomorphism $f\colon{{\Sigma}^{\ast}}\to M$ to a finite monoid $M$ and a subset $S\in\mathcal{P}(M)$ such that $L=f^{-1}(S)$ .
section 3.4 Profinite words:: there is a clopen subset $S\subset\widehat{{{\Sigma}^{\ast}}}$ such that $L$ is the inverse image of $S$ along the canonical embedding ${{\Sigma}^{\ast}}\to\widehat{{{\Sigma}^{\ast}}}$ .

Put simply, what we aim to do in this section is to categorify the “equivalence between these conditions” and lift it to “isomorphism between structures.”

3.1. Description by the canonical point $=$ Myhill-Nerode theorem

We adopt the following terminology:

Definition 3.1.

A Boolean-ringed topos⁶⁶6We prefer the word ‘Boolean ring,’ just because it is more conventional to say ‘ringed topos’ rather than ‘algebra-ed topos.’ is a Grothendieck topos equipped with an internal Boolean algebra (as a “structure sheaf”).

For example, every pointed topos is canonically a Boolean-ringed topos by lemma 2.5.

Definition 3.2.

The Boolean-ringed topos of regular languages $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ is the (pointed) topos ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ (definitions 2.17 and 2.19) equipped with the canonical internal boolean algebra (structure sheaf) $\mathcal{R}$ (proposition 2.24).

As we have seen in proposition 2.24, the Boolean-ringed topos $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ is the one induced by the canonical point

{p_{\mathrm{o.f.}}}\colon\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}.

This is essentially equivalent to the Myhill-Nerode theorem (lemma 2.22).

Remark 3.3 (Connection with the Nerode-congruence.).

Usually, the Myhill-Nerode theorem is stated in terms of (Nerode-)congruence. The follow-up paper will make the connection with Nerode-congruence more explicit. We will observe that the local state classifier $\Xi$ (defined in [Hor24]) consists of right congruences of ${{\Sigma}^{\ast}}$ and that the canonical morphism $\xi_{\mathcal{L}}\colon\mathcal{L}\to\Xi$ sends a language $L$ to its Nerode-congruence ${\sim}_{L}$ . The Myhill-Nerode theorem will be paraphrased as a pullback diagram along the morphism $\xi_{\mathcal{L}}$ .

3.2. Description by DFA

Definition 3.4.

We adopt the following terminologies:

•

A $\Sigma$ -set $(Q,\delta)$ is finite if the set of states $Q$ is a finite set.
•

The category of finite $\Sigma$ -sets is denoted by $\Sigma\text{-}\mathbf{FinSet}$ .
•

Let $J$ be the Grothendieck topology generated by jointly surjective families.

The next lemma is also mentioned in [Ura17].

Lemma 3.5 (Equivalence between the underlying topoi).

$\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J)\simeq{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$

Proof.

Since $\Sigma\text{-}\mathbf{FinSet}$ is a full subcategory of ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ , due to Giraud’s theorem, it is enough to show that finite $\Sigma$ -sets form a generating set of ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ . This is easily implied by the definition of orbit-finiteness. ∎

Remark 3.6.

Unless $\Sigma$ is empty, the category $\Sigma\text{-}\mathbf{FinSet}$ is not an elementary topos, a fortiori, not a Grothendieck topos. However, this belongs to a good class of categories, semi-Galois categories [Ura17].

We define the $J$ -sheaf of $\mathrm{DFA}\colon\Sigma\text{-}\mathbf{FinSet}^{\mathrm{op}}\to\mathbf{Set}$ (deterministic finite automata). The set $\mathrm{DFA}(Q,\delta)$ is intended to be the set of all DFA structures over $(Q,\delta)$ , i.e. $\mathrm{DFA}(Q,\delta)\coloneqq\{(Q,\delta,F)\mid F\subset Q\}$ . This can be simplified as follows.

Definition 3.7.

The sheaf of DFA is the $J$ -sheaf of Boolean algebras

\mathrm{DFA}\colon\Sigma\text{-}\mathbf{FinSet}^{\mathrm{op}}\to\mathbf{BoolAlg}\colon(Q,\delta)\to\mathcal{P}(Q).

The proof of the following proposition verifies that $\mathrm{DFA}$ is indeed a $J$ -sheaf of Boolean algebras.

Proposition 3.8.

The two Boolean-ringed topoi are equivalent:

({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})\simeq(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J),\mathrm{DFA}).

Proof.

The equivalence of topoi is due to lemma 3.5. Since we have proven that $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ is an internal Boolean algebra and the equivalence $N\colon{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\xrightarrow{{\simeq}}\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J)$ is given by

X\mapsto\left(N(X)\colon(Q,\delta)\mapsto{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),X)\right),

the corresponding internal Boolean algebra is given by $N(\mathcal{R})$

N(\mathcal{R})\colon(Q,\delta)\mapsto{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),\mathcal{R}).

Then, proposition 2.24 implies

N(\mathcal{R})(Q,\delta)\cong{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),\mathcal{R})\cong\mathbf{Set}(Q,\{\top,\bot\})\cong\mathrm{DFA}(Q,\delta),

which completes the proof. ∎

Notice that this equivalence of two Boolean-ringed topoi actually capture the notion of language recognition by DFA, since the correpondence $\mathrm{DFA}(Q,\delta)\cong{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),\mathcal{R})$ is nothing but the language recognition (remarks 2.9 and 2.28).

3.3. Description by finite monoids

This subsection aims to reconstruct $\mathcal{R}$ in terms of the recognizability by finite monoids.

Definition 3.9.

A finite $\Sigma$ -monoid is a pair of a finite monoid $M$ and a $\Sigma$ -indexed family of elements $\{m_{a}\}_{a\in\Sigma}$ .

Definition 3.10.

We define the category $\Sigma\text{-}\mathbf{FinMon}$ as follows:

•

an object is a finite $\Sigma$ -monoid $(M,\{m_{a}\}_{a\in\Sigma})$ ,
•

a morphism $(M,\{m_{a}\}_{a\in\Sigma})\to(M^{\prime},\{m^{\prime}_{a}\}_{a\in\Sigma})$ is a function $f\colon M\to M^{\prime}$ such that $f(xm_{a})=f(x)m^{\prime}_{a}$ for any $a\in\Sigma$ and $x\in M$ .

Notice that $\Sigma\text{-}\mathbf{FinMon}$ is a full subcategory of ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ (not of ${{\Sigma}^{\ast}}/\mathbf{Monoids}$ ), since a finite $\Sigma$ -monoid $(M,\{m_{a}\}_{a\in\Sigma})$ can be regarded as a finite $\Sigma$ -set $(M,\overline{\delta})$ with $\overline{\delta}(m,a)\coloneqq m\cdot m_{a}$ .

Letting $J$ denote the Grothendieck topology, generated by the jointly surjective families in ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ , we obtain the following proposition

Proposition 3.11.

The following two Boolean ringed topoi are equivalent:

({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})\simeq(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinMon},J),\mathcal{P}),

where $\mathcal{P}$ denotes the power set functor $\Sigma\text{-}\mathbf{FinMon}^{\mathrm{op}}\xrightarrow{U}\mathbf{FinSet}^{\mathrm{op}}\xrightarrow{\mathcal{P}}\mathbf{Set}$ .

Proof.

The proof is almost the same as the proof of proposition 3.8. The only non-trivial part is proving that $\Sigma\text{-}\mathbf{FinMon}\hookrightarrow{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ is a generating full subcategory. It is enough to show that, for an arbitrary finite $\Sigma$ -set $(Q,\delta)$ and an element $q_{0}\in Q$ , there exists a finite $\Sigma$ -monoid $(M,\{m_{a}\}_{a\in\Sigma})$ and a $\Sigma$ -set homomorphism $f\colon(M,\overline{\delta})\to(Q,\delta)$ such that $q_{0}\in\mathrm{Im}(f)$ . Let us consider $\mathrm{End}(Q)^{\mathrm{op}}$ , the opposite monoid of the endofunction monoid $\mathrm{End}(Q)$ . In other words, elements of the monoid $\mathrm{End}(Q)^{\mathrm{op}}$ are functions $Q\to Q$ , and the multiplication $\phi\cdot\psi$ is defined to be $\psi\circ\phi$ . The family of endomorphisms $\{\delta({-},a)\colon Q\to Q\}_{a\in\Sigma}$ makes it a finite $\Sigma$ -monoid. The function

f\colon\mathrm{End}(Q)^{\mathrm{op}}\to Q\colon\phi\mapsto\phi(q_{0})

is a $\Sigma$ -set morphism, since $(\phi\cdot\delta({-},a))(q_{0})=\delta(\phi(q_{0}),a)$ . The element $q_{0}$ belongs to the image $\mathrm{Im}(f)$ , since the identity function $\mathrm{id}_{Q}\in\mathrm{End}(Q)^{\mathrm{op}}$ is sent to $q_{0}$ . ∎

Remark 3.12 (This is not satisfying enough!).

The category $\Sigma\text{-}\mathbf{FinMon}$ is not quite monoid-theoretic, in the sense that two isomorphic objects in $\Sigma\text{-}\mathbf{FinMon}$ might be non-isomorphic as monoids. A more natural way to understand monoid-theoretic aspects, including the theory of syntactic monoids, will be proposed in the follow-up paper.

Remark 3.13 (Other generating sets).

The arguments in section 3.2 and section 3.3 only utilize the fact that the considered full subcategories, namely $\Sigma\text{-}\mathbf{FinSet}$ and $\Sigma\text{-}\mathbf{FinMon}$ , are generating subcategories. We can do the same for other generating sets of objects.

3.4. Description by clopen subsets of profinite words

This subsection needs a few preliminaries on topological monoids and profinite words. First, recall basics of the topos of topological monoid actions. See [Rog23] for an extensive study on this topic.

Lemma 3.14 (Recall on the toposes of topological monoid actions. [Rog23]).

For a topological monoid⁷⁷7[Rog23] deals with monoid equipped with a topology, whose multiplication is not necessarily continuous. $M$ ,

•

the topos of continuous actions of $M$ $\mathbf{Cont}(M)$ is defined to be a full subcategory of $\mathbf{PSh}(M)$ that consists of $M$ -sets $(X,X\times M\to X)$ such that the action map $X\times M\to X$ is continuous, with respect to the discrete topology on $X$ and the product topology on $X\times M$ .
•

$\mathbf{Cont}(M)$ is a Grothendieck topos.
•

The forgetful functor $U\colon\mathbf{Cont}(M)\to\mathbf{Set}$ has a right adjoint, and the adjunction

defines a surjective geometric morphism $\mathbf{Set}\to\mathbf{Cont}(M)$ . We call this point $p\colon\mathbf{Set}\to\mathbf{Cont}(M)$ the canonical point.

Lemma 3.15.

For a compact topological monoid $M$ , the corresponding internal Boolean algebra of the pointed topos

p\colon\mathbf{Set}\to\mathbf{Cont}(M)

(given by lemma 2.5) is the Boolean algebra of clopen subsets $\mathrm{Clopen}(M)$ .

Proof.

The point $p\colon\mathbf{Set}\to\mathbf{Cont}(M)$ is decomposed into the composite of $\mathbf{Set}\to\mathbf{PSh}(M)\to\mathbf{Cont}(M)$ , where $\mathbf{PSh}(M)$ denotes the topos of discrete actions. This decomposition allows us to calculate the internal Boolean algebra $B$ as a Boolean subalgebra of $\mathcal{P}(M)$ . What we will prove is that $B$ coincides with the Boolean subalgebra $\mathrm{Clopen}(M)\subset\mathcal{P}(M)$ .

The calculation of the direct image functor $\mathbf{PSh}(M)\to\mathbf{Cont}(M)$ implies that a subset $S\subset M$ belongs to $B$ if and only if every equivalence class of the equivalence relation

a\sim_{S}b\iff a^{-1}S=b^{-1}S

is open (See [[]Scholium 2.9.]rogers2023toposes for the details.). Here, $a^{-1}S$ denotes $\{m\in M\mid am\in S\}$ .

First, we will prove that $B\subset\mathrm{Clopen}(M)$ (without the assumption of the compactness). Since for every $s\in S$ , $s\sim_{S}b\iff s^{-1}S=b^{-1}S\implies e\in b^{-1}S\iff b\in S$ , $S\in B$ implies that $S$ is open. $S$ is closed, because $M\setminus S$ also belongs to $B$ .

We will prove $B\supset\mathrm{Clopen}(M)$ (using the assumption of compactness). Let $S\subset M$ be an arbitrary clopen subset, and $a\in M$ be an arbitrary element. It is enough to construct an open neighborhood $a\in U\subset M$ such that $\forall b\in U,\;a\sim_{S}b$ . For each $m\in M$ , we can take open neighborhoods $a\in U_{m}\subset M$ and $m\in V_{m}\subset M$ such that

\forall a^{\prime}\in U_{m},\forall m^{\prime}\in V_{m},\;(a^{\prime}m^{\prime}\in S\iff am\in S),

since the multiplication map $M\times M\to M$ is continuous. The compactness of $M$ allows us to pick a finite subcover $M=V_{m_{1}}\cup\dots\cup V_{m_{n}}$ . Take an arbitrary element $b$ of $U\coloneqq U_{m_{1}}\cap\dots\cap U_{m_{n}}$ . What we need to prove is that $a\sim_{S}b$ . Take an arbitrary element $m\in M$ . For $1\leq i\leq n$ with $m\in V_{m_{i}}$ , we have

bm\in S\iff am_{i}\in S\iff am\in S,

which proves that $a\sim_{S}b$ . ∎

In particular, if the topological monoid $M$ is profinite, then the corresponding internal Boolean algebra is its Stone dual (equipped with the canonical continuous $M$ action).

Definition 3.16.

Let $\widehat{{{\Sigma}^{\ast}}}$ denote the topological monoid of profinite words, i.e., the profinite completion of the monoid ${{\Sigma}^{\ast}}$ . Let $\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}})$ be its topos of continuous actions.

For a detailed explanation of the notion of profinite words and its relation to automata theory, see [Pin22]. See also [Ura17] for the description of the topos $\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}})$ .

Lemma 3.17.

The pointed topos $p\colon\mathbf{Set}\to\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}})$ is equivalent to $p\colon\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ as pointed topoi.

Proof.

Since $\Sigma\text{-}\mathbf{Set}$ is the topos of (continuous) actions of the discrete topological monoid ${{\Sigma}^{\ast}}$ , the canonical inclusion $\iota\colon{{\Sigma}^{\ast}}\rightarrowtail\widehat{{{\Sigma}^{\ast}}}$ induces a geometric morphism $g\colon\Sigma\text{-}\mathbf{Set}\to\widehat{{{\Sigma}^{\ast}}}$ , where $g^{*}$ is given by the restriction of the action along $\iota$ :

By the construction, $g^{*}$ is faithful. The denseness of $\iota\colon{{\Sigma}^{\ast}}\to\widehat{{{\Sigma}^{\ast}}}$ implies that $g^{*}$ is full, i.e., the geometric morphism $g\colon\Sigma\text{-}\mathbf{Set}\to\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}})$ is connected. The compactness of $\widehat{{{\Sigma}^{\ast}}}$ implies that each orbit of continuous $\widehat{{{\Sigma}^{\ast}}}$ action is finite, i.e., the geometric morphism $g$ factors through the hyperconnected geometric morphism $h$ as follows.

Since $g$ and $h$ are connected, so is $e$ . The remaining task is to prove $e^{*}$ is essentially surjective. Since $e^{*}$ is coreflective (and hence creates all colimits), it suffices to prove that the essential image of $e^{*}$ contains a generating set. By definition of profinite completion, every finite quotient monoid ${{\Sigma}^{\ast}}\twoheadrightarrow M$ is a continuous quotient of $\widehat{{{\Sigma}^{\ast}}}$ , which implies that the canonical action $M\times\Sigma\to M$ belongs to the essential image of $e^{*}$ . The same argument to proposition 3.11 completes the proof. ∎

Proposition 3.18.

The following two Boolean ringed topoi are equivalent:

({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})\simeq(\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}),\mathrm{Clopen}(\widehat{{{\Sigma}^{\ast}}})).

Proof.

This immediately follows from lemma 3.15 and lemma 3.17. ∎

As a summary of this section, we obtain the following theorem:

Theorem 3.19.

The following four Boolean-ringed topoi are all equivalent to the Boolean-ringed topos of regular languages $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ .

Myhill-Nerode:: $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},p_{\ast}(\{\top,\bot\}))$
DFA:: $(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J),\mathrm{DFA})$
Finite Monoids and their subsets:: $(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinMon},J),\mathcal{P})$
Profinite Words and its clopen subsets:: $(\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}),\mathrm{Clopen}(\widehat{{{\Sigma}^{\ast}}}))$

Appendix A Preliminaries on hyperconnected geometric morphism

This appendix aims to recall the notion of hyperconnected geometric morphisms. See [Joh02], or [Joh81] for more details.

Definition A.1 (Hyperconnected geometric morphism).

A geometric morphism $f\colon\mathcal{E}\to\mathcal{F}$ is called hyperconnected if its inverse image functor $f^{\ast}$ is fully faithful (i.e., $f$ is connected) and satisfy the following equivalent conditions:

•

the essential image of $f^{*}$ is closed under subquotients.
•

the counit $\epsilon\colon f^{*}f_{*}\Rightarrow\mathrm{id}_{\mathcal{E}}$ is monic.

Lemma A.2.

For a Grothendieck topos $\mathcal{E}$ , if a full subcategory $\iota\colon\mathcal{F}\hookrightarrow\mathcal{E}$ is closed under

•

small coproducts,
•

finite products, and
•

subquotients (subobjects and quoteint objects),

then $\mathcal{F}$ is also a Grothendieck topos, and there is a hyperconnected geoemteric morphism $h\colon\mathcal{E}\to\mathcal{F}$ , whose inverse image functor $h^{*}\colon\mathcal{F}\to\mathcal{E}$ coincides with the embedding functor $\iota$ .

Proof.

Under the assumption, $\iota$ admits a right adjoint $R\colon\mathcal{E}\to\mathcal{F}$ , which sends an object $X\in\mathrm{ob}\mathcal{E}$ to the maximum subobject belonging to (the essential image of the embedding of) $\mathcal{F}$ .

Since $\iota$ preserves all finite limits, which are constructed by fintie products and (regular) subobjects, this adjunction is lex coreflective, in particular, lex comonadic. This proves that $\mathcal{F}$ is a category of coalgebras of the lex comonad $\iota\circ R$ , and that $\mathcal{F}$ is an elementary topos. Furthermore, since $\iota$ is closed under subquoteint, the adjunction $\iota\dashv R$ defines a hyperconnected geometric morphism

Since $\mathcal{E}$ is a Grothendieck topos, we can prove that $\mathcal{F}$ is also a Grothendieck topos (see [[]Theorem 1.8.5.]rogers2021supercompactly for a proof). This completes the proof. ∎

Conversely, for any hyperconnected geometric morphism $h\colon\mathcal{E}\to\mathcal{F}$ , from a Grothendieck topos $\mathcal{E}$ , the essential image of $h^{*}$ satisfies the assumption of lemma A.2. So every hyperconnected geometric morphism is constructed by lemma A.2.

References

[Adá74] Jiří Adámek “Free algebras and automata realizations in the language of categories” In Commentationes Mathematicae Universitatis Carolinae 15.4 Charles University in Prague, Faculty of MathematicsPhysics, 1974, pp. 589–602
[Boc+23] Guido Boccali et al. “The semibicategory of Moore automata” In arXiv:2305.00272, 2023
[Car23] Olivia Caramello “The unification of mathematics via topos theory” In Logic in Question: Talks from the Annual Sorbonne Logic Workshop (2011-2019), 2023, pp. 563–601 Springer
[CP20] Thomas Colcombet and Daniela Petrişan “Automata minimization: a functorial approach” In Logical Methods in Computer Science 16 Episciences. org, 2020
[GPA22] Alexandre Goy, Daniela Petrişan and Marc Aiguier “Powerset-like monads weakly distribute over themselves in toposes and compact Hausdorff spaces” In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), 2022 Schloss Dagstuhl–Leibniz-Zentrum für Informatik
[Hor24] Ryuya Hora “Internal parameterization of hyperconnected quotients” In Theory and Applications of Categories 42.11, 2024, pp. 263–313
[HK24] Ryuya Hora and Yuhi Kamio “Quotient toposes of discrete dynamical systems” In Journal of Pure and Applied Algebra Elsevier, 2024, pp. 107657
[Iwa24] Victor Iwaniack “Automata in W-Toposes, and General Myhill-Nerode Theorems” In International Workshop on Coalgebraic Methods in Computer Science, 2024, pp. 93–113 Springer
[Jac17] Bart Jacobs “Introduction to coalgebra” Cambridge University Press, 2017
[Joh81] Peter T. Johnstone “Factorization theorems for geometric morphisms, I” In Cahiers de topologie et géométrie différentielle catégoriques 22.1, 1981, pp. 3–17
[Joh02] Peter T. Johnstone “Sketches of an Elephant: A Topos Theory Compendium, Volume 1” Oxford University Press, 2002
[KH24] Yuhi Kamio and Ryuya Hora “A solution to the first Lawvere’s problem A Grothendieck topos that has a proper class many quotient topoi” In arXiv:2407.17105, 2024
[Law04] F William Lawvere “Functorial concepts of complexity for finite automata” In Theory and Applications of Categories 13.10 Citeseer, 2004, pp. 164–168
[LS09] F. Lawvere and Stephen H. Schanuel “Conceptual mathematics: a first introduction to categories Second Edition” Cambridge University Press, 2009
[Pin22] Jean-Éric Pin “Mathematical foundations of automata theory”, 2022
[Rog19] Morgan Rogers “Toposes of Discrete Monoid Actions” In arXiv:1905.10277, 2019
[Rog21] Morgan Rogers “On Supercompactly and Compactly Generated Toposes” In Theory and Applications of Categories 37.32, 2021, pp. 1017–1079
[Rog23] Morgan Rogers “Toposes of topological monoid actions” In Compositionality: the open-access journal for the mathematics of composition 5 Episciences. org, 2023
[Rut19] Jan Rutten “The Method of Coalgebra: exercises in coinduction” Amsterdam: CWI, 2019
[Tom20] Ivan Tomasic “A topos-theoretic view of difference algebra” In arXiv:2001.09075, 2020
[Ura17] Takeo Uramoto “Semi-galois Categories I: The Classical Eilenberg Variety Theory” In arXiv:1512.04389, 2017

Topoi of automata I: Four topoi of automata and regular languages

Abstract.

Key words and phrases:

2020 Mathematics Subject Classification:

1. Introduction

1. Introducing four topoi of automata.

2. Proving that four characterizations of regular languages provide four descriptions of a single boolean-ringed topos (Σ​-​𝐒𝐞𝐭o.f.,ℛ)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}).

On the follow-up papers

Acknowledgement

Notation 1.1.

2. Four topoi of automata

2.1. Σ​-​𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}: The presheaf topos of word actions

2.1.1. Word actions form a topos

Definition 2.1.

Proposition 2.2.

Proof.

Remark 2.3 (Studies on the topos of word actions.).

2.1.2. The canonical point and the internal Boolean algebra of languages

Definition 2.4.

Lemma 2.5 (Canonical boolean algebra in a pointed topos).

Proof.

Definition 2.6.

Lemma 2.7.

Proposition 2.8 (The canonical Boolean algebra consists of all languages).

Proof.

Remark 2.9 (The point p∗⊣p∗p^{\ast}\dashv p_{\ast} describes computation by Moore machine).

2.2. 𝐀𝐭𝐦𝐭\mathbf{Atmt}: The presheaf topos of (coalgebraic) automata

2.2.1. (Coalgebraic) automata form a topos

Definition 2.10.

Remark 2.11 (Automata as coalgebras).

Theorem 2.12.

Proof.

Corollary 2.13.

Proof.

2.2.2. Language recognition

Definition 2.14.

Corollary 2.15 (Categorical description of language recognition).

Proof.

Remark 2.16 (Yoneda point of view on initial state, recognition, and the minimal automaton).

2.3. Σ​-​𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}: The Grothendieck topos of orbit-finite word actions

2.3.1. Orbit-finite word actions form a topos

Definition 2.17.

Remark 2.18 (Why orbit-finite, not finite?).

Proposition 2.19.

Proof.

Lemma 2.20 (Construction of h∗h_{\ast}).

Proof.

2.3.2. The canonical point and the internal Boolean algebra of regular languages

Definition 2.21.

Lemma 2.22 (Myhill-Nerode theorem).

Proof.

Definition 2.23 (orbit-finite Σ\Sigma-set of regular languages).

Proposition 2.24 (The canonical Boolean algebra consists of regular languages).

Proof.

2.4. 𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}}: The Grothendieck topos of orbit-finite automata

Definition 2.25.

Proposition 2.26.

Proof.

Corollary 2.27.

Proof.

Proposition 2.28.

3. Four Morita equivalent definitions of the Boolean-ringed topos of regular languages (Σ​-​𝐒𝐞𝐭o.f.,ℛ)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})

3.1. Description by the canonical point == Myhill-Nerode theorem

Definition 3.1.

Definition 3.2.

Remark 3.3 (Connection with the Nerode-congruence.).

3.2. Description by DFA

Definition 3.4.

Lemma 3.5 (Equivalence between the underlying topoi).

Proof.

Remark 3.6.

Definition 3.7.

Proposition 3.8.

Proof.

3.3. Description by finite monoids

Definition 3.9.

Definition 3.10.

Proposition 3.11.

Proof.

Remark 3.12 (This is not satisfying enough!).

Topoi of automata I:
Four topoi of automata and regular languages

2. Proving that four characterizations of regular languages provide four descriptions of a single boolean-ringed topos $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$ .

2.1. $\Sigma\text{-}\mathbf{Set}$ : The presheaf topos of word actions

Remark 2.9 (The point $p^{\ast}\dashv p_{\ast}$ describes computation by Moore machine).

2.2. $\mathbf{Atmt}$ : The presheaf topos of (coalgebraic) automata

2.3. ${\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}$ : The Grothendieck topos of orbit-finite word actions

Lemma 2.20 (Construction of $h_{\ast}$ ).

Definition 2.23 (orbit-finite $\Sigma$ -set of regular languages).

2.4. $\mathbf{Atmt}_{\mathrm{o.f.}}$ : The Grothendieck topos of orbit-finite automata

3. Four Morita equivalent definitions of the Boolean-ringed topos of regular languages $({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})$

3.1. Description by the canonical point $=$ Myhill-Nerode theorem