This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Topoi of automata I:
Four topoi of automata and regular languages

Ryuya Hora
Abstract.

Both topos theory and automata theory are known for their multi-faceted nature and relationship with topology, algebra, logic, and category theory. This paper aims to clarify the topos-theoretic aspects of automata theory, particularly demonstrating through two main theorems how regular (and non-regular) languages arise in topos-theoretic calculation. First, it is shown that the four different notions of automata form four types of Grothendieck topoi, illustrating how the technical details of automata theory are described by topos theory. Second, we observe that the four characterizations of regular languages (DFA, Myhill-Nerode theorem, finite monoids, profinite words) provide Morita-equivalent definitions of a single Boolean-ringed topos, situating this within the context of Olivia Caramello’s ‘Toposes as Bridges.’

This paper also serves as a preparation for follow-up papers, which deal with the relationship between hyperconnected geometric morphisms and algebraic/geometric aspects of formal language theory.

Key words and phrases:
Automaton, topos, regular language, coalgebra, finite monoid, profinite word, Myhill-Nerode theorem
2020 Mathematics Subject Classification:
18F10, 68Q70, 20M35, 18B20
Graduate School of Mathematical Sciences, University of Tokyo. hora@ms.u-tokyo

1. Introduction

This series of papers aims to propose a topos-theoretic framework for automata theory with the following future goals:

  • to unify aspects of automata theory in terms of topoi, and

  • to introduce geometric methods into automata theory.

The connection between category theory and automata theory is a richly historic area. There are a vast number of studies on the connection between category theory and automata theory, including [Adá74, Jac17, Rut19, CP20, GPA22], and also connections between topos theory and automata theory [Law04, Ura17, GPA22, Boc+23, Iwa24].

As far as the author knows, the novelty of this paper is to consider the topoi (consisting) of automata (not automata in topoi or topoi constructed from automata-theoretic gadgets.) Our starting point is the following fact: the category of automata (defined as a coalgebra QQΣ×{,}Q\to Q^{\Sigma}\times\{\top,\bot\}) is a presheaf topos (over the category of languages). (see corollary 2.13). In this series of papers, we will provide various “Grothendieck topoi of automata”, which can be regarded as variants of this topos. Some of them are presheaf topoi, but some are not.

The structure of this first paper is as follows:

section 2:

Introducing four topoi of automata.

section 3:

Proving that four characterizations of regular languages provide four descriptions of a single boolean-ringed topos (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}).

1. Introducing four topoi of automata.

In section 2, introducing and calculating four topoi of automata, we will see some automata-theoretic topics naturally arise in our approach of ‘topoi of automata.’ Those include language recognition, coalgebraic treatment, automata minimalization, the quotient of language, and regular languages. (See table 1 table 2, and table 3, though some rows in the tables will be treated in the follow-up papers).

topos theory automata theory
sheaf in Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set} \leftrightsquigarrow word action
point of Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set} \leftrightsquigarrow infinite word
The canonical point pp of Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set} \leftrightsquigarrow Run of Moore machine
The internal Boolean algebra p{,}p_{\ast}\{\top,\bot\} \leftrightsquigarrow Boolean algebra of languages
Path action on p{,}p_{\ast}\{\top,\bot\} \leftrightsquigarrow Quotient of language
Morphism to p{,}p_{\ast}\{\top,\bot\} \leftrightsquigarrow automaton == 2xΣ2x^{\Sigma}-coalgebra
Image of the Yoneda map ​H()\textrm{\!\maljapanese\char 72\relax}(\ast)\to\mathcal{L} \leftrightsquigarrow minimal automata
hyperconnected quotient Σ-𝐒𝐞𝐭Σ-𝐒𝐞𝐭o.f.\Sigma\text{-}\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} \leftrightsquigarrow Regular languages
Generated hyperconnected quotient \leftrightsquigarrow Syntactic monoid
Table 1. Some correspondence on Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}
topos theory automata theory
sheaf in 𝐀𝐭𝐦𝐭\mathbf{Atmt} \leftrightsquigarrow automaton == 2xΣ2x^{\Sigma}-coalgebra
Structure map of an étale space \leftrightsquigarrow language recognition == coinduction
étale covering 𝐀𝐭𝐦𝐭Σ-𝐒𝐞𝐭\mathbf{Atmt}\twoheadrightarrow\Sigma\text{-}\mathbf{Set} \leftrightsquigarrow Forgetting the accept states
essential point of 𝐀𝐭𝐦𝐭\mathbf{Atmt} \leftrightsquigarrow a language
Open subtopos of 𝐀𝐭𝐦𝐭\mathbf{Atmt} \leftrightsquigarrow a quotient-stable language class
Table 2. Some correspondence on 𝐀𝐭𝐦𝐭\mathbf{Atmt}
the canonical Boolean algebra \leftrightsquigarrow Boolean algebra of regular languages
The ringed site (Σ-𝐅𝐢𝐧𝐒𝐞𝐭,J),DFA(\Sigma\text{-}\mathbf{FinSet},J),\mathrm{DFA} \leftrightsquigarrow recognition by DFA
The ringed site (Σ-𝐅𝐢𝐧𝐌𝐨𝐧,J),𝒫(\Sigma\text{-}\mathbf{FinMon},J),\mathcal{P} \leftrightsquigarrow recognition by finite monoids
The topological monoid action Σ-𝐒𝐞𝐭o.f.𝐂𝐨𝐧𝐭(Σ^){\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\simeq\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}) \leftrightsquigarrow profinite words description
Table 3. Some correspondence on Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}
2. Proving that four characterizations of regular languages provide four descriptions of a single boolean-ringed topos (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}).

In section 3, we will deal with regular languages. Regular languages are a class of languages defined by certain finiteness properties and are known to have many characterizations (see fig. 1):

  • They are accepted by finite automata.

  • They are recognized by finite monoids.

  • They are (pullbacks of) clopen sets of profinite words.

  • Their corresponding Nerode congruence has only finitely many equivalence classes.

DFAfinite monoidsMyhill-Nerodeprofinite wordsThe ringed topos (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})ringed site (Σ-𝐅𝐢𝐧𝐒𝐞𝐭,DFA)(\Sigma\text{-}\mathbf{FinSet},\mathrm{DFA})ringed site (Σ-𝐅𝐢𝐧𝐌𝐨𝐧,𝒫)(\Sigma\text{-}\mathbf{FinMon},\mathcal{P})hyperconnected geometric morphism𝐂𝐨𝐧𝐭(Σ^)\mathbf{Cont}\left(\widehat{{{\Sigma}^{\ast}}}\right)
Figure 1. Four characterizations of regular languages are Morita equivalent.

We show that these data are Morita equivalent, in the sense that we construct (a priori four) Boolean-ringed topoi from these four data and prove that they are equivalent. The author regards this as an example of Olivia Caramello’s slogan of ‘toposes as bridges’ (see [Car23]), at least in a broader sense. This unified perspective demonstrates that the diverse views on regular languages can be interpreted as a single multifaceted topos.

On the follow-up papers

The contents of the follow-up papers include how points of the topoi categorify infinite words and how the complete lattice of hyperconnected quotients generalizes classes of languages and corresponding syntactic monoids.

Acknowledgement

The author would like to thank his supervisor, Ryu Hasegawa, for his continuous and helpful advice. I would like to thank Takeo Uramoto for his advice and for suggesting a connection with the variety theorem, Yuhi Kamio for his enlightening explanation of algebraic language theory, Morgan Rogers for the discussion on automata as topological monoid actions, Victor Iwaniack for topos theoretic automata theory, and Ryoma Sin’ya for his fascinating introduction to the field of automata.

I would like to extend my gratitude to Keisuke Hoshino, Takao Yuyama, Yusuke Inoue, Isao Ishikawa, Yuzuki Haga, David Jaz Myers, Ivan Tomasic, Igor Bakovic, and Joshua Wrigley for their helpful and encouraging discussions.

This research is supported by FoPM, WINGS Program, the University of Tokyo.

Notation 1.1.

In this note, we will fix a finite set of alphabet Σ\Sigma. Let Σ{{\Sigma}^{\ast}} denote the set of all words, i.e., the free monoid over the set Σ\Sigma.

2. Four topoi of automata

This section aims to introduce four topoi of automata

Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}:

The topos of word actions (section 2.1)

𝐀𝐭𝐦𝐭\mathbf{Atmt}:

The topos of (coalgebraic) automata (section 2.2)

Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}:

The topos of orbit-finite word actions (section 2.3)

𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}}:

The topos of orbit-finite automata (section 2.4)

and to provide a theoretical framework for the following sections and the follow-up papers.

2.1. Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}: The presheaf topos of word actions

This subsection focuses on the simplest topos in this paper, the topos of word actions Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}. Although the topos might seem too trivial to be interesting, we will observe that this topos has depth in its simplicity and naturally includes aspects of formal language theory.

2.1.1. Word actions form a topos

As mentioned in 1.1, the alphabet Σ\Sigma is fixed throughout the paper.

Definition 2.1.

We adopt the following definitions and terminologies.

  • A Σ\Sigma-set is a (possibly infinite) set QQ equipped with a function δ:Q×ΣQ\delta\colon Q\times\Sigma\to Q.

  • An element of QQ is called a state, and the associated function δ\delta is called the transition function.

  • A morphism of Σ\Sigma-sets f:(Q1,δ1)(Q2,δ2)f\colon(Q_{1},\delta_{1})\to(Q_{2},\delta_{2}) is a function f:Q1Q2f\colon Q_{1}\to Q_{2} that commutes with their transition functions.

    Q1×Σ{Q_{1}\times\Sigma}Q1{Q_{1}}Q2×Σ{Q_{2}\times\Sigma}Q2{Q_{2}}δ1\scriptstyle{\delta_{1}}f×idΣ\scriptstyle{f\times\mathrm{id}_{\Sigma}}f\scriptstyle{f}δ2\scriptstyle{\delta_{2}}
  • The category of Σ\Sigma-sets is denoted by Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}.

The following (seemingly boring) proposition is our starting point:

Proposition 2.2.

The category of Σ\Sigma-sets is equivalent to category of right Σ{{\Sigma}^{\ast}}-actions

Σ-𝐒𝐞𝐭𝐏𝐒𝐡(Σ).\Sigma\text{-}\mathbf{Set}\simeq\mathbf{PSh}({{\Sigma}^{\ast}}).

In particular, it is a presheaf topos (and hence, a Grothendieck topos).

Proof.

For a Σ\Sigma-set (Q,δ)(Q,\delta), the right action of a word w=a1anw=a_{1}\dots a_{n} on qQq\in Q is defined by

qwδ(δ(δ(q,a1),a2),an).qw\coloneqq\delta(\dots\delta(\delta(q,a_{1}),a_{2}),\dots a_{n}).

This construction naturally induces an equivalence of categories Σ-𝐒𝐞𝐭𝐏𝐒𝐡(Σ)\Sigma\text{-}\mathbf{Set}\to\mathbf{PSh}({{\Sigma}^{\ast}}). ∎

Remark 2.3 (Studies on the topos of word actions.).

This topos has been studied from several points of view. Of course, this topos is an example of topoi of (topological or discrete) monoid actions [Rog19, Rog23]. Even the case where Σ\Sigma is a singleton, has been of interest [LS09, Tom20, HK24]. The author also studied this topos (where Σ\Sigma is infinite) from the viewpoint of the study of quotient topoi [KH24].

2.1.2. The canonical point and the internal Boolean algebra of languages

This sub-subsection explains how the notion of languages arises from the topos Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}. This also serves as a preparation for section 2.2.

Definition 2.4.

We adopt the following definitions:

  • A language is a subset of the free monoid Σ{{\Sigma}^{\ast}}.

  • The set of languages 𝒫(Σ)\mathcal{P}({{\Sigma}^{\ast}}) will be denoted by \mathcal{L}.

  • The action δ:×Σ\delta\colon\mathcal{L}\times\Sigma\to\mathcal{L} is defined by the left quotient

    δ(L,a){vΣavL},\delta(L,a)\coloneqq\{v\in{{\Sigma}^{\ast}}\mid av\in L\},

    and δ(L,a)\delta(L,a) is denoted by a1La^{-1}L 111In this paper, terminologies and notations in automata theory are basically borrowed from [Pin22]..

Recall that a point of a Grothendieck topos \mathcal{E} is defined as a geometric morphism from the topos 𝐒𝐞𝐭𝐒𝐡(1)\mathbf{Set}\simeq\mathbf{Sh}(1) to the topos \mathcal{E}. A pointed topos is a Grothendieck topos \mathcal{E} equipped with a point p:𝐒𝐞𝐭p\colon\mathbf{Set}\to\mathcal{E}.

We will show that the notion of languages naturally arises from the topos Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}, using the next general lemma (see also [[]Lemma 2.3]rogers2023toposes):

Lemma 2.5 (Canonical boolean algebra in a pointed topos).

For a pointed topos p:𝐒𝐞𝐭p\colon\mathbf{Set}\to\mathcal{E}, the Boolean operations on {,}\{\top,\bot\} induces an internal Boolean algebra structure on the object p{,}p_{\ast}\{\top,\bot\} in \mathcal{E}.

Proof.

Since the right adjoint functor pp_{\ast} preserves all finite products, it preserves all internal algebras. ∎

We call this internal boolean algebra p{,}p_{\ast}\{\top,\bot\} the canonical Boolean algebra of a pointed topos p:𝐒𝐞𝐭p\colon\mathbf{Set}\to\mathcal{E}. To apply this to our topos Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}, we introduce the notion of the canonical point. This terminology is due to [Rog19].

Definition 2.6.

The canonical point p:𝐒𝐞𝐭Σ-𝐒𝐞𝐭p\colon\mathbf{Set}\to\Sigma\text{-}\mathbf{Set} of the topos Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set} is the geometric morphism

p:𝐒𝐞𝐭𝐏𝐒𝐡(1)𝐏𝐒𝐡(Σ)Σ-𝐒𝐞𝐭,p\colon\mathbf{Set}\simeq\mathbf{PSh}(1)\to\mathbf{PSh}({{\Sigma}^{\ast}})\simeq\Sigma\text{-}\mathbf{Set},

induced by the unique monoid homomorphism 1Σ1\to{{\Sigma}^{\ast}}, where the inverse image functor pp^{\ast} is the forgetful functor.

The direct image part pp_{\ast} is calculated by the formula of pointwise right Kan extension, and the result is as follows:

Lemma 2.7.

The direct image part p:𝐒𝐞𝐭Σ-𝐒𝐞𝐭p_{\ast}\colon\mathbf{Set}\to\Sigma\text{-}\mathbf{Set} sends a set XX to the set of functions p(X)=XΣp_{\ast}(X)=X^{{{\Sigma}^{\ast}}} equipped with the Σ{{\Sigma}^{\ast}}-action (ϕw)(v)=ϕ(wv)(\phi w)(v)=\phi(wv).

The following proposition provides a categorical description of the boolean algebra of languages and its universality, which serves as a foundation of this paper. For example, as we will see in the next subsection, this universality implies the famous coinductive description of language recognition.

Proposition 2.8 (The canonical Boolean algebra consists of all languages).

The Σ\Sigma-set of languages \mathcal{L} is isomorphic to the canonical Boolean algebra p({,})p_{\ast}(\{\top,\bot\}) (lemma 2.5), i.e.,

p({,}) in Σ-𝐒𝐞𝐭.\mathcal{L}\cong p_{\ast}(\{\top,\bot\})\text{ in }\Sigma\text{-}\mathbf{Set}.
Proof.

The isomorphism {,}Σp({,})\mathcal{L}\cong{\{\top,\bot\}}^{{{\Sigma}^{\ast}}}\cong p_{\ast}(\{\top,\bot\}) follows from the above lemma. ∎

Remark 2.9 (The point ppp^{\ast}\dashv p_{\ast} describes computation by Moore machine).

This adjunction ppp^{\ast}\dashv p_{\ast} exhibits the behavior of Moore machines. More precisely, for a set OO (of outputs) and a Σ\Sigma-set (Q,δ)(Q,\delta), the adjunction provides the one-to-one correspondence between

Output assignment to each state:

a function g:p(Q,δ)=QOg^{\sharp}\colon p^{*}(Q,\delta)=Q\to O, and

(Curried) run:

a Σ\Sigma-set morphism g:(Q,δ)OΣ=pOg^{\flat}\colon(Q,\delta)\to O^{{{\Sigma}^{\ast}}}=p_{*}O,

where gg^{\flat} sends qQq\in Q to

g(q):ΣO:wg(qw).{g^{\flat}}(q)\colon{{\Sigma}^{\ast}}\to O\colon w\mapsto g^{\sharp}(qw).

This is exactly same as the computation by a Moore machine, since g(qw)g^{\sharp}(qw) is the output of the run, with the initial state qQq\in Q and the input word wΣw\in{{\Sigma}^{\ast}}. The language recognition is the special case where O={,}O=\{\top,\bot\} (corollary 2.15).

2.2. 𝐀𝐭𝐦𝐭\mathbf{Atmt}: The presheaf topos of (coalgebraic) automata

This subsection aims to define a (presheaf) topos of automata, which rewrite the coalgebraic treatment of automata. We will observe that the category of (coalgebraicly defined) automata is a presheaf topos (corollary 2.13), and that the language recognition (== coinduction) is the structure map of a slice topos (corollary 2.15).

2.2.1. (Coalgebraic) automata form a topos

We define the category of automata as follows:

Definition 2.10.

We adopt the following definitions and terminologies:

  • An automaton222Our definition of automata does not contain the notion of start states. However, start states will naturally appear in our formulation, for example, in corollary 2.15, in remark 2.16, and also in the follow-up paper. is a Σ\Sigma-set (Q,δ:Q×ΣQ)(Q,\delta\colon Q\times\Sigma\to Q) equipped with a subset FQF\subset Q.

  • An element of FF is called an accept state.

  • A morphism of automata f:(Q1,δ1,F1)(Q2,δ2,F2)f\colon(Q_{1},\delta_{1},F_{1})\to(Q_{2},\delta_{2},F_{2}) is a Σ\Sigma-set morphism f:(Q1,δ1)(Q2,δ2)f\colon(Q_{1},\delta_{1})\to(Q_{2},\delta_{2}) that preserves and reflects accept states, (i.e., F1=f1(F2)F_{1}=f^{-1}(F_{2})).

  • The category of automata is denoted by 𝐀𝐭𝐦𝐭\mathbf{Atmt}.

Remark 2.11 (Automata as coalgebras).

In the category theory community, the above definition of automata has been considered in the context of colalgebras More precisely, the category of automata 𝐀𝐭𝐦𝐭\mathbf{Atmt} is equivalent to the category of coalgebras of an endofunctor 2xΣ:𝐒𝐞𝐭𝐒𝐞𝐭:X{accept,reject}×XΣ2x^{\Sigma}\colon\mathbf{Set}\to\mathbf{Set}\colon X\mapsto\{\text{accept},\text{reject}\}\times X^{\Sigma}.

𝐀𝐭𝐦𝐭𝐂𝐨𝐚𝐥𝐠2xΣ\mathbf{Atmt}\simeq\mathbf{Coalg}_{2x^{\Sigma}}

(For more details, see textbooks including [Jac17, Rut19])

Theorem 2.12.

The category of automata 𝐀𝐭𝐦𝐭\mathbf{Atmt} is equivalent to the slice category Σ-𝐒𝐞𝐭/\Sigma\text{-}\mathbf{Set}/\mathcal{L}.

𝐀𝐭𝐦𝐭Σ-𝐒𝐞𝐭/\mathbf{Atmt}\simeq\Sigma\text{-}\mathbf{Set}/\mathcal{L}
Proof.

Due to proposition 2.8, a Σ\Sigma-set morphism (Q,δ)(Q,\delta)\to\mathcal{L} corresponds to a function χF:Q{,}\chi_{F}\colon Q\to\{\top,\bot\}, which specifies the set of accept states FQF\subset Q. ∎

Geometrically speaking, the topos 𝐀𝐭𝐦𝐭\mathbf{Atmt} is an étale covering over the topos Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}.

Corollary 2.13.

The following four categories are mutually equivalent, and hence they are presheaf topoi, (in particular, Grothendieck topoi).

  • 𝐀𝐭𝐦𝐭\mathbf{Atmt}

  • 𝐂𝐨𝐚𝐥𝐠2xΣ\mathbf{Coalg}_{2x^{\Sigma}} (remark 2.11)

  • Σ-𝐒𝐞𝐭/\Sigma\text{-}\mathbf{Set}/\mathcal{L}

  • 𝐏𝐒𝐡()\mathbf{PSh}({{\int}\mathcal{L}}), where the category of languages {{\int}\mathcal{L}} is defined to be the category of elements of ob(𝐏𝐒𝐡(Σ))\mathcal{L}\in\mathrm{ob}(\mathbf{PSh}({{\Sigma}^{\ast}})).

Proof.

We have observed the equivalence between the first three categories. For the last one 𝐏𝐒𝐡()\mathbf{PSh}({{\int}\mathcal{L}}), this follows from the general fact of a slice of a presheaf topos 𝐏𝐒𝐡(𝒞)/P𝐏𝐒𝐡(P)\mathbf{PSh}(\mathcal{C})/P\simeq\mathbf{PSh}({\int}P). ∎

2.2.2. Language recognition

By abuse of notation, we will refer to the automaton of languages defined below and the Σ\Sigma-set of languages, by the same symbol \mathcal{L}.

Definition 2.14.

The automaton of languages, which is also denoted by \mathcal{L}, is the Σ\Sigma-set of languages \mathcal{L} equipped with the set of accept states FF\subset\mathcal{L} defined by

F{LεL},F\coloneqq\{L\in\mathcal{L}\mid\varepsilon\in L\},

where ε\varepsilon denotes the empty word.

Theorem 2.12 immediately implies (and provides a new perspective on) the following famous theorem in the coalgebraic theory of automata.

Corollary 2.15 (Categorical description of language recognition).

The automaton of languages \mathcal{L} is the terminal object of 𝐀𝐭𝐦𝐭\mathbf{Atmt}. Furthermore, for an automaton A=(Q,δ,F)A=(Q,\delta,F), the unique morphism AA\to\mathcal{L} in 𝐀𝐭𝐦𝐭\mathbf{Atmt} sends a state qQq\in Q to the language ()(\in\mathcal{L}) that the automaton (Q,δ,F)(Q,\delta,F) equipped with the starting state qq recognizes.

Proof.

This is usually proven by the theory of final coalgebras. (See, for example, [Jac17].) But we will derive it from theorem 2.12. Since the terminal object of a slice category 𝒞/c\mathcal{C}/c is the identity map idc:cc\mathrm{id}_{c}\colon c\to c in general, the terminal object of 𝐀𝐭𝐦𝐭Σ-𝐒𝐞𝐭/\mathbf{Atmt}\simeq\Sigma\text{-}\mathbf{Set}/\mathcal{L} is the identity map id:\mathrm{id}_{\mathcal{L}}\colon\mathcal{L}\to\mathcal{L}, which corresponds to the automaton of languages. The latter statement follows from the calculation of the adjunction ppp^{\ast}\dashv p_{\ast} (remark 2.9). ∎

Remark 2.16 (Yoneda point of view on initial state, recognition, and the minimal automaton).

Our topos-theoretic framework also describes the automata minimalization, which resembles the functorial approach [CP20]. Let ​H()\textrm{\!\maljapanese\char 72\relax}(\ast) denote the free Σ\Sigma-set, whose underlying set is Σ{{\Sigma}^{\ast}}, and the action is given by the concatenation of words. It is the unique representable presheaf in Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}. By the Yoneda lemma, a diagram

​H(){\textrm{\!\maljapanese\char 72\relax}(\ast)}(Q,δ){(Q,\delta)}{\mathcal{L}}q0\scriptstyle{\lceil{q_{0}}\rceil}χF\scriptstyle{\chi_{F}}

in Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set} corresponds to the data of an automaton (Q,δ,F)(Q,\delta,F) equipped with an initial state q0Qq_{0}\in Q. Their composite

​H(){\textrm{\!\maljapanese\char 72\relax}(\ast)}{\mathcal{L}}

corresponds to the recognized language LL\in\mathcal{L} by the Yoneda lemma.

Conversely, for any language LL\in\mathcal{L}, there is the corresponding morphism

​H(){\textrm{\!\maljapanese\char 72\relax}(\ast)}{\mathcal{L}}L\scriptstyle{\lceil L\rceil}

by the Yoneda lemma. Since Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set} is a topos, there is an epi-mono factorization, which provides the canonically constructed new automaton (equipped with an initial state):

​H(){\textrm{\!\maljapanese\char 72\relax}(\ast)}{\mathcal{L}}𝒜(L){\mathcal{A}(L)}L\scriptstyle{\lceil L\rceil}

and this new automaton 𝒜(L)\mathcal{A}(L) coincides with what’s called the minimal automaton (or Nerode-automaton, see details for [[]4.6 Minimal automata]pin2020mathematical) of the language LL.

2.3. Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}: The Grothendieck topos of orbit-finite word actions

So far, both of Σ-𝐒𝐞𝐭,𝐀𝐭𝐦𝐭\Sigma\text{-}\mathbf{Set},\mathbf{Atmt} are presheaf topoi, and we did not assume any finiteness assumption. However, needles to say, finiteness is crucial for the theory of regular languages. For example, a regular language is defined as a language recognizable by a finite automaton. So, our next question is: how can we deal with finiteness in our framework? To answer this question, we will define a Grothendieck topos, which is not a presheaf topos.

This subsection focuses on the category of orbit-finite Σ\Sigma-sets, which turns out to be a Grothendieck topos (proposition 2.19). The content of the present subsection will be generalized to a broader context in the follow-up paper.

2.3.1. Orbit-finite word actions form a topos

We introduce the notion of an orbit-finite Σ\Sigma-set and show that it forms a (pointed) Grothendieck topos. Then, we demonstrate that this topos characterizes a class of regular languages in complete parallel with section 2.1.

Definition 2.17.

We define the notion of local finiteness as follows:

  • For a Σ\Sigma-set (Q,δ)(Q,\delta) and its state qQq\in Q, the orbit of qq is the set {qwwΣ}Q\{qw\mid w\in{{\Sigma}^{\ast}}\}\subset Q.

  • A Σ\Sigma-set (Q,δ)(Q,\delta) is orbit-finite, if, for any qQq\in Q, its orbit is a finite set.

  • The category of orbit-finite Σ\Sigma-sets, which is a full subcategory of Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}, is denoted by Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}.

Remark 2.18 (Why orbit-finite, not finite?).

Regular languages are defined as languages that are recognized by finite automata. Then why do we consider orbit-finite automata instead of finite automata? There are many reasons, but they can be broadly divided into two:

  1. (1)

    It is equivalent to say that regular languages are those accepted by orbit-finite automata.

  2. (2)

    Orbit-finite automata are closed under more constructions than finite automata.

The first reason indicates that we do not necessarily need to stick to finite automata. The latter ensures good categorical properties, such as existense of small colimits, specifically appearing as:

  • Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} becomes a Grothendieck topos (proposition 2.19),

  • the collection of all regular languages forms an orbit-finite (but not finite) automaton (proposition 2.24).

The category of finite Σ\Sigma-sets will be utilized as a site for the topos (proposition 3.8). See also remark 3.6 and [Ura17]

The goal of this sub-subsection is to prove the following proposition. For the notion of hyperconnected geometric morphisms, see appendix A, [Joh02], or [Joh81].

Proposition 2.19.

The category Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} is a Grothendieck topos, and there is a hyperconnected geometric morphism

h:Σ-𝐒𝐞𝐭Σ-𝐒𝐞𝐭o.f.h\colon\Sigma\text{-}\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}

whose inverse image part h:Σ-𝐒𝐞𝐭o.f.Σ-𝐒𝐞𝐭h^{\ast}\colon{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\to\Sigma\text{-}\mathbf{Set} is the canonical embedding functor.

Proof.

Due to lemma A.2, it is enough to prive that the full subcategory Σ-𝐒𝐞𝐭o.f.Σ-𝐒𝐞𝐭{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\to\Sigma\text{-}\mathbf{Set} is closed under taking small coproducts, finite products, and subquoteints (subobjects, and quotient objects). This immediately follows from the concrete calculation (and the fact that finite limits, subobjects, and quotient objects are preserved by the forgetful functor Σ-𝐒𝐞𝐭𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}\to\mathbf{Set}). ∎

We have proven the above proposition by abstract nonsense, but we also provides a concrete description of the right adjoint for the later referrences.

Lemma 2.20 (Construction of hh_{\ast}).

We can construct the right adjoint h:Σ-𝐒𝐞𝐭Σ-𝐒𝐞𝐭o.f.h_{*}\colon\Sigma\text{-}\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} as follows:

  • For a Σ\Sigma-set (Q,δ)(Q,\delta) and its subset

    Qfin{qQthe orbit of q is finite},Q_{\text{fin}}\coloneqq\{q\in Q\mid\text{the orbit of }q\text{ is finite}\},

    (Qfin,δ)(Q_{\text{fin}},\delta) is the maximum orbit-finite subΣ\Sigma-set.

  • The above construction (Q,δ)(Qfin,δ)(Q,\delta)\mapsto(Q_{\text{fin}},\delta) defines the right adjoint to the full embedding functor h:Σ-𝐒𝐞𝐭o.f.Σ-𝐒𝐞𝐭h^{*}\colon{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\to\Sigma\text{-}\mathbf{Set}.

Proof.

The former part is easy to prove. Notice that QfinQ_{\text{fin}} is closed by the Σ{{\Sigma}^{\ast}} actions. The latter part follows from the former one and lemma A.2. ∎

We obtained the concrete description h(Q,δ)=(Qfin,δ)h_{\ast}(Q,\delta)=(Q_{\text{fin}},\delta) from lemma 2.20. The monic counit (see definition A.1) is the inclusion ϵ(Q,δ):(Qfin,δ)(Q,δ)\epsilon_{(Q,\delta)}\colon(Q_{\text{fin}},\delta)\rightarrowtail(Q,\delta).

2.3.2. The canonical point and the internal Boolean algebra of regular languages

We will observe the canonical point and the internal Boolean algebra of Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} in parallel with section 2.1.2.

Definition 2.21.

The canonical point333This terminology is also a special case of [Rog23]. Furthermore, the definition of this point as a compotite of essential surjective point followed by a hyperconnected geometric morphism, is nothing other than the characterization of toposes of topological monoid actions ([[]Theorem 3.20.]rogers2023toposes). We will come back to this observation shortly in section 3.4 and extensively in the follow-up paper. po.f.{p_{\mathrm{o.f.}}} of Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} is the composite of

po.f.:𝐒𝐞𝐭{{p_{\mathrm{o.f.}}}\colon\mathbf{Set}}Σ-𝐒𝐞𝐭{\Sigma\text{-}\mathbf{Set}}Σ-𝐒𝐞𝐭o.f..{{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}.}p\scriptstyle{p}h\scriptstyle{h}

Since Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} is a pointed topos, there is the associated internal Boolean algebra po.f.({,}){p_{\mathrm{o.f.}}}_{\ast}(\{\top,\bot\}) (by lemma 2.5). We will prove that it is the Boolean algebra of regular languages. To prove it, we need the Myhill-Nerode theorem (see [Pin22]), in our terminology444Myhill-Nerode theorem in terms of Nerode congruences will appear in the follow-up paper, utilizing the theory of a local state classifier [Hor24]..

According to remark 2.16, let ​H()Σ\textrm{\!\maljapanese\char 72\relax}(\ast)\cong{{\Sigma}^{\ast}} denote the free Σ\Sigma-set, and ​H(){\textrm{\!\maljapanese\char 72\relax}(\ast)}{\mathcal{L}}L\scriptstyle{\lceil L\rceil} denote the morphism corresponding to a language LL\in\mathcal{L} by the Yoneda lemma. Using this, Myhill-Nerode theorem is

Lemma 2.22 (Myhill-Nerode theorem).

For a language LL\in\mathcal{L}, the followings are equivalent:

  1. (1)

    LL is regular (i.e., recognized by a finite automaton).

  2. (2)

    The Yoneda-corresponding morphism

    ​H(){\textrm{\!\maljapanese\char 72\relax}(\ast)}{\mathcal{L}}L\scriptstyle{\lceil L\rceil}

    factor through a finite Σ\Sigma-set 555A Σ\Sigma-set is said to be finite, if its underlying set is finite (definition 3.4). .

  3. (3)

    The image of

    ​H(){\textrm{\!\maljapanese\char 72\relax}(\ast)}{\mathcal{L}}L\scriptstyle{\lceil L\rceil}

    is finite.

  4. (4)

    The orbit of LL\in\mathcal{L}, which is {w1LwΣ}\{w^{-1}L\mid w\in{{\Sigma}^{\ast}}\}\subset\mathcal{L}, is finite.

Proof.

The first two conditions are just paraphrases due to remark 2.16. The last two conditions are more appearently equivalent, since {w1LwΣ}\{w^{-1}L\mid w\in{{\Sigma}^{\ast}}\}\subset\mathcal{L} is the image of L\lceil L\rceil. The equivalence between second and third conditions follows from the image factorizations in Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}. ∎

Definition 2.23 (orbit-finite Σ\Sigma-set of regular languages).

The orbit-finite Σ\Sigma-set of regular languages is denoted by \mathcal{R}.

Proposition 2.24 (The canonical Boolean algebra consists of regular languages).

The canonical internal Boolean algebra (described in lemma 2.5) of the pointed topos po.f.:𝐒𝐞𝐭Σ-𝐒𝐞𝐭o.f.{p_{\mathrm{o.f.}}}\colon\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} is isomorphic to the orbit-finite Σ\Sigma-set of regular languages:

po.f.({,}) in Σ-𝐒𝐞𝐭o.f..\mathcal{R}\cong{p_{\mathrm{o.f.}}}_{\ast}(\{\top,\bot\})\text{ in }{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}.
Proof.

We have the following isomorphisms

po.f.({,})h(p({,}))h(),{p_{\mathrm{o.f.}}}_{\ast}(\{\top,\bot\})\cong h_{\ast}(p_{\ast}(\{\top,\bot\}))\cong h_{\ast}(\mathcal{L}),

by proposition 2.8. Furthermore, lemma 2.20 and lemma 2.22 imply that h()h_{\ast}(\mathcal{L})\cong\mathcal{R}, which completes the proof. ∎

2.4. 𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}}: The Grothendieck topos of orbit-finite automata

Definition 2.25.

We adopt the following definitions and terminologies:

  • An automaton (Q,δ,F)(Q,\delta,F) is orbit-finite if its underlying Σ\Sigma-set (Q,δ)(Q,\delta) is orbit-finite.

  • The category of orbit-finite automata is denoted by 𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}}.

In parallel with theorem 2.12, we obtain the following “slice description” of 𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}}.

Proposition 2.26.

The category of orbit-finite automata is equivalent to the slice category

𝐀𝐭𝐦𝐭o.f.Σ-𝐒𝐞𝐭o.f./.\mathbf{Atmt}_{\mathrm{o.f.}}\simeq{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}/\mathcal{R}.
Proof.

The same proof as theorem 2.12. ∎

Then we obtain a variant of corollary 2.15 for regular languages.

Corollary 2.27.

The category 𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}} is a Grothendieck topos.

Proof.

Every slice category of a Grothendieck topos is again a Grothendieck topos. ∎

Proposition 2.28.

We can observe the recognition of regular languages in the Grothendieck topos 𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}}:

  • The terminal object of 𝐀𝐭𝐦𝐭o.f.\mathbf{Atmt}_{\mathrm{o.f.}} is the orbit-finite automaton of regular languages \mathcal{R}.

  • Furthermore, for an orbit-finite automaton (Q,δ,F)(Q,\delta,F), the unique map !:(Q,δ,F)!\colon(Q,\delta,F)\to\mathcal{R} sends a state qQq\in Q to the regular language that (Q,δ,q,F)(Q,\delta,q,F) recognizes with the start state qq.

In the follow-up paper, the content of this subsection will be generalized to other hyperconencted geometric morphisms form Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set}, so that we can consider other classes of (possibly non-regular) languages.

3. Four Morita equivalent definitions of the Boolean-ringed topos of regular languages (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})

In this section, we will observe that the following four characterizations of regular languages provide four different “Morita equivalent” definitions of the single Boolean-ringed topos (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}).

A language LL is regular, if and only if

section 3.1 Myhill-Nerode theorem:

its orbit {w1LwΣ}\{w^{-1}L\mid w\in{{\Sigma}^{\ast}}\} is finite.

section 3.2 DFA:

it is recognized by a deterministic finite automaton.

section 3.3 Finite monoids.:

there is a monoid homomorphism f:ΣMf\colon{{\Sigma}^{\ast}}\to M to a finite monoid MM and a subset S𝒫(M)S\in\mathcal{P}(M) such that L=f1(S)L=f^{-1}(S).

section 3.4 Profinite words:

there is a clopen subset SΣ^S\subset\widehat{{{\Sigma}^{\ast}}} such that LL is the inverse image of SS along the canonical embedding ΣΣ^{{\Sigma}^{\ast}}\to\widehat{{{\Sigma}^{\ast}}}.

Put simply, what we aim to do in this section is to categorify the “equivalence between these conditions” and lift it to “isomorphism between structures.”

3.1. Description by the canonical point == Myhill-Nerode theorem

We adopt the following terminology:

Definition 3.1.

A Boolean-ringed topos666We prefer the word ‘Boolean ring,’ just because it is more conventional to say ‘ringed topos’ rather than ‘algebra-ed topos.’ is a Grothendieck topos equipped with an internal Boolean algebra (as a “structure sheaf”).

For example, every pointed topos is canonically a Boolean-ringed topos by lemma 2.5.

Definition 3.2.

The Boolean-ringed topos of regular languages (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}) is the (pointed) topos Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} (definitions 2.17 and 2.19) equipped with the canonical internal boolean algebra (structure sheaf) \mathcal{R} (proposition 2.24).

As we have seen in proposition 2.24, the Boolean-ringed topos (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}) is the one induced by the canonical point

po.f.:𝐒𝐞𝐭Σ-𝐒𝐞𝐭o.f..{p_{\mathrm{o.f.}}}\colon\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}.

This is essentially equivalent to the Myhill-Nerode theorem (lemma 2.22).

Remark 3.3 (Connection with the Nerode-congruence.).

Usually, the Myhill-Nerode theorem is stated in terms of (Nerode-)congruence. The follow-up paper will make the connection with Nerode-congruence more explicit. We will observe that the local state classifier Ξ\Xi (defined in [Hor24]) consists of right congruences of Σ{{\Sigma}^{\ast}} and that the canonical morphism ξ:Ξ\xi_{\mathcal{L}}\colon\mathcal{L}\to\Xi sends a language LL to its Nerode-congruence L{\sim}_{L}. The Myhill-Nerode theorem will be paraphrased as a pullback diagram along the morphism ξ\xi_{\mathcal{L}}.

3.2. Description by DFA

Definition 3.4.

We adopt the following terminologies:

  • A Σ\Sigma-set (Q,δ)(Q,\delta) is finite if the set of states QQ is a finite set.

  • The category of finite Σ\Sigma-sets is denoted by Σ-𝐅𝐢𝐧𝐒𝐞𝐭\Sigma\text{-}\mathbf{FinSet}.

  • Let JJ be the Grothendieck topology generated by jointly surjective families.

The next lemma is also mentioned in [Ura17].

Lemma 3.5 (Equivalence between the underlying topoi).

𝐒𝐡(Σ-𝐅𝐢𝐧𝐒𝐞𝐭,J)Σ-𝐒𝐞𝐭o.f.\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J)\simeq{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}

Proof.

Since Σ-𝐅𝐢𝐧𝐒𝐞𝐭\Sigma\text{-}\mathbf{FinSet} is a full subcategory of Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}, due to Giraud’s theorem, it is enough to show that finite Σ\Sigma-sets form a generating set of Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}. This is easily implied by the definition of orbit-finiteness. ∎

Remark 3.6.

Unless Σ\Sigma is empty, the category Σ-𝐅𝐢𝐧𝐒𝐞𝐭\Sigma\text{-}\mathbf{FinSet} is not an elementary topos, a fortiori, not a Grothendieck topos. However, this belongs to a good class of categories, semi-Galois categories [Ura17].

We define the JJ-sheaf of DFA:Σ-𝐅𝐢𝐧𝐒𝐞𝐭op𝐒𝐞𝐭\mathrm{DFA}\colon\Sigma\text{-}\mathbf{FinSet}^{\mathrm{op}}\to\mathbf{Set} (deterministic finite automata). The set DFA(Q,δ)\mathrm{DFA}(Q,\delta) is intended to be the set of all DFA structures over (Q,δ)(Q,\delta), i.e. DFA(Q,δ){(Q,δ,F)FQ}\mathrm{DFA}(Q,\delta)\coloneqq\{(Q,\delta,F)\mid F\subset Q\}. This can be simplified as follows.

Definition 3.7.

The sheaf of DFA is the JJ-sheaf of Boolean algebras

DFA:Σ-𝐅𝐢𝐧𝐒𝐞𝐭op𝐁𝐨𝐨𝐥𝐀𝐥𝐠:(Q,δ)𝒫(Q).\mathrm{DFA}\colon\Sigma\text{-}\mathbf{FinSet}^{\mathrm{op}}\to\mathbf{BoolAlg}\colon(Q,\delta)\to\mathcal{P}(Q).

The proof of the following proposition verifies that DFA\mathrm{DFA} is indeed a JJ-sheaf of Boolean algebras.

Proposition 3.8.

The two Boolean-ringed topoi are equivalent:

(Σ-𝐒𝐞𝐭o.f.,)(𝐒𝐡(Σ-𝐅𝐢𝐧𝐒𝐞𝐭,J),DFA).({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})\simeq(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J),\mathrm{DFA}).
Proof.

The equivalence of topoi is due to lemma 3.5. Since we have proven that (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}) is an internal Boolean algebra and the equivalence N:Σ-𝐒𝐞𝐭o.f.𝐒𝐡(Σ-𝐅𝐢𝐧𝐒𝐞𝐭,J)N\colon{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}\xrightarrow{{\simeq}}\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J) is given by

X(N(X):(Q,δ)Σ-𝐒𝐞𝐭o.f.((Q,δ),X)),X\mapsto\left(N(X)\colon(Q,\delta)\mapsto{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),X)\right),

the corresponding internal Boolean algebra is given by N()N(\mathcal{R})

N():(Q,δ)Σ-𝐒𝐞𝐭o.f.((Q,δ),).N(\mathcal{R})\colon(Q,\delta)\mapsto{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),\mathcal{R}).

Then, proposition 2.24 implies

N()(Q,δ)Σ-𝐒𝐞𝐭o.f.((Q,δ),)𝐒𝐞𝐭(Q,{,})DFA(Q,δ),N(\mathcal{R})(Q,\delta)\cong{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),\mathcal{R})\cong\mathbf{Set}(Q,\{\top,\bot\})\cong\mathrm{DFA}(Q,\delta),

which completes the proof. ∎

Notice that this equivalence of two Boolean-ringed topoi actually capture the notion of language recognition by DFA, since the correpondence DFA(Q,δ)Σ-𝐒𝐞𝐭o.f.((Q,δ),)\mathrm{DFA}(Q,\delta)\cong{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}((Q,\delta),\mathcal{R}) is nothing but the language recognition (remarks 2.9 and 2.28).

3.3. Description by finite monoids

This subsection aims to reconstruct \mathcal{R} in terms of the recognizability by finite monoids.

Definition 3.9.

A finite Σ\Sigma-monoid is a pair of a finite monoid MM and a Σ\Sigma-indexed family of elements {ma}aΣ\{m_{a}\}_{a\in\Sigma}.

Definition 3.10.

We define the category Σ-𝐅𝐢𝐧𝐌𝐨𝐧\Sigma\text{-}\mathbf{FinMon} as follows:

  • an object is a finite Σ\Sigma-monoid (M,{ma}aΣ)(M,\{m_{a}\}_{a\in\Sigma}),

  • a morphism (M,{ma}aΣ)(M,{ma}aΣ)(M,\{m_{a}\}_{a\in\Sigma})\to(M^{\prime},\{m^{\prime}_{a}\}_{a\in\Sigma}) is a function f:MMf\colon M\to M^{\prime} such that f(xma)=f(x)maf(xm_{a})=f(x)m^{\prime}_{a} for any aΣa\in\Sigma and xMx\in M.

Notice that Σ-𝐅𝐢𝐧𝐌𝐨𝐧\Sigma\text{-}\mathbf{FinMon} is a full subcategory of Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} (not of Σ/𝐌𝐨𝐧𝐨𝐢𝐝𝐬{{\Sigma}^{\ast}}/\mathbf{Monoids}), since a finite Σ\Sigma-monoid (M,{ma}aΣ)(M,\{m_{a}\}_{a\in\Sigma}) can be regarded as a finite Σ\Sigma-set (M,δ¯)(M,\overline{\delta}) with δ¯(m,a)mma\overline{\delta}(m,a)\coloneqq m\cdot m_{a}.

Letting JJ denote the Grothendieck topology, generated by the jointly surjective families in Σ-𝐒𝐞𝐭o.f.{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}, we obtain the following proposition

Proposition 3.11.

The following two Boolean ringed topoi are equivalent:

(Σ-𝐒𝐞𝐭o.f.,)(𝐒𝐡(Σ-𝐅𝐢𝐧𝐌𝐨𝐧,J),𝒫),({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})\simeq(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinMon},J),\mathcal{P}),

where 𝒫\mathcal{P} denotes the power set functor Σ-𝐅𝐢𝐧𝐌𝐨𝐧op𝑈𝐅𝐢𝐧𝐒𝐞𝐭op𝒫𝐒𝐞𝐭\Sigma\text{-}\mathbf{FinMon}^{\mathrm{op}}\xrightarrow{U}\mathbf{FinSet}^{\mathrm{op}}\xrightarrow{\mathcal{P}}\mathbf{Set}.

Proof.

The proof is almost the same as the proof of proposition 3.8. The only non-trivial part is proving that Σ-𝐅𝐢𝐧𝐌𝐨𝐧Σ-𝐒𝐞𝐭o.f.\Sigma\text{-}\mathbf{FinMon}\hookrightarrow{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} is a generating full subcategory. It is enough to show that, for an arbitrary finite Σ\Sigma-set (Q,δ)(Q,\delta) and an element q0Qq_{0}\in Q, there exists a finite Σ\Sigma-monoid (M,{ma}aΣ)(M,\{m_{a}\}_{a\in\Sigma}) and a Σ\Sigma-set homomorphism f:(M,δ¯)(Q,δ)f\colon(M,\overline{\delta})\to(Q,\delta) such that q0Im(f)q_{0}\in\mathrm{Im}(f). Let us consider End(Q)op\mathrm{End}(Q)^{\mathrm{op}}, the opposite monoid of the endofunction monoid End(Q)\mathrm{End}(Q). In other words, elements of the monoid End(Q)op\mathrm{End}(Q)^{\mathrm{op}} are functions QQQ\to Q, and the multiplication ϕψ\phi\cdot\psi is defined to be ψϕ\psi\circ\phi. The family of endomorphisms {δ(,a):QQ}aΣ\{\delta({-},a)\colon Q\to Q\}_{a\in\Sigma} makes it a finite Σ\Sigma-monoid. The function

f:End(Q)opQ:ϕϕ(q0)f\colon\mathrm{End}(Q)^{\mathrm{op}}\to Q\colon\phi\mapsto\phi(q_{0})

is a Σ\Sigma-set morphism, since (ϕδ(,a))(q0)=δ(ϕ(q0),a)(\phi\cdot\delta({-},a))(q_{0})=\delta(\phi(q_{0}),a). The element q0q_{0} belongs to the image Im(f)\mathrm{Im}(f), since the identity function idQEnd(Q)op\mathrm{id}_{Q}\in\mathrm{End}(Q)^{\mathrm{op}} is sent to q0q_{0}. ∎

Remark 3.12 (This is not satisfying enough!).

The category Σ-𝐅𝐢𝐧𝐌𝐨𝐧\Sigma\text{-}\mathbf{FinMon} is not quite monoid-theoretic, in the sense that two isomorphic objects in Σ-𝐅𝐢𝐧𝐌𝐨𝐧\Sigma\text{-}\mathbf{FinMon} might be non-isomorphic as monoids. A more natural way to understand monoid-theoretic aspects, including the theory of syntactic monoids, will be proposed in the follow-up paper.

Remark 3.13 (Other generating sets).

The arguments in section 3.2 and section 3.3 only utilize the fact that the considered full subcategories, namely Σ-𝐅𝐢𝐧𝐒𝐞𝐭\Sigma\text{-}\mathbf{FinSet} and Σ-𝐅𝐢𝐧𝐌𝐨𝐧\Sigma\text{-}\mathbf{FinMon}, are generating subcategories. We can do the same for other generating sets of objects.

3.4. Description by clopen subsets of profinite words

This subsection needs a few preliminaries on topological monoids and profinite words. First, recall basics of the topos of topological monoid actions. See [Rog23] for an extensive study on this topic.

Lemma 3.14 (Recall on the toposes of topological monoid actions. [Rog23]).

For a topological monoid777[Rog23] deals with monoid equipped with a topology, whose multiplication is not necessarily continuous. MM,

  • the topos of continuous actions of MM 𝐂𝐨𝐧𝐭(M)\mathbf{Cont}(M) is defined to be a full subcategory of 𝐏𝐒𝐡(M)\mathbf{PSh}(M) that consists of MM-sets (X,X×MX)(X,X\times M\to X) such that the action map X×MXX\times M\to X is continuous, with respect to the discrete topology on XX and the product topology on X×MX\times M.

  • 𝐂𝐨𝐧𝐭(M)\mathbf{Cont}(M) is a Grothendieck topos.

  • The forgetful functor U:𝐂𝐨𝐧𝐭(M)𝐒𝐞𝐭U\colon\mathbf{Cont}(M)\to\mathbf{Set} has a right adjoint, and the adjunction

    𝐒𝐞𝐭{{\mathbf{Set}}}{\perp}𝐂𝐨𝐧𝐭(M){{\mathbf{Cont}(M)}}U\scriptstyle{U}

    defines a surjective geometric morphism 𝐒𝐞𝐭𝐂𝐨𝐧𝐭(M)\mathbf{Set}\to\mathbf{Cont}(M). We call this point p:𝐒𝐞𝐭𝐂𝐨𝐧𝐭(M)p\colon\mathbf{Set}\to\mathbf{Cont}(M) the canonical point.

Lemma 3.15.

For a compact topological monoid MM, the corresponding internal Boolean algebra of the pointed topos

p:𝐒𝐞𝐭𝐂𝐨𝐧𝐭(M)p\colon\mathbf{Set}\to\mathbf{Cont}(M)

(given by lemma 2.5) is the Boolean algebra of clopen subsets Clopen(M)\mathrm{Clopen}(M).

Proof.

The point p:𝐒𝐞𝐭𝐂𝐨𝐧𝐭(M)p\colon\mathbf{Set}\to\mathbf{Cont}(M) is decomposed into the composite of 𝐒𝐞𝐭𝐏𝐒𝐡(M)𝐂𝐨𝐧𝐭(M)\mathbf{Set}\to\mathbf{PSh}(M)\to\mathbf{Cont}(M), where 𝐏𝐒𝐡(M)\mathbf{PSh}(M) denotes the topos of discrete actions. This decomposition allows us to calculate the internal Boolean algebra BB as a Boolean subalgebra of 𝒫(M)\mathcal{P}(M). What we will prove is that BB coincides with the Boolean subalgebra Clopen(M)𝒫(M)\mathrm{Clopen}(M)\subset\mathcal{P}(M).

The calculation of the direct image functor 𝐏𝐒𝐡(M)𝐂𝐨𝐧𝐭(M)\mathbf{PSh}(M)\to\mathbf{Cont}(M) implies that a subset SMS\subset M belongs to BB if and only if every equivalence class of the equivalence relation

aSba1S=b1Sa\sim_{S}b\iff a^{-1}S=b^{-1}S

is open (See [[]Scholium 2.9.]rogers2023toposes for the details.). Here, a1Sa^{-1}S denotes {mMamS}\{m\in M\mid am\in S\}.

First, we will prove that BClopen(M)B\subset\mathrm{Clopen}(M) (without the assumption of the compactness). Since for every sSs\in S, sSbs1S=b1Seb1SbSs\sim_{S}b\iff s^{-1}S=b^{-1}S\implies e\in b^{-1}S\iff b\in S, SBS\in B implies that SS is open. SS is closed, because MSM\setminus S also belongs to BB.

We will prove BClopen(M)B\supset\mathrm{Clopen}(M) (using the assumption of compactness). Let SMS\subset M be an arbitrary clopen subset, and aMa\in M be an arbitrary element. It is enough to construct an open neighborhood aUMa\in U\subset M such that bU,aSb\forall b\in U,\;a\sim_{S}b. For each mMm\in M, we can take open neighborhoods aUmMa\in U_{m}\subset M and mVmMm\in V_{m}\subset M such that

aUm,mVm,(amSamS),\forall a^{\prime}\in U_{m},\forall m^{\prime}\in V_{m},\;(a^{\prime}m^{\prime}\in S\iff am\in S),

since the multiplication map M×MMM\times M\to M is continuous. The compactness of MM allows us to pick a finite subcover M=Vm1VmnM=V_{m_{1}}\cup\dots\cup V_{m_{n}}. Take an arbitrary element bb of UUm1UmnU\coloneqq U_{m_{1}}\cap\dots\cap U_{m_{n}}. What we need to prove is that aSba\sim_{S}b. Take an arbitrary element mMm\in M. For 1in1\leq i\leq n with mVmim\in V_{m_{i}}, we have

bmSamiSamS,bm\in S\iff am_{i}\in S\iff am\in S,

which proves that aSba\sim_{S}b. ∎

In particular, if the topological monoid MM is profinite, then the corresponding internal Boolean algebra is its Stone dual (equipped with the canonical continuous MM action).

Definition 3.16.

Let Σ^\widehat{{{\Sigma}^{\ast}}} denote the topological monoid of profinite words, i.e., the profinite completion of the monoid Σ{{\Sigma}^{\ast}}. Let 𝐂𝐨𝐧𝐭(Σ^)\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}) be its topos of continuous actions.

For a detailed explanation of the notion of profinite words and its relation to automata theory, see [Pin22]. See also [Ura17] for the description of the topos 𝐂𝐨𝐧𝐭(Σ^)\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}).

Lemma 3.17.

The pointed topos p:𝐒𝐞𝐭𝐂𝐨𝐧𝐭(Σ^)p\colon\mathbf{Set}\to\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}) is equivalent to p:𝐒𝐞𝐭Σ-𝐒𝐞𝐭o.f.p\colon\mathbf{Set}\to{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}} as pointed topoi.

Σ-𝐒𝐞𝐭o.f.{{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}}𝐒𝐞𝐭{\mathbf{Set}}{{\cong}}𝐂𝐨𝐧𝐭(Σ^){\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}})}e\scriptstyle{e}

\scriptstyle\simeq

p\scriptstyle{p}p\scriptstyle{p}
Proof.

Since Σ-𝐒𝐞𝐭\Sigma\text{-}\mathbf{Set} is the topos of (continuous) actions of the discrete topological monoid Σ{{\Sigma}^{\ast}}, the canonical inclusion ι:ΣΣ^\iota\colon{{\Sigma}^{\ast}}\rightarrowtail\widehat{{{\Sigma}^{\ast}}} induces a geometric morphism g:Σ-𝐒𝐞𝐭Σ^g\colon\Sigma\text{-}\mathbf{Set}\to\widehat{{{\Sigma}^{\ast}}}, where gg^{*} is given by the restriction of the action along ι\iota:

Σ{{{\Sigma}^{\ast}}}Σ^{\widehat{{{\Sigma}^{\ast}}}}End(Q).{\mathrm{End}(Q).}ι\scriptstyle{\iota}

By the construction, gg^{*} is faithful. The denseness of ι:ΣΣ^\iota\colon{{\Sigma}^{\ast}}\to\widehat{{{\Sigma}^{\ast}}} implies that gg^{*} is full, i.e., the geometric morphism g:Σ-𝐒𝐞𝐭𝐂𝐨𝐧𝐭(Σ^)g\colon\Sigma\text{-}\mathbf{Set}\to\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}) is connected. The compactness of Σ^\widehat{{{\Sigma}^{\ast}}} implies that each orbit of continuous Σ^\widehat{{{\Sigma}^{\ast}}} action is finite, i.e., the geometric morphism gg factors through the hyperconnected geometric morphism hh as follows.

Σ-𝐒𝐞𝐭{\Sigma\text{-}\mathbf{Set}}Σ-𝐒𝐞𝐭o.f.{{\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}}}𝐒𝐞𝐭{\mathbf{Set}}{{\cong}}𝐂𝐨𝐧𝐭(Σ^){\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}})}h\scriptstyle{h}g\scriptstyle{g}e\scriptstyle{e}{\cong}p\scriptstyle{p}p\scriptstyle{p}

Since gg and hh are connected, so is ee. The remaining task is to prove ee^{*} is essentially surjective. Since ee^{*} is coreflective (and hence creates all colimits), it suffices to prove that the essential image of ee^{*} contains a generating set. By definition of profinite completion, every finite quotient monoid ΣM{{\Sigma}^{\ast}}\twoheadrightarrow M is a continuous quotient of Σ^\widehat{{{\Sigma}^{\ast}}}, which implies that the canonical action M×ΣMM\times\Sigma\to M belongs to the essential image of ee^{*}. The same argument to proposition 3.11 completes the proof. ∎

Proposition 3.18.

The following two Boolean ringed topoi are equivalent:

(Σ-𝐒𝐞𝐭o.f.,)(𝐂𝐨𝐧𝐭(Σ^),Clopen(Σ^)).({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R})\simeq(\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}),\mathrm{Clopen}(\widehat{{{\Sigma}^{\ast}}})).
Proof.

This immediately follows from lemma 3.15 and lemma 3.17. ∎

As a summary of this section, we obtain the following theorem:

Theorem 3.19.

The following four Boolean-ringed topoi are all equivalent to the Boolean-ringed topos of regular languages (Σ-𝐒𝐞𝐭o.f.,)({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},\mathcal{R}).

Myhill-Nerode:

(Σ-𝐒𝐞𝐭o.f.,p({,}))({\Sigma\text{-}\mathbf{Set}}_{\mathrm{o.f.}},p_{\ast}(\{\top,\bot\}))

DFA:

(𝐒𝐡(Σ-𝐅𝐢𝐧𝐒𝐞𝐭,J),DFA)(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinSet},J),\mathrm{DFA})

Finite Monoids and their subsets:

(𝐒𝐡(Σ-𝐅𝐢𝐧𝐌𝐨𝐧,J),𝒫)(\mathbf{Sh}(\Sigma\text{-}\mathbf{FinMon},J),\mathcal{P})

Profinite Words and its clopen subsets:

(𝐂𝐨𝐧𝐭(Σ^),Clopen(Σ^))(\mathbf{Cont}(\widehat{{{\Sigma}^{\ast}}}),\mathrm{Clopen}(\widehat{{{\Sigma}^{\ast}}}))

Appendix A Preliminaries on hyperconnected geometric morphism

This appendix aims to recall the notion of hyperconnected geometric morphisms. See [Joh02], or [Joh81] for more details.

Definition A.1 (Hyperconnected geometric morphism).

A geometric morphism f:f\colon\mathcal{E}\to\mathcal{F} is called hyperconnected if its inverse image functor ff^{\ast} is fully faithful (i.e., ff is connected) and satisfy the following equivalent conditions:

  • the essential image of ff^{*} is closed under subquotients.

  • the counit ϵ:ffid\epsilon\colon f^{*}f_{*}\Rightarrow\mathrm{id}_{\mathcal{E}} is monic.

Lemma A.2.

For a Grothendieck topos \mathcal{E}, if a full subcategory ι:\iota\colon\mathcal{F}\hookrightarrow\mathcal{E} is closed under

  • small coproducts,

  • finite products, and

  • subquotients (subobjects and quoteint objects),

then \mathcal{F} is also a Grothendieck topos, and there is a hyperconnected geoemteric morphism h:h\colon\mathcal{E}\to\mathcal{F}, whose inverse image functor h:h^{*}\colon\mathcal{F}\to\mathcal{E} coincides with the embedding functor ι\iota.

Proof.

Under the assumption, ι\iota admits a right adjoint R:R\colon\mathcal{E}\to\mathcal{F}, which sends an object XobX\in\mathrm{ob}\mathcal{E} to the maximum subobject belonging to (the essential image of the embedding of) \mathcal{F}.

{{\mathcal{E}}}{\perp}{{\mathcal{F}}}R\scriptstyle{R}ι\scriptstyle{\iota}

Since ι\iota preserves all finite limits, which are constructed by fintie products and (regular) subobjects, this adjunction is lex coreflective, in particular, lex comonadic. This proves that \mathcal{F} is a category of coalgebras of the lex comonad ιR\iota\circ R, and that \mathcal{F} is an elementary topos. Furthermore, since ι\iota is closed under subquoteint, the adjunction ιR\iota\dashv R defines a hyperconnected geometric morphism

{\mathcal{E}}.{\mathcal{F}.}h\scriptstyle{h}

Since \mathcal{E} is a Grothendieck topos, we can prove that \mathcal{F} is also a Grothendieck topos (see [[]Theorem 1.8.5.]rogers2021supercompactly for a proof). This completes the proof. ∎

Conversely, for any hyperconnected geometric morphism h:h\colon\mathcal{E}\to\mathcal{F}, from a Grothendieck topos \mathcal{E}, the essential image of hh^{*} satisfies the assumption of lemma A.2. So every hyperconnected geometric morphism is constructed by lemma A.2.

References

  • [Adá74] Jiří Adámek “Free algebras and automata realizations in the language of categories” In Commentationes Mathematicae Universitatis Carolinae 15.4 Charles University in Prague, Faculty of MathematicsPhysics, 1974, pp. 589–602
  • [Boc+23] Guido Boccali et al. “The semibicategory of Moore automata” In arXiv:2305.00272, 2023
  • [Car23] Olivia Caramello “The unification of mathematics via topos theory” In Logic in Question: Talks from the Annual Sorbonne Logic Workshop (2011-2019), 2023, pp. 563–601 Springer
  • [CP20] Thomas Colcombet and Daniela Petrişan “Automata minimization: a functorial approach” In Logical Methods in Computer Science 16 Episciences. org, 2020
  • [GPA22] Alexandre Goy, Daniela Petrişan and Marc Aiguier “Powerset-like monads weakly distribute over themselves in toposes and compact Hausdorff spaces” In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), 2022 Schloss Dagstuhl–Leibniz-Zentrum für Informatik
  • [Hor24] Ryuya Hora “Internal parameterization of hyperconnected quotients” In Theory and Applications of Categories 42.11, 2024, pp. 263–313
  • [HK24] Ryuya Hora and Yuhi Kamio “Quotient toposes of discrete dynamical systems” In Journal of Pure and Applied Algebra Elsevier, 2024, pp. 107657
  • [Iwa24] Victor Iwaniack “Automata in W-Toposes, and General Myhill-Nerode Theorems” In International Workshop on Coalgebraic Methods in Computer Science, 2024, pp. 93–113 Springer
  • [Jac17] Bart Jacobs “Introduction to coalgebra” Cambridge University Press, 2017
  • [Joh81] Peter T. Johnstone “Factorization theorems for geometric morphisms, I” In Cahiers de topologie et géométrie différentielle catégoriques 22.1, 1981, pp. 3–17
  • [Joh02] Peter T. Johnstone “Sketches of an Elephant: A Topos Theory Compendium, Volume 1” Oxford University Press, 2002
  • [KH24] Yuhi Kamio and Ryuya Hora “A solution to the first Lawvere’s problem A Grothendieck topos that has a proper class many quotient topoi” In arXiv:2407.17105, 2024
  • [Law04] F William Lawvere “Functorial concepts of complexity for finite automata” In Theory and Applications of Categories 13.10 Citeseer, 2004, pp. 164–168
  • [LS09] F. Lawvere and Stephen H. Schanuel “Conceptual mathematics: a first introduction to categories Second Edition” Cambridge University Press, 2009
  • [Pin22] Jean-Éric Pin “Mathematical foundations of automata theory”, 2022
  • [Rog19] Morgan Rogers “Toposes of Discrete Monoid Actions” In arXiv:1905.10277, 2019
  • [Rog21] Morgan Rogers “On Supercompactly and Compactly Generated Toposes” In Theory and Applications of Categories 37.32, 2021, pp. 1017–1079
  • [Rog23] Morgan Rogers “Toposes of topological monoid actions” In Compositionality: the open-access journal for the mathematics of composition 5 Episciences. org, 2023
  • [Rut19] Jan Rutten “The Method of Coalgebra: exercises in coinduction” Amsterdam: CWI, 2019
  • [Tom20] Ivan Tomasic “A topos-theoretic view of difference algebra” In arXiv:2001.09075, 2020
  • [Ura17] Takeo Uramoto “Semi-galois Categories I: The Classical Eilenberg Variety Theory” In arXiv:1512.04389, 2017