This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Decision trees for regular factorial languages

Mikhail Moshkov Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia. Email: mikhail.moshkov@kaust.edu.sa.
Abstract

In this paper, we study arbitrary regular factorial languages over a finite alphabet Σ\Sigma. For the set of words L(n)L(n) of the length nn belonging to a regular factorial language LL, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word from L(n)L(n), we should recognize it using queries each of which, for some i{1,,n}i\in\{1,\ldots,n\}, returns the iith letter of the word. In the case of membership problem, for a given word over the alphabet Σ\Sigma of the length nn, we should recognize if it belongs to the set L(n)L(n) using the same queries. For a given problem and type of trees, instead of the minimum depth h(n)h(n) of a decision tree of the considered type solving the problem for L(n)L(n), we study the smoothed minimum depth H(n)=max{h(m):mn}H(n)=\max\{h(m):m\leq n\}. With the growth of nn, the smoothed minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a constant, or grows as a logarithm, or linearly. For other cases (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership problem deterministically and nondeterministically), with the growth of nn, the smoothed minimum depth of decision trees is either bounded from above by a constant or grows linearly. As corollaries of the obtained results, we study joint behavior of smoothed minimum depths of decision trees for the considered four cases and describe five complexity classes of regular factorial languages. We also investigate the class of regular factorial languages over the alphabet {0,1}\{0,1\} each of which is given by one forbidden word.

Keywords: regular factorial language, recognition problem, membership problem, deterministic decision tree, nondeterministic decision tree.

1 Introduction

In this paper, we study arbitrary regular factorial languages over a finite alphabet Σ\Sigma. For the set of words L(n)L(n) of the length nn belonging to a regular factorial language LL, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word from L(n)L(n), we should recognize it using queries each of which, for some i{1,,n}i\in\{1,\ldots,n\}, returns the iith letter of the word. In the case of membership problem, for a given word over the alphabet Σ\Sigma of the length nn, we should recognize if it belongs to L(n)L(n) using the same queries.

For a given problem (problem of recognition or membership problem) and type of trees (solving the problem deterministically or nondeterministically), instead of the minimum depth h(n)h(n) of a decision tree of the considered type solving the problem for L(n)L(n), we study the smoothed minimum depth H(n)=max{h(m):mn}H(n)=\max\{h(m):m\leq n\}. The reason is that the graph of the function h(n)h(n) may have sawtooth form.

For an arbitrary regular factorial language, with the growth of nn, the smoothed minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a constant, or grows as a logarithm, or linearly. These results follow immediately from more general, obtained in [5] for arbitrary regular languages.

For other cases (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership problem deterministically and nondeterministically), with the growth of nn, the smoothed minimum depth of decision trees is either bounded from above by a constant, or grows linearly. In the conference paper [4], a classification of arbitrary regular languages depending on the smoothed minimum depth of decision trees solving the problem of recognition nondeterministically was announced without proofs. In the present paper, we consider simpler classification for regular factorial languages with full proof. Results related to the decision trees solving the membership problem are new.

As corollaries of the obtained results, we study joint behavior of smoothed minimum depths of decision trees for the considered four cases and describe five complexity classes of regular factorial languages. We also investigate the class of regular factorial languages over the alphabet E={0,1}E=\{0,1\} each of which is given by one forbidden word.

We should mention a recent paper [6] in which similar results were obtained for subword-closed languages over the alphabet EE, where by subword we mean subsequence. It is clear that each subword-closed language is a factorial language. Moreover, each subword-closed language over a finite alphabet is a regular language [2]. One can show that the language L(00)L(00) over the alphabet EE given by one forbidden word 0000 is a regular factorial language, which is not subword-closed. Therefore the class of subword-closed languages over the alphabet EE is a proper subclass of the class of regular factorial languages over the alphabet EE.

The main difference between the present paper and [6] is that, in the latter paper, we do not assume that the subword-closed languages are given by sources (partial deterministic finite automata). Instead of this, we describe simple criteria (based on the presence in the language of words of special types) for the behavior of the minimum depths of decision trees solving the problem of recognition deterministically and nondeterministically. Differently formulated criteria for the behavior of the minimum depth of decision trees solving the recognition problem require very different proofs. One more difference is that in [6] we directly consider the minimum depth of decision trees since it is a fairly smooth function for subword-closed languages.

The rest of the paper is organized as follows. In Section 2, we consider main notions, in Section 3 – main results, and in Section 4 – two corollaries of these results.

2 Main Notions

In this section, we discuss the notions related to regular factorial languages and decision trees solving problems of recognition and membership for these languages.

2.1 Regular Factorial Languages

Let ω={0,1,2,}\omega=\{0,1,2,\ldots\} be the set of nonnegative integers and Σ\Sigma be a finite alphabet with at least two letters. By Σ\Sigma^{\ast}, we denote the set of all finite words over the alphabet Σ\Sigma, including the empty word λ\lambda. A word wΣw\in\Sigma^{\ast} is called a factor of a word uΣu\in\Sigma^{\ast} if u=v1wv2u=v_{1}wv_{2} and v1,v2Σv_{1},v_{2}\in\Sigma^{\ast}. A language LΣL\subseteq\Sigma^{\ast} is called factorial if it contains all factors of its words. A word wΣw\in\Sigma^{\ast} is called a minimal forbidden word for LL if wLw\notin L and all proper factors of ww belong to LL. We denote by 𝑀𝐹(L)\mathit{MF}(L) the language of minimal forbidden words for LL. It is known [1] that a factorial language LL is regular if and only if the language 𝑀𝐹(L)\mathit{MF}(L) is regular. In particular, a factorial language LL with a finite set of minimal forbidden words 𝑀𝐹(L)\mathit{MF}(L) is regular. In this paper, we study arbitrary nonempty regular factorial languages.

A source over the alphabet Σ\Sigma is a triple I=(G,q0,Q)I=(G,q_{0},Q), where GG is a finite directed graph, possibly with multiple edges and loops, in which each edge is labeled with a letter from Σ\Sigma and edges leaving each node are labeled with pairwise different letters, q0q_{0} is a node of GG called initial, and QQ is a nonempty set of the graph GG nodes called terminal. Note that the source II can be interpreted as a partial deterministic finite automaton.

A path of the source II is an arbitrary sequence ξ=v1,d1,,vm,dm,vm+1\xi=v_{1},d_{1},\ldots,v_{m},d_{m},v_{m+1} of nodes and edges of GG such that the edge did_{i} leaves the node viv_{i} and enters the node vi+1v_{i+1} for i=1,,mi=1,\ldots,m. We now define a word w(ξ)w(\xi) from Σ\Sigma^{\ast} in the following way: if m=0m=0, then w(ξ)=λw(\xi)=\lambda. Let m>0m>0 and let δj\delta_{j} be the letter attached to the edge djd_{j}, j=1,,mj=1,\ldots,m. Then w(ξ)=δ1δmw(\xi)=\delta_{1}\cdots\delta_{m}. We say that the path ξ\xi generates the word w(ξ)w(\xi). Note that different paths which start in the same node generate different words.

We denote by Ξ(I)\Xi(I) the set of all paths of the source II each of which starts in the node q0q_{0} and finishes in a node from QQ. Let

LI={w(ξ):ξΞ(I)}.L_{I}=\{w(\xi):\xi\in\Xi(I)\}.

We say that the source II generates the language LIL_{I}. It is well known that LIL_{I} is a regular language.

The source II is called everywhere defined over the alphabet Σ\Sigma if exactly |Σ||\Sigma| edges leave each node of GG. Note that these edges are labeled with pairwise different letters from Σ\Sigma. The source II is called reduced if, for each node of GG, there exists a path from Ξ(I)\Xi(I), which contains this node. It is known [3] that, for each regular language over the alphabet Σ\Sigma, there exists an everywhere defined over the alphabet Σ\Sigma source, which generates this language. Therefore, for each nonempty regular language, there exists a reduced source, which generates this language.

Let LL be a regular factorial language and I=(G,q0,Q)I=(G,q_{0},Q) be a reduced source that generates the language LL. Since the language LL is factorial, we can assume additionally that each node of the graph GG is terminal – it will not change the language generated by II. The source II will be called t-reduced if it is reduced and each node of the graph GG is terminal. Further we will assume that a considered regular factorial language LL is nonempty and it is given by a t-reduced source, which generates this language.

2.2 Decision Trees for Recognition and Membership Problems

Let LL be a regular factorial language over the alphabet Σ\Sigma. For any natural nn, denote L(n)=LΣnL(n)=L\cap\Sigma^{n}, where Σn\Sigma^{n} is the set of words over the alphabet Σ\Sigma, which length is equal to nn. We consider two problems related to the set L(n)L(n). The problem of recognition: for a given word from L(n)L(n), we should recognize it using attributes (queries) l1n,,lnnl_{1}^{n},\ldots,l_{n}^{n}, where linl_{i}^{n}, i{1,,n}i\in\{1,\ldots,n\}, is a function from Σn\Sigma^{n} to Σ\Sigma such that lin(a1an)=ail_{i}^{n}(a_{1}\cdots a_{n})=a_{i} for any word a1anΣna_{1}\cdots a_{n}\in\Sigma^{n}. The problem of membership: for a given word from Σn\Sigma^{n}, we should recognize if this word belongs to the set L(n)L(n) using the same attributes. To solve these problems, we use decision trees over L(n)L(n).

A decision tree over L(n)L(n) is a marked finite directed tree with root, which has the following properties:

  • The root and the edges leaving the root are not labeled.

  • Each node, which is not the root nor terminal node, is labeled with an attribute from the set {l1n,,lnn}\{l_{1}^{n},\ldots,l_{n}^{n}\}.

  • Each edge leaving a node, which is not a root, is labeled with a letter from the alphabet Σ\Sigma.

A decision tree over L(n)L(n) is called deterministic if it satisfies the following conditions:

  • Exactly one edge leaves the root.

  • For any node, which is not the root nor terminal node, the edges leaving this node are labeled with pairwise different letters.

Let Γ\Gamma be a decision tree over L(n)L(n). A complete path in Γ\Gamma is any sequence ξ=v0,e0,,vm,em,vm+1\xi=v_{0},e_{0},\ldots,v_{m},e_{m},v_{m+1} of nodes and edges of Γ\Gamma such that v0v_{0} is the root, vm+1v_{m+1} is a terminal node, and viv_{i} is the initial and vi+1v_{i+1} is the terminal node of the edge eie_{i} for i=0,,mi=0,\ldots,m. We define a subset Σ(n,ξ)\Sigma(n,\xi) of the set Σn\Sigma^{n} in the following way: if m=0m=0, then Σ(n,ξ)=Σn\Sigma(n,\xi)=\Sigma^{n}. Let m>0m>0, the attribute lijnl_{i_{j}}^{n} be attached to the node vjv_{j}, and bjb_{j} be the letter attached to the edge eje_{j}, j=1,,mj=1,\ldots,m. Then

Σ(n,ξ)={a1anΣn:ai1=b1,,aim=bm}.\Sigma(n,\xi)=\{a_{1}\cdots a_{n}\in\Sigma^{n}:a_{i_{1}}=b_{1},\ldots,a_{i_{m}}=b_{m}\}.

Let L(n)L(n)\neq\emptyset. We say that a decision tree Γ\Gamma over L(n)L(n) solves the problem of recognition for L(n)L(n) nondeterministically if Γ\Gamma satisfies the following conditions:

  • Each terminal node of Γ\Gamma is labeled with a word from L(n)L(n).

  • For any word wL(n)w\in L(n), there exists a complete path ξ\xi in the tree Γ\Gamma such that wΣ(n,ξ)w\in\Sigma(n,\xi).

  • For any word wL(n)w\in L(n) and for any complete path ξ\xi in the tree Γ\Gamma such that wΣ(n,ξ)w\in\Sigma(n,\xi), the terminal node of the path ξ\xi is labeled with the word ww.

We say that a decision tree Γ\Gamma over L(n)L(n) solves the problem of recognition for L(n)L(n) deterministically if Γ\Gamma is a deterministic decision tree, which solves the problem of recognition for L(n)L(n) nondeterministically.

We say that a decision tree Γ\Gamma over L(n)L(n) solves the problem of membership for L(n)L(n) nondeterministically if Γ\Gamma satisfies the following conditions:

  • Each terminal node of Γ\Gamma is labeled with a number from the set {0,1}\{0,1\}.

  • For any word wΣnw\in\Sigma^{n}, there exists a complete path ξ\xi in the tree Γ\Gamma such that wΣ(n,ξ)w\in\Sigma(n,\xi).

  • For any word wΣnw\in\Sigma^{n} and for any complete path ξ\xi in the tree Γ\Gamma such that wΣ(n,ξ)w\in\Sigma(n,\xi), the terminal node of the path ξ\xi is labeled with the number 11 if wL(n)w\in L(n) and with the number 0, otherwise.

We say that a decision tree Γ\Gamma over L(n)L(n) solves the problem of membership for L(n)L(n) deterministically if Γ\Gamma is a deterministic decision tree which solves the problem of membership for L(n)L(n) nondeterministically.

Let Γ\Gamma be a decision tree over L(n)L(n). We denote by h(Γ)h(\Gamma) the maximum number of nodes in a complete path in Γ\Gamma that are not the root nor terminal node. The value h(Γ)h(\Gamma) is called the depth of the decision tree Γ\Gamma.

We denote by hLra(n)h_{L}^{ra}(n) (hLrd(n)h_{L}^{rd}(n)) the minimum depth of a decision tree over L(n)L(n), which solves the problem of recognition for L(n)L(n) nondeterministically (deterministically). If L(n)=L(n)=\emptyset, then hLra(n)=hLrd(n)=0h_{L}^{ra}(n)=h_{L}^{rd}(n)=0.

We denote by hLma(n)h_{L}^{ma}(n) (hLmd(n)h_{L}^{md}(n)) the minimum depth of a decision tree over L(n)L(n), which solves the problem of membership for L(n)L(n) nondeterministically (deterministically). If L(n)=L(n)=\emptyset, then hLma(n)=hLmd(n)=0h_{L}^{ma}(n)=h_{L}^{md}(n)=0.

3 Bounds on Decision Tree Depth

Let LL be a nonempty factorial regular language. In this section, we consider the behavior of four functions HLraH_{L}^{ra}, HLrdH_{L}^{rd}, HLmaH_{L}^{ma}, and HLmdH_{L}^{md} defined on the set ω{0}\omega\setminus\{0\} and with values from ω\omega. For any natural nn,

HLra(n)\displaystyle H_{L}^{ra}(n) =\displaystyle= max{hLra(m):1mn},\displaystyle\max\{h_{L}^{ra}(m):1\leq m\leq n\},
HLrd(n)\displaystyle H_{L}^{rd}(n) =\displaystyle= max{hLrd(m):1mn},\displaystyle\max\{h_{L}^{rd}(m):1\leq m\leq n\},
HLma(n)\displaystyle H_{L}^{ma}(n) =\displaystyle= max{hLma(m):1mn},\displaystyle\max\{h_{L}^{ma}(m):1\leq m\leq n\},
HLmd(n)\displaystyle H_{L}^{md}(n) =\displaystyle= max{hLmd(m):1mn}.\displaystyle\max\{h_{L}^{md}(m):1\leq m\leq n\}.

For any pair bc{ra,rd,ma,md}bc\in\{ra,rd,ma,md\}, the function HLbc(n)H_{L}^{bc}(n) is a smoothed analog of the function hLbc(n)h_{L}^{bc}(n).

3.1 Decision Trees Solving Recognition Problem Deterministically

Let I=(G,q0,Q)I=(G,q_{0},Q) be a t-reduced source over the alphabet Σ\Sigma. A path of the source II is called a cycle of the source II if there is at least one edge in this path, and the first node of this path is equal to the last node of this path. A cycle of the source II is called elementary if nodes of this cycle, with the exception of the last node, are pairwise different.

The source II is called simple if every two different elementary cycles of the source II do not have common nodes. Let II be a simple source and ξ\xi be a path of the source II. The number of different elementary cycles of the source II, which have common nodes with ξ\xi, is denoted by cl(ξ)cl(\xi) and is called the cyclic length of the path ξ\xi. The value

cl(I)=max{cl(ξ):ξΞ(I)}cl(I)=\max\{cl(\xi):\xi\in\Xi(I)\}

is called the cyclic length of the source II.

Let II be a simple source, CC be an elementary cycle of the source II, and vv be a node of the cycle CC. Beginning with the node vv, the cycle CC generates an infinite periodic word over the alphabet Σ\Sigma. This word will be denoted by W(I,C,v)W(I,C,v). We denote by r(I,C,v)r(I,C,v) the minimum period of the word W(I,C,v)W(I,C,v). The source II is called dependent if there exist two different elementary cycles C1C_{1} and C2C_{2} of the source II, nodes v1v_{1} and v2v_{2} of the cycles C1C_{1} and C2C_{2}, respectively, and a path π\pi of the source II from v1v_{1} to v2v_{2}, which satisfy the following conditions: W(I,C1,v1)=W(I,C2,v2)W(I,C_{1},v_{1})=W(I,C_{2},v_{2}) and the length of the path π\pi is a number divisible by r(I,C1,v1)r(I,C_{1},v_{1}). If the source II is not dependent, then it is called independent. Next theorem follows immediately from Theorem 2.1 [5].

Theorem 1.

Let LL be a nonempty regular factorial language over the alphabet Σ\Sigma and II be a t-reduced source, which generates the language LL. Then the following statements hold:

(a) If II is an independent simple source and cl(I)1cl(I)\leq 1, then HLrd(n)=O(1)H_{L}^{rd}(n)=O(1).

(b) If II is an independent simple source and cl(I)2cl(I)\geq 2, then HLrd(n)=Θ(logn)H_{L}^{rd}(n)=\Theta(\log n).

(c) If II is not independent simple source, then HLrd(n)=Θ(n)H_{L}^{rd}(n)=\Theta(n).

3.2 Decision Trees Solving Recognition Problem Nondeterministically

Let LL be a nonempty regular factorial language over the alphabet Σ\Sigma. For any natural nn, we define a parameter TL(n)T_{L}(n) of the language LL. If L(n)=L(n)=\emptyset, then TL(n)=0T_{L}(n)=0. Let L(n)L(n)\neq\emptyset, w=a1anL(n)w=a_{1}\cdots a_{n}\in L(n), and J{1,,n}J\subseteq\{1,\ldots,n\}. Denote L(w,J)={b1bnL(n):bj=aj,jJ}L(w,J)=\{b_{1}\cdots b_{n}\in L(n):b_{j}=a_{j},j\in J\} (if J=J=\emptyset, then L(w,J)=L(n)L(w,J)=L(n)) and ML(n,w)=min{|J|:J{1,,n},|L(w,J)|=1}M_{L}(n,w)=\min\{|J|:J\subseteq\{1,\ldots,n\},|L(w,J)|=1\}. Then

TL(n)=max{ML(n,w):wL(n)}.T_{L}(n)=\max\{M_{L}(n,w):w\in L(n)\}.

Note that, for any word wL(n)w\in L(n), ML(n,w)M_{L}(n,w) is the minimum number of letters of the word ww, which allow us to distinguish it from all other words belonging to L(n)L(n). One can show that hLra(n)=TL(n)h_{L}^{ra}(n)=T_{L}(n).

Theorem 2.

Let LL be a nonempty regular factorial language over the alphabet Σ\Sigma and I=(G,q0,Q)I=(G,q_{0},Q) be a t-reduced source, which generates the language LL. Then the following statements hold:

(a) If II is an independent simple source, then HLra(n)=O(1)H_{L}^{ra}(n)=O(1).

(b) If II is not independent simple source, then HLra(n)=Θ(n)H_{L}^{ra}(n)=\Theta(n).

Proof.

(a) Let II be an independent simple source and cl(I)1cl(I)\leq 1. By Theorem 1, HLrd(n)=O(1)H_{L}^{rd}(n)=O(1). It is clear that HLra(n)HLrd(n)H_{L}^{ra}(n)\leq H_{L}^{rd}(n). Therefore HLra(n)=O(1)H_{L}^{ra}(n)=O(1).

Let II be an independent simple source and cl(I)2cl(I)\geq 2. Let nn be a natural number. If L(n)=L(n)=\emptyset, then TL(n)=0T_{L}(n)=0. Let L(n)L(n)\neq\emptyset. Denote by dd the number of nodes in the graph GG. In the proof of Lemma 4.5 [5], it was proved that ML(n,w)d(4d+1)M_{L}(n,w)\leq d(4d+1) for any word wL(n)w\in L(n). Therefore TL(n)d(4d+1)T_{L}(n)\leq d(4d+1). Thus, hLra(n)d(4d+1)h_{L}^{ra}(n)\leq d(4d+1) for any natural nn and HLra(n)=O(1)H_{L}^{ra}(n)=O(1).

(b) Let II be not simple source and C1,C2C_{1},C_{2} be different elementary cycles of the source II, which have a common node vv. Since II is a t-reduced source, it contains a path ξ\xi from the node q0q_{0} to the node vv, and vv is a terminal node. Let the length of the path ξ\xi be equal to aa, the length of the cycle C1C_{1} be equal to bb, and the length of the cycle C2C_{2} be equal to cc. Let α\alpha be the word generated by the path ξ\xi, β\beta be the word generated by a path from vv to vv obtained by the passage cc times along the cycle C1C_{1}, and γ\gamma be the word generated by a path from vv to vv obtained by the passage bb times along the cycle C2C_{2}. The words β\beta and γ\gamma are different and they have the same length bcbc.

Consider the sequence of numbers ni=a+ibcn_{i}=a+ibc, i=1,2,i=1,2,\ldots. Let iω{0}i\in\omega\setminus\{0\}. The set L(ni)L(n_{i}) contains the word αγi\alpha\gamma^{i} and the words αγjβγij1\alpha\gamma^{j}\beta\gamma^{i-j-1} for j=0,,i1j=0,\ldots,i-1. It is easy to show that ML(n,αγi)iM_{L}(n,\alpha\gamma^{i})\geq i: to distinguish the word αγi\alpha\gamma^{i} from the words αγjβγij1\alpha\gamma^{j}\beta\gamma^{i-j-1}, j=0,,i1j=0,\ldots,i-1, we need to use at least one letter from each of ii words γ\gamma appearing in αγi\alpha\gamma^{i}. Therefore TL(ni)iT_{L}(n_{i})\geq i and hLra(ni)i=(nia)/(bc)h_{L}^{ra}(n_{i})\geq i=(n_{i}-a)/(bc). Let nn1n\geq n_{1} and let ii be the maximum natural number such that nnin\geq n_{i}. Evidently, nnibcn-n_{i}\leq bc. Hence HLra(n)hLra(ni)(nbca)/(bc)H_{L}^{ra}(n)\geq h_{L}^{ra}(n_{i})\geq(n-bc-a)/(bc). Therefore HLra(n)n/(2bc)H_{L}^{ra}(n)\geq n/(2bc) for large enough nn. The inequality HLra(n)nH_{L}^{ra}(n)\leq n is obvious. Thus, HLra(n)=Θ(n)H_{L}^{ra}(n)=\Theta(n).

Let II be a dependent simple source. Then there exist two different elementary cycles C1C_{1} and C2C_{2} of the source II, nodes v1v_{1} and v2v_{2} of the cycles C1C_{1} and C2C_{2}, respectively, and a path π\pi of the source II from v1v_{1} to v2v_{2}, which satisfy the following conditions: W(I,C1,v1)=W(I,C2,v2)W(I,C_{1},v_{1})=W(I,C_{2},v_{2}) and the length of the path π\pi is a number divisible by r(I,C1,v1)r(I,C_{1},v_{1}). Let us remind that, for i=1,2i=1,2, W(I,Ci,vi)W(I,C_{i},v_{i}) is the infinite periodic word over the alphabet Σ\Sigma generated by the cycle CiC_{i} beginning with the node viv_{i}, and r(I,C1,v1)r(I,C_{1},v_{1}) is the minimum period of the word W(I,C1,v1)W(I,C_{1},v_{1}). Since II is a t-reduced source, it contains a path ξ\xi from the node q0q_{0} to the node v1v_{1}, and all nodes of the graph GG are terminal. Let the path ξ\xi generate the word α\alpha of the length aa. Denote r=r= r(I,C1,v1)r(I,C_{1},v_{1}). Let the length of the cycle C1C_{1} be equal to brbr, the length of the path π\pi be equal to crcr, and the path π\pi generate the word β\beta. Denote by γ\gamma the prefix of the length rr of the word W(I,C1,v1)W(I,C_{1},v_{1}). We now define two words of the length rbcrbc: u=γbcu=\gamma^{bc} and w=βγc(b1)w=\beta\gamma^{c(b-1)}. It is clear that uwu\neq w.

Consider the sequence of numbers ni=a+irbcn_{i}=a+irbc, i=1,2,i=1,2,\ldots. Let iω{0}i\in\omega\setminus\{0\}. The set L(ni)L(n_{i}) contains the word αui\alpha u^{i} and the words αujwuij1\alpha u^{j}wu^{i-j-1} for j=0,,i1j=0,\ldots,i-1. It is easy to show that ML(n,αui)iM_{L}(n,\alpha u^{i})\geq i: to distinguish the word αui\alpha u^{i} from the words αujwuij1\alpha u^{j}wu^{i-j-1}, j=0,,i1j=0,\ldots,i-1, we need to use at least one letter from each of ii words uu appearing in αui\alpha u^{i}. Therefore TL(ni)iT_{L}(n_{i})\geq i and hLra(ni)i=(nia)/(rbc)h_{L}^{ra}(n_{i})\geq i=(n_{i}-a)/(rbc). Let nn1n\geq n_{1} and let ii be the maximum natural number such that nnin\geq n_{i}. Evidently, nnirbcn-n_{i}\leq rbc. Hence HLra(n)hLra(ni)(nrbca)/(rbc)H_{L}^{ra}(n)\geq h_{L}^{ra}(n_{i})\geq(n-rbc-a)/(rbc). Therefore HLra(n)n/(2rbc)H_{L}^{ra}(n)\geq n/(2rbc) for large enough nn. The inequality HLra(n)nH_{L}^{ra}(n)\leq n is obvious. Thus, HLra(n)=Θ(n)H_{L}^{ra}(n)=\Theta(n). ∎

Note that in general case (when we consider not only factorial languages) the classification of reduced sources depending on the minimum depth of decision trees solving the problem of recognition nondeterministically is more complicated [4]. In particular, there exists a dependent simple reduced source I0I_{0} (see Fig. 1) with the initial node labeled with the symbol ++ and the unique terminal node labeled with the symbol * that generates the regular language L0={0i10j:i,jω}L_{0}=\{0^{i}10^{j}:i,j\in\omega\} over the alphabet {0,1}\{0,1\}, which is not factorial and for which HL0ra(n)=O(1)H_{L_{0}}^{ra}(n)=O(1).

Refer to caption
Figure 1: Source I0I_{0}

3.3 Decision Trees Solving Membership Problem

For a regular factorial language LL over the alphabet Σ\Sigma, we denote by LCL^{C} its complementary language ΣL\Sigma^{\ast}\setminus L. The notation |L|=|L|=\infty means that LL is an infinite language, and the notation |L|<|L|<\infty means that LL is a finite language.

Theorem 3.

Let LL be a regular factorial language over the alphabet Σ\Sigma.

(a) If |L|=|L|=\infty and LCL^{C}\neq\emptyset, then HLmd(n)=Θ(n)H_{L}^{md}(n)=\Theta(n) and HLma(n)=Θ(n)H_{L}^{ma}(n)=\Theta(n).

(b) If |L|<|L|<\infty or LC=L^{C}=\emptyset, then HLmd(n)=O(1)H_{L}^{md}(n)=O(1) and HLma(n)=O(1)H_{L}^{ma}(n)=O(1).

Proof.

It is clear that hLma(n)hLmd(n)h_{L}^{ma}(n)\leq h_{L}^{md}(n) for any natural nn.

(a) Let |L|=|L|=\infty, LCL^{C}\neq\emptyset, and w0w_{0} be a word with the minimum length from LCL^{C}. Denote by tt the length of w0w_{0}. Since |L|=|L|=\infty, L(n)L(n)\neq\emptyset for any natural nn. Let nn be a natural number such that n>tn>t and Γ\Gamma be a decision tree over L(n)L(n) that solves the problem of membership for L(n)L(n) nondeterministically and has the minimum depth. Let wL(n)w\in L(n) and ξ\xi be a complete path in Γ\Gamma such that wΣ(n,ξ)w\in\Sigma(n,\xi). Then the terminal node of ξ\xi is labeled with the number 11. Beginning with the first letter, we divide the word ww into n/t\left\lfloor n/t\right\rfloor blocks with tt letters in each and the suffix of the length ntn/tn-t\left\lfloor n/t\right\rfloor. Let us assume that the number of nodes labeled with attributes in ξ\xi is less than n/t\left\lfloor n/t\right\rfloor. Then there is a block such that queries (attributes) attached to nodes of ξ\xi does not ask about letters from the block. We replace this block in the word ww with the word w0w_{0} and denote by ww^{\prime} the obtained word. It is clear that wLw^{\prime}\notin L and wΣ(n,ξ)w^{\prime}\in\Sigma(n,\xi), but this is impossible since the terminal node of the path ξ\xi is labeled with the number 1. Therefore the depth of Γ\Gamma is greater than or equal to n/t\left\lfloor n/t\right\rfloor. Thus, hLma(n)n/th_{L}^{ma}(n)\geq\left\lfloor n/t\right\rfloor. It is easy to construct a decision tree over L(n)L(n) that solves the problem of membership for L(n)L(n) deterministically and has the depth equals to nn. Therefore hLmd(n)nh_{L}^{md}(n)\leq n. Thus, HLmd(n)=Θ(n)H_{L}^{md}(n)=\Theta(n) and HLma(n)=Θ(n)H_{L}^{ma}(n)=\Theta(n).

(b) Let |L|<|L|<\infty. Then there exists natural mm such that L(n)=L(n)=\emptyset for any natural nmn\geq m. Therefore, for each natural nmn\geq m, hLmd(n)=0h_{L}^{md}(n)=0 and hLma(n)=0h_{L}^{ma}(n)=0. Thus, HLmd(n)=O(1)H_{L}^{md}(n)=O(1) and HLma(n)=O(1)H_{L}^{ma}(n)=O(1).

Let LC=L^{C}=\emptyset, nn be a natural number, and Γ\Gamma be a decision tree over L(n)L(n), which consists of the root, a terminal node labeled with 1,1, and an edge that leaves the root and enters the terminal node. One can show that Γ\Gamma solves the problem of membership for L(n)L(n) deterministically and has the depth equals to 0. Therefore hLmd(n)=0h_{L}^{md}(n)=0 and hLma(n)=0h_{L}^{ma}(n)=0. Thus, HLmd(n)=O(1)H_{L}^{md}(n)=O(1) and HLma(n)=O(1)H_{L}^{ma}(n)=O(1). ∎

4 Corollaries

In this section, we consider two corollaries of Theorems 13.

4.1 Joint Behavior of Functions HLraH_{L}^{ra}, HLrdH_{L}^{rd}, HLmaH_{L}^{ma}, and HLmdH_{L}^{md}

In this section, we assume that each regular factorial language over the alphabet Σ\Sigma is given by a t-reduced source II, which generates the considered language denoted by LIL_{I}. To study all possible types of joint behavior of functions HLIrdH_{L_{I}}^{rd}, HLIraH_{L_{I}}^{ra}, HLImdH_{L_{I}}^{md}, and HLImaH_{L_{I}}^{ma}, we consider five classes of regular factorial languages 1,,5\mathcal{F}_{1},\ldots,\mathcal{F}_{5} described in the columns 2–4 of Table 1. In particular, 1\mathcal{F}_{1} consists of all regular factorial languages LIL_{I} for which the source II is an independent simple source and cl(I)=0cl(I)=0. It is easy to show that the complexity classes 1,,5\mathcal{F}_{1},\ldots,\mathcal{F}_{5} are pairwise disjoint, and each regular factorial language LIL_{I} belongs to one of these classes. The behavior of functions HLIrdH_{L_{I}}^{rd}, HLIraH_{L_{I}}^{ra}, HLImdH_{L_{I}}^{md}, and HLImaH_{L_{I}}^{ma} for languages from these classes is described in the last four columns of Table 1. For each class, the results considered in Table 1 for the functions HLIrdH_{L_{I}}^{rd} and HLIraH_{L_{I}}^{ra} follow directly from Theorems 1 and 2.

Table 1: Complexity classes 1,,5\mathcal{F}_{1},\ldots,\mathcal{F}_{5}
II is independent cl(I)cl(I) LICL_{I}^{C} HLIrdH_{L_{I}}^{rd} HLIraH_{L_{I}}^{ra} HLImdH_{L_{I}}^{md} HLImaH_{L_{I}}^{ma}
simple source
1\mathcal{F}_{1} Yes =0=0 O(1)O(1) O(1)O(1) O(1)O(1) O(1)O(1)
2\mathcal{F}_{2} Yes =1=1 O(1)O(1) O(1)O(1) Θ(n)\Theta(n) Θ(n)\Theta(n)
3\mathcal{F}_{3} Yes 2\geq 2 Θ(logn)\Theta(\log n) O(1)O(1) Θ(n)\Theta(n) Θ(n)\Theta(n)
4\mathcal{F}_{4} No \neq\emptyset Θ(n)\Theta(n) Θ(n)\Theta(n) Θ(n)\Theta(n) Θ(n)\Theta(n)
5\mathcal{F}_{5} No ==\emptyset Θ(n)\Theta(n) Θ(n)\Theta(n) O(1)O(1) O(1)O(1)

We now consider the behavior of the functions HLImdH_{L_{I}}^{md} and HLImaH_{L_{I}}^{ma} for each of the classes 1,,5\mathcal{F}_{1},\ldots,\mathcal{F}_{5}. Let I=(G,q0,Q)I=(G,q_{0},Q) be a t-reduced source over the alphabet Σ\Sigma, which generates a regular factorial language.

Let LI1L_{I}\in\mathcal{F}_{1}. Since cl(I)=0cl(I)=0, GG is a directed acyclic graph, and the language LIL_{I} is finite. Using Theorem 3 we obtain HLImd(n)=O(1)H_{L_{I}}^{md}(n)=O(1) and HLIma(n)=O(1)H_{L_{I}}^{ma}(n)=O(1).

Let LI2L_{I}\in\mathcal{F}_{2}. Since cl(I)=1cl(I)=1, GG is a graph containing a cycle, and the language LIL_{I} is infinite. By Lemma 4.2 [5], |LI(n)|=O(1)|L_{I}(n)|=O(1). Therefore LICL_{I}^{C}\neq\emptyset. Using Theorem 3 we obtain HLImd(n)=Θ(n)H_{L_{I}}^{md}(n)=\Theta(n) and HLIma(n)=Θ(n)H_{L_{I}}^{ma}(n)=\Theta(n).

Let LI3L_{I}\in\mathcal{F}_{3}. Since cl(I)2cl(I)\geq 2, GG is a graph containing a cycle, and the language LIL_{I} is infinite. By Lemma 4.2 [5], |LI(n)|=O(ncl(I))|L_{I}(n)|=O(n^{cl(I)}). Therefore LICL_{I}^{C}\neq\emptyset. Using Theorem 3 we obtain HLImd(n)=Θ(n)H_{L_{I}}^{md}(n)=\Theta(n) and HLIma(n)=Θ(n)H_{L_{I}}^{ma}(n)=\Theta(n).

Let LI4L_{I}\in\mathcal{F}_{4}. Since II is not an independent simple source, GG is a graph containing a cycle, and the language LIL_{I} is infinite. We know that LICL_{I}^{C}\neq\emptyset. Using Theorem 3 we obtain HLImd(n)=Θ(n)H_{L_{I}}^{md}(n)=\Theta(n) and HLIma(n)=Θ(n)H_{L_{I}}^{ma}(n)=\Theta(n).

Let LI5L_{I}\in\mathcal{F}_{5}. Then LIC=L_{I}^{C}=\emptyset. Using Theorem 3 we obtain HLImd(n)=O(1)H_{L_{I}}^{md}(n)=O(1) and HLIma(n)=O(1)H_{L_{I}}^{ma}(n)=O(1).

We now show that the classes 1,,5\mathcal{F}_{1},\ldots,\mathcal{F}_{5} are nonempty. For simplicity, we assume that Σ=E\Sigma=E, where E={0,1}E=\{0,1\}. It is easy to generalize the considered examples to the case of an arbitrary finite alphabet Σ\Sigma with at least two letters. In the examples of sources, the initial node is labeled with the symbol ++, and all nodes are terminal.

Refer to caption
Figure 2: Source I1I_{1}

Denote by I1I_{1} the source over the alphabet EE depicted in Fig. 2. One can show that I1I_{1} is an independent simple t-reduced source and cl(I1)=0cl(I_{1})=0. This source generates the language LI1={λ,0}L_{I_{1}}=\{\lambda,0\}, which is factorial. Therefore LI11L_{I_{1}}\in\mathcal{F}_{1}.

Refer to caption
Figure 3: Source I2I_{2}

Denote by I2I_{2} the source over the alphabet EE depicted in Fig. 3. One can show that I2I_{2} is an independent simple t-reduced source and cl(I2)=1cl(I_{2})=1. This source generates the language LI2={0i:iω}L_{I_{2}}=\{0^{i}:i\in\omega\}, which is factorial. Therefore LI2L_{I_{2}}\in 2\mathcal{F}_{2}.

Refer to caption
Figure 4: Source I3I_{3}

Denote by I3I_{3} the source over the alphabet EE depicted in Fig. 4. One can show that I3I_{3} is an independent simple t-reduced source and cl(I1)=2cl(I_{1})=2. This source generates the language LI3={0i1j:i,jω}L_{I_{3}}=\{0^{i}1^{j}:i,j\in\omega\}, which is factorial. Therefore LI3L_{I_{3}}\in 3\mathcal{F}_{3}.

Refer to caption
Figure 5: Source I4I_{4}

Denote by I4I_{4} the source over the alphabet EE depicted in Fig. 5. One can show that I4I_{4} is a dependent simple t-reduced source generating the language LI4={0i1j0k:i,kω,j{0,1}}L_{I_{4}}=\{0^{i}1^{j}0^{k}:i,k\in\omega,j\in\{0,1\}\}, which is factorial. It is clear that LI4CL_{I_{4}}^{C}\neq\emptyset. Therefore LI4L_{I_{4}}\in 4\mathcal{F}_{4}.

Refer to caption
Figure 6: Source I5I_{5}

Denote by I5I_{5} the source over the alphabet EE depicted in Fig. 6. One can show that I5I_{5} is a t-reduced source that is not simple. This source generates the language LI5=EL_{I_{5}}=E^{\ast}, which is factorial. It is clear that LI5C=L_{I_{5}}^{C}=\emptyset. Therefore LI5L_{I_{5}}\in 5\mathcal{F}_{5}.

A regular factorial language LL can have different t-reduced sources, which generate it. However, for each of such sources II, the language LI=LL_{I}=L will belong to the same complexity class. Let us assume the contrary: there exist a regular factorial language LL and two t-reduced sources I1I_{1} and I2I_{2}, which generate it and for which languages LI1L_{I_{1}} and LI2L_{I_{2}} belong to different complexity classes. Then, for some pair bc{rd,ra,md,ma}bc\in\{rd,ra,md,ma\}, the functions HLI1bcH_{L_{I_{1}}}^{bc} and HLI2bcH_{L_{I_{2}}}^{bc} have different behavior, but this is impossible since HLI1bc(n)=HLI2bc(n)H_{L_{I_{1}}}^{bc}(n)=H_{L_{I_{2}}}^{bc}(n) for any natural nn.

4.2 Languages Over Alphabet {0,1}\{0,1\} Given by One Forbidden Word

Let E={0,1}E=\{0,1\}, αE\alpha\in E^{\ast}, and αλ\alpha\neq\lambda. We denote by L(α)L(\alpha) the language over the alphabet EE, which consists of all words from EE^{\ast} that does not contain α\alpha as a factor. This is a regular factorial language with 𝑀𝐹(L(α))={α}\mathit{MF}(L(\alpha))=\{\alpha\}. The following theorem indicates for each nonempty word αE\alpha\in E^{\ast} the complexity class i\mathcal{F}_{i} to which the language L(α)L(\alpha) belongs.

Theorem 4.

Let αE\alpha\in E^{\ast} and αλ\alpha\neq\lambda.

(a) If α{0,1}\alpha\in\{0,1\}, then L(α)2L(\alpha)\in\mathcal{F}_{2}.

(b) If α{01,10}\alpha\in\{01,10\}, then L(α)3L(\alpha)\in\mathcal{F}_{3}.

(c) If α{0,1,01,10}\alpha\notin\{0,1,01,10\}, then L(α)4L(\alpha)\in\mathcal{F}_{4}.

We now describe a t-reduced source I(α)I(\alpha) that generates the language L(α)L(\alpha) for a nonempty word αE\alpha\in E^{\ast}. Let α=a1an\alpha=a_{1}\cdots a_{n}, α0=λ\alpha_{0}=\lambda, and αi=\alpha_{i}= a1aia_{1}\cdots a_{i} for i=1,,n1i=1,\ldots,n-1. The set P(α)={α0,α1,,αn1}P(\alpha)=\{\alpha_{0},\alpha_{1},\ldots,\alpha_{n-1}\} is the set of all proper prefixes of the word α\alpha. Then I(α)=(G,q0,Q)I(\alpha)=(G,q_{0},Q), where the set of nodes of the graph GG is equal to P(α)P(\alpha), q0=α0q_{0}=\alpha_{0}, and Q=P(α)Q=P(\alpha). For i=0,,n2i=0,\ldots,n-2, an edge leaves the node αi\alpha_{i} and enters the node αi+1\alpha_{i+1}. This edge is labeled with the letter ai+1a_{i+1}. For i=0,,n1i=0,\ldots,n-1, an edge leaves the node αi\alpha_{i} and enters the node αjP(α)\alpha_{j}\in P(\alpha) such that αj\alpha_{j} is the longest suffix of the word αia¯i+1\alpha_{i}\bar{a}_{i+1}, where a¯i+1=0\bar{a}_{i+1}=0 if ai+1=1a_{i+1}=1 and a¯i+1=1\bar{a}_{i+1}=1 if ai+1=0a_{i+1}=0. This edge is labeled with the letter a¯i+1\bar{a}_{i+1}. It is easy to show that I(α)I(\alpha) is a t-reduced source over the alphabet EE. From Theorem 10 [1] it follows that the source I(α)I(\alpha) generates the language L(α)L(\alpha).

Let αE{λ}\alpha\in E^{\ast}\setminus\{\lambda\} and α=a1an\alpha=a_{1}\cdots a_{n}. We denote by α¯\bar{\alpha} the word a¯1a¯n\bar{a}_{1}\cdots\bar{a}_{n}. It is easy to prove the following statement.

Lemma 1.

Let αE\alpha\in E^{\ast} and αλ\alpha\neq\lambda. Then HL(α¯)bc(n)=HL(α)bc(n)H_{L(\bar{\alpha})}^{bc}(n)=H_{L(\alpha)}^{bc}(n) for any pair bc{rd,ra,md,bc\in\{rd,ra,md, ma}ma\} and any natural nn.

Lemma 2.

Let αE{λ}\alpha\in E^{\ast}\setminus\{\lambda\}, βE\beta\in E^{\ast}, and L(α)4L(\alpha)\in\mathcal{F}_{4}. Then L(αβ)4L(\alpha\beta)\in\mathcal{F}_{4}.

Proof.

Since L(α)4L(\alpha)\in\mathcal{F}_{4}, HL(α)rd(n)=Θ(n)H_{L(\alpha)}^{rd}(n)=\Theta(n) and HL(α)ra(n)=Θ(n)H_{L(\alpha)}^{ra}(n)=\Theta(n). One can show that L(α)L(αβ)L(\alpha)\subseteq L(\alpha\beta). Using this fact it is not difficult to prove that HL(α)rd(n)HL(αβ)rd(n)H_{L(\alpha)}^{rd}(n)\leq H_{L(\alpha\beta)}^{rd}(n) and HL(α)ra(n)HL(αβ)ra(n)H_{L(\alpha)}^{ra}(n)\leq H_{L(\alpha\beta)}^{ra}(n) for any natural nn. From here and from Theorems 1 and 2 it follows that HL(αβ)rd(n)=Θ(n)H_{L(\alpha\beta)}^{rd}(n)=\Theta(n) and HL(αβ)ra(n)=Θ(n)H_{L(\alpha\beta)}^{ra}(n)=\Theta(n).

Since αβL(αβ)\alpha\beta\notin L(\alpha\beta), L(αβ)CL(\alpha\beta)^{C}\neq\emptyset. The source I(αβ)I(\alpha\beta) contains at least one circle formed by the edge that leaves and enters the node λ\lambda and is labeled with the letter a¯1\bar{a}_{1}, where a1a_{1} is the first letter of the word α\alpha. Therefore the language L(αβ)L(\alpha\beta) is infinite. By Theorem 3, HL(αβ)md(n)=Θ(n)H_{L(\alpha\beta)}^{md}(n)=\Theta(n) and HL(αβ)ma(n)=Θ(n)H_{L(\alpha\beta)}^{ma}(n)=\Theta(n). Thus, L(αβ)4L(\alpha\beta)\in\mathcal{F}_{4}. ∎

of Theorem 4.

In each figure depicting a source I(α)I(\alpha), αE{λ}\alpha\in E^{\ast}\setminus\{\lambda\}, we label each node with a corresponding prefix of the word α\alpha.

Refer to caption
Figure 7: Source I(0)I(0)

(a) The source I(0)I(0) is depicted in Fig. 7. This is an independent simple t-reduced source with cl(I(0))=1cl(I(0))=1. Therefore L(0)2L(0)\in\mathcal{F}_{2}. By Lemma 1, L(1)2L(1)\in\mathcal{F}_{2}.

Refer to caption
Figure 8: Source I(01)I(01)

(b) The source I(01)I(01) is depicted in Fig. 8. This is an independent simple t-reduced source with cl(I(01))=2cl(I(01))=2. Therefore L(01)3L(01)\in\mathcal{F}_{3}. By Lemma 1, L(10)3L(10)\in\mathcal{F}_{3}.

Refer to caption
Figure 9: Source I(00)I(00)

(c) The source I(00)I(00) is depicted in Fig. 9. This is not a simple source. It is clear that L(00)CL(00)^{C}\neq\emptyset. Therefore L(00)4L(00)\in\mathcal{F}_{4}. By Lemma 1, L(11)4L(11)\in\mathcal{F}_{4}. Using Lemma 2 we obtain L(000),L(001),L(110),L(111)4L(000),L(001),L(110),L(111)\in\mathcal{F}_{4}.

Refer to caption
Figure 10: Source I(010)I(010)

The source I(010)I(010) is depicted in Fig. 10. This is not a simple source. It is clear that L(010)CL(010)^{C}\neq\emptyset. Therefore L(010)4L(010)\in\mathcal{F}_{4}. By Lemma 1, L(101)4L(101)\in\mathcal{F}_{4}.

Refer to caption
Figure 11: Source I(011)I(011)

The source I(011)I(011) is depicted in Fig. 11. This is not a simple source. It is clear that L(011)CL(011)^{C}\neq\emptyset. Therefore L(011)4L(011)\in\mathcal{F}_{4}. By Lemma 1, L(100)4L(100)\in\mathcal{F}_{4}.

We proved that, for any word αE\alpha\in E^{\ast} of the length three, L(α)4L(\alpha)\in\mathcal{F}_{4}. Using Lemma 2 we obtain that, for any word αE\alpha\in E^{\ast} of the length greater than or equal to four, L(α)4L(\alpha)\in\mathcal{F}_{4}. ∎

Acknowledgments

Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST).

References

  • [1] Crochemore, M., Mignosi, F., Restivo, A.: Automata and forbidden words. Inf. Process. Lett. 67(3), 111–117 (1998)
  • [2] Haines, L.H.: On free monoids partially ordered by embedding. J. Comb. Theory 6, 94–98 (1969)
  • [3] Markov, A.A.: Introduction into Coding Theory (in Russian). Nauka, Moscow (1982)
  • [4] Moshkov, M.: Complexity of deterministic and nondeterministic decision trees for regular language word recognition. In: S. Bozapalidis (ed.) Proceedings of the 3rd International Conference Developments in Language Theory, DLT 1997, Thessaloniki, Greece, July 20–23, 1997, pp. 343–349. Aristotle University of Thessaloniki (1997)
  • [5] Moshkov, M.: Decision trees for regular language word recognition. Fundam. Inform. 41(4), 449–461 (2000)
  • [6] Moshkov, M.: Decision trees for binary subword-closed languages. CoRR abs/2201.01493 (2022). URL https://arxiv.org/abs/2201.01493