¹¹institutetext: Ural Federal University, Ekaterinburg, Russia
{elena.petrova,arseny.shur}@urfu.ru

Abelian Repetition Threshold Revisited

Elena A. Petrova Arseny M. Shur

Abstract

Abelian repetition threshold ${\sf ART}(k)$ is the number separating fractional Abelian powers which are avoidable and unavoidable over the $k$ -letter alphabet. The exact values of ${\sf ART}(k)$ are unknown; the lower bounds were proved in [A.V. Samsonov, A.M. Shur. On Abelian repetition threshold. RAIRO ITA, 2012] and conjectured to be tight. We present a method of study of Abelian power-free languages using random walks in prefix trees and some experimental results obtained by this method. On the base of these results, we conjecture that the lower bounds for ${\sf ART}(k)$ by Samsonov and Shur are not tight for all $k\neq 5$ and prove this conjecture for $k=6,7,8,9,10$ . Namely, we show that ${\sf ART}(k)>\frac{k-2}{k-3}$ in all these cases.

Keywords:

Abelian-power-free language, repetition threshold, prefix tree, random walk

1 Introduction

Two words are Abelian equivalent if they have the same multiset of letters (in other terms, if they are anagrams of each other); Abelian repetition is a pair of Abelian equivalent factors in a word. The study of Abelian repetitions originated from the question of Erdös [6]: does there exist an infinite finitary word having no consecutive pair of Abelian equivalent factors? The factors of the form $u\bar{u}$ , where $u$ and $\bar{u}$ are Abelian equivalent, are now called Abelian squares. In modern terms, Erdös’s question looks like “are Abelian squares avoidable over some finite alphabet?” This question was answered in the affirmative by Evdokimov [7]; the smallest possible alphabet has cardinality 4, as was proved by Keränen [8]. In a similar way, Abelian $k$ th powers are defined for arbitrary $k\geq 2$ . Dekking [5] constructed infinite ternary words without Abelian cubes and infinite binary words without Abelian 4th powers. The results by Dekking and Keränen form an Abelian analog of the seminal result by Thue [14]: there exist an infinite ternary word containing no squares (factors of the form $uu$ ) and an infinite binary word containing no cubes (factors of the form $uuu$ ).

Integral powers of words were later generalized to rational (fractional) powers: given a word $u$ of length $n$ , take a length- $m$ prefix $v$ of the infinite word $uuu\cdots$ ; then $v$ is the $(\frac{m}{n})$ th power of $u$ ( $m>n$ is assumed). Usually, $v$ is referred to as a $(\frac{m}{n})$ -power. A word is said to be $\alpha$ -power-free if it contains no $(\frac{m}{n})$ -powers with $\frac{m}{n}\geq\alpha$ . Fractional powers gave rise to the notion of repetition threshold which is the function ${\sf RT}(k)=\inf\{\alpha:\text{there exists an infinite $k$-ary $\alpha$-power-free word}\}$ . The value ${\sf RT}(2)=2$ is known since Thue [15]. Dejean [4] showed that ${\sf RT}(3)=7/4$ and conjectured the remaining values ${\sf RT}(4)=7/5$ (proved by Pansiot [10]) and ${\sf RT}(k)=\frac{k}{k-1}$ for $k\geq 5$ (proved by efforts of many authors [9, 1, 3, 12]). An extension of the notions of fractional power and repetition threshold to the Abelian case was proposed by Samsonov and Shur [13]. Integral Abelian powers can be generalized to fractional ones in several ways; however, for the case $m\leq 2n$ , one definition of an Abelian $(\frac{m}{n})$ -power is preferable due to its symmetric nature. According to this definition, a word $vu\bar{v}$ is an Abelian $(\frac{m}{n})$ th power of the word $vu$ if $|vu|=n$ , $|vu\bar{v}|=m$ , and $\bar{v}$ is Abelian equivalent to $v$ (in [13], such Abelian powers were called strong). Note that the reversal of an Abelian $(\frac{m}{n})$ -power is also an Abelian $(\frac{m}{n})$ -power. In this paper, we consider only strong Abelian powers; see Section 2 for the definition in the case $m>2n$ . Given the definition of fractional Abelian powers, one naturally defines Abelian $\alpha$ -power-free words and Abelian repetition threshold ${\sf ART}(k)=\inf\{\alpha:\text{there exists an infinite $k$-ary Abelian $\alpha$-power-free word}\}$ . Cassaigne and Currie [2] showed that for any $\varepsilon>0$ their exists an Abelian $(1+\varepsilon)$ -power-free word over a finite alphabet of size $2^{\mathrm{poly}(\varepsilon^{-1})}$ . Surely, this bound is very loose but it proves that $\lim_{k\to\infty}{\sf ART}(k)=1$ . In [13], the lower bounds ${\sf ART}(4)\geq 9/5$ , ${\sf ART}(k)\geq\frac{k-2}{k-3}$ for $k\geq 5$ were proved and conjectured to be tight; in full, this conjecture is as follows.

Conjecture 1 ([13])

${\sf ART}(2)=11/3$ ; ${\sf ART}(3)=2$ ; ${\sf ART}(4)=9/5$ ; ${\sf ART}(k)=\frac{k-2}{k-3}$ for $k\geq 5$ .

Up to now, no exact values of ${\sf ART}(k)$ are known. One reason for the lack of progress in estimating ${\sf ART}(k)$ is the fact that the number of $k$ -ary Abelian $\alpha$ -power-free words can be finite but so huge that these words cannot be enumerated by exhaustive search.

In the present study, we approach Abelian $\alpha$ -power-free words using randomized depth-first search. The language ${\sf AF}(k,\alpha)$ of all $k$ -ary Abelian $\alpha$ -power-free words is viewed as a prefix tree ${\cal T}_{k,\alpha}$ : the elements of ${\sf AF}(k,\alpha)$ are nodes of the tree and $u$ is an ancestor of $v$ in the tree iff $u$ is a prefix of $w$ . The search starts at the root (empty word) and is organized as follows. Reaching a node $u$ for the first time, we choose a random letter $a$ , check ad hoc whether $ua\in{\sf AF}(k,\alpha)$ and descend to $ua$ if this node exists; visiting $u$ next times, we choose a random letter among the letters not chosen at $u$ before, and proceed the same way. If there is no choice, we return to the parent of $u$ (and thus will never reach $u$ in the future). We repeated the search multiple times and analysed the maximum level of a node reached in the tree and the change of level during the search. Based on this analysis, we state

Conjecture 2

${\sf ART}(2)>11/3$ ; $2<{\sf ART}(3)\leq 5/2$ ; ${\sf ART}(4)>9/5$ ; ${\sf ART}(5)=3/2$ ; $4/3<{\sf ART}(6)<3/2$ ; ${\sf ART}(k)=\frac{k-3}{k-4}$ for $k\geq 7$ .

As a first step in proving this conjecture, we prove ${\sf ART}(k)>\frac{k-2}{k-3}$ for $k=6,7,8,9,10$ by some exhaustive search (Theorem 4.1 in Section 4).

The paper is organized as follows. After preliminary Section 2, we describe the algorithms used in our study and the results obtained through experiments in Sections 3 and 4 respectively. Section 5 contains some final remarks and prospects for future studies.

2 Definitions and Notation

We study finite words over finite alphabets, using standard notation $\Sigma$ for a (linearly ordered) alphabet, $\sigma$ for its size, $\Sigma^{*}$ for the set of all finite words over $\Sigma$ , including the empty word $\lambda$ . For a length- $n$ word $u\in\Sigma^{*}$ we write $u=u[1..n]$ ; the elements of the range $[1..n]$ are positions in $u$ , the length of $u$ is denoted by $|u|$ . A word $w$ is a factor of $u$ if $u=vwz$ for some (possibly empty) words $v$ and $z$ ; the condition $v=\lambda$ (resp., $z=\lambda$ ) means that $w$ is a prefix (resp., suffix) of $u$ . Any factor $w$ of $u$ can be represented as $w=u[i..j]$ for some $i$ and $j$ ( $j<i$ means $w=\lambda$ ). A factor $w$ of $u$ can have several such representations; we say that $su[i..j]$ specifies the occurrence of $w$ at position $i$ .

A $k$ -power of a word $u$ is the concatenation of $k$ copies of $u$ , denoted by $u^{k}$ . This notion can be extended to $\alpha$ -powers for an arbitrary rational $\alpha>1$ . The $\alpha$ -power of $u$ is the word $u^{\alpha}=u\cdots uu^{\prime}$ such that $|u^{\alpha}|=\alpha|u|$ and $u^{\prime}$ is a prefix of $u$ . A word is $\alpha$ -free (resp., $\alpha^{+}\!$ -free) if no one of its factors is a $\beta$ -power with $\beta\geq\alpha$ (resp., $\beta>\alpha$ ).

The Parikh vector $\Psi(u)$ of a word $u\in\Sigma^{*}$ is an integer vector of length $\sigma$ whose coordinates are the numbers of occurrences of the letters from $\Sigma$ in $u$ . Thus, the word $acabac$ over the alphabet $\Sigma=\{a<b<c<d\}$ has the Parikh vector $(3,1,2,0)$ . Two words $u$ and $v$ are Abelian equivalent (denoted by $u\sim v$ ) if $\Psi(u)=\Psi(v)$ . The reversal of a word $u=u[1..n]$ is the word $u^{R}=u[n]u[n{-}1]\cdots u[1]$ . Clearly, $u\sim u^{R}$ . A nonempty word $u$ is an Abelian $k$ th power ( $k$ -A-power) if $u=w_{1}\cdots w_{k}$ , where $w_{i}\sim w_{j}$ for all indices $i,j$ . A 2-A-power is an Abelian square, and a 3-A-power is an Abelian cube. Thus, $k$ -A-powers generalize $k$ -powers by relaxing the equality of factors to their Abelian equivalence. However, there are many ways to generalize the notion of an $\alpha$ -power to the Abelian case, and all of them have drawbacks. The reason is that $u\sim v$ does not imply $u[i..j]\sim v[i..j]$ for any pair of factors of $u$ and $v$ . If $1<\alpha\leq 2$ , we define an $\alpha$ -A-power as a word $vuv^{\prime}$ such that $\frac{|vuv^{\prime}|}{|vu|}=\alpha$ and $v\sim v^{\prime}$ . The advantage of this definition is that the reversal of an $\alpha$ -A-power is an $\alpha$ -A-power as well. For $\alpha>2$ the situation is worse: all natural definitions compatible with the definition of $k$ -A-power are not symmetric with respect to reversals (see [13] for more details). So we give the definition which is compatible with the case $\alpha\leq 2$ : an $\alpha$ -A-power is a word $u_{1}\cdots u_{k}u^{\prime}$ such that $\frac{|u_{1}\cdots u_{k}u^{\prime}|}{|u_{1}|}=\alpha$ , $k=\lfloor\alpha\rfloor$ , $u_{1}\sim\cdots\sim u_{k}$ , and $u^{\prime}$ is Abelian equivalent to a prefix of $u_{1}$ . In [13], such words are called strong Abelian $\alpha$ -powers. For a given $\alpha$ , $\alpha$ -A-free and $\alpha^{+}\!$ -A-free words are defined in the same way as $\alpha$ -free ( $\alpha^{+}\!$ -free) words. It is convenient to extend rational numbers with “numbers” of the form $\alpha^{+}$ , postulating the equivalence of the inequalities $\beta>\alpha$ and $\beta\geq\alpha^{+}$ (resp., $\beta\leq\alpha$ and $\beta<\alpha^{+}$ ).

A language is any subset of $\Sigma^{*}$ . The reversal $L^{R}$ of a language $L$ consists of the reversals of all words in $L$ . The $\alpha$ -A-free language over $\Sigma$ (where $\alpha$ belongs to extended rationals) consists of all $\alpha$ -A-free words over $\Sigma$ and is denoted by ${\sf AF}(\sigma,\alpha)$ . These languages are the main objects of our study aimed at finding the Abelian repetition threshold, which is the function ${\sf ART}(k)=\inf\{\alpha:{\sf AF}(k,\alpha)\text{ is infinite}\}$ . The languages ${\sf AF}(\sigma,\alpha)$ are closed under permutations: if $\pi$ is a permutation of the alphabet, then the words $u$ and $\pi(u)$ are $\alpha$ -A-free for exactly the same values of $\alpha$ . This makes possible the enumeration of the words in languages ${\sf AF}(\sigma,\alpha)$ by considering only lexmin words: a word $u\in\Sigma^{*}$ is lexmin if $u<\pi(u)$ for any permutation $\pi$ of $\Sigma$ .

Suppose that a language $L$ is factorial (i.e., closed under taking factors of its words); for example, all languages ${\sf AF}(\sigma,\alpha)$ are factorial. Then $L$ can be represented by its prefix tree ${\cal T}_{L}$ , which is a rooted labeled tree whose nodes are elements of $L$ and edges have the form $u\xrightarrow{a}ua$ , where $a$ is a letter. Thus $u$ is an ancestor of $v$ iff $u$ is a prefix of $v$ . We study the languages ${\sf AF}(\sigma,\alpha)$ through different types of search in their prefix trees.

3 Algorithms

In this section we present the algorithms we develop for the use in experiments. First we describe the random depth-first search in the prefix tree ${\cal T}={\cal T}_{L}$ of an arbitrary factorial language $L$ . Given a number $N$ , the algorithm visits $N$ distinct nodes of ${\cal T}$ following the depth-first order and returns the maximum level of a visited node. The search can be easily augmented to return the word corresponding to the node of maximum level, or to log the sequence of levels of visited nodes. Algorithm 1 below describes one iteration of the search. In the algorithm, $u=u[1..n]$ is the word corresponding to the current node; ${\sf Set}[u]$ is the set of all letters $a$ such that the search has not tried the node $ua$ yet; ${\sf ml}$ is the maximum level reached so far; $\mathsf{count}$ is the number of visited nodes; ${\cal L}(u)$ is the predicate returning $\mathsf{true}$ if $u\in L$ and $\mathsf{false}$ otherwise. The lines 3 and 8 refer to the updates of data structures used to compute ${\cal L}(u)$ . The search starts with $u=\lambda$ , ${\sf ml}=0$ , $\mathsf{count}=1$ . A variant of this search algorithm was used in [11] to numerically estimate the entropy of some $\alpha$ -free and $\alpha$ -A-free languages.

Algorithm 1 Random depth-first search in

{\cal T}(L)

: one iteration

1:if

\mathsf{count}=N

then break

\triangleright

search finished

2:if

{\sf Set}[u]=\varnothing

then

\triangleright

all children of

u

were visited

3: [update data structures]

u\leftarrow u[1..|u|{-}1]

\triangleright

return to the parent of

u

5:else

a\leftarrow

random(

{\sf Set}[u]

);

{\sf Set}[u]\leftarrow{\sf Set}[u]-a

\triangleright

take random unused letter

7: if

{\cal L}(ua)

then

\triangleright

the node

ua

is in

{\cal T}(L)

8: [update data structures]

u\leftarrow ua

;

{\sf Set}[u]\leftarrow\Sigma

;

\mathsf{count}\leftarrow\mathsf{count}+1

\triangleright

visit

ua

10: if

|u|>{\sf ml}

then

{\sf ml}\leftarrow|u|

\triangleright

update the maximum level

The key to an efficient search is a fast algorithm computing the predicate ${\cal L}(u)$ . The following fact is very useful: if $u^{\prime}$ is a proper prefix of $u$ , then the node for $u^{\prime}$ exists and hence ${\cal L}(u^{\prime})=\mathsf{true}$ . Below we present four algorithms checking, for a given $\alpha$ , whether a word has a $\beta$ -A-power with $\beta\geq\alpha$ as a suffix. The cases $\alpha<2$ and $\alpha>2$ are considered in Sections 3.1 and 3.2 respectively.

3.1 Avoiding small powers

Let $\alpha<2$ and $u$ be a word of length $n$ such that all proper prefixes of $u$ are $\alpha$ -A-free. To prove $u$ $\alpha$ -A-free, it is necessary and sufficient to show that

$(\star)$

no suffix of $u$ can be written as $xyz$ such that $|z|>0$ , $x\sim z$ , and $\frac{|xyz|}{|xy|}\geq\alpha$ .

Remark 1

Since Abelian equivalence is not closed under taking any sort of factors of words, the ratio $\frac{|xyz|}{|xy|}$ in $(\star)$ can significantly exceed $\alpha$ . For example, all proper prefixes, and even all proper factors, of the word $u=abcde\,bdaec$ are $\frac{3}{2}$ -A-free, while $u$ is an Abelian square. Hence for each suffix $z$ of $u$ one should check multiple candidates to the factor $x$ in $(\star)$ . The number of such candidates can be as big as $\Theta(n)$ ; in total, $\Theta(n^{2})$ candidates to the pair $(x,z)$ should be analysed.

A reasonable approach is to store the Parikh vectors of all prefixes of $u$ ; they spend $O(n\sigma)$ words of space in total and require $O(\sigma)$ time for update when a letter is appended or deleted on the right end of the word. Then the Parikh vector of each factor of $u$ is obtained as the difference of Parikh vectors of corresponding prefixes. So one comparison of factors takes $O(\sigma)$ time, which means $\Theta(n^{2}\sigma)$ time for performing all comparisons of candidate pairs $(x,z)$ in $(\star)$ in a naive way (see Remark 1). Algorithm 2 below gets many of the comparisons for free. It makes use of two length- $n$ arrays for each letter $a\in\Sigma$ : $c_{a}[i]$ is the number of occurrences of $a$ in $u[1..i]$ (= a coordinate of the Parikh vector of the prefix $u[1..i]$ ) and $d_{a}[i]$ is the position of $i$ th from the left letter $a$ in the current word $u$ . Each of the arrays can be updated in $O(\sigma)$ time when a letter is appended or deleted on the right end of $u$ . We specify the lines 3 and 8 of Algorithm 1 as follows. At line 3, we delete the Parikh vector of $u$ from $c$ -arrays and delete the last element of $d_{b}$ , where $b$ is the last letter of $u$ . At line 8, we add the Parikh vector of $ua$ to $c$ -arrays and add a new element $|ua|$ to $d_{a}$ .

The arrays $c_{a}$ and $d_{a}$ are used to compute two auxiliary functions, $\mathsf{Parikh}(i,j)$ and $\mathsf{cover}(\vec{P},j)$ . The function $\mathsf{Parikh}(i,j)$ returns the Parikh vector of $u[i..j]$ ; its coordinates are just the differences of the form $c_{a}[j]-c_{a}[i-1]$ . The function $\mathsf{cover}(\vec{P},j)$ returns the biggest number $i$ such that $\Psi(u[i..j])\geq\vec{P}$ or zero if there is no such number. Thus the function returns 0 if $u[1..j]$ contains, for some $a$ , less $a$ ’s than $\vec{P}(a)$ ; i.e., if $c_{a}[j]<\vec{P}[a]$ . If no such $a$ exists, then $i=\min_{a\in\Sigma}\{d_{a}[c_{a}[j]-\vec{P}[a]+1]\}$ .

Algorithm 2 Abelian powers detection (case

\alpha<2

)

1:function

\mathsf{alphafree}(u)

\triangleright

u

=word;

n=|u|

\mathit{free}\leftarrow\mathsf{true}

\triangleright

\alpha

-A-freeness flag

3:for

i=n

downto 1+

\lceil n/2\rceil

\triangleright

z=u[i..n]

right\leftarrow i-1

len\leftarrow n-i+1

;

\vec{P}\leftarrow\mathsf{Parikh}(i,n)

\triangleright

length and Parikh vector of

z

\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)

7: if

\mathit{left}=0

then break

\triangleright

\vec{P}(u[1..right])\not\geq\vec{P}(z)

8: while

\mathit{left}\geq\max\{1,\lceil\frac{\alpha i-1-n}{\alpha-1}\rceil\}

\triangleright

guarantees

\frac{|xyz|}{|xy|}\geq\alpha

9: if

right-\mathit{left}+1=len

then

\triangleright

x=u[\mathit{left}..right]\sim z

10:

\mathit{free}\leftarrow\mathsf{false}

; break

11: else

\triangleright

shift

right

leftwards, skip redundant comparisons

12:

right\leftarrow\mathit{left}+len-1

13:

\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)

14: if

\mathit{free}=\textsf{false}

then break

15:return

\mathit{free}

\triangleright

the answer to “is

u

\alpha

-A-free?”

Proposition 1

Let $\alpha$ be a number such that $1<\alpha<2$ and $u$ be a word all proper prefixes of which are $\alpha$ -A-free. Then Algorithm 2 correctly detects whether $u$ is $\alpha$ -A-free.

Proof

Let us show that Algorithm 2 verifies the condition $(\star)$ . The outer cycle of the algorithm fixes the first position $i$ of the suffix $z$ of $u$ ; the suffixes are analysed in the order of increased length $len=|z|$ . If a forbidden suffix $xyz$ is detected during the iteration (we discuss the correctness and completeness of this detection below), then the algorithm breaks the outer cycle in line 14 and returns $\mathsf{false}$ . Thus at the current iteration of the outer cycle the condition $(\star)$ is already verified for all shorter suffixes. The iteration uses a simple observation: if $x\sim z$ , then every word $v$ , containing $x$ , satisfies $\Psi(v)\geq\Psi(z)$ . We proceed as follows. We fix the rightmost position $right$ where a factor $x$ satisfying $x\sim z$ can end. Initially $right=i-1$ as $x$ can immediately precede $z$ (see the example in Remark 1). Then we compute the shortest factor $v=u[\mathit{left}..right]$ such that $\Psi(v)\geq\Psi(z)$ . If $v=x$ , the suffix $xz$ of $u$ violates $(\star)$ . Otherwise $x$ cannot begin later than at the position $\mathit{left}$ by the construction of $v$ . Hence we decrease $right$ by setting $right=\mathit{left}+|z|-1$ and repeat the above procedure in a loop. The verification of $(\star)$ for $z$ ends successfully when either $v$ does not exist (i.e., $\Psi(u[1..right])\not\geq\Psi(z)$ for the current value of $right$ ) or $\mathit{left}$ is too small (i.e., $xyz=u[\mathit{left}..n]$ with $|x|=|z|$ means $\frac{|xyz|}{|xy|}<\alpha$ ). The described process is illustrated by Fig. 1. Details are provided below.

Refer to caption — Figure 1: Illustrating the proof of Proposition 2. Processing the suffix $z$ , Algorithm 2 successively finds three words $v$ satisfying $\Psi(v)\geq\Psi(z)$ . On the left picture, the position $\mathit{left}$ of $v^{(3)}$ is smaller than the bound in line 8, so the verification of $(\star)$ for $z$ is finished. On the right picture, $|v^{(3)}|=|z|$ , so a forbidden suffix, starting with $v^{(3)}$ , is detected.

In lines 4–6 the algorithm calls $\mathsf{Parikh}$ to compute $\Psi(z)$ and $\mathsf{cover}$ to find the mentioned factor $v=u[\mathit{left}..right]$ for $right=i-1$ . If $\mathit{left}=0$ , then $\Psi(u[1..i{-}1])\not\geq\Psi(u[i..n])$ and hence every suffix $xyz$ of $u$ satisfies $x\not\sim z$ . Moreover, one can observe that $\Psi(u[1..j{-}1])\not\geq\Psi(u[j..n])$ for each $j<i$ which immediately verifies $(\star)$ for all suffixes of $u$ that are longer than $z$ . Hence in this case the verification of $(\star)$ is already done; respectively, the algorithm breaks the outer cycle in line 7 and returns $\mathsf{true}$ .

If no break has happened, the algorithm enters the inner cycle, checks whether $v=x$ (line 9) and breaks with the output $\mathsf{false}$ if this condition holds. If it does not, the algorithm decreases $right$ as described above (line 12) and computes the new factor $v$ (line 13). If $v$ does not exist, $\mathit{left}$ gets 0, which results in the immediate exit from the inner cycle. If $v$ is computed but its position is too small, then the cycle is also exited. The exit means the end of the $i$ th iteration.

Thus, Algorithm 2 returns $\mathsf{false}$ only if it finds a suffix $xyz$ of $u$ which violates $(\star)$ . For the other direction, let $xyz=u[j..n]$ violate $(\star)$ such that $|z|$ is minimal over all suffixes violating it. Then Algorithm 2 cannot stop before the iteration which checks $z$ . During this iteration, $\mathit{left}$ cannot become smaller than $j$ by the definition of the factor $v$ . As $right$ decreases at each iteration of the inner cycle, eventually $x$ will be found. Thus the algorithm indeed verifies $(\star)$ and thus detects the $\alpha$ -A-freeness of $u$ . ∎

Remark 2

As both $\mathsf{Parikh}$ and $\mathsf{cover}$ work in $O(\sigma)$ time, Algorithm 2 processes a word $u$ of length $n$ in $O((K+n)\sigma)$ time, where $K$ is the total number of the inner cycle iterations during the course of the algorithm. Clearly, $K=O(n^{2})$ . The results of experiments suggest that $K=\Theta(n^{3/2})$ on expectation. This is indeed the case if $u$ is a random word, as Lemma 1 below shows. This lemma implies that the iteration processing a suffix $z$ builds, on expectation, $O(\sqrt{|z|})$ words $v$ .

Lemma 1

Suppose that an infinite word $\mathbf{w}$ is chosen uniformly at random among all $\sigma$ -ary infinite words, $z$ is a prefix of $\mathbf{w}$ , and $v$ is the shortest prefix of $\mathbf{w}[|z|{+}1..\infty]$ such that $\Psi(v)\geq\Psi(z)$ . Then the expected length of $v$ is $|z|+\Omega(\sqrt{|z|})$ .

Proof

Let $\ell=|z|$ , $\delta=|v|-|z|$ . First consider the case $\Sigma=\{0,1\}$ . The process can be viewed as follows: first, $z$ is generated by $\ell$ tosses of a fair coin; then another $\ell$ tosses generate some prefix $x$ of $v$ ; additional tosses are made one by one until the desired result $\Psi(v)\geq\Psi(z)$ is reached after $\delta$ tosses. The Parikh vector of a word of a known length over $\Sigma$ is determined by the number of 1’s. Hence $\Psi(z)$ is a random variable $\xi$ with the binomial distribution $\mathsf{bin}(\ell,\frac{1}{2})$ ; similarly, $\Psi(x)$ is a random variable $\eta$ with the same distribution.

Due to symmetry, $\eta$ and $\ell-\eta$ have the same distribution. Hence we can replace $\xi-\eta$ by $\xi+\eta-\ell$ . The random variable $\xi+\eta$ has the binomial distribution $\mathsf{bin}(2\ell,\frac{1}{2})$ , so its standard deviation is $\sqrt{\ell/2}$ . Thus $E(\delta)=2E(|\xi-\eta|)=\sqrt{2\ell}=\Omega(\sqrt{\ell})$ , as desired.

Over larger alphabets the expectation of $\delta$ can only increase. The easiest way to see this is to split $\Sigma$ arbitrarily into two subsets $K_{1}$ and $K_{2}$ of equal size. Then $x$ with respect to $z$ has a deficiency of letters from one of these subsets, say, $K_{1}$ . By the argument for the binary alphabet, $\sqrt{2\ell}$ additional letters is needed, on expectation, to cover this deficiency. This is a necessary (but not sufficient) condition to obtain the word $v$ . Hence $E(\delta)=\Omega(\sqrt{\ell})$ . ∎

Algorithm 2 significantly speeds up the naive algorithm but is still rather slow. For the case $\alpha\leq 3/2$ a much faster dictionary-based Algorithm 3 is presented below. However, Algorithm 3 consumes the amount of memory which is quadratic in $n$ ; this limits the depth of the search to the values of about $10^{4}$ for an ordinary laptop.

When processing a word $u=u[1..n]$ , the algorithm has in the dictionary all factors of $u[1..n{-}1]$ up to the Abelian equivalence. Recall that a dictionary contains a set of pairs (key, value), where all keys are unique, and supports fast lookup, addition and deletion by key. For the dictionary used in Algorithm 3, the keys are Parikh vectors and the values are lists of positions, in the increasing order, of the factors having this Parikh vector. The algorithm accesses only the last (maximal) element of the list. Let us describe the updates of the dictionary (lines 3 and 8 of Algorithm 1). At line 3, we delete all suffixes of $u$ from the dictionary. For a suffix $z$ , this means the deletion of the last element from the list $dict[\Psi(z)]$ ; if the list becomes empty, the entry for $\Psi(z)$ is also deleted. At line 8, all suffixes of $ua$ are added to the dictionary. For a suffix $z$ , if $\Psi(z)$ was not in the dictionary, an entry is created; then the position $|ua|-|z|+1$ is added to the end of the list $dict[\Psi(z)]$ .

Algorithm 3 Dictionary-based Abelian powers detection (

\alpha\leq 3/2

)

1:function

\mathsf{alphafreedict}(u)

\triangleright

u

=word;

n=|u|

\mathit{free}\leftarrow\mathsf{true}

\triangleright

\alpha

-A-freeness flag

\vec{P}\leftarrow\vec{0}

4:for

i=n

downto 1+

\lceil n/2\rceil

\triangleright

z=u[i..n]

len\leftarrow n-i+1

\triangleright

length of

z

\vec{P}[u[i]]\leftarrow P[u[i]]+1

\triangleright

get

\Psi(z)

from

\Psi(u[i{+}1..n])

pos\leftarrow dict[\vec{P}].last

\triangleright

position of last occurrence of some

x\sim z

, if exists

8: if

pos\leq i-len

and

pos\geq\lceil\frac{\alpha i-1-n}{\alpha-1}\rceil

then

\triangleright

xyz

is forbidden

\mathit{free}\leftarrow\mathsf{false}

; break

10:return

\mathit{free}

\triangleright

the answer to “is

u

\alpha

-A-free?”

Proposition 2

Let $\alpha$ be a number such that $1<\alpha\leq 3/2$ and $u$ be a word all proper prefixes of which are $\alpha$ -A-free. Then Algorithm 3 correctly detects whether $u$ is $\alpha$ -A-free.

Proof

Let us show that Algorithm 3 verifies $(\star)$ . First suppose that the algorithm returned $\mathsf{false}$ . Then it broke from the for cycle (line 9); let $z$ be the last suffix processed. The lookup by the key $\Psi(z)$ returned the position of a factor $x\sim z$ , and the condition in line 8 was true. Then the suffix $xyz=u[pos..n]$ violates $(\star)$ . Indeed, $x\sim z$ since $x$ was found by the key $\Psi(z)$ ; the first inequality means that $x$ and $z$ do not overlap in $u$ ; the second inequality is equivalent to $\frac{|xyz|}{|xy|}\geq\alpha$ .

Now suppose that the algorithm returned $\mathsf{true}$ . Aiming at a contradiction, assume that $u$ has a suffix violating $(\star)$ . Let $xyz$ $(x\sim z)$ be the shortest such suffix. Consider the iteration of the for cycle, where $z$ was processed. The key $\Psi(z)$ was present in the dictionary because $x\sim z$ . If $pos$ (line 7) corresponded to the $x$ from our “bad” suffix, i.e., $xyz=u[pos..n]$ , then both inequalities in line 8 held because $x$ and $z$ do not overlap in $u$ and $\frac{|xyz|}{|xy|}\geq\alpha$ . But then the algorithm would have returned $\mathsf{false}$ , contradicting our assumption. Hence $pos$ was the position of some other $x^{\prime}\sim z$ which occurs in $u$ later than $x$ . By the choice of the suffix $xyz$ , $u$ cannot have shorter suffix $x^{\prime}y^{\prime}z$ with $x^{\prime}\sim z$ . This means that the occurrences of $x^{\prime}$ and $z$ overlap (see Fig. 2).

Note that $x^{\prime}$ and $x$ also overlap. Otherwise, $xyz$ has a prefix of the form $x\hat{y}x^{\prime}$ and $\frac{|x\hat{y}x^{\prime}|}{|x^{\prime}\hat{y}|}\geq\frac{|xyz|}{|xy|}\geq\alpha$ , contradicting the condition that all proper prefixes of $u$ are $\alpha$ -A-free. Then $x=x_{1}x_{2}$ , $x^{\prime}=x_{2}yx_{3}$ , $z=x_{3}z_{1}$ , as shown in Fig. 2, and $x_{1},x_{2},x_{3},z_{1}$ are nonempty. We observe that $x\sim x^{\prime}\sim z$ imply $x_{1}\sim yx_{3}$ and $x_{2}y\sim z_{1}$ . By the condition on the prefixes of $u$ , $\frac{|x_{1}x_{2}yx_{3}|}{|x_{1}x_{2}|}<\alpha\leq 3/2$ . Hence $|yx_{3}|<|x_{2}|$ and then $|x_{3}|<|x_{2}y|=|z_{1}|$ . Therefore $\frac{|x_{2}yx_{3}z_{1}|}{|x_{2}yx_{3}|}>3/2\geq\alpha$ , so the suffix $x^{\prime}z_{1}=x_{2}yx_{3}z_{1}$ of $u$ violates $(\star)$ . But $|x^{\prime}z_{1}|<|xyz|$ , contradicting the choice of $xyz$ . This contradiction proves that $u$ satisfies $(\star)$ . ∎

Dictionaries based on hash table techniques such as linear probing or cuckoo hashing guarantee expected constant time per dictionary operation. As Algorithm 3 consists of a single cycle with a constant number of operations inside, the following statement is straightforward.

Proposition 3

For a word of length $n$ , Algorithm 3 performs $O(n)$ operations, including dictionary operations.

Remark 3

A slight modification of Algorithm 3 allows one to process the important case $\alpha=(3/2)^{+}$ within the same complexity bound. The whole argument from the proof of Proposition 2 remains valid for $\alpha=(3/2)^{+}$ except for one specific situation: in Fig. 2 it is possible that $y$ is empty and $|x_{1}|=|x_{2}|=|x_{3}|=|x_{4}|$ . In this situation Algorithm 3 misses the Abelian square $xz$ . To fix this, we add a patch after line 7:
7.5: if $pos=i-len/2$ then $pos\leftarrow pos.next$
As an example, consider $u=abcdbadc$ . Processing the suffix $z=badc$ , Algorithm 3 retrieves $pos=3$ from the dictionary by the key $\Psi(z)$ . The corresponding factor $x^{\prime}=cdba$ overlaps $z$ and the condition in line 8 would fail for $pos$ . However, $pos$ satisfies the condition in the inserted line 7.5 and thus the factor $x=abcd$ at $pos=1$ will be reached. The condition in line 8 holds for $pos=1$ and the forbidden Abelian repetition is detected.

Remark 4

Algorithm 3 can be further modified to work for all $\alpha<2$ . If we replace the patch from Remark 3 with the following one:
7.5: while $pos>i-len$ do $pos\leftarrow pos.next$
the algorithm will find the closest factor $x\sim z$ which does not overlap with $z$ . This new patch introduces an inner cycle and thus affects the time complexity but the algorithm remains faster in practice than Algorithm 2.

3.2 Avoiding big powers

Let $\alpha>2$ . The case $2<\alpha<3$ for ternary words and the case $3<\alpha<4$ for binary words are relevant to the studies of Abelian repetition threshold. We provide here the algorithms for the first case; the algorithms for the second case are very similar (the only difference is that one should check for Abelian cubes instead of Abelian squares). An Abelian $\beta$ -power with $\alpha\leq\beta\leq 3$ has the form $ZZ^{\prime}z$ , where $Z\sim Z^{\prime}$ , $z$ is equivalent to a prefix of $Z$ , and $\frac{|ZZ^{\prime}z|}{|Z|}\geq\alpha$ . We can write the Abelian square $ZZ^{\prime}$ as $xy$ where $|x|\leq|y|$ and $x\sim z$ . Consequently, if all proper prefixes of a word $u$ are $\alpha$ -A-free, then $u$ is $\alpha$ -A-free iff the following analog of $(\star)$ holds:

$(*)$

no suffix of $u$ can be written as $xyz$ such that $|y|\geq|x|>0$ , $x\sim z$ , $xy$ is an Abelian square, and $\frac{2|xyz|}{|xy|}\geq\alpha$ .

Verifying $(*)$ for $u$ , we proceed for its suffix $z$ as follows. Within the range determined by $\alpha$ , we search for all factors $x=u[\mathit{left}..right]$ such that $x\sim z$ . The search is organized as in Algorithm 2 (see Fig. 1). For each $x$ we consider the corresponding suffix $xyz$ of $u$ and check whether $xy$ is an Abelian square. If yes, $xyz$ violates $(*)$ . In Algorithm 4 below, we make use of the arrays $c_{a},d_{a}$ and functions $\mathsf{Parikh},\mathsf{cover}$ designed for Algorithm 2.

Algorithm 4 Abelian powers detection (case

2<\alpha<3

)

1:function

\mathsf{ALPHAfree}(u)

\triangleright

u

=word;

n=|u|

\mathit{free}\leftarrow\mathsf{true}

\triangleright

\alpha

-A-freeness flag

3:for

i=n

downto 1+

\lceil 2n/3\rceil

\triangleright

z=u[i..n]

right\leftarrow i-1

len\leftarrow n-i+1

;

\vec{P}\leftarrow\mathsf{Parikh}(i,n)

\triangleright

length and Parikh vector of

z

\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)

7: if

\mathit{left}=0

then break

\triangleright

\Psi(u[1..right])\not\geq\Psi(z)

8: while

\mathit{left}\geq\max\{1,\lceil\frac{\alpha i-1-2n}{\alpha-2}\rceil\}

\triangleright

guarantees

\frac{2|xyz|}{|xy|}\geq\alpha

9: if

\mathit{left}+len-1=right

then

\triangleright

x=u[\mathit{left}..right]\sim z

10: if

2\mid(i-\mathit{left})

and

\sum_{a\in\Sigma}\big{|}c_{a}[i{-}1]+c_{a}[\mathit{left}{-}1]-2c_{a}[\frac{i+\mathit{left}}{2}{-}1]\big{|}=0

then

11:

\mathit{free}\leftarrow\mathsf{false}

; break

\triangleright

xy=u[\mathit{left}..i{-}1]

is an Abelian square

12: else

13:

right\leftarrow right-1

\triangleright

right bound for the next search

14: else

15:

right\leftarrow\mathit{left}+len-1

\triangleright

right bound for the next search

16:

\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)

17: if

\mathit{free}=\textsf{false}

then break

18:return

\mathit{free}

\triangleright

the answer to “is

u

\alpha

-A-free?”

Proposition 4

Let $\alpha$ be a number such that $2<\alpha<3$ and $u$ be a word all proper prefixes of which are $\alpha$ -A-free. Then Algorithm 4 correctly detects whether $u$ is $\alpha$ -A-free.

Proof

Algorithm 4 is similar to Algorithm 2, so we focus on their difference. If some suffix $xyz$ violates $(*)$ , then $|z|\leq|xyz|/3\leq n/3$ ; hence the range for the outer cycle in line 3. For a fixed $z$ we repeatedly seek for the shortest factor $v=u[\mathit{left}..right]$ with the given right bound and the property $\Psi(v)\geq\Psi(z)$ . If $|v|=|z|$ (condition in line 9 holds), then $v$ is a candidate for $x$ in the suffix $xyz$ violating $(*)$ . The initial value for $right$ (line 4) is set to ensure the condition $|x|\leq|y|$ . The candidate found in line 9 is checked in line 10 for the remaining condition: $xy$ is an Abelian square. Namely, we check that $|xy|$ is even and its left and right halves have the same Parikh vector. If this condition holds, the algorithm broke both inner and outer cycles and returns $\mathsf{false}$ . If the condition fails, we decrease $right$ by 1 and compute the factor $v$ for this new right bound. The rest is the same as in Algorithm 2. So we can conclude that Algorithm 4 verifies $(*)$ . ∎

Remark 5

The inner cycle of Algorithm 4 works in $O(\sigma)$ time, and so Algorithm 4 has the same time complexity $O((K+n)\sigma)$ as Algorithm 2. (Recall that $K$ is the total number of iterations of the inner cycle during processing $u$ .)

Algorithm 4 is rather slow. But it appears that dual Abelian powers can be detected by a much faster Algorithm 5. Let us give the definitions. As was mentioned in Section 2, the reversal of an $\alpha$ -A-power for $\alpha>2$ is not necessarily an $\alpha$ -power. For example, $u=abc\,bac\,a$ is a $(7/3)$ -A-power while $u^{R}=aca\,bcb\,a$ does not begin with an Abelian square. We call $u$ a dual $\alpha$ -A-power if $u^{R}$ is an $\alpha$ -A-power; the notion of dual $\alpha$ -A-free word is defined by analogy with $\alpha$ -A-free word. Dual $\alpha$ -A-free words are exactly the reversals of $\alpha$ -A-free words.

Assume that all proper prefixes of a word $u$ are dual $\alpha$ -A-free, where $2<\alpha<3$ . Then $u$ is dual $\alpha$ -A-free iff the following analog of $(*)$ holds:

$(\dagger)$

no suffix of $u$ can be written as $xyz$ such that $y\sim z$ , $x$ is equivalent to a suffix of $z$ , and $\frac{|xyz|}{|z|}\geq\alpha$ .

Algorithm 5 Dual Abelian powers detection (

2<\alpha<3

)

1:function

\mathsf{dualALPHAfree}(u)

\triangleright

u

=word;

n=|u|

\mathit{free}\leftarrow\mathsf{true}

\triangleright

\alpha

-A-freeness flag

i\leftarrow n

4:while

i\geq 1+\lceil\frac{\alpha-1}{\alpha}n\rceil

\triangleright

z=u[i..n]

len\leftarrow n-i+1

;

\vec{P}\leftarrow\mathsf{Parikh}(i,n)

\triangleright

length and Parikh vector of

z

\mathit{left}\leftarrow\mathsf{cover}(\vec{P},i-1)

\triangleright

computing

v

7: if

\mathit{left}+len=i

then

\triangleright

|v|=|z|\Rightarrow v=y\sim z

j=\lceil(\alpha-2)\cdot len\rceil

\triangleright

minimal length of

x

9: while

j\leq len

\triangleright

possible lengths of

x

10:

\vec{P}_{1}\leftarrow\mathsf{Parikh}(n{-}j{+}1,n)

\triangleright

Parikh vector of the length-

j

suffix of

z

11:

\mathit{left}_{1}\leftarrow\mathsf{cover}(\vec{P}_{1},\mathit{left}-1)

\triangleright

computing

v_{1}

for

x

12: if

\mathit{left}_{1}+j=\mathit{left}

then

\triangleright

x

is found

13:

\mathit{free}\leftarrow\mathsf{false}

; break

14: else

15:

j\leftarrow\mathit{left}-\mathit{left}_{1}

16:

i\leftarrow i-1

17: else

18:

i\leftarrow\lceil(n+\mathit{left})/2\rceil

19: if

\mathit{free}=\textsf{false}

then break

20:return

\mathit{free}

\triangleright

the answer to “is

u

dual

\alpha

-A-free?”

Proposition 5

Let $\alpha$ be a number such that $2<\alpha<3$ and $u$ be a word all proper prefixes of which are dual $\alpha$ -A-free. Then Algorithm 5 correctly detects whether $u$ is dual $\alpha$ -A-free.

Proof

If some suffix $xyz$ violates $(\dagger)$ , then $|z|\leq|xyz|/\alpha\leq n/\alpha$ ; hence the range for the outer cycle in line 3. The general scheme is as follows. For each processed suffix $z$ , the algorithm first checks if $u$ ends with an Abelian square $yz$ ( $y\sim z$ ); if yes, it checks whether $yz$ is preceded by some $x$ which is equivalent to a suffix of $z$ . If such an $x$ is found, the algorithm detects a violation of $(\dagger)$ and stops. If either $x$ or $y$ is not found, the algorithm moves to the next appropriate suffix. Let us consider the details.

In line 6, the shortest $v=u[\mathit{left}..i{-}1]$ such that $vz$ is a suffix of $u$ and $\Psi(v)\geq\Psi(z)$ is computed. If $|v|=|z|$ (the condition in line 7), then $y=v$ is found and we enter the inner cycle to find $x$ . If $|v|>|x|$ , we note that the suffixes of $u$ of lengths between $2|z|$ and $|vz|-1$ cannot be Abelian squares; then the next suffix to be considered has the length $\lceil\frac{|vz|}{2}\rceil$ , as is set in line 18. In the inner cycle, a similar idea is implemented: for each processed suffix $z_{1}$ of $z$ the algorithm finds the shortest word $v_{1}=u[\mathit{left}_{1}..\mathit{left}-1]$ satisfying $\Psi(v_{1})\geq\Psi(z_{1})$ (line 11). If $|v_{1}|=|z_{1}|$ (line 12), then $x$ is found; otherwise, the next suffix of $z$ to be checked is of length $|v_{1}|$ (line 15). The inner cycle breaks if this length exceeds the length of $z$ .

We have shown that Algorithm 5 stops with the answer $\mathsf{false}$ if it finds a suffix $xyz$ of $u$ that violates $(\dagger)$ ; if it finishes the check of the suffix $z$ without breaking or skips this suffix at all, then $u$ has no suffix $xyz$ , violating $(\dagger)$ . Therefore, the algorithm verifies $(\dagger)$ . ∎

Algorithm 5 works extremely fast compared to other algorithms from this section. The following statement holds for the case of the ternary alphabet.

Proposition 6

For a word $u$ picked up uniformly at random from the set $\{0,1,2\}^{n}$ , Algorithm 5 works in $\Theta(\sqrt{n})$ expected time.

Proof

Lemma 1 says that the expected length of the word $v$ found in line 6 is $|z|+\Omega(\sqrt{|z|})$ and thus, on expectation, the assignment in line 18 leads to skipping $\Omega(\sqrt{|z|})$ suffixes of $u$ . Hence the expected total number of processed suffixes of $u$ is $O(\sqrt{n})$ . By the same lemma, the inner cycle for a suffix $z$ runs, on expectation, $O(\sqrt{|z|})$ iterations, so its expected time complexity is $O(\sqrt{|z|})$ . Thus, processing the suffix of length $\ell$ , Algorithm 5 performs $O(1)+p_{\ell}\cdot O(\sqrt{\ell})$ operations, where $p_{\ell}$ is the probability to enter the inner cycle, i.e., the probability that two random ternary words of length $\ell$ are Abelian equivalent. Let us estimate this probability. One has

p_{\ell}\leq\max_{k_{1},k_{2},k_{3}}\binom{\ell}{k_{1},k_{2},k_{3}}\Big{/}3^{\ell},

where the denominator is the number of ternary words of length $\ell$ and the numerator is the maximum size of a class of Abelian equivalent ternary words of length $\ell$ . This maximum, reached for (almost) equal $k_{1},k_{2},k_{3}$ , can be estimated by the Stirling formula to $\Theta(3^{\ell}/\ell)$ . Thus $p_{\ell}=O(1/\ell)$ . Then Algorithm 5 performs, on expectation, $O(1)$ operations per iteration of the outer cycle. The result now follows. ∎

4 Experimental results

We ran a big series of experiments for $\alpha$ -A-free languages over the alphabets of size $2,3,\ldots,10$ . Each of the experiments is a set of random walks in the prefix tree of a given language. Each walk follows the random depth-first search (Algorithm 1), with the number $N$ of visited nodes being of order $10^{5}$ to $10^{7}$ . The ultimate aim of every experiment was to make a well-grounded conjecture about the (in)finiteness of the studied language.

Our initial expectation was that random walks will demonstrate two easily distinguishable types of behaviour:

•

infinite-like: the level of the current node is (almost) proportional to the number of nodes visited, or
•

finite-like: from some point, the level of the current node oscillates near the maximum reached earlier.

However, the situation is not that straightforward: very long oscillations of level were detected during random walks even in some languages which are known to be infinite; for example, in the binary 4-A-free language. To overcome such an unwanted behaviour, we endowed Algorithm 1 with a “forced backtrack” rule:

•

let ${\sf ml}=k$ be the maximum level of a node reached so far; if $f(k)$ nodes were visited since the last update of ${\sf ml}$ or since the last forced backtrack, then make a forced backtrack: from the current node, move $g(k)$ edges up the tree and continue the search from the node reached.

Here $f(k)$ and $g(k)$ are some heuristically chosen monotone functions; we used $f(k)=\lceil k^{3/2}\rceil$ and $g(k)=\lceil k^{1/2}\rceil$ . Forced backtracking deletes the last $g(k)$ letters of the current word in order to get out of a big finite subtree the search was supposed to traverse. The use of forced backtracking allowed us to classify the walks in almost all studied languages either as infinite-like or as finite-like. The results presented below are grouped by the alphabets.

4.1 Alphabets with $6,7,8,9$ , and 10 letters

In [13], it was proved (Theorem 3.1) that ${\sf ART}(k)\geq\frac{k-2}{k-3}$ for all $k\geq 5$ and conjectured that the equality holds in all cases. However, the random search reveals a different picture. For each of the languages ${\sf AF}(k,\frac{k-2}{k-3}^{+})$ , $k=6,7,8,9,10$ , we ran random search with forced backtrack, using Algorithm 3 to decide the membership in the language; the search terminated when $N$ nodes were visited. We repeated the search 100 times with with $N=10^{6}$ and another 100 times with $N=2\cdot 10^{6}$ . The results, presented in columns 3–8 of Table 1, clearly demonstrate finite-like behaviour of random walks. Moreover, the results suggest that neither of these languages contains a word much longer than 100 symbols. So we were able to prove the following result by (optimized) exhaustive search.

Theorem 4.1

One has ${\sf ART}(k)>\frac{k-2}{k-3}$ for $k=6,7,8,9,10$ .

Alphabet size	Avoided power	$N=10^{6}$			$N=2\cdot 10^{6}$			Maximum length
Alphabet size	Avoided power	${\sf ml}_{max}$	${\sf ml}_{av}$	${\sf ml}_{med}$	${\sf ml}_{max}$	${\sf ml}_{av}$	${\sf ml}_{med}$	Maximum length
6	$(4/3)^{+}$	112	98.9	98	114	101.1	101	116
7	$(5/4)^{+}$	116	100.3	100	124	103.9	102	125
8	$(6/5)^{+}$	103	94.8	95	102	96.2	96	105
9	$(7/6)^{+}$	108	95.6	96	107	98.8	99	117
10	$(8/7)^{+}$	121	107.7	108	128	111.6	111	148*

Table 1: Maximum levels

{\sf ml}

reached by random walks in some Abelian power-free languages. Columns 3–5 (resp. 6–8) show the maximum, average, and median values of

{\sf ml}

among 100 random walks visiting

N=10^{6}

(resp.,

N=2\cdot 10^{6}

) nodes each. Column 9 shows the length of a longest word in the language, found by exhaustive search.

A length- $n$ word is called $n$ -permutation if all its letters are pairwise distinct. We use the following lemma to reduce the search space.

Lemma 2

Let $k\geq 6$ , $\alpha={\frac{k-2}{k-3}}^{+}$ , and let $L_{1},L_{2}$ , and $L_{3}$ be subsets of $L={\sf AF}(k,\alpha)$ defined as follows:
- $L_{1}$ is the set of all $w\in L$ such that $w$ has the prefix $01\cdots(k{-}3)$ and contains no $(k-1)$ -permutations;
- $L_{2}$ is the set of all $w\in L$ such that $w$ has the prefix $01\cdots(k{-}2)$ and contains no $k$ -permutations;
- $L_{3}$ is the set of all $w\in L$ having the prefix $01\cdots(k{-}1)$ .
Then $L$ is finite iff each of $L_{1},L_{2}$ , and $L_{3}$ is finite.

Proof

The necessity is immediate from definitions; let us prove sufficiency. Let $w\in L$ and let $\ell_{1},\ell_{2},\ell_{3}$ be the lengths of the longest words in $L_{1},L_{2}$ , and $L_{3}$ respectively. Let us show that $|w|<\ell_{1}+\ell_{2}+\ell_{3}$ . If $u$ is a word and $a$ is a letter, then $aua$ is an Abelian $\frac{|u|+2}{|u|+1}$ -power. Hence

$(\diamond)$

any factor of $w$ of length $k-3$ is a $(k-3)$ -permutation.

Now consider a factor $u$ of $w$ such that $|u|=k-1$ . By $(\diamond)$ , one can write $u=abu^{\prime}cd$ , where $u^{\prime}$ does not contain the letters $a,b,c,d$ and $b\neq c$ . If $a=c$ and $b=d$ , then $u$ is an Abelian $\frac{k-1}{k-3}$ -power, which is impossible since $u\in L$ as a factor of $w$ . Hence $u$ either begins or ends with a $(k-2)$ -permutation. Thus $w$ contains a $(k-2)$ -permutation beginning at the first or second position. Then $w$ contains a $(k-1)$ -permutation due to finiteness of $L_{1}$ ; moreover, this permutation ends no later than at position $2+\ell_{1}$ and thus begins no later than at position $4-k+\ell_{1}$ . Similarly, due to finiteness of $L_{2}$ , $w$ contains a $k$ -permutation no later than at position $5-2k+\ell_{1}+\ell_{2}$ . Finally, the finiteness of $L_{3}$ implies the upper bound $|w|\leq 4-2k+\ell_{1}+\ell_{2}+\ell_{3}$ . In particular, $L$ is finite.∎

We ran (non-randomized) depth-first search on the prefix trees of the languages $L_{1},L_{2}$ , and $L_{3}$ for the cases $k=6,7,8,9,10$ , using Algorithm 3 to detect Abelian powers, and proved that all these trees are finite. According to Lemma 2, this proves Theorem 4.1. The total number of visited nodes was approximately 0.43 billions for $k=8$ ; 0.90 billions for $k=7$ ; 6.29 billions for $k=6$ ; 8.14 billions for $k=9$ ; more than 500 billions for $k=10$ . The last case required about 2000 hours of processing time (single-core) by an ordinary laptop.

Remark 6

For each $k=6,7,8,9$ it is feasible to run a single search which enumerates all lexmin words in the language ${\sf AF}(k,\frac{k-2}{k-3}^{+})$ . We performed these searches and found the maximum length of a word in each language (the last column of Table 1) and the distribution of words by their length (see Fig. 3 for the case $k=8$ ). For $k=10$ , such a single search would require too much resources; here the value in the last column of Table 1 is the length of the longest word in $L_{1}\cup L_{2}\cup L_{3}$ .

Theorem 4.1 raises the question of avoidance of bigger $\alpha$ -A-powers over the same alphabets. As the next step, we ran experiments for the languages ${\sf AF}(k,\frac{k-3}{k-4})$ . The results for $k=7,8,9,10$ are presented in Table 2; random walks in these languages clearly demonstrate finite-type behaviour, while proving finiteness by exhaustive search looks hardly possible. On the contrary, the walks in the 6-ary language ${\sf AF}(6,\frac{3}{2})$ demonstrate an infinite-like behaviour: the average value of ${\sf ml}$ for our experiments with $N=10^{5}$ is greater than $5\cdot 10^{4}$ . We note that the obtained words are too long to use Algorithm 3, so we had to use slower Algorithm 2. Finally, we constructed random walks for the languages ${\sf AF}(k,\frac{k-3}{k-4}^{+})$ ( $k=7,8,9,10$ ). They also demonstrate infinite-like behaviour. The obtained experimental results allow us to state the part of Conjecture 2 for the alphabets with 6 and more letters.

Alphabet size	Avoided power	$N=10^{6}$			$N=2\cdot 10^{6}$
Alphabet size	Avoided power	${\sf ml}_{max}$	${\sf ml}_{av}$	${\sf ml}_{med}$	${\sf ml}_{max}$	${\sf ml}_{av}$	${\sf ml}_{med}$
7	$4/3$	510	374.5	371	510	397.5	394
8	$5/4$	211	179.7	179	223	185.0	184
9	$6/5$	192	157.2	156	191	162.3	161
10	$7/6$	175	154.0	154	187	159.7	158

Table 2: Maximum levels

{\sf ml}

reached by random walks in some Abelian power-free languages. Columns 3–5 (resp. 6–8) show the maximum, average, and median values of

{\sf ml}

among 100 random walks visiting

N=10^{6}

(resp.,

N=2\cdot 10^{6}

) nodes each.

4.2 Alphabets with $2,3,4$ , and 5 letters

Random walks in the prefix tree of the language ${\sf AF}(5,\frac{3}{2}^{+})$ demonstrate the infinite-like behaviour; Fig. 4 shows an example of dependence of the level of the current node on the number of nodes visited. Although we could not push random walks significantly farther down the tree (Algorithm 3 uses too much space to work on such big levels, so we relied on slow Algorithm 2), we obtained sufficient evidence to support Conjecture 1 for $k=5$ .

Remark 7

As the language ${\sf AF}(5,\frac{3}{2}^{+})$ is supposed to be infinite, it is interesting to estimate its growth. Based on the technique described in [11], we estimate the number of words of length $n$ in ${\sf AF}(5,\frac{3}{2}^{+})$ as growing exponentially with $n$ at the rate close to 1.5. The upper bound 2.335 [13] on this rate is thus very loose.

For smaller alphabets, we studied the languages ${\sf AF}(4,\frac{9}{5}^{+})$ , ${\sf AF}(3,2^{+})$ , and ${\sf AF}(2,\frac{11}{3}^{+})$ , indicated by Conjecture 1 as infinite. To detect Abelian powers, we used Algorithm 3 for the quaternary alphabet; for the ternary and binary alphabets, we worked with the reversals of ${\sf AF}(3,2^{+})$ , and ${\sf AF}(2,\frac{11}{3}^{+})$ to benefit from the speed of Algorithm 5 detecting dual Abelian powers. Random walks (with forced backtracking) in each of three languages show the finite-like behaviour; see Table 3 and the example in Fig. 5. Longer random searches lead to somewhat better results, especially on average, due to multiple forced backtracks; however, nothing resembles a steady growth of maximum level with the total number of visited nodes. So the experimental results justify lower bounds from Conjecture 2 for $k=2,3,4$ . To get the upper bound for the ternary alphabet, we ran random walks for the language ${\sf AF}(3,\frac{5}{2}^{+})$ with the results similar to those obtained for ${\sf AF}(5,\frac{3}{2}^{+})$ : all walks demonstrate the infinite-like behaviour; the level ${\sf ml}=10^{5}$ is reached within minutes.

Overall, we conclude that the experiments we conducted justify the formulation of Conjecture 2.

Alphabet size	Avoided power	$N=10^{6}$			$N=2\cdot 10^{6}$			$N=10^{7}$
Alphabet size	Avoided power	${\sf ml}_{max}$	${\sf ml}_{av}$	${\sf ml}_{med}$	${\sf ml}_{max}$	${\sf ml}_{av}$	${\sf ml}_{med}$	${\sf ml}_{max}$	${\sf ml}_{av}$	${\sf ml}_{med}$
2	$(11/3)^{+}$	775	435.8	416	706	477.0	453	759	589.7	588
3	$2^{+}$	3344	1700.0	1671	5363	2228.8	2140	5449	3078.1	3148
4	$(9/5)^{+}$	1367	861.2	835	1734	986.8	956	2453	1414.7	1369

Table 3: Maximum levels

{\sf ml}

reached by random walks in some Abelian power-free languages. Columns 3–5 (resp. 6–8, 9–11) show the maximum, average, and median values of

{\sf ml}

among 100 random walks visiting

N=10^{6}

(resp.,

N=2\cdot 10^{6}

N=10^{7}

) nodes each.

We ran two additional experiments with the language ${\sf AF}(4,\frac{9}{5}^{+})$ . First, we took the longest word found by random search (it has length 2453) and extended it to the left with another long random search, repeated multiple times. The longest obtained word has length 3152, which seems to be a fair approximation of the maximum length of a word in ${\sf AF}(4,\frac{9}{5}^{+})$ . Second, we tried an exhaustive enumeration of the words in ${\sf AF}(4,\frac{9}{5}^{+})$ to understand how fast is the initial growth and how far we can reach. We discovered that the language contains $10.68$ billions of lexmin words of length 90 compared to $9.49$ billions of such words of length 89. Hence, 90 is still quite far from the length where the number of words will be maximal.

5 Future Work

Clearly, the main challenge in the topic is to find the exact values of the Abelian repetition threshold. Even finding one such value would be a great progress. Choosing the case to start with, we would suggest proving ${\sf ART}(5)=3/2$ because in this case the lower bound is already checked by exhaustive search in [13]. For all other alphabets, the proof of lower bounds suggested in Conjecture 2 is already a challenging task which cannot be solved by brute force.

Another piece of work is to refine Conjecture 2 by suggesting the precise values of ${\sf ART}(2)$ , ${\sf ART}(3)$ , ${\sf ART}(4)$ , and ${\sf ART}(6)$ . For bigger $k$ , random walks demonstrate an obvious “phase transition” at the point $\frac{k-3}{k-4}$ : the behaviour of a walk switches from finite-like for ${\sf AF}(k,\frac{k-3}{k-4})$ to infinite-like for ${\sf AF}(k,\frac{k-3}{k-4}^{+})$ . However, the situation with small alphabets can be trickier. For example, we tried the next natural candidate for ${\sf ART}(4)$ , namely, $11/6$ . For the random walks in ${\sf AF}(4,\frac{11}{6}^{+})$ , with $N=10^{6}$ and forced backtracks, the range of obtained maximum levels in our experiments varied from 3000 to 20000; such big lengths show that there is no hope to see a clear-cut phase transition in the experiments with random walks.

Finally, we want to draw attention to the following fact. The quaternary 2-A-free word constructed by Keränen [8] contains arbitrarily long factors of the form $xa\bar{x}$ , where $a$ is a letter and $x\sim\bar{x}$ ; thus it is not $\alpha$ -A-free for any $\alpha<2$ . Similarly, the word constructed by Dekking [5] for the ternary (resp., binary) alphabet is not $\alpha$ -A-free for any $\alpha<3$ (resp., $\alpha<4$ ). Hence some new constructions are necessary to improve upper bounds for ${\sf ART}$ .

References

[1] Carpi, A.: On Dejean’s conjecture over large alphabets. Theoret. Comput. Sci. 385, 137–151 (1999)
[2] Cassaigne, J., Currie, J.D.: Words strongly avoiding fractional powers. Eur. J. Comb. 20(8), 725–737 (1999)
[3] Currie, J.D., Rampersad, N.: A proof of Dejean’s conjecture. Math. Comp. 80, 1063–1070 (2011)
[4] Dejean, F.: Sur un théorème de Thue. J. Combin. Theory. Ser. A 13, 90–99 (1972)
[5] Dekking, F.M.: Strongly non-repetitive sequences and progression-free sets. J. Combin. Theory. Ser. A 27, 181–185 (1979)
[6] Erdös, P.: Some unsolved problems. Magyar Tud. Akad. Mat. Kutató Int. Közl. 6, 221–264 (1961)
[7] Evdokimov, A.A.: Strongly asymmetric sequences generated by a finite number of symbols. Soviet Math. Dokl. 9, 536–539 (1968)
[8] Keränen, V.: Abelian squares are avoidable on 4 letters. In: Kuich, W. (ed.) Proc. ICALP 1992. LNCS, vol. 623, pp. 41–52. Springer-Verlag (1992)
[9] Moulin-Ollagnier, J.: Proof of Dejean’s conjecture for alphabets with $5,6,7,8,9,10$ and $11$ letters. Theoret. Comput. Sci. 95, 187–205 (1992)
[10] Pansiot, J.J.: A propos d’une conjecture de F. Dejean sur les répétitions dans les mots. Discr. Appl. Math. 7, 297–311 (1984)
[11] Petrova, E.A., Shur, A.M.: Branching frequency and Markov entropy of repetition-free languages. In: Developments in Language Theory - 25th International Conference, DLT, Proceedings. Lecture Notes in Computer Science, vol. 12811, pp. 328–341. Springer (2021)
[12] Rao, M.: Last cases of Dejean’s conjecture. Theoret. Comput. Sci. 412, 3010–3018 (2011)
[13] Samsonov, A.V., Shur, A.M.: On Abelian repetition threshold. RAIRO Theor. Inf. Appl. 46, 147–163 (2012)
[14] Thue, A.: Über unendliche Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 7, 1–22 (1906)
[15] Thue, A.: Über die gegenseitige Lage gleicher Teile gewisser Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 1, 1–67 (1912)

Abelian Repetition Threshold Revisited

Abstract

Keywords:

1 Introduction

Conjecture 1 (​​[13])

Conjecture 2

2 Definitions and Notation

3 Algorithms

3.1 Avoiding small powers

Remark 1

Proposition 1

Proof

Remark 2

Lemma 1

Proof

Proposition 2

Proof

Proposition 3

Remark 3

Remark 4

3.2 Avoiding big powers

Proposition 4

Proof

Remark 5

Proposition 5

Proof

Proposition 6

Proof

4 Experimental results

4.1 Alphabets with 6,7,8,96,7,8,9, and 10 letters

Theorem 4.1

Lemma 2

Proof

Remark 6

4.2 Alphabets with 2,3,42,3,4, and 5 letters

Remark 7

5 Future Work

References

Conjecture 1 ([13])

4.1 Alphabets with $6,7,8,9$ , and 10 letters

4.2 Alphabets with $2,3,4$ , and 5 letters