This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Ural Federal University, Ekaterinburg, Russia
{elena.petrova,arseny.shur}@urfu.ru

Abelian Repetition Threshold Revisited

Elena A. Petrova    Arseny M. Shur
Abstract

Abelian repetition threshold 𝖠𝖱𝖳(k){\sf ART}(k) is the number separating fractional Abelian powers which are avoidable and unavoidable over the kk-letter alphabet. The exact values of 𝖠𝖱𝖳(k){\sf ART}(k) are unknown; the lower bounds were proved in [A.V. Samsonov, A.M. Shur. On Abelian repetition threshold. RAIRO ITA, 2012] and conjectured to be tight. We present a method of study of Abelian power-free languages using random walks in prefix trees and some experimental results obtained by this method. On the base of these results, we conjecture that the lower bounds for 𝖠𝖱𝖳(k){\sf ART}(k) by Samsonov and Shur are not tight for all k5k\neq 5 and prove this conjecture for k=6,7,8,9,10k=6,7,8,9,10. Namely, we show that 𝖠𝖱𝖳(k)>k2k3{\sf ART}(k)>\frac{k-2}{k-3} in all these cases.

Keywords:
Abelian-power-free language, repetition threshold, prefix tree, random walk

1 Introduction

Two words are Abelian equivalent if they have the same multiset of letters (in other terms, if they are anagrams of each other); Abelian repetition is a pair of Abelian equivalent factors in a word. The study of Abelian repetitions originated from the question of Erdös [6]: does there exist an infinite finitary word having no consecutive pair of Abelian equivalent factors? The factors of the form uu¯u\bar{u}, where uu and u¯\bar{u} are Abelian equivalent, are now called Abelian squares. In modern terms, Erdös’s question looks like “are Abelian squares avoidable over some finite alphabet?” This question was answered in the affirmative by Evdokimov [7]; the smallest possible alphabet has cardinality 4, as was proved by Keränen [8]. In a similar way, Abelian kkth powers are defined for arbitrary k2k\geq 2. Dekking [5] constructed infinite ternary words without Abelian cubes and infinite binary words without Abelian 4th powers. The results by Dekking and Keränen form an Abelian analog of the seminal result by Thue [14]: there exist an infinite ternary word containing no squares (factors of the form uuuu) and an infinite binary word containing no cubes (factors of the form uuuuuu).

Integral powers of words were later generalized to rational (fractional) powers: given a word uu of length nn, take a length-mm prefix vv of the infinite word uuuuuu\cdots; then vv is the (mn)(\frac{m}{n})th power of uu (m>nm>n is assumed). Usually, vv is referred to as a (mn)(\frac{m}{n})-power. A word is said to be α\alpha-power-free if it contains no (mn)(\frac{m}{n})-powers with mnα\frac{m}{n}\geq\alpha. Fractional powers gave rise to the notion of repetition threshold which is the function 𝖱𝖳(k)=inf{α:there exists an infinite k-ary α-power-free word}{\sf RT}(k)=\inf\{\alpha:\text{there exists an infinite $k$-ary $\alpha$-power-free word}\}. The value 𝖱𝖳(2)=2{\sf RT}(2)=2 is known since Thue [15]. Dejean [4] showed that 𝖱𝖳(3)=7/4{\sf RT}(3)=7/4 and conjectured the remaining values 𝖱𝖳(4)=7/5{\sf RT}(4)=7/5 (proved by Pansiot [10]) and 𝖱𝖳(k)=kk1{\sf RT}(k)=\frac{k}{k-1} for k5k\geq 5 (proved by efforts of many authors [9, 1, 3, 12]). An extension of the notions of fractional power and repetition threshold to the Abelian case was proposed by Samsonov and Shur [13]. Integral Abelian powers can be generalized to fractional ones in several ways; however, for the case m2nm\leq 2n, one definition of an Abelian (mn)(\frac{m}{n})-power is preferable due to its symmetric nature. According to this definition, a word vuv¯vu\bar{v} is an Abelian (mn)(\frac{m}{n})th power of the word vuvu if |vu|=n|vu|=n, |vuv¯|=m|vu\bar{v}|=m, and v¯\bar{v} is Abelian equivalent to vv (in [13], such Abelian powers were called strong). Note that the reversal of an Abelian (mn)(\frac{m}{n})-power is also an Abelian (mn)(\frac{m}{n})-power. In this paper, we consider only strong Abelian powers; see Section 2 for the definition in the case m>2nm>2n. Given the definition of fractional Abelian powers, one naturally defines Abelian α\alpha-power-free words and Abelian repetition threshold 𝖠𝖱𝖳(k)=inf{α:there exists an infinite k-ary Abelian α-power-free word}{\sf ART}(k)=\inf\{\alpha:\text{there exists an infinite $k$-ary Abelian $\alpha$-power-free word}\}. Cassaigne and Currie [2] showed that for any ε>0\varepsilon>0 their exists an Abelian (1+ε)(1+\varepsilon)-power-free word over a finite alphabet of size 2poly(ε1)2^{\mathrm{poly}(\varepsilon^{-1})}. Surely, this bound is very loose but it proves that limk𝖠𝖱𝖳(k)=1\lim_{k\to\infty}{\sf ART}(k)=1. In [13], the lower bounds 𝖠𝖱𝖳(4)9/5{\sf ART}(4)\geq 9/5, 𝖠𝖱𝖳(k)k2k3{\sf ART}(k)\geq\frac{k-2}{k-3} for k5k\geq 5 were proved and conjectured to be tight; in full, this conjecture is as follows.

Conjecture 1 (​​[13])

𝖠𝖱𝖳(2)=11/3{\sf ART}(2)=11/3; 𝖠𝖱𝖳(3)=2{\sf ART}(3)=2; 𝖠𝖱𝖳(4)=9/5{\sf ART}(4)=9/5; 𝖠𝖱𝖳(k)=k2k3{\sf ART}(k)=\frac{k-2}{k-3} for k5k\geq 5.

Up to now, no exact values of 𝖠𝖱𝖳(k){\sf ART}(k) are known. One reason for the lack of progress in estimating 𝖠𝖱𝖳(k){\sf ART}(k) is the fact that the number of kk-ary Abelian α\alpha-power-free words can be finite but so huge that these words cannot be enumerated by exhaustive search.

In the present study, we approach Abelian α\alpha-power-free words using randomized depth-first search. The language 𝖠𝖥(k,α){\sf AF}(k,\alpha) of all kk-ary Abelian α\alpha-power-free words is viewed as a prefix tree 𝒯k,α{\cal T}_{k,\alpha}: the elements of 𝖠𝖥(k,α){\sf AF}(k,\alpha) are nodes of the tree and uu is an ancestor of vv in the tree iff uu is a prefix of ww. The search starts at the root (empty word) and is organized as follows. Reaching a node uu for the first time, we choose a random letter aa, check ad hoc whether ua𝖠𝖥(k,α)ua\in{\sf AF}(k,\alpha) and descend to uaua if this node exists; visiting uu next times, we choose a random letter among the letters not chosen at uu before, and proceed the same way. If there is no choice, we return to the parent of uu (and thus will never reach uu in the future). We repeated the search multiple times and analysed the maximum level of a node reached in the tree and the change of level during the search. Based on this analysis, we state

Conjecture 2

𝖠𝖱𝖳(2)>11/3{\sf ART}(2)>11/3; 2<𝖠𝖱𝖳(3)5/22<{\sf ART}(3)\leq 5/2; 𝖠𝖱𝖳(4)>9/5{\sf ART}(4)>9/5; 𝖠𝖱𝖳(5)=3/2{\sf ART}(5)=3/2; 4/3<𝖠𝖱𝖳(6)<3/24/3<{\sf ART}(6)<3/2; 𝖠𝖱𝖳(k)=k3k4{\sf ART}(k)=\frac{k-3}{k-4} for k7k\geq 7.

As a first step in proving this conjecture, we prove 𝖠𝖱𝖳(k)>k2k3{\sf ART}(k)>\frac{k-2}{k-3} for k=6,7,8,9,10k=6,7,8,9,10 by some exhaustive search (Theorem 4.1 in Section 4).

The paper is organized as follows. After preliminary Section 2, we describe the algorithms used in our study and the results obtained through experiments in Sections 3 and 4 respectively. Section 5 contains some final remarks and prospects for future studies.

2 Definitions and Notation

We study finite words over finite alphabets, using standard notation Σ\Sigma for a (linearly ordered) alphabet, σ\sigma for its size, Σ\Sigma^{*} for the set of all finite words over Σ\Sigma, including the empty word λ\lambda. For a length-nn word uΣu\in\Sigma^{*} we write u=u[1..n]u=u[1..n]; the elements of the range [1..n][1..n] are positions in uu, the length of uu is denoted by |u||u|. A word ww is a factor of uu if u=vwzu=vwz for some (possibly empty) words vv and zz; the condition v=λv=\lambda (resp., z=λz=\lambda) means that ww is a prefix (resp., suffix) of uu. Any factor ww of uu can be represented as w=u[i..j]w=u[i..j] for some ii and jj (j<ij<i means w=λw=\lambda). A factor ww of uu can have several such representations; we say that su[i..j]su[i..j] specifies the occurrence of ww at position ii.

A kk-power of a word uu is the concatenation of kk copies of uu, denoted by uku^{k}. This notion can be extended to α\alpha-powers for an arbitrary rational α>1\alpha>1. The α\alpha-power of uu is the word uα=uuuu^{\alpha}=u\cdots uu^{\prime} such that |uα|=α|u||u^{\alpha}|=\alpha|u| and uu^{\prime} is a prefix of uu. A word is α\alpha-free (resp., α+\alpha^{+}\!-free) if no one of its factors is a β\beta-power with βα\beta\geq\alpha (resp., β>α\beta>\alpha).

The Parikh vector Ψ(u)\Psi(u) of a word uΣu\in\Sigma^{*} is an integer vector of length σ\sigma whose coordinates are the numbers of occurrences of the letters from Σ\Sigma in uu. Thus, the word acabacacabac over the alphabet Σ={a<b<c<d}\Sigma=\{a<b<c<d\} has the Parikh vector (3,1,2,0)(3,1,2,0). Two words uu and vv are Abelian equivalent (denoted by uvu\sim v) if Ψ(u)=Ψ(v)\Psi(u)=\Psi(v). The reversal of a word u=u[1..n]u=u[1..n] is the word uR=u[n]u[n1]u[1]u^{R}=u[n]u[n{-}1]\cdots u[1]. Clearly, uuRu\sim u^{R}. A nonempty word uu is an Abelian kkth power (kk-A-power) if u=w1wku=w_{1}\cdots w_{k}, where wiwjw_{i}\sim w_{j} for all indices i,ji,j. A 2-A-power is an Abelian square, and a 3-A-power is an Abelian cube. Thus, kk-A-powers generalize kk-powers by relaxing the equality of factors to their Abelian equivalence. However, there are many ways to generalize the notion of an α\alpha-power to the Abelian case, and all of them have drawbacks. The reason is that uvu\sim v does not imply u[i..j]v[i..j]u[i..j]\sim v[i..j] for any pair of factors of uu and vv. If 1<α21<\alpha\leq 2, we define an α\alpha-A-power as a word vuvvuv^{\prime} such that |vuv||vu|=α\frac{|vuv^{\prime}|}{|vu|}=\alpha and vvv\sim v^{\prime}. The advantage of this definition is that the reversal of an α\alpha-A-power is an α\alpha-A-power as well. For α>2\alpha>2 the situation is worse: all natural definitions compatible with the definition of kk-A-power are not symmetric with respect to reversals (see [13] for more details). So we give the definition which is compatible with the case α2\alpha\leq 2: an α\alpha-A-power is a word u1ukuu_{1}\cdots u_{k}u^{\prime} such that |u1uku||u1|=α\frac{|u_{1}\cdots u_{k}u^{\prime}|}{|u_{1}|}=\alpha, k=αk=\lfloor\alpha\rfloor, u1uku_{1}\sim\cdots\sim u_{k}, and uu^{\prime} is Abelian equivalent to a prefix of u1u_{1}. In [13], such words are called strong Abelian α\alpha-powers. For a given α\alpha, α\alpha-A-free and α+\alpha^{+}\!-A-free words are defined in the same way as α\alpha-free (α+\alpha^{+}\!-free) words. It is convenient to extend rational numbers with “numbers” of the form α+\alpha^{+}, postulating the equivalence of the inequalities β>α\beta>\alpha and βα+\beta\geq\alpha^{+} (resp., βα\beta\leq\alpha and β<α+\beta<\alpha^{+}).

A language is any subset of Σ\Sigma^{*}. The reversal LRL^{R} of a language LL consists of the reversals of all words in LL. The α\alpha-A-free language over Σ\Sigma (where α\alpha belongs to extended rationals) consists of all α\alpha-A-free words over Σ\Sigma and is denoted by 𝖠𝖥(σ,α){\sf AF}(\sigma,\alpha). These languages are the main objects of our study aimed at finding the Abelian repetition threshold, which is the function 𝖠𝖱𝖳(k)=inf{α:𝖠𝖥(k,α) is infinite}{\sf ART}(k)=\inf\{\alpha:{\sf AF}(k,\alpha)\text{ is infinite}\}. The languages 𝖠𝖥(σ,α){\sf AF}(\sigma,\alpha) are closed under permutations: if π\pi is a permutation of the alphabet, then the words uu and π(u)\pi(u) are α\alpha-A-free for exactly the same values of α\alpha. This makes possible the enumeration of the words in languages 𝖠𝖥(σ,α){\sf AF}(\sigma,\alpha) by considering only lexmin words: a word uΣu\in\Sigma^{*} is lexmin if u<π(u)u<\pi(u) for any permutation π\pi of Σ\Sigma.

Suppose that a language LL is factorial (i.e., closed under taking factors of its words); for example, all languages 𝖠𝖥(σ,α){\sf AF}(\sigma,\alpha) are factorial. Then LL can be represented by its prefix tree 𝒯L{\cal T}_{L}, which is a rooted labeled tree whose nodes are elements of LL and edges have the form u𝑎uau\xrightarrow{a}ua, where aa is a letter. Thus uu is an ancestor of vv iff uu is a prefix of vv. We study the languages 𝖠𝖥(σ,α){\sf AF}(\sigma,\alpha) through different types of search in their prefix trees.

3 Algorithms

In this section we present the algorithms we develop for the use in experiments. First we describe the random depth-first search in the prefix tree 𝒯=𝒯L{\cal T}={\cal T}_{L} of an arbitrary factorial language LL. Given a number NN, the algorithm visits NN distinct nodes of 𝒯{\cal T} following the depth-first order and returns the maximum level of a visited node. The search can be easily augmented to return the word corresponding to the node of maximum level, or to log the sequence of levels of visited nodes. Algorithm 1 below describes one iteration of the search. In the algorithm, u=u[1..n]u=u[1..n] is the word corresponding to the current node; 𝖲𝖾𝗍[u]{\sf Set}[u] is the set of all letters aa such that the search has not tried the node uaua yet; 𝗆𝗅{\sf ml} is the maximum level reached so far; 𝖼𝗈𝗎𝗇𝗍\mathsf{count} is the number of visited nodes; (u){\cal L}(u) is the predicate returning 𝗍𝗋𝗎𝖾\mathsf{true} if uLu\in L and 𝖿𝖺𝗅𝗌𝖾\mathsf{false} otherwise. The lines 3 and 8 refer to the updates of data structures used to compute (u){\cal L}(u). The search starts with u=λu=\lambda, 𝗆𝗅=0{\sf ml}=0, 𝖼𝗈𝗎𝗇𝗍=1\mathsf{count}=1. A variant of this search algorithm was used in [11] to numerically estimate the entropy of some α\alpha-free and α\alpha-A-free languages.

Algorithm 1 Random depth-first search in 𝒯(L){\cal T}(L): one iteration
1:if 𝖼𝗈𝗎𝗇𝗍=N\mathsf{count}=N then break\triangleright search finished
2:if 𝖲𝖾𝗍[u]={\sf Set}[u]=\varnothing then \triangleright all children of uu were visited
3:     [update data structures]
4:     uu[1..|u|1]u\leftarrow u[1..|u|{-}1] \triangleright return to the parent of uu
5:else
6:     aa\leftarrow random(𝖲𝖾𝗍[u]{\sf Set}[u]); 𝖲𝖾𝗍[u]𝖲𝖾𝗍[u]a{\sf Set}[u]\leftarrow{\sf Set}[u]-a \triangleright take random unused letter
7:     if (ua){\cal L}(ua) then \triangleright the node uaua is in 𝒯(L){\cal T}(L)
8:         [update data structures]
9:         uuau\leftarrow ua; 𝖲𝖾𝗍[u]Σ{\sf Set}[u]\leftarrow\Sigma; 𝖼𝗈𝗎𝗇𝗍𝖼𝗈𝗎𝗇𝗍+1\mathsf{count}\leftarrow\mathsf{count}+1 \triangleright visit uaua next
10:         if |u|>𝗆𝗅|u|>{\sf ml} then 𝗆𝗅|u|{\sf ml}\leftarrow|u|\triangleright update the maximum level               

The key to an efficient search is a fast algorithm computing the predicate (u){\cal L}(u). The following fact is very useful: if uu^{\prime} is a proper prefix of uu, then the node for uu^{\prime} exists and hence (u)=𝗍𝗋𝗎𝖾{\cal L}(u^{\prime})=\mathsf{true}. Below we present four algorithms checking, for a given α\alpha, whether a word has a β\beta-A-power with βα\beta\geq\alpha as a suffix. The cases α<2\alpha<2 and α>2\alpha>2 are considered in Sections 3.1 and 3.2 respectively.

3.1 Avoiding small powers

Let α<2\alpha<2 and uu be a word of length nn such that all proper prefixes of uu are α\alpha-A-free. To prove uu α\alpha-A-free, it is necessary and sufficient to show that

  • ()(\star)

    no suffix of uu can be written as xyzxyz such that |z|>0|z|>0, xzx\sim z, and |xyz||xy|α\frac{|xyz|}{|xy|}\geq\alpha.

Remark 1

Since Abelian equivalence is not closed under taking any sort of factors of words, the ratio |xyz||xy|\frac{|xyz|}{|xy|} in ()(\star) can significantly exceed α\alpha. For example, all proper prefixes, and even all proper factors, of the word u=abcdebdaecu=abcde\,bdaec are 32\frac{3}{2}-A-free, while uu is an Abelian square. Hence for each suffix zz of uu one should check multiple candidates to the factor xx in ()(\star). The number of such candidates can be as big as Θ(n)\Theta(n); in total, Θ(n2)\Theta(n^{2}) candidates to the pair (x,z)(x,z) should be analysed.

A reasonable approach is to store the Parikh vectors of all prefixes of uu; they spend O(nσ)O(n\sigma) words of space in total and require O(σ)O(\sigma) time for update when a letter is appended or deleted on the right end of the word. Then the Parikh vector of each factor of uu is obtained as the difference of Parikh vectors of corresponding prefixes. So one comparison of factors takes O(σ)O(\sigma) time, which means Θ(n2σ)\Theta(n^{2}\sigma) time for performing all comparisons of candidate pairs (x,z)(x,z) in ()(\star) in a naive way (see Remark 1). Algorithm 2 below gets many of the comparisons for free. It makes use of two length-nn arrays for each letter aΣa\in\Sigma: ca[i]c_{a}[i] is the number of occurrences of aa in u[1..i]u[1..i] (= a coordinate of the Parikh vector of the prefix u[1..i]u[1..i]) and da[i]d_{a}[i] is the position of iith from the left letter aa in the current word uu. Each of the arrays can be updated in O(σ)O(\sigma) time when a letter is appended or deleted on the right end of uu. We specify the lines 3 and 8 of Algorithm 1 as follows. At line 3, we delete the Parikh vector of uu from cc-arrays and delete the last element of dbd_{b}, where bb is the last letter of uu. At line 8, we add the Parikh vector of uaua to cc-arrays and add a new element |ua||ua| to dad_{a}.

The arrays cac_{a} and dad_{a} are used to compute two auxiliary functions, 𝖯𝖺𝗋𝗂𝗄𝗁(i,j)\mathsf{Parikh}(i,j) and 𝖼𝗈𝗏𝖾𝗋(P,j)\mathsf{cover}(\vec{P},j). The function 𝖯𝖺𝗋𝗂𝗄𝗁(i,j)\mathsf{Parikh}(i,j) returns the Parikh vector of u[i..j]u[i..j]; its coordinates are just the differences of the form ca[j]ca[i1]c_{a}[j]-c_{a}[i-1]. The function 𝖼𝗈𝗏𝖾𝗋(P,j)\mathsf{cover}(\vec{P},j) returns the biggest number ii such that Ψ(u[i..j])P\Psi(u[i..j])\geq\vec{P} or zero if there is no such number. Thus the function returns 0 if u[1..j]u[1..j] contains, for some aa, less aa’s than P(a)\vec{P}(a); i.e., if ca[j]<P[a]c_{a}[j]<\vec{P}[a]. If no such aa exists, then i=minaΣ{da[ca[j]P[a]+1]}i=\min_{a\in\Sigma}\{d_{a}[c_{a}[j]-\vec{P}[a]+1]\}.

Algorithm 2 Abelian powers detection (case α<2\alpha<2)
1:function 𝖺𝗅𝗉𝗁𝖺𝖿𝗋𝖾𝖾(u)\mathsf{alphafree}(u) \triangleright uu=word; n=|u|n=|u|
2:𝑓𝑟𝑒𝑒𝗍𝗋𝗎𝖾\mathit{free}\leftarrow\mathsf{true} \triangleright α\alpha-A-freeness flag
3:for i=ni=n downto 1+n/2\lceil n/2\rceil do \triangleright z=u[i..n]z=u[i..n]
4:     righti1right\leftarrow i-1
5:     lenni+1len\leftarrow n-i+1; P𝖯𝖺𝗋𝗂𝗄𝗁(i,n)\vec{P}\leftarrow\mathsf{Parikh}(i,n) \triangleright length and Parikh vector of zz
6:     𝑙𝑒𝑓𝑡𝖼𝗈𝗏𝖾𝗋(P,right)\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)
7:     if 𝑙𝑒𝑓𝑡=0\mathit{left}=0 then break \triangleright P(u[1..right])P(z)\vec{P}(u[1..right])\not\geq\vec{P}(z)      
8:     while 𝑙𝑒𝑓𝑡max{1,αi1nα1}\mathit{left}\geq\max\{1,\lceil\frac{\alpha i-1-n}{\alpha-1}\rceil\} do\triangleright guarantees |xyz||xy|α\frac{|xyz|}{|xy|}\geq\alpha
9:         if right𝑙𝑒𝑓𝑡+1=lenright-\mathit{left}+1=len then \triangleright x=u[𝑙𝑒𝑓𝑡..right]zx=u[\mathit{left}..right]\sim z
10:              𝑓𝑟𝑒𝑒𝖿𝖺𝗅𝗌𝖾\mathit{free}\leftarrow\mathsf{false}; break
11:         else\triangleright shift rightright leftwards, skip redundant comparisons
12:              right𝑙𝑒𝑓𝑡+len1right\leftarrow\mathit{left}+len-1          
13:         𝑙𝑒𝑓𝑡𝖼𝗈𝗏𝖾𝗋(P,right)\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)      
14:     if 𝑓𝑟𝑒𝑒=false\mathit{free}=\textsf{false} then break      
15:return 𝑓𝑟𝑒𝑒\mathit{free} \triangleright the answer to “is uu α\alpha-A-free?”
Proposition 1

Let α\alpha be a number such that 1<α<21<\alpha<2 and uu be a word all proper prefixes of which are α\alpha-A-free. Then Algorithm 2 correctly detects whether uu is α\alpha-A-free.

Proof

Let us show that Algorithm 2 verifies the condition ()(\star). The outer cycle of the algorithm fixes the first position ii of the suffix zz of uu; the suffixes are analysed in the order of increased length len=|z|len=|z|. If a forbidden suffix xyzxyz is detected during the iteration (we discuss the correctness and completeness of this detection below), then the algorithm breaks the outer cycle in line 14 and returns 𝖿𝖺𝗅𝗌𝖾\mathsf{false}. Thus at the current iteration of the outer cycle the condition ()(\star) is already verified for all shorter suffixes. The iteration uses a simple observation: if xzx\sim z, then every word vv, containing xx, satisfies Ψ(v)Ψ(z)\Psi(v)\geq\Psi(z). We proceed as follows. We fix the rightmost position rightright where a factor xx satisfying xzx\sim z can end. Initially right=i1right=i-1 as xx can immediately precede zz (see the example in Remark 1). Then we compute the shortest factor v=u[𝑙𝑒𝑓𝑡..right]v=u[\mathit{left}..right] such that Ψ(v)Ψ(z)\Psi(v)\geq\Psi(z). If v=xv=x, the suffix xzxz of uu violates ()(\star). Otherwise xx cannot begin later than at the position 𝑙𝑒𝑓𝑡\mathit{left} by the construction of vv. Hence we decrease rightright by setting right=𝑙𝑒𝑓𝑡+|z|1right=\mathit{left}+|z|-1 and repeat the above procedure in a loop. The verification of ()(\star) for zz ends successfully when either vv does not exist (i.e., Ψ(u[1..right])Ψ(z)\Psi(u[1..right])\not\geq\Psi(z) for the current value of rightright) or 𝑙𝑒𝑓𝑡\mathit{left} is too small (i.e., xyz=u[𝑙𝑒𝑓𝑡..n]xyz=u[\mathit{left}..n] with |x|=|z||x|=|z| means |xyz||xy|<α\frac{|xyz|}{|xy|}<\alpha). The described process is illustrated by Fig. 1. Details are provided below.

Refer to caption
Refer to caption
Figure 1: Illustrating the proof of Proposition 2. Processing the suffix zz, Algorithm 2 successively finds three words vv satisfying Ψ(v)Ψ(z)\Psi(v)\geq\Psi(z). On the left picture, the position 𝑙𝑒𝑓𝑡\mathit{left} of v(3)v^{(3)} is smaller than the bound in line 8, so the verification of ()(\star) for zz is finished. On the right picture, |v(3)|=|z||v^{(3)}|=|z|, so a forbidden suffix, starting with v(3)v^{(3)}, is detected.

In lines 4–6 the algorithm calls 𝖯𝖺𝗋𝗂𝗄𝗁\mathsf{Parikh} to compute Ψ(z)\Psi(z) and 𝖼𝗈𝗏𝖾𝗋\mathsf{cover} to find the mentioned factor v=u[𝑙𝑒𝑓𝑡..right]v=u[\mathit{left}..right] for right=i1right=i-1. If 𝑙𝑒𝑓𝑡=0\mathit{left}=0, then Ψ(u[1..i1])Ψ(u[i..n])\Psi(u[1..i{-}1])\not\geq\Psi(u[i..n]) and hence every suffix xyzxyz of uu satisfies x≁zx\not\sim z. Moreover, one can observe that Ψ(u[1..j1])Ψ(u[j..n])\Psi(u[1..j{-}1])\not\geq\Psi(u[j..n]) for each j<ij<i which immediately verifies ()(\star) for all suffixes of uu that are longer than zz. Hence in this case the verification of ()(\star) is already done; respectively, the algorithm breaks the outer cycle in line 7 and returns 𝗍𝗋𝗎𝖾\mathsf{true}.

If no break has happened, the algorithm enters the inner cycle, checks whether v=xv=x (line 9) and breaks with the output 𝖿𝖺𝗅𝗌𝖾\mathsf{false} if this condition holds. If it does not, the algorithm decreases rightright as described above (line 12) and computes the new factor vv (line 13). If vv does not exist, 𝑙𝑒𝑓𝑡\mathit{left} gets 0, which results in the immediate exit from the inner cycle. If vv is computed but its position is too small, then the cycle is also exited. The exit means the end of the iith iteration.

Thus, Algorithm 2 returns 𝖿𝖺𝗅𝗌𝖾\mathsf{false} only if it finds a suffix xyzxyz of uu which violates ()(\star). For the other direction, let xyz=u[j..n]xyz=u[j..n] violate ()(\star) such that |z||z| is minimal over all suffixes violating it. Then Algorithm 2 cannot stop before the iteration which checks zz. During this iteration, 𝑙𝑒𝑓𝑡\mathit{left} cannot become smaller than jj by the definition of the factor vv. As rightright decreases at each iteration of the inner cycle, eventually xx will be found. Thus the algorithm indeed verifies ()(\star) and thus detects the α\alpha-A-freeness of uu. ∎

Remark 2

As both 𝖯𝖺𝗋𝗂𝗄𝗁\mathsf{Parikh} and 𝖼𝗈𝗏𝖾𝗋\mathsf{cover} work in O(σ)O(\sigma) time, Algorithm 2 processes a word uu of length nn in O((K+n)σ)O((K+n)\sigma) time, where KK is the total number of the inner cycle iterations during the course of the algorithm. Clearly, K=O(n2)K=O(n^{2}). The results of experiments suggest that K=Θ(n3/2)K=\Theta(n^{3/2}) on expectation. This is indeed the case if uu is a random word, as Lemma 1 below shows. This lemma implies that the iteration processing a suffix zz builds, on expectation, O(|z|)O(\sqrt{|z|}) words vv.

Lemma 1

Suppose that an infinite word 𝐰\mathbf{w} is chosen uniformly at random among all σ\sigma-ary infinite words, zz is a prefix of 𝐰\mathbf{w}, and vv is the shortest prefix of 𝐰[|z|+1..]\mathbf{w}[|z|{+}1..\infty] such that Ψ(v)Ψ(z)\Psi(v)\geq\Psi(z). Then the expected length of vv is |z|+Ω(|z|)|z|+\Omega(\sqrt{|z|}).

Proof

Let =|z|\ell=|z|, δ=|v||z|\delta=|v|-|z|. First consider the case Σ={0,1}\Sigma=\{0,1\}. The process can be viewed as follows: first, zz is generated by \ell tosses of a fair coin; then another \ell tosses generate some prefix xx of vv; additional tosses are made one by one until the desired result Ψ(v)Ψ(z)\Psi(v)\geq\Psi(z) is reached after δ\delta tosses. The Parikh vector of a word of a known length over Σ\Sigma is determined by the number of 1’s. Hence Ψ(z)\Psi(z) is a random variable ξ\xi with the binomial distribution 𝖻𝗂𝗇(,12)\mathsf{bin}(\ell,\frac{1}{2}); similarly, Ψ(x)\Psi(x) is a random variable η\eta with the same distribution.

The vector Ψ(z)Ψ(x)\Psi(z)-\Psi(x) has the form (m,m)(-m,m) for some integer mm. To obtain vv, we should make |m||m| “successful” tosses with the probability of “success” being 1/21/2; hence the expectation of δ\delta equals 2|m|2|m|. Thus it remains to find the expectation of |m|=|ξη||m|=|\xi-\eta|. Since E(ξη)=0E(\xi-\eta)=0, we see that E(|ξη|)E(|\xi-\eta|) is the standard deviation of ξη\xi-\eta by definition.

Due to symmetry, η\eta and η\ell-\eta have the same distribution. Hence we can replace ξη\xi-\eta by ξ+η\xi+\eta-\ell. The random variable ξ+η\xi+\eta has the binomial distribution 𝖻𝗂𝗇(2,12)\mathsf{bin}(2\ell,\frac{1}{2}), so its standard deviation is /2\sqrt{\ell/2}. Thus E(δ)=2E(|ξη|)=2=Ω()E(\delta)=2E(|\xi-\eta|)=\sqrt{2\ell}=\Omega(\sqrt{\ell}), as desired.

Over larger alphabets the expectation of δ\delta can only increase. The easiest way to see this is to split Σ\Sigma arbitrarily into two subsets K1K_{1} and K2K_{2} of equal size. Then xx with respect to zz has a deficiency of letters from one of these subsets, say, K1K_{1}. By the argument for the binary alphabet, 2\sqrt{2\ell} additional letters is needed, on expectation, to cover this deficiency. This is a necessary (but not sufficient) condition to obtain the word vv. Hence E(δ)=Ω()E(\delta)=\Omega(\sqrt{\ell}). ∎

Algorithm 2 significantly speeds up the naive algorithm but is still rather slow. For the case α3/2\alpha\leq 3/2 a much faster dictionary-based Algorithm 3 is presented below. However, Algorithm 3 consumes the amount of memory which is quadratic in nn; this limits the depth of the search to the values of about 10410^{4} for an ordinary laptop.

When processing a word u=u[1..n]u=u[1..n], the algorithm has in the dictionary all factors of u[1..n1]u[1..n{-}1] up to the Abelian equivalence. Recall that a dictionary contains a set of pairs (key, value), where all keys are unique, and supports fast lookup, addition and deletion by key. For the dictionary used in Algorithm 3, the keys are Parikh vectors and the values are lists of positions, in the increasing order, of the factors having this Parikh vector. The algorithm accesses only the last (maximal) element of the list. Let us describe the updates of the dictionary (lines 3 and 8 of Algorithm 1). At line 3, we delete all suffixes of uu from the dictionary. For a suffix zz, this means the deletion of the last element from the list dict[Ψ(z)]dict[\Psi(z)]; if the list becomes empty, the entry for Ψ(z)\Psi(z) is also deleted. At line 8, all suffixes of uaua are added to the dictionary. For a suffix zz, if Ψ(z)\Psi(z) was not in the dictionary, an entry is created; then the position |ua||z|+1|ua|-|z|+1 is added to the end of the list dict[Ψ(z)]dict[\Psi(z)].

Algorithm 3 Dictionary-based Abelian powers detection (α3/2\alpha\leq 3/2)
1:function 𝖺𝗅𝗉𝗁𝖺𝖿𝗋𝖾𝖾𝖽𝗂𝖼𝗍(u)\mathsf{alphafreedict}(u) \triangleright uu=word; n=|u|n=|u|
2:𝑓𝑟𝑒𝑒𝗍𝗋𝗎𝖾\mathit{free}\leftarrow\mathsf{true} \triangleright α\alpha-A-freeness flag
3:P0\vec{P}\leftarrow\vec{0}
4:for i=ni=n downto 1+n/2\lceil n/2\rceil do \triangleright z=u[i..n]z=u[i..n]
5:     lenni+1len\leftarrow n-i+1 \triangleright length of zz
6:     P[u[i]]P[u[i]]+1\vec{P}[u[i]]\leftarrow P[u[i]]+1 \triangleright get Ψ(z)\Psi(z) from Ψ(u[i+1..n])\Psi(u[i{+}1..n])
7:     posdict[P].lastpos\leftarrow dict[\vec{P}].last\triangleright position of last occurrence of some xzx\sim z, if exists
8:     if posilenpos\leq i-len and posαi1nα1pos\geq\lceil\frac{\alpha i-1-n}{\alpha-1}\rceil then \triangleright xyzxyz is forbidden
9:         𝑓𝑟𝑒𝑒𝖿𝖺𝗅𝗌𝖾\mathit{free}\leftarrow\mathsf{false}; break      
10:return 𝑓𝑟𝑒𝑒\mathit{free} \triangleright the answer to “is uu α\alpha-A-free?”
Proposition 2

Let α\alpha be a number such that 1<α3/21<\alpha\leq 3/2 and uu be a word all proper prefixes of which are α\alpha-A-free. Then Algorithm 3 correctly detects whether uu is α\alpha-A-free.

Proof

Let us show that Algorithm 3 verifies ()(\star). First suppose that the algorithm returned 𝖿𝖺𝗅𝗌𝖾\mathsf{false}. Then it broke from the for cycle (line 9); let zz be the last suffix processed. The lookup by the key Ψ(z)\Psi(z) returned the position of a factor xzx\sim z, and the condition in line 8 was true. Then the suffix xyz=u[pos..n]xyz=u[pos..n] violates ()(\star). Indeed, xzx\sim z since xx was found by the key Ψ(z)\Psi(z); the first inequality means that xx and zz do not overlap in uu; the second inequality is equivalent to |xyz||xy|α\frac{|xyz|}{|xy|}\geq\alpha.

Now suppose that the algorithm returned 𝗍𝗋𝗎𝖾\mathsf{true}. Aiming at a contradiction, assume that uu has a suffix violating ()(\star). Let xyzxyz (xz)(x\sim z) be the shortest such suffix. Consider the iteration of the for cycle, where zz was processed. The key Ψ(z)\Psi(z) was present in the dictionary because xzx\sim z. If pospos (line 7) corresponded to the xx from our “bad” suffix, i.e., xyz=u[pos..n]xyz=u[pos..n], then both inequalities in line 8 held because xx and zz do not overlap in uu and |xyz||xy|α\frac{|xyz|}{|xy|}\geq\alpha. But then the algorithm would have returned 𝖿𝖺𝗅𝗌𝖾\mathsf{false}, contradicting our assumption. Hence pospos was the position of some other xzx^{\prime}\sim z which occurs in uu later than xx. By the choice of the suffix xyzxyz, uu cannot have shorter suffix xyzx^{\prime}y^{\prime}z with xzx^{\prime}\sim z. This means that the occurrences of xx^{\prime} and zz overlap (see Fig. 2).

Refer to caption
Figure 2: Location of Abelian equivalent factors (Proposition 2).

Note that xx^{\prime} and xx also overlap. Otherwise, xyzxyz has a prefix of the form xy^xx\hat{y}x^{\prime} and |xy^x||xy^||xyz||xy|α\frac{|x\hat{y}x^{\prime}|}{|x^{\prime}\hat{y}|}\geq\frac{|xyz|}{|xy|}\geq\alpha, contradicting the condition that all proper prefixes of uu are α\alpha-A-free. Then x=x1x2x=x_{1}x_{2}, x=x2yx3x^{\prime}=x_{2}yx_{3}, z=x3z1z=x_{3}z_{1}, as shown in Fig. 2, and x1,x2,x3,z1x_{1},x_{2},x_{3},z_{1} are nonempty. We observe that xxzx\sim x^{\prime}\sim z imply x1yx3x_{1}\sim yx_{3} and x2yz1x_{2}y\sim z_{1}. By the condition on the prefixes of uu, |x1x2yx3||x1x2|<α3/2\frac{|x_{1}x_{2}yx_{3}|}{|x_{1}x_{2}|}<\alpha\leq 3/2. Hence |yx3|<|x2||yx_{3}|<|x_{2}| and then |x3|<|x2y|=|z1||x_{3}|<|x_{2}y|=|z_{1}|. Therefore |x2yx3z1||x2yx3|>3/2α\frac{|x_{2}yx_{3}z_{1}|}{|x_{2}yx_{3}|}>3/2\geq\alpha, so the suffix xz1=x2yx3z1x^{\prime}z_{1}=x_{2}yx_{3}z_{1} of uu violates ()(\star). But |xz1|<|xyz||x^{\prime}z_{1}|<|xyz|, contradicting the choice of xyzxyz. This contradiction proves that uu satisfies ()(\star). ∎

Dictionaries based on hash table techniques such as linear probing or cuckoo hashing guarantee expected constant time per dictionary operation. As Algorithm 3 consists of a single cycle with a constant number of operations inside, the following statement is straightforward.

Proposition 3

For a word of length nn, Algorithm 3 performs O(n)O(n) operations, including dictionary operations.

Remark 3

A slight modification of Algorithm 3 allows one to process the important case α=(3/2)+\alpha=(3/2)^{+} within the same complexity bound. The whole argument from the proof of Proposition 2 remains valid for α=(3/2)+\alpha=(3/2)^{+} except for one specific situation: in Fig. 2 it is possible that yy is empty and |x1|=|x2|=|x3|=|x4||x_{1}|=|x_{2}|=|x_{3}|=|x_{4}|. In this situation Algorithm 3 misses the Abelian square xzxz. To fix this, we add a patch after line 7:
7.5: if pos=ilen/2pos=i-len/2 then pospos.nextpos\leftarrow pos.next
As an example, consider u=abcdbadcu=abcdbadc. Processing the suffix z=badcz=badc, Algorithm 3 retrieves pos=3pos=3 from the dictionary by the key Ψ(z)\Psi(z). The corresponding factor x=cdbax^{\prime}=cdba overlaps zz and the condition in line 8 would fail for pospos. However, pospos satisfies the condition in the inserted line 7.5 and thus the factor x=abcdx=abcd at pos=1pos=1 will be reached. The condition in line 8 holds for pos=1pos=1 and the forbidden Abelian repetition is detected.

Remark 4

Algorithm 3 can be further modified to work for all α<2\alpha<2. If we replace the patch from Remark 3 with the following one:
7.5: while pos>ilenpos>i-len do pospos.nextpos\leftarrow pos.next
the algorithm will find the closest factor xzx\sim z which does not overlap with zz. This new patch introduces an inner cycle and thus affects the time complexity but the algorithm remains faster in practice than Algorithm 2.

3.2 Avoiding big powers

Let α>2\alpha>2. The case 2<α<32<\alpha<3 for ternary words and the case 3<α<43<\alpha<4 for binary words are relevant to the studies of Abelian repetition threshold. We provide here the algorithms for the first case; the algorithms for the second case are very similar (the only difference is that one should check for Abelian cubes instead of Abelian squares). An Abelian β\beta-power with αβ3\alpha\leq\beta\leq 3 has the form ZZzZZ^{\prime}z, where ZZZ\sim Z^{\prime}, zz is equivalent to a prefix of ZZ, and |ZZz||Z|α\frac{|ZZ^{\prime}z|}{|Z|}\geq\alpha. We can write the Abelian square ZZZZ^{\prime} as xyxy where |x||y||x|\leq|y| and xzx\sim z. Consequently, if all proper prefixes of a word uu are α\alpha-A-free, then uu is α\alpha-A-free iff the following analog of ()(\star) holds:

  • ()(*)

    no suffix of uu can be written as xyzxyz such that |y||x|>0|y|\geq|x|>0, xzx\sim z, xyxy is an Abelian square, and 2|xyz||xy|α\frac{2|xyz|}{|xy|}\geq\alpha.

Verifying ()(*) for uu, we proceed for its suffix zz as follows. Within the range determined by α\alpha, we search for all factors x=u[𝑙𝑒𝑓𝑡..right]x=u[\mathit{left}..right] such that xzx\sim z. The search is organized as in Algorithm 2 (see Fig. 1). For each xx we consider the corresponding suffix xyzxyz of uu and check whether xyxy is an Abelian square. If yes, xyzxyz violates ()(*). In Algorithm 4 below, we make use of the arrays ca,dac_{a},d_{a} and functions 𝖯𝖺𝗋𝗂𝗄𝗁,𝖼𝗈𝗏𝖾𝗋\mathsf{Parikh},\mathsf{cover} designed for Algorithm 2.

Algorithm 4 Abelian powers detection (case 2<α<32<\alpha<3)
1:function 𝖠𝖫𝖯𝖧𝖠𝖿𝗋𝖾𝖾(u)\mathsf{ALPHAfree}(u) \triangleright uu=word; n=|u|n=|u|
2:𝑓𝑟𝑒𝑒𝗍𝗋𝗎𝖾\mathit{free}\leftarrow\mathsf{true} \triangleright α\alpha-A-freeness flag
3:for i=ni=n downto 1+2n/3\lceil 2n/3\rceil do \triangleright z=u[i..n]z=u[i..n]
4:     righti1right\leftarrow i-1
5:     lenni+1len\leftarrow n-i+1; P𝖯𝖺𝗋𝗂𝗄𝗁(i,n)\vec{P}\leftarrow\mathsf{Parikh}(i,n) \triangleright length and Parikh vector of zz
6:     𝑙𝑒𝑓𝑡𝖼𝗈𝗏𝖾𝗋(P,right)\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)
7:     if 𝑙𝑒𝑓𝑡=0\mathit{left}=0 then break \triangleright Ψ(u[1..right])Ψ(z)\Psi(u[1..right])\not\geq\Psi(z)      
8:     while 𝑙𝑒𝑓𝑡max{1,αi12nα2}\mathit{left}\geq\max\{1,\lceil\frac{\alpha i-1-2n}{\alpha-2}\rceil\} do\triangleright guarantees 2|xyz||xy|α\frac{2|xyz|}{|xy|}\geq\alpha
9:         if 𝑙𝑒𝑓𝑡+len1=right\mathit{left}+len-1=right then \triangleright x=u[𝑙𝑒𝑓𝑡..right]zx=u[\mathit{left}..right]\sim z
10:              if 2(i𝑙𝑒𝑓𝑡)2\mid(i-\mathit{left}) and aΣ|ca[i1]+ca[𝑙𝑒𝑓𝑡1]2ca[i+𝑙𝑒𝑓𝑡21]|=0\sum_{a\in\Sigma}\big{|}c_{a}[i{-}1]+c_{a}[\mathit{left}{-}1]-2c_{a}[\frac{i+\mathit{left}}{2}{-}1]\big{|}=0 then
11:                  𝑓𝑟𝑒𝑒𝖿𝖺𝗅𝗌𝖾\mathit{free}\leftarrow\mathsf{false}; break \triangleright xy=u[𝑙𝑒𝑓𝑡..i1]xy=u[\mathit{left}..i{-}1] is an Abelian square
12:              else
13:                  rightright1right\leftarrow right-1\triangleright right bound for the next search               
14:         else
15:              right𝑙𝑒𝑓𝑡+len1right\leftarrow\mathit{left}+len-1\triangleright right bound for the next search          
16:         𝑙𝑒𝑓𝑡𝖼𝗈𝗏𝖾𝗋(P,right)\mathit{left}\leftarrow\mathsf{cover}(\vec{P},right)      
17:     if 𝑓𝑟𝑒𝑒=false\mathit{free}=\textsf{false} then break      
18:return 𝑓𝑟𝑒𝑒\mathit{free} \triangleright the answer to “is uu α\alpha-A-free?”
Proposition 4

Let α\alpha be a number such that 2<α<32<\alpha<3 and uu be a word all proper prefixes of which are α\alpha-A-free. Then Algorithm 4 correctly detects whether uu is α\alpha-A-free.

Proof

Algorithm 4 is similar to Algorithm 2, so we focus on their difference. If some suffix xyzxyz violates ()(*), then |z||xyz|/3n/3|z|\leq|xyz|/3\leq n/3; hence the range for the outer cycle in line 3. For a fixed zz we repeatedly seek for the shortest factor v=u[𝑙𝑒𝑓𝑡..right]v=u[\mathit{left}..right] with the given right bound and the property Ψ(v)Ψ(z)\Psi(v)\geq\Psi(z). If |v|=|z||v|=|z| (condition in line 9 holds), then vv is a candidate for xx in the suffix xyzxyz violating ()(*). The initial value for rightright (line 4) is set to ensure the condition |x||y||x|\leq|y|. The candidate found in line 9 is checked in line 10 for the remaining condition: xyxy is an Abelian square. Namely, we check that |xy||xy| is even and its left and right halves have the same Parikh vector. If this condition holds, the algorithm broke both inner and outer cycles and returns 𝖿𝖺𝗅𝗌𝖾\mathsf{false}. If the condition fails, we decrease rightright by 1 and compute the factor vv for this new right bound. The rest is the same as in Algorithm 2. So we can conclude that Algorithm 4 verifies ()(*). ∎

Remark 5

The inner cycle of Algorithm 4 works in O(σ)O(\sigma) time, and so Algorithm 4 has the same time complexity O((K+n)σ)O((K+n)\sigma) as Algorithm 2. (Recall that KK is the total number of iterations of the inner cycle during processing uu.)

Algorithm 4 is rather slow. But it appears that dual Abelian powers can be detected by a much faster Algorithm 5. Let us give the definitions. As was mentioned in Section 2, the reversal of an α\alpha-A-power for α>2\alpha>2 is not necessarily an α\alpha-power. For example, u=abcbacau=abc\,bac\,a is a (7/3)(7/3)-A-power while uR=acabcbau^{R}=aca\,bcb\,a does not begin with an Abelian square. We call uu a dual α\alpha-A-power if uRu^{R} is an α\alpha-A-power; the notion of dual α\alpha-A-free word is defined by analogy with α\alpha-A-free word. Dual α\alpha-A-free words are exactly the reversals of α\alpha-A-free words.

Assume that all proper prefixes of a word uu are dual α\alpha-A-free, where 2<α<32<\alpha<3. Then uu is dual α\alpha-A-free iff the following analog of ()(*) holds:

  • ()(\dagger)

    no suffix of uu can be written as xyzxyz such that yzy\sim z, xx is equivalent to a suffix of zz, and |xyz||z|α\frac{|xyz|}{|z|}\geq\alpha.

Algorithm 5 Dual Abelian powers detection (2<α<32<\alpha<3)
1:function 𝖽𝗎𝖺𝗅𝖠𝖫𝖯𝖧𝖠𝖿𝗋𝖾𝖾(u)\mathsf{dualALPHAfree}(u) \triangleright uu=word; n=|u|n=|u|
2:𝑓𝑟𝑒𝑒𝗍𝗋𝗎𝖾\mathit{free}\leftarrow\mathsf{true} \triangleright α\alpha-A-freeness flag
3:ini\leftarrow n
4:while i1+α1αni\geq 1+\lceil\frac{\alpha-1}{\alpha}n\rceil do \triangleright z=u[i..n]z=u[i..n]
5:     lenni+1len\leftarrow n-i+1; P𝖯𝖺𝗋𝗂𝗄𝗁(i,n)\vec{P}\leftarrow\mathsf{Parikh}(i,n) \triangleright length and Parikh vector of zz
6:     𝑙𝑒𝑓𝑡𝖼𝗈𝗏𝖾𝗋(P,i1)\mathit{left}\leftarrow\mathsf{cover}(\vec{P},i-1)\triangleright computing vv
7:     if 𝑙𝑒𝑓𝑡+len=i\mathit{left}+len=i then \triangleright |v|=|z|v=yz|v|=|z|\Rightarrow v=y\sim z
8:         j=(α2)lenj=\lceil(\alpha-2)\cdot len\rceil\triangleright minimal length of xx
9:         while jlenj\leq len do\triangleright possible lengths of xx
10:              P1𝖯𝖺𝗋𝗂𝗄𝗁(nj+1,n)\vec{P}_{1}\leftarrow\mathsf{Parikh}(n{-}j{+}1,n) \triangleright Parikh vector of the length-jj suffix of zz
11:              𝑙𝑒𝑓𝑡1𝖼𝗈𝗏𝖾𝗋(P1,𝑙𝑒𝑓𝑡1)\mathit{left}_{1}\leftarrow\mathsf{cover}(\vec{P}_{1},\mathit{left}-1)\triangleright computing v1v_{1} for xx
12:              if 𝑙𝑒𝑓𝑡1+j=𝑙𝑒𝑓𝑡\mathit{left}_{1}+j=\mathit{left} then \triangleright xx is found
13:                  𝑓𝑟𝑒𝑒𝖿𝖺𝗅𝗌𝖾\mathit{free}\leftarrow\mathsf{false}; break
14:              else
15:                  j𝑙𝑒𝑓𝑡𝑙𝑒𝑓𝑡1j\leftarrow\mathit{left}-\mathit{left}_{1}                        
16:         ii1i\leftarrow i-1
17:     else
18:         i(n+𝑙𝑒𝑓𝑡)/2i\leftarrow\lceil(n+\mathit{left})/2\rceil      
19:     if 𝑓𝑟𝑒𝑒=false\mathit{free}=\textsf{false} then break      
20:return 𝑓𝑟𝑒𝑒\mathit{free} \triangleright the answer to “is uu dual α\alpha-A-free?”
Proposition 5

Let α\alpha be a number such that 2<α<32<\alpha<3 and uu be a word all proper prefixes of which are dual α\alpha-A-free. Then Algorithm 5 correctly detects whether uu is dual α\alpha-A-free.

Proof

If some suffix xyzxyz violates ()(\dagger), then |z||xyz|/αn/α|z|\leq|xyz|/\alpha\leq n/\alpha; hence the range for the outer cycle in line 3. The general scheme is as follows. For each processed suffix zz, the algorithm first checks if uu ends with an Abelian square yzyz (yzy\sim z); if yes, it checks whether yzyz is preceded by some xx which is equivalent to a suffix of zz. If such an xx is found, the algorithm detects a violation of ()(\dagger) and stops. If either xx or yy is not found, the algorithm moves to the next appropriate suffix. Let us consider the details.

In line 6, the shortest v=u[𝑙𝑒𝑓𝑡..i1]v=u[\mathit{left}..i{-}1] such that vzvz is a suffix of uu and Ψ(v)Ψ(z)\Psi(v)\geq\Psi(z) is computed. If |v|=|z||v|=|z| (the condition in line 7), then y=vy=v is found and we enter the inner cycle to find xx. If |v|>|x||v|>|x|, we note that the suffixes of uu of lengths between 2|z|2|z| and |vz|1|vz|-1 cannot be Abelian squares; then the next suffix to be considered has the length |vz|2\lceil\frac{|vz|}{2}\rceil, as is set in line 18. In the inner cycle, a similar idea is implemented: for each processed suffix z1z_{1} of zz the algorithm finds the shortest word v1=u[𝑙𝑒𝑓𝑡1..𝑙𝑒𝑓𝑡1]v_{1}=u[\mathit{left}_{1}..\mathit{left}-1] satisfying Ψ(v1)Ψ(z1)\Psi(v_{1})\geq\Psi(z_{1}) (line 11). If |v1|=|z1||v_{1}|=|z_{1}| (line 12), then xx is found; otherwise, the next suffix of zz to be checked is of length |v1||v_{1}| (line 15). The inner cycle breaks if this length exceeds the length of zz.

We have shown that Algorithm 5 stops with the answer 𝖿𝖺𝗅𝗌𝖾\mathsf{false} if it finds a suffix xyzxyz of uu that violates ()(\dagger); if it finishes the check of the suffix zz without breaking or skips this suffix at all, then uu has no suffix xyzxyz, violating ()(\dagger). Therefore, the algorithm verifies ()(\dagger). ∎

Algorithm 5 works extremely fast compared to other algorithms from this section. The following statement holds for the case of the ternary alphabet.

Proposition 6

For a word uu picked up uniformly at random from the set {0,1,2}n\{0,1,2\}^{n}, Algorithm 5 works in Θ(n)\Theta(\sqrt{n}) expected time.

Proof

Lemma 1 says that the expected length of the word vv found in line 6 is |z|+Ω(|z|)|z|+\Omega(\sqrt{|z|}) and thus, on expectation, the assignment in line 18 leads to skipping Ω(|z|)\Omega(\sqrt{|z|}) suffixes of uu. Hence the expected total number of processed suffixes of uu is O(n)O(\sqrt{n}). By the same lemma, the inner cycle for a suffix zz runs, on expectation, O(|z|)O(\sqrt{|z|}) iterations, so its expected time complexity is O(|z|)O(\sqrt{|z|}). Thus, processing the suffix of length \ell, Algorithm 5 performs O(1)+pO()O(1)+p_{\ell}\cdot O(\sqrt{\ell}) operations, where pp_{\ell} is the probability to enter the inner cycle, i.e., the probability that two random ternary words of length \ell are Abelian equivalent. Let us estimate this probability. One has

pmaxk1,k2,k3(k1,k2,k3)/3,p_{\ell}\leq\max_{k_{1},k_{2},k_{3}}\binom{\ell}{k_{1},k_{2},k_{3}}\Big{/}3^{\ell},

where the denominator is the number of ternary words of length \ell and the numerator is the maximum size of a class of Abelian equivalent ternary words of length \ell. This maximum, reached for (almost) equal k1,k2,k3k_{1},k_{2},k_{3}, can be estimated by the Stirling formula to Θ(3/)\Theta(3^{\ell}/\ell). Thus p=O(1/)p_{\ell}=O(1/\ell). Then Algorithm 5 performs, on expectation, O(1)O(1) operations per iteration of the outer cycle. The result now follows. ∎

4 Experimental results

We ran a big series of experiments for α\alpha-A-free languages over the alphabets of size 2,3,,102,3,\ldots,10. Each of the experiments is a set of random walks in the prefix tree of a given language. Each walk follows the random depth-first search (Algorithm 1), with the number NN of visited nodes being of order 10510^{5} to 10710^{7}. The ultimate aim of every experiment was to make a well-grounded conjecture about the (in)finiteness of the studied language.

Our initial expectation was that random walks will demonstrate two easily distinguishable types of behaviour:

  • infinite-like: the level of the current node is (almost) proportional to the number of nodes visited, or

  • finite-like: from some point, the level of the current node oscillates near the maximum reached earlier.

However, the situation is not that straightforward: very long oscillations of level were detected during random walks even in some languages which are known to be infinite; for example, in the binary 4-A-free language. To overcome such an unwanted behaviour, we endowed Algorithm 1 with a “forced backtrack” rule:

  • let 𝗆𝗅=k{\sf ml}=k be the maximum level of a node reached so far; if f(k)f(k) nodes were visited since the last update of 𝗆𝗅{\sf ml} or since the last forced backtrack, then make a forced backtrack: from the current node, move g(k)g(k) edges up the tree and continue the search from the node reached.

Here f(k)f(k) and g(k)g(k) are some heuristically chosen monotone functions; we used f(k)=k3/2f(k)=\lceil k^{3/2}\rceil and g(k)=k1/2g(k)=\lceil k^{1/2}\rceil. Forced backtracking deletes the last g(k)g(k) letters of the current word in order to get out of a big finite subtree the search was supposed to traverse. The use of forced backtracking allowed us to classify the walks in almost all studied languages either as infinite-like or as finite-like. The results presented below are grouped by the alphabets.

4.1 Alphabets with 6,7,8,96,7,8,9, and 10 letters

In [13], it was proved (Theorem 3.1) that 𝖠𝖱𝖳(k)k2k3{\sf ART}(k)\geq\frac{k-2}{k-3} for all k5k\geq 5 and conjectured that the equality holds in all cases. However, the random search reveals a different picture. For each of the languages 𝖠𝖥(k,k2k3+){\sf AF}(k,\frac{k-2}{k-3}^{+}), k=6,7,8,9,10k=6,7,8,9,10, we ran random search with forced backtrack, using Algorithm 3 to decide the membership in the language; the search terminated when NN nodes were visited. We repeated the search 100 times with with N=106N=10^{6} and another 100 times with N=2106N=2\cdot 10^{6}. The results, presented in columns 3–8 of Table 1, clearly demonstrate finite-like behaviour of random walks. Moreover, the results suggest that neither of these languages contains a word much longer than 100 symbols. So we were able to prove the following result by (optimized) exhaustive search.

Theorem 4.1

One has 𝖠𝖱𝖳(k)>k2k3{\sf ART}(k)>\frac{k-2}{k-3} for k=6,7,8,9,10k=6,7,8,9,10.

Alphabet size Avoided power N=106N=10^{6} N=2106N=2\cdot 10^{6} Maximum length
𝗆𝗅max{\sf ml}_{max} 𝗆𝗅av{\sf ml}_{av} 𝗆𝗅med{\sf ml}_{med} 𝗆𝗅max{\sf ml}_{max} 𝗆𝗅av{\sf ml}_{av} 𝗆𝗅med{\sf ml}_{med}
6 (4/3)+(4/3)^{+} 112 98.9 98 114 101.1 101 116
7 (5/4)+(5/4)^{+} 116 100.3 100 124 103.9 102 125
8 (6/5)+(6/5)^{+} 103 94.8 95 102 96.2 96 105
9 (7/6)+(7/6)^{+} 108 95.6 96 107 98.8 99 117
10 (8/7)+(8/7)^{+} 121 107.7 108 128 111.6 111 148*
Table 1: Maximum levels 𝗆𝗅{\sf ml} reached by random walks in some Abelian power-free languages. Columns 3–5 (resp. 6–8) show the maximum, average, and median values of 𝗆𝗅{\sf ml} among 100 random walks visiting N=106N=10^{6} (resp., N=2106N=2\cdot 10^{6}) nodes each. Column 9 shows the length of a longest word in the language, found by exhaustive search.

A length-nn word is called nn-permutation if all its letters are pairwise distinct. We use the following lemma to reduce the search space.

Lemma 2

Let k6k\geq 6, α=k2k3+\alpha={\frac{k-2}{k-3}}^{+}, and let L1,L2L_{1},L_{2}, and L3L_{3} be subsets of L=𝖠𝖥(k,α)L={\sf AF}(k,\alpha) defined as follows:
- L1L_{1} is the set of all wLw\in L such that ww has the prefix 01(k3)01\cdots(k{-}3) and contains no (k1)(k-1)-permutations;
- L2L_{2} is the set of all wLw\in L such that ww has the prefix 01(k2)01\cdots(k{-}2) and contains no kk-permutations;
- L3L_{3} is the set of all wLw\in L having the prefix 01(k1)01\cdots(k{-}1).
Then LL is finite iff each of L1,L2L_{1},L_{2}, and L3L_{3} is finite.

Proof

The necessity is immediate from definitions; let us prove sufficiency. Let wLw\in L and let 1,2,3\ell_{1},\ell_{2},\ell_{3} be the lengths of the longest words in L1,L2L_{1},L_{2}, and L3L_{3} respectively. Let us show that |w|<1+2+3|w|<\ell_{1}+\ell_{2}+\ell_{3}. If uu is a word and aa is a letter, then auaaua is an Abelian |u|+2|u|+1\frac{|u|+2}{|u|+1}-power. Hence

  • ()(\diamond)

    any factor of ww of length k3k-3 is a (k3)(k-3)-permutation.

Now consider a factor uu of ww such that |u|=k1|u|=k-1. By ()(\diamond), one can write u=abucdu=abu^{\prime}cd, where uu^{\prime} does not contain the letters a,b,c,da,b,c,d and bcb\neq c. If a=ca=c and b=db=d, then uu is an Abelian k1k3\frac{k-1}{k-3}-power, which is impossible since uLu\in L as a factor of ww. Hence uu either begins or ends with a (k2)(k-2)-permutation. Thus ww contains a (k2)(k-2)-permutation beginning at the first or second position. Then ww contains a (k1)(k-1)-permutation due to finiteness of L1L_{1}; moreover, this permutation ends no later than at position 2+12+\ell_{1} and thus begins no later than at position 4k+14-k+\ell_{1}. Similarly, due to finiteness of L2L_{2}, ww contains a kk-permutation no later than at position 52k+1+25-2k+\ell_{1}+\ell_{2}. Finally, the finiteness of L3L_{3} implies the upper bound |w|42k+1+2+3|w|\leq 4-2k+\ell_{1}+\ell_{2}+\ell_{3}. In particular, LL is finite.∎

We ran (non-randomized) depth-first search on the prefix trees of the languages L1,L2L_{1},L_{2}, and L3L_{3} for the cases k=6,7,8,9,10k=6,7,8,9,10, using Algorithm 3 to detect Abelian powers, and proved that all these trees are finite. According to Lemma 2, this proves Theorem 4.1. The total number of visited nodes was approximately 0.43 billions for k=8k=8; 0.90 billions for k=7k=7; 6.29 billions for k=6k=6; 8.14 billions for k=9k=9; more than 500 billions for k=10k=10. The last case required about 2000 hours of processing time (single-core) by an ordinary laptop.

Remark 6

For each k=6,7,8,9k=6,7,8,9 it is feasible to run a single search which enumerates all lexmin words in the language 𝖠𝖥(k,k2k3+){\sf AF}(k,\frac{k-2}{k-3}^{+}). We performed these searches and found the maximum length of a word in each language (the last column of Table 1) and the distribution of words by their length (see Fig. 3 for the case k=8k=8). For k=10k=10, such a single search would require too much resources; here the value in the last column of Table 1 is the length of the longest word in L1L2L3L_{1}\cup L_{2}\cup L_{3}.

Refer to caption
Figure 3: Distribution of the number of lexmin words by length in the language 𝖠𝖥(8,65+){\sf AF}(8,\frac{6}{5}^{+}) (logarithmic scale).

Theorem 4.1 raises the question of avoidance of bigger α\alpha-A-powers over the same alphabets. As the next step, we ran experiments for the languages 𝖠𝖥(k,k3k4){\sf AF}(k,\frac{k-3}{k-4}). The results for k=7,8,9,10k=7,8,9,10 are presented in Table 2; random walks in these languages clearly demonstrate finite-type behaviour, while proving finiteness by exhaustive search looks hardly possible. On the contrary, the walks in the 6-ary language 𝖠𝖥(6,32){\sf AF}(6,\frac{3}{2}) demonstrate an infinite-like behaviour: the average value of 𝗆𝗅{\sf ml} for our experiments with N=105N=10^{5} is greater than 51045\cdot 10^{4}. We note that the obtained words are too long to use Algorithm 3, so we had to use slower Algorithm 2. Finally, we constructed random walks for the languages 𝖠𝖥(k,k3k4+){\sf AF}(k,\frac{k-3}{k-4}^{+}) (k=7,8,9,10k=7,8,9,10). They also demonstrate infinite-like behaviour. The obtained experimental results allow us to state the part of Conjecture 2 for the alphabets with 6 and more letters.

Alphabet size Avoided power N=106N=10^{6} N=2106N=2\cdot 10^{6}
𝗆𝗅max{\sf ml}_{max} 𝗆𝗅av{\sf ml}_{av} 𝗆𝗅med{\sf ml}_{med} 𝗆𝗅max{\sf ml}_{max} 𝗆𝗅av{\sf ml}_{av} 𝗆𝗅med{\sf ml}_{med}
7 4/34/3 510 374.5 371 510 397.5 394
8 5/45/4 211 179.7 179 223 185.0 184
9 6/56/5 192 157.2 156 191 162.3 161
10 7/67/6 175 154.0 154 187 159.7 158
Table 2: Maximum levels 𝗆𝗅{\sf ml} reached by random walks in some Abelian power-free languages. Columns 3–5 (resp. 6–8) show the maximum, average, and median values of 𝗆𝗅{\sf ml} among 100 random walks visiting N=106N=10^{6} (resp., N=2106N=2\cdot 10^{6}) nodes each.

4.2 Alphabets with 2,3,42,3,4, and 5 letters

Random walks in the prefix tree of the language 𝖠𝖥(5,32+){\sf AF}(5,\frac{3}{2}^{+}) demonstrate the infinite-like behaviour; Fig. 4 shows an example of dependence of the level of the current node on the number of nodes visited. Although we could not push random walks significantly farther down the tree (Algorithm 3 uses too much space to work on such big levels, so we relied on slow Algorithm 2), we obtained sufficient evidence to support Conjecture 1 for k=5k=5.

Refer to caption
Figure 4: An infinite-like random walk in the language 𝖠𝖥(5,32+){\sf AF}(5,\frac{3}{2}^{+}): a point (n,m)(n,m) of the graph means that the nnth node visited by the walk has depth mm.
Remark 7

As the language 𝖠𝖥(5,32+){\sf AF}(5,\frac{3}{2}^{+}) is supposed to be infinite, it is interesting to estimate its growth. Based on the technique described in [11], we estimate the number of words of length nn in 𝖠𝖥(5,32+){\sf AF}(5,\frac{3}{2}^{+}) as growing exponentially with nn at the rate close to 1.5. The upper bound 2.335 [13] on this rate is thus very loose.

For smaller alphabets, we studied the languages 𝖠𝖥(4,95+){\sf AF}(4,\frac{9}{5}^{+}), 𝖠𝖥(3,2+){\sf AF}(3,2^{+}), and 𝖠𝖥(2,113+){\sf AF}(2,\frac{11}{3}^{+}), indicated by Conjecture 1 as infinite. To detect Abelian powers, we used Algorithm 3 for the quaternary alphabet; for the ternary and binary alphabets, we worked with the reversals of 𝖠𝖥(3,2+){\sf AF}(3,2^{+}), and 𝖠𝖥(2,113+){\sf AF}(2,\frac{11}{3}^{+}) to benefit from the speed of Algorithm 5 detecting dual Abelian powers. Random walks (with forced backtracking) in each of three languages show the finite-like behaviour; see Table 3 and the example in Fig. 5. Longer random searches lead to somewhat better results, especially on average, due to multiple forced backtracks; however, nothing resembles a steady growth of maximum level with the total number of visited nodes. So the experimental results justify lower bounds from Conjecture 2 for k=2,3,4k=2,3,4. To get the upper bound for the ternary alphabet, we ran random walks for the language 𝖠𝖥(3,52+){\sf AF}(3,\frac{5}{2}^{+}) with the results similar to those obtained for 𝖠𝖥(5,32+){\sf AF}(5,\frac{3}{2}^{+}): all walks demonstrate the infinite-like behaviour; the level 𝗆𝗅=105{\sf ml}=10^{5} is reached within minutes.

Overall, we conclude that the experiments we conducted justify the formulation of Conjecture 2.

Alphabet size Avoided power N=106N=10^{6} N=2106N=2\cdot 10^{6} N=107N=10^{7}
𝗆𝗅max{\sf ml}_{max} 𝗆𝗅av{\sf ml}_{av} 𝗆𝗅med{\sf ml}_{med} 𝗆𝗅max{\sf ml}_{max} 𝗆𝗅av{\sf ml}_{av} 𝗆𝗅med{\sf ml}_{med} 𝗆𝗅max{\sf ml}_{max} 𝗆𝗅av{\sf ml}_{av} 𝗆𝗅med{\sf ml}_{med}
2 (11/3)+(11/3)^{+} 775 435.8 416 706 477.0 453 759 589.7 588
3 2+2^{+} 3344 1700.0 1671 5363 2228.8 2140 5449 3078.1 3148
4 (9/5)+(9/5)^{+} 1367 861.2 835 1734 986.8 956 2453 1414.7 1369
Table 3: Maximum levels 𝗆𝗅{\sf ml} reached by random walks in some Abelian power-free languages. Columns 3–5 (resp. 6–8, 9–11) show the maximum, average, and median values of 𝗆𝗅{\sf ml} among 100 random walks visiting N=106N=10^{6} (resp., N=2106N=2\cdot 10^{6}, N=107N=10^{7}) nodes each.

We ran two additional experiments with the language 𝖠𝖥(4,95+){\sf AF}(4,\frac{9}{5}^{+}). First, we took the longest word found by random search (it has length 2453) and extended it to the left with another long random search, repeated multiple times. The longest obtained word has length 3152, which seems to be a fair approximation of the maximum length of a word in 𝖠𝖥(4,95+){\sf AF}(4,\frac{9}{5}^{+}). Second, we tried an exhaustive enumeration of the words in 𝖠𝖥(4,95+){\sf AF}(4,\frac{9}{5}^{+}) to understand how fast is the initial growth and how far we can reach. We discovered that the language contains 10.6810.68 billions of lexmin words of length 90 compared to 9.499.49 billions of such words of length 89. Hence, 90 is still quite far from the length where the number of words will be maximal.

Refer to caption
Figure 5: A finite-like random walk in the language 𝖠𝖥(4,95+){\sf AF}(4,\frac{9}{5}^{+}): a point (n,m)(n,m) of the graph means that the nnth node visited by the walk has depth mm.

5 Future Work

Clearly, the main challenge in the topic is to find the exact values of the Abelian repetition threshold. Even finding one such value would be a great progress. Choosing the case to start with, we would suggest proving 𝖠𝖱𝖳(5)=3/2{\sf ART}(5)=3/2 because in this case the lower bound is already checked by exhaustive search in [13]. For all other alphabets, the proof of lower bounds suggested in Conjecture 2 is already a challenging task which cannot be solved by brute force.

Another piece of work is to refine Conjecture 2 by suggesting the precise values of 𝖠𝖱𝖳(2){\sf ART}(2), 𝖠𝖱𝖳(3){\sf ART}(3), 𝖠𝖱𝖳(4){\sf ART}(4), and 𝖠𝖱𝖳(6){\sf ART}(6). For bigger kk, random walks demonstrate an obvious “phase transition” at the point k3k4\frac{k-3}{k-4}: the behaviour of a walk switches from finite-like for 𝖠𝖥(k,k3k4){\sf AF}(k,\frac{k-3}{k-4}) to infinite-like for 𝖠𝖥(k,k3k4+){\sf AF}(k,\frac{k-3}{k-4}^{+}). However, the situation with small alphabets can be trickier. For example, we tried the next natural candidate for 𝖠𝖱𝖳(4){\sf ART}(4), namely, 11/611/6. For the random walks in 𝖠𝖥(4,116+){\sf AF}(4,\frac{11}{6}^{+}), with N=106N=10^{6} and forced backtracks, the range of obtained maximum levels in our experiments varied from 3000 to 20000; such big lengths show that there is no hope to see a clear-cut phase transition in the experiments with random walks.

Finally, we want to draw attention to the following fact. The quaternary 2-A-free word constructed by Keränen [8] contains arbitrarily long factors of the form xax¯xa\bar{x}, where aa is a letter and xx¯x\sim\bar{x}; thus it is not α\alpha-A-free for any α<2\alpha<2. Similarly, the word constructed by Dekking [5] for the ternary (resp., binary) alphabet is not α\alpha-A-free for any α<3\alpha<3 (resp., α<4\alpha<4). Hence some new constructions are necessary to improve upper bounds for 𝖠𝖱𝖳{\sf ART}.

References

  • [1] Carpi, A.: On Dejean’s conjecture over large alphabets. Theoret. Comput. Sci. 385, 137–151 (1999)
  • [2] Cassaigne, J., Currie, J.D.: Words strongly avoiding fractional powers. Eur. J. Comb. 20(8), 725–737 (1999)
  • [3] Currie, J.D., Rampersad, N.: A proof of Dejean’s conjecture. Math. Comp. 80, 1063–1070 (2011)
  • [4] Dejean, F.: Sur un théorème de Thue. J. Combin. Theory. Ser. A 13, 90–99 (1972)
  • [5] Dekking, F.M.: Strongly non-repetitive sequences and progression-free sets. J. Combin. Theory. Ser. A 27, 181–185 (1979)
  • [6] Erdös, P.: Some unsolved problems. Magyar Tud. Akad. Mat. Kutató Int. Közl. 6, 221–264 (1961)
  • [7] Evdokimov, A.A.: Strongly asymmetric sequences generated by a finite number of symbols. Soviet Math. Dokl. 9, 536–539 (1968)
  • [8] Keränen, V.: Abelian squares are avoidable on 4 letters. In: Kuich, W. (ed.) Proc. ICALP 1992. LNCS, vol. 623, pp. 41–52. Springer-Verlag (1992)
  • [9] Moulin-Ollagnier, J.: Proof of Dejean’s conjecture for alphabets with 5,6,7,8,9,105,6,7,8,9,10 and 1111 letters. Theoret. Comput. Sci. 95, 187–205 (1992)
  • [10] Pansiot, J.J.: A propos d’une conjecture de F. Dejean sur les répétitions dans les mots. Discr. Appl. Math. 7, 297–311 (1984)
  • [11] Petrova, E.A., Shur, A.M.: Branching frequency and Markov entropy of repetition-free languages. In: Developments in Language Theory - 25th International Conference, DLT, Proceedings. Lecture Notes in Computer Science, vol. 12811, pp. 328–341. Springer (2021)
  • [12] Rao, M.: Last cases of Dejean’s conjecture. Theoret. Comput. Sci. 412, 3010–3018 (2011)
  • [13] Samsonov, A.V., Shur, A.M.: On Abelian repetition threshold. RAIRO Theor. Inf. Appl. 46, 147–163 (2012)
  • [14] Thue, A.: Über unendliche Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 7, 1–22 (1906)
  • [15] Thue, A.: Über die gegenseitige Lage gleicher Teile gewisser Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 1, 1–67 (1912)