{elena.petrova,arseny.shur}@urfu.ru
Abelian Repetition Threshold Revisited
Abstract
Abelian repetition threshold is the number separating fractional Abelian powers which are avoidable and unavoidable over the -letter alphabet. The exact values of are unknown; the lower bounds were proved in [A.V. Samsonov, A.M. Shur. On Abelian repetition threshold. RAIRO ITA, 2012] and conjectured to be tight. We present a method of study of Abelian power-free languages using random walks in prefix trees and some experimental results obtained by this method. On the base of these results, we conjecture that the lower bounds for by Samsonov and Shur are not tight for all and prove this conjecture for . Namely, we show that in all these cases.
Keywords:
Abelian-power-free language, repetition threshold, prefix tree, random walk1 Introduction
Two words are Abelian equivalent if they have the same multiset of letters (in other terms, if they are anagrams of each other); Abelian repetition is a pair of Abelian equivalent factors in a word. The study of Abelian repetitions originated from the question of Erdös [6]: does there exist an infinite finitary word having no consecutive pair of Abelian equivalent factors? The factors of the form , where and are Abelian equivalent, are now called Abelian squares. In modern terms, Erdös’s question looks like “are Abelian squares avoidable over some finite alphabet?” This question was answered in the affirmative by Evdokimov [7]; the smallest possible alphabet has cardinality 4, as was proved by Keränen [8]. In a similar way, Abelian th powers are defined for arbitrary . Dekking [5] constructed infinite ternary words without Abelian cubes and infinite binary words without Abelian 4th powers. The results by Dekking and Keränen form an Abelian analog of the seminal result by Thue [14]: there exist an infinite ternary word containing no squares (factors of the form ) and an infinite binary word containing no cubes (factors of the form ).
Integral powers of words were later generalized to rational (fractional) powers: given a word of length , take a length- prefix of the infinite word ; then is the th power of ( is assumed). Usually, is referred to as a -power. A word is said to be -power-free if it contains no -powers with . Fractional powers gave rise to the notion of repetition threshold which is the function . The value is known since Thue [15]. Dejean [4] showed that and conjectured the remaining values (proved by Pansiot [10]) and for (proved by efforts of many authors [9, 1, 3, 12]). An extension of the notions of fractional power and repetition threshold to the Abelian case was proposed by Samsonov and Shur [13]. Integral Abelian powers can be generalized to fractional ones in several ways; however, for the case , one definition of an Abelian -power is preferable due to its symmetric nature. According to this definition, a word is an Abelian th power of the word if , , and is Abelian equivalent to (in [13], such Abelian powers were called strong). Note that the reversal of an Abelian -power is also an Abelian -power. In this paper, we consider only strong Abelian powers; see Section 2 for the definition in the case . Given the definition of fractional Abelian powers, one naturally defines Abelian -power-free words and Abelian repetition threshold . Cassaigne and Currie [2] showed that for any their exists an Abelian -power-free word over a finite alphabet of size . Surely, this bound is very loose but it proves that . In [13], the lower bounds , for were proved and conjectured to be tight; in full, this conjecture is as follows.
Conjecture 1 ([13])
; ; ; for .
Up to now, no exact values of are known. One reason for the lack of progress in estimating is the fact that the number of -ary Abelian -power-free words can be finite but so huge that these words cannot be enumerated by exhaustive search.
In the present study, we approach Abelian -power-free words using randomized depth-first search. The language of all -ary Abelian -power-free words is viewed as a prefix tree : the elements of are nodes of the tree and is an ancestor of in the tree iff is a prefix of . The search starts at the root (empty word) and is organized as follows. Reaching a node for the first time, we choose a random letter , check ad hoc whether and descend to if this node exists; visiting next times, we choose a random letter among the letters not chosen at before, and proceed the same way. If there is no choice, we return to the parent of (and thus will never reach in the future). We repeated the search multiple times and analysed the maximum level of a node reached in the tree and the change of level during the search. Based on this analysis, we state
Conjecture 2
; ; ; ; ; for .
2 Definitions and Notation
We study finite words over finite alphabets, using standard notation for a (linearly ordered) alphabet, for its size, for the set of all finite words over , including the empty word . For a length- word we write ; the elements of the range are positions in , the length of is denoted by . A word is a factor of if for some (possibly empty) words and ; the condition (resp., ) means that is a prefix (resp., suffix) of . Any factor of can be represented as for some and ( means ). A factor of can have several such representations; we say that specifies the occurrence of at position .
A -power of a word is the concatenation of copies of , denoted by . This notion can be extended to -powers for an arbitrary rational . The -power of is the word such that and is a prefix of . A word is -free (resp., -free) if no one of its factors is a -power with (resp., ).
The Parikh vector of a word is an integer vector of length whose coordinates are the numbers of occurrences of the letters from in . Thus, the word over the alphabet has the Parikh vector . Two words and are Abelian equivalent (denoted by ) if . The reversal of a word is the word . Clearly, . A nonempty word is an Abelian th power (-A-power) if , where for all indices . A 2-A-power is an Abelian square, and a 3-A-power is an Abelian cube. Thus, -A-powers generalize -powers by relaxing the equality of factors to their Abelian equivalence. However, there are many ways to generalize the notion of an -power to the Abelian case, and all of them have drawbacks. The reason is that does not imply for any pair of factors of and . If , we define an -A-power as a word such that and . The advantage of this definition is that the reversal of an -A-power is an -A-power as well. For the situation is worse: all natural definitions compatible with the definition of -A-power are not symmetric with respect to reversals (see [13] for more details). So we give the definition which is compatible with the case : an -A-power is a word such that , , , and is Abelian equivalent to a prefix of . In [13], such words are called strong Abelian -powers. For a given , -A-free and -A-free words are defined in the same way as -free (-free) words. It is convenient to extend rational numbers with “numbers” of the form , postulating the equivalence of the inequalities and (resp., and ).
A language is any subset of . The reversal of a language consists of the reversals of all words in . The -A-free language over (where belongs to extended rationals) consists of all -A-free words over and is denoted by . These languages are the main objects of our study aimed at finding the Abelian repetition threshold, which is the function . The languages are closed under permutations: if is a permutation of the alphabet, then the words and are -A-free for exactly the same values of . This makes possible the enumeration of the words in languages by considering only lexmin words: a word is lexmin if for any permutation of .
Suppose that a language is factorial (i.e., closed under taking factors of its words); for example, all languages are factorial. Then can be represented by its prefix tree , which is a rooted labeled tree whose nodes are elements of and edges have the form , where is a letter. Thus is an ancestor of iff is a prefix of . We study the languages through different types of search in their prefix trees.
3 Algorithms
In this section we present the algorithms we develop for the use in experiments. First we describe the random depth-first search in the prefix tree of an arbitrary factorial language . Given a number , the algorithm visits distinct nodes of following the depth-first order and returns the maximum level of a visited node. The search can be easily augmented to return the word corresponding to the node of maximum level, or to log the sequence of levels of visited nodes. Algorithm 1 below describes one iteration of the search. In the algorithm, is the word corresponding to the current node; is the set of all letters such that the search has not tried the node yet; is the maximum level reached so far; is the number of visited nodes; is the predicate returning if and otherwise. The lines 3 and 8 refer to the updates of data structures used to compute . The search starts with , , . A variant of this search algorithm was used in [11] to numerically estimate the entropy of some -free and -A-free languages.
The key to an efficient search is a fast algorithm computing the predicate . The following fact is very useful: if is a proper prefix of , then the node for exists and hence . Below we present four algorithms checking, for a given , whether a word has a -A-power with as a suffix. The cases and are considered in Sections 3.1 and 3.2 respectively.
3.1 Avoiding small powers
Let and be a word of length such that all proper prefixes of are -A-free. To prove -A-free, it is necessary and sufficient to show that
-
no suffix of can be written as such that , , and .
Remark 1
Since Abelian equivalence is not closed under taking any sort of factors of words, the ratio in can significantly exceed . For example, all proper prefixes, and even all proper factors, of the word are -A-free, while is an Abelian square. Hence for each suffix of one should check multiple candidates to the factor in . The number of such candidates can be as big as ; in total, candidates to the pair should be analysed.
A reasonable approach is to store the Parikh vectors of all prefixes of ; they spend words of space in total and require time for update when a letter is appended or deleted on the right end of the word. Then the Parikh vector of each factor of is obtained as the difference of Parikh vectors of corresponding prefixes. So one comparison of factors takes time, which means time for performing all comparisons of candidate pairs in in a naive way (see Remark 1). Algorithm 2 below gets many of the comparisons for free. It makes use of two length- arrays for each letter : is the number of occurrences of in (= a coordinate of the Parikh vector of the prefix ) and is the position of th from the left letter in the current word . Each of the arrays can be updated in time when a letter is appended or deleted on the right end of . We specify the lines 3 and 8 of Algorithm 1 as follows. At line 3, we delete the Parikh vector of from -arrays and delete the last element of , where is the last letter of . At line 8, we add the Parikh vector of to -arrays and add a new element to .
The arrays and are used to compute two auxiliary functions, and . The function returns the Parikh vector of ; its coordinates are just the differences of the form . The function returns the biggest number such that or zero if there is no such number. Thus the function returns 0 if contains, for some , less ’s than ; i.e., if . If no such exists, then .
Proposition 1
Let be a number such that and be a word all proper prefixes of which are -A-free. Then Algorithm 2 correctly detects whether is -A-free.
Proof
Let us show that Algorithm 2 verifies the condition . The outer cycle of the algorithm fixes the first position of the suffix of ; the suffixes are analysed in the order of increased length . If a forbidden suffix is detected during the iteration (we discuss the correctness and completeness of this detection below), then the algorithm breaks the outer cycle in line 14 and returns . Thus at the current iteration of the outer cycle the condition is already verified for all shorter suffixes. The iteration uses a simple observation: if , then every word , containing , satisfies . We proceed as follows. We fix the rightmost position where a factor satisfying can end. Initially as can immediately precede (see the example in Remark 1). Then we compute the shortest factor such that . If , the suffix of violates . Otherwise cannot begin later than at the position by the construction of . Hence we decrease by setting and repeat the above procedure in a loop. The verification of for ends successfully when either does not exist (i.e., for the current value of ) or is too small (i.e., with means ). The described process is illustrated by Fig. 1. Details are provided below.


In lines 4–6 the algorithm calls to compute and to find the mentioned factor for . If , then and hence every suffix of satisfies . Moreover, one can observe that for each which immediately verifies for all suffixes of that are longer than . Hence in this case the verification of is already done; respectively, the algorithm breaks the outer cycle in line 7 and returns .
If no break has happened, the algorithm enters the inner cycle, checks whether (line 9) and breaks with the output if this condition holds. If it does not, the algorithm decreases as described above (line 12) and computes the new factor (line 13). If does not exist, gets 0, which results in the immediate exit from the inner cycle. If is computed but its position is too small, then the cycle is also exited. The exit means the end of the th iteration.
Thus, Algorithm 2 returns only if it finds a suffix of which violates . For the other direction, let violate such that is minimal over all suffixes violating it. Then Algorithm 2 cannot stop before the iteration which checks . During this iteration, cannot become smaller than by the definition of the factor . As decreases at each iteration of the inner cycle, eventually will be found. Thus the algorithm indeed verifies and thus detects the -A-freeness of . ∎
Remark 2
As both and work in time, Algorithm 2 processes a word of length in time, where is the total number of the inner cycle iterations during the course of the algorithm. Clearly, . The results of experiments suggest that on expectation. This is indeed the case if is a random word, as Lemma 1 below shows. This lemma implies that the iteration processing a suffix builds, on expectation, words .
Lemma 1
Suppose that an infinite word is chosen uniformly at random among all -ary infinite words, is a prefix of , and is the shortest prefix of such that . Then the expected length of is .
Proof
Let , . First consider the case . The process can be viewed as follows: first, is generated by tosses of a fair coin; then another tosses generate some prefix of ; additional tosses are made one by one until the desired result is reached after tosses. The Parikh vector of a word of a known length over is determined by the number of 1’s. Hence is a random variable with the binomial distribution ; similarly, is a random variable with the same distribution.
The vector has the form for some integer . To obtain , we should make “successful” tosses with the probability of “success” being ; hence the expectation of equals . Thus it remains to find the expectation of . Since , we see that is the standard deviation of by definition.
Due to symmetry, and have the same distribution. Hence we can replace by . The random variable has the binomial distribution , so its standard deviation is . Thus , as desired.
Over larger alphabets the expectation of can only increase. The easiest way to see this is to split arbitrarily into two subsets and of equal size. Then with respect to has a deficiency of letters from one of these subsets, say, . By the argument for the binary alphabet, additional letters is needed, on expectation, to cover this deficiency. This is a necessary (but not sufficient) condition to obtain the word . Hence . ∎
Algorithm 2 significantly speeds up the naive algorithm but is still rather slow. For the case a much faster dictionary-based Algorithm 3 is presented below. However, Algorithm 3 consumes the amount of memory which is quadratic in ; this limits the depth of the search to the values of about for an ordinary laptop.
When processing a word , the algorithm has in the dictionary all factors of up to the Abelian equivalence. Recall that a dictionary contains a set of pairs (key, value), where all keys are unique, and supports fast lookup, addition and deletion by key. For the dictionary used in Algorithm 3, the keys are Parikh vectors and the values are lists of positions, in the increasing order, of the factors having this Parikh vector. The algorithm accesses only the last (maximal) element of the list. Let us describe the updates of the dictionary (lines 3 and 8 of Algorithm 1). At line 3, we delete all suffixes of from the dictionary. For a suffix , this means the deletion of the last element from the list ; if the list becomes empty, the entry for is also deleted. At line 8, all suffixes of are added to the dictionary. For a suffix , if was not in the dictionary, an entry is created; then the position is added to the end of the list .
Proposition 2
Let be a number such that and be a word all proper prefixes of which are -A-free. Then Algorithm 3 correctly detects whether is -A-free.
Proof
Let us show that Algorithm 3 verifies . First suppose that the algorithm returned . Then it broke from the for cycle (line 9); let be the last suffix processed. The lookup by the key returned the position of a factor , and the condition in line 8 was true. Then the suffix violates . Indeed, since was found by the key ; the first inequality means that and do not overlap in ; the second inequality is equivalent to .
Now suppose that the algorithm returned . Aiming at a contradiction, assume that has a suffix violating . Let be the shortest such suffix. Consider the iteration of the for cycle, where was processed. The key was present in the dictionary because . If (line 7) corresponded to the from our “bad” suffix, i.e., , then both inequalities in line 8 held because and do not overlap in and . But then the algorithm would have returned , contradicting our assumption. Hence was the position of some other which occurs in later than . By the choice of the suffix , cannot have shorter suffix with . This means that the occurrences of and overlap (see Fig. 2).

Note that and also overlap. Otherwise, has a prefix of the form and , contradicting the condition that all proper prefixes of are -A-free. Then , , , as shown in Fig. 2, and are nonempty. We observe that imply and . By the condition on the prefixes of , . Hence and then . Therefore , so the suffix of violates . But , contradicting the choice of . This contradiction proves that satisfies . ∎
Dictionaries based on hash table techniques such as linear probing or cuckoo hashing guarantee expected constant time per dictionary operation. As Algorithm 3 consists of a single cycle with a constant number of operations inside, the following statement is straightforward.
Proposition 3
For a word of length , Algorithm 3 performs operations, including dictionary operations.
Remark 3
A slight modification of Algorithm 3 allows one to process the important case within the same complexity bound. The whole argument from the proof of Proposition 2 remains valid for except for one specific situation: in Fig. 2 it is possible that is empty and . In this situation Algorithm 3 misses the Abelian square . To fix this, we add a patch after line 7:
7.5: if then
As an example, consider . Processing the suffix , Algorithm 3 retrieves from the dictionary by the key . The corresponding factor overlaps and the condition in line 8 would fail for . However, satisfies the condition in the inserted line 7.5 and thus the factor at will be reached. The condition in line 8 holds for and the forbidden Abelian repetition is detected.
Remark 4
Algorithm 3 can be further modified to work for all . If we replace the patch from Remark 3 with the following one:
7.5: while do
the algorithm will find the closest factor which does not overlap with . This new patch introduces an inner cycle and thus affects the time complexity but the algorithm remains faster in practice than Algorithm 2.
3.2 Avoiding big powers
Let . The case for ternary words and the case for binary words are relevant to the studies of Abelian repetition threshold. We provide here the algorithms for the first case; the algorithms for the second case are very similar (the only difference is that one should check for Abelian cubes instead of Abelian squares). An Abelian -power with has the form , where , is equivalent to a prefix of , and . We can write the Abelian square as where and . Consequently, if all proper prefixes of a word are -A-free, then is -A-free iff the following analog of holds:
-
no suffix of can be written as such that , , is an Abelian square, and .
Verifying for , we proceed for its suffix as follows. Within the range determined by , we search for all factors such that . The search is organized as in Algorithm 2 (see Fig. 1). For each we consider the corresponding suffix of and check whether is an Abelian square. If yes, violates . In Algorithm 4 below, we make use of the arrays and functions designed for Algorithm 2.
Proposition 4
Let be a number such that and be a word all proper prefixes of which are -A-free. Then Algorithm 4 correctly detects whether is -A-free.
Proof
Algorithm 4 is similar to Algorithm 2, so we focus on their difference. If some suffix violates , then ; hence the range for the outer cycle in line 3. For a fixed we repeatedly seek for the shortest factor with the given right bound and the property . If (condition in line 9 holds), then is a candidate for in the suffix violating . The initial value for (line 4) is set to ensure the condition . The candidate found in line 9 is checked in line 10 for the remaining condition: is an Abelian square. Namely, we check that is even and its left and right halves have the same Parikh vector. If this condition holds, the algorithm broke both inner and outer cycles and returns . If the condition fails, we decrease by 1 and compute the factor for this new right bound. The rest is the same as in Algorithm 2. So we can conclude that Algorithm 4 verifies . ∎
Remark 5
Algorithm 4 is rather slow. But it appears that dual Abelian powers can be detected by a much faster Algorithm 5. Let us give the definitions. As was mentioned in Section 2, the reversal of an -A-power for is not necessarily an -power. For example, is a -A-power while does not begin with an Abelian square. We call a dual -A-power if is an -A-power; the notion of dual -A-free word is defined by analogy with -A-free word. Dual -A-free words are exactly the reversals of -A-free words.
Assume that all proper prefixes of a word are dual -A-free, where . Then is dual -A-free iff the following analog of holds:
-
no suffix of can be written as such that , is equivalent to a suffix of , and .
Proposition 5
Let be a number such that and be a word all proper prefixes of which are dual -A-free. Then Algorithm 5 correctly detects whether is dual -A-free.
Proof
If some suffix violates , then ; hence the range for the outer cycle in line 3. The general scheme is as follows. For each processed suffix , the algorithm first checks if ends with an Abelian square (); if yes, it checks whether is preceded by some which is equivalent to a suffix of . If such an is found, the algorithm detects a violation of and stops. If either or is not found, the algorithm moves to the next appropriate suffix. Let us consider the details.
In line 6, the shortest such that is a suffix of and is computed. If (the condition in line 7), then is found and we enter the inner cycle to find . If , we note that the suffixes of of lengths between and cannot be Abelian squares; then the next suffix to be considered has the length , as is set in line 18. In the inner cycle, a similar idea is implemented: for each processed suffix of the algorithm finds the shortest word satisfying (line 11). If (line 12), then is found; otherwise, the next suffix of to be checked is of length (line 15). The inner cycle breaks if this length exceeds the length of .
We have shown that Algorithm 5 stops with the answer if it finds a suffix of that violates ; if it finishes the check of the suffix without breaking or skips this suffix at all, then has no suffix , violating . Therefore, the algorithm verifies . ∎
Algorithm 5 works extremely fast compared to other algorithms from this section. The following statement holds for the case of the ternary alphabet.
Proposition 6
For a word picked up uniformly at random from the set , Algorithm 5 works in expected time.
Proof
Lemma 1 says that the expected length of the word found in line 6 is and thus, on expectation, the assignment in line 18 leads to skipping suffixes of . Hence the expected total number of processed suffixes of is . By the same lemma, the inner cycle for a suffix runs, on expectation, iterations, so its expected time complexity is . Thus, processing the suffix of length , Algorithm 5 performs operations, where is the probability to enter the inner cycle, i.e., the probability that two random ternary words of length are Abelian equivalent. Let us estimate this probability. One has
where the denominator is the number of ternary words of length and the numerator is the maximum size of a class of Abelian equivalent ternary words of length . This maximum, reached for (almost) equal , can be estimated by the Stirling formula to . Thus . Then Algorithm 5 performs, on expectation, operations per iteration of the outer cycle. The result now follows. ∎
4 Experimental results
We ran a big series of experiments for -A-free languages over the alphabets of size . Each of the experiments is a set of random walks in the prefix tree of a given language. Each walk follows the random depth-first search (Algorithm 1), with the number of visited nodes being of order to . The ultimate aim of every experiment was to make a well-grounded conjecture about the (in)finiteness of the studied language.
Our initial expectation was that random walks will demonstrate two easily distinguishable types of behaviour:
-
•
infinite-like: the level of the current node is (almost) proportional to the number of nodes visited, or
-
•
finite-like: from some point, the level of the current node oscillates near the maximum reached earlier.
However, the situation is not that straightforward: very long oscillations of level were detected during random walks even in some languages which are known to be infinite; for example, in the binary 4-A-free language. To overcome such an unwanted behaviour, we endowed Algorithm 1 with a “forced backtrack” rule:
-
•
let be the maximum level of a node reached so far; if nodes were visited since the last update of or since the last forced backtrack, then make a forced backtrack: from the current node, move edges up the tree and continue the search from the node reached.
Here and are some heuristically chosen monotone functions; we used and . Forced backtracking deletes the last letters of the current word in order to get out of a big finite subtree the search was supposed to traverse. The use of forced backtracking allowed us to classify the walks in almost all studied languages either as infinite-like or as finite-like. The results presented below are grouped by the alphabets.
4.1 Alphabets with , and 10 letters
In [13], it was proved (Theorem 3.1) that for all and conjectured that the equality holds in all cases. However, the random search reveals a different picture. For each of the languages , , we ran random search with forced backtrack, using Algorithm 3 to decide the membership in the language; the search terminated when nodes were visited. We repeated the search 100 times with with and another 100 times with . The results, presented in columns 3–8 of Table 1, clearly demonstrate finite-like behaviour of random walks. Moreover, the results suggest that neither of these languages contains a word much longer than 100 symbols. So we were able to prove the following result by (optimized) exhaustive search.
Theorem 4.1
One has for .
Alphabet size | Avoided power | Maximum length | ||||||
---|---|---|---|---|---|---|---|---|
6 | 112 | 98.9 | 98 | 114 | 101.1 | 101 | 116 | |
7 | 116 | 100.3 | 100 | 124 | 103.9 | 102 | 125 | |
8 | 103 | 94.8 | 95 | 102 | 96.2 | 96 | 105 | |
9 | 108 | 95.6 | 96 | 107 | 98.8 | 99 | 117 | |
10 | 121 | 107.7 | 108 | 128 | 111.6 | 111 | 148* |
A length- word is called -permutation if all its letters are pairwise distinct. We use the following lemma to reduce the search space.
Lemma 2
Let , , and let , and be subsets of defined as follows:
- is the set of all such that has the prefix and contains no -permutations;
- is the set of all such that has the prefix and contains no -permutations;
- is the set of all having the prefix .
Then is finite iff each of , and is finite.
Proof
The necessity is immediate from definitions; let us prove sufficiency. Let and let be the lengths of the longest words in , and respectively. Let us show that . If is a word and is a letter, then is an Abelian -power. Hence
-
any factor of of length is a -permutation.
Now consider a factor of such that . By , one can write , where does not contain the letters and . If and , then is an Abelian -power, which is impossible since as a factor of . Hence either begins or ends with a -permutation. Thus contains a -permutation beginning at the first or second position. Then contains a -permutation due to finiteness of ; moreover, this permutation ends no later than at position and thus begins no later than at position . Similarly, due to finiteness of , contains a -permutation no later than at position . Finally, the finiteness of implies the upper bound . In particular, is finite.∎
We ran (non-randomized) depth-first search on the prefix trees of the languages , and for the cases , using Algorithm 3 to detect Abelian powers, and proved that all these trees are finite. According to Lemma 2, this proves Theorem 4.1. The total number of visited nodes was approximately 0.43 billions for ; 0.90 billions for ; 6.29 billions for ; 8.14 billions for ; more than 500 billions for . The last case required about 2000 hours of processing time (single-core) by an ordinary laptop.
Remark 6
For each it is feasible to run a single search which enumerates all lexmin words in the language . We performed these searches and found the maximum length of a word in each language (the last column of Table 1) and the distribution of words by their length (see Fig. 3 for the case ). For , such a single search would require too much resources; here the value in the last column of Table 1 is the length of the longest word in .

Theorem 4.1 raises the question of avoidance of bigger -A-powers over the same alphabets. As the next step, we ran experiments for the languages . The results for are presented in Table 2; random walks in these languages clearly demonstrate finite-type behaviour, while proving finiteness by exhaustive search looks hardly possible. On the contrary, the walks in the 6-ary language demonstrate an infinite-like behaviour: the average value of for our experiments with is greater than . We note that the obtained words are too long to use Algorithm 3, so we had to use slower Algorithm 2. Finally, we constructed random walks for the languages (). They also demonstrate infinite-like behaviour. The obtained experimental results allow us to state the part of Conjecture 2 for the alphabets with 6 and more letters.
Alphabet size | Avoided power | ||||||
---|---|---|---|---|---|---|---|
7 | 510 | 374.5 | 371 | 510 | 397.5 | 394 | |
8 | 211 | 179.7 | 179 | 223 | 185.0 | 184 | |
9 | 192 | 157.2 | 156 | 191 | 162.3 | 161 | |
10 | 175 | 154.0 | 154 | 187 | 159.7 | 158 |
4.2 Alphabets with , and 5 letters
Random walks in the prefix tree of the language demonstrate the infinite-like behaviour; Fig. 4 shows an example of dependence of the level of the current node on the number of nodes visited. Although we could not push random walks significantly farther down the tree (Algorithm 3 uses too much space to work on such big levels, so we relied on slow Algorithm 2), we obtained sufficient evidence to support Conjecture 1 for .

Remark 7
For smaller alphabets, we studied the languages , , and , indicated by Conjecture 1 as infinite. To detect Abelian powers, we used Algorithm 3 for the quaternary alphabet; for the ternary and binary alphabets, we worked with the reversals of , and to benefit from the speed of Algorithm 5 detecting dual Abelian powers. Random walks (with forced backtracking) in each of three languages show the finite-like behaviour; see Table 3 and the example in Fig. 5. Longer random searches lead to somewhat better results, especially on average, due to multiple forced backtracks; however, nothing resembles a steady growth of maximum level with the total number of visited nodes. So the experimental results justify lower bounds from Conjecture 2 for . To get the upper bound for the ternary alphabet, we ran random walks for the language with the results similar to those obtained for : all walks demonstrate the infinite-like behaviour; the level is reached within minutes.
Overall, we conclude that the experiments we conducted justify the formulation of Conjecture 2.
Alphabet size | Avoided power | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
2 | 775 | 435.8 | 416 | 706 | 477.0 | 453 | 759 | 589.7 | 588 | |
3 | 3344 | 1700.0 | 1671 | 5363 | 2228.8 | 2140 | 5449 | 3078.1 | 3148 | |
4 | 1367 | 861.2 | 835 | 1734 | 986.8 | 956 | 2453 | 1414.7 | 1369 |
We ran two additional experiments with the language . First, we took the longest word found by random search (it has length 2453) and extended it to the left with another long random search, repeated multiple times. The longest obtained word has length 3152, which seems to be a fair approximation of the maximum length of a word in . Second, we tried an exhaustive enumeration of the words in to understand how fast is the initial growth and how far we can reach. We discovered that the language contains billions of lexmin words of length 90 compared to billions of such words of length 89. Hence, 90 is still quite far from the length where the number of words will be maximal.

5 Future Work
Clearly, the main challenge in the topic is to find the exact values of the Abelian repetition threshold. Even finding one such value would be a great progress. Choosing the case to start with, we would suggest proving because in this case the lower bound is already checked by exhaustive search in [13]. For all other alphabets, the proof of lower bounds suggested in Conjecture 2 is already a challenging task which cannot be solved by brute force.
Another piece of work is to refine Conjecture 2 by suggesting the precise values of , , , and . For bigger , random walks demonstrate an obvious “phase transition” at the point : the behaviour of a walk switches from finite-like for to infinite-like for . However, the situation with small alphabets can be trickier. For example, we tried the next natural candidate for , namely, . For the random walks in , with and forced backtracks, the range of obtained maximum levels in our experiments varied from 3000 to 20000; such big lengths show that there is no hope to see a clear-cut phase transition in the experiments with random walks.
Finally, we want to draw attention to the following fact. The quaternary 2-A-free word constructed by Keränen [8] contains arbitrarily long factors of the form , where is a letter and ; thus it is not -A-free for any . Similarly, the word constructed by Dekking [5] for the ternary (resp., binary) alphabet is not -A-free for any (resp., ). Hence some new constructions are necessary to improve upper bounds for .
References
- [1] Carpi, A.: On Dejean’s conjecture over large alphabets. Theoret. Comput. Sci. 385, 137–151 (1999)
- [2] Cassaigne, J., Currie, J.D.: Words strongly avoiding fractional powers. Eur. J. Comb. 20(8), 725–737 (1999)
- [3] Currie, J.D., Rampersad, N.: A proof of Dejean’s conjecture. Math. Comp. 80, 1063–1070 (2011)
- [4] Dejean, F.: Sur un théorème de Thue. J. Combin. Theory. Ser. A 13, 90–99 (1972)
- [5] Dekking, F.M.: Strongly non-repetitive sequences and progression-free sets. J. Combin. Theory. Ser. A 27, 181–185 (1979)
- [6] Erdös, P.: Some unsolved problems. Magyar Tud. Akad. Mat. Kutató Int. Közl. 6, 221–264 (1961)
- [7] Evdokimov, A.A.: Strongly asymmetric sequences generated by a finite number of symbols. Soviet Math. Dokl. 9, 536–539 (1968)
- [8] Keränen, V.: Abelian squares are avoidable on 4 letters. In: Kuich, W. (ed.) Proc. ICALP 1992. LNCS, vol. 623, pp. 41–52. Springer-Verlag (1992)
- [9] Moulin-Ollagnier, J.: Proof of Dejean’s conjecture for alphabets with and letters. Theoret. Comput. Sci. 95, 187–205 (1992)
- [10] Pansiot, J.J.: A propos d’une conjecture de F. Dejean sur les répétitions dans les mots. Discr. Appl. Math. 7, 297–311 (1984)
- [11] Petrova, E.A., Shur, A.M.: Branching frequency and Markov entropy of repetition-free languages. In: Developments in Language Theory - 25th International Conference, DLT, Proceedings. Lecture Notes in Computer Science, vol. 12811, pp. 328–341. Springer (2021)
- [12] Rao, M.: Last cases of Dejean’s conjecture. Theoret. Comput. Sci. 412, 3010–3018 (2011)
- [13] Samsonov, A.V., Shur, A.M.: On Abelian repetition threshold. RAIRO Theor. Inf. Appl. 46, 147–163 (2012)
- [14] Thue, A.: Über unendliche Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 7, 1–22 (1906)
- [15] Thue, A.: Über die gegenseitige Lage gleicher Teile gewisser Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 1, 1–67 (1912)