Extensions and reductions of square-free words
Abstract.
A word is square-free if it does not contain a nonempty word of the form as a factor. A famous 1906 result of Thue asserts that there exist arbitrarily long square-free words over a -letter alphabet. We study square-free words with additional properties involving single-letter deletions and extensions of words.
A square-free word is steady if it remains square-free after deletion of any single letter. We prove that there exist infinitely many steady words over a -letter alphabet. We also demonstrate that one may construct steady words of any length by picking letters from arbitrary alphabets of size assigned to the positions of the constructed word. We conjecture that both bounds can be lowered to , which is best possible.
In the opposite direction, we consider square-free words that remain square-free after insertion of a single (suitably chosen) letter at every possible position in the word. We call them bifurcate. We prove a somewhat surprising fact, that over a fixed alphabet with at least three letters, every steady word is bifurcate. We also consider families of bifurcate words possessing a natural tree structure. In particular, we prove that there exists an infinite tree of doubly infinite bifurcate words over alphabet of size .
1. Introduction
A square is a nonempty word of the form . A word contains a square if it can be written as , with nonempty, while and possibly empty. Otherwise, is called square-free. A famous theorem of Thue [29] (see [3]) asserts that there exist infinitely many square-free words over a -letter alphabet. This result initiated combinatorics on words — the whole branch of mathematics, abundant in many deep results, exciting problems, and unexpected connections to diverse areas of science (see [1, 2, 4, 6, 21, 22]).
In this paper we study square-free words with additional properties involving two natural operations, a single-letter extension and a single-letter deletion, defined as follows.
Let be a word of length over a fixed alphabet . For , let and denote the prefix and the suffix of of length , respectively. Notice that the words and are empty. In particular, we have for every . An extension of at position is a word of the form , where is any letter from . In this case we also say that is a reduction of .
A square-free word is extremal over if there is no square-free extension of . Grytczuk, Kordulewski and Niewiadomski [15] proved that there exist infinitely many extremal ternary words, the shortest one being
They also conjectured that there are no extremal words over an alphabet of size . Recently, Hong and Zhang [17] proved that this is true for an alphabet of size .
Harju [16] introduced a complementary concept of irreducible words. These are square-free words whose any non-trivial reduction (the deleted letter is neither the first one nor the last one in the word) contains a square. He proved that for any there exists a ternary irreducible word of length .
In this article we consider square-free words with the very opposite properties, defined as follows: a square-free word is steady if it remains square-free after deleting any single letter. For instance, the word is steady since all of its four reductions
are square-free, while is not steady since one of its reductions is . Generally, every ternary square-free word of length at least must contain a factor of the form , and therefore is not steady. However, there are steady words of any given length over larger alphabets, as we prove in Theorem 1. We also consider a general variant of such statement in the following list setting.
Conjecture 1.
Let be a positive integer and let be a sequence of alphabets of size . Then there exists a steady word , with for every .
In Theorem 2 we prove that the statement of the conjecture holds for alphabets with at least letters. Let us mention that analogous conjecture for pure square-free words (with alphabets of size ), stated in [11], is still open. Currently the best general result confirms it for alphabets of size (see [13, 12, 26] for three different proofs). Recently Rosenfeld [27] proved that it holds when the union of all alphabets is a -element set.
We also consider square-free words defined similarly with respect to extensions of words. A square-free word is bifurcate over a fixed alphabet if it has at least one square-free extension at every position. For instance, the word is bifurcate over and here are its five square-free extensions:
Thus the word is both steady and bifurcate. This not a coincidence — for an alphabet with at least three letters, every steady word is bifurcate, as we prove in Theorem 3.
Clearly, ternary bifurcate words cannot be too long. Indeed, every ternary square-free word of length at least contains a factor of the form (or its reversal). On the other hand, any ternary square-free word is bifurcate over a -letter alphabet. One may, however, inquire about the existence of an infinite chain of bifurcate quaternary words.
Conjecture 2.
There exists an infinite sequence of quaternary bifurcate words such that is a single-letter extension of , for each .
A much stronger property holds over alphabets of size at least , namely there exists a complete bifurcate tree of bifurcate words (rooted at any single letter), in which every word of length has descendants, corresponding to the extensions at different positions, and each of them is again bifurcate, and the same is true for all of their descendant, and so on, ad infinitum. This curious fact follows easily from a result of Kündgen and Pelsmajer on nonrepetitive colorings of outerplanar graphs, as noted in [14] in a different context of the on-line Thue games. We will recall this short argument for completeness. It seems plausible, however, that the actual number of letters needed for such an amazing property may be much smaller.
Conjecture 3.
There exists a complete bifurcate tree over an alphabet of size .
It is not hard to verify that the above conjecture is tight, as we demonstrate at the end of the following section.
2. Results
We shall present proofs of our results in the following subsections.
2.1. Steady words over a -letter alphabet
For a rational , a fractional -power is a word , where is a prefix of of length . We say that is the exponent of the word . The famous Dejean’s conjecture (now a theorem) states that the infimum of for which there exist infinite -ary words without factors of exponent greater than is equal to
The case is a simple consequence of the classical theorem of Thue [29]. The case was proved by Dejean [8] and the case by Pansiot [25]. Cases up to were proved by Ollagnier [24] and Noori and Currie [23]. Carpi [5] showed that statement holds for every , while the remaining cases were proved by Currie and Rampersad [7].
Theorem 1.
There exist arbitrarily long steady words over a -letter alphabet.
Proof.
From Dejean’s theorem we get that there exist arbitrarily long quaternary words without factors of exponent greater than . Let be any such word. Notice that any factor of the form in must satisfy , where denotes the length of a word . Indeed, the oposite inequality, , implies that contains a factor with exponent at least , but .
We claim that any word with the above separation property is steady. Assume to the contrary that deleting some single interior letter in the word generates a square. We distinguish two cases corresponding to the relative position of the deleted letter:
-
(1)
for some letter and words , where is non-empty. If we put and , we immediately get a contradiction with separation property of , as .
-
(2)
for some letter and words where and are non-empty. If , then the factor contradicts the separation property of (by putting and ). Otherwise, we have , which implies that . In this case, the factor contradicts the separation property of (by taking and ).
Thus, the word is steady, which completes the proof. ∎
3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 5 | 5 | 7 | 9 | 12 | 16 | 21 | 28 | 37 | 45 | 58 | 73 | |
18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | |
93 | 101 | 124 | 150 | 179 | 216 | 257 | 309 | 376 | 453 | 551 | 662 | 798 | 957 | 1149 |
2.2. Steady words from lists of size
In this subsections we consider the list variant of steady words. The proof of the following theorem is inspired by a beautiful argument due to Rosenfeld [26], used by him in analogous problem for square-free words.
Theorem 2.
Let be a sequence of alphabets such that for all . For every there exist at least steady words , such that for all .
Proof.
Let be the set of steady words , such that for all and set to be the size of . Similarly, let be the set of words with for all , such that but is in . We think of as the number of ways we can fail by appending a new symbol at the end of a steady word of length . Our goal is to give a good upper bound on .
Claim 1.
Proof.
Let us partition into subsets defined such that the word from is in if the suffix of of length can be reduced to a square by removing a single letter, and is not in . For example contains words that end with or , contains words that end with (note that , and are not possible) and contains words that end with , or (again, , and are not possible). It is clearly a partition by the definition of .
Let be the size of . Note that , because every word from can be obtained by appending to some word from a repetition of the last or one before last letter. We also have because, as remarked earlier, contains only words that end with (where and are single letters) and each such word can be obtained by repeating a third- and second-from-last letter of a word from .
Now consider a word from for . Since a suffix of of length can be reduced to a square by removing a single non-final letter, we have that either (a) or (b) , where is a single letter and and are words, where and is nonempty. We will count the number of words from that fit those cases separately for every possible position of (i.e. the length of ). Note that in case (a) we must have , because otherwise would be contained in . Therefore, there are possible lengths of and for each of those lengths there are compatible words in , as each such word can be obtained from a word from by appending at the end; this totals to .
Similarly, in case (b) , because otherwise would not be contained in . If the length of is (respectively ), then there are at most (resp. ) compatible words in , obtained by repeating (resp. ) letters from (resp. ). If the length of is at least (for which there are possibilities), then the number of compatible words is at most , as each of them can be obtained by repeating letters from a word in and picking one of seven letters from the list .
After summing the above estimation in both cases, we obtain that for
This, together with our estimations on and , implies that
Which concludes the proof of the claim. ∎
Now we inductively show that for all . It is clearly true for . Note that , so by Claim 1 we obtain that
Using the induction assumption and then calculating sums of the geometric series it follows that
which completes the proof. ∎
2.3. Steady words are bifurcate
As it turns out, the only square-free word which is steady, but not bifurcate is over the alphabet .
Theorem 3.
Let be a fixed alphabet with at least three letters. Every steady word over is bifurcate over .
Proof.
The words , , and are the only steady words with length not greater than 3. Such words are also bifurcate, therefore the theorem holds in their cases.
Thus, let us assume that and let the square-free word over be steady. Such word does not contain a factor , so for every .
We show that for every there exists such that the word
is square-free. The main idea is to show that creating a palindromic factor in the extended word is in favor of its square-freeness. Further, we use a simple fact that an extension of a square-free word is square-free if and only if every factor of which contains is square-free.
Case 1 ( and ).
Consider the extension of the word by the letter at the beginning:
We will show that this extension is square-free. Since is square-free, it is sufficient to show that every prefix of is square-free. Of course, , and are square-free. Let us notice that , so is square-free. Finally, is a unique factor of the form in the word , so it cannot be a prefix of a square factor of length greater than 5 in .
Analogously, one can show that the word is square-free.
Case 2 ( and ).
Consider the extension of by inserting the letter between the first and the second letter of . The resulting word,
is square-free, since , , and is a unique factor of form . So, both words, and , are the prefixes of square-free factors.
Analogously, one can show that the word
is square-free.
Case 3 ().
In this part, let us assume that , , and , where , , and are pairwise different letters. Thus
Let us investigate the extension
of the word , that is,
We will show that is indeed a square-free word.
First notice that the word is actually an extension of the word by inserting the second letter, , at the beginning. Since is a steady word, we obtain that is square-free, by Case 1.
Next notice that the word cannot contain a square as a suffix since it ends up with the unique palindrome . Therefore, it is sufficient to show that there are no squares in prefixes and , and that no other potential square in may contain the factor .
Subcase 1 (No squares in ).
Let us recall that is steady. If there is a square suffix in , then
for some words and (where is possibly empty) and hence
which gives us a contradiction with the assumption that is steady.
Subcase 2 (No squares in ).
If there is a square suffix in , then
for some, possibly empty, words and . This construction implies that
This word contains a reduction
which contradicts the fact that is steady.
Subcase 3 (No squares with the factor ).
If has a square which contains a unique factor , then
for certain word , and so has to contain a factor
Let us notice that cannot contain a factor since it would imply that is not square-free. Moreover, cannot contain a factor since it would imply that has a reduction with a factor , which contradicts the fact that is steady.
The proof is complete.
∎
Let us notice that the conversed statement is not true — a word over the alphabet is bifurcate, but not steady.
2.4. Bifurcate trees of words
We will now prove the aforementioned result on bifurcate trees. Let us start with a formal definition.
A bifurcate tree is any family of bifurcate words over a fixed alphabet arranged in a rooted tree so that the descendants of a word are its single-letter extensions at different positions. Thus, a word of length may have at most descendants. A bifurcate tree is called complete if every vertex has the maximum possible number of descendants. Notice that such a tree must be infinite.
We will demonstrate that complete bifurcate trees exist over alphabets of size at least . This fact follows easily from the results concerning on-line nonrepetitive games obtained in [14] and independently in [19], but we recall the proof for completeness. The key idea is to apply the following result form graph coloring.
A coloring the vertices of a graph is square-free if for every simple path in , the word is square-free. A graph is planar if it can be drawn on the plane without crossing edges. A planar graph is called outerplanar if it has a plane drawing such that all vertices are incident to the outer face.
Theorem 4 (Kündgen and Pelsmayer [20]).
Every outerplanar graph has a square-free coloring using at most colors.

We will apply this theorem to the infinite graph constructed as follows. Let us denote and for . Let , , denote the simple path on with edges joining consecutive elements in . Let be the union of all paths with (see Figure 1) and let be the countable union of all graphs , .
Theorem 5.
There exists a complete bifurcate tree over alphabet of size .
Proof.
Clearly, every finite graph is outerplanar, hence by Theorem 4 it has a square-free coloring with colors. By compactness, the infinite graph also has a -coloring without a square on any simple path. Fix one such square-free coloring of the graph . It is now easy to extract a complete bifurcate tree out of this coloring in the following way.
The root of is . Its descendants are and . Each of these has the following sets of descendants,
and
respectively. In general, every word in corresponds to a path in the graph with vertices taken from different paths and closest to each other as the rational points in the unit interval. All these words are square-free by the more general property of the coloring . ∎
One may easily extend this result to doubly infinite words, with the notions of bifurcate words and trees extended in a natural way.
Theorem 6.
There exists a complete bifurcate tree of doubly infinite words over alphabet of size .
Proof.
Consider the graph on the set of all integers which is a countable union of copies of the graph inserted in every interval , for every integer . By Theorem 4 the graph has a square-free coloring using at most colors. As the root for the constructed tree one may take the doubly infinite word
This word is clearly bifurcate and the whole assertion follows similarly as in the previous proof. ∎
On the other hand, it is not difficult to establish that the above results are no longer true over alphabet of size four.
Theorem 7.
There is no complete bifurcate tree with words of length more than over a -letter alphabet.
Proof.
Let and suppose that is a complete bifurcate tree over . We may assume that is a word in . Then an extension of in the middle is either or . Notice that each of these words can be factorized as such that is a word over the alphabet and is a word over the alphabet . This property will be preserved in every further extension at the position separating form . So, the longest possible square-free word in has the form , which proves the assertion. ∎
3. Final remarks
Let us conclude the paper with some suggestions for future research.
First notice that the assertion of Theorem 4 is actually much stronger than needed for deriving conclusions on bifurcate trees. Indeed, the -coloring it provides is square-free on all possible paths while for our purposes it is sufficient to consider only directed paths going always to the right. More formally, let denote the directed graph obtained from by orienting every edge to the right (towards the larger number).
Problem 1.
Determine the least possible such that there is a -coloring of in which all directed paths are square-free.
By Theorem 4 we know that , but most probably this is not the best possible bound. Clearly, any improvement for the constant would give an improvement in statements of Theorems 5 and 6. Therefore, by Theorem 7 we know that .
Notice that the family of graphs we used in the proof of Theorem 5 is actually a quite restricted subclass of planar graphs, which in turn is just one of the minor-closed classes of graphs. It has been recently proved by Dujmović, Esperet, Joret, Walczak, and Wood [10] that every such class (except the class of all finite graphs) has bounded square-free chromatic number. In particular, every planar graph has a square-free coloring using at most colors. Perhaps these results could be used to derive other interesting properties of words. For such applications it is sufficient to restrict to oriented planar graphs, that is, directed graphs arising from simple planar graphs by fixing for every edge one of the two possible orientations.
Problem 2.
Determine the least possible such that there is a -coloring of any oriented planar graph in which all directed paths are square-free.
Finally, let us mention of another striking connection between words and graph colorings. Let be fixed, and consider all possible proper vertex -colorings of the graph . Identifying colors with letters, one may think of these colorings as of words over a -letter alphabet. Let us denote this set by . Clearly, every word in has length equal to — the number of vertices in the graph .
Let be any word of length and let be any subset of . Denote by the subword of along the set of indices . The following statement is a simple consequence of the celebrated Four Color Theorem (see [18], [28]).
Theorem 8.
For every pair of positive integers and any set of positive integers , with and , there exists a word such that .
References
- [1] J.-P. Allouche, J. Shallit, Automatic Sequences. Theory, Applications, Generalizations, Cambridge University Press, Cambridge, 2003.
- [2] D. R. Bean, A. Ehrenfeucht, and G. F. McNulty, Avoidable patterns in strings of symbols, Pacific J. Math. 85 (1979), 261–294.
- [3] J. Berstel, Axel Thue’s papers on repetitions in words: a translation, Publications du LaCIM 20 (1995).
- [4] J. Berstel, D. Perrin, The origins of combinatorics on words, Europ. J. Combin. 28 (2007), 996–1022.
- [5] A. Carpi, On Dejean’s conjecture over large alphabets, Theoret. Comput. Sci. 385 (2007), 135–151.
- [6] J. D. Currie, Pattern avoidance: themes and variations, Theoretical Computer Science 339 (2005), 7–18.
- [7] J. D. Currie, N. Rampersad, A proof of Dejean’s conjecture, Math. Comput. 80(274) (2011), 1063–1070.
- [8] F. Dejean, Sur un théorème de Thue, J. Combin. Theory Ser. A 13 (1972), 90–99.
- [9] B. Descartes and R. Descartes, La coloration des Cartes, Eureka, 31 (1968) 29–31.
- [10] V. Dujmović, L. Esperet, G. Joret, B. Walczak, and D. R.Wood, Planar graphs have bounded nonrepetitive chromatic number, Advances in Combinatorics, 5, (2020).
- [11] J. Grytczuk, Thue type problems for graphs, points, and numbers. Discrete Math. 308 (2008), 4419–4429.
- [12] J. Grytczuk, J. Przybyło, and X. Zhu, Nonrepetitive list colourings of paths, Random Structures and Algorithms 38 (2011), 162–173.
- [13] J. Grytczuk, J. Kozik, and P. Micek, New approach to nonrepetitive sequences, Random Structures and Algorithms 42 (2013), 214–225.
- [14] J. Grytczuk, P. Szafruga, M. Zmarz, Online version of the theorem of Thue, Inform. Process. Lett. 113 (2013), 193–195.
- [15] J. Grytczuk, H. Kordulewski, and A. Niewiadomski, Extremal square-free words, Electron. J. Combin. 27 (2020), P1.48.
- [16] T. Harju. Disposability in square-free words. Theoretical Computer Science 862 (2021), 155–159.
- [17] L. Hong and S. Zhang, No extremal square-free words over large alphabets, Available at https://arxiv.org/abs/2107.13123.
- [18] T.R. Jensen, B. Toft, Graph Coloring Problems, John Wiley, New York (1995).
- [19] B. Keszegh, X. Zhu, A note about online nonrepetitive coloring k-trees, Discrete Applied Mathematics 285 (2020) 108–112.
- [20] A. Kündgen and M. J. Pelsmajer, Nonrepetitive colorings of graphs of bounded tree-width, Discrete Math. 308 (2008) 4473–4478.
- [21] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983.
- [22] M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press, 2002.
- [23] M. M. Noori, J. D. Currie, Dejean’s conjecture and Strumian words, European J. Combin. 28 (2007), 876–890.
- [24] J. M. Ollagnier, Proof of Dejean’s conjecture for alphabets with 5, 6, 7, 8, 9, 10 and 11 letters, Theoret. Comput. Sci. 95 (1992), 187–205.
- [25] J.-J. Pansiot, A propos d’une conjecture de F. Dejean sur les répétitions dans les mots, Discrete Appl. Math. 7 (1984), 297–311.
- [26] M. Rosenfeld, Another approach to non-repetitive colorings of graphs of bounded degree, Electron. J. Combin. 27 (2020), P3.43.
- [27] M. Rosenfeld, Avoiding squares over words with lists of size three amongst four symbols, arXiv:2104.09965 [math.CO].
- [28] R. Thomas, An update on the Four-Color Theorem, Notices Amer. Math. Soc., 45/7, (1998) 848–859.
- [29] A. Thue, Über unendliche Zeichenreichen, Norske vid. Selsk. Skr. Mat. Nat. Kl. 7 (1906), 1–22. Reprinted in Selected Mathematical Papers of Axel Thue, T. Nagell, editor, Universitetsforlaget, Oslo, 1977, pp. 139–158.
- [30] D. R. Wood, Nonrepetitive graph colouring, Dynamic Surveys, Electronic J. Combin. (2021) DS24.