This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Extensions and reductions of square-free words

Michał DĘbski Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland m.debski@mini.pw.edu.pl Jarosław Grytczuk Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland j.grytczuk@mini.pw.edu.pl  and  Bartłomiej Pawlik Institute of Mathematics, Silesian University of Technology, 44-100 Gliwice, Poland bpawlik@polsl.pl
Abstract.

A word is square-free if it does not contain a nonempty word of the form XXXX as a factor. A famous 1906 result of Thue asserts that there exist arbitrarily long square-free words over a 33-letter alphabet. We study square-free words with additional properties involving single-letter deletions and extensions of words.

A square-free word is steady if it remains square-free after deletion of any single letter. We prove that there exist infinitely many steady words over a 44-letter alphabet. We also demonstrate that one may construct steady words of any length by picking letters from arbitrary alphabets of size 77 assigned to the positions of the constructed word. We conjecture that both bounds can be lowered to 44, which is best possible.

In the opposite direction, we consider square-free words that remain square-free after insertion of a single (suitably chosen) letter at every possible position in the word. We call them bifurcate. We prove a somewhat surprising fact, that over a fixed alphabet with at least three letters, every steady word is bifurcate. We also consider families of bifurcate words possessing a natural tree structure. In particular, we prove that there exists an infinite tree of doubly infinite bifurcate words over alphabet of size 1212.

1. Introduction

A square is a nonempty word of the form XXXX. A word WW contains a square if it can be written as W=PXXSW=PXXS, with XX nonempty, while PP and SS possibly empty. Otherwise, WW is called square-free. A famous theorem of Thue [29] (see [3]) asserts that there exist infinitely many square-free words over a 33-letter alphabet. This result initiated combinatorics on words — the whole branch of mathematics, abundant in many deep results, exciting problems, and unexpected connections to diverse areas of science (see [1, 2, 4, 6, 21, 22]).

In this paper we study square-free words with additional properties involving two natural operations, a single-letter extension and a single-letter deletion, defined as follows.

Let WW be a word of length nn over a fixed alphabet 𝒜\mathcal{A}. For i=0,1,,ni=0,1,\dots,n, let Pi(W)P_{i}(W) and Si(W)S_{i}(W) denote the prefix and the suffix of WW of length ii, respectively. Notice that the words P0(W)P_{0}(W) and S0(W)S_{0}(W) are empty. In particular, we have W=Pi(W)Sni(W)W=P_{i}(W)S_{n-i}(W) for every i=0,1,,ni=0,1,\dots,n. An extension of WW at position ii is a word of the form U=Pi(W)𝚡Sni(W)U=P_{i}(W)\mathtt{x}S_{n-i}(W), where 𝚡\mathtt{x} is any letter from 𝒜\mathcal{A}. In this case we also say that WW is a reduction of UU.

A square-free word WW is extremal over 𝒜\mathcal{A} if there is no square-free extension of WW. Grytczuk, Kordulewski and Niewiadomski [15] proved that there exist infinitely many extremal ternary words, the shortest one being

𝟷𝟸𝟹𝟷𝟸𝟷𝟹𝟸𝟹𝟷𝟸𝟹𝟸𝟷𝟸𝟹𝟷𝟸𝟷𝟹𝟸𝟹𝟷𝟸𝟹.\mathtt{1231213231232123121323123}.

They also conjectured that there are no extremal words over an alphabet of size 44. Recently, Hong and Zhang [17] proved that this is true for an alphabet of size 1717.

Harju [16] introduced a complementary concept of irreducible words. These are square-free words whose any non-trivial reduction (the deleted letter is neither the first one nor the last one in the word) contains a square. He proved that for any n4,5,7,12n\neq 4,5,7,12 there exists a ternary irreducible word of length nn.

In this article we consider square-free words with the very opposite properties, defined as follows: a square-free word is steady if it remains square-free after deleting any single letter. For instance, the word 𝟷𝟸𝟹𝟷\mathtt{1231} is steady since all of its four reductions

𝟸𝟹𝟷,𝟷𝟹𝟷,𝟷𝟸𝟷,𝟸𝟹𝟷\mathtt{231,131,121,231}

are square-free, while 𝟷𝟸𝟷𝟹\mathtt{1213} is not steady since one of its reductions is 𝟷𝟷𝟹\mathtt{113}. Generally, every ternary square-free word of length at least 66 must contain a factor of the form 𝚡𝚢𝚡\mathtt{xyx}, and therefore is not steady. However, there are steady words of any given length over larger alphabets, as we prove in Theorem 1. We also consider a general variant of such statement in the following list setting.

Conjecture 1.

Let nn be a positive integer and let 𝒜1,𝒜2,,𝒜n\mathcal{A}_{1},\mathcal{A}_{2},\ldots,\mathcal{A}_{n} be a sequence of alphabets of size 44. Then there exists a steady word W=w1w2wnW=w_{1}w_{2}\cdots w_{n}, with wi𝒜iw_{i}\in\mathcal{A}_{i} for every i=1,2,,ni=1,2,\ldots,n.

In Theorem 2 we prove that the statement of the conjecture holds for alphabets with at least 77 letters. Let us mention that analogous conjecture for pure square-free words (with alphabets of size 33), stated in [11], is still open. Currently the best general result confirms it for alphabets of size 44 (see [13, 12, 26] for three different proofs). Recently Rosenfeld [27] proved that it holds when the union of all alphabets is a 44-element set.

We also consider square-free words defined similarly with respect to extensions of words. A square-free word is bifurcate over a fixed alphabet 𝒜\mathcal{A} if it has at least one square-free extension at every position. For instance, the word 𝟷𝟸𝟹𝟷\mathtt{1231} is bifurcate over {1,2,3}\{1,2,3\} and here are its five square-free extensions:

𝟸¯𝟷𝟸𝟹𝟷,𝟷𝟹¯𝟸𝟹𝟷,𝟷𝟸𝟷¯𝟹𝟷,𝟷𝟸𝟹𝟸¯𝟷,𝟷𝟸𝟹𝟷𝟸¯.\mathtt{\underline{2}1231,1\underline{3}231,12\underline{1}31,123\underline{2}1,1231\underline{2}}.

Thus the word 𝟷𝟸𝟹𝟷\mathtt{1231} is both steady and bifurcate. This not a coincidence — for an alphabet with at least three letters, every steady word is bifurcate, as we prove in Theorem 3.

Clearly, ternary bifurcate words cannot be too long. Indeed, every ternary square-free word of length at least 66 contains a factor of the form 𝚡𝚢𝚡𝚣\mathtt{xyxz} (or its reversal). On the other hand, any ternary square-free word is bifurcate over a 44-letter alphabet. One may, however, inquire about the existence of an infinite chain of bifurcate quaternary words.

Conjecture 2.

There exists an infinite sequence of quaternary bifurcate words W1,W2,W_{1},W_{2},\ldots such that Wi+1W_{i+1} is a single-letter extension of WiW_{i}, for each i=1,2,i=1,2,\ldots.

A much stronger property holds over alphabets of size at least 1212, namely there exists a complete bifurcate tree of bifurcate words (rooted at any single letter), in which every word of length nn has n+1n+1 descendants, corresponding to the extensions at different positions, and each of them is again bifurcate, and the same is true for all of their descendant, and so on, ad infinitum. This curious fact follows easily from a result of Kündgen and Pelsmajer on nonrepetitive colorings of outerplanar graphs, as noted in [14] in a different context of the on-line Thue games. We will recall this short argument for completeness. It seems plausible, however, that the actual number of letters needed for such an amazing property may be much smaller.

Conjecture 3.

There exists a complete bifurcate tree over an alphabet of size 55.

It is not hard to verify that the above conjecture is tight, as we demonstrate at the end of the following section.

2. Results

We shall present proofs of our results in the following subsections.

2.1. Steady words over a 44-letter alphabet

For a rational r[1,2]r\in[1,2], a fractional rr-power is a word W=XPW=XP, where PP is a prefix of WW of length (r1)|X|(r-1)|X|. We say that rr is the exponent of the word WW. The famous Dejean’s conjecture (now a theorem) states that the infimum of rr for which there exist infinite nn-ary words without factors of exponent greater than rr is equal to

{7/4 for n=3,7/5 for n=4,n/(n1) for n3,4.\left\{\begin{array}[]{ll}7/4&\mbox{ for }n=3,\\ 7/5&\mbox{ for }n=4,\\ n/(n-1)&\mbox{ for }n\neq 3,4.\end{array}\right.

The case n=2n=2 is a simple consequence of the classical theorem of Thue [29]. The case n=3n=3 was proved by Dejean [8] and the case n=4n=4 by Pansiot [25]. Cases up to n=14n=14 were proved by Ollagnier [24] and Noori and Currie [23]. Carpi [5] showed that statement holds for every n33n\geq 33, while the remaining cases were proved by Currie and Rampersad [7].

Theorem 1.

There exist arbitrarily long steady words over a 44-letter alphabet.

Proof.

From Dejean’s theorem we get that there exist arbitrarily long quaternary words without factors of exponent greater than 75\frac{7}{5}. Let SS be any such word. Notice that any factor of the form XYXXYX in SS must satisfy |Y|>|X||Y|>|X|, where |W||W| denotes the length of a word WW. Indeed, the oposite inequality, |Y||X||Y|\leqslant|X|, implies that SS contains a factor with exponent at least 32\frac{3}{2}, but 32>75\frac{3}{2}>\frac{7}{5}.

We claim that any word with the above separation property is steady. Assume to the contrary that deleting some single interior letter in the word SS generates a square. We distinguish two cases corresponding to the relative position of the deleted letter:

  1. (1)

    S=AC𝚊CBS=AC\mathtt{a}CB for some letter 𝚊\mathtt{a} and words A,B,CA,B,C, where CC is non-empty. If we put X=CX=C and Y=aY=a, we immediately get a contradiction with separation property of SS, as |C|1|C|\geqslant 1.

  2. (2)

    S=AC𝚊C′′CC′′BS=AC^{\prime}\mathtt{a}C^{\prime\prime}C^{\prime}C^{\prime\prime}B for some letter 𝚊\mathtt{a} and words A,B,C,C′′A,B,C^{\prime},C^{\prime\prime} where CC^{\prime} and C′′C^{\prime\prime} are non-empty. If |C||C′′||C^{\prime}|\leqslant|C^{\prime\prime}|, then the factor C′′CC′′C^{\prime\prime}C^{\prime}C^{\prime\prime} contradicts the separation property of SS (by putting X=C′′X=C^{\prime\prime} and Y=CY=C^{\prime}). Otherwise, we have |C′′|<|C||C^{\prime\prime}|<|C^{\prime}|, which implies that |𝚊C′′||C||\mathtt{a}C^{\prime\prime}|\leqslant|C^{\prime}|. In this case, the factor C(𝚊C′′)CC^{\prime}(\mathtt{a}C^{\prime\prime})C^{\prime} contradicts the separation property of SS (by taking X=CX=C^{\prime} and Y=𝚊C′′Y=\mathtt{a}C^{\prime\prime}).

Thus, the word SS is steady, which completes the proof. ∎

nn 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
NN 1 2 3 5 5 7 9 12 16 21 28 37 45 58 73
nn 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
NN 93 101 124 150 179 216 257 309 376 453 551 662 798 957 1149

Table 1. The number NN of quaternary steady words of length nn (with respect to the permutations of symbols).

2.2. Steady words from lists of size 77

In this subsections we consider the list variant of steady words. The proof of the following theorem is inspired by a beautiful argument due to Rosenfeld [26], used by him in analogous problem for square-free words.

Theorem 2.

Let (𝒜1,𝒜2,)(\mathcal{A}_{1},\mathcal{A}_{2},\ldots) be a sequence of alphabets such that |𝒜i|=7\left|\mathcal{A}_{i}\right|=7 for all ii. For every NN there exist at least 4N4^{N} steady words w=w1w2wNw=w_{1}w_{2}\ldots w_{N}, such that wi𝒜iw_{i}\in\mathcal{A}_{i} for all i1i\geqslant 1.

Proof.

Let n\mathbb{C}_{n} be the set of steady words w=w1w2wnw=w_{1}w_{2}\ldots w_{n}, such that wi𝒜iw_{i}\in\mathcal{A}_{i} for all ii and set CnC_{n} to be the size of n\mathbb{C}_{n}. Similarly, let 𝔽n\mathbb{F}_{n} be the set of words f=f1f2fnf=f_{1}f_{2}\ldots f_{n} with fiLif_{i}\in L_{i} for all ii, such that fnf\notin\mathbb{C}_{n} but f1f2fn1f_{1}f_{2}\ldots f_{n-1} is in n1\mathbb{C}_{n-1}. We think of FnF_{n} as the number of ways we can fail by appending a new symbol at the end of a steady word of length n1n-1. Our goal is to give a good upper bound on FnF_{n}.

Claim 1.
Fn+12Cn+2Cn1+i=0(3+8i)Cn2iF_{n+1}\leqslant 2C_{n}+2C_{n-1}+\sum_{i=0}^{\infty}\left(3+8i\right)C_{n-2-i}
Proof.

Let us partition 𝔽n+1\mathbb{F}_{n+1} into subsets 𝔻1,𝔻2,𝔻3,\mathbb{D}_{1},\mathbb{D}_{2},\mathbb{D}_{3},\ldots defined such that the word f=f1f2fn+1f=f_{1}f_{2}\ldots f_{n+1} from 𝔽n+1\mathbb{F}_{n+1} is in 𝔻j\mathbb{D}_{j} if the suffix of ff of length 2j+12j+1 can be reduced to a square by removing a single letter, and ff is not in 𝔻j1\mathbb{D}_{j-1}. For example 𝔻1\mathbb{D}_{1} contains words that end with aaaa or axaaxa, 𝔻2\mathbb{D}_{2} contains words that end with abxababxab (note that abababab, axbabaxbab and abaxbabaxb are not possible) and 𝔻3\mathbb{D}_{3} contains words that end with abxcabcabxcabc, abcxabcabcxabc or abcaxbcabcaxbc (again, abcabcabcabc, axbcabcaxbcabc and abcabxcabcabxc are not possible). It is clearly a partition by the definition of 𝔽n\mathbb{F}_{n}.

Let DjD_{j} be the size of 𝔻j\mathbb{D}_{j}. Note that D12CnD_{1}\leqslant 2C_{n}, because every word from 𝔻1\mathbb{D}_{1} can be obtained by appending to some word from n\mathbb{C}_{n} a repetition of the last or one before last letter. We also have D2Cn1D_{2}\leq C_{n-1} because, as remarked earlier, 𝔻2\mathbb{D}_{2} contains only words that end with abxababxab (where a,ba,b and xx are single letters) and each such word can be obtained by repeating a third- and second-from-last letter of a word from n1\mathbb{C}_{n-1}.

Now consider a word f=f1f2fn+1f=f_{1}f_{2}\ldots f_{n+1} from 𝔻j\mathbb{D}_{j} for j>2j>2. Since a suffix of ff of length 2j+12j+1 can be reduced to a square by removing a single non-final letter, we have that either (a) fn2j+1fn2j+1fn+1=PxQPQf_{n-2j+1}f_{n-2j+1}\ldots f_{n+1}=PxQPQ or (b) fn2j+1fn2j+1fn+1=PQPxQf_{n-2j+1}f_{n-2j+1}\ldots f_{n+1}=PQPxQ, where xx is a single letter and PP and QQ are words, where |P|+|Q|=j\left|P\right|+\left|Q\right|=j and QQ is nonempty. We will count the number of words ff from 𝔻j\mathbb{D}_{j} that fit those cases separately for every possible position of xx (i.e. the length of PP). Note that in case (a) we must have |P|2\left|P\right|\geqslant 2, because otherwise ff would be contained in 𝔻j1\mathbb{D}_{j-1}. Therefore, there are j2j-2 possible lengths of PP and for each of those lengths there are Cn+1jC_{n+1-j} compatible words in 𝔻j\mathbb{D}_{j}, as each such word can be obtained from a word from n+1j\mathbb{C}_{n+1-j} by appending PQPQ at the end; this totals to (j1)Cn+1j(j-1)C_{n+1-j}.

Similarly, in case (b) |Q|2\left|Q\right|\geqslant 2, because otherwise ff would not be contained in 𝔽n+1\mathbb{F}_{n+1}. If the length of PP is 0 (respectively 11), then there are at most Cn+1jC_{n+1-j} (resp. Cn+2jC_{n+2-j}) compatible words in 𝔻j\mathbb{D}_{j}, obtained by repeating jj (resp. j1j-1) letters from n+1j\mathbb{C}_{n+1-j} (resp. n+2j\mathbb{C}_{n+2-j}). If the length of PP is at least 22 (for which there are j3j-3 possibilities), then the number of compatible words is at most 7Cn+1j7C_{n+1-j}, as each of them can be obtained by repeating jj letters from a word in n+1j\mathbb{C}_{n+1-j} and picking one of seven letters from the list 𝒜n+1|P|\mathcal{A}_{n+1-\left|P\right|}.

After summing the above estimation in both cases, we obtain that for j>2j>2

Dj(j1)Cn+1j+Cn+1j+Cn+2j+(j3)7Cn+1j=Cn+2j+(8j22)Cn+1j.\displaystyle D_{j}\leqslant\left(j-1\right)C_{n+1-j}+C_{n+1-j}+C_{n+2-j}+\left(j-3\right)7C_{n+1-j}=C_{n+2-j}+\left(8j-22\right)C_{n+1-j}.

This, together with our estimations on D1D_{1} and D2D_{2}, implies that

Fn+12Cn+Cn1+j=3(Cn+2j+(8j22)Cn+1j)=\displaystyle F_{n+1}\leqslant 2C_{n}+C_{n-1}+\sum_{j=3}^{\infty}\left(C_{n+2-j}+\left(8j-22\right)C_{n+1-j}\right)=
=2Cn+Cn1+Cn1+i=0Cn2i+i=0(8i+2)Cn2i=\displaystyle=2C_{n}+C_{n-1}+C_{n-1}+\sum_{i=0}^{\infty}C_{n-2-i}+\sum_{i=0}^{\infty}\left(8i+2\right)C_{n-2-i}=
=2Cn+2Cn1+i=0(3+8i)Cn2i,\displaystyle=2C_{n}+2C_{n-1}+\sum_{i=0}^{\infty}\left(3+8i\right)C_{n-2-i},

Which concludes the proof of the claim. ∎

Now we inductively show that Cn4Cn1C_{n}\geqslant 4C_{n-1} for all n>0n>0. It is clearly true for n=1n=1. Note that Cn+1=7CnFn+1C_{n+1}=7C_{n}-F_{n+1}, so by Claim 1 we obtain that

Cn+15Cn2Cn1i=0(3+8i)Cn2i.\displaystyle C_{n+1}\geqslant 5C_{n}-2C_{n-1}-\sum_{i=0}^{\infty}\left(3+8i\right)C_{n-2-i}.

Using the induction assumption and then calculating sums of the geometric series it follows that

Cn+15Cn2Cn4i=0(3+8i)Cn42+i=Cn(923i=0142+i12i=1i4i)\displaystyle C_{n+1}\geqslant 5C_{n}-2\frac{C_{n}}{4}-\sum_{i=0}^{\infty}\left(3+8i\right)\frac{C_{n}}{4^{2+i}}=C_{n}\left(\frac{9}{2}-3\sum_{i=0}^{\infty}\frac{1}{4^{2+i}}-\frac{1}{2}\sum_{i=1}^{\infty}\frac{i}{4^{i}}\right)
=Cn(921412i=1j=1i14i)=Cn(17412j=1i=j14i)\displaystyle=C_{n}\left(\frac{9}{2}-\frac{1}{4}-\frac{1}{2}\sum_{i=1}^{\infty}\sum_{j=1}^{i}\frac{1}{4^{i}}\right)=C_{n}\left(\frac{17}{4}-\frac{1}{2}\sum_{j=1}^{\infty}\sum_{i=j}^{\infty}\frac{1}{4^{i}}\right)
=Cn(17412j=114j43)=Cn(17423j=114j)\displaystyle=C_{n}\left(\frac{17}{4}-\frac{1}{2}\sum_{j=1}^{\infty}\frac{1}{4^{j}}\frac{4}{3}\right)=C_{n}\left(\frac{17}{4}-\frac{2}{3}\sum_{j=1}^{\infty}\frac{1}{4^{j}}\right)
=Cn(17429)>4Cn,\displaystyle=C_{n}\left(\frac{17}{4}-\frac{2}{9}\right)>4C_{n},

which completes the proof. ∎

2.3. Steady words are bifurcate

As it turns out, the only square-free word which is steady, but not bifurcate is 𝟷\mathtt{1} over the alphabet {𝟷}\{\mathtt{1}\}.

Theorem 3.

Let 𝒜\mathcal{A} be a fixed alphabet with at least three letters. Every steady word over 𝒜\mathcal{A} is bifurcate over 𝒜\mathcal{A}.

Proof.

The words 𝟷\mathtt{1}, 𝟷𝟸\mathtt{12}, and 𝟷𝟸𝟹\mathtt{123} are the only steady words with length not greater than 3. Such words are also bifurcate, therefore the theorem holds in their cases.

Thus, let us assume that n4n\geqslant 4 and let the square-free word W=w1w2wnW=w_{1}w_{2}\ldots w_{n} over 𝒜\mathcal{A} be steady. Such word does not contain a factor 𝚡𝚢𝚡\mathtt{xyx}, so wiwi+2w_{i}\neq w_{i+2} for every 1in21\leqslant i\leqslant n-2.

We show that for every 0jn0\leqslant j\leqslant n there exists 𝚡𝒜\mathtt{x}\in\mathcal{A} such that the word

Pj(W)𝚡Snj(W)P_{j}(W)\mathtt{x}S_{n-j}(W)

is square-free. The main idea is to show that creating a palindromic factor 𝚡𝚢𝚡\mathtt{xyx} in the extended word is in favor of its square-freeness. Further, we use a simple fact that an extension A𝚡BA\mathtt{x}B of a square-free word ABAB is square-free if and only if every factor of A𝚡BA\mathtt{x}B which contains 𝚡\mathtt{x} is square-free.

Case 1 (j=0j=0 and j=nj=n).

Consider the extension of the word W=w1w2wnW=w_{1}w_{2}\cdots w_{n} by the letter w2w_{2} at the beginning:

Y=w2¯w1w2wn.Y=\underline{w_{2}}w_{1}w_{2}\ldots w_{n}.

We will show that this extension is square-free. Since w1w2wnw_{1}w_{2}\ldots w_{n} is square-free, it is sufficient to show that every prefix of YY is square-free. Of course, w2w_{2}, w2w1w_{2}w_{1} and w2w1w2w_{2}w_{1}w_{2} are square-free. Let us notice that w1w3w_{1}\neq w_{3}, so P4(Y)P_{4}(Y) is square-free. Finally, w2w1w2w_{2}w_{1}w_{2} is a unique factor of the form 𝚡𝚢𝚡\mathtt{xyx} in the word YY, so it cannot be a prefix of a square factor of length greater than 5 in YY.

Analogously, one can show that the word w1wn1wnwn1¯w_{1}\ldots w_{n-1}w_{n}\underline{w_{n-1}} is square-free.

Case 2 (j=1j=1 and j=n1j=n-1).

Consider the extension of WW by inserting the letter w3w_{3} between the first and the second letter of WW. The resulting word,

Y=w1w3¯w2w3wn,Y=w_{1}\underline{w_{3}}w_{2}w_{3}\ldots w_{n},

is square-free, since w1w3w_{1}\neq w_{3}, w2w4w_{2}\neq w_{4}, and w3w2w3w_{3}w_{2}w_{3} is a unique factor of form 𝚡𝚢𝚡\mathtt{xyx}. So, both words, w1w3w2w3w_{1}w_{3}w_{2}w_{3} and w3w2w3w_{3}w_{2}w_{3}, are the prefixes of square-free factors.

Analogously, one can show that the word

Y=w1wn2wn1wn2¯wnY=w_{1}\ldots w_{n-2}w_{n-1}\underline{w_{n-2}}w_{n}

is square-free.

Case 3 (2jn22\leqslant j\leqslant n-2).

In this part, let us assume that wj=𝚊w_{j}=\mathtt{a}, wj+1=𝚋w_{j+1}=\mathtt{b}, and wj+2=𝚌w_{j+2}=\mathtt{c}, where 𝚊\mathtt{a}, 𝚋\mathtt{b}, and 𝚌\mathtt{c} are pairwise different letters. Thus

W=Pj1(W)𝚊𝚋𝚌Snj2(W).W=P_{j-1}(W)\mathtt{abc}S_{n-j-2}(W).

Let us investigate the extension

Z=Pj(W)𝚌Snj(W)Z=P_{j}(W){\mathtt{c}}S_{n-j}(W)

of the word WW, that is,

Z=Pj+3(Z)w1w2𝚊𝚌¯𝚋𝚌wj+3wnSnj+1(Z).Z=\hbox to0.0pt{$\displaystyle\underbrace{\phantom{w_{1}w_{2}\ldots\mathtt{a\underline{c}bc}}}_{P_{j+3}(Z)}$\hss}w_{1}w_{2}\ldots\mathtt{a}\overbrace{\underline{\mathtt{c}}\mathtt{bc}w_{j+3}\ldots w_{n}}^{S_{n-j+1}(Z)}.

We will show that ZZ is indeed a square-free word.

First notice that the word Snj+1(Z)S_{n-j+1}(Z) is actually an extension of the word Snj(W)S_{n-j}(W) by inserting the second letter, 𝚌\mathtt{c}, at the beginning. Since Snj(W)S_{n-j}(W) is a steady word, we obtain that Snj+1(Z)S_{n-j+1}(Z) is square-free, by Case 1.

Next notice that the word Pj+3(Z)P_{j+3}(Z) cannot contain a square as a suffix since it ends up with the unique palindrome 𝚌𝚋𝚌\mathtt{cbc}. Therefore, it is sufficient to show that there are no squares in prefixes Pj+1(Z)P_{j+1}(Z) and Pj+2(Z)P_{j+2}(Z), and that no other potential square in ZZ may contain the factor 𝚌𝚋𝚌\mathtt{cbc}.

Subcase 1 (No squares in Pj+1(Z)P_{j+1}(Z)).

Let us recall that Pj(Z)=Pj(W)P_{j}(Z)=P_{j}(W) is steady. If there is a square suffix in Pj+1(Z)P_{j+1}(Z), then

Pj+1(Z)=AB𝚌B𝚌P_{j+1}(Z)=AB\mathtt{c}B\mathtt{c}

for some words AA and BB (where AA is possibly empty) and hence

Pj(Z)=AB𝚌B=Pj(W),P_{j}(Z)=AB\mathtt{c}B=P_{j}(W),

which gives us a contradiction with the assumption that WW is steady.

Subcase 2 (No squares in Pj+2(Z)P_{j+2}(Z)).

If there is a square suffix in Pj+2(Z)P_{j+2}(Z), then

Pj+2(Z)=AB𝚌𝚋B𝚌𝚋P_{j+2}(Z)=AB\mathtt{cb}B\mathtt{cb}

for some, possibly empty, words AA and BB. This construction implies that

Pj+1(W)=AB𝚌𝚋B𝚋.P_{j+1}(W)=AB\mathtt{cb}B\mathtt{b}.

This word contains a reduction

AB𝚋B𝚋,AB\mathtt{b}B\mathtt{b},

which contradicts the fact that WW is steady.

Subcase 3 (No squares with the factor 𝚌𝚋𝚌\mathtt{cbc}).

If ZZ has a square UU which contains a unique factor 𝚌𝚋𝚌\mathtt{cbc}, then

U=𝚋𝚌A𝚌𝚋𝚌A𝚌 or U=𝚌A𝚌𝚋𝚌A𝚌𝚋U=\mathtt{bc}A\mathtt{c}\mathtt{bc}A\mathtt{c}\ \mbox{ or }\ U=\mathtt{c}A\mathtt{cb}\mathtt{c}A\mathtt{cb}

for certain word AA, and so WW has to contain a factor

U1=𝚋𝚌A𝚋𝚌A𝚌 or U2=𝚌A𝚋𝚌A𝚌𝚋.U_{1}=\mathtt{bc}A\mathtt{bc}A\mathtt{c}\ \mbox{ or }\ U_{2}=\mathtt{c}A\mathtt{bc}A\mathtt{cb}.

Let us notice that WW cannot contain a factor U1U_{1} since it would imply that WW is not square-free. Moreover, WW cannot contain a factor U2U_{2} since it would imply that WW has a reduction with a factor 𝚌A𝚌A𝚌𝚋\mathtt{c}A\mathtt{c}A\mathtt{cb}, which contradicts the fact that WW is steady.

The proof is complete.

Let us notice that the conversed statement is not true — a word 𝟷𝟸𝟹𝟷𝟸\mathtt{12312} over the alphabet {𝟷,𝟸,𝟹}\{\mathtt{1,2,3}\} is bifurcate, but not steady.

2.4. Bifurcate trees of words

We will now prove the aforementioned result on bifurcate trees. Let us start with a formal definition.

A bifurcate tree is any family 𝔹\mathbb{B} of bifurcate words over a fixed alphabet arranged in a rooted tree so that the descendants of a word are its single-letter extensions at different positions. Thus, a word of length nn may have at most n+1n+1 descendants. A bifurcate tree is called complete if every vertex has the maximum possible number of descendants. Notice that such a tree must be infinite.

We will demonstrate that complete bifurcate trees exist over alphabets of size at least 1212. This fact follows easily from the results concerning on-line nonrepetitive games obtained in [14] and independently in [19], but we recall the proof for completeness. The key idea is to apply the following result form graph coloring.

A coloring cc the vertices of a graph GG is square-free if for every simple path v1v2vnv_{1}v_{2}\ldots v_{n} in GG, the word c(v1)c(v2)c(vn)c(v_{1})c(v_{2})\cdots c(v_{n}) is square-free. A graph is planar if it can be drawn on the plane without crossing edges. A planar graph is called outerplanar if it has a plane drawing such that all vertices are incident to the outer face.

Theorem 4 (Kündgen and Pelsmayer [20]).

Every outerplanar graph has a square-free coloring using at most 1212 colors.

Refer to caption
Figure 1. The graph D3D_{3}.

We will apply this theorem to the infinite graph DD constructed as follows. Let us denote V0={0,1}V_{0}=\{0,1\} and Vn={0,12n,22n,32n,,2n12n,1}V_{n}=\{0,\frac{1}{2^{n}},\frac{2}{2^{n}},\frac{3}{2^{n}},\ldots,\frac{2^{n}-1}{2^{n}},1\} for n1n\geqslant 1. Let PnP_{n}, n0n\geqslant 0, denote the simple path on VnV_{n} with edges joining consecutive elements in VnV_{n}. Let DnD_{n} be the union of all paths PjP_{j} with 0jn0\leqslant j\leqslant n (see Figure 1) and let DD be the countable union of all graphs DnD_{n}, n0n\geqslant 0.

Theorem 5.

There exists a complete bifurcate tree over alphabet of size 1212.

Proof.

Clearly, every finite graph DnD_{n} is outerplanar, hence by Theorem 4 it has a square-free coloring with 1212 colors. By compactness, the infinite graph DD also has a 1212-coloring without a square on any simple path. Fix one such square-free coloring cc of the graph DD. It is now easy to extract a complete bifurcate tree \mathcal{B} out of this coloring in the following way.

The root of \mathcal{B} is c(12)c(\frac{1}{2}). Its descendants are c(14)c(12)c(\frac{1}{4})c(\frac{1}{2}) and c(12)c(34)c(\frac{1}{2})c(\frac{3}{4}). Each of these has the following sets of descendants,

c(18)c(14)c(12),c(14)c(38)c(12),c(14)c(12)c(58)c\left(\frac{1}{8}\right)c\left(\frac{1}{4}\right)c\left(\frac{1}{2}\right),c\left(\frac{1}{4}\right)c\left(\frac{3}{8}\right)c\left(\frac{1}{2}\right),c\left(\frac{1}{4}\right)c\left(\frac{1}{2}\right)c\left(\frac{5}{8}\right)

and

c(38)c(12)c(34),c(12)c(58)c(34),c(12)c(34)c(78),c\left(\frac{3}{8}\right)c\left(\frac{1}{2}\right)c\left(\frac{3}{4}\right),c\left(\frac{1}{2}\right)c\left(\frac{5}{8}\right)c\left(\frac{3}{4}\right),c\left(\frac{1}{2}\right)c\left(\frac{3}{4}\right)c\left(\frac{7}{8}\right),

respectively. In general, every word WW in 𝔹\mathbb{B} corresponds to a path in the graph DD with vertices taken from different paths PnP_{n} and closest to each other as the rational points in the unit interval. All these words are square-free by the more general property of the coloring cc. ∎

One may easily extend this result to doubly infinite words, with the notions of bifurcate words and trees extended in a natural way.

Theorem 6.

There exists a complete bifurcate tree of doubly infinite words over alphabet of size 1212.

Proof.

Consider the graph GG on the set of all integers which is a countable union of copies of the graph DD inserted in every interval [n,n+1][n,n+1], for every integer nn. By Theorem 4 the graph GG has a square-free coloring cc using at most 1212 colors. As the root for the constructed tree one may take the doubly infinite word

R=c(3)c(2)c(1)c(0)c(1)c(2).R=\cdots c(-3)c(-2)c(-1)c(0)c(1)c(2)\cdots.

This word is clearly bifurcate and the whole assertion follows similarly as in the previous proof. ∎

On the other hand, it is not difficult to establish that the above results are no longer true over alphabet of size four.

Theorem 7.

There is no complete bifurcate tree with words of length more than 66 over a 44-letter alphabet.

Proof.

Let 𝒜={𝚊,𝚋,𝚌,𝚍}\mathcal{A}=\{\mathtt{a,b,c,d}\} and suppose that 𝔹\mathbb{B} is a complete bifurcate tree over 𝒜\mathcal{A}. We may assume that 𝚊𝚌\mathtt{ac} is a word in 𝔹\mathbb{B}. Then an extension of 𝚊𝚌\mathtt{ac} in the middle is either 𝚊𝚋𝚌\mathtt{abc} or 𝚊𝚍𝚌\mathtt{adc}. Notice that each of these words can be factorized as XYXY such that XX is a word over the alphabet {𝚊,𝚋}\{\mathtt{a,b}\} and YY is a word over the alphabet {𝚌,𝚍}\{\mathtt{c,d}\}. This property will be preserved in every further extension at the position separating XX form YY. So, the longest possible square-free word in 𝔹\mathbb{B} has the form 𝚊𝚋𝚊𝚌𝚍𝚌\mathtt{abacdc}, which proves the assertion. ∎

3. Final remarks

Let us conclude the paper with some suggestions for future research.

First notice that the assertion of Theorem 4 is actually much stronger than needed for deriving conclusions on bifurcate trees. Indeed, the 1212-coloring it provides is square-free on all possible paths while for our purposes it is sufficient to consider only directed paths going always to the right. More formally, let DD^{*} denote the directed graph obtained from DD by orienting every edge to the right (towards the larger number).

Problem 1.

Determine the least possible kk such that there is a kk-coloring of DD^{*} in which all directed paths are square-free.

By Theorem 4 we know that k12k\leqslant 12, but most probably this is not the best possible bound. Clearly, any improvement for the constant kk would give an improvement in statements of Theorems 5 and 6. Therefore, by Theorem 7 we know that k5k\geqslant 5.

Notice that the family of graphs DnD_{n} we used in the proof of Theorem 5 is actually a quite restricted subclass of planar graphs, which in turn is just one of the minor-closed classes of graphs. It has been recently proved by Dujmović, Esperet, Joret, Walczak, and Wood [10] that every such class (except the class of all finite graphs) has bounded square-free chromatic number. In particular, every planar graph has a square-free coloring using at most 768768 colors. Perhaps these results could be used to derive other interesting properties of words. For such applications it is sufficient to restrict to oriented planar graphs, that is, directed graphs arising from simple planar graphs by fixing for every edge one of the two possible orientations.

Problem 2.

Determine the least possible kk such that there is a kk-coloring of any oriented planar graph in which all directed paths are square-free.

Finally, let us mention of another striking connection between words and graph colorings. Let n0n\geqslant 0 be fixed, and consider all possible proper vertex 44-colorings of the graph DnD_{n}. Identifying colors with letters, one may think of these colorings as of words over a 44-letter alphabet. Let us denote this set by 𝔻n\mathbb{D}_{n}. Clearly, every word in 𝔻n\mathbb{D}_{n} has length equal to 2n+12^{n}+1 — the number of vertices in the graph DnD_{n}.

Let WW be any word of length NN and let AA be any subset of {1,2,,N}\{1,2,\ldots,N\}. Denote by WAW_{A} the subword of WW along the set of indices AA. The following statement is a simple consequence of the celebrated Four Color Theorem (see [18], [28]).

Theorem 8.

For every pair of positive integers nNn\leqslant N and any set of positive integers AA, with |A|=2n+1|A|=2^{n}+1 and maxA2N+1\max A\leqslant 2^{N}+1, there exists a word W𝔻NW\in\mathbb{D}_{N} such that WA𝔻nW_{A}\in\mathbb{D}_{n}.

What is more surprising is that this statement is actually equivalent to the Four Color Theorem, as proved by Descartes and Descartes [9] (see [18]) Perhaps one could prove it directly, without refereeing to graph coloring and without huge computer verifications.

References

  • [1] J.-P. Allouche, J. Shallit, Automatic Sequences. Theory, Applications, Generalizations, Cambridge University Press, Cambridge, 2003.
  • [2] D. R. Bean, A. Ehrenfeucht, and G. F. McNulty, Avoidable patterns in strings of symbols, Pacific J. Math. 85 (1979), 261–294.
  • [3] J. Berstel, Axel Thue’s papers on repetitions in words: a translation, Publications du LaCIM 20 (1995).
  • [4] J. Berstel, D. Perrin, The origins of combinatorics on words, Europ. J. Combin. 28 (2007), 996–1022.
  • [5] A. Carpi, On Dejean’s conjecture over large alphabets, Theoret. Comput. Sci. 385 (2007), 135–151.
  • [6] J. D. Currie, Pattern avoidance: themes and variations, Theoretical Computer Science 339 (2005), 7–18.
  • [7] J. D. Currie, N. Rampersad, A proof of Dejean’s conjecture, Math. Comput. 80(274) (2011), 1063–1070.
  • [8] F. Dejean, Sur un théorème de Thue, J. Combin. Theory Ser. A 13 (1972), 90–99.
  • [9] B. Descartes and R. Descartes, La coloration des Cartes, Eureka, 31 (1968) 29–31.
  • [10] V. Dujmović, L. Esperet, G. Joret, B. Walczak, and D. R.Wood, Planar graphs have bounded nonrepetitive chromatic number, Advances in Combinatorics, 5, (2020).
  • [11] J. Grytczuk, Thue type problems for graphs, points, and numbers. Discrete Math. 308 (2008), 4419–4429.
  • [12] J. Grytczuk, J. Przybyło, and X. Zhu, Nonrepetitive list colourings of paths, Random Structures and Algorithms 38 (2011), 162–173.
  • [13] J. Grytczuk, J. Kozik, and P. Micek, New approach to nonrepetitive sequences, Random Structures and Algorithms 42 (2013), 214–225.
  • [14] J. Grytczuk, P. Szafruga, M. Zmarz, Online version of the theorem of Thue, Inform. Process. Lett. 113 (2013), 193–195.
  • [15] J. Grytczuk, H. Kordulewski, and A. Niewiadomski, Extremal square-free words, Electron. J. Combin. 27 (2020), P1.48.
  • [16] T. Harju. Disposability in square-free words. Theoretical Computer Science 862 (2021), 155–159.
  • [17] L. Hong and S. Zhang, No extremal square-free words over large alphabets, Available at https://arxiv.org/abs/2107.13123.
  • [18] T.R. Jensen, B. Toft, Graph Coloring Problems, John Wiley, New York (1995).
  • [19] B. Keszegh, X. Zhu, A note about online nonrepetitive coloring k-trees, Discrete Applied Mathematics 285 (2020) 108–112.
  • [20] A. Kündgen and M. J. Pelsmajer, Nonrepetitive colorings of graphs of bounded tree-width, Discrete Math. 308 (2008) 4473–4478.
  • [21] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983.
  • [22] M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press, 2002.
  • [23] M. M. Noori, J. D. Currie, Dejean’s conjecture and Strumian words, European J. Combin. 28 (2007), 876–890.
  • [24] J. M. Ollagnier, Proof of Dejean’s conjecture for alphabets with 5, 6, 7, 8, 9, 10 and 11 letters, Theoret. Comput. Sci. 95 (1992), 187–205.
  • [25] J.-J. Pansiot, A propos d’une conjecture de F. Dejean sur les répétitions dans les mots, Discrete Appl. Math. 7 (1984), 297–311.
  • [26] M. Rosenfeld, Another approach to non-repetitive colorings of graphs of bounded degree, Electron. J. Combin. 27 (2020), P3.43.
  • [27] M. Rosenfeld, Avoiding squares over words with lists of size three amongst four symbols, arXiv:2104.09965 [math.CO].
  • [28] R. Thomas, An update on the Four-Color Theorem, Notices Amer. Math. Soc., 45/7, (1998) 848–859.
  • [29] A. Thue, Über unendliche Zeichenreichen, Norske vid. Selsk. Skr. Mat. Nat. Kl. 7 (1906), 1–22. Reprinted in Selected Mathematical Papers of Axel Thue, T. Nagell, editor, Universitetsforlaget, Oslo, 1977, pp. 139–158.
  • [30] D. R. Wood, Nonrepetitive graph colouring, Dynamic Surveys, Electronic J. Combin. (2021) DS24.