This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Compressibility and probabilistic proofs

Alexander Shen LIRMM CNRS & University of Montpellier. On leave from IITP RAS, Moscow, Russia. E-mail address: alexander.shen@lirmm.fr. Supported by ANR-15-CE40-0016-01 RaCAF grant.
Abstract

We consider several examples of probabilistic existence proofs using compressibility arguments, including some results that involve Lovász local lemma.

1 Probabilistic proofs: a toy example

There are many well known probabilistic proofs that objects with some properties exist. Such a proof estimates the probability for a random object to violate the requirements and shows that it is small (or at least strictly less than 11). Let us look at a toy example.

Consider a n×nn\times n Boolean matrix and its k×kk\times k minor (the intersection of kk rows and kk columns chosen arbitrarily). We say that the minor is monochromatic if all its elements are equal (either all zeros or all ones).

Proposition.

For large enough nn and for k=O(logn)k=O(\log n), there exists a (n×n)(n\times n)-matrix that does not contain a monochromatic (k×k)(k\times k)-minor.

Proof.

We repeat the same simple proof three times, in three different languages.

(Probabilistic language) Let us choose matrix elements using independent tosses of a fair coin. For a given kk colums and kk rows, the probability of getting a monochromatic minor at their intersection is 2k2+12^{-k^{2}+1}. (Both zero-minor and one-minor have probability 2k22^{-k^{2}}.) There are at most nkn^{k} choices for columns and the same number for rows, so by the union bound the probability of getting at least one monochromatic minor is bounded by

nk×nk×2k2+1=22klognk2+1=2k(2lognk)+1n^{k}\times n^{k}\times 2^{-k^{2}+1}=2^{2k\log n-k^{2}+1}=2^{k(2\log n-k)+1}

and the last expression is less then 11 if, say, k=3lognk=3\log n and nn is suffuciently large.

(Combinatorial language) Let us count the number of bad matrices. For a given choice of columns and rows we have 22 possibilities for the minor and 2n2k22^{n^{2}-k^{2}} possibilities for the rest, and there is at most nkn^{k} choices for raws and columns, so the total number of matrices with monochromatic minor is

nk×nk×2×2n2k2=2n2+2klognk2+1=2n2+k(2lognk)+1,n^{k}\times n^{k}\times 2\times 2^{n^{2}-k^{2}}=2^{n^{2}+2k\log n-k^{2}+1}=2^{n^{2}+k(2\log n-k)+1},

and this is less than 2n22^{n^{2}}, the total number of Boolean (n×n)(n\times n)-matrices.

(Compression language) To specify the matrix that has a monochromatic minor, it is enough to specify 2k2k numbers between 11 and nn (rows and column numbers), the color of the monochromatic minor (0 or 11) and the remaining n2k2n^{2}-k^{2} bits in the matrix (their positions are already known). So we save k2k^{2} bits (compared to the straightforward list of all n2n^{2} bits) using 2klogn+12k\log n+1 bits instead (each number in the range 1n1\ldots n requires logn\log n bits; to be exact, we may use logn\lceil\log n\rceil), so we can compress the matrix with a monochromatic minor if 2klogn+1k22k\log n+1\ll k^{2}, and not all matrices are compressible. ∎

Of course, these three arguments are the same: in the second one we multiply probabilities by 2n22^{n^{2}}, and in the third one we take logarithms. However, the compression language provides some new viewpoint that may help our intuition.

2 A bit more interesting example

In this example we want to put bits (zeros and ones) around the circle in a “essentially asymmetric” way: each rotation of the circle should change at least a fixed percentage of bits. More precisely, we are interested in the following statement:

Proposition.

There exists ε>0\varepsilon>0 such for every suffuciently large nn there exists a sequence x0x1xn1x_{0}x_{1}\ldots x_{n-1} of bits such that for every k=1,2,,n1k=1,2,\ldots,n-1 the cyclic shift by kk positions produces a sequence

y0=xk,y1=xk+1,,yn1=xk1,y_{0}=x_{k},y_{1}=x_{k+1},\ldots,y_{n-1}=x_{k-1},

that differs from xx in at least εn\varepsilon n positions (the Hamming distance between xx and yy is at least εn\varepsilon n).

Refer to caption
Figure 1: A string x0xn1x_{0}\ldots x_{n-1} is bad if most of the dotted lines connect equal bits
Proof.

Assume that some rotation (cyclic shift by kk positions) transforms xx into a string yy that coincides almost everywhere with xx. We may assume that kn/2k\leq n/2: the cyclic shift by kk positions changes as many bits as the cyclic shift by nkn-k (the inverse one). Imagine that we dictate the string xx from left to right. First kk bits we dictate normally. But then the bits start to repeat (mostly) the previous ones (kk positions before), so we can just say “the same” or “not the same”, and if ε\varepsilon is small, we know that most of the time we say “the same”. Technically, we have εn\varepsilon n different bits, and at least nkn/2n-k\geq n/2 bits to dictate after the first kk, so the fraction of “not the same” signals is at most 2ε2\varepsilon. It is well known that strings of symbols where some symbols appear more often than others can be encoded efficiently. Shannon tells us that a string with two symbols with frequencies pp and qq (so p+q=1p+q=1) can be encoded using

H(p,q)=plog1p+qlog1qH(p,q)=p\log\frac{1}{p}+q\log\frac{1}{q}

bits per symbol and that H(p,q)=1H(p,q)=1 only when p=q=1/2p=q=1/2. In our case, for small ε\varepsilon, one of the frequencies is close to 0 (at most 2ε2\varepsilon), and the other one is close to 11, so H(p,q)H(p,q) is significantly less than 11. So we get a significant compression for every string that is bad for the theorem, therefore most string are good (so good string do exist).

More precisely, every string x0xn1x_{0}\ldots x_{n-1} that does not satisfy the requirements, can be described by

  • kk  [logn\log n bits]

  • x0,,xk1x_{0},\ldots,x_{k-1}  [kk bits]

  • xkx0,xk+1x1,,xn1xnk1x_{k}\oplus x_{0},x_{k+1}\oplus x_{1},\ldots,x_{n-1}\oplus x_{n-k-1}   [nkn-k bits where the fraction of 11s is at most 2ε2\varepsilon, compressed to (nk)H(2ε,12ε)(n-k)H(2\varepsilon,1-2\varepsilon) bits]

For ε<1/4\varepsilon<1/4 and for large enough nn the economy in the third part (compared to nkn-k) is more important than logn\log n in the first part. ∎

Of course, this is essentially a counting argument: the number of strings of length (nk)(n-k) where the fraction of 11s is at most 2ε2\varepsilon, is bounded by 2H(2ε,12ε)(nk)2^{H(2\varepsilon,1-2\varepsilon)(n-k)} and we show that the bound for the number of bad strings,

k=1n/22k2H(2ε,12ε)(nk)\sum_{k=1}^{n/2}2^{k}2^{H(2\varepsilon,1-2\varepsilon)(n-k)}

is less than the total number of strings (2n2^{n}). Still the compression metaphor makes the proof more intuitive, at least for some readers.

3 Lovász local lemma and
Moser–Tardos algorithm

In our examples of probabilistic proofs we proved the existence of objects that have some property by showing that most objects have this property (in other words, that the probability of this property to be true is close to 11 under some natural distribibution). Not all probabilistic proofs go like that. One of the exceptions is the famous Lovász local lemma (see, e.g., [1]). It can be used in the situations where the union bound does not work: we have too many bad events, and the sum of their probabilities exceeds 11 even if probability of each one is very small. Still Lovász local lemma shows that these bad events do not cover the probability space entirely, assuming that the bad events are “mainly independent”. The probability of avoiding these bad events is exponentially small, still Lovász local lemma provides a positive lower bound for it.

This means, in particular, that we cannot hope to construct an object satisfying the requirements by random trials, so the bound provided by Lovász local lemma does not give us a randomized algorithm that constructs the object with required properties with probability close to 11. Much later Moser and Tardos [4, 5] suggested such an algorithm — in fact a very simple one. In other terms, they suggested a different distribution under which good objects form a majority.

We do not discuss the statement of Lovász local lemma and Moser–Tardos algorithm in general. Instead, we provide two examples when they can be used, and the compression-language proofs that can be considered as ad hoc versions of Moser–Tardos argument. These two examples are (1) satisfiability of formulas in conjunctive normal form (CNF) and (2) strings without forbidden factors.

4 Satisfiable CNF

A CNF (conjunctive normal form) is a propositional formula that is a conjuction of clauses. Each clause is a disjunction of literals; a literal is a propositional variable or its negation. For example, CNF

(¬p1p2p4)(¬p2p3¬p4)(\lnot p_{1}\lor p_{2}\lor p_{4})\land(\lnot p_{2}\lor p_{3}\lor\lnot p_{4})

consists of two clauses. First one prohibits the case when p1=truep_{1}=\textsc{true}, p2=falsep_{2}=\textsc{false}, p4=falsep_{4}=\textsc{false}; the second one prohibits the case when p2=truep_{2}=\textsc{true}, p3=falsep_{3}=\textsc{false}, p4=truep_{4}=\textsc{true}. A CNF is satisfiable if it has a satisfying assigment (that makes all clauses true, avoiding the prohibited combinations). In our example there are many satisfying assigments. For example, if p1=falsep_{1}=\textsc{false} and p3=truep_{3}=\textsc{true}, all values of other variables are OK.

We will consider CNF where all clauses include nn literals with nn different variables (from some pool of variables that may contain much more than nn variables). For a random assignment (each variable is obtained by an independent tossing of a fair coin) the probability to violate a clause of this type is 2n2^{-n} (one of 2n2^{n} combinations of values for nn variables is forbidden). Therefore, if the number of clauses of this type is less than 2n2^{n}, then the formula is satisfiable. This is a tight bound: using 2n2^{n} clauses with the same variables, we can forbid all the combinations and get an unsatisfiable CNF.

The following result says that we can guarantee the satisfiability for formuli with much more clauses. In fact, the total number of clauses may be arbitrary (but still we consider finite formulas, of course). The only thing we need is the “limited dependence” of clauses. Let us say that two clauses are neighbors if they have a common variable (or several common variables). The clauses that are not neighbors correspond to independent events (for a random assignment). The following statement says that if the number of neighbors of each clause is bounded, then CNF is guaranteed to be satisfisable.

Proposition.

Assume that each clause in some CNF contains nn literals with different variables and has at most 2n32^{n-3} neighbor clauses. Then the CNF is satisfiable.

Note that 2n32^{n-3} is a rather tight bound: to forbid all the combinations for some nn variables, we need only 2n2^{n} clauses.

Proof.

It is convenient to present a proof using the compression language, as suggested by Lance Fortnow. Consider the following procedure Fix(C)\textsc{Fix}(C) whose argument is a clause (from our CNF).

{ CC is false }

Fix(C)\textsc{Fix}(C):

Resample(C)\textsc{Resample}(C)

for all CC^{\prime} that are neighbors of CC:

if CC^{\prime} is false then Fix(C)\textsc{Fix}(C^{\prime})

{ CC is true; other clauses that were true remain true }

Here Resample(C)\textsc{Resample}(C) is the procedure that assigns fresh random values to all variables in CC. The pre-condition (the first line) says that the procedure is called only in the situation where CC is false. The post-condition (the last line) says that if the procedure terminates, then CC is true after termination, and, moreover, all other clauses of our CNF that were true before the call remain true. (The ones that were false may be true or false.)

Note that up to now we do not say anything about the termination: note that the procedure is randomized and it may happen that it does not terminate (for example, if all Resample calls are unlucky to choose the same old bad values).

Simple observation: if we have such a procedure, we may apply it to all clauses one by one and after all calls (assuming they terminate and the procedure works according to the specification) we get a satisfying assignment.

Another simple observation: it is easy to prove the “conditional correctness” of the procedure Fix(C)\textsc{Fix}(C). In other words, it achieves its goal assuming that (1) it terminates; (2) all the recursive calls Fix(C)\textsc{Fix}(C^{\prime}) achieve their goals. It is almost obvious: the Resample(C)\textsc{Resample}(C) call may destroy (=make false) only clauses that are neighbors to CC, and all these clauses are Fix-ed after that. Note that CC is its own neighbor, so the for-loop includes also a recursive call Fix(C)\textsc{Fix}(C), so after all these calls (that terminate and satisfy the post-condition by assumption) the clause CC and all its neighbors are true and no other clause is damaged.

Note that the last argument remains valid even if we delete the only line that really changes something, i.e., the line Resample(C)\textsc{Resample}(C). In this case the procedure never changes anything but still is conditionally correct; it just does not terminate if one of the clauses is false.

It remains to prove that the call Fix(C)\textsc{Fix}(C) terminates with high probability. In fact, it terminates with probability 11 if there are no time limits and with probability exponentially close to 11 in polynomial time. To prove this, one may use a compression argument: we show that if the procedure works for a long time without terminating, then the sequence of random bits used for resampling is compressible. We assume that each call of Resample()\textsc{Resample}() uses nn fresh bits from the sequence. Finally, we note that this compressibility may happen only with exponentially small probability.

Imagine that Fix(C)\textsc{Fix}(C) is called and during its recursive execution performs many calls

Resample(C1),,Resample(CN)\textsc{Resample}(C_{1}),\ldots,\textsc{Resample}(C_{N})

(in this order) but does not terminate (yet). We stop it at some moment and examine the values of all the variables.

Lemma.

Knowing the values of the variables after these calls and the sequence C1,,CNC_{1},\ldots,C_{N}, we can reconstruct all the NnNn random bits used for resampling.

Proof of the lemma.

Let us go backwards. By assumption we know the values of all variables after the calls. The procedure Resample(CN)\textsc{Resample}(C_{N}) is called only when CNC_{N} is false, and there is only one nn-tuple of values that makes CNC_{N} false. Therefore we know the values of all variables before the last call, and also know the random bits used for the last resampling (since we know the values of variables after resampling).

The same argument shows that we can reconstruct the values of variables before the preceding call Resample(CN1)\textsc{Resample}(C_{N-1}), and random bits used for the resampling in this call, etc. ∎

Now we need to show that the sequence of clauses C1,,CNC_{1},\ldots,C_{N} used for resampling can be described by less bits than nNnN (the number of random bits used). Here we use the assumption saying each clause has at most 2n32^{n-3} neighbors and that the clauses CC^{\prime} for which Fix(C)\textsc{Fix}(C^{\prime}) is called from Fix(C)\textsc{Fix}(C), are neighbors of CC.

One could try to say that since Ci+1C_{i+1} is a neighbor of CiC_{i}, we need only n3n-3 bits to specify it (there are at most 2n32^{n-3} neighbors by assumption), so we save 33 bits per clause (compared to nn random bits used by resampling). But this argument is wrong: Ci+1C_{i+1} is not always the neighbor of CiC_{i}, since we may return from a recursive call that causes resampling of CiC_{i} and then make a new recursive call that resamples Ci+1C_{i+1}.

To get a correct argument, we should look more closely at the tree of recursive calls generated by one call Fix(C)\textsc{Fix}(C) (Fig. 2). In this tree the sons of each vertex correspond to neighbor clauses of the father-clause.

Refer to caption
Figure 2: The tree of recursive calls for Fix(C1)\textsc{Fix}(C_{1}) (up to some moment)

The sequence of calls is determined by a walk in this tree, but we go up and down, not only up (as we assumed in the wrong argument). How many bits we need to encode this walk (and therefore the sequence of calls)? We use one bit to distinguish between steps up and down. If we are going down, no other information is needed. If we are going up (and resample a new clause), we need one bit to say that we are going up, and n3n-3 bits for the number of neighbor we are going to. For accounting purposes we combine these bits with a bit needed to encode the step back (this may happen later or not happen at all), and we see that in total we need at most (n3)+1+1=n1(n-3)+1+1=n-1 bits per each resampling. This is still less than nn, so we save one bit for each resampling. If NN is much bigger than the number of variables, we indeed compress the sequence of random bits used for resampling, and this happens with exponentially small probability.

This argument finishes the proof. ∎

5 Tetris and forbidden factors

The next example is taken from word combinatorics. Assume that a list of binary strings F1,,FkF_{1},\ldots,F_{k} is given. These FiF_{i} are considered as “forbidden factors”: this means that we want to construct a (long) string XX that does not have any of FiF_{i} as a factor (i.e., none of FiF_{i} is a substring of XX). This may be possible or not depending on the list. For example, if we consider two strings 0,110,11 as forbidden factors, every string of length 22 or more has a forbidden factor (we cannot use zeros at all, and two ones are forbidden).

The more forbidden factors we have, the more chances that they block the growth in the sense that every sufficiently long string has a forbidden factor. Of course, not only the number of factors matters: e.g., if we consider 0,000,00 as forbidden factors, then we have long strings of ones without forbidden factors. However, now we are interested in quantitative results of the following type: if the number of forbidden factors of length jj is aja_{j}, and the numbers aja_{j} are “not too big”, then there exists an arbitrarily long string without forbidden factors.

This question can be analyzed with many different tools, including Lovász local lemma (see [8]) and Kolmogorov complexity. Using a complexity argument, Levin proved that if aj=2αja_{j}=2^{\alpha j} for some constant α<1\alpha<1, then there exists a constant MM and an infinite sequence that does not contain forbidden factors of length smaller than MM. (See [9, Section 8.5] for Levin’s argument and other related results.) A nice sufficient condition was suggested by Miller [3]: we formulate the statement for the arbitrary alphabet size.

Proposition.

Consider an alphabet with mm letters. Assume that for each j2j\geq 2 we have aja_{j} “forbidden” strings of length jj. Assume that there exist some constant x>0x>0 such that

j2ajxj<mx1\sum_{j\geq 2}a_{j}x^{j}<mx-1

Then there exist arbitrarily long strings that do not contain forbidden substrings.

Remarks. 1. We do not consider j=1j=1, since this means that some letters are deleted from the alphabet.

2. By compactness the statement implies that there exists an infinite sequence with no forbidden factors.

3. The constant xx should be at least 1/m1/m, otherwise the right hand side is negative. This means that aj/mja_{j}/m^{j} should be small, and this corresponds to our intution (aja_{j} should be significantly less than mjm^{j}, the total number of strings of length jj).

The original proof from [3] uses some ingenious potential function defined on strings: Miller shows that if its value is less than 11, then one can add some letter preserving this property. It turned out (rather misteriously) that exactly the same condition can be obtained by a completely different argument (following [2, 6]) — so probably the inequality is more fundamental than it may seem! This argument is based on compression.

Proof.

Here is the idea. We start with an empty string and add randomly chosen letters to its right end. If some forbidden string appears as a suffix, it is immediately deleted. So forbidden strings may appear only as suffixes, and only for a short time. After this “backtracking” we continue adding new letters. (This resembles the famous “tetris game” when blocks fall down and then disappear under some conditions.)

We want to show that if this process is unsuccessful in the sense that after many steps we still have a short string, then the sequence of added random letters is compressible, so this cannot happen always, and therefore a long string without forbidden factors exists. Let us consider a “record” (log file) for this process that is a sequence of symbols “++” and “+deleted string+\langle\text{deleted string}\rangle” (for each forbidden string we have a symbol, plus one more symbol without a string). If a letter was added and no forbidden string appears, we just add ‘++’ to the record. If we have to delete some forbidden string ss after a letter was added, we write this string in brackets after the ++ sign. Note that we do not record the added letters, only the deleted substrings. (It may happen that several forbidden suffixes appear; in this case we may choose any of them.)

Lemma.

At every stage of the process the current string and the record uniquely determine the sequence of random letters used.

Proof of the lemma.

Having this information, we can reconstruct the configuration going backwards. This reversed process has steps where a forbidden string is added (and we know which one, since it is written in brackets in the record), and also steps when a letter is deleted (and we know which letter is deleted, i.e., which random letter was added when moving forwards). ∎

If after many (say, TT) steps we still have a short current string, then the sequence of random letters can be described by the record (due to the Lemma; we ignore the current string part since it is short). As we will see, the record can be encoded with less bits than it should have been (i.e., less than TlogmT\log m bits). Let us describe this encoding and show that it is efficient (assuming the inequality ajxj<mx1\sum a_{j}x^{j}<mx-1).

We use arithmetic encoding for the lengths. Arithmetic encoding for MM symbols starts by choosing positive reals q1,,qMq_{1},\ldots,q_{M} such that q1++qM=1q_{1}+\ldots+q_{M}=1. Then we split the interval [0,1][0,1] into parts of length q1,,qMq_{1},\ldots,q_{M} that correspond to these MM symbols. Adding a new symbol corresponds to splitting the current interval in the same proportion and choosing the right subinterval. For example, the sequence (a,b)(a,b) corresponds to bbth subinterval of aath interval; this interval has length qaqbq_{a}q_{b}. The sequence (a,b,,c)(a,b,\ldots,c) corresponds to interval of length qaqbqcq_{a}q_{b}\ldots q_{c} and can be reconstructed given any point of this interval (assuming q1,,qMq_{1},\ldots,q_{M} are fixed); to specify some binary fraction in this interval we need at most log(qaqbqc)+O(1)-\log(q_{a}q_{b}\ldots q_{c})+O(1) bits, i.e., logqalogqblogqc+O(1)-\log q_{a}-\log q_{b}-\ldots-\log q_{c}+O(1) bits.

Now let us apply this technique to our situation. For ++ without brackets we use log(1/p0)\log(1/p_{0}) bits, and for +s+\langle s\rangle where ss is of length jj, we use log(1/pj)+logaj\log(1/p_{j})+\log a_{j} bits. Here pjp_{j} are some positive reals to be chosen later; we need p0+pj=1p_{0}+\sum p_{j}=1. Indeed, we may split pjp_{j} into aja_{j} equal parts (of size pj/ajp_{j}/a_{j}) and use these parts as qsq_{s} in the description of arithmetical coding above; splitting adds logaj\log a_{j} to the code length for strings of length jj.

To bound the total number of bits used for encoding the record, we perform amortised accounting and show that the average number of bits per letter is less than logm\log m. Note that the number of letters is equal to the number of ++ signs in the record. Each ++ without brackets increases the length of the string by one letter, and we want to use less that logmc\log m-c bits for its encoding, where c>0c>0 is some constant saying how much is saved as a reserve for amortized analysis. And +s+\langle s\rangle for a string ss of length jj decreases the length by j1j-1, so we want to use less than logm+c(j1)\log m+c(j-1) bits (using the reserve).

So we need:

log(1/p0)\displaystyle\log(1/p_{0}) <logmc;\displaystyle<\log m-c;
log(1/pj)+logaj\displaystyle\log(1/p_{j})+\log a_{j} <logm+c(j1)\displaystyle<\log m+c(j-1)

together with

p0+j2pj=1.p_{0}+\sum_{j\geq 2}p_{j}=1.

Technically is it easier to use non-strict inequalities in the first two cases and a strict one in the last case (and then increase pip_{i} a bit):

log(1/p0)logmc;log(1/pj)+logajlogm+c(j1);p0+j2pj<1.\log(1/p_{0})\leq\log m-c;\ \log(1/p_{j})+\log a_{j}\leq\log m+c(j-1);\ p_{0}+\sum_{j\geq 2}p_{j}<1.

Then for a given cc we take minimal possible pip_{i}:

p0\displaystyle p_{0} =1m2c\displaystyle=\frac{1}{m2^{-c}}
pj\displaystyle p_{j} =aj(2c)jm2c\displaystyle=\frac{a_{j}(2^{-c})^{j}}{m2^{-c}}

and it remains to show that the sum is less than 11 for a suitable choice of cc. Let x=2cx=2^{-c}, then the inequality can be rewritten as

1mx+j2ajxjmx<1,\frac{1}{mx}+\sum_{j\geq 2}\frac{a_{j}x^{j}}{mx}<1,

or

j2ajxj<mx1,\sum_{j\geq 2}a_{j}x^{j}<mx-1,

and this is our assumption.

Now we see the role of this mystical xx in the condition: it is just a parameter that determines the constant used for the amortised analysis. ∎

Acknowledgement. Author thanks his LIRMM colleagues, in particular Pascal Ochem and Daniel Gonçalves, as well as the participants of Kolmogorov seminar in Moscow.

References

  • [1] N. Alon, J.H. Spencer, The Probabilistic Method, Wiley, 2004.
  • [2] D. Gonçalves, M. Montassier, A. Pinlou, Entropy compression method applied to graph colorings, https://arxiv.org/pdf/1406.4380.pdf.
  • [3] J. Miller, Two notes on subshifts, Proceedings of the AMS, 140, 1617-1622 (2012).
  • [4] R. Moser, A constructive proof of the Lovász local lemma, https://arxiv.org/abs/0810.4812.
  • [5] R. Moser, G. Tardos, A constructive proof of the general Lovász local lemma, Journal of the ACM, 57(2), 11.1–11.15 (2010).
  • [6] P. Ochem, A. Pinlou, Application of Entropy Compression in Pattern Avoidance, The Electronic Journal of Combinatorics, 21:2, paper P2.7 (2014).
  • [7] A. Rumyantsev, A. Shen, Probabilistic Constructions of Computable Objects and a Computable Version of Lovász Local Lemma, Fundamenta Informaticae, 132, 1–14 (2013), see also https://arxiv.org/abs/1305.1535
  • [8] A. Rumyantsev, M. Ushakov, Forbidden substrings, Kolmogorov complexity and almost periodic sequences, STACS 2006 Proceedings, Lecture Notes in Computer Science, 3884, 396–407, see also https://arxiv.org/abs/1009.4455.
  • [9] A. Shen, V.A. Uspensky, N. Vereshchagin, Kolmogorov complexity and algorithmic randomness, to be published by the AMS, www.lirmm.fr/~ashen/kolmbook-eng.pdf. (Russian version published by MCCME (Moscow), 2013.)

Appendix

There is one more sufficient condition for the existence of arbitrarily long sequences that avoid forbidden substrings. Here is it.111A more general algebraic fact about ideals in a free algebra with mm generators is sometimes called Golod theorem; N. Rampersad in https://arxiv.org/pdf/0907.4667.pdf gives a reference to Rowen’s book (L. Rowen, Ring Theory, vol. II, Pure and Applied Mathematics, Academic Press, Boston, 1988, Lemma 6.2.7). This more general statement concerns ideals generated not necessarily by strings (products of generators), but by arbitrary uniform elements. The original paper is: Е.С. Голод, И.Р. Шафаревич, О башне полей классов, Известия АН СССР, серия математическая, 1964, 28:2, 261–272, http://www.mathnet.ru/links/f17df1a72a73e5e73887c19b7d47e277/im2955.pdf. If the power series for

11mx+a2x2+a3x3+\frac{1}{1-mx+a_{2}x^{2}+a_{3}x^{3}+\ldots}

(where aia_{i} is the number of forbidden strings of length mm) has all positive coefficients, then there exist arbitrarily long strings withour forbidden substrings. Moreover, in this case the number of nn-letter strings without forbidden substrings is at least gng_{n}, where gng_{n} is the nnth coefficient of this inverse series.

To prove this result, consider the number sks_{k} of allowed strings of length kk. It is easy to see that

sk+1skmsk1a2sk2a3s1aks0ak+1.s_{k+1}\geq s_{k}m-s_{k-1}a_{2}-s_{k-2}a_{3}-\ldots-s_{1}a_{k}-s_{0}a_{k+1}.

Indeed, we can add each of mm letters to each of sks_{k} strings of length kk, and then we should exclude the cases where there is a forbidden string at the end. This forbidden string may have length 22, then there are at most sk1a2s_{k-1}a_{2} possibilities, or length 33, there are at most sk2a3s_{k-2}a_{3} possibilities, etc. (Note that s0=1s_{0}=1 and s1=ms_{1}=m; note also that we can get a string with two forbidden suffixes, but this is OK, since we have an inequality.) These inequalities can be rephrased as follows: the product

(1+mx+s2x2+s3x3+)(1mx+a2x2+a3x3+)(1+mx+s_{2}x^{2}+s_{3}x^{3}+\ldots)(1-mx+a_{2}x^{2}+a_{3}x^{3}+\ldots)

has only non-negative coefficients. Denote the second term by AA; if

1/A=1+mx+g2x2+g3x3+1/A=1+mx+g_{2}x^{2}+g_{3}x^{3}+\ldots

has only positive coefficients gig_{i}, (as our assumption says), then the first term is a product of two series with non-negative coefficients. The first factor (1/A1/A) starts with 11, so the nnth coefficient of a product, i.e., sns_{n}, is not less than nnth coefficient of the second factor, i.e., gng_{n}.

Surprisingly, this condition is closely related to the one considered above, as shown by Dmitry Piontkovsky (his name has a typo in the publication: Д.И. Пиотковский, О росте градуированных алгебр с небольшим числом определяющих соотношений, УМН, 1993,48:3(291), 199–200, http://www.mathnet.ru/links/6034910939adb12fff0cd8fb9745dfc8/rm1307.pdf):

Proposition.

Series

11mx+a2x2+a3x3+\frac{1}{1-mx+a_{2}x^{2}+a_{3}x^{3}+\ldots}

has all positive coefficients if and only if the series in the denominator has a root on a positive part of real line.

Proof.

Assume that the series in the denominator does not have a root, but the inverse series has all positive coefficients. In fact, non-negative coefficients are enough to get a contradiction. For a series with all non-negative coefficients, or with finitely many negative coefficients, the radius of convergence is determined by behavior of the sum on the real line: when the argument approaches the convergence radius, the sum of the series goes to infinity. Now we have the product of two series

(1mx+a2x2+a3x3+)(1+mx+g2x2+g3x3+)=1(1-mx+a_{2}x^{2}+a_{3}x^{3}+\ldots)(1+mx+g_{2}x^{2}+g_{3}x^{3}+\ldots)=1

that is equal to 11. One of these series should have finite convergence radius, otherwise both are everywhere defined and the product is everywhere 11, but both are large for large xx. Look at the minimal convergence radius (of two); one of the series goes to infinity near the corresponding point on the real line, so the other one converges to zero, so it has bigger convergence radius and reaches zero at the real line. Finally, note that only the first factor (the denominator) may have a zero, since the other one has all non-negative coefficients.

Now assume that the denominator has a zero; we have to prove that the inverse series has only positive coefficients. In general, the following result is true (D. Piontkovsky): if the series

A=a0+a1x+a2x2+A=a_{0}+a_{1}x+a_{2}x^{2}+\ldots

has a0>0a_{0}>0, and a2,a3,0a_{2},a_{3},\ldots\geq 0, and for some positive xx this series converges to 0, then the inverse series has all positive coefficients. To prove this statement, let α\alpha be the root, so A(α)=0A(\alpha)=0. Recall the long division process that computes the inverse series. It produce the sequence of remainders: the first R0R_{0} is 11; then we subtract from the kkth remainder

R(k)=Rk(k)xk+Rk+1(k)xk+1+R^{(k)}=R^{(k)}_{k}x^{k}+R^{(k)}_{k+1}x^{k+1}+\ldots

the product (R(k)/a0)Axk(R^{(k)}/a_{0})Ax^{k} to cancel the first term, and get the next remainder R(k+1)R^{(k+1)}. By induction we prove that for each remainder R(k)R^{(k)}:

  • R(k)(α)=1R^{(k)}(\alpha)=1;

  • all the coefficients Rk+1(k)R^{(k)}_{k+1}, Rk+2(k),R^{(k)}_{k+2},\ldots, except the first one, are negative or zeros;

  • the first coefficient Rk(k)R^{(k)}_{k} is positive.

The first claim is true, since it was true for R(k1)R^{(k-1)} by induction assumption and we subtract a series that equals zero at α\alpha.

The second claim: by induction assumption the first coefficient in R(k1)R^{(k-1)} was positive, so we subtract the series (R(k1)/a0)Axk1(R^{(k-1)}/a_{0})Ax^{k-1} with positive first and non-negative third, fourth, etc. coefficients. The first term cancels the first term in R(k1)R^{(k-1)}, the second term does not matter now, but all the subsequent coefficients are negative or zeros, since we subtract non-negative coefficients from non-positive ones.

Finally, the third claim is the consequence of the first two: if the sum is positive (equal to 11) and all the terms except one are non-positive, then the remaining term is positive.

Therefore, all coefficients in the inverse series are positive. ∎

Note that we have shown that if all coefficients of the series 1/A1/A are non-negative, then they are positive. Also note that we get a bit stronger result compared to the entropy argument where we required the series to reach a negative value (now the zero value is enough).