This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A construction of a λ\displaystyle\lambda-Poisson generic sequence

Verónica Becher and Gabriel Sac Himelfarb
Abstract

Years ago Zeev Rudnick defined the λ\displaystyle\lambda-Poisson generic sequences as the infinite sequences of symbols in a finite alphabet where the number of occurrences of long words in the initial segments follow the Poisson distribution with parameter λ\displaystyle\lambda. Although almost all sequences, with respect to the uniform measure, are Poisson generic, no explicit instance has yet been given. In this note we give a construction of an explicit λ\displaystyle\lambda-Poisson generic sequence over any alphabet and any positive λ\displaystyle\lambda, except for the case of the two-symbol alphabet, in which it is required that λ\displaystyle\lambda be less than or equal to the natural logarithm of 2\displaystyle 2. Since λ\displaystyle\lambda-Poisson genericity implies Borel normality, the constructed sequences are Borel normal. The same construction provides explicit instances of Borel normal sequences that are not λ\displaystyle\lambda-Poisson generic.

Keywords: Poisson generic, normal numbers, de Bruijn sequence

MSC Classification: 11K16,05A05, 60G55.

1 Introduction and Statement of Results

A real number is Poisson generic to an integer base b\displaystyle b greater than or equal to 2\displaystyle 2 if the number of occurrences of long blocks in the initial segments of its base-b\displaystyle b expansion follow the Poisson distribution. The definition was given years ago by Zeev Rudnick [1, 11], who thought of it as a property stronger than Borel normality that still holds for almost all real numbers with respect to the Lebesgue measure.111He called the notion supernormality. Personal communication from Z. Rudnick to V. Becher, 24 May 2017. He was motivated his result in [10] that in almost all dilates of lacunary sequences the number of elements in a random interval of the size of the mean spacing follows the Poisson law. Rudnick asked for an explicit instance of a Poisson generic real number.

Since we consider fractional expansions of real numbers in a fixed integer base, the definition of Poisson genericity can be given for infinite sequences of symbols in a finite alphabet. We write 0\displaystyle\mathbb{N}_{0} for the set of non-negative integers, and \displaystyle\mathbb{N} for the set of positive integers. Let Ω\displaystyle\Omega be an alphabet of b\displaystyle b symbols, for b2\displaystyle b\geq 2. We write Ω\displaystyle\Omega^{\mathbb{N}} for the set of infinite sequences of symbols in Ω\displaystyle\Omega. The finite sequences of symbols in Ω\displaystyle\Omega are called words and  Ωk\displaystyle\Omega^{k} denotes the set of words of length k\displaystyle k.

We number the positions in words and infinite sequences starting at 1\displaystyle 1 and we write w[ij]\displaystyle w[i...j] for the subsequence of w\displaystyle w that begins in position i\displaystyle i and ends in position j\displaystyle j. For a word w\displaystyle w we denote its length as |w|\displaystyle|w|. Given two words w\displaystyle w and v\displaystyle v, the number of occurrences of v\displaystyle v in w\displaystyle w is:

|v|w=#{1i|v||w|+1:v[ii+|w|1]=w}.\displaystyle|v|_{w}=\#\{1\leq i\leq|v|-|w|+1:v[i...i+|w|-1]=w\}.

For example, |0001|00=2\displaystyle|0001|_{00}=2.

For xΩ\displaystyle x\in\Omega^{\mathbb{N}}, a positive real number λ\displaystyle\lambda, i0\displaystyle i\in\mathbb{N}_{0} and k\displaystyle k\in\mathbb{N} we write Zi,kλ(x)\displaystyle Z^{\lambda}_{i,k}(x) for the proportion of words of length k\displaystyle k that occur exactly i\displaystyle i times in x[1..λbk+k1]\displaystyle x[1..\lfloor\lambda b^{k}\rfloor+k-1],

Zi,kλ(x)=#{wΩk:|x[1λbk+k1]|w=i}bk.\displaystyle Z^{\lambda}_{i,k}(x)=\frac{\#\{w\in\Omega^{k}:|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}=i\}}{b^{k}}.
Definition 1.

Let λ\displaystyle\lambda be a positive real number. A sequence xΩ\displaystyle x\in\Omega^{\mathbb{N}} is λ\displaystyle\lambda-Poisson generic if for every i0\displaystyle i\in\mathbb{N}_{0},

limkZi,kλ(x)=eλλii!.\displaystyle\lim_{k\rightarrow\infty}Z^{\lambda}_{i,k}(x)=e^{-\lambda}\frac{\lambda^{i}}{i!}.

A sequence is Poisson generic if it is λ\displaystyle\lambda-Poisson generic for all positive real numbers λ\displaystyle\lambda.

The λ\displaystyle\lambda-Poisson generic property can be thought of in terms of random allocations of balls in bins, where the N=λbk\displaystyle N=\lfloor\lambda b^{k}\rfloor initial words of length k\displaystyle k of a random sequence are the balls, and the bk\displaystyle b^{k} possible words of length k\displaystyle k are the bins. These allocations are almost independent: it can be checked that the probability that two words in Ωk\displaystyle\Omega^{k} picked uniformly at random appear in fixed overlapping positions is exactly b2k\displaystyle b^{-2k}, as if they were independent. The occupancy of a random bin satisfies a Poisson law in the limit, the proof can be read from [8, Example III.10].

Benjamin Weiss and Yuval Peres [11] proved that almost all sequences with respect to the uniform measure222The uniform measure over Ω\displaystyle\Omega^{\mathbb{N}} is the infinite product of the uniform measure over the alphabet Ω\displaystyle\Omega. The uniform measure on Ω\displaystyle\Omega^{\mathbb{N}} coincides with the Lebesgue measure when we identify the real numbers with their fractional expansions in each given integer base. are Poisson generic. In fact, they proved the following stronger result: Consider the finite probability spaces Ωk\displaystyle\Omega^{k}, k\displaystyle k\in\mathbb{N}, with the uniform probability measure μk\displaystyle\mu^{k}. Fix xΩ\displaystyle x\in\Omega^{\mathbb{N}}, and define on these spaces, for each bounded Borel set S+\displaystyle S\subset\mathbb{R}^{+}, the integer valued random variable Mkx(S)\displaystyle M_{k}^{x}(S) in the following way: Mkx(S)(ω)\displaystyle M_{k}^{x}(S)(\omega) counts how many times the word ω\displaystyle\omega occurs in x\displaystyle x at a position in the set {bks:sS}\displaystyle{\mathbb{N}}\cap\{b^{k}s:s\in S\}. Then, for almost all x\displaystyle x with respect to the uniform measure, Mkx()\displaystyle M_{k}^{x}(\cdot) converges in distribution to the Poisson point process in the positive real line as k\displaystyle k converges to infinity, see also [1, Theorem 1]. Since

Zi,kλ(x)=μk({ωΩk:Mkx((0,λ])(ω)=i})\displaystyle Z_{i,k}^{\lambda}(x)=\mu^{k}\left(\{\omega\in\Omega^{k}:M^{x}_{k}((0,\lambda])(\omega)=i\}\right)

it follows that almost all xΩ\displaystyle x\in\Omega^{\mathbb{N}} with respect to the uniform measure are Poisson generic. Despite this result, no explicit example has yet been given. The following is the main result of this note and its corollary gives a construction of an explicit λ\displaystyle\lambda-Poisson generic sequence over any alphabet and any positive λ\displaystyle\lambda, except for the case of the two-symbol alphabet, in which it is required that λ\displaystyle\lambda be less than or equal to the natural logarithm of 2\displaystyle 2.

Theorem 1.

Let λ\displaystyle\lambda be a positive real number and Ω\displaystyle\Omega a b\displaystyle b-symbol alphabet. Let (pi)i0\displaystyle(p_{i})_{i\in\mathbb{N}_{0}} be a sequence of non-negative real numbers such that i0pi=1\displaystyle\sum\limits_{i\geq 0}p_{i}=1 and i0ipi=λ\displaystyle\sum\limits_{i\geq 0}ip_{i}=\lambda, and let p0\displaystyle p_{0} be greater than or equal to 1/2\displaystyle 1/2 if b=2\displaystyle b=2. Then, there is a construction of an infinite sequence x\displaystyle x over alphabet Ω\displaystyle\Omega, which satisfies for every i0\displaystyle i\in\mathbb{N}_{0},

limkZi,kλ(x)=pi.\displaystyle\lim_{k\rightarrow\infty}Z^{\lambda}_{i,k}(x)=p_{i}.

By taking pi=eλλi/i!\displaystyle p_{i}=e^{-\lambda}\lambda^{i}/i! we obtain the promised result. In the sequel we write ln\displaystyle\ln for the natural logarithm, namely, the logarithm in base e\displaystyle e.

Corollary 1.

Let Ω\displaystyle\Omega be b\displaystyle b-symbol alphabet. In case b=2\displaystyle b=2, fix a positive real number λ\displaystyle\lambda less than or equal to ln(2)\displaystyle\ln(2); otherwise fix any positive real number λ\displaystyle\lambda. Then, there is a construction of a λ\displaystyle\lambda-Poisson generic sequence xΩ\displaystyle x\in\Omega^{\mathbb{N}}.

To prove Theorem 1 we give a construction that consists in concatenating segments of any infinite de Bruijn sequence (see Definition 3), which is a sequence that satisfies that each initial segment of length bk\displaystyle b^{k} is a cyclic de Bruijn word of order k\displaystyle k [2, Theorem 1],. Our construction works by selecting segments of this given sequence and repeating them as many times as determined by the probabilities pi\displaystyle p_{i}, for every i0\displaystyle i\in\mathbb{N}_{0}.

Remark 1.

An infinite sequence x=a1a2\displaystyle x=a_{1}a_{2}\ldots of symbols in a given alphabet is computable exactly when the map kak\displaystyle k\mapsto a_{k} is computable. Since the set of computable sequences is countable, it has uniform measure 0\displaystyle 0, so the existence of computable Poisson generic sequences does not necessarily follow from the fact that the set of Poisson generic sequences has full measure. In [1, Theorem 2] it is shown that there exist countably many Poisson generic computable sequences. Theorem 1 yields an explicit computable instance whenever (pi)i\displaystyle(p_{i})_{i\in\mathbb{N}} is a computable sequence of real numbers, which means that the map (i,n)\displaystyle(i,n)\mapsto the n\displaystyle n-th digit in the base-b\displaystyle b expansion of pi\displaystyle p_{i}, is computable.

For the next result we consider Borel’s definition of normality for sequences of symbols in a given alphabet. An introduction to the theory of normal numbers can be read from [9, 3].

Definition 2.

Let Ω\displaystyle\Omega be a b\displaystyle b-symbol alphabet, b2\displaystyle b\geq 2. A sequence xΩ\displaystyle x\in\Omega^{{\mathbb{N}}} is Borel normal if every word w\displaystyle w occurs in x\displaystyle x with the same limiting frequency as every other word of the same length,

limn|x[1..n]|wn=b|w|.\displaystyle\lim_{n\to\infty}\frac{|x[1..n]|_{w}}{n}=b^{-|w|}.

In [11] Weiss showed that 1\displaystyle 1-Poisson genericity implies Borel normality and that the two notions do not coincide, witnessed by the famous Champernowne sequence333Bejamin Weiss first presented this proof at the Institute for Advanced Study, Princeton University USA on June 16, 2010, as part of his conference on “Random-like behavior in deterministic systems”. It is available at https://www.youtube.com/watch?v=8AB7591De68&ab_channel=InstituteforAdvancedStudy. It was transcribed and completed in [4]. It is immediate to see that the infinite de Bruijn sequences (see Definition 3) are not 1\displaystyle 1-Poisson generic either. In Theorem 2 we present a Borel normality criterion that generalizes this fact. In contrast to Theorem 1, this result has no limitations in the case of the two-symbol alphabet.

Theorem 2.

Let Ω\displaystyle\Omega be a b\displaystyle b-symbol alphabet, b2\displaystyle b\geq 2, and let xΩ\displaystyle x\in\Omega^{\mathbb{N}}. We fix a positive real number λ\displaystyle\lambda and define for every i0\displaystyle i\in\mathbb{N}_{0} the numbers pi=lim infkZi,kλ(x)\displaystyle p_{i}=\liminf_{k\rightarrow\infty}Z_{i,k}^{\lambda}(x). If the numbers pi\displaystyle p_{i} satisfy i0ipi=λ\displaystyle\sum_{i\geq 0}ip_{i}=\lambda then x\displaystyle x is Borel normal to base b\displaystyle b.

Remark 2.

It is easy to verify that if numbers pi\displaystyle p_{i} are defined as in Theorem 2 then it is always the case that i0ipiλ\displaystyle\sum\limits_{i\geq 0}ip_{i}\leq\lambda (for example, it follows from Fatou’s Lemma).

The following is a consequence of Theorem 2.

Corollary 2.

Every λ\displaystyle\lambda-Poisson generic sequence is Borel normal, but the two notions do not coincide. The construction in Theorem 1 yields infinitely many Borel normal sequences which are not λ\displaystyle\lambda-Poisson generic.

2 Proof of Theorem 1

2.1 The construction

A cyclic de Bruijn word of order n\displaystyle n in a b\displaystyle b-symbol alphabet is a word w\displaystyle w of length bn\displaystyle b^{n} where each word of length n\displaystyle n occurs exactly once in the circular word determined by w\displaystyle w. The classical reference is [6] but they have been found independently also by I. J. Good and by N. Korobov around the same time. Our construction is based on the following property of de Bruijn words.

Lemma 1 (Becher and Heiber [2, Theorem 1]).
  1. 1.

    Every cyclic de Bruijn word of order n\displaystyle n over an alphabet of at least three symbols can be extended to a cyclic de Bruijn word of order n+1\displaystyle n+1.

  2. 2.

    Every de Bruijn word of order n\displaystyle n in two symbols can not be extended to order n+1\displaystyle n+1, but it can be extended to order n+2\displaystyle n+2.

For example, consider the alphabet {0,1,2}\displaystyle\{0,1,2\}. Then, 012110022\displaystyle 012110022 is a cyclic de Bruijn word of order 2\displaystyle 2 which can be extended to 012110022010200011120212221\displaystyle 012110022010200011120212221, which is a cyclic de Bruijn word of order 3\displaystyle 3.

Lemma 1 allows us to define infinite de Bruijn sequences.

Definition 3.

An infinite de Bruijn sequence in a b\displaystyle b-symbol alphabet Ω\displaystyle\Omega, b3\displaystyle b\geq 3, is an infinite sequence xΩ\displaystyle x\in\Omega^{\mathbb{N}} such that for every k\displaystyle k\in\mathbb{N}, x[1bk]\displaystyle x[1...b^{k}] is a cyclic de Bruijn word of order k\displaystyle k. In the case b=2\displaystyle b=2, we say xΩ\displaystyle x\in\Omega^{\mathbb{N}} is an infinite de Bruijn sequence if for every k\displaystyle k\in\mathbb{N}, x[122k1]\displaystyle x[1...2^{2k-1}] is a cyclic de Bruijn word of order 2k1\displaystyle 2k-1.

Given a real number y[0,1)\displaystyle y\in[0,1), we write {y}k\displaystyle\{y\}_{k} for the truncation to k\displaystyle k digits of the unique base-b\displaystyle b representation of y\displaystyle y which does not end in an infinite tail of (b1)\displaystyle(b-1)’s. In the sole case y=1\displaystyle y=1, we choose the base-b\displaystyle b representation i1(b1)bi\displaystyle\sum_{i\geq 1}(b-1)b^{-i}.

Construction.

Let λ\displaystyle\lambda be a positive real number and Ω\displaystyle\Omega a b\displaystyle b-symbol alphabet , b2\displaystyle b\geq 2. Let (pi)i0\displaystyle(p_{i})_{i\in\mathbb{N}_{0}} be a sequence of non-negative real numbers such that i0pi=1\displaystyle\sum\limits_{i\geq 0}p_{i}=1 and i0ipi=λ\displaystyle\sum\limits_{i\geq 0}ip_{i}=\lambda, and let p01/2\displaystyle p_{0}\geq 1/2 if b=2\displaystyle b=2. We define g:\displaystyle g:\mathbb{N}\rightarrow\mathbb{N} as g(k)=k2\displaystyle g(k)=\left\lceil\frac{k}{2}\right\rceil and we define the real numbers (pik)i0,k1\displaystyle(p_{i}^{k})_{i\geq 0,k\geq 1} inductively as follows. For every i1\displaystyle i\geq 1,

pi1\displaystyle\displaystyle p_{i}^{1} ={pi}g(1),\displaystyle\displaystyle=\{p_{i}\}_{g(1)},
p01\displaystyle\displaystyle p_{0}^{1} =1i1pi1.\displaystyle\displaystyle=1-\sum_{i\geq 1}p_{i}^{1}.

And for every k1\displaystyle k\geq 1 and i1\displaystyle i\geq 1,

pik+1\displaystyle\displaystyle p_{i}^{k+1} =1bpik+{b1bpi}g(k+1)\displaystyle\displaystyle=\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}
p0k+1\displaystyle\displaystyle p_{0}^{k+1} =1i0pik+1.\displaystyle\displaystyle=1-\sum_{i\geq 0}p_{i}^{k+1}.

We fix an infinite de Bruijn sequence A\displaystyle A over the alphabet Ω\displaystyle\Omega. We define Ak\displaystyle A_{k} to be A[1bk]\displaystyle A[1...b^{k}].

Given a sequence w\displaystyle w of length bk\displaystyle b^{k} we say δ\displaystyle\delta is a block in w\displaystyle w if it is a subsequence of w\displaystyle w and |δ|=bjbk\displaystyle|\delta|=b^{j}\leq b^{k} for some j0\displaystyle j\in\mathbb{N}_{0}. We say that a block δ\displaystyle\delta in w\displaystyle w has absolute length |δ|\displaystyle|\delta| and relative length |δ|bk\displaystyle|\delta|b^{-k} with respect to w\displaystyle w.

The construction works by steps. Let xk\displaystyle x_{k} be the output of the construction after Step k\displaystyle k. For all k\displaystyle k, xk\displaystyle x_{k} is a prefix of xk+1\displaystyle x_{k+1}. The output of the construction is the infinite word x\displaystyle x obtained as the limit of the finite words xk\displaystyle x_{k}. Start with x0\displaystyle x_{0} equal to the empty word.

Step 1.

In this first step, we consider the base-b\displaystyle b expansion of pi1\displaystyle p_{i}^{1}, for i1\displaystyle i\geq 1,

pi1=0.ci\displaystyle p_{i}^{1}=0.c_{i}

For each ci\displaystyle c_{i}, i1\displaystyle i\geq 1, we choose ci\displaystyle c_{i} blocks of relative length b1\displaystyle b^{-1} with respect to A1\displaystyle A_{1}, that is, blocks of length 1\displaystyle 1 (if ci=0\displaystyle c_{i}=0 we don’t choose any blocks). The selected blocks should be non-overlapping. This is possible thanks to the fact that i1pi11\displaystyle\sum_{i\geq 1}p_{i}^{1}\leq 1. We select the blocks from left to right, leaving no gaps at the beginning or in between blocks.

The output of the construction after Step 1 is the concatenation of the chosen blocks, in any order, where for every i1\displaystyle i\geq 1 each of the ci\displaystyle c_{i} selected blocks is repeated exactly i\displaystyle i times.

Step k+1.

We consider the base-b\displaystyle b expansion of {b1bpi}g(k+1)\displaystyle\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)} for i1\displaystyle i\geq 1:

{b1bpi}g(k+1)=0.ai,1ai,2..ai,g(k+1)\displaystyle\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}=0.a_{i,1}a_{i,2}..a_{i,g(k+1)}

where ai,j{0,1,2,,b1}\displaystyle a_{i,j}\in\{0,1,2,...,b-1\}.

We select blocks in Ak+1\displaystyle A_{k+1} in the following manner: for each ai,j\displaystyle a_{i,j}, i1\displaystyle i\geq 1, jg(k+1)\displaystyle j\leq g(k+1), we choose ai,j\displaystyle a_{i,j} blocks of relative length bj\displaystyle b^{-j} with respect to Ak+1\displaystyle A_{k+1}. If ai,j=0\displaystyle a_{i,j}=0 we don’t select any blocks. Notice that only finitely many blocks are selected. All the selected blocks should be non-overlapping. This is possible due to the fact that

i11jg(k+1)ai,j1bj=i1{b1bpi}g(k+1)b1b=|Ak+1[bk+1bk+1]|bk+1\displaystyle\sum_{i\geq 1}\sum_{1\leq j\leq g(k+1)}a_{i,j}\frac{1}{b^{j}}=\sum_{i\geq 1}\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}\leq\frac{b-1}{b}=\frac{|A_{k+1}[b^{k}+1...b^{k+1}]|}{b^{k+1}}

In the case b=3\displaystyle b=3, we can select the blocks anywhere in Ak+1\displaystyle A_{k+1}, so there could be gaps between the blocks selected at step k\displaystyle k and the ones at step k+1\displaystyle k+1. For example, we may take blocks from Ak+1[bk+1bk+1]\displaystyle A_{k+1}[b^{k}+1...b^{k+1}]. In the case b=2\displaystyle b=2, however, we do not allow gaps.

The construction now appends the chosen blocks to xk\displaystyle x_{k}. For every i1\displaystyle i\geq 1, jg(k+1)\displaystyle j\leq g(k+1), each of the ai,j\displaystyle a_{i,j} selected blocks is repeated exactly i\displaystyle i times. We refer to each of the chosen blocks of A\displaystyle A as constituent segments in the output xk+1\displaystyle x_{k+1}. We say that the concatenation of i\displaystyle i-many copies of a constituent segment corresponding to ai,j\displaystyle a_{i,j} is a run segment in the output.

2.2 An example

To illustrate the way the construction works, we give an example of three steps of the execution. Just to make the example more enlightening, we set g(k)=k\displaystyle g(k)=k in this section. Take p0=0\displaystyle p_{0}=0, p1=1/2\displaystyle p_{1}=1/2, p2=5/18\displaystyle p_{2}=5/18, p3=2/9\displaystyle p_{3}=2/9, and pi=0\displaystyle p_{i}=0 for i4\displaystyle i\geq 4. In this case λ=31/18\displaystyle\lambda=31/18. Now fix b=3\displaystyle b=3, Ω={0,1,2}\displaystyle\Omega=\{0,1,2\} and

A=012110022010200011120212221\displaystyle A=012110022010200011120212221...

p1=0.1111\displaystyle p_{1}=0.1111... and 23p1=0.1000\displaystyle\frac{2}{3}p_{1}=0.1000...

p2=0.0211\displaystyle p_{2}=0.0211... and 23p2=0.0120\displaystyle\frac{2}{3}p_{2}=0.0120...

p3=0.0200\displaystyle p_{3}=0.0200... and 23p3=0.0110\displaystyle\frac{2}{3}p_{3}=0.0110...

Step 1:

A=012|110022|010200011120212221|\displaystyle A=\fcolorbox[gray]{0}{0.9}{0}12\bigg{|}110022\bigg{|}010200011120212221\bigg{|}...
x1=0\displaystyle x_{1}=\fcolorbox[gray]{0}{0.9}{0}

Step 2:

A=012|110022|010200011120212221|\displaystyle A=012\bigg{|}\fcolorbox[gray]{0}{0.9}{110}\fcolorbox[gray]{0}{0.7}{0}\fcolorbox[gray]{0}{0.5}{2}2\bigg{|}010200011120212221\bigg{|}...
x2=011000222\displaystyle x_{2}=0\fcolorbox[gray]{0}{0.9}{110}\fcolorbox[gray]{0}{0.7}{0}\fcolorbox[gray]{0}{0.7}{0}\fcolorbox[gray]{0}{0.5}{2}\fcolorbox[gray]{0}{0.5}{2}\fcolorbox[gray]{0}{0.5}{2}

In this case 0, 110, 0 and 2 are the constituent segments of x2\displaystyle x_{2}.

Step 3:

A=012|110022|010200011120021221|\displaystyle A=012\bigg{|}110022\bigg{|}\fcolorbox[gray]{0}{0.9}{010200011}\fcolorbox[gray]{0}{0.7}{120}\fcolorbox[gray]{0}{0.5}{021}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.5}{1}\bigg{|}...
x3=0110002220102000111201202122122122222111\displaystyle x_{3}=011000222\fcolorbox[gray]{0}{0.9}{010200011}\fcolorbox[gray]{0}{0.7}{120}\fcolorbox[gray]{0}{0.7}{120}\fcolorbox[gray]{0}{0.5}{212}\fcolorbox[gray]{0}{0.5}{212}\fcolorbox[gray]{0}{0.5}{212}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.5}{1}\fcolorbox[gray]{0}{0.5}{1}\fcolorbox[gray]{0}{0.5}{1}

In this case, 120120 is the run segment corresponding to the constituent segment 120, and 212212212 is the run segment corresponding to the constituent segment 212.

2.3 Correctness

To prove the correctness of the construction we use the following fact and five lemmas.

Fact 1.

For every y[0,1)\displaystyle y\in[0,1) and every k1\displaystyle k\geq 1, y1bk<{y}ky\displaystyle y-\frac{1}{b^{k}}<\{y\}_{k}\leq y. In the case y=1\displaystyle y=1, {y}k=y1bk\displaystyle\{y\}_{k}=y-\frac{1}{b^{k}}.

Lemma 2.

Let b=2\displaystyle b=2 and k2\displaystyle k\geq 2. Then, at step k\displaystyle k of the construction it is always possible to choose all necessary blocks from A[12k1]\displaystyle A[1...2^{k-1}].

Proof.

First of all, notice that for k1\displaystyle k\geq 1, the relative length with respect to Ak+1\displaystyle A_{k+1} of the blocks we need to choose at step k+1\displaystyle k+1 is

i11jg(k+1)ai,j12j=i1{12pi}g(k+1)12i1pi=12(1p0)14=|A[2k1+12k]|2k+1,\displaystyle\sum_{i\geq 1}\sum_{1\leq j\leq g(k+1)}a_{i,j}\frac{1}{2^{j}}=\sum_{i\geq 1}\left\{\frac{1}{2}p_{i}\right\}_{g(k+1)}\leq\frac{1}{2}\sum_{i\geq 1}p_{i}=\frac{1}{2}(1-p_{0})\leq\frac{1}{4}=\frac{|A[2^{k-1}+1...2^{k}]|}{2^{k+1}},

where ai,j\displaystyle a_{i,j} has the same meaning as in the construction, and we used the hypothesis p012\displaystyle p_{0}\geq\frac{1}{2} in the last inequality.

This means that A[2k1+12k]\displaystyle A[2^{k-1}+1...2^{k}] has enough space to accommodate all the necessary blocks at step k+1\displaystyle k+1. Then, we only need to check that A[2k2+12k1]\displaystyle A[2^{k-2}+1\dots 2^{k-1}] is free at step k\displaystyle k for every k2\displaystyle k\geq 2. We can check it inductively. In the first step, the used proportion of A1\displaystyle A_{1} is

i1pi1i1pi=1p012,\displaystyle\sum_{i\geq 1}p_{i}^{1}\leq\sum_{i\geq 1}p_{i}=1-p_{0}\leq\frac{1}{2},

where the last inequality holds because p01/2\displaystyle p_{0}\geq 1/2. Then, at least half of A1=A[12]\displaystyle A_{1}=A[1...2] remains unused after step 1, so A[222+1221]=A[22]\displaystyle A[2^{2-2}+1...2^{2-1}]=A[2...2] is free at step 2\displaystyle 2. This proves the base case.

Now suppose that at step k\displaystyle k, A[2k2+12k1]\displaystyle A[2^{k-2}+1...2^{k-1}] is free. Thanks to the first observation, we can choose all necessary blocks there. This leaves A[2k1+12k]\displaystyle A[2^{k-1}+1...2^{k}] free to use at step k+1\displaystyle k+1.

Lemma 3.

For every i1\displaystyle i\geq 1 the sum of the relative lengths with respect to Ak\displaystyle A_{k} of all constituent segments in the output xk\displaystyle x_{k} that are repeated exactly i\displaystyle i times is pik\displaystyle p_{i}^{k}.

Proof.

It can easily be checked by induction on k\displaystyle k, using the definition of pik\displaystyle p_{i}^{k} and the way the construction operates. If k=1\displaystyle k=1, this is immediately true by Step 1 of the construction. Assuming the statement is true for k\displaystyle k, let us see it is also true for k+1\displaystyle k+1. Notice that blocks that occur in xk\displaystyle x_{k} have a relative length in Ak+1\displaystyle A_{k+1} which is 1/b\displaystyle 1/b of their relative length in Ak\displaystyle A_{k}. The extra blocks added contribute with {b1bpi}g(k+1)\displaystyle\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)} to the sum. Then, the sum of the relative lengths with respect to Ak+1\displaystyle A_{k+1} is

1bpik+{b1bpi}g(k+1)=pik+1.\displaystyle\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}=p_{i}^{k+1}.

Lemma 4.

For every i0\displaystyle i\in\mathbb{N}_{0}, limkpik=pi\displaystyle\lim\limits_{k\rightarrow\infty}p_{i}^{k}=p_{i}. In fact, for every i1\displaystyle i\geq 1, k1\displaystyle k\geq 1, the following estimation holds,

pikbg(k)pikpi.p_{i}-\frac{k}{b^{g(k)}}\leq p_{i}^{k}\leq p_{i}. (\displaystyle\dagger)
Proof.

For i1\displaystyle i\geq 1 we prove (\displaystyle\dagger4) by induction on k\displaystyle k. If k=1\displaystyle k=1 it follows immediately from the definition of pi1\displaystyle p_{i}^{1} and Fact 1. For the inductive step, notice that

pik+1=1bpik+{b1bpi}g(k+1)1bpi+b1bpipi.\displaystyle\displaystyle p_{i}^{k+1}=\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}\leq\frac{1}{b}p_{i}+\frac{b-1}{b}p_{i}\leq p_{i}.
pipik+1=pi(1bpik+{b1bpi}g(k+1))\displaystyle\displaystyle p_{i}-p_{i}^{k+1}=p_{i}-\left(\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}\right) pi1b(pikbg(k)){b1bpi}g(k+1)\displaystyle\displaystyle\leq p_{i}-\frac{1}{b}\left(p_{i}-\frac{k}{b^{g(k)}}\right)-\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}
b1bpi{b1bpi}g(k+1)+kb1+g(k)\displaystyle\displaystyle\leq\frac{b-1}{b}p_{i}-\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}+\frac{k}{b^{1+g(k)}}
1bg(k+1)+kb1+g(k)\displaystyle\displaystyle\leq\frac{1}{b^{g(k+1)}}+\frac{k}{b^{1+g(k)}}
k+1bg(k+1).\displaystyle\displaystyle\leq\frac{k+1}{b^{g(k+1)}}.

In the last inequality we used the fact that g(k+1)g(k)+1\displaystyle g(k+1)\leq g(k)+1.

In the case of i=0\displaystyle i=0,

|p0kp0|=|1i1pik(1i1pi)|=i1(pipik).\displaystyle\displaystyle|p_{0}^{k}-p_{0}|=\left|1-\sum_{i\geq 1}p_{i}^{k}-\left(1-\sum_{i\geq 1}p_{i}\right)\right|=\sum_{i\geq 1}(p_{i}-p_{i}^{k}).

Given ε>0\displaystyle\varepsilon>0, there exists N>0\displaystyle N>0 such that iN+1pi<ε2\displaystyle\sum_{i\geq N+1}p_{i}<\frac{\varepsilon}{2}. Then,

|p0kp0|i=1N(pipik)+iN+1pi<kNbg(k)+ε2.\displaystyle|p_{0}^{k}-p_{0}|\leq\sum_{i=1}^{N}(p_{i}-p_{i}^{k})+\sum_{i\geq N+1}p_{i}<\frac{kN}{b^{g(k)}}+\frac{\varepsilon}{2}.

If k\displaystyle k is big enough, then kNbg(k)<ε2\displaystyle\frac{kN}{b^{g(k)}}<\frac{\varepsilon}{2} and |p0kp0|<ε\displaystyle|p_{0}^{k}-p_{0}|<\varepsilon, as desired. ∎

Lemma 5.

Let xk\displaystyle x_{k} be the word output by the construction after step k\displaystyle k. Then,

limk|λbk+k1|xk||bk=0.\displaystyle\lim_{k\rightarrow\infty}\frac{|\lfloor\lambda b^{k}\rfloor+k-1-|x_{k}||}{b^{k}}=0.
Proof.

By Lemma 3, |xk|=bki1ipik\displaystyle|x_{k}|=b^{k}\sum\limits_{i\geq 1}ip_{i}^{k}. Then,

|λbk+k1|xk||bkk1bk+|λbkλbk|bk+|λi1ipik|kbk+|λi1ipik|.\displaystyle\frac{|\lfloor\lambda b^{k}\rfloor+k-1-|x_{k}||}{b^{k}}\leq\frac{k-1}{b^{k}}+\frac{|\lfloor\lambda b^{k}\rfloor-\lambda b^{k}|}{b^{k}}+\left|\lambda-\sum\limits_{i\geq 1}ip_{i}^{k}\right|\leq\frac{k}{b^{k}}+\left|\lambda-\sum\limits_{i\geq 1}ip_{i}^{k}\right|.

It suffices to prove then that the last term converges to zero. Recall that (pi)i0\displaystyle(p_{i})_{i\in\mathbb{N}_{0}} satisfies λ=i1ipi\displaystyle\lambda=\sum\limits_{i\geq 1}ip_{i}. Hence,

|λi1ipik|=i1i(pipik).\displaystyle\left|\lambda-\sum\limits_{i\geq 1}ip_{i}^{k}\right|=\sum\limits_{i\geq 1}i(p_{i}-p_{i}^{k}).

Given ε>0\displaystyle\varepsilon>0 take N\displaystyle N big enough so that iN+1ipi<ε2\displaystyle\sum\limits_{i\geq N+1}ip_{i}<\frac{\varepsilon}{2}. By means of Equation (\displaystyle\dagger4) of Lemma 4,

i1i(pipik)i=1Nikbg(k)+iN+1ipi<kbg(k)N(N+1)2+ε2.\displaystyle\sum\limits_{i\geq 1}i(p_{i}-p_{i}^{k})\leq\sum\limits_{i=1}^{N}i\frac{k}{b^{g(k)}}+\sum\limits_{i\geq N+1}ip_{i}<\frac{k}{b^{g(k)}}\frac{N(N+1)}{2}+\frac{\varepsilon}{2}.

Clearly, when k\displaystyle k is large enough, kbg(k)N(N+1)2<ε2\displaystyle\frac{k}{b^{g(k)}}\frac{N(N+1)}{2}<\frac{\varepsilon}{2}. ∎

Recall that each of the blocks of A\displaystyle A used at step k\displaystyle k is a constituent segment in the output xk\displaystyle x_{k}, and the concatenation of i\displaystyle i-many copies of a constituent segment corresponding to ai,j\displaystyle a_{i,j} is a run segment. Let Bk\displaystyle B_{k} be the number of run segments in the output xk\displaystyle x_{k}.

Lemma 6.

The quantities Bk\displaystyle B_{k} satisfy

limkkBkbk=0.\displaystyle\lim_{k\rightarrow\infty}\frac{kB_{k}}{b^{k}}=0.
Proof.

Notice that for every 2\displaystyle\ell\geq 2.

1bg()(BB1)\displaystyle\displaystyle\frac{1}{b^{g(\ell)}}(B_{\ell}-B_{\ell-1}) =1bg()i1j=1g()ai,j\displaystyle\displaystyle=\frac{1}{b^{g(\ell)}}\sum_{i\geq 1}\sum_{j=1}^{g(\ell)}a_{i,j}
i1j=1g()1bjai,j\displaystyle\displaystyle\leq\sum_{i\geq 1}\sum_{j=1}^{g(\ell)}\frac{1}{b^{j}}a_{i,j}
i1b1bpi\displaystyle\displaystyle\leq\sum_{i\geq 1}\frac{b-1}{b}p_{i}
1.\displaystyle\displaystyle\leq 1.

Recall g()=2\displaystyle g(\ell)=\left\lceil\frac{\ell}{2}\right\rceil. Notice

Bk=B1+=2kBB1.\displaystyle B_{k}=B_{1}+\sum\limits_{\ell=2}^{k}B_{\ell}-B_{\ell-1}.

Then we obtain

kBkbk\displaystyle\displaystyle\frac{kB_{k}}{b^{k}}\leq k=2kbg()+kB1bk\displaystyle\displaystyle\frac{k\sum\limits_{\ell=2}^{k}b^{g(\ell)}+kB_{1}}{b^{k}}
k=0kb1+/2+kB1bk\displaystyle\displaystyle\leq\frac{k\sum\limits_{\ell=0}^{k}b^{1+\ell/2}+kB_{1}}{b^{k}}
b(b1/21)1k(b(k+1)/21)+kB1bk,\displaystyle\displaystyle\leq\frac{b(b^{1/2}-1)^{-1}k(b^{(k+1)/2}-1)+kB_{1}}{b^{k}},

which converges to 0\displaystyle 0 as k\displaystyle k goes to infinity. ∎

Remark 3.

Notice there are many other alternatives for the function g\displaystyle g. For instance, any function which satisfies the following conditions is a suitable choice:

  • g(k)k/m\displaystyle g(k)\leq k/m for every k\displaystyle k\in\mathbb{N}, where m\displaystyle m is a constant greater than 1\displaystyle 1.

  • g(k+1)g(k)+1\displaystyle g(k+1)\leq g(k)+1 for every k\displaystyle k\in\mathbb{N}.

  • kbg(k)k0\displaystyle\frac{k}{b^{g(k)}}\xrightarrow{k\rightarrow\infty}0

The first and second condition ensure Lemma 6 and 4 still hold, respectively. The third condition guarantees that limkpik=pi\displaystyle\lim_{k\rightarrow\infty}p_{i}^{k}=p_{i}. For example, g(k)=k\displaystyle g(k)=\lceil\sqrt{k}\rceil is a possible choice, but g(k)=logb(k)\displaystyle g(k)=\lceil\log_{b}(k)\rceil is not because the last condition fails.

We are now ready for the proof of Theorem 1.

Proof of Theorem 1.

First we consider the case i1\displaystyle i\geq 1 and we estimate the value of Zi,kλ(x)\displaystyle Z^{\lambda}_{i,k}(x). Since we are only interested in the value of Zi,kλ(x)\displaystyle Z^{\lambda}_{i,k}(x) as k\displaystyle k tends to infinity, by Lemma 5, it suffices to count occurrences of words in xk\displaystyle x_{k} instead of x[1λbk]\displaystyle x[1...\lfloor\lambda b^{k}\rfloor].

Let us see that the chosen constituent segments up to step k\displaystyle k don’t have any words of length k\displaystyle k in common. By Definition 3, if b3\displaystyle b\geq 3, each word of length k\displaystyle k occurs exactly once in Ak\displaystyle A_{k} so the claim follows. If b=2\displaystyle b=2 and k\displaystyle k is odd, then A[12k]\displaystyle A[1...2^{k}] is a de Bruijn word of order k\displaystyle k, so it does not repeat words of length k\displaystyle k. If b=2\displaystyle b=2 and k\displaystyle k is even, A[12k1]\displaystyle A[1...2^{k-1}] is a de Bruijn word of order k1\displaystyle k-1, so it does not repeat words of length k1\displaystyle k-1 (hence, it does not repeat words of length k\displaystyle k either). Since up to step k\displaystyle k the construction has picked blocks only from A[12k1]\displaystyle A[1...2^{k-1}] (thanks to Lemma 2), two different constituent segments share no words of length k\displaystyle k.

Therefore, if a constituent segment w\displaystyle w of relative length bj\displaystyle b^{-j} with respect to Ak\displaystyle A_{k} and absolute length bkj\displaystyle b^{k-j} is repeated exactly i\displaystyle i times in xk\displaystyle x_{k}, we may assume it contributes with bkj\displaystyle b^{k-j} words to the numerator of the counting function Zi,kλ(x)\displaystyle Z^{\lambda}_{i,k}(x), that is, it contributes to Zi,kλ(x)\displaystyle Z^{\lambda}_{i,k}(x) with its relative length with respect to Ak\displaystyle A_{k}. To see this, notice that the words of length k\displaystyle k that could make the actual count differ from this approximation belong to one of the following groups:

  • 1.

    For each run segment corresponding to a constituent segment of length at least k\displaystyle k, we need to consider words that occur in between the constituent segments inside the run segment, which are at most k\displaystyle k different words, and words across the ending of the run segment and the beginning of the next one, which are also at most k\displaystyle k. Then, the number of such words is at most 2k\displaystyle 2k for each run segment.

  • 2.

    We also need to consider the number of different words in run segments composed of constituent segments of length s<k\displaystyle s<k. In this case, the first s\displaystyle s words of length k\displaystyle k in the run segment are repeated throughout the segment. There are less than k\displaystyle k extra words that occur across the end of the run segment and the beginning of the next one . Thus, there are less than s+k<2k\displaystyle s+k<2k different words for each such run segment.

The error introduced by the previous assumption is then bounded by 2kBk\displaystyle 2kB_{k}, which by Lemma 6 becomes negligible as k\displaystyle k tends to infinity.

By the previous approximations, we can estimate Zi,kλ(x)\displaystyle Z^{\lambda}_{i,k}(x) as the sum of the relative lengths of all constituent segments that occur exactly i\displaystyle i times in xk\displaystyle x_{k}, which is pik\displaystyle p_{i}^{k} by Lemma 3. We can then compute the limit of Zi,kλ(x)\displaystyle Z^{\lambda}_{i,k}(x) as k\displaystyle k tends to infinity as limkpik\displaystyle\lim\limits_{k\rightarrow\infty}p_{i}^{k}, which is equal to pi\displaystyle p_{i}, thanks to Lemma 4.

To conclude, we must consider the case i=0\displaystyle i=0. We need to compute

Z0,kλ(x)=#{wΩk:|x[1λbk+k1]|w=0}bk.\displaystyle Z^{\lambda}_{0,k}(x)=\frac{\#\{w\in\Omega^{k}:|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}=0\}}{b^{k}}.

The numerator is equal to the total number of words of length k\displaystyle k minus the number of words of length k\displaystyle k that occur at least once. Using the same approximations as before, we estimate the ratio as the relative length of the unused portion of Ak\displaystyle A_{k} after step k\displaystyle k, that is,

1i1pik=p0k.\displaystyle 1-\sum\limits_{i\geq 1}p_{i}^{k}=p_{0}^{k}.

By Lemma 4, we know that p0k\displaystyle p_{0}^{k} converges to p0\displaystyle p_{0} as k\displaystyle k goes to infinity. This concludes the proof. ∎

Remark 4.

Our construction solves the problem of exhibiting a λ\displaystyle\lambda-Poisson generic sequence for any fixed positive λ\displaystyle\lambda for a b\displaystyle b-symbol alphabet with b3\displaystyle b\geq 3, and for λln(2)\displaystyle\lambda\leq\ln(2) in case b=2\displaystyle b=2. It does not, however, allow us to generate a Poisson generic sequence. This is because we use an infinite de Bruijn sequence for the construction. Suppose b3\displaystyle b\geq 3 and we construct x\displaystyle x for λ=1\displaystyle\lambda=1. Then the frequencies for λ=1/b\displaystyle\lambda=1/b, i1\displaystyle i\geq 1, satisfy,

limkZi,k+11/b(x)1bZi,k1(x)=0.\displaystyle\lim_{k\rightarrow\infty}Z^{1/b}_{i,k+1}(x)-\frac{1}{b}Z^{1}_{i,k}(x)=0.

But this relation does not hold in the case of the probability mass function of the Poisson distribution:

e1/b1bii!e11bi!.\displaystyle e^{-1/b}\frac{1}{b^{i}i!}\neq e^{-1}\frac{1}{bi!}.

In view of Remark 4 it remains unanswered if by using a different sequence A\displaystyle A as the source for blocks, our construction could be adapted to produce a Poisson generic sequence, over any alphabet of size b2\displaystyle b\geq 2.

2.4 On the case of the two-symbol alphabet

Infinite de Bruijn sequences over an alphabet of more than two symbols satisfy, for every k\displaystyle k, that any two disjoint segments occurring before position bk\displaystyle b^{k} do not share words of length k\displaystyle k nor longer. The two-symbol alphabet does not guarantee this. Consequently, Theorem 1 solves the problem of finding λ\displaystyle\lambda-Poisson generic sequences over the two-symbol alphabet only partially, namely for λln(2)\displaystyle\lambda\leq\ln(2). To solve this problem completely it would suffice to use what we may call a quasi-de Bruijn sequence, which we define as an infinite sequence xΩ\displaystyle x\in\Omega^{\mathbb{N}} that satisfies

limkZ1,k1(x)=1.\displaystyle\lim_{k\rightarrow\infty}Z_{1,k}^{1}(x)=1.

That is, the proportion of words of length k\displaystyle k that do not occur exactly once in the prefixes x[12k+k1]\displaystyle x[1...2^{k}+k-1] converges to zero. It is quite immediate to see that we can run the construction of Theorem 1 but using as an input a quasi-de Bruijn sequence in the two-symbol alphabet.

Observe that the infinite de Bruijn sequences in the two-symbol alphabet are not necessarily quasi-de Bruijn because although their initial segments of length 22k1\displaystyle 2^{2k-1} are cyclic de Bruijn words, the initial segments of length 22k\displaystyle 2^{2k} are not. We do not know of any construction in the two-symbol alphabet proved to be quasi-de Bruijn. There is, however, empirical evidence supporting the hypothesis that the Eherenfeucht-Mycielski sequence [7] is indeed a quasi-de Bruijn sequence. Not much is known about this sequence: as of today, it hasn’t even been proven whether the limiting frequencies of zeros and ones are equal to 1/2\displaystyle 1/2.

3 Proof of Theorem 2

The proof of Theorem 2 is a simple generalization of Weiss proof that 1\displaystyle 1-Poisson genericity implies Borel normality [11]. It relies on a classical result due to Pyatetskii-Shapiro in 1951.

Lemma 7 (Pyatetskii-Shapiro [3, Theorem 4.6]).

Let Ω\displaystyle\Omega be a b\displaystyle b-symbol alphabet, b2\displaystyle b\geq 2. Let xΩ\displaystyle x\in\Omega^{\mathbb{N}}. If there exists a positive constant C\displaystyle C such that for every \displaystyle\ell\in\mathbb{N} and every word wΩ\displaystyle w\in\Omega^{\ell},

lim supN|x[1N]|wNCb,\displaystyle\limsup_{N\rightarrow\infty}\frac{|x[1...N]|_{w}}{N}\leq Cb^{-\ell},

then x\displaystyle x is Borel normal.

Let w\displaystyle w be a fixed word. We define the set Bad(k,w,ε)\displaystyle Bad(k,w,\varepsilon) as the set of words of length k\displaystyle k where the frequency of w\displaystyle w differs from the expected frequency for Borel normality by more than ε\displaystyle\varepsilon.

Bad(k,w,ε)={vΩk:||v|wkb|w||εk}\displaystyle Bad(k,w,\varepsilon)=\left\{v\in\Omega^{k}:\left||v|_{w}-kb^{-|w|}\right|\geq\varepsilon k\right\}

The cardinality of the set Bad(k,w,ε)\displaystyle Bad(k,w,\varepsilon) has exponential decay in k\displaystyle k. This follows from Bernstein’s inequality and was proved in the early works on Borel normal numbers, such as [5] and since then each work computes a similar upper bound.

Lemma 8.

Assume a b\displaystyle b-symbol alphabet. Let k\displaystyle k and \displaystyle\ell be positive integers and let ε\displaystyle\varepsilon be such that 6/k/ε1/b\displaystyle 6/\lfloor k/\ell\rfloor\leq\varepsilon\leq 1/b^{\ell}. Then, for every word w\displaystyle w of length \displaystyle\ell,

|Bad(k,w,ε)|<4bk+ebε2k/(6).\displaystyle|Bad(k,w,\varepsilon)|<4\ell b^{k+\ell}e^{-b^{\ell}\varepsilon^{2}k/(6\ell)}.

We can now give the awaiting proof.

Proof of Theorem 2.

We need to prove that x\displaystyle x is Borel normal for the b\displaystyle b-symbol alphabet, given that for all i0\displaystyle i\in\mathbb{N}_{0},

lim infkZi,kλ(x)=pi,\displaystyle\liminf_{k\to\infty}Z^{\lambda}_{i,k}(x)=p_{i},
i0ipi=λ.\displaystyle\sum_{i\geq 0}ip_{i}=\lambda.

Fix a positive real number ε\displaystyle\varepsilon. By hypothesis we know that i0ipi=λ\displaystyle\sum_{i\geq 0}ip_{i}=\lambda. Let i0\displaystyle i_{0} be such that i>i0ipi<λε2.\displaystyle\sum_{i>i_{0}}ip_{i}<\frac{\lambda\varepsilon}{2}. It follows that

i=0i0ipi>λ(1ε2).\displaystyle\sum_{i=0}^{i_{0}}ip_{i}>\lambda\left(1-\frac{\varepsilon}{2}\right).

Let k0\displaystyle k_{0} be such that for all k>k0\displaystyle k>k_{0} and 0ii0\displaystyle 0\leq i\leq i_{0},

Zi,kλ(x)>piλε2i02.\displaystyle Z^{\lambda}_{i,k}(x)>p_{i}-\frac{\lambda\varepsilon}{2i_{0}^{2}}.

Consider the positions from 1\displaystyle 1 to λbk\displaystyle\lfloor\lambda b^{k}\rfloor in x\displaystyle x. We say a position is blamed if the word of length k\displaystyle k starting at that position occurs more than i0\displaystyle i_{0} times in the prefix x[1λbk+k1]\displaystyle x[1...\lfloor\lambda b^{k}\rfloor+k-1] of x\displaystyle x. For k>k0\displaystyle k>k_{0}, we can bound the number of blamed positions between 1\displaystyle 1 and λbk\displaystyle\lfloor\lambda b^{k}\rfloor in the following way:

λbki=0i0iZi,kλ(x)bk\displaystyle\displaystyle\lfloor\lambda b^{k}\rfloor-\sum_{i=0}^{i_{0}}i\ Z^{\lambda}_{i,k}(x)b^{k} <λbkbki=0i0(ipiλε2i0)\displaystyle\displaystyle<\lambda b^{k}-b^{k}\sum_{i=0}^{i_{0}}\left(ip_{i}-\frac{\lambda\varepsilon}{2i_{0}}\right)
<λbk+bkλε2bki=0i0ipi\displaystyle\displaystyle<\lambda b^{k}+\frac{b^{k}\lambda\varepsilon}{2}-b^{k}\sum_{i=0}^{i_{0}}ip_{i}
<λbk+bkλε2bkλ(1ε2)\displaystyle\displaystyle<\lambda b^{k}+\frac{b^{k}\lambda\varepsilon}{2}-b^{k}\lambda\left(1-\frac{\varepsilon}{2}\right)
<λεbk.\displaystyle\displaystyle<\lambda\varepsilon b^{k}.

We cover the positions from 1\displaystyle 1 to λbk+k1\displaystyle\lfloor\lambda b^{k}\rfloor+k-1 with non-overlapping words of length k\displaystyle k such that no word starts at a blamed position and every position that is not blamed is covered by exactly one word. We refer to these words as covering words. Notice that a covering word may contain blamed positions as long as they are not the first one.

Occurrences of w\displaystyle w in x[1λbk+k1]\displaystyle x[1...\lfloor\lambda b^{k}\rfloor+k-1] fall into one of the following categories:

  • occurrences of w\displaystyle w starting at a blamed position. The number of such occurrences is bounded by the number of blamed positions, which is at most λεbk\displaystyle\lambda\varepsilon b^{k}.

  • occurrences of w\displaystyle w not fully contained in a covering word. Since there are at most k1λbk\displaystyle k^{-1}\lambda b^{k} covering words, the number of these occurrences is bounded by |w|k1λbk\displaystyle|w|k^{-1}\lambda b^{k}.

  • occurrences of w\displaystyle w contained in a covering word which is in Bad(k,w,ε)\displaystyle Bad(k,w,\varepsilon). Each covering word can occur at most i0\displaystyle i_{0} times, and can contain at most k\displaystyle k occurrences of w\displaystyle w. Then there are at most i0k|Bad(k,w,ε)|\displaystyle i_{0}k|Bad(k,w,\varepsilon)| occurrences of w\displaystyle w in this case. Notice that for sufficiently large k\displaystyle k, ε6/k/|w|\displaystyle\varepsilon\geq 6/\lfloor k/|w|\rfloor, so we can use the bound in Lemma 8.

  • occurrences contained in a covering word which is not in Bad(k,w,ε)\displaystyle Bad(k,w,\varepsilon). Each such word contains at most kb|w|+εk\displaystyle kb^{-|w|}+\varepsilon k occurrences of w\displaystyle w. Since there are at most k1λbk\displaystyle k^{-1}\lambda b^{k} covering words, the total number of such occurrences is at most λbk(b|w|+ε)\displaystyle\lambda b^{k}(b^{-|w|}+\varepsilon).

Combining all the upper bounds for each category yields the following upper bound for the number of occurrences of w\displaystyle w in x[1..λbk+k1]\displaystyle x[1..\lfloor\lambda b^{k}\rfloor+k-1],

|x[1λbk+k1]|wλbkε+1k|w|+4i0|w|b|w|λkeε2kb|w|/(6|w|)+b|w|+ε.\displaystyle\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k}}\leq\varepsilon+\frac{1}{k}|w|+\frac{4i_{0}|w|b^{|w|}}{\lambda}ke^{-\varepsilon^{2}kb^{|w|}/(6|w|)}+b^{-|w|}+\varepsilon.

Taking limit superior we obtain

lim supk|x[1λbk+k1]|wλbk2ε+b|w|.\displaystyle\displaystyle\limsup_{k\to\infty}\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k}}\leq 2\varepsilon+b^{-|w|}.

Since this holds for every ε1/b|w|\displaystyle\varepsilon\leq 1/b^{|w|}, it follows that

lim supk|x[1λbk+k1]|wλbkb|w|.\displaystyle\displaystyle\limsup_{k\to\infty}\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k}}\leq b^{-|w|}.

To show that x\displaystyle x is Borel normal we apply Lemma 7. Fix N\displaystyle N and let k\displaystyle k be such that λbk1N<λbk\displaystyle\lambda b^{k-1}\leq N<\lambda b^{k}. Then, using the bounds obtained before,

lim supN|x[1N]|wNlim supn|x[1λbk+k1]|wλbk1b1|w|.\displaystyle\limsup_{N\to\infty}\frac{|x[1...N]|_{w}}{N}\leq\limsup_{n\to\infty}\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k-1}}\leq b^{1-|w|}.

We conclude that x\displaystyle x is Borel normal. ∎


Acknowledgements. The authors are grateful to Zeev Rudnick and to Benjamin Weiss for their comments on our investigations and for having insisted that we solve this problem. Gabriel Sac Himelfarb is supported by the student fellowship “Beca de Estímulo a las Vocaciones Científicas” convocatoria 2020, Consejo Interuniversitario Nacional, Argentina. Verónica Becher is supported by Agencia Nacional de Promoción Científica y Tecnológica grant PICT-2018-02315 and byUniversidad de Buenos Aires grant Ubacyt 20020170100309BA.

References

  • [1] Nicolás Alvarez, Verónica Becher, and Martín Mereb. Poisson generic sequences. International Mathematics Research Notices, rnac234, 2022. DOI 10.1093/imrn/rnac234.
  • [2] Verónica Becher and Pablo Ariel Heiber. On extending de Bruijn sequences. Information Processing Letters, 111:930–932, 2011.
  • [3] Yann Bugeaud. Distribution modulo one and Diophantine approximation, volume 193 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 2012.
  • [4] Lucas Puterman Colomer. Very normal numbers, 2019. Tesis de Licenciatura en Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires.
  • [5] Arthur H. Copeland and Paul Erdös. Note on normal numbers. Bulletin American Mathematical Society, 52:857–860, 1946.
  • [6] Nicolaas Gover de Bruijn. A combinatorial problem. Indagationes Mathematicae, 8:461–467, 1946.
  • [7] Andrzej Ehrenfeucht and Jan Mycielski. A pseudorandom sequence – How random is it? American Mathematical Monthly, 99(4):373–375, 1992.
  • [8] Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. Cambridge University Press, 2009.
  • [9] Lauwerens Kuipers and Harald Niederreiter. Uniform distribution of sequences. Dover, 2006.
  • [10] Zeev Rudnick and Alexandru Zaharescu. The distribution of spacings between fractional parts of lacunary sequences. Forum Mathematicum, 14(5):691–712, 2002.
  • [11] Benjamin Weiss. Poisson generic points. Jean-Morlet Chair 2020 - Conference: Diophantine Problems, Determinism and Randomness, Centre International de Rencontres Mathématiques, November 23 to 29, 2020. Audio- visual resource: doi:10.24350/CIRM.V.19690103.

Verónica Becher
Departamento de Computación, Facultad de Ciencias Exactas y Naturales & ICC, Universidad de Buenos Aires & CONICET, Argentina
vbecher@dc.uba.ar

Gabriel Sac Himelfarb
Departamento de Matemática & Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
gabrielsachimelfarb@gmail.com