A construction of a $\displaystyle\lambda$ -Poisson generic sequence

Verónica Becher and Gabriel Sac Himelfarb

Abstract

Years ago Zeev Rudnick defined the $\displaystyle\lambda$ -Poisson generic sequences as the infinite sequences of symbols in a finite alphabet where the number of occurrences of long words in the initial segments follow the Poisson distribution with parameter $\displaystyle\lambda$ . Although almost all sequences, with respect to the uniform measure, are Poisson generic, no explicit instance has yet been given. In this note we give a construction of an explicit $\displaystyle\lambda$ -Poisson generic sequence over any alphabet and any positive $\displaystyle\lambda$ , except for the case of the two-symbol alphabet, in which it is required that $\displaystyle\lambda$ be less than or equal to the natural logarithm of $\displaystyle 2$ . Since $\displaystyle\lambda$ -Poisson genericity implies Borel normality, the constructed sequences are Borel normal. The same construction provides explicit instances of Borel normal sequences that are not $\displaystyle\lambda$ -Poisson generic.

Keywords: Poisson generic, normal numbers, de Bruijn sequence

MSC Classification: 11K16,05A05, 60G55.

1 Introduction and Statement of Results

A real number is Poisson generic to an integer base $\displaystyle b$ greater than or equal to $\displaystyle 2$ if the number of occurrences of long blocks in the initial segments of its base- $\displaystyle b$ expansion follow the Poisson distribution. The definition was given years ago by Zeev Rudnick [1, 11], who thought of it as a property stronger than Borel normality that still holds for almost all real numbers with respect to the Lebesgue measure.¹¹1He called the notion supernormality. Personal communication from Z. Rudnick to V. Becher, 24 May 2017. He was motivated his result in [10] that in almost all dilates of lacunary sequences the number of elements in a random interval of the size of the mean spacing follows the Poisson law. Rudnick asked for an explicit instance of a Poisson generic real number.

Since we consider fractional expansions of real numbers in a fixed integer base, the definition of Poisson genericity can be given for infinite sequences of symbols in a finite alphabet. We write $\displaystyle\mathbb{N}_{0}$ for the set of non-negative integers, and $\displaystyle\mathbb{N}$ for the set of positive integers. Let $\displaystyle\Omega$ be an alphabet of $\displaystyle b$ symbols, for $\displaystyle b\geq 2$ . We write $\displaystyle\Omega^{\mathbb{N}}$ for the set of infinite sequences of symbols in $\displaystyle\Omega$ . The finite sequences of symbols in $\displaystyle\Omega$ are called words and $\displaystyle\Omega^{k}$ denotes the set of words of length $\displaystyle k$ .

We number the positions in words and infinite sequences starting at $\displaystyle 1$ and we write $\displaystyle w[i...j]$ for the subsequence of $\displaystyle w$ that begins in position $\displaystyle i$ and ends in position $\displaystyle j$ . For a word $\displaystyle w$ we denote its length as $\displaystyle|w|$ . Given two words $\displaystyle w$ and $\displaystyle v$ , the number of occurrences of $\displaystyle v$ in $\displaystyle w$ is:

$\displaystyle|v|_{w}=\#\{1\leq i\leq|v|-|w|+1:v[i...i+|w|-1]=w\}.$

For example, $\displaystyle|0001|_{00}=2$ .

For $\displaystyle x\in\Omega^{\mathbb{N}}$ , a positive real number $\displaystyle\lambda$ , $\displaystyle i\in\mathbb{N}_{0}$ and $\displaystyle k\in\mathbb{N}$ we write $\displaystyle Z^{\lambda}_{i,k}(x)$ for the proportion of words of length $\displaystyle k$ that occur exactly $\displaystyle i$ times in $\displaystyle x[1..\lfloor\lambda b^{k}\rfloor+k-1]$ ,

\displaystyle Z^{\lambda}_{i,k}(x)=\frac{\#\{w\in\Omega^{k}:|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}=i\}}{b^{k}}.

Definition 1.

Let $\displaystyle\lambda$ be a positive real number. A sequence $\displaystyle x\in\Omega^{\mathbb{N}}$ is $\displaystyle\lambda$ -Poisson generic if for every $\displaystyle i\in\mathbb{N}_{0}$ ,

\displaystyle\lim_{k\rightarrow\infty}Z^{\lambda}_{i,k}(x)=e^{-\lambda}\frac{\lambda^{i}}{i!}.

A sequence is Poisson generic if it is $\displaystyle\lambda$ -Poisson generic for all positive real numbers $\displaystyle\lambda$ .

The $\displaystyle\lambda$ -Poisson generic property can be thought of in terms of random allocations of balls in bins, where the $\displaystyle N=\lfloor\lambda b^{k}\rfloor$ initial words of length $\displaystyle k$ of a random sequence are the balls, and the $\displaystyle b^{k}$ possible words of length $\displaystyle k$ are the bins. These allocations are almost independent: it can be checked that the probability that two words in $\displaystyle\Omega^{k}$ picked uniformly at random appear in fixed overlapping positions is exactly $\displaystyle b^{-2k}$ , as if they were independent. The occupancy of a random bin satisfies a Poisson law in the limit, the proof can be read from [8, Example III.10].

Benjamin Weiss and Yuval Peres [11] proved that almost all sequences with respect to the uniform measure²²2The uniform measure over $\displaystyle\Omega^{\mathbb{N}}$ is the infinite product of the uniform measure over the alphabet $\displaystyle\Omega$ . The uniform measure on $\displaystyle\Omega^{\mathbb{N}}$ coincides with the Lebesgue measure when we identify the real numbers with their fractional expansions in each given integer base. are Poisson generic. In fact, they proved the following stronger result: Consider the finite probability spaces $\displaystyle\Omega^{k}$ , $\displaystyle k\in\mathbb{N}$ , with the uniform probability measure $\displaystyle\mu^{k}$ . Fix $\displaystyle x\in\Omega^{\mathbb{N}}$ , and define on these spaces, for each bounded Borel set $\displaystyle S\subset\mathbb{R}^{+}$ , the integer valued random variable $\displaystyle M_{k}^{x}(S)$ in the following way: $\displaystyle M_{k}^{x}(S)(\omega)$ counts how many times the word $\displaystyle\omega$ occurs in $\displaystyle x$ at a position in the set $\displaystyle{\mathbb{N}}\cap\{b^{k}s:s\in S\}$ . Then, for almost all $\displaystyle x$ with respect to the uniform measure, $\displaystyle M_{k}^{x}(\cdot)$ converges in distribution to the Poisson point process in the positive real line as $\displaystyle k$ converges to infinity, see also [1, Theorem 1]. Since

\displaystyle Z_{i,k}^{\lambda}(x)=\mu^{k}\left(\{\omega\in\Omega^{k}:M^{x}_{k}((0,\lambda])(\omega)=i\}\right)

it follows that almost all $\displaystyle x\in\Omega^{\mathbb{N}}$ with respect to the uniform measure are Poisson generic. Despite this result, no explicit example has yet been given. The following is the main result of this note and its corollary gives a construction of an explicit $\displaystyle\lambda$ -Poisson generic sequence over any alphabet and any positive $\displaystyle\lambda$ , except for the case of the two-symbol alphabet, in which it is required that $\displaystyle\lambda$ be less than or equal to the natural logarithm of $\displaystyle 2$ .

Theorem 1.

Let $\displaystyle\lambda$ be a positive real number and $\displaystyle\Omega$ a $\displaystyle b$ -symbol alphabet. Let $\displaystyle(p_{i})_{i\in\mathbb{N}_{0}}$ be a sequence of non-negative real numbers such that $\displaystyle\sum\limits_{i\geq 0}p_{i}=1$ and $\displaystyle\sum\limits_{i\geq 0}ip_{i}=\lambda$ , and let $\displaystyle p_{0}$ be greater than or equal to $\displaystyle 1/2$ if $\displaystyle b=2$ . Then, there is a construction of an infinite sequence $\displaystyle x$ over alphabet $\displaystyle\Omega$ , which satisfies for every $\displaystyle i\in\mathbb{N}_{0}$ ,

\displaystyle\lim_{k\rightarrow\infty}Z^{\lambda}_{i,k}(x)=p_{i}.

By taking $\displaystyle p_{i}=e^{-\lambda}\lambda^{i}/i!$ we obtain the promised result. In the sequel we write $\displaystyle\ln$ for the natural logarithm, namely, the logarithm in base $\displaystyle e$ .

Corollary 1.

Let $\displaystyle\Omega$ be $\displaystyle b$ -symbol alphabet. In case $\displaystyle b=2$ , fix a positive real number $\displaystyle\lambda$ less than or equal to $\displaystyle\ln(2)$ ; otherwise fix any positive real number $\displaystyle\lambda$ . Then, there is a construction of a $\displaystyle\lambda$ -Poisson generic sequence $\displaystyle x\in\Omega^{\mathbb{N}}$ .

To prove Theorem 1 we give a construction that consists in concatenating segments of any infinite de Bruijn sequence (see Definition 3), which is a sequence that satisfies that each initial segment of length $\displaystyle b^{k}$ is a cyclic de Bruijn word of order $\displaystyle k$ [2, Theorem 1],. Our construction works by selecting segments of this given sequence and repeating them as many times as determined by the probabilities $\displaystyle p_{i}$ , for every $\displaystyle i\in\mathbb{N}_{0}$ .

Remark 1.

An infinite sequence $\displaystyle x=a_{1}a_{2}\ldots$ of symbols in a given alphabet is computable exactly when the map $\displaystyle k\mapsto a_{k}$ is computable. Since the set of computable sequences is countable, it has uniform measure $\displaystyle 0$ , so the existence of computable Poisson generic sequences does not necessarily follow from the fact that the set of Poisson generic sequences has full measure. In [1, Theorem 2] it is shown that there exist countably many Poisson generic computable sequences. Theorem 1 yields an explicit computable instance whenever $\displaystyle(p_{i})_{i\in\mathbb{N}}$ is a computable sequence of real numbers, which means that the map $\displaystyle(i,n)\mapsto$ the $\displaystyle n$ -th digit in the base- $\displaystyle b$ expansion of $\displaystyle p_{i}$ , is computable.

For the next result we consider Borel’s definition of normality for sequences of symbols in a given alphabet. An introduction to the theory of normal numbers can be read from [9, 3].

Definition 2.

Let $\displaystyle\Omega$ be a $\displaystyle b$ -symbol alphabet, $\displaystyle b\geq 2$ . A sequence $\displaystyle x\in\Omega^{{\mathbb{N}}}$ is Borel normal if every word $\displaystyle w$ occurs in $\displaystyle x$ with the same limiting frequency as every other word of the same length,

\displaystyle\lim_{n\to\infty}\frac{|x[1..n]|_{w}}{n}=b^{-|w|}.

In [11] Weiss showed that $\displaystyle 1$ -Poisson genericity implies Borel normality and that the two notions do not coincide, witnessed by the famous Champernowne sequence³³3Bejamin Weiss first presented this proof at the Institute for Advanced Study, Princeton University USA on June 16, 2010, as part of his conference on “Random-like behavior in deterministic systems”. It is available at https://www.youtube.com/watch?v=8AB7591De68&ab_channel=InstituteforAdvancedStudy. It was transcribed and completed in [4]. It is immediate to see that the infinite de Bruijn sequences (see Definition 3) are not $\displaystyle 1$ -Poisson generic either. In Theorem 2 we present a Borel normality criterion that generalizes this fact. In contrast to Theorem 1, this result has no limitations in the case of the two-symbol alphabet.

Theorem 2.

Let $\displaystyle\Omega$ be a $\displaystyle b$ -symbol alphabet, $\displaystyle b\geq 2$ , and let $\displaystyle x\in\Omega^{\mathbb{N}}$ . We fix a positive real number $\displaystyle\lambda$ and define for every $\displaystyle i\in\mathbb{N}_{0}$ the numbers $\displaystyle p_{i}=\liminf_{k\rightarrow\infty}Z_{i,k}^{\lambda}(x)$ . If the numbers $\displaystyle p_{i}$ satisfy $\displaystyle\sum_{i\geq 0}ip_{i}=\lambda$ then $\displaystyle x$ is Borel normal to base $\displaystyle b$ .

Remark 2.

It is easy to verify that if numbers $\displaystyle p_{i}$ are defined as in Theorem 2 then it is always the case that $\displaystyle\sum\limits_{i\geq 0}ip_{i}\leq\lambda$ (for example, it follows from Fatou’s Lemma).

The following is a consequence of Theorem 2.

Corollary 2.

Every $\displaystyle\lambda$ -Poisson generic sequence is Borel normal, but the two notions do not coincide. The construction in Theorem 1 yields infinitely many Borel normal sequences which are not $\displaystyle\lambda$ -Poisson generic.

2 Proof of Theorem 1

2.1 The construction

A cyclic de Bruijn word of order $\displaystyle n$ in a $\displaystyle b$ -symbol alphabet is a word $\displaystyle w$ of length $\displaystyle b^{n}$ where each word of length $\displaystyle n$ occurs exactly once in the circular word determined by $\displaystyle w$ . The classical reference is [6] but they have been found independently also by I. J. Good and by N. Korobov around the same time. Our construction is based on the following property of de Bruijn words.

Lemma 1 (Becher and Heiber [2, Theorem 1]).

1.

Every cyclic de Bruijn word of order $\displaystyle n$ over an alphabet of at least three symbols can be extended to a cyclic de Bruijn word of order $\displaystyle n+1$ .
2.

Every de Bruijn word of order $\displaystyle n$ in two symbols can not be extended to order $\displaystyle n+1$ , but it can be extended to order $\displaystyle n+2$ .

For example, consider the alphabet $\displaystyle\{0,1,2\}$ . Then, $\displaystyle 012110022$ is a cyclic de Bruijn word of order $\displaystyle 2$ which can be extended to $\displaystyle 012110022010200011120212221$ , which is a cyclic de Bruijn word of order $\displaystyle 3$ .

Lemma 1 allows us to define infinite de Bruijn sequences.

Definition 3.

An infinite de Bruijn sequence in a $\displaystyle b$ -symbol alphabet $\displaystyle\Omega$ , $\displaystyle b\geq 3$ , is an infinite sequence $\displaystyle x\in\Omega^{\mathbb{N}}$ such that for every $\displaystyle k\in\mathbb{N}$ , $\displaystyle x[1...b^{k}]$ is a cyclic de Bruijn word of order $\displaystyle k$ . In the case $\displaystyle b=2$ , we say $\displaystyle x\in\Omega^{\mathbb{N}}$ is an infinite de Bruijn sequence if for every $\displaystyle k\in\mathbb{N}$ , $\displaystyle x[1...2^{2k-1}]$ is a cyclic de Bruijn word of order $\displaystyle 2k-1$ .

Given a real number $\displaystyle y\in[0,1)$ , we write $\displaystyle\{y\}_{k}$ for the truncation to $\displaystyle k$ digits of the unique base- $\displaystyle b$ representation of $\displaystyle y$ which does not end in an infinite tail of $\displaystyle(b-1)$ ’s. In the sole case $\displaystyle y=1$ , we choose the base- $\displaystyle b$ representation $\displaystyle\sum_{i\geq 1}(b-1)b^{-i}$ .

Construction.

Let $\displaystyle\lambda$ be a positive real number and $\displaystyle\Omega$ a $\displaystyle b$ -symbol alphabet , $\displaystyle b\geq 2$ . Let $\displaystyle(p_{i})_{i\in\mathbb{N}_{0}}$ be a sequence of non-negative real numbers such that $\displaystyle\sum\limits_{i\geq 0}p_{i}=1$ and $\displaystyle\sum\limits_{i\geq 0}ip_{i}=\lambda$ , and let $\displaystyle p_{0}\geq 1/2$ if $\displaystyle b=2$ . We define $\displaystyle g:\mathbb{N}\rightarrow\mathbb{N}$ as $\displaystyle g(k)=\left\lceil\frac{k}{2}\right\rceil$ and we define the real numbers $\displaystyle(p_{i}^{k})_{i\geq 0,k\geq 1}$ inductively as follows. For every $\displaystyle i\geq 1$ ,

	$\displaystyle\displaystyle p_{i}^{1}$	$\displaystyle\displaystyle=\{p_{i}\}_{g(1)},$
	$\displaystyle\displaystyle p_{0}^{1}$	$\displaystyle\displaystyle=1-\sum_{i\geq 1}p_{i}^{1}.$

And for every $\displaystyle k\geq 1$ and $\displaystyle i\geq 1$ ,

	$\displaystyle\displaystyle p_{i}^{k+1}$	$\displaystyle\displaystyle=\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}$
	$\displaystyle\displaystyle p_{0}^{k+1}$	$\displaystyle\displaystyle=1-\sum_{i\geq 0}p_{i}^{k+1}.$

We fix an infinite de Bruijn sequence $\displaystyle A$ over the alphabet $\displaystyle\Omega$ . We define $\displaystyle A_{k}$ to be $\displaystyle A[1...b^{k}]$ .

Given a sequence $\displaystyle w$ of length $\displaystyle b^{k}$ we say $\displaystyle\delta$ is a block in $\displaystyle w$ if it is a subsequence of $\displaystyle w$ and $\displaystyle|\delta|=b^{j}\leq b^{k}$ for some $\displaystyle j\in\mathbb{N}_{0}$ . We say that a block $\displaystyle\delta$ in $\displaystyle w$ has absolute length $\displaystyle|\delta|$ and relative length $\displaystyle|\delta|b^{-k}$ with respect to $\displaystyle w$ .

The construction works by steps. Let $\displaystyle x_{k}$ be the output of the construction after Step $\displaystyle k$ . For all $\displaystyle k$ , $\displaystyle x_{k}$ is a prefix of $\displaystyle x_{k+1}$ . The output of the construction is the infinite word $\displaystyle x$ obtained as the limit of the finite words $\displaystyle x_{k}$ . Start with $\displaystyle x_{0}$ equal to the empty word.

Step 1.

In this first step, we consider the base- $\displaystyle b$ expansion of $\displaystyle p_{i}^{1}$ , for $\displaystyle i\geq 1$ ,

\displaystyle p_{i}^{1}=0.c_{i}

For each $\displaystyle c_{i}$ , $\displaystyle i\geq 1$ , we choose $\displaystyle c_{i}$ blocks of relative length $\displaystyle b^{-1}$ with respect to $\displaystyle A_{1}$ , that is, blocks of length $\displaystyle 1$ (if $\displaystyle c_{i}=0$ we don’t choose any blocks). The selected blocks should be non-overlapping. This is possible thanks to the fact that $\displaystyle\sum_{i\geq 1}p_{i}^{1}\leq 1$ . We select the blocks from left to right, leaving no gaps at the beginning or in between blocks.

The output of the construction after Step 1 is the concatenation of the chosen blocks, in any order, where for every $\displaystyle i\geq 1$ each of the $\displaystyle c_{i}$ selected blocks is repeated exactly $\displaystyle i$ times.

Step k+1.

We consider the base- $\displaystyle b$ expansion of $\displaystyle\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}$ for $\displaystyle i\geq 1$ :

\displaystyle\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}=0.a_{i,1}a_{i,2}..a_{i,g(k+1)}

where $\displaystyle a_{i,j}\in\{0,1,2,...,b-1\}$ .

We select blocks in $\displaystyle A_{k+1}$ in the following manner: for each $\displaystyle a_{i,j}$ , $\displaystyle i\geq 1$ , $\displaystyle j\leq g(k+1)$ , we choose $\displaystyle a_{i,j}$ blocks of relative length $\displaystyle b^{-j}$ with respect to $\displaystyle A_{k+1}$ . If $\displaystyle a_{i,j}=0$ we don’t select any blocks. Notice that only finitely many blocks are selected. All the selected blocks should be non-overlapping. This is possible due to the fact that

\displaystyle\sum_{i\geq 1}\sum_{1\leq j\leq g(k+1)}a_{i,j}\frac{1}{b^{j}}=\sum_{i\geq 1}\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}\leq\frac{b-1}{b}=\frac{|A_{k+1}[b^{k}+1...b^{k+1}]|}{b^{k+1}}

In the case $\displaystyle b=3$ , we can select the blocks anywhere in $\displaystyle A_{k+1}$ , so there could be gaps between the blocks selected at step $\displaystyle k$ and the ones at step $\displaystyle k+1$ . For example, we may take blocks from $\displaystyle A_{k+1}[b^{k}+1...b^{k+1}]$ . In the case $\displaystyle b=2$ , however, we do not allow gaps.

The construction now appends the chosen blocks to $\displaystyle x_{k}$ . For every $\displaystyle i\geq 1$ , $\displaystyle j\leq g(k+1)$ , each of the $\displaystyle a_{i,j}$ selected blocks is repeated exactly $\displaystyle i$ times. We refer to each of the chosen blocks of $\displaystyle A$ as constituent segments in the output $\displaystyle x_{k+1}$ . We say that the concatenation of $\displaystyle i$ -many copies of a constituent segment corresponding to $\displaystyle a_{i,j}$ is a run segment in the output.

2.2 An example

To illustrate the way the construction works, we give an example of three steps of the execution. Just to make the example more enlightening, we set $\displaystyle g(k)=k$ in this section. Take $\displaystyle p_{0}=0$ , $\displaystyle p_{1}=1/2$ , $\displaystyle p_{2}=5/18$ , $\displaystyle p_{3}=2/9$ , and $\displaystyle p_{i}=0$ for $\displaystyle i\geq 4$ . In this case $\displaystyle\lambda=31/18$ . Now fix $\displaystyle b=3$ , $\displaystyle\Omega=\{0,1,2\}$ and

\displaystyle A=012110022010200011120212221...

$\displaystyle p_{1}=0.1111...$ and $\displaystyle\frac{2}{3}p_{1}=0.1000...$

$\displaystyle p_{2}=0.0211...$ and $\displaystyle\frac{2}{3}p_{2}=0.0120...$

$\displaystyle p_{3}=0.0200...$ and $\displaystyle\frac{2}{3}p_{3}=0.0110...$

Step 1:

\displaystyle A=\fcolorbox[gray]{0}{0.9}{0}12\bigg{|}110022\bigg{|}010200011120212221\bigg{|}...

\displaystyle x_{1}=\fcolorbox[gray]{0}{0.9}{0}

Step 2:

\displaystyle A=012\bigg{|}\fcolorbox[gray]{0}{0.9}{110}\fcolorbox[gray]{0}{0.7}{0}\fcolorbox[gray]{0}{0.5}{2}2\bigg{|}010200011120212221\bigg{|}...

\displaystyle x_{2}=0\fcolorbox[gray]{0}{0.9}{110}\fcolorbox[gray]{0}{0.7}{0}\fcolorbox[gray]{0}{0.7}{0}\fcolorbox[gray]{0}{0.5}{2}\fcolorbox[gray]{0}{0.5}{2}\fcolorbox[gray]{0}{0.5}{2}

In this case 0, 110, 0 and 2 are the constituent segments of $\displaystyle x_{2}$ .

Step 3:

\displaystyle A=012\bigg{|}110022\bigg{|}\fcolorbox[gray]{0}{0.9}{010200011}\fcolorbox[gray]{0}{0.7}{120}\fcolorbox[gray]{0}{0.5}{021}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.5}{1}\bigg{|}...

\displaystyle x_{3}=011000222\fcolorbox[gray]{0}{0.9}{010200011}\fcolorbox[gray]{0}{0.7}{120}\fcolorbox[gray]{0}{0.7}{120}\fcolorbox[gray]{0}{0.5}{212}\fcolorbox[gray]{0}{0.5}{212}\fcolorbox[gray]{0}{0.5}{212}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.7}{2}\fcolorbox[gray]{0}{0.5}{1}\fcolorbox[gray]{0}{0.5}{1}\fcolorbox[gray]{0}{0.5}{1}

In this case, 120120 is the run segment corresponding to the constituent segment 120, and 212212212 is the run segment corresponding to the constituent segment 212.

2.3 Correctness

To prove the correctness of the construction we use the following fact and five lemmas.

Fact 1.

For every $\displaystyle y\in[0,1)$ and every $\displaystyle k\geq 1$ , $\displaystyle y-\frac{1}{b^{k}}<\{y\}_{k}\leq y$ . In the case $\displaystyle y=1$ , $\displaystyle\{y\}_{k}=y-\frac{1}{b^{k}}$ .

Lemma 2.

Let $\displaystyle b=2$ and $\displaystyle k\geq 2$ . Then, at step $\displaystyle k$ of the construction it is always possible to choose all necessary blocks from $\displaystyle A[1...2^{k-1}]$ .

Proof.

First of all, notice that for $\displaystyle k\geq 1$ , the relative length with respect to $\displaystyle A_{k+1}$ of the blocks we need to choose at step $\displaystyle k+1$ is

\displaystyle\sum_{i\geq 1}\sum_{1\leq j\leq g(k+1)}a_{i,j}\frac{1}{2^{j}}=\sum_{i\geq 1}\left\{\frac{1}{2}p_{i}\right\}_{g(k+1)}\leq\frac{1}{2}\sum_{i\geq 1}p_{i}=\frac{1}{2}(1-p_{0})\leq\frac{1}{4}=\frac{|A[2^{k-1}+1...2^{k}]|}{2^{k+1}},

where $\displaystyle a_{i,j}$ has the same meaning as in the construction, and we used the hypothesis $\displaystyle p_{0}\geq\frac{1}{2}$ in the last inequality.

This means that $\displaystyle A[2^{k-1}+1...2^{k}]$ has enough space to accommodate all the necessary blocks at step $\displaystyle k+1$ . Then, we only need to check that $\displaystyle A[2^{k-2}+1\dots 2^{k-1}]$ is free at step $\displaystyle k$ for every $\displaystyle k\geq 2$ . We can check it inductively. In the first step, the used proportion of $\displaystyle A_{1}$ is

\displaystyle\sum_{i\geq 1}p_{i}^{1}\leq\sum_{i\geq 1}p_{i}=1-p_{0}\leq\frac{1}{2},

where the last inequality holds because $\displaystyle p_{0}\geq 1/2$ . Then, at least half of $\displaystyle A_{1}=A[1...2]$ remains unused after step 1, so $\displaystyle A[2^{2-2}+1...2^{2-1}]=A[2...2]$ is free at step $\displaystyle 2$ . This proves the base case.

Now suppose that at step $\displaystyle k$ , $\displaystyle A[2^{k-2}+1...2^{k-1}]$ is free. Thanks to the first observation, we can choose all necessary blocks there. This leaves $\displaystyle A[2^{k-1}+1...2^{k}]$ free to use at step $\displaystyle k+1$ .

∎

Lemma 3.

For every $\displaystyle i\geq 1$ the sum of the relative lengths with respect to $\displaystyle A_{k}$ of all constituent segments in the output $\displaystyle x_{k}$ that are repeated exactly $\displaystyle i$ times is $\displaystyle p_{i}^{k}$ .

Proof.

It can easily be checked by induction on $\displaystyle k$ , using the definition of $\displaystyle p_{i}^{k}$ and the way the construction operates. If $\displaystyle k=1$ , this is immediately true by Step 1 of the construction. Assuming the statement is true for $\displaystyle k$ , let us see it is also true for $\displaystyle k+1$ . Notice that blocks that occur in $\displaystyle x_{k}$ have a relative length in $\displaystyle A_{k+1}$ which is $\displaystyle 1/b$ of their relative length in $\displaystyle A_{k}$ . The extra blocks added contribute with $\displaystyle\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}$ to the sum. Then, the sum of the relative lengths with respect to $\displaystyle A_{k+1}$ is

\displaystyle\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}=p_{i}^{k+1}.

∎

Lemma 4.

For every $\displaystyle i\in\mathbb{N}_{0}$ , $\displaystyle\lim\limits_{k\rightarrow\infty}p_{i}^{k}=p_{i}$ . In fact, for every $\displaystyle i\geq 1$ , $\displaystyle k\geq 1$ , the following estimation holds,

p_{i}-\frac{k}{b^{g(k)}}\leq p_{i}^{k}\leq p_{i}.

(

\displaystyle\dagger

)

Proof.

For $\displaystyle i\geq 1$ we prove ( $\displaystyle\dagger$ ‣ 4) by induction on $\displaystyle k$ . If $\displaystyle k=1$ it follows immediately from the definition of $\displaystyle p_{i}^{1}$ and Fact 1. For the inductive step, notice that

\displaystyle\displaystyle p_{i}^{k+1}=\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}\leq\frac{1}{b}p_{i}+\frac{b-1}{b}p_{i}\leq p_{i}.

	$\displaystyle\displaystyle p_{i}-p_{i}^{k+1}=p_{i}-\left(\frac{1}{b}p_{i}^{k}+\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}\right)$	$\displaystyle\displaystyle\leq p_{i}-\frac{1}{b}\left(p_{i}-\frac{k}{b^{g(k)}}\right)-\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}$
		$\displaystyle\displaystyle\leq\frac{b-1}{b}p_{i}-\left\{\frac{b-1}{b}p_{i}\right\}_{g(k+1)}+\frac{k}{b^{1+g(k)}}$
		$\displaystyle\displaystyle\leq\frac{1}{b^{g(k+1)}}+\frac{k}{b^{1+g(k)}}$
		$\displaystyle\displaystyle\leq\frac{k+1}{b^{g(k+1)}}.$

In the last inequality we used the fact that $\displaystyle g(k+1)\leq g(k)+1$ .

In the case of $\displaystyle i=0$ ,

\displaystyle\displaystyle|p_{0}^{k}-p_{0}|=\left|1-\sum_{i\geq 1}p_{i}^{k}-\left(1-\sum_{i\geq 1}p_{i}\right)\right|=\sum_{i\geq 1}(p_{i}-p_{i}^{k}).

Given $\displaystyle\varepsilon>0$ , there exists $\displaystyle N>0$ such that $\displaystyle\sum_{i\geq N+1}p_{i}<\frac{\varepsilon}{2}$ . Then,

\displaystyle|p_{0}^{k}-p_{0}|\leq\sum_{i=1}^{N}(p_{i}-p_{i}^{k})+\sum_{i\geq N+1}p_{i}<\frac{kN}{b^{g(k)}}+\frac{\varepsilon}{2}.

If $\displaystyle k$ is big enough, then $\displaystyle\frac{kN}{b^{g(k)}}<\frac{\varepsilon}{2}$ and $\displaystyle|p_{0}^{k}-p_{0}|<\varepsilon$ , as desired. ∎

Lemma 5.

Let $\displaystyle x_{k}$ be the word output by the construction after step $\displaystyle k$ . Then,

\displaystyle\lim_{k\rightarrow\infty}\frac{|\lfloor\lambda b^{k}\rfloor+k-1-|x_{k}||}{b^{k}}=0.

Proof.

By Lemma 3, $\displaystyle|x_{k}|=b^{k}\sum\limits_{i\geq 1}ip_{i}^{k}$ . Then,

\displaystyle\frac{|\lfloor\lambda b^{k}\rfloor+k-1-|x_{k}||}{b^{k}}\leq\frac{k-1}{b^{k}}+\frac{|\lfloor\lambda b^{k}\rfloor-\lambda b^{k}|}{b^{k}}+\left|\lambda-\sum\limits_{i\geq 1}ip_{i}^{k}\right|\leq\frac{k}{b^{k}}+\left|\lambda-\sum\limits_{i\geq 1}ip_{i}^{k}\right|.

It suffices to prove then that the last term converges to zero. Recall that $\displaystyle(p_{i})_{i\in\mathbb{N}_{0}}$ satisfies $\displaystyle\lambda=\sum\limits_{i\geq 1}ip_{i}$ . Hence,

\displaystyle\left|\lambda-\sum\limits_{i\geq 1}ip_{i}^{k}\right|=\sum\limits_{i\geq 1}i(p_{i}-p_{i}^{k}).

Given $\displaystyle\varepsilon>0$ take $\displaystyle N$ big enough so that $\displaystyle\sum\limits_{i\geq N+1}ip_{i}<\frac{\varepsilon}{2}$ . By means of Equation ( $\displaystyle\dagger$ ‣ 4) of Lemma 4,

\displaystyle\sum\limits_{i\geq 1}i(p_{i}-p_{i}^{k})\leq\sum\limits_{i=1}^{N}i\frac{k}{b^{g(k)}}+\sum\limits_{i\geq N+1}ip_{i}<\frac{k}{b^{g(k)}}\frac{N(N+1)}{2}+\frac{\varepsilon}{2}.

Clearly, when $\displaystyle k$ is large enough, $\displaystyle\frac{k}{b^{g(k)}}\frac{N(N+1)}{2}<\frac{\varepsilon}{2}$ . ∎

Recall that each of the blocks of $\displaystyle A$ used at step $\displaystyle k$ is a constituent segment in the output $\displaystyle x_{k}$ , and the concatenation of $\displaystyle i$ -many copies of a constituent segment corresponding to $\displaystyle a_{i,j}$ is a run segment. Let $\displaystyle B_{k}$ be the number of run segments in the output $\displaystyle x_{k}$ .

Lemma 6.

The quantities $\displaystyle B_{k}$ satisfy

\displaystyle\lim_{k\rightarrow\infty}\frac{kB_{k}}{b^{k}}=0.

Proof.

Notice that for every $\displaystyle\ell\geq 2$ .

	$\displaystyle\displaystyle\frac{1}{b^{g(\ell)}}(B_{\ell}-B_{\ell-1})$	$\displaystyle\displaystyle=\frac{1}{b^{g(\ell)}}\sum_{i\geq 1}\sum_{j=1}^{g(\ell)}a_{i,j}$
		$\displaystyle\displaystyle\leq\sum_{i\geq 1}\sum_{j=1}^{g(\ell)}\frac{1}{b^{j}}a_{i,j}$
		$\displaystyle\displaystyle\leq\sum_{i\geq 1}\frac{b-1}{b}p_{i}$
		$\displaystyle\displaystyle\leq 1.$

Recall $\displaystyle g(\ell)=\left\lceil\frac{\ell}{2}\right\rceil$ . Notice

\displaystyle B_{k}=B_{1}+\sum\limits_{\ell=2}^{k}B_{\ell}-B_{\ell-1}.

Then we obtain

	$\displaystyle\displaystyle\frac{kB_{k}}{b^{k}}\leq$	$\displaystyle\displaystyle\frac{k\sum\limits_{\ell=2}^{k}b^{g(\ell)}+kB_{1}}{b^{k}}$
		$\displaystyle\displaystyle\leq\frac{k\sum\limits_{\ell=0}^{k}b^{1+\ell/2}+kB_{1}}{b^{k}}$
		$\displaystyle\displaystyle\leq\frac{b(b^{1/2}-1)^{-1}k(b^{(k+1)/2}-1)+kB_{1}}{b^{k}},$

which converges to $\displaystyle 0$ as $\displaystyle k$ goes to infinity. ∎

Remark 3.

Notice there are many other alternatives for the function $\displaystyle g$ . For instance, any function which satisfies the following conditions is a suitable choice:

•

$\displaystyle g(k)\leq k/m$ for every $\displaystyle k\in\mathbb{N}$ , where $\displaystyle m$ is a constant greater than $\displaystyle 1$ .
•

$\displaystyle g(k+1)\leq g(k)+1$ for every $\displaystyle k\in\mathbb{N}$ .
•

$\displaystyle\frac{k}{b^{g(k)}}\xrightarrow{k\rightarrow\infty}0$

The first and second condition ensure Lemma 6 and 4 still hold, respectively. The third condition guarantees that $\displaystyle\lim_{k\rightarrow\infty}p_{i}^{k}=p_{i}$ . For example, $\displaystyle g(k)=\lceil\sqrt{k}\rceil$ is a possible choice, but $\displaystyle g(k)=\lceil\log_{b}(k)\rceil$ is not because the last condition fails.

We are now ready for the proof of Theorem 1.

Proof of Theorem 1.

First we consider the case $\displaystyle i\geq 1$ and we estimate the value of $\displaystyle Z^{\lambda}_{i,k}(x)$ . Since we are only interested in the value of $\displaystyle Z^{\lambda}_{i,k}(x)$ as $\displaystyle k$ tends to infinity, by Lemma 5, it suffices to count occurrences of words in $\displaystyle x_{k}$ instead of $\displaystyle x[1...\lfloor\lambda b^{k}\rfloor]$ .

Let us see that the chosen constituent segments up to step $\displaystyle k$ don’t have any words of length $\displaystyle k$ in common. By Definition 3, if $\displaystyle b\geq 3$ , each word of length $\displaystyle k$ occurs exactly once in $\displaystyle A_{k}$ so the claim follows. If $\displaystyle b=2$ and $\displaystyle k$ is odd, then $\displaystyle A[1...2^{k}]$ is a de Bruijn word of order $\displaystyle k$ , so it does not repeat words of length $\displaystyle k$ . If $\displaystyle b=2$ and $\displaystyle k$ is even, $\displaystyle A[1...2^{k-1}]$ is a de Bruijn word of order $\displaystyle k-1$ , so it does not repeat words of length $\displaystyle k-1$ (hence, it does not repeat words of length $\displaystyle k$ either). Since up to step $\displaystyle k$ the construction has picked blocks only from $\displaystyle A[1...2^{k-1}]$ (thanks to Lemma 2), two different constituent segments share no words of length $\displaystyle k$ .

Therefore, if a constituent segment $\displaystyle w$ of relative length $\displaystyle b^{-j}$ with respect to $\displaystyle A_{k}$ and absolute length $\displaystyle b^{k-j}$ is repeated exactly $\displaystyle i$ times in $\displaystyle x_{k}$ , we may assume it contributes with $\displaystyle b^{k-j}$ words to the numerator of the counting function $\displaystyle Z^{\lambda}_{i,k}(x)$ , that is, it contributes to $\displaystyle Z^{\lambda}_{i,k}(x)$ with its relative length with respect to $\displaystyle A_{k}$ . To see this, notice that the words of length $\displaystyle k$ that could make the actual count differ from this approximation belong to one of the following groups:

1.

For each run segment corresponding to a constituent segment of length at least $\displaystyle k$ , we need to consider words that occur in between the constituent segments inside the run segment, which are at most $\displaystyle k$ different words, and words across the ending of the run segment and the beginning of the next one, which are also at most $\displaystyle k$ . Then, the number of such words is at most $\displaystyle 2k$ for each run segment.
2.

We also need to consider the number of different words in run segments composed of constituent segments of length $\displaystyle s<k$ . In this case, the first $\displaystyle s$ words of length $\displaystyle k$ in the run segment are repeated throughout the segment. There are less than $\displaystyle k$ extra words that occur across the end of the run segment and the beginning of the next one . Thus, there are less than $\displaystyle s+k<2k$ different words for each such run segment.

The error introduced by the previous assumption is then bounded by $\displaystyle 2kB_{k}$ , which by Lemma 6 becomes negligible as $\displaystyle k$ tends to infinity.

By the previous approximations, we can estimate $\displaystyle Z^{\lambda}_{i,k}(x)$ as the sum of the relative lengths of all constituent segments that occur exactly $\displaystyle i$ times in $\displaystyle x_{k}$ , which is $\displaystyle p_{i}^{k}$ by Lemma 3. We can then compute the limit of $\displaystyle Z^{\lambda}_{i,k}(x)$ as $\displaystyle k$ tends to infinity as $\displaystyle\lim\limits_{k\rightarrow\infty}p_{i}^{k}$ , which is equal to $\displaystyle p_{i}$ , thanks to Lemma 4.

To conclude, we must consider the case $\displaystyle i=0$ . We need to compute

\displaystyle Z^{\lambda}_{0,k}(x)=\frac{\#\{w\in\Omega^{k}:|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}=0\}}{b^{k}}.

The numerator is equal to the total number of words of length $\displaystyle k$ minus the number of words of length $\displaystyle k$ that occur at least once. Using the same approximations as before, we estimate the ratio as the relative length of the unused portion of $\displaystyle A_{k}$ after step $\displaystyle k$ , that is,

\displaystyle 1-\sum\limits_{i\geq 1}p_{i}^{k}=p_{0}^{k}.

By Lemma 4, we know that $\displaystyle p_{0}^{k}$ converges to $\displaystyle p_{0}$ as $\displaystyle k$ goes to infinity. This concludes the proof. ∎

Remark 4.

Our construction solves the problem of exhibiting a $\displaystyle\lambda$ -Poisson generic sequence for any fixed positive $\displaystyle\lambda$ for a $\displaystyle b$ -symbol alphabet with $\displaystyle b\geq 3$ , and for $\displaystyle\lambda\leq\ln(2)$ in case $\displaystyle b=2$ . It does not, however, allow us to generate a Poisson generic sequence. This is because we use an infinite de Bruijn sequence for the construction. Suppose $\displaystyle b\geq 3$ and we construct $\displaystyle x$ for $\displaystyle\lambda=1$ . Then the frequencies for $\displaystyle\lambda=1/b$ , $\displaystyle i\geq 1$ , satisfy,

\displaystyle\lim_{k\rightarrow\infty}Z^{1/b}_{i,k+1}(x)-\frac{1}{b}Z^{1}_{i,k}(x)=0.

But this relation does not hold in the case of the probability mass function of the Poisson distribution:

\displaystyle e^{-1/b}\frac{1}{b^{i}i!}\neq e^{-1}\frac{1}{bi!}.

In view of Remark 4 it remains unanswered if by using a different sequence $\displaystyle A$ as the source for blocks, our construction could be adapted to produce a Poisson generic sequence, over any alphabet of size $\displaystyle b\geq 2$ .

2.4 On the case of the two-symbol alphabet

Infinite de Bruijn sequences over an alphabet of more than two symbols satisfy, for every $\displaystyle k$ , that any two disjoint segments occurring before position $\displaystyle b^{k}$ do not share words of length $\displaystyle k$ nor longer. The two-symbol alphabet does not guarantee this. Consequently, Theorem 1 solves the problem of finding $\displaystyle\lambda$ -Poisson generic sequences over the two-symbol alphabet only partially, namely for $\displaystyle\lambda\leq\ln(2)$ . To solve this problem completely it would suffice to use what we may call a quasi-de Bruijn sequence, which we define as an infinite sequence $\displaystyle x\in\Omega^{\mathbb{N}}$ that satisfies

\displaystyle\lim_{k\rightarrow\infty}Z_{1,k}^{1}(x)=1.

That is, the proportion of words of length $\displaystyle k$ that do not occur exactly once in the prefixes $\displaystyle x[1...2^{k}+k-1]$ converges to zero. It is quite immediate to see that we can run the construction of Theorem 1 but using as an input a quasi-de Bruijn sequence in the two-symbol alphabet.

Observe that the infinite de Bruijn sequences in the two-symbol alphabet are not necessarily quasi-de Bruijn because although their initial segments of length $\displaystyle 2^{2k-1}$ are cyclic de Bruijn words, the initial segments of length $\displaystyle 2^{2k}$ are not. We do not know of any construction in the two-symbol alphabet proved to be quasi-de Bruijn. There is, however, empirical evidence supporting the hypothesis that the Eherenfeucht-Mycielski sequence [7] is indeed a quasi-de Bruijn sequence. Not much is known about this sequence: as of today, it hasn’t even been proven whether the limiting frequencies of zeros and ones are equal to $\displaystyle 1/2$ .

3 Proof of Theorem 2

The proof of Theorem 2 is a simple generalization of Weiss proof that $\displaystyle 1$ -Poisson genericity implies Borel normality [11]. It relies on a classical result due to Pyatetskii-Shapiro in 1951.

Lemma 7 (Pyatetskii-Shapiro [3, Theorem 4.6]).

Let $\displaystyle\Omega$ be a $\displaystyle b$ -symbol alphabet, $\displaystyle b\geq 2$ . Let $\displaystyle x\in\Omega^{\mathbb{N}}$ . If there exists a positive constant $\displaystyle C$ such that for every $\displaystyle\ell\in\mathbb{N}$ and every word $\displaystyle w\in\Omega^{\ell}$ ,

\displaystyle\limsup_{N\rightarrow\infty}\frac{|x[1...N]|_{w}}{N}\leq Cb^{-\ell},

then $\displaystyle x$ is Borel normal.

Let $\displaystyle w$ be a fixed word. We define the set $\displaystyle Bad(k,w,\varepsilon)$ as the set of words of length $\displaystyle k$ where the frequency of $\displaystyle w$ differs from the expected frequency for Borel normality by more than $\displaystyle\varepsilon$ .

\displaystyle Bad(k,w,\varepsilon)=\left\{v\in\Omega^{k}:\left||v|_{w}-kb^{-|w|}\right|\geq\varepsilon k\right\}

The cardinality of the set $\displaystyle Bad(k,w,\varepsilon)$ has exponential decay in $\displaystyle k$ . This follows from Bernstein’s inequality and was proved in the early works on Borel normal numbers, such as [5] and since then each work computes a similar upper bound.

Lemma 8.

Assume a $\displaystyle b$ -symbol alphabet. Let $\displaystyle k$ and $\displaystyle\ell$ be positive integers and let $\displaystyle\varepsilon$ be such that $\displaystyle 6/\lfloor k/\ell\rfloor\leq\varepsilon\leq 1/b^{\ell}$ . Then, for every word $\displaystyle w$ of length $\displaystyle\ell$ ,

\displaystyle|Bad(k,w,\varepsilon)|<4\ell b^{k+\ell}e^{-b^{\ell}\varepsilon^{2}k/(6\ell)}.

We can now give the awaiting proof.

Proof of Theorem 2.

We need to prove that $\displaystyle x$ is Borel normal for the $\displaystyle b$ -symbol alphabet, given that for all $\displaystyle i\in\mathbb{N}_{0}$ ,

\displaystyle\liminf_{k\to\infty}Z^{\lambda}_{i,k}(x)=p_{i},

\displaystyle\sum_{i\geq 0}ip_{i}=\lambda.

Fix a positive real number $\displaystyle\varepsilon$ . By hypothesis we know that $\displaystyle\sum_{i\geq 0}ip_{i}=\lambda$ . Let $\displaystyle i_{0}$ be such that $\displaystyle\sum_{i>i_{0}}ip_{i}<\frac{\lambda\varepsilon}{2}.$ It follows that

\displaystyle\sum_{i=0}^{i_{0}}ip_{i}>\lambda\left(1-\frac{\varepsilon}{2}\right).

Let $\displaystyle k_{0}$ be such that for all $\displaystyle k>k_{0}$ and $\displaystyle 0\leq i\leq i_{0}$ ,

\displaystyle Z^{\lambda}_{i,k}(x)>p_{i}-\frac{\lambda\varepsilon}{2i_{0}^{2}}.

Consider the positions from $\displaystyle 1$ to $\displaystyle\lfloor\lambda b^{k}\rfloor$ in $\displaystyle x$ . We say a position is blamed if the word of length $\displaystyle k$ starting at that position occurs more than $\displaystyle i_{0}$ times in the prefix $\displaystyle x[1...\lfloor\lambda b^{k}\rfloor+k-1]$ of $\displaystyle x$ . For $\displaystyle k>k_{0}$ , we can bound the number of blamed positions between $\displaystyle 1$ and $\displaystyle\lfloor\lambda b^{k}\rfloor$ in the following way:

	$\displaystyle\displaystyle\lfloor\lambda b^{k}\rfloor-\sum_{i=0}^{i_{0}}i\ Z^{\lambda}_{i,k}(x)b^{k}$	$\displaystyle\displaystyle<\lambda b^{k}-b^{k}\sum_{i=0}^{i_{0}}\left(ip_{i}-\frac{\lambda\varepsilon}{2i_{0}}\right)$
		$\displaystyle\displaystyle<\lambda b^{k}+\frac{b^{k}\lambda\varepsilon}{2}-b^{k}\sum_{i=0}^{i_{0}}ip_{i}$
		$\displaystyle\displaystyle<\lambda b^{k}+\frac{b^{k}\lambda\varepsilon}{2}-b^{k}\lambda\left(1-\frac{\varepsilon}{2}\right)$
		$\displaystyle\displaystyle<\lambda\varepsilon b^{k}.$

We cover the positions from $\displaystyle 1$ to $\displaystyle\lfloor\lambda b^{k}\rfloor+k-1$ with non-overlapping words of length $\displaystyle k$ such that no word starts at a blamed position and every position that is not blamed is covered by exactly one word. We refer to these words as covering words. Notice that a covering word may contain blamed positions as long as they are not the first one.

Occurrences of $\displaystyle w$ in $\displaystyle x[1...\lfloor\lambda b^{k}\rfloor+k-1]$ fall into one of the following categories:

•

occurrences of $\displaystyle w$ starting at a blamed position. The number of such occurrences is bounded by the number of blamed positions, which is at most $\displaystyle\lambda\varepsilon b^{k}$ .
•

occurrences of $\displaystyle w$ not fully contained in a covering word. Since there are at most $\displaystyle k^{-1}\lambda b^{k}$ covering words, the number of these occurrences is bounded by $\displaystyle|w|k^{-1}\lambda b^{k}$ .
•

occurrences of $\displaystyle w$ contained in a covering word which is in $\displaystyle Bad(k,w,\varepsilon)$ . Each covering word can occur at most $\displaystyle i_{0}$ times, and can contain at most $\displaystyle k$ occurrences of $\displaystyle w$ . Then there are at most $\displaystyle i_{0}k|Bad(k,w,\varepsilon)|$ occurrences of $\displaystyle w$ in this case. Notice that for sufficiently large $\displaystyle k$ , $\displaystyle\varepsilon\geq 6/\lfloor k/|w|\rfloor$ , so we can use the bound in Lemma 8.
•

occurrences contained in a covering word which is not in $\displaystyle Bad(k,w,\varepsilon)$ . Each such word contains at most $\displaystyle kb^{-|w|}+\varepsilon k$ occurrences of $\displaystyle w$ . Since there are at most $\displaystyle k^{-1}\lambda b^{k}$ covering words, the total number of such occurrences is at most $\displaystyle\lambda b^{k}(b^{-|w|}+\varepsilon)$ .

Combining all the upper bounds for each category yields the following upper bound for the number of occurrences of $\displaystyle w$ in $\displaystyle x[1..\lfloor\lambda b^{k}\rfloor+k-1]$ ,

\displaystyle\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k}}\leq\varepsilon+\frac{1}{k}|w|+\frac{4i_{0}|w|b^{|w|}}{\lambda}ke^{-\varepsilon^{2}kb^{|w|}/(6|w|)}+b^{-|w|}+\varepsilon.

Taking limit superior we obtain

\displaystyle\displaystyle\limsup_{k\to\infty}\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k}}\leq 2\varepsilon+b^{-|w|}.

Since this holds for every $\displaystyle\varepsilon\leq 1/b^{|w|}$ , it follows that

\displaystyle\displaystyle\limsup_{k\to\infty}\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k}}\leq b^{-|w|}.

To show that $\displaystyle x$ is Borel normal we apply Lemma 7. Fix $\displaystyle N$ and let $\displaystyle k$ be such that $\displaystyle\lambda b^{k-1}\leq N<\lambda b^{k}$ . Then, using the bounds obtained before,

\displaystyle\limsup_{N\to\infty}\frac{|x[1...N]|_{w}}{N}\leq\limsup_{n\to\infty}\frac{|x[1...\lfloor\lambda b^{k}\rfloor+k-1]|_{w}}{\lambda b^{k-1}}\leq b^{1-|w|}.

We conclude that $\displaystyle x$ is Borel normal. ∎

Acknowledgements. The authors are grateful to Zeev Rudnick and to Benjamin Weiss for their comments on our investigations and for having insisted that we solve this problem. Gabriel Sac Himelfarb is supported by the student fellowship “Beca de Estímulo a las Vocaciones Científicas” convocatoria 2020, Consejo Interuniversitario Nacional, Argentina. Verónica Becher is supported by Agencia Nacional de Promoción Científica y Tecnológica grant PICT-2018-02315 and byUniversidad de Buenos Aires grant Ubacyt 20020170100309BA.

References

[1] Nicolás Alvarez, Verónica Becher, and Martín Mereb. Poisson generic sequences. International Mathematics Research Notices, rnac234, 2022. DOI 10.1093/imrn/rnac234.
[2] Verónica Becher and Pablo Ariel Heiber. On extending de Bruijn sequences. Information Processing Letters, 111:930–932, 2011.
[3] Yann Bugeaud. Distribution modulo one and Diophantine approximation, volume 193 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 2012.
[4] Lucas Puterman Colomer. Very normal numbers, 2019. Tesis de Licenciatura en Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires.
[5] Arthur H. Copeland and Paul Erdös. Note on normal numbers. Bulletin American Mathematical Society, 52:857–860, 1946.
[6] Nicolaas Gover de Bruijn. A combinatorial problem. Indagationes Mathematicae, 8:461–467, 1946.
[7] Andrzej Ehrenfeucht and Jan Mycielski. A pseudorandom sequence – How random is it? American Mathematical Monthly, 99(4):373–375, 1992.
[8] Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. Cambridge University Press, 2009.
[9] Lauwerens Kuipers and Harald Niederreiter. Uniform distribution of sequences. Dover, 2006.
[10] Zeev Rudnick and Alexandru Zaharescu. The distribution of spacings between fractional parts of lacunary sequences. Forum Mathematicum, 14(5):691–712, 2002.
[11] Benjamin Weiss. Poisson generic points. Jean-Morlet Chair 2020 - Conference: Diophantine Problems, Determinism and Randomness, Centre International de Rencontres Mathématiques, November 23 to 29, 2020. Audio- visual resource: doi:10.24350/CIRM.V.19690103.

Verónica Becher
Departamento de Computación, Facultad de Ciencias Exactas y Naturales & ICC, Universidad de Buenos Aires & CONICET, Argentina
vbecher@dc.uba.ar

Gabriel Sac Himelfarb
Departamento de Matemática & Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
gabrielsachimelfarb@gmail.com

A construction of a λ\displaystyle\lambda-Poisson generic sequence

Abstract

1 Introduction and Statement of Results

Definition 1.

Theorem 1.

Corollary 1.

Remark 1.

Definition 2.

Theorem 2.

Remark 2.

Corollary 2.

2 Proof of Theorem 1

2.1 The construction

Lemma 1 (Becher and Heiber [2, Theorem 1]).

Definition 3.

Construction.

Step 1.

Step k+1.

2.2 An example

2.3 Correctness

Fact 1.

Lemma 2.

Proof.

Lemma 3.

Proof.

Lemma 4.

Proof.

Lemma 5.

Proof.

Lemma 6.

Proof.

Remark 3.

Proof of Theorem 1.

Remark 4.

2.4 On the case of the two-symbol alphabet

3 Proof of Theorem 2

Lemma 7 (Pyatetskii-Shapiro [3, Theorem 4.6]).

Lemma 8.

Proof of Theorem 2.

References

A construction of a $\displaystyle\lambda$ -Poisson generic sequence