This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Introduction to weak Pinsker filtrations

Séverin Benzoni
Abstract

We introduce the so-called weak Pinsker dynamical filtrations, whose existence in any ergodic system follows from the universality of the weak Pinsker property, recently proved by Austin [1]. These dynamical filtrations appear as a potential tool to describe and classify positive entropy systems. We explore the links between the asymptotic structure of weak Pinsker filtrations and the properties of the underlying dynamical system. A central question is whether, on a given system, the structure of weak Pinsker filtrations is unique up to isomorphism. We give a partial answer, in the case where the underlying system is Bernoulli. We conclude our work by giving two explicit examples of weak Pinsker filtrations.

1 Introduction

In 1958, Kolmogorov and Sinaï introduced the notion of entropy in ergodic theory: the Kolmogorov-Sinaï entropy (or KS-entropy). The same year, Kolmogorov introduced another important notion: K-systems. He defined a K-system as a dynamical system 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) on which there is a generator ξ\xi whose tail σ\sigma-algebra n1σ(ξ],n])\bigcap_{n\geq 1}\sigma(\xi_{]-\infty,-n]}) is trivial. There is an equivalent definition that is more intrinsic to the system: 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) is a K-system if, and only if, every non-trivial observable ξ0\xi_{0} satisfies hμ(ξ,T)>0h_{\mu}(\xi,T)>0 (a proof of this equivalence, and a more complete presentation of this notion can be found in [4]). It is also equivalent to assume that the Pinsker factor of the system is trivial, the Pinsker factor being the σ\sigma-algebra

Π𝐗={A𝒜|h(𝟙A,T)=0}.\Pi_{\mathbf{X}}=\{A\in\mathscr{A}\;|\;h(\mathbbm{1}_{A},T)=0\}.

The Pinsker factor is simply the largest factor of 𝐗\mathbf{X} that is of entropy 0. Therefore, a K-system has no non-trivial factor of entropy 0: it is entirely non-deterministic. For example, the most elementary K-systems are the Bernoulli shifts. They are K-systems because i.i.d. processes satisfy Kolmogorov’s 0-11 law.

Entropy is an invariant that quantifies the “chaos” of a dynamical system, or more precisely its unpredictability, and many of the questions that arose after its discovery were aimed at understanding the structure of this “chaos”. The first question, which Kolmogorov asked after proving that Bernoulli shifts are K-systems, was whether all K-systems are Bernoulli shifts, which would imply that these chaotic systems have a very simple structure. More general questions then emerged, and we will return to them in the following paragraphs.

The discovery of entropy first led to non-isomorphism results, particularly for Bernoulli shifts: two isomorphic Bernoulli shifts must have the same entropy. The converse of this result, shown by Ornstein [11, 12], is one of the most notable successes of the KS-entropy. But the ramifications of Ornstein’s theory go far beyond Bernoulli shifts, and have had a profound impact on the evolution of ergodic theory. We will confine ourselves here to telling the story of the weak Pinsker property.

In the early 1960s, Pinsker, then working in Moscow with Kolmogorov, showed that any K-factor of 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) is independent of the Pinsker factor Π𝐗\Pi_{\mathbf{X}} (see [19], but this reference is in Russian). Following this result, although the existence of any specific K-factor had not yet been proved, he issued a conjecture (later called the “Pinsker conjecture”): any system of non-zero entropy is isomorphic to the direct product of its Pinsker factor and a K-system. A few years later, Sinai published [23] which seemed to confirm this conjecture: he proved the existence of a factor of 𝐗\mathbf{X} isomorphic to a Bernoulli shift of the same entropy as 𝐗\mathbf{X}. Given Pinsker’s independence result, it would have been sufficient to prove that the factor constructed by Sinaï and the Pinsker factor generate the entire σ\sigma-algebra 𝒜\mathscr{A} to obtain a result even stronger than Pinsker’s conjecture: 𝐗\mathbf{X} would then be isomorphic to the direct product of its Pinsker factor and a Bernoulli shift. This “strong Pinsker conjecture” would also have proved that any K-system is isomorphic to a Bernoulli shift.

But this conjecture turned out to be false: Ornstein published a first example of a non-Bernoulli K-system [15] which contradicts the strong Pinsker conjecture. Following that, many other counterexamples were built, showing that the family of all K-systems is very broad, leaving little hope for a complete classification of those systems. Among all these counterexamples, we can find a construction by Ornstein [13] that can be used to contradict Pinsker’s conjecture. Furthermore, he then refined this result by constructing a mixing system that does not verify Pinsker’s conjecture [14]. Thus, all the conjectures formulated in the early years of the study of KS-entropy were wrong, revealing a wide variety of possible phenomena.

One of the ramifications of Ornstein’s theory can be found in the work of Thouvenot, who, starting in 1975, became interested in relatively Bernoulli systems and developed a “relative” version of Ornstein’s theory. Following this work, in his 1977 paper [24], he introduced the weak Pinsker property: for any ε>0\varepsilon>0, 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) is isomorphic to the direct product of a Bernoulli shift 𝐁\mathbf{B} and a system 𝐗ε\mathbf{X}_{\varepsilon} of entropy at most ε\varepsilon:

𝐗𝐗ε𝐁.\mathbf{X}\cong\mathbf{X}_{\varepsilon}\otimes\mathbf{B}. (1)

For four decades, however, it was unclear whether all systems verified the weak Pinsker property. But in 2018, Austin published a paper on the subject [1] in which he proved that all ergodic systems satisfy the weak Pinsker property.

We can then iterate this splitting operation: take (εn)n1(\varepsilon_{n})_{n\leq-1} an increasing sequence of positive numbers such that limnεn=0\lim_{n\rightarrow-\infty}\varepsilon_{n}=0, and start by splitting 𝐗\mathbf{X} into

𝐗𝐗ε1𝐁1,\mathbf{X}\cong\mathbf{X}_{\varepsilon_{-1}}\otimes\mathbf{B}_{-1},

then split 𝐗ε1\mathbf{X}_{\varepsilon_{-1}} into

𝐗ε1𝐗ε2𝐁2,\mathbf{X}_{\varepsilon_{-1}}\cong\mathbf{X}_{\varepsilon_{-2}}\otimes\mathbf{B}_{-2},

and so on. This yields a sequence of systems (𝐗εn)n1(\mathbf{X}_{\varepsilon_{n}})_{n\leq-1} such that, for every n1n\leq-1, 𝐗εn\mathbf{X}_{\varepsilon_{n}} is a factor of 𝐗εn+1\mathbf{X}_{\varepsilon_{n+1}}. By composing the factor maps, it means that each 𝐗εn\mathbf{X}_{\varepsilon_{n}} is a factor of 𝐗\mathbf{X}, and therefore generates a TT-invariant σ\sigma-algebra n:=σ(𝐗εn)𝒜\mathscr{F}_{n}:=\sigma(\mathbf{X}_{\varepsilon_{n}})\subset\mathscr{A}. Because of our iterating construction, we see that nn+1\mathscr{F}_{n}\subset\mathscr{F}_{n+1}, so the sequence :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} is a filtration. This is what we call a weak Pinsker filtration on 𝐗\mathbf{X} (see Definition 2.20).

The purpose of this paper is to discuss the links between weak Pinsker filtrations and the structure of dynamical systems with positive entropy. Weak Pinsker filtrations fall into the category of dynamical filtrations, i.e. filtrations :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} on a dynamical system for which each σ\sigma-algebra n\mathscr{F}_{n} is TT-invariant. A framework for the study of such filtrations was introduced in [10], and this will guide our approach of weak Pinsker filtrations. In Section 2.2, we introduce the necessary concepts from the theory of dynamical filtrations. This framework is focused on the various possible structures of filtrations whose tail σ\sigma-algebra n0n\bigcap_{n\leq 0}\mathscr{F}_{n} is trivial, which is the type of weak Pinsker filtrations that appear on K-systems (see Theorem 2.23). Therefore, the study of weak Pinsker filtrations we suggest would mainly be aimed at classifying K-systems, and in particular non-Bernoulli K-systems.

In Section 2, we give an overview of the results and open questions that arise from the study of the properties of weak Pinsker filtrations, and their relation to the structure of the underlying dynamical system. One of those questions concerns the uniqueness, up to isomorphism, of weak Pinsker filtrations. In Section 3, we give a partial answer to this uniqueness problem in the case of Bernoulli systems. That section is based on ideas suggested to us by Thouvenot. The main result of this paper is Theorem 3.1. Finally, in Section 4, we give explicit examples of weak Pinsker filtrations, in order to give a more concrete meaning to all of those abstract notions.

2 Weak Pinsker filtrations and related questions

In this section, we introduce the notion of weak Pinsker filtrations, the tools necessary to study them and state some of the main questions concerning those filtrations. Since weak Pinsker filtrations are dynamical filtrations, we will use the framework for classifying dynamical filtrations presented in the previous section.

2.1 Basic notation

A dynamical system is a quadruple 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) such that (X,𝒜,μ)(X,\mathscr{A},\mu) is a Lebesgue probability space, and TT is an invertible measure-preserving transformation.

Let ,𝒞𝒜\mathscr{B},\mathscr{C}\subset\mathscr{A} be sub-σ\sigma-algebras. We write 𝒞\mathscr{B}\subset\mathscr{C} mod μ\mu, if for every BB\in\mathscr{B}, there exists C𝒞C\in\mathscr{C} such that μ(BΔC)=0\mu(B\Delta C)=0. Then, =𝒞\mathscr{B}=\mathscr{C} mod μ\mu if 𝒞\mathscr{B}\subset\mathscr{C} mod μ\mu and 𝒞\mathscr{C}\subset\mathscr{B} mod μ\mu. We denote 𝒞\mathscr{B}\vee\mathscr{C} the smallest σ\sigma-algebra that contains \mathscr{B} and 𝒞\mathscr{C}. We say that \mathscr{B} is a factor σ\sigma-algebra (or TT-invariant σ\sigma-algebra) if T1()=T^{-1}(\mathscr{B})=\mathscr{B} mod μ\mu. Let ,𝒞\mathscr{B},\mathscr{C} and 𝒟\mathscr{D} be sub-σ\sigma-algebras of 𝒜\mathscr{A}. We say that \mathscr{B} and 𝒞\mathscr{C} are relatively independent over 𝒟\mathscr{D} if for any \mathscr{B}-measurable bounded function BB and 𝒞\mathscr{C}-measurable bounded function CC

𝔼[BC|𝒟]=𝔼[B|𝒟]𝔼[C|𝒟]almost surely.\mathbb{E}[BC\,|\,\mathscr{D}]=\mathbb{E}[B\,|\,\mathscr{D}]\;\mathbb{E}[C\,|\,\mathscr{D}]\;\text{almost surely}.

In this case, we write 𝒟𝒞\mathscr{B}\perp\!\!\!\perp_{\mathscr{D}}\mathscr{C}. If 𝒟\mathscr{D} is trivial, \mathscr{B} and 𝒞\mathscr{C} are independent, which we denote 𝒞\mathscr{B}\perp\!\!\!\perp\mathscr{C}.

If we have two systems 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) and 𝐘:=(Y,,ν,S)\mathbf{Y}:=(Y,\mathscr{B},\nu,S), a factor map is a measurable map π:XY\pi:X\longrightarrow Y such that πμ=ν\pi_{*}\mu=\nu and πT=Sπ\pi\circ T=S\circ\pi, μ\mu-almost surely. If such a map exists, we say that 𝐘\mathbf{Y} is a factor of 𝐗\mathbf{X} and we denote σ(π):=π1()\sigma(\pi):=\pi^{-1}(\mathscr{B}) the σ\sigma-algebra generated by π\pi. Conversely, we also say that 𝐗\mathbf{X} is an extension of 𝐘\mathbf{Y} or that 𝐘\mathbf{Y} is embedded in 𝐗\mathbf{X}. Moreover, if there exist invariant sets X0XX_{0}\subset X and Y0YY_{0}\subset Y of full measure such that π:X0Y0\pi:X_{0}\longrightarrow Y_{0} is a bijection, then π\pi is an isomorphism and we write 𝐗𝐘\mathbf{X}\cong\mathbf{Y}.

For a given factor σ\sigma-algebra \mathscr{B}, in general, the quadruple (X,,μ,T)(X,\mathscr{B},\mu,T) is not a dynamical system since \mathscr{B} need not separate points on XX, and in this case (X,,μ)(X,\mathscr{B},\mu) is not a Lebesgue probability space. However, there always exist a dynamical system 𝐘\mathbf{Y} and a factor map π:𝐗𝐘\pi:\mathbf{X}\longrightarrow\mathbf{Y} such that σ(π)=\sigma(\pi)=\mathscr{B} mod μ\mu. Moreover, this representation is not unique, but for a given factor \mathscr{B}, all such systems 𝐘\mathbf{Y} are isomorphic and there is a canonical construction to get a system 𝐗/\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{B}$} and a factor map p:𝐗𝐗/p_{\mathscr{B}}:\mathbf{X}\longrightarrow\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{B}$} such that σ(p)=\sigma(p_{\mathscr{B}})=\mathscr{B} mod μ\mu (see [6, Chapter 2, Section 2]).

2.2 Dynamical filtrations

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T). A dynamical filtration is a pair (,T)(\mathscr{F},T) such that :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} is a filtration in discrete negative time (i.e. nn+1\mathscr{F}_{n}\subset\mathscr{F}_{n+1}) on 𝒜\mathscr{A} and each n\mathscr{F}_{n} is TT-invariant. The theory of dynamical filtrations was initiated by Paul Lanthier in [9, 10]. For our present work, the primary notion we need is a precise definition of what it means for two dynamical filtrations to be isomorphic:

Definition 2.1.

Let (,T1)(\mathscr{F},T_{1}) be a dynamical filtration on 𝐗1:=(X1,𝒜1,μ1,T1)\mathbf{X}_{1}:=(X_{1},\mathscr{A}_{1},\mu_{1},T_{1}) and (𝒢,T2)(\mathscr{G},T_{2}) a dynamical filtration on 𝐗2:=(X2,𝒜2,μ2,T2)\mathbf{X}_{2}:=(X_{2},\mathscr{A}_{2},\mu_{2},T_{2}). We say that (,T1)(\mathscr{F},T_{1}) and (𝒢,T2)(\mathscr{G},T_{2}) are isomorphic if there is an isomorphism Φ:𝐗1/0𝐗2/𝒢0\Phi:\mathbf{X}_{1}/\!\raisebox{-2.79857pt}{$\mathscr{F}_{0}$}\rightarrow\mathbf{X}_{2}/\!\raisebox{-2.79857pt}{$\mathscr{G}_{0}$} such that, for all n0n\leq 0, Φ(n)=𝒢n\Phi(\mathscr{F}_{n})=\mathscr{G}_{n} mod μ2\mu_{2}.

We will discuss two specific families of filtrations:

Definition 2.2.

Let (,T)(\mathscr{F},T) be a dynamical filtration on 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T). It is of product type if there is a sequence (𝒞n)n0(\mathscr{C}_{n})_{n\leq 0} of mutually independent factor σ\sigma-algebras such that

n0,n=kn𝒞k mod μ.\forall n\leq 0,\;\mathscr{F}_{n}=\bigvee_{k\leq n}\mathscr{C}_{k}\;\text{ mod }\,\mu.
Definition 2.3.

Let (,T)(\mathscr{F},T) be a dynamical filtration on 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T). It is Kolmogorovian if its tail σ\sigma-algebra is trivial, i.e. if n0n={,X}\bigcap_{n\leq 0}\mathscr{F}_{n}=\{\varnothing,X\} mod μ\mu.

In particular, because of Kolmogorov’s 010-1 law, product type filtrations are Kolmogorovian.

In the theory of dynamical filtrations, additional properties, such as standardness and I-cosiness, appear naturally (see [10]). However, for now, we are not able to find relevant applications of those notions in the study of weak Pinsker filtrations, so we dot not discuss them in this paper. That being said, standardness and I-cosiness being looser than “product-type”, a first step in investigating further the examples of Section 4 could be to determine whether they are standard, I-cosy or neither.

2.3 Reminders on KS-entropy

The notion of entropy first appeared in mathematics in 1948, introduced by Shannon in his foundational work on information theory [21]. It is defined as follows:

Definition 2.4 (Shannon entropy).

Let (X,𝒜,μ)(X,\mathscr{A},\mu) be a probability space and ξ:XA\xi:X\rightarrow A a random variable, with AA finite or countable. The Shannon entropy of ξ\xi is

Hμ(ξ):=aAμ({ξ=a})logμ({ξ=a}).H_{\mu}(\xi):=-\sum_{a\in A}\mu(\{\xi=a\})\cdot\log\mu(\{\xi=a\}).

The number Hμ(ξ)H_{\mu}(\xi) gives the average amount of information given by the random variable ξ\xi. If we have a probability measure ρ\rho defined directly on AA, we can also define the entropy of that measure

H(ρ):=aAρ(a)logρ(a).H(\rho):=-\sum_{a\in A}\rho(a)\cdot\log\rho(a).

One can also define conditional entropy: for χ1:XY1\chi_{1}:X\rightarrow Y_{1} and χ2:XY2\chi_{2}:X\rightarrow Y_{2} be two random variables, we define

Hμ(χ1|χ2):=y2Y2μ(χ2=y2)y1Y1φ(μ(χ1=y1|χ2=y2)),H_{\mu}(\chi_{1}\,|\,\chi_{2}):=\sum_{y_{2}\in Y_{2}}\mu(\chi_{2}=y_{2})\sum_{y_{1}\in Y_{1}}\varphi(\mu(\chi_{1}=y_{1}\,|\,\chi_{2}=y_{2})),

where φ(x)=xlog(x)\varphi(x)=-x\cdot\log(x). This quantifies the missing information required to determine χ1\chi_{1} once χ2\chi_{2} is known. We refer to [3, Chapter 2, Section 6] for the basic properties of this notion. In the present work, conditional entropy will only serve as a computational tool, via Fano’s inequality. See [3, Theorem 6.2] for a proof.

Lemma 2.5 (Fano’s inequality).

Let χ1:XA\chi_{1}:X\rightarrow A and χ2:XA\chi_{2}:X\rightarrow A be two AA-valued random variables. Set d:=μ(χ1χ2)d:=\mu(\chi_{1}\neq\chi_{2}). We have

Hμ(χ1|χ2)φ(d)+φ(1d)+dlog(#A1).H_{\mu}(\chi_{1}\,|\,\chi_{2})\leq\varphi(d)+\varphi(1-d)+d\log(\#A-1).

In particular, for ε]0,e1[\varepsilon\in]0,e^{-1}[, if χ\chi is an AA-valued random variable such that there exists a0Aa_{0}\in A satisfying μ(χ=a0)1ε\mu(\chi=a_{0})\geq 1-\varepsilon, then

Hμ(χ)ε(1+log(ε1)+log(#A1)).H_{\mu}(\chi)\leq\varepsilon(1+\log(\varepsilon^{-1})+\log(\#A-1)).

In 1958, Kolmogorov and Sinaï used entropy to quantify the randomness of measure preserving dynamical systems.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system. To any random variable ξ0:XA\xi_{0}:X\rightarrow A, with AA finite, we associate ξ:XA\xi:X\rightarrow A^{\mathbb{Z}} the corresponding TT-process

ξ:=(ξn)n:=(ξ0Tn)n.\xi:=(\xi_{n})_{n\in\mathbb{Z}}:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}.

Also, for FF\subset\mathbb{Z}, set ξF:=(ξn)nF\xi_{F}:=(\xi_{n})_{n\in F}.

The Kolmogorov-Sinaï entropy (or KS-entropy) of a dynamical system is:

Definition 2.6 (Kolmogorov-Sinaï entropy).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system. For a finite valued random variable ξ0:XA\xi_{0}:X\rightarrow A, define

hμ(ξ,T):=limn1nHμ(ξ0,n).h_{\mu}(\xi,T):=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}(\xi_{\llbracket 0,n\rrbracket}).

For a TT-invariant σ\sigma-algebra 𝒜\mathscr{B}\subset\mathscr{A}, define

hμ(,T):=sup{hμ(ξ,T);ξ0 a -measurable random variable}.h_{\mu}(\mathscr{B},T):=\sup\{h_{\mu}(\xi,T)\,;\;\xi_{0}\text{ a }\mathscr{B}\text{-measurable random variable}\}.

Finally, set

h(𝐗):=hμ(𝒜,T).h(\mathbf{X}):=h_{\mu}(\mathscr{A},T).

The KS-entropy satisfies the following continuity result:

Lemma 2.7.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system and a random variable ξ0:XA\xi_{0}:X\rightarrow A, with AA finite. For ε>0\varepsilon>0, there exists δ>0\delta>0 such that, for any random variable ζ0:XA\zeta_{0}:X\rightarrow A such that μ(ζ0ξ0)δ\mu(\zeta_{0}\neq\xi_{0})\leq\delta, we have

|hμ(ξ,T)hμ(ζ,T)|ε.|h_{\mu}(\xi,T)-h_{\mu}(\zeta,T)|\leq\varepsilon.
Proof.

In this proof, we will use Fano’s inequality (Lemma 2.5). Specifically, we compute:

h(ξ,T)\displaystyle h(\xi,T) =limn1nHμ(ξ[0,n[)limn1nHμ((ξζ)[0,n[)\displaystyle=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}(\xi_{[0,n[})\leq\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}((\xi\vee\zeta)_{[0,n[})
limn1n(Hμ(ζ[0,n[)+j=0n1Hμ(ξj|ζ[0,n[))\displaystyle\leq\lim_{n\rightarrow\infty}\frac{1}{n}\left(H_{\mu}(\zeta_{[0,n[})+\sum_{j=0}^{n-1}H_{\mu}(\xi_{j}\,|\,\zeta_{[0,n[})\right)
limn1n(Hμ(ζ[0,n[)+j=0n1Hμ(ξj|ζj))\displaystyle\leq\lim_{n\rightarrow\infty}\frac{1}{n}\left(H_{\mu}(\zeta_{[0,n[})+\sum_{j=0}^{n-1}H_{\mu}(\xi_{j}\,|\,\zeta_{j})\right)
hμ(ζ,T)+Hμ(ξ0|ζ0)\displaystyle\leq h_{\mu}(\zeta,T)+H_{\mu}(\xi_{0}\,|\,\zeta_{0})
hμ(ζ,T)+φ(d)+φ(1d)+dlog(#A1).\displaystyle\leq h_{\mu}(\zeta,T)+\varphi(d)+\varphi(1-d)+d\log(\#A-1).

where φ(x)=xlog(x)\varphi(x)=-x\cdot\log(x). And, since φ\varphi is continuous, there exists δ>0\delta>0 such that, if dδd\leq\delta, we have

hμ(ξ,T)hμ(ζ,T)+ε.h_{\mu}(\xi,T)\leq h_{\mu}(\zeta,T)+\varepsilon.

By switching ξ\xi and ζ\zeta and doing the same reasoning, we end the proof. ∎

It is useful to locate the deterministic aspects of a dynamical system. We do that by considering the Pinsker factor of a system. For any factor σ\sigma-algebra \mathscr{B}, we define

Π={A|h(𝟙A,T)=0}.\Pi_{\mathscr{B}}=\{A\in\mathscr{B}\;|\;h(\mathbbm{1}_{A},T)=0\}.

The Pinsker factor of the system 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) is then defined as Π𝐗:=Π𝒜\Pi_{\mathbf{X}}:=\Pi_{\mathscr{A}}. We will use the following basic result, which can be found in [18, Theorem 14]:

Lemma 2.8.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system and \mathscr{B} and 𝒞\mathscr{C} be independent factor σ\sigma-algebras. We have

Π𝒞=ΠΠ𝒞.\Pi_{\mathscr{B}\vee\mathscr{C}}=\Pi_{\mathscr{B}}\vee\Pi_{\mathscr{C}}.

To be able to compute the entropy of a system, the following result proves to be most useful.

Theorem 2.9 (Kolmogorov-Sinaï).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system. Consider a finite valued random variable ξ0:XA\xi_{0}:X\rightarrow A and the corresponding TT-process ξ:=(ξ0Tn)n\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}}. Then we have

hμ(σ(ξ),T)=hμ(ξ,T).h_{\mu}(\sigma(\xi),T)=h_{\mu}(\xi,T).

In particular, if ξ\xi is a generator of 𝒜\mathscr{A} (i.e. 𝒜=σ(ξ)\mathscr{A}=\sigma(\xi) mod μ\mu), then h(𝐗)=hμ(ξ,T)h(\mathbf{X})=h_{\mu}(\xi,T).

2.4 Ornstein’s theory and its relative version

From the definition, one easily sees that KS-entropy is invariant under isomorphism of dynamical systems, which makes it a useful tool in the classification of measure preserving dynamical systems. The most remarkable classification results concern Bernoulli and relatively Bernoulli systems:

Definition 2.10 (Bernoulli and relatively Bernoulli).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system.

We say that 𝐗\mathbf{X} (or 𝒜\mathscr{A}) is Bernoulli if there exists a random variable ξ0:XA\xi_{0}:X\rightarrow A such that the corresponding TT-process ξ:=(ξ0Tn)n\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}} is i.i.d. and generates 𝒜\mathscr{A}, i.e. we have σ(ξ)=𝒜\sigma(\xi)=\mathscr{A} mod μ\mu.

Let 𝒜\mathscr{B}\subset\mathscr{A} be a factor σ\sigma-algebra. We say that 𝐗\mathbf{X} (or 𝒜\mathscr{A}) is relatively Bernoulli over \mathscr{B} if there is an i.i.d. process of the form ξ:=(ξ0Tn)n\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}} such that σ(ξ)\sigma(\xi) is independent of \mathscr{B} and 𝒜=σ(ξ)\mathscr{A}=\mathscr{B}\vee\sigma(\xi) mod μ\mu.

Those two definitions coincide when \mathscr{B} is the trivial factor σ\sigma-algebra: 𝐗\mathbf{X} is relatively Bernoulli over {,X}\{\varnothing,X\} if and only if 𝐗\mathbf{X} is Bernoulli.

Remark 2.11.

We can consider another approach to define Bernoulli systems: take AA a finite or countable set and ρ\rho a probability measure on AA. On the product probability space (A,ρ)(A^{\mathbb{Z}},\rho^{\otimes\mathbb{Z}}), consider the transformation

S:(an)n(an+1)n.S:(a_{n})_{n\in\mathbb{Z}}\mapsto(a_{n+1})_{n\in\mathbb{Z}}.

The map SS is called the shift on AA^{\mathbb{Z}}. One can easily check that ρ\rho^{\otimes\mathbb{Z}} is SS-invariant. Therefore, this yields a measure preserving dynamical system

𝐁:=(A,ρ,S),\mathbf{B}:=(A^{\mathbb{Z}},\rho^{\otimes\mathbb{Z}},S), (2)

which is called a Bernoulli shift. Then a system is Bernoulli if and only if it is isomorphic to a Bernoulli shift. Similarly, we can see that a system 𝐗\mathbf{X} is relatively Bernoulli over a factor σ\sigma-algebra \mathscr{B} if and only if 𝐗\mathbf{X} is isomorphic to a system of the form 𝐘𝐁\mathbf{Y}\otimes\mathbf{B} via a factor map φ:𝐗𝐘×𝐁\varphi:\mathbf{X}\longrightarrow\mathbf{Y}\times\mathbf{B} such that σ(π𝐘φ)=\sigma(\pi_{\mathbf{Y}}\circ\varphi)=\mathscr{B} mod μ\mu (where π𝐘\pi_{\mathbf{Y}} is the projection of 𝐘𝐁\mathbf{Y}\otimes\mathbf{B} onto 𝐘\mathbf{Y}).

Using Theorem 2.9, it is easy to compute the entropy of a Bernoulli system. Let ξ\xi be an i.i.d.i.i.d. process on 𝐗\mathbf{X} that generates 𝒜\mathscr{A}. We then have

h(𝐗)\displaystyle h(\mathbf{X}) =hμ(𝒜,T)=hμ(ξ,T)=limn1nHμ(ξ0,n)\displaystyle=h_{\mu}(\mathscr{A},T)=h_{\mu}(\xi,T)=\lim_{n\rightarrow\infty}\frac{1}{n}H_{\mu}(\xi_{\llbracket 0,n\rrbracket})
=limn1ni=0nHμ(ξi)=Hμ(ξ0).\displaystyle=\lim_{n\rightarrow\infty}\frac{1}{n}\sum_{i=0}^{n}H_{\mu}(\xi_{i})=H_{\mu}(\xi_{0}).

In particular, if 𝐗\mathbf{X} is isomorphic to a system of the form (2), we get

h(𝐗)=h(𝐁)=aAρ(a)log(ρ(a)).h(\mathbf{X})=h(\mathbf{B})=-\sum_{a\in A}\rho(a)\log(\rho(a)).

Since, to be isomorphic, two systems need to have the same entropy, this computation enables us to get a non-isomorphism result between any two Bernoulli systems of different entropy. Remarkably, Ornstein proved that the converse is also true:

Theorem 2.12 (Ornstein [11, 12]).

If 𝐗\mathbf{X} and 𝐘\mathbf{Y} are Bernoulli systems such that h(𝐗)=h(𝐘)h(\mathbf{X})=h(\mathbf{Y}), then 𝐗𝐘\mathbf{X}\cong\mathbf{Y}.

This means that the KS-entropy gives a complete classification of Bernoulli systems. An outstanding result that emerged from Ornstein’s theory was a criterion to characterize Bernoulli systems: finite determination. However, although this notion is useful in proving abstract results, when studying a given system, it is not easy to know whether or not it is finitely determined. Because of that, another criterion called very weak Bernoullicity was developed (see [16, Section 7]). This is the criterion we are interested in.

For the remainder of this section, we assume that the processes are defined on finite alphabets. We first need a technical definition. Given a finite alphabet AA, an integer 1\ell\geq 1 and two words 𝐚,𝐛A\mathbf{a},\mathbf{b}\in A^{\ell} of length \ell on AA, we define the normalized Hamming metric between 𝐚\mathbf{a} and 𝐛\mathbf{b} as:

d(𝐚,𝐛):=1#{i1,|aibi},d_{\ell}(\mathbf{a},\mathbf{b}):=\frac{1}{\ell}\#\{i\in\llbracket 1,\ell\rrbracket\,|\,a_{i}\neq b_{i}\},

where 𝐚=(a1,,a)\mathbf{a}=(a_{1},...,a_{\ell}) and 𝐛=(b1,,b)\mathbf{b}=(b_{1},...,b_{\ell}). We then consider the corresponding transportation metric on 𝒫(A)\mathscr{P}(A^{\ell}):

μ,ν𝒫(A),d¯(μ,ν):=inf{d(𝐚,𝐛)𝑑λ(𝐚,𝐛);λ a coupling of μ and ν}.\forall\mu,\nu\in\mathscr{P}(A^{\ell}),\;\bar{d_{\ell}}(\mu,\nu):=\inf\left\{\int d_{\ell}(\mathbf{a},\mathbf{b})d\lambda(\mathbf{a},\mathbf{b})\,;\;\lambda\text{ a coupling of $\mu$ and $\nu$}\right\}.

Then a process ξ\xi is said to be very weak Bernoulli if, for some 1\ell\geq 1, the conditional law of ξ[0,[\xi_{[0,\ell[} given the past of ξ\xi is close enough to the law of ξ[0,[\xi_{[0,\ell[} in the d¯\bar{d_{\ell}} metric. More formally, we state:

Definition 2.13 (Very weak Bernoulli).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system, equipped with a TT-process ξ\xi taking values in a finite alphabet. We say that ξ\xi is very weak Bernoulli if, for every ε>0\varepsilon>0, there exists 1\ell\geq 1 such that for every m1m\geq 1, we have

d¯(ν(|𝐚[m,0[),ν())dν(𝐚)ε,\int\bar{d_{\ell}}\left(\nu_{\ell}(\cdot\,|\,\mathbf{a}_{[-m,0[}),\nu_{\ell}(\cdot)\right)d\nu(\mathbf{a})\leq\varepsilon,

where ν\nu is the law of ξ\xi and, for II\subset\mathbb{Z}, ν(|𝐚I)\nu_{\ell}(\cdot\,|\,\mathbf{a}_{I}) is the conditional law of ξ[0,[\xi_{[0,\ell[} given that ξI\xi_{I} equals 𝐚I\mathbf{a}_{I}.

If 𝒜=σ(ξ)\mathscr{A}=\sigma(\xi), we say that 𝐗\mathbf{X} (or 𝒜\mathscr{A}) is very weak Bernoulli.

The fact that very weak Bernoullicity characterizes Bernoulli systems can be stated as follows:

Theorem 2.14 (see [16, 17]).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system. A TT-process ξ\xi on 𝐗\mathbf{X} is very weak Bernoulli if and only if σ(ξ)\sigma(\xi) is Bernoulli.

Following the work of Ornstein, Thouvenot studied relatively Bernoulli systems and adapted the definitions of finite determination and very weak Bernoullicity to get criteria that characterize relatively Bernoulli systems. Here we give his adaptation of very weak Bernoullicity:

Definition 2.15 (Relatively very weak Bernoulli).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system, equipped with two TT-processes ξ\xi and η\eta with finite alphabets. We say that ξ\xi is relatively very weak Bernoulli over η\eta if, for every ε>0\varepsilon>0, there exists 1\ell\geq 1 such that for every m1m\geq 1 and for all k1k\geq 1 large enough, we have

d¯(ν(|𝐚[m,0[,𝐛[k,k]),ν(|𝐛[k,k]))dν(𝐚,𝐛)ε,\int\bar{d_{\ell}}\left(\nu_{\ell}(\cdot\,|\,\mathbf{a}_{[-m,0[},\mathbf{b}_{[-k,k]}),\nu_{\ell}(\cdot\,|\,\mathbf{b}_{[-k,k]})\right)d\nu(\mathbf{a},\mathbf{b})\leq\varepsilon,

where ν\nu is the law of (ξ,η)(\xi,\eta) and, for I,JI,J\subset\mathbb{Z}, ν(|𝐚I,𝐛J)\nu_{\ell}(\cdot\,|\,\mathbf{a}_{I},\mathbf{b}_{J}) is the conditional law of ξ[0,[\xi_{[0,\ell[} given that ξI\xi_{I} equals 𝐚I\mathbf{a}_{I} and that ηJ\eta_{J} equals 𝐛J\mathbf{b}_{J}.

If 𝒜=σ(ξ)\mathscr{A}=\sigma(\xi) and =σ(η)\mathscr{B}=\sigma(\eta), we say that 𝐗\mathbf{X} (or 𝒜\mathscr{A}) is relatively very weak Bernoulli over \mathscr{B}.

Many early results from Thouvenot’s theory were stated for relatively finitely determined systems. However, for our work, relative very weak Bernoullicity is a more convenient notion. Fortunately, we have the following equivalence, which enables us to apply to relatively very weak Bernoulli processes results originally stated for relatively finitely determined processes:

Theorem 2.16 (see [20]).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic system and ξ\xi and η\eta be TT-processes with finite alphabets defined on 𝐗\mathbf{X}. Then ξ\xi is relatively very weak Bernoulli over η\eta if and only if it is relatively finitely determined over η\eta.

We give a summary of the results we will use:

Lemma 2.17.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a finite entropy dynamical system and \mathscr{B} a factor σ\sigma-algebra. Let ξ\xi and η\eta be TT-processes with finite alphabets defined on 𝐗\mathbf{X} such that 𝒜=σ(ξ)\mathscr{A}=\sigma(\xi) and =σ(η)\mathscr{B}=\sigma(\eta). If ξ\xi is relatively very weak Bernoulli over η\eta, then

  1. (i)

    𝐗\mathbf{X} is relatively Bernoulli over \mathscr{B},

  2. (ii)

    any TT-process ρ\rho on 𝐗\mathbf{X} is relatively very weak Bernoulli over η\eta,

  3. (iii)

    for any factor σ\sigma-algebra 𝒞𝒜\mathscr{C}\subset\mathscr{A}, 𝒞\mathscr{B}\vee\mathscr{C} is relatively very weak Bernoulli over \mathscr{B},

  4. (iv)

    any factor σ\sigma-algebra 𝒞𝒜\mathscr{C}\subset\mathscr{A} that is independent from \mathscr{B} is Bernoulli.

Proof.

We prove the lemma mainly by referring to the literature. The statement (i) follows from [26, Proposition 5] and Theorem 2.16. Then (ii) follows from [26, Proposition 4] and Theorem 2.16, and (iii) follows from (ii). Let us prove (iv): take ρ\rho a process on 𝐗\mathbf{X} such that 𝒞=σ(ρ)\mathscr{C}=\sigma(\rho) mod μ\mu. From (ii), we know that ρ\rho is relatively very weak Bernoulli over η\eta. However, since 𝒞\mathscr{C} is independent of \mathscr{B}, ρ\rho is independent of η\eta. One can then notice that if we add this independence in the definition of relative very weak Bernoullicity, we end up with the fact that ρ\rho is very weak Bernoulli. Finally, Theorem 2.14 tells us that 𝒞=σ(ρ)\mathscr{C}=\sigma(\rho) is Bernoulli. ∎

We have just given many definitions and results concerning processes with finite alphabets, and the σ\sigma-algebras they generate. The following result from Krieger tells that it is applicable on any finite entropy system:

Theorem 2.18 (See [8]).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system and 𝒜\mathscr{B}\subset\mathscr{A} be a factor σ\sigma-algebra. If hμ(,T)<h_{\mu}(\mathscr{B},T)<\infty, there exists a finite alphabet AA and a random variable ξ0:XA\xi_{0}:X\rightarrow A such that

=σ({ξ0Tn}n) mod μ.\mathscr{B}=\sigma(\{\xi_{0}\circ T^{n}\}_{n\in\mathbb{Z}})\text{ mod }\mu.

We say that ξ\xi is a finite generator of \mathscr{B}.

2.5 Positive entropy systems and weak Pinsker filtrations

In 2018, Austin proved the following:

Theorem 2.19 (Austin, 2018, [1]).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system. For every ε>0\varepsilon>0 there exists a factor σ\sigma-algebra \mathscr{B} such that:

  • hμ(,T)εh_{\mu}(\mathscr{B},T)\leq\varepsilon,

  • 𝐗\mathbf{X} is relatively Bernoulli over \mathscr{B}.

In other words, 𝐗\mathbf{X} has the weak Pinsker property (as in (1)).

Definition 2.20.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system and :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} a dynamical filtration on 𝐗\mathbf{X} such that 0=𝒜\mathscr{F}_{0}=\mathscr{A}. We say that \mathscr{F} is a weak Pinsker filtration if

  • for every n1n\leq-1, n+1\mathscr{F}_{n+1} is relatively Bernoulli over n\mathscr{F}_{n},

  • and

    limnhμ(n,T)=0.\lim_{n\rightarrow-\infty}h_{\mu}(\mathscr{F}_{n},T)=0.

Then, by iterating Austin’s theorem, we see that we can obtain weak Pinsker filtrations on any ergodic system:

Proposition 2.21.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system. If 𝐗\mathbf{X} is ergodic, there exists a weak Pinsker filtration on 𝐗\mathbf{X}. More specifically, for every increasing sequence (εn)n1(\varepsilon_{n})_{n\leq-1} such that ε1h(𝐗)\varepsilon_{-1}\leq h(\mathbf{X}) and limnεn=0\lim_{n\rightarrow-\infty}\varepsilon_{n}=0, there exists a weak Pinsker filtration (n)n0(\mathscr{F}_{n})_{n\leq 0} such that n1\forall n\leq-1, hμ(n,T)=εnh_{\mu}(\mathscr{F}_{n},T)=\varepsilon_{n}.

This simply tells us that weak Pinsker filtrations exist, but gives no explicit description. To start understanding those filtrations better, we can first link them to the Pinsker factor of the system:

Proposition 2.22.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system and :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} a weak Pinsker filtration on 𝐗\mathbf{X}. Then the tail σ\sigma-algebra :=n0n\mathscr{F}_{-\infty}:=\bigcap_{n\leq 0}\mathscr{F}_{n} is the Pinsker factor of 𝐗\mathbf{X}.

Proof.

Let :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} be a weak Pinsker filtration on 𝐗\mathbf{X}. Since, for n00n_{0}\leq 0, n0\mathscr{F}_{-\infty}\subset\mathscr{F}_{n_{0}}, it follows that hμ(,T)hμ(n0,T)h_{\mu}(\mathscr{F}_{-\infty},T)\leq h_{\mu}(\mathscr{F}_{n_{0}},T). Then, by taking n0n_{0}\rightarrow-\infty, this yields hμ(,T)=0h_{\mu}(\mathscr{F}_{-\infty},T)=0. Therefore, Π𝐗\mathscr{F}_{-\infty}\subset\Pi_{\mathbf{X}}.

Conversely, let us show that, for every n0n\leq 0, Π𝐗n\Pi_{\mathbf{X}}\subset\mathscr{F}_{n}. Since \mathscr{F} is a weak Pinsker filtration, we can choose n𝒜\mathscr{B}_{n}\subset\mathscr{A} a Bernoulli factor σ\sigma-algebra such that

nn and nn=𝒜 mod μ.\mathscr{F}_{n}\perp\!\!\!\perp\mathscr{B}_{n}\;\text{ and }\;\mathscr{F}_{n}\vee\mathscr{B}_{n}=\mathscr{A}\text{ mod }\mu.

Then we use Lemma 2.8:

Π𝐗=Π𝒜=ΠnΠn=Πnn,\Pi_{\mathbf{X}}=\Pi_{\mathscr{A}}=\Pi_{\mathscr{F}_{n}}\vee\Pi_{\mathscr{B}_{n}}=\Pi_{\mathscr{F}_{n}}\subset\mathscr{F}_{n},

because, n\mathscr{B}_{n} being Bernoulli, its Pinsker factor is trivial. ∎

Weak Pinsker filtrations are dynamical filtrations, and in Section 2.2, we introduced tools to classify dynamical filtrations, which we use here. While trying to connect the properties of a weak Pinsker filtration with the properties of the underlying system, we get the following simple results:

Theorem 2.23.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a dynamical system and :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} be a weak Pinsker filtration on 𝐗\mathbf{X}. Then

  1. (i)

    𝐗\mathbf{X} is a K-system if and only if \mathscr{F} is Kolmogorovian, i.e. n0n={,X}\bigcap_{n\leq 0}\mathscr{F}_{n}=\{\varnothing,X\} mod μ\mu.

  2. (ii)

    If the filtration \mathscr{F} is of product-type, then 𝐗\mathbf{X} is Bernoulli.

Proof.

We know that a system is K if and only if its Pinsker factor is trivial. Then the equivalence in (i) follows from Proposition 2.22.

We now prove (ii). Assume that \mathscr{F} is a weak Pinsker filtration of product type. This means that there exists a sequence (n)n0(\mathscr{B}_{n})_{n\leq 0} of mutually independent factor σ\sigma-algebras such that n=knk\mathscr{F}_{n}=\bigvee_{k\leq n}\mathscr{B}_{k}. Let n0n\leq 0. We know that n\mathscr{F}_{n} is relatively Bernoulli over n1\mathscr{F}_{n-1} and that n\mathscr{B}_{n} is independent of n1\mathscr{F}_{n-1}. So, Lemma 2.17 tells us that n\mathscr{B}_{n} is Bernoulli. Therefore, we have 𝒜=0=k0k\mathscr{A}=\mathscr{F}_{0}=\bigvee_{k\leq 0}\mathscr{B}_{k}, which shows that we can write 𝒜\mathscr{A} as a product of mutually independent Bernoulli factors. Hence, 𝒜\mathscr{A} is Bernoulli. ∎

However, this result leaves many open questions. First, we can ask if the converse of (ii) is true. Since we remark at the end of Section 2.6 that, on a Bernoulli shift, there is at least one weak Pinsker filtration of product type, the converse of (ii) is equivalent to the uniqueness problem given in Question 2.27. Another area that is left open is to consider other properties from the theory of dynamical filtrations, like standardness or I-cosiness, and wonder what it implies of the system if a weak Pinsker filtrations has those properties:

Question 2.24.

What can we say about 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) if there is a weak Pinsker filtration \mathscr{F} on 𝐗\mathbf{X} that is standard ? In that case, is 𝐗\mathbf{X} Bernoulli ? And if the weak Pinsker filtration is I-cosy ?

Our hope is that answering those questions could give additional information on the structure of non-Bernoulli K-systems. For precise definitions of standardness and I-cosiness, see [10] or [2].

2.6 The uniqueness problem

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system. As mentioned in Proposition 2.21, the fact that every ergodic systems satisfies the weak Pinsker property implies that, for any given increasing sequence (εn)n1(\varepsilon_{n})_{n\leq-1} that goes to 0 in -\infty such that ε1h(𝐗)\varepsilon_{-1}\leq h(\mathbf{X}), there exits a weak Pinsker filtration \mathscr{F} on 𝐗\mathbf{X} such that hμ(n,T)=εnh_{\mu}(\mathscr{F}_{n},T)=\varepsilon_{n}. But this filtration is not unique. Indeed, in the splitting result given by the weak Pinsker property (1), the choice of the factor σ\sigma-algebra generated by 𝐗ε\mathbf{X}_{\varepsilon} is not unique. For example, take a system of the form

𝐗:=𝐙𝐁1𝐁2,\mathbf{X}:=\mathbf{Z}\otimes\mathbf{B}_{1}\otimes\mathbf{B}_{2},

where 𝐙\mathbf{Z} is a 0 entropy system and 𝐁1\mathbf{B}_{1} and 𝐁2\mathbf{B}_{2} are Bernoulli shifts of equal entropy. Note that 𝐙𝐁1\mathbf{Z}\otimes\mathbf{B}_{1} and 𝐙𝐁2\mathbf{Z}\otimes\mathbf{B}_{2} generate two different factor σ\sigma-algebras on 𝐗\mathbf{X}. But they are both factors over which 𝐗\mathbf{X} is relatively Bernoulli, and they have the same entropy. However, we can notice in this example that 𝐙𝐁1\mathbf{Z}\otimes\mathbf{B}_{1} and 𝐙𝐁2\mathbf{Z}\otimes\mathbf{B}_{2} are isomorphic. This observation hints to a general result:

Theorem 2.25 (From Thouvenot in [25]).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) and 𝐘:=(Y,,ν,S)\mathbf{Y}:=(Y,\mathscr{B},\nu,S) be ergodic dynamical systems and 𝐁\mathbf{B} be a Bernoulli shift of finite entropy. If 𝐗𝐁\mathbf{X}\otimes\mathbf{B} and 𝐘𝐁\mathbf{Y}\otimes\mathbf{B} are isomorphic, then 𝐗\mathbf{X} and 𝐘\mathbf{Y} are isomorphic.

Proof.

This proof relies on the weak Pinsker property of 𝐗\mathbf{X} and 𝐘\mathbf{Y}, and Lemma 2.17. We also use many times that Bernoulli shifts with the same entropy are isomorphic.

Since 𝐗𝐁\mathbf{X}\otimes\mathbf{B} and 𝐘𝐁\mathbf{Y}\otimes\mathbf{B} are isomorphic, we have:

h(𝐗)=h(𝐗𝐁)h(𝐁)=h(𝐘𝐁)h(𝐁)=h(𝐘).\displaystyle h(\mathbf{X})=h(\mathbf{X}\otimes\mathbf{B})-h(\mathbf{B})=h(\mathbf{Y}\otimes\mathbf{B})-h(\mathbf{B})=h(\mathbf{Y}).

Set a:=h(𝐗)=h(𝐘)a:=h(\mathbf{X})=h(\mathbf{Y}). We can then apply the weak Pinsker property of 𝐗\mathbf{X} and 𝐘\mathbf{Y} to find two systems 𝐗^\hat{\mathbf{X}}, 𝐘^\hat{\mathbf{Y}} and a Bernoulli shift 𝐁^\hat{\mathbf{B}} such that

h(𝐗^)=h(𝐘^)a/3,h(\hat{\mathbf{X}})=h(\hat{\mathbf{Y}})\leq a/3,

and

𝐗𝐗^𝐁^ and 𝐘𝐘^𝐁^.\mathbf{X}\cong\hat{\mathbf{X}}\otimes\hat{\mathbf{B}}\;\text{ and }\;\mathbf{Y}\cong\hat{\mathbf{Y}}\otimes\hat{\mathbf{B}}.

This implies

𝐗^(𝐁^𝐁)𝐘^(𝐁^𝐁).\hat{\mathbf{X}}\otimes(\hat{\mathbf{B}}\otimes\mathbf{B})\cong\hat{\mathbf{Y}}\otimes(\hat{\mathbf{B}}\otimes\mathbf{B}).

In other words, there is a system 𝐙\mathbf{Z} and two factor maps p𝐗^:𝐙𝐗^p_{\hat{\mathbf{X}}}:\mathbf{Z}\longrightarrow\hat{\mathbf{X}} and p𝐘^:𝐙𝐘^p_{\hat{\mathbf{Y}}}:\mathbf{Z}\longrightarrow\hat{\mathbf{Y}} such that 𝐙\mathbf{Z} is relatively Bernoulli over p𝐗^p_{\hat{\mathbf{X}}} and relatively Bernoulli over p𝐘^p_{\hat{\mathbf{Y}}}. But then, Lemma 2.17 tells us that the factor σ\sigma-algebra σ(p𝐗^p𝐘^)\sigma(p_{\hat{\mathbf{X}}}\vee p_{\hat{\mathbf{Y}}}) is relatively very weak Bernoulli over p𝐗^p_{\hat{\mathbf{X}}} and relatively very weak Bernoulli over p𝐘^p_{\hat{\mathbf{Y}}}. Therefore, there exist a Bernoulli shift 𝐁~\tilde{\mathbf{B}} and two factor maps φ1:𝐙𝐁~\varphi_{1}:\mathbf{Z}\longrightarrow\tilde{\mathbf{B}} and φ2:𝐙𝐁~\varphi_{2}:\mathbf{Z}\longrightarrow\tilde{\mathbf{B}} such that φ1p𝐗^\varphi_{1}\perp\!\!\!\perp p_{\hat{\mathbf{X}}}, φ2p𝐘^\varphi_{2}\perp\!\!\!\perp p_{\hat{\mathbf{Y}}} and

σ(p𝐗^φ1)=σ(p𝐗^p𝐘^)=σ(p𝐘^φ2).\sigma(p_{\hat{\mathbf{X}}}\vee\varphi_{1})=\sigma(p_{\hat{\mathbf{X}}}\vee p_{\hat{\mathbf{Y}}})=\sigma(p_{\hat{\mathbf{Y}}}\vee\varphi_{2}).

This implies that

𝐗^𝐁~𝐘^𝐁~.\hat{\mathbf{X}}\otimes\tilde{\mathbf{B}}\cong\hat{\mathbf{Y}}\otimes\tilde{\mathbf{B}}.

But, since we chose to have h(𝐗^)=h(𝐘^)a/3h(\hat{\mathbf{X}})=h(\hat{\mathbf{Y}})\leq a/3, we get

h(𝐁~)h(p𝐗^p𝐘^)h(𝐗^)+h(𝐘^)2a/3h(𝐁^).\displaystyle h(\tilde{\mathbf{B}})\leq h(p_{\hat{\mathbf{X}}}\vee p_{\hat{\mathbf{Y}}})\leq h(\hat{\mathbf{X}})+h(\hat{\mathbf{Y}})\leq 2a/3\leq h(\hat{\mathbf{B}}).

Given a last Bernoulli shift 𝐁¯\overline{\mathbf{B}} of entropy h(𝐁^)h(𝐁~)h(\hat{\mathbf{B}})-h(\tilde{\mathbf{B}}) we get 𝐁^𝐁~𝐁¯\hat{\mathbf{B}}\cong\tilde{\mathbf{B}}\otimes\overline{\mathbf{B}} and

𝐗𝐗^𝐁^𝐗^𝐁~𝐁¯𝐘^𝐁~𝐁¯𝐘^𝐁^𝐘.\displaystyle\mathbf{X}\cong\hat{\mathbf{X}}\otimes\hat{\mathbf{B}}\cong\hat{\mathbf{X}}\otimes\tilde{\mathbf{B}}\otimes\overline{\mathbf{B}}\cong\hat{\mathbf{Y}}\otimes\tilde{\mathbf{B}}\otimes\overline{\mathbf{B}}\cong\hat{\mathbf{Y}}\otimes\hat{\mathbf{B}}\cong\mathbf{Y}.

As a consequence of this result, we see that if :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} and 𝒢:=(𝒢n)n0\mathscr{G}:=(\mathscr{G}_{n})_{n\leq 0} are two weak Pinsker filtrations on 𝐗\mathbf{X} such that, for all n0n\leq 0, hμ(n,T)=hμ(𝒢n,T)h_{\mu}(\mathscr{F}_{n},T)=h_{\mu}(\mathscr{G}_{n},T), then we must have that, for each n0n\leq 0, 𝐗/n𝐗/𝒢n\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{F}_{n}$}\cong\mathbf{X}/\!\raisebox{-2.79857pt}{$\mathscr{G}_{n}$}.

However, this only gives “local isomorphisms”, and it does not necessarily mean that the filtrations \mathscr{F} and 𝒢\mathscr{G} are isomorphic (according to the notion of isomorphism introduced in Definition 2.1). Therefore, the following is still an open question:

Question 2.26.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system. Are all weak Pinsker filtrations on 𝐗\mathbf{X} with the same entropy isomorphic ?

This question is what we call the uniqueness problem.

If 𝐗\mathbf{X} is a Bernoulli shift, and if we take an increasing sequence (εn)n0(\varepsilon_{n})_{n\leq 0} such that ε0=h(𝐗)\varepsilon_{0}=h(\mathbf{X}), we can take Bernoulli shifts (𝐁n)n0(\mathbf{B}_{n})_{n\leq 0} such that h(𝐁n)=εnεn1h(\mathbf{B}_{n})=\varepsilon_{n}-\varepsilon_{n-1}, and define the system

𝐁:=n0𝐁n.\mathbf{B}:=\bigotimes_{n\leq 0}\mathbf{B}_{n}.

It is a Bernoulli shift of entropy ε0=h(𝐗)\varepsilon_{0}=h(\mathbf{X}), so it is isomorphic to 𝐗\mathbf{X}. Through this isomorphism, the factors of the form kn𝐁k\bigotimes_{k\leq n}\mathbf{B}_{k} generate a product type weak Pinsker filtration on 𝐗\mathbf{X}. Therefore, in the case where 𝐗\mathbf{X} is a Bernoulli shift, the uniqueness problem becomes:

Question 2.27.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a Bernoulli shift. Are all weak Pinsker filtrations on 𝐗\mathbf{X} of product type ?

3 Uniqueness problem on Bernoulli systems

In this section, we present our efforts to tackle Question 2.27. The ideas developed here come from discussions with Jean-Paul Thouvenot, and we thank him for those insights. Specifically, we are going to show:

Theorem 3.1.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a Bernoulli system and let :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} be a weak Pinsker filtration. There exists some sub-sequence (nk)k0(\mathscr{F}_{n_{k}})_{k\leq 0} which is a weak Pinsker filtration of product type.

The fact that we are only able to describe the structure of a sub-sequence of \mathscr{F}, for now, seems to be significant. Indeed, we can compare that result to a well known result from Vershik about static filtrations on a probability space: any filtration whose tail σ\sigma-algebra n0n\bigcap_{n\leq 0}\mathscr{F}_{n} is trivial has a sub-sequence that is standard (see [5, Theorem 3]). However there are many examples of non-standard filtrations with trivial tail σ\sigma-algebra. Therefore, although the context of Vershik’s result is very different, it emphasizes that Theorem 3.1 does not give a complete answer to Question 2.27.

The main step in proving Theorem 3.1 is contained in the following proposition:

Proposition 3.2.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a Bernoulli system of finite entropy and 𝒫0:XA\mathcal{P}_{0}:X\rightarrow A a finite generator of 𝒜\mathscr{A}, i.e. a finite valued random variable such that 𝒜=σ({𝒫0Tn}n)\mathscr{A}=\sigma(\{\mathcal{P}_{0}\circ T^{n}\}_{n\in\mathbb{Z}}). For every ε>0\varepsilon>0, there exists δ>0\delta>0 such that, if 𝒜\mathscr{H}\subset\mathscr{A} is a factor σ\sigma-algebra such that 𝐗\mathbf{X} is relatively Bernoulli over \mathscr{H}, and if hμ(,T)δh_{\mu}(\mathscr{H},T)\leq\delta, there is a Bernoulli factor σ\sigma-algebra \mathscr{B} such that

  1. (i)

    \mathscr{B}\perp\!\!\!\perp\mathscr{H},

  2. (ii)

    𝒜=\mathscr{A}=\mathscr{H}\vee\mathscr{B} mod μ\mu,

  3. (iii)

    and 𝒫0ε\mathcal{P}_{0}\preceq_{\varepsilon}\mathscr{B}.

In this proposition, Krieger’ theorem (Theorem 2.18) ensures the existence of a finite generator 𝒫\mathcal{P} since 𝐗\mathbf{X} has finite entropy. The notation “𝒫0ε\mathcal{P}_{0}\preceq_{\varepsilon}\mathscr{B}”, which we use many times below, means that there exists a \mathscr{B}-measurable random variable 𝒬0\mathcal{Q}_{0} such that μ(𝒫0𝒬0)ε\mu(\mathcal{P}_{0}\neq\mathcal{Q}_{0})\leq\varepsilon.

The existence of a Bernoulli factor satisfying (i) and (ii) is simply the definition of relative Bernoullicity, the important part of this proposition is the ability to build a Bernoulli complement that satisfies (iii). Then iterating this result will yield Theorem 3.1 (see Section 3.3).

3.1 The technical lemma

In this section, we tackle the main technical and constructive part of the proof of Proposition 3.2. It is contained in Lemma 3.7.

In Section 2.4, we introduced the notion of very weak Bernoullicity, which gives a characterization of Bernoulli systems. Here, we use another equivalent notion: extremality, due to Thouvenot (see [27, Definition 6.3]).

Definition 3.3.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system and ξ:=(ξ0Tn)n\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}} be a process where ξ0\xi_{0} takes values in some finite alphabet AA. We say that ξ\xi is extremal if, for every ε>0\varepsilon>0, there exist δ>0\delta>0 and NN\in\mathbb{N}, such that for every N\ell\geq N and every random variable 𝒬:XB\mathcal{Q}:X\rightarrow B with #B2δ\#B\leq 2^{\delta\ell}, there is a set B0BB_{0}\subset B such that μ(𝒬B0)1ε\mu(\mathcal{Q}\in B_{0})\geq 1-\varepsilon and for bB0b\in B_{0}, we have:

d¯(ν(|b),ν())ε,\bar{d_{\ell}}(\nu_{\ell}(\cdot\,|\,b),\nu_{\ell}(\cdot))\leq\varepsilon,

where ν\nu_{\ell} is the law of ξ[0,[\xi_{[0,\ell[} and ν(|b)\nu_{\ell}(\cdot\,|\,b) is the law of ξ[0,[\xi_{[0,\ell[} given that 𝒬\mathcal{Q} equals bb.

In [27, Theorem 6.4], it is shown that extremality is equivalent to very weak Bernoullicity (and hence to Bernoullicity). In particular, we will use the fact that any process defined on a Bernoulli system is extremal.

The proof of Lemma 3.7 uses many methods that are usual in Ornstein’s theory of Bernoulli shifts (a presentation can be found in [16] or [22]). Therefore, we need to introduce some commonly used notions and results from that theory. The following combinatorial result is frequently used in Ornstein’s theory:

Lemma 3.4 (Hall’s marriage lemma [7]).

Let EE and FF be finite sets, and {Je}eE\{J_{e}\}_{e\in E} be a family of subsets of FF: eE,JeF\forall e\in E,J_{e}\subset F. There exists an injective map ψ:EF\psi:E\rightarrow F such that eE,ψ(e)Je\forall e\in E,\psi(e)\in J_{e} if, and only if for every IEI\subset E, we have

#I#eIJe.\#I\leq\#\bigcup_{e\in I}J_{e}.

The main way in which the entropy of the processes is used in our arguments comes from the Shannon-McMillan-Breiman Theorem (see [3, Theorem 13.1]):

Theorem 3.5.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system and ξ0:XA\xi_{0}:X\rightarrow A. For 𝐚A[0,n[\boldsymbol{a}\in A^{[0,n[}, define

pn(𝒂):=μ(ξ[0,n[=𝒂).p_{n}(\boldsymbol{a}):=\mu(\xi_{[0,n[}=\boldsymbol{a}).

We have

limn1nlog(pn(ξ[0,n[))=hμ(ξ,T),μ-almost surely.\lim_{n\rightarrow\infty}-\frac{1}{n}\log(p_{n}(\xi_{[0,n[}))=h_{\mu}(\xi,T),\;\mu\text{-almost surely}.

In particular, we also have the convergence in probability: for every ε>0\varepsilon>0, there exists N1N\geq 1 such that for every nNn\geq N, there exists a set 𝒜nA[0,n[\mathcal{A}_{n}\subset A^{[0,n[} such that μ(ξ[0,n[𝒜n)1ε\mu(\xi_{[0,n[}\in\mathcal{A}_{n})\geq 1-\varepsilon and for every 𝐚𝒜n\boldsymbol{a}\in\mathcal{A}_{n},

2(hμ(ξ,T)+ε)nμ(ξ[0,n[=𝒂)2(hμ(ξ,T)ε)n.2^{-(h_{\mu}(\xi,T)+\varepsilon)n}\leq\mu(\xi_{[0,n[}=\boldsymbol{a})\leq 2^{-(h_{\mu}(\xi,T)-\varepsilon)n}.

We also need to introduce another tool that is commonly used in Ornstein’s theory: Rokhlin towers. On a dynamical system 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T), to get a tower of height nn, we need a set FF such that the sets TjFT^{j}F, for 0jn10\leq j\leq n-1 are disjoint. Then the family 𝒯:=(F,TF,,Tn1F)\mathcal{T}:=(F,TF,...,T^{n-1}F) is what we call a Rokhlin tower, or, in short, a tower. However, we will also refer to the set j=0n1TjF\bigsqcup_{j=0}^{n-1}T^{j}F as a tower. In particular, many times, we will write μ(𝒯)\mu(\mathcal{T}) for μ(j=0n1TjF)\mu(\bigsqcup_{j=0}^{n-1}T^{j}F). The following result guaranties that Rokhlin of arbitrary height and total measure almost 11 exist under quite general conditions:

Proposition 3.6 (See [22]).

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic dynamical system and ξ0\xi_{0} a finite valued random variable. Assume that μ\mu is non-atomic. For all n1n\geq 1 and ε>0\varepsilon>0, there exists a measurable set FXF\subset X such that the sets TjFT^{j}F, for j[0,n[j\in[0,n[, are disjoint, μ(j=0n1TjF)1ε\mu(\bigcup_{j=0}^{n-1}T^{j}F)\geq 1-\varepsilon and (ξ0|F)=(ξ0)\mathcal{L}(\xi_{0}\,|\,F)=\mathcal{L}(\xi_{0}).

The set FF is called the base of the tower 𝒯\mathcal{T} and the sets TjFT^{j}F are the levels. For any set EFE\subset F, the family

CE:={TjE}0jn1C_{E}:=\{T^{j}E\}_{0\leq j\leq n-1}

is a tower, and we say that it is a column of 𝒯\mathcal{T}. If ξ0:XA\xi_{0}:X\rightarrow A is a random variable, we will be interested in the columns defined by sets of the form F𝒂:=F{ξ[0,n[=𝒂}F_{\boldsymbol{a}}:=F\cap\{\xi_{[0,n[}=\boldsymbol{a}\} with 𝒂A[0,n[\boldsymbol{a}\in A^{[0,n[}. We say that 𝒂\boldsymbol{a} is the ξ\xi-name of the column C𝒂:=CF𝒂C_{\boldsymbol{a}}:=C_{F_{\boldsymbol{a}}}. The columns {C𝒂}𝒂A[0,n[\{C_{\boldsymbol{a}}\}_{\boldsymbol{a}\in A^{[0,n[}} give a partition of the levels of 𝒯\mathcal{T}. Now, conversely, assume that we have a partition of FF given by sets E1,,EpE_{1},...,E_{p}, then the columns CE1,,CEpC_{E_{1}},...,C_{E_{p}} give a partition of the levels of 𝒯\mathcal{T}. If, moreover, we associate to each column CEiC_{E_{i}} a name 𝒂(i)A[0,n[\boldsymbol{a}^{(i)}\in A^{[0,n[} of length nn, we can define a random variable ξ0\xi_{0} on the levels of 𝒯\mathcal{T} so that, for every ii, we have CEi=C𝒂(i)C_{E_{i}}=C_{\boldsymbol{a}^{(i)}}. We obtain this random variable simply by setting, for i1,p,j0,ni\in\llbracket 1,p\rrbracket,j\in\llbracket 0,n\llbracket

ξ0=𝒂j(i) on TjEi.\xi_{0}=\boldsymbol{a}_{j}^{(i)}\;\text{ on }\;T^{j}E_{i}.

This is the framework we will use to construct our random variables. We are now ready to turn our attention to the following:

Lemma 3.7.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a Bernoulli system of finite entropy and 𝒫0:XA\mathcal{P}_{0}:X\rightarrow A a finite generator of 𝒜\mathscr{A}. For every ε>0\varepsilon>0, there exists δ>0\delta>0 satisfying the following:

  • if 0:XH\mathcal{H}_{0}:X\rightarrow H is a finite valued random variable such that hμ(,T)δh_{\mu}(\mathcal{H},T)\leq\delta,

  • and ξ:=(ξ0Tn)n\xi:=(\xi_{0}\circ T^{n})_{n\in\mathbb{Z}} is a BB-valued (for some finite set BB) i.i.d. process independent from \mathcal{H} such that 𝒜=σ()σ(ξ)\mathscr{A}=\sigma(\mathcal{H})\vee\sigma(\xi) mod μ\mu,

then for any α>0\alpha>0, there exists a process ξ~\tilde{\xi} such that

  1. (i)

    d¯1((ξ0),(ξ~0))α\bar{d}_{1}(\mathcal{L}(\xi_{0}),\mathcal{L}(\tilde{\xi}_{0}))\leq\alpha,

  2. (ii)

    0hμ(ξ,T)hμ(ξ~,T)α0\leq h_{\mu}(\mathcal{H}\vee\xi,T)-h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq\alpha,

  3. (iii)

    and 𝒫0εσ(ξ~)\mathcal{P}_{0}\preceq_{\varepsilon}\sigma(\tilde{\xi}).

The proof of the lemma being quite intricate, we start by giving a sketch of the proof. First, we will need a Rokhlin tower 𝒯n\mathcal{T}_{n} of very large height nn. This tower is then divided into the columns C𝒉C_{\boldsymbol{h}} (see (10)) generated by \mathcal{H}. Each of those columns is then divided into sub-columns C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} (see (14)) generated by ξ\xi. Because ξ\mathcal{H}\vee\xi generates 𝒜\mathscr{A}, we can approach 𝒫0\mathcal{P}_{0} by some random variable 𝒫~0\tilde{\mathcal{P}}_{0} depending on finitely many coordinates of ξ\mathcal{H}\vee\xi. It enables us to associate to each C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} a word 𝒫~[0,n[(𝒉,𝒃)\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b}) which gives a good approximation of 𝒫0\mathcal{P}_{0} on the levels of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}. We will define ξ~0\tilde{\xi}_{0} by giving C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} a new ξ~\tilde{\xi}-name, to replace 𝒃\boldsymbol{b}. Our goal is to choose those names so that we can get a good approximation of 𝒫~[0,n[(𝒉,𝒃)\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b}) by simply knowing the ξ~\tilde{\xi}-name of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}, regardless of 𝒉\boldsymbol{h}. To do that, we fix a column C𝒉0C_{\boldsymbol{h}_{0}} and use it as a “model” for the other columns. Then the extremality of 𝒫\mathcal{P} comes into play: it tells us, for most choices of 𝒉\boldsymbol{h}, the families {𝒫~[0,n[(𝒉0,𝒃)}𝒃n\{\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h}_{0},\boldsymbol{b})\}_{\boldsymbol{b}\in\mathcal{B}_{n}} and {𝒫~[0,n[(𝒉,𝒃)}𝒃n\{\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b})\}_{\boldsymbol{b}\in\mathcal{B}_{n}} are quite similar. More specifically, we show that, for most 𝒃\boldsymbol{b}, there are names 𝒃~\tilde{\boldsymbol{b}} such that dn(𝒫~[0,n[(𝒉0,𝒃~),𝒫~[0,n[(𝒉,𝒃)){d_{n}(\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h}_{0},\tilde{\boldsymbol{b}}),\tilde{\mathcal{P}}_{[0,n[}(\boldsymbol{h},\boldsymbol{b}))} is small. Those names are then suitable ξ~\tilde{\xi}-names for C𝒉𝒃C^{\boldsymbol{b}}_{\boldsymbol{h}}. However, when we choose among those suitable names, we need to make sure that we are not giving the same name to too many columns, otherwise we might loose to much information, and we could not get (ii). This is done using Hall’s marriage lemma.

Proof of Lemma 3.7.

In this proof, we use many parameters, which we introduce below in a specific order to highlight the way they depend on each other:

  1. (a)

    Let ε>0\varepsilon>0. This parameter is chosen first, as it appears in the statement of the lemma. Then we choose δ>0\delta>0 and N1N\geq 1, as the numbers associated to ε3/4\varepsilon^{3}/4 in the definition of extremality of 𝒫\mathcal{P}. We assume that hμ(,T)<δh_{\mu}(\mathcal{H},T)<\delta.

  2. (b)

    Let α>0\alpha>0. This is another arbitrarily small parameter that appears in the statement of the lemma. It does not depend on ε\varepsilon nor δ\delta.

  3. (c)

    Next, we introduce 0<γ<10<\gamma<1, which must be small relative to α\alpha and ε\varepsilon for (ii) and (iii) to hold. Specifically, we require that γε/2\gamma\leq\varepsilon/2, and that the bound in Lemma 2.7 holds with error α\alpha, whenever μ(ξ0ζ0)2γ\mu(\xi_{0}\neq\zeta_{0})\leq 2\gamma, for any random variables ξ0\xi_{0} and ζ0\zeta_{0}.

  4. (d)

    Then we take β>0\beta>0, which is our most used parameter. We set β\beta satisfying the following:

    β<{δhμ(,T);min(ε3/24,ε/4);min(γ,γhμ(ξ,T)/5);α/7.\beta<\left\{\begin{array}[]{l}\delta-h_{\mu}(\mathcal{H},T);\\ \min(\varepsilon^{3}/24,\varepsilon/4);\\ \min(\gamma,\gamma h_{\mu}(\xi,T)/5);\\ \alpha/7.\end{array}\right.

    Once β\beta is fixed, we choose n01n_{0}\geq 1 such that 𝒫0β2/2(ξ)[n0,n0]\mathcal{P}_{0}\preceq_{\beta^{2}/2}(\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}.

  5. (e)

    Finally, we choose an integer nn, which will be the height of the Rokhlin tower. It is chosen larger than NN. We also need it to be large enough for us to apply the Shannon-McMillan-Breiman theorem, as well as Birkhoff’s ergodic theorem. As nn appears in many estimates, it needs to be large enough depending on ε\varepsilon, δ\delta, γ\gamma, β\beta and n0n_{0}. It would be quite tedious to give an explicit lower bound for nn, so, since all other parameters are now fixed and do not depend on nn, we simply point out throughout the proof the estimates where nn needs to be large.

Having now established the parameters, we begin the proof.

Step 1: The setup of the tower

As mentioned in (e), we choose nn so that we can apply the Shannon-McMillan-Breiman theorem (i.e. Theorem 3.5) and Birkhoff’s ergodic theorem to know that there exist two sets n0H[0,n[\mathcal{E}_{n}^{0}\subset H^{[0,n[} and n0B[0,n[\mathcal{B}_{n}^{0}\subset B^{[0,n[} such that

μ([0,n[n0)1β/2, and μ(ξ[0,n[n0)1β/3,\begin{gathered}\mu(\mathcal{H}_{[0,n[}\in\mathcal{E}_{n}^{0})\geq 1-\beta/2,\;\text{ and }\;\mu(\xi_{[0,n[}\in\mathcal{B}_{n}^{0})\geq 1-\beta/3,\end{gathered} (3)

on which the estimates (5), (6), (8) and (15) hold. Latter in the proof, we will see that we can take nn0\mathcal{E}_{n}\subset\mathcal{E}_{n}^{0} and nn0\mathcal{B}_{n}\subset\mathcal{B}_{n}^{0} subsets such that

μ([0,n[n)1β, and μ(ξ[0,n[n)1β,\begin{gathered}\mu(\mathcal{H}_{[0,n[}\in\mathcal{E}_{n})\geq 1-\beta,\;\text{ and }\;\mu(\xi_{[0,n[}\in\mathcal{B}_{n})\geq 1-\beta,\end{gathered} (4)

on which we also have (12) and (16). The fact that (12) holds for n\mathcal{E}_{n} appears in Step 2 and the fact that (16) holds for n\mathcal{B}_{n} appears in Step 3. Until then, we only use (4). For now, we present some of the estimates we have announced.

The first estimates given by the Shannon-McMillan-Breiman theorem are:

𝒉n, 2(hμ(,T)+β)nμ([0,n[=𝒉)2(hμ(,T)β)n,\displaystyle\forall\boldsymbol{h}\in\mathcal{E}_{n},\;2^{-(h_{\mu}(\mathcal{H},T)+\beta)n}\leq\mu(\mathcal{H}_{[0,n[}=\boldsymbol{h})\leq 2^{-(h_{\mu}(\mathcal{H},T)-\beta)n}, (5)
𝒃n, 2(hμ(ξ,T)+β)nμ(ξ[0,n[=𝒃)2(hμ(ξ,T)β)n.\displaystyle\forall\boldsymbol{b}\in\mathcal{B}_{n},\;2^{-(h_{\mu}(\xi,T)+\beta)n}\leq\mu(\xi_{[0,n[}=\boldsymbol{b})\leq 2^{-(h_{\mu}(\xi,T)-\beta)n}. (6)

For any sequence 𝒃B[0,n[\boldsymbol{b}\in B^{[0,n[} and any element bBb^{\prime}\in B, denote fn(𝒃,b)f_{n}(\boldsymbol{b},b^{\prime}) the frequency at which the element bb^{\prime} appears in the sequence 𝒃\boldsymbol{b}. This can also be defined as follows:

x{ξ[0,n[=𝒃},fn(𝒃,b):=1nj=0n1𝟙{ξ0=b}(Tjx).\forall x\in\{\xi_{[0,n[}=\boldsymbol{b}\},\;f_{n}(\boldsymbol{b},b^{\prime}):=\frac{1}{n}\sum_{j=0}^{n-1}\mathbbm{1}_{\{\xi_{0}=b^{\prime}\}}(T^{j}x). (7)

From this definition of fnf_{n}, it becomes clear that, as announced earlier, the estimate given by Birkhoff’s ergodic theorem is:

bB|fn(𝒃,b)μ(ξ0=b)|β.\sum_{b^{\prime}\in B}|f_{n}(\boldsymbol{b},b^{\prime})-\mu(\xi_{0}=b^{\prime})|\leq\beta. (8)

Since ξ\mathcal{H}\vee\xi generates 𝒜\mathscr{A}, as said in (d), we can find n01n_{0}\geq 1 so that 𝒫0β2(ξ)[n0,n0]\mathcal{P}_{0}\preceq_{\beta^{2}}(\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}. This means that there exists a (ξ)[n0,n0](\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}-measurable random variable 𝒫~0\tilde{\mathcal{P}}_{0} such that μ(𝒫~0𝒫0)β2\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0})\leq\beta^{2}.

By making use of Proposition 3.6, we can build a set GG such that F:=TGF^{\prime}:=TG is disjoint from GG and FF^{\prime} is the base of a tower 𝒢n:={TjF}0jn1\mathcal{G}_{n}:=\{T^{j}F^{\prime}\}_{0\leq j\leq n-1} such that μ(𝒢n)1β\mu(\mathcal{G}_{n})\geq 1-\beta and

((𝒫~ξ)[0,n[|F)=((𝒫~ξ)[0,n[).\mathcal{L}((\mathcal{H}\vee\tilde{\mathcal{P}}\vee\xi)_{[0,n[}\,|\,F^{\prime})=\mathcal{L}((\mathcal{H}\vee\tilde{\mathcal{P}}\vee\xi)_{[0,n[}). (9)

The set GG will be useful later to code the entrance of the tower. We slightly reduce the tower by setting F:=F{[0,n[n}{ξ[0,n[n}F:=F^{\prime}\cap\{\mathcal{H}_{[0,n[}\in\mathcal{E}_{n}\}\cap\{\xi_{[0,n[}\in\mathcal{B}_{n}\} and 𝒯n:={TjF}0jn1\mathcal{T}_{n}:=\{T^{j}F\}_{0\leq j\leq n-1}. One can then use (9) with our previous estimates to see that μ(𝒯n)13β\mu(\mathcal{T}_{n})\geq 1-3\beta (by making sure that 1/nβ1/n\leq\beta).

We then split 𝒯n\mathcal{T}_{n} into \mathcal{H}-columns: for 𝒉n\boldsymbol{h}\in{\mathcal{E}}_{n}, we define

C𝒉:={Tj(F{[0,n[=𝒉})}0jn1,C_{\boldsymbol{h}}:=\{T^{j}(F\cap\{\mathcal{H}_{[0,n[}=\boldsymbol{h}\})\}_{0\leq j\leq n-1}, (10)

so that 𝒯n=𝒉nC𝒉\mathcal{T}_{n}=\bigsqcup_{\boldsymbol{h}\in{\mathcal{E}}_{n}}C_{\boldsymbol{h}} (we mean that the levels of 𝒯n\mathcal{T}_{n} are disjoint unions of the levels of C𝒉C_{\boldsymbol{h}}). For each 𝒉n\boldsymbol{h}\in\mathcal{E}_{n}, we say that C𝒉C_{\boldsymbol{h}} is the column of [0,n[\mathcal{H}_{[0,n[}-name 𝒉\boldsymbol{h}. We also denote by F𝒉:=F{[0,n[=𝒉}F_{\boldsymbol{h}}:=F\cap\{\mathcal{H}_{[0,n[}=\boldsymbol{h}\} the base of C𝒉C_{\boldsymbol{h}}.

Step 2: Using the extremality of 𝒫\mathcal{P}

We plan on modifying ξ\xi into a process ξ~\tilde{\xi} so that the joint law of 𝒫ξ~\mathcal{P}\vee\tilde{\xi} is almost the same in most of the columns {C𝒉}𝒉n\{C_{\boldsymbol{h}}\}_{\boldsymbol{h}\in\mathcal{E}_{n}}. We start by using the fact that 𝐗\mathbf{X} is Bernoulli to see that the law of 𝒫\mathcal{P} is almost the same on each column C𝒉C_{\boldsymbol{h}}. Indeed, since 𝐗\mathbf{X} is Bernoulli, 𝒫\mathcal{P} is extremal, and we fixed δ>0\delta>0 and N1N\geq 1 as the numbers associated to ε3/4\varepsilon^{3}/4 in the definition of extremality and assume that hμ(,T)<δh_{\mu}(\mathcal{H},T)<\delta (see (a)). On the other hand, from (5), we deduce that

#n2(hμ(,T)+β)n.\#\mathcal{E}_{n}\leq 2^{(h_{\mu}(\mathcal{H},T)+\beta)n}.

Next we define the partition

𝒬:={on {[0,n[n}{ξ[0,n[n}[0,n[on {[0,n[n}{ξ[0,n[n}.\mathcal{Q}:=\left\{\begin{array}[]{ll}*&\text{on }\{\mathcal{H}_{[0,n[}\notin\mathcal{E}_{n}\}\cup\{\xi_{[0,n[}\notin\mathcal{B}_{n}\}\\ \mathcal{H}_{[0,n[}&\text{on }\{\mathcal{H}_{[0,n[}\in\mathcal{E}_{n}\}\cap\{\xi_{[0,n[}\in\mathcal{B}_{n}\}\\ \end{array}.\right.

In particular, we know that μ(𝒬=)2β\mu(\mathcal{Q}=*)\leq 2\beta. Moreover, the number of values taken by 𝒬\mathcal{Q} is bounded by

#n+12(hμ(,T)+β)n+12nδ,\#\mathcal{E}_{n}+1\leq 2^{(h_{\mu}(\mathcal{H},T)+\beta)n}+1\leq 2^{n\delta},

since β<δhμ(,T)\beta<\delta-h_{\mu}(\mathcal{H},T). Therefore the extremality of 𝒫\mathcal{P} tells us that, since nNn\geq N, there exists a subset ¯nn\bar{\mathcal{E}}_{n}\subset{\mathcal{E}}_{n} such that

μ(𝒬(¯n{}))ε3/4+2βε3ε,\mu(\mathcal{Q}\notin(\bar{\mathcal{E}}_{n}\cup\{*\}))\leq\varepsilon^{3}/4+2\beta\leq\varepsilon^{3}\leq\varepsilon, (11)

and for 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n}, we have

d¯n((𝒫[0,n[|𝒬=𝒉),(𝒫[0,n[))ε3/4.\bar{d}_{n}(\mathcal{L}(\mathcal{P}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\mathcal{P}_{[0,n[}))\leq\varepsilon^{3}/4.

As mentioned at the start of the proof, the set n\mathcal{E}_{n} is chosen so that (4) holds and we have

𝒉n,μ(𝒫~0𝒫0|𝒬=𝒉)β.\forall\boldsymbol{h}\in\mathcal{E}_{n},\;\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0}\,|\,\mathcal{Q}=\boldsymbol{h})\leq\beta. (12)

This is possible because μ(𝒫0𝒫~0)β2/2\mu(\mathcal{P}_{0}\neq\tilde{\mathcal{P}}_{0})\leq\beta^{2}/2. Therefore, using that βε3/24\beta\leq\varepsilon^{3}/24 with (12), this yields, for 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n}:

d¯n((𝒫~[0,n[|𝒬=𝒉),(𝒫~[0,n[))ε3/3.\bar{d}_{n}(\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}))\leq\varepsilon^{3}/3. (13)

Step 3: Framework for the construction of ξ~0\tilde{\xi}_{0}

We start the construction of ξ~\tilde{\xi} by setting ξ~0:=\tilde{\xi}_{0}:=* on G=T1FG=T^{-1}F^{\prime}, where * represents a symbol that does not belong to BB. Later in the proof, this will allow us to detect the entrance into 𝒯n\mathcal{T}_{n} from the value of the process ξ~\tilde{\xi}. Then define ξ~0\tilde{\xi}_{0} to take any value in BB on the rest of 𝒯nc\mathcal{T}_{n}^{c}. For 𝒉n\¯n\boldsymbol{h}\in\mathcal{E}_{n}\backslash\bar{\mathcal{E}}_{n}, on C𝒉C_{\boldsymbol{h}}, we set ξ~0:=ξ0\tilde{\xi}_{0}:=\xi_{0}. We are left with defining our new random variable ξ~0\tilde{\xi}_{0} on the columns C𝒉C_{\boldsymbol{h}}, with 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n}. We start by fixing 𝒉0¯n\boldsymbol{h}_{0}\in\bar{\mathcal{E}}_{n}, and the column C𝒉0C_{\boldsymbol{h}_{0}} will serve as a “model” for the other columns.

Next we fix an 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n}. We define sub-columns of C𝒉C_{\boldsymbol{h}}: for 𝒃n\boldsymbol{b}\in\mathcal{B}_{n},

C𝒉𝒃:={Tj(F{[0,n[=𝒉}{ξ[0,n[=𝒃})}0jn1.C_{\boldsymbol{h}}^{\boldsymbol{b}}:=\{T^{j}(F\cap\{\mathcal{H}_{[0,n[}=\boldsymbol{h}\}\cap\{\xi_{[0,n[}=\boldsymbol{b}\})\}_{0\leq j\leq n-1}. (14)

We say that 𝒃\boldsymbol{b} the ξ\xi-name of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}. Because of our definition of FF and (9), the set n\mathcal{B}_{n} gives us exactly the ξ\xi-names of all the sub-columns in C𝒉C_{\boldsymbol{h}}. We will then give each sub-column C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} a new word 𝒃~n\tilde{\boldsymbol{b}}\in\mathcal{B}_{n} and define the random variable ξ~0\tilde{\xi}_{0} on C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} as the only variable such that 𝒃~\tilde{\boldsymbol{b}} is the ξ~\tilde{\xi}-name of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}. This means that to conclude the construction of ξ~0\tilde{\xi}_{0} on C𝒉C_{\boldsymbol{h}}, we simply need to build a map φ𝒉:nn,\varphi_{\boldsymbol{h}}:\mathcal{B}_{n}\longrightarrow\mathcal{B}_{n}, and the properties we will obtain on ξ~\tilde{\xi} will follow from our choice for φ𝒉\varphi_{\boldsymbol{h}}.

In order to give us some additional leeway, we use the parameter γ>0\gamma>0 introduced at the start of the proof: we define n1:=(1γ)nnn_{1}:=\lfloor(1-\gamma)n\rfloor\leq n, and for 𝒃n\boldsymbol{b}\in\mathcal{B}_{n}, we denote by 𝒃n1:=𝒃[0,n1[B[0,n1[\boldsymbol{b}_{n_{1}}:=\boldsymbol{b}_{[0,n_{1}[}\in B^{[0,n_{1}[} the truncated sub-sequence of 𝒃\boldsymbol{b} of length n1n_{1}. Conversely, for 𝒃¯B[0,n1[\bar{\boldsymbol{b}}\in B^{[0,n_{1}[}, define

B(𝒃¯):={𝒃n|𝒃n1=𝒃¯},B(\bar{\boldsymbol{b}}):=\{\boldsymbol{b}\in\mathcal{B}_{n}\,|\,\boldsymbol{b}_{n_{1}}=\bar{\boldsymbol{b}}\},

and

n1:={𝒃¯B[0,n1[|B(𝒃¯)}.\mathcal{B}_{n_{1}}:=\{\bar{\boldsymbol{b}}\in B^{[0,n_{1}[}\,|\,B(\bar{\boldsymbol{b}})\neq\varnothing\}.

We will obtain the map φ𝒉\varphi_{\boldsymbol{h}} by building an injective map ψ𝒉:n1n\psi_{\boldsymbol{h}}:\mathcal{B}_{n_{1}}\longrightarrow\mathcal{B}_{n} and setting φ𝒉(𝒃):=ψ𝒉(𝒃n1)\varphi_{\boldsymbol{h}}(\boldsymbol{b}):=\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}}). We start be recalling that we chose nn and n\mathcal{B}_{n} so that the estimate given by the Shannon-McMillan-Breiman Theorem, i.e. (6), still holds when replacing nn by n1n_{1}. More precisely, we mean that, for 𝒃¯n1\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}

2(hμ(ξ,T)+β)n1μ(ξ[0,n1[=𝒃¯)2(hμ(ξ,T)β)n1.2^{-(h_{\mu}(\xi,T)+\beta)n_{1}}\leq\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}})\leq 2^{-(h_{\mu}(\xi,T)-\beta)n_{1}}. (15)

Moreover, we stated at the start of the proof that n\mathcal{B}_{n} is chosen such that, for 𝒃¯n1\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}

μ(ξ[0,n1[=𝒃¯,ξ[0,n[n)12μ(ξ[0,n1[=𝒃¯).\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n})\geq\frac{1}{2}\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}}). (16)

We need to prove that statement. We do this by considering the set

𝒞:={𝒃¯n10|μ(ξ[0,n1[=𝒃¯,ξ[0,n[n0)12μ(ξ[0,n1[=𝒃¯)}.\mathcal{C}:=\{\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}^{0}\,|\,\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\notin\mathcal{B}_{n}^{0})\geq\frac{1}{2}\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}})\}.

From the definition of 𝒞\mathcal{C}, we get

12μ(ξ[0,n1[𝒞)μ(ξ[0,n[n0)β/3.\frac{1}{2}\mu(\xi_{[0,n_{1}[}\in\mathcal{C})\leq\mu(\xi_{[0,n[}\notin\mathcal{B}_{n}^{0})\leq\beta/3.

Then, we define n\mathcal{B}_{n} as n:={𝒃n0|𝒃n1𝒞}\mathcal{B}_{n}:=\{\boldsymbol{b}\in\mathcal{B}_{n}^{0}\,|\,\boldsymbol{b}_{n_{1}}\notin\mathcal{C}\}, and easily get that μ(ξ[0,n[n)1β\mu(\xi_{[0,n[}\in\mathcal{B}_{n})\geq 1-\beta. Next, because the set we removed from n0\mathcal{B}_{n}^{0} is measurable with respect to the truncated sequences of length n1n_{1}, for 𝒃¯𝒞\bar{\boldsymbol{b}}\notin\mathcal{C}, we get

μ(ξ[0,n1[=𝒃¯,ξ[0,n[n)=μ(ξ[0,n1[=𝒃¯,ξ[0,n[n0),\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n})=\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n}^{0}),

so (16) follows from the definition of 𝒞\mathcal{C}.

Finally, putting (15) and (16) together, we get, for 𝒃¯n1\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}

μ(ξ[0,n1[=𝒃¯,ξ[0,n[n)122(hμ(ξ,T)+β)n12(hμ(ξ,T)+2β)n1,\mu(\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}},\xi_{[0,n[}\in\mathcal{B}_{n})\geq\frac{1}{2}2^{-(h_{\mu}(\xi,T)+\beta)n_{1}}\geq 2^{-(h_{\mu}(\xi,T)+2\beta)n_{1}}, (17)

by making sure that n1n_{1} large enough. This will enable us to control the measure of the part of the truncated column over {ξ[0,n1[=𝒃¯}\{\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}}\} that is in 𝒯n\mathcal{T}_{n}.

Step 4: Estimates for Hall’s marriage lemma

From (6) and (15), we can tell that

#n1\displaystyle\#\mathcal{B}_{n_{1}} 2(hμ(ξ,T)+β)n12(hμ(ξ,T)+β)(1γ)n\displaystyle\leq 2^{(h_{\mu}(\xi,T)+\beta)n_{1}}\leq 2^{(h_{\mu}(\xi,T)+\beta)(1-\gamma)n}
2(hμ(ξ,T)γhμ(ξ,T)+β)n(1β)2(hμ(ξ,T)β)n#n,\displaystyle\leq 2^{(h_{\mu}(\xi,T)-\gamma h_{\mu}(\xi,T)+\beta)n}\leq(1-\beta)2^{(h_{\mu}(\xi,T)-\beta)n}\leq\#\mathcal{B}_{n},

since 2β<γhμ(ξ,T)2\beta<\gamma h_{\mu}(\xi,T) and we can choose nn large enough. This inequality is also clearly true from the definition of n1\mathcal{B}_{n_{1}}, but we include this computation, as a similar one will be essential later in the proof. That being said, this inequality means that it is possible to find an injective map from n1\mathcal{B}_{n_{1}} to n\mathcal{B}_{n}, but we want to be more specific about which injective map we choose. To that end, we will make use of Hall’s marriage lemma. To do that, for each 𝒃¯n1\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}, we need to specify which elements of n\mathcal{B}_{n} we consider suitable ξ~\tilde{\xi}-names for the columns {C𝒉𝒃;𝒃B(𝒃¯)}\{C_{\boldsymbol{h}}^{\boldsymbol{b}};\,\boldsymbol{b}\in B(\bar{\boldsymbol{b}})\}.

We recall that n0n_{0} is the integer chosen so that 𝒫~0\tilde{\mathcal{P}}_{0} is (ξ)[n0,n0](\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}-measurable. Define Ln:=[n0,n1n0[L_{n}:=[n_{0},n_{1}-n_{0}[\subset\mathbb{Z} and :=n12n0\ell:=n_{1}-2n_{0} the length of LnL_{n}. Because 𝒫~0\tilde{\mathcal{P}}_{0} is (ξ)[n0,n0](\mathcal{H}\vee\xi)_{[-n_{0},n_{0}]}-measurable, 𝒫~Ln\tilde{\mathcal{P}}_{L_{n}} is (ξ)[0,n1[(\mathcal{H}\vee\xi)_{[0,n_{1}[}-measurable. So, for 𝒉\boldsymbol{h} fixed, for each 𝒃¯n1\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}, on the set {[0,n[=𝒉,ξ[0,n1[=𝒃¯}\{\mathcal{H}_{[0,n[}=\boldsymbol{h},\xi_{[0,n_{1}[}=\bar{\boldsymbol{b}}\}, there can be only one value of 𝒫~Ln\tilde{\mathcal{P}}_{L_{n}}, which we denote 𝒫~Ln(𝒉,𝒃¯)\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}}).

For 𝒃¯n1\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}, the suitable corresponding ξ~\tilde{\xi}-names will be the elements 𝒃n\boldsymbol{b}\in\mathcal{B}_{n} for which d(𝒫~Ln(𝒉0,𝒃n1),𝒫~Ln(𝒉,𝒃¯))εd_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\boldsymbol{b}_{n_{1}}),\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}}))\leq\varepsilon. More formally, we set

J𝒃¯:={𝒃n|d(𝒫~Ln(𝒉0,𝒃n1),𝒫~Ln(𝒉,𝒃¯))ε},J_{\bar{\boldsymbol{b}}}:=\{\boldsymbol{b}\in\mathcal{B}_{n}\,|\,d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\boldsymbol{b}_{n_{1}}),\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}}))\leq\varepsilon\},

and we want to build ψ𝒉\psi_{\boldsymbol{h}} so that we have

ψ𝒉(𝒃¯)J𝒃¯,\psi_{\boldsymbol{h}}(\bar{\boldsymbol{b}})\in J_{\bar{\boldsymbol{b}}}, (18)

for as many 𝒃¯n1\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}} as possible.

From (13), it follows that

d¯n((𝒫~[0,n[|𝒬=𝒉),(𝒫~[0,n[|𝒬=𝒉0))2ε3/3.\bar{d}_{n}(\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}_{0}))\leq 2\varepsilon^{3}/3.

Therefore:

d¯((𝒫~Ln|𝒬=𝒉)\displaystyle\bar{d}_{\ell}(\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}) ,(𝒫~Ln|𝒬=𝒉0))\displaystyle,\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}_{0}))
nn12n0d¯n((𝒫~[0,n[|𝒬=𝒉),(𝒫~[0,n[|𝒬=𝒉0))\displaystyle\leq\frac{n}{n_{1}-2n_{0}}\bar{d}_{n}(\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}),\mathcal{L}(\tilde{\mathcal{P}}_{[0,n[}\,|\,\mathcal{Q}=\boldsymbol{h}_{0}))
n(1γ)nn02ε3/3<ε3,\displaystyle\leq\frac{n}{(1-\gamma)n-n_{0}}2\varepsilon^{3}/3<\varepsilon^{3},

by choosing nn large enough. So there exists λ𝒫(ALn×ALn)\lambda\in\mathscr{P}(A^{L_{n}}\times A^{L_{n}}) a coupling of (𝒫~Ln|𝒬=𝒉)\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}) and (𝒫~Ln|𝒬=𝒉0)\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}_{0}) such that

d(𝒑1,𝒑2)𝑑λ(𝒑1,𝒑2)ε3.\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon^{3}.

Denote by λ1\lambda_{1} and λ2\lambda_{2} the marginals of λ\lambda, i.e. λ1=(𝒫~Ln|𝒬=𝒉)\lambda_{1}=\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}) and λ2=(𝒫~Ln|𝒬=𝒉0)\lambda_{2}=\mathcal{L}(\tilde{\mathcal{P}}_{L_{n}}\,|\,\mathcal{Q}=\boldsymbol{h}_{0}). We are interested in the set 𝒜ALn\mathcal{A}_{\ell}\subset A^{L_{n}} defined by

𝒜:={𝒑ALn;λ(d(𝒑1,𝒑2)ε|𝒑1=𝒑)1ε}.\mathcal{A}_{\ell}:=\{\boldsymbol{p}\in A^{L_{n}}\,;\,\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon\,|\,\boldsymbol{p}_{1}=\boldsymbol{p})\geq 1-\varepsilon\}.

The following gives an estimate on the measure of 𝒜\mathcal{A}_{\ell}:

ε3\displaystyle\varepsilon^{3}\geq d(𝒑1,𝒑2)𝑑λ(𝒑1,𝒑2)𝒑1𝒜d(𝒑1,𝒑2)𝑑λ(𝒑1,𝒑2)\displaystyle\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\geq\int_{\boldsymbol{p}_{1}\notin\mathcal{A}_{\ell}}d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2})
=𝒑𝒜d(𝒑1,𝒑2)𝑑λ(𝒑1,𝒑2|𝒑1=𝒑)𝑑λ1(𝒑)\displaystyle=\int_{\boldsymbol{p}\notin\mathcal{A}_{\ell}}\int d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})d\lambda(\boldsymbol{p}_{1},\boldsymbol{p}_{2}\,|\,\boldsymbol{p}_{1}=\boldsymbol{p})d\lambda_{1}(\boldsymbol{p})
𝒑𝒜λ(d(𝒑1,𝒑2)>ε|𝒑1=𝒑)ε𝑑λ1(𝒑)\displaystyle\geq\int_{\boldsymbol{p}\notin\mathcal{A}_{\ell}}\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})>\varepsilon\,|\,\boldsymbol{p}_{1}=\boldsymbol{p})\cdot\varepsilon d\lambda_{1}(\boldsymbol{p})
>ε2μ(𝒫~Ln𝒜|𝒬=𝒉),\displaystyle>\varepsilon^{2}\mu(\tilde{\mathcal{P}}_{L_{n}}\notin\mathcal{A}_{\ell}\,|\,\mathcal{Q}=\boldsymbol{h}),

so μ(𝒫~Ln𝒜|𝒬=𝒉)<ε\mu(\tilde{\mathcal{P}}_{L_{n}}\notin\mathcal{A}_{\ell}\,|\,\mathcal{Q}=\boldsymbol{h})<\varepsilon. In other words, if we set

¯n1(𝒉):={𝒃¯n1|𝒫~Ln(𝒉,𝒃¯)𝒜},\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}):=\{\bar{\boldsymbol{b}}\in\mathcal{B}_{n_{1}}\,|\,\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}})\in\mathcal{A}_{\ell}\},

we have μ(ξ[0,n1[¯n1(𝒉)|𝒬=𝒉)1ε\mu(\xi_{[0,n_{1}[}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\,|\,\mathcal{Q}=\boldsymbol{h})\geq 1-\varepsilon. The set ¯n1(𝒉)\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}) is the set on which we want (18) to hold. Hall’s marriage lemma tells us that there exists an injective map ψ𝒉:¯n1(𝒉)n\psi_{\boldsymbol{h}}:\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\rightarrow\mathcal{B}_{n} for which (18) is true if we have the following:

I¯n1(𝒉),#I#𝒃¯IJ𝒃¯.\forall I\subset\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}),\;\#I\leq\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}. (19)

Let I¯n1(𝒉)I\subset\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}). Consider K:=𝒃¯I{𝒫~Ln(𝒉,𝒃¯)}𝒜K:=\bigcup_{\bar{\boldsymbol{b}}\in I}\{\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\bar{\boldsymbol{b}})\}\subset\mathcal{A}_{\ell} and note that

𝒃¯IJ𝒃¯={𝒃n|d(𝒫~Ln(𝒉0,𝒃n1),K)ε}.\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}=\{\boldsymbol{b}\in\mathcal{B}_{n}\,|\,d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\boldsymbol{b}_{n_{1}}),K)\leq\varepsilon\}.

Taking that into account, we have

#𝒃¯IJ𝒃¯2(hμ(ξ,T)β)nμ(d(𝒫~Ln(𝒉𝟎,ξ[0,n1[),K)ε,ξ[0,n[n)=2(hμ(ξ,T)β)nμ(d(𝒫~Ln,K)ε,ξ[0,n[n|[0,n[=𝒉0)=2(hμ(ξ,T)β)nμ(ξ[0,n[n)μ(d(𝒫~Ln,K)ε|𝒬=𝒉0)2(hμ(ξ,T)β)n(1β)λ(d(𝒑1,𝒑2)ε,𝒑1K)2(hμ(ξ,T)2β)n𝒑Kλ(d(𝒑1,𝒑2)ε|𝒑1=𝒑)𝑑λ1(𝒑)2(hμ(ξ,T)2β)n(1ε)λ1(K), because K𝒜2(hμ(ξ,T)3β)nμ(𝒫~LnK|𝒬=𝒉).\begin{split}\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}&\geq 2^{(h_{\mu}(\xi,T)-\beta)n}\mu(d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h_{0}},\xi_{[0,n_{1}[}),K)\leq\varepsilon,\xi_{[0,n[}\in\mathcal{B}_{n})\\ &=2^{(h_{\mu}(\xi,T)-\beta)n}\mu(d_{\ell}(\tilde{\mathcal{P}}_{L_{n}},K)\leq\varepsilon,\xi_{[0,n[}\in\mathcal{B}_{n}\,|\,\mathcal{H}_{[0,n[}=\boldsymbol{h}_{0})\\ &=2^{(h_{\mu}(\xi,T)-\beta)n}\mu(\xi_{[0,n[}\in\mathcal{B}_{n})\mu(d_{\ell}(\tilde{\mathcal{P}}_{L_{n}},K)\leq\varepsilon\,|\,\mathcal{Q}=\boldsymbol{h}_{0})\\ &\geq 2^{(h_{\mu}(\xi,T)-\beta)n}(1-\beta)\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon,\boldsymbol{p}_{1}\in K)\\ &\geq 2^{(h_{\mu}(\xi,T)-2\beta)n}\int_{\boldsymbol{p}\in K}\lambda(d_{\ell}(\boldsymbol{p}_{1},\boldsymbol{p}_{2})\leq\varepsilon\,|\,\boldsymbol{p}_{1}=\boldsymbol{p})d\lambda_{1}(\boldsymbol{p})\\ &\geq 2^{(h_{\mu}(\xi,T)-2\beta)n}(1-\varepsilon)\lambda_{1}(K),\,\text{ because }K\subset\mathcal{A}_{\ell}\\ &\geq 2^{(h_{\mu}(\xi,T)-3\beta)n}\mu(\tilde{\mathcal{P}}_{L_{n}}\in K\,|\,\mathcal{Q}=\boldsymbol{h}).\end{split} (20)

making sure again that nn is large enough. Moreover, using (17), we get

#I\displaystyle\#I 2(hμ(ξ,T)+2β)n1μ(ξ[0,n1[I,ξ[0,n[n)\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I,\xi_{[0,n[}\in\mathcal{B}_{n})
2(hμ(ξ,T)+2β)n1μ(ξ[0,n1[I|ξ[0,n[n)\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I\,|\,\xi_{[0,n[}\in\mathcal{B}_{n})
=2(hμ(ξ,T)+2β)n1μ(ξ[0,n1[I|𝒬=𝒉), because ξ\displaystyle=2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\xi_{[0,n_{1}[}\in I\,|\,\mathcal{Q}=\boldsymbol{h}),\,\text{ because }\xi\perp\!\!\!\perp\mathcal{H}
2(hμ(ξ,T)+2β)n1μ(𝒫~LnK|𝒬=𝒉),\displaystyle\leq 2^{(h_{\mu}(\xi,T)+2\beta)n_{1}}\mu(\tilde{\mathcal{P}}_{L_{n}}\in K\,|\,\mathcal{Q}=\boldsymbol{h}),

by definition of KK. Together with (20), it yields

#I\displaystyle\#I 2((1γ)(hμ(ξ,T)+2β)(hμ(ξ,T)3β))n#𝒃¯IJ𝒃¯\displaystyle\leq 2^{((1-\gamma)(h_{\mu}(\xi,T)+2\beta)-(h_{\mu}(\xi,T)-3\beta))n}\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}
2(5βγhμ(ξ,T))n#𝒃¯IJ𝒃¯#𝒃¯IJ𝒃¯,\displaystyle\leq 2^{(5\beta-\gamma h_{\mu}(\xi,T))n}\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}}\leq\#\bigcup_{\bar{\boldsymbol{b}}\in I}J_{\bar{\boldsymbol{b}}},

since 5βγhμ(ξ,T)5\beta\leq\gamma h_{\mu}(\xi,T). Therefore there exists an injective map ψ𝒉:¯n1(𝒉)n\psi_{\boldsymbol{h}}:\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\rightarrow\mathcal{B}_{n} for which (18) holds. As we noted that #n1#n\#\mathcal{B}_{n_{1}}\leq\#\mathcal{B}_{n}, ψ𝒉\psi_{\boldsymbol{h}} can then be extended to an injective map defined on n1\mathcal{B}_{n_{1}} (still taking values in n\mathcal{B}_{n}). We recall that, with ψ𝒉\psi_{\boldsymbol{h}} built, we set φ𝒉(𝒃):=ψ𝒉(𝒃n1)\varphi_{\boldsymbol{h}}(\boldsymbol{b}):=\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}}).

As we announced at the start of our reasoning, we define ξ~0\tilde{\xi}_{0} on the levels of C𝒉C_{\boldsymbol{h}} so that the ξ~\tilde{\xi}-name of each sub-column C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} is φ𝒉(𝒃)=ψ𝒉(𝒃n1)\varphi_{\boldsymbol{h}}(\boldsymbol{b})=\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}}). Since this construction can be done with every 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n} (with the map ψ𝒉\psi_{\boldsymbol{h}} depending on 𝒉\boldsymbol{h}), we have completed the construction of ξ~0\tilde{\xi}_{0}. We now need to check that ξ~\tilde{\xi} satisfies the conditions (i), (ii) and (iii) of our lemma.

Step 5: Proving that ξ~\tilde{\xi} satsifies (i), (ii) and (iii)

We start by estimating the law of ξ~0\tilde{\xi}_{0}. Since μ(𝒯n)13β\mu(\mathcal{T}_{n})\geq 1-3\beta, we have

bB\displaystyle\sum_{b\in B} |μ(ξ~0=b)μ(ξ0=b)|bB|μ({ξ~0=b}𝒯n)μ(ξ0=b)μ(𝒯n)|+6β\displaystyle|\mu(\tilde{\xi}_{0}=b)-\mu(\xi_{0}=b)|\leq\sum_{b\in B}|\mu(\{\tilde{\xi}_{0}=b\}\cap\mathcal{T}_{n})-\mu(\xi_{0}=b)\mu(\mathcal{T}_{n})|+6\beta
bB𝒉n,𝒃n|μ({ξ~0=b}C𝒉𝒃)μ(ξ0=b)μ(C𝒉𝒃)|+6β.\displaystyle\leq\sum_{b\in B}\sum_{\boldsymbol{h}\in\mathcal{E}_{n},\boldsymbol{b}\in\mathcal{B}_{n}}|\mu(\{\tilde{\xi}_{0}=b\}\cap C_{\boldsymbol{h}}^{\boldsymbol{b}})-\mu(\xi_{0}=b)\mu(C_{\boldsymbol{h}}^{\boldsymbol{b}})|+6\beta.

We recall that fn(𝒃,b)f_{n}(\boldsymbol{b},b^{\prime}) is the frequency at which the element bb^{\prime} appears in the sequence 𝒃\boldsymbol{b} (see (7)). Moreover, one can see that, since φ𝒉(𝒃)\varphi_{\boldsymbol{h}}(\boldsymbol{b}) is the ξ~\tilde{\xi}-name of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} and all the levels of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} have the same measure, we have

μ({ξ~0=b}C𝒉𝒃)=μ(C𝒉𝒃)fn(φ𝒉(𝒃),b).\mu(\{\tilde{\xi}_{0}=b\}\cap C_{\boldsymbol{h}}^{\boldsymbol{b}})=\mu(C_{\boldsymbol{h}}^{\boldsymbol{b}})\cdot f_{n}(\varphi_{\boldsymbol{h}}(\boldsymbol{b}),b).

Therefore, because φ𝒉\varphi_{\boldsymbol{h}} takes values in n\mathcal{B}_{n}, (8) yields:

bB|μ(ξ~0=b)μ(ξ0=b)|\displaystyle\sum_{b\in B}|\mu(\tilde{\xi}_{0}=b)-\mu(\xi_{0}=b)| bB𝒉n,𝒃nμ(C𝒉𝒃)|fn(φ𝒉(𝒃),b)μ(ξ0=b)|+6β\displaystyle\leq\sum_{b\in B}\sum_{\boldsymbol{h}\in\mathcal{E}_{n},\boldsymbol{b}\in\mathcal{B}_{n}}\mu(C_{\boldsymbol{h}}^{\boldsymbol{b}})|f_{n}(\varphi_{\boldsymbol{h}}(\boldsymbol{b}),b)-\mu(\xi_{0}=b)|+6\beta
βμ(𝒯n)+6β7β.\displaystyle\leq\beta\mu(\mathcal{T}_{n})+6\beta\leq 7\beta.

This means that d¯1((ξ~0),(ξ0))7βα\bar{d}_{1}(\mathcal{L}(\tilde{\xi}_{0}),\mathcal{L}(\xi_{0}))\leq 7\beta\leq\alpha (using (d)).

We now turn our attention to the entropy of ξ~\mathcal{H}\vee\tilde{\xi}. The ξ~\tilde{\xi}-name of a column C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}} is ψ𝒉(𝒃n1)\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}}), and since ψ𝒉\psi_{\boldsymbol{h}} is invective, we can deduce 𝒃n1\boldsymbol{b}_{n_{1}} from the ξ~\tilde{\xi}-name of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}. This means that, on the levels of the truncated tower 𝒯n1:={TjF}0jn11\mathcal{T}_{n_{1}}:=\{T^{j}F\}_{0\leq j\leq n_{1}-1}, ξ0\xi_{0} is (ξ~)[n1,n[(\mathcal{H}\vee\tilde{\xi})_{[-n_{1},n[}-measurable. Indeed, if xx is in 𝒯n1\mathcal{T}_{n_{1}} and the sequence (ξ~)[n1,n[(x)(\mathcal{H}\vee\tilde{\xi})_{[-n_{1},n[}(x) is known, the sequence ξ~[n1,0[(x)\tilde{\xi}_{[-n_{1},0[}(x) must contain a “*”, which indicates the moment the past orbit of xx passes trough GG before entering 𝒯n\mathcal{T}_{n}. So the position of “*” in ξ~[n1,0[(x)\tilde{\xi}_{[-n_{1},0[}(x) tells us the index of the level of 𝒯n1\mathcal{T}_{n_{1}} the point xx is on, which we call j0j_{0}. In other words, we mean that Tj0xFT^{-j_{0}}x\in F. Then, (ξ~)[j0,nj0[(\mathcal{H}\vee\tilde{\xi})_{[-j_{0},n-j_{0}[} gives the (ξ~)(\mathcal{H}\vee\tilde{\xi})-name of the column xx is on, from which we deduce the truncated ξ\xi-name of length n1n_{1} of the column. Finally, the j0j_{0}-th letter of that name gives us ξ0(x)\xi_{0}(x).

Therefore, if we combine the previous paragraph with the fact that μ(𝒯n1)(1γ)μ(𝒯n)\mu(\mathcal{T}_{n_{1}})\geq(1-\gamma)\mu(\mathcal{T}_{n}), there exists a (ξ~)[n1,n[(\mathcal{H}\vee\tilde{\xi})_{[-n_{1},n[}-measurable random variable χ0\chi_{0} such that μ(χ0ξ0)β+γ2γ\mu(\chi_{0}\neq\xi_{0})\leq\beta+\gamma\leq 2\gamma (because βγ\beta\leq\gamma). So, by the choice of γ\gamma made in (c), we can apply Lemma 2.7 and conclude that

hμ(ξ,T)\displaystyle h_{\mu}(\mathcal{H}\vee\xi,T) hμ(χ,T)+αhμ(ξ~,T)+α.\displaystyle\leq h_{\mu}(\mathcal{H}\vee\chi,T)+\alpha\leq h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)+\alpha.

Because ξ\mathcal{H}\vee\xi generates 𝒜\mathscr{A}, we also have the converse inequality

hμ(ξ~,T)hμ(ξ,T),h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq h_{\mu}(\mathcal{H}\vee\xi,T),

so we have proved that ξ~\tilde{\xi} satisfies condition (ii) of our lemma.

We are now left with proving (iii). If we consider that ξ~[n,n[(x)\tilde{\xi}_{[-n,n[}(x) is known, we deduce that, if the symbol “*” appears in ξ~[n,0[\tilde{\xi}_{[-n,0[}, then xx is in 𝒯n\mathcal{T}_{n} and the position of “*” tells us the index j0j_{0} of the level of 𝒯n\mathcal{T}_{n} the point xx is on. Then, using the notation introduced in our construction above, we can look at the random variable 𝒫~j0(𝒉0,ξ~[j0,n1j0[)\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[}). It is ξ~[n,n[\tilde{\xi}_{[-n,n[}-measurable and we are going to show that it satisfies

μ(𝒫~j0(𝒉0,ξ~[j0,n1j0[)𝒫0)5ε.\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\mathcal{P}_{0})\leq 5\varepsilon. (21)

We start by looking at a column C𝒉C_{\boldsymbol{h}} for some 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n}. We then split it into sub-columns C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}. If 𝒃n1¯n1(𝒉)\boldsymbol{b}_{n_{1}}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}), we are going to use (18). First, we need to remember that if xx is in C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}, then ξ~[j0,nj0[(x)\tilde{\xi}_{[-j_{0},n-j_{0}[}(x) gives the ξ~\tilde{\xi}-name of the column C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}. But, by construction, that name is ψ𝒉(𝒃n1)\psi_{\boldsymbol{h}}(\boldsymbol{b}_{n_{1}}), and, because we are looking at the case where 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n} and 𝒃n1¯n1(𝒉)\boldsymbol{b}_{n_{1}}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h}), (18) holds. So we have

d(𝒫~Ln(𝒉0,ξ~[j0,n1j0[),𝒫~Ln(𝒉,𝒃n1))ε.d_{\ell}(\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[}),\tilde{\mathcal{P}}_{L_{n}}(\boldsymbol{h},\boldsymbol{b}_{n_{1}}))\leq\varepsilon.

We recall that Ln=[n0,n1n0[L_{n}=[n_{0},n_{1}-n_{0}[ and \ell is its length. By definition of dd_{\ell}, we know that the number of levels j0Lnj_{0}\in L_{n} on which we have 𝒫~j0(𝒉0,ξ~[j0,n1j0[)=𝒫~j0(𝒉,𝒃n1)\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})=\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h},\boldsymbol{b}_{n_{1}}) is greater than (1ε)(1-\varepsilon)\ell. Moreover, by construction, for j0Lnj_{0}\in L_{n}, on the j0j_{0}-th level of C𝒉𝒃C_{\boldsymbol{h}}^{\boldsymbol{b}}, we have 𝒫~0=𝒫~j0(𝒉,𝒃n1)\tilde{\mathcal{P}}_{0}=\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h},\boldsymbol{b}_{n_{1}}). Finally, since =n12n0\ell=n_{1}-2n_{0}, we have

μ(𝒫~j0(𝒉0,ξ~[j0,n1j0[)𝒫~0|C𝒉𝒃)\displaystyle\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\tilde{\mathcal{P}}_{0}\,|\,C_{\boldsymbol{h}}^{\boldsymbol{b}}) n(1ε)(n12n0)n\displaystyle\leq\frac{n-(1-\varepsilon)(n_{1}-2n_{0})}{n}
n(1ε)(1γ)n+2n0n\displaystyle\leq\frac{n-(1-\varepsilon)(1-\gamma)n+2n_{0}}{n}
ε+γ+2n0n2ε,\displaystyle\leq\varepsilon+\gamma+\frac{2n_{0}}{n}\leq 2\varepsilon,

since γε/2\gamma\leq\varepsilon/2 and we can assume that nn is large enough so that 2n0/nε/22n_{0}/n\leq\varepsilon/2. Moreover, the fact that 𝒉¯n\boldsymbol{h}\in\bar{\mathcal{E}}_{n} implies that μ(ξ[0,n1[¯n1(𝒉)|𝒬=𝒉)1ε\mu(\xi_{[0,n_{1}[}\in\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})\,|\,\mathcal{Q}=\boldsymbol{h})\geq 1-\varepsilon, and, combining it with (9), we can see that

μ(𝒃n1¯n1(𝒉)C𝒉𝒃|C𝒉)ε.\mu\left(\left.\bigcup_{\boldsymbol{b}_{n_{1}}\notin\bar{\mathcal{B}}_{n_{1}}(\boldsymbol{h})}C_{\boldsymbol{h}}^{\boldsymbol{b}}\,\right|\,C_{\boldsymbol{h}}\right)\leq\varepsilon.

Therefore

μ(𝒫~j0(𝒉0,ξ~[j0,n1j0[)𝒫~0|C𝒉)3ε.\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\tilde{\mathcal{P}}_{0}\,|\,C_{\boldsymbol{h}})\leq 3\varepsilon.

Next

μ(𝒫~j0(𝒉0,ξ~[j0,n1j0[)𝒫~0)\displaystyle\mu(\tilde{\mathcal{P}}_{j_{0}}(\boldsymbol{h}_{0},\tilde{\xi}_{[-j_{0},n_{1}-j_{0}[})\neq\tilde{\mathcal{P}}_{0}) 3ε+μ(𝒉¯nC𝒉)+μ(𝒯nc)\displaystyle\leq 3\varepsilon+\mu\left(\bigcup_{\boldsymbol{h}\notin\bar{\mathcal{E}}_{n}}C_{\boldsymbol{h}}\right)+\mu(\mathcal{T}_{n}^{c})
3ε+ε+μ(𝒯nc) using (9) and (11)\displaystyle\leq 3\varepsilon+\varepsilon+\mu(\mathcal{T}_{n}^{c})\;\text{ using \eqref{eq:indep_base_tower} and \eqref{eq:set_extremality}}
4ε+3β.\displaystyle\leq 4\varepsilon+3\beta.

Finally, 𝒫~0\tilde{\mathcal{P}}_{0} was chosen so that μ(𝒫~0𝒫0)β2β\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0})\leq\beta^{2}\leq\beta, since βε/4\beta\leq\varepsilon/4, we have proven (21), and therefore, up to replacing ε\varepsilon by ε/5\varepsilon/5, we have shown that

𝒫0εσ(ξ~).\mathcal{P}_{0}\preceq_{\varepsilon}\sigma(\tilde{\xi}).

3.2 Application of the technical lemma

We are now left with proving Proposition 3.2 using Lemma 3.7. This is done using some abstract results from Thouvenot [26, Proposition 2’, Proposition 3]. We start by rewriting those results with our notation. We give a slight simplification, adapted to our setup.

First, [26, Proposition 2’] tells us that a process close enough to an i.i.d. process independent from \mathscr{H} in law and entropy can be turned into an i.i.d. process independent from \mathscr{H}.

Proposition 3.8.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic system of finite entropy. Let \mathcal{H} be a finite valued process defined on 𝐗\mathbf{X} and ρ\rho be a probability measure on a finite alphabet BB. For every ε>0\varepsilon>0, there exist α>0\alpha>0 such that if a random variable ξ~0:XB\tilde{\xi}_{0}:X\rightarrow B satisfies

  1. (i)

    d¯1(ρ,(ξ~0))α\bar{d}_{1}(\rho,\mathcal{L}(\tilde{\xi}_{0}))\leq\alpha,

  2. (ii)

    and 0hμ(,T)+H(ρ)hμ(ξ~,T)α0\leq h_{\mu}(\mathcal{H},T)+H(\rho)-h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq\alpha,

then there exists a random variable ξ0\xi^{\prime}_{0} of law ρ\rho such that the process ξ:=(ξ0Tn)n\xi^{\prime}:=(\xi^{\prime}_{0}\circ T^{n})_{n\in\mathbb{Z}} is i.i.d., independent from \mathcal{H} and we have

μ(ξ~0ξ0)ε.\mu(\tilde{\xi}_{0}\neq\xi^{\prime}_{0})\leq\varepsilon.

Next, [26, Proposition 3] tells us that on a system that is relatively Bernoulli over a factor \mathscr{H}, any i.i.d. process independent from \mathscr{H} with the right entropy can be turned into an independent complement of \mathscr{H}:

Proposition 3.9.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be an ergodic system, \mathcal{H} a finite valued process and ξ\xi a finite valued i.i.d. process independent from \mathcal{H} such that 𝒜=σ()σ(ξ)\mathscr{A}=\sigma(\mathcal{H})\vee\sigma(\xi) mod μ\mu. For any ε>0\varepsilon>0 and any i.i.d. process ζ\zeta independent from \mathcal{H} such that hμ(ξ,T)=hμ(ζ,T)h_{\mu}(\xi,T)=h_{\mu}(\zeta,T), there exists ζ~0\tilde{\zeta}_{0} such that (ζ~)=(ζ)\mathcal{L}(\mathcal{H}\vee\tilde{\zeta})=\mathcal{L}(\mathcal{H}\vee\zeta), 𝒜=σ()σ(ζ~)\mathscr{A}=\sigma(\mathcal{H})\vee\sigma(\tilde{\zeta}) mod μ\mu and

μ(ζ~0ζ0)ε.\mu(\tilde{\zeta}_{0}\neq\zeta_{0})\leq\varepsilon.

We are now fully equipped to end the proof of Proposition 3.2:

Proof of Proposition 3.2.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a Bernoulli shift of finite entropy and 𝒫0:XA\mathcal{P}_{0}:X\rightarrow A a finite valued random variable such that 𝒜=σ({𝒫0Tn}n)\mathscr{A}=\sigma(\{\mathcal{P}_{0}\circ T^{n}\}_{n\in\mathbb{Z}}). As we consider a factor σ\sigma-algebra \mathscr{H} of 𝐗\mathbf{X}, it has finite entropy, therefore there exists a finite valued random variable 0:XH\mathcal{H}_{0}:X\rightarrow H such that the process :=(0Tn)n\mathcal{H}:=(\mathcal{H}_{0}\circ T^{n})_{n\in\mathbb{Z}} generates \mathscr{H}. Lastly, we take an i.i.d. process ξ\xi independent from \mathscr{H} such that 𝒜=σ(ξ)\mathscr{A}=\mathscr{H}\vee\sigma(\xi) mod μ\mu. Let ε>0\varepsilon>0.

Now, Lemma 3.7 tells us that there is δ>0\delta>0 for which, if hμ(,T)δh_{\mu}(\mathscr{H},T)\leq\delta, then for any α>0\alpha>0, there is a random variable ξ~0\tilde{\xi}_{0} such that

  1. (i)

    d¯1((ξ0),(ξ~0))α\bar{d}_{1}(\mathcal{L}(\xi_{0}),\mathcal{L}(\tilde{\xi}_{0}))\leq\alpha,

  2. (ii)

    0hμ(ξ,T)hμ(ξ~,T)α0\leq h_{\mu}(\mathcal{H}\vee\xi,T)-h_{\mu}(\mathcal{H}\vee\tilde{\xi},T)\leq\alpha,

  3. (iii)

    and 𝒫0ε/4σ(ξ~)\mathcal{P}_{0}\preceq_{\varepsilon/4}\sigma(\tilde{\xi}).

Denote 𝒫~0\tilde{\mathcal{P}}_{0} a ξ~\tilde{\xi}-measurable random variable such that μ(𝒫~0𝒫0)ε/4\mu(\tilde{\mathcal{P}}_{0}\neq\mathcal{P}_{0})\leq\varepsilon/4. We can find an integer N1N\geq 1 for which 𝒫~0ε/4ξ~[N,N]\tilde{\mathcal{P}}_{0}\preceq_{\varepsilon/4}\tilde{\xi}_{[-N,N]} and set ε1:=ε/(4(2N+1))\varepsilon_{1}:=\varepsilon/(4(2N+1)). If α\alpha is chosen small enough, then Proposition 3.8 tells us that there is a random variable ξ0\xi^{\prime}_{0} such that the process (ξ0Tn)n(\xi^{\prime}_{0}\circ T^{n})_{n\in\mathbb{Z}} is i.i.d., independent from \mathcal{H} and we have μ(ξ0ξ~0)ε1\mu(\xi^{\prime}_{0}\neq\tilde{\xi}_{0})\leq\varepsilon_{1}. Finally, Proposition 3.9 tells us that we can then find a random variable ξ0′′\xi^{\prime\prime}_{0} for which the process (ξ0′′Tn)n(\xi^{\prime\prime}_{0}\circ T^{n})_{n\in\mathbb{Z}} is still i.i.d., independent from \mathcal{H}, but we also have that 𝒜=σ(ξ′′)\mathscr{A}=\mathscr{H}\vee\sigma(\xi^{\prime\prime}) mod μ\mu and μ(ξ0ξ~0)ε1\mu(\xi^{\prime}_{0}\neq\tilde{\xi}_{0})\leq\varepsilon_{1}. So we have μ(ξ0′′ξ~0)2ε1\mu(\xi^{\prime\prime}_{0}\neq\tilde{\xi}_{0})\leq 2\varepsilon_{1}.

Combining that with the fact that 𝒫~0ε/4ξ~[N,N]\tilde{\mathcal{P}}_{0}\preceq_{\varepsilon/4}\tilde{\xi}_{[-N,N]}, we get that 𝒫~03ε/4ξ[N,N]′′\tilde{\mathcal{P}}_{0}\preceq_{3\varepsilon/4}\xi^{\prime\prime}_{[-N,N]}, so

𝒫0εξ[N,N]′′.\mathcal{P}_{0}\preceq_{\varepsilon}\xi^{\prime\prime}_{[-N,N]}.

Setting :=σ(ξ′′)\mathscr{B}:=\sigma(\xi^{\prime\prime}), we get the Bernoulli factor desired to prove our proposition. ∎

3.3 Proof of Theorem 3.1

In the previous section, we managed to conclude the proof of Proposition 3.2. We now see how Theorem 3.1 follows from that proposition:

Proof of Theorem 3.1.

Let 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) be a Bernoulli system and :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} be a weak Pinsker filtration. Since \mathscr{F} is a weak Pinsker filtration, if (n)1(\mathscr{F}_{n})_{\leq-1} is of product type, so is \mathscr{F}. Therefore, up to replacing 𝐗\mathbf{X} by the factor generated by 1\mathscr{F}_{-1}, we can assume that 𝐗\mathbf{X} has finite entropy. Thanks to Theorem 2.18, this means that we can set a finite alphabet AA and a random variable 𝒫0:XA\mathcal{P}_{0}:X\rightarrow A such that the corresponding process 𝒫:=(𝒫0Ti)i\mathcal{P}:=(\mathcal{P}_{0}\circ T^{i})_{i\in\mathbb{Z}} generates 𝒜\mathscr{A}, i.e. 𝒜=σ(𝒫)\mathscr{A}=\sigma(\mathcal{P}) mod μ\mu. Let (εk)k1(\varepsilon_{k})_{k\geq 1} be a decreasing sequence of positive numbers such that limkεk=0\lim_{k\rightarrow\infty}\varepsilon_{k}=0.

We need to build a strictly increasing sequence (nk)k0(n_{k})_{k\leq 0} such that (nk)k0(\mathscr{F}_{n_{k}})_{k\leq 0} is of product type. We start by setting n0=0n_{0}=0. Since limnhμ(n,T)=0\lim_{n\rightarrow-\infty}h_{\mu}(\mathscr{F}_{n},T)=0, we can choose n10n_{-1}\leq 0 large enough (in absolute value), so that hμ(n1,T)h_{\mu}(\mathscr{F}_{n_{-1}},T) is small enough for Proposition 3.2 to enable us to build a Bernoulli factor σ\sigma-algebra n1\mathscr{B}_{n_{-1}} that is an independent complement of n1\mathscr{F}_{n_{-1}} such that 𝒫0ε1n1\mathcal{P}_{0}\preceq_{\varepsilon_{1}}\mathscr{B}_{n_{-1}}.

Now take k1k\leq-1 and assume that we have built (n1,,nk)(\mathscr{B}_{n_{-1}},...,\mathscr{B}_{n_{k}}) such that they are mutually independent Bernoulli factors such that for kj1k\leq j\leq-1, nj\mathscr{B}_{n_{j}} is independent from nj\mathscr{F}_{n_{j}}, nj+1=njnj\mathscr{F}_{n_{j+1}}=\mathscr{F}_{n_{j}}\vee\mathscr{B}_{n_{j}} and we have

𝒫0ε|k|kj1nj.\mathcal{P}_{0}\preceq_{\varepsilon_{|k|}}\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}. (22)

By construction of the nj\mathscr{B}_{n_{j}}, we know that 𝒫\mathcal{P} is measurable with respect to nkkj1nj\mathscr{F}_{n_{k}}\vee\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}. Moreover, using again Theorem 2.18, there is a random variable 𝒫0(k)\mathcal{P}^{(k)}_{0} such that the process 𝒫(k):=(𝒫0(k)Ti)i\mathcal{P}^{(k)}:=(\mathcal{P}^{(k)}_{0}\circ T^{i})_{i\in\mathbb{Z}} generates nk\mathscr{F}_{n_{k}}. So there exists an integer N1N\geq 1 such that

𝒫0ε|k|+1/2𝒫[N,N](k)kj1nj.\mathcal{P}_{0}\preceq_{\varepsilon_{|k|+1}/2}\mathcal{P}^{(k)}_{[-N,N]}\vee\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}. (23)

Then set ε~:=ε|k|+1/(2(2N+1))>0\tilde{\varepsilon}:=\varepsilon_{|k|+1}/(2(2N+1))>0. As we did above, we choose nk1nkn_{k-1}\leq n_{k} large enough in absolute value so that hμ(nk1,T)h_{\mu}(\mathscr{F}_{n_{k-1}},T) is small enough for us to apply Proposition 3.2 to find a Bernoulli factor nk1nk\mathscr{B}_{n_{k-1}}\subset\mathscr{F}_{n_{k}} such that nk1nk1\mathscr{B}_{n_{k-1}}{\perp\!\!\!\perp}\mathscr{F}_{n_{k-1}}, nk=nk1nk1\mathscr{F}_{n_{k}}=\mathscr{F}_{n_{k-1}}\vee\mathscr{B}_{n_{k-1}} and

𝒫0(k)ε~nk1.\mathcal{P}^{(k)}_{0}\preceq_{\tilde{\varepsilon}}\mathscr{B}_{n_{k-1}}. (24)

Putting (23) and (24) together, we get

𝒫0ε|k|+1k1j1nj.\mathcal{P}_{0}\preceq_{\varepsilon_{|k|+1}}\bigvee_{k-1\leq j\leq-1}\mathscr{B}_{n_{j}}.

Iterating this for every k1k\leq-1 ends our construction of (nk)k0(n_{k})_{k\leq 0} and (nk)k1(\mathscr{B}_{n_{k}})_{k\leq-1}. Therefore (22) holds for every k1k\leq-1. It follows then that 𝒫0\mathcal{P}_{0} is measurable with respect to

j1nj.\bigvee_{j\leq-1}\mathscr{B}_{n_{j}}.

Since the nj\mathscr{B}_{n_{j}} are factor σ\sigma-algebras, the full process 𝒫\mathcal{P} is also j1nj\bigvee_{j\leq-1}\mathscr{B}_{n_{j}}-measurable. Finally, 𝒫\mathcal{P} generates 𝒜\mathscr{A}, so

j1nj=𝒜=0 mod μ.\bigvee_{j\leq-1}\mathscr{B}_{n_{j}}=\mathscr{A}=\mathscr{F}_{0}\,\text{ mod }\mu.

Let k1k\leq-1, and set 1:=jk1nj\mathscr{E}_{1}:=\bigvee_{j\leq k-1}\mathscr{B}_{n_{j}} and 2:=kj1nj\mathscr{E}_{2}:=\bigvee_{k\leq j\leq-1}\mathscr{B}_{n_{j}}. By construction, we have

1nk,nk2, and 0=12.\mathscr{E}_{1}\subset\mathscr{F}_{n_{k}},\;\mathscr{F}_{n_{k}}\perp\!\!\!\perp\mathscr{E}_{2},\;\text{ and }\;\mathscr{F}_{0}=\mathscr{E}_{1}\vee\mathscr{E}_{2}.

We use this to see that if ff is nk\mathscr{F}_{n_{k}}-measurable, we have

f=𝔼[f|0]=𝔼[f|12]=𝔼[f|1],f=\mathbb{E}[f\,|\,\mathscr{F}_{0}]=\mathbb{E}[f\,|\,\mathscr{E}_{1}\vee\mathscr{E}_{2}]=\mathbb{E}[f\,|\,\mathscr{E}_{1}],

which proves that

nk=1=jk1nj mod μ.\mathscr{F}_{n_{k}}=\mathscr{E}_{1}=\bigvee_{j\leq k-1}\mathscr{B}_{n_{j}}\;\text{ mod }\mu.

4 Examples of weak Pinsker filtrations generated by a cellular automaton

Up to this point, we have discussed the existence and abstract properties of weak Pinsker filtrations. Now we want to give explicit examples to get a more concrete idea of what those objects can look like. We take inspiration from [10] and use cellular automata to generate our filtrations. We describe in the following paragraphs how this is done.

Let AA be a finite alphabet. A cellular automaton (or, more precisely, a deterministic cellular automaton) τ:AA\tau:A^{\mathbb{Z}}\rightarrow A^{\mathbb{Z}} maps AA^{\mathbb{Z}} onto itself as follows: take FF\subset\mathbb{Z} finite, which we call a neighborhood, and a local map τ0:AFA\tau_{0}:A^{F}\rightarrow A. Then define

τ:(an)n(τ0((an+k)kF))n.\tau:(a_{n})_{n\in\mathbb{Z}}\mapsto(\tau_{0}((a_{n+k})_{k\in F}))_{n\in\mathbb{Z}}.

Here, we will only consider examples in which F={0,1}F=\{0,1\}. Therefore, our automata will be determined by a local map of the form τ0:A{0,1}A\tau_{0}:A^{\{0,1\}}\rightarrow A. One can note that, by construction, cellular automata commute with the shift transformation

S:(an)n(an+1)n.S:(a_{n})_{n\in\mathbb{Z}}\mapsto(a_{n+1})_{n\in\mathbb{Z}}.

So we can consider a dynamical system of the form 𝐘:=(A,,ν,S)\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S) where ν\nu is a SS-invariant measure, and note that the σ\sigma-algebra σ(τ)\sigma(\tau) generated by τ\tau is a factor σ\sigma-algebra. We can do better and iterate τ\tau to generate a filtration:

for n0,n:=σ(τ|n|).\text{for }n\leq 0,\;\mathscr{F}_{n}:=\sigma(\tau^{|n|}).

In that case, each n\mathscr{F}_{n} is a factor σ\sigma-algebra of 𝐘\mathbf{Y}, and therefore :=(n)n0\mathscr{F}:=(\mathscr{F}_{n})_{n\leq 0} is a dynamical filtration. So, we see that cellular automata give a natural way to construct dynamical filtrations.

In fact, the theory of dynamical filtrations we presented in Section 2.2 was initiated in [10] in the setting of filtrations generated by cellular automata. However, the automata studied there preserve the product measure, and therefore the entropy of the associated factor σ\sigma-algebras n\mathscr{F}_{n} will be the same for every n0n\leq 0. This prevents the filtration from being weak Pinsker.

Here, we will consider a different automaton: take AA a finite alphabet and assume that one element of AA is labeled <<0>>. Then define the following local map

τ0:A2A(α1,α2){α1 if α1=α20 otherwise.\begin{array}[]{cccc}\tau_{0}:&A^{2}&\longrightarrow&A\\ &(\alpha_{1},\alpha_{2})&\mapsto&\left\{\begin{array}[]{ll}\alpha_{1}&\text{ if }\,\alpha_{1}=\alpha_{2}\\ 0&\text{ otherwise}.\end{array}\right.\end{array} (25)

The associated automaton will eliminate isolated elements, replacing them with 0, and a maximal string of the form ααα\alpha\cdots\alpha\alpha is replaced with αα0\alpha\cdots\alpha 0. For example, if A={0,1}A=\{0,1\}, this gives:

[Uncaptioned image]

Therefore, as we iterate the automaton, the proportion of <<0>> increases as all other elements are gradually replaced by <<0>>. Heuristically, this indicates that the entropy of the factor σ\sigma-algebras σ(τ|n|)\sigma(\tau^{|n|}) will go to zero as nn goes to infinity. But to state this rigorously, one need to specify the system 𝐘:=(A,,ν,S)\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S) on which we define \mathscr{F}. More accurately, it is the alphabet AA and the measure ν\nu that need to be specified. However, the entropy hν(n)h_{\nu}(\mathscr{F}_{n}) goes to 0 regardless of the choice of AA and ν\nu:

Proposition 4.1.

Let 𝐘:=(A,,ν,S)\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S), where ν\nu is a SS-invariant measure and let ξ\xi be the coordinate process on 𝐘\mathbf{Y}. For every n1n\geq 1, we have

hν(τnξ,S)log(#An2)n.h_{\nu}(\tau^{n}\xi,S)\leq\frac{\log(\#An^{2})}{n}.
Proof.

Let BnAnB_{n}\subset A^{n} be the set of values taken by (τnξ)[0,n[(\tau^{n}\xi)_{[0,n[}. We know that

Hν((τnξ)[0,n[)log(#Bn).H_{\nu}((\tau^{n}\xi)_{[0,n[})\leq\log(\#B_{n}).

Because of the structure of τ\tau, in τnξ\tau^{n}\xi, for α0\alpha\neq 0, any run of <<α\alpha>> is placed in between two runs of <<0>> of length at least n+1n+1. Therefore, (τnξ)[0,n[(\tau^{n}\xi)_{[0,n[} is either a sequence of <<0>> or composed of one run of <<α\alpha>> (with α0\alpha\neq 0) in between runs of <<0>>. So

#Bn1+(#A1)n2#An2.\#B_{n}\leq 1+(\#A-1)n^{2}\leq\#An^{2}.

In conclusion

hν(τnξ,S)1nHν((τnξ)[0,n[)log(#An2)n.h_{\nu}(\tau^{n}\xi,S)\leq\frac{1}{n}H_{\nu}((\tau^{n}\xi)_{[0,n[})\leq\frac{\log(\#An^{2})}{n}.

In Section 4.1, we deal with the case where 𝐘\mathbf{Y} is a Bernoulli shift, and in Section 4.2, we deal with the case where 𝐘\mathbf{Y} is Ornstein’s example of a non-Bernoulli K-system from [15]. In both cases, by Proposition 4.1, the entropy of the filtration generated by the cellular automaton goes to zero. Then we look at each example separately to show the more involved result: each n+1\mathscr{F}_{n+1} is relatively Bernoulli over n\mathscr{F}_{n}. Therefore, we get two examples of weak Pinsker filtrations.

It is interesting to note that those two filtrations are very similar in their construction, but the filtration (or any sub-sequence) on Ornstein’s K-system cannot be of product type (otherwise, the system would be Bernoulli), we know from Theorem 3.1 that the latter has a sub-sequence that is of product type. It shows that there can be subtle differences in the asymptotic structure of weak Pinsker filtrations.

4.1 A cellular automaton on a Bernoulli shift

Here, we consider a Bernoulli shift 𝐘:=(A,,ν,S)\mathbf{Y}:=(A^{\mathbb{Z}},\mathscr{B},\nu,S) where ν\nu is a product measure. To avoid unnecessarily complicated notations, we will also assume that A={0,1}A=\{0,1\} and ν:=(12(δ0+δ1))\nu:=(\frac{1}{2}(\delta_{0}+\delta_{1}))^{\otimes\mathbb{Z}}. Therefore, the local function (25) becomes:

τ0:{0,1}2{0,1}α{1 if α=(1,1)0 otherwise.\begin{array}[]{cccc}\tau_{0}:&\{0,1\}^{2}&\longrightarrow&\{0,1\}\\ &\alpha&\mapsto&\left\{\begin{array}[]{ll}1&\text{ if }\,\alpha=(1,1)\\ 0&\text{ otherwise}.\end{array}\right.\end{array}

And we study the corresponding automaton:

τ:{0,1}{0,1}(an)n(τ0(an,an+1))n\begin{array}[]{cccc}\tau:&\{0,1\}^{\mathbb{Z}}&\longrightarrow&\{0,1\}^{\mathbb{Z}}\\ &(a_{n})_{n\in\mathbb{Z}}&\mapsto&(\tau_{0}(a_{n},a_{n+1}))_{n\in\mathbb{Z}}\\ \end{array}

The automaton replaces an isolated <<11>> with a <<0>> and reduces sequences of <<11>> by replacing the final one by a <<0>>.

Theorem 4.2.

On the system 𝐘:=({0,1},,ν,S)\mathbf{Y}:=(\{0,1\}^{\mathbb{Z}},\mathscr{B},\nu,S), the filtration given by :=(σ(τ|n|))n0\mathscr{F}:=(\sigma(\tau^{|n|}))_{n\leq 0} is a weak Pinsker filtration. That is, for every n1n\leq-1, n+1\mathscr{F}_{n+1} is relatively Bernoulli over n\mathscr{F}_{n} and we have

hν(n)n0.h_{\nu}(\mathscr{F}_{n})\underset{n\rightarrow-\infty}{\longrightarrow}0. (26)

The convergence of the entropy follows from Proposition 4.1. However, when 𝐘\mathbf{Y} is a Bernoulli shift, we can compute a better bound, as stated in Proposition 4.3.

Proposition 4.3.

Let ξ\xi denote the coordinate process on 𝐘\mathbf{Y}. For every n0n\geq 0, we have

hν(τnξ,S)3log(2)2n/2.h_{\nu}(\tau^{n}\xi,S)\leq 3\log(2)2^{-n/2}.
Proof.

Let n0n\geq 0. One can see that τnξ\tau^{n}\xi is 11 at ii if and only if ξ\xi is 11 over the entire segment [i,i+n][i,i+n], as shown below:

[Uncaptioned image]

We set kn:=n/2k_{n}:=\lceil n/2\rceil, and we remark that

ν({i[0,kn],(τnξ)i=1})ν({ξ[kn,n]=(1,,1)})1/2nkn+11/2n/2.\nu(\{\exists i\in[0,k_{n}],(\tau^{n}\xi)_{i}=1\})\leq\nu(\{\xi_{[k_{n},n]}=(1,...,1)\})\leq 1/2^{n-k_{n}+1}\leq 1/2^{n/2}.

Then, combining this with Fano’s inequality (Lemma 2.5) we get

Hν((τnξ)[0,kn])2n/2(1+log(2kn+1)+log(2n/2))2n/23(kn+1)log(2),H_{\nu}((\tau^{n}\xi)_{[0,k_{n}]})\leq 2^{-n/2}(1+\log(2^{k_{n}+1})+\log(2^{n/2}))\leq 2^{-n/2}3(k_{n}+1)\log(2),

and we can conclude for the KS-entropy:

hν(τnξ,S)1kn+1Hν((τnξ)[0,kn])3log(2)2n/2.h_{\nu}(\tau^{n}\xi,S)\leq\frac{1}{k_{n}+1}H_{\nu}((\tau^{n}\xi)_{[0,k_{n}]})\leq 3\log(2)2^{-n/2}.

In addition, we give the following simple lemma on conditional independence:

Lemma 4.4.

Let (X,𝒜,μ)(X,\mathscr{A},\mu) be a probability space and 𝒵\mathscr{Z} a sub-σ\sigma-algebra. Let AA, BB, UU and VV be random variables such that

(A,U)𝒵(B,V).(A,U)\perp\!\!\!\perp_{\mathscr{Z}}(B,V).

Then we have

(A,B|U,V,𝒵)\displaystyle\mathcal{L}(A,B\,|\,U,V,\mathscr{Z}) =(A|U,𝒵)(B|V,𝒵)\displaystyle=\mathcal{L}(A\,|\,U,\mathscr{Z})\otimes\mathcal{L}(B\,|\,V,\mathscr{Z})
=(A|U,V,𝒵)(B|U,V,𝒵)\displaystyle=\mathcal{L}(A\,|\,U,V,\mathscr{Z})\otimes\mathcal{L}(B\,|\,U,V,\mathscr{Z})
Proof.

It follows from the fact that if AA^{\prime}, BB^{\prime}, UU^{\prime} and VV^{\prime} are respectively AA, BB, UU and VV-measurable random variables:

𝔼[ABUV|𝒵]\displaystyle\mathbb{E}[A^{\prime}\cdot B^{\prime}\cdot U^{\prime}\cdot V^{\prime}\,|\,\mathscr{Z}] =𝔼[AU|𝒵]𝔼[BV|𝒵]\displaystyle=\mathbb{E}[A^{\prime}\cdot U^{\prime}\,|\,\mathscr{Z}]\cdot\mathbb{E}[B^{\prime}\cdot V^{\prime}\,|\,\mathscr{Z}]
=𝔼[𝔼[A|U,𝒵]U|𝒵]𝔼[𝔼[B|V,𝒵]V|𝒵]\displaystyle=\mathbb{E}[\mathbb{E}[A^{\prime}\,|\,U,\mathscr{Z}]\cdot U^{\prime}\,|\,\mathscr{Z}]\cdot\mathbb{E}[\mathbb{E}[B^{\prime}\,|\,V,\mathscr{Z}]\cdot V^{\prime}\,|\,\mathscr{Z}]
=𝔼[𝔼[A|U,𝒵]U𝔼[B|V,𝒵]V|𝒵].\displaystyle=\mathbb{E}[\mathbb{E}[A^{\prime}\,|\,U,\mathscr{Z}]\cdot U^{\prime}\cdot\mathbb{E}[B^{\prime}\,|\,V,\mathscr{Z}]\cdot V^{\prime}\,|\,\mathscr{Z}].

Proposition 4.5.

Let ξ\xi be the coordinate process on 𝐘\mathbf{Y}. For every n1n\geq 1, ξ\xi is relatively very weak Bernoulli over τnξ\tau^{n}\xi.

Proof.

Set η:=τnξ\eta:=\tau^{n}\xi. Relative very weak Bernoullicity was defined in Definition 2.15. We recall some notation: take λ𝒫({0,1}×{0,1})\lambda\in\mathscr{P}(\{0,1\}^{\mathbb{Z}}\times\{0,1\}^{\mathbb{Z}}) to be the law of (η,ξ)(\eta,\xi), and for I,JI,J\subset\mathbb{Z} and a,b{0,1}a,b\in\{0,1\}^{\mathbb{Z}}, λ(|aI,bJ)\lambda_{\ell}(\cdot\,|\,a_{I},b_{J}) is the conditional law of ξ[0,[\xi_{[0,\ell[} given that ηI=aI\eta_{I}=a_{I} and ξJ=bJ\xi_{J}=b_{J}.

Let ε>0\varepsilon>0. We need to show that there exists 1\ell\geq 1 such that for every m1m\geq 1 and for k1k\geq 1 large enough, we have

d¯(λ(|a[k,k],b[m,0]),λ(|a[k,k]))dλ(a,b)ε.\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\lambda(a,b)\leq\varepsilon. (27)

Let m1m\geq 1. We start by noting that there must be some <<11>> that appears in η\eta: indeed, the law of large numbers tells us that there exists 01\ell_{0}\geq 1 such that

μ({i[0,0[;ηi=1}:=A)1ε.\mu(\underset{:=A}{\underbrace{\{\exists i\in[0,\ell_{0}[\,;\;\eta_{i}=1\}}})\geq 1-\varepsilon. (28)

We then set :=1ε0\ell:=\lceil\frac{1}{\varepsilon}\rceil\ell_{0}. Next, we take k0k\geq\ell_{0} so that η[k,k]\eta_{[-k,k]} determines entirely AA.

We fix i[0,0[i\in[0,\ell_{0}[. First, we note that, as we can see on the following image

[Uncaptioned image]

if ηi=1\eta_{i}=1, then (ξ],i[,η],i[)(\xi_{]-\infty,i[},\eta_{]-\infty,i[}) is ξ],i[\xi_{]-\infty,i[}-measurable and (ξ]i,[,η]i,[)(\xi_{]i,\infty[},\eta_{]i,\infty[}) is ξ]i+n,[\xi_{]i+n,\infty[}-measurable. So, since the variables {ξj}j\{\xi_{j}\}_{j\in\mathbb{Z}} are independent, given {ηi=1}\{\eta_{i}=1\} the variables (ξ],i[,η],i[)(\xi_{]-\infty,i[},\eta_{]-\infty,i[}) and (ξ]i,[,η]i,[)(\xi_{]i,\infty[},\eta_{]i,\infty[}) are independent. Finally, using Lemma 4.4, for aAa\in A^{\mathbb{Z}} such that ai=1a_{i}=1, we get:

(ξ],i[,ξ]i,[|\displaystyle\mathcal{L}(\xi_{]-\infty,i[},\xi_{]i,\infty[}\,|\, η[k,k]=a[k,k])=(ξ],i[|η[k,i[=a[k,i[,ηi=1)\displaystyle\eta_{[-k,k]}=a_{[-k,k]})=\mathcal{L}(\xi_{]-\infty,i[}\,|\,\eta_{[-k,i[}=a_{[-k,i[},\eta_{i}=1)
(ξ]i,[|η]i,k]=a]i,k],ηi=1)\displaystyle\hskip 85.35826pt\otimes\mathcal{L}(\xi_{]i,\infty[}\,|\,\eta_{]i,k]}=a_{]i,k]},\eta_{i}=1)
=(ξ],i[|η[k,k]=a[k,k])(ξ]i,[|η]i,k]=a[k,k]).\displaystyle=\mathcal{L}(\xi_{]-\infty,i[}\,|\,\eta_{[-k,k]}=a_{[-k,k]})\otimes\mathcal{L}(\xi_{]i,\infty[}\,|\,\eta_{]i,k]}=a_{[-k,k]}).

Therefore, if η[k,k]\eta_{[-k,k]} is chosen so that there exists i[0,0[i\in[0,\ell_{0}[ such that ηi=1\eta_{i}=1, we see that ξ[m,0[\xi_{[-m,0[} and ξ[0,[\xi_{[\ell_{0},\ell[} are independent given η[k,k]\eta_{[-k,k]}.

We are now ready to prove (27). For any b{0,1}b\in\{0,1\}^{\mathbb{Z}} and any a{0,1}a\in\{0,1\}^{\mathbb{Z}} such that there exists i[0,0[i\in[0,\ell_{0}[ such that ai=1a_{i}=1, the fact that ξ[m,0]\xi_{[-m,0]} and ξ[0,[\xi_{[\ell_{0},\ell[} are relatively independent given {η[k,k]=a[k,k]}\{\eta_{[-k,k]}=a_{[-k,k]}\} implies that the measures λ(|a[k,k],b[m,0])\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}) and λ(|a[k,k])\lambda_{\ell}(\cdot\,|\,a_{[-k,k]}) have the same marginal on the coordinates of [0,[[\ell_{0},\ell[. So the relative product of those measures over ξ[0,[\xi_{[\ell_{0},\ell[} is a coupling under which the copies of ξ[0,[\xi_{[\ell_{0},\ell[} coincide. It follows that

d¯(λ(|a[k,k],b[m,0]),λ(|a[k,k]))0/ε.\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)\leq\ell_{0}/\ell\leq\varepsilon. (29)

By combining (28) and (29), we can conclude that

d¯(λ(|a[k,k],b[m,0]),λ(|a[k,k]))dλ(a,b)2ε.\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\lambda(a,b)\leq 2\varepsilon.

Proof of Theorem 4.2.

First of all, (26) follows directly from Proposition 4.3. Next, from Proposition 4.5, it follows that 0\mathscr{F}_{0} is relatively very weak Bernoulli over n\mathscr{F}_{n}, so n+1\mathscr{F}_{n+1} is relatively very weak Bernoulli over n\mathscr{F}_{n} (by part (iii) of Lemma 2.17), so n+1\mathscr{F}_{n+1} is relatively Bernoulli over n\mathscr{F}_{n} (by part (i) of Lemma 2.17). ∎

4.2 A cellular automaton on Ornstein’s K-process

Here, we consider the non-Bernoulli K-system introduced by Ornstein in [15]. A more detailed presentation of this system is given in [16, Part III], but we give a sketch of the construction for completeness. It is a process defined on the alphabet {0,e,f,s}\{0,e,f,s\}. We set h(r)h(r), s(r)s(r) and f(r)f(r) to be integers depending on rr\in\mathbb{N} used in the construction of the process. For r1r\geq 1, an rr-block is a random sequence of length h(r)h(r) on the alphabet {0,e,f,s}\{0,e,f,s\}, whose law we define inductively.

To get a 11-block, take k11,f(1)1k_{1}\in\llbracket 1,f(1)-1\rrbracket chosen uniformly at random, and consider a sequence that starts with a string of k1k_{1} <<ff>>, followed by a string of h(0)h(0) <<0>>, and ends with a string of f(1)k1f(1)-k_{1} <<ee>>:

[Uncaptioned image]

This construction implies that h(1)=h(0)+f(1)h(1)=h(0)+f(1).

To get an rr-block, take kr1,f(r)1k_{r}\in\llbracket 1,f(r)-1\rrbracket chosen uniformly at random, and 2r2^{r} i.i.d. random variables (ξi(r1))i1,2r(\xi^{(r-1)}_{i})_{i\in\llbracket 1,2^{r}\rrbracket} such that each ξi(r1)\xi^{(r-1)}_{i} is an (r1)(r-1)-block. The rr-block is then built as follows:

[Uncaptioned image]

So an rr-block starts with a string of krk_{r} <<ff>>, and ends with a string of f(r)krf(r)-k_{r} <<ee>>. In between, we put all the (r1)(r-1)-blocks separated by strings of <<ss>> so that each ξi(r1)\xi_{i}^{(r-1)} is placed in between two strings of <<ss>> of respective lengths is(r)is(r) and (i+1)s(r)(i+1)s(r). In particular, h(r)h(r) is entirely determined by h(r1)h(r-1), f(r)f(r) and s(r)s(r).

Ornstein’s K-system is then built by constructing an increasing sequence of towers (𝒯r)r1(\mathcal{T}_{r})_{r\geq 1} such that X:=r1𝒯rX:=\bigcup_{r\geq 1}\mathcal{T}_{r}. A tower 𝒯r\mathcal{T}_{r} is given by its base FrF_{r} for which the sets {TiFr}i[0,h(r)[\{T^{i}F_{r}\}_{i\in[0,h(r)[} are disjoint and

𝒯r:=i=0h(r)1TiFr.\mathcal{T}_{r}:=\bigsqcup_{i=0}^{h(r)-1}T^{i}F_{r}.

Through a cutting and stacking method, Ornstein builds in [15] the towers (𝒯r)r1(\mathcal{T}_{r})_{r\geq 1} along with a process ξ\xi so that the law of ξ[0,h(r)[\xi_{[0,h(r)[} given FrF_{r} is the law of an rr-block. In other words, this means that the columns of the form

C𝐚:=i=0h(r)1Ti(Fr{ξ[0,h(r)[=𝐚}), for 𝐚{0,e,f,s}h(r),C_{\mathbf{a}}:=\bigsqcup_{i=0}^{h(r)-1}T^{i}(F_{r}\cap\{\xi_{[0,h(r)[}=\mathbf{a}\}),\,\text{ for }\mathbf{a}\in\{0,e,f,s\}^{h(r)},

partition 𝒯r\mathcal{T}_{r} according to the law of an rr-block. Denote 𝐗:=(X,𝒜,μ,T)\mathbf{X}:=(X,\mathscr{A},\mu,T) the resulting dynamical system. A proper choice of h(r)h(r), s(r)s(r) and f(r)f(r) assures that this construction gives a finite measure. Then ξ\xi is a factor map onto the system

𝐘:=({0,e,f,s},,ν,S),\mathbf{Y}:=(\{0,e,f,s\}^{\mathbb{Z}},\mathscr{B},\nu,S),

where ν\nu is the law of ξ\xi.

Since ξ\xi is a process on the alphabet {0,e,f,s}\{0,e,f,s\}, the local function (25) becomes:

τ0:{0,e,f,s}2{0,e,f,s}(α1,α2){α1 if α1=α20 otherwise.\begin{array}[]{cccc}\tau_{0}:&\{0,e,f,s\}^{2}&\longrightarrow&\{0,e,f,s\}\\ &(\alpha_{1},\alpha_{2})&\mapsto&\left\{\begin{array}[]{ll}\alpha_{1}&\text{ if }\,\alpha_{1}=\alpha_{2}\\ 0&\text{ otherwise}.\end{array}\right.\end{array}

From now on, τ\tau denotes the corresponding cellular automaton. Similarly to what we did in Section 4.1, we prove

Theorem 4.6.

On the system 𝐘:=({0,e,f,s},,ν,S)\mathbf{Y}:=(\{0,e,f,s\}^{\mathbb{Z}},\mathscr{B},\nu,S), the filtration given by :=(σ(τ|n|))n0\mathscr{F}:=(\sigma(\tau^{|n|}))_{n\leq 0} is a weak Pinsker filtration. That is, for every n1n\leq-1, n+1\mathscr{F}_{n+1} is relatively Bernoulli over n\mathscr{F}_{n} and we have

hν(n)n0.h_{\nu}(\mathscr{F}_{n})\underset{n\rightarrow-\infty}{\longrightarrow}0. (30)

The overall structure of the proof will resemble Section 4.1, but the details are adapted to the specific structure of Ornstein’s process. First, the convergence to 0 of the entropy follows from Proposition 4.1. We could also adapt the proof of Proposition 4.3 to get that convergence, but it does not give a better rate of convergence than Proposition 4.1, so we do not give any details.

Proposition 4.7.

If ξ\xi is the process defined above, then for every n1n\geq 1, ξ\xi is relatively very weak Bernoulli over τnξ\tau^{n}\xi.

Proof.

We set η:=τnξ\eta:=\tau^{n}\xi. Let ε>0\varepsilon>0. Once again, we need to show that there exists 1\ell\geq 1 such that for every m1m\geq 1 and for k1k\geq 1 large enough, we have

d¯(λ(|a[k,k],b[m,0]),λ(|a[k,k]))dλ(a,b)ε,\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\lambda(a,b)\leq\varepsilon,

where λ\lambda is the law of (η,ξ)(\eta,\xi) and, for I,JI,J\subset\mathbb{Z}, λ(|aI,bJ)\lambda_{\ell}(\cdot\,|\,a_{I},b_{J}) is the conditional law of ξ[0,[\xi_{[0,\ell[} given that ηI\eta_{I} equals aIa_{I} and that ξJ\xi_{J} equals bJb_{J}.

Let m1m\geq 1. We choose rr so that s(r+1)n+1s(r+1)\geq n+1. By construction of ξ\xi, we know that for any rr-block in ξ\xi, there exists i[1,2r+1]i\in[1,2^{r+1}] such that the said rr-block will come after a string of is(r+1)i\cdot s(r+1) <<ss>> and be followed by a string of (i+1)s(r+1)(i+1)\cdot s(r+1) <<ss>>. Therefore, by knowing the positions of all the strings of <<ss>> longer that s(r+1)s(r+1), we know the position of every rr-block.

However, since we chose to have s(r+1)n+1s(r+1)\geq n+1, we can say that, for kk\in\mathbb{Z}, we have ξ[k,k+s(r+1)[=(s,,s)\xi_{[k,k+s(r+1)[}=(s,...,s) if and only if η[k,k+s(r+1)n[=(s,,s)\eta_{[k,k+s(r+1)-n[}=(s,...,s). This means that the positions of the rr-blocks contained on a segment [k1,k2][k_{1},k_{2}] are η[k1N,k2+N]\eta_{[k_{1}-N,k_{2}+N]}-measurable, for NN large enough (for example N=(2r+1+1)s(r+1)N=(2^{r+1}+1)s(r+1)).

By choosing rr large enough, we can also assume that μ(𝒯r)1ε/2\mu(\mathcal{T}_{r})\geq 1-\varepsilon/2. Using Birkhoff’s ergodic theorem, for \ell large enough, the set

A:={xX;1j=01𝟙𝒯r(Tj(x))>1ε},A:=\left\{x\in X;\,\frac{1}{\ell}\sum_{j=0}^{\ell-1}\mathbbm{1}_{\mathcal{T}_{r}}(T^{j}(x))>1-\varepsilon\right\},

satisfies μ(A)>1ε\mu(A)>1-\varepsilon.

In other words, for xAx\in A, the number of elements in the sequence ξ[0,[(x)\xi_{[0,\ell[}(x) that are part of an rr-block is greater than (1ε)(1-\varepsilon)\ell. However, among the intervals on which those rr-blocks are supported, two of them may not be included in [0,[[0,\ell[, and can intersect \[0,[\mathbb{Z}\backslash[0,\ell[. But, if h(r)/ε/2h(r)/\ell\leq\varepsilon/2, there are at most ε\varepsilon\ell elements in those two intervals. To sum up, we get that the number of elements in the sequence ξ[0,[(x)\xi_{[0,\ell[}(x) that are part of an rr-block, and for which the position of that rr-block is contained on the segment [0,[[0,\ell[, is greater than (12ε)(1-2\varepsilon)\ell. Then, we choose k1k\geq 1 so that the positions of the rr-blocks contained in [m,[[-m,\ell[ are η[k,k]\eta_{[-k,k]}-measurable (in particular, AA is η[k,k]\eta_{[-k,k]}-measurable). So we have the following configuration for ξ[m,[\xi_{[-m,\ell[}:

[Uncaptioned image]

where the BiB_{i} are the positions of the rr-blocks supported on [0,[[0,\ell[, and we have shown that #i=1pBi(12ε).\#\bigsqcup_{i=1}^{p}B_{i}\geq(1-2\varepsilon)\ell.

Denote by I:={Bi}1ipI_{\ell}:=\{B_{i}\}_{1\leq i\leq p} the random variable that gives the positions of the rr-blocks on the segment [0,[[0,\ell[. By construction of ξ\xi, we know that, given II_{\ell}, for any rr-block BB, the variables ξB\xi_{B} and ξBc\xi_{B^{c}} are independent. Moreover, we know that any rr-block is between two strings of at least n+1n+1 <<ss>>. Therefore, we see that if II_{\ell} is fixed, for any rr-block BB, ηB\eta_{B} is ξB\xi_{B}-measurable and ηBc\eta_{B^{c}} is ξBc\xi_{B^{c}}-measurable.

Let us give details on the proof of that last claim: we write BcB^{c} as the union of BB^{-} and B+B^{+}, the infinite intervals that come before and after BB respectively. Given the structure of our automaton, it is always true that ηB+\eta_{B^{+}} is ξB+\xi_{B^{+}}-measurable. At the boundary between BB^{-} and BB, we have the following configuration:

[Uncaptioned image]

Indeed, in the construction of the blocks, we see that ξ\xi must put an <<ff>> in the first box of BB. Therefore, we must have <<0>> in the red boxes. So, the values that η\eta takes on the n+1n+1 boxes preceding BB are determined. For the rest of the boxes of BB^{-}, it comes from the structure of τ\tau that the values of η\eta are determined by ξB\xi_{B^{-}} since we are at a distance n+1\geq n+1 from BB. So we have shown that ηB\eta_{B^{-}} is (ξB)(\xi_{B^{-}})-measurable. A similar reasoning at the boundary between BB and B+B^{+} shows that ηB\eta_{B} is ξB\xi_{B}-measurable. And since it is always true that ηB+\eta_{B^{+}} is ξB+\xi_{B^{+}}-measurable, we have proven that ηB\eta_{B} is ξB\xi_{B}-measurable and ηBc\eta_{B^{c}} is ξBc\xi_{B^{c}}-measurable.

But, we also know from the structure of ξ\xi that, given II_{\ell}, ξB\xi_{B} and ξBc\xi_{B^{c}} are independent. The previous paragraph enables us to use Lemma 4.4 to extend that to: given Iη[k,k]I_{\ell}\vee\eta_{[-k,k]}, ξB\xi_{B} and ξBc\xi_{B^{c}} are independent. Finally, since II_{\ell} is η[k,k]\eta_{[-k,k]}-measurable, this yields that ξB\xi_{B} and ξBc\xi_{B^{c}} are relatively independent given η[k,k]\eta_{[-k,k]}.

This independence tells us that, for every sequences aa and bb, λ(|a[k,k],b[m,0])\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}) and λ(|a[k,k])\lambda_{\ell}(\cdot\,|\,a_{[-k,k]}) have the same marginals on the coordinates of the rr-blocks BB contained in [0,[[0,\ell[. Moreover, if aa is chosen so that {η[k,k]=a[k,k]}\{\eta_{[-k,k]}=a_{[-k,k]}\} is a subset of AA, we know that the positions of the rr-blocks cover at least (12ε)(1-2\varepsilon)\ell elements in [0,[[0,\ell[. Then, by considering the relative product of λ(|a[k,k],b[m,0])\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}) and λ(|a[k,k])\lambda_{\ell}(\cdot\,|\,a_{[-k,k]}) over {ξBi}1ip\{\xi_{B_{i}}\}_{1\leq i\leq p}, we get:

d¯(λ(|a[k,k],b[m,0]),λ(|a[k,k]))2ε.\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)\leq 2\varepsilon.

Finally, since μ(A)1ε\mu(A)\geq 1-\varepsilon, this yields

d¯(λ(|a[k,k],b[m,0]),λ(|a[k,k]))dν(a,b)3ε.\int\bar{d_{\ell}}\left(\lambda_{\ell}(\cdot\,|\,a_{[-k,k]},b_{[-m,0]}),\lambda_{\ell}(\cdot\,|\,a_{[-k,k]})\right)d\nu(a,b)\leq 3\varepsilon.

Remark 4.8.

We see that the proofs of Theorem 4.6 and Theorem 4.2 are very similar. In both cases, we have a process ξ\xi, whose conditional law given τnξ\tau^{n}\xi is made of random blocks separated by deterministic blocks, and the random blocks are filled independently from each other. The main difference that prevents Ornstein’s K-process from being Bernoulli is that the position of rr-blocks is determined by the long sequences of <<ss>>, and this creates correlations over long distances. But once we condition by τnξ\tau^{n}\xi, those sequences of <<ss>> are entirely determined. Therefore we are left with filling independently all the rr-blocks, and the past has no longer a significant influence on the future.

In that sense, when we look at the relative structure of Ornstein’s K-process over τn\tau^{n}, the non-Bernoulli aspects disappear. However, when we look at the asymptotic properties of the weak Pinsker filtration obtained by applying {τn}n1\{\tau^{n}\}_{n\geq 1}, whether we start with a Bernoulli process or with a non-Bernoulli K-process, we get different results. Therefore, getting a better understanding of the classification of the various properties of weak Pinsker filtrations could help to develop a new classification of non-Bernoulli K-systems.

Acknowledgements

The author thanks Jean-Paul Thouvenot for the fruitful discussions, the insights and the ongoing exchanges regarding the present work.

References

  • [1] Tim Austin. Measure concentration and the weak Pinsker property. Publ. Math., Inst. Hautes Étud. Sci., 128:1–119, 2018.
  • [2] Séverin Benzoni. Confined extensions and non-standard dynamical filtrations. Stud. Math., 276(3):233–270, 2024.
  • [3] P. Billingsley. Ergodic theory and information. John Wiley & Sons, Hoboken, NJ, 1965.
  • [4] I. P. Cornfeld, S. V. Fomin, and Ya. G. Sinai. Ergodic theory. Transl. from the Russian by A. B. Sossinskii., volume 245. Springer, Berlin, 1982.
  • [5] M. Émery and W. Schachermayer. On Vershik’s standardness criterion and Tsirelson’s notion of cosiness. In Séminaire de Probabilités XXXV, pages 265–305. Berlin: Springer, 2001.
  • [6] Eli Glasner. Ergodic theory via joinings, volume 101. Providence, RI: American Mathematical Society (AMS), 2003.
  • [7] Marshall Hall, Jr. Combinatorial theory. Wiley Classics Library. John Wiley & Sons, Inc., New York, second edition, 1998. A Wiley-Interscience Publication.
  • [8] W. Krieger. On entropy and generators of measure-preserving transformations. Trans. Am. Math. Soc., 149:453–464, 1970.
  • [9] Paul Lanthier. Aspects ergodiques et algébriques des automates cellulaires. PhD thesis, Université de Rouen Normandie, 2020.
  • [10] Paul Lanthier and Thierry De La Rue. Classification of backward filtrations and factor filtrations: examples from cellular automata. Ergodic Theory and Dynamical Systems, 42(9):2890–2922, 2022.
  • [11] D. Ornstein. Bernoulli shifts with the same entropy are isomorphic. Adv. Math., 4:337–352, 1970.
  • [12] Donald Ornstein. Two Bernoulli shifts with infinite entropy are isomorphic. Adv. Math., 5:339–348, 1971.
  • [13] Donald S. Ornstein. A K-automorphism with no square root and Pinsker’s conjecture. Adv. Math., 10:89–102, 1973.
  • [14] Donald S. Ornstein. A mixing transformation for which Pinsker’s conjecture fails. Adv. Math., 10:103–123, 1973.
  • [15] Donald S. Ornstein. An example of a Kolmogorov automorphism that is not a Bernoulli shift. Adv. Math., 10:49–62, 1973.
  • [16] Donald S. Ornstein. Ergodic theory, randomness, and dynamical systems. Yale Mathematical Monographs. 5. New Haven - London: Yale University Press. VII, 141 p. £2.50 (1974)., 1974.
  • [17] Donald S. Ornstein and Benjamin Weiss. Finitely determined implies very weak Bernoulli. Isr. J. Math., 17:94–104, 1974.
  • [18] William Parry. Topics in ergodic theory. Reprint of the 1981 original, volume 75 of Camb. Tracts Math. Cambridge, MA: Cambridge University Press, 2004.
  • [19] M. S. Pinsker. Dynamical systems with completely positive or zero entropy. Sov. Math., Dokl., 1:937–938, 1960.
  • [20] M. Rahe. Relatively finitely determined implies relatively very weak Bernoulli. Can. J. Math., 30:531–548, 1978.
  • [21] C. E. Shannon. A mathematical theory of communication. Bell System Tech. J., 27:379–423, 623–656, 1948.
  • [22] Paul Shields. The theory of Bernoulli shifts. Chicago Lectures in Mathematics. Chicago - London: The University of Chicago Press. X, 118 p. £1.35; cloth £3.55 (1973)., 1973.
  • [23] Ja. G. Sinaĭ. On a weak isomorphism of transformations with invariant measure. Mat. Sb. (N.S.), 63 (105):23–42, 1964.
  • [24] J.-P. Thouvenot. On the stability of the weak Pinsker property. Isr. J. Math., 27:150–162, 1977.
  • [25] J.-P. Thouvenot. Two facts concerning the transformations which satisfy the weak Pinsker property. Ergodic Theory Dyn. Syst., 28(2):689–695, 2008.
  • [26] Jean-Paul Thouvenot. Quelques propriétés des systèmes dynamiques qui se decomposent en un produit de deux systèmes dont l’un est un schema de Bernoulli. Isr. J. Math., 21:177–207, 1975.
  • [27] Jean-Paul Thouvenot. Entropy, isomorphism and equivalence in ergodic theory. In Handbook of dynamical systems. Volume 1A, pages 205–238. Amsterdam: North-Holland, 2002.